Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities

Betti Pillippuge, Thushantha Lakmal; Khan, Zaheer; Munir, Kamran

doi:10.3390/encyclopedia5010037

Open AccessSystematic Review

Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities

by

Thushantha Lakmal Betti Pillippuge

^*

,

Zaheer Khan

^*

and

Kamran Munir

^*

Computer Science Research Centre (CSRC), School of Computing and Creative Technologies, College of Arts, Technology and Environment, University of the West of England (UWE), Bristol BS16 1QY, UK

^*

Authors to whom correspondence should be addressed.

Encyclopedia 2025, 5(1), 37; https://doi.org/10.3390/encyclopedia5010037

Submission received: 23 December 2024 / Revised: 24 February 2025 / Accepted: 28 February 2025 / Published: 6 March 2025

(This article belongs to the Section Mathematics & Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

The deployment of virtual machines (VMs) within the Infrastructure as a Service (IaaS) layer across public, private, or hybrid cloud infrastructures is prevalent in various organisational settings for hosting essential business services. However, achieving rapid elasticity, or autoscaling, and ensuring quality of service amidst fluctuating service demands and available computing resources present significant challenges. Unlike the Platform as a Service (PaaS) and Software as a Service (SaaS) layers, where cloud providers offer managed elasticity features, the VMs at the IaaS layer often lack such capabilities. This paper scrutinises the constraints surrounding the rapid elasticity of VMs within single and hybrid cloud environments at the IaaS layer. It provides a critical analysis of the existing research gaps, emphasising the necessity for the horizontal elasticity of VMs extended across hybrid clouds, coupled with predictive capabilities integrated into the elasticity mechanism. This paper’s focus is particularly beneficial in scenarios where workloads require VM provisioning from multiple clouds to eliminate vendor lock-in and enhance quality of service (QoS) assurances, especially in instances of platform failures. Through critical examination, several research challenges are identified, delineating the existing research gap and outlining future research directions. This paper contributes to the research challenges of VM elasticity in complex cloud environments and underscores the imperative for innovative solutions to address these challenges effectively.

Keywords:

hybrid cloud; reactive autoscaling; proactive autoscaling; elasticity; virtual machine autoscaling; autoscaling at IaaS layer

1. Introduction

In today’s IT infrastructures, a significant number of services are hosted on virtual machines (VMs), which are sourced from self-managed on-premise private clouds, public clouds, or a mixture of both. In all three cases, “rapid elasticity” [1], or autoscaling, is identified as one of the key characteristics of cloud computing, and is used to provide the on-demand scalable computing resources. This is achieved by either scaling the number of VMs (Horizontal Autoscaling) or adjusting the computing resources provided to a VM from a resource pool (Vertical Autoscaling). However, finding an appropriate and timely equilibrium between horizontal and vertical resource allocation required for both autoscaling approaches while managing associated costs to ensure that service level agreements (SLAs) are met, is a challenging task across all three cloud deployment models [2]. Furthermore, these autoscaling techniques are mainly divided into two main categories for both horizontal and vertical scaling, and this division is mainly based on how and when the autoscaling actions are triggered. These two categories are known as reactive and proactive autoscaling [3]. Reactive autoscaling adjusts cloud resources in real-time based on workload changes using thresholds or rules. Proactive autoscaling uses historical data to predict future demands and allocate resources in advance, ensuring resource readiness for future workload resource requirements.

1.1. Motivation and Scope of the Paper

In cloud computing environments, VMs operate at the IaaS layer. IaaS is one of the three service models along with the other two service models, known as PaaS and SaaS layers [4]. Currently, it is not possible to extend VM autoscaling across multiple cloud platforms, and this can be seen from the comparisons provided in Section 5 “Commercial Autoscaling Approaches in Public Cloud Environments”. Also, a challenging task is the accurate number of VMs required to cater a particular workload and then make the proactive autoscaling decisions. While public cloud platforms like AWS, Azure, and Google Cloud provide auto-scaling capabilities, the offerings are mainly reactive with limited proactive autoscaling support with limitations that need to be addressed. For example, proactive scaling necessitates precise workload prediction using machine learning (ML)-based models, a capability that is not typically included in standard cloud services. Although there are some ML capabilities available, such as time series-based predictions, the lack of integrated advanced prediction technologies forces consumers to rely on in-house-developed or third-party solutions for workload predictions and resource allocations. In addition, the inherent delays introduced by the VM provisioning and boot times and the in-return delays imposed on the scale-out actions impact the application performance during the peak and sudden demands. To address these gaps, the authors believe that the autoscaling offerings in public cloud platforms should be equipped with the following capabilities:

Precisely forecasting the future demand by analysing the historical trends with anomaly detection.
Accurate autoscaling decision execution to address the VM provisioning or boot time delays (cold start).
More flexibility in cloud native VM autoscaling solutions, where the VM autoscaling can be extended across hybrid clouds.
Greater versatility in defining custom autoscaling policies using more user-defined and workload-specific autoscaling metrics.

These factors have motivated this research to explore the reactive and proactive autoscaling techniques currently available in the most popular public cloud platforms and identify the current gaps, especially when providing proactive (predictive) autoscaling for VMs and extending the VM autoscaling across hybrid cloud platforms. Therefore, this comprehensive systematic literature review focuses on the challenges of extending autoscaling across multiple cloud platforms (autoscaling in hybrid clouds) at the IaaS layer and reviews proactive autoscaling solutions. The main scope and objective of this review is to identify the gaps in horizontal autoscaling, specifically in proactive autoscaling scenarios, in both commercial implementations and current academic research. This is further broadened to investigate the potential of extending the horizontal autoscaling at the IaaS layer across hybrid cloud platforms, removing the limitation of IaaS horizontal autoscaling restricted to a single cloud platform. Although the primary focus of this work lies in the IaaS layer, there has been some research work selected from the PaaS layer as well, with the intention of re-examining the approaches employed in those works that enabled proactive and hybrid cloud autoscaling and adopting those for VM autoscaling at the IaaS layer. The main contributions of this paper are:

Reviewing the current research work related to VM autoscaling to identify the research directions and gaps in proactive autoscaling and hybrid cloud autoscaling.
Identifying how proactive autoscaling and workload classification are introduced for container autoscaling using ML-based technologies.
Performing a comparison of the autoscaling offerings of three major public cloud platforms and one open-source cloud platform to identify the current gaps and issues.
Define future research directions to address the gaps identified in this review.

1.2. Structure of the Paper

To provide a better understanding of the contents discussed in the reviews, Figure 1 depicts an overall taxonomy of concepts related to this research. In addition, commonly used autoscaling techniques in cloud computing (both the reactive and proactive autoscaling categories) are briefly presented in Section 2. Then, to present a holistic view, a review of the autoscaling issues at the PaaS layer has also been performed. The motive behind this is to identify the extent to which the approaches and techniques proposed at the PaaS layer are suitable to enhance proactive autoscaling at the IaaS layer. In Section 3, a critical review of cloud autoscaling is presented. It focuses on three aspects: reactive autoscaling, proactive autoscaling in a single-cloud infrastructure, and proactive autoscaling across multiple cloud infrastructures. In Section 4, research methods are presented, followed by the classification of autoscaling techniques in Section 5. Then, the discussions on commercial deployments of autoscaling are presented in Section 6. This section focuses on three selected commercial cloud infrastructures. Section 7 summarises the challenges identified during the review of autoscaling approaches in previous research works and commercial deployments. Finally, conclusions and future research directions are presented in Section 8. Table 1 below provides an overview of this paper’s structure.

2. Rapid Elasticity for VMs in Cloud Platforms and Current Challenges

Rapid elasticity (also known as autoscaling) allows the adjustment of computing resources both horizontally and vertically to address changing resource demands. In this section, various autoscaling concepts are briefly elaborated to fully comprehend the scale of the problem.

2.1. Rapid Elasticity at Different Cloud Deployment Layers

To the best of our knowledge, for services operating at the Platform as a Service (PaaS) and Software as a Service (SaaS) layers, these rapid elasticity features are provided as a managed service by cloud service providers. This means that deployment teams have significantly reduced planning and administrative overhead for the deployment task of rapid elasticity or autoscaling for those layers. From the end-user perspective, the services they receive in terms of an operating environment (such as a Linux container or a VM) are seamless, i.e., they are not concerned or aware of the layer in which the services they receive are implemented. However, from a business perspective, it is an architectural or design decision that defines the extent of the elasticity or autoscaling embedded in a certain service or application that they provide to the end-users. This integrally relates to the Service Level Agreement (SLA) of a particular service that the end-user is expecting to receive. According to Mirobi, G. Justy, and L. Arockiam [5], from a consumer perspective, the SLA enables the consumer to obtain information about the service they receive and provides a description of the service, the layer the service operates in, and business-level policies of the service, including the areas of shared responsibility and key aspects around the agreed security and management strategies. This also includes information about the quality of service (QoS), which indicates the level of performance, reliability, and availability of the services offered to the consumer by the provider [6]. For example, for the services operating at the PaaS layer, the administrative overhead in planning the extent of the autoscaling footprint across a single cloud platform or across the hybrid cloud is minimal, and this overhead can be further reduced if the customers choose to use a service provider that manages container services operating at the PaaS layer, such as AWS EKS [7] or Google Containers [8]. However, for services operating in the IaaS layer, infrastructure teams do not have the advantage of cloud provider-managed autoscaling. In this case, infrastructure teams need to spend significant amounts of time planning autoscaling by identifying workload behaviours and matching the correct autoscaling strategy, policies, etc. When considering the integral complexities of today’s IT workloads and due to the varying nature of the demand for services from end-users or businesses, manual planning and finding an appropriate autoscaling solution become significantly challenging tasks. Autoscaling increases cost efficiency, maintains QoS and SLA, and increases service resilience. However, poorly planned and deployed autoscaling in production infrastructures will not provide optimal results and will introduce significant SLA/QoS violations. As an example, we can consider a web-based application front end hosted across two data centres in two different geographical regions, as shown under Case 1 in Figure 2; e.g., one data centre is in London, and the other is in Dublin. The virtual machines (VMs) hosting the web application are equally spread across the two data centres, and the autoscaler continues to add or remove the virtual machines accordingly as and when the demand fluctuates. In this scenario, the autoscaling feature horizontally scales-out, or scales-in, the number of virtual machines the web-based application has available for end-user access. The identified key challenges are as follows:

How do we determine the optimum number of VMs to provision to avoid SLA/QoS violations?
How do we know exactly when to provide the additional required capacity?
What are the measures we could take to reduce provisioning latency?

Finding accurate values for these parameters will determine how efficiently a particular autoscaling approach is deployed in a cloud infrastructure. Although VM autoscaling at the IaaS layer is not straightforward in specific cloud infrastructure, it becomes more challenging in hybrid cloud infrastructures. Considerably increasing numbers of organisations are now obtaining services from multiple cloud providers, for instance, by extending the on-premises private cloud infrastructure to either one or more public cloud platforms; hence, novel approaches to autoscaling in hybrid clouds are needed. This does not mean the load or traffic sharing is offered by means of a load balancer, which allows us to easily deploy to send traffic to multiple destination IP addresses hosted across various locations. In this case, load balancing redistributes the load from an entry or an ingress point (traffic-receiving node or a device) to a set of available back-end nodes, whereas in autoscaling, the back-end computing resources are either scaled in or out based on the fluctuating demand for the resources [9].

2.2. Capability of Multi-Cloud Support for Autoscaling

To the best of our knowledge, services operating in the PaaS layer can span multiple cloud platforms; however, despite some research, for workloads hosted in the IaaS layer, there are no commercially available solutions to achieve autoscaling. This leaves the cloud service consumers at the IaaS layer with vendor dependency to distribute the workload across a hybrid cloud; hence, consumers are unable to take advantage of autoscaling, provide a consistent level of business services, and extend service resilience. For instance, in the case of a single-cloud infrastructure, if an entire data centre becomes temporarily unavailable in a zone or site, the VMs previously running on that site will be redistributed across the other available site(s) because of autoscaling. However, for similar situations in hybrid cloud infrastructure, consumers do not have the option to autoscale and hence cannot benefit from the rapid elasticity characteristic. In the above context, Case 2 in Figure 2 depicts a scenario in which VMs are redistributed across the remaining availability zones following a zonal failure or as a deliberated operational task of moving VMs from one zone to another by updating the relevant autoscaling rule sets. If the scenario depicted in Case 2 in Figure 2 can be extended across multiple cloud platforms (see Case 3 in Figure 2), then this can be cost-efficient and resilient for consumers. For example, consumers can exploit features such as AWS spot instances [10] and GCP spot VMs [11] if a bursting workload can be extended under a common autoscaling VM set, and the VMs can be dynamically moved across multivendor cloud infrastructures based on the best offers they receive from either of the two cloud vendors.

2.3. Proactive Autoscaling Support

Proactive autoscaling options are either limited in some public cloud platforms or unavailable. This issue exists at both the IaaS and PaaS layers. Some commercial cloud providers, such as AWS “Predictive scaling for Amazon EC2 Auto Scaling [12], GCP “Scaling based on predictions” [13], and Azure “Predictive Autoscaling” [14], provide solutions; however, the extent of deployment is somewhat constrained by the limited features that consumers can use when setting up proactive autoscaling options. For example, Predictive Scaling is limited to historical data and does not perform any self-learning or adaptive scaling. Additionally, the thresholds that can be used are limited to CPU usage-based metrics only in Azure and GCP. Also, none of the public clouds, including the ones referenced above, support an autoscaling solution that extends horizontal autoscaling across hybrid cloud platforms (this is discussed in Section 5). It can be argued that open-source multi-cloud management platforms, like the Melodic platform [15], offer this capability. However, the Melodic and subsequent variants originated from the Melodic algorithm, which offers the deployment of resources to multiple public clouds, the optimisation of resources, and cost utilisation only, and does not offer the capability of autoscaling or VMs across multiple clouds as a single resource pool.

3. Autoscaling in Clouds—Previous Work

There is considerable academic research exploring the use of autoscaling in all service models and deployment models of cloud platforms. This article highlights the issues and gaps that the current autoscaling deployments have and proposes various frameworks and architectural solutions. This section presents a critical review of the literature shortlisted for the PaaS and IaaS service models. To maintain simplicity, three subsections cover the issues, limited only to (i) reactive autoscaling, (ii) autoscaling in a single cloud infrastructure with dynamic or Predictive Scaling Support, and (iii) autoscaling across multiple clouds.

3.1. Reactive Autoscaling

Bouabdallah, R. et al. [16] suggested a methodology for resolving the CPU bottleneck issues in horizontal elasticity by using a reactive model to perform scale-out operations and a proactive approach to perform scale-in operations. The solution is based on a multiagent system that acts within the concept of monitor, analyse, plan, execute, and knowledge (MAPE-K) loops to achieve autoscaling. The MAPE-K loop is a control theory-based feedback loop, and Arcaini, P. et al. [17] identified this as a method of the most influential reference control model for autonomic and self-adaptive systems. The multiagent system is designed with three agents. The monitoring agent collects the CPU metrics from the VMs (representing the monitoring phase); an analyser agent represents the analysis phase, analyses the collected CPU metrics, and evaluates the analysed metrics against two models for reactive and proactive autoscaling; and a manager agent performs the autoscaling actions based on the selected model, i.e., either by using more VMs to remove the CPU bottleneck or by scaling in whenever possible to optimise the resource utilisation. However, the model does not apply this self-adaptiveness for scaling out; rather, it only performs specific reactive actions for the scaling out operations when needed.

Iqbal, W. et al. [18] proposed VM autoscaling for multi-tier applications in the IaaS layer. Their method uses a monitoring approach to capture the application traffic through access logs every second and then processes the arrival rate of traffic in the last “k” intervals to predict the arrival rate of traffic for the next “k” intervals. These two observed response times and the predicted arrival rates are used to create a model that predicts the response time using polynomial regression. Based on the predicted response time, the required resources will be provisioned automatically by using an Application Configuration map. This Application Configuration map is derived using a Random Decision Forest (RDF) and trained offline using historical performance data generated against custom provisioning policies that are manually defined. The state transition configuration map extracted from the RDF classifier encodes a constant function against the performance variations of the underlying VMs. This map defines the number of resources required at any given time interval by the application to maintain a required level of performance. The authors [18] claim that the proposed method differs from traditional machine learning-based autoscaling methods, which are trained using static historical data; therefore, those methods fail to address the relationship between the varying performance arrival rate and resource requirements. This is because the data used to train models could be obsolete when compared against the different levels of performance. The key algorithm of this framework is Random Decision Forest, which is an extension of the Decision Tree (DT) predictive model. DTs recursively covariate space into subspaces so that each subspace initiates a different prediction function [19]. However, DTs are disadvantageous because they cannot grow to random complexity without compromising the accuracy of the generalisation of unseen data. To resolve this Ho, Kam T [20] proposed RDF by using a forest of trees in randomly selected subspaces. The method uses the predicted response time and matches the resource requirements using the RDF, and then it reactively allocates the resources. Although this method uses an RDF-based technique to predict the resource requirements for a particular workload, autoscaling only occurs on demand (i.e., the RDF is used to classify the resource requirements but the scale-out and scale-in operations take place reactively), so this method can be classified as a reactive autoscaling method operating on a single cloud platform.

Liu, C. et al. [21] proposed a categorical prediction approach for predicting varying workloads for service clouds. The method deployed both the Linear Regression (LR) and Support Vector Regression (SVR), for evaluating the predictions. The proposed approach uses feedback from the latest observed workloads to the prediction model and dynamically updates the model parameters (which determine the autoscaling action) dynamically. A 0–1 integer programming model was used to trade off the prediction accuracy and prediction time using the sum of the L2-norm of the workload average. The architecture consists of an Admission Controller that determines the user’s access, and a resource manager module, which adjusts the resources dynamically as per the user requests. The database module collects and stores workload logs, which are used to analyse and categorise the workloads and to train/test the prediction model. The Predictor Module consists of a workload classifier and is LR/SVM-based. It occupies regression-based machine learning algorithms for predictions, and control theory is used to perform autoscaling in accordance with the predictions. Using LR, linear correlations between dependent and independent variables through data analysis and modelling are captured [22]. When compared to LR, the SVM offers classification in addition to regression with maximised predictive accuracy and automatically avoids overfitting [23]. The Predictor Module is used to classify the workloads based on different data types and variations of the workloads. The unique prediction method is applied for a given workload based on the optimal solution provided by the workload classifier. The total of these workloads determines the output of the platform, which it considers as the total predicting workload of the platform. This will be taken as the input to estimate the number of VMs, and the resource manager will provision these numbers of VMs. They claimed that this solution “can dynamically determine the type of service workloads effectively” and has addressed the prediction of optimal resource provisioning based on various types of workloads. Further analysis indicated that they used the LR/SVM and control theory-based approach to predict optimal resource provisioning for specific workload types and to reactively perform resource allocations; therefore, this approach lacks proactive autoscaling and is limited to a single cloud platform.

Wadia, Y. et al. [24] proposed an open standard-based portable framework called iAutoscaler, which can be customised for use across a range of public and private clouds to provide autoscaling capabilities. The solution contains Web-Based User Interfaces (Web UI), a Monitoring Engine (ME), a Decision Engine (DE), a Provisioning Manager (PM), and a database (DB) as the key components. Web UI provides users with the facility of management and the configuration of the autoscaling in various clouds. The ME collects the metrics and provides a snapshot of the cloud VM instance status, and these metrics are stored in the DB. The ME comprises a monitoring manager and monitoring agents and processes the data collected in the DB and updates the DE, which makes rule-based or threshold-based autoscaling decisions. These autoscaling decisions will reach the cloud platform via the Provisioning Manager, who will eventually perform the autoscaling actions accordingly. In summary, the solution is developed upon customised open-source tools and performs reactive autoscaling in a single cloud platform in which the iAutosaler is specifically configured to operate on; therefore, this provides the advantages of portability across different cloud platforms and the use of a unified administrative interface to perform autoscaling across different cloud platforms. However, this solution is unable to offer any support for extending autoscaling across multiple cloud platforms as a single entity and does not provide any proactive autoscaling capability.

Si, W., Pan, L. and Liu, S. [25] focused on improving cost savings through the timely allocation of autoscaling VMs used as a platform for SaaS applications. Another objective of this framework is to avoid SLA violations for the users of the applications, which may incur penalties in the case of any SLA violation. The solution intends to provide optimal VM allocations based on the arrival requests to a web application and attempts to obtain the best-priced AWS EC2 spot instances. To achieve this, they used an Online Auto-Scaling algorithm, which adds new VMs to serve the incoming requests based on real-time information relating to the maximum capacity of the currently serving VMs and the number of request refusals (or SLA violations) on the SaaS-based application. Based on these metrics, the algorithm dynamically adds the new VMs, and new requests are forwarded to those newly provisioned VMs. The algorithm does not use any ML technique but is based on a mathematical formula derived from the metrics related to the arrival rate, VMs currently running with maximum capacity, and request refusals at the application level. Additionally, the solution is limited to only AWS; hence, this approach cannot be directly applied to other cloud platforms without introducing significant modifications. For example, although some areas of this algorithm, such as the arrival rate and SLA violations, can be reused for similar applications elsewhere, VM-related areas need a complete rework, as they are based on the behaviour of AWS spot instances and pricing details.

3.2. Autoscaling in a Single-Cloud Infrastructure with Dynamic or Predictive Scaling Support

Bibal Benifa, J.V. and Dejey, D. [26] proposed a Reinforcement Learning (RL) [27,28] based approach for proactively scaling virtual resources in cloud environments. The solution uses the State–Action–Reward–State–Action (SARSA) algorithm and RL. SARSA is a model-free, RL algorithm that helps to improve the actions an agent takes over time without the use of a predefined model for training. In SARSA, agent learning is based on the “action value function”, which consists of the “state and action” and defines the value of the reward the agent should be given depending on the action the agent takes at a particular state [29]. In this proposed approach, the agent, used for autoscaling decisions, learns the autoscaling policies at the same time it is operating. The action the agents need to perform is to add or remove the VM based on the autoscaling policies. As these policies are improving over time due to parallel learning and rewarding, accurate autoscaling decisions can be made to predict the correct resource requirements for heterogeneous and fluctuating workloads. The solution consists of two main components, the resource monitor (RM) and an autoscaler. RM accumulates the SLA profiles of the VMs and the utilisation and performance metrics of the applications running on each VM. This information is then consolidated and fed into the autoscaler, which computes the optimal scaling policy and performs an in/out scaling action. This will result in evenly balancing the system load across the VMs through a load balancer. This method targeted the workloads running on VMs provisioned on a Xen hypervisor. This work has introduced effective resource utilisation using the RL-based dynamic autoscaling approach, and it seems worth extending and evaluating this framework for hybrid cloud environments.

Arabnejad, H. et al. [30] used a framework based on Fuzzy Q-Learning for Knowledge Evolution (FQL4KE) which learns and modifies autoscaling rules at runtime. This prevents the users from providing the knowledge for autoscaling rules; rather, the system can adjust the resources automatically without having any former knowledge. The solution has been implemented and tested on OpenStack in the IaaS layer as a proof of concept, and the authors claim that the solution can also be applied to other cloud providers, such as Azure. The method uses an enhanced version of Q-Learning combined with fuzzy systems. The performance of the Q-Learning algorithms degrades when the state space expands. This is due to the slowness of the learning speed [31]. This issue has been resolved by combining Q-Learning with Takagi–Sugeno fuzzy systems to form Fuzzy Q-Learning [32]. This work also uses the MAPE-K loop to continuously monitor various performance metrics to evaluate the system status against performance goals and then to adjust the resources to satisfy those goals, i.e., the provisioning of VMs. This work introduces an approach for self-learning workloads and autonomously adapting autoscaling decisions for dynamic workloads. One main advantage of this framework is that it monitors various performance metrics that can be used as thresholds; this feature can be further developed when we consider multiple metrics in a hybrid cloud autoscaling scenario.

Golshani, E. and Ashtiani, M. [33] proposed a proactive autoscaling mechanism for horizontal scaling. The solution uses a Temporal Convolutional Neural Network (TCN)-based algorithm to predict future service requests and then maps the predicted workloads with both real-time and future resource requirements. Additionally, an autoscaling decision-making method is designed based on a multicriteria decision-making (MCDM) approach. The solution addresses the issue of predicting future resource requirements based on historical workloads captured during a fixed time window and making autoscaling decisions accordingly. Therefore, this study provides a framework for proactive/autonomous autoscaling. However, the solution focuses only on autoscaling in a single-cloud platform.

Chudasma, V. et al. [34] focused on elastic resource allocation for highly dynamic and unpredictable workloads in hybrid cloud environments using an algorithm based on Deep Learning–Long Short-Term Memory (LSTM) and Queuing Theory concepts. LSTM was proposed by Hochreiter and Schmidhuber in 1997 [35] as a solution to the exploding and vanishing gradient issues in the Recurrent Neural Networks (RNN) [36]. According to Hochreiter and Schmidhuber [35], LSTM is a novel approach for RNNs that uses an appropriate and efficient gradient-based learning algorithm. This gradient-based algorithm enforces constant error flow through the internal states of special units, which makes it possible to learn to bridge time intervals of more than 1000 steps. Chudasma, V. et al. [34] used LSTM to proactively predict the adequate amount of computing resources for short-term resource demand and used the model to forecast the resource requirements. This has contributed to improving proactive autoscaling for bursting workloads through the exploitation of Long Short-Term Memory (LSTM).

Biswas, A. et al. [37] focused on the automatic provisioning of cloud resources by an intermediary enterprise. The intermediary enterprise provides AWS Virtual Private Cloud services to a third-party consumer using resources from a public cloud service provider. This provides services to third-party consumers while reducing costs. The key component of the solution is the Broker that runs within an organisation called “Intermediary Enterprise (IE)”, and this Broker is responsible for autoscaling several resources acquired from a public cloud provider. This solution comprises an autoscaling technique based on both a reactive and a machine learning-based proactive approach for scaling out resources proactively to accommodate the demand for adequate resources. LR and SVM are used but compared to the use of LR and SVM in Liu, C. et al. [21] for workload prediction and classification; in this work, LR and SVM are used for predicting workload patterns. The limitation of this solution is that the approach is focused on a single cloud platform only; therefore, the solution can only be used to perform autoscaling either on an AWS or on another cloud platform with required modifications but cannot be applied to autoscaling in hybrid cloud platforms.

A reinforcement learning-based approach to autoscale the EC2 instances on the AWS proposed by Garí, Y. et al. [38] focuses specifically on the scientific workloads hosted on the cloud. The scientific workloads use workflows to declare the flow of processing steps and demand a larger amount of computing, storage, and networking resources. They highlighted that scientific workflows can greatly benefit from reliable and affordable virtualised resources and elasticity characteristics, which also offer dynamically scalable resources on demand. This resource allocation can be improved by dynamically allocating resources for specific workflows by choosing a proper scaling policy. This scaling policy has been optimally chosen by using the Markov Decision Process (MDP) and Q-Learning-based methodology. This framework focuses on how to improve autoscaling features by proactively choosing the correct types of VMs and the number of VM allocations for the resource-intensive and dynamically changing workloads of scientific workloads in a single cloud platform (AWS).

A framework that uses workload classification for predicting resource allocation was proposed by Alanagh, A. et al. [39]. This framework uses time series-based algorithms for classifying future workloads and then allocates or deallocates VMs through an autoscaler. In the proposed framework, the LR, SVM, and ARIMA models have been used to predict the most suitable workload for applications. To automatically find the best prediction model for a given workload, they trained a neural network model on a dataset comprising “N” workloads, each with approximately 110 application features used, and “m” models were added as target classes to this dataset. Once the correct model for a particular workload has been selected, the workload for a job will be predicted according to the characteristics of the model (based on either LR, SVM, or ARIMA). The evaluation of the framework was carried out in a simulated environment developed using the Python and R programming languages. The evaluation of the environment was performed by using publicly available Google cluster data, which include Google Cloud server logs that provide more granular data related to individual jobs and the total CPU consumption for each job. Therefore, this framework has not been evaluated in a real-time cloud environment; however, with further research, the workload classification and prediction techniques proposed in this work can be used as a baseline for proactive autoscaling in either a single or hybrid cloud environment.

Tournaire, T. et al. [40] proposed a framework that addresses the issues of defining the precise values for thresholds, focusing on the queue occupancy thresholds that trigger VM autoscaling on physical machines. The objective of this work was to compute an optimal policy that can be used to add or remove VMs while preserving energy and simultaneously maintaining the SLA. To address the issue of calculating optimal threshold values for a multiserver queuing system, they considered two main optimisation approaches: Continuous Time Markov Chains (CTMCs) and the Markov Decision Process (MDP). The experimental setup modelled a physical server hosting VMs that serve the requests fed through a continuous time-controlled multiserver queuing model. Comparative evaluations have been performed to identify which approach out of MDP or CTMC provides the optimal thresholds, and the authors claim that the MDP approach performs significantly better than the CTMC. In summary, this work proposes a framework that can quickly and dynamically calculate the thresholds that trigger an autoscaling operation. In this work, the authors focused only on the queue occupancy. It is worth it to consider how this approach can be further developed and repurposed for other metric threshold calculations.

Chouliaras, S. and Sotiriadis, S. [41] proposed a framework named the Performance-Aware Autoscaler for Cloud Elasticity (PACE), which addresses the issues of the under- or overutilisation of systems used for running on-demand workloads. The proposed solution uses both reactive and proactive autoscaling techniques. Reactive autoscaling uses threshold-based rules, and proactive autoscaling uses Convolution Neural Networks (CNN) and K-means clustering to forecast the demand to generate autoscaling policies. Historical usage data were used to train the K-means cluster, and the time series was partitioned into three clusters: high, medium, and low demand. Then, Dynamic Time Warping Barycentre Average [42], an algorithm that enables the detection of similar patterns in different phases in a time series, was used to calculate the average sequence for each cluster, which will be used in conjunction with the average CPU usage percentage metrics for future workload demand. This work focused on reactive and proactive autoscaling for containerised workloads operating in the PaaS layer, and the work only discussed deployment in a single cloud platform. However, the technologies used for reactive and proactive autoscaling can be considered for VM autoscaling in the IaaS layer.

Wen, L., et al. [43] proposed a streamlined workload prediction framework for Kubernetes container autoscaling. This framework addressed the issues of complications imposed on accurate predictions by diverse and heterogenous workloads, the use of inefficient static models for dynamic workload predictions, and the lack of existing methods in feature extraction in different time sequences. This framework employs Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN), which combines long-term and short-term load forecasting. CEEMDAN is an extension of Empirical Mode Decomposition (EMD), an adaptive methodology to analyse nonstationary and nonlinear time series, by removing the drawbacks of EMD introduced by issues such as the occurrence of oscillations with significantly varying amplitudes within a mode, or the presence of nearly identical oscillations in different modes, referred to as “mode mixing” [44]. Although this work proposed a streamlined workload prediction for container autoscaling under K8 orchestration, the solution failed to address the issues in VM autoscaling at the IaaS layer and was not evaluated for hybrid cloud autoscaling.

Another Kubernetes-based approach introduced by Liu, H. et al. [45] presents a horizontally automated scaling engine utilising a predictive methodology for Kubernetes HPA to proactively predict and configure the resources based on historical data. The framework also performs trend detection for instantaneous and irregular load fluctuations in a high-concurrency environment. This framework uses an improved MAPE structure to predict the future pod count and scale the pods horizontally using the prediction models based on a mixture of LSTM, ARIMA, and Informer. Informer is an enhanced version of transformers, specifically aimed at resolving the efficiency issues of the transformers when they are applied to make long-term forecasts using a substantial amount of time-series data based on past behaviour (Long Sequence Time-Series Forecasting, or LSTF) [46]. The transformer has an encoder–decoder structure. The encoder converts an input sequence of symbols into continuous representations (z₁, …, z_n). The decoder then generates an output sequence (y₁, …, y_m) from these representations one element at a time. At each stage, the model operates in an auto-regressive manner, utilising the previously generated symbols as supplementary input for generating the subsequent symbol. In addition to this, the transformer employs stacked self-attention mechanisms along with pointwise, fully connected layers for both the encoder and decoder. Although this self attention mechanism enables greater parallelisation and can achieve improved translation quality after training [47], the transformer model has performance challenges when applied to Long Sequence Time-Series Forecasting (LSTF) tasks. These challenges include the quadratic computation of self-attention, memory limitations in stacking layers for long inputs, and reduced speed in predicting long outputs. To address these challenges, an informer model is proposed with (i) a ProbSparse self-attention mechanism, which efficiently replaces canonical self-attention, achieving O (Llog L) time complexity and memory usage for dependency alignments; (ii) a self-attention distillation operation, which prioritises dominant attention scores within the J-stacking layers, enhancing the processing of long sequence inputs; and (iii) a generative style decoder, which produces long sequences in one step, preventing cumulative errors during inference [48]. However, in this proposed framework, Liu, H et al. found that the predictions made using all three models produce close performances but, considering the training time and the prediction speed, LSTM was chosen as the predictive model for the autoscaler.

Zheng, Y. et al. [49] proposed a predictive autoscaling framework that employs a hybrid toolset consisting of a transformer-based approach called PatchTST for accurate workload prediction and scaling, an Adaptive Sequence Transformation for cold-start, time-sensitive Gaussian filtering for efficient decisions, and a Golang Libtorch wrapper to invoke Torch models. The authors of this document are particularly interested in workload prediction, which is achieved by using a “Patch Time Series Transformer” (PatchTST) [50]. PatchTST is an enhanced version of state-of-the-art (SOTA) transformer-based models, which provides a significantly improved prediction accuracy than the SOTA models. PatchTST achieves this using two key designs: “Patching”, which enhances locality and captures comprehensive semantic information that is not accessible at the point level, by aggregating time steps into subseries-level patches, and Channel Independence, which treats multiple channels formed from a multivariate time series independently, where information from each channel represents a separate input token to a transformer model. Although this approach addresses the prediction inaccuracies due to significant differences in time series data, cold start, and lack of Golang [51] support, this lacks the support for autoscaling across multiple cloud platforms and the framework does not provide support for extending the proactive scaling across multiple cloud platforms.

3.3. Autoscaling Across Multiple Clouds

Shreya, M. M [52] proposed a solution to autoscale from on-premise private clouds to public clouds without manual intervention. This solution was developed by using a virtual API proxy layer that maps other clouds as sub-clouds to the OpenStack cloud. To achieve this, an enhanced version of the Nova Compute API is used to intercept and forward the user request for computer resources to multiple sub-clouds based on the resource demand. This solution addresses the issue of autoscaling across multiple clouds; however, the solution does not involve distributed resource monitoring across multiple cloud platforms involved in autoscaling operations. Hence, the data used for autoscaling decisions are limited to what is captured from only one cloud platform. This limits the granularity and efficiency of autoscaling operations. Furthermore, this solution does not support any proactive autoscaling.

Harwalker, S. et al. [53] explored how to improve the efficiency of resource allocations during peak loads to reduce the time it takes to allocate resources. Their solution was an extension to OpenStack autoscaling functionality and is based on Predictive Scaling and Delta Correction using a Markov Decision Process (MDP)-based Hidden Markov Model (HMM). The Markov Decision Process [54] is a reinforcement learning-based model that is used for decision making in discrete, stochastic, and sequential environments. In MDP, the agent adapts and changes the state randomly in response to the actions chosen by the decision maker and evolves the next actions based on the reward it receives. The HMM is an extension of the MDP and allows all the observation symbols to be derived from each state with a finite probability [54]. According to Rabiner et al. [55], the HMM is a “doubly stochastic process with an underlying stochastic process that is not observable (i.e., it is hidden) but can only be observed through another set of stochastic processes that produce the sequence of observed symbols”. The solution uses HMM for efficient resource allocation and performs dynamic migration of the VMs across multiple clouds, using enhanced versions of OpenStack components related to autoscaling and resource management (nova, glance, cinder, neutron, and gnocchi APIs). These components act as an API proxy layer, interacting with remote clouds for autoscaling and resource management. Although the solution uses RL-based HMM for workload and resource requirement predictions, the autoscaling actions are based on the OpenStack Nova API scripts triggered manually; therefore, the resource allocations are more manually performed rather than being adaptive and autonomous.

Sriram, S.N. et al. [56] proposed a novel scheduling strategy for processing containerised applications efficiently in a cloud environment by using a Heuristic Autoscaling policy. The overall solution can be summarised into four main areas: (i) a heuristic-based matching strategy designed to find the best-fit container for each microservice based on the computing resource requirements, (ii) a rule-based strategy used to address the cold start problem of container-based scheduling, (iii) a strategy used to launch containers on a suitable virtual/physical machine, and (iv) a Heuristic Autoscaling strategy developed to control autoscaling by varying the number of physical/virtual machines. This solution addresses the issues of autoscaling, such as platform dependency and delay in capacity allocation, by performing autoscaling across hybrid clouds and introducing a solution for cold start. However, the solution works only at the PaaS layer.

Abeer Abdel Khaleq and Ra, I [57] addressed the issues of precisely setting predefined autoscaling metrics by proposing a framework. They focused on the autoscaling provided by the Horizontal Pod Autoscaler (HPA) in Kubernetes and used a reinforcement learning-based approach to learn resource usage from the Kubernetes-based microservice environment and update the Kubernetes HPA dynamically to add/remove new pods. The RL model interacts with the environment and undergoes training until the desired outcome is achieved, i.e., to add/remove resources or new pods while maintaining the average response time at the expected QoS. The RL agent is rewarded as to how far it can keep the response time lower than the current response time, which will improve the response time over time and consequently improve the QoS. As this work is based on operating at the microservice level, this could be extended to container autoscaling across hybrid clouds in a proactive way using an RL-based approach. Furthermore, the RL-based approach may be used in the IaaS layer for proactive autoscaling of VMs.

3.4. Discussion of the Comparative Analysis

Based on the above review, it is obvious that reactive autoscaling is a commonly used technique because of its straightforward and simplistic approach to scaling the resources based on the varying resource demands. However, reactive autoscaling carries inherent inefficiencies when dealing with sudden spikes or prolonged periods of low resource demands. Several approaches support reactive autoscaling, such as Bouabdallah, R. et al. [16]’s MAPE-K-based multi-agent system, RDF-based prediction by Iqbal, W. et al. [18], Liu, C. et al. [21]’s regression-based predictions, Wadia, Y. et al. [24]’s iAutoscale, and Si, W., Pan, L., and Liu, S. [25]’s timely allocation of VMs for cost improvements on AWS for SaaS-based applications. In contrast, predictive or proactive scaling addresses the limitations of reactive autoscaling where it fails to provide a mechanism to predict the resource requirements efficiently. Several approaches support proactive scaling; however, they lack the ability to scale across a hybrid cloud environment. The approaches are as follows: Bibal Benifa, J.V. and Dejey, D. [26]’s RL-based SARSA model, a Fuzzy Q-Learning-based approach by Arabnejad, H. et al. [30]; Temporal Convolution Networks used by Golshani, E. and Ashtiani, M. [33]; Deep Learning LSTM [34] used by Chudasma, V. et al. [34]; Biswas, A. et al. [38]’s Broker-Based Scaling; the Markov Decision Process used by Garí, Y. et al. [38]; a hybrid ML approach consisting of LR, SVM, and ARIMA used by Alanagh, A. et al. [39]; Threshold-Based Optimisation used by Tournaire, T. et al. [40]; PACE framework seen in the approach proposed by Chouliaras, S. and Sotiriadis, S. [41]; Kubernetes-based Workload Predictions, such as the solutions using CEEMDAN by Wen, L., et al. [43]; the LSTM + transformer-based solution used by Liu, H. et al. [45]; and the PatchTST transformer-based approach by Zheng, Y. et al. [49]. The lack of hybrid cloud support is a key drawback of the current commercial public cloud platforms. This issue has been addressed by fairly limited research work with some limitations. For example, the API proxy-based scaling approach by Shreya, M.M. [52], which uses OpenStack as a master controller, mapping multiple clouds as sub-clouds, which lacks distributed monitoring, is missing. Similarly, the Predictive Scaling and Delta Correction approach proposed by Harwalker, S. et al. [53] uses the Markov Decision Process (MDP) and Hidden Markov Model (HMM) for efficient resource allocation but relies on manual API triggers. Among others, hybrid cloud Heuristic Autoscaling by Sriram, S.N. et al. [56] uses a rule-based and heuristic strategy for scheduling containerised applications across clouds; however, this solution is restricted to the PaaS layer. The RL-based Kubernetes scaling approach by Abeer Abdel Khaleq and Ra, I. [57] uses reinforcement learning to optimise Kubernetes pod autoscaling but lacks IaaS scaling across clouds. Table 2 and Table 3 provide a tabular summary of the main pros and cons of each of the three categories.

4. Methods and Materials

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology was adopted for the systematic literature review [58]. Using PRISMA, several databases, including IEEE Explore, Elsevier, Springer, and Google Scholar, were searched by using several combinations of related keywords, such as “autoscaling”, “hybrid cloud”, “IaaS”, “PaaS”, “proactive autoscaling”, and “bursting workloads”. From the initial search, the relevant articles were shortlisted. Then, articles related to autoscaling in the IaaS and PaaS layers were included in the second stage. In the third and last stages of the process, further filtering was performed to select articles focusing on horizontal autoscaling methods and their appropriateness for hybrid clouds using proactive or machine learning approaches. Zotero [59] bibliographic and research documentation organisation software was used throughout this process to download, store, segregate, and continuously track the reviewed articles. Figure 3 presents the PRISMA flow diagram arranged according to the document-filtering process in this review, and Figure 4 depicts the temporal frequency of the selected articles from different research databases. Table 4 provides a tabular summary of the paper selection criteria.

5. Classification of Autoscaling Techniques

The main approaches for performing autoscaling in cloud environments are based on how the application in question is hosted. When the application is hosted on a single computational resource, for example, on a virtual machine (VM), this type of autoscaling is known as vertical autoscaling [60]. In this case, the CPU and memory resources are allocated (or deallocated) to the VM based on the varying resource needs of the application. If the application is hosted in a more distributed manner across multiple computer resources, then the term horizontal autoscaling is used. In this scenario, the demand for the number of VMs will either increase or decrease based on the application resource needs. The literature on clouds and various deployment guides for public clouds for both vertical and horizontal autoscaling reveals that autoscaling can be further divided into two main categories: reactive and proactive autoscaling. In the literature, five categories and underlying approaches are being used for autoscalers [61]. These approaches are depicted in Figure 5 and are discussed briefly in the following sections.

5.1. Threshold-Based Rules

This is the most widely used and simplistic form of autoscaling technique identified in the public and open-source cloud platforms discussed in Section 5—commercial deployments of autoscaling in public cloud platforms. In this technique, autoscaling actions are triggered based on a preset threshold and a set of rules defined against those thresholds. These thresholds can be defined against system resources such as CPU, memory, and I/O throughput or external metrics such as the number of user requests or latency-related metrics. If the number of incoming connections at the load-balancing layer for a given application exceeds a certain threshold, then to satisfy increasing demand and comply with the QoS requirements, the autoscaling rules can be set to increase the number of VMs to a certain number. Similarly, when the number of incoming connections is below the required threshold, the autoscaling rules can be set to decrease the number of VMs. The threshold-based rule approach is very easy to use for autoscaling; however, there should be a very good understanding of the behavioural patterns of related workloads to set the thresholds and rules to obtain optimal results from this approach.

5.2. Reinforcement Learning

This machine learning approach enables autonomous autoscaling based on historical demand learned through interaction between an agent and the autoscaling environment [60,61]. Autoscaling decisions are made based on the learned experience of the autoscaling environment and the thresholds of the metrics used to trigger the autoscaling decisions. Autoscaling actions will be made based on the current state of the environment, and future decisions will be made with reference to the current state with the aim of improving future autoscaling actions. To achieve this, once an autoscaling action has been taken, the agent will be rewarded from the autoscaling environment based on how good the autoscaling action was. Actions with higher rewards will be encouraged and tend to be executed repeatedly with further enhancements (or reinforcements), while actions with lower rewards will be treated in the opposite manner. Verma S and Bala A [62] identified four main approaches to reinforcement learning in autoscaling: (i) The Markov Decision Process (MDP), where the fundamentals of MDP are applied to define a particular state S of autoscaling in terms of the total number of user requests in a given time period (w), the number of allocated virtual machines, (u) and the performance based on average request response time (p). MDP is particularly used in horizontal autoscaling. (ii) Q-Learning, where a preliminary approximation of Q-Learning function is used as a mechanism for updating the Q value for each state in all iterations so that the negative performance of RL is improved. (iii) Parallel learning, in which each mediator does not need to query each state and action repeatedly; instead, it can obtain the value of unvisited states. (iv) Neural networks, which can predict the value of unvisited states by taking a state-action pair as the input and producing a Q-value as the output. All latter three methods can be used to improve the prediction accuracies in the autoscaling process over time.

5.3. Queuing Theory

Queuing Theory is a mathematical model used to explain the behaviour of the waiting lines or queues and correlates the mean arrival time of the objects “λ” to the queue (i.e., in horizontal autoscaling, the rate of arrival of requests to the load balancing layer) and the mean service rate “μ”, at which those objects will be served (i.e., in horizontal autoscaling, the rate at which the requests will be picked up by a node (VM) and processed) [61]. Queuing models are described and classified using a mathematical notation known as “Kendall’s Notation” [63]. In Kendall’s notation, a queuing system is described by A/B/m/K/n/D: “A: distribution function of the interarrival times, B: distribution function of the service times, m: number of servers, K: capacity of the system, the maximum number of customers in the system including the one being serviced, n: population size, number of customers, D: service discipline”. According to Sahni J, Vidyarthi D, P [64], Queuing Theory has been widely used to estimate the performance metrics of internet applications, and they discussed the use of Queuing Theory to model the system’s reactions to varying workloads and the provisioning of the required number of VMs (EC2 instances) for a given workload in an AWS cloud-based infrastructure.

5.4. Control Theory

Control theory-based approaches have also been widely used in autoscaling for varying workloads, mostly in reactive resource provisioning but also in proactive resource allocation, as reported by Verma, S. and Bala, A. [62] and Sahni, J. and Vidyarthi, D.P. [64]. According to Verma, S. and Bala, A. [62], control theory-based systems can be classified into three categories: (i) open loops, where systems use the input of a target system in a given time to determine the output (i.e., the number of resources to be provisioned); (ii) feedback systems, where the input of the target system in a given time to determine the output and the differences in usage are fed back to the input so that the output provisioning can be adjusted accordingly; and (iii) feed forward controllers, which use a model to forecast the system performance and respond to the actual occurrence of error to maintain the output of the target system at a preferred level by regulating the control input.

5.5. Time Series Analysis

According to Lorido-Botran et al. [61], time series analysis can be used to analyse the input workloads and identify the repeating patterns in the analysis phase of the autoscaling process. Therefore, time series can be used in proactive autoscaling as a machine learning-based approach to predict future workloads based on historical usage patterns to initiate autoscaling actions well in advance. In this approach, the incoming traffic, resource consumption, and performance metrics are predicted based on historical patterns. There are five types of time series forecasting methods [65]: (i) time series regression, which uses regression models; (ii) time series decomposition, which splits the time series datasets into four different components (Trends, Seasons, Cycles, and Noise); (iii) Exponential Smoothing (ES), which assigns weighted averages to past observations, and the assigned weight decays exponentially over time; (iv) Autoregressive Integrated Moving Average (ARIMA), which extends the capabilities of ES by using autocorrelations to produce forecasts; and (v) Neural Network-based Recurrent Neural Networks (RNNs) [36], and Long Short-Term Memory (LSTM) [35] techniques. Table 5 presents a summary of autoscaling techniques and their key features and potential use cases.

6. Commercial Autoscaling Approaches in Public Cloud Environments

For this study, we reviewed Amazon Web Services (AWS), the Google Cloud Platform (GCP), and Microsoft Azure, which are the most widely used public cloud environments [66]. These cloud environments offer several products and managed service spectra that are competitive and cost-effective for demand-metered services. In addition to these three, we have chosen the Apache Cloud Stack (ACS) as an open-source cloud platform that is widely used in academic/learning clouds [67] because it is free to use. The following subsections briefly discuss the autoscaling deployments in the above four cloud platforms and compare their autoscaling techniques.

6.1. Amazon Web Services

AWSs provide autoscaling at the IaaS layer in their EC2 autoscaling offering [68]. EC2 autoscaling helps consumers dynamically adjust the underlying EC2 instances according to variations in capacity demands. According to AWS EC2 autoscaling documentation [69], they use their key components: (i) autoscaling groups, which are a collection of EC2 instances placed under a load balancer; (ii) Configuration/Launch Templates, which define the EC2 Instance type, Operating System, software stack, and other vital configuration details; and (iii) scaling options, which govern the autoscaling strategy and provide several ways of scaling the EC2 instances for varying workloads. In AWS, “scheduled scaling” [70] allows users to set up scaling policies according to known or predictable demand changes. In contrast, “Manual Scaling” allows for the addition or removal of EC2 instances to an autoscaling group at any given time. AWS dynamic scaling [71] offers the ability to dynamically increase and decrease the current capacity based on “CloudWatch Metrics” [72], a predefined set of scaling adjustments or single scaling adjustments, and AWS Predictive Scaling offers scaling based on cyclic traffic and other historical metric patterns. Additionally, the AWS provides scaling based on incoming traffic variances; for example, “Scaling based on Amazon SQS” [73].

6.2. Google Cloud Platform

GCP offers autoscaling in combination with managed instance groups (MIGs) [74], which are a collection of GCP virtual machines and autoscaling policies. The autoscaling policy contains one or more signals as specified by the user and uses those signals to trigger the autoscaling action. The autoscaler then either scales-out or scales-in the managed instance group by adding or removing the number of VMs as needed. GCP deploys several autoscaling techniques for autoscaling VMs under an MIG. Scaling based on CPU utilisation [75] and scaling based on monitoring metrics [76] both use threshold-based autoscaling, where the CPU utilisation uses CPU metrics and monitoring metric-based scaling uses user-defined metrics to trigger autoscaling. Scaling based on load-balancing serving capacity [77] uses a threshold-based approach as well, which triggers autoscaling actions based on target capacity of a load balancer. The schedule-based autoscaling implementation of GCP [78] offers the facility of setting up the schedule-based autoscaling of target groups for recurring and one-off events. GCP also offers prediction-based autoscaling [79] for proactive autoscaling based on historical patterns; however, this approach uses only CPU utilisation as the scaling metric.

6.3. Azure Cloud

Azure provides autoscaling at the IaaS layer for VM autoscaling using virtual machine scale sets [80] and Service Fabric [81]. The virtual machine scale sets provide a facility for the users to create a group of load-balanced VMs to scale-out or in based on the demand automatically or based on a scheduled demand. Azure uses two forms of scale sets, “Scale Sets with Uniform orchestration” [82], which uses a VM profile or a template to perform autoscaling using identical VM instances, limiting the number of instances to fewer than 100 VM instances, and “Scale Sets with the Flexible Orchestration”, which provides more flexible autoscaling using up to 1000 VM instances across the Azure ecosystem, spanning the autoscaling capability across multiple regions and Azure Availability Zones [83]. Another advantage of flexible autoscaling is that it provides the isolation of the workloads at the Azure Fault Domain [84], which limits a set of selected VMs to a set of hardware that shares a common single point of failure. Therefore, using multiple Fault Domains within Flexible Orchestration and application can achieve fault tolerance; at the same time, autoscaling is achieved. Commonly, the scale sets use “Azure Monitor Autoscale” [85] schedule-based or runtime (threshold)-based autoscaling. “Azure Service Fabric” autoscaling [81] uses a dynamic approach to autoscale by dynamically provisioning additional resources to the service on demand. Autoscaling can be enabled at the time-of-service creation, and it is not necessary to define the autoscaling policies manually. Autoscaling actions are triggered based on predefined conditions, which are checked periodically. In addition, the Azure VM scale sets can benefit from the Azure “Predictive Autoscaling Offerings” [86], which perform predictive autoscaling actions based on cyclic CPU usage patterns.

6.4. Apache Cloud

The Apache Cloud Stack supports only threshold-based autoscaling [87]. It uses the Autoscaling Configuration Wizard to set up scaling with the desired computational engines, minimum and maximum number of VMs, and cooldown periods for scale-up and scale-down activities and policies. The policies can either be “Duration”-based, where the defined conditions should be true for the length of that duration, or “Counter”-based where the performance counters such as “Linux User CPU [native]—percentage” and “Linux User RAM [native]—percentage” are used to trigger the autoscaling activity. Table 6 summarises the use of various autoscaling techniques in the above cloud environments.

7. What Are the Challenges to Achieving Horizontal Autoscaling at the IaaS Layer?

The above review indicates that AWS, GCP, and Azure provide autoscaling capabilities in all three service models (IaaS, PaaS, and SaaS). However, when considering the horizontal autoscaling capabilities at the IaaS layer, the consumers are responsible for its configuration and management. Obtaining optimal results via those custom configurations and customisations is a challenging task and usually depends upon many factors that can have an impact on the performance of autoscaling or introduce the unexpected negative impact of autoscaling services. These factors are briefly described as follows:

Capacity estimation/load prediction errors: Capacity forecasting and load prediction errors might cause proactive capacity allocations to be either overprovisioned or under-provisioned. GCP best practice documentation suggests several guidelines on how to address this issue with careful manual intervention [88]. The literature indicates that previous attempts were made to resolve autoscaling challenges either by using proactive or reactive autoscaling approaches, but to the best of our knowledge, these attempts have yet to come up with a reliable solution. It is not straightforward to precisely estimate the capacity requirement for a particular workload and adjust additional resources without over/under-provisioning.
Delays in capacity allocation when demand increases or with cold start: A time delay in provisioning the required resources to meet the increased demand will compromise the quality of service. This is a well-known problem in reactive autoscaling [89]. The above review addresses this issue by providing various frameworks for proactive autoscaling. In proactive autoscaling, this issue has been addressed using self-derived algorithms or machine learning technologies based on time series or reinforcement learning.
Infrastructure failures: It can be argued that public clouds will overcome this issue by providing a more resilient and distributed environment; however, the recently reported public cloud failures prove this otherwise [90].
Metric collection/alerting failures: Moghaddam et al. [91] highlighted that detecting anomalies in metrics can lead to autoscaling decision errors. Autoscaling operations are triggered by various system alerts generated during certain events. These alerts are implemented by means of various metric collections. The errors or inefficient analysis of these metrics could result in inefficient or incorrect predictions.
Platform dependency: Currently, there are very limited options available for autoscaling multiple cloud platforms in the IaaS service model, e.g., the inability to use VM bursting across a hybrid cloud environment. This introduces vendor lock-in, which means resource dependency on one cloud platform. In the case of a single point of failure, services cannot be executed.
Predictive (or proactive) Autoscaling: Currently, there are limitations in the commercial deployment of proactive autoscaling. For example, the offerings from AWS and GCP in this domain have limited capabilities and only use time series-based forecasting to perform autoscaling operations. These limitations are further exacerbated when proactive scaling is a requirement in hybrid cloud environments, where predictive autoscaling operations at IaaS span multiple clouds. Figure 6 depicts a holistic view of the above challenges for autoscaling in public hybrid cloud environments.

Figure 6 and Table 7 present these challenges in a summarised form with the specific challenges caused by these and the resulting impacts.

Also, this review reveals that there are significant gaps in the current research and how autoscaling differs significantly when deployed in different public cloud platforms. For example, the current commercial deployments provide limited options for proactive autoscaling in production environments. It is evident from the literature that most of the research carried out in the IaaS layer that has attempted to resolve the issue of proactive autoscaling has considered only a single-cloud infrastructure. In attempts to extend autoscaling to multiple clouds, in the IaaS layer, the literature indicates that the current solutions provide only rule-based or reactive autoscaling. Therefore, to the best of our knowledge, no single solution addresses both the following issues: hybrid cloud autoscaling and support for proactive autoscaling at the IaaS layer. Therefore, we propose the following recommendations/roadmaps for addressing these identified gaps.

1.: Identify common capacity- and load-related metrics, which can be derived from multiple cloud infrastructures.

Apart from scheduled autoscaling, all the other autoscaling strategies discussed in this review depend on real-time or historical metrics. These metrics could be directly included using the metric collection tools provided by the cloud provider or can be altered to “custom metrics”, which could be used by consumers to suit a particular use case or application. Different metrics are measured and calculated differently, and the monitoring and alerting tools vary from one cloud to another. For example, the Cloud Watch in the AWS [92], Cloud Monitoring in the GCP, and the Azure Monitor [93] in the Azure Cloud are used, and these are mostly native to their specific cloud infrastructure. Some monitoring tools, such as Cloud Monitoring in GCP and Azure Monitor [94] metric collection tools, can extend monitoring and metric collection for multiple platforms and on-premise environments. However, to the best of our knowledge, such tools have certain limitations when collecting metrics and alerting on other platforms, such as complexities that are introduced when integrating with external platform tools or their network/security policies, as no straightforward integration is provided. In addition to commercial public cloud offerings, there is an open-source platform for collecting and exporting metrics, traces, and logs between different cloud environment sources. This platform was developed as a part of the Open Telemetry suite [95]. However, not all public cloud vendors support the Open Telemetry Protocol (OTLP). This means that there is a gap in autoscaling metric gathering across cloud environments. To fill this gap would require investigating two areas: first, the current mechanisms of metric collection and alerting already available in public and private cloud platforms (AWS, GCP, Azure, and Apache Cloud Stack, for example); second, how can we design a common metric model that could be universally used in a hybrid cloud autoscaler?

2.: Extend the support for horizontal autoscaling of virtual machines in a single cloud environment to a hybrid cloud setup with integrated proactive autoscaling capabilities.

Most of the current implementations of autoscaling across multiple cloud platforms are based on the PaaS layer and use various container-based solutions for hosting micro services. The literature indicates that there are solutions proposed to extend the autoscaling capabilities to multiple cloud platforms but these solutions have significant gaps in achieving “autonomous” and “platform agnostic” approaches. To achieve the above ambition, state-of-the-art autoscaling mechanisms in multiple clouds, designing autoscaling APIs, metric collection, and authentication at an in-depth level will provide detailed insights and basis to design and develop a proof-of-concept (PoC) of the autoscaler framework. The autoscaler framework can act as the single point of control for VM autoscaling across multiple cloud platforms. The autoscaler decision-making for proactive autoscaling can be based on existing successes in time series and machine learning algorithms.

3.: Define evaluation metrics to evaluate the effectiveness of autonomous autoscaling, providing the required quality of service in a hybrid cloud.

Predictive or autonomous scaling is suitable for workloads that take a long time to start; for example, workloads that are based on VMs with a significant startup latency due to larger life cycle issues. GCP documentation for predictive autoscaling suggests that these types of workloads are the best candidates for autonomous (or predictive) autoscaling. However, there are no specific metrics defined to measure the effectiveness of predictive autoscaling on the platforms mentioned above. To close this gap, one possible solution is to treat the autoscaling process life cycle events in VMs for different workloads to be considered unique states in a finite state machine (FSM) [96]. This will allow for monitoring and evaluation by using metric data such as “VM ready-to-service ready time delay”, percentage of SLA violations vs. percentage of SLA fulfilled, etc.

8. Conclusions and Future Research Directions

This paper reviews rapid elasticity (also called autoscaling) in cloud infrastructure, focusing on the IaaS layer. It examines autoscaling technologies across PaaS and IaaS layers, identifies research gaps, and proposes solutions for proactive autoscaling in hybrid clouds. This review reveals that while proactive autoscaling solutions have been explored at both the IaaS and PaaS layers, the majority of hybrid cloud autoscaling implementations are designed for containerised workloads at the PaaS layer. This highlights a significant gap in VM-based autoscaling solutions at the IaaS layer, which remains a critical challenge for hybrid cloud deployments. Additionally, an examination of commercial cloud offerings indicates a lack of hybrid cloud autoscaling solutions for VM-based workloads at the IaaS layer. This limitation not only reinforces platform dependency for cloud consumers but also acts as a potential barrier to broader cloud adoption. Moreover, this study identifies a critical gap in standardising autoscaling metrics into a unified data model and format. Establishing a common set of metrics for hybrid cloud autoscaling could significantly enhance interoperability and efficiency across heterogeneous cloud environments.

In summary, this paper has:

Identified various issues and challenges faced by autoscaling in cloud infrastructures both at the IaaS and PaaS layers in terms of metric collections and workload classification.
Discussed the lack of support for hybrid cloud autoscaling, specifically for the autoscaling of VMs across hybrid clouds.
Reviewed the proposed solutions and frameworks addressing these issues, which use a combination of machine learning and deep learning algorithms and open-source tools.
Presented comparisons of those solutions, categorising them into three autoscaling domains: single cloud autoscaling, single cloud autoscaling with proactive/dynamic autoscaling support, and proactive autoscaling across hybrid clouds.
Reviewed three of the most popular public cloud platforms and an open-source platform and conducted a comparison of VM autoscaling implementations.

Based on the literature review, we have proposed the following recommendations, which will be carried out as the next step of this research.

Developing a unified set of metrics that can serve as a benchmark for performing proactive autoscaling in hybrid cloud environments.
Implementing autoscaling mechanisms at the IaaS layer that work across different platforms, ensuring consistency and performance in hybrid cloud environments without relying on platform-specific features.
Identifying and defining suitable evaluation metrics to measure the success and reliability of autonomous autoscaling mechanisms, ensuring that they meet the performance and quality of service (QoS) requirements in a hybrid cloud setting.

Author Contributions

T.L.B.P., Z.K. and K.M. conceptualised the structure of the article. Z.K. and K.M. provided guidance on the systematic survey process, while T.L.B.P. conducted the research. T.L.B.P. identified relevant resources from various repositories and performed a critical review. Additionally, T.L.B.P. drafted the initial manuscript and created the graphics. Z.K. and K.M. reviewed the article multiple times, offering feedback and editorial modifications on both content and visuals. T.L.B.P. incorporated the revisions and finalised the article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Al-Dhuraibi, Y.; Paraiso, F.; Djarallah, N.; Merle, P. Elasticity in cloud computing: State of the art and research challenges. IEEE Trans. Serv. Comput. 2018, 11, 430–447. [Google Scholar] [CrossRef]
De Assuncao, M.D.; Cardonha, C.H.; Netto, M.A.; Cunha, R.L. Impact of user patience on auto-scaling resource capacity for cloud services. Future Gener. Comput. Syst. 2016, 55, 41–50. [Google Scholar] [CrossRef]
Qu, C.; Calheiros, R.N.; Buyya, R. Auto-scaling web applications in clouds: A taxonomy and survey. ACM Comput. Surv. (CSUR) 2018, 51, 1–33. [Google Scholar] [CrossRef]
Mell, P.; Grance, T. The NISTDefinition of Cloud Computing Recommendations of the National Institute of Standards Technology [Online], NIST. 2011. Available online: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf (accessed on 10 December 2023).
Mirobi, G.J.; Arockiam, L. Service Level Agreement in Cloud Computing: An Overview. In Proceedings of the 2015 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kumaracoil, India, 18–19 December 2015. [Google Scholar] [CrossRef]
Jelassi, M.; Ghazel, C.; Saïdane, L.A. A survey on quality of service in cloud computing. In Proceedings of the 2017 3rd International Conference on Frontiers of Signal Processing (ICFSP), Paris, France, 6–8 September 2017. [Google Scholar]
Managed Kubernetes Service—Amazon EKS—Amazon Web Services. Available online: https://aws.amazon.com/eks/ (accessed on 2 January 2024).
What Are Containers? | Google Cloud. Available online: https://cloud.google.com/learn/what-are-containers/ (accessed on 2 January 2024).
Ashalatha, R.; Agarkhed, J. Evaluation of Auto Scaling and Load Balancing Features in Cloud. Int. J. Comput. Appl. 2015, 117, 30–33. [Google Scholar] [CrossRef]
Amazon EC2 Spot. Available online: https://aws.amazon.com/ec2/spot/instance-advisor/ (accessed on 2 January 2024).
Spot VMs | Compute Engine Documentation | Google Cloud. Available online: https://cloud.google.com/solutions/spot-vms/ (accessed on 2 January 2024).
Predictive Scaling for Amazon EC2 Auto Scaling—Amazon EC2 Auto Scaling. Available online: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-predictive-scaling.html (accessed on 2 January 2024).
Google Cloud. Scaling Based on Predictions. 2016. Available online: https://cloud.google.com/compute/docs/autoscaler/predictive-autoscaling#suitable_workloads (accessed on 2 January 2024).
Vemasani, P.; Vuppalapati, S.M.; Modi, S.; Ponnusamy, S. Achieving Agility through Au-to-Scaling: Strategies for Dynamic Resource Allocation in Cloud Computing. IJRASET 2024, 12, 3169–3177. [Google Scholar] [CrossRef]
Multicloud Optimization Platform, Application Automatic Deployment Solution. Available online: https://www.melodic.cloud/ (accessed on 16 February 2024).
Bouabdallah, R.; Lajmi, S.; Ghedira, K. Use of reactive and proactive elasticity to adjust resources provisioning in the cloud provider. In Proceedings of the 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Sydney, Australia, 12–14 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1155–1162. [Google Scholar]
Arcaini, P.; Riccobene, E.; Scandurra, P. Modeling and analyzing MAPE-k feedback loops for self-adaptation. In Proceedings of the 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, Florence, Italy, 18–19 May 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 13–23. [Google Scholar]
Iqbal, W.; Erradi, A.; Abdullah, M.; Mahmood, A. Predictive auto-scaling of multi-tier applications using performance varying cloud resources. IEEE Trans. Cloud Comput. 2019, 10, 595–607. [Google Scholar] [CrossRef]
Rokach, L. Decision forest: Twenty years of research. Inf. Fusion 2016, 27, 111–125. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 1, pp. 278–282. [Google Scholar]
Liu, C.; Shang, Y.; Duan, L.; Chen, S.; Liu, C.; Chen, J. Optimizing Workload Category for Adaptive Workload Prediction in Service Clouds; Springer: Berlin/Heidelberg, Germany, 2015; pp. 87–104. [Google Scholar]
Lim, H.I. A linear regression approach to modeling software characteristics for classifying similar software. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 1, pp. 942–943. [Google Scholar]
Jakkula, V. Tutorial on Support Vector Machine (SVM); School of EECS, Washington State University: Pullman, WA, USA, 2006; Volume 37, p. 3. [Google Scholar]
Wadia, Y.; Gaonkar, R.; Namjoshi, J. Portable autoscaler for managing multi-cloud elasticity. In Proceedings of the 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies, Pune, India, 15–16 November 2013; IEEE: Piscataway, NJ, USA, 2013; pp. 48–51. [Google Scholar]
Si, W.; Pan, L.; Liu, S. A cost-driven online auto-scaling algorithm for web applications in cloud environments. Knowl.-Based Syst. 2022, 244, 108523. [Google Scholar] [CrossRef]
Bibal Benifa, J.; Dejey, D. Rlpas: Reinforcement learning-based proactive auto-scaler for resource provisioning in cloud environment. Mob. Netw. Appl. 2019, 24, 1348–1363. [Google Scholar] [CrossRef]
Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
Dayan, P.; Niv, Y. Reinforcement learning: The good, the bad and the ugly. Curr. Opin. Neurobiol. 2008, 18, 185–196. [Google Scholar] [CrossRef] [PubMed]
Zhao, D.; Wang, H.; Shao, K.; Zhu, Y. Deep reinforcement learning with experience replay based on SARSA. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016. [Google Scholar] [CrossRef]
Arabnejad, H.; Jamshidi, P.; Estrada, G.; El Ioini, N.; Pahl, C. An Auto-Scaling Cloud Controller Using Fuzzy Q-Learning-Implementation in Openstack; Springer: Berlin/Heidelberg, Germany, 2016; pp. 152–167. [Google Scholar]
Salehizadeh, M.R.; Soltaniyan, S. Application of fuzzy Q-learning for electricity market modeling by considering renewable power penetration. Renew. Sustain. Energy Rev. 2016, 56, 1172–1181. [Google Scholar] [CrossRef]
Glorennec, P.Y.; Jouffe, L. Fuzzy Q-learning. In Proceedings of the 6th International Fuzzy Systems Conference, Barcelona, Spain, 5 July 1997; IEEE: Piscataway, NJ, USA, 1997; Volume 2, pp. 659–662. [Google Scholar]
Golshani, E.; Ashtiani, M. Proactive auto-scaling for cloud environments using temporal convolutional neural networks. J. Parallel Distrib. Comput. 2021, 154, 119–141. [Google Scholar] [CrossRef]
Chudasama, V.; Bhavsar, M. A dynamic prediction for elastic resource allocation in hybrid cloud environment. Scalable Comput. Pract. Exp. 2020, 21, 661–672. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent neural networks for time series forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
Biswas, A.; Majumdar, S.; Nandy, B.; El-Haraki, A. A hybrid auto-scaling technique for clouds processing applications with service level agreements. J. Cloud Comput. 2017, 6, 29. [Google Scholar] [CrossRef]
Gari, Y.; Monge, D.A.; Mateos, C. A Q-learning approach for the autoscaling of scientific workflows in the cloud. Future Gen-Eration Comput. Syst. 2022, 127, 168–180. [Google Scholar] [CrossRef]
Alidoost Alanagh, Y.; Firouzi, M.; Rasouli Kenari, A.; Shamsi, M. Introducing an adaptive model for auto-scaling cloud computing based on workload classification. Concurr. Comput. Pract. Exp. 2023, 35, e7720. [Google Scholar] [CrossRef]
Tournaire, T.; Castel-Taleb, H.; Hyon, E. Efficient Computation of Optimal Thresholds in Cloud Auto-Scaling Systems. ACM Trans. Model. Perform. Eval. Comput. Syst. 2023, 8, 1–31. [Google Scholar] [CrossRef]
Chouliaras, S.; Sotiriadis, S. caling containerized cloud applications: A workload-driven approach. Simul. Model. Pract. Theory 2022, 121, 102654. [Google Scholar] [CrossRef]
Tran, T.M.; Le, X.M.T.; Nguyen, H.T.; Huynh, V.N. A novel non-parametric method for time series classification based on k- Nearest Neighbors and Dynamic Time Warping Barycenter Averaging. Eng. Appl. Artif. Intell. 2019, 78, 173–185. [Google Scholar] [CrossRef]
Wen, L.; Xu, M.; Toosi, A.N.; Ye, K. TempoScale: A Cloud Workloads Prediction Approach Integrating Short-Term and Long-Term Information. In Proceedings of the 2024 IEEE 17th International Conference on Cloud Computing (CLOUD), Shenzhen, China, 7–13 July 2024; pp. 183–193. [Google Scholar] [CrossRef]
Torres, M.E.; Colominas, M.A.; Schlotthauer, G.; Flandrin, P. A complete ensemble empirical mode decomposition with adaptive noise. In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011. [Google Scholar] [CrossRef]
Liu, H.; Zhu, W.; Fu, S.; Lu, Y. A Trend Detection-Based Auto-Scaling Method for Containers in High-Concurrency Scenarios. IEEE Access 2024, 12, 71821–71834. [Google Scholar] [CrossRef]
Chen, Z.; Ma, M.; Li, T.; Wang, H.; Li, C. Long sequence time-series forecasting with deep learning: A survey. Inf. Fusion 2023, 97, 101819. [Google Scholar] [CrossRef]
Research.Google. Attention Is All You Need. Available online: https://research.google/pubs/attention-is-all-you-need/ (accessed on 25 February 2024).
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Zheng, Y.; Zhou, W.; Wang, C.; Zhang, J.; Tang, W.; Qi, L.; Ai, T.; Li, G.; Yu, B.; Yang, X. PheScale: Leveraging Transformer Models for Proactive VM Auto-scaling. In International Conference on Advanced Data Mining and Applications; Lecture Notes in Computer Science; Springer: Singapore, 2024; pp. 47–61. [Google Scholar] [CrossRef]
Nie, Y.; Nguyen, N.H.; Sinthong, P.; Kalagnanam, J. A Time Series is Worth 64 Words: Long-term Forecasting with Transformers. arXiv 2023. [Google Scholar] [CrossRef]
Tanadechopon, T.; Kasemsontitum, B. Performance Evaluation of Programming Languages as API Services for Cloud Environments: A Comparative Study of PHP, Python, Node.js and Golang. In Proceedings of the 2023 7th International Conference on Information Technology (InCIT), Chiang Rai, Thailand, 16–17 November 2023. [Google Scholar] [CrossRef]
Shreyas, M. Federated Cloud Services using Virtual API Proxy Layer in a Distributed Cloud Environment. In Proceedings of the 2017 Ninth International Conference on Advanced Computing (ICoAC), Chennai, India, 14–16 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 134–141. [Google Scholar]
Harwalkar, S.; Sitaram, D.; Kidiyoor, D.V.; Milan, M.L.; D’souza, O.; Agarwal, R.; Agarwal, Y. Multicloud-auto scale with prediction and delta correction algorithm. In Proceedings of the 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, Indi, 5–6 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 227–233. [Google Scholar]
Eddy, S.R. What is a hidden Markov model? Nat. Biotechnol. 2004, 22, 1315–1316. [Google Scholar] [CrossRef] [PubMed]
Rabiner, L.; Juang, B. An introduction to hidden Markov models. IEEE Assp. Mag. 1986, 3, 4–16. [Google Scholar] [CrossRef]
Srirama, S.N.; Adhikari, M.; Paul, S. Application deployment using containers with auto-scaling for microservices in cloud en-vironment. J. Netw. Comput. Appl. 2020, 160, 102629. [Google Scholar] [CrossRef]
Abdel Khaleq, A.; Ra, I. Intelligent microservices autoscaling module using reinforcement learning. Clust. Comput. 2023, 26, 2789–2800. [Google Scholar] [CrossRef]
Sarkis-Onofre, R.; Catalá-López, F.; Aromataris, E.; Lockwood, C. How to properly use the PRISMA Statement. Syst. Rev. 2021, 10, 117. [Google Scholar] [CrossRef] [PubMed]
Mueen Ahmed, K.; Dhubaib, B.E.A. Zotero: A bibliographic assistant to researcher. J. Pharmacol. Pharma-Cother-Apeutics 2011, 2, 304–305. [Google Scholar] [CrossRef] [PubMed]
Sanad, A.J.; Hammad, M. Combining Spot Instances Hopping with Vertical Auto-scaling To Reduce Cloud Leasing Cost. In Proceedings of the 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), Sakheer, Bahrain, 20–21 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
Lorido-Botran, T.; Miguel-Alonso, J.; Lozano, J.A. A review of auto-scaling techniques for elastic applications in cloud environments. J. Grid Comput. 2014, 12, 559–592. [Google Scholar] [CrossRef]
Verma, S.; Bala, A. Auto-scaling techniques for IoT-based cloud applications: A review. Clust. Comput. 2021, 24, 2425–2459. [Google Scholar] [CrossRef]
Sztrik, J. Basic Queuing Theory; University of Debrecen: Debrecen, Hungary, 2012; Volume 193, pp. 60–67. [Google Scholar]
Sahni, J.; Vidyarthi, D.P. A cost-effective deadline-constrained dynamic scheduling algorithm for scientific workflows in a cloud environment. IEEE Trans. Cloud Comput. 2015, 6, 2–18. [Google Scholar] [CrossRef]
Liu, Z.; Zhu, Z.; Gao, J.; Xu, C. Forecast Methods for Time Series Data: A Survey. IEEE Access 2021, 9, 91896–91912. [Google Scholar] [CrossRef]
Gartner, I. Best Cloud Infrastructure and Platform Services Reviews 2023 | Gartner Peer Insights. Available online: https://www.gartner.com/reviews/market/cloud-infrastructure-and-platform-services (accessed on 22 January 2024).
Selecting Cloud Computing Software for a Virtual Online Laboratory Supporting the Operating Systems Course | CTE Workshop Proceedings. Available online: https://acnsci.org/journal/index.php/cte/article/view/116 (accessed on 22 January 2024).
Instance Auto Scaling—Amazon EC2 Autoscaling—AWS. Available online: https://aws.amazon.com/ec2/autoscaling/ (accessed on 22 January 2024).
What Is Amazon EC2 Auto Scaling?—Amazon EC2 Auto Scaling. Available online: https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html (accessed on 22 January 2024).
Scheduled Scaling for Amazon EC2 Auto Scaling—Amazon EC2 Auto Scaling. Available online: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-scheduled-scaling.html (accessed on 22 January 2024).
Dynamic Scaling for Amazon EC2 Auto Scaling—Amazon EC2 Auto Scaling. Available online: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html (accessed on 22 January 2024).
Use Amazon CloudWatch Metrics—Amazon CloudWatch. Available online: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html (accessed on 22 January 2024).
Scaling Based on Amazon SQS—Amazon EC2 Auto Scaling. Available online: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html (accessed on 24 January 2024).
Instance Groups Documentation Google Cloud. Available online: https://cloud.google.com/compute/docs/instance-groups (accessed on 24 January 2024).
Scaling Based on CPU Utilization | Compute Engine Documentation | Google Cloud. Available online: https://cloud.google.com/compute/docs/autoscaler/scaling-cpu (accessed on 24 January 2024).
Scale Based on Monitoring Metrics | Compute Engine Documentation | Google Cloud. Available online: https://cloud.google.com/compute/docs/autoscaler/scaling-cloud-monitoring-metrics (accessed on 24 January 2024).
Scaling Based on Load Balancing Serving Capacity | Compute Engine Documentation | Google Cloud. Available online: https://cloud.google.com/compute/docs/autoscaler/scaling-load-balancing (accessed on 25 January 2024).
Scaling Based on Schedules | Compute Engine Documentation | Google Cloud. Available online: https://cloud.google.com/compute/docs/autoscaler/scaling-schedules (accessed on 25 January 2024).
Scaling Based on Predictions | Compute Engine Documentation | Google Cloud. Available online: https://cloud.google.com/compute/docs/autoscaler/predictive-autoscaling (accessed on 25 January 2024).
Shim, J. Azure Virtual Machine Scale Sets Overview—Azure Virtual Machine Scale Sets. Available online: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/overview (accessed on 25 January 2024).
Tomvcassidy. Overview of Azure Service Fabric—Azure Service Fabric. Available online: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-overview (accessed on 25 January 2024).
Fitzgeraldsteele. Orchestration Modes for Virtual Machine Scale Sets in Azure—Azure Virtual Machine Scale Sets. Available online: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-orchestration-modes (accessed on 26 January 2024).
Bdeforeest. Associate a Virtual Machine Scale Set with Flexible Orchestration to a Capacity Reservation Group (Preview)—Azure Virtual Machines. Available online: https://learn.microsoft.com/en-us/azure/virtual-machines/capacity-reservation-associate-virtual-machine-scale-set-flex (accessed on 26 January 2024).
Mimckitt. Manage Fault Domains in Azure Virtual Machine Scale Sets—Azure Virtual Machine Scale Sets. Available online: https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-manage-fault-domains (accessed on 26 January 2024).
EdB-MSFT. Autoscale in Azure Monitor—Azure Monitor. Available online: https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-overview (accessed on 26 January 2024).
EdB-MSFT. Use Predictive Autoscale to Scale Out Before Load Demands in Virtual Machine Scale Sets—Azure Monitor. Available online: https://learn.microsoft.com/en-us/azure/azure-monitor/autoscale/autoscale-predictive (accessed on 26 January 2024).
Apache Cloudstack. Apache Cloudstack. Available online: https://cloudstack.apache.org/ (accessed on 26 January 2024).
Plan for Peak Traffic and Launch Events | Architecture Framework | Google Cloud. Available online: https://cloud.google.com/architecture/framework/operational-excellence/plan-for-peak-traffic-and-launch-events (accessed on 26 January 2024).
Radhika, E.; Sadasivam, G.S. A review on prediction based autoscaling techniques for heterogeneous applications in cloud environment. Mater. Today Proc. 2021, 45, 2793–2800. [Google Scholar] [CrossRef]
Giles, M. A Major Outage at AWS Has Caused Chaos at Amazon’s Own Operations, High-Lighting Cloud Computing Risks. Available online: https://www.forbes.com/sites/martingiles/2021/12/07/aws-outage-caused-chaos-at-amazon-underlining-cloud-computing-risks/?sh=5173d7496834 (accessed on 26 January 2024).
Moghaddam, S.K.; Buyya, R.; Ramamohanarao, K. ACAS: An anomaly based cause aware auto-scaling framework for clouds. J. Parallel Distrib. Comput. 2019, 126, 107–120. [Google Scholar] [CrossRef]
AWS. What Is Amazon CloudWatch?—Amazon CloudWatch. 2024. Available online: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html (accessed on 26 January 2024).
Cloud Monitoring | Google Cloud. Available online: https://cloud.google.com/monitoring?hl=en (accessed on 26 January 2024).
Rboucher. Azure Monitor Overview—Azure Monitor. Available online: https://learn.microsoft.com/en-us/azure/azure-monitor/overview (accessed on 26 January 2024).
Documentation. OpenTelemetry. 2024. Available online: https://opentelemetry.io/docs (accessed on 3 March 2024).
Lee, D.; Yannakakis, M. Principles and methods of testing finite state machines—A survey. Proc. IEEE 1996, 84, 1090–1123. [Google Scholar] [CrossRef]

Figure 1. Breakdown of taxonomy. Boxes filled in green indicate the scope of this research.

Figure 2. Single and multi-cloud autoscaling scenarios.

Figure 3. PRISMA flow diagram.

Figure 4. Temporal frequency of the papers included in the review.

Figure 5. Classification of autoscaling techniques.

Figure 6. Challenges in VM autoscaling single and hybrid clouds.

Table 1. An overview of this paper’s structure.

Section	Section Heading	Brief Description
Section 2	Rapid elasticity for VMs in Cloud Platforms and current challenges	Discusses the commonly used autoscaling techniques in cloud computing within both the reactive and proactive autoscaling categories.
Section 3	Autoscaling in Clouds—Previous Work	Provides an overview of the selected papers, discussing the strengths and weaknesses of each work.
Section 4	Methods and Materials	Outlines the methodology used for the systematic review.
Section 5	Classification of Autoscaling Techniques	Offers a classification of commonly employed autoscaling techniques, along with a brief description of each method.
Section 6	Commercial Autoscaling Approaches in Public Cloud Environments	Discusses the autoscaling approaches used in three widely used commercial public cloud providers and one open-source-based cloud platform.
Section 7	What are the Challenges to Achieve Autoscaling at IaaS Layer?	Summarises the challenges from the review of commercial deployments (Section 5) and autoscaling approaches proposed in the reviewed papers (Section 6).
Section 8	Conclusions and future research directions	Concludes and offers future research directions are presented in this section.

Table 2. Pros and cons of the three categories.

Category	Pros	Cons
Reactive autoscaling	1. Straightforward and simple to use.	2. On-demand scaling introduces latency issues and SLA violations.
Single cloud with proactive scaling support	1. Proactive resource forecasting addresses the latency and SLA issues. 2. Cost efficient due to predictive nature of resource usage forecasting.	1. Cloud native and lack of hybrid cloud support. 2. Requirements of high-computing resources for ML-based algorithms.
Autoscaling across multiple clouds	1. Hybrid cloud support to remove the platform dependency and improved resilience.	1. Limited proactive autoscaling support. 2. Difficult to execute autoscaling decisions across hybrid cloud (API differences).

Table 3. Comparative summary of the approaches.

Ref.	Year	Reactive	Proactive	Hybrid Cloud Support	ML Based	Dynamic Scaling at Runtime	Workload Classification Support	Principle Techniques	IaaS /PaaS
[16]	2016	✓	×	×	×	✓	×	Control theory	IaaS
[18]	2019	✓	×	×	✓	×	✓	Decision Tree/Random Decision Forest	IaaS
[21]	2015	✓	×	×	✓	×	✓	Linear Regression (LR) and Support Vector Method (SVM) for classification. A 0–1 integer programming	IaaS
[24]	2013	✓	×	×	×	✓	×	Control theory-based using a customised open-source tool-set	IaaS
[25]	2022	✓	×	×	×	✓	×	Control theory-based mathematical formula based on request arrival rate	IaaS
[26]	2022	✓	×	×	✓	×	×	Reinforcement learning-based State Action Reward State Action (SARSA)	IaaS
[30]	2016	✓	✓	×	✓	✓	×	Reinforcement learning-based Fuzzy Q-Learning and control theory-based MAPE K	IaaS
[33]	2021	✓	✓	×	✓	✓	✓	Temporal Convolutional Neural Network	IaaS
[34]	2020	×	✓	×	✓	✓	✓	LSTM (Long Short-Term Memory) and Queuing Theory	IaaS
[37]	2017	✓	✓	×	✓	×	✓	Linear Regression and Support Vector Method for prediction	IaaS
[38]	2022	✓	✓	×	✓	×	×	Markov Decision Process and Q-Learning	IaaS
[39]	2023	✓	✓	×	✓	✓	✓	Time series-based (LR, SVM, ARIMA)	IaaS
[40]	2023	✓	✓	×	✓	✓	×	Continuous Time Markov Chains (CTMC) and Markov Decision Process (MDP)	IaaS
[41]	2022	✓	✓	×	✓	✓	×	Convolution Neural Networks (CNN) and K-means clustering	IaaS
[43]	2024	×	✓	×	✓	×	×	CEEMDAN, which is an extension of Empirical Mode Decomposition (EMD)	PaaS
[45]	2024	×	✓	×	✓	×	×	ARIMA, LSTM, Informer, and MAPE	PaaS
[49]	2025	×	✓	×	✓	×	×	PatchTST and Adaptive Sequence Transformation	IaaS
[52]	2017	✓	×	✓	×	×	×	Control Theory-based approach using OpenStack Nova API	IaaS
[53]	2019	✓	×	✓	×	×	✓	Hidden Markov Model	IaaS
[56]	2020	✓	✓	✓	✓	✓	×	Rule-based and heuristic-based Best Fit Dynamic Bin Packing	PaaS
[57]	2023	✓	✓	✓	✓	×	×	Reinforcement learning	PaaS

Table 4. Paper selection criteria.

Stage	Description	Criteria	Outcome
1	Selection of databases	Selected the databases containing journals related to the cloud, distributed computing, and information systems	The following databases were chosen to perform the searches: 1. IEE Explore 2. Elsevier—Science Direct 3. ACM 4. Springer 5. ResearchGate
2	Searched the databases to find relevant papers published during the past 10 years	Used the combinations of keywords: 1. Cloud autoscaling 2. Hybrid cloud autoscaling 3. Issues in cloud autoscaling 4. Autoscaling in IaaS 5. Proactive autoscaling	Based on the title and abstract, papers related to cloud autoscaling were selected and saved/downloaded to Zotero
3	Segregated papers further to filter out IaaS papers	Abstract, introduction, and conclusion sections of the papers were reviewed to identify how the papers could be organised into three cloud service models	Papers were organised into four separate folders in Zotero: 1. IaaS 2. PaaS 3. SaaS
4	Further reviewed and removed papers in IaaS and selected the most relevant papers	The contents of all papers were reviewed, selecting papers that address horizontal autoscaling only	21 papers were selected for the survey.

Table 5. Autoscaling techniques, key features, and use cases.

Ref.	Autoscaling Technique	Key Methods	Key Features	Main Limitations	Use Cases
[61]	Threshold-based Rules	Uses one or more performance metrics like CPU load, average response time, or request rate to trigger the scaling action	Autoscaling actions are triggered by preset thresholds and rules defined against the thresholds. Thresholds can be set for system resources, metrics such as user requests, or latency metrics.	Reactive scaling support only. Preset thresholds do not support dynamic workloads well. Over/Under-provisioned resources.	Scales-out/in the VM based on the system resource usage or incoming requests.
[61,62]	Reinforcement Learning	MDP, Q-Learning, Parallel Learning, Neural Networks	Utilises historical data dynamically learned through interactions with the autoscaling environment to inform autoscaling decisions. Future autoscaling decisions will progressively improve over time.	Associated computational costs. Improvements in autoscaling decisions are not immediate.	Improves the accuracy of the autoscaling decisions over time. Affordability of resource usage can be optimised whilst providing the optimum QoS.
[65]	Queuing Theory	Kendall’s Notation	Explains the queue behaviour, linking the mean arrival time “λ” of objects to the queue and the mean service rate “μ”, at which they are served.	Requires mathematical analysis of the arrival rate and service rates to define the autoscaling rules.	Efficiently distributes traffic to back-end virtual machines by considering the rate of incoming requests at a load balancer within a horizontal autoscaling framework.
[63,65]	Control theory	Open-loop, feedback, and feed forward controllers	Uses the input of a target system at any given time to determine the output. Three main varieties: (i) open loop, (ii) feedback, and (iii) feed forward.	Performs reactive and dynamic scaling only. Requires accurate input to output modelling to achieve the optimum results.	Resource allocation for both proactive and reactive autoscaling.
[62]	Time series analysis	Time Series Regression, Exponential Smoothing, time series decomposition, ARIMA, RNN, LSTM	Analyse the input workloads and identify the repeating patterns in the analysis phase of the autoscaling process. Five main categories: (i) time series regression, (ii) time series decomposition, (iii) Exponential Smoothing, (iv) Autoregressive Integrated Moving Average, and (v) Neural Network-based RNNs and LSTM-based techniques.	Early models are ineffective for bursting workloads. Associated computational costs.	Proactive autoscaling for predicting future workloads using historical usage patterns to trigger scaling actions in advance.

Table 6. Autoscaling techniques used across public clouds.

Cloud Platform	Control Theory	Queuing Theory	Threshold-Based	Time Series	Reinforcement Learning	Hybrid Cloud Support
AWS	✓	✓	✓	✓	×	×
GCP	✓	✓	✓	✓	×	×
Azure	✓	✓	✓	✓	×	×
Apache	✓	×	✓	×	×	×

Table 7. Challenges in VM autoscaling in single and hybrid clouds.

Category	Specific Challenge	Consequences	Impact
Capacity Estimation/Load Prediction Errors	Over/Under-provisioned Resources	Increased Operational Costs or Compromised SLA	Financial and Performance Impact
Delay in Capacity Allocation	Compromised SLA	Negative Reputational/Financial Impact	Business Continuity Issues
Infrastructure Failures	Non-Resilient Services/Compromised SLA	Service Downtime and Reliability Issues	Operational Risk
Metric Collection/Alerting Failures	Autoscaling Decision Errors	Over/Under-provisioned Resources	Cost Inefficiency and SLA Violations
Platform Dependency	Vendor Locked Services	Licensing Monopoly/Single Point of Failure	Reduced Flexibility and Increased Costs
Predictive Autoscaling on Hybrid Clouds	Efficient Resource Allocation and SLA/Cost Management	Improved Workload Handling and Resource Optimisation	Positive Reputational/Financial Impact

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Betti Pillippuge, T.L.; Khan, Z.; Munir, K. Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities. Encyclopedia 2025, 5, 37. https://doi.org/10.3390/encyclopedia5010037

AMA Style

Betti Pillippuge TL, Khan Z, Munir K. Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities. Encyclopedia. 2025; 5(1):37. https://doi.org/10.3390/encyclopedia5010037

Chicago/Turabian Style

Betti Pillippuge, Thushantha Lakmal, Zaheer Khan, and Kamran Munir. 2025. "Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities" Encyclopedia 5, no. 1: 37. https://doi.org/10.3390/encyclopedia5010037

APA Style

Betti Pillippuge, T. L., Khan, Z., & Munir, K. (2025). Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities. Encyclopedia, 5(1), 37. https://doi.org/10.3390/encyclopedia5010037

Article Menu

Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities

Abstract

1. Introduction

1.1. Motivation and Scope of the Paper

1.2. Structure of the Paper

2. Rapid Elasticity for VMs in Cloud Platforms and Current Challenges

2.1. Rapid Elasticity at Different Cloud Deployment Layers

2.2. Capability of Multi-Cloud Support for Autoscaling

2.3. Proactive Autoscaling Support

3. Autoscaling in Clouds—Previous Work

3.1. Reactive Autoscaling

3.2. Autoscaling in a Single-Cloud Infrastructure with Dynamic or Predictive Scaling Support

3.3. Autoscaling Across Multiple Clouds

3.4. Discussion of the Comparative Analysis

4. Methods and Materials

5. Classification of Autoscaling Techniques

5.1. Threshold-Based Rules

5.2. Reinforcement Learning

5.3. Queuing Theory

5.4. Control Theory

5.5. Time Series Analysis

6. Commercial Autoscaling Approaches in Public Cloud Environments

6.1. Amazon Web Services

6.2. Google Cloud Platform

6.3. Azure Cloud

6.4. Apache Cloud

7. What Are the Challenges to Achieving Horizontal Autoscaling at the IaaS Layer?

8. Conclusions and Future Research Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI