An Efficient Multivariate Autoscaling Framework Using Bi-LSTM for Cloud Computing

Dang-Quang, Nhat-Minh; Yoo, Myungsik

doi:10.3390/app12073523

Open AccessArticle

An Efficient Multivariate Autoscaling Framework Using Bi-LSTM for Cloud Computing

by

Nhat-Minh Dang-Quang

^1,† and

Myungsik Yoo

^2,*,†

¹

Department of Information Communication Convergence Technology, Soongsil University, Seoul 06978, Korea

²

School of Electronic Engineering, Soongsil University, Seoul 06978, Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2022, 12(7), 3523; https://doi.org/10.3390/app12073523

Submission received: 27 February 2022 / Revised: 28 March 2022 / Accepted: 29 March 2022 / Published: 30 March 2022

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid development of 5G technology, the need for a flexible and scalable real-time system for data processing has become increasingly important. By predicting future resource workloads, cloud service providers can automatically provision and deprovision user resources for the system beforehand, to meet service level agreements. However, workload demands fluctuate continuously over time, which makes their prediction difficult. Hence, several studies have proposed a technique called time series forecasting to accurately predict the resource workload. However, most of these studies focused solely on univariate time series forecasting; in other words, they only analyzed the measurement of a single feature. This study proposes an efficient multivariate autoscaling framework using bidirectional long short-term memory (Bi-LSTM) for cloud computing. The system framework was designed based on the monitor–analyze–plan–execute loop. The results obtained from our experiments on different actual workload datasets indicated that the proposed multivariate Bi-LSTM exhibited a root-mean-squared error (RMSE) prediction error 1.84-times smaller than that of the univariate one. Furthermore, it reduced the RMSE prediction error by 6.7% and 5.4% when compared with the multivariate LSTM and convolutional neural network-long short-term memory (CNN-LSTM) models, respectively. Finally, in terms of resource provisioning, the multivariate Bi-LSTM autoscaler was 47.2% and 14.7% more efficient than the multivariate LSTM and CNN-LSTM autoscalers, respectively.

Keywords:

multivariate variables; time series forecasting; autoscaling; resource estimation; cloud computing

1. Introduction

Cloud computing is currently playing a significant role in improving the efficiency of enterprises, industries, and governments [1,2]. Virtualization technology, which allows cloud providers to execute programs on virtual machines (VMs), is an essential component of cloud computing [3]. By renting VMs, stakeholders can deploy their applications on the VMs using pay-per-use prices instead of purchasing expensive physical servers. Elasticity is a key property of cloud computing as it allows cloud providers to create and remove resources as needed, increasing performance, while lowering costs [4]. Autoscaling is an automatic process that dynamically adds and removes resources such as VMs and containers. Autoscaling can be categorized into two types: reactive and proactive. Reactive autoscaling refers to processes that scale resources based on predefined rules or thresholds, whereas proactive autoscaling analyzes and predicts future workloads based on historical data. It then generates scaling decisions based on the predicted workloads. Reactive autoscaling has the advantage of its ease of implementation, but has a significant disadvantage in that it cannot handle the change in workload, leading to a waste of resources, which affects service level agreements and costs [5]. With the proactive approach, service providers can add or remove resources in advance. However, workload demands fluctuate continuously over time, which makes their prediction difficult. To address this problem, many studies have proposed a technique called time series forecasting [3,5,6,7,8,9,10,11]. However, most of these studies focused on univariate time series analysis, that is they only analyzed the measurement of a single feature (or variate or variable), such as CPU utilization or memory usage, and produced predictions. However, there are multiple features to be considered in the cloud system, some of which might have hidden correlations that depend on each other. Multivariate time series analysis can be used to study these hidden correlations. Thus, it helps the prediction model better understand the system and improve the prediction accuracy.

The large dimensionality and spatial–temporal dependence features of multivariate time series data, as well as the existence of noisy data make traditional statistical methodologies difficult to model effectively [12]. Owing to the advancement of artificial intelligence (AI), especially in deep learning, the technology has been applied to various industries, including computer vision, natural language processing, and medical image analysis [13]. Although traditional statistical methodologies can model time series data, deep-learning-based time series forecasting is becoming prevalent [3,9]. In addition, it is promising for designing autoscalers [14]. This paper proposes a multivariate deep-learning-based autoscaling framework using the proactive approach for cloud computing. There are various types of deep learning approaches used in time series forecasting, including recurrent neural networks (RNNs), artificial neural networks, convolutional neural networks (CNNs), and long short-term memory (LSTM). Bidirectional LSTM (Bi-LSTM) [15] is a special type of LSTM with two LSTM layers. These layers run the input data in opposite directions, which helps the model preserve information from the past and the future. Thus, Bi-LSTM can better accumulate knowledge and improve prediction accuracy. The contributions of this study are as follows:

First, we propose an efficient multivariate proactive autoscaling framework architecture using Bi-LSTM designed for cloud computing;
Second, through the implementation of a multivariate Bi-LSTM prediction model for time series forecasting, we demonstrate that it outperforms both the univariate Bi-LSTM model and other multivariate time series prediction algorithms;
Thirdly, we investigated the under-provisioning and over-provisioning of resources by the proposed multivariate Bi-LSTM autoscaler and other well-known multivariate autoscalers.

By utilizing different datasets, our studies revealed that the proposed multivariate Bi-LSTM model outperforms not only the univariate Bi-LSTM model, but also other well-known multivariate deep learning models such as LSTM and CNN-LSTM in prediction accuracy. Furthermore, this work also shows the advantage of the multivariate Bi-LSTM autoscaler compared with other multivariate deep-learning-based autoscalers in terms of resource provisioning efficiency.

The rest of the paper is organized as follows. In Section 2, we review research on autoscaling using the time series forecasting technique and its data. Section 3 presents the system framework architecture followed by the experiments and evaluation in Section 4. Finally, Section 5 concludes the paper.

2. Related Work

In this section, we review recent works related to autoscaling and resource estimation using time series forecasting in cloud computing. Related studies using multiple performance metrics, such as CPU utilization, memory usage, disk I/O, and network throughput, are still limited [11]. Table 1 lists some of these recent related studies.

Time series forecasting is a popular technique for analyzing the behavior of temporal data and predicting future values. It has been widely used in business [16], as well as traffic flow forecasting [17] and anomaly detection [18]. In general, time series data have a natural temporal ordering and can be categorized into two types: univariate and multivariate time series data.

Univariate time series data are a collection of measurements taken for a single variable over time. Univariate refers to a single variable, variate, or feature.

Calheiros et al. [8] proposed a cloud workload prediction module for software as a service (SaaS) using the autoregressive integrated moving average (ARIMA) method to adjust the number of VMs dynamically. They evaluated the proposed prediction model with a realistic HTTP request workload dataset from the Wikimedia Foundation [19]. For seasonal data, the ARIMA model had a 91% accuracy rate. However, the authors did not compare this with other approaches.

Li and Xia [7] presented a hybrid cloud autoscaling architecture known as the cloud resource prediction and provisioning scheme (RPPS). Their proposed prediction model used ARIMA to analyze the historical data of CPU utilization and predict its future usage. The proposed autoscaling architecture, evaluated using varying workloads, showed that their model could outperform the Kubernetes horizontal pod autoscaler (HPA) [20].

Prachitmutita et al. [6] presented a novel autoscaling framework using LSTM and multilayer perceptron (MLP) models to analyze and predict the number of HTTP requests coming to the system. They evaluated the models on a realistic HTTP request workload dataset of the FIFA World Cup 98 [21], and the results showed that the LSTM model could outperform MLP models in terms of accuracy.

LSTM was also used in [9] as a prediction service to predict the number of HTTP requests in the near future. The experiments were conducted on the FIFA World Cup 98 dataset. The authors showed that the LSTM model had a slight prediction error on the root-mean-squared error (RMSE) metric compared with the ARIMA model, indicating that ARIMA had a slightly better accuracy. However, the prediction speed of LSTM was 100-times faster than that of ARIMA.

Toka et al. [10] proposed to use a proactive forecasting method for proactive scaling decisions. The proactive forecasting method consisted of three different prediction models: autoregression (AR), hierarchical temporal memory (HTM), and LSTM. These models were used to learn and predict the incoming HTTP request workload to the system. The authors also proposed HPA+, a backtesting plugin that alternates between the AI-based forecasting technique and HPA automatically. It switches to the HPA if the AI-based forecasting method yields a poor prediction accuracy. The results showed that HPA+ could dramatically reduce the number of rejected requests at the cost of slightly higher resource usage.

Dang-Quang and Yoo [3] proposed a deep-learning-based autoscaling framework for Kubernetes. The proposed autoscaler used Bi-LSTM to analyze and predict the number of HTTP requests and then generate scaling actions. The results of the experiments on the FIFA World Cup 98 dataset showed that Bi-LSTM could outperform LSTM and ARIMA in terms of prediction accuracy, while maintaining the same prediction speed as that of LSTM.

Multivariate time series data are a collection of measurements taken for multiple variables across time. Studying multivariate time series data helps researchers examine and compute the relationship between features. To manage the relationship between variables, they can utilize cross-tabulation, partial correlation, and multiple regressions, as well as add other variables to determine the relationships between the independent and dependent variables or to show the conditions under which the association occurs. The advantage of multivariate analysis over single-variable analysis is that it provides a more realistic picture, thus providing a more powerful test of significance [22].

Yan et al. [5] proposed a hybrid autoscaling system for Kubernetes [20]. The proposed prediction model used Bi-LSTM to analyze and predict the CPU utilization and memory usage of both VMs and pods. To produce elastic scaling actions, the proposed system integrated online reinforcement learning into the proposed Bi-LSTM prediction model, which is a reactive model. The results of the experiments showed that the proposed Bi-LSTM model had better accuracy than those of the ARIMA, RNN, and LSTM models. However, no comparison with the univariate model was performed to analyze the improvement, and the configuration of the proposed Bi-LSTM model was not available. Furthermore, this work did not compare its proposed model with the most recently proposed CNN-LSTM model, and no technique for oscillation mitigation was addressed.

Ouhame and Hadi [11] proposed a multivariate prediction model for cloud computing using a CNN-LSTM network. The proposed model receives multiple features, such as CPU utilization, memory usage, network usage, and disk usage, as inputs to the prediction model. The proposed model was evaluated on the GWA-T-12 dataset from the Bitbrains data center [23] and showed that it could outperform the ARIMA-LSTM model in terms of accuracy. However, this study did not use any feature selection method to select the suitable subset features for the inputs of the model.

Most of the elated studies focused only on univariate time series analysis, which means that they only analyzed the measurement of a single feature (or variate or variable), such as CPU utilization, or memory usage, or network throughput (HTTP requests), and produced predictions. However, there are multiple features to be considered in the cloud system, some of which might have hidden correlations that depend on each other. Multivariate time series analysis can be used to study these hidden correlations. Thus, it helps the prediction model to better understand the system and improve the prediction accuracy. Furthermore, as mentioned earlier, the huge dimensionality and spatial–temporal dependence features of multivariate time series data, as well as the existence of noisy data make traditional statistical methodologies difficult to model effectively [12]. In addition, statistical methods have been relatively slow in matching dynamic workload demands [3,9], which is not suitable for designing real-time autoscaling systems. For these reasons and given the great success of AI, especially deep learning in various fields, this paper proposes an efficient multivariate autoscaling framework using Bi-LSTM for cloud computing environments.

3. Proposed Autoscaling Framework

The proposed system framework is presented in Figure 1. It has two main components: managed resource and autonomic manager. The managed resource component monitors and controls the resources in the system. The autonomic manager component is responsible for estimating needed resources and providing scaling actions.

3.1. Managed Resource

3.1.1. Effector Component

The effector component receives scaling action commands (from the autonomic manager) and executes them. Furthermore, after each scaling action, it validates the currently managed resources to determine whether they meet the desired ones.

3.1.2. Collector Component

In time series forecasting, a multivariate prediction model usually consists of multiple feature values at each time step. The collector component collects the resource utilization data of multiple features such as the CPU, memory, disk I/O, and network received throughput. All the collected data are sent to the monitor server of the autonomic manager.

3.2. Autonomic Manager

3.2.1. Monitor Phase

Monitoring server: In the first phase of the MAPE loop, the monitoring server continuously gathers the different types of data from the collector component of the monitor server and stores them in the Prometheus time series database;
Prometheus time series database: Prometheus is a powerful open-source time series database developed by SoundCloud that allows users to leverage the collected data and use them to make decisions. The collected data can be easily exposed and accessed through application programming interface (API) services.

3.2.2. Analyze Phase

Feature selection service:
Feature selection enables a machine learning algorithm to train faster, reduces the complexity, and makes the interpretation easier. Furthermore, it can improve the prediction accuracy if the right subset of features is chosen. Finally, it can prevent the overfitting of the model.
The feature selection component accesses the collected data through the exposed API. Each feature value depends not only on its previous values, but also on other feature values. This correlation between features is important when modeling a multivariate prediction model. In this study, we used the well-known Pearson correlation feature-selection method to measure the linear relationship between two features, where the change in one feature is related to a corresponding change in other features. This is measured by the ratio between the covariance of two different features and the product of their standard deviations. Equation (1) presents the Pearson correlation, $r_{p e a r}$ , and the coefficient of two resource metrics: $X_{i}$ and $X_{j}$ .

$\begin{matrix} r_{p e a r} = & c o r r (X_{i}, X_{j}) = \frac{\sum_{t = 1}^{T} (x_{i}^{t} - {\bar{x}}_{i}) (x_{j}^{t} - {\bar{x}}_{j})}{\sqrt{\sum_{t = 1}^{T} {(x_{i}^{t} - {\bar{x}}_{i})}^{2}} \sqrt{\sum_{t = 1}^{T} {(x_{j}^{t} - {\bar{x}}_{j})}^{2}}} \end{matrix}$

(1)

We denote ${\bar{x}}_{i}$ and ${\bar{x}}_{j}$ as the average values of the features $X_{i}$ and $X_{j}$ , respectively. The Pearson correlation outputs a value between −1 and 1, where 1 indicates the maximum positive correlation and −1 denotes the negative correlation between the two features. The Pearson correlation will output a value of 0 if the two features are independent of each other. Finally, the Pearson correlation is symmetric; therefore, $c o r r (X_{i}, X_{j}) = c o r r (X_{j}, X_{i})$ . In this study, we chose the network received throughput as the desired prediction metric. Then, we calculated the Pearson correlation value of every resource metric pair (CPU utilization, memory usage, etc.) with the network received throughput. Finally, we selected the resource metric with the highest correlation. The chosen resource metric and network received throughput were selected as the input features of the multivariate Bi-LSTM forecasting service;
Multivariate Bi-LSTM forecasting service:
Time series forecasting takes the historical time series data as the input and outputs the future temporal value $p_{t + 1}$ at time step $t + 1$ or $f_{t + p}$ at time step $t + p$ . The model is fed data that include the variable $f_{i}$ itself and other multivariate time series variables. Using the network received throughput as an example, the model not only includes the network received throughput, but also includes other features such as CPU usage, memory usage, and disk I/O. Network received throughput refers to the amount of data that can be received from the source to the destination within a given time frame. The number of packets that successfully arrive at their destinations is measured by the throughput. Throughput capacity is usually expressed in bits per second, although it can be alternatively expressed as data per second. By predicting the network received throughput, we can estimate the number of incoming packets to our system and better provision or deprovision resources (VMs) to process those packets. For this reason, in this study, we chose the network received throughput as the desired prediction metric for preparing the scaling action described in the planning phase in Section 3.2.3.

$M D = (\begin{matrix} m d_{1, t - w} & \dots & m d_{1, t} \\ m d_{2, t - w} & \dots & m d_{2, t} \end{matrix}) \overset{trained model}{\to} (m d_{1, t + 1})$

(2)

As expressed in Equation (2), the multivariate forecasting model in this paper attempts to predict the value at the next time step $t + 1$ , given a history of multivariate time series dataset $M D = m d_{i, j} | i = 1, 2; j = t - w, \dots, t - 1, t$ in the past, where i and w denote the numbers of features and historical time step data, respectively. The proposed model comprises two features: the network received throughput and the feature with the highest correlation chosen from the feature selection service above.
The multivariate Bi-LSTM forecasting service proposed in this paper collects historical metric data with size w from the Prometheus time series database. Then, the trained model uses the current collected data to predict the data at the next time step $t + 1$ of the network received throughput, $m d_{1, t + 1}$ .
Although the dependencies of long-term states can be established theoretically, an RNN can only learn short-term dependencies because of the gradient vanishing problem [24]. This problem is called long-term dependence. LSTM is a special type of RNN. It was proposed by Hochreiter and Schmihuber in 1996 [25] to solve the RNN problem. Furthermore, it is used in various fields, including weather forecasting, image processing, video analysis, and time series forecasting. The long-term temporal dependence properties of time series data can be learned successfully using LSTM [25]. A typical LSTM unit consists of five components, as shown in the right corner of Figure 2. The inputs of the multivariate Bi-LSTM model include two features: $X_{1}$ and $X_{2}$ , which are the network received throughput and the chosen feature from the feature selection service. Each red dashed box area presents a matrix of time series data of both input features at a specific time step. For example, the red dashed box area of $X_{t - w}$ is a matrix [ $x_{1, t - w}$ , $x_{2, t - w}$ ] that contains two separate historical time series data of features $X_{1}$ and $X_{2}$ , respectively, at time step $t - w$ . We use $i_{t}, f_{t}, c_{t}, o_{t},$ and $h_{t}$ to respectively denote the input gate, forward gate, memory cell, output gate, and hidden state of a basic LSTM unit. The gates and memory cell of the LSTM unit allow it to memorize or forget information.

$i_{t} = σ (M_{i} x_{t} + N_{i} h_{t - 1} + b_{i})$

(3)

$f_{t} = σ (M_{f} x_{t} + N_{f} h_{t - 1} + b_{f})$

(4)

$o_{t} = σ (M_{o} x_{t} + N_{o} h_{t - 1} + b_{o})$

(5)

$\tilde{c_{t}} = t a n h (M_{c} x_{t} + N_{c} h_{t - 1} + b_{c})$

(6)

$c_{t} = f_{t} ⊙ c_{t - 1} + i_{t} ⊙ \tilde{c_{t}}$

(7)

$h_{t} = o_{t} ⊙ t a n h (c_{t})$

(8)

We denote $h_{t}$ and $c_{t}$ as the hidden state and memory state of the LSTM, respectively. The LSTM computes $h_{t}$ and $c_{t}$ for an input time series value at time step t. We denote $σ$ and ⊙ as the activation function and Hadamard product (elementwise product), respectively. The Hadamard product is a binary operation that takes two matrices of the same dimensions and outputs a new matrix of the same dimensions as the operands. However, because LSTM can only learn the past context of time series data, it has a restriction. Thus, in this study, we propose to use Bi-LSTM. Bi-LSTM is a special type of LSTM [15] that has two LSTM layers for simultaneously processing the input time series data in two directions. The first LSTM processes sequence input data in a forward direction from t = 1 to t = T. In contrast, the second LSTM processes the input data in the backward direction from t = T to t = 1. This Bi-LSTM architecture executes the input in two directions: from the past to the future and from the future to the past. Applying LSTM twice results in further learning of the long-term dependencies. Therefore, it improves the accuracy of the model. The equations for Bi-LSTM are as follows:

$\vec{i_{t}} = σ (\vec{M_{i}} \vec{x_{t}} + \vec{N_{i}} \vec{h_{t - 1}} + \vec{b_{i}})$

(9)

$\vec{f_{t}} = σ (\vec{M_{f}} \vec{x_{t}} + \vec{N_{f}} \vec{h_{t - 1}} + \vec{b_{f}})$

(10)

$\vec{o_{t}} = σ (\vec{M_{o}} \vec{x_{t}} + \vec{N_{o}} \vec{h_{t - 1}} + \vec{b_{o}})$

(11)

$\vec{\tilde{c_{t}}} = t a n h (\vec{M_{c}} \vec{x_{t}} + \vec{N_{c}} \vec{h_{t - 1}} + \vec{b_{c}})$

(12)

$\vec{c_{t}} = \vec{f_{t}} ⊙ \vec{c_{t - 1}} + \vec{i_{t}} ⊙ \vec{\tilde{c_{t}}}$

(13)

$\vec{h_{t}} = \vec{o_{t}} ⊙ \vec{t a n h (c_{t})}$

(14)

$\overset{\leftarrow}{i_{t}} = σ (\overset{\leftarrow}{M_{i}} \overset{\leftarrow}{x_{t}} + \overset{\leftarrow}{N_{i}} \overset{\leftarrow}{h_{t - 1}} + \overset{\leftarrow}{b_{i}})$

(15)

$\overset{\leftarrow}{f_{t}} = σ (\overset{\leftarrow}{M_{f}} \overset{\leftarrow}{x_{t}} + \overset{\leftarrow}{N_{f}} \overset{\leftarrow}{h_{t - 1}} + \overset{\leftarrow}{b_{f}})$

(16)

$\overset{\leftarrow}{o_{t}} = σ (\overset{\leftarrow}{M_{o}} \overset{\leftarrow}{x_{t}} + \overset{\leftarrow}{N_{o}} \overset{\leftarrow}{h_{t - 1}} + \overset{\leftarrow}{b_{o}})$

(17)

$\overset{\leftarrow}{\tilde{c_{t}}} = t a n h (\overset{\leftarrow}{M_{c}} \overset{\leftarrow}{x_{t}} + \overset{\leftarrow}{N_{c}} \overset{\leftarrow}{h_{t - 1}} + \overset{\leftarrow}{b_{c}})$

(18)

$\overset{\leftarrow}{c_{t}} = \overset{\leftarrow}{f_{t}} ⊙ \overset{\leftarrow}{c_{t - 1}} + \overset{\leftarrow}{i_{t}} ⊙ \overset{\leftarrow}{\tilde{c_{t}}}$

(19)

$\overset{\leftarrow}{h_{t}} = \overset{\leftarrow}{o_{t}} ⊙ \overset{\leftarrow}{t a n h (c_{t})}$

(20)

$h_{t} = \vec{h_{t}} ⊙ \overset{\leftarrow}{h_{t}}$

(21)

As shown by the above equations, the arrow shows the processing direction. We denote $h_{t}$ as the final hidden output of Bi-LSTM, which is calculated by concatenating the hidden forward output $\vec{h_{t}}$ and hidden backward output $\overset{\leftarrow}{h_{t}}$ . The dense layer is a neural network layer that is deeply connected, which means that each neuron in the dense layer receives input from all neurons of its previous layer and then it outputs the predictions.

3.2.3. Planning Phase

The resource estimation service calculates the number of resources (VMs or containers) based on the predicted network received throughput of the previous step,

n e t w o r k_{t + 1}

, as shown in Algorithm 1. The autoscaler has to deal with oscillation problems that frequently perform opposite scaling actions in a short time. This problem will result in a wastage of resources and budget. We applied the control period time (CPT) between each scaling action to deal with this problem. In addition, we set the CPT value to 1 min, which is fine grained for the VM setting.

V M s_{t + 1}

and

n e t w o r k_{V M}

are the number of estimated VMs in the next time step and the maximum network workload that a VM can handle in a minute, respectively.

V M s_{m i n}

is the minimum number of VMs, and the system cannot scale down the number of VMs to a number lower than this value. For each CPT (Line 3), we calculated

V M s_{t + 1}

(Line 4). If

V M s_{t + 1}

is higher than the current number of VMs

V M s_{c u r r e n t}

(Line 5), for the next time step, we scheduled the scale-up command to provision the number of VMs required to fulfill the resource requirement (Line 6). Otherwise, if

V M s_{t + 1}

is lower than

V M s_{c u r r e n t}

(Line 7), we obtained the maximum value between

V M s_{t + 1}

and

V M s_{m i n}

(Line 8), and then scheduled the scale-down command to deprovision unneeded VMs (Line 9). Otherwise, the autoscaler sleeps and waits until the next interval to keep recalculating the number of needed VMs and performing scaling actions (Line 11).

Algorithm 1 Adaption strategy service algorithm.

Input:

n e t w o r k_{t + 1}

// predicted workload in the next interval

V M s_{m i n}

// minimum number of VMs that the system has to maintain

Output: Scheduled scaling actions

1:: Initialization
2:: while system is running do
3:: for each CPT do
4:: $V M s_{t + 1}$ = $\frac{n e t w o r k_{t + 1}}{n e t w o r k_{V M}}$
5:: if $V M s_{t + 1}$ > $V M s_{c u r r e n t}$ then
6:: $S C H E D U L E_S C A L E_U P_C O M M A N D$ ( $V M s_{t + 1})$
7:: else if $V M s_{t + 1}$ < $V M s_{c u r r e n t}$ then
8:: $V M s_{t + 1}$ $= m a x (V M s_{t + 1}, V M s_{m i n})$
9:: $S C H E D U L E_S C A L E_D O W N_C O M M A N D$ ( $V M s_{t + 1})$
10:: else
11:: Sleep
12:: end if
13:: end for
14:: end while

3.2.4. Execution Phase

The execution phase has two cases. If it receives a scheduled scale-up command, the execution phase first sends the scale-up command to the effector component to spawn a number of VMs to satisfy the demand. Subsequently, it schedules the time required to assign the spawned VMs to a cluster at the next time step. Otherwise, when it receives a scheduled scale-down command, it waits until the next time step and begins to discard the VMs according to the demand.

4. Experiment and Evaluation

To evaluate the proposed system framework described in Section 4, all experiments were performed using real workload datasets obtained from the Bitbrains and Materna data centers.

4.1. Experiment Detail

First, we calculated the Pearson correlation value of every resource metric pair of other resource metrics (CPU utilization, memory usage, etc.) with the network received throughput to select the resource metric with the highest correlation value.

Second, we used the chosen resource and network received throughput metrics as inputs to the proposed multivariate Bi-LSTM model and compared them with the univariate one. The univariate Bi-LSTM model only has a network received throughput metric as its input. Furthermore, the multivariate Bi-LSTM model was compared with other well-known multivariate deep learning models (LSTM and CNN-LSTM) [11] to evaluate its performance. All prediction models performed prediction and output the future network received throughput data of the system. We used the Python programming language and trained the models on the tensor processing unit from Google Colab [26]. Furthermore, the code for implementing these models followed the code in the Deep Learning for Time Series Forecasting book [27]. The configuration of the proposed multivariate Bi-LSTM model is presented in Table 2.

Finally, we simulated the environment to evaluate the resource provisioning efficiency of the proposed multivariate Bi-LSTM autoscaler compared with those of other multivariate deep-learning-based autoscalers, as mentioned above. In this experiment, we used the first 200 observations in the testing set of the GWA-T-13 dataset to estimate the resource provision efficiency of each multivariate model. In addition, we assumed that each VM could handle 3300 KB/s of network received throughput,

n e t w o r k_{V M} = 3300

, and that the workload was balanced between VMs.

4.2. Dataset

We evaluated the proposed multivariate Bi-LSTM model on two real-world workload datasets: GWA-T-12 [23] from the Bitbrains data center and GWA-T-13 [28] from the Materna data center.

4.2.1. GWA-T-12 Bitbrains

The GWA-T-12 [23] logs contain the performance metrics of 1750 VMs from the Bitbrains data center. These logs are organized according to the following traces: fastStorage and Rnd. This study used the first trace—fastStorage—which contains the performance metrics of 1250 VMs that are connected to the storage devices of a fast storage area network.

4.2.2. GWA-T-13 Materna

The GWA-T-13 [28] dataset contains three traces of the performance metrics of 520, 527, and 547 VMs from the Materna data center. Materna is a premium full-service supplier that has successfully implemented innovation-through-collaboration projects for its clients for over 35 years. We chose the first trace that contained the performance metrics of 520 VMs.

4.3. Metric for Validation

Three standard statistical metrics, mean absolute error (MAE), MSE, and RMSE, were used to evaluate the prediction accuracy of the models. The equations of these metrics are expressed in Equations (22)–(24), respectively.

M A E = \frac{1}{n} \sum_{i = 1}^{n} |(v_{i} - l_{i})|

(22)

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(v_{i} - l_{i})}^{2}

(23)

R M S E = \sqrt{M S E} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(v_{i} - l_{i})}^{2}}

(24)

We denote

v_{i}

and

l_{i}

as predicted and true values, respectively. The MAE is the difference between the predicted and true values. The MSE calculates the average squared difference between the predicted and true values. The RMSE is the square root of the average of the squared differences between the prediction and true observation. These metrics are the three most commonly used metrics for measuring the accuracy of continuous data, where smaller values indicate higher prediction accuracy and vice versa.

To evaluate the resource provisioning efficiency, we applied the system-oriented elasticity metrics presented by the Research Group of the Standard Performance Evaluation Corporation (SPEC) [29]. Various studies [3,9,30] have also used these metrics to evaluate the resource provisioning efficiency of their autoscalers. The equations of the metrics are shown in Equations (25)–(29), where

d_{n}

and

p_{n}

are the resource demand and provisioned resources, respectively.

The under-provisioning resource metric,

Θ_{U}

, calculates the number of required VMs that must be provisioned to meet the service-level objectives normalized by the measured time. The over-provisioning resource metric,

Θ_{O}

, calculates the number of excess VMs that are provisioned by the autoscaler. The values of

Θ_{U}

and

Θ_{O}

range between 0 and infinity, where 0 is the best value.

T_{U}

and

T_{O}

are the durations the system is under-provisioned and over-provisioned, respectively.

T_{U}

and

T_{O}

have values ranging from 0 to 100, with 0 being the optimal value when no under-provisioning or over-provisioning occurs throughout the measurement period. Finally, the elastic speed-up metric,

ϵ_{n a}

, is the efficiency of the autoscaling compared with that of the no-autoscaling case, where $n a$ indicates no autoscaler is used and a indicates otherwise. The default

ϵ_{n a}

value of $n a$ is 1.0. If the proposed autoscaler (a case) has an

ϵ_{n a}

value greater than 1.0, it means that the autoscaler has an autoscaling gain over the no-autoscaler case; otherwise, it has none.

Θ_{U} [%] = \frac{100}{T} \sum_{t = 1}^{T} \frac{m a x (d_{n} - p_{n}, 0)}{d_{n}} Δ t

(25)

Θ_{O} [%] = \frac{100}{T} \sum_{t = 1}^{T} \frac{m a x (p_{n} - d_{n}, 0)}{d_{n}} Δ t

(26)

T_{U} [%] = \frac{100}{T} \sum_{t = 1}^{T} m a x (s g n (d_{n} - p_{n}), 0) Δ t

(27)

T_{O} [%] = \frac{100}{T} \sum_{t = 1}^{T} m a x (s g n (p_{n} - d_{n}), 0) Δ t

(28)

ϵ_{n a} = {(\frac{Θ_{U, n a}}{Θ_{U, a}} \frac{Θ_{O, n a}}{Θ_{O, a}} \frac{T_{U, n a}}{T_{U, a}} \frac{T_{O, n a}}{T_{O, a}})}^{\frac{1}{4}}

(29)

4.4. Experiment Results

4.4.1. Feature Selection Results

GWA-T-12 dataset:
Table 3 shows the Pearson correlation values of the network received throughput metric with other metrics for the GWA-T-12 dataset. As can be seen in the table, memory usage had the highest correlation value of 0.746. Thus, the network received throughput and memory usage were used as inputs of the multivariate models to evaluate the prediction accuracy;
GWA-T-13 dataset:
Table 4 presents the Pearson correlation values of the network received throughput metric with other metrics for the GWA-T-13 dataset. The memory usage metric in this dataset also had the highest correlation value. Thus, the memory usage and network received throughput were used as inputs to the multivariate models to evaluate their prediction accuracy.

We preprocessed the dataset to a value in the range [0, 1] based on Equation (30). Subsequently, we divided each dataset into training (80%) and evaluation (20%) datasets for both datasets.

As mentioned earlier, the feature selection can attempt to prevent overfitting of the prediction model. Figure 3 presents the model loss of the proposed multivariate Bi-LSTM model during the training process. The training dataset (which contains 80% of the entire dataset) was split into two parts: training (80%) and validation (20%). As shown in Figure 3, the loss during the training and validation processes decreased over time. Thus, the model did not have an overfitting problem.

z = \frac{x - m i n (x)}{m a x (x) - m i n (x)}

(30)

4.4.2. Prediction Performance Results

GWA-T-12 dataset:
The MAE, MSE, and RMSE metric values for the multivariate Bi-LSTM model were lower than those for the univariate Bi-LSTM model, as shown in Table 5. The proposed multivariate Bi-LSTM had an RMSE prediction error 1.84-times smaller than that of the univariate one. These results indicate that Bi-LSTM had a better prediction accuracy. Figure 4 shows the absolute error values for the first 100 observations of the univariate and multivariate Bi-LSTM models. We can see in the figure that the multivariate Bi-LSTM had smaller absolute error values in every observation than those of the univariate Bi-LSTM.
Table 6 also indicates that the multivariate Bi-LSTM model had smaller prediction error values for the MAE, MSE, and RMSE metrics than those of the multivariate LSTM and CNN-LSTM models. Moreover, the multivariate Bi-LSTM had better accuracy than those of the other models. We plotted the absolute error values of the first 100 observations using different multivariate prediction models, as shown in Figure 5. In the figure, we can clearly see that the multivariate Bi-LSTM had smaller absolute error values in all 100 observations. This proves that the time series data in the forward direction do not provide effective information other than to integrate the backward direction of the time series, causing the LSTM and CNN-LSTM models to become less effective;
GWA-T-13 dataset:
Table 7 lists the results obtained by the univariate and multivariate Bi-LSTM models on the GWA-T-13 dataset. The multivariate Bi-LSTM model outperformed the univariate one, reducing the RMSE prediction error by 3.4% compared with that of the univariate model. Figure 6 and Figure 7 show the results obtained on the testing dataset. The multivariate Bi-LSTM model had better prediction results. We can clearly see this by observing some high peak points where the multivariate model had closer predicted values to the actual values than those of the univariate model. Figure 8 visualizes the absolute error for the first 100 observations of both models and shows that the multivariate Bi-LSTM had smaller prediction absolute error values than those of the univariate one.
When compared with other well-known multivariate models listed in Table 8, the Bi-LSTM model also had better prediction accuracy than those of the LSTM and CNN-LSTM models. It reduced the RMSE prediction error by 6.7% and 5.4% when compared with the LSTM and CNN-LSTM models, respectively. Figure 9 and Figure 10 show the predicted values versus the actual values on the testing dataset. Figure 9 tends to over-predict over time when compared with Figure 7 and Figure 10. Figure 11 shows the absolute error for the first 100 observations of the three multivariate models, where Bi-LSTM had smaller prediction absolute error values than those of the LSTM and CNN-LSTM models.

4.4.3. Evaluation of the Autoscaler

Figure 12, Figure 13 and Figure 14 show the number of provisioned VMs versus the number of required VMs of the different multivariate models calculated using Algorithm 1. A comparison of the resource provisioning of the autoscaler using different multivariate models is also shown in Table 9.

The Bi-LSTM autoscaler performed better by having smaller over-provisioned (

Θ_{O}

,

T_{0}

) and under-provisioned (

Θ_{U}

,

T_{U}

) resource metric values than those of the LSTM autoscaler. Furthermore, it had better elastic speed-up

ϵ_{n a}

values: 1.759 > 1.195 (increase of 47.2%), which indicate that it had better autoscaling gain than that of the LSTM autoscaler.

Although the CNN-LSTM autoscaler exhibited a similar under-provisioning value to that of Bi-LSTM, its over-provisioning value was higher. Consequently, the Bi-LSTM autoscaler exhibited better autoscaling gain than that of the CNN-LSTM autoscaler. This was demonstrated by the 14.7% increase in the

ϵ_{n a}

values when compared with those of the CNN-LSTM autoscaler.

5. Conclusions

Cloud computing plays a significant role in improving the efficiency of enterprises, industries, etc. Autoscaling is an essential mechanism in cloud computing that automatically adds or removes resources. However, over time, the workload demands fluctuate continuously, which makes their prediction difficult. To predict the resource workload, many studies have proposed a technique called time series forecasting. However, most of these studies have focused only on univariate time series forecasting. This study proposed an efficient multivariate autoscaling framework using Bi-LSTM for cloud computing environments. The proposed autoscaling system framework can be flexibly implemented for VM platforms (such as OpenStack [31]) and container platforms (such as Docker Swarm [32] and Kubernetes [33]). The experiments were conducted on actual trace workload datasets from the Bitbrains and Materna data centers. The evaluation results showed that our proposed multivariate Bi-LSTM model can outperform not only the univariate Bi-LSTM model, but also other well-known multivariate deep learning models such as LSTM and CNN-LSTM in terms of prediction accuracy. Furthermore, this work also showed the advantage in the resource-provision efficiency of the proposed multivariate Bi-LSTM autoscaler, which had better autoscaling gains than those of the other multivariate deep-learning-based autoscalers.

In future work, we plan to extend our work to hybrid autoscaling (which fuses reactive and proactive autoscaling) and extend the inputs of the proposed prediction model to evaluate its performance.

Author Contributions

N.-M.D.-Q. proposed the idea, performed the analysis, and wrote the manuscript. M.Y. provided the guidance for the data analysis and paper writing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-0-02046 and IITP-2021-2017-0-01633) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

$r_{p e a r}$	Pearson correlation
X	Features
${\bar{x}}_{i}$	Average value of feature $X_{i}$
${\bar{x}}_{j}$	Average value of feature $X_{j}$
$M D$	History of multivariate time series dataset
$m d_{i, j} \| i = 1, 2; j = t - w, \dots, t - 1$	History time series data of the selected feature at a
	specific time step
$i_{t}$	Input gate of an LSTM unit
$f_{t}$	Forward gate of an LSTM unit
$c_{t}$	Memory cell of an LSTM unit
$o_{t}$	Output gate of an LSTM unit
$h_{t}$	Hidden state of an LSTM unit
$σ$	Activation functions
⊙	Hadamard product
$n e t w o r k_{t + 1}$	Predicted workload in the next interval
$V M s_{m i n}$	Minimum number of VMs that the system has to maintain
CPT	Control duration
$V M s_{t + 1}$	Number of estimated VMs at the next interval
$n e t w o r k_{V M}$	Maximum network workload that a VM can handle in a minute
$n e t w o r k_{t + 1}$	Estimated network workload in the next time step
$V M s_{c u r r e n t}$	Number of current VMs in the system
v	Predicted values
l	True values
$M A E$	Mean absolute error
$M S E$	Mean-squared error
$R M S E$	Root-mean-squared error
$θ_{U}$	Under-provisioning resource metric
$θ_{O}$	Over-provisioning resource metric
$T_{U}$	Duration system is under-provisioned
$T_{O}$	Duration system is over-provisioned
$d_{n}$	Number of required VMs to meet SLOs
$p_{n}$	Number of provided VMs by the autoscaling system
$ϵ_{n a}$	Elastic speed-up value or autoscaling gain value

References

Aslanpour, M.S.; Gill, S.S.; Toosi, A.N. Performance evaluation metrics for cloud, fog and edge computing: A review, taxonomy, benchmarks and standards for future research. Internet Things 2020, 12, 100273. [Google Scholar] [CrossRef]
Varghese, B.; Buyya, R. Next generation cloud computing: New trends and research directions. Future Gener. Comput. Syst. 2018, 79, 849–861. [Google Scholar] [CrossRef] [Green Version]
Dang-Quang, N.M.; Yoo, M. Deep Learning-Based Autoscaling Using Bidirectional Long Short-Term Memory for Kubernetes. Appl. Sci. 2021, 11, 3835. [Google Scholar] [CrossRef]
Cruz Coulson, N.; Sotiriadis, S.; Bessis, N. Adaptive Microservice Scaling for Elastic Applications. IEEE Internet Things J. 2020, 7, 4195–4202. [Google Scholar] [CrossRef]
Yan, M.; Liang, X.; Lu, Z.; Wu, J.; Zhang, W. HANSEL: Adaptive horizontal scaling of microservices using Bi-LSTM. Appl. Soft Comput. 2021, 105, 107216. [Google Scholar] [CrossRef]
Prachitmutita, I.; Aittinonmongkol, W.; Pojjanasuksakul, N.; Supattatham, M.; Padungweang, P. Auto-scaling microservices on IaaS under SLA with cost-effective framework. In Proceedings of the 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), Xiamen, China, 29–31 March 2018; pp. 583–588. [Google Scholar] [CrossRef]
Li, Y.; Xia, Y. Auto-scaling web applications in hybrid cloud based on docker. In Proceedings of the 2016 5th International Conference on Computer Science and Network Technology (ICCSNT), Changchun, China, 10–11 December 2016; pp. 75–79. [Google Scholar] [CrossRef]
Calheiros, R.N.; Masoumi, E.; Ranjan, R.; Buyya, R. Workload Prediction Using ARIMA Model and Its Impact on Cloud Applications’ QoS. IEEE Trans. Cloud Comput. 2015, 3, 449–458. [Google Scholar] [CrossRef]
Imdoukh, M.; Ahmad, I.; Alfailakawi, M.I. Machine learning-based autoscaling for containerized applications. Neural Comput. Appl. 2019, 32, 9745–9760. [Google Scholar] [CrossRef]
Toka, L.; Dobreff, G.; Fodor, B.; Sonkoly, B. Adaptive AI-based autoscaling for Kubernetes. In Proceedings of the 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, 14 July 2020; pp. 599–608. [Google Scholar] [CrossRef]
Ouhame, S.; Hadi, Y.; Ullah, A. An efficient forecasting approach for resource utilization in cloud data center using CNN-LSTM model. Neural Comput. Appl. 2021, 33, 10043–10055. [Google Scholar] [CrossRef]
Fu, T. A review on time series data mining. Eng. Appl. Artif. Intell. 2011, 24, 164–181. [Google Scholar] [CrossRef]
Paszkiel, S. Using Neural Networks for Classification of the Changes in the EEG Signal Based on Facial Expressions. In Analysis and Classification of EEG Signals for Brain Computer Interfaces; Springer International Publishing: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.L.; Chen, S.C.; Iyengar, S.S. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 2018, 51, 1–36. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef] [Green Version]
De Gooijer, J.G.; Hyndman, R.J. 25 years of time series forecasting. Int. J. Forecast. 2006, 22, 443–473. [Google Scholar] [CrossRef] [Green Version]
De Gooijer, J.G.; Hyndman, R.J. A distributed spatial–temporal weighted model on MapReduce for short-term traffic flow forecasting. Neurocomputing 2016, 179, 246–263. [Google Scholar] [CrossRef]
Ahmad, S.; Lavin, A.; Purdy, S.; Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 2017, 262, 134–147. [Google Scholar] [CrossRef]
Projects, W. Page View Statistics for Wikimedia Projects. Available online: https://dumps.wikimedia.org/other/pagecounts-raw/ (accessed on 31 December 2021).
Kubernetes. Horizontal Pod Autoscaler|Kubernetes. Available online: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ (accessed on 31 December 2021).
Arlitt, M. 1998 World Cup Web Site Access Logs-FTP Directory Listing. Available online: ftp://ita.ee.lbl.gov/html/contrib/WorldCup.html (accessed on 31 December 2021).
Jackson, J. Multivariate Techniques: Advantages and Disadvantages. Available online: https://www.theclassroom.com/multivariate-techniques-advantages-disadvantages-8247893.html (accessed on 31 December 2021).
Bitbrains. GWA-T-12 Bitbrains. 2021. Available online: http://gwa.ewi.tudelft.nl/datasets/gwa-t-12-bitbrains (accessed on 31 December 2021).
Sundermeyer, M.; Schlüter, R.; Ney, H. LSTM Neural Networks for Language Modeling. In Proceedings of the INTERSPEECH, Portland, OR, USA, 9–13 September 2012. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. LSTM can solve hard long time lag problems. MIT Press: Cambridge, MA, USA, 1996; pp. 473–479. [Google Scholar]
Colab, G. Google Colaboratory. Available online: https://colab.research.google.com/ (accessed on 31 December 2021).
Brownlee, J. Predict the Future with MLPs, CNNs and LSTMs in Python. In Deep Learning for Time Series Forecasting. Available online: https://machinelearningmastery.com/deep-learning-for-time-series-forecasting/ (accessed on 31 December 2021).
Materna. GWA-T-13 Materna. 2021. Available online: http://gwa.ewi.tudelft.nl/datasets/gwa-t-13-materna (accessed on 31 December 2021).
Bauer, A.; Grohmann, J.; Herbst, N.; Kounev, S. On the Value of Service Demand Estimation for AutoScaling; Springer: Berlin/Heidelberg, Germany; pp. 142–156.
Herbst, N.; Krebs, R.; Oikonomou, G.; Kousiouris, G.; Evangelinou, A.; Iosup, A.; Kounev, S. Ready for Rain ? A View from SPEC Research on the Future of Cloud Metrics. arXiv 2016, arXiv:1604.03470. [Google Scholar]
Openstack. Open Source Cloud Computing Infrastructure-Openstack. Available online: https://www.openstack.org/ (accessed on 31 December 2021).
Swarm, D. Docker Swarm Overview. Available online: https://docs.docker.com/engine/swarm/ (accessed on 31 December 2021).
Kubernetes. Kubernetes. Available online: https://kubernetes.io/ (accessed on 31 December 2021).

Figure 1. Proposed autoscaling framework.

Figure 2. Proposed Bi-LSTM neural network architecture.

Figure 3. Model loss of the MSE metric for the proposed multivariate Bi-LSTM model.

Figure 4. Absolute error for the first 100 observations of univariate and multivariate Bi-LSTM models using the GWA-T-12 dataset.

Figure 5. Absolute error for the first 100 observations of multivariate models using the GWA-T-12 dataset.

Figure 6. Result of univariate Bi-LSTM using the GWA-T-13 dataset.

Figure 7. Result of multivariate Bi-LSTM using the GWA-T-13 dataset.

Figure 8. Absolute error for the first 100 observations of the univariate and multivariate Bi-LSTM models using the GWA-T-13 dataset.

Figure 9. Result of multivariate LSTM using the GWA-T-13 dataset.

Figure 10. Result of multivariate CNN-LSTM using the GWA-T-13 dataset.

Figure 11. Absolute error for the first 100 observations of the multivariate models using the GWA-T-13 dataset.

Figure 12. Number of VMs provided by the autoscaler using multivariate LSTM.

Figure 13. Number of VMs provided by the autoscaler using multivariate CNN-LSTM.

Figure 14. Number of VMs provided by the autoscaler using multivariate Bi-LSTM.

Table 1. Autoscaling- and resource estimation-related studies using time series forecasting in cloud computing.

Related Studies	Performance Metric(s)	Type	Method
[8]	HTTP requests	Univariate	ARIMA
[7]	CPU utilization	Univariate	ARIMA
[6]	HTTP requests	Univariate	LSTM and MLP
[9]	HTTP requests	Univariate	LSTM
[10]	HTTP requests	Univariate	AR, HTM, LSTM
[3]	HTTP requests	Univariate	Bi-LSTM
[5]	CPU utilization, memory usage	Multivariate	Bi-LSTM
[11]	CPU utilization, memory usage,	Multivariate	CNN-LSTM
	network usage and disk usage

Table 2. Multivariate Bi-LSTM model configuration.

Number of layers	2 layers (forward and backward)
Number of features	2
Input size	3 neural cells
Number of hidden units	50
per neural cell
Loss function	MSE
Batch size	64
Epochs	10
Activation function	ReLU

Table 3. Feature selection of the GWA-T-12 dataset.

Pair of	Pearson Correlation
Metrics	Value
corr(network, CPU)	0.714
corr(network, memory)	0.746
corr(network, disk read)	0.637
corr(network, disk write)	0.63