In this section, we outline the theories related to the research we conducted and the research opportunities we will pursue. The live migration model, aimed at maintaining the performance of virtualization technology known as auto-scale, is an optimization strategy to adjust resource capacity in cloud computing based on workload demand fluctuations. This approach enables applications to stay responsive and available during traffic spikes or load reductions [
20]. On the other hand, live virtual migration (LVM) is the technique utilized in virtualization in data centers for moving VMs from one host machine (HM) to another without disrupting the running applications. This allows for maintenance, resource optimization, and high availability in virtualized environments. Both concepts play a significant role in maintaining the performance and availability of current computing systems [
21]. In this investigation, the authors focus exclusively on the technology of live virtual machine migration in data centers [
22]. For example, in a single LVM process, only one VM is involved at a time, while the migration of a VM group allows the movement of a group of interdependent VMs together [
23,
24]. This approach can enhance the consistency and coherence of applications during migration. Furthermore, migration can also be classified according to the state of the virtual machine workload, including memory, CPU, disk, and other settings, from the source host machine to the destination host machine [
3]. Each classification in VM migration has different implications depending on the specific needs and environments [
25,
26]. Additionally, workload characteristics and VM management policies must be considered when determining the appropriate placement. Other factors, such as network traffic, storage requirements, and application-based separation policies, can also influence the placement of virtual machines.
In this regard, the impact on the network should be considered, including the location of the virtual machine from the network nodes and the usage to optimize latency and network performance [
7,
27]. Security and compliance aspects are also crucial considerations in VM placement, requiring security policies and compliance requirements to be executed in the proper VM placement. By considering these factors, the right placement algorithm can be used for live virtual machine migration, thus achieving optimal migration success. The authors identify several challenges in VM migration that need to be considered. During the migration of virtual machines, the primary goal is to complete the migration on time and place the virtual machine on the appropriate host. To achieve this, an effective model approach is needed to calculate VM migration that is based on the service level agreement (SLA) for CPU and RAM by determining the correct VM placement to improve VM performance [
28,
29].
Previous Studies on Live Migration Technology
The authors identified a significant gap in LVM improvement, as studied by Haris et al. (2023) in machine learning. We opted to enhance machine learning modeling with a hybrid approach, thereby optimizing the LVM approach using the Markov Decision Process (MDP) and genetic algorithm (GA) in decision making and LVM scheduling selection. This is consistent with the research conducted by Guo et al. in 2020, which suggests that the Markov Decision Process (MDP) is a commonly utilized type of natural language processing (NLP) in reinforcement learning to determine the best policy [
18]. However, the upcoming research aims to go further by developing hybrid machine learning to determine VM workloads with higher loads and presenting detailed information on LVM preparation. It involves selecting which VM instances should be migrated and when the optimal time for LVM execution is guided by optimal HM objectives. In this study, the MDP algorithm aligns with the research conducted by Alqarni et al. (2023), where problem-solving involves the reallocation of resources [
30]. This hybrid ML can significantly impact the prediction of overloaded VM issues and selection of scheduling processes in the LVM. In
Table 1, we explore previous research conducted on LVM processes.
The Markov Decision Process (MDP) is an algorithm for decision-making processes and can represent an environment. Some prediction problems can be formulated using MDP equations. MDP is the traditional formalization of sequential decision making in which decisions have an impact on the next stage through future delayed rewards in addition to current benefits [
37,
38]. Thus, MDP is associated with delayed rewards and the need to modify these rewards over time [
39].
In this investigation, the authors used the Markov Decision Process (MDP) to analyze the support for learning between the specialist and the environment. Within the multi-agent system setting, the MDP can be represented as a five-tuple
, where
S is the state space,
A is the action space,
is a reward function,
is a transition state function, and
is a discount factor. The agent should select the action based on its policy
and reward outcome
. Subsequently, the environment will change for the next argument of
T. In a multi-agent environment, it can be seen as
, a combination of local observations for
N agents, and
a set of actions [
40]. Meanwhile, the reward and transition functions are subject to change
,
. The discounted reward is calculated using the equation
. The authors aim to maximum the reward in the policy, which is expected to be expressed in the equation
[
41].
According to scholars, the genetic algorithm has become a search technique that can solve estimates for optimization and search problems. This algorithm utilizes techniques such as derivatives, mutations, and crossovers [
42]. Genetic algorithms (AG) may offer solutions to strike a balance between accuracy and efficiency trade-offs. The authors aim to address scheduling in virtual machine placement using a genetic algorithm approach [
43]. Initially, the authors propose an arrangement that will be denoted as a decision variable d with first-fit and best-fit algorithms. These arrangements are considered the initial population. Most issues with job scheduling are known to be NP-complete [
44]. During the selection process, the solution with the best fitness will be chosen, while the others will be discarded [
45,
46].
Within the selection process, the first step is to calculate the choice likelihood for each chromosome. The likelihood of determination of the chromosome is calculated using Equation (1).
The probability of selection is calculated and then divided into segments, the width of which is proportional to the selection probability. The final step involves generating a random number and selecting a chromosome from the segment that contains the random number [
47]. In this genetic algorithm, crossover is used to produce the new offspring of the selected individual in the selection sequence. Various crossover methods are discussed in the literature review. In the studies using two-point crossover, the chromosome is initially divided into two segments [
48]. The location of the division on the chromosome is randomly selected and is represented in Equation (2) as follows:
where
and
are two intersection points, and
, creates a random number between one and the entire length of the chromosome. Subsequently, the two chromosomes between these cuts have their values changed, resulting in two new chromosomes. The crossover details are illustrated in
Figure 1 [
49].
Mutation operations introduce new features on chromosomes, namely exchange mutations, uniform mutations, and percussion mutations. In this strategy, the arrangement is arbitrarily chosen, and its quality value is replaced by the value of another quality arrangement [
50]. To confirm and demonstrate that each arrangement has an equal chance to induce new features, the starting and ending positions are first selected, and then the genes at those positions are swapped. In the next iteration, the position from the beginning is added and subtracted from the last position and then the gene value is exchanged at this position, and so on. An illustration can be seen in
Figure 2 below [
51].
The random forest (RF) algorithm is a popular learning algorithm used in regression and classification scenarios. This algorithm combines multiple decision trees (DTs) to generate predictions, resulting in more accurate outcomes. RF can be employed to assess feature importance and aid in feature selection [
52]. However, due to the need to create multiple decision trees, the training phase of RF can be computationally expensive. In DT-based algorithms like RF [
53,
54], the Gini impurity metric assesses impurity in a set of instances based on their class labels. The goal is to identify features that minimize the Gini index, indicating higher homogeneity among class labels within the instance [
55]. The Gini impurity metric can be mathematically calculated using Equation (3).