3.2. Problem Formulation
As shown in
Figure 3, it is our adopted network architecture [
26] of physical machines deployed in the cloud. The architecture makes use of fat-tree topology and is composed of three tiers. Core switches (
) are on top tier of the network architecture, that deliver corresponding users’ requests according to load balancer shown in
Figure 1. Each point of delivery (pod) comprises with aggregation switches (
) and access switches (
), they connect with all involving core switches. These two switches perform different network functions. The aggregation switches are capable of supporting many 10 GigE and GigE interconnects while providing a high-speed switching fabric with a high forwarding rate. They are required to provide redundancy and to maintain session state while providing valuable services to the access layer. In addition, the access switches can connect with PMs directly and provide a high GigE port density. The number of pods and three types of switches is related to the number of ports per switch. Besides, the number of ports per switch also determines the number of physical machines in pod. Suppose the number of ports per switch is
, then there will be
pods that contain
aggregation switches and
access switches in each pod. There are
cores switches that connect to each pod with
physical machines. Therefore, we can get
switches that communicate with
physical machines in total.
In this paper, a request communicates with VMs that provides corresponding services through mentioned above three tiers topology. These PMs that VMs could be placed on are commonly distributed location, where different pods are distributed in a datacenter as shown in
Figure 3. Take a specific PM
as an example, the distance between
and the four core switches is same. However, the switches in the communication path between them are not exactly the same. This can lead to a different request delay that affects quality of service (QoS) [
27]. In order to provide low delay service especially for delay-sensitive requests, these VMs should be placed on optimal PMs to make the average delay minimization.
In this paper, there are two types of PMs and their number is the same. One () is fit to VMs that provides services for data, another () is fit to VMs that provides services for computing. and are the index of PMs. We let denote the attribute of , and respectively represent the spare memory and spare CPU. Similarly, let denote the attribute of , and represent the spare memory and spare CPU respectively. Correspondingly, there are two types of VMs. One provides service for data (), another provides service for computing (). A and B respectively represent the kinds of corresponding service. The VMs for the kind service of data are denoted as , and respectively represent the needed memory and CPU. Similarly, denotes the bth kind service of computing and its needed memory and CPU are represented by and .
Therefore, a VM could be placed on the selected physical machine, only when the target physical machine has enough memory and CPU capacities:
or
In this paper, we suppose that a request could be denoted as ( R), R denotes the total number of user requests. We can see that two types of VM serve a request. In order to ensure better QoS to requests especially for delay-sensitive requests, we try to minimize the latency for all requests. This is rarely discussed in previous studies. Moreover, our paper considers the situation that a VM can serve several requests. However, a VM only serves one request in previous studies.
We only consider the general situation where a request could be served by two kinds of VMs (computing and data) in order to verify our proposed algorithms. There are also some applications that access CPU-intensive services or data-intensive services in practice. These requests could be denoted as for CPU-intensive services, or for data-intensive services. For the purpose of generality, we only consider the general situation that a request is served by and . Moreover, it is possible to allocate a data VM in a computing PM and vice versa. However, we mainly focus on the average RTT for requests and intend to verify our proposed algorithms’ effectiveness for coexisting two kinds PMs. Therefore, we do not discuss this specific situation in our paper.
Remark 1. The disk resources are easier to expand than CPU and memory. Most of the previous works [19,28,29] did not consider the disk space as well. Moreover, we only regard these resources as a constraint in this paper and they are not the main discussed contents of this paper. The data machines mentioned in our paper are similar to Amazon servers and we proposed VMs ( and ) are similar to Amazon image in which users can runs different applications. Amazon image types are defined by operating system, architecture and other parameters. However, Amazon image is used to run individualized applications for different users, which is similar to VMs ( and ) in our paper. They also have the specific demand for CPU and memory resources.
We assume that the types of and is and . It means that each type of provides different data services. In addition, the same is to . This “Types” is determined by different applications that run in and . Take as an example, and represent different applications.
In this paper, we regard
RTT as the metric of delay [
30] for requests based on fat-tree network topology. According to the network topology shown in
Figure 3, we define the
RTT between the
core switch
and PMs as follows:
and
where
is the
RTT between core switch
and
,
is the
RTT between core switch
and
. Function
represents the
RTT between two network nodes. Therefore, we can get the
RTT matrix
and
that are updated with the latest
RTT periodically.
Therefore, suppose that a request
is arranged to the
core switch
, its needed
and
is placed on
and
respectively. The request’s average
RRT could be calculated as follow:
Then we can define our optimization problem to minimize the total average
RTT for R requests: