**1. Introduction**

The study of multi-server queueing systems in most cases assumes the servers to be homogeneous when the individual service rates are the same for all the servers in the system. However, in many real applications, the assumption of the homogeneity cannot be valid, e.g., a group of servers with different types of processors as a consequence of irregular system updates, nodes in telecommunication networks with links of different unequal capacities and availability, nodes in wireless systems serving different mobile users, peer-to-peer services for data streaming, file sharing and storage, where heterogeneous servers arrive and depart randomly, multi-processor systems with heterogeneous processor's attributes like a throughput and an electric energy consumption, etc. Moreover, in many cases the heterogeneous server system outperforms its homogeneous server counterpart. This reality leads to necessity to analyse multi-server queueing systems with heterogeneous servers. The assumption of the heterogeneity of servers does not automatically mean that the stochastic modelling of such a queueing system becomes more complex. If the customers can change the server to a faster one during a service, in other words, the service is with a preemption, this is a classic one-dimensional birth-and-death process, that can be analysed in a standard way. The task of analysing a heterogeneous system becomes much more complex with the assumption that the customer cannot change the server during a service time, i.e., service without any preemption. In this case, on the one hand, the dimension of the corresponding random process increases and on the other hand, a mechanism for allocation of customers between the servers must be introduced.

The systems with heterogeneous servers are mostly investigated with respect to heuristic allocation policies, e.g., allocation according to the fastest server first (FSF) policy or the randomly chosen server (RCS). The results dedicated to the heterogeneous systems operating under these policies and some approximations of such models can be found in papers of Alves et al. [1], Bilgen and Altintas [2], Melikov et al. [3], The question of how to allocate the customers between the heterogeneous servers in order to minimize the mean number of customers in the system was studied for the queueing system with two servers in terms of a Markov decision process (MDP), e.g., by Larsen [4], Larsen and Agrawala [5], who conjectured the optimality of threshold policy that functions as follows: the fastest server must be used whenever it is idle and the slower one must be used only if the number of customers in the queue exceeds some prespecified threshold level *q* ≥ 1. Based on the MDP model, Lin and Kumar in [6] considered a similar problem and proved the optimality of a threshold policy. Simple proofs of corresponding results have later been given by Koole [7], Luh and Viniotis [8], Walrand [9] and Weber [10]. The problem of an optimal control of a two-server queueing system with failures was studied by Özkan and Kharoufeh [11]. The problem of the optimal control allocation in the systems with more than two servers were investigated by Armony and Ward [12], Efrosinin [13], Rosberg and Makowski [14], Viniotis and Ephremides [15]. Rykov in [16] gave evidence for certain monotonicity properties of an optimal policy in case of the mean number of customers minimization. The techniques to prove such results are based on monotonicity properties of the dynamic programming relative value function. The case of infinitely many servers was proposed by Shenker and Weinrib [17], where an asymptotic analysis of large heterogeneous queueing systems is performed.

As it was shown in [18,19], also taking into account the incompleteness of the theoretical proof noticed by Vericourt and Zhou in [20], the optimal allocation policy, which minimizes the mean number of customers in heterogeneous queueing system without preemption, belongs to a set of structural policies. According to this policy for the servers' enumeration (1) the first server is used whenever it is free and there is a waiting customer in the queue, while the empty server with a number *k* + 1 must be occupied only if the first *k* faster servers are busy and the number of customers in the queue reaches some threshold level *qk*+<sup>1</sup> ≥ 1. Numerical analysis shows that the threshold level *qk* in general case can have a very weak dependence of slower servers' states. Due to our observations, the optimal threshold may vary by at most 1 when the state of a slower server changes. Moreover, since this deviation has no influence on the mean number of customers in the system, it can be neglected. Hence the optimal allocation policy can be defined as a classic threshold one through a sequence of threshold levels 1 = *q*1 ≤ *q*2 ≤···≤ *qK* < <sup>∞</sup>, that is the first *k* servers must be occupied whenever there are *q* customers in the queue and *qk* ≤ *q* ≤ *qk*+<sup>1</sup> − 1.

While there is a certain amount of work being done on heterogeneous systems, there are still many open questions related to the accurate and quick calculation of the optimal control policy and the resulting performance characteristics. Searching for optimal values *q*2, ... , *qK* by a direct minimizing the mean number of customers in the system can be performed only for small *K* by solving the system of difference equations for the steady-state probabilities or by means of a matrix geometric approach introduced by Neuts [21]. However, when *K* is large, these methods become too complicated. For example, the involved in computation matrix sizes become infeasible large even for the moderate numbers of servers like *K* ≥ 4, see e.g., [22]. To calculate the optimal threshold levels the MDP model and a policy iteration algorithm [23–25], which constructs a sequence of improved policies

that converges to optimal one, can also be used. While this approach is a powerful tool for solving many optimization tasks, it has significant limitations on dimension of the model, number of states, convergence in a heavy traffic case due to processing time and memory requirements. The contribution of the paper is three-fold. First, we provide a simple heuristic solution (HS) for a sub-optimal policy in order to avoid the time-consuming search for the optimal one in case of an arbitrary number of servers. Second, we investigate the possibility to use the equivalent queueing system with a preemption and a threshold-based policy to evaluate the lower and upper boundaries for the optimal mean number of customers in the system without preemption. Third, we check by means of a simulation, whether the proposed heuristic solution for the optimal thresholds is insensitive to changes in the form of inter-arrival and service time distributions.

This paper is organized as follows. In Section 2 we discuss a queueing model, formulate the corresponding MDP and specify a policy-iteration algorithm used for evaluation the optimal threshold policy. Section 3 introduces a heuristic solution for the optimal threshold levels based on a simple discrete fluid approximation, that turn out to be nearly optimal. In Section 4 we propose approximations to calculate the lower and upper bounds for the mean number of customers in the system under the optimal allocation policy. In Section 5 the simulation is used to provide sensitivity analysis of the heuristic solution to changes in inter-arrival and service time distributions. Finally, we make some conclusions and remarks.

#### **2. Mathematical Model and MDP Formulation**

Consider an infinite-capacity *M*/*M*/*K* queueing system with *K* heterogeneous servers and one common queue, see Figure 1. The customers arrive to the system according to a homogeneous Poisson process with a rate *λ*. The *j*th server has an exponentially distributed service time with a rate *μj*. The server *j* is called an available server if it is idle. The service of customers is has no preemption, i.e., a customer being served on a server cannot change it. In this case a threshold-based policy defined below which is used for the customer allocation has sense. The inter-arrival and service times are assumed to be mutually independent. Assume that the servers are enumerated in a way

$$
\mu\_1 \ge \dots \ge \mu\_K. \tag{1}
$$

**Figure 1.** Controllable multi-server queueing system with heterogeneous servers.

The stability condition is obviously defined through the inequality

$$
\lambda < \sum\_{j=1}^{K} \mu\_j. \tag{2}
$$

The controller or decision maker has a full information about system states. It allocates customers between servers according to a control policy *f* either to one of available servers or to queue at a new arrival and service completion epoch if it occurs with a nonempty queue. The system dynamics is common for the systems with one common queue and heterogeneous servers. At each arrival epoch the customer joins the queue and the controller can allocate the customer staying at the head of the queue to an available server *j*. At service completion epochs the controller may decide to allocate the customer from the head of nonempty queue to an available server or leave the customer in the queue. As it was mentioned above, the optimal control policy *f* , which minimizes the mean number of customers in the system with servers ordered according to (1), belongs to a set of threshold policies defined as a sequence of threshold levels

$$1 = q\_1 \le q\_2 \le \cdots \le q\_K < \infty. \tag{3}$$

According to this policy the first *k* servers must be occupied whenever there are *q* customers in the queue and *qk* ≤ *q* ≤ *qk*+<sup>1</sup> − 1 for *k* = 1, ... , *K* − 1, and *qk* ≤ *q* < ∞ for *k* = *K*. For example, the policy *f* for the system *M*/*M*/5 with *K* = 5 servers and thresholds (*q*1, *q*2, ... , *q*5)=(1, 3, 4, 5, 12) means that the fastest server is used whenever upon arrival of a customer it is free and there are *q* customers in the queue with 1 ≤ *q* ≤ 2. The first two servers are used when *q* = 3. The first three servers must be occupied whenever there are *q* = 4 customers in the queue, the first four customers are used when 5 ≤ *q* ≤ 11. All servers must be used when the queue length exceeds the level *q* ≥ 12. When the queue length drops below a specific threshold level, then the corresponding busy server remains idle after a service completion. As thresholds can take on different values, there are a huge number of admissible threshold policies. Hence the main goal is to calculate the optimal values for threshold levels *qk* and the minimized mean number of customers in the system.

We formulate the above optimization problem as a Markov decision process associated with a multi-dimensional continuous-time Markov chain

$$\{X(t)\}\_{t\geq 0} = \{Q(t), D\_1(t), \dots, D\_K(t)\}\_{t\geq 0} \tag{4}$$

with a set of admissible actions *A* = {0, 1, ... , *K*} with elements *a*, where *a* = 0 means the allocation of the customer to the queue and *a* = *j* -= 0 – to the *j*th server. The term *Q*(*t*) ∈ N0 in (4) denotes the number of customers in the queue at time *t*, *Dj*(*t*) ∈ {0, 1} – the state of server *j* at time *t*, where

$$D\_{\dot{j}}(t) = \begin{cases} 0 & \text{if server } \dot{j} \text{ is idle} \\ 1 & \text{if server } \dot{j} \text{ is busy.} \end{cases}$$

For any fixed allocation policy *f* we wish to guarantee that the process {*X*(*t*)}*t*≥0 is an irreducible, positive recurrent Markov chain with a state space *E* = {*x* = (*q*(*x*), *d*1(*x*), ... , *dK*(*x*))} ≡ N0 × {0, 1}*<sup>K</sup>* and infinitesimal generator Λ*f* which depend on the policy *f* . The notations *q*(*x*) and *dj*(*x*) will be used further in the paper to specify the components of the vector state *x* ∈ *E*, where *q*(*x*) denotes the queue length in state *x* and *dj*(*x*) – the state of the *j*th server in state *x*. We use next the notations

$$J\_0(\mathbf{x}) = \{ j : d\_j(\mathbf{x}) = 0 \}, \; l\_1(\mathbf{x}) = \{ j : d\_j(\mathbf{x}) = 1 \}$$

to specify respectively a set of idle and busy servers in state *x*, *<sup>A</sup>*(*x*) = *J*0(*x*) ∪ {0} ⊆ *A* the subset of admissible actions in state *x* and **e***j* stands for a vector of dimension *K* + 1 with 1 in the *j*th position (*j* = 0, 1, . . . , *K*) and 0 elsewhere.

For the ergodic Markov decision process a long-run average cost in the system per unit of time for the policy *f* coincides with the corresponding assemble average, i.e.,

$$\log^f = \limsup\_{t \to \infty} \frac{1}{t} V^f(\mathbf{x}, t) = \sum\_{y \in \mathbb{E}^f} l(y) \pi\_y^f < \infty,\tag{5}$$

where *l*(*y*) = *q*(*y*) + <sup>∑</sup>*Kj*=<sup>1</sup> *dj*(*y*) in our model is a number of customers in state *y* ∈ *E*,

$$V^f(\mathbf{x}, t) = \mathbb{E}^f \left[ \int\_0^t \left( Q(t) + \sum\_{j=1}^K D\_j(t) \right) dt \, | \mathbf{X}(0) = \mathbf{x} \right].$$

denotes the total average number of customers up to time *t* given initial state is *x* and *πfy* = P*f* [*X*(*t*) = *y*] is a stationary state probability of the process under given policy *f* . The policy *f* ∗ is said to be optimal when for *gf* defined in (5) we evaluate

$$\mathfrak{g}^\* = \inf\_f \mathfrak{g}^f = \min\_{q\_{2\prime}, \dots, q\_K} \mathfrak{g}(q\_{2\prime}, \dots, q\_K). \tag{6}$$

One fruitful approach to finding optimal policy *f* ∗ is through solving the Bellman's optimality equation, which in our case is of the form

$$B\upsilon(\mathbf{x}) = (\lambda + \sum\_{j \in I\_1(\mathbf{x})} \mu\_j)\upsilon(\mathbf{x}) + \mathbf{g}\_{\prime} \tag{7}$$

where *B* is a dynamic programming operator acting on a relative value function *v* : *E* → R which indicates a transient effect of an initial state *x* to the total average cost, and, according to Howard [23], the following asymptotic relation for the function *V <sup>f</sup>*(*<sup>x</sup>*, *t*) in case of a Markov-chain with one ergodic class holds,

$$\mathcal{V}^f(\mathbf{x}, t) = \mathbf{g}^f \mathbf{t} + \mathbf{v}^f(\mathbf{x}) + o(1), \; \mathbf{x} \in E, \; t \to \infty. \tag{8}$$

The functions *vf* and *gf* further in the paper will be denoted by *v* and *g* without upper index *f* .

**Proposition 1.** *The Bellman's optimality Equation (7) is defined as follows*

$$Bv(\mathbf{x}) = l(\mathbf{x}) + \lambda \min\_{\mathbf{a} \in A(\mathbf{x})} v(\mathbf{x} + \mathbf{e}\_{\mathbf{a}}) + \sum\_{j \in I\_l(\mathbf{x})} \mu\_j v(\mathbf{x} - \mathbf{e}\_j) \mathbf{1}\_{\{q(\mathbf{x}) = 0\}} + \tag{9}$$

$$+ \sum\_{j \in I\_l(\mathbf{x})} \mu\_j \min\_{\mathbf{a} \in A(\mathbf{x} - \mathbf{e}\_j - \mathbf{e}\_0)} v(\mathbf{x} - \mathbf{e}\_j - \mathbf{e}\_0 + \mathbf{e}\_d) \mathbf{1}\_{\{q(\mathbf{x}) > 0\}}$$

*where the notation* <sup>1</sup>{*A*} *specifies the indicator function, which takes the value 1 if the event A holds, and 0 otherwise.*

**Proof.** According to [26], the behaviour of the function *<sup>V</sup>*(*<sup>x</sup>*, *t*) in the interval [*t*, *t* + *dt*) by letting *t* → ∞ and taking into account the asymptotic relation (8) can be represented as a system of linear equations, which in general case is of the form

$$v(\mathbf{x}) = \min\_{a} \left\{ \frac{1}{\lambda\_{\mathbf{x}}(a)} \left[ \mathbf{c}(\mathbf{x}) + \sum\_{y \neq x} \lambda\_{\mathbf{x}y}(a) v(y) - \mathbf{g} \right] \right\}.$$

Evaluating these equations for analyzed queueing system and taking into account the transition rates of the specified Markov decision model we ge<sup>t</sup>

$$v(\mathbf{x}) = \frac{1}{\lambda + \sum\_{j \in J\_1(\mathbf{x})} \mu\_j} [Bv(\mathbf{x}) - \mathbf{g}].$$

The relation for *Bv*(*x*) contains the term *l*(*x*) specifying a number of customers in state *x* ∈ *EX*, the second term represents the changing of the state accompanying with a new arrival which occurs with a rate *λ*. The third and the fourth terms represent transitions due to service completions at server *j* with a rate *μj* by en empty and non-empty queue respectively.

To generate a data-set for the queueing system under study which includes optimal threshold levels and corresponding values of the system parameters the policy-iteration Algorithm 1 is used. For numerical results the truncated equivalent system with a buffer size *W* is considered. The algorithm consists of two main steps: policy evaluation and policy improvement. In the first step, for a given control policy *f* the system of linear equations for the relative value function *<sup>v</sup>*(*x*), *x* ∈ *E* \ {(0, 0, ... , 0)} must be solved together with an equation *g* = *<sup>λ</sup>v*(**<sup>e</sup>**1). In the second step, the obtained in the first step relative function is used to improve the current policy. The algorithm stops when a new policy coincides with a previous one. As an initial policy the FSF allocation policy is used.

**Algorithm 1** Policy-iteration algorithm

1: **procedure** PIA(*K*, *W*, *λ*, *μj*, *j* = 1, 2, . . . , *K*) 2: *f* (0)(*x*) = argmax*j*∈*J*0(*x*) *<sup>μ</sup>j* - Initial policy 3: *n* ← 0 4: *g*(*n*) ← *<sup>λ</sup>v*(*n*)(**<sup>e</sup>**1) - Policy evaluation 5: **for** *x* = (0, 1, 0, . . . , 0) **to** (*<sup>N</sup>*, 1, 1, . . . , 1) **do** 6: *v*(*n*)(*x*) ← 1 *λ* + ∑*j*∈*J*1(*x*) *μj <sup>l</sup>*(*x*) − *g*(*n*) + *<sup>λ</sup>v*(*n*)(*x* + **<sup>e</sup>***f*(*n*)(*x*)) + ∑ *j*∈*J*1(*x*) *<sup>μ</sup>j<sup>v</sup>*(*n*)(*<sup>x</sup>* − **<sup>e</sup>***j*)<sup>1</sup>{*q*(*x*)=0} + ∑ *j*∈*J*1(*x*) *<sup>μ</sup>j<sup>v</sup>*(*n*)(*<sup>x</sup>* − **e***j* − **e**0 + **<sup>e</sup>***f*(*n*)(*<sup>x</sup>*−**<sup>e</sup>***j*−**e**0))<sup>1</sup>{*q*(*x*)><sup>0</sup>} 7: **end for** 8: - Policy improvement *f* (*n*+<sup>1</sup>)(*x*) ← argmin*a*∈*<sup>A</sup>*(*x*) *v*(*n*)(*x* + **<sup>e</sup>***a*) 9: **if** *f* (*n*+<sup>1</sup>)(*x*) ← *f* (*n*)(*x*), *x* ∈ *E* **then return** *f* (*n*+<sup>1</sup>)(*x*), *<sup>v</sup>*(*n*)(*x*), *g*(*n*) 10: **else** *n* ← *n* + 1, **go to step 4** 11: **end if**

12: **end procedure**

We convert by implementing the Algorithm 1 the *K* + 1-dimensional state space *E* of the Markov decision process ordered in a certain way to a one-dimensional equivalent state space N0, Δ : *E* → N0, for state *x* = (*q*(*x*), *d*1(*x*),..., *dK*(*x*)) ∈ *E*,

$$\Delta(\mathbf{x}) = q(\mathbf{x}) \mathbf{2}^K + \sum\_{i=1}^K d\_i(\mathbf{x}) \mathbf{2}^{i-1}. \tag{10}$$

Therefore, in one-dimensional case the changing of the state *x* due to adding or removing a customer from the queue and due to occupation or departure of a customer from the *j*th server can be respectively represented in the form,

$$\begin{aligned} \Delta(\mathbf{x} \pm \mathbf{e}\_0) &= (q(\mathbf{x}) \pm 1)2^K + \sum\_{i=1}^K d\_i(\mathbf{x})2^{i-1} = \Delta(\mathbf{x}) \pm 2^K, \\ \Delta(\mathbf{x} \pm \mathbf{e}\_j) &= q(\mathbf{x})2^K + \sum\_{i=1}^K d\_i(\mathbf{x})2^{i-1} \pm 2^{j-1} = \Delta(\mathbf{x}) \pm 2^{j-1}, \, j = 1, 2, \dots, K. \end{aligned}$$

For further details about derivation of the dynamic programming equation needed to evaluate the optimal policy the interested readers are referred to [13]. The infinite buffer queueing system is approximated by a finite buffer equivalent system in such a way that the loss probability does not exceed some specified small number *ε* > 0.

**Remark 1.** *For the bounded buffer size W the number of states is*

$$|E| = 2^{\aleph}(\aleph + 1).$$

*If the queue length q* ≥ *qK, all servers must be busy and the system behaves like a M*/*M*/1 *queueing system with a service rate* <sup>∑</sup>*Kj*=<sup>1</sup> *μj. The stationary state probabilities <sup>π</sup>*(*q*,1,...,1)*, q* ≥ *qK, satisfy the difference equation*

$$
\lambda \pi\_{(q-1,1,\ldots,1)} - \left(\lambda + \sum\_{j=1}^{K} \mu\_{j}\right) \pi\_{(q,1,\ldots,1)} + \sum\_{j=1}^{K} \mu\_{j} \pi\_{(q+1,1,\ldots,1)} = 0,
$$

*which has a solution in a geometric form, <sup>π</sup>*(*q*,1,...,1) = *<sup>π</sup>*(*qK*,1,...,1)*ρq*−*qK , q* ≥ *qK. For details and theoretical substantiation see e.g., [27]. The threshold level qK can be estimated using HS (11). The buffer size W is chosen in such a way that it satisfies the condition for the loss probability*

$$\sum\_{q=W}^{\infty} \pi\_{(q,1,\dots,1)} = \pi\_{q\_K} \sum\_{q=W}^{\infty} \rho^{q-q\_K} \le \sum\_{q=W}^{\infty} \rho^{q-q\_K} = \frac{\rho^{W-q\_K}}{1-\rho} < \varepsilon,$$

*where ρ* = *λ* <sup>∑</sup>*Kj*=<sup>1</sup> *μj . After simple algebra it implies*

$$W > \frac{\log \varepsilon (1 - \rho)}{\log(\rho)} + q\_K.$$

The algorithm was implemented in C++ and tested for model problems up to 10 servers and a queue of size *W* = 100. It shows matching results to the proposed heuristic solution but is only viable for relative small number of servers. For system with 100 servers the maximum number of states would be in the order of 2<sup>100</sup> which makes a reasonable usage of the policy-iteration algorithm impossible.

**Example 1.** *Consider the system M*/*M*/5 *with K* = 5 *and λ* = 15*. The service rates take the following values:* (*μ*1, *μ*2, *μ*3, *μ*4, *μ*5)=(20, 8, 4, 2, 1)*. The buffer size is W* = 80 *which for ε* = 0.0001 *guaranties that W* > log 0.0001(<sup>1</sup>−14/36) log(14/36) + *q*5 = 22.2734*, where q*5 = 12 *is evaluated by (11). The table of evaluated control actions f*(*x*) *for selected system states x is of the form:*


*Threshold levels qk, k* = 1, ... , *K* = 5*, can be evaluated by comparing the optimal actions f*(*q*, 1, . . . , 1 ! "# \$ *k*−1 , 0, . . . , 0 ! "# \$ *K*−*k*+1 ) < *f*(*q* + 1, 1, . . . , 1 ! "# \$ *k*−1 , 0, . . . , 0 ! "# \$ *K*−*k*+1 ) *for q* = 0, ... , *W* − 1*. In this example the optimal policy f* ∗ *is defined here through a sequence of threshold levels* (*q*2, *q*3, *q*4, *q*5)=(3, 4, 5, 12) *and g*∗ = 4.92897*.*

## **3. Heuristic Solution**

As it was mentioned above, the policy iteration algorithm has restrictions on dimension of the model, number of states, convergence in a heavy traffic case. In this section we derive a heuristic solution (HS) to estimate threshold levels *qk*, *k* = 2, ... , *K*, for the arbitrary *K* using a simple discrete fluid approximation *Q*(*t*) − *Q t* + 1 ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *<sup>μ</sup>j*−*<sup>λ</sup>* = 1, *t* = 0, 1 ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *<sup>μ</sup>j*−*<sup>λ</sup>* , ... , *qk*−1 ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *<sup>μ</sup>j*−*<sup>λ</sup>* , for the queue length at time *t*, as illustrated in Figure 2.

**Figure 2.** Fluid approximation.

We now explain how this fluid model can be employed for our aim. Assume that *qk* is an optimal threshold to allocate the customer to server *k* in state (*qk* − 1, 1, . . . , 1 ! "# \$ *k*−1 , 0, . . . , 0 ! "# \$ *K*−*k*+1 ), where the first *k* − 1 servers are busy. Now we compare the queues of the system given initial state is *x*0 = (*qk*, 1, . . . , 1 ! "# \$ *k*−1 , 0, 0, . . . , 0 ! "# \$ *K*−*k* ), where the *k*th server is not used for a new customer, and *y*0 = (*qk* − 1, 1, . . . , 1 ! "# \$ *k*−1 , 1, 0, . . . , 0 ! "# \$ *K*−*k* ), where the *k*th server is occupied by a waiting customer. It is assumed that the stability condition holds. In Figure 2, the queue lengths are labeled by *A* = *qk* and

*B* = *qk* − 1. If the queue dynamics corresponded to the deterministic fluid, it would decrease at the rate ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *μj* − *λ*. When this rate is maintained until the queue is empty, it occurs respectively at points *D* = *qk* ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *<sup>μ</sup>j*−*<sup>λ</sup>* and *C* = *qk*−1 ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *<sup>μ</sup>j*−*<sup>λ</sup>* . The total holding times of customers in a queue with lengths *qk* and *qk*− 1 are equal obviously to the areas

$$F\_{AOD} = \frac{q\_k(q\_k + 1)}{2} \cdot \frac{1}{\sum\_{j=1}^{k-1} \mu\_j - \lambda} \quad \text{and} \quad F\_{BOC} = \frac{q\_k(q\_k - 1)}{2} \cdot \frac{1}{\sum\_{j=1}^{k-1} \mu\_j - \lambda}$$

of triangles *AOD* and *BOC*. The mean service time of customers by first *k* − 1 busy servers until the queue is empty starting from state *x*0 is equal to

$$q\_k \left( \frac{1}{\mu\_1} \frac{\mu\_1}{\sum\_{j=1}^{k-1} \mu\_j} + \dots + \frac{1}{\mu\_{k-1}} \frac{\mu\_{k-1}}{\sum\_{j=1}^{k-1} \mu\_j} \right) = q\_k \frac{k-1}{\sum\_{j=1}^{k-1} \mu\_j} \gamma$$

where *μi* ∑*<sup>k</sup>*−<sup>1</sup> *j*=1 *μj* is a probability to be served by the *i*th server, and starting from state *y*0—is equal to (*qk* − 1) *k*−1 ∑*<sup>k</sup>*−<sup>1</sup> *j*=1*μj*.

According to a specified deterministic fluid schema we formulate **Proposition 2.** *The optimal thresholds qk, k* = 2, . . . , *K, are defined by*

$$\eta\_k \approx \mathfrak{H}\_k = \min \left\{ \mathfrak{H}\_{k-1}, \left\lfloor \left( \sum\_{j=1}^{k-1} \mu\_j - \lambda \right) \left( \frac{1}{\mu\_k} - \frac{k-1}{\sum\_{j=1}^{k-1} \mu\_j} \right) \right\rfloor + 1 \right\}. \tag{11}$$

**Proof.** Denote by *<sup>V</sup>*(*x*) the overall average holding time of customers until the system is empty given initial state is *x* ∈ *E*. The decision to perform the allocation to the *k*th server in state (*qk* − 1, 1, . . . , 1 ! "# \$ *k*−1 , 0, . . . , 0 ! "# \$ *K*−*k*+1 ) must lead to a reduction of the overall holding time under fluid schema, i.e.,

$$V(x\_0) - V(y\_0) > 0.\tag{12}$$

where

$$V(\mathbf{x}\_0) = F\_{AOD} + q\_k \frac{k-1}{\sum\_{j=1}^{k-1} \mu\_j} + V(0, \underbrace{1, \dots, 1}\_{k-1}, \underbrace{0, \dots, 0}\_{K-k+1}), \tag{13}$$

$$\begin{split} V(y\_0) &= \frac{1}{\mu\_k} + V(q\_k - 1, \underbrace{1, \dots, 1}\_{k-1}, 0, \underbrace{0, \dots, 0}\_{K-k}) \\ &= \frac{1}{\mu\_k} + F\_{\text{BOC}} + (q\_k - 1) \frac{k - 1}{\sum\_{j=1}^{k-1} \mu\_j} + V(0, \underbrace{1, \dots, 1}\_{k-1}, \underbrace{0, \dots, 0}\_{K-k+1}). \end{split}$$

After substitution of (13) into (12) and some simple manipulations we ge<sup>t</sup> that the heuristic solution for the optimal threshold *qk* is defined then as the integer larger then 1 and the smallest integer (11) satisfying the inequality (12).

**Example 2.** *Consider a queueing system from previous example for K* = 5*. We generate a data-set S in form of a list*

$$S = \tag{14}$$

$$\left\{ (\lambda, \mu\_1, \dots, \mu\_K) \to (q\_2, \dots, q\_K) : \lambda \in [1, 45], \mu\_1, \dots, \mu\_K \in [1, 40], \lambda < \sum\_{j=1}^K \mu\_j, \mu\_1 \ge \dots \ge \mu\_K \right\}.$$

*and evaluate with HS for the corresponding thresholds qk, k* = 1, ... , *K. Confusion matrices in Figure 3 visualize the performance of proposed heuristics respectively for the threshold levels* (*q*2, *q*3, *q*4, *q*5)*. Each row of these matrices represents the instances in a predicted value while each column represents the instances in an actual value. We notice the heavily diagonally dominant matrices that indicates a very good classification. This fact is confirmed also by overall accuracies. Such metrics describe the closeness of the heuristic measurements to a real threshold value and are calculated through the ratio of correct predictions to total predictions. Calculations of the overall accuracies as well as the accuracies for results with an acceptable deviation of threshold values by* ±1 *from the real value are summarized in Table 1.*

**Table 1.** Accuracy for prediction with heuristic solution (HS).


**Figure 3.** Confusion matrices (**<sup>a</sup>**–**d**) for prediction of *q*2, *q*3, *q*4 and *q*5 using HS.

**Example 3.** *Consider the queueing system M*/*M*/*K with a different number of servers K* = 2, 3, ... , 8*. Service rates take the values as given in Table 2.*



*Table 3 lists values of optimal thresholds qk and corresponding heuristic solutions q*<sup>ˆ</sup>*k. As we see it, the maximum deviation of the optimal thresholds from the heuristic solution is 1 independently of the number of servers.*



*The large number of numerical experiments carried out using the policy-iteration algorithm and simulations allows us to conclude that deviations of certain thresholds by 1 have practically no effect on the value of the minimised function. Thus, we believe that the proposed heuristic solution is effective for an arbitrary number of servers.*

#### **4. Simple Bounds for the Optimal Mean Number of Customers in the System**

As established in previous section, the estimation of the optimal threshold policy is possible by means of a simple heuristic solution. Nevertheless, with this knowledge it is quite complicated to calculate the optimal mean number of customers in the system with a high number of servers. A possible solution for this problem consists in construction a proper approximation of the original system with a preemption by an equivalent system without preemption. In this case a multidimensional Markov-chain can be described by an one-dimensional process. In this section we develop approximations for the low *L* ¯ *l* and upper *L* ¯ *u* bounds for the optimal gain function *g* = *L* ¯ , *L* ¯ *l* ≤ *L* ¯ ≤ *L* ¯ *u*.

To calculate the lower bound *L* ¯ *l* we use a heterogeneous queueing system with a preemption and threshold-based control policy denoted by *Sl* Further define by {*Yl*(*t*)}*t*≥0 the corresponding continuous-time Markov chain with a state space *El* = {*y* : *y* ∈ N0} describing the number of customers in the system. The state transition diagram of this system is illustrated in Figure 4.

**Figure 4.** The state transition diagram for the queueing system *Sl*.

The optimal threshold levels *qk*, *k* = 2, ... , *K* are calculated using the heuristic solution (11). Obviously, since the customer being served in a slower server can change it as the faster one becomes empty, the mean number of customers in the system must be lower comparing to an original queue. The steady-state probabilities *<sup>π</sup>y* = lim*<sup>t</sup>*→∞ <sup>P</sup>[*Yl*(*t*) = *y*] obviously exist under the stability condition (2).

**Proposition 3.** *The steady-state probabilities <sup>π</sup>y of the Markov chain* {*Yl*(*t*)}*t*≥0 *are given by*

$$\begin{split} \pi\_{0} &= \left[1+\sum\_{y=1}^{q\_{2}} \left(\frac{\lambda}{\mu\_{1}}\right)^{y}+\sum\_{k=3}^{K}\sum\_{y=q\_{k-1}}^{q\_{k}} \left(\frac{\lambda}{\sum\_{j=1}^{k-1}\mu\_{j}}\right)^{y-q\_{k-1}}\cdot\prod\_{i=1}^{k-2} \left(\frac{\lambda}{\sum\_{j=1}^{i}\mu\_{j}}\right)^{q\_{i}+1-q\_{i}}+\\ &+\prod\_{i=1}^{K-1} \left(\frac{\lambda}{\sum\_{j=1}^{i}\mu\_{j}}\right)^{q\_{i+1}-q\_{i}}\cdot\frac{\lambda}{\sum\_{j=1}^{K}\mu\_{j}-\lambda}\right]^{-1},\\ \pi\_{y} &= \begin{cases} \pi\_{0}\cdot\left(\frac{\lambda}{\mu\_{1}}\right)^{y}, & 1 \leq y \leq q\_{2}\\\ \pi\_{0}\cdot\prod\_{i=1}^{k-2} \left(\frac{\lambda}{\sum\_{j=1}^{i}\mu\_{j}}\right)^{q\_{i+1}-q\_{i}}\cdot\left(\frac{\lambda}{\sum\_{j=1}^{k-1}\mu\_{j}}\right)^{y-q\_{k-1}}, & q\_{k-1}\leq y \leq q\_{k}, 3\leq k \leq K\\\ \pi\_{0}\cdot\prod\_{i=1}^{K-1} \left(\frac{\lambda}{\sum\_{j=1}^{i}\mu\_{j}}\right)^{q\_{i+1}-q\_{i}}\cdot\left(\frac{\lambda}{\sum\_{j=1}^{i}\mu\_{j}}\right)^{y-q\_{K}}, & y \geq q\_{K}+1 \end{cases} \end{split}$$

**Proof.** The proposition follows directly from the properties of the ergodic birth-and-death process {*Yl*(*t*)}*t*≥0 [28].

From the probabilities *<sup>π</sup>y* it is possible to derive the performance measures of the system, e.g., the mean number of customers in the system *L* ¯ and the mean number of customers in the queue *Q* ¯ .

**Corollary 1.** *The mean number of customers in the system Sl satisfies the relation*

$$\bar{L}\_l = \sum\_{y=0}^{\infty} \pi\_y = \sum\_{y=0}^{q\_K} \pi\_y + \frac{\lambda \left(\sum\_{j=1}^{K} \mu\_j + (\sum\_{j=1}^{K} \mu\_j - \lambda) q\_K\right)}{(\sum\_{j=1}^{K} \mu\_j - \lambda)^2} \pi\_{q\_K}.\tag{15}$$

The upper bound *L* ¯ *u* for the optimal mean number of customers in the system can be obtained from an equivalent system under the FSF policy, see a state transition diagram in Figure 5, where *qk* = 1 for *k* = 1, . . . , *K*.

**Figure 5.** The state transition diagram for the heterogeneous queueing system with the fastest server first (FSF) policy.

In this diagram the group of states with a certain number of busy servers are labeled by (*q*, <sup>∑</sup>*Kj*=<sup>1</sup> *dj*) according to the number of busy servers in a state. An analytical solution for the heterogeneous queueing system with the FSF policy, where all states of servers are taken into account, although possible, but it is limited by the number of servers in the system. The latter system can be approximated in turn by a heterogeneous system *Su* with a preemption with appropriate evaluated service rates *mj*, *j* = 1, ... , *K*. The dynamics of the system *Su* is described by the continuous-time Markov-chain {*Yu*(*t*)} with a state space *Eu* = {*y* : *y* ∈ N0}, where *Yu*(*t*) specifies the number of customers in the system at time *t*. The state transition diagram for this Markov-chain is presented in Figure 6.

**Figure 6.** The state transition diagram for the queueing system *Su*.

The approximations for *mj* are based on the observation that the incentive to occupy the slower servers is getting higher as arrival rate increases.

**Proposition 4.** *The service rates mj, j* = 1, ... , *K* − 1*, of the queueing model* {*Yu*(*t*)}*t*≥0 *can be approximated by the following relations*

$$m\_{j} = \begin{cases} \sum\_{i=1}^{j} \mu\_{i} & 0 < \lambda \le \sum\_{i=0}^{j-1} \mu\_{K-i} \\ \sum\_{i=0}^{j-1} \frac{\mu\_{K-i}}{\lambda} \sum\_{i=1}^{j} \mu\_{i} + \sum\_{i=1}^{k-j} \left(\frac{\mu\_{K-i-j+1}}{\lambda} \sum\_{n=1}^{k} \mu\_{n+i}\right) \\ & + \left(1 - \sum\_{i=0}^{k-1} \frac{\mu\_{K-i}}{\lambda}\right) \sum\_{i=k-j+2}^{k+1} \mu\_{j} \quad \sum\_{i=0}^{k-1} \mu\_{K-i} < \lambda \le \sum\_{i=0}^{k} \mu\_{K-i}, j \le k \le K-1, \\ m\_{K} = \sum\_{i=1}^{K} \mu\_{i}. \end{cases} \tag{16}$$

**Proof.** For small arrival rate, e.g., 0 < *λ* ≤ *μ<sup>K</sup>*, most probably that only the first server which is the fastest one will be occupied and hence it will have the main contribution to the service rate *m*1. When the values *λ* are larger, e.g., *μK* < *λ* ≤ *μ<sup>K</sup>*−<sup>1</sup> + *μ<sup>K</sup>*, the first server will have a contribution to *μ*1 with a probability *μKλ* and the second server—with a complementary probability (1 − *μKλ* ). For larger values of *λ*, *μ<sup>K</sup>*−<sup>1</sup> + *μK* < *λ* ≤ *μ<sup>K</sup>*−<sup>2</sup> + *μ<sup>K</sup>*−<sup>1</sup> + *μK* the first three servers will contribute to *μ*1 with probabilities *μKλ* , *μK*−1 *λ* and 1 − *μK*−1+*μ<sup>K</sup> λ* . Similarly we may derive the contribution of the servers larger values of *λ* up to the condition *μ*2 + ... , +*μK* < *λ* ≤ *μ*1 + *μ*2 + ··· + *μ<sup>K</sup>*. To evaluate the contribution to the service rate *m*2 in a state with two busy servers the same schema can be used. When *λ* is small, 0 < *λ* ≤ *μ<sup>K</sup>*−<sup>1</sup> + *μ<sup>K</sup>*, the first two servers will form the service rate *μ*2. If *μ<sup>K</sup>*−<sup>1</sup> + *μK* < *λ* ≤ *μ<sup>K</sup>*−<sup>2</sup> + *μ<sup>K</sup>*−<sup>1</sup> + *μ<sup>K</sup>*, the first three servers will have a contribution to *μ*2, the first and second servers contribute with a probability *μ*1+*μ*2 *λ* , the second and fourth – with a probability 1 − *μ*2+*μ*3 *λ* . When <sup>∑</sup><sup>2</sup>*j*=<sup>0</sup> *μ<sup>K</sup>*−*j* < *λ* ≤ <sup>∑</sup><sup>3</sup>*j*=<sup>0</sup> *μ<sup>K</sup>*−*j*, the four faster servers will serve the customers, the first and second server with probability *μK*−1+*μ<sup>K</sup> λ* , the second and third server with probability *μ*3*λ* and the third and fourth with probability 1 − *μK*−2+*μK*−1+*μ<sup>K</sup> λ* . The procedure can be continued for larger values of *λ* in a similar way as before. The proposed arguments can be summarized for all service rates *mj*, *j* = 1, . . . , *K*, and the arbitrary number of servers *K* in form of the approximation (16).

It can be verified that for any *j* the quotient *λmj* < 1 and *λ* < *mK* = <sup>∑</sup>*Kj*=<sup>1</sup> *μj*. Now we can use the approximation (16) to derive the steady-state distribution.

**Proposition 5.** *The steady-state probabilities <sup>π</sup>y of the Markov-chain* {*Yu*(*t*)}*t*≥0 *are given by*

$$\begin{aligned} \pi\_0 &= \left[ 1 + \sum\_{y=1}^{K-1} \frac{\lambda^y}{\prod\_{j=1}^y m\_j} + \frac{\lambda^{K+1}}{(m\_K - \lambda)\prod\_{j=1}^K m\_j} \right]^{-1}, \\ \pi\_y &= \begin{cases} \pi\_0 \cdot \frac{\lambda^y}{\prod\_{j=1}^y m\_j} & 1 \le y \le K, \\ \pi\_0 \cdot \frac{\lambda^y}{m\_K^{y-K} \prod\_{j=1}^K m\_j} & y \ge K+1. \end{cases} \end{aligned}$$

**Proof.** The proposition follows from the properties of the ergodic birth-and-death process {*Yu*(*t*)}*t*≥0 [28].

**Corollary 2.** *The mean number of customers in the system Su satisfies the relation*

$$L\_u = \sum\_{y=0}^{\infty} \pi\_y = \sum\_{y=0}^{K} \pi\_y + \frac{\lambda \left(m\_K + (m\_K - \lambda)K\right)}{(m\_K - \lambda)^2}.\tag{17}$$

**Example 4.** *Consider the M*/*M*/*K queueing system with a total service intensity equal to* <sup>∑</sup>*Kj*=<sup>1</sup> *μj* = 35*. Here we analyse the systems with different number of servers and their heterogeneity.*

*A Gini's index <sup>G</sup>*(*μ*)*,* 0 ≤ *<sup>G</sup>*(*μ*) ≤ 1*, can be used to measure the inequality for individual data μ, see for details [29], and hence is quite appropriate as a metric for the heterogeneity of servers. This index can be obtained by computing the moments of the data set μ* = {*μ<sup>K</sup>*, *μ<sup>K</sup>*−1,..., *μ*1} *with μj sorted in increasing order,*

$$G(\mu) = \frac{2Cov[\mu\_\prime n\_K]}{K\bar{\mu}}, \bar{\mu} = \frac{1}{K} \sum\_{j=1}^{K} \mu\_{j\prime} n\_K = \{1, 2, \dots, K\}.$$

*The Gini's index ranges from a minimum value of zero, when all individuals are equal, e.g., for the homogeneous servers <sup>G</sup>*(*μ*) = 0*, to a theoretical maximum of one when every individual except one has a value zero. Two different values of heterogeneity are studied within this example, namely <sup>G</sup>*(*μ*) = 0.63 *and* *<sup>G</sup>*(*μ*) = 0.40*. The corresponding values of service intensities for three types of systems with K* = 3*, K* = 5 *and K* = 8 *are presented in Table 4.*


**Table 4.** Service intensities versus Gini's index.

*In Figures 7–9 we display the values L* ¯ *with bounds L* ¯ *l and L* ¯ *u calculated respectively by the policy-iteration Algorithm 1 and by expressions (15) and (17) as functions of λ and number of servers K* = 3, 5, 8*. The Gini's index <sup>G</sup>*(*μ*) = 0.63 *in a figures labeled by (a) and <sup>G</sup>*(*μ*) = 0.40*—by (b). The curves in figures show, that the mean number of customers as well as the size of the gap between the lower and upper bounds increases with increasing values of K. As expected, the low and upper bounds must coincide with a mean value L* ¯ *for the system with homogeneous servers, where <sup>G</sup>*(*μ*) = 0*. Indeed, in figures with less heterogeneity of servers the curves for L* ¯ *L* ¯ *u and L* ¯ *l are getting closer, as the Gini's index decreases. Moreover, we notice that the functions take similar values in a light traffic case when λ* << <sup>∑</sup>*Kj*=<sup>1</sup> *μj and tend to the same values as the traffic becomes heavier, i.e., if λ* → <sup>∑</sup>*Kj*=<sup>1</sup> *μj.*

**Figure 7.** Mean value *L* ¯ with the bounds versus *λ*.

**Figure 8.** Mean value *L* ¯ with the bounds versus *λ*.

**Figure 9.** Mean value *L*¯ with the bounds versus *λ*.

#### **5. Sensitivity Analysis of the Heuristic Solution to Changes in Distribution**

Another natural method to calculate the mean number of customers in the system and to check whether a certain policy leads to a reduction of this value is a simulation. This approach, while time-consuming, also makes it possible to examine the sensitivity of the optimal control policy *f* and the corresponding mean performance characteristics to changes in distribution types other than exponential. An implementation of a simulation model is shown in the Figure 10 bellow.

**Figure 10.** Simulation of the heterogeneous queueing system without preemption.

For this specific implementation it is possible to set the number of servers, the buffer capacity, threshold levels (limits), the arrival and service rates. The customers are indicated by a black circle, and are numbered accordingly to their arrival times. On the graphical interface there are also fields that show the actual amount of customers in the system, the average number and the total number of customers in the system including the already processed customers, the number of lost customers due to the truncated buffer capacity. The stability condition is taken into account and the buffer size is big enough so there should be hardly any lost customers. Hence the results with a truncated queue are comparable to systems with infinite queue lengths. Unfortunately, simulations are also unfit to solve systems with a large number of servers and states, as one would need to simulate a large number of different configurations with thousands of customers to ge<sup>t</sup> acceptable results. This fact further confirms the relevance of the results obtained in the previous sections.

The inter-arrival *A* and service times *Bj*, *j* = 1, 2, ... , *K*, of customers follow exponential, gamma, Pareto, log-normal and hyper-exponential distributions. To ge<sup>t</sup> comparable results the parameters of the distributions are chosen to have the same means E[*A*] = 1*λ* , E[*Bj*] = 1*μj* , *j* = 1, 2, ... , *K*, and variances V[*A*] = 1*λ*<sup>2</sup> , V[*Bj*] = 1*μ*2*j* , *j* = 1, 2, ... , *K*, as the system driven by exponential distribution. For this purpose we use formulas describing the parameters in terms of the mean and variance given by Toth et al. [30]. The main goal of the simulation experiments consists in understanding weather the heuristic solution (11) for *λ* = 1 E[*A*] and *μj* = 1 E[*Bj*] is insensitive to changes in forms of distributions.

**Example 5.** *As a reference, we first simulate the system M*/*M*/5 *with an arrival rate λ* = 25*,* (*μ*1, *μ*2, *μ*3, *μ*4, *μ*5)=(20, 8, 4, 2, 1)*. Table 6 lists the mean number of customers in the system the optima, heuristic, FSF policies as well for other threshold policies with lower and higher values of thresholds.*

*We now simulate the systems like G I*/*M*/5*, M*/*G*/5 *as well as G I*/*G*/5 *with heterogeneous servers and threshold-based allocation policy where either the inter-arrival times, the service time or both follow one of the distributions mentioned above. For all the following simulation results we hereby want to find the mean number of customers in the system L* ¯ *for the policies specified in the preceding table for the markovian queueing system M*/*M*/5*. Table 6 provide a sensitivity and comparative analysis of the system performance obtained by employing different inter-arrival and service time distributions.*

*Of course, finding the optimal control policy through a simulation modelling is not an easy task. But in our example, we do not want to find the real values of the optimum thresholds, but rather to understand whether the optimum control and heuristic solution changes drastically when the distribution of the corresponding random values characterising the behaviour of a queueing system changes. Note that L* ¯ *for the optimal and heuristic policy takes always the values between those corresponding to the policies with lower and higher thresholds. The results of this example, as well as numerous other results carried out for systems with other parameters, show that while the absolute values of the mean number of customers vary as distributions change, the values of the optimal and heuristic thresholds are concentrated sufficiently close to the respective thresholds for markovian systems. Thus, we strongly believe, it is possible to use a heuristic solution with the replacement of exponential intensities by intensities of arbitrary distributions as a quasi-optimal solution in the problem of minimising the mean number of customers in the system with non-exponential inter-arrival and service time distributions.*


**Table 5.** Simulation results for the *M*/*M*/5 queueing system.

**Table 6.** Simulation results for the *G I*/*M*/5, *M*/*G*/5 and *G I*/*G*/5 queueing systems.



**Table 6.** *Cont*.
