**1. Introduction**

Queueing theory is successfully applied in various fields of human activity for optimization of the consumption and scheduling certain restricted resources and provisioning the high quality of service. The overwhelming majority of the existing literature in this theory is devoted to the systems with homogeneous customers; see, e.g., [1]. Because real-world customers are very often heterogeneous in many respects, new developments in the analysis of queues with heterogeneous customers are of grea<sup>t</sup> importance. The heterogeneity of the customers with respect to the required resources, level of service, and their economical or social value causes the necessity of the optimal managemen<sup>t</sup> of their service. Such managemen<sup>t</sup> can be implemented, e.g., in various generalizations of polling disciplines, processor sharing, applying versatile priority schemes. For some references, see, e.g., [2]. Priority schemes assume the assignment of a certain priority to each class of customers and providing the advantage of access to the restricted resource (we will call this resource as a server) to available customers having the highest priority. Static priorities sugges<sup>t</sup> that once the priorities are assigned, a low priority customer does not have any chance to start service until the server finishes service of all high priority customers presenting in the system. This may cause a low priority customer to wait in the queue much longer than the just arrived high priority customer. To avoid this evident unfairness to the low priority customers, dynamic priorities were taken into consideration. The dynamic priority assumes, e.g., that the low priority customers obtain the chance to start service in presence of high priority customers when: (i) the queue of the low priority customers exceeds some threshold values, see, e.g., [3–6]; or (ii) some relation between the queue lengths of priority and non-priority customers is

fulfilled, see, e.g., [7]; or (iii) a certain limit of the number of high priority customers that can overtake the low priority customers is exceeded, see, e.g., [8]. The use of dynamic priorities allows to essentially improve the quality of the system operation. The shortcomings of such priorities are: (i) the necessity to permanently monitor the values of the queue length of different classes of customers what is not always possible (or costly) in some real-world systems and (ii) dependence of the waiting time of a concrete low priority customer on the rate of future arrival of other low priority customers. Another opportunity of providing more fair access to low priority customers is assumed in the models where a low priority customer can become higher priority customer after a certain period of waiting in the buffer. A currently popular model assumes that the low priority customers accumulate a priority during the stay in the queue. The accumulation of the priority may be described as some function, e.g., linear or piece-wise linear function, of the time spent by the customer in a queue. The rate of the increase of the priority may depend on the class to which the customer belongs. Such a type of model was considered, e.g., in the papers [9–14]. The main interest to the queues with accumulating priorities stems from their applicability to modeling operation of emergency departments of hospitals. Arriving customers (patients) are preliminarily sorted (triaged) into several groups according to the severity of the patient's condition. However, during the waiting for treatment by the doctors, a state of health of some patient, which was initially classified as not requiring very urgen<sup>t</sup> treatment, can become essentially worse and this patient has to be transferred to the group of very urgen<sup>t</sup> patients. Because in the described situation the increase of the priority of a customer is not defined by some deterministic function of the elapsed waiting time, another type of model, with the randomized change of a priority, exists in the literature. This type of model was considered, e.g., in [15,16] and the recent paper [17]. The table presenting the state of art in the analysis of queues with priority change after some random amount of time is presented in [17]. It follows from that table that only a few papers consider the models where the arrival processes of customers of different types are not defined by the stationary Poisson arrival process, while it is already well recognized that the flows in many real systems and networks are poorly described by the stationary Poisson arrival process. The rare exceptions, when a more complicated arrival process is considered, are the papers [18–20]. In all these papers, an arbitrary number of priority classes is suggested. In [18], it is assumed that all the flows, except the flow having the highest priority, are described by the stationary Poisson arrival process. The arrival flow of customers having the highest priority is described by a much more general Markov arrival process (*MAP*); see, e.g., [21–23] for more details. In [19,20], the arrival flow is described by even more general marked Markov arrival process (*MMAP*). The *MMAP*, as the essential generalization of the *MAP* to the case of heterogeneous customers, was introduced in [24]. The models with the *MAP* or *MMAP* are much more difficult for analysis than the models with the stationary Poisson arrival process. This explains why only some bounds and tail distributions were obtained in [18] and only the problem of establishing the ergodicity condition (but not the problem of computation of the stationary distribution of the system states and performance measures) is solved in [19,20]. The problem of computation of the stationary distribution of the system states is successfully solved in [17] but only for two classes of customers. The advantage of our paper over [17] is that we sugges<sup>t</sup> any finite number *R* of priority classes. The arrival process is described by the *MMAP*. The system has a finite buffer and any arriving customer is admitted to the buffer if it is not full. If the buffer is full while some waiting customers have lower priority than the arriving customer, the arriving customer pushes out from the buffer a customer having the lowest priority among the presenting ones. During the stay in the buffer, after an exponentially distributed time, any customer can increase its priority. The service time has a phase-type distribution. After the service completion, the next service is provided for a customer with the highest priority among the presented in the buffer.

It is worth mentioning that the problem of assigning the priorities to different classes of customers is often closely related to the problem of the account of possible impatience of customers from different classes, e.g., if customers of two types are almost equally valuable for the system, the more impatient customers should be given higher priority (and the possibility to increase the priority during the

waiting time in a buffer) to avoid the loss of the customer and possible starvation (and poor utilization) of the server in the future. In our model, we pay significant attention to the account of impatience.

Besides the above-mentioned popular model of treatment of patients in a hospital emergency department, we mention the following examples of potential applications of the considered model to the analysis and optimization of real-world systems.

	- (a) An emergency call—when a patient suddenly has diseases, conditions and (or) exacerbation of chronic diseases that pose a threat to the patient's life and (or) others requiring emergency medical intervention;
	- (b) An urgen<sup>t</sup> call is associated with a sharp deterioration in the patient's health status when it is not possible to clarify the reasons for treatment;
	- (c) A less urgen<sup>t</sup> call—when the patient suddenly has diseases, conditions, and/or exacerbation of chronic diseases without obvious signs of a threat to the patient's life, requiring urgen<sup>t</sup> medical intervention.

Accordingly, the emergency call has the highest priority, the urgen<sup>t</sup> call has the middle priority, and the less urgen<sup>t</sup> calls have the lowest priority. However, along with this categorization and establishing the priority in service, there exist strict standards for starting the provisioning of help. A dispatcher has to assign an ambulance car for providing help to patients before the fixed deadlines. In Minsk, the capital of the Republic of Belarus, these standards are fixed as four minutes for the emergency call, fifteen minutes for the urgen<sup>t</sup> call, and sixty minutes for the less urgen<sup>t</sup> call. Violation of this standard is punished. In this example, the service time can be interpreted as a time between the sequential release of ambulance cars. The service time essentially depends on the number of available cars and medical teams. The results of the analysis of the model given in our paper can be useful for the optimization of the work of the described first aid station via a proper choice of the number of ambulance teams to guarantee the required quality of service.

Methodological value of the paper consists of presenting a way for analysis of various transitions of a set of interacting Markov processes, which define the dynamics of the number of customers of several types in the system, caused by new customers of various types arrival, service completion, departure due to impatience, changing the priority, and pushing out the low priority customers in the case of the buffer overflow.

The organization of the text is as follows. In Section 2, the mathematical model is described and graphically illustrated. The multi-dimensional Markov chain including as components the total number of customers in the system, the states of the underlying processes of customers arrival and service, and the number of customers of all types presenting in the system is defined in Section 3. The set of matrices defining the probabilities or intensities of transitions of the number of customers of all types are given and the generator of the Markov chain is written down. Formulas for computation of the main performance measures of the system are presented in Section 4. The numerical example

illustrating the dependence of performance measures of the system on the capacity of the buffer is presented in Section 5. The importance of account of a complicated pattern of arrival process and variance of the service time is demonstrated there. Section 6 concludes the paper.

## **2. Mathematical Model**

*<sup>r</sup>*=1

We consider a single-server queuing system where service is provided to *R* types of customers. The structure of the system is presented in Figure 1.

**Figure 1.** Structure of the system.

The customer arrival process is assumed to be defined by the *MMAP* (see, e.g., [24]). As the recent papers where the queuing models with the *MMAP* are analyzed, we can mention, e.g., [25–27].

Customer arrivals in the *MMAP* can occur at the moments of the transitions of the irreducible continuous-time Markov chain *νt*, *t* ≥ 0, having a state space {1, 2, ..., *<sup>W</sup>*}. The *MMAP* is completely described by the square matrices *D*0, *Dr*, *r* = 1, *R*. Hereinafter, the denotation like *r* = 1, *R* means that the parameter *r* takes values {1, . . . , *<sup>R</sup>*}.

The matrix *Dr* defines the transition intensities of the underlying process *νt* that lead to arrival of a type-*r* customer, *r* = 1, *R*. The non-diagonal entries of the matrix *D*0 define the transition intensities of the underlying process that do not lead to any arrival. The moduli of the diagonal entries of the matrix *D*0 define the intensity of the the process *νt* departure of from its states. The matrix *D*(1) = *D*0 + *D* where *D* = *R* ∑ *Dr* is the generator of the underlying process.

 The mean arrival rate *λ* is defined by *λ* = *θD***e** where *θ* is the invariant probability row vector of the underlying process. This vector is computed as the unique solution for the finite system *θD*(1) = **0**, *θ***e** = 1. Hereinafter, **e** denotes a column vector of appropriate size consisting of 1s and **0** denotes a row vector consisting of zeroes.

The mean rate *λr* of type-*r* customers arrival is computed as *λr* = *θDr***<sup>e</sup>**, *r* = 1, *R*. The squared coefficient of variation *<sup>c</sup>*2*var* of the intervals between successive arrivals is given by *<sup>c</sup>*2*var* = <sup>2</sup>*λθ*(−*D*0)−1**<sup>e</sup>** − 1. The coefficient of correlation *ccor* of two successive intervals between arrivals is given by

$$\mathcal{c}\_{\rm cor} = (\lambda \theta (-D\_0)^{-1} D (-D\_0)^{-1} \mathbf{e} - 1) / c\_{\rm var}^2 \text{s}$$

The system has the finite common buffer space for storing the customers that arrive when the server is busy. The capacity of the buffer is *N*, *N* ≥ 1. Therefore, the total number of customers of all types, which can stay in the system simultaneously, is restricted by the number *N* + 1. If a customer of any type arrives when the server is idle, the customer immediately starts processing by the server (service). If the server is busy but the buffer is not full, the customer of any type is placed into the buffer dedicated to this type of customers. There is no specific restriction on the capacity of the dedicated

*l*=1

buffers, except that the total number of the customers staying in all these buffers always does not exceed the capacity *N*.

Customers of different types have different priorities. The priority defines the fate of the customer if it arrives when the buffer is full and the order of picking up the customers from the buffer when the server finishes service. We assume that type-*<sup>r</sup>*, *r* = 1, *R*, customers have the non-preemptive priority over type-*l* customers, *l* = *r* + 1, *R*. This means the following.


We assume that during the stay in the system, each customer of type-*<sup>r</sup>*, *r* = 2, *R*, can increase its priority. It means that after exponentially distributed time with the parameter *αr* a type-*r* customer becomes a type-*l* customer with the probability *pr*,*l*, *l* = 1,*r* − 1, independently of other customers. Here, *<sup>r</sup>*−1 ∑ *pr*,*<sup>l</sup>*= 1, *r* = 2, *R*.

It is worth noting that more popular in the existing literature assumption is that only the head-of-the-line customer of each type can make a jump to the end of the queue of higher priority customers. We assume that each customer of any type can jump to higher priority class, independently of other customers. This means that not only the head-of-the-line customer has a clock counting the time till the jump, but each customer (not of the highest priority) has its own clock. Our assumption seems more realistic in some potential applications, e.g., health of any patient, not only the head-of-the-line patient in emergency department modeling example, can suddenly become worse. The same is true in applications where various information units become obsolete independently of the other units or different perishable foods have independent spoiling times. Note also, that, using the slight modification of some matrix blocks defined and constructed in the next section, the presented results can be extended to the models with the head-of-the-line customer priority jumps as well.

Customers staying in the buffer are impatient and can leave the system without service, independently of other customers, if the waiting time is too long. A type-*r* customer leaves the system without service after an exponentially distributed patience time with the parameter *γr* , *γr* ≥ 0. Let us denote *γ* = ( *γ*1, *γ*2,..., *<sup>γ</sup>R*). If the customer changes the priority, its patience time starts from the early beginning with the parameter corresponding to the new priority.

We assume that the service time of any type customer has a *PH* distribution with the underlying Markov process *mt*, *t* ≥ 0, having a finite state space {1, ... , *M*, *M* + 1} and the irreducible representation (*β*, *<sup>S</sup>*), see, [28]. We denote **S0** = <sup>−</sup>*S***<sup>e</sup>**. The mean service time is given by *b*1 = *β*(−*<sup>S</sup>*)−1**e**. The mean service rate can be compute as *μ* = *b*−<sup>1</sup> 1 .

If during the service completion epoch there are customers in the buffer, the first customer among having the highest priority starts service. Otherwise, the server remains idle until the next arrival moment.

#### **3. Process of the System States**

The behavior of the system under study can be described by the regular irreducible continuous-time Markov chain

$$\vec{\xi}\_t = \{n\_t, \nu\_{t'}m\_t, \eta\_t^{(1)}, \dots, \eta\_t^{(R)}\}, \ t \ge 0,$$

where, during the epoch *t*,


To investigate the Markov chain *ξ<sup>t</sup>*, *t* ≥ 0, let us enumerate its states in the direct lexicographic order of the components *νt* and *mt*, and in the reverse lexicographic order of the components *η*(1) *t*, ..., *η*(*R*) *t*.

The most technically difficult and important part of the research is the analysis of the transitions of the process of the number of different type customers in the buffer. Let us firstly consider the process *ζ*(*n*) *t* = {*η*(1) *t* , ... , *η*(*R*) *t* }, *t* ≥ 0, *η*(*r*) *t* = 0, *n*, *r* = 1, *R*, *R*∑ *<sup>r</sup>*=1 *η*(*r*) *t* = *n*. The process *ζ*(*n*) *t* describes the transitions of the number of different types customers in the buffer when the total number of customers in the buffer is *n*. First, we present the algorithms for computing the set of the matrices that define the transition probabilities or transition intensities of the process *ζ*(*n*) *t* at the moments of the changes, due to various reasons, of the components of this process when *n*, *n* = 1, *N*, customers stay in the buffer.
