# **Age of Information Concept, Metric and Tool for Network Control**

Edited by Anthony Ephremides and Yin Sun Printed Edition of the Special Issue Published in *Entropy*

www.mdpi.com/journal/entropy

## **Age of Information: Concept, Metric and Tool for Network Control**

## **Age of Information: Concept, Metric and Tool for Network Control**

Editors

**Anthony Ephremides Yin Sun**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Anthony Ephremides University of Maryland College Park USA

Yin Sun Auburn University Auburn USA

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Entropy* (ISSN 1099-4300) (available at: https://www.mdpi.com/journal/entropy/special issues/ age of information).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-7292-5 (Hbk) ISBN 978-3-0365-7293-2 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editors**

#### **Anthony Ephremides**

Anthony Ephremides has been at the University of Maryland for over 50 years, conducting research and teaching in the areas of information theory, communication networks, and system theory. His work has pioneered the areas of ad hoc wireless networks, cross-layer integration in networks, energy-efficient communications, and most recently, the field of semantic communications as exemplified in information freshness and communication for computing and actuation. He is a native of Greece where he received his undergraduate education before completing a Ph.D. at Princeton University in 1971. He has mentored over 40 doctoral students and received numerous awards. He has been engaged in numerous professional activities including being a member of the Board of Directors of IEEE and a Life Fellow of the institute. He has published and lectured extensively and maintains active collaborations with colleagues around the world.

#### **Yin Sun**

Yin Sun is an Assistant Professor in the Department of Electrical and Computer Engineering at Auburn University, Alabama. He received his B.Eng. and Ph.D. degrees in Electronic Engineering from Tsinghua University, in 2006 and 2011, respectively. He was a Postdoctoral Scholar and Research Associate at the Ohio State University from 2011–2017. His research interests include Wireless Networks, Machine Learning, Semantic Communications, Age of Information, Information Theory, and Robotic Control. His articles received the Best Student Paper Award of the IEEE/IFIP WiOpt 2013, Best Paper Award of the IEEE/IFIP WiOpt 2019, runner-up for the Best Paper Award of ACM MobiHoc 2020, and 2021 Journal of Communications and Networks (JCN) Best Paper Award. He co-authored a monograph Age of Information: A New Metric for Information Freshness, published by Morgan & Claypool Publishers in 2019. He received the Auburn Author Award of 2020. He received the National Science Foundation (NSF) CAREER Award in 2023. He is a Senior Member of the IEEE and a Member of the ACM.

## *Article* **Scheduling to Minimize Age of Incorrect Information with Imperfect Channel State Information**

**Yutao Chen and Anthony Ephremides \***

Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA; cheny@umd.edu

**\*** Correspondence: etony@umd.edu

**Abstract:** In this paper, we study a slotted-time system where a base station needs to update multiple users at the same time. Due to the limited resources, only part of the users can be updated in each time slot. We consider the problem of minimizing the Age of Incorrect Information (AoII) when imperfect Channel State Information (CSI) is available. Leveraging the notion of the Markov Decision Process (MDP), we obtain the structural properties of the optimal policy. By introducing a relaxed version of the original problem, we develop the Whittle's index policy under a simple condition. However, indexability is required to ensure the existence of Whittle's index. To avoid indexability, we develop Indexed priority policy based on the optimal policy for the relaxed problem. Finally, numerical results are laid out to showcase the application of the derived structural properties and highlight the performance of the developed scheduling policies.

**Keywords:** age of incorrect information; multi-user system; scheduling policy

**Citation:** Chen, Y.; Ephremides, A. Scheduling to Minimize Age of Incorrect Information with Imperfect Channel State Information. *Entropy* **2021**, *23*, 1572. https://doi.org/ 10.3390/e23121572

Academic Editor: Mario Martinelli

Received: 3 November 2021 Accepted: 23 November 2021 Published: 25 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

The Age of Incorrect Information (AoII) is introduced in [1] as a combination of agebased metrics (e.g., Age of Information (AoI)) and error-based metrics (e.g., Minimum Mean Square Error). In communication systems, AoII captures not only the information mismatch between the source and the destination but also the aging process of inconsistent information. Hence, two functions dominate AoII. The first is the time penalty function, which reflects how the inconsistency of information affects the system over time. In reallife applications, inconsistent information will affect different communication systems in different ways. For example, machine temperature monitoring is time-sensitive because the damage caused by overheating will accumulate quickly. However, reservoir water level monitoring is less sensitive to time. Therefore, by adopting different time penalty functions, AoII can capture different aging processes of the mismatch in different systems. The second is the information penalty function, which captures the information mismatch between the source and the destination. It allows us to measure mismatches in different ways, depending on how sensitive different systems are to information inconsistencies. For example, the navigation system requires precise information to give correct instructions, but the real-time delivery tracking system does not need very accurate location information. Since we can choose different penalty functions for different systems, AoII is adaptable to various communication goals, which is why it is regarded as a semantic metric [2].

Since the introduction of AoII, several studies have been performed to reveal its fundamental nature. The authors of [3] consider a system with random packet delivery times and compare AoII with AoI and real-time error via extensive numerical results. The authors of [4] study the problem of minimizing the AoII that takes the general time penalty function. Three real-life applications are considered to showcase the performance advantages of AoII over AoI and real-time error. In [5], the authors investigate the AoII that considers the quantified mismatch between the source and the destination. The optimization problem is studied when the system is resource-constrained. The authors of [6] studied the AoII

minimization problem in the context of scheduling. It considers a system where the central scheduler needs to update multiple users at the same time. However, the central scheduler cannot know the states of the sources before receiving the updates. By introducing the belief value, Whittle's index policy is developed and evaluated. In this paper, we also consider the problem of minimizing AoII in scheduling. Different from [6], we consider the generic time penalty function and study the minimization problem in the presence of imperfect Channel State Information (CSI). Due to the existence of CSI, Whittle's index policy becomes infeasible in general. Hence, we introduce another scheduling policy that is more versatile and has comparable performance to Whittle's index policy.

The problem of scheduling to minimize AoI is studied under various system settings in [7–11]. The problem studied in this paper is different and more complicated because AoII considers the aging process of inconsistent information rather than the aging process of updates. Meanwhile, none of them consider the case where CSI is available. The problem of optimizing information freshness in the presence of CSI is studied in [12,13]. However, they focus on the system with a single user and mainly discuss the case where CSI is perfect. The scheduling problems with the goal of minimizing an error-based performance measure are considered in [14–16]. Our problem is fundamentally different because AoII also considers the time effect. Moreover, we consider the system where a base station observes multiple sources simultaneously and needs to send updates to multiple destinations.

The main contributions of this work can be summarized as follows. (1) We study the problem of minimizing AoII in a multi-user system where imperfect CSI is available. Meanwhile, the time penalty function is generic. (2) We derive the structural properties of the optimal policy for the considered problem. (3) We establish the indexability of the considered problem under a simple condition and develop Whittle's index policy. (4) We obtain the optimal policy for a relaxed version of the original problem. By exploring the characteristics of the relaxed problem, we provide an efficient algorithm to obtain the optimal policy. (5) Based on the optimal policy for the relaxed problem, we develop the Indexed priority policy that is free from indexability and has comparable performance to Whittle's index policy.

The remainder of this paper is organized in the following way. In Section 2, we introduce the system model and formulate the primal problem. Section 3 explores the structural properties of the optimal policy for the primal problem. Under a simple condition, we develop Whittle's index policy in Section 4. Section 5 presents the optimal policy for a relaxed version of the primal problem. On this basis, we develop the Indexed priority policy in Section 6. Finally, in Section 7, the numerical results are laid out.

#### **2. System Overview**

#### *2.1. Communication Model*

We consider a slotted-time system with *N* users and one base station. Each user is composed of a source process, a channel, and a receiver. We assume all the users share the same structure, but the parameters are different. The structure of the communication model is provided in Figure 1.

**Figure 1.** The structure of the communication model.

For user *i*, the source process is modeled by a two-state Markov chain where transitions happen between the two states with probability *pi* > 0 and self-transitions happen with probability 1 − *pi*. At any time slot *t*, the state of the source process *Xi*,*<sup>t</sup>* ∈ {0, 1} will be reported to the base station as an update, and the base station will decide whether to transmit this update through the corresponding channel. The channel is unreliable, but the estimate of the Channel State Information (CSI) is available at the beginning of each time slot. Let *ri*,*<sup>t</sup>* ∈ {0, 1} be the CSI at time *t*. We assume that *ri*,*<sup>t</sup>* is independent across time and user indices. *ri*,*<sup>t</sup>* = 1 if and only if the transmission attempt at time *t* will succeed and *ri*,*<sup>t</sup>* = 0 otherwise. Then, we denote by *r*ˆ*i*,*<sup>t</sup>* ∈ {0, 1} the estimate of *ri*,*t*. We assume that *r*ˆ*i*,*<sup>t</sup>* is an independent Bernoulli random variable with parameter *γi*, i.e., *r*ˆ*i*,*<sup>t</sup>* = 1 with probability *γ<sup>i</sup>* ∈ [0, 1] and *r*ˆ*i*,*<sup>t</sup>* = 0 with probability 1 − *γi*. However, the estimate is imperfect. We assume that the error depends only on the user and its estimate. More precisely, we define the probability of error as *p r*ˆ*i <sup>e</sup>*,*<sup>i</sup>* - *Pr*[*ri* = *r*ˆ*<sup>i</sup>* | *r*ˆ*i*]. We assume *p r*ˆ*i <sup>e</sup>*,*<sup>i</sup>* < 0.5 because we can flip the estimate if *p r*ˆ*i <sup>e</sup>*,*<sup>i</sup>* > 0.5. We are not interested in the case of *p r*ˆ*i <sup>e</sup>*,*<sup>i</sup>* = 0.5 since *r*ˆ*i*,*<sup>t</sup>* is useless in this case. Although the channel is unreliable, each transmission attempt takes exactly one time slot regardless of the result, and the successfully transmitted update will not be corrupted. Every time an update is received, the receiver will use it as the new estimate *X*ˆ*i*,*t*. The receiver will send an *ACK*/*NACK* packet to inform the base station of its reception of the new update. Since an *ACK*/*NACK* packet is generally very small and simple, we assume that it is transmitted reliably and received instantaneously. Then, if *ACK* is received, the base station knows that the receiver's estimate changed to the transmitted update. If *NACK* is received, the base station knows that the receiver's estimate did not change. Therefore, the base station always knows the estimate at the receiver side.

At the beginning of each time slot, the base station receives updates from each source and the estimates of CSI from each channel. The old updates and estimates are discarded upon the arrival of new ones. Then, the base station decides which updates to transmit, and the decision is independent of the transmission history. Due to the limited resources, at most *M* < *N* updates are allowed per transmission attempt. We consider a base station that always transmits *M* updates.

#### *2.2. Age of Incorrect Information*

All the users adopt AoII as a performance metric, but the choices of penalty functions vary. Let *Xt* and *X*ˆ*<sup>t</sup>* be the true state and the estimate of the source process, respectively. Then, in a slotted-time system, AoII can be expressed as follows

$$\Delta\_{AoII}(X\_{t\prime}\hat{X}\_{t\prime}t) = \sum\_{k=lI\_l+1}^{t} \left( \lg(X\_{k\prime}\hat{X}\_k) \times F(k-lI\_l) \right),\tag{1}$$

where *Ut* is the last time instance before time *t* (including *t*) that the receiver's estimate is correct. *g*(*Xt*, *X*ˆ*t*) can be any information penalty function that captures the difference between *Xt* and *X*ˆ*t*. *F*(*t*) *f*(*t*) − *f*(*t* − 1) where *f*(*t*) can be any time penalty function that is non-decreasing in *t*. We consider the case where the users adopt the same information penalty function *<sup>g</sup>*(*Xt*, *<sup>X</sup>*ˆ*t*) = <sup>|</sup>*Xt* <sup>−</sup> *<sup>X</sup>*ˆ*t*<sup>|</sup> but possibly different time penalty functions. To ease the analysis, we require *f*(*t*) to be unbounded. Combined together, we require *f*(*t*1) ≤ *f*(*t*2) if *t*<sup>1</sup> < *t*<sup>2</sup> and lim*t*→+<sup>∞</sup> *f*(*t*)=+∞. Without a loss of generality, we assume *f*(0) = 0, as the source is modeled by a two-state Markov chain, *<sup>g</sup>*(*Xt*, *<sup>X</sup>*ˆ*t*) ∈ {0, 1}. Hence, Equation (1) can be simplified to

$$\Delta\_{AoII}(\mathcal{X}\_t, \mathcal{X}\_{t'}t) = \sum\_{k=lI\_l+1}^t F(k-lI\_t) = f(s\_t)\_{\prime t}$$

where *st t* − *Ut*. Therefore, the evolution of *st* is sufficient to characterize the evolution of AoII. To this end, we distinguish between the following cases.


To sum up, we get

$$s\_{l+1} = \mathbb{1}\_{\{L\_{l+1} \neq l+1\}} \times (s\_l + 1). \tag{2}$$

A sample path of *st* is shown in Figure 2. In the remainder of this paper, we use *fi*(·) to denote the time penalty function user *i* adopts.

**Figure 2.** A sample path of *st*.

**Remark 1.** *Under this particular choice of the penalty function, st can be interpreted as the time elapsed since the last time the receiver's estimate is correct. Please note that st is different from the Age of Information (AoI) [17], which is defined as the time elapsed since the generation time of the last received update. We can see that AoI considers the aging process of the update, while AoII considers the aging process of the estimation error. At the same time, st is also fundamentally different from the holding time, which, according to [18,19], is defined as the time elapsed since the last successful transmission. We notice that the receiver's estimate can become correct even when no new update is successfully transmitted. Moreover, the information carried by the update may have become incorrect by the time it is received. We also notice that [18,19] consider the problem of minimizing the estimation error. However, by adopting AoII as the performance metric, we study the impact of estimation error on the system.*

#### *2.3. System Dynamic*

In this section, we tackle the system dynamic. We notice that the status of user *i* can be captured by the pair *xi*,*<sup>t</sup>* - (*si*,*t*,*r*ˆ*i*,*t*). In the following, we will use *xi*,*<sup>t</sup>* and (*si*,*t*,*r*ˆ*i*,*t*) interchangeably. Then, the system dynamic can be fully characterized by the dynamic of *xt* - (*x*1,*t*, ... , *xN*,*t*). Hence, it suffices to characterize the value of *xt*+<sup>1</sup> given *x<sup>t</sup>* and the base station's action. To this end, we denote, by *a<sup>t</sup>* = (*a*1,*t*, ... , *aN*,*t*), the base station's action at time *t*. *ai*,*<sup>t</sup>* = 1 if the base station transmits the update from user *i* at time *t* and *ai*,*<sup>t</sup>* = 0 otherwise. We notice that given action *at*, users are independent and the action taken on user *i* will only affect itself. Consequently

$$\Pr(\mathbf{x}\_{t+1} \mid \mathbf{x}\_t, \mathbf{a}\_t) = \prod\_{i=1}^{N} \Pr(\mathbf{x}\_{i, t+1} \mid \mathbf{x}\_{i, t\prime}, \mathbf{a}\_t) = \prod\_{i=1}^{N} \Pr(\mathbf{x}\_{i, t+1} \mid \mathbf{x}\_{i, t\prime} a\_{i, t}).$$

Combined with the fact that all the users share the same structure, it is sufficient to study the dynamic of a single user. In the following discussions, we drop the user-dependent subscript *i*. We recall that *r*ˆ*t*+<sup>1</sup> is an independent Bernoulli random variable. Then, we have

$$\Pr(\mathbf{x}\_{t+1} \mid \mathbf{x}\_t, a\_t) = P(\mathbf{\hat{r}}\_{t+1}) \times \Pr(\mathbf{s}\_{t+1} \mid \mathbf{x}\_t, a\_t). \tag{3}$$

By definition, *P*(*r*ˆ*t*+<sup>1</sup> = 1) = *γ* and *P*(*r*ˆ*t*+<sup>1</sup> = 0) = 1 − *γ*. Then, we only need to tackle the value of *Pr*(*st*+<sup>1</sup> | *xt*, *at*). To this end, we distinguish between the following cases

• When *xt* = (0,*r*ˆ*t*), the estimate at time *t* is correct (i.e., *X*ˆ*<sup>t</sup>* = *Xt*). Hence, for the receiver, *Xt* carries no new information about the source process. In other words, *X*ˆ*t*+<sup>1</sup> = *X*ˆ*<sup>t</sup>* regardless of whether an update is transmitted at time *t*. We recall that *Ut*+<sup>1</sup> <sup>=</sup> *Ut* if *<sup>X</sup>*ˆ*t*+<sup>1</sup> <sup>=</sup> *Xt*+<sup>1</sup> and *Ut*+<sup>1</sup> <sup>=</sup> *<sup>t</sup>* <sup>+</sup> 1 otherwise. Since the source is binary, we obtain *Ut*+<sup>1</sup> = *Ut* if *Xt*+<sup>1</sup> = *Xt*, which happens with probability *p* and *Ut*+<sup>1</sup> = *t* + 1 otherwise. According to (2), we obtain

$$\Pr(1 \mid (0, \mathfrak{f}\_t)\_\prime a\_t) = p\_\prime$$

$$\Pr(0 \mid (0, \mathfrak{f}\_t)\_\prime a\_t) = 1 - p\_\prime$$

• When *at* = 0 and *xt* = (*st*,*r*ˆ*t*), where *st* > 0, the channel will not be used and no new update will be received by the receiver,and so, *X*ˆ*t*+<sup>1</sup> = *X*ˆ*t*. We recall that *Ut*+<sup>1</sup> = *Ut* if *<sup>X</sup>*ˆ*t*+<sup>1</sup> <sup>=</sup> *Xt*+<sup>1</sup> and *Ut*+<sup>1</sup> <sup>=</sup> *<sup>t</sup>* <sup>+</sup> 1 otherwise. Since *Xt* <sup>=</sup> *<sup>X</sup>*ˆ*<sup>t</sup>* and the source is binary, we have *Ut*+<sup>1</sup> = *Ut* if *Xt*+<sup>1</sup> = *Xt*, which happens with probability 1 − *p* and *Ut*+<sup>1</sup> = *t* + 1 otherwise. According to (2), we obtain

$$\Pr(\mathbf{s}\_t + \mathbf{1} \mid (\mathbf{s}\_t, \mathbf{\hat{r}}\_t), a\_t = 0) = 1 - p,$$

$$\Pr(\mathbf{0} \mid (\mathbf{s}\_t, \mathbf{\hat{r}}\_t), a\_t = 0) = p.$$

• When *at* = 1 and *xt* = (*st*, 1) where *st* > 0, the transmission attempt will succeed with probability 1 <sup>−</sup> *<sup>p</sup>*<sup>1</sup> *<sup>e</sup>* and fail with probability *p*<sup>1</sup> *<sup>e</sup>*. We recall that *Ut*+<sup>1</sup> = *Ut* if *<sup>X</sup>*ˆ*t*+<sup>1</sup> <sup>=</sup> *Xt*+<sup>1</sup> and *Ut*+<sup>1</sup> <sup>=</sup> *<sup>t</sup>* <sup>+</sup> 1 otherwise. Then, when the transmission attempt succeeds (i.e., *<sup>X</sup>*ˆ*t*+<sup>1</sup> <sup>=</sup> *Xt*), *Ut*+<sup>1</sup> <sup>=</sup> *Ut* if *Xt*+<sup>1</sup> <sup>=</sup> *Xt* and *Ut*+<sup>1</sup> <sup>=</sup> *<sup>t</sup>* <sup>+</sup> 1 otherwise. When the transmission attempt fails (i.e., *<sup>X</sup>*ˆ*t*+<sup>1</sup> <sup>=</sup> *<sup>X</sup>*ˆ*<sup>t</sup>* <sup>=</sup> *Xt*), we have *Ut*+<sup>1</sup> <sup>=</sup> *Ut* if *Xt*+<sup>1</sup> <sup>=</sup> *Xt* and *Ut*+<sup>1</sup> = *t* + 1 otherwise. Combining (2) with the dynamic of the source process we obtain

$$\Pr(s\_t + 1 \mid (s\_t, 1), a\_t = 1) = p\_c^1 (1 - p) + (1 - p\_c^1) p \stackrel{\triangle}{=} a\_\prime$$

$$\Pr(0 \mid (s\_t, 1), a\_t = 1) = p\_c^1 p + (1 - p\_c^1)(1 - p) = 1 - a\_\prime$$

• When *at* = 1 and *xt* = (*st*, 0), where *st* > 0, following the same line, we obtain

$$\begin{aligned} Pr(\mathbf{s}\_t + 1 \mid (\mathbf{s}\_t, 0), a\_t = 1) &= p\_\epsilon^0 p + (1 - p\_\epsilon^0)(1 - p) \triangleq \beta, \\ Pr(\mathbf{0} \mid (\mathbf{s}\_t, 0), a\_t = 1) &= p\_\epsilon^0 (1 - p) + (1 - p\_\epsilon^0)p = 1 - \beta. \end{aligned}$$

Combines together, we obtain the value of *Pr*(*st*+<sup>1</sup> | *xt*, *at*) in all cases. As only *M* out of *N* updates are allowed per transmission attempt, we realize a necessity to require transmission attempts always help minimize AoII. It is equivalent to impose *Pr*(*st*+<sup>1</sup> > *st* | (*st*,*r*ˆ*t*), *at* = 0) > *Pr*(*st*+<sup>1</sup> > *st* | (*st*,*r*ˆ*t*), *at* = 1) for any (*st*,*r*ˆ*t*). Leveraging the results above, it is sufficient to require *p* < 0.5. As all the users share the same structure, we assume, for the rest of this paper, that 0 < *pi* < 0.5 for 1 ≤ *i* ≤ *N*.

#### *2.4. Problem Formulation*

The communication goal is to minimize the expected AoII. Therefore, the problem can be formulated as the following

$$\underset{\Phi \in \Phi}{\text{arg min}} \qquad \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\Phi} \left( \sum\_{t=0}^{T-1} \sum\_{i=1}^{N} f\_i(s\_{i,t}) \right) \tag{4a}$$

$$\text{subject to} \quad \sum\_{i=1}^{N} a\_{i,t} = M \quad \forall t,\tag{4b}$$

where Φ is the set of all causal policies. We refer to the constrained minimization problem reported in problem (4) as the Primal Problem (PP). We notice that the PP is a Restless Multi-Armed Bandit (RMAB) Problem. The optimal policy for this type of problem is far from reachable since it is PSPACE-hard in general [20]. However, we can still derive the structural properties of the optimal policy. These structural properties can be used as a guide for the development of scheduling policies and can indicate the good performance of the developed scheduling policies.

#### **3. Structural Properties of the Optimal Policy**

In this section, we investigate the structural properties of the optimal policy for PP. We first define an infinite horizon with an average cost Markov Decision Process (MDP) M*N*(*w*, *M*)=(X*N*, A*N*(*M*),P*N*, C*N*(*w*)), where


$$P\_{\mathbf{x},\mathbf{x}'}(\mathbf{a}) = \prod\_{i=1}^{N} P(\mathbf{\hat{r}}\_i') P\_{s\_i s\_i'}(a\_{i'} \mathbf{\hat{r}}\_i) \dots$$

where *Psi*,*s i* (*ai*,*r*ˆ*i*) is the transition probability from *si* to *s <sup>i</sup>* when the estimate of CSI is *r*ˆ*<sup>i</sup>* and action *ai* is taken. The values of *Psi*,*s i* (*ai*,*r*ˆ*i*) can be obtained easily from the results in Section 2.3.

• C*N*(*w*) denotes the instant cost. When the system is at state *x* and action *a* is taken, the instant cost is *C*(*x*, *a*) - ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *C*(*xi*, *ai*) - ∑*<sup>N</sup> i*=1 - *fi*(*si*) + *wai* .

We notice that PP can be cast into M*N*(0, *M*). Since *w* = 0, the instant cost is independent of action *a*. Therefore, we abbreviate *C*(*x*, *a*) as *C*(*x*). To simplify the analysis, we consider the case of *M* = 1. Equivalently, we investigate the structural properties of the optimal policy for M*N*(0, 1).

**Remark 2.** *For the case of M* > 1*, we can apply the same methodology. However, as M increases, the action space will grow quickly, resulting in the need to consider more feasible actions in each step of the proof. Hence, to better demonstrate the methodology, we only consider the case of M* = 1 *in this paper.*

It is well known that the optimal policy for M*N*(0, 1) can be characterized by the value function. We denote the value function of state *x* as *V*(*x*). A canonical procedure to calculate *V*(*x*) is applying the Value Iteration Algorithm (VIA). To this end, we define *Vν*(·) as the estimated value function at iteration *ν* of VIA and initialize *V*0(·) = 0. Then, VIA updates the estimated value functions in the following way

$$V\_{\nu+1}(\mathbf{x}) = \mathbb{C}(\mathbf{x}) - \theta + \min\_{\mathbf{a} \in \mathcal{A}\_N(1)} \left\{ \sum\_{\mathbf{x'} \in \mathcal{X}\_N} P\_{\mathbf{x}, \mathbf{x'}}(\mathbf{a}) V\_{\nu}(\mathbf{x'}) \right\},\tag{5}$$

where *θ* is the optimal value of M*N*(0, 1). VIA is guaranteed to converge to the value function [21]. More precisely, *Vν*(·) = *V*(·) when *ν* → +∞. However, the exact value function is impossible to get since we need infinite iterations and the state space is infinite. Instead, we provide two structural properties of the value function.

**Lemma 1** (Monotonicity)**.** *For* M*N*(0, 1)*, V*(*x*) *is non-decreasing in si for* 1 ≤ *i* ≤ *N.*

**Proof.** Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired results. The complete proof can be found in Appendix A.

Before introducing the next structural property, we make the following definition.

**Definition 1** (Statistically identical)**.** *Two users are said to be statistically identical if the userdependent parameters and the adopted time penalty functions are the same.*

For the users that are statistically identical, we can prove the following

**Lemma 2** (Equivalence)**.** *For* M*N*(0, 1)*, if users j and k are statistically identical, V*(*x*) = *V* (P(*x*)) *where* P(*x*) *is state x with xj and xk exchanged.*

**Proof.** Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired results. At each iteration, we show that for each feasible action at state *x*, we can find an equivalent action at state P(*x*). Two actions are equivalent if they lead to the same value function. The complete proof can be found in Appendix B.

Equipped with the above lemmas, we proceed with characterizing the structural properties of the optimal policy. We recall that the optimal action at each state can be characterized by the value function. Hence, we denote, by *V<sup>j</sup>* (*x*), the value function resulting from choosing user *j* to update at state *x*. Then, *V<sup>j</sup>* (*x*) can be calculated by

$$V^{j}(\mathbf{x}) = \mathbb{C}(\mathbf{x}) - \theta + \sum\_{\mathbf{x}' - \mathbf{x}'\_j} \left\{ \left( \prod\_{i \neq j} P\_{\mathbf{x}\_i, \mathbf{x}'\_i}(0) \right) \sum\_{\mathbf{r}'\_j} \left[ P(\mathbf{r}'\_j) \left( \sum\_{s'\_j} P\_{s\_j, \mathbf{s}'\_j}(1, \mathbf{\hat{r}}\_j) V(\mathbf{x'}) \right) \right] \right\}.$$

If *V<sup>j</sup>* (*x*) <sup>&</sup>lt; *<sup>V</sup>k*(*x*) for all *<sup>k</sup>* <sup>=</sup> *<sup>j</sup>*, it is optimal to transmit the update from user *<sup>j</sup>*. When *Vj* (*x*) = *Vk*(*x*), the two choices are equally desirable. In the following, we will characterize the properties of *δj*,*k*(*x*) - *V<sup>j</sup>* (*x*) <sup>−</sup> *<sup>V</sup>k*(*x*) for any *<sup>j</sup>* and *<sup>k</sup>*.

**Theorem 1** (Structural properties)**.** *For* <sup>M</sup>*N*(0, 1)*, <sup>δ</sup>j*,*k*(*x*) *has the following properties*


**Proof.** The proof can be found in Appendix C.

We notice that Γ*r*ˆ*<sup>i</sup> <sup>i</sup>* can be written as

$$\Gamma\_i^{\flat\_i} = \frac{\Pr\left(s\_i + 1 \mid (s\_i \prime \hat{r}\_i) \mid a\_i = 1\right)}{\Pr\left(s\_i + 1 \mid (s\_i \prime \hat{r}\_i) \mid a\_i = 0\right)} < 1,$$

where *si* can be any positive integer. Consequently, <sup>Γ</sup>*r*ˆ*<sup>i</sup> <sup>i</sup>* is independent of any *si* > 0 and indicates the decrease in the probability of increasing *si* caused by action *ai* = 1. When <sup>Γ</sup>*r*ˆ*<sup>i</sup> i* is large, action *ai* = 1 will achieve a small decrease in the probability of increasing *si*. In the following, we provide an intuitive interpretation of why the monotonicity in Property 4 of Theorem 1 depends on Γ*r*ˆ*<sup>i</sup> <sup>i</sup>* . We take the case of Γ *r*ˆ*j <sup>j</sup>* <sup>≤</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> <sup>k</sup>* as an example and assume that there are only users *j* and *k* in the system. Then, according to Section 2.3, the dynamic of *sj* and *sk* can be divided into the following three cases


We notice that *δj*,*k*(*x*) implies the tendency of the base station to choose between the two users. The larger *δj*,*k*(*x*) is, the more the base station tends to choose user *k*. Thus, we investigate the base station's propensity to choose user *k* when *sk* increases but *sj* stays the same. We ignore the case where the resulting *sk* is zero since it is independent of the increase in *sk*. With this in mind, we first notice that *P<sup>k</sup> <sup>k</sup>* <sup>≤</sup> *<sup>P</sup><sup>k</sup> <sup>j</sup>* . Meanwhile, we can easily

verify that *Pj Pk* <sup>=</sup> <sup>Γ</sup> *r*ˆ*j j* Γ *r*ˆ *k k* . When Γ *r*ˆ*j <sup>j</sup>* <sup>≤</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> <sup>k</sup>* , we have *Pj* ≤ *Pk*. Then, there exists a subtle trade-off. More precisely, choosing user *k* will result in *P<sup>k</sup> <sup>k</sup>* <sup>≤</sup> *<sup>P</sup><sup>k</sup> <sup>j</sup>* , but at the cost of *Pk* ≥ *Pj*. Hence, in this case, the propensity of the base station is hard to determine. Following the same line, we can show that choosing user *j* will lead to *P<sup>j</sup> <sup>j</sup>* <sup>≤</sup> *<sup>P</sup><sup>j</sup> <sup>k</sup>* and *Pj* ≤ *Pk*. Thus, there exists

no such trade-off when we investigate the base station's propensity to choose user *j* as *sj* increases but *sk* stays the same. Leveraging Theorem 1, we can provide some specific structural properties of the optimal policy.

**Corollary 1** (Application of Theorem 1)**.** *When M* = 1*, the optimal policy for PP must satisfy the following*

	- *If smax*,1 ≥ *smax*,0*, it is optimal to choose the user with x* = (*smax*,1, 1)*.*
	- *If smax*,1 < *smax*,0*, the optimal choice will switch from the user with x* = (*smax*,0, 0) *to the user with x* = (*smax*,1, 1) *when smax*,1 *increases from* 0 *to smax*,0 *solely.*

**Proof.** The first property follows directly from Property 1 and Property 3 of Theorem 1. For the second property, leveraging Property 2 of Theorem 1, we have *<sup>δ</sup>j*,*k*(*x*2) <sup>≤</sup> *<sup>δ</sup>j*,*k*(*x*1) <sup>≤</sup> 0 if *r*ˆ1,*<sup>j</sup>* ≤ *r*ˆ2,*j*, *r*ˆ1,*<sup>k</sup>* ≥ *r*ˆ2,*k*, and *s*1,*<sup>i</sup>* = *s*2,*<sup>i</sup>* for 1 ≤ *i* ≤ *N*. Thus, the optimal choice will not be user *k* in this case. Then, we can conclude that the optimal choice must be in the set *G* = {*j*}∪{*k* : *r*ˆ1,*<sup>k</sup>* < *r*ˆ2,*k*}.

For the third property, we have proved in Property 4 of Theorem 1 that *δj*,*k*(*x*) is non-increasing in *sj* if Γ *r*ˆ*j <sup>j</sup>* <sup>≤</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> <sup>k</sup>* . Hence, *<sup>δ</sup>j*,*k*(*x*2) <sup>≤</sup> *<sup>δ</sup>j*,*k*(*x*1) <sup>≤</sup> 0. As we consider the case of *N* = 2, the optimal choice at state *x*<sup>2</sup> will also be user *j*. The fourth property can be shown in a similar way by noticing that *δj*,*k*(*x*) is non-decreasing in *sk* when Γ *r*ˆ*j <sup>j</sup>* <sup>≥</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> k* .

For the last property, we recall from Property 5 of Theorem 1 that it is always better to choose the user with a larger *s* if they are statistically identical and have the same *r*ˆ. Thus, we can conclude that the optimal choice must be either the user with *x* = (*smax*,1, 1) or the user with *x* = (*smax*,0, 0). Without a loss of generality, we assume *xj* = (*smax*,1, 1) and *xk* = (*smax*,0, 0). Now, we distinguish between the following cases


#### **4. Whittle's Index Policy**

Whittle's index policy is a well-known low-complexity heuristic that shows a strong performance in many problems that belong to RMAB [22–24]. In this section, we develop Whittle's index policy for PP. We first present the general procedures we adopt to obtain Whittle's index.


#### *4.1. Relaxed Problem*

The first step in obtaining Whittle's index is to formulate the Relaxed Problem (RP). More precisely, instead of requiring the limit on the number of updates allowed per transmission attempt to be met in each time slot, we relax the constraint such that the limit is not violated in an average sense. Then, RP can be formulated as

$$\begin{array}{ll}\underset{\Phi\in\Phi}{\text{arg min}} & \mathbb{\bar{\Delta}}\_{\Phi}\triangleq\lim\_{T\to\infty}\frac{1}{T}\mathbb{E}\_{\Phi}\left(\sum\_{t=0}^{T-1}\sum\_{i=1}^{N}f\_{i}(s\_{i,t})\right) \\ \end{array} \tag{6a}$$

$$\text{subject to} \quad \bar{\rho}\_{\Phi} \stackrel{\Delta}{=} \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\Phi} \left( \sum\_{t=0}^{T-1} \sum\_{i=1}^{N} a\_{i,t} \right) \le M. \tag{6b}$$

As RP is specified, we apply the Lagrangian approach. First of all, we write RP into its Lagrangian form.

$$\mathcal{L}(\lambda, \phi) = \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\phi} \left( \sum\_{t=0}^{T-1} \sum\_{i=1}^{N} (f\_i(s\_{i,t}) + \lambda a\_{i,t}) \right) - \lambda M\_{\tau}$$

where *λ* ≥ 0 is the Lagrange multiplier. Then, we investigate the problem of minimizing the Lagrangian function. Since *λM* is independent of policies, we can ignore it. More precisely, we consider the following minimization problem

$$\underset{\Phi \in \Phi}{\text{minimize}} \quad \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\Phi} \left( \sum\_{t=0}^{T-1} \sum\_{i=1}^{N} (f\_i(s\_{i,t}) + \lambda a\_{i,t}) \right). \tag{7}$$

#### *4.2. Decoupled Model*

In this section, we formulate the decoupled problem and investigate its optimal policy. The decoupled model associated with each user follows the system model with *N* = 1. Since all the users share the same structure, we drop the user-dependent subscript *i* for simplicity. Then, the decoupled problem can be formulated as

$$\min\_{\Phi \in \Phi'} \min\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\Phi} \left( \sum\_{t=0}^{T-1} \left( f(s\_t) + \lambda a\_t \right) \right), \tag{8}$$

where Φ is the set of all causal policies when *N* = 1. We notice that problem (8) can be cast into the MDP M1(*λ*, −1). We define *M* = −1 when there is no restriction on the number of updates allowed per transmission attempt.

We first investigate the structural properties of the optimal policy for M1(*λ*, −1) when *λ* is a given non-negative constant. We start with characterizing the corresponding value function *V*(*x*).

**Corollary 2** (Extension of Lemma 1)**.** *For* M1(*λ*, −1)*, V*(*x*) *is non-decreasing in s.*

**Proof.** The proof follows the same steps as in the proof of Lemma 1. The complete proof can be found in Appendix D.

Equipped with the above corollary, we can characterize the structural properties of the optimal policy for (8).

**Proposition 1** (Optimal policy for decoupled problem)**.** *The optimal policy for the decoupled problem is a threshold policy with the following properties.*


**Proof.** We define Δ*V*(*x*) - *<sup>V</sup>*1(*x*) <sup>−</sup> *<sup>V</sup>*0(*x*), where *<sup>V</sup>a*(*x*) is the value function resulting from taking action *a* at state *x*. Then, the optimal action at state *x* is *a* = 1 if Δ*V*(*x*) < 0, and *a* = 0 is optimal otherwise. We use Corollary 2 to characterize the sign of Δ*V*(*x*). The complete proof can be found in Appendix E.

In the following, we evaluate the performance of the threshold policy detailed in Proposition 1. More precisely, we calculate the expected AoII Δ¯ *<sup>n</sup>* and the expected transmission rate *ρ*¯*<sup>n</sup>* resulting from the adoption of threshold policy *n*. We will see in the following that Δ¯ *<sup>n</sup>* and *ρ*¯*<sup>n</sup>* are essential for establishing the indexability and obtaining the expression of Whittle's index.

**Proposition 2** (Performance)**.** *Under threshold policy n* = (*n*0, *n*1)*,*

$$\Delta\_{\mathfrak{n}} = \pi\_0 p \left[ \sum\_{k=1}^{n\_1 - 1} f(k)(1 - p)^{k - 1} + (1 - p)^{n\_1 - 1} \left( \sum\_{k = n\_1}^{n\_0 - 1} f(k)c\_1^{k - n\_1} + c\_1^{n\_0 - n\_1} \sum\_{k = n\_0}^{+\infty} f(k)c\_2^{k - n\_0} \right) \right],$$

$$\bar{\rho}\_{\mathfrak{n}} = \pi\_0 p (1 - p)^{n\_1 - 1} \left[ \frac{\gamma}{1 - c\_1} + c\_1^{n\_0 - n\_1} \left( \frac{1}{1 - c\_2} - \frac{\gamma}{1 - c\_1} \right) \right],$$

*where*

$$\pi\_0 = \frac{1}{2 + p(1 - p)^{n\_1 - 1} \left[ \frac{1}{1 - c\_1} - \frac{1}{p} + c\_1^{n\_0 - n\_1} \left( \frac{1}{1 - c\_2} - \frac{1}{1 - c\_1} \right) \right]}$$
  $c\_1 = (1 - \gamma)(1 - p) + \gamma a$ , and  $c\_2 = (1 - \gamma)\beta + \gamma a$ .

**Proof.** We notice that the dynamic of AoII under the threshold policy can be fully captured by a Discrete-Time Markov Chain (DTMC). Then, combined with the fact that *r*ˆ is an independent Bernoulli random variable, we can obtain the desired results from the stationary distribution of the induced DTMC. The complete proof can be found in Appendix F.

As *<sup>f</sup>*(·) can be any non-decreasing function, <sup>Δ</sup>¯ can grow indefinitely. Thus, it is necessary to require that there exists at least one threshold policy that causes a finite Δ¯ . By noting that 1 − *p* ≥ *c*<sup>1</sup> ≥ *c*2, we have

$$\begin{aligned} \bar{\Lambda} &\geq \pi\_0 p \left[ \sum\_{k=1}^{n\_1 - 1} f(k) c\_2^{k-1} + c\_2^{n\_1 - 1} \left( \sum\_{k=n\_1}^{n\_0 - 1} f(k) c\_2^{k - n\_1} + c\_2^{n\_0 - n\_1} \sum\_{k=n\_0}^{+\infty} f(k) c\_2^{k - n\_0} \right) \right] \\ &= \pi\_0 p \left( \sum\_{k=1}^{+\infty} f(k) c\_2^{k - 1} \right) . \end{aligned}$$

The equality is achieved when *n*<sup>0</sup> = *n*<sup>1</sup> = 1. Then, we can conclude that it is sufficient to require ∑+<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *<sup>f</sup>*(*k*)*ck*−<sup>1</sup> <sup>2</sup> <sup>&</sup>lt; <sup>+</sup>∞. This will be the underlying assumption throughout the rest of this paper.

#### *4.3. Indexability*

In this section, we establish the indexability of the decoupled problem, which ensures the existence of Whittle's index. We start with the definition of indexability.

**Definition 2** (Indexability)**.** *The decoupled problem is indexable if the set of states in which a* = 0 *is the optimal action increases with λ, that is,*

$$
\lambda' < \lambda \implies D(\lambda') \subseteq D(\lambda)\_{\prime\prime}
$$

*where D*(*λ*) *is the set of states in which a* = 0 *is optimal when Lagrange multiplier λ is adopted.*

The Lagrange multiplier *λ* can be viewed as a cost associated with each transmission attempt. Intuitively, as *λ* increases, the base station should stay idle (i.e., *a* = 0) for a longer time until *s* becomes large enough to offset the cost. Although it is intuitively correct that the decoupled problem is indexable, the indexability is hard to establish as the optimal policy is characterized by two thresholds. Thus, Whittle's index does not necessarily exist. However, the indexability can be established when the following condition is satisfied

$$p\_{\varepsilon,i}^0 = 0 \quad \text{for } 1 \le i \le N. \tag{9}$$

**Remark 3.** *Problem* (9) *only requires the estimate r*ˆ*<sup>i</sup> to be perfect when r*ˆ*<sup>i</sup>* = 0*. In the case of r*ˆ*<sup>i</sup>* = 1*, we still allow the estimate to be inaccurate.*

When (9) is satisfied, Propositions 1 and 2 reduce to the following

**Corollary 3** (Consequences of (9))**.** *When* (9) *is satisfied, the optimal policy for the decoupled problem* (8) *is the threshold policy n* = (+∞, *n*)*. The corresponding* Δ¯ *<sup>n</sup> and ρ*¯*<sup>n</sup> are*

$$
\bar{\Delta}\_{\mathfrak{n}} = \pi\_0 p \left( \sum\_{k=1}^{n-1} f(k)(1-p)^{k-1} + (1-p)^{n-1} \sum\_{k=n}^{+\infty} f(k) c\_1^{k-n} \right),
$$

$$
\bar{\rho}\_{\mathfrak{n}} = \pi\_0 p (1-p)^{n-1} \left( \frac{\gamma}{1-c\_1} \right),
$$

$$
\pi\_0 = \frac{1}{\left( \begin{array}{c} 1 \\ 1 & 1 \end{array} \right)}.
$$

*where*

$$
\pi\_0 = \frac{1}{2 + p(1 - p)^{n - 1} \left(\frac{1}{1 - c\_1} - \frac{1}{p}\right)}.
$$

**Proof.** We continue with the same notations as in the proof of Propositions 1 and 2. It is sufficient to show that *n*<sup>0</sup> = +∞. To this end, we consider the state *x* = (*s*, 0). By following the same steps as in the proof of Proposition 1, we have

$$
\Delta V(s,0) = \lambda \ge 0.
$$

Therefore, it is optimal to stay idle (i.e., *a* = 0) at state *x* = (*s*, 0) for any *s* ≥ 0. Equivalently, *n*<sup>0</sup> = +∞. Then, the corresponding Δ¯ *<sup>n</sup>* and *ρ*¯*<sup>n</sup>* can be calculated as a special case of Proposition 2 where *n*<sup>0</sup> = +∞, *n*<sup>1</sup> = *n*, and *p*<sup>0</sup> *<sup>e</sup>* = 0.

Leveraging Corollary 3, we can establish the indexability of the decoupled problem.

**Proposition 3** (Indexability of decoupled problem)**.** *The decoupled problem is indexable when* (9) *is satisfied.*

**Proof.** According to Proposition 2.2 of [25], we only need to verify that the expected transmission rate *ρ*¯*<sup>n</sup>* is strictly decreasing in *n*. From Corollary 3, we have

$$\bar{\rho}\_{\mathfrak{n}} = \frac{\gamma \left( \frac{p}{1 - c\_1} \right)}{\frac{2}{(1 - p)^{n - 1}} + \left( \frac{p}{1 - c\_1} - 1 \right)} \cdot \frac{1}{2}$$

As <sup>1</sup> <sup>2</sup> < 1 − *p* < 1, we can easily verify that *ρ*¯*<sup>n</sup>* is strictly decreasing in *n*. Thus, the decoupled problem is indexable when (9) is satisfied.

#### *4.4. Whittle's Index Policy*

In this section, we proceed with finding the expression of Whittle's index and defining Whittle's index policy. First of all, we give the definition of Whittle's index.

**Definition 3** (Whittle's index)**.** *When the decoupled problem is indexable, Whittle's index at state x is defined as the infimum λ, such that both actions are equally desirable. Equivalently, Whittle's index at state x is defined as the infimum λ such that V*0(*x*) = *V*1(*x*)*.*

Let us denote by *Wx* the Whittle's index at state *x*. Then, the expression of Whittle's index is given by the following Proposition.

**Proposition 4** (Whittle's index)**.** *When* (9) *is satisfied, Whittle's index is*

$$\mathcal{W}\_{\mathbf{x}} = \begin{cases} 0 & \text{when } \mathbf{x} = (0, \hat{\mathbf{r}}) \text{ or } \mathbf{x} = (s, 0), \\ & (1 - c\_1) \sum\_{k = s + 1}^{+\infty} f(k) c\_1^{k - s - 1} - \bar{\Delta}\_s \\ \hline \frac{(1 - c\_1)(1 - p) - \gamma(1 - p - a)}{c\_1(1 - p - a)} + \bar{\rho}\_s \\ \end{cases} \quad \text{when } \mathbf{x} = (s, 1),$$

*where <sup>s</sup>* <sup>&</sup>gt; <sup>0</sup> *and <sup>c</sup>*<sup>1</sup> = (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*) + *γα.* <sup>Δ</sup>¯ *<sup>s</sup> and <sup>ρ</sup>*¯*<sup>s</sup> are the expected AoII and the expected transmission rate when threshold policy n* = (+∞,*s*) *is adopted, respectively. At the same time, Wx is non-negative and is non-decreasing in s.*

**Proof.** Whittle's indexes at state *x* = (0,*r*ˆ) and *x* = (*s*, 0) are obtained easily from the proof of Proposition 1. For state *x* = (*s*, 1), we first use backward induction to calculate the expressions of some value functions. Then, the expression of Whittle's index can be obtained from its definition. The complete proof can be found in Appendix G.

**Definition 4** (Whittle's index policy)**.** *At any state x* = (*x*1, *x*2, ... , *xN*)*, the base station will transmit the updates from M users with the largest Wxi . The ties are broken arbitrarily. Wxi is calculated using Proposition 4 with the parameters of user i.*

**Remark 4.** *Whittle's index policy possesses the structural properties detailed in Corollary 1.*


#### **5. Optimal Policy for Relaxed Problem**

In this section, we provide an efficient algorithm to obtain the optimal policy for RP, based on which we will develop another scheduling policy for PP in the next section that is free from indexability. At the same time, the performance of the optimal policy for RP forms a universal lower bound because the following ordering holds

$$
\bar{\Delta}\_{AoII}^{RP} \le \bar{\Delta}\_{AoII\prime}^{PP}
$$

where Δ¯ *RP AoI I* and <sup>Δ</sup>¯ *PP AoI I* are the minimal expected AoII of RP and PP, respectively.

**Remark 5.** *Note that the optimal policy for RP may not necessarily be a valid policy for PP, as the transmitter may transmit more than M updates in one transmission attempt under RPoptimal policy.*

To solve RP, we follow the discussion in Section 4.1. More precisely, we take the Lagrangian approach and consider the problem reported in (7). We will see in the following discussion that the optimal policy for RP can be characterized by the optimal policies for problem (7). Therefore, we first cast problem (7) into the MDP M*N*(*λ*, −1). However, the optimal policy for M*N*(*λ*, −1) is difficult to obtain because the state space is infinite. Even though we can make the state space finite by imposing an upper limit on the value of *s*, the state space and the action space grow exponentially with the number of users in the system. To overcome the difficulty, we investigate the optimal policy for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*, −1) where 1 ≤ *i* ≤ *N*. The superscript *i* means that the only user in the system is user *i*. We will show later that the optimal policy for M*N*(*λ*, −1) can be fully characterized by the optimal policies for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*, −1) where 1 ≤ *i* ≤ *N*.

#### *5.1. Optimal Policy for Single User*

In this section, we tackle the problem of finding the optimal policy for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*, −1). Since the users share the same structure, we ignore the superscript *i* for simplicity. To find the optimal policy, we first use the Approximating Sequence Method (ASM) introduced in [26] to make the state space finite. More precisely, we impose *s* ≤ *m* where *m* is a predetermined upper limit. The state transition probabilities *P <sup>s</sup>*,*s*(*a*,*r*ˆ) are modified in the following way

$$P\_{s,s'}'(a,\hat{r}) = \begin{cases} P\_{s,s'}(a,\hat{r}) & \text{if } s' < m, \\ P\_{s,s'}(a,\hat{r}) + \sum\_{z>m} P\_{s,z}(a,\hat{r}) & \text{if } s' = m. \end{cases} \tag{10}$$

The action space and the instant cost remain unchanged. Then, we can apply Relative Value Iteration (RVI) with convergence criteria  to obtain the optimal policy. We notice that M1(*λ*, −1) coincides with the decoupled model studied in Section 4.2. Hence, we can utilize the threshold structure of the optimal policy to improve RVI. To this end, we class a state as active if the optimal action at this state is *a* = 1. Then, the threshold structure detailed in Proposition 1 tells us the following. For any state *x*, if there exists an active state *x*<sup>1</sup> with *s*<sup>1</sup> ≤ *s* and *r*ˆ1 ≤ *r*ˆ, then *x* must also be active. Hence, we can determine the optimal action at state *x* immediately instead of comparing all feasible actions. In this way, we can reduce the running time of RVI. The pseudocode for the improved RVI can be found in Algorithm A1 of Appendix M. A similar technique is also presented in [5].

For M1(*λ*, −1), when problem (9) is satisfied, Whittle's index exists and can be calculated efficiently using Proposition 4. Therefore, we can obtain the optimal policy using Whittle's index and further reduce the computational complexity. To this end, we denote by *n<sup>λ</sup>* the optimal policy for M1(*λ*, −1) and present the following proposition

**Proposition 5** (Optimal deterministic policy)**.** *When* (9) *is satisfied, the optimal policy for* M1(*λ*, −1) *is n<sup>λ</sup>* = (+∞, *n*) *where n is given by*

$$n = \begin{cases} 1 & \text{if } \lambda = 0, \\ \max\{s \in \mathbb{N}\_0 : \mathcal{W}\_s \le \lambda\} + 1 & \text{if } \lambda > 0. \end{cases}$$

*Ws is the Whittle's index at state* (*s*, 1)*.*

**Proof.** We first notice that M1(*λ*, −1) coincides with the decoupled model studied in Section 4.2. Then, we show the optimal action for each state with *r*ˆ = 1 using the definition of Whittle's index and the fact that the decoupled problem is indexable when (9) is satisfied. The complete proof can be found in Appendix H.

In the following, we provide a randomized policy that is also optimal for M1(*λ*, −1). We will see later that the randomized policy is the key to obtaining the optimal policy for RP.

**Theorem 2** (Optimal randomized policy)**.** *There exist two deterministic policies nλ*<sup>+</sup> *and nλ*<sup>−</sup> *, which are both optimal for* M1(*λ*, −1)*. We consider the following randomized policy nλ: every time the system reaches state* (0, 0)*, the base station will make the choice between <sup>n</sup>λ*<sup>−</sup> *with probability μ and nλ*<sup>+</sup> *with probability* 1 − *μ. The chosen policy will be followed until the next choice. Then, the randomized policy n<sup>λ</sup> is optimal for* M1(*λ*, −1) *under any μ* ∈ [0, 1]*.*

**Proof.** We show that our system verifies the assumptions given in [27]. Then, leveraging the characteristics of our system, we can obtain the optimal randomized policy. The complete proof can be found in Appendix I.

In practice, we approximate *λ*<sup>+</sup> ≈ *λ* + *ξ* and *λ*<sup>−</sup> ≈ *λ* − *ξ* where *ξ* is a small perturbation. Then, the deterministic policies *nλ*<sup>+</sup> and *nλ*<sup>−</sup> can be obtained by following the discussion at the beginning of this subsection. Note that, in most cases, *nλ*<sup>+</sup> and *nλ*<sup>−</sup> are the same.

#### *5.2. Optimal Policy for RP*

In this section, we characterize the optimal policy for RP. Let us denote by *V*(*x*) and *Vi* (*xi*) the value functions of <sup>M</sup>*N*(*λ*, <sup>−</sup>1) and <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*, −1), respectively. Then, we can prove the following

**Proposition 6** (Separability)**.** *V*(*x*) = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *V<sup>i</sup>* (*xi*) *where x* = (*x*1, ... , *xN*)*. In other words, the policy, under which each user adopts its own optimal policy, is optimal for* M*N*(*λ*, −1)*.*

**Proof.** We show *V*(*x*) = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *V<sup>i</sup>* (*xi*) by comparing the Bellman equations they must satisfy. The complete proof can be found in Appendix J.

We denote the optimal policy for M*N*(*λ*, −1) as *φλ* = [*nλ*,1, ... , *nλ*,*N*] where *nλ*,*<sup>i</sup>* is the optimal policy for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*, <sup>−</sup>1). For simplicity, we define <sup>Δ</sup>¯(*λ*) and *<sup>ρ</sup>*¯(*λ*) as the expected AoII and the expected transmission rate associated with *φλ*, respectively. Δ¯ *<sup>i</sup>* (*λ*) and *ρ*¯*<sup>i</sup>* (*λ*) are defined analogously for user *i* under policy *nλ*,*i*. We also define *λ*<sup>∗</sup> inf{*λ* > 0 : *ρ*¯(*λ*) ≤ *M*}. With Proposition 6 and the above definitions in mind, we proceed with constructing the optimal policy for RP.

**Theorem 3** (Optimal policy for RP)**.** *The optimal policy for RP can be characterized by two deterministic policies φλ*<sup>∗</sup> <sup>+</sup> = [*nλ*<sup>∗</sup> <sup>+</sup>,1, ... , *nλ*<sup>∗</sup> <sup>+</sup>,*N*] *and φλ*<sup>∗</sup> <sup>−</sup> = [*nλ*<sup>∗</sup> −,1, ... , *<sup>n</sup>λ*<sup>∗</sup> −,*N*] *where <sup>n</sup>λ*<sup>∗</sup> <sup>+</sup>,*<sup>i</sup> and nλ*<sup>∗</sup> <sup>−</sup>,*<sup>i</sup> are both the optimal deterministic policies for* <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1)*. Then, we mix φλ*<sup>∗</sup> <sup>+</sup> *and φλ*<sup>∗</sup> <sup>−</sup> *in the following way: for each user i, every time the user reaches state* (0, 0)*, the base station will make the choice between nλ*<sup>∗</sup> <sup>−</sup>,*<sup>i</sup> with probability <sup>μ</sup><sup>i</sup> and <sup>n</sup>λ*<sup>∗</sup> <sup>+</sup>,*<sup>i</sup> with probability* 1 − *μi. The chosen policy will be followed by user i until the next choice. Where* 1 ≤ *i* ≤ *N, the μ<sup>i</sup> is chosen in such a way as to satisfy*

$$\sum\_{i=1}^{N} \rho^i(\lambda^\*) = \sum\_{i=1}^{N} \left( \mu\_i \rho^i(\lambda\_-^\*) + (1 - \mu\_i)\rho^i(\lambda\_+^\*) \right) = M. \tag{11}$$

*Then, the mixed policy, denoted by φλ*<sup>∗</sup> *, is optimal for RP.*

**Proof.** According to Lemma 3.10 of [27], a policy is optimal for RP if


Then, we construct such a policy using Theorem 2 and Proposition 6. The complete proof can be found in Appendix K.

Since we approximate *λ*∗ <sup>+</sup> ≈ *λ*<sup>∗</sup> + *ξ* and *λ*<sup>∗</sup> <sup>−</sup> <sup>≈</sup> *<sup>λ</sup>*<sup>∗</sup> <sup>−</sup> *<sup>ξ</sup>* in practice, *<sup>ρ</sup>*¯*<sup>i</sup>* (*λ*∗ +) <sup>≤</sup> *<sup>ρ</sup>*¯*<sup>i</sup>* (*λ*∗ −) for all *i* according to the monotonicity given by Lemma 3.4 of [27]. Combining with the definition of *λ*∗, we must have *ρ*¯(*λ*∗ +) ≤ *M* < *ρ*¯(*λ*<sup>∗</sup> <sup>−</sup>). Therefore, we can always find *<sup>μ</sup>i*'s that realize (11). In this paper, we choose

$$\mu\_i = \mu = \frac{M - \overline{\rho}(\lambda\_+^\*)}{\overline{\rho}(\lambda\_-^\*) - \overline{\rho}(\lambda\_+^\*)}, \quad \text{for } 1 \le i \le N. \tag{12}$$

Then, we describe the algorithm used to obtain the optimal policy for RP. As detailed in Theorem 3, it is essential to find *λ*∗. To this end, we recall that, for any user *i* under given *λ*, the optimal deterministic policy *nλ*,*<sup>i</sup>* can be obtained using the results in Section 5.1 and the resulting expected transmission rate *ρ*¯*<sup>i</sup>* (*λ*) is given by Proposition 2. Since *ρ*¯*<sup>i</sup>* (*λ*) is non-increasing in *λ* for all *i* according to Lemma 3.4 of [27], *ρ*¯(*λ*) = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *ρ*¯*<sup>i</sup>* (*λ*) is also non-increasing in *λ*. Hence, we can regard *ρ*¯(*λ*) as a non-increasing function of *λ*. Then, according to the definition of *λ*∗, we can use the Bisection search to obtain *λ*∗ efficiently. The main steps can be summarized as follows.


Then, *λ*∗ <sup>−</sup> and *<sup>λ</sup>*<sup>∗</sup> <sup>+</sup> can simply be the boundaries of the final interval. The pseudocode for the Bisection search can be found in Algorithm A2 of Appendix M. After obtaining *λ*∗ <sup>−</sup> and *λ*∗ <sup>+</sup>, the optimal policy *φλ*<sup>∗</sup> is detailed in Theorem 3 and the mixing probabilities *μi*'s are given by (12).

**Remark 6.** *We recall that the optimal deterministic policy for each user can be characterized by two positive thresholds (i.e., n*0, *n*<sup>1</sup> > 0*). Consequently, under RP-optimal policy, the base station will never choose the user at state* (0,*r*ˆ)*. Then, when M increases, the expected transmission rate achieved by RP-optimal policy will saturate before M reaches N. When the expected transmission rate saturates, the RP-optimal policy is φ*<sup>∗</sup> = [*n*1, ... , *nN*] *where n<sup>i</sup>* = (1, 1) *for* 1 ≤ *i* ≤ *N. The saturation happens when M is larger than or equal to the expected transmission rate achieved by φ*∗*.*

#### **6. Indexed Priority Policy**

Although the performance of Whittle's index policy is known to be good, it requires indexability, which is usually difficult to establish. In this section, based on the primaldual heuristic introduced in [28], we develop a policy that does not require indexability

and has comparable performance to Whittle's index policy. We start with presenting the primal-dual heuristic.

#### *6.1. Primal-Dual Heuristic*

The heuristic is based on the optimal primal and dual solution pair to the linear program associated with RP. To introduce the linear program, we define *πai xi* (*φ*) ≥ 0 as the expected time that user *i* is at state *xi* and action *ai* is taken according to policy *φ*. Then, for any *φ*, *πai xi* (*φ*) must satisfy the following problems

$$
\pi\_{\mathbf{x}\_i}^0(\boldsymbol{\phi}) + \pi\_{\mathbf{x}\_i}^1(\boldsymbol{\phi}) = \sum\_{\mathbf{x}\_i'} \sum\_{a\_i'} P\_{\mathbf{x}\_{i'}' \mathbf{x}\_i}(a\_i') \pi\_{\mathbf{x}\_i'}^{a\_i'}(\boldsymbol{\phi}), \quad \forall \mathbf{x}\_{i'} \text{ i.i.}
$$

$$
\sum\_{\mathbf{x}\_i} \sum\_{a\_i} \pi\_{\mathbf{x}\_i}^{a\_i}(\boldsymbol{\phi}) = 1, \quad \forall i.
$$

The objective function of RP can be rewritten as

$$\underset{\Phi \in \Phi}{\text{minimize}} \quad \sum\_{i=1}^{N} \sum\_{x\_i, a\_i} C(x\_i) \,\pi^{a\_i}\_{\boldsymbol{x}\_i}(\boldsymbol{\phi}),$$

where *C*(*xi*) = *fi*(*si*) is the instant cost at state *xi*. The constraint on the expected transmission rate can be rewritten as *<sup>N</sup>*

$$\sum\_{i=1}^{N} \sum\_{x\_i} \pi\_{x\_i}^1(\phi) \le M.$$

Thus, the linear program associated with RP can be formulated as the following

$$\underset{\pi\pi\_{X\_i}^{a\_i}}{\text{minimize}} \quad \sum\_{i=1}^{N} \sum\_{x\_i a\_i} \mathbb{C}(\mathbf{x}\_i) \pi\_{x\_i}^{a\_i} \tag{13a}$$

$$\text{subject to} \quad \pi\_{\mathbf{x}\_i}^0 + \pi\_{\mathbf{x}\_i}^1 - \sum\_{\mathbf{x}\_i'} \sum\_{a\_i'} P\_{\mathbf{x}\_i', \mathbf{x}\_i}(a\_i') \pi\_{\mathbf{x}\_i'}^{a\_i'} = \mathbf{0}, \quad \forall \mathbf{x}\_i, i, \tag{13b}$$

$$\sum\_{\mathbf{x}\_i} \sum\_{a\_i} \pi\_{\mathbf{x}\_i}^{a\_i} = \mathbf{1}, \quad \forall i,\tag{13c}$$

$$\sum\_{i=1}^{N} \sum\_{\mathbf{x}\_{i}} \pi\_{\mathbf{x}\_{i}}^{1} \le \mathcal{M}\_{\star} \tag{13d}$$

$$
\pi^{a\_i}\_{\mathbf{x}\_i} \ge 0, \quad \forall \mathbf{x}\_{i\prime} a\_{i\prime} \,\mathrm{i.}\tag{13e}
$$

The corresponding dual problem is

$$\underset{\sigma\_{\prime}\sigma\_{\prime i}}{\text{maximize}} \quad \sum\_{i=1}^{N} \sigma\_{i} - M\sigma \tag{14a}$$

$$\text{subject to} \quad \sigma\_{\mathbf{x}\_i} + \sigma\_i - \sum\_{\mathbf{x}\_i'} P\_{\mathbf{x}\_i \mathbf{x}\_i'}(\mathbf{0}) \sigma\_{\mathbf{x}\_i'} \le \mathbb{C}(\mathbf{x}\_i)\_\prime \quad \forall \mathbf{x}\_i, i, \tag{14b}$$

$$
\sigma\_{\mathbf{x}\_i} + \sigma\_i - \sum\_{\mathbf{x}\_i'} P\_{\mathbf{x}\_i, \mathbf{x}\_i'}(1)\sigma\_{\mathbf{x}\_i'} - \sigma \le C(\mathbf{x}\_i)\_\prime \quad \forall \mathbf{x}\_i, i\_\prime \tag{14c}
$$

$$
\sigma \ge 0.\tag{14d}
$$

Let {*π*¯ *ai xi* } and {*σ*¯, *σ*¯*i*, *σ*¯*xi* } be the optimal primal and dual solution pair to the problems reported in (13) and (14). We define

$$\begin{aligned} \bar{\psi}\_{\boldsymbol{x}\_{i}}^{0} &= \sum\_{\boldsymbol{x}\_{i}'} P\_{\boldsymbol{x}\_{i},\boldsymbol{x}\_{i}'}(0)\boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}'} + \mathbb{C}(\boldsymbol{x}\_{i}) - \boldsymbol{\sigma}\_{i} - \boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}} \ge 0, \\\\ \bar{\psi}\_{\boldsymbol{x}\_{i}}^{1} &= \sum\_{\boldsymbol{x}\_{i}'} P\_{\boldsymbol{x}\_{i},\boldsymbol{x}\_{i}'}(1)\boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}'} + \boldsymbol{\sigma} + \mathbb{C}(\boldsymbol{x}\_{i}) - \boldsymbol{\sigma}\_{i} - \boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}} \ge 0. \end{aligned}$$

For any state *x* = (*x*1, ... , *xN*), let *h*(*x*) = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> {*π*¯ <sup>1</sup> *xi* <sup>&</sup>gt;0}. Then, the heuristic operates in the following way


However, Linear Programming (LP) is a very general technique and does not appear to take advantage of the special structure of the problem. Although there are algorithms for solving rational LP that take time polynomial in the number of variables and constraints, they run extremely slowly in practice [29]. For our problem, we notice that the users have separate activity areas that are linked through a common resource constraint. Therefore, the primal problem can be solved using Dantzig-Wolfe decomposition. Even so, the problem is still computationally demanding when the system scales up. We recall that we solved the exact problem efficiently using MDP-specific algorithms in Section 5. It is more efficient because of the following reasons


In the following, we will translate the results in Section 5 into the optimal primal and dual solution pair and propose Indexed priority policy.

#### *6.2. Indexed Priority Policy*

We first define the Lagrangian function associated with (13).

<sup>L</sup>(*πai xi* , *σ*, *σi*, *σxi* , *ψai xi* ) = *<sup>N</sup>* ∑ *i*=1 ∑*xi*,*ai <sup>C</sup>*(*xi*)*πai xi* + ∑ *i*,*xi σxi* ∑ *x i* ∑ *a i Px i* ,*xi* (*a i*)*πa i x i* <sup>−</sup> *<sup>π</sup>*<sup>0</sup> *xi* <sup>−</sup> *<sup>π</sup>*<sup>1</sup> *xi* + *N* ∑ *i*=1 *σi* <sup>1</sup> <sup>−</sup> <sup>∑</sup>*xi* ∑*ai πai xi* + *σ <sup>N</sup>* ∑ *i*=1 ∑*xi π*1 *xi* − *M* − ∑ *i*,*xi*,*ai ψai xi πai xi* .

Then, the corresponding Lagrangian dual function is

$$\lg(\sigma, \sigma\_{i\prime}, \sigma\_{\chi\_{i\prime}\prime} \psi\_{\chi\_i}^{a\_i}) = \inf\_{\pi\_{\mathfrak{x}\_i^{i}}^{a\_i}} \mathcal{L}(\pi\_{\chi\_{i\prime}\prime}^{a\_i} \sigma, \sigma\_{i\prime} \sigma\_{\chi\_{i\prime}\prime} \psi\_{\chi\_i}^{a\_i}) .$$

Let *πxi* be the expected time that user *i* is at state *xi* caused by the adoption of *φλ*<sup>∗</sup> , where *φλ*<sup>∗</sup> is the optimal policy detailed in Theorem 3. Then, we define {*πai xi* } as follows


We also define *σ* = *λ*∗, *σ<sup>i</sup>* = *θi*, and *σxi* = *V<sup>i</sup>* (*xi*) where *λ*<sup>∗</sup> is specified in Section 5.2, *θ<sup>i</sup>* is the optimal value of <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, <sup>−</sup>1), and *<sup>V</sup><sup>i</sup>* (*xi*) is the value function associated with <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1). Lastly, we define {*ψai xi* } as follows

$$\begin{aligned} \boldsymbol{\Psi}\_{\boldsymbol{\chi}\_{i}}^{0} &= \sum\_{\boldsymbol{\chi}\_{i}'} P\_{\boldsymbol{\chi}\_{i}, \boldsymbol{\chi}\_{i}'}(0) \boldsymbol{\sigma}\_{\boldsymbol{\chi}\_{i}'} + \mathbb{C}(\boldsymbol{\chi}\_{i}) - \boldsymbol{\sigma}\_{i} - \boldsymbol{\sigma}\_{\boldsymbol{\chi}\_{i} \prime} \\\\ \boldsymbol{\Psi}\_{\boldsymbol{\chi}\_{i}}^{1} &= \sum\_{\boldsymbol{\chi}\_{i}'} P\_{\boldsymbol{\chi}\_{i}, \boldsymbol{\chi}\_{i}'}(1) \boldsymbol{\sigma}\_{\boldsymbol{\chi}\_{i}'} + \boldsymbol{\sigma} + \mathbb{C}(\boldsymbol{\chi}\_{i}) - \boldsymbol{\sigma}\_{i} - \boldsymbol{\sigma}\_{\boldsymbol{\chi}\_{i}} \dots \end{aligned}$$

Then, we can prove the following proposition.

**Proposition 7** (Optimal solution pair)**.** {*πai xi* } *and* {*σ*, *σi*, *σxi* , *ψai xi* } *are primal and dual solutions to* (13)*, respectively.*

**Proof.** Since (13) is linear and strictly feasible, it is sufficient to show that {*πai xi* } and {*σ*, *σi*, *σxi* , *ψai xi* } verify the KKT conditions, which can be expressed as the following four conditions.


Apparently, the first condition is satisfied by {*πai xi* }. For the second condition, *σ* ≥ 0 since *<sup>σ</sup>* <sup>=</sup> *<sup>λ</sup>*<sup>∗</sup> <sup>≥</sup> 0 by definition. For *<sup>ψ</sup>ai xi* , we can verify that *ψai xi* <sup>=</sup> *<sup>V</sup>i*,*ai*(*xi*) <sup>−</sup> *<sup>V</sup><sup>i</sup>* (*xi*) where *Vi*,*ai*(*xi*) is the value function resulting from taking action *ai* at state *xi*. Then, the nonnegativity is guaranteed by the Bellman equation. For the third condition, the first term is zero because we choose the *μi*'s given by (12). For the second term, we recall that *ψai xi* <sup>=</sup> *<sup>V</sup>i*,*ai*(*xi*) <sup>−</sup> *<sup>V</sup><sup>i</sup>* (*xi*). According to the definition of *<sup>π</sup>ai xi* , we know *V<sup>i</sup>* (*xi*) = *Vi*,*ai*(*xi*) if *πai xi* <sup>&</sup>gt; 0. Combined together, we can conclude that *<sup>ψ</sup>ai xi* <sup>=</sup> 0 when *<sup>π</sup>ai xi* > 0. Thus, the third condition is satisfied. For the last condition, setting the gradient equal to zero yields a system of linear equations. More precisely, for each *xi* and 1 ≤ *i* ≤ *N*

$$\begin{cases} \sum\_{\boldsymbol{x}\_{i}^{\prime}, \boldsymbol{x}\_{i}^{\prime}} (\boldsymbol{0}) \boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}^{\prime}} + \mathbb{C}(\boldsymbol{x}\_{i}) = \boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}} + \boldsymbol{\sigma}\_{\boldsymbol{i}} + \boldsymbol{\Psi}\_{\boldsymbol{x}\_{i}^{\prime}}^{0} \\ \sum\_{\boldsymbol{x}\_{i}^{\prime}} P\_{\boldsymbol{x}\_{i}, \boldsymbol{x}\_{i}^{\prime}} (\boldsymbol{1}) \boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}^{\prime}} + \boldsymbol{\sigma} + \mathbb{C}(\boldsymbol{x}\_{i}) = \boldsymbol{\sigma}\_{\boldsymbol{x}\_{i}} + \boldsymbol{\sigma}\_{\boldsymbol{i}} + \boldsymbol{\Psi}\_{\boldsymbol{x}\_{i}}^{1} \end{cases}$$

.

Then, {*σ*, *σi*, *σxi* , *ψai xi* } verifies the system of linear equations by definition. Since all four conditions are satisfied, we can conclude our proof.

According to Proposition 7, we know that {*πai xi* } and {*σ*, *σi*, *σxi* } defined above are the optimal solutions to problems (13) and (14), respectively. As the optimal solutions are obtained, we can adopt the heuristic detailed in Section 6.1.

The heuristic can be expressed equivalently as an index policy. To this end, we define the index *Ixi* for state *xi* as

$$I\_{\mathbf{x}\_i} \triangleq \psi\_{\mathbf{x}\_i}^0 - \psi\_{\mathbf{x}\_i}^1.$$

According to the complementary slackness, *Ixi* can be reduced to the following.


We can show that *Ixi* possesses the following properties.

**Proposition 8** (Properties of *Ixi* )**.** *For* 1 ≤ *i* ≤ *N, Ixi* ≥ −*λ*<sup>∗</sup> *for any xi. The equality holds when r*ˆ*<sup>i</sup>* = *p*<sup>0</sup> *<sup>e</sup>*,*<sup>i</sup>* = 0 *or si* = 0*. At the same time, Ixi is non-decreasing in both si and r*ˆ*i.*

**Proof.** We notice that *Ixi* can be expressed as a function of *V<sup>i</sup>* (*xi*) and *λ*∗. Meanwhile, M*i* <sup>1</sup>(*λ*∗, −1) coincides with the decoupled model studied in Section 4.2. Then, we can verify the properties of *Ixi* using the results in Section 4.2. The complete proof can be found in Appendix L.

Comparing with the heuristic detailed in Section 6.1, we can define the Indexed priority policy.

**Definition 5** (Indexed priority policy)**.** *At any state x* = (*x*1, *x*2, ... , *xN*)*, the base station will transmit the updates from M users with the largest Ixi . The ties are broken arbitrarily.*

**Remark 7.** *Indexed priority policy belongs to the class of priority policies introduced in [30]. These priority policies are asymptotically optimal when certain conditions are satisfied.*

**Remark 8.** *Indexed priority policy possesses the structural properties detailed in Corollary 1.*


We notice that *θi*'s and *C*(*xi*)'s are canceled out by the definition of *Ixi* . Therefore, *Ixi* can be calculated using *<sup>λ</sup>*<sup>∗</sup> and the value function of <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1). In practice, we can use either *λ*∗ <sup>−</sup> or *<sup>λ</sup>*<sup>∗</sup> <sup>+</sup> to approximate *λ*∗, and the value function can be approximated by the result of the RVI detailed in Section 5.1. Since the state space is infinite, we only calculate a finite number of *V<sup>i</sup>* (*xi*), the number of which depends on the truncation parameter *m* of ASM. Meanwhile, the probabilities *Pxi*,*x i* (*ai*) in *Ixi* are modified according to (10).

#### **7. Numerical Results**

In this section, we provide numerical results to showcase the performance of the developed scheduling policies. To eliminate the effect of *N*, we plot the expected average AoII. In particular, we provide the expected average AoII achieved by the Indexed priority policy and Whittle's index policy when *M* = 1. The policies are calculated using the results detailed in Sections 4–6. When obtaining the Indexed priority policy, we set the tolerance in the Bisection search to *ξ* = 0.005. Meanwhile, we choose the truncation parameter in ASM *m* = 800 and the convergence criteria in RVI  = 0.01. We notice that the calculation of Whittle's index involves an infinite sum. In practice, we approximate the result by replacing +∞ with a large enough number *kmax*. Here, we choose *kmax* = 800. For both scheduling policies, the resulting expected average AoII is obtained via simulations. Each data point is the average of 15 runs with 15,000 time slots considered in each run.

We also compare the developed policies with the optimal policy for RP, which can be calculated by following the discussion in Section 5.2. We adopt the same choices of parameters as we used to obtain the developed policies. The corresponding performance is calculated using Proposition 2. Like before, the infinite sum is approximated by replacing +∞ with *kmax* = 800. We also provide the expected average AoII achieved by the Greedy policy to show the performance advantages of the developed policies. When the Greedy policy is adopted, the base station always chooses the user with the largest AoII. The resulting expected average AoII is obtained via the same simulations as applied to the developed policies.

Figures 3 and 4 illustrate the performance when the source processes have different dynamics and when each user's communication goal is different, respectively. Figure 3a provides the performance when *pi* <sup>=</sup> 0.05 <sup>+</sup> 0.4(*i*−1) *<sup>N</sup>*−<sup>1</sup> for 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>*. For other parameters, the users make the same choices. More precisely, *fi*(*s*) = *s*, *γ<sup>i</sup>* = 0.6, and *p*<sup>0</sup> *<sup>e</sup>*,*<sup>i</sup>* = *<sup>p</sup>*<sup>1</sup> *<sup>e</sup>*,*<sup>i</sup>* = 0.1 for 1 ≤ *i* ≤ *N*. Figure 4a provides the performance when *fi*(*s*) = *s* 0.5<sup>+</sup> *<sup>i</sup>*−<sup>1</sup> *<sup>N</sup>*−<sup>1</sup> for 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>*. Same as before, the users make the same choices for other parameters. More precisely, *pi* = 0.3, *γ<sup>i</sup>* = 0.6, and *p*<sup>0</sup> *<sup>e</sup>*,*<sup>i</sup>* = *<sup>p</sup>*<sup>1</sup> *<sup>e</sup>*,*<sup>i</sup>* = 0.1 for 1 ≤ *i* ≤ *N*. In Figures 3b and 4b, we force *p*0 *<sup>e</sup>*,*<sup>i</sup>* = 0 for all users to ensure the existence of Whittle's index. Other choices remain the same as in Figures 3a and 4a. According to Corollary 1, the optimal policy will never choose the user with *r*ˆ = *p*<sup>0</sup> *<sup>e</sup>* = 0 unless it is to break the tie. Therefore, in Figures 3b and 4b, we also consider the Greedy+ policy where the base station always chooses the user

with the largest AoII among the users with *r*ˆ = 1. The resulting expected average AoII is obtained via the same simulations as applied to the Greedy policy.

Figure 5 shows the performance in systems where the parameters for each user are generated uniformly and randomly within their ranges. In Figure 5a, we consider *N* = 5, *<sup>γ</sup>* <sup>∈</sup> [0, 1], *<sup>p</sup>* <sup>∈</sup> [0.05, 0.45], *<sup>p</sup>r*<sup>ˆ</sup> *<sup>e</sup>* <sup>∈</sup> [0, 0.45], and *<sup>f</sup>*(*s*) = *<sup>s</sup>τ*, where *<sup>τ</sup>* <sup>∈</sup> [0.5, 1.5]. There are a total of 300 different choices and the results are sorted by the performance of RP-optimal policy in ascending order. Figure 5b adopts the same system settings except that we impose *p*0 *<sup>e</sup>*,*<sup>i</sup>* = 0 for 1 ≤ *i* ≤ *N* to ensure the feasibility of Whittle's index policy. Meanwhile, we ignore the Greedy policy since the Greedy+ policy achieves a better performance, as indicated by Figures 3b and 4b.

**Figure 3.** Performance when the source processes vary. We choose *pi* <sup>=</sup> 0.05 <sup>+</sup> 0.4(*i*−1) *<sup>N</sup>*−<sup>1</sup> , *fi*(*s*) = *<sup>s</sup>*, *<sup>γ</sup><sup>i</sup>* <sup>=</sup> 0.6, *<sup>p</sup>*<sup>0</sup> *<sup>e</sup>*,*<sup>i</sup>* <sup>=</sup> *<sup>p</sup>*<sup>0</sup> *e*, and *p*<sup>1</sup> *<sup>e</sup>*,*<sup>i</sup>* = 0.1 for 1 ≤ *i* ≤ *N*.

**Figure 4.** Performance when the communication goals vary. We choose *fi*(*s*) = *s* 0.5+ *<sup>i</sup>*−<sup>1</sup> *<sup>N</sup>*−<sup>1</sup> , *pi* = 0.3, *γ<sup>i</sup>* = 0.6, *p*<sup>0</sup> *<sup>e</sup>*,*<sup>i</sup>* <sup>=</sup> *<sup>p</sup>*<sup>0</sup> *e*, and *p*<sup>1</sup> *<sup>e</sup>*,*<sup>i</sup>* = 0.1 for 1 ≤ *i* ≤ *N*.

**Figure 5.** Performance in systems with random parameters when *N* = 5. The parameters for each user are chosen randomly within the following intervals: *<sup>γ</sup>* <sup>∈</sup> [0, 1], *<sup>p</sup>* <sup>∈</sup> [0.05, 0.45], *<sup>p</sup>*<sup>0</sup> *<sup>e</sup>* <sup>∈</sup> *<sup>I</sup>*, *<sup>p</sup>*<sup>1</sup> *<sup>e</sup>* <sup>∈</sup> [0, 0.45], and *<sup>f</sup>*(*s*) = *<sup>s</sup><sup>τ</sup>* where *<sup>τ</sup>* <sup>∈</sup> [0.5, 1.5].

We can make the following observations from the figures.


#### **8. Conclusions**

In this paper, we studied the problem of minimizing the Age of Incorrect Information in a slotted-time system where a base station needs to schedule *M* users among *N* available users. Meanwhile, the base station has access to imperfect channel state information in each time slot. The problem is a restless multi-armed bandit problem which is SPACEhard. However, by casting the problem into a Markov decision process, we obtain the structural properties of the optimal policy. Then, we introduce a relaxed version of the original problem and investigate the decoupled model. Under a simple condition, we establish the indexability of the decoupled problem and obtain the expression of Whittle's index. On this basis, we developed Whittle's index policy. To get rid of the requirement for indexability, we developed the Indexed priority policy based on the optimal policy for the relaxed problem. The characteristics of the relaxed problem are explored to make the calculation of its optimal policy more efficient. Finally, through numerical results, we show that simple applications of the structural properties can improve the performance of scheduling policies. Moreover, Whittle's index policy and the Indexed priority policy achieve good and comparable performances.

**Author Contributions:** Formal analysis, Y.C.; Investigation, Y.C.; Methodology, Y.C.; Supervision, A.E.; Validation, Y.C.; Writing—original draft, Y.C.; Writing—review & editing, Y.C. and A.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Lemma 1**

We consider two states, *x*<sup>1</sup> and *x*2, that differ only in the value of *sj*. Without the loss of generality, we assume *s*1,*<sup>j</sup>* < *s*2,*j*. Then, it is sufficient to show that, for any 1 ≤ *j* ≤ *N*, *V*(*x*1) ≤ *V*(*x*2). Leveraging the iterative nature of VIA, we use mathematical induction to prove the monotonicity. First of all, the base case (i.e., *ν* = 0) is true by initialization. We assume the lemma holds at iteration *ν*. Then, we want to examine whether it holds at iteration *ν* + 1. The update step reported in problem (5) can be rewritten as follows.

$$V\_{\nu+1}(\mathbf{x}) = \min\_{\mathbf{a} \in \mathcal{A}\_N(1)} V\_{\nu+1}^{\mathbf{a}}(\mathbf{x})\_{\prime} \tag{A1}$$

where

$$V\_{\nu+1}^{\mathbf{a}}(\mathbf{x}) = \mathbb{C}(\mathbf{x}) - \theta + \sum\_{\mathbf{x}' - \{\mathbf{x}'\_{j}\}} \left\{ \left( \prod\_{i \neq j} P\_{\mathbf{x}\_{i}, \mathbf{x}'\_{i}}(a\_{i}) \right) \sum\_{\mathbf{r}'\_{j}} P(\hat{\mathbf{r}}'\_{j}) \mathcal{U}\_{\nu}^{\mathbf{j}}(\mathbf{x}, \mathbf{x}') \right\},$$

$$\mathcal{U}\_{\nu}^{\mathbf{j}}(\mathbf{x}, \mathbf{x}') = \sum\_{\mathbf{s}'\_{j}} P\_{s\_{j}, \mathbf{s}'\_{j}}(a\_{j}, \mathbf{r}\_{j}) V\_{\nu}(\mathbf{x}').$$

To prove the desired results, we distinguish between the following cases.

• We first consider the case of *s*1,*<sup>j</sup>* = 0 < *s*2,*<sup>j</sup>* and *r*ˆ1,*<sup>j</sup>* = *r*ˆ2,*<sup>j</sup>* = 0. When *aj* = 1 and for any *x* − {*s j* }, we have

$$\mathcal{U}\_{\boldsymbol{\nu}}^{j}(\mathbf{x}\_{1},\mathbf{x}') = p\_{\boldsymbol{\jmath}}V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{\jmath}}'=1) + (1-p\_{\boldsymbol{\jmath}})V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{\jmath}}'=0),$$

$$\mathcal{U}\_{\boldsymbol{\nu}}^{j}(\mathbf{x}\_{2},\mathbf{x}') = \beta\_{\boldsymbol{\jmath}}V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{\jmath}}'=\mathbf{s}\_{2,j}+1) + (1-\beta\_{\boldsymbol{\jmath}})V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{\jmath}}'=0),$$

where *Vν*(*x* ;*s <sup>j</sup>* = 0) is the estimated value function of the state *x* with *s <sup>j</sup>* = 0 at iteration *ν* (at the risk of abusing the notation, we use *V*(*x*;*sj* = *s*1) and *V*(*x*;*sj* = *s*2) to represent the value functions of two states that differ only in the value of *sj*). Then, we get

$$\|\mathcal{U}\_{\boldsymbol{\nu}}^{\boldsymbol{j}}(\mathbf{x}\_{1},\mathbf{x}') - \mathcal{U}\_{\boldsymbol{\nu}}^{\boldsymbol{j}}(\mathbf{x}\_{2},\mathbf{x}') \le (p\_{\boldsymbol{j}} - \beta\_{\boldsymbol{j}}) \left(V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{j}}' = 1) - V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{j}}' = 0)\right) \le 0.$$

The inequalities hold since *β<sup>j</sup>* > *pj* and Lemma 1 are true at iteration *ν* by assumption. Therefore, we have *U<sup>j</sup> <sup>ν</sup>*(*x*1, *x* ) <sup>≤</sup> *<sup>U</sup><sup>j</sup> <sup>ν</sup>*(*x*2, *x* ) when *aj* = 1 for any *x* − {*s j* }.

For the case of *ai* = 1 where *i* = *j*, we notice that *aj* = 0. Then, for any *x* − {*s j* }, we obtain

$$\mathcal{U}\_{\boldsymbol{\nu}}^{l}(\mathbf{x}\_{1},\mathbf{x}') = p\_{\boldsymbol{\bar{\jmath}}}V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{\bar{\jmath}}}'=1) + (1-p\_{\boldsymbol{\bar{\jmath}}})V\_{\boldsymbol{\nu}}(\mathbf{x}';\mathbf{s}\_{\boldsymbol{\bar{\jmath}}}'=0),$$

$$
\mathcal{U}\_{\boldsymbol{\nu}}^{j}(\mathbf{x}\_{2\boldsymbol{\nu}}\mathbf{x}') = (1 - p\_{\boldsymbol{\restriction}})V\_{\boldsymbol{\nu}}(\mathbf{x}'; \mathbf{s}\_{\boldsymbol{\triangleright}}' = \mathbf{s}\_{2, \boldsymbol{\flat}} + 1) + p\_{\boldsymbol{\n}}V\_{\boldsymbol{\nu}}(\mathbf{x}'; \mathbf{s}\_{\boldsymbol{\flat}}' = 0).
$$

Therefore, when *ai* = 1, we have

$$\mathbb{L}l\_{\boldsymbol{\nu}}^{j}(\mathbf{x}\_{1},\mathbf{x}') - l\_{\boldsymbol{\nu}}^{j}(\mathbf{x}\_{2},\mathbf{x}') \le (2p\_{j} - 1) \left( V\_{\boldsymbol{\nu}}(\mathbf{x}'; \mathbf{s}\_{j}' = 1) - V\_{\boldsymbol{\nu}}(\mathbf{x}'; \mathbf{s}\_{j}' = 0) \right) \le 0.$$

The inequalities hold since 2*pj* − 1 < 0 and Lemma 1 is true at iteration *ν* by assumption. Combining with the case of *aj* <sup>=</sup> 1, *<sup>U</sup><sup>j</sup> <sup>ν</sup>*(*x*1, *x* ) <sup>≤</sup> *<sup>U</sup><sup>j</sup> <sup>ν</sup>*(*x*2, *x* ) holds for any

*x* − {*s j* } under any feasible action. Since *x*<sup>1</sup> and *x*<sup>2</sup> differ only in the value of *sj* and *<sup>C</sup>*(*x*) is non-decreasing in *si* for 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>*, we can see that *<sup>V</sup><sup>a</sup> <sup>ν</sup>*+1(*x*1) <sup>≤</sup> *<sup>V</sup><sup>a</sup> <sup>ν</sup>*+1(*x*2) for any feasible *a*. Then, by (A1), we can conclude that the lemma holds at iteration *ν* + 1 when *s*1,*<sup>j</sup>* = 0 < *s*2,*<sup>j</sup>* and *r*ˆ1,*<sup>j</sup>* = *r*ˆ2,*<sup>j</sup>* = 0.


$$\begin{aligned} P\_{s\_{1,j},s\_{1,j}+1}(a\_{j'}\mathfrak{f}\_{1,j}) &= P\_{s\_{2,j},s\_{2,j}+1}(a\_{j'}\mathfrak{f}\_{2,j}),\\ P\_{s\_{1,j},0}(a\_{j'}\mathfrak{f}\_{1,j}) &= P\_{s\_{2,j},0}(a\_{j'}\mathfrak{f}\_{2,j}).\end{aligned}$$

Then, leveraging the monotonicity of *Vν*(*x*) and *C*(*x*), we can conclude with the same result.

Combining the three cases, we prove that the lemma also holds at iteration *ν* + 1 of VIA. Therefore, the lemma holds at any iteration *ν* by mathematical induction. Since the results hold for any 1 ≤ *j* ≤ *N* and VIA is guaranteed to converge to the value function when *ν* → +∞, we can conclude our proof.

#### **Appendix B. Proof of Lemma 2**

We inherit the notations in the proof of Lemma 1. We still use mathematical induction to obtain the desired results. The base case *ν* = 0 is true by initialization. We assume the lemma holds at iterative *ν* and examine whether it still holds at iteration *ν* + 1. In the case of *M* = 1, we rewrite (5) as

$$V\_{\nu+1}(\mathbf{x}) = \min\_{1 \le j \le N} V\_{\nu+1}^j(\mathbf{x}),\tag{A2}$$

where

$$V\_{\nu+1}^{j}(\mathbf{x}) = \mathbb{C}(\mathbf{x}) - \theta + \sum\_{\mathbf{x}'} \left\{ \left( \prod\_{i \neq j} P\_{x\_i, x\_i'}^{i}(0) \right) P\_{x\_j, x\_j'}^{j}(1) V\_{\nu}(\mathbf{x}') \right\},\tag{A3}$$

and *P<sup>i</sup> <sup>x</sup>*,*x*(*ai*) is the probability that action *ai* will lead to state *x* when user *i* is at state *x*. To get the desired results, we distinguish between the following cases

• We first show that *V<sup>j</sup> <sup>ν</sup>*+1(*x*) = *<sup>V</sup><sup>k</sup> <sup>ν</sup>*+1(P(*x*)). According to (A3), we have

$$\begin{split} V\_{\nu+1}^{j}(\mathbf{x}) &= \mathsf{C}(\mathbf{x}) - \theta + \sum\_{\mathbf{x}'} \left\{ \left( \prod\_{i \neq j,k} P\_{\mathbf{x}\_i,\mathbf{x}'\_i}^{i}(0) \right) P\_{\mathbf{x}\_k,\mathbf{x}'\_k}^{k}(0) P\_{\mathbf{x}\_j,\mathbf{x}'\_j}^{j}(1) V\_{\nu}(\mathbf{x}') \right\}. \\ \mathcal{P}(\mathcal{P}(\mathbf{x})) &= \mathsf{C}(\mathcal{P}(\mathbf{x})) - \theta + \end{split}$$

$$\begin{split} V\_{\nu+1}^{k}(\mathcal{P}(\mathbf{x})) &= \mathbb{C}(\mathcal{P}(\mathbf{x})) - \theta + \\ &\sum\_{\beta^{\nu}(\mathbf{x})'} \left( \prod\_{i \neq j,k} P\_{\beta^{\nu}(\mathbf{x})\_{i}, \beta^{\nu}(\mathbf{x})\_{i}'}^{i}(0) \right) P\_{\beta^{\nu}(\mathbf{x})\_{k}, \beta^{\nu}(\mathbf{x})\_{k}'}^{i}(1) P\_{\beta^{\nu}(\mathbf{x})\_{j}, \beta^{\nu}(\mathbf{x})\_{j}'}^{j}(0) V\_{\nu}(\mathcal{P}(\mathbf{x})'). \end{split}$$

It is obvious that for any P(*x*) , there always exists P(*x*) = P(*x*) . Then, we obtain

$$\begin{split} V\_{\nu+1}^{k}(\mathcal{P}(\mathbf{x})) &= \mathbb{C}(\mathcal{P}(\mathbf{x})) - \theta + \\ &\sum\_{\mathcal{P}(\mathbf{x''})} \left( \prod\_{i \neq j,k} P\_{\mathbf{x}\_{i},\mathbf{x''\_{i}}}^{i}(0) \right) P\_{\mathbf{x}\_{j},\mathcal{P}(\mathbf{x''})\_{k}}^{k}(1) P\_{\mathbf{x}\_{k},\mathcal{P}(\mathbf{x''})\_{j}}^{j}(0) V\_{\nu}(\mathcal{P}(\mathbf{x''})) \\ &= \mathbb{C}(\mathcal{P}(\mathbf{x})) - \theta + \sum\_{\mathbf{x''}} \left( \prod\_{i \neq j,k} P\_{\mathbf{x}\_{i},\mathbf{x''\_{i}}}^{i}(0) \right) P\_{\mathbf{x}\_{j},\mathbf{x''\_{j}}}^{k}(1) P\_{\mathbf{x}\_{k},\mathbf{x''\_{k}}}^{j}(0) V\_{\nu}(\mathbf{x''}) \\ &= \mathbb{C}(\mathcal{P}(\mathbf{x})) - \theta + \sum\_{\mathbf{x'}} \left( \prod\_{i \neq j,k} P\_{\mathbf{x}\_{i},\mathbf{x'\_{i}}}^{i}(0) \right) P\_{\mathbf{x}\_{j},\mathbf{x'\_{j}}}^{k}(1) P\_{\mathbf{x}\_{k},\mathbf{x'\_{j}}}^{j}(0) V\_{\nu}(\mathbf{x'}). \end{split}$$

The second equality follows from the definition of P(·), the property of summation, and the assumption at iteration *ν*. The last equality follows from the variable renaming. Then, by the definition of statistically identical, we have *Pk xj*,*x j* (1) = *P<sup>j</sup> xj*,*x j* (1), *P<sup>j</sup> xk*,*x k* (0) = *P<sup>k</sup> xk*,*x k* (0), and *C*(*x*) = *C*(P(*x*)). Therefore, we can conclude that *V<sup>j</sup> <sup>ν</sup>*+1(*x*) = *<sup>V</sup><sup>k</sup> <sup>ν</sup>*+1(P(*x*)).

• Along the same lines, we can easily show that *V<sup>k</sup> <sup>ν</sup>*+1(*x*) = *<sup>V</sup><sup>j</sup> <sup>ν</sup>*+1(P(*x*)) and *<sup>V</sup><sup>i</sup> <sup>ν</sup>*+1(*x*) = *Vi <sup>ν</sup>*+1(P(*x*)) for *i* = *j*, *k*.

Combining the above cases with (A2), we prove that *Vν*+1(*x*) = *Vν*+1(P(*x*)). Then, by induction, we have *Vν*(*x*) = *Vν*(P(*x*)) at any iteration *ν*. Since VIA is guaranteed to converge to the value function when *ν* → +∞, we can conclude our proof.

#### **Appendix C. Proof of Theorem 1**

For arbitrary *j* and *k*

$$\delta^{j,k}(\mathbf{x}) = \sum\_{\mathbf{x'} - \{\mathbf{x'}\_j, \mathbf{x'}\_k\}} \left\{ \left( \prod\_{i \neq j,k} P\_{\mathbf{x}\_i, \mathbf{x'}\_i}(0) \right) \sum\_{\mathbf{r'}\_j, \mathbf{r'}\_k} P(\mathbf{r'}\_j) P(\mathbf{r'}\_k) R^{j,k}(\mathbf{x}, \mathbf{x'}) \right\},\tag{A4}$$

where

$$R^{j,k}(\mathbf{x}, \mathbf{x}') = \sum\_{s'\_{\hat{j}}, s'\_{\hat{k}}} \left[ \left( P\_{s\_k s'\_{\hat{k}}}(0, \hat{r}\_k) P\_{s\_{\hat{j}} s'\_{\hat{j}}}(1, \hat{r}\_{\hat{j}}) - P\_{s\_k s'\_{\hat{k}}}(1, \hat{r}\_k) P\_{s\_{\hat{j}} s'\_{\hat{j}}}(0, \hat{r}\_{\hat{j}}) \right) V(\mathbf{x'}) \right]. \tag{A5}$$

With this in mind, we will prove the properties one by one.

Property 1—*δj*,*k*(*x*) <sup>≤</sup> 0 if *<sup>r</sup>*ˆ*<sup>k</sup>* <sup>=</sup> *<sup>p</sup>*<sup>0</sup> *<sup>e</sup>*,*<sup>k</sup>* = 0. The equality holds when *sj* = 0 or *<sup>r</sup>*ˆ*<sup>j</sup>* = *<sup>p</sup>*<sup>0</sup> *<sup>e</sup>*,*<sup>j</sup>* = 0.

When *r*ˆ*<sup>k</sup>* = *p*<sup>0</sup> *<sup>e</sup>*,*<sup>k</sup>* = 0, transmitting the update from user *k* will necessarily fail. Therefore, *Psk*,*s k* (0, 0) = *Psk*,*s k* (1, 0) for any *sk* and *s <sup>k</sup>*. Then, we have

$$R^{j,k}(\mathbf{x},\mathbf{x}') = \sum\_{s\_k'} P\_{s\_k s\_k'}(0,0) \sum\_{s\_j'} \left[ \left( P\_{s\_j, s\_j'}(1, \hat{r}\_j) - P\_{s\_j, s\_j'}(0, \hat{r}\_j) \right) V(\mathbf{x}') \right].$$

To identify the sign of *Rj*,*k*(*x*, *x* ), we distinguish between the following cases


$$\begin{split} R^{j,k}(\mathbf{x}, \mathbf{x}') &= \sum\_{s\_k'} P\_{s\_k s\_k'}(0, 0) (\mathbf{a}\_j + p\_j - 1) \left( V(\mathbf{x}'; s\_j' = s\_j + 1) - V(\mathbf{x}'; s\_j' = 0) \right) \\ &\le 0. \end{split} \tag{A6}$$

The inequality holds because of Lemma 1 and the fact that *α<sup>j</sup>* + *pj* < 1. We recall that *δj*,*k*(*x*) is a linear combination of *Rj*,*k*(*x*, *x* )'s with non-negative coefficients. Then, we can conclude that *<sup>δ</sup>j*,*k*(*x*) <sup>≤</sup> 0 in this case.

• When *sj* > 0 and *r*ˆ*<sup>j</sup>* = 0, by replacing the *α<sup>j</sup>* in (A6) with *βj*, we can get the same result. In this case, the equality holds when *β<sup>j</sup>* + *pj* = 1, or, equivalently, *p*<sup>0</sup> *<sup>e</sup>*,*<sup>j</sup>* = 0.

Combining the cases, we prove the first property.

Property 2—*δj*,*k*(*x*) is non-increasing in *r*ˆ*<sup>j</sup>* and is non-decreasing in *r*ˆ*<sup>k</sup>* when *sj*,*sk* > 0. At the same time, *<sup>δ</sup>j*,*k*(*x*) is independent of *<sup>r</sup>*ˆ*<sup>i</sup>* for any *<sup>i</sup>* <sup>=</sup> *<sup>j</sup>*, *<sup>k</sup>*.

We first prove the monotonicity of *δj*,*k*(*x*) with respect to *r*ˆ*j*. To this end, we define *x*<sup>1</sup> and *x*<sup>2</sup> as two states that differ only in the value of *r*ˆ*j*. Without a loss of generality, we assume *<sup>r</sup>*ˆ1,*<sup>j</sup>* <sup>=</sup> 1 and *<sup>r</sup>*ˆ2,*<sup>j</sup>* <sup>=</sup> 0. Then, we investigate the sign of *<sup>δ</sup>j*,*k*(*x*1) <sup>−</sup> *<sup>δ</sup>j*,*k*(*x*2). We define *xi <sup>x</sup>*1,*<sup>i</sup>* <sup>=</sup> *<sup>x</sup>*2,*<sup>i</sup>* for *<sup>i</sup>* <sup>=</sup> *<sup>j</sup>*. Then, according to (A4), *<sup>δ</sup>j*,*k*(*x*1) <sup>−</sup> *<sup>δ</sup>j*,*k*(*x*2) can be written as

$$\begin{split} \delta^{j,k}(\mathbf{x}\_{1}) - \delta^{j,k}(\mathbf{x}\_{2}) &= \\ \sum\_{\mathbf{x'} - \{\mathbf{x}'\_{j}, \mathbf{x'\_{k}}\}} \left\{ \left( \prod\_{i \neq j,k} P\_{\mathbf{x}\_{i}, \mathbf{x'\_{i}}}(0) \right) \sum\_{\mathbf{\tilde{r}'\_{j}}, \mathbf{\tilde{r}'\_{k}}} P(\mathbf{\tilde{r}'\_{j}}) P(\mathbf{\tilde{r}'\_{k}}) \left( \mathcal{R}^{j,k}(\mathbf{x}\_{1}, \mathbf{x'}) - \mathcal{R}^{j,k}(\mathbf{x}\_{2}, \mathbf{x'}) \right) \right\}. \end{split}$$

Since *x*1,*<sup>k</sup>* = *x*2,*k*, we have *Ps*1,*k*,*s k* (*a*,*r*ˆ1,*k*) = *Ps*2,*k*,*s k* (*a*,*r*ˆ2,*k*) for any *s <sup>k</sup>*. We recall that the transition probability is independent of *r*ˆ when *a* = 0. Combining with the fact that *s*1,*<sup>j</sup>* = *s*2,*j*, we also have *Ps*1,*j*,*s j* (0,*r*ˆ1,*j*) = *Ps*2,*j*,*s j* (0,*r*ˆ2,*j*) for any *s j* . Combining together, we obtain

$$P\_{s\_{1,k},s\_k'}(1, \mathfrak{rh}\_{1,k}) P\_{s\_{1,j},s\_j'}(0, \mathfrak{rh}\_{1,j}) = P\_{s\_{2,k},s\_k'}(1, \mathfrak{rh}\_{2,k}) P\_{s\_{2,j}s\_j'}(0, \mathfrak{rh}\_{2,j}),$$

$$P\_{s\_{1,k},s\_k'}(0, \mathfrak{rh}\_{1,k}) = P\_{s\_{2,k},s\_k'}(0, \mathfrak{rh}\_{2,k}).$$

Leveraging the above two problems, we have

$$\begin{split} R^{j,k}(\mathbf{x}\_1, \mathbf{x}') - R^{j,k}(\mathbf{x}\_2, \mathbf{x}') &= \\ \sum\_{s\_j', s\_k'} \left[ P\_{s\_k, s\_k'}(0, \mathbf{\hat{r}}\_k) \left( P\_{s\_{1,j}, s\_j'}(1, \mathbf{\hat{r}}\_{1,j}) - P\_{s\_{2,j}, s\_j'}(1, \mathbf{\hat{r}}\_{2,j}) \right) V(\mathbf{x}') \right]. \end{split}$$

Consequently, we obtain

$$\begin{split} \delta^{j,k}(\mathbf{x}\_{1}) - \delta^{j,k}(\mathbf{x}\_{2}) &= \\ \sum\_{\mathbf{x}' - \{\mathbf{x}'\_{j}\}} \left\{ \prod\_{i \neq j} P\_{\mathbf{x}\_{i}, \mathbf{x}'\_{i}}(0) \left[ \sum\_{\mathbf{y}'\_{j}} P(\mathbf{r}'\_{j}) \sum\_{s'\_{j}} \left( P\_{s\_{1,j}, \mathbf{s}'\_{j}}(1,1) - P\_{s\_{2,j}, \mathbf{s}'\_{j}}(1,0) \right) V(\mathbf{x}') \right] \right\}. \end{split}$$

In the following, we characterize the sign of

$$R\_1 \stackrel{\triangle}{=} \sum\_{s'\_j} \left( P\_{s\_{1,j}, s'\_j}(1, 1) - P\_{s\_{2,j}, s'\_j}(1, 0) \right) V(\mathbf{x'}) .$$

As *s*1,*<sup>j</sup>* = *s*2,*<sup>j</sup>* > 0, for any *x* − {*s j* }, we have

$$R\_1 = \left( (1 - \alpha\_{\dot{j}}) - (1 - \beta\_{\dot{j}}) \right) V(\mathbf{x'}; s\_{\dot{j}}' = 0) + (\alpha\_{\dot{j}} - \beta\_{\dot{j}}) V(\mathbf{x'}; s\_{\dot{j}}' = s\_{1, \dot{j}} + 1) \le 0.$$

The inequality follows from Lemma <sup>1</sup> and the fact that *<sup>β</sup><sup>j</sup>* <sup>&</sup>gt; *<sup>α</sup>j*. Since *<sup>δ</sup>j*,*k*(*x*1) <sup>−</sup> *<sup>δ</sup>j*,*k*(*x*2) is a linear combination of *R*1's with non-negative coefficients, we can conclude that *<sup>δ</sup>j*,*k*(*x*1) <sup>≤</sup> *<sup>δ</sup>j*,*k*(*x*2). Since *<sup>r</sup>*ˆ1,*<sup>j</sup>* <sup>&</sup>gt; *<sup>r</sup>*ˆ2,*j*, we can see that *<sup>δ</sup>j*,*k*(*x*) is non-increasing in *<sup>r</sup>*ˆ*j*.

In a very similar way, we can show that *δj*,*k*(*x*) is non-decreasing in *r*ˆ*k*. We recall that *r*ˆ*<sup>i</sup>* will not affect the system dynamic if *ai* = 0. Consequently, we can conclude that *δj*,*k*(*x*) is independent of *r*ˆ*<sup>i</sup>* for any *i* = *j*, *k*.

Combining together, we prove the second property.

Property 3—*δj*,*k*(*x*) <sup>≤</sup> 0 if *sk* <sup>=</sup> 0. The equality holds when *sj* <sup>=</sup> 0 or *<sup>r</sup>*ˆ*<sup>j</sup>* <sup>=</sup> *<sup>p</sup>*<sup>0</sup> *<sup>e</sup>*,*<sup>j</sup>* = 0.

Since the probabilities are non-negative, it is sufficient to show that *Rj*,*k*(*x*, *x* ) satisfies Property 3 for any *x* − {*s j* ,*s <sup>k</sup>*}. More precisely, it is sufficient to show that *<sup>R</sup>j*,*k*(*x*, *<sup>x</sup>* ) ≤ 0

for any *x* − {*s j* ,*s <sup>k</sup>*} when *sk* <sup>=</sup> 0 and the equality holds when *sj* <sup>=</sup> 0 or *<sup>r</sup>*ˆ*<sup>j</sup>* <sup>=</sup> *<sup>p</sup>*<sup>0</sup> *<sup>e</sup>*,*<sup>j</sup>* = 0. We recall that *Psk*,*s k* (1,*r*ˆ*k*) = *Psk*,*s k* (0,*r*ˆ*k*) for any *s <sup>k</sup>* when *sk* = 0. Hence, for any *x* − {*s j* ,*s k*}, we have

$$R^{j,k}(\mathbf{x}, \mathbf{x}') = \sum\_{s'\_k} \left[ P\_{s\_k s'\_k}(0, \mathfrak{k}\_{\mathbf{i}}) \sum\_{s'\_j} \left( P\_{s\_j, s'\_j}(1, \mathfrak{k}\_{\mathbf{j}}) - P\_{s\_j, s'\_j}(0, \mathfrak{k}\_{\mathbf{j}}) \right) V(\mathbf{x}') \right].$$

Then, we investigate the following quantity for any *x* − {*s j* }

$$R\_2 \stackrel{\triangle}{=} \sum\_{s'\_j} \left( P\_{s\_j, s'\_j}(1, \mathfrak{k}\_j) - P\_{x\_j, x'\_j}(0, \mathfrak{k}\_j) \right) V(\mathfrak{x'}) .$$

To this end, we distinguish between the following cases


$$R\_2 = (a\_{\not\slash} - 1 + p\_{\not\slash})V(\mathbf{x}'; s\_{\not\slash}' = s\_{\not\slash} + 1) + (1 - a\_{\not\slash} - p\_{\not\slash})V(\mathbf{x}'; s\_{\not\slash}' = 0) \leq 0 \tag{A7}$$

The inequality follows from Lemma 1 and the fact that *α<sup>j</sup>* + *pj* < 1. Thus, *Rj*,*k*(*x*, *x* ) ≤ 0 for any *x* − {*s j* ,*s k*}.

• When *sj* > 0 and *r*ˆ*<sup>j</sup>* = 0, by replacing the *α<sup>j</sup>* in (A7) with *βj*, we can get the same result. In this case, the equality holds when *β<sup>j</sup>* + *pj* = 1, or, equivalently, *p*<sup>0</sup> *<sup>e</sup>*,*<sup>j</sup>* = 0.

Combined together, we can conclude that Property 3 is true.

Property 4—*δj*,*k*(*x*) is non-increasing in *sj* if Γ *r*ˆ*j <sup>j</sup>* <sup>≤</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> <sup>k</sup>* and is non-decreasing in *sk* if Γ *r*ˆ*j <sup>j</sup>* <sup>≥</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> k* when *sj*,*sk* > 0. We define Γ<sup>1</sup> *<sup>i</sup> α<sup>i</sup>* <sup>1</sup>−*pi* and <sup>Γ</sup><sup>0</sup> *<sup>i</sup> β<sup>i</sup>* <sup>1</sup>−*pi* for 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>*.

Such as we did in the proof of Property 3, it is sufficient to show that *Rj*,*k*(*x*, *x* ) satisfies Property 4 for any *x* − {*s j* ,*s <sup>k</sup>*}. We recall that *<sup>R</sup>j*,*k*(*x*, *<sup>x</sup>* ) depends on the values of *r*ˆ*<sup>j</sup>* and *r*ˆ*k*. Therefore, we distinguish between the following cases

• In the case of *r*ˆ*<sup>j</sup>* = *r*ˆ*<sup>k</sup>* = 1 and *sj*,*sk* > 0, for any *x* − {*s j* ,*s <sup>k</sup>*}, (A5) can be written as

$$\begin{split} R^{jk}(\mathbf{x}, \mathbf{x}') &= \sum\_{s\_j', s\_k'} \left[ \left( P\_{s\_k s\_k'}(0, 1) P\_{s\_j s\_j'}(1, 1) - P\_{s\_k s\_k'}(1, 1) P\_{s\_j s\_j'}(0, 1) \right) V(\mathbf{x}') \right] \\ &= \left( p\_k a\_j - (1 - p\_j)(1 - a\_k) \right) V(\mathbf{x}'; s\_j' = s\_j + 1; s\_k' = 0) \\ &+ \left( (1 - p\_k)(1 - a\_j) - p\_j a\_k \right) V(\mathbf{x}'; s\_j' = 0; s\_k' = s\_k + 1) \\ &+ \left( (1 - p\_k)a\_j - (1 - p\_j)a\_k \right) V(\mathbf{x}'; s\_j' = s\_j + 1; s\_k' = s\_k + 1) \\ &+ \left( p\_k(1 - a\_j) - p\_j(1 - a\_k) \right) V(\mathbf{x}'; s\_j' = 0; s\_k' = 0). \end{split}$$

As we can verify

$$p\_k \mathfrak{a}\_{\hat{\jmath}} - (1 - p\_{\hat{\jmath}})(1 - \mathfrak{a}\_k) < \frac{1}{2}(p\_k + p\_{\hat{\jmath}} - 1) < 0,$$

$$(1 - p\_k)(1 - \mathfrak{a}\_{\hat{\jmath}}) - p\_{\hat{\jmath}} \mathfrak{a}\_k > \frac{1}{2}(1 - p\_k - p\_{\hat{\jmath}}) > 0.$$

We define Γ<sup>1</sup> *<sup>i</sup> α<sup>i</sup>* <sup>1</sup>−*pi* and <sup>Γ</sup><sup>0</sup> *<sup>i</sup> β<sup>i</sup>* <sup>1</sup>−*pi* for 1 <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> *<sup>N</sup>*. Then, we have

$$
\Gamma^1\_j \overset{<}{\geq} \Gamma^1\_k \Longrightarrow (1 - p\_k)\alpha\_j - (1 - p\_j)\alpha\_k \overset{<}{\geq} 0.
$$

Combining with Lemma 1, we can conclude that, for any *x* − {*s j* ,*s <sup>k</sup>*}, *<sup>R</sup>j*,*k*(*x*, *<sup>x</sup>* ) is non-increasing in *sj* if Γ<sup>1</sup> *<sup>j</sup>* <sup>≤</sup> <sup>Γ</sup><sup>1</sup> *<sup>k</sup>* and is non-decreasing in *sk* if <sup>Γ</sup><sup>1</sup> *<sup>j</sup>* <sup>≥</sup> <sup>Γ</sup><sup>1</sup> *k*.


$$\begin{split} R^{j,k}(\mathbf{x},\mathbf{x}') &= \sum\_{s'\_j,s'\_k} \left[ \left( P\_{s\_k s'\_k}(0,0) P\_{s\_j s'\_j}(1,1) - P\_{s\_k s'\_k}(1,0) P\_{s\_j s'\_j}(0,1) \right) V(\mathbf{x}') \right] \\ &= \left( p\_k \alpha\_j - (1 - p\_j)(1 - \beta\_k) \right) V(\mathbf{x}'; s'\_j = s\_j + 1; s'\_k = 0) \\ &+ \left( (1 - p\_k)(1 - \alpha\_j) - p\_j \beta\_k \right) V(\mathbf{x}'; s'\_j = 0; s'\_k = s\_k + 1) \\ &+ \left( (1 - p\_k)\alpha\_j - (1 - p\_j)\beta\_k \right) V(\mathbf{x}'; s'\_j = s\_j + 1; s'\_k = s\_k + 1) \\ &+ \left( p\_k(1 - \alpha\_j) - p\_j(1 - \beta\_k) \right) V(\mathbf{x}'; s'\_j = 0; s'\_k = 0). \end{split}$$

As we can verify

$$p\_k a\_j - (1 - p\_j)(1 - \beta\_k) < p\_k \left(p\_j - \frac{1}{2}\right) < 0,$$

$$(1 - p\_k)(1 - a\_j) - p\_j \beta\_k > (1 - p\_k)\left(\frac{1}{2} - p\_j\right) > 0.$$

At the same time

$$
\Gamma^1\_j \overset{<}{\geq} \Gamma^0\_k \Longrightarrow (1 - p\_k)\alpha\_j - (1 - p\_j)\beta\_k \overset{<}{\geq} 0.
$$

Combined with Lemma 1, we can conclude that, for any *x* − {*s j* ,*s <sup>k</sup>*}, *<sup>R</sup>j*,*k*(*x*, *<sup>x</sup>* ) is non-increasing in *sj* if Γ<sup>1</sup> *<sup>j</sup>* <sup>≤</sup> <sup>Γ</sup><sup>0</sup> *<sup>k</sup>* and is non-decreasing in *sk* if <sup>Γ</sup><sup>1</sup> *<sup>j</sup>* <sup>≥</sup> <sup>Γ</sup><sup>0</sup> *k*.

• In the case of *r*ˆ*<sup>j</sup>* = 0, *r*ˆ*<sup>k</sup>* = 1, and *sj*,*sk* > 0, by swapping the *α*'s and *β*'s in the above case, we can conclude with the same result.

Combined together, we conclude that *Rj*,*k*(*x*, *x* ) satisfies Property 3 for any *x* − {*s j* ,*s k*}. Consequently, *δj*,*k*(*x*) is non-increasing in *sj* if Γ *r*ˆ*j <sup>j</sup>* <sup>≤</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> <sup>k</sup>* and is non-decreasing in *sk* if Γ *r*ˆ*j <sup>j</sup>* <sup>≥</sup> <sup>Γ</sup>*r*ˆ*<sup>k</sup> <sup>k</sup>* when *sj*,*sk* > 0.

Property 5—*δj*,*k*(*x*) <sup>≤</sup> 0 if *sj* <sup>≥</sup> *sk*, *<sup>r</sup>*ˆ*<sup>j</sup>* <sup>≥</sup> *<sup>r</sup>*ˆ*k*, and users *<sup>j</sup>* and *<sup>k</sup>* are statistically identical.

According to Property 3, it is sufficient to consider the case where *sj*,*sk* > 0. We notice that the sign of *δj*,*k*(*x*) can be captured by the sign of the quantity *Qj*,*k*(*x*, *x* ) - ∑*r*ˆ *j* ,*r*ˆ *k P*(*r*ˆ *j* )*P*(*r*ˆ *<sup>k</sup>*)*Rj*,*k*(*x*, *<sup>x</sup>* ). Thus, we divide our discussion into the following cases.

• We first consider the case of *sj* ≥ *sk* > 0 and *r*ˆ*<sup>j</sup>* = *r*ˆ*<sup>k</sup>* = 0. Leveraging the definition of statistically identical, for any *x* − {*x j* , *x <sup>k</sup>*}, we have

$$Q^{jk}(\mathbf{x}, \mathbf{x}') = \sum\_{\hat{r}'\_{j}, \hat{r}'\_{k}} P(\mathbf{r}'\_{j}) P(\mathbf{r}'\_{k}) \mathbf{x}\_{1} \left( V(\mathbf{x}'; \mathbf{x}'\_{j} = (0, \mathbf{r}'\_{j}); \mathbf{x}'\_{k} = (s\_{k} + 1, \mathbf{r}'\_{k})) - 1 \right)$$

$$V(\mathbf{x}'; \mathbf{x}'\_{j} = (s\_{j} + 1, \mathbf{r}'\_{j}); \mathbf{x}'\_{k} = (0, \mathbf{r}'\_{k}))$$

where *κ*<sup>1</sup> = 1 − *pj* − *β<sup>j</sup>* ≥ 0. Then, by substituting the values of *P*(*r*ˆ) and using Lemma 2, we obtain

$$\begin{split} Q^{jk}(\mathbf{x},\mathbf{x}') &= \gamma\_j \gamma\_k \mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_k+1,1);\mathbf{x}'\_k = (0,1)) - \\ &\gamma\_j \gamma\_k \mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_j+1,1);\mathbf{x}'\_k = (0,1)) + \\ &(1-\gamma\_j)(1-\gamma\_k)\mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_k+1,0);\mathbf{x}'\_k = (0,0)) - \\ &(1-\gamma\_j)(1-\gamma\_k)\mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_j+1,0);\mathbf{x}'\_k = (0,0)) + \\ &\gamma\_k(1-\gamma\_j)\mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_k+1,1);\mathbf{x}'\_k = (0,0)) - \\ &\gamma\_k(1-\gamma\_j)\mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_j+1,0);\mathbf{x}'\_k = (0,1)) + \\ &\gamma\_j(1-\gamma\_k)\mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_k+1,0);\mathbf{x}'\_k = (0,1)) - \\ &\gamma\_j(1-\gamma\_k)\mathbf{x}\_1 V(\mathbf{x}';\mathbf{x}'\_j = (\mathbf{s}\_j+1,1);\mathbf{x}'\_k = (0,0)). \end{split}$$

Since users *j* and *k* are statistically identical, we have *γ<sup>j</sup>* = *γk*. Then, by Lemma 1, we have *Qj*,*k*(*x*, *x* ) ≤ 0 for any *x* − {*x j* , *x <sup>k</sup>*}. Since *<sup>δ</sup>j*,*k*(*x*) is a linear combination of *Qj*,*k*(*x*, *x* )'s with non-negative coefficients, we can conclude that *<sup>δ</sup>j*,*k*(*x*) <sup>≤</sup> 0.


$$\begin{split} R^{j,k}(\mathbf{x},\mathbf{x}') &= \left( p\_k a\_j - (1 - p\_j)(1 - \beta\_k) \right) V(\mathbf{x}'; s\_j' = s\_j + 1; s\_k' = 0) + \\ &\quad \left( (1 - p\_k)(1 - a\_j) - p\_j \beta\_k \right) V(\mathbf{x}'; s\_j' = 0; s\_k' = s\_k + 1) + \\ &\quad \left( (1 - p\_k)a\_j - (1 - p\_j)\beta\_k \right) V(\mathbf{x}'; s\_j' = s\_j + 1; s\_k' = s\_k + 1) + \\ &\quad \left( p\_k(1 - a\_j) - p\_j(1 - \beta\_k) \right) V(\mathbf{x}'; s\_j' = 0; s\_k' = 0). \end{split}$$

As users *j* and *k* are statistically identical, we have *pj* = *pk* and *α<sup>j</sup>* < *βk*. Leveraging Lemma 1, we have

$$R^{j,k}(\mathbf{x}, \mathbf{x}') \le (\alpha\_j + p\_j - 1) \left( V(\mathbf{x}'; s\_j' = s\_j + 1; s\_k' = 0) - 1 \right)$$

$$V(\mathbf{x}'; s\_j' = 0; s\_k' = s\_k + 1) \left).$$

Then, for any *x* − {*x j* , *x k*}

$$Q^{jk}(\mathbf{x}, \mathbf{x}') \le \sum\_{\boldsymbol{\theta}'\_{\boldsymbol{\hat{j}}'}, \boldsymbol{\theta}'\_{\boldsymbol{k}}} P(\mathbf{\hat{r}}'\_{\boldsymbol{j}}) P(\mathbf{\hat{r}}'\_{\boldsymbol{k}}) \mathbf{x}\_2 \left( V(\mathbf{x}'; \mathbf{x}'\_{\boldsymbol{j}} = (0, \mathbf{\hat{r}}'\_{\boldsymbol{j}}); \mathbf{x}'\_k = (s\_k + 1, \mathbf{\hat{r}}'\_k) \right) - 1$$

$$V(\mathbf{x}'; \mathbf{x}'\_{\boldsymbol{j}} = (s\_{\boldsymbol{j}} + 1, \mathbf{\hat{r}}'\_{\boldsymbol{j}}); \mathbf{x}'\_k = (0, \mathbf{\hat{r}}'\_k)) \;\_{1}$$

where *κ*<sup>2</sup> = 1 − *pj* − *α<sup>j</sup>* > 0. Such as we did in the previous cases, we can leverage Lemmas 1 and 2 to conclude that *Qj*,*k*(*x*, *x* ) ≤ 0 for any *x* − {*x j* , *x <sup>k</sup>*}. Consequently, *<sup>δ</sup>j*,*k*(*x*) <sup>≤</sup> 0 in this case. The details are omitted for the sake of space.

Combined together, we conclude the proof of Property 5.

#### **Appendix D. Proof of Corollary 2**

We follow the same steps as in the proof of Lemma 1. To prove the corollary, it is sufficient to show that *V*(*x*1) ≤ *V*(*x*2) when *s*<sup>1</sup> < *s*<sup>2</sup> and *r*ˆ1 = *r*ˆ2. We use mathematical induction to prove the monotonicity. First of all, the base case (i.e., *ν* = 0) is true by initialization. We assume the lemma holds at iteration *ν*. Then, we want to examine whether it holds at iteration *ν* + 1. For the system with a single user, the update step reported in problem (5) can be simplified and rewritten as follows

$$V\_{\nu+1}(\mathbf{x}) = \min\_{a \in \{0,1\}} V\_{\nu+1}^a(\mathbf{x}),\tag{A8}$$

where

$$V\_{\nu+1}^{a}(\mathbf{x}) = \mathbb{C}(\mathbf{x}, a) - \theta + \sum\_{\mathfrak{k}'} P(\mathfrak{k}') \sum\_{\mathfrak{s}'} P\_{\mathfrak{s}, \mathfrak{s}'}(a, \mathfrak{k}) V\_{\nu}(\mathbf{x}'),$$

and *θ* is the optimal value for M1(*λ*, −1). To prove the desired results, we distinguish between the following cases

• We first consider the case of *s*<sup>1</sup> = 0 < *s*<sup>2</sup> and *r*ˆ1 = *r*ˆ2 = 0. When *a* = 1, we have

$$\begin{aligned} V\_{\nu+1}^1(\mathbf{x}\_1) &= \mathbb{C}(\mathbf{x}\_1, 1) - \theta + \sum\_{\mathcal{r}'} P(\mathcal{r}') \left( p V\_{\mathcal{V}}(1, \mathfrak{r}') + (1 - p) V\_{\mathcal{V}}(0, \mathfrak{r}') \right), \\\\ V\_{\nu+1}^1(\mathbf{x}\_2) &= \mathbb{C}(\mathbf{x}\_2, 1) - \theta + \sum\_{\mathcal{r}'} P(\mathcal{r}') \left( \beta V\_{\mathcal{V}}(\mathfrak{s}\_2 + 1, \mathfrak{r}') + (1 - \beta) V\_{\mathcal{V}}(0, \mathfrak{r}') \right). \end{aligned}$$

Subtracting the two expressions yields

$$\begin{aligned} \left(V\_{\nu+1}^{1}(\mathbf{x}\_{1}) - V\_{\nu+1}^{1}(\mathbf{x}\_{2})\right) \\ \leq & \mathbb{C}(\mathbf{x}\_{1}, 1) - \mathbb{C}(\mathbf{x}\_{2}, 1) + \sum\_{\boldsymbol{\mathscr{P}}} P(\mathbf{\dot{\mathscr{P}}}) \left[ (\boldsymbol{p} - \boldsymbol{\mathscr{P}}) \left(V\_{\nu}(1, \mathbf{\dot{\mathscr{P}}}) - V\_{\nu}(0, \mathbf{\dot{\mathscr{P}}}) \right) \right] \leq 0. \end{aligned}$$

The inequalities hold since *β* > *p*, *C*(*x*, *a*) is non-decreasing in *s*, and Corollary 2 is true at iteration *ν* by assumption. For the case of *a* = 0, we obtain

$$\begin{aligned} V\_{\nu+1}^{0}(\mathbf{x}\_{1}) &= \mathbb{C}(\mathbf{x}\_{1}, 0) - \theta + \sum\_{\mathcal{\mathcal{P}}} P(\mathcal{\mathcal{P}}) \left( p V\_{\mathcal{V}}(1, \mathcal{\mathcal{P}}) + (1 - p) V\_{\mathcal{V}}(0, \mathcal{\mathcal{P}}) \right), \\\\ V\_{\nu+1}^{0}(\mathbf{x}\_{2}) &= \mathbb{C}(\mathbf{x}\_{2}, 0) - \theta + \sum\_{\mathcal{\mathcal{V}}} P(\mathcal{\mathcal{V}}) \left( (1 - p) V\_{\mathcal{V}}(\mathbf{s}\_{2} + 1, \mathcal{\mathcal{P}}) + p V\_{\mathcal{V}}(0, \mathcal{\mathcal{P}}) \right). \end{aligned}$$

Therefore, when *a* = 0, we have

$$\begin{aligned} \mathcal{V}^{0}\_{\boldsymbol{\nu}+1}(\mathbf{x}\_{1}) - \mathcal{V}^{0}\_{\boldsymbol{\nu}+1}(\mathbf{x}\_{2}) \\ \leq & \mathcal{C}(\mathbf{x}\_{1}, 0) - \mathcal{C}(\mathbf{x}\_{2}, 0) + \sum\_{\boldsymbol{\mathcal{V}}} P(\boldsymbol{\hat{\mathcal{V}}}) \Big[ (2p - 1) \left( V\_{\boldsymbol{\mathcal{V}}}(1, \boldsymbol{\hat{\mathcal{r}}}') - V\_{\boldsymbol{\mathcal{V}}}(0, \boldsymbol{\hat{\mathcal{r}}}') \right) \Big] \leq 0. \end{aligned}$$

The inequalities hold since 2*p* − 1 < 0, *C*(*x*, *a*) is non-decreasing in *s*, and Corollary 2 is true at iteration *ν* by assumption. Combined together, we can see that *Va <sup>ν</sup>*+1(*x*1) <sup>≤</sup> *<sup>V</sup><sup>a</sup> <sup>ν</sup>*+1(*x*2) for any feasible *a*. Then, by problem (A8), we can conclude that the lemma holds at iteration *ν* + 1 when *s*<sup>1</sup> = 0 < *s*<sup>2</sup> and *r*ˆ1 = *r*ˆ2 = 0.


Combining the three cases, we prove that the lemma holds at iteration *ν* + 1 of VIA. Therefore, the lemma holds at any iteration *ν* by mathematical induction. Since VIA is guaranteed to converge to the value function when *ν* → +∞, we can conclude our proof.

#### **Appendix E. Proof of Proposition 1**

We define Δ*V*(*x*) - *<sup>V</sup>*1(*x*) <sup>−</sup> *<sup>V</sup>*0(*x*) where *<sup>V</sup>a*(*x*) is the value function resulting from taking action *a* at state *x*. Then, *Va*(*x*) can be calculated as follows

$$V^a(\mathbf{x}) = \mathcal{C}(\mathbf{x}, a) - \theta + \sum\_{\mathbf{x}' \in \mathcal{X}} P\_{\mathbf{x}, \mathbf{x}'}(a) V(\mathbf{x}'), \tag{A9}$$

where *θ* is the optimal value for M1(*λ*, −1). Hence, the optimal action at state *x* can be fully characterized by the sign of Δ*V*(*x*). More precisely, the optimal action at state *x* is *a* = 1 if Δ*V*(*x*) < 0, and *a* = 0 is optimal otherwise. To determine the sign of Δ*V*(*x*) for each state, we distinguish between the following cases

• We first consider the state *x* = (0,*r*ˆ). Applying the results in Section 2.3 to problem (A9), we obtain

$$\begin{aligned} V^0(0, \mathfrak{k}) &= -\theta + (1 - \gamma)(1 - p)V(0, 0) + (1 - \gamma)pV(1, 0) + \\ &\gamma (1 - p)V(0, 1) + \gamma pV(1, 1), \end{aligned}$$

$$V^1(0, \mathfrak{f}) = \lambda + V^0(0, \mathfrak{f}). \tag{A10}$$

Therefore, Δ*V*(0,*r*ˆ) = *λ* ≥ 0. Thus, the optimal action at state (0,*r*ˆ) is *a* = 0. • Then, we consider the state *x* = (*s*, 0) where *s* > 0. Applying the results in Section 2.3 to Equation (A9), we obtain

$$\begin{aligned} V^0(s,0) &= f(s) - \theta + (1-\gamma)pV(0,0) + (1-\gamma)(1-p)V(s+1,0) + \\ &\gamma pV(0,1) + \gamma(1-p)V(s+1,1), \\ V^1(s,0) &= f(s) + \lambda - \theta + (1-\gamma)(1-\beta)V(0,0) + (1-\gamma)\beta V(s+1,0) + \end{aligned}$$

$$
\gamma (1 - \beta) V(0, 1) + \gamma \beta V(s + 1, 1).
$$

Then,

$$
\Delta V(s,0) = \lambda + p\_\varepsilon^0 (1 - 2p)\omega,\tag{A11}
$$

where *ω* = (1 − *γ*)[*V*(0, 0) − *V*(*s* + 1, 0)] + *γ*[*V*(0, 1) − *V*(*s* + 1, 1)] ≤ 0.

• Finally, we consider the state *x* = (*s*, 1) where *s* > 0. Following the same trajectory, we have

$$
\Delta V(\mathbf{s}, \mathbf{1}) = \lambda + (1 - p\_c^1)(1 - 2p)\omega\_{-1}
$$

According to Corollary 2 and the fact that *p* < 0.5, we can see that Δ*V*(*s*, 0) and Δ*V*(*s*, 1) are both a constant *λ* plus a term that is non-increasing in *s*. As the time penalty function is unbounded, the value function must also be unbounded. Then, combining the three cases, we can conclude the following. For fixed *r*ˆ, there always exists a threshold *nr*<sup>ˆ</sup> > 0 such that the optimal action at state (*s*,*r*ˆ) where *s* ≥ *nr*<sup>ˆ</sup> is *a* = 1, otherwise *a* = 0 is optimal. Since *r*ˆ ∈ {0, 1}, the optimal policy can be fully captured by the pair (*n*0, *n*1).

In the following, we determine the relationship between *n*<sup>0</sup> and *n*1. We have

$$
\Delta V(\mathbf{s}, 1) - \Delta V(\mathbf{s}, 0) = (1 - p\_\varepsilon^1 - p\_\varepsilon^0)(1 - 2p)\omega \le 0.
$$

At the same time, for the threshold *n*0, we know Δ*V*(*n*0, 0) < 0. Then, we have Δ*V*(*n*0, 1) ≤ Δ*V*(*n*0, 0) < 0. Combined with the fact that Δ*V*(*s*,*r*ˆ) is non-increasing in *s*, we can conclude that the ordering *n*<sup>0</sup> ≥ *n*<sup>1</sup> is true.

#### **Appendix F. Proof of Proposition 2**

We notice that the dynamic of AoII under threshold policy can be fully captured by a Discrete-Time Markov Chain (DTMC). Then, the expected AoII Δ¯ *<sup>n</sup>* and the expected transmission rate *ρ*¯*<sup>n</sup>* under threshold policy *n* = (*n*0, *n*1) can be obtained from the stationary

distribution of the induced DTMC. Let the states of the induced DTMC be the values of *s*. We recall that *r*ˆ is an independent Bernoulli random variable with parameter *γ*. Combined with the results in Section 2.3, we can easily obtain the state transition probabilities of the induced DTMC, which are shown in Figure A1.

**Figure A1.** DTMC induced by the threshold policy *n* = (*n*0, *n*1). In the figure, *c*<sup>1</sup> = (1 − *γ*)(1 − *p*) + *γα* and *c*<sup>2</sup> = (1 − *γ*) *β* + *γα*.

The balance equations of the induced DTMC are the following

$$(1-p)\pi\_0 + p\sum\_{k=1}^{n\_1 - 1} \pi\_k + (1-c\_1)\sum\_{k=n\_1}^{n\_0 - 1} \pi\_k + (1-c\_2)\sum\_{k=n\_0}^{+\infty} \pi\_k = \pi\_0.$$

$$p\pi\_0 = \pi\_1.$$

$$(1-p)\pi\_{k-1} = \pi\_k \text{ for } 2 \le k \le n\_1.$$

$$c\_1\pi\_{k-1} = \pi\_k \text{ for } n\_1 + 1 \le k \le n\_0.$$

$$c\_2\pi\_{k-1} = \pi\_k \text{ for } n\_0 + 1 \le k.$$

$$\sum\_{k=0}^{+\infty} \pi\_k = 1.$$

Then, we can easily solve the above system of linear equations. After some algebraic manipulation, we obtain the following

$$\pi\_0 = \frac{1}{2 + p(1 - p)^{n\_1 - 1} \left[ \frac{1}{1 - c\_1} - \frac{1}{p} + c\_1^{n\_0 - n\_1} \left( \frac{1}{1 - c\_2} - \frac{1}{1 - c\_1} \right) \right]}.$$

$$\pi\_k = p(1 - p)^{k - 1} \pi\_0 \text{ for } 1 \le k \le n\_1.$$

$$\pi\_k = p(1 - p)^{n\_1 - 1} c\_1^{k - n\_1} \pi\_0 \text{ for } n\_1 + 1 \le k \le n\_0.$$

$$\pi\_k = p(1 - p)^{n\_1 - 1} c\_1^{n\_0 - n\_1} c\_2^{k - n\_0} \pi\_0 \text{ for } n\_0 + 1 \le k.$$

Equipped with the above results, we proceed with calculating Δ¯ *<sup>n</sup>* and *ρ*¯*n*. According to problem (6a), the expected AoII is:

$$\bar{\Delta}\_{\mathfrak{n}} = \sum\_{k=0}^{+\infty} f(k)\pi\_{k}.$$

Substituting the expressions of *πk*'s, we can get the expression of Δ¯ *<sup>n</sup>*. Proposition 1 tells us the following.


• For state (*s*,*r*ˆ) where *s* ≥ *n*0, it is optimal to make transmission attempt regardless of *r*ˆ.

Combined with problem (6b), we have

$$\bar{\rho}\_{\mathfrak{n}} = \gamma \sum\_{k=n\_1}^{n\_0-1} \pi\_k + \sum\_{k=n\_0}^{+\infty} \pi\_k.$$

Substituting the expressions of *πk*'s, we can obtain the closed-form expression of *ρ*¯*n*.

#### **Appendix G. Proof of Proposition 4**

We first tackle the Whittle's indexes at state (0,*r*ˆ) and (*s*, 0) where *s* > 0. To this end, we distinguish between the following cases


Now, we tackle the Whittle's index at state *x* = (*s*, 1) where *s* > 0. For convenience, we denote by *Wn* the Whittle's index at state *x* = (*n*, 1). According to the monotonicity of Δ*V*(*x*) shown in the proof of Proposition 1, we can conclude that threshold policy *n* = (+∞, *n* + 1) is optimal when *V*0(*n*, 1) = *V*1(*n*, 1). Then, we can prove the following

**Lemma A1.** *When* (9) *is satisfied and V*0(*n*, 1) = *V*1(*n*, 1)*, V*(*s*, 1) = *V*(*s*, 0) - *V*(*s*) *for* 0 ≤ *s* ≤ *n.*

**Proof.** Since the value function satisfies the Bellman equation, it is sufficient to show that *V*(*s*, 1) and *V*(*s*, 0) satisfy the same Bellman equation. We recall that the Bellman equation for *V*(*x*) is given by

$$V(\mathbf{x}) = \min\_{a \in \{0, 1\}} V^a(\mathbf{x})\_r$$

where

$$\mathcal{V}^{a}(\mathbf{x}) = \mathbb{C}(\mathbf{x}, a) - \theta + \sum\_{\mathbf{x}'} P\_{\mathbf{x}, \mathbf{x}'}(a) V(\mathbf{x}'), \tag{A12}$$

and *θ* is the optimal value of the decoupled problem. We recall, from Corollary 3, that the optimal action at state (*s*, 0) is staying idle (i.e., *a* = 0) for any *s*. We also know that threshold policy *n* = (+∞, *n* + 1) is optimal when *V*0(*n*, 1) = *V*1(*n*, 1). Therefore, the optimal actions at states (*s*, 0) and (*s*, 1) where *s* ≤ *n* are the same (i.e., *a* = 0). Equivalently, we have

$$V(\mathbf{s}, \hat{r}) = V^0(\mathbf{s}, \hat{r}), \quad \text{for } \mathbf{s} \le \mathbf{n}. \tag{A13}$$

According to the system dynamic reported in Section 2.3, we know that the state transition probabilities are independent of *r*ˆ when *a* = 0. Meanwhile, *r*ˆ does not affect the instant cost. Let *x*<sup>1</sup> = (*s*, 1) and *x*<sup>2</sup> = (*s*, 0). Then, for any *x* , we have

$$P\_{\mathbf{x}\_1,\mathbf{x}'}(0) = P\_{\mathbf{x}\_2,\mathbf{x}'}(0).$$

$$\mathbb{C}(\mathbf{x}\_1, 0) = \mathbb{C}(\mathbf{x}\_2, 0).$$

Hence, according to (A12), we can see that *<sup>V</sup>*0(*s*, 0) = *<sup>V</sup>*0(*s*, 1) for any *<sup>s</sup>* <sup>≤</sup> *<sup>n</sup>*. Combined with problem (A13), we can conclude that *V*(*s*, 0) = *V*(*s*, 1) for any 0 ≤ *s* ≤ *n*.

By definition, Whittle's index *Wn* is the infimum *λ* such that *V*0(*n*, 1) = *V*1(*n*, 1). In this case, according to Lemma A1, *V*(0, 1) = *V*(0, 0) = *V*(0). Then, *V*0(*n*, 1) and *V*1(*n*, 1) can be written as

$$V^0(n,1) = f(n) - \theta + pV(0) + (1-p)[(1-\gamma)V(n+1,0) + \gamma V(n+1,1)].\tag{A14}$$

$$V^1(n,1) = f(n) + \mathcal{W}\_n - \theta + (1-\mathfrak{a})V(0) + \mathfrak{a}[(1-\gamma)V(n+1,0) + \gamma V(n+1,1)].$$

Without a loss of generality, we assume *V*(0) = 0. Then, equating the two expressions yields

$$\mathcal{W}\_n = (1 - p - n)(\gamma V(n+1, 1) + (1 - \gamma)V(n+1, 0)). \tag{A15}$$

Combining problems (A14) and (A15), we conclude that *Wn* is

$$W\_n = \frac{(1 - p - \alpha)(V^0(n, 1) + \theta - f(n))}{1 - p}.$$

Since the optimal action at state (*n*, 1) is *a* = 0, we have *V*0(*n*, 1) = *V*(*n*, 1) = *V*(*n*). Finally, we obtain

$$\mathcal{W}\_{\mathbb{H}} = \frac{(1 - p - \mathfrak{a})(\mathcal{V}(n) + \theta - f(n))}{1 - p}. \tag{A16}$$

Now, we tackle the expression of *V*(*n*). When *V*0(*n*, 1) = *V*1(*n*, 1), the optimal action at state (*s*,*r*ˆ) where 0 ≤ *s* < *n* is staying idle. Then, leveraging Lemma A1, value function *V*(*s*) where 0 ≤ *s* < *n* satisfies the following

$$V(s) = \begin{cases} -\theta + f(0) + pV(1) & \text{when } s = 0, \\ -\theta + f(s) + (1 - p)V(s + 1) & \text{when } 0 < s < n. \end{cases} \tag{A17}$$

By backward induction, we end up with the following equation for 0 < *s* < *n*.

$$V(s) = \frac{-\theta(1 - (1-p)^{n-s})}{p} + \sum\_{k=1}^{n-s} f(n-k)(1-p)^{n-s-k} + (1-p)^{n-s}V(n).$$

Letting *s* = 1 yields

$$V(1) = \frac{-\theta(1 - (1-p)^{n-1})}{p} + \sum\_{k=1}^{n-1} f(n-k)(1-p)^{n-1-k} + (1-p)^{n-1}V(n).$$

From problem (A17), *V*(1) also satisfies the following

$$V(1) = \frac{\theta - f(0)}{p}.$$

Equating the two expressions of *V*(1), we obtain

$$V(n) = \frac{-f(0)}{p(1-p)^{n-1}} + \theta \left(\frac{2}{p(1-p)^{n-1}} - \frac{1}{p}\right) - \sum\_{k=1}^{n-1} f(n-k)(1-p)^{-k}.\tag{A18}$$

We recall that, when *V*0(*n*, 1) = *V*1(*n*, 1), threshold policy *n* = (+∞, *n* + 1) is optimal and both actions at state *x* = (*n*, 1) are equally desirable. Thus, threshold policy *n* = (+∞, *n*) is also optimal. Then, we know

$$
\theta = \bar{\Delta}\_n + \mathsf{W}\_n \flat\_{\mathsf{n}\prime} \tag{A19}
$$

where Δ¯ *<sup>n</sup>* and *ρ*¯*<sup>n</sup>* are the expected AoII and the expected transmission rate under threshold policy *n* = (+∞, *n*), respectively. Finally, combining problems (A16), (A18) and (A19), we obtain

$$\mathcal{W}\_{\mathbb{H}} = \frac{\frac{-f(0)}{p(1-p)^n} + \Delta\_n \frac{2 - (1-p)^n}{p(1-p)^n} - (1-p)^{-n} \left(\sum\_{k=1}^n f(k)(1-p)^{k-1}\right)}{\frac{1}{1-p-a} - \beta\_n \frac{2 - (1-p)^n}{p(1-p)^n}}.$$

After some algebraic manipulation, we have

$$\mathcal{W}\_n = \frac{(1 - c\_1) \sum\_{k=n+1}^{+\infty} f(k) c\_1^{k-n-1} - \bar{\Delta}\_n}{\frac{(1 - c\_1)(1 - p) - \gamma(1 - p - \alpha)}{c\_1(1 - p - \alpha)} + \bar{\rho}\_n}$$

,

where *c*<sup>1</sup> = (1 − *γ*)(1 − *p*) + *γα*.

In the following, we investigate some properties of Whittle's index. First of all, *Wn* is non-negative since 1 − *p* − *α* and *V*(*n* + 1,*r*ˆ) in (A15) are all non-negative. Meanwhile, combining (A15) with the fact that *V*(*n*,*r*ˆ) is non-decreasing in *n*, we can verify that *Wn* is non-decreasing in *n*. Combined with the Whittle's indexes in two other cases (i.e., *x* = (0,*r*ˆ) and *x* = (*s*, 0) where *s* > 0), we can easily obtain the properties of *Wx* as detailed in Proposition 4.

#### **Appendix H. Proof of Proposition 5**

We notice that M1(*λ*, −1) coincides with the decoupled model studied in Section 4.2. When problem (9) is satisfied, the decoupled problem is indexable, and, according to Corollary 3, we only need to show that *n* is the optimal threshold for the states with *r*ˆ = 1. We first tackle the case of *λ* > 0. To this end, we divide our discussion into the following cases


Then, we conclude that *n* is the optimal threshold for the states with *r*ˆ = 1 when *λ* > 0. In the case of *λ* = 0, according to the proof of Proposition 1, we can easily verify that the optimal threshold is 1.

#### **Appendix I. Proof of Theorem 2**

We first make the following definitions. When M1(*λ*, −1) is at state *x* and action *a* is taken, cost *C*1(*x*, *a*) *f*(*s*) and *C*2(*x*, *a*) *λa* are incurred. We denote the expected *C*1-cost and the expected *C*2-cost under policy *φ* as *C*¯ <sup>1</sup>(*φ*) and *C*¯ <sup>2</sup>(*φ*), respectively. Let *G* be a non-empty set of states. For the given state *i*, we define R∗(*i*, *G*) as the class of policies *φ*, for which the following hold


With the definitions in mind, we proceed with verifying the assumptions given in [27].


*set <sup>U</sup> of transient states such that <sup>e</sup>* ∈ R∗(*i*, *<sup>R</sup>*) *for <sup>i</sup>* <sup>∈</sup> *U. Moreover, both <sup>C</sup>*¯ <sup>1</sup>(*e*) *and C*¯ <sup>2</sup>(*e*) *on R are finite*: We consider the policy under which the base station makes a transmission attempt at every time slot. According to the system dynamic detailed in Section 2.3, we can see that all the states communicate with state (0, 0) and (0, 0) communicates with all other states. Thus, the state space S consists of a single (non-empty) positive recurrent class and the set of transient states can simply be an empty set. *C*¯ <sup>1</sup>(*e*) and *C*¯ <sup>2</sup>(*e*) are trivially finite as we can verify using Proposition 2.


As the assumptions are verified, we proceed with introducing the optimal randomized policy for given *λ*. We say a policy is *λ*-optimal if the policy is optimal for M1(*λ*, −1). We consider two monotone sequences *λ<sup>n</sup>* <sup>+</sup> <sup>↓</sup> *<sup>λ</sup>* and *<sup>λ</sup><sup>n</sup>* <sup>−</sup> ↑ *<sup>λ</sup>*. Then, there exist subsequences of *λ<sup>n</sup>* <sup>+</sup> and *λ<sup>n</sup>* <sup>−</sup> such that the corresponding sequences of optimal policies converge. Then, according to Lemma 3.7 of [27], the limit points, denoted by *nλ*<sup>+</sup> and *nλ*<sup>−</sup> , are both *λ*optimal. By Proposition 3.2 of [27], the Markov chains induced by *nλ*<sup>+</sup> and *nλ*<sup>−</sup> both contain a single non-empty positive recurrent class and state (0, 0) is positive recurrent in both induced Markov chains. Hence, the base station can choose which policy to follow each time the system reaches state (0, 0) while keeping the resulting randomized policy *λ*-optimal as suggested by Lemma 3.9 of [27]. More precisely, we consider the following randomized policy: each time the system reaches state (0, 0), the base station will choose *nλ*<sup>−</sup> with probability *μ* and *nλ*<sup>+</sup> with probability 1 − *μ*. The chosen policy will be followed until the next choice. We denote such policy as *n<sup>λ</sup>* and conclude that *n<sup>λ</sup>* is *λ*-optimal under any *μ* ∈ [0, 1].

#### **Appendix J. Proof of Proposition 6**

The value function *V*(*x*) and *V<sup>i</sup>* (*xi*) must satisfy their own Bellman equations.More precisely

$$V(\mathbf{x}) + \theta = \min\_{\mathbf{a} \in \mathcal{A}\_N(-1)} \left\{ \mathbb{C}(\mathbf{x}, \mathbf{a}) + \sum\_{\mathbf{x'}} Pr(\mathbf{x'} \mid \mathbf{x}, \mathbf{a}) V(\mathbf{x'}) \right\},$$

$$V^i(\mathbf{x}\_i) + \theta\_i = \min\_{a\_i \in \{0, 1\}} \left\{ \mathbb{C}(\mathbf{x}\_i, a\_i) + \sum\_{\mathbf{x'}\_i} Pr(\mathbf{x'\_i} \mid \mathbf{x}\_i, a\_i) V^i(\mathbf{x'\_i}) \right\},\tag{A20}$$

where *<sup>θ</sup>* and *<sup>θ</sup><sup>i</sup>* are the optimal values of <sup>M</sup>*N*(*λ*, <sup>−</sup>1) and <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*, −1), respectively. We recall from Section 2.3 that the users are independent when action *a* and current state *x* are given. Thus

$$\Pr\left(\mathbf{x'} \mid \mathbf{x}, \mathbf{a}\right) = \prod\_{i=1}^{N} \Pr\left(\mathbf{x'\_i} \mid \mathbf{x}, \mathbf{a}\right),$$

where *x* = (*x* <sup>1</sup>,..., *x <sup>N</sup>*). Then, we have

$$\sum\_{\mathbf{x'} - \{\mathbf{x}'\_i\}} Pr(\mathbf{x'} - \{\mathbf{x}'\_i\} \mid \mathbf{x}, \mathbf{a}) = \sum\_{\mathbf{x'} - \{\mathbf{x}'\_i\}} \prod\_{j \neq i} Pr(\mathbf{x'\_j} \mid \mathbf{x}, \mathbf{a}) = 1.$$

We also recall from Section 2.3 that the state of user *i* depends only on its previous state and the action with respect to user *i*. Thus

$$Pr(\mathbf{x}\_i' \mid \mathbf{x}, \mathbf{a}) = Pr(\mathbf{x}\_i' \mid \mathbf{x}\_i, \mathbf{a}\_i) \dots$$

Combined together, we obtain

$$\begin{split} \sum\_{i=1}^{N} \sum\_{\mathbf{x}\_{i}'} Pr(\mathbf{x}\_{i}' \mid \mathbf{x}\_{i}, a\_{i}) V^{i}(\mathbf{x}\_{i}') &= \sum\_{i=1}^{N} \sum\_{\mathbf{x}\_{i}'} \left[ \sum\_{\mathbf{x}' \vdash \{\mathbf{x}\_{i}'\}} \prod\_{j \neq i} Pr(\mathbf{x}\_{j}' \mid \mathbf{x}, \mathbf{a}) \right] Pr(\mathbf{x}\_{i}' \mid \mathbf{x}\_{i}, a\_{i}) V^{i}(\mathbf{x}\_{i}') \\ &= \sum\_{i=1}^{N} \sum\_{\mathbf{x}\_{i}'} \left( \sum\_{\mathbf{x}' \vdash \{\mathbf{x}\_{i}'\}} \prod\_{i=1}^{N} Pr(\mathbf{x}\_{i}' \mid \mathbf{x}, \mathbf{a}) V^{i}(\mathbf{x}\_{i}') \right) \\ &= \sum\_{\mathbf{x}'} Pr(\mathbf{x}' \mid \mathbf{x}, \mathbf{a}) \left( \sum\_{i=1}^{N} V^{i}(\mathbf{x}\_{i}') \right). \end{split} \tag{A21}$$

Then, we sum problem (A20) over all users which yields

$$\sum\_{i=1}^{N} (V^i(\mathbf{x}\_i) + \theta\_i) = \min\_{\mathbf{a}} \left\{ \sum\_{i=1}^{N} \left( \mathbb{C}(\mathbf{x}\_i, a\_i) + \sum\_{\mathbf{x}\_i'} \Pr(\mathbf{x}\_i' \mid \mathbf{x}\_i, a\_i) V^i(\mathbf{x}\_i') \right) \right\}.$$

We recall that *C*(*x*, *a*) = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *C*(*xi*, *ai*) by definition. Then, leveraging problem (A21), we obtain

$$\sum\_{i=1}^{N} V^i(\mathbf{x}\_i) + \sum\_{i=1}^{N} \theta\_i = \min\_{\mathbf{a} \in \mathcal{A}\_N(-1)} \left\{ \mathbb{C}(\mathbf{x}, \mathbf{a}) + \sum\_{\mathbf{x}'} Pr(\mathbf{x}' \mid \mathbf{x}, \mathbf{a}) \left( \sum\_{i=1}^{N} V^i(\mathbf{x}'\_i) \right) \right\}.$$

Since the solution to the Bellman equation is unique [21], we must have ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *V<sup>i</sup>* (*xi*) = *V*(*x*) and ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *θ<sup>i</sup>* = *θ*. Then, we can conclude that it is optimal for M*N*(*λ*, −1) if each user adopts its own optimal policy.

#### **Appendix K. Proof of Theorem 3**

In this proof, we class a policy as *λ*∗-optimal if it is optimal for M*N*(*λ*∗, −1). In Section 4.2, we ensure that, for each user, there exists at least one threshold policy that yields a finite expected AoII. Therefore, we can conclude that, for RP, there exists at least one policy that causes the expected AoII and the expected transmission rate to be both finite. Then, according to Lemma 3.10 of [27], a policy is optimal for RP if


We first construct a policy *φλ*<sup>∗</sup> that is *λ*∗-optimal. We recall from Proposition 6 that a policy is *<sup>λ</sup>*∗-optimal if it consists of the optimal policies for each <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1) where 1 ≤ *i* ≤ *N*. According to Theorem 2, for any *i*, there exist *nλ*<sup>∗</sup> <sup>−</sup>,*<sup>i</sup>* and *<sup>n</sup>λ*<sup>∗</sup> <sup>+</sup>,*<sup>i</sup>* that are both optimal for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1). Then, we can construct the policy *φλ*<sup>∗</sup> in the following way.

• For user *i* with *nλ*<sup>∗</sup> <sup>−</sup>,*<sup>i</sup>* <sup>=</sup> *<sup>n</sup>λ*<sup>∗</sup> <sup>+</sup>,*<sup>i</sup> nλ*∗,*i*, the threshold policy *nλ*∗,*<sup>i</sup>* is used. Then, the deterministic policy *<sup>n</sup>λ*∗,*<sup>i</sup>* is optimal for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1) and

$$
\rho^i(\lambda^\*) = \rho^i(\lambda\_-^\*) = \rho^i(\lambda\_+^\*).
$$

In this case, the choice of *μ<sup>i</sup>* makes no difference.

• For user *i* with *nλ*<sup>∗</sup> <sup>−</sup>,*<sup>i</sup>* <sup>=</sup> *<sup>n</sup>λ*<sup>∗</sup> <sup>+</sup>,*i*, the randomized policy *nλ*∗,*<sup>i</sup>* as detailed in Theorem 2 is used. Then, for any *<sup>μ</sup><sup>i</sup>* <sup>∈</sup> [0, 1], the randomized policy *<sup>n</sup>λ*∗,*<sup>i</sup>* is optimal for <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1) and

$$
\bar{\rho}^i(\lambda^\*) = \mu\_i \bar{\rho}^i(\lambda\_-^\*) + (1 - \mu\_i)\bar{\rho}^i(\lambda\_+^\*).
$$

Combing the two cases, we conclude that *φλ*<sup>∗</sup> = [*nλ*∗,1, ... , *nλ*∗,*N*] is *λ*∗-optimal under any *<sup>μ</sup><sup>i</sup>* <sup>∈</sup> [0, 1]. Hence, as long as the chosen *<sup>μ</sup>i*'s realize <sup>∑</sup>*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *ρ*¯*<sup>i</sup>* (*λ*∗) = *M*, we can conclude that the randomized policy *φλ*<sup>∗</sup> is optimal for RP.

#### **Appendix L. Proof of Proposition 8**

We notice that <sup>M</sup>*<sup>i</sup>* <sup>1</sup>(*λ*∗, −1) coincides with the decoupled model studied in Section 4.2. Therefore, we can use the results in Section 4.2 to prove the properties. Since the users share the same structure, we ignore the user index *i* for simplicity. According to the definition of *Ix*, we have

$$\begin{aligned} I\_{\mathbf{x}} &= \sum\_{\mathbf{x}'} P\_{\mathbf{x}, \mathbf{x}'}(0) V(\mathbf{x}') - \sum\_{\mathbf{x}'} P\_{\mathbf{x}, \mathbf{x}'}(1) V(\mathbf{x}') - \lambda^\* \\ &= -\Delta V(\mathbf{x}). \end{aligned}$$

Leveraging the results in the proof of Proposition 1, we have the following


From the above three cases, we can easily conclude that *Ix* ≥ −*λ*<sup>∗</sup> and the equality holds when *r*ˆ = *p*<sup>0</sup> *<sup>e</sup>* = 0 or *s* = 0. As is proven in Corollary 2, *V*(*x*) is non-decreasing in *s*. Hence, we can conclude that *Ix* is also non-decreasing in *s*. To show that *Ix* is monotone in *r*ˆ, we consider two states *x*<sup>1</sup> = (*s*, 1) and *x*<sup>2</sup> = (*s*, 0). Then, we have

$$I\_{x\_2} - I\_{x\_1} = \Delta V(\mathbf{s}, 1) - \Delta V(\mathbf{s}, 0) = (1 - p\_c^1 - p\_c^0)(1 - 2p)\omega \le 0.1$$

Therefore, we can conclude that *Ix* is non-decreasing in *r*ˆ.

#### **Appendix M**



#### **Algorithm A2** Bisection Search

**Require:** Maximum updates per transmission attempt *M* MDP M*N*(*λ*, −1)=(X*N*, A*N*(−1),P*N*, C*N*(*λ*)) Tolerance *ξ* Convergence criteria  1: **procedure** BISECTIONSEARCH(M*N*(*λ*, −1), *M*, *ξ*, ) 2: Initialize *λ*<sup>−</sup> = 0; *λ*<sup>+</sup> = 1 3: *φλ*<sup>+</sup> ← (M*N*(*λ*+, −1), ) using Section 5.1 and Proposition 6 4: *ρ*¯(*λ*+) ← *φλ*<sup>+</sup> using Proposition 2 5: **while** *ρ*¯(*λ*+) ≥ *M* **do** 6: *λ*<sup>−</sup> = *λ*+; *λ*<sup>+</sup> = 2*λ*<sup>+</sup> 7: *φλ*<sup>+</sup> ← (M*N*(*λ*+, −1), ) using Section 5.1 and Proposition 6 8: *ρ*¯(*λ*+) ← *φλ*<sup>+</sup> using Proposition 2 9: **while** *λ*<sup>+</sup> − *λ*<sup>−</sup> ≥ 2*ξ* **do** 10: *λ* = *<sup>λ</sup>*++*λ*<sup>−</sup> 2 11: *φλ* ← (M*N*(*λ*, −1), ) using Section 5.1 and Proposition 6 12: *ρ*¯(*λ*) ← *φλ* using Proposition 2 13: **if** *ρ*¯(*λ*) > *M* **then** 14: *λ*<sup>−</sup> = *λ* 15: **else** 16: *λ*+ = *λ* **return** (*λ*∗ <sup>+</sup>, *λ*<sup>∗</sup> <sup>−</sup>) <sup>←</sup> (*λ*+, *<sup>λ</sup>*−)

#### **References**


### *Article* **Distribution of the Age of Gossip in Networks**

**Mohamed A. Abd-Elmagid \* and Harpreet S. Dhillon**

Wireless@VT, Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA 24061, USA **\*** Correspondence: maelaziz@vt.edu

**Abstract:** We study a general setting of gossip networks in which a source node forwards its measurements (in the form of status updates) about some observed physical process to a set of monitoring nodes according to independent Poisson processes. Furthermore, each monitoring node sends status updates about its information status (about the process observed by the source) to the other monitoring nodes according to independent Poisson processes. We quantify the freshness of the information available at each monitoring node in terms of Age of Information (AoI). While this setting has been analyzed in a handful of prior works, the focus has been on characterizing the average (i.e., marginal first moment) of each age process. In contrast, we aim to develop methods that allow the characterization of higher-order marginal or joint moments of the age processes in this setting. In particular, we first use the stochastic hybrid system (SHS) framework to develop methods that allow the characterization of the stationary marginal and joint moment generating functions (MGFs) of age processes in the network. These methods are then applied to derive the stationary marginal and joint MGFs in three different topologies of gossip networks, with which we derive closed-form expressions for marginal or joint high-order statistics of age processes, such as the variance of each age process and the correlation coefficients between all possible pairwise combinations of age processes. Our analytical results demonstrate the importance of incorporating the higher-order moments of age processes in the implementation and optimization of age-aware gossip networks rather than just relying on their average values.

**Keywords:** Age of Information; information freshness; gossip networks; stochastic hybrid systems

#### **1. Introduction**

Timely delivery of status updates is crucial for enabling the operation of many emerging Internet of Things (IoT)-based real-time status updating systems [1]. The concept of AoI was introduced in [2] to quantify the freshness of information available at some node about a physical process as a result of status update receptions over time. In particular, for a single source of information queueing theoretic model in which status updates about a single physical process are generated randomly at a *transmitter node* and are then sent to a *destination node* through a single server, the AoI at the destination was defined in [2] as the following random process: *x*(*t*) = *t* − *u*(*t*), where *u*(*t*) is the generation time instant of the latest status update received at the destination by a time *t*. Assuming that the AoI process is ergodic, in [2], the stationary average value of the AoI under the first-come-firstserve (FCFS) queueing discipline was derived by leveraging the properties of the AoI's sample functions and applying appropriate geometric arguments. Although this geometric approach has been considered in a series of subsequent prior works [3–13] to analyze the marginal distributional properties of AoI or peak AoI (an AoI-related metric introduced in [3] to capture the peak values of AoI over time) for adaptations of the queueing model studied in [2], it often requires tedious calculations of joint moments that limit its tractability in analyzing more sophisticated queueing models or disciplines.

Motivated by the above limitations of the geometric approach to AoI analysis, the authors of [14,15] developed an SHS-based framework to allow the analysis of the marginal

**Citation:** Abd-Elmagid, M.A.; Dhillon, H.S. Distribution of the Age of Gossip in Networks. *Entropy* **2023**, *25*, 364. https://doi.org/10.3390/ e25020364

Academic Editors: Anthony Ephremides and Yin Sun

Received: 16 January 2023 Revised: 12 February 2023 Accepted: 14 February 2023 Published: 16 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

distributional properties of each AoI process (in a network with multiple AoI processes) through the characterization of its stationary marginal moments and MGF. Furthermore, by using the notion of tensors, the authors of [16] generalized the analysis in [14,15] and developed an SHS-based general framework that facilitates the analysis of the joint distributional properties of an arbitrary set of AoI processes in a network through the characterization of their stationary joint moments and MGFs. In the piecewise linear SHS model with linear reset maps considered in the analyses in [14–16], the discrete state of the system *q*(*t*) is modeled as a finite-state, continuous-time Markov chain, and the continuous state of the system is modeled using the vector **x**(*t*), which contains the AoI or age processes at different nodes in the network. When a transition *l* occurs in *q*(*t*) (as a result of status update generation or reception at one of the nodes in the network), the continuous state is updated according to the following linear mapping of **x**(*t*): **x** (*t*) = **x**(*t*)**A***l*, where **x** (*t*) is the updated version of **x**(*t*) and **A***<sup>l</sup>* is the reset mapping matrix associated with a transition *l*. Additionally, in the absence of a transition in *q*(*t*), the age processes in **x**(*t*) grow at a unit rate with time, which yields piecewise linear age processes over time. Based on this description of the piecewise linear SHS model with linear reset maps, one can realize that the frameworks in [14–16] are not applicable to age analysis in classes of status-updating systems where it is not possible for every transition *l* in *q*(*t*) to express the updated value of each age process in the network as a linear combination of the age processes in **x**(*t*). A popular class of such systems is the gossip-based status-updating system, where each node in the network randomly shares its information status over time with the other nodes [17,18]. Here, when there is a transition caused by a status update reception at node *j* from node *i*, the updated value of the age process at node *j* is given by the minimum between the values of the age processes at nodes *i* and *j*. As a result, there have been a handful of recent efforts for developing new SHS-based methods that are suitable for age analysis in such gossip networks [19,20]. However, the methods developed thus far have been limited to the characterization of the stationary marginal first moment (average value) of each age process in the network. In this paper, we develop new SHS-based methods that allow the evaluation of the stationary marginal or joint high-order moments of the age processes in gossip networks through the characterization of their stationary marginal or joint MGFs.

#### *1.1. Related Work*

The literature relevant to this paper can be categorized into the following two categories: (1) prior analyses of AoI applying the SHS approach with linear reset maps and (2) prior analyses of AoI in gossip networks. We now discuss the relevant prior work in these two directions.

*Analyses of AoI applying the SHS approach with linear reset maps*. The SHS approach with linear reset maps developed in [14,15] has been applied to characterize the marginal distributional properties of AoI under a variety of system settings or queueing disciplines [21–33]. In particular, the average AoI was characterized for single-source systems in [21,22] and multi-source systems in [23–27], whereas the MGF of AoI was derived for single-source systems in [28,29], two-source systems in [30], and multi-source systems in [31–33]. Note that a multi-source system refers to the set-up where a transmitter has multiple sources of information generating status updates about multiple physical processes. The authors of [21] derived the average AoI under the last-come-first-serve (LCFS) with preemption in service queueing discipline when the transmitter contained multiple parallel servers. Furthermore, the authors of [22] derived the average AoI under the LCFS with preemption in service queueing discipline when the transmitter contained multiple servers in series or there existed a series of nodes between the transmitter and destination nodes. In [23], the average AoI was characterized under the priority LCFS with preemption in service or waiting queueing model. The authors of [24] derived the average AoI in the presence of packet delivery errors under stationary randomized and round-robin scheduling policies. In [25], the average AoI was characterized under the LCFS with preemption in service

queueing discipline when the transmitter contained multiple parallel servers. The authors of [26] analyzed the average AoI for a network in which multiple transmitter-destination pairs contended for the channel using the carrier sense multiple access scheme. In [27] (in [30]), the average AoI (the MGF of AoI) was derived under several source-aware packet management scheduling policies at the transmitter. For the case where the transmitter was powered by energy harvesting (EH), the authors of [28,31] derived the MGF of AoI under several queueing disciplines, including the LCFS with and without preemption in service or waiting strategies. On the other hand, the authors of [16,34] applied their SHS-based framework (developed to allow the analysis of the joint distributional properties of AoI processes in networks) to characterize the joint MGF of an arbitrary set of AoI processes in a multi-source updating system under non-preemptive and source-agnostic or source-aware preemptive-in-service queueing disciplines.

*Analyses of AoI in gossip networks*. There are only a handful of recent works focusing on the analysis or optimization of AoI and its variants in gossip networks [19,20,35–41]. For a general setting of gossip networks, the author of [19,20] first developed SHS-based methods for the evaluation of the average AoI and the average version age at each node in the network. Note that the version age is a discrete form of AoI defined as the number of versions where the current status of information at a node is out of date compared with the current status of the original source of information. The authors of [35] applied the results of [20] to derive the average version age at each node in several topologies of clustered gossip networks and characterized the average version age scaling as a function of the network size. The authors of [36] extended the SHS-based method developed in [19] for the evaluation of the average AoI in the setting where a timestomping adversary is present and then obtained the average AoI scaling for several network topologies. In [37], each node was assumed to have the ability to estimate the information at the source by applying the majority rule to the information received from the other nodes, and an error metric was introduced to quantify the average percentage of nodes that could accurately obtain the most up-to-date information. The authors of [38–40] developed gossip protocols with the objective of improving the average version age scaling. In [41], the problem of optimizing the average version age was formulated as a Markov decision process for a setting where an energy harvesting (EH)-powered sensor was sending status updates to an aggregator with caching capabilities (which served the requests of a gossip network), and the structural properties of the optimal policy were analytically characterized. Different from the analyses in [19,20,35–41], which were focused on characterizing or optimizing the stationary marginal first moment of AoI or some other AoI-related metrics, this paper is the first to develop SHS-based methods that allow the characterization of the stationary marginal or joint MGFs of AoI processes in gossip networks.

Before delving into more detail about our contributions, it is worth noting that aside from the above queueing theory-based analyses of AoI, there have been efforts to evaluate and optimize AoI or some other AoI-related metrics in a variety of communication systems that deal with time-sensitive information (see [42] for a comprehensive book and [43] for a recent survey). For instance, AoI has been studied in the context of age-optimal transmission scheduling policies [44–52], multi-hop networks [53–55], broadcast networks [56,57], ultrareliable low-latency vehicular networks [58], unmanned aerial vehicle (UAV)-assisted communication systems [59–61], Internet of Underwater Things networks [62], reconfigurable intelligent surface (RIS)-assisted communication systems [63,64], EH systems [65–74], largescale analysis of IoT networks [75–77], remote estimation [78,79], information-theoretic analysis [80–83], timely source coding [84,85], cache updating systems [86–88], economic systems [89], and timely communication in federated learning [90,91].

#### *1.2. Contributions*

A general setting of gossip networks is analyzed in this paper, where a source node forwards its measurements (in the form of status updates) about some observed physical process to a set of monitoring nodes according to independent Poisson processes. Furthermore, each monitoring node sends status updates about its information status (about the process observed by the source) to the other monitoring nodes according to independent Poisson processes. We quantify the freshness of the information available at each monitoring node in terms of AoI. The continuous state of the system is then formed by the AoI or age processes at different monitoring nodes. For this set-up, our main contributions are listed below.

*Developing SHS-based methods for the evaluation of the MGF of age of gossip in networks*. For the general setting of gossip networks described above, we use the SHS framework to characterize (1) the stationary marginal MGF of each age process in the network and (2) the stationary joint MGF of any two arbitrarily selected age processes in the network. In particular, we first construct two classes of test functions (functions whose expected values are quantities of interest) that are suitable for analyzing the marginal or joint MGF. By applying Dynkin's formula to each test function, we derive two systems of first-order ordinary differential equations characterizing the temporal evolution of the marginal and joint MGFs, from which the stationary marginal and joint MGFs are evaluated. To the best of our knowledge, this paper makes the first attempt at developing SHS-based methods for the characterization of the marginal or joint MGF of age of gossip in networks.

*Analysis of the stationary marginal or joint MGF of age of gossip in three different network topologies*. We apply our developed SHS-based methods to study the marginal or joint distributional properties of age processes in the following three network topologies: (1) a serially-connected topology, (2) a parallelly-connected topology, and (3) a clustered topology. For each of these topologies, we derive close-form expressions for (1) the stationary marginal MGF of the age process at each node and (2) the stationary joint MGFs of all possible pairwise combinations of the age processes.

*System design insights*. Using the MGF expressions derived for each network topology considered in this paper, we obtain closed-form expressions for the following quantities: (1) the stationary marginal first and second moments of each age process, (2) the variance of each age process, and (3) the correlation coefficients between all possible pairwise combinations of the age processes. For these derived quantities, we characterize their structural properties in terms of their convexity and monotonic nature with respect to the status updating rates and further provide asymptotic results showing their behaviors when each of the status updating rates becomes small or large. A key insight drawn from our analysis is that it is crucial to incorporate the higher-order moments of age processes in the implementation or optimization of age-aware gossip networks rather than just relying on the average values of the age processes (as has been performed in the existing literature thus far). This insight promotes the importance of the SHS-based methods developed in this paper for the characterization of the marginal or joint MGFs of different age processes in a general setting of gossip networks.

#### *1.3. Organization*

The rest of this paper is organized as follows. Section 2 presents the system model and the problem statement. Afterward, in Section 3, we develop the SHS-based methods that allow the evaluation of the stationary marginal or joint high-order moments of the age processes in gossip networks through the characterization of their stationary marginal or joint MGFs. Section 4 applies the SHS-based methods developed in Section 3 to derive the marginal or joint MGFs of age processes at different nodes in three different connected network settings. For each considered connected network setting, we further use the derived MGF expressions to obtain the marginal or joint high-order statistics of age processes such as the variance of each age process and the correlation coefficients between all possible pairwise combinations of the age processes. Finally, Section 5 concludes the paper.

#### **2. System Model and Problem Statement**

We consider a general setting of gossip networks where a source node (referred to as node 0) provides its measurements about some observed physical process for a set of nodes N = {1, 2, ··· , *N*} in the form of status updates. In particular, all the nodes in N are tracking the age of the process observed by the source, and the status updates sent by node 0 to node *j* ∈ N are assumed to follow an independent Poisson process with a rate *λ*0*j*. Aside from that, node *i* ∈ N sends updates about its information status (about the process observed by the source) to each node *j* ∈ N \{*i*} according to an independent Poisson process with a rate *λij*. When *λij* > 0, we say that nodes *i* and *j* are connected to each other. Since we allow each *λij* (*i* ∈ {0}∪N and *j* ∈ N ) to take a value in [0, ∞], we refer to the above setting as an arbitrarily connected gossip network. Note that this gossip network setting is of interest in many practical networks, such as low-latency vehicular networks and UAV-assisted communication networks. The freshness of status of the information available at each node is quantified in terms of AoI. Let *xi*(*t*) denote the AoI process (or equivalently the age process) at node *i* ∈ N . Assuming that node 0 always maintains a fresh status of information about the observed physical process, the age or AoI at node *j* ∈ N is reset to zero whenever it receives a status update from node 0. Furthermore, when node *j* ∈ N receives a status update from node *i* ∈N \{*j*} at time *t*, its age *xj*(*t*) is reset to the age of node *i xi*(*t*) only if *xi*(*t*) is smaller than *xj*(*t*). To summarize, when node *j* ∈ N receives a status update from node *i* ∈ {0}∪N , the age at node *k* ∈ N is updated as follows:

$$\mathbf{x}\_k'(t) = \begin{cases} 0, & \text{if } i = 0 \text{ and } k = j\_\prime \\ \min\left[\mathbf{x}\_j(t), \mathbf{x}\_i(t)\right], & \text{if } i \in \mathcal{N} \text{ and } k = j\_\prime \\ \mathbf{x}\_k(t), & \text{otherwise.} \end{cases} \tag{1}$$

For an arbitrary set *S* ⊆ N , define *xS*(*t*) = min *<sup>i</sup>*∈*<sup>S</sup> xi*(*t*) as the age or AoI process associated with *S* (or simply the age or AoI of *S*). For the above gossip network setting, the method developed in [19] has been limited to the characterization of the stationary marginal first moment of *xS*(*t*) (i.e., the stationary average value of *xS*(*t*)). In this paper, our prime objective is to develop a method that allows characterizing (1) the stationary marginal higher-order moments of *xS*(*t*) and (2) the stationary joint high-order moments of the two age processes associated with two arbitrary sets *S*<sup>1</sup> and *S*<sup>2</sup> (i.e., *xS*<sup>1</sup> (*t*) and *xS*<sup>2</sup> (*t*), respectively). Note that we do not place any restrictions on the construction of *S*<sup>1</sup> or *S*2. For instance, they could even have common elements. Formally, we aim at characterizing the stationary marginal MGF of *xS*(*t*) and the stationary joint MGF of *xS*<sup>1</sup> (*t*) and *xS*<sup>2</sup> (*t*), which are of the following forms: lim *t*→∞ E[exp[*nxS*(*t*)]] and lim *t*→∞ E exp *n*1*xS*<sup>1</sup> (*t*) + *n*2*xS*<sup>2</sup> (*t*) , respectively, where *n*, *n*1, *n*<sup>2</sup> ∈ R and *S*, *S*1, *S*<sup>2</sup> ⊆ N . As will be evident from the technical sections shortly, the characterization of such MGFs allows one to derive the marginal or joint high-order statistics of the AoI processes at different nodes in the network, such as the variance of each AoI process and the correlation coefficients between all possible pairwise combinations of the AoI processes. Given the generality of the system setting considered in this paper, the importance of our method lies in the fact that it is applicable to the marginal or joint analysis of AoI processes for an arbitrary structured gossip network setting.

#### **3. MGF Analysis of Age in Arbitrarily Connected Gossip Networks**

In this section, we first formulate the problem at hand as an SHS. We then use the SHS framework to characterize (1) the stationary marginal MGF of the age process associated with an arbitrary set *S* ⊆ N (i.e., *xS*(*t*)) and (2) the stationary joint MGF of the two age processes associated with two arbitrary sets *S*<sup>1</sup> ⊆ N and *S*<sup>2</sup> ⊆ N (i.e., *xS*<sup>1</sup> (*t*) and *xS*<sup>2</sup> (*t*), respectively) for the arbitrarily connected gossip network setting described in Section 2.

The SHS framework is used to analyze hybrid queueing systems that can be modeled by a combination of discrete and continuous state parameters. For the gossip network setting considered in this paper, the continuous state of the system is modeled using the row vector **x**(*t*) = [*x*1(*t*) *x*2(*t*) ··· *xN*(*t*)] containing the AoI or age processes at different nodes in the network. Furthermore, since the status updates sent by each node in the network to the other nodes are assumed to follow independent Poisson processes, it is sufficient to model the discrete state of the system as a singleton set. To complete the description of an SHS, one needs to define a set of transitions L along with the continuous and discrete states of the system. This set L refers to changes in either the continuous state or the discrete state. Since the discrete state of the SHS under consideration is a singleton set, the set L corresponds to only the changes in the continuous state of the system. In our system setting, a change in the continuous state of the system occurs when there is a status update reception at some node in the network. Furthermore, as long as there is no status update reception at any of the nodes, the AoI or age at each node grows linearly with time (which yields piecewise linear age processes over time); in other words, **x**˙(*t*) = **1***N*, where **<sup>1</sup>***<sup>N</sup>* is the row vector [<sup>1</sup> ··· <sup>1</sup>] <sup>∈</sup> <sup>R</sup>1×*N*. By inspecting the age updating rule in (1), the set <sup>L</sup> can be defined as follows:

$$\mathcal{L} = \{(0, j) : j \in \mathcal{N}\} \cup \{(i, j) : i, j \in \mathcal{N}\}.\tag{2}$$

For the above SHS-based formulation, we derive two systems of linear equations for evaluating the stationary marginal MGF lim *t*→∞ E[exp[*nxS*(*t*)]] and the stationary joint MGF lim *t*→∞ E exp *n*1*xS*<sup>1</sup> (*t*) + *n*2*xS*<sup>2</sup> (*t*) . The description of these systems of equations and the presentation of the subsequent results require defining the following quantities:

$$w\_S^{(n)}(t) = \mathbb{E}[\exp[n\mathbf{x}\_S(t)]],\ \vec{v}\_S^{(n)} = \lim\_{t \to \infty} v\_S^{(n)}, \ \forall S \subseteq \mathcal{N}.\tag{3}$$

$$\upsilon\_{\mathcal{S}\_1,\mathcal{S}\_2}^{(n\_1,n\_2)}(t) = \mathbb{E}\left[\exp\left[n\_1\mathbf{x}\_{\mathcal{S}\_1}(t) + n\_2\mathbf{x}\_{\mathcal{S}\_2}(t)\right]\right], \\ \upsilon\_{\mathcal{S}\_1,\mathcal{S}\_2}^{(n\_1,n\_2)} = \lim\_{t \to \infty} \upsilon\_{\mathcal{S}\_1,\mathcal{S}\_2}^{(n\_1,n\_2)}, \ \forall \mathcal{S}\_1,\mathcal{S}\_2 \subseteq \mathcal{N},\tag{4}$$

$$w\_S^{(m)}(t) = \mathbb{E}[\mathbf{x}\_S^m(t)], \ v\_S^{(m)} = \lim\_{t \to \infty} v\_S^{(m)}, \ \forall S \subseteq \mathcal{N}\_\prime \tag{5}$$

$$\boldsymbol{w}\_{S\_1, S\_2}^{(m\_1, m\_2)}(t) = \mathbb{E}\left[\mathbf{x}\_{S\_1}^{m\_1}(t)\mathbf{x}\_{S\_2}^{m\_2}(t)\right], \ \boldsymbol{\vartheta}\_{S\_1, S\_2}^{(m\_1, m\_2)} = \lim\_{t \to \infty} \boldsymbol{v}\_{S\_1, S\_2}^{(m\_1, m\_2)}, \ \forall \mathbf{S}\_1, \mathbf{S}\_2 \subseteq \boldsymbol{\mathcal{N}},\tag{6}$$

where *v* (*m*) *<sup>S</sup>* is the marginal *m*th moment of the age process *xS*(*t*) and *v* (*m*1,*m*2) *<sup>S</sup>*1,*S*<sup>2</sup> is the joint moment of the two age processes *xS*<sup>1</sup> (*t*) and *xS*<sup>2</sup> (*t*). From (3) and (5), *v* (1) *<sup>S</sup>* (*t*) may generally refer to *v* (*n*) *<sup>S</sup>* (*t*)|*n*=<sup>1</sup> or *v* (*m*) *<sup>S</sup>* (*t*)|*m*<sup>=</sup>1. To eliminate this conflict, the convention that *v* (*i*) *<sup>S</sup>* (*t*) for an integer *i* refers to *v* (*m*) *<sup>S</sup>* (*t*) at *m* = *i* is maintained here. The previous argument also applies to *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) and *<sup>v</sup>* (*m*1,*m*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) in (4) and (6), respectively, where *<sup>v</sup>* (*i*,*j*) *S*1,*S*<sup>2</sup> (*t*), for integers *i* and *j*, refers to *v* (*m*1,*m*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) at *<sup>m</sup>*<sup>1</sup> = *<sup>i</sup>* and *<sup>m</sup>*<sup>2</sup> = *<sup>j</sup>*. Furthermore, following the notations in [19], we define the update rate of node *i* into set *S* and the set of updating neighbors of *S* as

$$
\lambda\_i(S) = \begin{cases}
\sum\_{j \in S} \lambda\_{i,j\_\prime} & \text{if } i \notin S, \\
0, & \text{otherwise,}
\end{cases}
\tag{7}
$$

$$N(\mathcal{S}) = \{ i \in \mathcal{N} : \lambda\_i(\mathcal{S}) > 0 \}. \tag{8}$$

We are now ready to present the two systems of linear equations for the evaluation of *v*¯ (*n*) *<sup>S</sup>* and *v*¯ (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> in the following two theorems:

**Theorem 1.** *For an arbitrarily connected gossip network, there exists a threshold δ* > 0 *such that for n* ∈ [0, *δ*)*, the stationary marginal MGF of AoI of set S* ⊆ N *is given by*

$$
\boldsymbol{\vartheta}\_{\boldsymbol{S}}^{(n)} = \frac{\lambda\_0(\boldsymbol{S}) + \sum\_{\boldsymbol{i} \in \mathcal{N}(\boldsymbol{S})} \lambda\_{\boldsymbol{i}}(\boldsymbol{S}) \boldsymbol{\vartheta}\_{\boldsymbol{S} \cup \{\boldsymbol{i}\}}^{(n)}}{\lambda\_0(\boldsymbol{S}) + \sum\_{\boldsymbol{i} \in \mathcal{N}(\boldsymbol{S})} \lambda\_{\boldsymbol{i}}(\boldsymbol{S}) - n}. \tag{9}
$$

*Furthermore, for m* ≥ 1*, the stationary marginal m-th moment of AoI of set S* ⊆ N *is given by*

$$\boldsymbol{\vartheta}\_{\boldsymbol{S}}^{(m)} = \frac{m \boldsymbol{\vartheta}\_{\boldsymbol{S}}^{(m-1)} + \sum\_{i \in \mathcal{N}(\boldsymbol{S})} \lambda\_{i}(\boldsymbol{S}) \boldsymbol{\vartheta}\_{\boldsymbol{S} \cup \{i\}}^{(m)}}{\lambda\_{0}(\boldsymbol{S}) + \sum\_{i \in \mathcal{N}(\boldsymbol{S})} \lambda\_{i}(\boldsymbol{S})}.\tag{10}$$

#### **Proof of Theorem 1.** See Appendix A.

**Theorem 2.** *For an arbitrarily connected gossip network, there exists a threshold δ* > 0 *such that for* 0 ≤ *n*<sup>1</sup> + *n*<sup>2</sup> < *δ, the stationary joint MGF of the two AoI processes associated with the two sets S*<sup>1</sup> *and S*<sup>2</sup> *is given by*

*v*¯ (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>λ</sup>*0(*S*<sup>1</sup> <sup>∪</sup> *<sup>S</sup>*2) + <sup>∑</sup>*i*∈N \(*S*1∩*S*2) *<sup>λ</sup>i*(*S*<sup>1</sup> <sup>∩</sup> *<sup>S</sup>*2) + <sup>∑</sup>*i*∈N \*S*<sup>1</sup> *<sup>λ</sup>i*(*S*<sup>1</sup> \ *<sup>S</sup>*2) + <sup>∑</sup>*i*∈N \*S*<sup>2</sup> *<sup>λ</sup>i*(*S*<sup>2</sup> \ *<sup>S</sup>*1) <sup>−</sup> (*n*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*2) <sup>×</sup> *λ*0(*S*<sup>1</sup> ∩ *S*2) + *λ*0(*S*<sup>1</sup> \ *S*2)*v*¯ (*n*2) *<sup>S</sup>*<sup>2</sup> + *λ*0(*S*<sup>2</sup> \ *S*1)*v*¯ (*n*1) *<sup>S</sup>*<sup>1</sup> <sup>+</sup> ∑ *i*∈N \*S*<sup>1</sup> *λi*(*S*<sup>1</sup> \ *S*2)*v*¯ (*n*1,*n*2) *<sup>S</sup>*1∪{*i*},*S*<sup>2</sup> <sup>+</sup> ∑ *i*∈N \*S*<sup>2</sup> *λi*(*S*<sup>2</sup> \ *S*1)*v*¯ (*n*1,*n*2) *S*1,*S*2∪{*i*} + ∑ *i*∈N \(*S*1∪*S*2) *λi*(*S*<sup>1</sup> ∩ *S*2)*v*¯ (*n*1,*n*2) *<sup>S</sup>*1∪{*i*},*S*2∪{*i*} <sup>+</sup> ∑ *i*∈*S*1\*S*<sup>2</sup> *λi*(*S*<sup>1</sup> ∩ *S*2)*v*¯ (*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*} <sup>+</sup> ∑ *i*∈*S*2\*S*<sup>1</sup> *λi*(*S*<sup>1</sup> ∩ *S*2)*v*¯ (*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> . (11)

*Furthermore, for m*1, *m*<sup>2</sup> ≥ 1*, the stationary joint* (*m*1, *m*2)*-th moment of the AoI processes associated with the two sets S*<sup>1</sup> *and S*<sup>2</sup> *is given by*

*v*¯ (*m*1,*m*2) *<sup>S</sup>*1,*S*<sup>2</sup> <sup>=</sup> <sup>1</sup> *<sup>λ</sup>*0(*S*<sup>1</sup> <sup>∪</sup> *<sup>S</sup>*2) + <sup>∑</sup>*i*∈N \(*S*1∩*S*2) *<sup>λ</sup>i*(*S*<sup>1</sup> <sup>∩</sup> *<sup>S</sup>*2) + <sup>∑</sup>*i*∈N \*S*<sup>1</sup> *<sup>λ</sup>i*(*S*<sup>1</sup> \ *<sup>S</sup>*2) + <sup>∑</sup>*i*∈N \*S*<sup>2</sup> *<sup>λ</sup>i*(*S*<sup>2</sup> \ *<sup>S</sup>*1) <sup>×</sup> *m*1*v*¯ (*m*1−1,*m*2) *S*1,*S*<sup>2</sup> + *m*2*v*¯ (*m*1,*m*2−1) *<sup>S</sup>*1,*S*<sup>2</sup> <sup>+</sup> ∑ *i*∈N \*S*<sup>1</sup> *λi*(*S*<sup>1</sup> \ *S*2)*v*¯ (*m*1,*m*2) *<sup>S</sup>*1∪{*i*},*S*<sup>2</sup> <sup>+</sup> ∑ *i*∈N \*S*<sup>2</sup> *λi*(*S*<sup>2</sup> \ *S*1)*v*¯ (*m*1,*m*2) *<sup>S</sup>*1,*S*2∪{*i*} <sup>+</sup> <sup>∑</sup> *<sup>i</sup>*∈N \(*S*1∪*S*2) *λi*(*S*<sup>1</sup> ∩ *S*2)*v*¯ (*m*1,*m*2) *S*1∪{*i*},*S*2∪{*i*} + ∑ *i*∈*S*1\*S*<sup>2</sup> *λi*(*S*<sup>1</sup> ∩ *S*2)*v*¯ (*m*1,*m*2) *<sup>S</sup>*1,*S*2∪{*i*} <sup>+</sup> ∑ *i*∈*S*2\*S*<sup>1</sup> *λi*(*S*<sup>1</sup> ∩ *S*2)*v*¯ (*m*1,*m*2) *S*1∪{*i*},*S*<sup>2</sup> . (12)

**Proof of Theorem 2.** See Appendix B.

**Remark 1.** *Note that the stationary marginal MGF of S*<sup>1</sup> *or S*<sup>2</sup> *can be obtained from the stationary joint MGF in (11). In particular, when n*<sup>2</sup> = 0 *and S*<sup>2</sup> = ∅*, v*¯ (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> *reduces to*

$$
\overline{\boldsymbol{\vartheta}}\_{S\_{1,\mathcal{D}}}^{(n\_1,0)} = \frac{\lambda\_0(\mathcal{S}\_1) + \sum\_{i \in \mathcal{N}(\mathcal{S}\_1)} \lambda\_i(\mathcal{S}\_1) \overline{\boldsymbol{\vartheta}}\_{S\_1 \cup \{i\}}^{(n)}}{\lambda\_0(\mathcal{S}\_1) + \sum\_{i \in \mathcal{N}(\mathcal{S}\_1)} \lambda\_i(\mathcal{S}\_1) - n\_1} \stackrel{(\mathbf{a})}{=} \overline{\boldsymbol{\vartheta}}\_{S\_1}^{(n\_1)},\tag{13}
$$

*where step (a) follows from (9). Similarly, one can observe that v*¯ (0,*n*2) <sup>∅</sup>,*S*<sup>2</sup> = *<sup>v</sup>*¯ (*n*2) *<sup>S</sup>*<sup>2</sup> *.*

*Furthermore, when m* = 1*, (10) reduces to ([19] Theorem 1) characterizing the stationary marginal first moment of the AoI of set S* ⊆ N *.*

It is worth highlighting that the generality of Theorems 1 and 2 lies in the fact that they allow one to investigate the stationary marginal or joint MGFs of the age processes at different nodes in an arbitrarily connected gossip network. This opens the door for the application of Theorems 1 and 2 to characterize the marginal or joint high-order moments of age processes for different configurations or topologies of gossip networks studied in the literature, which have only been analyzed in terms of the marginal first moments of age processes (i.e., average age values) until now. Furthermore, the expressions in (10) and (12) provide a straightforward way for the numerical evaluation of the stationary marginal or joint high-order moments.

#### **4. Applications of Theorems 1 and 2**

In this section, we first apply Theorems 1 and 2 to understand the distributional properties of the age processes in the two canonical settings depicted in Figure 1 (i.e., the the serially and parallelly-connected network settings). We then aim to analyze a more complicated network setting, which was chosen to be the clustered gossip network topology depicted in Figure 2. Our choice for the clustered gossip network setting was inspired by the recent interest in analyzing its different topologies in terms of the marginal first moment of each age process (average age) in the network [35].

**Figure 1.** (**a**) A serially-connected network setting. (**b**) A parallelly-connected network setting.

**Figure 2.** A clustered gossip network topology consisting of *C* clusters such that the status updating rate from node 0 to the *c*-th cluster is *λc*.

#### *4.1. Serially-Connected Networks*

**Theorem 3.** *For the serially-connected network in Figure 1a, the stationary marginal MGFs of the AoI processes at nodes 1 and 2 are respectively given by*

$$
\sigma\_{\{1\}}^{(n)} = \frac{\lambda\_0}{\lambda\_0 - n},
\tag{14}
$$

$$
\sigma\_{\{2\}}^{(n)} = \frac{\lambda\_0 \lambda}{(\lambda\_0 - n)(\lambda - n)}. \tag{15}
$$

*Additionally, the stationary joint MGF of the two AoI processes at nodes 1 and 2 is given by*

$$
\psi\_{\{2\},\{1\}}^{(n\_1,n\_2)} = \frac{\lambda\_0 \lambda}{\lambda\_0 + \lambda - (n\_1 + n\_2)} \left( \frac{\lambda\_0}{(\lambda\_0 - n\_1)(\lambda - n\_1)} + \frac{1}{\lambda\_0 - (n\_1 + n\_2)} \right). \tag{16}
$$

**Proof of Theorem 3.** See Appendix C.

**Proposition 1.** *For the serially-connected network in Figure 1a, the first moment, second moment, and variance of the AoI process at each node are given by*

$$
\bar{\boldsymbol{v}}\_{\{1\}}^{(1)} = \lambda\_0^{-1}, \; \bar{\boldsymbol{v}}\_{\{1\}}^{(2)} = 2\lambda\_0^{-2}, \; \text{var}[\mathbf{x}\_1(t)] = \lambda\_0^{-2} \; \text{s} \tag{17}
$$

$$
\psi\_{\{2\}}^{(1)} = \frac{1}{\lambda\_0} + \frac{1}{\lambda}, \quad \psi\_{\{2\}}^{(2)} = 2\left(\frac{1}{\lambda\_0^2} + \frac{1}{\lambda\_0 \lambda} + \frac{1}{\lambda^2}\right), \quad \text{var}[\mathbf{x}\_2(t)] = \frac{1}{\lambda\_0^2} + \frac{1}{\lambda^2}.\tag{18}
$$

*Furthermore, the correlation coefficient between the AoI processes at nodes 1 and 2 can be expressed as*

$$\text{cor}[\mathbf{x}\_1(t), \mathbf{x}\_2(t)] = \frac{\lambda^2}{(\lambda\_0 + \lambda)\sqrt{\lambda\_0^2 + \lambda^2}}.\tag{19}$$

**Proof of Proposition 1.** See Appendix D.

**Remark 2.** *Note that the expressions of the stationary marginal MGFs in Theorem 3 and the stationary marginal moments in Proposition 1 match their corresponding ones for the preemptive line networks analyzed in [15].*

**Remark 3.** *Note that the stationary moments and variance of the age process at node 1 in (17) are univariate functions of λ*0*. This happens because node 1 is directly connected to node 0. This argument will also apply to: (i) the expressions derived for the age processes at nodes 1 and 2 in the parallelly-connected network in Figure 1b, and (ii) the expressions derived for the age process at node 1 inside each cluster of the clustered gossip network in Figure 2.*

**Remark 4.** *Note that the stationary moments and variance of the age process at node 2 in (18) are invariant to exchanging λ and λ*0*. These quantities are also jointly convex functions in* (*λ*0, *λ*)*, where the minimum value (zero) of each function is achieved at λ*<sup>0</sup> = *λ* = ∞*. Furthermore, for a given λ or λ*0*, each quantity in (18) is a monotonically non-increasing function with respect to λ*<sup>0</sup> *or λ. This can also be observed in Figure 3.*

**Remark 5.** *For a given λ,* cor[*x*1(*t*), *x*2(*t*)] *in (19) monotonically decreases as a function of λ*<sup>0</sup> *in the form* lim *λ*0→0 cor[*x*1(*t*), *x*2(*t*)] = 1 *until it approaches* lim *λ*0→∞ cor[*x*1(*t*), *x*2(*t*)] = 0*. On the other hand, for a given λ*0*,* cor[*x*1(*t*), *x*2(*t*)] *monotonically increases as a function of λ in the form* lim *λ*→0 cor[*x*1(*t*), *x*2(*t*)] = 0 *until it approaches* lim *λ*→∞ cor[*x*1(*t*), *x*2(*t*)] = 1*. This can also be observed in Figure 4.*

**Figure 3.** Stationary first and second moments of age processes in the serially and parallelly-connected network settings. We set *λ<sup>s</sup>* = 0.5*λ*<sup>0</sup> and *λ* = *λ*<sup>1</sup> = *λ*2. The simulated curves are obtained from the numerical evaluation of the stationary marginal moments using (10) in Theorem 1.

**Figure 4.** Correlation coefficients between age processes in the serially and parallelly-connected network settings.

#### *4.2. Parallelly-Connected Networks*

**Theorem 4.** *For the parallelly-connected network in Figure 1b, the stationary marginal MGFs of the AoI processes at nodes 1, 2, and 3 are given by*

$$
\psi\_{\{1\}}^{(n)} = \psi\_{\{2\}}^{(n)} = \frac{\lambda\_s}{\lambda\_s - n},
\tag{20}
$$

$$\psi\_{\{3\}}^{\{\eta\}} = \frac{\lambda\_s (2\lambda\_s - n) \left[\lambda\_1 (\lambda\_s + \lambda\_1 - n) + \lambda\_2 (\lambda\_s + \lambda\_2 - n)\right] + 2\lambda\_s \lambda\_1 \lambda\_2 (2\lambda\_s + \lambda\_1 + \lambda\_2 - 2n)}{(2\lambda\_s - n)(\lambda\_1 + \lambda\_2 - n)(\lambda\_s + \lambda\_1 - n)(\lambda\_s + \lambda\_2 - n)} . \tag{21}$$

*Additionally, the stationary joint MGF of the two AoI processes at nodes 1 and 3 is given by*

$$
\psi\_{(3),(1)}^{(n\_1, n\_2)} = \frac{\sum\_{i=1}^4 a\_i (n\_1, n\_2)}{[\lambda\_s + \lambda\_1 + \lambda\_2 - (n\_1 + n\_2)][2\lambda\_s + \lambda\_1 - (n\_1 + n\_2)][2\lambda\_s - (n\_1 + n\_2)][\lambda\_s + \lambda\_2 - (n\_1 + n\_2)]}
$$

$$
\times \frac{1}{(\lambda\_s - n\_2)(\lambda\_1 + \lambda\_2 - n\_1)(2\lambda\_s - n\_1)(\lambda\_s + \lambda\_2 - n\_1)(\lambda\_s + \lambda\_1 - n\_1)}\tag{22}
$$

$$
\text{where}
$$

$$\begin{split} u\_1(n\_1, n\_2) = \lambda\_s^2 (\lambda\_s - n\_2) [\lambda\_s + \lambda\_2 - (n\_1 + n\_2)] [2\lambda\_s + \lambda\_1 - (n\_1 + n\_2)] [2\lambda\_s - (n\_1 + n\_2)] \\ \times \left[ (2\lambda\_2 - n\_1) \left[ \lambda\_1 (\lambda\_s + \lambda\_1 - n\_1) + \lambda\_2 (\lambda\_s + \lambda\_2 - n\_1) \right] + 2\lambda\_1 \lambda\_2 (2\lambda\_s + \lambda\_1 + \lambda\_2 - 2n\_1) \right], \end{split} \tag{23}$$

*α*2(*n*1, *n*2) = *λ*<sup>2</sup> *<sup>s</sup>λ*2(*λ*<sup>1</sup> + *λ*<sup>2</sup> − *n*1)(2*λ<sup>s</sup>* − *n*1)(*λ<sup>s</sup>* + *λ*<sup>1</sup> − *n*1)(*λ<sup>s</sup>* + *λ*<sup>2</sup> − *n*1)[2*λ<sup>s</sup>* + *λ*<sup>1</sup> − (*n*<sup>1</sup> + *n*2)][*λ<sup>s</sup>* + *λ*<sup>1</sup> + *λ*<sup>2</sup> − (*n*<sup>1</sup> + *n*2)], (24)

$$\mathcal{A}\_{3}(n\_{1},n\_{2}) = \lambda\_{s}^{2}\lambda\_{2}(\lambda\_{1}+\lambda\_{2}-n\_{1})(\lambda\_{s}+\lambda\_{2}-n\_{1})(2\lambda\_{s}+2\lambda\_{1}-n\_{1})(\lambda\_{s}-n\_{2})[\lambda\_{s}+\lambda\_{2}-(n\_{1}+n\_{2})][2\lambda\_{s}-(n\_{1}+n\_{2})],\tag{25}$$

$$\begin{split} a\_4(n\_1, n\_2) = &\lambda\_s \lambda\_1 (\lambda\_s - n\_2)(\lambda\_1 + \lambda\_2 - n\_1)(2\lambda\_s - n\_1)(\lambda\_s + \lambda\_2 - n\_1)(\lambda\_s + \lambda\_1 - n\_1) \\ &\times \left[ \left[ 2\lambda\_s + \lambda\_1 - (n\_1 + n\_2) \right] \left[ 2\lambda\_s + \lambda\_2 - (n\_1 + n\_2) \right] + \lambda\_2 \left[ \lambda\_s + \lambda\_2 - (n\_1 + n\_2) \right] \right]. \end{split} \tag{26}$$

#### **Proof of Theorem 4.** See Appendix E.

**Proposition 2.** *For the parallelly-connected network in Figure 1b, the first moment, second moment, and variance of the AoI process at each node are given by*

$$\begin{aligned} \ v\_{\{1\}}^{(1)} = \ v\_{\{2\}}^{(1)} = \lambda\_s^{-1}, \ v\_{\{1\}}^{(2)} = \vartheta\_{\{2\}}^{(2)} = 2\lambda\_s^{-2}, \ \text{var}[\mathbf{x}\_1(t)] = \text{var}[\mathbf{x}\_2(t)] = \lambda\_s^{-2}, \end{aligned} \tag{27}$$

$$\psi\_{\{3\}}^{(1)} = \frac{2\lambda\_{\mathfrak{s}}(\lambda\_{\mathfrak{s}} + \lambda\_{1})(\lambda\_{\mathfrak{s}} + \lambda\_{2}) + \lambda\_{1}(2\lambda\_{\mathfrak{s}} + \lambda\_{2})(\lambda\_{\mathfrak{s}} + \lambda\_{1}) + \lambda\_{2}(2\lambda\_{\mathfrak{s}} + \lambda\_{1})(\lambda\_{\mathfrak{s}} + \lambda\_{2})}{2\lambda\_{\mathfrak{s}}(\lambda\_{\mathfrak{s}} + \lambda\_{1})(\lambda\_{\mathfrak{s}} + \lambda\_{2})(\lambda\_{1} + \lambda\_{2})},\tag{28}$$

$$\psi\_{\{3\}}^{(2)} = \frac{\sum\_{i=0}^{6} \gamma\_i \lambda\_s^i}{2\lambda\_s^2 (\lambda\_1 + \lambda\_2)^2 (\lambda\_s + \lambda\_1)^2 (\lambda\_s + \lambda\_2)^2},\tag{29}$$

$$\text{var}[\mathbf{x}\_3(t)] = \frac{\sum\_{i=0}^6 \eta\_i \lambda\_s^i}{4\lambda\_s^2 \left(\lambda\_1 + \lambda\_2\right)^2 \left(\lambda\_s + \lambda\_1\right)^2 \left(\lambda\_s + \lambda\_2\right)^2} \tag{30}$$

*where*

$$\begin{aligned} \gamma\_6 &= 4, \; \gamma\_5 = 12(\lambda\_1 + \lambda\_2), \; \gamma\_4 = 4\left[4(\lambda\_1 + \lambda\_2)^2 + \lambda\_1\lambda\_2\right], \; \gamma\_3 = 12(\lambda\_1 + \lambda\_2)^3, \\ \gamma\_7 &= (\lambda\_1 + \lambda\_2)^2 \left[4(\lambda\_1 + \lambda\_2)^2 + \lambda\_1\lambda\_2\right], \; \gamma\_1 = 3\lambda\_1\lambda\_2(\lambda\_1 + \lambda\_2)^3, \; \gamma\_0 = \lambda\_1^2\lambda\_2^2(\lambda\_1 + \lambda\_2)^2, \\ \eta\_6 &= 4, \; \eta\_5 = 8(\lambda\_1 + \lambda\_2), \; \eta\_4 = 8\left[(\lambda\_1 + \lambda\_2)^2 + \lambda\_1\lambda\_2\right], \; \eta\_3 = 4(\lambda\_1 + \lambda\_2)\left(2\lambda\_1^2 + 3\lambda\_1\lambda\_2 + 2\lambda\_2^2\right), \\ \eta\_7 &= 2(\lambda\_1 + \lambda\_2)^2 \left(2\lambda\_1^2 + \lambda\_1\lambda\_2 + 2\lambda\_2^2\right), \; \eta\_1 = 2\lambda\_1\lambda\_2(\lambda\_1 + \lambda\_2)^3, \; \eta\_0 = \lambda\_1^2\lambda\_2^2(\lambda\_1 + \lambda\_2)^2. \end{aligned}$$

*Furthermore, the correlation coefficient between the AoI processes at nodes 1 and 3 can be expressed as*

$$\begin{split} \text{cov}[\mathbf{x}\_{1}(t), \mathbf{x}\_{3}(t)] &= \frac{\lambda\_{1}(\lambda\_{1} + \lambda\_{2})}{2(\lambda\_{s} + \lambda\_{1} + \lambda\_{2})(2\lambda\_{s} + \lambda\_{1})(\lambda\_{s} + \lambda\_{2})\sqrt{\sum\_{l=0}^{6} \delta\_{l}\lambda\_{s}^{l}}} \\ &\times \left[ 8\lambda\_{s}^{4} + \lambda\_{s}^{3}(12\lambda\_{1} + 7\lambda\_{2}) + 2\lambda\_{s}^{2}(\lambda\_{1} + 2\lambda\_{2})(2\lambda\_{1} + \lambda\_{2}) + \lambda\_{s}\lambda\_{2}\left(3\lambda\_{1}^{2} + 5\lambda\_{1}\lambda\_{2} + \lambda\_{2}^{2}\right) + \lambda\_{1}\lambda\_{2}^{2}(\lambda\_{1} + \lambda\_{2})\right], \end{split} \tag{31}$$

*where*

$$\begin{aligned} \delta\_{\delta} &= 4, \,\delta\_{\delta} = 8(\lambda\_1 + \lambda\_2), \,\delta\_4 = 8\left[\left(\lambda\_1 + \lambda\_2\right)^2 + \lambda\_1\lambda\_2\right], \,\delta\_3 = 4(\lambda\_1 + \lambda\_2)\left(2\lambda\_1^2 + 3\lambda\_1\lambda\_2 + 2\lambda\_2^2\right), \\ \delta\_2 &= 2(\lambda\_1 + \lambda\_2)^2 \left(2\lambda\_1^2 + \lambda\_1\lambda\_2 + 2\lambda\_2^2\right), \,\delta\_1 = 2\lambda\_1\lambda\_2(\lambda\_1 + \lambda\_2)^3, \,\delta\_0 = \lambda\_1^2\lambda\_2^2(\lambda\_1 + \lambda\_2)^2. \end{aligned}$$

**Proof of Proposition 2.** See Appendix F.

**Remark 6.** *When λ*<sup>1</sup> *or λ*<sup>2</sup> *is zero, the parallelly-connected network reduces to a serially-connected network with a single path from node 0 to node 3. Thus, in this case, the stationary moments and variance of the age process at node 3 reduce to the corresponding expressions associated with the age process at node 2 in the serially-connected network such that λ*<sup>0</sup> *and λ are replaced by λ<sup>s</sup> and <sup>λ</sup>*<sup>1</sup> *or <sup>λ</sup>*2*. On the other hand, when <sup>λ</sup>*<sup>1</sup> *and <sup>λ</sup>*<sup>2</sup> *approach* <sup>∞</sup>*, we have* lim *<sup>λ</sup>*1→∞,*λ*2→<sup>∞</sup> *v*¯ (1) {3} <sup>=</sup> <sup>1</sup> 2*λs ,* lim *<sup>λ</sup>*1→∞,*λ*2→<sup>∞</sup> *v*¯ (2) {3} <sup>=</sup> <sup>1</sup> 2*λ*<sup>2</sup> *s , and* lim *<sup>λ</sup>*1→∞,*λ*2→<sup>∞</sup> var[*x*3(*t*)] = <sup>1</sup> 4*λ*<sup>2</sup> *s . Note that the stationary moments and*

*variance of x*3(*t*) *reduce to the ones associated with x*{1,2}(*t*)*.*

**Remark 7.** *Note that the stationary moments and variance of the age process at node 3 in (28)–(30) are invariant to exchanging λ*<sup>1</sup> *and λ*2*. Furthermore, for a given* (*λs*, *λ*2)*,* (*λs*, *λ*1)*, or* (*λ*1, *λ*2)*, each quantity in (28)–(30) is a monotonically non-increasing function with respect to λ*1*, λ*2*, or λs. This can also be observed in Figure 3.*

**Remark 8.** *For the same status updating rate from node 0 (i.e., λ*<sup>0</sup> = 2*λs) and λ* = *λ*<sup>1</sup> = *λ*2*, one can compare the achievable age performance at node 3 in the parallelly-connected network with the achievable age performance at node 2 in the serially-connected network using Propositions 1 and 2 as follows:*

$$
\psi\_{\{2\}}^{(1)} - \psi\_{\{3\}}^{(1)} = \frac{\lambda\_0}{2\lambda(\lambda\_0 + 2\lambda)},
\tag{32}
$$

$$
\psi\_{\{2\}}^{(2)} - \psi\_{\{3\}}^{(2)} = \frac{3\lambda\_0^2 + 4\left(\lambda^2 + 2\lambda\_0\lambda\right)}{2\lambda^2 \left(\lambda\_0 + 2\lambda\right)^2},\tag{33}
$$

$$\text{var}[\mathbf{x}\_2(t)] - \text{var}[\mathbf{x}\_3(t)] = \frac{3\lambda\imath\_0(\lambda\_0 + 4\lambda)}{4\lambda^2(\lambda\_0 + 2\lambda)^2}.\tag{34}$$

*By inspecting (32)–(34), one can see that these are positive quantities for any choice of values of* (*λ*0, *λ*)*. This certainly indicates that node 3 in the parallelly-connected network achieved a better age performance than the one achievable by node 2 in the serially-connected network. The improvement in the age performance at node 3 resulted from the existence of two status-updating paths from node 0 to node 3, as opposed to only a single path from node 0 to node 2 in the serially-connected network. Furthermore, each quantity in (32)–(34) is a monotonically decreasing function of λ for a given λ*<sup>0</sup> *such that its value approaches zero as λ* → ∞*. This can also be observed in Figure 3.*

**Remark 9.** *Due to the symmetry in the configuration of the parallelly-connected network, note that the correlation coefficient between x*2(*t*) *and x*3(*t*) *(i.e.,* cor[*x*2(*t*), *x*3(*t*)]*) can be obtained by replacing λ*<sup>1</sup> *and λ*<sup>2</sup> *with λ*<sup>2</sup> *and λ*1*, respectively, in (31). Furthermore, for a given* (*λ*1, *λ*2)*,* cor[*x*1(*t*), *x*3(*t*)] *monotonically decreases as a function of λ<sup>s</sup> from* lim *λs*→0 cor[*x*1(*t*), *x*3(*t*)] = <sup>1</sup> <sup>2</sup> *until it approaches* lim *λs*→∞ cor[*x*1(*t*), *x*3(*t*)] = 0*. On the other hand, for a given* (*λs*, *λ*2)*,* cor[*x*1(*t*), *x*3(*t*)] *monotonically increases as a function of λ*<sup>1</sup> *from* lim *λ*1→0 cor[*x*1(*t*), *x*3(*t*)] = 0 *until it approaches* lim *λ*1→∞ cor[*x*1(*t*), *<sup>x</sup>*3(*t*)] = <sup>4</sup>*λ*<sup>2</sup> *s*+3*λsλ*2+*λ*<sup>2</sup> 2 2(*λs*+*λ*2) √4*λ*<sup>2</sup> *<sup>s</sup>*+2*λsλ*2+*λ*<sup>2</sup> 2 *. Finally, for a given* (*λs*, *λ*1)*, one can deduce the following asymptotic results:* lim *λ*2→0 cor[*x*1(*t*), *<sup>x</sup>*3(*t*)] = *<sup>λ</sup>*<sup>2</sup> 1 (*λs*+*λ*1) √*λ*<sup>2</sup> *<sup>s</sup>*+*λ*<sup>2</sup> 1 *and* lim *λ*2→∞ cor[*x*1(*t*), *x*3(*t*)] = *λ*1(*λs*+*λ*1) 2(2*λs*+*λ*1) √4*λ*<sup>2</sup> *<sup>s</sup>*+2*λsλ*1+*λ*<sup>2</sup> 1 *. Clearly, when λ*<sup>2</sup> = 0*, there will only be a single status-updating path from node 0 to node 3 (through node 1), and hence we observe that* cor[*x*1(*t*), *x*3(*t*)] *reduced to the same expression as* cor[*x*1(*t*), *x*2(*t*)] *in (19) for the serially-connected network after replacing λ*<sup>0</sup> *and λ with λ<sup>s</sup> and λ*1*, respectively. Some of the above insights can also be seen in Figure 4.*

#### *4.3. Clustered Gossip Networks*

**Theorem 5.** *For the clustered gossip network in Figure 2, the stationary marginal MGFs of the AoI processes at nodes 1, 2, and 3 in the c-th cluster are respectively given by*

$$
\psi\_{\{1\}}^{(n)} = \frac{\lambda\_c}{\lambda\_c - n'} \tag{35}
$$

$$
\sigma\_{\{2\}}^{(n)} = \frac{\lambda\_{\varepsilon}\lambda}{(\lambda\_{\varepsilon} - n)(\lambda - n)},
\tag{36}
$$

$$
\psi\_{\{3\}}^{(n)} = \frac{\lambda\_{\varepsilon}\lambda^2}{(\lambda\_{\varepsilon} - n)(\lambda - n)^2}.\tag{37}
$$

*Additionally, the stationary joint MGF of each pair of AoI processes at nodes 1, 2, and 3 is given by*

$$
\psi\_{\{1\},\{2\}}^{(n\_1,n\_2)} = \frac{\lambda\_\varepsilon \lambda \left[ (\lambda\_\varepsilon + \lambda - n\_2)(\lambda\_\varepsilon - n\_2) - \lambda\_\varepsilon n\_1 \right]}{(\lambda\_\varepsilon - n\_2)(\lambda - n\_2) \left[ \lambda\_\varepsilon + \lambda - (n\_1 + n\_2) \right] \left[ \lambda\_\varepsilon - (n\_1 + n\_2) \right]} \tag{38}
$$

$$\vec{\upsilon}\_{\{1\},\{3\}}^{\{n\_{\{1\}},n\_{\{2\}}\}} = \frac{\lambda\_{\varepsilon}\lambda^{2}[\lambda\_{\varepsilon}+2\lambda-(n\_{1}+n\_{2})]^{3}\left[\lambda\_{\varepsilon}[\lambda\_{\varepsilon}-(n\_{1}+n\_{2})][\lambda\_{\varepsilon}+2\lambda-n\_{1}-2n\_{2}]+(\lambda\_{\varepsilon}-n\_{2})(\lambda-n\_{2})^{2}\right]}{(\lambda\_{\varepsilon}-n\_{2})(\lambda-n\_{2})^{2}[\lambda\_{\varepsilon}-(n\_{1}+n\_{2})][\lambda\_{\varepsilon}+\lambda-(n\_{1}+n\_{2})]^{2}[\lambda\_{\varepsilon}+2\lambda-(n\_{1}+n\_{2})]^{3}},\tag{39}$$

$$\sigma\_{\{2\},\{3\}}^{\{n\_{2},n\_{2}\}} = \frac{\lambda\_{c}\lambda^{2}\sum\_{i=1}^{4}\beta\_{i}(n\_{1},n\_{2})}{(\lambda\_{c}-n\_{2})(\lambda-n\_{2})^{2}[\lambda\_{c}-(n\_{1}+n\_{2})][\lambda-(n\_{1}+n\_{2})][2\lambda-(n\_{1}+n\_{2})][\lambda\_{c}+\lambda-(n\_{1}+n\_{2})]^{2}[\lambda\_{c}+2\lambda-(n\_{1}+n\_{2})]^{2}}, \quad \text{(40)}$$

*where*

$$\beta\_1(n\_1, n\_2) = \left(\lambda\_\varepsilon - n\_2\right) \left(\lambda - n\_2\right)^2 \left[\lambda\_\varepsilon + \lambda - \left(n\_1 + n\_2\right)\right]^2 \left[\lambda\_\varepsilon + 2\lambda - \left(n\_1 + n\_2\right)\right]^2,\tag{41}$$

$$\beta\_2(n\_1, n\_2) = \lambda^2 (\lambda\_c - n\_2)(\lambda - n\_2)^2 [\lambda - (n\_1 + n\_2)][3\lambda\_c + 4\lambda - 3(n\_1 + n\_2)],\tag{42}$$

$$\beta\_3(n\_1, n\_2) = \lambda \left(\lambda\_\varepsilon - n\_2\right) \left(\lambda - n\_2\right)^2 \left[\lambda - \left(n\_1 + n\_2\right)\right] \left[\lambda\_\varepsilon - \left(n\_1 + n\_2\right)\right] \left[\lambda\_\varepsilon + \lambda - \left(n\_1 + n\_2\right)\right],\tag{43}$$

$$\mathcal{A}\_{4}(\boldsymbol{u}\_{1},\boldsymbol{n}\_{2}) = \lambda\lambda\_{\boldsymbol{\epsilon}}[\lambda - (\boldsymbol{u}\_{1} + \boldsymbol{n}\_{2})][\lambda\_{\boldsymbol{\epsilon}} - (\boldsymbol{u}\_{1} + \boldsymbol{n}\_{2})] \left[ [\lambda\_{\boldsymbol{\epsilon}} + 2\lambda - (\boldsymbol{u}\_{1} + \boldsymbol{n}\_{2})]^{2}[\lambda\_{\boldsymbol{\epsilon}} + \lambda - (\boldsymbol{u}\_{1} + \boldsymbol{n}\_{2})] + (\lambda - \boldsymbol{u}\_{2})[\lambda\_{\boldsymbol{\epsilon}} + \lambda - (\boldsymbol{u}\_{1} + \boldsymbol{n}\_{2})]^{2} \right] \tag{44}$$

$$+ \lambda(\lambda - \boldsymbol{n}\_{2})[2\lambda\_{\boldsymbol{\epsilon}} + 3\lambda - 2(\boldsymbol{u}\_{1} + \boldsymbol{n}\_{2})].\tag{45}$$

#### **Proof of Theorem 5.** See Appendix G.

**Proposition 3.** *For the clustered gossip network in Figure 2, the first moment, second moment, and variance of the AoI process at each node in the c-th cluster are given by*

$$
\psi\_{\{1\}}^{(1)} = \lambda\_c^{-1}, \ \psi\_{\{1\}}^{(2)} = 2\lambda\_c^{-2}, \ \operatorname{var}[\mathbf{x}\_1(t)] = \lambda\_c^{-2}.\tag{45}
$$

$$
\bar{v}\_{\{2\}}^{(1)} = \lambda\_{\varepsilon}^{-1} + \lambda^{-1}, \quad \bar{v}\_{\{2\}}^{(2)} = 2\left(\lambda\_{\varepsilon}^{-2} + \lambda\_{\varepsilon}^{-1}\lambda^{-1} + \lambda^{-2}\right), \quad \text{var}[\mathbf{x}\_2(t)] = \lambda\_{\varepsilon}^{-2} + \lambda^{-2}, \tag{46}
$$

$$\bar{v}\_{\{3\}}^{(1)} = \lambda\_{\varepsilon}^{-1} + 2\lambda^{-1}, \; \bar{v}\_{\{3\}}^{(2)} = 2\left(\lambda\_{\varepsilon}^{-2} + 2\lambda\_{\varepsilon}^{-1}\lambda^{-1} + 3\lambda^{-2}\right), \; \text{var}[\mathbf{x}\_{\overline{3}}(t)] = \lambda\_{\varepsilon}^{-2} + 2\lambda^{-2}. \tag{47}$$

*Furthermore, the correlation coefficient between each pair of nodes can be expressed as*

$$\text{cor}[\mathbf{x}\_1(t), \mathbf{x}\_2(t)] = \frac{\lambda^2}{(\lambda\_c + \lambda)\sqrt{\lambda\_c^2 + \lambda^2}},\tag{48}$$

$$\left[\text{cor}\left[\mathbf{x}\_1(t), \mathbf{x}\_3(t)\right]\right] = \frac{\lambda^3}{\left(\lambda\_\varepsilon + \lambda\right)^2 \sqrt{2\lambda\_\varepsilon^2 + \lambda^2}},\tag{49}$$

$$\text{cor}[\mathbf{x}\_2(t), \mathbf{x}\_3(t)] = \frac{\lambda\_c^4 + 2\lambda\_c^3\lambda + 2\lambda\_c^2\lambda^2 + 2\lambda\_c\lambda^3 + 2\lambda^4}{2(\lambda\_c + \lambda)^2\sqrt{(\lambda\_c^2 + \lambda^2)(2\lambda\_c^2 + \lambda^2)}}. \tag{50}$$

**Proof of Proposition 3.** See Appendix H.

**Proposition 4.** *Let* N*<sup>c</sup> denote the set of nodes inside cluster c. For i*, *j* ∈ {1, 2, ··· , *C*}*, the two age processes x*N*<sup>i</sup>* (*t*) *and x*N*<sup>j</sup>* (*t*) *are not correlated.*

**Proof of Proposition 4.** See Appendix I.

**Remark 10.** *From Proposition 3, one can deduce that v*¯ (1) {1} <sup>≤</sup> *<sup>v</sup>*¯ (1) {2} <sup>≤</sup> *<sup>v</sup>*¯ (1) {3}*, <sup>v</sup>*¯ (2) {1} <sup>≤</sup> *<sup>v</sup>*¯ (2) {2} <sup>≤</sup> *<sup>v</sup>*¯ (2) {3}*, and* var[*x*1(*t*)] ≤ var[*x*2(*t*)] ≤ var[*x*3(*t*)] *for any choice of values of λ<sup>c</sup> and λ. This follows from the fact that the configuration of each cluster in the clustered gossip network under consideration is a uni-directional ring, where each node has a single status-updating path from node 0 passing through its preceding node in the cluster.*

**Remark 11.** *Similar to Remark 4, note that the quantities in (46) and (47) associated with the age processes at nodes 2 and 3 are jointly convex functions in* (*λc*, *λ*)*, where the minimum value (zero) of each function is achieved at λ<sup>c</sup>* = *λ* = ∞*. Furthermore, for a given λ or λc, each quantity in (46) and (47) is a monotonically non-increasing function with respect to λ<sup>c</sup> or λ. This can also be observed in Figure 5.*

**Remark 12.** *Note that the correlation coefficients in (48)–(50) are monotonically non-increasing functions of λ<sup>c</sup> for a given λ, whereas they are monotonically non-decreasing functions of λ for a given λc. In particular,* cor[*x*1(*t*), *x*2(*t*)] *and* cor[*x*1(*t*), *x*3(*t*)] *monotonically increase as functions of λ from* lim *λ*→0 cor[*x*1(*t*), *x*2(*t*)] = lim *λ*→0 cor[*x*1(*t*), *x*3(*t*)] = 0 *until they approach*

lim *λ*→∞ cor[*x*1(*t*), *x*2(*t*)] = lim *λ*→∞ cor[*x*1(*t*), *x*3(*t*)] = 1 *and monotonically decrease as functions of λ<sup>c</sup> from* lim *λc*→0 cor[*x*1(*t*), *x*2(*t*)] = lim *λc*→0 cor[*x*1(*t*), *x*3(*t*)] = 1 *until they approach* lim *λc*→∞ cor[*x*1(*t*), *x*2(*t*)] = lim *λc*→∞ cor[*x*1(*t*), *x*3(*t*)] = 0*. Additionally,* cor[*x*2(*t*), *x*3(*t*)] *monotonically increases as a function of λ from* lim *λ*→0 cor[*x*2(*t*), *x*3(*t*)] = <sup>1</sup> 2 <sup>√</sup><sup>2</sup> *until it approaches* lim *λ*→∞ cor[*x*2(*t*), *x*3(*t*)] = 1 *and monotonically decreases as a function of λ<sup>c</sup> from* lim *λc*→0 cor[*x*2(*t*), *x*3(*t*)] = 1 *until it approaches* lim *λc*→∞ cor[*x*2(*t*), *x*3(*t*)] = <sup>1</sup> 2 √2 *. These insights can also be seen in Figure 6.*

**Figure 5.** Stationary first and second moments of age processes at the nodes inside the *c*th cluster of the clustered gossip network topology. The simulated curves were obtained from the numerical evaluation of the stationary marginal moments using (10) in Theorem 1.

**Figure 6.** Correlation coefficients between age processes in the clustered gossip network topology.

**Remark 13.** *Note that the result of Proposition 4 agrees with the intuition. In particular, since the nodes in each cluster are disconnected from the nodes in the other clusters, the two age processes associated with any two arbitrary clusters in the network are uncorrelated.*

**Remark 14.** *From Propositions 1–3, one can see that the standard deviation of x*1(*t*) *(i.e.,* \$var[*x*1(*t*)]*) was equal to its average value v*¯ (1) {1}*. Additionally, the standard deviations of the age processes at the other nodes were relatively large with respect to their average values (which is also demonstrated numerically in Figures 7–10). This key insight promotes the importance of incorporating the higher-order moments of age processes in the implementation or optimization of age-aware gossip networks rather than just relying on the average values of the age processes (as has been performed in the existing literature thus far). This insight also demonstrates the need for the development of Theorems 1 and 2 in this paper, which allow the characterization of the marginal or joint MGFs of different age processes in the network that can then be used to evaluate the marginal or joint higher-order moments.*

**Figure 7.** Variance of *x*2(*t*) in the serially-connected network setting. We denote the standard deviation of *x*2(*t*) as *σ*2.

**Figure 8.** Variance of *x*3(*t*) in the parallelly-connected network setting. We denote the standard deviation of *x*3(*t*) as *σ*3.

**Figure 9.** Variance of *x*2(*t*) in the clustered gossip network topology. We denote the standard deviation of *x*2(*t*) as *σ*2.

**Figure 10.** Variance of *x*3(*t*) in the clustered gossip network topology. We denote the standard deviation of *x*3(*t*) as *σ*3.

#### **5. Conclusions**

In this paper, we developed SHS-based methods that allow the characterization of the stationary marginal and joint MGFs of age processes in a general setting of gossip networks. In particular, we used the SHS framework to derive two systems of first-order ordinary differential equations characterizing the temporal evolution of the marginal and joint MGFs, from which the stationary marginal and joint MGFs were evaluated. Afterward, these methods were applied to derive the stationary marginal and joint MGFs in the following three network topologies: (1) a serially-connected topology, (2) a parallellyconnected topology, and (3) a clustered topology. Using the MGF expressions derived for each network topology, we obtained closed-form expressions for the following quantities: (1) the stationary marginal first and second moments of each age process, (2) the variance of each age process, and (3) the correlation coefficients between all possible pairwise combinations of the age processes. We further characterized the structural properties of

these quantities in terms of their convexity and monotonic nature with respect to the status updating rates and provided asymptotic results showing their behaviors when each of the status updating rates became small or large. Our analytical findings demonstrated that the standard deviations of the age processes in each network topology considered in this paper were relatively large with respect to their average values. This key insight promotes the importance of incorporating the higher-order moments of age processes in the implementation and optimization of age-aware gossip networks rather than just relying on the average values of the age processes (as has been performed in the existing literature thus far).

Given the generality of the setting of gossip networks analyzed in this paper, our developed methods can be applied to understand the marginal or joint distributional properties of age processes in any arbitrary gossip network topology. This opens the door for the use of these methods in the future to characterize the stationary marginal or joint moments and MGFs of the age processes in gossip network topologies that have only been analyzed in terms of the stationary first moment of each age process in the network until now. It would also be interesting to investigate how the stationary marginal or joint moments scale as functions of the network size.

**Author Contributions:** Conceptualization, M.A.A.-E. and H.S.D.; formal analysis, M.A.A.-E.; writing —original draft preparation, M.A.A.-E.; writing—review and editing, M.A.A.-E. and H.S.D.; funding acquisition, H.S.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the U.S. NSF (Grants CNS-1814477 and CNS-1923807). The publication charges of this article were covered in part by Virginia Tech's Open Access Subvention Fund.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Theorem 1**

We derive this result by first using the SHS framework to obtain a system of differential equations characterizing the temporal evolution of the marginal MGFs of the age processes associated with all sets *S* ⊆ N . We then obtain the stationary marginal MGFs as the fixed point of this system of equations (i.e., when *t* → ∞). To derive the system of differential equations, we follow a similar approach to that in [15,92], where the idea is to define the test functions {*ψ*(**x**(*t*))} whose expected values {E[*ψ*(**x**(*t*))]} are quantities of interest. Since we are interested here in the characterization of the marginal MGFs, we define the following class of test functions that is appropriate for this analysis:

$$\psi\_{\mathbb{S}}^{(n)}(\mathbf{x}(t)) = \exp[n\mathbf{x}\_{\mathbb{S}}(t)]\_{\prime} \,\,\forall \,\mathbf{S} \subseteq \mathcal{N}\_{\prime} \tag{A1}$$

where the expected value E *ψ*(*n*) *<sup>S</sup>* (**x**(*t*)) is *v* (*n*) *<sup>S</sup>* (*t*). We apply the SHS mapping *ψ*(**x**(*t*)) → *Lψ*(**x**(*t*)) (known as the extended generator) to every test function in (A1). Since the test functions defined above are time-invariant, it follows from [92] Theorem 1 that the extended generator of a test function *ψ*(**x**(*t*)) under the considered piecewise linear SHS is given by

$$L\boldsymbol{\psi}(\mathbf{x}(t)) = \frac{\mathbf{d}\boldsymbol{\psi}(\mathbf{x}(t))}{\mathbf{d}\mathbf{x}(t)}\mathbf{1}\_{N}^{\mathrm{T}} + \sum\_{l=(i,j)\in\mathcal{L}}\lambda\_{i\bar{j}}[\boldsymbol{\psi}(\mathbf{x}'(t)) - \boldsymbol{\psi}(\mathbf{x}(t))],\tag{A2}$$

where **x** (*t*) = *x* <sup>1</sup>(*t*) *x* <sup>2</sup>(*t*) ··· *x <sup>N</sup>*(*t*) such that the updated age at node *k*, *x <sup>k</sup>*(*t*) resulting from the transition (*i*, *j*) is given by (1). In addition, note that *x <sup>S</sup>*(*t*) = min *<sup>i</sup>*∈*<sup>S</sup> <sup>x</sup> i* (*t*). Now, we proceed to evaluate *Lψ*(*n*) *<sup>S</sup>* (**x**(*t*)). From the age updating rule in (1), the set of transitions L in (2), and the structure of *ψ*(*n*) *<sup>S</sup>* (**x**(*t*)) in (A1), we have

$$\frac{d\psi\_S^{(n)}(\mathbf{x}(t))}{d\mathbf{x}(t)}\mathbf{1}\_N^T = n\psi\_S^{(n)}(\mathbf{x}(t)),\tag{A3}$$

$$\boldsymbol{\psi}\_{\mathcal{S}}^{(n)}\left(\mathbf{x}'(t)\right) = \exp\left[n\mathbf{x}'\_{\mathcal{S}}(t)\right] = \begin{cases} \exp\left[n \times 0\right] = 1, & l = (0, j), j \in \mathcal{S}, \\\exp\left[n\mathbf{x}\_{\mathcal{S}\cup\{i\}}(t)\right] = \boldsymbol{\psi}\_{\mathcal{S}\cup\{i\}}^{(n)}(\mathbf{x}(t)), & l = (i, j), j \in \mathcal{S}, i \in \mathcal{N} \\\exp\left[n\mathbf{x}\_{\mathcal{S}}(t)\right] = \boldsymbol{\psi}\_{\mathcal{S}}^{(n)}(\mathbf{x}(t)), & \text{otherwise.} \end{cases} \tag{A4}$$

Substituting (A3) and (A4) into (A2) gives

$$L\boldsymbol{\upmu}\_{S}^{(n)}(\mathbf{x}(t)) = n\boldsymbol{\upmu}\_{S}^{(n)}(\mathbf{x}(t)) + \sum\_{j \in \mathcal{S}} \lambda\_{0j} \left[1 - \boldsymbol{\upmu}\_{S}^{(n)}(\mathbf{x}(t))\right] + \sum\_{i \in \mathcal{N}\backslash S} \sum\_{j \in \mathcal{S}} \lambda\_{i\bar{j}} \left[\boldsymbol{\upmu}\_{S \cup \{i\}}^{(n)}(\mathbf{x}(t)) - \boldsymbol{\upmu}\_{S}^{(n)}(\mathbf{x}(t))\right],\tag{A5}$$

The system of differential equations characterizing the temporal evolution of the marginal MGFs {*v* (*n*) *<sup>S</sup>* (*t*)}*S*⊆N can be derived by applying Dynkin's formula [92] to each test function and its associated extended generator. In particular, for a test function *ψ*(**x**(*t*)), the Dynkin's formula can be expressed as

$$\frac{\text{d}\mathbb{E}[\psi(\mathbf{x}(t))]}{\text{d}t} = \mathbb{E}[L\psi(\mathbf{x}(t))].\tag{A6}$$

Plugging *ψ*(*n*) *<sup>S</sup>* (**x**(*t*)) and *<sup>L</sup>ψ*(*n*) *<sup>S</sup>* (**x**(*t*)) into (A6) gives

$$\begin{split} \boldsymbol{\hat{v}}\_{\mathcal{S}}^{(n)}(t) &= \boldsymbol{n} \boldsymbol{v}\_{\mathcal{S}}^{(n)}(t) + \sum\_{j \in \mathcal{S}} \lambda\_{0j} \left[ 1 - \boldsymbol{v}\_{\mathcal{S}}^{(n)}(t) \right] + \sum\_{i \in \mathcal{N} \cup \mathcal{S}} \sum\_{j \in \mathcal{S}} \lambda\_{ij} \left[ \boldsymbol{v}\_{\mathcal{S} \cup \{i\}}^{(n)}(t) - \boldsymbol{v}\_{\mathcal{S}}^{(n)}(t) \right] \\ &\overset{(a)}{=} \lambda\_{0}(\boldsymbol{S}) + \boldsymbol{v}\_{\mathcal{S}}^{(n)}(t) \left[ \boldsymbol{n} - \lambda\_{0}(\boldsymbol{S}) - \sum\_{i \in \mathcal{N}(\boldsymbol{S})} \lambda\_{i}(\boldsymbol{S}) \right] + \sum\_{i \in \mathcal{N}(\boldsymbol{S})} \lambda\_{i}(\boldsymbol{S}) \boldsymbol{v}\_{\mathcal{S} \cup \{i\}}^{(n)}(t), \end{split} \tag{A7}$$

where step (a) directly follows from the definitions of *λi*(*S*) and *N*(*S*) in (7) and (8), respectively. Note that there exists a range of *n* values for which the differential equation in (A7) is asymptotically stable for any arbitrary set *S* ⊆ N . To see this, let us first express *v*˙ (*n*) <sup>N</sup> (*t*) using (A7) as follows:

$$
\psi\_{\mathcal{N}}^{(n)}(t) = \lambda\_0(\mathcal{N}) + v\_{\mathcal{N}}^{(n)}(t)[n - \lambda\_0(\mathcal{N})].\tag{A8}
$$

For 0 ≤ *n* < *λ*0(N ), (A8) is asymptotically stable, and the stationary marginal MGF *v*¯ (*n*) <sup>N</sup> can be obtained by setting *<sup>v</sup>*˙ (*n*) <sup>N</sup> (*t*) to zero and replacing *<sup>v</sup>* (*n*) <sup>N</sup> (*t*) with *<sup>v</sup>*¯ (*n*) <sup>N</sup> . Now, when *S* = N \{*k*}, (A7) is given by

$$
\psi\_S^{(n)}(t) = \lambda\_0(S) + v\_S^{(n)}(t)[n - \lambda\_0(S) - \lambda\_k(S)] + \lambda\_k(S)v\_{\mathcal{N}}^{(n)}(t). \tag{A9}
$$

For 0 ≤ *n* < min[*λ*0(*S*) + *λk*(*S*), *λ*0(N )] and *S* = N \{*k*}, *v* (*n*) <sup>N</sup> (*t*) converges as *<sup>t</sup>* <sup>→</sup> <sup>∞</sup>, and the differential equation in (A9) is asymptotically stable. The stationary marginal MGF *v*¯ (*n*) N \{*k*} is then the fixed point of (A9), which can be obtained after setting the derivative to zero. Afterward, when *S* = N \{*k*1, *k*2}, one can follow the above procedure to obtain the range of *n* values under which (A7) is asymptotically stable. Generally, for an arbitrary set *S*, there exists a threshold *δ* such that for *n* ∈ [0, *δ*), the stationary marginal MGF *v*¯ (*n*) *<sup>S</sup>* is the fixed point of (A7).

Finally, the stationary marginal *m*th moment *v*¯ (*m*) *<sup>S</sup>* in (10) can be obtained by substituting *<sup>ψ</sup>*(**x**(*t*)) in (A2) with *<sup>ψ</sup>*(*m*) *<sup>S</sup>* (**x**(*t*)) = *<sup>x</sup><sup>m</sup> <sup>S</sup>* (*t*) and following similar steps to those in (A2)–(A9). This completes the proof.

#### **Appendix B. Proof of Theorem 2**

The flow of this proof is similar to that for Theorem 1 in Appendix A. In particular, we start by constructing a class of test functions that is appropriate for the joint MGF analysis. We then use (A2) to derive the extended generator for each test function, which is then plugged into Dynkin's formula in (A6) to obtain the system of differential equations characterizing the temporal evolution of the joint MGFs {*v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> }*S*1,*S*2⊆N . The class of test functions we define here for the joint MGF analysis is given by

$$
\psi\_{S\_1, S\_2}^{(n\_1, n\_2)}(\mathbf{x}(t)) = \exp\left[n\_1 \mathbf{x}\_{\mathbb{S}\_1}(t) + n\_2 \mathbf{x}\_{\mathbb{S}\_2}(t)\right], \forall S\_1, S\_2 \subseteq \mathcal{N}\_\prime \tag{A10}
$$

such that the expected value E *ψ*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) is *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*). For such a structure of test functions, we have

$$\frac{d\boldsymbol{\upmu}\_{S\_1, S\_2}^{(n\_1, n\_2)}(\mathbf{x}(t))}{d\mathbf{x}(t)}\mathbf{1}\_N^T = (n\_1 + n\_2)\boldsymbol{\upmu}\_{S\_1, S\_2}^{(n\_1, n\_2)}(\mathbf{x}(t)).\tag{A11}$$

Compared with the proof for Theorem 1 in Appendix A, a key challenge in the derivation of the extended generator here is to carefully identify all the possible transitions in <sup>L</sup> that result in having *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x** (*t*)) <sup>=</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)). We provide Figure A1 to help one easily visualize the following arguments. For the first subset of transitions {(0, *j*) : *j* ∈ N}⊂L, we have

$$\begin{aligned} \; \psi\_{S\_1, S\_2}^{(n\_1, n\_2)} \left( \mathbf{x}'(t) \right) = \exp \left[ n\_1 \mathbf{x}'\_{S\_1}(t) + n\_2 \mathbf{x}'\_{S\_2}(t) \right] = \begin{cases} \psi\_{S\_2}^{(n\_2)}(\mathbf{x}(t)), & l = (0, j), j \in \mathcal{S}\_1 \; \mid \; S\_2, \\\psi\_{S\_1}^{(n\_1)}(\mathbf{x}(t)), & l = (0, j), j \in \mathcal{S}\_2 \; \mid \; S\_1, \\\ 1, & l = (0, j), j \in \mathcal{S}\_1 \; \cap \; S\_2, \\\ \psi\_{S\_1, S\_2}^{(n\_1, n\_2)}(\mathbf{x}(t)), & \text{otherwise}. \end{cases} \end{aligned} \tag{A12}$$

To help one easily grasp the different cases in (A12), we elaborate more on the construction of the first case, and the other cases can be interpreted similarly. In particular, when *j* ∈ *S*<sup>1</sup> \ *S*2, the transition (0, *j*) results in resetting the age of *S*<sup>1</sup> to zero, whereas the age of *S*<sup>2</sup> will not change. As a result, *ψ*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x** (*t*)) = exp *n*<sup>1</sup> × 0 + *n*<sup>2</sup> × *xS*<sup>2</sup> (*t*) (a) = *ψ*(*n*2) *<sup>S</sup>*<sup>2</sup> (**x**(*t*)), where step (a) follows from (A1).

**Figure A1.** A Venn diagram representation.

For the second subset of transitions {(*i*, *j*) : *i*, *j* ∈N}⊂L, we have

$$\psi\_{S\_1,S\_2}^{(n\_1m\_2)}(\mathbf{x}'(t)) = \exp\left[n\_1\mathbf{x}'\_{S\_1}(t) + n\_2\mathbf{x}'\_{S\_2}(t)\right] = \begin{cases} \psi\_{S\_1\cup\{i\},S\_2\cup\{j\},S\_1}^{(n\_1m\_2)}(\mathbf{x}(t)), & l = (i,j), j \in S\_1 \cup S\_2, i \in \mathcal{N} \mid S\_1, j \in S\_2\\ \psi\_{S\_1,S\_2\cup\{i\}}^{(n\_1m\_2)}(\mathbf{x}(t)), & l = (i,j), j \in S\_2 \cup S\_1, i \in \mathcal{N} \mid S\_2, j \in S\_2\\ \psi\_{S\_1\cup\{i\},S\_2\cup\{j\}}^{(n\_1m\_2)}(\mathbf{x}(t)), & l = (i,j), j \in S\_1 \cap S\_2, i \in \mathcal{N} \mid S\_1 \cup S\_2,\\ \psi\_{S\_1,S\_2\cup\{i\}}^{(n\_1m\_2)}(\mathbf{x}(t)), & l = (i,j), j \in S\_1 \cap S\_2, i \in S\_1 \mid S\_2,\\ \psi\_{S\_1\cup\{i\},S\_2\cup\{i\}}^{(n\_1m\_2)}(\mathbf{x}(t)), & l = (i,j), j \in S\_1 \cap S\_2, i \in S\_2 \mid S\_1,\\ \psi\_{S\_1,S\_2\cup\{i\}}^{(n\_1m\_2)}(\mathbf{x}(t)), & \text{otherwise}. \end{cases} \tag{A13}$$

#### Plugging (A11)–(A13) into (A2) gives

*<sup>L</sup>ψ*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) <sup>=</sup> (*n*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*2)*ψ*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) <sup>+</sup> ∑ *j*∈*S*1\*S*<sup>2</sup> *λ*0*<sup>j</sup> <sup>ψ</sup>*(*n*2) *<sup>S</sup>*<sup>2</sup> (**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *j*∈*S*2\*S*<sup>1</sup> *λ*0*<sup>j</sup> <sup>ψ</sup>*(*n*1) *<sup>S</sup>*<sup>1</sup> (**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *j*∈*S*1∩*S*<sup>2</sup> *λ*0*<sup>j</sup>* <sup>1</sup> <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *i*∈N \*S*<sup>1</sup> ∑ *j*∈*S*1\*S*<sup>2</sup> *<sup>λ</sup>ij <sup>ψ</sup>*(*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> (**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *i*∈N \*S*<sup>2</sup> ∑ *j*∈*S*2\*S*<sup>1</sup> *<sup>λ</sup>ij <sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*}(**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *i*∈N \*S*1∪*S*<sup>2</sup> ∑ *j*∈*S*1∩*S*<sup>1</sup> *<sup>λ</sup>ij <sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1∪{*i*},*S*2∪{*i*}(**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *i*∈*S*1\*S*<sup>2</sup> ∑ *j*∈*S*1∩*S*<sup>2</sup> *<sup>λ</sup>ij <sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*}(**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) + ∑ *i*∈*S*2\*S*<sup>1</sup> ∑ *j*∈*S*1∩*S*<sup>2</sup> *<sup>λ</sup>ij <sup>ψ</sup>*(*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> (**x**(*t*)) <sup>−</sup> *<sup>ψ</sup>*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) . (A14)

By applying Dynkin's formula in (A6) to *ψ*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) and *<sup>L</sup>ψ*(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)), we have *v*˙

(*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) = (*n*<sup>1</sup> + *n*2)*v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *j*∈*S*1\*S*<sup>2</sup> *λ*0*<sup>j</sup> v* (*n*2) *<sup>S</sup>*<sup>2</sup> (*t*) − *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *j*∈*S*2\*S*<sup>1</sup> *λ*0*<sup>j</sup> v* (*n*1) *<sup>S</sup>*<sup>1</sup> (*t*) − *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *j*∈*S*1∩*S*<sup>2</sup> *λ*0*<sup>j</sup>* 1 − *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *i*∈N \*S*<sup>1</sup> ∑ *j*∈*S*1\*S*<sup>2</sup> *<sup>λ</sup>ij v* (*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> (*t*) − *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *i*∈N \*S*<sup>2</sup> ∑ *j*∈*S*2\*S*<sup>1</sup> *<sup>λ</sup>ij v* (*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*}(*t*) <sup>−</sup> *<sup>v</sup>* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *i*∈N \*S*1∪*S*<sup>2</sup> ∑ *j*∈*S*1∩*S*<sup>1</sup> *<sup>λ</sup>ij v* (*n*1,*n*2) *<sup>S</sup>*1∪{*i*},*S*2∪{*i*}(*t*) <sup>−</sup> *<sup>v</sup>* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *i*∈*S*1\*S*<sup>2</sup> ∑ *j*∈*S*1∩*S*<sup>2</sup> *<sup>λ</sup>ij v* (*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*}(*t*) <sup>−</sup> *<sup>v</sup>* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) + ∑ *i*∈*S*2\*S*<sup>1</sup> ∑ *j*∈*S*1∩*S*<sup>2</sup> *<sup>λ</sup>ij v* (*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> (*t*) − *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) (a) = *v* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) ⎡ <sup>⎣</sup>(*n*<sup>1</sup> <sup>+</sup> *<sup>n</sup>*2) <sup>−</sup> *<sup>λ</sup>*0(*S*<sup>1</sup> <sup>∪</sup> *<sup>S</sup>*2) <sup>−</sup> <sup>∑</sup> *<sup>i</sup>*∈N \(*S*1∩*S*2) *<sup>λ</sup>i*(*S*<sup>1</sup> ∩ *<sup>S</sup>*2) − ∑ *i*∈N \*S*<sup>1</sup> *<sup>λ</sup>i*(*S*<sup>1</sup> \ *<sup>S</sup>*2) − ∑ *i*∈N \*S*<sup>2</sup> *λi*(*S*<sup>2</sup> \ *S*1) ⎤ ⎦ + *λ*0(*S*<sup>1</sup> ∩ *S*2) + *λ*0(*S*<sup>1</sup> \ *S*2)*v* (*n*2) *<sup>S</sup>*<sup>2</sup> (*t*) + *λ*0(*S*<sup>2</sup> \ *S*1)*v* (*n*1) *<sup>S</sup>*<sup>1</sup> (*t*) + ∑ *i*∈N \*S*<sup>1</sup> *λi*(*S*<sup>1</sup> \ *S*2)*v* (*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> (*t*) + ∑ *i*∈N \*S*<sup>2</sup> *λi*(*S*<sup>2</sup> \ *S*1)*v* (*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*}(*t*) + <sup>∑</sup> *<sup>i</sup>*∈N \(*S*1∪*S*2) *λi*(*S*<sup>1</sup> ∩ *S*2)*v* (*n*1,*n*2) *<sup>S</sup>*1∪{*i*},*S*2∪{*i*}(*t*) + ∑ *i*∈*S*1\*S*<sup>2</sup> *λi*(*S*<sup>1</sup> ∩ *S*2)*v* (*n*1,*n*2) *<sup>S</sup>*1,*S*2∪{*i*}(*t*) + ∑ *i*∈*S*2\*S*<sup>1</sup> *λi*(*S*<sup>1</sup> ∩ *S*2)*v* (*n*1,*n*2) *S*1∪{*i*},*S*<sup>2</sup> (*t*), (A15)

where step (a) follows from applying the definition of *λi*(*S*) in (7), followed by some algebraic simplifications. Now, following a similar procedure to that in (A8) and (A9) in Appendix A, one can show that for any two arbitrary sets *S*<sup>1</sup> and *S*2, there exists a threshold *δ* (such that 0 ≤ *n*<sup>1</sup> + *n*<sup>2</sup> < *δ*) under which the differential equation in (A15) is asymptotically stable. Thus, the final expression of the stationary joint MGF *v*¯ (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> in (11) can be obtained by taking the limit as *t* → ∞ in (A15) (i.e., setting *v*˙ (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) to zero and replacing *<sup>v</sup>* (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> (*t*) with *<sup>v</sup>*¯ (*n*1,*n*2) *<sup>S</sup>*1,*S*<sup>2</sup> ).

Finally, the stationary joint (*m*1, *m*2)th moment *v*¯ (*m*1,*m*2) *<sup>S</sup>*1,*S*<sup>2</sup> in (12) can be obtained by substituting *<sup>ψ</sup>*(**x**(*t*)) in (A2) with *<sup>ψ</sup>*(*m*1,*m*2) *<sup>S</sup>*1,*S*<sup>2</sup> (**x**(*t*)) = *<sup>x</sup>m*<sup>1</sup> *<sup>S</sup>*<sup>1</sup> (*t*)*xm*<sup>2</sup> *<sup>S</sup>*<sup>2</sup> (*t*) and following similar steps to those in (A11)–(A15). This completes the proof.

#### **Appendix C. Proof of Theorem 3**

We start the proof by showing how one can use Theorem 1 to obtain the stationary marginal MGF of the AoI or age process at each node in the network. In particular, by observing the set of transitions in Figure 1a, repeated application of (9) gives

$$
\sigma\_{\{1\}}^{(n)} = \frac{\lambda\_0}{\lambda\_0 - n},
\tag{A16}
$$

$$
\psi\_{\{2\}}^{(n)} = \frac{\lambda \vartheta\_{\{1,2\}}^{(n)}}{\lambda - n},
\tag{A17}
$$

$$
\sigma\_{\{1,2\}}^{(n)} = \frac{\lambda\_0}{\lambda\_0 - n}. \tag{A18}
$$

By substituting (A18) into (A17), we obtain

$$
\sigma\_{\{2\}}^{(n)} = \frac{\lambda\_0 \lambda}{(\lambda\_0 - n)(\lambda - n)}. \tag{A19}
$$

Now, we proceed to the evaluation of the stationary joint MGF *v*¯ (*n*1,*n*2) {2},{1} using Theorem 2. In particular, by applying (11) twice (the first time for *S*<sup>1</sup> = {2} and *S*<sup>2</sup> = {1} and the second time for *S*<sup>1</sup> = {1, 2} and *S*<sup>2</sup> = {1}), we obtain

$$
\psi\_{\{2\},\{1\}}^{(n\_1,n\_2)}[\lambda\_0 + \lambda - (n\_1 + n\_2)] = \lambda\_0 \psi\_{\{2\}}^{(n\_1)} + \lambda \psi\_{\{1,2\},\{1\} }^{(n\_1,n\_2)}\tag{A20}
$$

$$
\psi\_{
\{1,2\},
\{1\}
}^{
(n\_1, n\_2)}[
\lambda\_0 - (n\_1 + n\_2)] = \lambda\_0. \tag{A21}
$$

The final expression of *v*¯ (*n*1,*n*2) {2},{1} in (16) can be obtained by substituting *<sup>v</sup>*¯ (*n*1) {2} and *v*¯ (*n*1,*n*2) {1,2},{1} from (A17) and (A21), respectively, into (A20).

#### **Appendix D. Proof of Proposition 1**

The stationary marginal *m*th moment of the age process at node *i* ∈ N = {1, 2} (i.e., *v*¯ (*m*) {*i*} ) is given by

$$\bar{\upsilon}\_{\{i\}}^{(m)} = \frac{\mathbf{d}^m \left[ \bar{\upsilon}\_{\{i\}}^{(n)} \right]}{\mathbf{d}n^m} \Big|\_{n=0} \,. \tag{A22}$$

Furthermore, for *i*, *j* ∈ N , the stationary joint moment *v*¯ (*m*1,*m*2) {*i*},{*j*} of the two age processes at nodes *i* and *j* is given by

$$\left| \psi\_{\{i\},\{j\}}^{(m\_1,m\_2)} = \frac{\partial^{m\_1+m\_2} \left[ \psi\_{\{i\},\{j\}}^{(n\_1,n\_2)} \right]}{\partial n\_1^{m\_1} \partial n\_2^{m\_2}} \right|\_{n\_1=0,n\_2=0}.\tag{A23}$$

The marginal first and second moments of the age process at each node in the seriallyconnected network (in (A22) and (A23)) can be obtained by plugging the marginal MGF expressions derived in Theorem 3 into (A22). Furthermore, the variance of the age process at node *i* is given by

$$\text{var}[\mathfrak{x}\_{i}(t)] = \bar{\boldsymbol{\upsilon}}\_{\{i\}}^{(2)} - \left(\bar{\boldsymbol{\upsilon}}\_{\{i\}}^{(1)}\right)^{2}.\tag{A24}$$

Finally, for nodes *i*, *j* ∈ N , the correlation coefficient can be evaluated as follows:

$$\text{cor}\left[\mathbf{x}\_{i}(t),\mathbf{x}\_{j}(t)\right] = \frac{\vartheta^{(1,1)}\_{\{i\},\{j\}} - \vartheta^{(1)}\_{\{i\}}\vartheta^{(1)}\_{\{j\}}}{\sqrt{\text{var}\left[\mathbf{x}\_{i}(t)\right]}\sqrt{\text{var}\left[\mathbf{x}\_{j}(t)\right]}},\tag{A25}$$

In order to obtain cor[*x*1(*t*), *x*2(*t*)], what remains is only to evaluate *v*¯ (1,1) {2},{1} from (A23) (using the joint MGF expression in (16)) as

$$
\psi^{(1,1)}\_{\{2\},\{1\}} = \frac{\lambda\_0^2 + 2\lambda\_0\lambda + 2\lambda^2}{\lambda\lambda\_0^2(\lambda\_0 + \lambda)}.\tag{A26}
$$

By noting that

$$
\overline{\psi}\_{\{2\},\{1\}}^{(1,1)} - \overline{\psi}\_{\{2\}}^{(1)} \overline{\psi}\_{\{1\}}^{(1)} = \frac{\lambda\_0^2 + 2\lambda\_0\lambda + 2\lambda^2}{\lambda\lambda\_0^2(\lambda\_0 + \lambda)} - \frac{\lambda\_0 + \lambda}{\lambda\_0^2\lambda} = \frac{\lambda}{\lambda\_0^2(\lambda\_0 + \lambda)},\tag{A27}
$$

then the final expression of cor[*x*1(*t*), *x*2(*t*)] in (19) can be obtained, which completes the proof.

#### **Appendix E. Proof of Theorem 4**

For the parallelly-connected network in Figure 1b, the stationary marginal MGF of the age process at each node *i* ∈ N = {1, 2, 3} can be derived by repeatedly applying (9) as follows:

$$
\sigma\_{\{3\}}^{(n)} = \frac{\lambda\_1 \vartheta\_{\{1,3\}}^{(n)} + \lambda\_2 \vartheta\_{\{2,3\}}^{(n)}}{\lambda\_1 + \lambda\_2 - n},
\tag{A28}
$$

$$
\psi\_{\{1,3\}}^{(n)} = \frac{\lambda\_s + \lambda\_2 \vartheta\_{\{1,2,3\}}^{(n)}}{\lambda\_s + \lambda\_2 - n},
\tag{A29}
$$

$$
\sigma\_{\{2,3\}}^{(n)} = \frac{\lambda\_s + \lambda\_1 \vartheta\_{\{1,2,3\}}^{(n)}}{\lambda\_s + \lambda\_1 - n},
\tag{A30}
$$

$$
\sigma\_{\{1,2,3\}}^{(n)} = \frac{2\lambda\_s}{2\lambda\_s - n'} \tag{A31}
$$

$$
\psi\_{\{1\}}^{(n)} = \psi\_{\{2\}}^{(n)} = \frac{\lambda\_s}{\lambda\_s - n} . \tag{A32}
$$

The final expression of *v*¯ (*n*) {3} in (21) can be obtained by substituting *<sup>v</sup>*¯ (*n*) {1,3} and *<sup>v</sup>*¯ (*n*) {2,3} from (A29)–(A31) into (A28).

We now derive the stationary joint MGF *v*¯ (*n*1,*n*2) {3},{1} by repeatedly applying (11) as follows:

$$
\overline{\boldsymbol{\sigma}}\_{\{3\},\{1\}}^{(n\_1 n\_2)} \left[ \lambda\_5 + \lambda\_1 + \lambda\_2 - (n\_1 + n\_2) \right] = \lambda\_s \overline{\boldsymbol{\sigma}}\_{\{3\}}^{(n\_1)} + \lambda\_1 \overline{\boldsymbol{\sigma}}\_{\{1,3\},\{1\}}^{(n\_1 n\_2)} + \lambda\_2 \overline{\boldsymbol{\sigma}}\_{\{2,3\},\{1\}}^{(n\_1 n\_2)} \tag{A33}
$$

$$
\psi^{(n\_1, n\_2)}\_{\{1, 3\}, \{1\}} \left[ \lambda\_s + \lambda\_2 - (n\_1 + n\_2) \right] = \lambda\_s + \lambda\_2 \psi^{(n\_1, n\_2)}\_{\{1, 2, 3\}, \{1\}}.\tag{A34}
$$

$$
\boldsymbol{\sigma}\_{\{2,3\},\{1\}}^{(n\_1,n\_2)}[2\lambda\_\varepsilon + \lambda\_1 - (n\_1 + n\_2)] = \lambda\_\varepsilon \left(\boldsymbol{\sigma}\_{\{1\}}^{(n\_2)} + \boldsymbol{\sigma}\_{\{2,3\}}^{(n\_1)}\right) + \lambda\_1 \boldsymbol{\sigma}\_{\{1,2,3\},\{1\}}^{(n\_1,n\_2)}\tag{A35}
$$

$$
\psi\_{\{1,2,3\},\{1\}}^{(n\_1 n\_2)}[2\lambda\_5 - (n\_1 + n\_2)] = \lambda\_5 + \lambda\_s \psi\_{\{1\}}^{(n\_2)}.\tag{A36}
$$

By substituting (A34)–(A36) into (A33), *v*¯ (*n*1,*n*2) {3},{1} can expressed as

$$
\sigma\_{\{3\},\{1\}}^{(n\_1 n\_2)} = \frac{1}{[\lambda\_s + \lambda\_1 + \lambda\_2 - (n\_1 + n\_2)][2\lambda\_s + \lambda\_1 - (n\_1 + n\_2)][2\lambda\_s - (n\_1 + n\_2)][\lambda\_s + \lambda\_2 - (n\_1 + n\_2)]} \times \\
$$

$$
\begin{split}
\left[\lambda\_s[2\lambda\_s + \lambda\_1 - (n\_1 + n\_2)][2\lambda\_s - (n\_1 + n\_2)][\lambda\_s + \lambda\_2 - (n\_1 + n\_2)]\sigma\_{\{3\}}^{(n\_1)} + \lambda\_s\lambda\_2[2\lambda\_s + \lambda\_1 - (n\_1 + n\_2)]\right] \\
\times [\lambda\_s + \lambda\_1 + \lambda\_2 - (n\_1 + n\_2)]\bar{\nu}\_{\{1\}}^{(n\_2)} + \lambda\_s\lambda\_2[\lambda\_s + \lambda\_2 - (n\_1 + n\_2)][2\lambda\_s - (n\_1 + n\_2)]\bar{\nu}\_{\{2,3\}}^{(n\_1)} + \lambda\_s\lambda\_1\lambda\_2
\end{split}
\tag{A37}
$$

$$
\times [\lambda\_s + \lambda\_2 - (n\_1 + n\_2)] + \lambda\_s\lambda\_1[2\lambda\_s + \lambda\_1 - (n\_1 + n\_2)][2\lambda\_s + \lambda\_2 - (n\_1 + n\_2)]\bar{\nu}\_{\{2,3\}}^{(n\_1)} + \lambda\_s\lambda\_1\lambda\_2
$$

The final expression of *v*¯ (*n*1,*n*2) {3},{1} in (22) can be obtained by substituting *<sup>v</sup>*¯ (*n*1) {3} , *<sup>v</sup>*¯ (*n*2) {1} and *v*¯ (*n*1) {2,3} from (21), (A32) and (A30), respectively, into (A37).

#### **Appendix F. Proof of Proposition 2**

The expressions in (27)–(30) of the first moment, second moment, and variance of the age process at each node *i* ∈ N = {1, 2, 3} can be derived by plugging the marginal MGF expressions in Theorem 2 into (A22). Furthermore, by plugging the joint MGF *v*¯ (*n*1,*n*2) {3},{1} in (22) into (A23), one can obtain *v*¯ (1,1) {3},{1} as follows:

$$\begin{split} \psi\_{\{3\},\{1\}}^{(1,1)} &= \frac{1}{4\lambda\_{\sf s}^{2} (\lambda\_{\sf s} + \lambda\_{1} + \lambda\_{2})(\lambda\_{\sf s} + \lambda\_{2})^{2} (2\lambda\_{\sf s} + \lambda\_{1})(\lambda\_{1} + \lambda\_{2})(\lambda\_{\sf s} + \lambda\_{1})} \\ &+ \lambda\_{\sf s}^{3} (\lambda\_{1} + \lambda\_{2}) \Big( 32\lambda\_{1}^{2} + 75\lambda\_{1}\lambda\_{2} + 32\lambda\_{2}^{2} \Big) + 4\lambda\_{\sf s}^{2} (\lambda\_{1} + \lambda\_{2})^{2} \Big( 2\lambda\_{1}^{2} + 9\lambda\_{1}\lambda\_{2} + 2\lambda\_{2}^{2} \Big) + 3\lambda\_{\sf s}\lambda\_{1}\lambda\_{2} \Big( 3\lambda\_{1}^{2} + 7\lambda\_{1}\lambda\_{2} + 3\lambda\_{2}^{2} \Big) \\ &\times (\lambda\_{1} + \lambda\_{2}) + 3\lambda\_{1}^{2}\lambda\_{2}^{2} (\lambda\_{1} + \lambda\_{2})^{2} \Big), \end{split} \tag{A8}$$

The final expression of cor[*x*1(*t*), *x*3(*t*)] in (31) can be obtained from (A25) while noting that we have

$$
\psi\_{\{3\},\{1\}}^{(1,1)} - \psi\_{\{3\}}^{(1)}\psi\_{\{1\}}^{(1)} = \frac{\lambda\_1 \left[8\lambda\_s^4 + \lambda\_s^3 (12\lambda\_1 + 7\lambda\_2) + 2\lambda\_s^2 (\lambda\_1 + 2\lambda\_2)(\lambda\_2 + 2\lambda\_1) + \lambda\_s \lambda\_2 \left(3\lambda\_1^2 + 5\lambda\_1\lambda\_2 + \lambda\_2^2\right) + \lambda\_1 \lambda\_2^2 (\lambda\_1 + \lambda\_2)\right]}{4\lambda\_s^2 (\lambda\_s + \lambda\_1 + \lambda\_2)(\lambda\_s + \lambda\_2)^2 (2\lambda\_s + \lambda\_1)(\lambda\_s + \lambda\_1)},
$$

This completes the proof.

#### **Appendix G. Proof of Theorem 5**

Repeated application of (9) gives

$$
\psi\_{\{1\}}^{(n)} = \frac{\lambda\_c + \lambda \vartheta\_{\{1,3\}}^{(n)}}{\lambda\_c + \lambda - n},
\tag{A39}
$$

$$
\psi\_{\{1,3\}}^{(n)} = \frac{\lambda\_{\mathfrak{c}} + \lambda \vartheta\_{\{1,2,3\}}^{(n)}}{\lambda\_{\mathfrak{c}} + \lambda - n} \stackrel{(\mathfrak{a})}{=} \frac{\lambda\_{\mathfrak{c}}}{\lambda\_{\mathfrak{c}} - n} \,\tag{A40}
$$

$$
\sigma\_{\{1,2,3\}}^{(n)} = \frac{\lambda\_c}{\lambda\_c - n'} \tag{A41}
$$

$$
\sigma\_{\{2\}}^{(n)} = \frac{\lambda \sigma\_{\{1,2\}}^{(n)}}{\lambda - n},
\tag{A42}
$$

$$
\psi\_{\{1,2\}}^{(n)} = \frac{\lambda\_{\mathfrak{c}} + \lambda \vartheta\_{\{1,2,3\}}^{(n)}}{\lambda\_{\mathfrak{c}} + \lambda - n} \stackrel{(\mathfrak{a})}{=} \frac{\lambda\_{\mathfrak{c}}}{\lambda\_{\mathfrak{c}} - n} \,. \tag{A43}
$$

$$
\sigma\_{\{3\}}^{(n)} = \frac{\lambda \sigma\_{\{2,3\}}^{(n)}}{\lambda - n},
\tag{A44}
$$

$$
\psi\_{\{2,3\}}^{(n)} = \frac{\lambda \vartheta\_{\{1,2,3\}}^{(n)}}{\lambda - n} \stackrel{(\text{a})}{=} \frac{\lambda\_{\text{c}} \lambda}{(\lambda\_{\text{c}} - n)(\lambda - n)},\tag{A45}
$$

where step (a) in (A40), (A43), and (A45) follows from substituting *v*¯ (*n*) {1,2,3} from (A41). The expressions of {*v*¯ (*n*) {*i*} }*i*∈{1,2,3} in (35)–(37) are obtained from substituting (1) *<sup>v</sup>*¯ (*n*) {1,3} from (A40) into (A39), (2) *v*¯ (*n*) {1,2} from (A43) into (A42), and (3) *<sup>v</sup>*¯ (*n*) {2,3} from (A45) into (A44).

Regarding the evaluation of the stationary joint MGF expressions, we start by deriving *v*¯ (*n*1,*n*2) {1},{2}. Repeated application of (11) gives

$$
\vec{\boldsymbol{\sigma}}\_{\{1\},\{2\}}^{(n\_1,n\_2)}[\lambda\_\varepsilon + 2\lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon \vec{\boldsymbol{\sigma}}\_{\{2\}}^{(n\_2)} + \lambda \vec{\boldsymbol{\sigma}}\_{\{1,3\},\{2\}}^{(n\_1,n\_2)} + \lambda \vec{\boldsymbol{\sigma}}\_{\{1\},\{1,2\}}^{(n\_1,n\_2)}\tag{A46}$$

$$
\begin{bmatrix}
\boldsymbol{\sigma}^{(\boldsymbol{n}\_{1},\boldsymbol{n}\_{2})}\_{\{1,3\},\{2\}}\left[\boldsymbol{\lambda}\_{\text{c}}+2\boldsymbol{\lambda}-\left(\boldsymbol{n}\_{1}+\boldsymbol{n}\_{2}\right)\right]=\boldsymbol{\lambda}\_{\text{c}}\boldsymbol{\sigma}^{(\boldsymbol{n}\_{2})}\_{\{2\}}+\boldsymbol{\lambda}\boldsymbol{\sigma}^{(\boldsymbol{n}\_{1},\boldsymbol{n}\_{2})}\_{\{1,2,3\},\{2\}}+\boldsymbol{\lambda}\boldsymbol{\sigma}^{(\boldsymbol{n}\_{1},\boldsymbol{n}\_{2})}\_{\{1,3\},\{1,2\}}\tag{A47}
$$

$$
\sigma\_{\{1\},\{1,2\}}^{(n\_1,n\_2)}[\lambda\_\varepsilon + \lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon + \lambda \sigma\_{\{1,3\},\{1,2\}}^{(n\_1,n\_2)}\tag{A48}
$$

$$
\bar{\boldsymbol{\upsilon}}\_{\{1,2,3\},\{2\}}^{(n\_1,n\_2)}[\lambda\_\epsilon + \lambda - (n\_1 + n\_2)] = \lambda\_\epsilon \bar{\boldsymbol{\upsilon}}\_{\{2\}}^{(n\_2)} + \lambda \bar{\boldsymbol{\upsilon}}\_{\{1,2,3\},\{1,2\}}^{(n\_1,n\_2)}\tag{A49}
$$

$$
\begin{bmatrix}
\boldsymbol{\sigma}^{(\boldsymbol{n}\_{1},\boldsymbol{n}\_{2})}\_{\{1,3\},\{1,2\}}\left[\boldsymbol{\lambda}\_{\mathcal{E}} + 2\boldsymbol{\lambda} - \left(\boldsymbol{n}\_{1} + \boldsymbol{n}\_{2}\right)\right] = \boldsymbol{\lambda}\_{\mathcal{E}} + \lambda \boldsymbol{\mathcal{o}}^{(\boldsymbol{n}\_{1},\boldsymbol{n}\_{2})}\_{\{1,2,3\},\{1,2\}} + \lambda \boldsymbol{\mathcal{o}}^{(\boldsymbol{n}\_{1},\boldsymbol{n}\_{2})}\_{\{1,3\},\{1,2,3\}}\tag{A50}
$$

$$
\psi^{(\mathfrak{n}\_1, \mathfrak{n}\_2)}\_{\{1,2,3\},\{1,2\}} [\lambda\_c + \lambda - (\mathfrak{n}\_1 + \mathfrak{n}\_2)] = \lambda\_c + \lambda \psi^{(\mathfrak{n}\_1, \mathfrak{n}\_2)}\_{\{1,2,3\},\{1,2,3\}'} \tag{A51}
$$

$$
\bar{\psi}\_{\{1,3\},\{1,2,3\}}^{(n\_1,n\_2)}[\lambda\_\mathfrak{c} + \lambda - (n\_1 + n\_2)] = \lambda\_\mathfrak{c} + \lambda \bar{\psi}\_{\{1,2,3\},\{1,2,3\}}^{(n\_1,n\_2)}\tag{A52}
$$

$$
\psi^{(n\_1, n\_2)}\_{\{1, 2, 3\}, \{1, 2, 3\}} = \frac{\lambda\_c}{\lambda\_c - (n\_1 + n\_2)}.\tag{A53}
$$

By substituting *v*¯ (*n*1,*n*2) {1,2,3},{1,2,3} from (A53) into (A48) and (A50)–(A52), we obtain

$$
\vartheta^{(n\_{1,\mathcal{V}\_2})}\_{\{1,2,3\},\{1,2\}} = \vartheta^{(n\_{1,\mathcal{V}\_2})}\_{\{1,3\},\{1,2,3\}} = \vartheta^{(n\_{1,\mathcal{V}\_2})}\_{\{1,3\},\{1,2\}} = \vartheta^{(n\_{1,\mathcal{V}\_2})}\_{\{1\},\{1,2\}} = \frac{\lambda\_c}{\lambda\_c - (n\_1 + n\_2)},\tag{A54}$$

Furthermore, from (A47), (A49), (A50) and (A54), *v*¯ (*n*1.*n*2) {1,3},{2} can be expressed as

$$
\psi\_{\{1,3\},\{2\}}^{(n\_1,n\_2)} = \frac{\lambda\_c[\lambda\_c - (n\_1+n\_2)]\overline{\psi}\_{\{2\}}^{(n\_2)} + \lambda\lambda\_c}{[\lambda\_c + \lambda - (n\_1+n\_2)][\lambda\_c - (n\_1+n\_2)]}.\tag{A55}$$

The final expression of *v*¯ (*n*1,*n*2) {1},{2} in (38) can be obtained by substituting *<sup>v</sup>*¯ (*n*1,*n*2) {1,3},{2}, *<sup>v</sup>*¯ (*n*1,*n*2) {1},{1,2}, and *v*¯ (*n*2) {2} from (A55), (A54) and (36), respectively, into (A46). Now, we proceed with the evaluation of *v*¯ (*n*1,*n*2) {1},{3}. Repeated application of (11) gives

$$
\bar{\upsilon}\_{\{1\},\{3\}}^{(n\_{1,n\_2})}[\lambda\_\varepsilon + 2\lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon \bar{\upsilon}\_{\{3\}}^{(n\_2)} + \lambda \bar{\upsilon}\_{\{1,3\},\{3\}}^{(n\_1,n\_2)} + \lambda \bar{\upsilon}\_{\{1\},\{2,3\}}^{(n\_1,n\_2)},\tag{A56}
$$

$$
\overline{\sigma}\_{\{1,3\},\{3\}}^{(n\_1,n\_2)}[\lambda\_\varepsilon + \lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon \overline{\sigma}\_{\{3\}}^{(n\_2)} + \lambda \overline{\sigma}\_{\{1,2,3\},\{2,3\}}^{(n\_1,n\_2)}\tag{A57}
$$

$$
\sigma\_{\{1\},\{2,3\}}^{(n\_1,n\_2)}[\lambda\_\mathfrak{c}+2\lambda-(n\_1+n\_2)]=\lambda\_\mathfrak{c}\overline{\nu}\_{\{2,3\}}^{(n\_2)}+\lambda\overline{\nu}\_{\{1,3\},\{2,3\}}^{(n\_1,n\_2)}+\lambda\overline{\nu}\_{\{1\},\{1,2,3\}}^{(n\_1,n\_2)}\tag{A58}
$$

$$
\vartheta^{(n\_1, n\_2)}\_{\{1, 2, 3\}, \{2, 3\}}[\lambda\_c + \lambda - (n\_1 + n\_2)] = \lambda\_c \vartheta^{(n\_2)}\_{\{2, 3\}} + \lambda \vartheta^{(n\_1, n\_2)}\_{\{1, 2, 3\}, \{1, 2, 3\}}\tag{A59}
$$

$$
\sigma\_{\{1,3\},\{2,3\}}^{(n\_1,n\_2)}[\lambda\_\varepsilon + 2\lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon \vartheta\_{\{2,3\}}^{(n\_2)} + \lambda \vartheta\_{\{1,3\},\{1,2,3\}}^{(n\_1,n\_2)} + \lambda \vartheta\_{\{1,2,3\},\{2,3\}}^{(n\_1,n\_2)} \tag{A60}
$$

$$
\psi^{(n\_1, n\_2)}\_{\{1\}, \{1, 2, 3\}}[\lambda\_\varepsilon + \lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon + \lambda \psi^{(n\_1, n\_2)}\_{\{1, 3\}, \{1, 2, 3\}'} \tag{A61}
$$

where *v*¯ (*n*1,*n*2) {1,3},{1,2,3} <sup>=</sup> *<sup>v</sup>*¯ (*n*1,*n*2) {1,2,3},{1,2,3} <sup>=</sup> *<sup>λ</sup><sup>c</sup> <sup>λ</sup>c*−(*n*1+*n*2). By substituting *<sup>v</sup>*¯ (*n*1,*n*2) {1,3},{1,2,3} into (A61), we obtain

$$
\psi^{(n\_1, n\_2)}\_{\{1\}, \{1, 2, 3\}} = \frac{\lambda\_{\mathfrak{c}}}{\lambda\_{\mathfrak{c}} - (n\_1 + n\_2)}.\tag{A62}
$$

Furthermore, from (A57)–(A62), *v*¯ (*n*1,*n*2) {1,3},{3} and *<sup>v</sup>*¯ (*n*1,*n*2) {1},{2,3} can be respectively expressed as

$$
\psi\_{\{1,3\},\{3\}}^{(n\_1,n\_2)} = \frac{\lambda\_c[\lambda\_c + \lambda - (n\_1 + n\_2)]\vartheta\_{\{3\}}^{(n\_2)} + \lambda\left(\lambda\_c \vartheta\_{\{2,3\}}^{(n\_2)} + \lambda \vartheta\_{\{1,2,3\},\{1,2,3\}}^{(n\_1,n\_2)}\right)}{\left[\lambda\_c + \lambda - (n\_1 + n\_2)\right]^2},\tag{A63}$$

$$
\psi\_{\{1\},\{2,3\}}^{(n\_1,n\_2)} = \frac{\lambda\_\epsilon \vartheta\_{\{2,3\}}^{(n\_2)} + \lambda \vartheta\_{\{1\},\{1,2,3\}}^{(n\_1,n\_2)}}{\lambda\_\epsilon + \lambda - (n\_1 + n\_2)}. \tag{A64}
$$

The final expression of *v*¯ (*n*1,*n*2) {1},{3} in (39) can be obtained from substituting (A63) and (A64) into (A56), followed by some algebraic simplifications. Finally, to derive *v*¯ (*n*1,*n*2) {2},{3}, we first repeatedly use (11) as follows:

$$
\psi\_{\{2\},\{3\}}^{(\mathfrak{u}\_1,\mathfrak{u}\_2)}[2\lambda - (\mathfrak{n}\_1 + \mathfrak{n}\_2)] = \lambda \psi\_{\{1,2\},\{3\}}^{(\mathfrak{u}\_1,\mathfrak{u}\_2)} + \lambda \psi\_{\{2\},\{2,3\}}^{(\mathfrak{u}\_1,\mathfrak{u}\_2)},\tag{A65}
$$

$$
\begin{bmatrix}
\vec{\upsilon}^{(n\_1, n\_2)} \\
\{1, 2\}, \{3\}
\end{bmatrix} \left[\lambda\_c + 2\lambda - (n\_1 + n\_2)\right] = \lambda\_c \vec{\upsilon}^{(n\_2)}\_{\{3\}} + \lambda \vec{\upsilon}^{(n\_1, n\_2)}\_{\{1, 2, 3\}, \{3\}} + \lambda \vec{\upsilon}^{(n\_1, n\_2)}\_{\{1, 2\}, \{2, 3\}'} \tag{A66}
$$

$$
\psi\_{\{2\},\{2,3\}}^{(n\_1,n\_2)}[\lambda - (n\_1 + n\_2)] = \lambda \psi\_{\{1,2\},\{1,2,3\}}^{(n\_1,n\_2)}\tag{A67}
$$

$$
\sigma\_{\{1,2,3\},\{3\}}^{(n\_1,n\_2)}[\lambda\_c + \lambda - (n\_1 + n\_2)] = \lambda\_c \overline{\sigma}\_{\{3\}}^{(n\_2)} + \lambda \overline{\sigma}\_{\{1,2,3\},\{2,3\}}^{(n\_1,n\_2)}\tag{A68}
$$

$$
\bar{\boldsymbol{\upsilon}}\_{\{1,2\},\{2,3\}}^{(n\_1,n\_2)}[\lambda\_\varepsilon + 2\lambda - (n\_1 + n\_2)] = \lambda\_\varepsilon \bar{\boldsymbol{\upsilon}}\_{\{2,3\}}^{(n\_2)} + \lambda \bar{\boldsymbol{\upsilon}}\_{\{1,2,3\},\{2,3\}}^{(n\_1,n\_2)} + \lambda \bar{\boldsymbol{\upsilon}}\_{\{1,2\},\{1,2,3\}}^{(n\_1,n\_2)}.\tag{A69}$$

From (A66)–(A69), *v*¯ (*n*1,*n*2) {1,2},{3} and *<sup>v</sup>*¯ (*n*1,*n*2) {2},{2,3} can be respectively expressed as

$$\begin{split} \boldsymbol{\upbeta}\_{\{12\},\{3\}}^{(n\_1,n\_2)} &= \frac{1}{\left[\lambda\_\mathsf{c} + \lambda - (n\_1 + n\_2)\right] \left[\lambda\_\mathsf{c} + 2\lambda - (n\_1 + n\_2)\right]^2} \times \left[\lambda\_\mathsf{c} \middle| \lambda\_\mathsf{c} + 2\lambda - (n\_1 + n\_2)\right]^2 \boldsymbol{\upbeta}\_{\{3\}}^{(n\_2)} \\ &+ \lambda \left[\lambda\_\mathsf{c} + \lambda - (n\_1 + n\_2)\right] \left(\lambda\_\mathsf{c} \middle| \boldsymbol{\upbeta}\_{\{2,3\}}^{(n\_2)} + \lambda \mathbb{1}\_{\{1,2\},\{1,2\}}^{(n\_1,n\_2)}\right) + \lambda^2 \left[2\lambda\_\mathsf{c} + 3\lambda - 2(n\_1 + n\_2)\right] \mathbb{P}\_{\{1,2\},\{2,3\}}^{(n\_1,n\_2)} \right], \end{split} \tag{A70}$$

$$
\psi\_{\{2\},\{2,3\}}^{(n\_1,n\_2)} = \frac{\lambda\_c \lambda}{[\lambda\_c - (n\_1+n\_2)][\lambda - (n\_1+n\_2)]}.\tag{A71}
$$

The final expression of *v*¯ (*n*1,*n*2) {2},{3} in (40) can be obtained from plugging (A70) and (A71) into (A65), followed by substituting (1) *v*¯ (*n*2) {3} from (37), (2) *<sup>v</sup>*¯ (*n*2) {2,3} from (A45), (3) *<sup>v</sup>*¯ (*n*1,*n*2) {1,2,3},{2,3} from (A59), and (4) *v*¯ (*n*1,*n*2) {1,2},{1,2,3} as *<sup>λ</sup><sup>c</sup> <sup>λ</sup>c*−(*n*1+*n*2).

#### **Appendix H. Proof of Proposition 3**

The results of this proposition can be derived by following similar steps to those in Appendices D and F while noting that

$$
\psi\_{\{1\},\{2\}}^{(1,1)} = \frac{\lambda^2 + \left(\lambda\_\varepsilon + \lambda\right)^2}{\lambda \lambda\_\varepsilon^2 \left(\lambda\_\varepsilon + \lambda\right)},\tag{A72}
$$

$$
\sigma\_{\{1\},\{2\}}^{(1,1)} - \sigma\_{\{1\}}^{(1)} \sigma\_{\{2\}}^{(1)} = \frac{\lambda}{\lambda\_{\mathfrak{c}}^2 (\lambda\_{\mathfrak{c}} + \lambda)},
\tag{A73}
$$

$$
\sigma\_{\{1\},\{3\}}^{(1,1)} = \frac{2\lambda\_c^3 + 5\lambda\_c^2\lambda + 4\lambda\_c\lambda^2 + 2\lambda^3}{\lambda\lambda\_c^2(\lambda\_c + \lambda)^2},\tag{A74}
$$

$$
\bar{v}\_{\{1\},\{3\}}^{(1\,1)} - \bar{v}\_{\{1\}}^{(1)} \bar{v}\_{\{3\}}^{(1)} = \frac{\lambda^2}{\lambda\_c^2 (\lambda\_c + \lambda)^2} \tag{A75}
$$

$$
\bar{v}\_{\{2\},\{3\}}^{(1,1)} = \frac{5\lambda\_c^4 + 16\lambda\_c^3\lambda + 20\lambda\_c^2\lambda^2 + 12\lambda\_c\lambda^3 + 4\lambda^4}{2\lambda\_c^2\lambda^2\left(\lambda\_c + \lambda\right)^2},\tag{A76}
$$

$$
\bar{\boldsymbol{\upsilon}}\_{\{2\},\{3\}}^{(1,1)} - \bar{\boldsymbol{\upsilon}}\_{\{2\}}^{(1)} \bar{\boldsymbol{\upsilon}}\_{\{3\}}^{(1)} = \frac{\lambda\_{\varepsilon}^{4} + 2\lambda\_{\varepsilon}^{3}\lambda + 2\lambda\_{\varepsilon}^{2}\lambda^{2} + 2\lambda\_{\varepsilon}\lambda^{3} + 2\lambda^{4}}{2\lambda\_{\varepsilon}^{2}\lambda^{2} \left(\lambda\_{\varepsilon} + \lambda\right)^{2}}.\tag{A77}$$

#### **Appendix I. Proof of Proposition 4**

We first apply (11) to obtain *v*¯ (*n*1,*n*2) <sup>N</sup>*i*,N*<sup>j</sup>* as

$$
\overline{\sigma}\_{\mathcal{N}\_i \mathcal{N}\_j}^{(n\_1, n\_2)} = \frac{\lambda\_i \overline{\sigma}\_{\mathcal{N}\_j}^{(n\_2)} + \lambda\_j \overline{\sigma}\_{\mathcal{N}\_i}^{(n\_1)}}{\lambda\_i + \lambda\_j - (n\_1 + n\_2)} \stackrel{(\mathbf{a})}{=} \frac{\lambda\_i \lambda\_j}{(\lambda\_i - n\_1)(\lambda\_j - n\_2)},\tag{A78}
$$

where step (a) follows from substituting *v*¯ (*n*1) <sup>N</sup>*<sup>i</sup>* and *<sup>v</sup>*¯ (*n*2) <sup>N</sup>*<sup>j</sup>* from (A41) as *<sup>λ</sup><sup>i</sup> λ<sup>i</sup>* − *n*<sup>1</sup> and *λj λ<sup>j</sup>* − *n*<sup>2</sup> , respectively. We then obtain *∂*2*v*¯ (*n*1,*n*2) N*i*,N*<sup>j</sup> ∂n*2*∂n*<sup>1</sup> as

$$\frac{\partial^2 \vartheta\_{\mathcal{N}\_i, \mathcal{N}\_j}^{(n\_1, n\_2)}}{\partial n\_2 \partial n\_1} = \frac{\lambda\_i \lambda\_j}{\left(\lambda\_i - n\_1\right)^2 \left(\lambda\_j - n\_2\right)^2}. \tag{A79}$$

Thus, from (A79), we have

$$
\psi\_{\mathcal{N}\_i \mathcal{N}\_j}^{(1,1)} = \frac{1}{\lambda\_i \lambda\_j}. \tag{A80}
$$

The conclusion that the two age processes *x*N*<sup>i</sup>* (*t*) and *<sup>x</sup>*N*<sup>j</sup>* (*t*) are uncorrelated follows from noting that *v*¯ (1,1) N*i*,N*<sup>j</sup>* − *v*¯ (1) N*i v*¯ (1) <sup>N</sup>*<sup>j</sup>* <sup>=</sup> 0, and hence the correlation coefficient between *<sup>x</sup>*N*<sup>i</sup>* (*t*) and *x*N*<sup>j</sup>* (*t*) is zero.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Optimizing Urgency of Information through Resource Constrained Joint Sensing and Transmission**

**Zhuoxuan Ju, Parisa Rafiee and Omur Ozel \***

Department of Electrical and Computer Engineering, George Washington University, Washington, DC 20052, USA

**\*** Correspondence: ozel@gwu.edu

**Abstract:** Applications requiring services from modern wireless networks, such as those involving remote control and supervision, call for maintaining the timeliness of information flows. Current research and development efforts for 5G, Internet of things, and artificial intelligence technologies will benefit from new notions of timeliness in designing novel sensing, computing, and transmission strategies. The age of information (AoI) metric and a recent related urgency of information (UoI) metric enable promising frameworks in this direction. In this paper, we consider UoI optimization in an interactive point-to-point system when the updating terminal is resource constrained to send updates and receive/sense the feedback of the status information at the receiver. We first propose a new system model that involves Gaussian distributed time increments at the receiving end to design interactive transmission and feedback sensing functions and develop a new notion of UoI suitable for this system. We then formulate the UoI optimization with a new objective function involving a weighted combination of urgency levels at the transmitting and receiving ends. By using a Lyapunov optimization framework, we obtain a decision strategy under energy resource constraints at both transmission and receiving/sensing and show that it can get arbitrarily close to the optimal solution. We numerically study performance comparisons and observe significant improvements with respect to benchmarks.

**Keywords:** urgency of information; information freshness; resource constraints; Lyapunov optimization

#### **1. Introduction**

As demand from wireless networks exponentially increases to enable emerging technologies, the timeliness of data delivery and adaptation to the context of information becomes essential for improved quality of service and experience in time-sensitive applications. To this end, the measurement and improvement of the timeliness of data delivery and the effective adaptation to the context of delivered data have been fundamental challenges that researchers and practitioners have worked on actively in recent years. The age of information (AoI) is a well-known metric to measure the timeliness of data from the perspective of the nodes receiving or consuming data [1] and is expressed as the time elapsed since the generation of the latest received data. Although AoI has received much interest as a metric representing the freshness of information, new metrics are needed to address nonlinearity in the aging of data and time-varying value or context associated with flowing data. As a matter of fact, context-based applications (e.g., automatic driving and artificial intelligence) and nonlinear age [2–4] (as in many IoT applications) require a departure from AoI definition and analysis. Toward this end, the references [5,6] recently proposed an urgency of information (UoI) framework by combining the timeliness and context associated with information updates. In these papers, UoI was formally defined as the product of context-aware weight and the cost resulting from real-time estimation error

**Citation:** Ju, Z.; Rafiee, P.; Ozel, O. Optimizing Urgency of Information through Resource Constrained Joint Sensing and Transmission. *Entropy* **2022**, *24*, 1624. https://doi.org/ 10.3390/e24111624

Academic Editors: Anthony Ephremides and Yin Sun

Received: 7 October 2022 Accepted: 7 November 2022 Published: 9 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in a Gaussian dynamical system, the latter being a well-known nonlinear function of AoI. UoI expression can be expressed in mathematical form as follows:

$$F(t) = w\_t \delta(\mathbb{C}(t)),\tag{1}$$

where *wt* is the nonnegative coefficient representing the context or value at a specific time *t*, *δ*(.) is the cost function, and *C*(*t*) is the instantaneous cost measuring the urgency. This formulation subsumes the typical definition of AoI. If *C*(*t*) increases by one each time an update is not received, then the common AoI problem can be formulated as *wt* = 1 and *δ*(*Q*(*t*)) = *U*(*t*)*C*(*t*), where *U*(*t*) is an indicator that shows whether the information is updated or not. In our current paper, we will pursue a similar metric whereby the urgency level is represented by a coefficient *wt*, which will be set as an independent, identically distributed random process that shows how crucial the status information is at a specific moment *t*. In addition, we will pursue a quadratic cost function. This formulation enables us to analyze error increments and connect the proposed framework to the classical AoI problem.

The UoI framework in this paper will be designed to measure the expected performance degradation as a weighted sum of expected staleness or informativeness of the latest sensed Gaussian process at the receiving end with respect to the transmitter and the lack of synchrony between them, maintained by status updating from the transmitting end to the receiving end. Our goal is to build a systematic understanding on the interaction of feedback sensing and update transmission to maintain improved UoI levels measuring the synchrony and informativeness of information at one side about the other side when both actions are resource constrained. We will employ Lyapunov optimization tools to address this crucial problem.

Lyapunov optimization methods and tools have been well-known to various research communities to control queues and more generally dynamical systems in a near-optimal sense. In the context of queuing theory, the state of a system at a particular time is the vector of realizations of error variables which can easily be brought in queue forms by lower bounding it by zero and studied for upper bounding the optimal cost. Typically, the cost function is defined to take smaller value when the system moves toward the desirable states. System stability is achieved by taking control actions that make the Lyapunov drift in the negative direction toward zero. The key requirement is that all the queues and virtual queues in the system are mean rate stable [7,8]. In addition, the target function is achieved by taking control actions that minimize the Lyapunov penalty. However, because of the system stability awareness, the solution always has a gap with the optimal solution. Due to its general applicability in queuing theory, Lyapunov optimization is also used in AoI analysis and optimization. Ref. [9] used Lyapunov optimization to identify the tradeoff between AoI, accuracy, and completeness with the constrained throughput optimization problem. Ref. [10] used Lyapunov optimization to jointly minimize the average cost of sampling and transmitting status updates by users over a wireless channel subject to average AoI constraints.

Our work's motivation is rooted in AoI research that was presented in the recent past. We next aim to cover some of the literature that relates to the proposed research in this paper. The references [11,12] address varying source update frequency and [13,14] address service rate in various queuing models. In the wireless network scenario, the scheduling algorithms for optimizing AoI is studied extensively, such as those considering the channel state [15,16], throughput [17–19], energy harvesting [20–22], and average resource constraints [23,24], multiple sources [25–28], and multiple channels [29–32] to name a few. Ref. [33] studied the calculation and iterative process of AoI in combination with queuing theory and gave the analytic formula of average AoI under the random scheduling strategy. Ref. [34] explored the impact of service rate on average AoI under fixed deadline constraints and random exponential deadline constraints. Regarding link scheduling in wireless networks, ref. [35] studied the link-scheduling problem for every time slot under periodic data updates, and proposed random, greedy, Lyapunov optimization, Whittle Index, and other strategies for

link scheduling to optimize the average AoI of the network. Ref. [36] proposed offline and online scheduling algorithms based on the Markov decision process for the random data arrival scenario.

Feedback is also an essential factor in wireless communication scenarios and can influence the AoI performance significantly. In particular, it is well known that the feedback may help maintain expedient processing, non-repetitive transmission, and hence, energy efficiency in wireless transmission. For the case of battery-based non-energy harvesting devices, it is also vital to schedule appropriate transmission and sensing strategies to prolong the device's life. As a result, the role of feedback and energy cost in AoI analysis and optimization has received much interest from the research community (see e.g., [37–41]). Additionally, ref. [42] proves that the AoI and energy-harvesting scheduling strongly differ with or without the feedback. Refs. [43,44] minimized the AoI when the sensor uses ON/OFF schemes with energy harvesting nodes. Ref. [45] focused on the extreme cases of one unit battery and infinite battery situations to minimize the average peak AoI with energy constraints. Most recently, the paper [46] provided an analysis of feedback cost in AoI optimization over a point-to-point channel and determined specific conditions when feedback may or may not be useful for AoI optimization.

Decisions to sense/receive updates under energy constraints have also been of interest to AoI researchers. In particular, energy constraints can limit the chance of sensing new data and hence cause AoI to increase. In this context, refs. [47,48] proposed the joint scheduling of sense and transmission schemes to optimize the average peak AoI in an energy-harvesting system. In this paper, we will combine the concept of feedback and sensing, which means that the system will decide whether to sense the feedback information as input. As other related research, refs. [49,50] studied the value of information (VoI) in status update systems, and compared the performance of VoI with AoI. We also refer the reader to the related paper [51]. Based on the idea that AoI is only important when the receiver performs a query, refs. [52,53] proposed the age of information at query (QAoI) and optimized the QAoI.

In this paper, we will extend the UoI optimization framework in [6] to an interactive scenario by considering sensing/receiving costs at the updating terminal under energy resource constraint by using a Lyapunov optimization framework. Resource constraints in receiving/sensing the feedback can be interpreted as a limitation due to processing or energy to make it available for decision making on update transmission. Our motivation can be compared to that of [46] as well, which assumes the cost of feedback is incurred at the receiving end. This new problem calls for coordinated decisions to sense the feedback from the receiver and transmit the update to the receiver. Additionally, we need to account for relativity with respect to the transmitter and receiver sides and measure urgency by using a weight representing their importance under resource constraints. Our framework will address these new issues.

As the main contributions of this paper, we extend the UoI optimization framework by using a new definition that addresses the interactive nature of the setting when transmitting and receiving/sensing information is costly and average resource constraints are present on both actions. Constructing the objective function by assigning different weights to the urgency levels at the transmitting and receiving terminals, we determine jointly optimal scheduling of transmission and receiving/sensing the feedback by using a Lyapunov optimization framework. We obtain the Lyapunov gap and show that the result can be made arbitrarily close to the optimal solution. Our simulation results show that the proposed algorithm performs significantly better than two benchmark schemes, namely the greedy and AoI optimal algorithms.

The rest of the paper is organized as follows. In Section 2, we present the system model of the UoI problem. In Section 3, we formulate the UoI problem and analyze it. In Section 4, we offer numerical results to show the behavior of the solution. Finally, we conclude this paper in Section 5 by summarizing our contributions and discussing future directions.

#### **2. System Model**

We consider the system model in Figure 1. Here, the time is slotted: *t* = 1, 2, ... , *T*. The information-carrying signal in the service center and terminal, *At* and *Qt*, are as follows:

$$A\_{t+1} = (1 - \mathcal{U}\_2(t))A\_t + \mathcal{K}\_t \tag{2}$$

$$Q\_{t+1} = (1 - S\_t \mathcal{U}\_1(t))Q\_t + \mathcal{U}\_2(t)A\_t. \tag{3}$$

The variable *Kt* ∼ N (0, *<sup>σ</sup>*2) represents the increments added to the information-carrying signal *At* and is a Gaussian random variable independent over time and other variables. For convenience, we take the variable *<sup>A</sup>*1, *<sup>Q</sup>*<sup>1</sup> ∼ N (0, *<sup>σ</sup>*2); however, the initial conditions are assumed given and do not determine the outcome as long as they come from a wellbehaving distribution that makes the expectations well defined (c.f. Lemma 3 below). *U*1(*t*), *U*2(*t*) ∈ {0, 1} are decision variables to determine whether to transmit an update and sense the feedback, respectively. Equation (2) represents the evolution of information at the receiver with respect to the sensing at the transmitter. When *U*2(*t*) = 1, the sensing action is activated and the information at both ends are synchronized except an additive Gaussian noise due to causality and one time slot difference. The Equation (3) represents the evolution of the information at the transmitter with respect to the receiver side. These two equations represent the interaction between the transmitter and the receiver. Note that if the transmission or sensing does not happen, i.e., if *U*1(*t*)*St* = 0 or *U*2(*t*) = 0, then *Qt* or *At*, respectively, will become noisier. This is at the heart of the urgency of information notion we pursue in this paper. When a transmission does not happen (due to not transmitting or a channel erasure), the synchrony between the two sides, represented by *Qt*, is not affected as long as a new sensing action is not taken. At the beginning of the *t*th time slot, the terminal first decides *U*1(*t*) ∈ {0, 1} to determine whether to transmit the information-carrying variable *Qt* to the service center or not. The transmission takes one time slot and goes through an erasure-type wireless channel represented by *St* with a fixed failure transmission rate *p*. In particular, *St* = 1 if the transmission is successful and *St* = 0 otherwise. At the same time, the service center feeds back *At* to the terminal, which also takes one time slot with no failure rate. At the end of the *t*th time slot, the feedback arrives at the terminal, and the terminal will decide *U*2(*t*) ∈ {0, 1} to determine whether to sense the feedback or not. We can, in principle, let *At* and *Qt* evolve as max{(1 − *U*2(*t*))*At* + *Kt*, 0} and max{(1 − *StU*1(*t*))*Qt* + *U*2(*t*)*At*, 0} with nonnegative initial values. These versions bring these system states to the form of queues with potentially dependent arrivals and departures. Our Lyapunov drift plus penalty-based analysis will be applicable for both versions. We therefore prefer to keep them as in (2) and (3) in the ensuing analysis.

Now we can elicit our optimization problem *P*<sup>1</sup> to minimize an upper bound of average UoI:

$$\min\_{\pi\_t} \lim\_{T \to \infty} \sup \frac{1}{T} \sum\_{t=0}^{T-1} E\left[w\_t(Q\_t^{\cdot,2} + MA\_t^{\cdot,2})\right] \tag{4}$$

$$\text{s.t.} \quad \lim\_{T \to \infty} \sup \frac{1}{T} \sum\_{t=0}^{T-1} E\left[\mathcal{U}\_1(t)\right] \le \varphi\_1 \tag{5}$$

$$\lim\_{T \to \infty} \sup \frac{1}{T} \sum\_{t=0}^{T-1} E[\mathbb{L}I\_2(t)] \le \varrho\_{2\prime} \tag{6}$$

where *π<sup>t</sup>* is the set of sequence of decisions *π<sup>t</sup>* = {*U*1(*t*), *U*2(*t*)}, *wt* is the nonnegative weight of urgency modeled as an i.i.d. random variable, *M* is the weight of the relative error of the variable *At* at the transmitter side, *ϕ*<sup>1</sup> is the energy (or frequency) constraint on transmission, and *ϕ*<sup>2</sup> is the energy (or frequency) constraint on sensing. In order to satisfy the average transmission/sensing frequency constraints (5) and (6), we define the virtual queues *Ht* and *Gt* as follows, which are both initialized at 0:

$$H\_{t+1} = \max\{H\_t - \varrho\_1 + \mathcal{U}\_1(t), \, 0\} \tag{7}$$

$$G\_{t+1} = \max\{G\_t - \varrho\_2 + lI\_2(t), \, 0\}.\tag{8}$$

Next, let us consider the evolution of the transmission virtual queue: If the terminal decides to transmit at time slot *t*, the transmission virtual queue *Ht* will increase by 1 − *ϕ*1. Otherwise, it will decrease by *ϕ*1. As a result, the longer the virtual queue, the more transfers will be performed. The virtual queue of sensing *Gt* evolves similarly. Therefore, these two virtual queues can appropriately express the usage of the historical transmission/sensing frequency.

**Figure 1.** Systemmodel with joint transmission and feedback reception.

#### **3. Optimizing the Urgency of Information in** *P***<sup>1</sup>**

In this section, we will systematically develop a Lyapunov optimization framework for optimizing an upper bound for the solution of *P*1. We summarize the notations we use throughout the rest of the paper in Table 1.



#### *3.1. Lyapunov Function Definitions*

In order to use the Lyapunov optimization framework, we will first define the Lyapunov drift function Δ*<sup>t</sup>* by using the quadratic sum of system states:

$$L\_t = \frac{1}{2}V H\_t^2 + \frac{1}{2}Z G\_t^2 + \frac{1}{2}\theta Q\_t^2 + \frac{1}{2}\beta A\_t^{-2},\tag{9}$$

where *V*, *Z*, *θ* and *β* are the weights for different variables, which represent different importance levels of the stability of the queues or system states *Ht*, *Gt*, *Qt* and *At*, respectively. In our analysis, we use the terms "queue" or "system state" interchangeably. Although the evolution of *At* and *Qt* in (2) and (3) can take negative values, we can redefine them by lower bounding their evolution by zero and make their definitions suitable as a queue with arrivals and departures potentially depending on the control actions. However, none of the analysis steps we take in this paper will be affected by this redefinition, as the Lyapunov analysis we present essentially optimizes a bound on the system performance. We therefore continue using the original definitions (2) and (3). The Lyapunov drift function for this system can be expressed as

$$
\Delta\_t = E[L\_{t+1} - L\_t | Q\_t, n\_t, H\_t, G\_{t\prime}, w\_{t+1}]\_\prime \tag{10}
$$

where *nt* is the number of time slots since the last time we decide to sense the feedback. It is obvious that in *t*th time slot, the terminal has a knowledge of *Ht*, *Gt*, *Qt*. However, the terminal cannot access the specific value of *At* because the latest estimation error arrived at the service center at the end of (*t* − 1)st time slot. Nevertheless, the terminal is aware of the number of time slot since the last time it decides to sense *nt*, which can be expressed as:

$$n\_{t+1} = (1 - lL\_2(t))n\_t + 1.\tag{11}$$

As a result, the terminal will decide whether to sense based on the number of time slot since the last time it decided to sense *nt* rather than the error in the service center *At*.

**Lemma 1.** *In each time slot t, given the error in terminal Qt, urgency weight at the next time slot wt*+1*, the number of time slots since the last time terminal decides to sense nt, virtual queue length Ht and Gt, set Yt* = {*Qt*, *nt*, *Ht*, *Gt*, *wt*+1}*, we can obtain an upper bound on the Lyapunov drift* Δ*<sup>t</sup> as*

$$\begin{split} \Delta\_{l} &\leq \frac{1}{2}(V+Z) + \frac{1}{2}\beta\sigma^{2} - V\varrho\_{1}H\_{l} - Z\varrho\_{2}G\_{l} + (VH\_{l} - \frac{1}{2}\theta pQ\_{l}^{\ast 2})E[\mathcal{U}\_{1}(t)|\mathcal{Y}\_{l}] \\ &+ (Z\mathcal{G}\_{l} + \frac{1}{2}\theta n\_{l}\sigma^{2} - \frac{1}{2}\beta n\_{l}\sigma^{2})E[\mathcal{U}\_{2}(t)|\mathcal{Y}\_{l}]. \end{split} \tag{12}$$

**Proof.** See Appendix A.

Denote the penalty in the *t*th time slot by *ft*. Because of causality, *U*1(*t*), *U*2(*t*) will affect UoI in (*t* + 1)st time slot. Therefore, we let *ft* = *Rwt*+1(*Qt*+<sup>1</sup> <sup>2</sup> + *MAt*+<sup>1</sup> <sup>2</sup>), where *R* is the weight of the UoI compared with system stability and the remaining terms represent UoI at *t* + 1.

**Lemma 2.** *If we set the penalty in the tth time slot as ft* = *Rwt*+1(*Qt*+<sup>1</sup> <sup>2</sup> + *MAt*+<sup>1</sup> <sup>2</sup>)*, and the average of the weight of the urgency as* <sup>∼</sup> *w. The Lyapunov drift plus penalty function is upper bounded as:*

$$\begin{split} \Delta\_{\mathrm{I}} + E[f\_{\mathrm{I}}|Y\_{\mathrm{I}}] \leq & \frac{1}{2}(V+Z) + \frac{1}{2}\theta\sigma^{2} + R\widetilde{w}(Q\_{\mathrm{I}}^{2} + M\sigma^{2} + Mn\_{\mathrm{I}}\sigma^{2}) - V\varrho\_{1}H\_{\mathrm{I}} - Z\varrho\_{2}G\_{\mathrm{I}} \\ &+ (VH\_{\mathrm{I}} - \frac{1}{2}\theta pQ\_{\mathrm{I}}t^{2} - R\widetilde{w}pQ\_{\mathrm{I}}^{2})E[\mathrm{I}I\_{\mathrm{I}}(t)|Y\_{\mathrm{I}}] \\ &+ (Z\mathrm{G}\_{\mathrm{I}} + \frac{1}{2}\theta n\_{\mathrm{I}}\sigma^{2} - \frac{1}{2}\beta n\_{\mathrm{I}}\sigma^{2} + (1-M)R\widetilde{w}n\_{\mathrm{I}}\sigma^{2})E[\mathrm{I}I\_{\mathrm{I}}(t)|Y\_{\mathrm{I}}]. \end{split} \tag{13}$$

#### **Proof.** See Appendix B.

**Lemma 3.** *If E*[*L*0] < ∞*, and* Δ*<sup>t</sup>* + *E*[ *ft*] ≤ *C, where C is a constant, then all the queues and virtual queues in the system are mean rate stable.*

#### **Proof.** See Appendix C.

#### *3.2. Finding Appropriate Weights for the System*

Next, we are going to find the optimal value of the weight parameters *θ* and *β* to minimize the right hand side of (13) to the extent possible. Note that it is feasible to use a stationary randomized scheme that independently transmits and senses with probability *ϕ*<sup>1</sup> and *ϕ*<sup>2</sup> at each time slot, which translates to *E*[*U*1(*t*)] = *ϕ*<sup>1</sup> and *E*[*U*2(*t*)] = *ϕ*2. As a result, we reorder (13) to get

$$\begin{split} E[L\_{t+1} - L\_t + f\_t | \mathcal{Y}\_t] \leq &(\widehat{R\bar{w}}M + \frac{1}{2}\beta)\sigma^2 + \frac{1}{2}(V+Z) + (-\frac{1}{2}\theta p\varphi\_1 - \widehat{R\bar{w}}p\varphi\_1 + \widehat{R\bar{w}})Q\_t^{\perp 2} \\ &+ (\frac{1}{2}(\theta - \beta)n\_t\sigma^2 + R\widehat{\bar{w}}M + (1-M)\widehat{R\bar{w}}p\varphi\_2)n\_t\sigma^2. \end{split} \tag{14}$$

To make the right hand side of (14) no larger than a constant, we want the coefficients of *Qt* <sup>2</sup> and *ntσ*<sup>2</sup> no larger than 0. For the coefficient of *Qt* 2,

$$-\frac{1}{2}\theta p \,\varphi\_1 - R\widetilde{w}p \,\varphi\_1 + R\widetilde{w} \le 0$$

$$\theta \ge \frac{2}{p\varphi\_1}(1 - p\varphi\_1)R\widetilde{w}.\tag{15}$$

For the coefficient of *ntσ*2,

$$\frac{1}{2}(\theta - \beta)u\_l v^2 + R\widetilde{w}M + (1 - M)R\widetilde{\widetilde{w}}\varphi\_2 \le 0$$

$$\beta \ge \theta + 2(\frac{1}{\varphi\_2} - 1)R\widetilde{\widetilde{w}}M + 2R\widetilde{\widetilde{w}}.\tag{16}$$

As a result, we take the value of the parameters *θ* and *β* as

$$\theta = \frac{2}{p\varphi\_1} (1 - p\varphi\_1) \hat{R}\hat{w} \tag{17}$$

$$\beta = \frac{2}{p\varphi\_1} R\widetilde{\dot{w}} + 2(\frac{1}{\varphi\_2} - 1)R\widetilde{\dot{w}}M. \tag{18}$$

Put the value of the parameters *θ* and *β* back to (14), then we can get the upper bound of *E*[*Lt*+<sup>1</sup> − *Lt* + *ft*|*Yt*] as

$$E[L\_{t+1} - L\_t + f\_t | Y\_t] \le (\frac{1}{p\varrho\_1} + \frac{M}{\varrho\_2})R\tilde{u}v^2 + \frac{1}{2}(V+Z). \tag{19}$$

Note that the right hand of (19) is a constant, which means that all the queues and virtual queues in the system are mean rate stable under above derived conditions.

#### *3.3. Deriving Lyapunov Optimal Decisions*

We now minimize the upper bound in the RHS of (13), which is actually in the following form:

$$\begin{split} \min\_{\pi\_{t}} & \quad (VH\_{t} - \frac{1}{2}\theta pQ\_{t}^{2} - Rw\_{t+1}pQ\_{t}^{2})\mathcal{U}\_{1}(t) \\ & \quad + (ZG\_{t} + \frac{1}{2}(\theta - \beta)n\_{t}\sigma^{2} + (1 - M)Rw\_{t+1}n\_{t}\sigma^{2})\mathcal{U}\_{2}(t). \end{split} \tag{20}$$

We next show the scheduling scheme for each time slot. Putting the value of the parameters *θ* and *β* back to (20), we get the following:

$$\begin{split} \min\_{\pi\_{l}} & \left[ VH\_{l} - (w\_{l+1} - \widetilde{w} + \frac{\widetilde{w}}{p\varrho\_{1}}) \mathcal{R}p Q\_{l}^{\ast} \right] \mathcal{U}\_{1}(t) \\ &+ \left[ \mathcal{Z}\mathcal{G}\_{l} + ((M-1)\left( \widetilde{w} - w\_{l+1} \right) - \frac{\widetilde{w}M}{q\varrho\_{2}}) \mathcal{R}n\_{l}\sigma^{2} \right] \mathcal{U}\_{2}(t). \end{split} \tag{21}$$

Set the update index *at* <sup>=</sup> *VHt* <sup>−</sup> (*wt*+<sup>1</sup> <sup>−</sup> <sup>∼</sup> *<sup>w</sup>* + <sup>∼</sup> *w pϕ*<sup>1</sup> )*RpQt* <sup>2</sup> and update index *bt* = *ZGt* + ((*M* −1) "∼ *w* − *wt*+<sup>1</sup> # − ∼ *wM <sup>ϕ</sup>*<sup>2</sup> )*Rntσ*2, and then the solution to the scheme (20) can be achieved:

$$\mathcal{U}\_1(t) = \begin{cases} 1 & \text{ $a\_t$ } < 0\\ \alpha & \text{ $a\_1$ -norm.min.} \end{cases} \tag{22a}$$

$$\begin{array}{c} \text{I} \\ \text{I} \end{array} \begin{array}{c} \text{(} 0 \quad \text{)} \\ \text{(} \text{(} 0 \quad \text{)} \text{)} \\ \text{(} \text{(} 1 \quad \text{)} \\ \text{(} \text{(} 1 \quad \text{)} \\ \text{(} \text{(} 0 \quad \text{)} \\ \text{(} 1 \quad \text{)} \text{)} \end{array} \tag{22b}$$

$$\mathcal{U}\_2(t) = \begin{cases} 1 & \text{, } b\_t < 0 \\ \frac{1}{\alpha} & \text{, } \end{cases} \tag{23a}$$

$$\begin{array}{c} \text{\textquotedblleft}, \text{\textquotedblright} \\ \text{\textquotedblleft}, \text{\textquotedblleft}, \text{\textquotedblleft} \end{array} \begin{array}{c} \text{\textquotedblleft}, \text{\textquotedblleft} \\ \text{\textquotedblleft}, \text{\textquotedblleft} \end{array} \tag{23b}$$

We summarize below the resulting Lyapunov optimal Algorithm 1.

#### **Algorithm 1** Decisions scheduling scheme based on Lyapunov optimization

**Require:** *A*0, *Q*0, *H*0, *G*0, *n*0, *St*, *Kt*, *ϕ*1, *ϕ*2, *wt*, ∼ *w*, *V*, *Z*, *M*, *R* 1: **for** each time slot *t* **do** 2: Calculate *at* <sup>=</sup> *VHt* <sup>−</sup> (*wt*+<sup>1</sup> <sup>−</sup> <sup>∼</sup> *<sup>w</sup>* + <sup>∼</sup> *w pϕ*<sup>1</sup> )*RpQt* 2; 3: Calculate *bt* = *ZGt* + ((*M* − 1) "∼ *w* − *wt*+<sup>1</sup> # − ∼ *wM <sup>ϕ</sup>*<sup>2</sup> )*Rntσ*2; 4: **if** *at* < 0 **then** 5: *U*1(*t*) = 1 6: **else** 7: *U*1(*t*) = 0; 8: **end if** 9: **if** *bt* < 0 **then** 10: *U*2(*t*) = 1; 11: **else** 12: *U*2(*t*) = 0; 13: **end if** 14: Calculate *At*+<sup>1</sup> = (1 − *U*2(*t*))*At* + *Kt*; 15: Calculate *Qt*+<sup>1</sup> = (1 − *StU*1(*t*))*Qt* + *U*2(*t*)*At*; 16: Calculate *Ht*+<sup>1</sup> = max{*Ht* − *ϕ*<sup>1</sup> + *U*1(*t*), 0}; 17: Calculate *Gt*+<sup>1</sup> = max{*Gt* − *ϕ*<sup>2</sup> + *U*2(*t*), 0}; 18: Calculate *nt*+<sup>1</sup> = (1 − *U*2(*t*))*nt* + 1; 19: **end for**

Based on the algorithm, we can make decisions by scheduling every time slot to minimize the value of UoI and maintain the virtual queue stability simultaneously. From the algorithm, it is apparent that we can successfully decouple the joint decisions into two independent threshold schemes, which makes the implementation desirably simple.

#### *3.4. Solving for the Target Function and Lyapunov Gap*

In this section, we will solve for the target function and achieve the expression of the gap between the optimal solution and the result obtained by the Lyapunov optimization algorithm. We will also prove that the result gained by the Lyapunov optimization algorithm can be infinitely close to the optimal solution. Now make the summation of the total T-time slot on both sides of (19), and we can get

$$E\left[L\_T - L\_0 + \sum\_{t=0}^{T-1} f\_t\right] \le T\left[(\frac{1}{p\,\varphi\_1} + \frac{M}{\varphi\_2})\tilde{R\,w}v^2 + \frac{1}{2}(V+Z)\right].\tag{24}$$

Note that *LT* <sup>≥</sup> 0 and *<sup>L</sup>*<sup>0</sup> *<sup>T</sup>* = 0, and then divide *T* on both sides of (24) to get the time-averaged result

$$\frac{1}{T}E\left[\sum\_{t=0}^{T-1}f\_t\right] \le (\frac{1}{p\,\varphi\_1} + \frac{M}{\varphi\_2})\hat{Rw}\sigma^2 + \frac{1}{2}(V+Z).\tag{25}$$

**Theorem 1.** *Set the problem of (20) as P*2(*πt*)*, and then the solution of P*2(*πt*) *will satisfy the following gap:*

$$\frac{1}{T} \sum\_{t=0}^{T-1} E\left[w\_t(Q\_t^{\cdot 2} + MA\_t^{\cdot 2})\right] \le (\frac{1}{pq\_1} + \frac{M}{q\_2})\tilde{w}\sigma^2 + \frac{(V+Z)}{2R} \tag{26}$$

*That is, the solution of P*2(*πt*) *can be approximated by the solution of P*1(*πt*)*, and the gap between them is* (*V*+*Z*) <sup>2</sup>*<sup>R</sup> .*

#### **Proof.** See Appendix D.

To be precise, the proof of this gap result in Appendix D requires *At* and *Qt* in (2) and (3) to be lower bounded by zero. Nevertheless, our numerical results show consistence with this gap even when they are non-negative. Note that as the value of *R* is taken as large as possible, and the result obtained by the Lyapunov optimization algorithm *P*2(*πt*) can be made arbitrarily close to the optimal result *P*∗ <sup>1</sup> (*πt*). (*V*+*Z*) <sup>2</sup>*<sup>R</sup>* can also seem to be the ratio of the weight of the energy constraints and UoI, which shows the tradeoff between the UoI and the energy constraints.

#### **4. Numerical Results**

In this section, we present extensive numerical results to explore the behaviour of the optimal scheme under various constraints and scenarios. At the beginning of each time slot, the terminal first decides whether to transmit the error packets to the service center or not. The transmission takes 1 ms and goes through a wireless channel with a fixed failure transmission rate. At the same time, the service center transmits the estimation error (feedback) to the terminal, which also takes 1 ms with no failure rate. At the end of each time slot, the feedback arrives at the terminal, and the terminal will decide whether to sense this feedback. Meanwhile, the service channel receives the error packets and the latest estimation of Gaussian noise. The service center will immediately calculate the error difference between the transmitted status information and the received status information and add that new error into the error packet.

#### *4.1. Response to Urgency Levels*

To demonstrate the system's response to a new urgency, for every 5000 time slots, we set W = 100 in the 50 consecutive time slots and W = 1 in the rest of the time slots. The transmission/sense energy constraints are set as *ϕ*<sup>1</sup> = 0.25 and *ϕ*<sup>2</sup> = 0.5. The channel error rate is *p* = 0.8, the weight of the UoI is set as *M* = 2.5 and *R* = 2, and the weight of the system states is set as *V* = *Z* = 1. Additionally, the Gaussian noise variance will be set to unity. Figures 2–4 show a sample evolution of the squared of errors *MA*<sup>2</sup> *<sup>t</sup>* + *Q*<sup>2</sup> *t* and two virtual queue length *Ht*, *Gt*. Observing Figures 2–4, we understand that when the urgency level rises, the square of errors will drop significantly, and the virtual queues will keep increasing because update transmissions are ramped up. However, due to the energy constraints, the terminal's probability of transmitting and sensing are affected. This is the reason why the square of errors will increase, and the transmission virtual queue will decrease after the urgency. These show that the system can swiftly respond to urgency levels while keeping the error variance portion of UoI (i.e., *Q*<sup>2</sup> *<sup>t</sup>* + *MA*<sup>2</sup> *<sup>t</sup>* ) at a reasonable level at all times.

**Figure 2.** UoI sequence obtained by the proposed Lyapunov algorithm under a specific realization of weights *wt*.

**Figure 3.** Transmission virtual queue under the same realization of weight *wt* in Figure 2.

**Figure 4.** Sensing virtual queue under the same realization of weight *wt* in Figure 2.

#### *4.2. Tradeoff between UoI and System Parameters*

In this section, we compare how the relationship between different variables will affect the UoI in the system. Unless otherwise specified, we set the energy constraint of transmission/sensing as *ϕ*<sup>1</sup> = *ϕ*<sup>2</sup> = 0.8, the weight of the system stability as *V* = *Z* = 1, the weight of totally UoI and the UoI in service center as *R* = *M* = 2, the channel error rate as *p* = 0.8, and the weight of urgency at each time slot is i.i.d. with probability 0.99 being 1 and probability 0.01 being 100.

Figures 5 and 6 present the relationship between UoI and transmission/sense energy constraint. They also show the effect of system stability weights on UoI. In Figure 5, the energy constraint of transmission ranges from 0.1 to 1.0, and the weight of the queue stability (i.e., the virtual queue levels) in the transmission part will be set as *V* = 1, 10, 100, and 1000. Similarly, in Figure 6, we set the energy constraint of sensing from 0.1 to 1.0 and *Z* = 1, 10, 100, and 1000. We observe that when average energy is less constrained, the UoI decreases. However, the UoI will not change much when the transmission frequency reaches 0.5. This is due to the fact that the frequency constraint becomes inactive after a certain level depending on the sensing activity. As sensing and transmission are in tandem, the higher frequency drives the overall performance. Moreover, when the weight of the stability *V* and *Z* are small, e.g., *V* = 1 or *Z* = 1, we pay more attention to the value of UoI than the frequency of transmission levels, yielding a virtual queue significantly above the set constraint. On the other hand, if we set the weight of the stability *V* and *Z* at a high level, e.g., *V* = 1000 or *Z* = 1000, the virtual queue stability becomes much more important, which compromises UoI performance.

In Figure 7, the energy constraint of transmission will be set from 0.1 to 1.0, and the failure probability of transmission will be set as *p* = 0.2, 0.4, 0.6, 0.8, and 1.0. We observe that the higher *p* is, the lower the average UoI is. This is because we need to decide to transmit more frequently to achieve the optimal average of UoI when the success rate is lower. In Figure 8, we observe that the average UoI decreases no matter whether *ϕ*<sup>1</sup> or *ϕ*<sup>2</sup> increases because we have more chances to transmit or sense when the energy is sufficient. Additionally, as *ϕ*<sup>2</sup> gets smaller, the curve will converge earlier because the error packets in the service center are the input of the terminal. When we have less probability of sensing the feedback, the transmission frequency will also not be large because of the input limitation, even if the transmission energy is sufficient.

**Figure 5.** Tradeoff between transmission energy constraint, V and UoI.

**Figure 6.** Tradeoff between sense energy constraint, Z and UoI.

**Figure 7.** Tradeoff between transmission energy constraint, channel failure rate *p*, and UoI.

**Figure 8.** Tradeoff between transmission energy constraint, sensing energy constraint, and UoI.

In Figure 9, we set the weight of total UoI as *R* = 1, 8, 16 and 64, and the weight of two virtual queues as *V* = *Z* = 20. As expected, the larger the weight of the total UoI is, the smaller the average UoI will be. This is because the system will consider the UoI more important and will take more chances to transmit and sense. Moreover, the Lyapunov gap, i.e., (*V*+*Z*) <sup>2</sup>*<sup>R</sup>* , will diminish as R increases.

**Figure 9.** Tradeoff between transmission energy constraint, R and UoI.

#### *4.3. Tradeoff between UoI and System Stability*

The tradeoff between the target function and the system stability is always an exciting and crucial question in the Lyapunov optimization framework. This section will show examples of how different weights can affect the system stability and UoI. We set *T* = 10, 000 and channel error rate as *p* = 0.8. The urgency weight *wt* is determined as an i.i.d. random process with probability 0.99 being 1 and probability 0.01 being 1000. We will observe the number of update transmissions and senses (i.e. the energies spent for update transmission and sensing throughout *T* = 10, 000 slots) to represent system stability.

In Figures 10 and 11, we set the weight of the system stability as *V* = *Z* = 10, and the weight of UoI as *R* = *M* = 2. As the energy is sufficient, we can have more chances to transmit and sense. In addition, the number of transmissions is always smaller or equal to the number of senses. This makes sense because the input error in the terminal comes from the service center and will be sent together in one transmission. In addition, even if there is no energy constraint for the transmission, e.g., *ϕ*<sup>1</sup> = 1.0, the number of transmissions will not reach the value of constraints. This is due to the fact that the frequency constraint becomes inactive after a certain level. However, when *ϕ*<sup>2</sup> ≤ 0.2, the energy spent for sensing goes above the set energy constraints. The reason is that the weight of UoI is much larger than the weight of stability. This means that the system will sacrifice stability for better UoI.

**Figure 10.** Energy spent for update transmission when V = Z = 10.

**Figure 11.** Energy spent for sensing when V = Z = 10.

In Figures 12 and 13, we set the weight of the system stability as *V* = *Z* = 80, which is larger than the weight of UoI. We see that both the transmission and sensing constraints are not binding. Comparing with the Figures 5 and 6, we observe that the UoI with *V*, *Z* = 100 is close to the UoI with *V*, *Z* = 1. Hence, by sacrificing a small amount of UoI, a very stable system can be guaranteed.

**Figure 12.** Energy spent for update transmission when V = Z = 80.

**Figure 13.** Energy spent for sensing when V = Z = 80.

In Figure 14, we set the weight of the UoI in service center as *M* ∈ [1, 10], the weights Z = 8, 16, 32, 64, and 128, and *V* = 5, *ϕ*<sup>1</sup> = 0.5 and *ϕ*<sup>2</sup> = 0.8. The virtual queue *Gt* is small when its weight is large, and the energy constraints are tight. In addition, when the weight of UoI in the service center *M* increases, the sensing time will keep increasing because the UoI in the service center is much more important than the virtual queue stability and the information in the terminal. This also exemplifies that our framework can accommodate different cases flexibly by using different weights.

**Figure 14.** Tradeoff between *M*, *Z* and sensing frequency.

#### *4.4. Comparison of Lyapunov Optimal Performance with Other Algorithms*

The greedy and probabilistic algorithms are also very suitable naive algorithms to solve this problem. The main idea of the greedy algorithm is that the terminal will decide to transmit/sense at time *t* if the instantaneous transmission/sensing frequency at time *t* has not reached the corresponding set limits. Moreover, for the probabilistic algorithm, in each time slot the terminal will transmit/sense with probability equal to the value of frequency constraints. For Figure 15, we set the weight of system stability as *V* = *Z* = 30. Channel success rate is set as *p* = 0.6, sense energy constraint is set as *ϕ*<sup>2</sup> = 0.8, and the weight of urgency at each time slot is the same as before. Because the greedy algorithm takes action independent of urgency, we will compare the average UoI with *wt* = 1. From the figure, the average error portion of UoI (i.e., *Q*<sup>2</sup> *<sup>t</sup>* + *MA*<sup>2</sup> *<sup>t</sup>* ) obtained by Lyapunov optimization is always lower than the other two algorithms, especially when the energy is insufficient and the gap closes with increasing energy availability.

**Figure 15.** Lyapunov optimal algorithm and greedy algorithm.

Recall that UoI can subsume various AoI problems. For instance, if we set the cost function *δ*(.) as a linear function with the unit parameter and the urgency weight *wt* = 1, then we can express the AoI in terminal *Q*˜*<sup>t</sup>* and the AoI in service center *A*˜*<sup>t</sup>* as

$$
\tilde{A}\_{t+1} = (1 - lL\_2(t))\tilde{A}\_t + 1 \tag{27}
$$

$$
\hat{Q}\_{t+1} = (1 - S(t)\mathcal{U}\_1(t))\mathcal{Q}\_t + \mathcal{U}\_2(t)\mathcal{A}\_t. \tag{28}
$$

Let us use the same Lyapunov optimization algorithm described earlier along with the same weights for system state variables and target function for a fair comparison. In Figure 16, we set the weight of virtual queues as *V* = *Z* = 20, the weight of UoI as *R* = *M* = 2, the weight of system states *θ*, *β* in AoI optimal will be the same as the value of UoI optimal and will be calculated each round. Additionally, set the probability of fail transmission as *p* = 0.6, sensing frequency limitation as *ϕ*<sup>2</sup> = 0.6. We can deduce that the average UoI obtained by UoI optimal is much better than the value obtained by AoI optimal. In addition, the value of average UoI by UoI optimal is smaller than that of the average weighted AoI by AoI optimal. This is because, in the AoI model, the increment *Kt* will always be 1; however, the UoI model yields a lower expectation.

**Figure 16.** Lyapunov optimization vs. AoI optimal.

#### **5. Conclusions**

This paper focused on urgency of information (UoI) optimization through joint sensing and transmission. We proposed a new interactive status updating problem over a point-topoint channel in which transmission and sensing actions are determined to minimize UoI as a combination of the staleness of sensed data and synchronization between two ends under resource constraints, and we used a Lyapunov optimization framework for its optimization. We obtained the gap between the optimal solution and the result gained by the Lyapunov optimal algorithm, and proved that the gap between them can be made arbitrarily small. We presented an extensive numerical study that illustrates various features of the model and resulting algorithm, and potential performance improvements with respect to several schemes. In our future work, we plan to extend this work in multiple directions such as the case of multiple terminals in series or parallel, on demand UoI definition and optimization as well as the cases of computation transmission tradeoffs and dynamical energy constraints.

**Author Contributions:** Conceptualization, Z.J., P.R. and O.O.; methodology, Z.J., P.R. and O.O.; software, Z.J.; validation, Z.J., P.R. and O.O.; formal analysis, Z.J., P.R. and O.O.; investigation, Z.J., P.R. and O.O.; writing—original draft preparation, Z.J. and P.R.; writing—review and editing, Z.J., P.R. and O.O.; visualization, Z.J. and P.R.; supervision, P.R. and O.O.; project administration, O.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by National Science Foundation under grant CNS 2219180.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Based on (7) we can get the following sequence of steps:

$$\begin{aligned} &E\left[H\_{t+1}^2 - H\_t^2 | Y\_t\right] \\ &\le E\left[(H\_t - \varrho\_1 + \iota \mathcal{U}\_1(t))^2 - H\_t^2 | Y\_t\right] \\ &= E\left[H\_t^2 + \varrho\_1^2 + \mathcal{U}\_1(t)^2 - 2\varrho\_1 H\_t + 2H\_t \mathcal{U}\_1(t) - 2\varrho\_1 \mathcal{U}\_1(t) - H\_t^2 | Y\_t\right] \\ &= E\left[(\varrho\_1 - \iota \mathcal{U}\_1(t))^2 + 2(-\varrho\_1 + \iota \mathcal{U}\_1(t))H\_t | Y\_t\right] \\ &\le 1 + 2(-\varrho\_1 + E\left[\iota \mathcal{U}\_1(t) | Y\_t\right])H\_{t\prime} \end{aligned} \tag{A1}$$

where the first inequality follows from the definition of *Ht* in (7) used in the identity that for any *<sup>X</sup>* <sup>=</sup> max{*<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>−</sup> *<sup>c</sup>*, 0}, *<sup>X</sup>*<sup>2</sup> <sup>≤</sup> (*<sup>a</sup>* <sup>+</sup> *<sup>b</sup>* <sup>−</sup> *<sup>c</sup>*)2, the following equalities follow from rearranging terms and the final inequality follow from *ϕ*<sup>1</sup> − *U*1(*t*) ≤ 1. Based on Equation (8), and using the same method as that for obtaining (A1), we get the following inequality:

$$E\left[\left.G\_{t+1}\right.^2 - \left.G\_t\right.^2|\mathcal{Y}\_t\right] \le 1 + 2(-\varphi\_2 + E\left[\mathcal{U}\_2(t)|\mathcal{Y}\_t\right])\mathcal{G}\_t.\tag{A2}$$

Based on (2), we have

$$\begin{aligned} &E\left[A\_{t+1}\,^2 - A\_t\,^2|\mathcal{Y}\_t\right] \\ &= E\left[\left(1 - \mathcal{U}\_2(t)\right)^2 A\_t\,^2 + 2A\_t K\_t (1 - \mathcal{U}\_2(t)) + K\_t^2 - A\_t\,^2|\mathcal{Y}\_t\right]. \end{aligned} \tag{A3}$$

Recall that *Kt* <sup>∼</sup> (0, *<sup>σ</sup>*2) follows i.i.d Gaussian distributions. This is due to the fact that the queue *At* is the summation of *Kt*; it is obvious that the summation of the Gaussian distribution is still a Gaussian distribution. As a result, the error in the service center also follows a Gaussian distribution *At* <sup>∼</sup> (0, *<sup>n</sup>σ*2). In addition, as *<sup>U</sup>*2(*t*) <sup>∈</sup> {0, 1}, we can simplified (A3) by *<sup>U</sup>*2(*t*)<sup>2</sup> <sup>=</sup> *<sup>U</sup>*2(*t*) and (<sup>1</sup> <sup>−</sup> *<sup>U</sup>*2(*t*))<sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>U</sup>*2(*t*). As a result, we have

$$E\left[A\_{t+1}\,^2 - A\_t\,^2|\,\mathcal{Y}\_t\right] = -n\_t\sigma^2 E\left[\mathcal{U}\_2(t)|\,\mathcal{Y}\_t\right] + \sigma^2. \tag{A4}$$

Based on (3) and the fact that (<sup>1</sup> <sup>−</sup> *<sup>U</sup>*1(*t*)*St*)<sup>2</sup> = (<sup>1</sup> <sup>−</sup> *<sup>U</sup>*1(*t*)*St*), we have

$$\begin{split} &E\left[\left.Q\_{t+1}\right|^2 - Q\_t^2\left|Y\_t\right|\right] \\ &= E\left[\left(1 - \mathcal{U}\_1(t)S\_t\right)^2 Q\_t^2 + 2Q\_t A\_t \mathcal{U}\_2(t) \left(1 - \mathcal{U}\_1(t)S\_t\right) + A\_t^2 \mathcal{U}\_2(t)^2 - Q\_t^2 \left|Y\_t\right|\right] \\ &= -Q\_t^2 p E\left[\mathcal{U}\_1(t)|\mathcal{Y}\_t\right] + n\_t \sigma^2 E\left[\mathcal{U}\_2(t)|\mathcal{Y}\_t\right]. \tag{A5} \end{split} \tag{A5}$$

Based on (A1)–(A5), we have

$$\begin{split} \Delta\_{t} &= E[L\_{t+1} - L\_{t} | Y\_{t}] \\ &= E\left[\frac{1}{2}V(H\_{t+1}^{2} - H\_{t}^{-2}) + \frac{1}{2}Z(G\_{t+1}^{2} - G\_{t}^{2}) + \frac{1}{2}\theta(Q\_{t+1}^{2} - Q\_{t}^{2}) + \frac{1}{2}\beta(A\_{t+1}^{2} - A\_{t}^{-2})|Y\_{t}|\right] \\ &\leq \frac{1}{2}(V+Z) + \frac{1}{2}\beta r^{2} - V\rho\_{1}H\_{t} - Z\rho\_{2}G\_{t} + (VH\_{t} - \frac{1}{2}\theta pQ\_{t}^{2})E[\mathcal{U}\_{1}(t)|Y\_{t}] \\ &+ (ZQ\_{t} + \frac{1}{2}\theta n\_{t}r^{2} - \frac{1}{2}\beta n\_{t}r^{2})E[\mathcal{U}\_{2}(t)|Y\_{t}]. \end{split} \tag{A6}$$

#### **Appendix B**

Set the Lyapunov penalty function as *ft* = *Rwt*+1(*Qt*+<sup>1</sup> <sup>2</sup> + *MAt*+<sup>1</sup> <sup>2</sup>), where *wt* <sup>≥</sup> 0. Based on (A4) and (A5), we can get the Lyapunov penalty function as follows:

$$E[f\_t|\mathcal{Y}\_t] = \mathbb{R}E\left[w\_{t+1}(-Q\_t^2 S\_t l I\_1(t) + 2Q\_t A\_I l I\_2(t)(1 - S\_t l I\_1(t)) + A\_t^2 l I\_2(t) + Q\_t^2)|\mathcal{Y}\_t\right]$$

$$\quad + \text{RME}\left[w\_{t+1}(-A\_t^2 l I\_2(t) + K\_t^2 + A\_t^2)|\mathcal{Y}\_t\right]$$

$$= \hat{\text{R}w}(-Q\_t^2 p E[I\_1(t)|\mathcal{Y}\_t] + Q\_t^2 + M\sigma^2 + Mn\_t\sigma^2 + (1 - M)n\_t\sigma^2 E[I\_2(t)|\mathcal{Y}\_t]). \tag{A7}$$

Combining (A6) and (A7), the Lyapunov drift plus penalty function satisfies the following inequality:

$$\begin{split} \Delta\_{\mathrm{f}} + E[f\_{\mathrm{f}}|\mathcal{Y}\_{\mathrm{f}}] \leq & \frac{1}{2}(V+Z) + \frac{1}{2}\beta\sigma^{2} + R\widetilde{w}(Q\_{\mathrm{f}}^{2} + M\sigma^{2} + Mn\_{l}\sigma^{2}) - V\varrho\_{1}H\_{l} - Z\varrho\_{2}G\_{l} \\ &+ (VH\_{\mathrm{f}} - \frac{1}{2}\theta pQ\_{\mathrm{f}}^{\prime} - R\widetilde{w}pQ\_{\mathrm{f}}^{2})E[\mathcal{U}\_{\mathrm{f}}(t)|\mathcal{Y}\_{\mathrm{f}}] \\ &+ (Z\mathcal{G}\_{\mathrm{f}} + \frac{1}{2}\theta n\_{l}\sigma^{2} - \frac{1}{2}\beta n\_{l}\sigma^{2} + (1-M)R\widetilde{w}n\_{l}\sigma^{2})E[\mathcal{U}\_{\mathrm{f}}(t)|\mathcal{Y}\_{\mathrm{f}}]. \end{split} \tag{A8}$$

#### **Appendix C**

We start by assuming that the initial values satisfy *E*[*L*0] < ∞. If Δ*<sup>t</sup>* + *E*[ *ft*] ≤ *C*, where *C* is a constant, then take the summation over *T* time slots to get

$$E\left[L\_T - L\_0 + \sum\_{t=0}^{T-1} f\_t\right] \le TC. \tag{A9}$$

Based on (9) we have

$$E[L\_T] \ge \frac{1}{2} V E\left[ (H\_t^2) \right]. \tag{A10}$$

From the definition of the virtual queue *Ht* (A1), it is obviously that *E* (*H*<sup>2</sup> *t* ) <sup>≥</sup> (*E*[(*Ht*)] )2, and also because the penalty function *ft* is always non-negative, we can change (A9) into

$$\frac{1}{2}VE[(H\_t)])^2 \le TC + L\_0$$

$$E[(H\_t)] \le \frac{\sqrt{2(TC + L\_0)}}{V}$$

$$\frac{E[(H\_t)]}{T} \le \frac{\sqrt{2(TC + L\_0)}}{VT}.\tag{A11}$$

Since *T* → ∞, the right hand side of (A11) is equal to 0. As a result,

$$\frac{E[(H\_t)]}{T} \to 0.\tag{A12}$$

As a result, the virtual queue *Ht* is mean rate stable. The other system states *Qt*, *At* and the virtual queue *Zt* can be proven as mean rate stable with the same method above. Therefore, the expressions of the queues in the system are appropriate, and the Lyapunov optimization algorithm is applicable. We also recall that the evolution of *Qt*, *At*, although not originally in a queue form, can be easily redefined to be bounded below by 0, and the analysis in our paper will be valid without any changes.

#### **Appendix D**

First, let us assume that *P*<sup>1</sup> has an optimal solution, which is to take the best decision for every time slot and get the optimal result of the target function (4). Because this optimal solution does not use the Lyapunov algorithm, the decision has no relationship with the queues and virtual queues in the system. Below *π<sup>t</sup>* = {*U*1(*t*), *U*2(*t*)} will be used to represent the decision policy of the Lyapunov optimization algorithm and *π*∗ *<sup>t</sup>* = {*U*1(*t*)∗, *U*<sup>∗</sup> <sup>2</sup> (*t*)} is used to represent the decisions of the optimal solution. Based on Equations (A1)–(A5), we have

$$L\_{t+1} - L\_t + f\_t(\pi\_t) \le \frac{1}{2}(V + Z) + (-\varrho\_1 + \mathcal{U}\_1(t))H\_t + (-\varrho\_2 + \mathcal{U}\_2(t))G\_t$$

$$\begin{split} + \frac{1}{2}\theta \left( -Q\_t^{-2} \mathcal{U}\_1(t)\mathcal{S}\_t + 2Q\_t A\_t \mathcal{U}\_2(t) \left( 1 - \mathcal{U}\_1(t)\mathcal{S}\_t \right) + A\_t^{-2} \mathcal{U}\_2(t) \right) \\ + \frac{1}{2}\beta \left( -\mathcal{U}\_2(t)A\_t^{-2} + 2A\_t \mathcal{K}\_t \left( 1 - \mathcal{U}\_2(t) \right) + \mathcal{K}\_t^{-2} \right) + f\_t(\pi\_t). \end{split} \tag{A13}$$

Because the optimal solution is a solution of the problem, it should also obey (A13)

$$L\_{t+1} - L\_t + f\_t(\pi\_t) \le \frac{1}{2}(V + Z) + (-\varrho\_1 + \mathcal{U}\_1^\*(t))H\_t + (-\varrho\_2 + \mathcal{U}\_2^\*(t))G\_t$$

$$\begin{split} + \frac{1}{2}\theta \left( -\mathcal{Q}\_t^{-2} \mathcal{U}\_1^\*(t)\mathcal{S}\_t + 2\mathcal{Q}\_t \mathcal{A}\_l \mathcal{U}\_2^\*(t)(1 - \mathcal{U}\_1^\*(t)\mathcal{S}\_l) + \mathcal{A}\_l^{\*2} \mathcal{U}\_2^\*(t) \right) \\ + \frac{1}{2}\beta \left( -\mathcal{U}\_2^\*(t)A\_l^\* + 2A\_l \mathcal{K}\_l (1 - \mathcal{U}\_2^\*(t)) + \mathcal{K}\_l^2 \right) + f\_t(\pi\_t^\*). \end{split} \tag{A14}$$

Then take the expectation on both sides of (A14)

$$E\left[L\_{l+1} - L\_l + f\_l(\pi\_l)|Y\_l\right] \le \frac{1}{2}(V+Z) + E\left[(-\varrho\_1 + \mathcal{U}\_1^\*(t))H\_l\right] + E\left[(-\varrho\_2 + \mathcal{U}\_2^\*(t))G\_l\right]$$

$$+ \frac{1}{2}\theta E\left[-Q\iota^2\mathcal{U}\_1^\*(t)S\_l + 2Q\_lA\_l\mathcal{U}\_2^\*(t)(1 - \mathcal{U}\_1^\*(t)S\_l) + A\_l^\perp\mathcal{U}\_2^\*(t)\right]$$

$$+ \frac{1}{2}\beta E\left[-\mathcal{U}\_2^\*(t)A\_l^\perp + 2A\_lK\_l(1 - \mathcal{U}\_2^\*(t)) + K\_l^\perp\right] + E[f\_l(\pi\_l^\*)].\tag{A15}$$

As is well known in the literature [7,8], there exists a w-optimal decision rule that makes decision randomly and independent of the variables in the system. In the analysis below, we assume such an optimal policy and denote it as (*U*∗ <sup>1</sup> (*t*), *U*<sup>∗</sup> <sup>2</sup> (*t*)):

$$\begin{split} E\left[L\_{t+1} - L\_t + f\_t(\pi\_t)|\mathcal{Y}\_t\right] &\leq \frac{1}{2}(V+Z) + (-\varrho\_1 + E[\mathcal{U}\_1^\*(t)])E[H\_t] + (-\varrho\_2 + E[\mathcal{U}\_2^\*(t)])E[\mathcal{G}\_t] \\ &+ \frac{1}{2}\theta(-E\left[Q\_t^2\right]E[\mathcal{U}\_1^\*(t)]p + 2E[Q\_l]E[A\_l]E[\mathcal{U}\_2^\*(t)](1 - E[\mathcal{U}\_1^\*(t)]p) + E\left[A\_l^{\*2}\right]E[\mathcal{U}\_2^\*(t)]) \\ &+ \frac{1}{2}\theta(-E[\mathcal{U}\_2^\*(t)]E\left[A\_l^{\*2}\right] + 2E[A\_l]E[K\_l](1 - E[\mathcal{U}\_2^\*(t)]) + E\left[K\_l^{\*2}\right]) + E[f\_l(\pi\_l^\*)]. \end{split} \tag{A16}$$

Placing *E*[*Kt*] = 0, *E Kt* 2 = *σ*2, *E*[*At*] = 0, as well as *ϕ*<sup>1</sup> into *E U*∗ <sup>1</sup> (*t*) , and *ϕ*<sup>2</sup> into *E*[*U*∗ <sup>2</sup> (*t*)], we get the following. It is worth noting that placing the time-average constraint on *E U*∗ <sup>1</sup> (*t*) and *E*[*U*∗ <sup>2</sup> (*t*)] with equality in the Lyapunov drift analysis can be justified easily by observing that the constraints must be active almost always over the *T* time horizon *O*(*T*) time instants:

$$\begin{split} E[L\_{t+1} - L\_t + f\_t(\pi\_t) | \mathcal{Y}\_t] \le &\frac{1}{2}(V + Z) + \frac{1}{2}\theta(-E\left[Q\_t^2\right]pq\_1 + E[A\_t]q\_2) \\ &+ \frac{1}{2}\beta(-q\circ E[A\_t] + \sigma^2) + E[f\_t(\pi\_t^\*)].\end{split} \tag{A17}$$

Recalling that queues *At* and *Qt* are mean rate stable, we have, *E Qt* 2 = *E Qt*+<sup>1</sup> 2 and *E At* 2 = *E At*+<sup>1</sup> 2 . From (A3)–(A5), we can get the expectation of *A*<sup>2</sup> *<sup>t</sup>* and *Q*<sup>2</sup> *<sup>t</sup>* as

$$E\left[A\_t^{\;2}\right] = \frac{\sigma^2}{\varrho\_2} \tag{A18}$$

$$E\left[Q\_t^2\right] = \frac{\sigma^2}{p\varphi\_1}.\tag{A19}$$

As a result, (A17) can be simplified as follows:

$$\begin{split} E[L\_{t+1} - L\_t + Rf\_t(\pi\_t) | Y\_t] \le &\frac{1}{2}(V + Z) + \frac{1}{2}\theta(-\frac{\sigma^2}{pq\_1}p\,\varphi\_1 + \frac{\sigma^2}{q\_2}p\_2) \\ &+ \frac{1}{2}\beta(-\varrho\_2\frac{\sigma^2}{q\_2} + \sigma^2) + E[f\_t(\pi\_t^\*)] \\ =&\frac{1}{2}(V + Z) + E[f\_t(\pi\_t^\*)]. \end{split} \tag{A20}$$

Now take the summation of the total T-time slot on both sides of (A20) and we have

$$\mathbb{E}\left[L\_T - L\_0 + \sum\_{t=0}^{T-1} f\_t(\pi\_t) \, | \, Y\_t \right] \le \frac{1}{2} (V + Z)T + \sum\_{t=0}^{T-1} \mathbb{E}[f\_t(\pi\_t^\*)].\tag{A21}$$

Note that *LT* <sup>≥</sup> 0 and *<sup>L</sup>*<sup>0</sup> *<sup>T</sup>* = 0; we then divide *T* on both sides of (A21) to get the time averaged result

$$\frac{1}{T}E\left[\sum\_{t=0}^{T-1}f\_t(\pi\_t)|Y\_t\right] \le \frac{1}{2}(V+Z) + \frac{1}{T}\sum\_{t=0}^{T-1}E[f\_t(\pi\_t^\*)].\tag{A.22}$$

Finally, divide R on both side of (A22) to convert *ft* into target function

$$P\_1^\*(\pi\_l) \le P\_2(\pi\_l) \le \frac{V + Z}{2R} + P\_1^\*(\pi\_l). \tag{A23}$$

#### **References**


## *Article* **Optimal Information Update for Energy Harvesting Sensor with Reliable Backup Energy**

**Lixin Wang 1, Fuzhou Peng 2, Xiang Chen <sup>2</sup> and Shidong Zhou 1,\***


**\*** Correspondence: zhousd@tsinghua.edu.cn

**Abstract:** The timely delivery of status information collected from sensors is critical in many realtime applications, e.g., monitoring and control. In this paper, we consider a scenario where a wireless sensor sends updates to the destination over an erasure channel with the supply of harvested energy and reliable backup energy. We adopt the metric age of information (AoI) to measure the timeliness of the received updates at the destination. We aim to find the optimal information updating policy that minimizes the time-average weighted sum of the AoI and the reliable backup energy cost. First, when all the environmental statistics are assumed to be known, the optimal information updating policy exists and is proved to have a threshold structure. Based on this special structure, an algorithm for efficiently computing the optimal policy is proposed. Then, for the unknown environment, a learning-based algorithm is employed to find a near-optimal policy. The simulation results verify the correctness of the theoretical derivation and the effectiveness of the proposed method.

**Keywords:** age of information; information update; energy harvesting; reliable backup energy

#### **1. Introduction**

Timely information updates from wireless sensors to destinations are essential for real-time monitoring and control systems. To describe the timeliness of information updates from the receivers' perspective, a new metric called age of information (AoI) is proposed [1–3]. Unlike general performance metrics, such as delay and throughput, AoI refers to the time elapsed since the generation of the latest received information. A lower AoI generally reflects more timely information at the destination. Therefore, the AoI-minimal status updating policies in sensor networks have been widely studied [4–7].

The destinations always desire information updates in as timely a manner as possible, which is typically constrained by sensors' energy. Generally, energy sources include the grid and sensors' own non-rechargeable batteries. We call these sources *reliable energy* since they enable sensors to reliably operate until the power grid is cut off or sensors' batteries are exhausted [8]. Specifically, if sensors consume energy from the grid, they need to pay the electricity bill; if sensors only use the power of their own batteries, the price of sensing and transmitting updates will be the cost of frequent battery replacement. There is clearly a price to pay for using reliable energy to update. Energy harvesting (EH) is a promising technology that can help reduce the consumption of reliable energy for information update [9,10]. It can continuously extract energy from solar power, ambient RF, and thermal energy and store the harvested energy in sensors' rechargeable batteries. The stored energy is renewable and can be used for free. Hence, in this case, the reliable energy can serve as *backup* energy. The design of the coexistence of reliable backup energy and harvested energy has been researched and promoted in academia and industry [8,11–14]. The mixed energy supply mode can enhance the reliability of the system.

However, the irregular arrivals of harvested energy and the limited capacity of rechargeable batteries still motivate us to schedule the energy usage properly to reduce

**Citation:** Wang, L.; Peng, F.; Chen, X.; Zhou, S. Optimal Information Update for Energy Harvesting Sensor with Reliable Backup Energy. *Entropy* **2022**, *24*, 961. https://doi.org/ 10.3390/e24070961

Academic Editors: Udo Von Toussaint and Chintha Tellambura

Received: 22 February 2022 Accepted: 7 July 2022 Published: 11 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the cost of using reliable backup energy while maintaining the timeliness of information updates (i.e., the average AoI). Intuitively, the average AoI and the cost of using reliable energy cannot be minimized simultaneously. On the one hand, a lower average AoI means that the sensor senses and transmits updates more frequently, which will increase the consumption of reliable backup energy since the harvested energy is limited. On the other hand, to reduce the cost of reliable backup energy, the sensor will only exploit the harvested energy. Due to the uncertainty of the energy harvesting behavior, the average AoI of the system will inevitably increase. Therefore, in this paper, we focus on achieving the best trade-off between the average AoI and the cost of reliable backup energy.

We consider a sensor-based information update system, where an energy harvesting sensor with reliable backup energy sends timely updates to the destination through an erasure channel. Based on our settings, we will minimize the long-term average weighted sum of the AoI and the paid reliable energy cost to find the optimal information updating policy by Markov decision process (MDP) theory [15]. First, we assume that the sensor knows the relevant statistics in advance, such as the success probability of each transmission and the probability of energy arrival, so that the sensor can make the optimal decision at any time. Then we consider a more realistic scenario where the sensor has no knowledge of the environment. In such an unknown environment, learning-based approaches should be adopted to obtain the updating policy.

#### *1.1. Related Work*

There have been a series of related works studying AoI minimization in EH communication systems [16–34]. In these systems, each update consumes harvested energy and is constrained by the energy causality.

Refs. [16–23] focus on how to optimize AoI under general energy causality constraints, where different battery model settings are considered. Constrained by the average power available in the infinite-sized battery, ref. [16] shows that a lazy policy which leaves a certain idle period between updates outperforms the greedy policy under random service times. With the same assumption of an infinite-sized battery, ref. [17] focuses on both offline and online policies under energy replenishment constraints with zero service time. While considering fixed service times, the offline results in [17] are extended to a two-hop scenario in [18], and online policy is provided in [19]. In the case of the delay being controlled by transmission energy, ref. [20] also investigated the optimal offline policy. For the error-free and delay-free channel, the optimal updating policies were investigated for different battery settings [21,22]. Ref. [21] derived the asymptotically optimal policies for the infinite-sized, finite-sized, and unit-sized battery by renewal theory. It turned out to be a threshold policy for the unit-sized battery case. More general battery models were considered in [22]. The optimal policy was also proved to be multi-threshold and the energy-dependent thresholds were characterized explicitly. When the battery is finite sized and there is no feedback from the destination, it was shown that the optimal updating policy is of a threshold structure and the threshold is non-increasing with the battery level [23].

Refs. [24–30] studied how to properly utilize the harvested energy to transmit updates over imperfect channels. For the noisy channel, ref. [24] considered an infinite-sized battery model and derived the different optimal policies for updating with and without feedback. Ref. [25] further derived a closed-form expression for the threshold of the unit-sized battery model and extended the threshold-based policies to multiple sources case. To combat the noisy channel, some channel coding schemes for EH communication were investigated in [26,27]. In [28], the HARQ protocol was applied for a single EH sensor to send updates to the destination. The optimal policies were obtained by employing reinforcement learning in both known and unknown environments, but no clear intuition on the policy structure was provided. Considering energy harvesting wireless sensor networks (EH-WSNs), ref. [29] suggested to estimate the channel state of a Rayleigh fading channel before transmitting to improve the AoI, update interval and packet loss performance, despite the associated time and energy costs. Ref. [30] aimed to minimize the average AoI of an EH-aided secondary

user(SU) in a cognitive ratio network. The SU has to make sensing and updating decisions subject to random energy arrivals and the available spectrum. The sequential decision problem is formulated as a partially observable Markov decision process (POMDP).

Refs. [31–34] paid attention to other AoI-related metrics in EH communication and even the distributional properties of AoI, not just the average AoI. Different freshness metrics were considered, such as nonlinear AoI [31], urgency-aware AoI (U-AoI) [32], and peak AoI [33] in EH sensor network. To better understand the distributional properties of AoI, ref. [34] further derived closed-form expressions of the moment generating function (MGF) of AoI in an EH-powered queuing system using the stochastic hybrid systems (SHS) framework.

The above works focus on optimizing information freshness under the EH supply. Different from them, energy sources in this paper include both harvested energy and reliable backup energy, and our goal is to achieve the best trade-off between age and reliable energy consumption, instead of merely optimizing AoI. Among the above works, refs. [23,25] are the most related to our paper. The following Table 1 summarizes the detailed differences. It is worth noting that by letting the reliable energy consumption be small enough, our results can be compared with some prior results in [23,25].



The age–energy trade-off has been widely studied in [35–39]. The age–energy trade-off in the erasure channel was studied in [35], and the fading channel case was investigated in [36]. Ref. [37] adopted a truncated automatic repeat request (TARQ) scheme and characterized the age–energy trade-off for the IoT monitoring system. Optimum energy efficiency and AoI trade-off was considered in a multicast system in [38]. In [39] , the authors investigated the optimal age–energy trade-off, where status sensing and data transmission can be carried out separately. By the MDP analysis similar to [6,15], the optimal policy exists and is proved to have two thresholds. The energy sources are all reliable in these works, which means that the energy cost of the update is easy to track. However, the uncertainty of the energy arrival and mixed energy supplies bring more challenges to the MDP analysis in this paper. To the best of our knowledge, this paper is the first to consider the timeliness of the system under mixed energy supplies. The preliminary results of this paper are presented in [40].

#### *1.2. Main Contributions*

The main contributions of this paper are as follows:


that the optimal policy is of the threshold-type. Based on this special structure, we propose an efficient algorithm to compute the optimal policy.


#### *1.3. Organization*

The rest of this paper is organized as follows. In Section 2, we introduce the model of the information update system and formulate the problem. In Section 3, we analyze the optimal policy when all the statistics are known. In Section 4, we aim to minimize the average cost of updating in an unknown environment. In Section 5, we present the simulation results. Finally, in Section 6, we conclude the paper.

#### **2. System Model and Problem Formulation**

#### *2.1. System Model*

In this paper, we consider a point-to-point information update system, where a wireless sensor and a destination are connected by an erasure channel, as shown in Figure 1. The channel is assumed to be noisy and time invariant, and each update is corrupted with probability *p* during transmission (Note *p* ∈ (0, 1)). Both the free harvested energy stored in the rechargeable battery and the reliable backup energy that needs to be paid can be used for real-time environmental status updates.

Without loss of generality, time is slotted with equal length and indexed by *t* = 0, 1, 2 . . . . At the beginning of each time slot, the sensor decides whether to generate and transmit an update to the destination or stay idle. The decision action at slot *t*, denoted by *a*[*t*], takes value from action set A = {0, 1}, where *a*[*t*] = 1 means that the sensor decides to generate and transmit an update to the destination while *a*[*t*] = 0 means the sensor is idle. The destination will feed back an instantaneous ACK to the sensor through an error-free channel when it has successfully received an update and a NACK otherwise. We assume the above processes can be completed in one time slot. The destination keeps track of the environment status through the received updates. We apply the metric age of information to measure the freshness of the status information available at the destination.

**Figure 1.** System model.

#### 2.1.1. Age of Information

Age of Information (AoI) is defined as the elapsed time since the generation of the latest successfully received update [1–3]. Denote Δ[*t*] as the AoI of destination in time slot *t*. Then, we have

$$
\Delta[t] = t - \mathcal{U}[t]. \tag{1}
$$

where *U*[*t*] denotes the time slot when the most recently received update was generated before time slot *t*. In particular, the AoI will decrease to one if a new update is successfully received. Otherwise, it will increase by one. The evolution of AoI can be expressed as follows:

$$
\Delta[t+1] = \begin{cases} 1, & \text{successful transmission,} \\ \Delta[t] + 1, & \text{otherwise.} \end{cases} \tag{2}
$$

A sample path of AoI is depicted in Figure 2.

**Figure 2.** A sample path of AoI with initial age 1.

#### 2.1.2. Description of Energy Supply

We assume that only the sensor's measurement and transmission process will consume energy and ignore other energy consumption. The energy unit is normalized, so the generation and transmission for each update will consume one energy unit. As previously described, the energy sources of the sensor include energy harvested from nature and reliable backup energy.

For the harvested energy, the sensor can store it in a rechargeable battery for later use. The maximum capacity of the rechargeable battery is *B* units (*B* > 1). Considering the scarcity of energy in nature, the total energy harvested in one time slot may sometimes not reach an energy unit. So we consider using the Bernoulli process with the parameter *λ* to approximately capture the arrival process of harvested energy, which was also adopted in [41–43]. Let *b*[*t*] be the accumulated harvested energy in time slot *t*. That is, we have Pr{*b*[*t*] = 1} = *λ* and Pr{*b*[*t*] = 0} = 1 − *λ* in each time slot *t* (note *λ* ∈ (0, 1)). Here, we assume that the energy arrival at each slot is independently and identically distributed. Time-correlated energy arrival processes, such as Markov process, will be considered in future work.

For reliable backup energy, we assume that it contains much more energy units than the rechargeable battery, so the energy it contains can be viewed as infinite. However, it needs to be used for a fee compared to the free renewable energy stored in the rechargeable battery. Therefore, when the stored renewable energy is not zero, the sensor will prioritize using it for status updates; otherwise, it will automatically switch to the reliable backup energy until the sensor has harvested energy. Defining the power of the rechargeable battery at the beginning of time slot *t* as the battery state *q*[*t*], then the evolution of battery state from time slot *t* to *t* + 1 can be summarized as follows:

$$q[t+1] = \min\{q[t] + b[t] - a[t]u(q[t]), B\},\tag{3}$$

where *u*(·) is unit step function, which is defined as

$$\mu(\mathbf{x}) = \begin{cases} 1, & \text{if } \mathbf{x} > \mathbf{0}, \\ 0, & \text{otherwise.} \end{cases} \tag{4}$$

Suppose that under reliable energy supply, the cost of generating and transmitting an update is a non-negative value *Cr*. Defining *E*[*t*] as the paid reliable energy cost at the time slot *t*, then we have

$$E[t] = \mathbb{C}\_r a[t] (1 - u(q[t])).\tag{5}$$

#### *2.2. Problem Formulation*

Let Π denote the set of non-anticipative policies in which scheduling decision *a*[*t*] are made based on the action history {*a*[*k*]}*t*−<sup>1</sup> *<sup>k</sup>*=0, the evolution of AoI {Δ[*k*]}*<sup>t</sup> <sup>k</sup>*=0, the evolution of battery state {*q*[*k*]}*<sup>t</sup> <sup>k</sup>*=<sup>0</sup> as well as the system parameters (e.g., *p* and *λ*). In order to keep the information freshness at the destination, the sensor needs to send updates. However, due to the randomness of energy arrivals, the battery energy may sometimes be insufficient to support updates, and the sensor has to take energy from reliable backup energy. To balance the information freshness and the paid reliable backup energy cost, we aim to find the optimal information updating policy *π* ∈ Π that achieves the minimum of the timeaverage weighted sum of the AoI and the paid reliable backup energy cost. The problem is formulated as follows:

$$\min\_{\pi \in \Pi} \limsup\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\pi} \left\{ \sum\_{t=0}^{T-1} \left[ \Delta[t] + \omega E[t] \right] \right\},\tag{6}$$
  $\text{s.t.} \qquad (2), (3), (5),$ 

where *ω* is the pre-defined non-negative weighting factor. If *ω* = 0, the optimal policy is to update in each time slot, i.e., zero-wait policy [4]. Since the effect of energy can be ignored, if the rechargeable battery is not empty, the sensor uses the renewable energy; otherwise, the sensor will use the reliable energy directly. When *ω* > 0, the optimal policy is non-trivial. So we will focus on the optimal policy for *ω* > 0 in the rest of the paper. The smaller *ω* is, the more we attach importance to the system AoI; otherwise, the more emphasis is placed on the cost of reliable energy.

**Remark 1.** *The optimal trade-off between age and reliable energy consumption can also be formulated as a constrained problem, where the reliable energy consumption serves as a constraint (not exceeding Em) but not a penalty, and the goal is to minimize the long-term average age. By the Lagrangian method, it can be converted into an unconstrained weighted sum problem, where the Lagrangian multiplier is exactly the weight factor ω. So the solution proposed in this paper can be used. If there exists an ω such that the average reliable energy consumption in the minimum weighted sum is Em, the optimal policy of the weighted sum problem also minimizes the long-term average age with the Em constraint. Otherwise, a randomized optimal policy for the constrained problem needs to be considered; see details in [44].*

#### **3. Optimal Policy Analysis In A Known Environment**

In this section, we aim to solve the problem (6) in a known environment and obtain the optimal policy. It is difficult to solve the original problem directly due to the random erasures and the temporal dependency in both AoI and battery state evolution. However, since the statistics such as channel erasure probability and EH probability are known, we can reformulate the original problem as a time-average cost MDP with infinite state space and analyze the structure of the optimal policy.

#### *3.1. Markov Decision Process Formulation*

According to the system description mentioned in the previous section, the MDP is formulated as follows:


*Case 1* . *a* = 0,

$$\begin{cases} \Pr\{ (\Delta+1,q+1) | (\Delta,q),0 \} = \lambda\_\prime & \text{if } q < B\_\prime\\ \Pr\{ (\Delta+1,\theta) | (\Delta,\theta),0 \} = 1, & \text{if } q = B\_\prime\\ \Pr\{ (\Delta+1,q) | (\Delta,q),0 \} = 1-\lambda\_\prime & \text{if } q < B. \end{cases} \tag{7}$$

*Case 2*. *a* = 1,

$$\begin{cases} \Pr\{ (\Delta+1,q) | (\Delta,q),1\} = p\lambda\_{\prime} & \text{if } q>0, \\ \Pr\{ (1,q) | (\Delta,q),1\} = (1-p)\lambda\_{\prime} & \text{if } q>0, \\ \Pr\{ \Delta+1,q-1\} | (\Delta,q),1\} = p(1-\lambda)\_{\prime} & \text{if } q>0, \\ \Pr\{ (1,q-1) | (\Delta,q),1\} = (1-p)(1-\lambda)\_{\prime} & \text{if } q>0, \\ \Pr\{ (\Delta+1,1) | (\Delta,0),1\} = p\lambda\_{\prime} & \text{if } q=0, \\ \Pr\{ (1,1) | (\Delta,0),1\} = (1-p)\lambda\_{\prime} & \text{if } q=0, \\ \Pr\{ (\Delta+1,0) | (\Delta,0),0\} = p(1-\lambda)\_{\prime} & \text{if } q=0, \\ \Pr\{ (1,0) | (\Delta,0),0\} = (1-p)(1-\lambda), & \text{if } q=0. \end{cases} \tag{8}$$

In both cases, the evolution of AoI still follows Equation (2) and the evolution of battery state follows Equation (3).

• **One-step cost**. For the current state **x** = (Δ, *q*), the one-step cost *C*(**x**, *a*) of taking action *a* is expressed by

$$\mathbf{C}(\mathbf{x}, a) = \boldsymbol{\Delta} + \omega \mathbf{C}\_r a (1 - \boldsymbol{u}(q)). \tag{9}$$

After the above modeling, the original problem (6) is transformed into obtaining the optimal policy for the MDP to minimize the average cost in an infinite horizon:

$$\min\_{\pi \in \Pi} \limsup\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\pi} \left\{ \sum\_{t=0}^{T-1} \mathbb{C} (\mathbf{x}[t], a[t]) \right\}. \tag{10}$$

Denote Π*SD* as the set of stationary deterministic policies. Given observation (Δ[*t*], *q*[*t*]) = (Δ, *q*), the policy *π* ∈ Π*SD* selects action *a*[*t*] = *π*(Δ, *q*), where *π*(·) : (Δ, *q*) → {0, 1} is a deterministic function from state space S to action space A. In the next section, we prove that there is an optimal stationary deterministic policy for the above unconstrained MDP with infinite countable state and action space.

#### *3.2. The Existence of the Optimal Stationary Deterministic Policy*

According to [15], we need to first address a discounted cost MDP, then relate it to the original average cost problem. Given an initial state **x**[0] = **x**ˆ, the total expected discounted cost under a policy *π* is given by

$$V\_{\gamma}^{\pi}(\hat{\mathbf{x}}) = \limsup\_{T \to \infty} \mathbb{E}\_{\pi} \left\{ \sum\_{t=0}^{T-1} \gamma^{t} \mathbb{C}(\mathbf{x}[t], a[t]) \, \middle| \, \mathbf{x}[0] = \hat{\mathbf{x}} \right\},\tag{11}$$

where the discounted factor is *γ* ∈ (0, 1). Therefore, the problem of minimizing the expected discounted cost can be formulated as

$$V\_{\gamma}(\hat{\mathbf{x}}) \stackrel{\Delta}{=} \min\_{\pi \in \Pi} V\_{\gamma}^{\pi}(\hat{\mathbf{x}}),\tag{12}$$

where value function *Vγ*(**x**ˆ) denotes the minimum expected discounted cost. The policy is *γ*-optimal if it minimizes the above discounted cost. The optimality equation of *Vγ*(**x**ˆ) is introduced in Proposition 1.

#### **Proposition 1.**

*(a) The optimal expected discounted cost Vγ*(*x*ˆ) *satisfies the Bellman equation as follows:*

$$V\_{\gamma}(\mathfrak{k}) = \min\_{a \in \mathcal{A}} Q\_{\gamma}(\mathfrak{k}, a), \tag{13}$$

*where the state–action value function Qγ*(*x*ˆ, *a*) *is defined as*

$$Q\_{\boldsymbol{\gamma}}(\boldsymbol{\mathfrak{x}},a) = \mathbb{C}(\boldsymbol{\mathfrak{x}},a) + \gamma \sum\_{\mathbf{x}' \in \mathcal{S}} \Pr(\mathbf{x}'|\boldsymbol{\mathfrak{x}},a) V\_{\boldsymbol{\gamma}}(\mathbf{x}'). \tag{14}$$


$$V\_{\gamma,n}(\mathfrak{k}) = \min\_{a \in \mathcal{A}} Q\_{\gamma,n}(\mathfrak{k}, a)\_{\prime} \tag{15}$$

*where Qγ*,*n*(*x*ˆ, *a*) *is obtained as follows:*

$$Q\_{\gamma, \mathfrak{u}}(\mathfrak{x}, a) = \mathbb{C}(\mathfrak{x}, a) + \gamma \sum\_{\mathbf{x'} \in \mathcal{S}} \Pr(\mathbf{x'} | \mathfrak{x}, a) \, V\_{\gamma, \mathfrak{u} - 1}(\mathbf{x'}). \tag{16}$$

*Then the equation* lim*n*→<sup>∞</sup> *<sup>V</sup>γ*,*n*(*x*ˆ) = *<sup>V</sup>γ*(*x*ˆ) *holds for every state <sup>x</sup>*<sup>ˆ</sup> *and <sup>γ</sup>.*

#### **Proof.** See Appendix A.

Now, we can show the monotonic properties of *Vγ*(**x**ˆ) in the following lemma by using (c) in Proposition 1.

#### **Lemma 1.** *Given fixed channel erasure probability p and EH probability λ, then*

*(a) value function Vγ*(Δ, *q*) *is non-decreasing in* Δ*, i.e., for any* 1 ≤ Δ<sup>1</sup> ≤ Δ<sup>2</sup> *and any battery state q* ∈ B*, we have*

$$V\_{\gamma}(\Delta\_1, q) \le V\_{\gamma}(\Delta\_2, q),\tag{17}$$

*and*

$$V\_{\gamma}(\Delta\_{2\prime}q) - V\_{\gamma}(\Delta\_1, q) \ge \Delta\_2 - \Delta\_1. \tag{18}$$

*(b) value function Vγ*(Δ, *q*) *is non-increasing in q, i.e., for AoI* Δ ≥ 1 *and any battery state q* ∈ {0, 1, . . . , *B* − 1}*, we have*

$$V\_{\gamma}(\Delta, q) \ge V\_{\gamma}(\Delta, q + 1),\tag{19}$$

#### **Proof.** See Appendix B.

Based on the Proposition 1 and Lemma 1, we will verify the existence of the optimal stationary deterministic policy for the average cost problem (10) in the following theorem.

**Theorem 1.** *There exists an optimal policy π*- ∈ Π*SD for the average cost MDP in (10). Moreover, for every state <sup>x</sup>, there exists a value function <sup>V</sup>*(·) : S → <sup>R</sup> *and a unique constant <sup>g</sup>*- ∈ R *such that:*

$$\mathcal{g}^{\star} + V(\mathbf{x}) = \min\_{a \in \mathcal{A}} \left\{ \mathbb{C}(\mathbf{x}, a) + \sum\_{\mathbf{x}' \in \mathcal{S}} \Pr(\mathbf{x}' | \mathbf{x}, a) V(\mathbf{x}') \right\},\tag{20}$$

*where g is the optimal average cost of problem (10) and satisfies g*- = lim *γ*→1 (1 − *γ*)*Vγ*(*x*) *for every state x, and the value function V*(*x*) *satisfies*

$$V(\mathbf{x}) = \lim\_{\gamma \to 1} \gamma V\_{\gamma}(\mathbf{x}) = \lim\_{\gamma \to 1} V\_{\gamma}(\mathbf{x}) - \mathbf{g}^\* = \limsup\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\pi} \left\{ \sum\_{t=0}^{T-1} [\mathbb{C}(\mathbf{x}[t], a[t]) - \mathbf{g}^\*] \right\}. \tag{21}$$

**Proof.** See Appendix C.

Based on Theorem 1, we have the following corollary:

**Corollary 1.** *The state–action value function Q*(*x*, *a*) *for the average cost is given as follows:*

$$Q(\mathbf{x}, a) = \mathbb{C}(\mathbf{x}, a) + \sum\_{\mathbf{x'} \in \mathcal{S}} \Pr(\mathbf{x'} | \mathbf{x}, a) V(\mathbf{x'}),\tag{22}$$

*which is similar to <sup>Q</sup>γ*(*x*, *<sup>a</sup>*) *in* (14) *by letting <sup>γ</sup>* <sup>→</sup> <sup>1</sup>*. Then the optimal policy <sup>π</sup>*- ∈ Π*SD for the average cost MDP in (10) can be expressed as follows:*

$$\pi^\*(\mathbf{x}) = \arg\min\_{a \in \mathcal{A}} Q(\mathbf{x}, a), \forall \mathbf{x} \in \mathcal{S}. \tag{23}$$

#### *3.3. Structure Analysis of Optimal Policy*

Before analyzing the structure of the optimal policy *π*-, we first prove some monotonic properties of the value function *V*(**x**) on different dimensions, which is summarized in the following lemma.

#### **Lemma 2.** *Given fixed channel erasure probability p and EH probability λ, then*

*(a) value function V*(Δ, *q*) *is non-decreasing in* Δ*, i.e., for any* 1 ≤ Δ<sup>1</sup> ≤ Δ<sup>2</sup> *and any battery state q* ∈ B*, we have*

$$V(\Delta\_1, q) \le V(\Delta\_2, q)\_{\prime} \tag{24}$$

*and*

$$V(\Delta\_2, q) - V(\Delta\_1, q) \ge \Delta\_2 - \Delta\_1. \tag{25}$$

*(b) value function V*(Δ, *q*) *is non-increasing in q, i.e., for AoI* Δ ≥ 1 *and any battery state q* ∈ {0, 1, . . . , *B* − 1}*, we have*

$$V(\Delta, q) \ge V(\Delta, q+1),\tag{26}$$

**Proof.** According to the (21), *V*(*x*) = lim *γ*→1 *Vγ*(**x**) − *g*. Therefore, the monotonic properties of *Vγ*(**x**) in Lemma 1 are also valid for *V*(*x*), which completes the proof.

Based on Lemma 2, we will derive the **proportional differential property** of the value function in Lemma 3.

**Lemma 3.** *Given fixed channel erasure probability p and EH probability λ, then value function V*(Δ, *q*) *has the proportional differential property, i.e., the inequality*

$$\frac{V(\Delta+1,q+1) - V(\Delta,q+1)}{V(\Delta+1,q) - V(\Delta,q)} \ge p\tag{27}$$

*holds for AoI* Δ ≥ 1 *and any battery state q* ∈ {0, 1, ..., *B* − 1}*.*

#### **Proof.** See Appendix D.

With Corollary 1, Lemmas 2 and 3, we directly provide our main result in the following theorem.

**Theorem 2.** *Assuming that the channel erasure probability p and EH probability λ are both fixed , there exists a threshold* <sup>Δ</sup>*<sup>q</sup>* <sup>∈</sup> <sup>Z</sup><sup>+</sup> *for given battery state q, such that when* <sup>Δ</sup> <sup>&</sup>lt; <sup>Δ</sup>*q, the optimal action π*-(Δ, *<sup>q</sup>*) = <sup>0</sup>*, i.e., the sensor keeps idle; when* <sup>Δ</sup> <sup>≥</sup> <sup>Δ</sup>*q, the optimal action <sup>π</sup>*-(Δ, *q*) = 1*, i.e., the sensor chooses to generate and transmit a new update.*

#### **Proof.** See Appendix E.

Theorem 2 reveals the threshold structure of the optimal policy: if the optimal action in a certain state is to generate and transmit an update, then in the state with the same battery state and larger AoI, the optimal action must be the same. Note that the threshold Δ*<sup>q</sup>* is actually determined by the channel erasure probability *p*, EH probability *λ* and pre-defined weighting factor *ω*. The closed-form expression of the threshold is difficult to be derived due to the complex transition probabilities. In the next section, we will show how to compute the optimal policy numerically.

#### *3.4. Modified Relative Value Iteration Algorithm Design*

In this section, we will propose a computationally efficient algorithm to find the optimal stationary deterministic policy based on the threshold structure.

Since the state space <sup>S</sup> is infinite, we will use a truncated space <sup>S</sup> *<sup>N</sup>* for approximation in practice, where <sup>S</sup> *<sup>N</sup>* <sup>=</sup> {(Δ, *<sup>q</sup>*)|<sup>Δ</sup> <sup>≤</sup> *<sup>N</sup>*, <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup>+, *<sup>q</sup>* ∈ B}. It can be proved that when *<sup>N</sup>* is large enough, the optimal policy of the approximated MDP will be identical to that of the original problem [6].

However, the value iteration algorithm in Proposition 1 for the discounted cost problem cannot be applied to the average cost problem by letting *γ* = 1. It does not converge because the value function *V*(·) in (20) is not unique. One can check if *V*(·) satisfies (20), a new function *V* (·) = *V*(·) + *c* also satisfies (20), where *c* ∈ R. Therefore, we introduce a *relative value iterative* (RVI) algorithm to obtain the optimal policy of the approximate average cost MDP [45]. We choose a reference state *<sup>x</sup>*<sup>ˆ</sup> ∈ S *<sup>N</sup>* and set *<sup>V</sup>*0(**x**) = 0 for all states **<sup>x</sup>** ∈ S *<sup>N</sup>* Then for all *<sup>n</sup>* <sup>≥</sup> 0, we have

$$V\_{n+1}(\mathbf{x}) = \min\_{a \in \mathcal{A}} Q\_{n+1}(\mathbf{x}, a), \tag{28}$$

and *Qn*+1(**x**, *a*) is obtained as follows:

$$Q\_{n+1}(\mathbf{x}, a) = \mathbb{C}(\mathbf{x}, a) + \sum\_{\mathbf{x'} \in S^N} \Pr(\mathbf{x'} | \mathbf{x}, a) h\_n(\mathbf{x'}), \tag{29}$$

where the differential value function is *hn*(**x**) = *Vn*(**x**) <sup>−</sup>*Vn*(**x**ˆ). The equation lim*n*→<sup>∞</sup> *Qn*(**x**, *<sup>a</sup>*) = *<sup>Q</sup>*(**x**, *<sup>a</sup>*) holds for every state **<sup>x</sup>** ∈ S *<sup>N</sup>* and action *<sup>a</sup>* ∈ A. Finally, we compute the optimal policy by

$$
\pi^\star(\mathbf{x}) = \arg\min\_{a \in \mathcal{A}} Q(\mathbf{x}, a). \tag{30}
$$

Note that the optimal policy is still of a threshold structure. The corresponding proof is similar to that of Theorem 2.

Moreover, based on the RVI algorithm, we can exploit this threshold structure to reduce the computational complexity. When the optimal policy of a state **x** = (Δ , *q* ) is 1, the optimal policy of state **x** ∈ {(Δ, *q*)|Δ > Δ , Δ ≤ *N*, *q* = *q* } will also be 1 without the need to calculate (30). Therefore, we propose a modified RVI algorithm, and the details are given in Algorithm 1.

**Algorithm 1** Modified relative value iteration algorithm.

**Input:** Iteration number *K*, Iteration threshold , Maximum of AoI *N*, Maximum of battery state *B*, Reference state **x**ˆ. **Output:** Optimal policy *π*-(**x**) for all state **x**. 1: **Initialization:** *<sup>h</sup>*0(**x**) = 0, for all **<sup>x</sup>** ∈ S *<sup>N</sup>* 2: **for** episodes *n* = 0, 1, 2, . . . , *K* **do** 3: **for** state **<sup>x</sup>** ∈ S *<sup>N</sup>* **do** 4: **for** action *a* ∈ A **do** 5: *Qn*(**x**, *a*) ← *C*(**x**, *a*) + ∑ **<sup>x</sup>**∈S *<sup>N</sup>* Pr(**x** |**x**, *a*)*hn*(**x** )// Update the state-action value function. 6: **end for** 7: *Vn*+1(**x**) ← min *<sup>a</sup>*∈A *Qn*(**x**, *<sup>a</sup>*)// Update the value function. 8: *hn*+1(**x**) ← *Vn*+1(**x**) − *Vn*+1(**x**ˆ)// Update the differential value function. 9: **end for** 10: **if** *hn*+1(**x**) <sup>−</sup> *hn*(**x**) ≤ , <sup>∀</sup>**<sup>x</sup>** ∈ S *<sup>N</sup>* **then** 11: **for x** = (Δ, *<sup>q</sup>*) ∈ S *<sup>N</sup>* **do** 12: **if** *π*-(Δ − 1, *q*) = 1 **then** 13: *π*-(**x**) ← 1, // Leverage the threshold structure of the optimal policy. 14: **else** 15: *π*-(**x**) <sup>←</sup> arg min *<sup>a</sup>*∈A *Qn*(**x**, *<sup>a</sup>*) 16: **end if** 17: **end for** 18: **break** 19: **end if** 20: **end for**

#### **4. Minimize Average Cost in an Unknown Environment**

In the previous sections, we assumed that the channel erasure probability *p* and EH probability *λ* are known in advance. Thus, the *model-based* RVI method can be employed to obtain the optimal updating policy. However, statistics such as *p* and *λ* may be unknown and even time variant in many practical scenarios, which makes it impossible to apply modified RVI algorithm because the transition probabilities are not explicit and Equation (29) cannot be applied to estimate the state-action value function *Q*(**x**, *a*). In the field of reinforcement learning, alternatively, *model-free* methods can solve MDP problems with unknown

transition probabilities. An example of a model-free algorithm is *Q-learning* [46]. *Q*-learning finds an optimal policy in the sense of maximizing the expected value of the total reward over any and all successive steps. However, it is only designed for discounted MDP. For the average cost problem in (10), we employ an average cost *Q*-learning algorithm. The basic idea of this algorithm comes from the *SMART* algorithm in [47], which is a model-free reinforcement learning algorithm proposed for semi-Markov decision problems (SMDP) under the average-reward criterion. We modify it to fit the average cost MDP problem.

The state–action value function *Q*(**x**, *a*) is essential for solving the optimal policy. When the model is unknown, as long as *Q*(**x**, *a*) can be estimated accurately, the optimal policy can also be obtained immediately by (30). So the key question is how to estimate the *Q*(**x**, *a*) function, or equivalently, the value of all state–action pairs. Similar to *Q*-learning, the average cost *Q*-learning algorithm uses the minimum value of the next state–action pairs to update the value of the current state–action pair. Moreover, it needs to estimate the shift value *g* by averaging all immediate cost.

Specifically, the average cost *Q*-learning algorithm learns *Q*(**x**, *a*) by episodes. Each episode contains several iterations, and each iteration corresponds to one time slot. Then in the *n*th time slot of an episode, the algorithm first observes the current state **x**[*n*]=(Δ[*n*], *q*[*n*]), selects an action *a*[*n*] according to the -greedy policy:

$$a[n] = \begin{cases} \arg\min\_{a \in \mathcal{A}} Q(\mathbf{x}[n], a), & \text{with probability } 1 - \mathfrak{e}, \\ \text{random action}, & \text{otherwise.} \end{cases} \tag{31}$$

By (9), the immediate cost *C*[*n*] = Δ[*n*] + *ωCra*[*n*](1 − *u*(*q*[*n*])) occurs, and the system will transit to the next state **x**[*n* + 1]. The value of *Q*(**x**[*n*], *a*[*n*]) is updated as follows:

$$Q(\mathbf{x}[n], a[n]) = (1 - a[n])Q(\mathbf{x}[n], a[n]) + a[n](\mathbf{C}[n] - \mathbf{g} + \min\_{a \in \mathcal{A}} Q(\mathbf{x}[n+1], a)), \tag{32}$$

where *α*[*n*] is the learning rate. The shift value *g* is updated as follows:

$$\mathbf{g} = (\mathbf{1} - \beta[n])\mathbf{g} + \beta[n]\mathbf{C}[n] \tag{33}$$

where *β*[*n*] = <sup>1</sup> *<sup>n</sup>* . The details are given in Algorithm 2. We leverage the parameter  to balance exploration and exploitation. As the number of epochs increases, the learned *Q*(**x**, *a*) value will approach its true value, so we can gradually decrease  to 0 to reduce invalid exploration. At the same time, the shift value *g* will also be close to the optimal average cost *g* in (20). Note that in [47], the shift value *g* is updated only in a non-exploratory time slot. Here we update it by simply averaging all cost, similar to [48]. The performance comparison of the average cost *Q*-learing algorithm and modified RVI algorithm is shown in the next section.

**Algorithm 2** Average cost *Q*-learning algorithm. **Input:** Maximum number of episodes *K*, Maximum iteration number of an episode *Ne*, Maximum of AoI *N*, Maximum of battery state *B*, Initial value of *<sup>Q</sup>N*×*B*×<sup>2</sup> <sup>←</sup> **<sup>0</sup>**, Initial value of  ← 0, Initial value of the shift value *g* ← 0. **Output:** Learned policy *π*(**x**) for all state **x**, Average cost *g* by following the policy *π*. 1: **for** episodes *k* = 0, 1, 2, . . . , *K* **do** 2: *g* ← 0 // Initialize the shift value at the beginning of every episode. 3: **for** *n* = 1, 2, . . . , *Ne* **do** 4: Observe the current state **x**[*n*] 5: Select an action *a*[*n*] according to -greedy policy in (31) 6: Calculate immediate cost *C*[*n*] ← Δ[*n*] + *ωCra*[*n*](1 − *u*(*q*[*n*])) 7: Observe the next state **x**[*n* + 1] 8: *<sup>α</sup>*[*n*] <sup>←</sup> <sup>√</sup><sup>1</sup> *n* 9: *Q*(**x**[*n*], *a*[*n*]) ← (1 − *α*[*n*])*Q*(**x**[*n*], *a*[*n*]) + *α*[*n*](*C*[*n*] − *g* + min *<sup>a</sup>*∈A *<sup>Q</sup>*(**x**[*<sup>n</sup>* <sup>+</sup> <sup>1</sup>], *<sup>a</sup>*)) // Update the state-action value function. 10: *<sup>β</sup>*[*n*] <sup>←</sup> <sup>1</sup> *n* 11: *g* ← (1 − *β*[*n*])*g* + *β*[*n*]*C*[*n*]// Update the shift value. 12: **end for** 13: Decrease  14: **end for** 15: **for x** = (Δ, *<sup>q</sup>*) ∈ S *<sup>N</sup>* **do** 16: *<sup>π</sup>*(**x**) <sup>←</sup> arg min *<sup>a</sup>*∈A *<sup>Q</sup>*(**x**, *<sup>a</sup>*) // Calculate the learned policy *<sup>π</sup>*. 17: **end for**

#### **5. Numerical Results**

In this section, we first show the threshold structure of the optimal policy by the simulation results. Then we compare the performance of the optimal policy with the following representative policies under different system parameters:


Moreover, we will show the average cost *Q*-learning algorithm performs very close to the modified RVI with known statistics. We will also compare age and reliable energy cost trade-off curves of the optimal updating policies under EH supply, reliable energy supply and mixed energy supplies. Finally, we compare the performance of the optimal policy under the only EH supply and unit-sized battery setting with the prior results in [23,25].

#### *5.1. Simulation Setup*

In our simulations, we set the maximum of AoI *N* = 500, and the maximum of battery state *<sup>B</sup>* <sup>=</sup> 20. So the finite state space <sup>S</sup> *<sup>N</sup>* <sup>=</sup> {(Δ, *<sup>q</sup>*)|<sup>Δ</sup> <sup>≤</sup> 500, <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup>+, *<sup>q</sup>* ∈ B}. The cost of reliable energy *Cr* for one update is equal to 2. For the modified RVI algorithm, we set the iteration number *K* = 1000, iteration threshold  = 10−<sup>5</sup> and reference state **x**ˆ = (1, *B*). The optimal policy and other baseline policies are run for *T* = 10,000 time slots to compute the average cost. For the average cost *Q*-learning algorithm, we set the total number of episodes *K* = 1000, and the maximum iteration number in an episode *Ne* = 100,000.

#### *5.2. Results*

Figure 3 shows the optimal policy under different system parameters. All the subfigures in Figure 3 exhibit the threshold structure described in Theorem 2. Intuitively, when *ω* is very small, the optimal action for every state should be 1, and when *ω* is very large, the optimal action for battery state *q* = 0 should be 0. It can be observed from Figure 3a,b that when *ω* is small (i.e., *ω* = 0.1), the optimal policy is to update for every state, which is exactly the zero-wait policy. Figure 3 also shows that when *ω* is relatively large (e.g., *ω* = 10), and the AoI is small, even if the battery state is not zero, the optimal action in the corresponding state is to keep idle. When the AoI is large or the battery state is large, the optimal action is to measure and send updates. Moreover, in all the subfigures, the threshold Δ*<sup>q</sup>* keeps monotonically non-increasing with the battery state *q*. However, this conclusion has not been rigorously proven.

**Figure 3.** Optimal policy conditioned on different parameters: (**a**) *ω* = 0.1, *p* = 0.2, *λ* = 0.5, (**b**) *ω* = 0.1, *p* = 0.4, *λ* = 0.5, (**c**) *ω* = 10, *p* = 0.2, *λ* = 0.5 and (**d**) *ω* = 10, *p* = 0.4, *λ* = 0.5.

Figure 4 shows the time average cost with respect to *ω* under different policies. Here, we set the period of the periodic policy to 5 and 10 for comparison without loss of generality. It can be found that under different weighting factor *ω*, the optimal policy proposed in this paper can obtain the minimum long-term average cost compared with the other policies, which indicates the best trade-off between the average AoI and the cost of reliable energy. When *ω* tends to 0, the zero-wait policy tends to be optimal. Since there is no need to consider the update cost brought by paid reliable backup energy, the optimal policy should maximize the utilization of the updating opportunities.

**Figure 4.** Performance comparison of the proposed optimal policy, zero-wait policy, periodic policy (period = 5), periodic policy (period = 10), randomized policy and energy first policy versus the weighting factor *ω* with simulation conditions: (**a**) *p* = 0.2, *λ* = 0.5 and (**b**) *p* = 0.2, *λ* = 0.1.

It can also be observed from Figure 4 that the growth of the optimal policy curve slows down as *ω* increases. This is because the optimal policy in the case of large *ω* does not tend to use the reliable energy when battery state *q* = 0, but prefers to wait for harvested energy, as shown in Figure 3. Since the EH probability is constant, the average AoI does not change much, resulting in no significant increase in the total average cost. Comparing Figure 4a,b, it is found that the larger the *λ*, the smaller the average cost variation with *ω*. This is because there is not much opportunity for the sensor to use reliable energy in the case of sufficient harvested energy.

Figure 5 reveals the impact of EH probabilities *λ*. In Figure 5a,b, we set *p* = 0.2, *ω* = 10 and *p* = 0.2, *ω* = 1, respectively.

**Figure 5.** Performance comparison of the proposed optimal policy, zero-wait policy, periodic policy (period = 5), periodic policy (period = 10), randomized policy and energy first policy versus the EH probability *λ* with simulation conditions: (**a**) *p* = 0.2, *ω* = 10 and (**b**) *p* = 0.2, *ω* = 1.

It can also be found from both Figure 5a,b that the proposed optimal update policy outperforms all other policies under different EH probability. The interesting point is that when the EH probability tends to 1, i.e., energy arrives in each time slot, the performance of the zero-wait policy and the energy first policy is equal to the optimal policy, while there is still a performance gap between the optimal policy and the other two polices. This is intuitive because when the free harvested energy is sufficient, the optimal policy must be to generate and transmit updates in every time slot. However, the periodic policy and the randomized policy still keep idle in many time slots, which will lead to a higher average AoI and thus increase the average cost. Results show that the performance of zero-wait policy approaches the optimal policy for large *λ*, which is consistent with our findings in Figure 4.

In Figure 6, we compare the five policies under different channel erasure probability *p*.

**Figure 6.** Performance comparison of the proposed optimal policy, zero-wait policy, periodic policy (period = 5), periodic policy (period = 10), randomized policy and energy first policy versus the erasure probability *p* with simulation conditions: (**a**) *λ* = 0.5, *ω* = 10 and (**b**) *λ* = 0.2, *ω* = 10.

It can be found that when the erasure probability increases from 0 to 0.9, the proposed optimal update policy always performs better than the other baseline policies. As *p* tends to 1, the average cost under all policies theoretically tends to infinity because all updates will be erased by the noisy channel and cannot be received by the destination. The simulation results confirmed this conjecture. Comparing Figure 6a,b, we can observe that when *λ* is large, the energy-first strategy will be close to the optimal strategy, which is also illustrated in Figure 5.

Figure 7 shows the performance of the average cost *Q*-learning algorithm. In every episode, the shift value *g* of the last inner iteration is recorded as the average cost. It can be found from Figure 7a that the average cost achieved by Algorithm 2 converges to that obtained by the modified RVI algorithm under different EH probability *λ* and channel erasure probability *p*. The age–energy trade-off is shown in Figure 7b. By fixing *λ* and *p* and changing *ω* from 0 to 1000, we run the modified RVI algorithm and average cost *Q*-learning algorithm to obtain the corresponding trade-off curve. It can be found that the curve obtained by the average cost *Q*-learning algorithm is very close to the optimal tradeoff curve under the same condition, which further verifies the near-optimal performance of the average cost *Q*-learning algorithm in an unknown environment.

**Figure 7.** (**a**) Performance of the average cost *Q*-learning with respect to the modified RVI algorithm under different system parameters (*ω* = 10); (**b**) age–energy trade-off curves computed by the average cost *Q*-learning and modified RVI algorithm.

Figure 8 shows the optimal age and reliable energy cost trade-off curves for different energy supplies. By fixing EH probability *λ* and channel erasure probability *p* and changing *ω* from 0 to 10,000, we run the modified RVI algorithm to get the optimal trade-off curve for mixed energy supplies. By letting EH probability *λ* = 0 and following the same steps, we can obtain the optimal trade-off curve for reliable energy supply. By letting weighting factor *ω* go to infinity, we can theoretically obtain the optimal trade-off "curve" corresponding to the EH supply. The "curve" contains only one point because the reliable energy consumption can only be 0 for the EH supply case. It should be noted that *ω* cannot be infinite in a simulation. Instead, we can set *ω* to a relatively large number (e.g., 10,000). To facilitate comparison, the channel erasure probability is set as *p* = 0.2, and the EH probability *λ* is set as 0.1, 0.3 and 0.7. It can be observed that the curves for the mixed energy supplies are always at the lower left of the curve for relying solely on reliable energy, which indicates that under the same average AoI, the reliable energy required by the system under the mixed energy supplies is smaller, and under the same reliable energy consumption, the AoI of the system under the mixed energy supplies is lower. The mixed energy design also achieves lower AoI than that with only EH, at the cost of paying for reliable energy. The optimal updating policy proposed in this paper makes full use of the harvested energy.

**Figure 8.** Age-reliable energy trade-off for different energy supplies: mixed energy supply, reliable energy supply and EH supply. The channel erasure probability *p* = 0.2, and the EH probability *λ* is set as 0.1, 0.3 and 0.7, respectively.

Figure 9 compares the performance of the optimal policy with the prior results in [23,25] for a special case where the sensor only uses the harvested energy and the battery capacity *B* = 1. Both [23,25] considered a continuous-time model, i.e., the energy arrival process is a Poisson process with an arrival rate of Λ energy units per *time unit* (TU), and proved that the optimal policies are threshold structure, in which a new update is generated and transmitted only if the time until the next energy arrival since the latest successful transmission exceeds a certain threshold. Specifically, [23] (Theorem 4, Equation (13)) provided the average AoI and threshold in closed-form under the optimal update policy for any energy arrival rate Λ in the error-free channel case. It is interesting that the optimal average AoI and the corresponding threshold are equal. Ref. [25] (Theorem 4, Equation (14)) extended the results of [23] to an error-prone channel case, while the energy arrival rate Λ is assumed to be 1. So we first show the results of [23,25] vs. different channel erasure probability *p* in Figure 9, where the energy arrival rate Λ = 1. It should be emphasized that the unit of the average AoI and threshold is TU. According to Theorems 1 and 2 in this paper, the optimal update policy exists and admits a threshold

structure for any EH probability *λ*, channel erasure probability *p*, weighting factor *ω* and battery capacity *B*. This conclusion is based on the discrete-time model, i.e., the energy arrives as a Bernoulli process with parameter *λ*, which is different from the continuous-time model in [23,25], and the reliable backup energy is also considered. However, by the choice of some parameters (large *ω*, small *λ*), our results can be a good approximation of the results in [23,25]. First, by choosing a large *ω*, the reliable energy will almost never be used, and equivalently, only the EH supply exists. Secondly, by choosing a small *λ*, the Poisson process can be approximated as a Bernoulli process. This is because for a Poisson process with parameter Λ, we can discretize a TU into *n* small time slots of equal length, then according to probability theory, when *n* is large enough, the energy arrival process within a time slot can be approximated as a Bernoulli process with parameter *λ* = Λ/*n*, which is relatively small. In our simulation, we set the battery capacity *B* = 1, and take *λ* = 0.1 (i.e., *n* = 10) and *ω* = 10,000. By changing the channel erasure probability *p*, we can run the modified RVI algorithm to compute the minimum average AoI and the optimal threshold. It needs to be mentioned that the unit of them is a time slot. For comparison, we need to divide their values by *n* to obtain the average AoI and threshold in TU. The final results are shown by the dashed lines in Figure 9. It can be observed that the results of this paper are extremely close to the explicit results in [23,25], which verifies the correctness of the analysis and also reflects the generality of our system model.

**Figure 9.** AoI and threshold with the proposed optimal policy for a special case where the sensor only uses the harvested energy and the battery capacity *B* = 1, and those with a unit-sized battery in [23,25] (error-free channel case and error-prone channel case, respectively), vs. the channel erasure probability *p*.

#### **6. Conclusions**

In this paper, we studied the optimal updating policy for an information update system, where a wireless sensor sends updates over an erasure channel using both harvested energy and reliable backup energy. Theoretical analysis indicates the threshold structure of the optimal policy and simulation results verify its performance. For the practical case where the statistics, such as the EH probability and channel erasure probability, are unknown in advance, a learning-based algorithm is proposed to compute the updating policy. Simulation results show its performance is close to that of the optimal policy. With the optimal policy, the design of mixed energy supplies can make full use of harvested energy and achieve the best age–energy trade-off. In future work, we will focus on the timeliness of the multi-sensor system under mixed energy supplies.

**Author Contributions:** Conceptualization, L.W., F.P., X.C. and S.Z.; methodology, L.W. and F.P.; software, L.W. and F.P.; validation, L.W. and F.P.; formal analysis, L.W. and F.P.; investigation, L.W.; writing—original draft preparation, L.W.; writing—review and editing, F.P., X.C. and S.Z.; visualization, L.W.; supervision, S.Z.; project administration, S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by Key Research and Development Program of China under Grant 2019YFE0113200&2019YFE0196600, Tsinghua University-China Mobile Communications Group Co.,Ltd. Joint Institute, Huawei Company Cooperation Project under Contract No. TC20210519013.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Proof of Proposition 1**

According to [15], the proof of Proposition 1 is equivalent to proving that there is a stationary deterministic policy *π* such that the expected discounted cost *V<sup>π</sup> <sup>γ</sup>* (**x**) is finite for all **x**, *γ*. So we can select a policy *π* which chooses to keep idle in each time slot. Then by (11), for any state **x** = (Δ, *q*) ∈ S and *γ* ∈ (0, 1), we have

$$\begin{split} V\_{\gamma}^{\pi}(\mathbf{x}) &= \mathbb{E}\_{\pi} \left\{ \sum\_{t=0}^{\infty} \gamma^{t} \mathbb{C}(\mathbf{x}[t], a[t]) | \mathbf{x}[0] = \mathbf{x} \right\} \\ &= \sum\_{t=0}^{\infty} \gamma^{t} \mathbb{C}(\mathbf{x}[t], a[t]) \\ &= \sum\_{t=0}^{\infty} \gamma^{t} (\Delta + t) \\ &= \frac{1}{1 - \gamma} (\Delta + \frac{\gamma}{1 - \gamma}) < \infty, \tag{A1} \end{split} \tag{A1}$$

which completes the proof.

#### **Appendix B. Proof of Lemma 1**

The proof requires the use of value iteration algorithm(VIA) and mathematical induction. According to (c) in Proposition 1, The specific iteration process of VIA is as follows:

$$\begin{cases} V\_{\gamma,0}(\mathbf{x}) = 0, \\ Q\_{\gamma,k}(\mathbf{x}, a) = \mathbb{C}(\mathbf{x}, a) + \gamma \sum\_{\mathbf{x'} \in \mathcal{S}} \Pr(\mathbf{x'} | \mathbf{x}, a) V\_{\gamma,k}(\mathbf{x'}), \\ V\_{\gamma,k+1}(\mathbf{x}) = \min\_{a \in \mathcal{A}} Q\_{\gamma,k}(\mathbf{x}, a), \end{cases} \tag{A2}$$

where *<sup>k</sup>* <sup>∈</sup> <sup>Z</sup>+. For any state **<sup>x</sup>** ∈ S, *<sup>V</sup>γ*,*k*(**x**) will converge when *<sup>k</sup>* goes into infinity:

$$\lim\_{k \to \infty} V\_{\gamma,k}(\mathbf{x}) = V\_{\gamma}(\mathbf{x}). \tag{A3}$$

Then we will use mathematical induction to prove the monotonicity of the value function in each component.

First let us tackle part (a) of Lemma 1.

For (17), we can verify that the inequality *Vγ*,1(Δ1, *q*) ≤ *Vγ*,1(Δ2, *q*) holds when *k* = 1. Then we assume that at the *k*th step of the induction method, the following formula holds:

$$V\_{\gamma,k}(\Delta\_1, q) \le V\_{\gamma,k}(\Delta\_2, q), \forall \Delta\_1 \le \Delta\_2. \tag{A4}$$

So the next formula that needs to be verified is

$$V\_{\gamma,k+1}(\Delta\_1, q) \le V\_{\gamma,k+1}(\Delta\_2, q), \forall \Delta\_1 \le \Delta\_2 \tag{A5}$$

Since *Vγ*,*k*+1(**x**) = min *<sup>a</sup>*∈A *<sup>Q</sup>γ*,*k*(**x**, *<sup>a</sup>*), we need to bring out *<sup>Q</sup>γ*,*k*(**x**, *<sup>a</sup>*) first. Due to the complexity of the transition probabilities and one-step cost function, we need to discuss the following three cases: *q* = 0, 0 < *q* < *B* and *q* = *B*. For the sake of brevity, we only give the calculation details of the case 0 < *q* < *B*, and the other two cases can be verified by following the exact same steps.

According to transition probability (7) and (8), we have the state-action value function *Qγ*,*k*(Δ, *q*, 0) and *Qγ*,*k*(Δ, *q*, 1) as follows:

$$Q\_{\gamma,k}(\Delta,q,0) = \Delta + \gamma \lambda V\_{\gamma,k}(\Delta+1,q+1) + \gamma (1-\lambda)V\_{\gamma,k}(\Delta+1,q),\tag{A6}$$

and

$$\begin{split} Q\_{\gamma,k}(\Delta,q,1) &= \Delta + \gamma p \lambda V\_{\gamma,k}(\Delta+1,q) + \gamma p(1-\lambda)V\_{\gamma,k}(\Delta+1,q-1) \\ &\quad + \gamma(1-p)\lambda V\_{\gamma,k}(1,q) + \gamma(1-p)(1-\lambda)V\_{\gamma,k}(1,q-1). \end{split} \tag{A7}$$

Because *Vγ*,*k*(Δ, *q*) is assumed to be non-decreasing function with respect to Δ for any fixed *q*, it is obvious that both *Qγ*,*k*(Δ, *q*, 0) and *Qγ*,*k*(Δ, *q*, 1) are non-decreasing with respect to Δ. Therefore, for any Δ<sup>1</sup> ≤ Δ2, we have

$$\begin{split} V\_{\gamma,k+1}(\Lambda\_1, q) &= \min\_{a \in \mathcal{A}} \left\{ Q\_{\gamma,k}(\Lambda\_1, q, a) \right\} \\ &= \min \left\{ Q\_{\gamma,k}(\Lambda\_1, q, 0), Q\_{\gamma,k}(\Lambda\_1, q, 1) \right\} \\ &\leq \min \left\{ Q\_{\gamma,k}(\Lambda\_2, q, 0), Q\_{\gamma,k}(\Lambda\_2, q, 1) \right\} \\ &= V\_{\gamma,k+1}(\Lambda\_2, q). \end{split} \tag{A8}$$

As a result, with the induction we prove that *Vγ*,*k*(Δ, *q*) is a non-decreasing function with respect to Δ for any *q* ∈ {1, . . . , *B* − 1}, i.e., the Equation (A4) holds. When *k* goes to infinity, combining (A3) and (A4), we prove that (17) holds in the case 0 < *q* < *B*. In the other two cases, (17) still holds. So we have proved that (17) holds for any *q* ∈ B.

For (18), it is easy to yield

$$\begin{aligned} Q\_{\gamma}(\Lambda\_2, q, 0) - Q\_{\gamma}(\Lambda\_1, q, 0) &= \Lambda\_2 - \Delta\_1 \\ &+ \gamma \lambda \left[ V\_{\gamma}(\Lambda\_2 + 1, q + 1) - V\_{\gamma}(\Lambda\_1 + 1, q + 1) \right] \\ &+ \gamma (1 - \lambda) \left[ V\_{\gamma}(\Lambda\_2 + 1, q) - V\_{\gamma}(\Lambda\_1 + 1, q) \right] \\ &\overset{(a)}{\geq} \Delta\_2 - \Delta\_1. \end{aligned} \tag{A9}$$

and

$$\begin{aligned} \mathcal{Q}\_{\gamma}(\Delta\_2 q, 1) - \mathcal{Q}\_{\gamma}(\Delta\_1, q, 1) &= \Delta\_2 - \Delta\_1 \\ &+ \gamma p \lambda [V\_{\gamma}(\Delta\_2 + 1, q) - V\_{\gamma}(\Delta\_1 + 1, q)] \\ &+ \gamma p (1 - \lambda) [V\_{\gamma}(\Delta\_2 + 1, q - 1) - V\_{\gamma}(\Delta\_1 + 1, q - 1)] \\ &+ \gamma (1 - p) \lambda [V\_{\gamma}(1, q) - V\_{\gamma}(1, q)] \\ &+ \gamma (1 - p) (1 - \lambda) [V\_{\gamma}(1, q - 1) - V\_{\gamma}(1, q - 1)] \\ &\stackrel{(b)}{\geq} \Delta\_2 - \Delta\_1. \end{aligned} \tag{A10}$$

where (*a*) and (*b*) are due to (17). Since *Vγ*(**x**) = min *<sup>a</sup>*∈A *<sup>Q</sup>γ*(**x**, *<sup>a</sup>*), we prove that Equation (18) holds for all *q* ∈ {1, . . . , *B* − 1}. Through the same proof process, it can also be verified that (18) is also valid when *q* = 0 and *q* = *B*. Therefore, we have completed the proof of part (a).

Second, we will tackle the part (b) of Lemma 1.

For (19), according to the exact same mathematical induction we have applied to (17), we can also verify that Equation (19) holds. Due to limited space, the details are omitted here.

Hence, we have completed the whole proof.

#### **Appendix C. Proof of Theorem 1**

By Proposition 4 in [15], it suffices to show that the following four conditions hold:


**x**∈S For condition (1), recall that we have verified that there exists a stationary deterministic policy *π* such that the expected discounted cost *V<sup>π</sup> <sup>γ</sup>* is finite in the proof of Proposition 1, and here we will extend this conclusion to any policy *π* ∈ Π. For any non-anticipative policy *π* and state **x** = (Δ, *q*), we have

$$\mathbb{C}(\mathbf{x}[t], a[t]) = \Delta + \omega \mathbb{C}\_r a(1 - u(t)) \le \Delta + \omega \mathbb{C}\_r. \tag{A11}$$

Since the AoI grows linearly at most, for any state **x** = (Δ, *q*) and discounted factor *γ*, we have

$$V\_{\gamma}(\mathbf{x}) = \min\_{\pi \in \Pi} V\_{\gamma}^{\pi}(\mathbf{x}) = \min\_{\pi \in \Pi} \mathbb{E}\_{\pi} \left\{ \sum\_{t=0}^{\infty} \gamma^{t} \mathbb{C}(\mathbf{x}[t], a[t]) | \mathbf{x}[0] = (\Delta, q) \right\}$$

$$\leq \sum\_{t=0}^{\infty} \gamma^{t} (\Delta + t + \omega \mathbb{C}\_{\tau})$$

$$= \frac{1}{1 - \gamma} (\Delta + \omega \mathbb{C}\_{\tau} + \frac{\gamma}{1 - \gamma}) < \infty,\tag{A12}$$

which verifies condition (1).

Next let us focus on condition (2). By (17) and (19) in Lemma 1, *Vγ*(Δ, *q*) is nondecreasing with regard to age Δ and non-increasing with regard to battery state *q*. Hence, we can choose *L* = 0 and reference state **x**ˆ = (1, *B*). Then we have *L* = 0 ≤ *Vγ*(**x**) − *Vγ*(**x**ˆ) = *hγ*(**x**), which verifies condition (2).

To prove that condition (3) holds, we need to introduce the following lemma:

**Lemma A1.** *Denote x*ˆ = (1, *B*) *as the reference state, and T* = inf{*t* : *t* ≥ 0, *x*[*t*] = *x*ˆ} *as the first hitting time from the initial state x to x*ˆ*. Under the following lazy policy π :*

$$
\pi'(\Delta, q) = \begin{cases} 1, & \text{if } q = B, \\ 0, & \text{otherwise}, \end{cases} \tag{A13}
$$

*the expected discounted cost from x to x*ˆ *is finite for all initial state x* ∈ S*, i.e.,*

$$\mathbb{C}^{\pi'}(\mathbf{x}) = \mathbb{E}\_{\pi'} \left\{ \sum\_{t=0}^{T-1} \gamma^t \mathbb{C}(\mathbf{x}[t], a[t]) | \mathbf{x}[0] = \mathbf{x} \right\} < \infty. \tag{A14}$$

*Note that if x* = *x*ˆ*, Cπ* (*x*) = 0*.*

#### **Proof.** see Appendix F.

Considering a mixed non-anticipative policy *π<sup>m</sup>* consisting of *π* and optimal policy *π*for (12) from the initial state **x** as follows,

$$
\pi^m(\mathbf{x}[t]) = \begin{cases}
\pi'(\mathbf{x}[t]), & \text{if } t < T, \\
\pi^\*(\mathbf{x}[t]), & \text{otherwise}, \end{cases} \tag{A15}
$$

we have

$$\begin{split} \mathcal{V}\_{\gamma}(\mathbf{x}) \leq & V\_{\gamma}^{\pi^{\mathsf{T}}}(\mathbf{x}) = \mathbb{E}\_{\pi^{\mathsf{T}}} \left\{ \sum\_{t=0}^{T-1} \gamma^{t} \mathbb{C}(\mathbf{x}[t], a[t]) | \mathbf{x}[0] = \mathbf{x} \right\} + \mathbb{E}\_{\pi^{\mathsf{T}}} \left\{ \sum\_{t=T}^{\infty} \gamma^{t} \mathbb{C}(\mathbf{x}[t], a[t]) | \mathbf{x}[T] = \hat{\mathbf{x}} \right\} \\ \overset{(a)}{=} & \mathbb{C}^{\pi^{\mathsf{T}}}(\mathbf{x}) + \mathbb{E}\_{\pi^{\mathsf{T}}} \left\{ \gamma^{T} V\_{\gamma}(\hat{\mathbf{x}}) \right\} \\ \overset{(b)}{\leq} & \mathbb{C}^{\pi^{\mathsf{T}}}(\mathbf{x}) + V\_{\gamma}(\hat{\mathbf{x}}), \end{split} \tag{A16}$$

where (*a*) is due to (A14) and (12), (*b*) is due to *γ* ∈ (0, 1). Recall the definition of *hγ*(**x**), by setting *M*(**x**) = *Cπ* (**x**), the condition (3) holds.

Based on Lemma A1, *M*(**x**) = *Cπ* (**x**) < ∞ holds for any state **x**. Since there will be finite possible states after transition from **x** under any action, the sum of finite *M*(·) is also finite. Hence, condition (4) holds.

#### **Appendix D. Proof of Lemma 3**

For (27), an equivalent transformation is made as follows:

$$V(\Delta+1,q+1) + pV(\Delta,q) \ge V(\Delta,q+1) + pV(\Delta+1,q). \tag{A17}$$

For every state **x**, we have

$$V(\mathbf{x}) = \min\_{a \in \mathcal{A}} Q(\mathbf{x}, a) = \min \{ Q(\mathbf{x}, 0), Q(\mathbf{x}, 1) \}. \tag{A18}$$

So every value function in (A17) has two possible values. In order to prove Equation (A17), theoretically we need to discuss 24 = 16 cases, which is obviously a bit too cumbersome. Here we use a little trick, that is, as long as we prove that for the 22 = 4 possible combinations on the left hand side(LHS) of (A17), there exists a combination on the right hand side (RHS) of (A17) to make "≥" hold, then we complete the proof. For convenience, we make a mapping by using four numbers to sequentially represent the action taken by the

minimum state–action value function in Equation (A17). For example, "1010" represents the following:

$$Q(\Delta+1,q+1,\mathbf{1}) + pQ(\Delta,q,\mathbf{0}) \ge Q(\Delta,q+1,\mathbf{1}) + pQ(\Delta+1,q,\mathbf{0}),\tag{A19}$$

So according to the previous trick, we only need to verify "0000", "1010", "0101", and "1111" to prove Equation (A17). For brevity, we only show the verification process of "1010" in the following proof. The other three cases can also be proved by exactly the same steps.

Now we start to apply VIA and mathematical induction. Assuming that *V*0(**x**) = 0 for any states **x**, it is easy to yield

$$V\_1(\Delta+1, q+1) + pV\_1(\Delta, q) \ge V\_1(\Delta, q+1) + pV\_1(\Delta+1, q),\tag{A20}$$

for any *<sup>q</sup>* <sup>∈</sup> {0, 1, ..., *<sup>B</sup>* <sup>−</sup> <sup>1</sup>} and <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup>+. By induction, assuming that for any *<sup>q</sup>* <sup>∈</sup> {0, 1, . . . , *<sup>B</sup>* <sup>−</sup> <sup>1</sup>} and <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup>+, we have:

$$V\_k(\Delta + 1, q+1) + pV\_k(\Delta, q) \ge V\_k(\Delta, q+1) + pV\_k(\Delta + 1, q). \tag{A21}$$

What we need to do is to verify that Equation (A21) still holds in the next value iteration. Based on our previous analysis, we will focus on the "1010" case. For <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> and *q* ∈ {0, 1, . . . , *B* − 1}, we have

$$\begin{aligned} &Q\_k(\Delta+1,q+1,1) + pQ\_k(\Delta,q,0) \\ &- [Q\_k(\Delta,q+1,1) + pQ\_k(\Delta+1,q,0)] \\ &= \Delta+1+p\lambda V\_k(\Delta+2,q+1) + p(1-\lambda)V\_k(\Delta+2,q) \\ &+ (1-p)\lambda V\_k(1,q+1) + (1-p)(1-\lambda)V\_k(1,q) \\ &+ p[\Delta+\omega \mathcal{C}\_r + \lambda V\_k(\Delta+1,q+1) + (1-\lambda)V\_k(\Delta+1,q)] \\ &- \Delta-p\lambda V\_k(\Delta+1,q+1) - p(1-\lambda)V\_k(\Delta+1,q) \\ &- (1-p)\lambda V\_k(1,q+1) - (1-p)(1-\lambda)V\_k(1,q) \\ &- p[\Delta+1+\omega \mathcal{C}\_r + \lambda V\_k(\Delta+2,q+1) - (1-\lambda)V\_k(\Delta+2,q)] \\ &= 1-p \ge 0. \end{aligned} \tag{A22}$$

Therefore, by the similar step, we can verify the other three cases and confirm that the following formula

$$V\_{k+1}(\Delta+1,q+1) + pV\_{k+1}(\Delta,q) \ge V\_{k+1}(\Delta,q+1) + pV\_{k+1}(\Delta+1,q) \tag{A23}$$

holds for any <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> and *<sup>q</sup>* <sup>∈</sup> {0, 1, . . . , *<sup>B</sup>* <sup>−</sup> <sup>1</sup>}. Therefore, by induction, we prove that for any *k*, the Equation (A21) holds. Take the limits of *k* on both sides, then we are able to prove that (A17) holds, which indicates that (27) holds. Therefore, we have completed the proof.

#### **Appendix E. Proof of Theorem 2**

By Corollary 1, the optimal policy is of a threshold structure if *Q*(**x**, *a*) has a *sub-modular* structure, that is,

$$Q(\Delta, q, 0) - Q(\Delta, q, 1) \le Q(\Delta + 1, q, 0) - Q(\Delta + 1, q, 1). \tag{A24}$$

We will divide the whole proof into the following three cases: **Case 1.** When *<sup>q</sup>* <sup>=</sup> 0, for any <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> we have:

$$\begin{split} &Q(\Delta,q,0) - Q(\Delta,q,1) \\ = &\Delta + \lambda V(\Delta+1,q+1) + (1-\lambda)V(\Delta+1,q) \\ &- \Delta - \omega \mathbb{C}\_r - p\lambda V(\Delta+1,q+1) + p(1-\lambda)V(\Delta+1,q) \\ &- (1-p)\lambda V(1,q+1) - (1-p)(1-\lambda)V(1,q) \\ = &(1-p)\lambda (V(\Delta+1,q+1) - V(1,q+1)) \\ &+ (1-p)(1-\lambda)(V(\Delta+1,q) - V(1,q)) - \omega \mathbb{C}\_r. \end{split} \tag{A25}$$

Therefore, we have

*Q*(Δ + 1, *q*, 0) − *Q*(Δ + 1, *q*, 1) − [*Q*(Δ, *q*, 0) − *Q*(Δ, *q*, 1)] =(1 − *p*)*λ*(*V*(Δ + 2, *q* + 1) − *V*(Δ + 1, *q* + 1)) + (1 − *p*)(1 − *λ*)(*V*(Δ + 2, *q*) − *V*(Δ, *q*)) (*a*) ≥0, (A26)

where the last inequality (*a*) is due to (24) in Lemma 2. **Case 2.** When *<sup>q</sup>* <sup>∈</sup> {1, . . . , *<sup>B</sup>* <sup>−</sup> <sup>1</sup>}, for any <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> we have

$$\begin{aligned} &Q(\Delta+1,q,0)-Q(\Delta+1,q,1)-[Q(\Delta,q,0)-Q(\Delta,q,1)] \\ = &Q(\Delta+1,q,0)-Q(\Delta,q,0)-[Q(\Delta+1,q,1)-Q(\Delta,q,1)] \\ = &\lambda[V(\Delta+2,q+1)-V(\Delta+1,q+1)] \\ &-p\lambda[V(\Delta+2,q)-V(\Delta+1,q)] \\ &+(1-\lambda)[V(\Delta+2,q)-V(\Delta+1,q)] \\ &-p(1-\lambda)[V(\Delta+2,q-1)-V(\Delta+1,q-1)] \\ \overset{(a)}{\geq} &0,\end{aligned}$$

where the last inequality (*a*) is due to (27) in Lemma 3.

**Case 3.** When *<sup>q</sup>* <sup>=</sup> *<sup>B</sup>*, for any <sup>Δ</sup> <sup>∈</sup> <sup>Z</sup><sup>+</sup> we have

$$\begin{aligned} &Q(\Delta+1,q,0) - Q(\Delta+1,q,1) - \left[Q(\Delta,q,0) - Q(\Delta,q,1)\right] \\ &= Q(\Delta+1,q,0) - Q(\Delta,q,0) - \left[Q(\Delta+1,q,1) - Q(\Delta,q,1)\right] \\ &= (1-\lambda)\left[V(\Delta+2,q) - V(\Delta+1,q)\right] \\ &- p(1-\lambda)\left[V(\Delta+2,q-1) - V(\Delta+1,q-1)\right] \\ &\overset{(a)}{\geq} 0, \end{aligned} \tag{A28}$$

where the last inequality (*a*) is also due to (27) in Lemma 3.

Therefore, we have completed the whole proof.

#### **Appendix F. Proof of Lemma A1**

Before dealing with the expected discounted cost *Cπ* (**x**), we need to find the probability distribution of the first hitting time *T*, which is determined by the transition probabilities of system states. Under the lazy policy *π* , we can formulate a two-dimensional Markov chain to describe the dynamic changes of system states. The state transition probabilities of the formulated Markov chain is extremely complicated, and we can simplify it by combining some states, as depicted in Figure A1.

**Figure A1.** A simplified Markov chain of system states under the lazy policy. Note that (1, *B*) is the reference state. (·, 1) means the state set ' (Δ, *<sup>q</sup>*)|<sup>Δ</sup> <sup>∈</sup> <sup>Z</sup>+, *<sup>q</sup>* <sup>=</sup> <sup>1</sup> ( , (-, *B*) means the state set ' (Δ, *<sup>q</sup>*)|<sup>Δ</sup> <sup>∈</sup> <sup>Z</sup>+, <sup>Δ</sup> <sup>&</sup>gt; 1, *<sup>q</sup>* <sup>=</sup> *<sup>B</sup>* ( and so on for the rest.

According to the simplified Markov chain, the initial state **x** can be divided into three cases: (-, *B*), (·, *B* − 1), and (·, *q*) where *q* < *B* − 1. Note that for the special case **x** = **x**ˆ, *Cπ* (*x*) is set to be 0. First, we focus on the case **x** = (·, *q*), where *q* < *B* − 1. Suppose it takes *T* = *k* time slots for state **x** to transit to state **x**ˆ for the first time. Then state **x** = (·, *B* − 1) must be passed during these *k* time slots. Therefore, we can divide the entire transition process into two parts: state **x** first visits state **x** after *k*<sup>1</sup> time slots, and then starts from state **x** and enters state **x**ˆ for the first time after *k*<sup>2</sup> = *k* − *k*<sup>1</sup> time slots. Denote *f* (*n*) **<sup>x</sup>**1,**x**<sup>2</sup> as the first hitting probability from state **x**<sup>1</sup> to state **x**<sup>2</sup> after *n* time slots, then we have

$$f\_{\mathbf{x},\mathbf{k}}^{(k)} = \sum\_{k\_1=0}^{k} f\_{\mathbf{x},\mathbf{x}'}^{(k\_1)} f\_{\mathbf{x}',\mathbf{k}}^{(k\_2)}.\tag{A29}$$

When the initial state first transits to state **x** , the total energy arrivals must be exactly *B* − *q* − 1. Hence, the first hitting probability *f* (*k*1) **<sup>x</sup>**,**x** from state **x** to state **x** can be expressed as follows:

$$\begin{split} f\_{\mathbf{x},\mathbf{x}'}^{(k\_1)} &= \binom{k\_1 - 1}{B - q - 2} \lambda^{B - q - 2} (1 - \lambda)^{k\_1 - 1 - (B - q - 2)} \lambda \\ &= \binom{k\_1 - 1}{B - q - 2} (\frac{\lambda}{1 - \lambda})^{B - q - 1} (1 - \lambda)^{k\_1} \\ &\stackrel{(a)}{\leq} (k\_1 - 1)^{B - q - 2} (\frac{\lambda}{1 - \lambda})^{B - q - 1} (1 - \lambda)^{k\_1} \end{split} \tag{A30}$$

where *k*<sup>1</sup> ≥ *B* − *q* − 1. The inequality (*a*) in (A30) is due to combination ( *N <sup>r</sup>* ) <sup>≤</sup> *<sup>N</sup><sup>r</sup>* , ∀*N* ≥ *r*. For any *k*<sup>1</sup> < *B* − *q* − 1, we have *f* (*k*1) **xx** = 0.

After entering state **x** , the system state will always change between states **x** = (·, *B* − 1) and (-, *B*) before entering state **x**ˆ for the first time. By mathematical induction, *f* (*k*2) **<sup>x</sup>**,**x**<sup>ˆ</sup> is given as follows:

$$\begin{split} f\_{\mathbf{x}',\mathbf{A}}^{(k\_2)} &= \begin{bmatrix} 1 - \lambda & \lambda \end{bmatrix} \begin{bmatrix} 1 - \lambda & \lambda \\ 1 - \lambda & \lambda p \end{bmatrix}^{k\_2 - 2} \begin{bmatrix} 0 \\ \lambda(1 - p) \end{bmatrix} \\ &= (1 - p)\lambda^2 \frac{\beta\_1^{k\_2 - 1} - \beta\_2^{k\_2 - 1}}{\beta\_1 - \beta\_2} \\ &= (1 - p)\lambda^2 \sum\_{i = 0}^{k\_2 - 2} \beta\_1^i \beta\_2^{k\_2 - 2 - i} \\ &\stackrel{(a)}{<} (1 - p)\lambda^2 (k\_2 - 1) \beta\_1^{k\_2 - 2}, \end{split} \tag{A31}$$

where *k*<sup>2</sup> ≥ 2, *β*<sup>1</sup> and *β*<sup>2</sup> are the eigenvalues of the matrix ⎡ ⎣ 1 − *λ λ* 1 − *λ λp* ⎤ <sup>⎦</sup> and satisfy −1 < *β*<sup>2</sup> < 0 < 1 − *λ* < *β*<sup>1</sup> < 1. The last inequality (*a*) of (A31) is due to *β*<sup>2</sup> < 0 < *β*<sup>1</sup> and |*β*2| < |*β*1|. For any *k*<sup>2</sup> < 2, we have *f* (*k*2) **<sup>x</sup>**,**x**<sup>ˆ</sup> = 0.

Therefore, we will verify the discounted cost from the initial state **x** to reference state **x**ˆ is finite as follows:

$$\begin{split} \mathsf{C}^{\pi'}(\mathbf{x}) &= \mathbb{E}\_{\pi'} \left\{ \sum\_{t=0}^{T-1} \gamma^t \mathsf{C}(\mathbf{x}[t], a[t]) | \mathbf{x}[0] = \mathbf{x} \right\} \\ &\stackrel{(a)}{\leq} \sum\_{k=0}^{\infty} f\_{\mathbf{x}, \mathbf{k}}^{(k)} \sum\_{t=0}^{k} (\Delta + t + \omega \mathsf{C}\_{\tau}) ] \\ &\stackrel{(b)}{=} \sum\_{k=B-q+1}^{\infty} \sum\_{i\_1=B-q-1}^{k} f\_{\mathbf{x}, \mathbf{k}}^{(k\_1)} f\_{\mathbf{x}, \mathbf{k}}^{(k\_2)} \left[ \sum\_{t=0}^{k} (\Delta + t + \omega \mathsf{C}\_{\tau}) \right] \\ &\stackrel{(c)}{\leq} (1-p) \lambda^2 \frac{(\frac{1-\lambda}{\beta\_1})^{B-q-1}}{1-\frac{1-\lambda}{\beta\_1}} \sum\_{k=2}^{\infty} \beta\_1^{k-2} k^{B-q-1} [\sum\_{t=0}^{k} (\Delta + t + \omega \mathsf{C}\_{\tau})] \\ &\stackrel{(d)}{<} \infty. \end{split} \tag{A32}$$

where inequality (*a*) is due to (A11), equality (*b*) is due to (A29), inequality (*c*) is due to (A30) and (A31), and inequality (*d*) is due to 0 < *β*<sup>1</sup> < 1.

For the other two case where the initial state is (·, *B* − 1) or (-, *B*), the discounted cost to the reference state for the first time can also be verified to be finite by similar steps. Therefore, we have completed the proof of Lemma A1.

#### **References**


## *Article* **Age Analysis of Status Updating System with Probabilistic Packet Preemption**

**Jixiang Zhang and Yinfei Xu \***

School of Information Science and Engineering, Southeast University, Nanjing 210096, China; zhangjx@seu.edu.cn **\*** Correspondence: yinfeixu@seu.edu.cn

**Abstract:** The age of information (AoI) metric was proposed to measure the freshness of messages obtained at the terminal node of a status updating system. In this paper, the AoI of a discrete time status updating system with probabilistic packet preemption is investigated by analyzing the steady state of a three-dimensional discrete stochastic process. We assume that the queue used in the system is *Ber*/*Geo*/1/2∗/*η*, which represents that the system size is 2 and the packet in the buffer can be preempted by a fresher packet with probability *η*. Instead of considering the system's AoI separately, we use a three-dimensional state vector (*n*, *m*, *l*) to simultaneously track the real-time changes of the AoI, the age of a packet in the server, and the age of a packet waiting in the buffer. We give the explicit expression of the system's average AoI and show that the average AoI of the system without packet preemption is obtained by letting *η* = 0. When *η* is set to 1, the mean of the AoI of the system with a *Ber*/*Geo*/1/2∗ queue is obtained as well. Combining the results we have obtained and comparing them with corresponding average continuous AoIs, we propose a possible relationship between the average discrete AoI with the *Ber*/*Geo*/1/*c* queue and the average continuous AoI with the *M*/*M*/1/*c* queue. For each of two extreme cases where *η* = 0 and *η* = 1, we also determine the stationary distribution of AoI using the probability generation function (PGF) method. The relations between the average AoI and the packet preemption probability *η*, as well as the AoI's distribution curves in two extreme cases, are illustrated by numerical simulations. Notice that the probabilistic packet preemption may occur, for example, in an energy harvest (EH) node of a wireless sensor network, where the packet in the buffer can be replaced only when the node collects enough energy. In particular, to exhibit the usefulness of our idea and methods and highlight the merits of considering discrete time systems, in this paper, we provide detailed discussions showing how the results about continuous AoI are derived by analyzing the corresponding discrete time system and how the discrete age analysis is generalized to the system with multiple sources. In terms of packet service process, we also propose an idea to analyze the AoI of a system when the service time distribution is arbitrary.

**Keywords:** age of information; discrete time status updating system; probabilistic preemption; probability generation function; stationary distribution

#### **1. Introduction**

The freshness of transmitted messages has attracted increased attention in the design of practical communication systems. Messages obtained by a controller in a real-time monitor system may be used to perform traffic scheduling or resource allocation, and for such applications, the system's timeliness is crucial for the scheduler to make the right response and for precise control. The age of information (AoI) metric was proposed in [1] as the time elapsed since the generation time of the last received packet in the destination, which has been used widely in recent years to measure the packet's freshness and characterize the timeliness of various communication networks. A simple introduction to the AoI theory can be found in [2], and in [3], the authors made a detailed summary about the analytical results

**Citation:** Zhang, J.; Xu, Y. Age Analysis of Status Updating System with Probabilistic Packet Preemption. *Entropy* **2022**, *24*, 785. https:// doi.org/10.3390/e24060785

Academic Editors: Anthony Ephremides and Yin Sun

Received: 1 May 2022 Accepted: 1 June 2022 Published: 2 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of age of information, along with employing the AoI optimization in many cyber-physical applications.

#### *1.1. Related Work*

For a status updating system with simple queue models, such as *M*/*M*/1, *M*/*D*/1, and *D*/*M*/1 queues, the expression of average AoI was obtained in [4–7]. In particular, in [7], the authors considered a queue using Last-Come-First-Served (LCFS) discipline, and the newer packet from the source could preempt the packet currently in service. The influence of different packet management strategies on a system's average AoI was investigated in [8,9], where only one or two packets can be stored in the system. Specifically, the average AoI of a system with three queues—that is, *M*/*M*/1/1, *M*/*M*/1/2, and *M*/*M*/1/2\*—was determined. The difference between last two queues lies in whether the packet waiting in the buffer can be substituted by following packets from the source. For two cases with a system size equal to 2, it was shown that updating the waiting packet with a fresher one can always result in a lower average AoI, which is apparent because transmitting the packet with a smaller age is helpful for improving the timeliness of the information transmission systems. Apart from these, the benefit of introducing a proper packet deadline, both deterministic and random, to reduce the average age of information was proved in [10–12]. Controlling packet preemptions to improve the freshness of a transmitted message was discussed in [13–15]. The authors of [16] showed that the average AoI can be significantly improved when adding a period of waiting time before the service of a new packet begins. Assuming there are two parallel servers in the status updating system, the expressions of the average AoI were determined in [17]. A freshness-based cache updating in a parallel relay network was considered in [18]. Notice that when more than one server was present, the updating packet could reach the destination through different paths. In these situations, since a packet generated behind may be transmitted to destination via a short-delay path, it is possible that this packet arrives at the receiver before the packets generated before it. Recently, many papers have been launched considering the AoI of status updating networks with simple structures, such as the status updating system with multiple sources [19–25], the system with more than one hop transmission [26–30], and the system in which the packet transmission is assisted by a relay [31–35]. Recently, using the SHS method, the AoI of an arbitrarily connected network named the gossip network was discussed in [36,37]. For each of the above systems, the average performance of the AoI was characterized, and even some properties of the AoI's distribution were obtained in certain papers. For example, for the age on a line network of preemptive memoryless servers, in [38], the author proved that the age at a node is identical in distribution to the sum of independent exponential service times by calculating the Moment Generation Function (MGF) of the defined age vector. In [39,40], the distribution of AoI was studied in a wireless networked control system with two-hop packet transmission. The authors devised the problem of minimizing the tail of the AoI distribution with respect to the sampling rate under a First-Come First-Serve (FCFS) queuing discipline. In [41], for the phase-type (PH-type) interarrival time or packet service time, the authors numerically obtained the exact distribution of the (peak) age of information for the system with *PH*/*PH*/1/1 and *M*/*PH*/1/2 queues. Within the paper, they used the sample path arguments and the theory of Markov Fluid Queues (MFQ). Except for the works we mentioned above which focus on obtaining analytical results of the AoI for status updating systems with various queue models, even more papers have been published in which the authors considered designing optimal systems under different timeliness requirements, such as in [42–51]. In such problems, usually the age of information is used as a freshness metric and is studied as the optimization objective.

#### *1.2. Discussion of Existing Methods*

As far as we know, at least three methods have been proposed to analyze the AoI of a continuous time status updating system. The first one is the method based on the graph of the AoI stochastic process, which was given in [2]. The time average AoI is obtained by calculating the area below the sample path of the AoI process. Using the common assumption that the age process is ergodic, this time average AoI converges to the AoI's mean as the observation time tends to infinity. It shows that the average AoI of a status updating system is determined by

$$\mathbb{E}[\Delta] = \frac{\mathbb{E}[\mathcal{Y}T] + \mathbb{E}\left[\mathcal{Y}^2\right]/2}{\mathbb{E}[\mathcal{Y}]} \tag{1}$$

when the packet arrival process and the distribution of service time are specified, in which the notation *Y* denotes the interarrival time between two successive updating packets, and *T* represents the packet system time. Secondly, in [6], the authors illustrated the usage of the Stochastic Hybrid System (SHS) approach to the analysis of system's stationary AoI. They employed a continuous state vector to track the real-time age of the updating packets from the source and described all the possible state vector transfers under the system's random dynamics—for example, if a new packet arrives, whether the packet service is completed. Then, the steady state of the multiple-dimensional continuous time Markov process was characterized by a group of differential equations, and the first few of the AoI's moments could be obtained using the theory of SHS [52]. This method was used later to determine the average AoI of more general systems, including the system with multiple sources, packet preemption, and even stochastic energy harvesting at certain system nodes. The last method was introduced in [5], where the authors proposed a novel description of the AoI process and characterized its sample paths using a new point process. They proved that the stationary distribution of the AoI can be represented in terms of the distributions of the system's delay and the peak AoI. From this point of view, large numbers of analytical formulas about the AoI's stationary distribution were obtained (in the form of its Laplace Stieljes Transform (LST)) for single-server systems. In addition, we found that the same method has been used to consider the distribution of discrete time (peak) AoI in [53,54], where the *z*-transform of the (peak) AoI's distribution was derived for the system with some discrete queues.

Although plenty of results have been obtained using the methods mentioned above, interested readers may find that most of the results are heavily dependent on the assumptions that the packet arrivals form a Poisson process and the service time distribution are exponential, especially for the SHS method. The memoryless property of both interarrival time distribution and the distribution of packet service time dramatically simplifies the age analysis of the considered status updating system. So far, the first method based on the graphical argument of the AoI process is used only to calculate the AoI's mean, but it seems that the theory of Level Crossing in [55] may be useful when considering the AoI's distribution from the sample paths themselves. The level crossing method has been used to derive the steady-state probability density function of queue waiting in several variants of the *M*/*G*/1 queue. It is worth trying to determine whether this theory can be used to find the stationary distribution of continuous AoI. Using the SHS method, similarly, only the first few of the AoI's moments can be calculated. In order to obtain the distribution property of system's AoI, one has to solve the system of differential equations, which is extremely hard in general. At last, in [5], the authors pointed out that the general formula they proposed holds sample-path-wise, regardless of the service discipline or the distributions of interarrival and packet service times; however, the results they obtained are not straight-forward, as they only derived the LST of the AoI's stationary distribution, while computing the explicit expression of this distribution is also a hard problem due to the difficulty of computing the inverse of the LST. On the other hand, it is unknown if the method and the obtained formula can be generalized to more general status updating systems, not just for the system with a single server.

In the following part, we introduce the idea and methods to analyze the AoI of discrete time status updating systems and talk about their merits compared with those ways dealing with continuous time age of information. By an explicit example, we show how the results of continuous AoI can be obtained by considering the corresponding discrete time systems.

#### *1.3. Analysis of Discrete Time AoI: Idea and Methods*

We propose the idea and methods to characterize the steady state AoI of a discrete time status updating system, in which the packet arrivals, the packet service, and AoI declines are considered in discrete time slots. Although there are not many, there are still some works analyzing the AoI of a discrete system with different queue models. To our best knowledge, the analysis of discrete AoI was proposed for the first time in [56]. Using the proof techniques and tools developed to analyze continuous AoI, the authors obtained the average (peak) AoI of a *Ber*/*G*/1 and *G*/*G*/∞ queue modeled discrete time status updating system. The notation "Ber" represents that the packet arrival or the service of the packet forms a Bernoulli stochastic process; equivalently, in each time slot, a packet arrives (or the packet service is completed), which is independent and occurs with an identical probability. Later, using the similar description of the age process's sample path as in [5], in [53,54] the expression of the discrete AoI's distribution was obtained for the system with a First-Come First-Served (FCFS) queue, the preemptive Last-Come First-Served (LCFS) queue, and the bufferless status updating system. Discrete time systems with multiple sources are considered in [57]. Under the assumption of Bernoulli packet arrivals and a common general discrete phase-type service time distribution across all the sources, the authors obtained the exact per-source distributions of AoI and peak AoI in matrix-geometric form for three different queueing disciplines, i.e., nonpreemptive bufferless, preemptive bufferless, and nonpreemptive single buffer with replacement.

In our work [58], we obtain the explicit formula of average discrete AoI, Δ*Ber*/*Geo*/1/1 for a bufferless status updating system (actually, the service time distribution in [58] is arbitrary) by defining a two-dimensional age process which characterizes the AoI at the destination and the age of packet in service as a whole. The idea we proposed in [58] can be regarded as the discretization of the SHS method, which is shown to be equally powerful and more flexible when applied to more general systems. We describe all the possible state transfers for every initial state vector and then establish the stationary equations of the defined two-dimensional discrete age process. These equations are solved completely in [58]; thus, the distribution of AoI can be determined explicitly as one of the marginal distributions of the two-dimensional age process's stationary distribution. Given the AoI's distribution, the mean, the variance, and the tail probabilities of the AoI can be easily calculated. The idea of constituting multiple-dimensional age processes is then used in [59] to obtain the mean and the distribution of the infinite size state updating system. In [60], the distributions of the AoI of a system with *Ber*/*Geo*/1/1, *Ber*/*Geo*/1/2, and *Ber*/*Geo*/1/2\* queues are derived explicitly using the method of solving equations. In this paper, the AoIs of a system with *Ber*/*Geo*/1/2 and *Ber*/*Geo*/1/2\* queues are considered simultaneously, which are connected together by the probabilistic packet preemption in the system's buffer. In addition, in order to avoid the tedious calculation required to solve the stationary equations and calculate the marginal distribution, we define the Probability Generation Function (PGF) of the multiple-dimensional stationary distribution, from which both the AoI's mean and its stationary distribution can be obtained effectively. For the system's average AoI, in Table 1, we list the results we have obtained about the discrete AoI and the corresponding expressions of the continuous system's average AoI. The average AoI Δ*Ber*/*Geo*/1/1 was obtained in [58], and the other two expressions will be derived in the current paper. Apart from the AoI's mean, we also determine the distribution of the discrete AoI Δ*Ber*/*Geo*/1/2 and Δ*Ber*/*Geo*/1/2<sup>∗</sup> by writing the PGF as the power series.

As mentioned above, one can see the similarity between our idea and the SHS method, and one may mistakenly think that we simply change the continuous time into discrete time slots. The power of combining multiple-dimensional state vector descriptions with the PGF method may be underestimated due to the simple assumptions used in the current paper—that is, the packet arrivals form a Bernoulli process and the packet service time

is geometrically distributed. It is known that in order to obtain the complete statistical information, not just the mean of the stationary AoI by the method of SHS, one has to solve a group of differential equations, which may be possible for some systems with simple queues but generally is impossible. In addition, the usage of SHS analysis is heavily restricted because it requires that both the packet arrival process and the packet service process are memoryless, i.e., the interarrival time and the packet service time have to be i.i.d. exponential random variables. In the following, we explain the merits of considering a discrete time system in two aspects.

**Table 1.** Some formulas of the average continuous and average discrete age of information.


#### (1) Calculation: reducing the complexity

Observing that when all the state transitions are described in discrete time slots, the stationary equations characterizing the steady state of the defined age process become a set of linear equations, which is more likely to be solved compared with those differential equations, we show in this paper that these linear equations can be dealt with using the PGF method even more easily and more effectively. In our another work, we have determined the explicit expression of average AoI and the corresponding AoI's distribution assuming the *Ber*/*Geo*/1/*c* queue is used in the status updating system, where the system's size *c* can be arbitrary. For the cases *c* = 3 and 4, we obtain that

$$\overline{\Delta}\_{\text{Ber}/\text{Geo}/1/3} = \frac{1}{\gamma} \left( (1 - \gamma) + \frac{1}{\rho\_d} + \frac{\rho\_d^2 (1 - \gamma^2) + 3\rho\_d^3 (1 - 5\gamma/3 + \gamma^2/3)}{1 + \rho\_d (1 - 3\gamma) + \rho\_d^2 (1 - 3\gamma + 3\gamma^2) + \rho\_d^3 (1 - \gamma)^3} \right) \tag{2}$$

and

$$\begin{split} \overline{\Delta}\_{\text{Ber}/\text{Geo}/14} &= \frac{1}{\gamma} \Biggl( (1-\gamma) + \frac{1}{\rho\_d} \\ &+ \frac{\rho\_d^2(1-\gamma) + 2\rho\_d^3(1-\gamma)(1-2\gamma) + 4\rho\_d^4(1-\gamma)(1-11\gamma/4 + 9\gamma^2/4 - \gamma^3/4)}{1 + \rho\_d(1-4\gamma) + \rho\_d^2(1-4\gamma+6\gamma^2) + \rho\_d^3(1-4\gamma+6\gamma^2-4\gamma^3) + \rho\_d^4(1-\gamma)^4} \Biggr) \tag{3} \end{split}$$

Although we have not mentioned this yet, the readers should find that those expressions of average continuous and average discrete AoI given in Table 1 are quite similar. We propose the following possible relationship:

$$
\mu \cdot \overline{\Delta}\_{M/M/1/\mathfrak{c}} = \gamma \cdot \overline{\Delta}\_{\text{Ber}/\text{Gco}/1/\mathfrak{c}}|\_{\gamma=0^{\prime}} \text{ then replacing } \rho\_d \text{ with } \rho \tag{4}
$$

The relation (4) holds at least for *c* = 1, *c* = 2, and *c* = 2∗. Note that the relation (4) is given only by observation, and it is not easy to prove that (4) is applicable in general situations, because the average continuous AoI Δ*M*/*M*/1/*<sup>c</sup>* is temporarily unknown. If Equation (4) is fortunately applicable in general, which we hope, then from expressions (2) and (3), immediately we have

$$\overline{\Delta}\_{M/M/1/3} = \frac{1}{\mu} \left( 1 + \frac{1}{\rho} + \frac{\rho^2 + 3\rho^3}{1 + \rho + \rho^2 + \rho^3} \right) \tag{5}$$

and

$$\overline{\Delta}\_{M/M/1/4} = \frac{1}{\mu} \left( 1 + \frac{1}{\rho} + \frac{\rho^2 + 2\rho^3 + 4\rho^4}{1 + \rho + \rho^2 + \rho^3 + \rho^4} \right) \tag{6}$$

Notice that the average continuous AoIs (5) and (6) are not derived using any of the three methods we discussed earlier—that is, the method based on the sample path of the AoI process, the SHS, and the method proposed in [5,54]. On the contrary, we first characterize the stationary AoI of the corresponding discrete time system and then obtain the expression of the continuous AoI's mean through relationship (4). There is no doubt that the formulas of Δ*M*/*M*/1/3 and Δ*M*/*M*/1/4 can be obtained using AoI's SHS analysis; however, the general formula of AoI's mean, i.e., Δ*M*/*M*/1/*<sup>c</sup>* for arbitrary size *c* is temporarily unknown. Furthermore, the stationary distribution of discrete AoI can also be determined explicitly from the PGF defined for the considered system, while the distribution properties of the continuous AoI cannot be revealed easily through either the graphical method or the AoI's SHS analysis. Although it is not possible to accurately reprint the continuous AoI's distribution in every position using the discrete approximation, the difference between them can be reasonably small when the length of the time slot is short enough. In the current paper, we determine the distribution expressions of discrete AoI for the system with *Ber*/*Geo*/1/2 and *Ber*/*Geo*/1/2∗ queues. Unlike in [5,54], these expressions are straight-forward and not expressed in the form of other transformations.

According to vabove discussions, from the perspective of deriving the average AoI or obtaining the AoI's distribution, considering the status updating system in the discrete time model is of great significance. To a certain extent, we can even conclude that our method is stronger since more specific results about AoI have been obtained.

#### (2) Generalization: In terms of system structure and service time distribution

Recently, using the SHS method, the age analysis has been generalized to the status updating networks with a simple structure, especially the system with multiple sources. In this part, we briefly explain how the discrete age of information is characterized in the multiple-source bufferless system and the two-source system equipped with a size 1 buffer. The system models are depicted in Figure 1.

**Figure 1.** (**a**) Status updating system with multiple sources and bufferless server. (**b**) Status updating system with two sources and a size 1 buffer.

Specifically, we assume the packets arrive at the beginning of one time slot; whether the packet service is completed is determined at the end of the time slot. Since the system's random dynamics are considered in time slots, it is possible that more than one packet arrives to the server (buffer) from different sources in a time slot. The server has to choose one of these and discard the other packets if the system does not have a buffer. This packet collision problem can be solved by assigning priorities to the packets from different sources; then, the packet with the highest priority is selected and put into the server.

In the bufferless system given in the first picture of Figure 1, let *ri* be the priority of source *si*, 1 ≤ *i* ≤ *N*, and assume *r*<sup>1</sup> > *r*<sup>2</sup> > ··· > *rN*—that is, the priority of source *si* is over that of *sj* if *i* < *j*. In each time slot, source *si* generates a new packet with probability *pi*, and the packet generation process is independent of all other sources. Actually, this situation is exactly the generalization of our work in [58] when the status updating system has multiple independent sources. For the given *i* ∈ [1, *N*], it shows that the AoI process corresponding to source *si* can be analyzed separately and is thus similar to work [58], showing that a two-dimensional state vector (*ni*, *mi*) is sufficient to track the real-time changes of *AoIi* and the age of the packet in the server from source *si*. In this system, we observe that it does not matter whether the service of the packet from *si* can be preempted by other packets with higher priorities. The state vector transfers from every (*ni*, *mi*) can be described as in [58], but the transition probabilities need to be modified. For example, for *ni* > *mi* ≥ 1, we have

*State vector at next time slot* = (*ni* + 1, *mi* + 1) the packet service is not completed, (*mi* <sup>+</sup> 1, 0) the service of the packet is over. (7)

if the service process cannot be preempted. In contrast, it can be decided that

*State vector at next time slot* = ⎧ ⎪⎨ ⎪⎩ (*ni* + 1, *mi* + 1) no packets of higher priorities arrive, the service is not over, (*ni* + 1, 0) one packet with higher priority comes, (*mi* + 1, 0) no packets with higher priorities arrive, the service is over. (8)

> when packet service preemption is allowable. After all the state transfers are described and their transition probabilities are determined, we can obtain the stationary equations, which can be solved completely as in [58] or by using the PGF method as in this paper. Like [58], the service time distribution in this case can be arbitrary.

> Although there are multiple sources, it can be seen that the age analysis of each source is easy when the status updating system has no buffer. Notice that in this case, no queue is formed before the server; thus, there is no chance that the packets from different sources are combined. As a result, the packets from every source are totally divided, and the AoI of each source can be analyzed separately. The situation is much more difficult if the system has a buffer. As an example, we consider the AoI of each source of a two-source system, which is depicted in the second picture of Figure 1.

> We can define a six-dimensional state vector (*n*1, *n*2, *m*1, *m*2, *l*1, *l*2) to describe the AoI of two sources simultaneously, where the state components represent the values of two AoIs at the destination, the age of a packet in the server, and the age of a packet in the system's buffer. In every position of the system, apart from the "age", it is necessary to indicate which source the packet comes from. Therefore, a three-dimensional state vector (*n*, *m*, *l*) that does not include this information is not sufficient. Notice that at any time, at most one of *m*<sup>1</sup> and *m*<sup>2</sup> are non-zero. This is the same for the parameters *l*<sup>1</sup> and *l*2. When there is a buffer in front of the server, apparently a queue is formed if a packet arrives and finds that the server is currently busy. Each one of two packets in the system (one is in the server and the other is in the buffer) may come from source *s*<sup>1</sup> or *s*2. Of course, these two packets may belong to different sources. Although the problem becomes complex, theoretically, all the state transfers of every initial six-dimensional state vector can be determined explicitly, since the randomness that causes the state vector transfers are limited to random packet arrival, the service of the packet, and the additional packet preemption. Then, according to the balance of probabilities in the steady state, the stationary equations are established; this solves the first half of the AoI analysis. Details of the latter half—that is, deriving the

average AoI from the group of stationary equations—can be found in the procedures in this article.

We find that in [61], the authors obtained the average continuous AoI for the same twosource status updating system in Figure 1b using the SHS method. They added another assumption that the packet in the server and the packet in the buffer must belong to different sources in their second and the third considered situation and named the policies "source-aware packet management". As we have mentioned above, although the packets from two sources are still combined, after adding this restriction, the complexity of the problem has been greatly reduced.

In fact, the state vector defined for a discrete time system has a very clear physical meaning. For the status updating system with the FCFS queue, the first parameter denotes the AoI and the other state components represent the ages of packets in the server and in the buffer of the system. Thus, a (*c* + 1)-dimensional state vector is needed if the size of the system is equal to *c*. Compared with analyzing the AoI of discrete systems, in the SHS method, the defined state vector is sometimes easier, such as in the system with multi-sources. In [61], in order to characterize the AoI of one source in a two-source system, the authors used a four-dimensional state vector [*x*0(*t*), *x*1(*t*), *x*2(*t*), *x*3(*t*)] that describes the evolutions of AoI when different random events occur. As mentioned before, we use the six-dimensional state vector (*n*1, *n*2, *m*1, *m*2, *l*1, *l*2) describing the random changes of both source 1 and source 2. The parameter *n*<sup>1</sup> or *n*<sup>2</sup> can also be deleted if only one of two sources are analyzed. In our proposed method, we show the correspondence between the dimension of the state vector and the size of the discrete system; this may not be a unique way to define the discrete state vector. Although considering the AoI of the discrete time system has higher computational complexity, the biggest advantage of discrete AoI analysis is that it can obtain the stationary distribution of the AoI.

Except the simple status updating networks given in Figure 1, we have also obtained the average discrete AoI for a status updating system with two-stage service, where for simplicity, in front of each server, no buffer is equipped. For the system with two parallel servers, the age analysis is more difficult, since some packets may become "ineffective" if one packet is generated later but arrives to the destination earlier. Some policies need to be identified to deal with these packets—for instance, deleting the packet directly once it becomes ineffective. If nothing is done, when an ineffective packet is obtained at the receiver, the value of AoI will not be reduced.

Another direction of generalization we shall talk about is the distribution of packet service time (while the packet arrival process is still Bernoulli). Now, taking the size 2 status updating system as an example, we explain how the service time distribution is relaxed to be an arbitrary distribution. Using a three-dimensional state vector (*n*, *m*, *l*), we can fully describe the random dynamics including the AoI at the receiver and the age of two packets in the system if both the packet interarrival time and the service time have memoryless properties. In each time slot, the changes of the AoI's value and the packet ages depend on random packet arrival, which is memoryless and independent, and whether the packet service is over. When the service time distribution is arbitrary, the probability that the service is completed in one time slot is related to the time this packet has experienced in the server. Let *S* be the random variable of service time, and we represent the general distribution as

$$\Pr\{S=i\} = q\_i \qquad (i \ge 1) \tag{9}$$

We assume that, before the current time slot, the packet has stayed in server for *j* time slots; then, the probabilities that determine the state vector transfers should be the following two conditional probabilities:

$$\Pr\{S = j + 1 | S > j\} \text{ and } \Pr\{S > j + 1 | S > j\} \tag{10}$$

Therefore, if we have knowledge about this passed service time *j*, as before, all the state transfers of state vector (*n*, *m*, *l*) can be completely described and the age analysis becomes feasible. Since no one of the three parameters *n*, *m*, and *l* can provide this information, it is natural to introduce an extra component, say *k*, to denote the service time that the packet has consumed and constitute the four-dimensional state vector (*n*, *m*, *l*, *k*). In this way, the possible state transfers of this four-dimensional state vector can be described and the transition probabilities can also be determined. For example, let the initial state vector be (*n*, *m*, *l*, *k*)—we have the state transfers and transition probabilities as

$$\text{Next state vector} = \begin{cases} (n+1, m+1, l+1, k+1) & \text{the service is not over with prob. } 1 - q\_{k+1} / \sum\_{i=k+1}^{\infty} q\_i \\ (m+1, l+1, 0, 0) & \text{the service completes with prob. } q\_{k+1} / \sum\_{i=k+1}^{\infty} q\_i \end{cases} \tag{11}$$

where we assume the queue discipline is FCFS and there is no packet preemption. We show that the four parameters *n*, *m*, *l*, and *k* satisfy the relationships *n* > *m* > *l* ≥ 0 and *n* > *m* ≥ *k* ≥ 0. The first one holds because *n*, *m*, and *l* are three ages of packets generated in chronological order, and *n* > *m* ≥ *k* is satisfied since the packet system time *m* must be larger than or equal to the service time of the packet, which is denoted by *k*. These relations determine which vectors are qualified state vectors. Although we show that the state transfers can be analyzed and the group of stationary equations can be determined by considering the balance of those stationary probabilities; however, it can be expected that solving these equations is not easy. Since the service time probabilities *qi*s are arbitrary, the expression of the average AoI, as we can determine in later work, will not be closed-formed. It is also important to note that the PGF method cannot be used when the service time distribution is not geometric, because the transition probabilities is no longer the same for different state vectors and thus cannot be the common factor.

Summarizing the above discussions, we have proved that on the basis of original memoryless status updating system, by introducing an extra component to denote the time the packet has consumed in the server, the age analysis becomes feasible for the situation where the packet service time is arbitrarily distributed. Although it may be difficult to obtain the expressions of the system's average AoI, the idea is still applicable when we generalize the size 2 system to a status updating system with arbitrary size *c*. In one of our works, we have shown that for a size *c* discrete time status updating system with Bernoulli packet arrivals and geometrically distributed service time, in order to fully characterize the real time transfers of the system's AoI and all the packet ages, a (*c* + 1) dimensional state vector (*n*, *m*1, ··· , *mc*) should be defined. By adding an extra state component *k* that records the service time the packet has experienced in the server, according to previous discussions, the age analysis can be generalized to a size *c* status updating system whose service time distribution is arbitrary (at least we can establish all the stationary equations).

We have to attribute the above idea to [62], in which the authors considered the timely transmission of the updates over an erasure channel. They assume that each update consists of *k* symbols and the symbol erasure in each time slot is an i.i.d. Bernoulli process. The aim of [62] is to design an optimal online transmission scheme to minimize the time average AoI, and the problem is formulated as a Markov Decision Process (MDP). Although the optimization of AoI is not our interest, the state tuple (*δt*, *dt*, *lt*) defined in Section 2. A is very enlightening, based on which the transmission policy at the next time slot is determined. At the *t*-th time slot, the notation *δ<sup>t</sup>* denotes the value of AoI, *dt* is the age of the next update, i.e., the packet at the head of the queue, and *lt* records the number of symbols that has been obtained successfully up to this time slot—these symbols belong to the update that is transmitted currently. A similar timely source coding problem was also discussed in [63], in which the authors also pointed out that the length of the encoded update is equivalent to the service time of the update, and their considered system behaves as a discrete time *Geo*/*G*/1 queue (we use the notation *Ber*/*G*/1). Therefore, the role of *lt* in [62] can be regarded (or redefined) as the service time that the current update has consumed. By adding this knowledge, the distribution of the source in these papers and the service time distribution in the discrete time status updating system which we study in this part can be arbitrary.

In previous paragraphs, we explain the idea and methods used to study the AoI of discrete time status updating systems. We have shown how the discrete AoI is characterized for the basic system, the system with multiple sources, and the system whose service time distribution is arbitrary. As part of AoI theory, we believe that discrete AoI deserves more attention, and it is meaningful to establish analytical results including the AoI's mean and its distribution for more general systems. In particular, the proposed possible relationship in (4) shows that discussing discrete AoI not only has independent theoretical significance but also helps to determine certain results about continuous AoI. If one problem is difficult in the continuous time model, it is a choice to consider it in discrete time settings.

#### *1.4. The Work in the Current Paper*

We have discussed numerous topics of discrete AoI in the previous subsection, and it is inappropriate to consider all the issues in one article. In this paper, we focus on the stationary AoI of a discrete time system with a *Ber*/*Geo*/1/2 and *Ber*/*Geo*/1/2∗ queue and discuss both in a single model. We assume the packet in the buffer can be probabilistically preempted by the fresher packets from the source and define the queue model in this scenario as *Ber*/*Geo*/1/2∗/*η*, where *η* is the preemption probability. In the literature of AoI, the probabilistic packet preemption (replacement) has been studied in [64]. In [65], the probabilistic preemption was considered in the scenario where a CPU is used frequently to deal with the unpredictable tasks. Then, for the case of *η* = 0, the queue model of the system reduces to *Ber*/*Geo*/1/2, while when *η* is equal to 1, the status updating system with *Ber*/*Geo*/1/2∗ queue is obtained. For the general case, we derive the explicit expression of the system's average AoI. By writing the defined PGF as the power series, for two extreme cases of *η* = 0 and *η* = 1, the distribution expressions of two discrete AoIs are determined as well.

The rest of the paper is organized as follows. In Section 2, we describe the model of a discrete time status updating system with probabilistic packet preemption. The stationary distribution and the mean of the system's AoI are also defined. By analyzing the steady state of a three-dimensional stochastic age process, in Section 3, we obtain the explicit formula of the average AoI under general preemption probability using the probability generation function (PGF) method. In Section 4, let *η* = 0 and *η* = 1, and we determine the average AoIs Δ*Ber*/*Geo*/1/2 and Δ*Ber*/*Geo*/1/2<sup>∗</sup> from the general expression derived previously in Section 3. Furthermore, in order to obtain the stationary distribution of two discrete AoIs, we write the PGF as power series. Then, the coefficient before *x<sup>n</sup>* gives the probability that the AoI takes value *n* for each *n* ≥ 1. Numerical results are placed in Section 5. For the general case, we illustrate the relationships between the average AoI and *η* and the traffic intensity *ρd*, respectively. In addition, the mean and the cumulative probabilities of three discrete AoIs including Δ*Ber*/*Geo*/1/1, Δ*Ber*/*Geo*/1/2, and Δ*Ber*/*Geo*/1/2<sup>∗</sup> are depicted. These average discrete AoIs and their corresponding average continuous AoIs are also numerically compared in Section 5. Finally, we conclude the paper in Section 6.

#### **2. System Model and Problem Formulation**

We depict the model of the status updating system which uses the *Ber*/*Geo*/1/2∗/*η* queue in Figure 2, in which the packet in the system's buffer can be preempted by a fresher packet from the source *s* with probability *η*. The packet arrivals to the transmitter are assumed to form a Bernoulli stochastic process—that is, in each time slot, a new packet comes with an identical probability, which we denote by *p*. Packet service time follows the geometric distribution with intensity *γ*. The updated packet generated at *s* is transmitted to the destination *d* through the transmitter, in which a random period of time is consumed. The age of information (AoI) at *d* is defined as the time elapsed since the generation time of the last obtained packet. Within the time when no packet is received, the value of AoI increases by 1 after each time slot ends. Every time a packet passes the transmitter and arrives to *d*, the AoI will be reduced to the system time of the obtained packet, which is actually equal to the instantaneous age of this packet.

**Figure 2.** The model of the discrete time status updating system with probabilistic packet preemption in the system's buffer.

Let *a*(*k*) be the value of AoI in the *k*th time slot. The AoI at the next time slot, *a*(*k* + 1), is determined by

$$a(k+1) = \begin{cases} a(k) + 1 & \text{if no packet is obtained,} \\ a(k) + 1 - \mathcal{Y}\_{\bar{\jmath}} & \text{when } \bar{\jmath}\text{th generated packet arrives to } d. \end{cases} \tag{12}$$

where *Yj* is the interarrival time between the (*j* − 1)th and *j*th arriving packet.

Notice that these (*j* − 1)th and *j*th packets may be generated discontinuously, since between them, some updating packets may be discarded when they arrive and find the system full. Actually, this is exactly the difference between the finite and infinite status updating systems. Based on this observation, in [59], we have determined the average AoI and its stationary distribution for an infinite size status updating system with Bernoulli packet arrivals and geometric service time.

Denote the stationary AoI for the system with probabilistic packet preemption as Δ*Ber*/*Geo*/1/2∗/*η*. We define the time average AoI as follows, which is equal to the mean of the AoI because the age process is assumed to be ergodic. We have

$$\overline{\Delta}\_{\text{Ber}/\text{Geo}/1/2^\*/\eta} = \lim\_{T \to \infty} \frac{1}{T} \sum\_{k=1}^T a(k) \tag{13}$$

$$\hat{\mathbf{x}} = \lim\_{T \to \infty} \frac{1}{T} \sum\_{i=1}^{M\_T} i \cdot |\{1 \le k \le T : a(k) = i\}| \tag{14}$$

$$=\sum\_{i=1}^{\infty} i \cdot \pi\_i \tag{15}$$

where |{<sup>1</sup> ≤ *<sup>k</sup>* ≤ *<sup>T</sup>* : *<sup>a</sup>*(*k*) = *<sup>i</sup>*}| is the times that the AoI takes value *<sup>i</sup>*, and *MT* = max1≤*k*≤*<sup>T</sup> a*(*k*) is the maximal discrete AoI in *T* time slots. For each *i* ≥ 1,

$$\pi\_i = \lim\_{T \to \infty} \frac{|\{1 \le k \le T : a(k) = i\}|}{T} \tag{16}$$

is the probability that the stationary AoI takes value *i*. In fact, the probability distribution {*πi*, *i* ≥ 1} forms the stationary distribution of the AoI Δ*Ber*/*Geo*/1/2∗/*η*.

The randomness of both packet arrivals and the service time in the server, along with the probabilistic packet preemption in the system's buffer, together make the AoI at the destination change randomly. After one time slot, the value of AoI may increase by 1 if no packet is obtained or drop to the age of the obtained packet at that time if one such packet is successfully received. In order to fully describe these random dynamics of AoI, we propose to use a three-dimensional state vector to simultaneously record the changes of the AoI, the age of a packet in the server, and the age of the packet waiting in the buffer, and then constitute the three-dimensional stochastic process. Next, the steady-state of this multipledimensional age process is analyzed. To obtain the mean and the distribution of AoI, we define the PGF corresponding to the stationary distribution of the three-dimensional age

process, from which both the AoI's mean and its distribution can be obtained. The detailed analysis of the system's AoI is given in Section 3.

#### **3. AoI Analysis for Status Updating System with Probabilistic Packet Preemption**

Define the three-dimensional state vector (*n*, *m*, *l*), where we use *n* to denote the AoI at destination *d*, and the other two parameters *m* and *l* are the ages of the packets in the system's server and the buffer. In the *k*th time slot, if the server is busy while the buffer is empty, then *nk* and *mk* are greater than 0 but *lk* = 0. When both the server and the buffer are empty, we have *mk* = *lk* = 0. In this case, the entire system is empty.

Consider the following three-dimensional age process

$$AoI\_{PP} = \left\{ (n\_k, m\_k, l\_k) : n\_k > m\_k \ge l\_k \ge 0, k \in \mathbb{N} \right\} \tag{17}$$

where the subscript "PP" in expression (17) is the abbreviation of probabilistic preemption. Notice that when the system is empty, the last two parameters *mk* and *lk* are both equal to 0. When there are two packets in the system, i.e., one is in the server and the other is in the buffer, we show that the state components satisfy *nk* > *mk* > *lk* ≥ 1, since in a path from the source to the receiver, the packet ahead always has a greater age. It is clearly shown later that this relationship facilitates the derivation of probability generation function *HPP*(*x*), which is defined in Equation (20).

Define three random variables *A*, *B*, and *F* to represent whether a packet is generated in a time slot, if the service of the packet is completed, and if the arriving packet replaces the original one in the buffer. For each possible initial state vector, according to different realizations of r.v.s (*A*, *B*, *F*), the state transfers of the three-dimensional state vector (*n*, *m*, *l*) can be described specifically. We list all of them using Table 2. For example, the third row of the table shows that a packet of age *l* is in the buffer and a new packet arrives, since the r.v. *A* takes value 1. However, *F* = 0 means that this new packet will not substitute the original one; meanwhile, *B* = 0 implies that the packet service is not over at this time slot. Summarizing all these events, the beginning state vector (*n*, *m*, *l*) will transfer to (*n* + 1, *m* + 1, *l* + 1) at the next time slot, and the transition probability is determined as *p*(1 − *γ*)(1 − *η*). The other cases in the third column of Table 2 are obtained through similar discussions.

From the state transfers given in Table 2 and the corresponding transition probabilities, we can establish all the stationary equations that characterize the steady-state of age process *AoIPP*. Let *π*(*n*,*m*,*l*), *n* > *m* ≥ *l* ≥ 0 be the probability that the process stays at the state vector (*n*, *m*, *l*) when it reaches the steady-state; we show that these stationary probabilities *π*(*n*,*m*,*l*) satisfy the following equations.

$$\begin{cases} \pi\_{(n,m,l)} = \pi\_{(n-1,m-1,l-1)}[(1-p)(1-\gamma)+p(1-\gamma)(1-\eta)] & (n>m>l\geq 2) \\ \pi\_{(n,m,1)} = \pi\_{(n-1,m-1,0)}p(1-\gamma) + \sum\_{j=1}^{m-2}\pi\_{(n-1,m-1,j)}p(1-\gamma)\eta & (n>m\geq 3) \\ \pi\_{(n,2,1)} = \pi\_{(n-1,0,0)}p(1-\gamma) & (n\geq 3) \\ \pi\_{(n,m,0)} = \pi\_{(n-1,m-1,0)}(1-p)(1-\gamma) & (n>m\geq 2) \\ \qquad + \sum\_{k=n}^{\infty}\pi\_{(k,n-1,m-1)}[(1-p)\gamma+p\gamma(1-\eta)] & (n>m\geq 2) \\ \pi\_{(n,1,0)} = \pi\_{(n-1,0,0)}p\gamma(1-\gamma) & \\ \qquad + \sum\_{k=n}^{\infty}\pi\_{(k,n-1,0)}p\gamma+\sum\_{k=n}^{\infty}\pi\_{(k,n-1,j)}p\gamma\eta & (n\geq 3) \\ \pi\_{(2,1,0)} = \pi\_{(1,0,0)}p(1-\gamma) + \sum\_{k=n}^{\infty}\pi\_{(k,n-1,0)}p\gamma & (n\geq 2) \\ \pi\_{(n,0,0)} = \pi\_{(n-1,0,0)}(1-p) + \sum\_{k=n}^{\infty}\pi\_{(k,n-1,0)}(1-p)\gamma & (n\geq 2) \\ \pi\_{(1,0,0)} = \sum\_{n=1}^{\infty}\pi\_{(n,0,0)}p\gamma & (n\geq 2) \end{cases} \tag{18}$$


**Table 2.** The state vector transfers of age process *AoIPP*.

We explain the stationary equations only for a part of the state vectors and show that the other equations in (18) can be determined in a similar manner. Firstly, for the fifth row of (18), the state vector (*n*, 1, 0) can be obtained from (*n* − 1, 0, 0) assuming that a new packet arrives and enters the server directly, but the service does not end in a single time slot. Next, from the current state vector (*k*, *n* − 1, 0), *k* ≥ *n*, if the service of the age (*n* − 1) packet is completed and a new packet arrives at the same time slot, it is observed that the packet of age (*n* − 1) will be sent to the receiver, which makes the AoI change to *n* at next time slot. The new packet enters the server; thus, the middle parameter of the state vector changes to 1. This gives the expected state (*n*, 1, 0). Since in this case, the buffer is empty, when a new packet comes, it occupies the buffer directly, and no packet preemption occurs. At last, we consider the situation where the age process begins with an arbitrary state (*k*, *n* − 1, *j*) where *k* > *n* − 1 > *j* ≥ 1. As long as the packet service is completed and at the same time a new packet arrives preempting the original one in the buffer, again, we will obtain the state vector (*n*, 1, 0) after one time slot. Combining all of the above cases, the stationary equation corresponding to (*n*, 1, 0) is finally determined. In addition to the fifth row, we also explain the last equation in (18). Observing that in order to obtain the state vector (1, 0, 0), the receiver needs a packet of age 1, the system has then to be emptied. This state can be transferred to only from (*n*, 0, 0), and the service time of the newly arrived packet is restricted to be only one time slot.

To derive the expression of the average AoI Δ*Ber*/*Geo*/1/2∗/*η*, we do not solve Equation (18) although this approach is feasible for the AoI analysis of tje current system. In our work [60], we analyzed the AoI of a status updating system with *Ber*/*Geo*/1/1, *Ber*/*Geo*/1/2, and *Ber*/*Geo*/1/2∗ queues, and the expression of the AoI's stationary distribution was determined for each case. There, we completely solved the stationary equations for each system and obtained the explicit expression for every stationary probability. Notice that this work can be regarded as a discrete correspondence of the packet management of continuous AoI in [8,9]. Assuming all the probabilities *π*(*n*,*m*,*l*) have been determined by solving Equation (18), we have

$$\Pr\left\{\Delta\_{\text{Ber}/\text{Geo}/1/2^{\ast}/\eta} = n\right\} = \begin{cases} \pi\_{(1,0,0)} & (n=1) \\ \pi\_{(n,0,0)} + \sum\_{l=0}^{n-2} \sum\_{m=l+1}^{n-1} \pi\_{(n,m,l)} & (n\ge 2) \end{cases} \tag{19}$$

since the probability that the AoI takes each *n* is equal to the sum of all the stationary probabilities with the identical first component. Equation (19) gives the stationary distribution of the AoI, from which we can calculate the average value of AoI as

$$\Delta\_{\text{Ber}/\text{Geo}/1/2^\*/\eta} = \sum\_{n=1}^{\infty} n \cdot \Pr\left\{ \Delta\_{\text{Ber}/\text{Geo}/1/2^\*/\eta} = n \right\}$$

However, the number of calculations to solve Equation (18) may be large, and apart from this, extra computations are required to determine the AoI's distribution according to Formula (19). Since the AoI is denoted by the first component, to obtain the distribution of AoI, we need to sum all the other state components. Notice that when the dimension of defined state vector is bigger, more calculations are required to determine the AoI's distribution. Therefore, we must determine the mean of AoI and its distribution in another way, i.e., the probability generation function (PGF) method.

For 0 < *x* ≤ 1, define the probability generation function

$$H\_{PP}(\mathbf{x}) = \sum\_{n=1}^{\infty} \mathbf{x}^n \Pr\left\{ \Delta\_{\mathrm{Ber}/\mathrm{Geo}/1/2\*} \,|\, \eta = n \right\} \tag{20}$$

and we write *HPP*(*x*) further as

$$\begin{split} H\_{PP}(\mathbf{x}) &= \mathbf{x} \Pr\{\Delta\_{\mathrm{Ber}/\mathrm{Geo}\prime} \gamma\_{1/2^{\ast}} = 1\} + \sum\_{n=2}^{\infty} \mathbf{x}^{n} \Pr\{\Delta\_{\mathrm{Ber}/\mathrm{Geo}\prime} \gamma\_{1/2^{\ast}} = n\} \\ &= \mathbf{x} \pi\_{(1,0,0)} + \sum\_{n=2}^{\infty} \mathbf{x}^{n} \left\{ \pi\_{(n,0,0)} + \sum\_{l=0}^{n-2} \sum\_{m=l+1}^{n-1} \pi\_{(n,m,l)} \right\} \end{split} \tag{21}$$

$$\mathbf{x}^{n} = \sum\_{m=1}^{\infty} \mathbf{x}^{n} \pi\_{(n,0,0)} + \sum\_{m=2}^{\infty} \mathbf{x}^{n} \sum\_{d=0}^{n-2} \sum\_{m=l+1}^{n-1} \pi\_{(n,m,l)} \tag{22}$$

$$\mathbf{x} = \sum\_{n=1}^{m-1} \mathbf{x}^n \pi\_{(n,0,0)} + \sum\_{l=0}^{m-2} \sum\_{m=l+1}^{m} \sum\_{n=m+1}^{m} \mathbf{x}^n \pi\_{(n,m,l)} \tag{23}$$

$$\mathbf{x} = \sum\_{m=1}^{\infty} \mathbf{x}^{n} \pi\_{(n,0,0)} + \sum\_{m=1}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^{n} \pi\_{(n,m,0)} + \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^{n} \pi\_{(n,m,l)} \tag{24}$$

where in (21) we have used the probability expressions (19). Equation (23) is obtained by exchanging the summation order in (22). In Equation (24), we divide the PGF *HPP*(*x*) into three parts. It can be seen in the following paragraphs that the entire function (20) is obtained by determining these parts separately.

According to expression (20), immediately, we have

$$H\_{PP}(1) = 1, \quad \left. \frac{\mathrm{d}H\_{PP}(\mathbf{x})}{\mathrm{d}\mathbf{x}} \right|\_{\mathbf{x}=1} = \overline{\Lambda}\_{\mathrm{Ber}/\mathrm{Geo}/1/2^\*/\eta} \tag{25}$$

That is, the average AoI can be obtained from the PGF's derivative at point *x* = 1, and the probability that the steady state AoI equals *n* is determined by the coefficient before the term *<sup>x</sup><sup>n</sup>* for every *<sup>n</sup>* <sup>≥</sup> 1.

Now, we determine the PGF *HPP*(*x*). For 0 < *x* ≤ 1, define the functions

$$\begin{aligned} h\_1(\mathbf{x}) &= \sum\_{n=1}^{\infty} \mathbf{x}^n \pi\_{(n,0,0)} \\ h\_2(\mathbf{x}) &= \sum\_{m=1}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^n \pi\_{(n,m,0)} \\ h\_3(\mathbf{x}) &= \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^n \pi\_{(n,m,l)} \end{aligned}$$

and

$$h\_2^{(m)}(\mathbf{x}) = \sum\_{m=1}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^m \pi\_{(n,m,0)}$$

We first give the following lemma, from which the PGF *HPP*(*x*) can be determined completely.

**Lemma 1.** *For the functions hi*(*x*)*,* <sup>1</sup> <sup>≤</sup> *<sup>i</sup>* <sup>≤</sup> <sup>3</sup> *and h*(*m*) <sup>2</sup> (*x*)*, we have*

$$h\_1(\mathbf{x}) = \frac{p\gamma M\_1 \mathbf{x}}{1 - (1 - p)\mathbf{x}} + \frac{(1 - p)\gamma \mathbf{x}}{1 - (1 - p)\mathbf{x}} h\_2^{(m)}(\mathbf{x}) \tag{26}$$

$$h\_2(\mathbf{x}) = \frac{p(1-\gamma)\mathbf{x}}{1 - (1-p)(1-\gamma)\mathbf{x}}h\_1(\mathbf{x}) + \frac{p\gamma\mathbf{x}}{[1-(1-p)(1-\gamma)\mathbf{x}][1-(1-\gamma)\mathbf{x}]}h\_2^{(m)}(\mathbf{x})\tag{27}$$

$$h\_3(\mathbf{x}) = \frac{p(1-\gamma)\mathbf{x}}{1-(1-\gamma)\mathbf{x}}h\_2(\mathbf{x})\tag{28}$$

*and it is determined that*

$$h\_2^{(m)}(\mathbf{x}) = \frac{\left[\gamma + p^2 (1 - \gamma)\eta - (1 - p)(1 - \gamma)(1 - p\eta)\gamma\mathbf{x}\right]M\_2\mathbf{x}}{\left[1 - (1 - p)(1 - \gamma)\mathbf{x}\right][1 - (1 - \gamma)(1 - p\eta)\mathbf{x}]} \tag{29}$$

*in which the numbers M*<sup>1</sup> *and M*<sup>2</sup> *are given as*

$$M\_1 = \frac{(1-p)\gamma^2}{(p+\gamma-2p\gamma)\gamma+p^2(1-\gamma)^2} \tag{30}$$

$$M2 = \frac{p\gamma(1-\gamma)}{(p+\gamma-2p\gamma)\gamma+p^2(1-\gamma)^2} \tag{31}$$

**Proof.** Lemma 1 is proved in Appendix A.

Using Lemma 1, we calculate the PGF *HPP*(*x*) as follows. Equation (24) shows

$$\begin{split} H\_{PP}(\mathbf{x}) &= h\_1(\mathbf{x}) + h\_2(\mathbf{x}) + h\_3(\mathbf{x}) \\ &= h\_1(\mathbf{x}) + h\_2(\mathbf{x}) + \frac{p(1-\gamma)\mathbf{x}}{1-(1-\gamma)\mathbf{x}} h\_2(\mathbf{x}) \\ &= h\_1(\mathbf{x}) + \frac{1-(1-p)(1-\gamma)\mathbf{x}}{1-(1-\gamma)\mathbf{x}} \left( \frac{p(1-\gamma)\mathbf{x}}{1-(1-p)(1-\gamma)\mathbf{x}} h\_1(\mathbf{x}) \right. \\ &\quad + \frac{p\gamma\mathbf{x}}{[1-(1-p)(1-\gamma)\mathbf{x}][1-(1-\gamma)\mathbf{x}]} h\_2^{(m)}(\mathbf{x}) \Bigg) \\ &= \frac{1-(1-p)(1-\gamma)\mathbf{x}}{1-(1-\gamma)\mathbf{x}} h\_1(\mathbf{x}) + \frac{p\gamma\mathbf{x}}{[1-(1-\gamma)\mathbf{x}]^2} h\_2^{(m)}(\mathbf{x}) \end{split} \tag{32}$$

where in (32) we have substituted Equation (27).

Using Equation (26) and merging the same terms, eventually, we obtain

$$\begin{split} H\_{PP}(\mathbf{x}) &= \frac{p\gamma M\_1 \mathbf{x} [1 - (1 - p)(1 - \gamma)\mathbf{x}]}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]} \\ &+ \frac{\gamma \mathbf{x} \{1 - (1 - p)[2(1 - \gamma) + p\gamma]\mathbf{x} + (1 - p)^2(1 - \gamma)^2 \mathbf{x}^2\}}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]^2} h\_2^{(m)}(\mathbf{x}) \end{split} \tag{33}$$

in which the function *h* (*m*) <sup>2</sup> (*x*) is given in Equation (29).

According to Formula (25), the average AoI of the system with probabilistic packet preemption is calculated in Theorem 1.

**Theorem 1.** *For the discrete time state updating system with a Ber*/*Geo*/1/2∗/*η queue, assuming the packet waiting in the buffer can be preempted by following fresher packets with probability η, then the average age of information of this system is determined as*

$$\overline{\Delta}\_{\text{Ber}/\text{Cov}/1/2^{\circ}\gamma} = \frac{(p+\gamma-p\gamma)(p+\gamma)-p\gamma}{p\gamma}M\_1 + \frac{(p+\gamma-p\gamma)^2 - p\gamma(1-p)}{p\gamma} \cdot \left. \frac{\text{dh}\_2^{(m)}(x)}{\text{dx}} \right|\_{x=1}$$

$$+\left. \frac{\left\{(p+\gamma-p\gamma)[1-3(1-p)(1-\gamma)] - 2p\gamma(1-p)\right\}p\gamma + \text{Poly}\_1}{p^2\gamma^2}M\_2 \tag{34}$$

*in which we define*

$$\text{Poly}\_1 = [(p + \gamma - p\gamma)^2 - p\gamma(1 - p)][2p(1 - \gamma) + (1 - p)\gamma] \tag{35}$$

*and the derivative of h*(*m*) <sup>2</sup> (*x*) *at point 1 is calculated as*

$$\left. \frac{\mathrm{d}h\_2^{(m)}(\mathbf{x})}{\mathrm{d}x} \right|\_{x=1} = \left( \frac{1}{p + \gamma - p\gamma} + \frac{p(1 - \gamma)(1 - p\eta)}{(p + \gamma - p\gamma)[\gamma + p(1 - \gamma)\eta]} \right) \mathrm{M}\_2 \tag{36}$$

*Let p* = *ρ<sup>d</sup>* · *γ and substitute numbers M*1*, M*2*; the average AoI is also written as*

$$\begin{split} \overline{\Delta}\_{\text{Ber}/\text{Gou}/1/2^\*/\eta} &= \frac{1+\rho\_d(1-2\gamma)+\rho\_d^2(1-\gamma)^2+2\rho\_d^3(1-\gamma)^2}{\rho\_d\gamma[1+\rho\_d(1-2\gamma)+\rho\_d^2(1-\gamma)^2]} \\ &+ \frac{(1-\gamma)+\rho\_d(1-\gamma)(1-2\gamma)+\rho\_d^2(1-\gamma)(1-\gamma+\gamma^2)}{1+\rho\_d(1-2\gamma)+\rho\_d^2(1-\gamma)^2} \left\{ \frac{(2-\gamma)-\rho\_d\gamma(1-\gamma)}{\gamma[1+\rho\_d(1-\gamma)]} \\ &- \frac{(1-\gamma)-\rho\_d(1-\gamma)[1+\gamma-(1-\gamma)\eta]+\rho\_d^2\gamma^2(1-\gamma)\eta}{\gamma[1+\rho\_d(1-\gamma)(1+\eta)+\rho\_d^2(1-\gamma)^2\eta]} \right\} \end{split} \tag{37}$$

*where ρ<sup>d</sup>* = *p*/*γ is defined as the discrete traffic load.*

**Proof.** The average AoI is determined by first computing the derivative of *HPP*(*x*) in (33) and then letting *x* = 1. Replacing parameter *p* with *ρ<sup>d</sup>* · *γ*, expression (37) is obtained eventually. Although a certain amount of calculation is required, all the computations are straight-forward.

Here, we only provide the details from obtaining Equation (36). From (29), we have

$$\begin{split} \left. \frac{d\boldsymbol{h}\_{2}^{(m)}(\boldsymbol{x})}{d\boldsymbol{x}} \right|\_{\boldsymbol{x}=1} &= \left. \frac{\mathbf{d}}{d\mathbf{x}} \left[ \frac{\gamma + p^{2}(1-\gamma)\eta - (1-p)(1-\gamma)(1-p\eta)\gamma\mathbf{x}}{[1-(1-p)(1-\gamma)\mathbf{x}][1-(1-\gamma)(1-p\eta)\mathbf{x}]} \right]\_{\boldsymbol{x}=1} \right|\_{\boldsymbol{x}=1} \\ &= \left. \frac{\mathbf{d}}{d\mathbf{x}} \left( M\_{2} \frac{[\gamma + p^{2}(1-\gamma)\eta]\mathbf{x} - (1-p)(1-\gamma)(1-p\eta)\gamma\mathbf{x}^{2}}{1 - [(1-p)(1-\gamma) + (1-\gamma)(1-p\eta)]\mathbf{x} + (1-p)(1-\gamma)^{2}(1-p\eta)\mathbf{x}^{2}} \right) \right|\_{\boldsymbol{x}=1} \\ &= M\_{2} \frac{\text{Poly}\_{2}\cdot(p+\gamma-p\eta)[\gamma+p(1-\gamma)\eta] + [\gamma+p^{2}(1-\gamma)\eta - (1-p)(1-\gamma)(1-p\eta)\gamma] \cdot \text{Poly}\_{3}}{(p+\gamma-p\eta)^{2}[\gamma+p(1-\gamma)\eta]^{2}} \end{split} \tag{38}$$

where

$$\text{Poly}\_2 = \gamma + p^2 (1 - \gamma)\eta - 2(1 - p)(1 - \gamma)(1 - p\eta)\gamma$$

and

$$\begin{aligned} \text{Poly}\_3 &= (1-p)(1-\gamma) + (1-\gamma)(1-p\eta) - 2(1-p)(1-\gamma)^2(1-p\eta) \\ &= (1-p)(1-\gamma)[\gamma+p(1-\gamma)\eta] + (p+\gamma-p\gamma)(1-\gamma)(1-p\eta) \end{aligned}$$

Notice that

$$\begin{aligned} &\gamma + p^2(1-\gamma)\eta - (1-p)(1-\gamma)(1-p\eta)\gamma \\ &=\gamma + p(1-\gamma)\eta - p(1-p)(1-\gamma)\eta - (1-p)(1-\gamma)(1-p\eta)\gamma \\ &=\gamma + p(1-\gamma)\eta - (1-p)(1-\gamma)[\gamma + p(1-\gamma)\eta] \\ &=(p+\gamma-p\gamma)[\gamma + p(1-\gamma)\eta] \end{aligned} \tag{39}$$

Substituting (39) into (38) results in

$$\begin{split} \left. \frac{\mathrm{d}h\_{2}^{(m)}(x)}{\mathrm{d}x} \right|\_{x=1} &= M\_{2} \left( 1 - \frac{(1-p)(1-\gamma)(1-p\eta)\gamma}{(p+\gamma-p\gamma)[\gamma+p(1-\gamma)\eta]} + \frac{(1-p)(1-\gamma)}{p+\gamma-p\gamma} + \frac{(1-\gamma)(1-p\eta)}{\gamma+p(1-\gamma)\eta} \right) \\ &= M\_{2} \left( \frac{1}{p+\gamma-p\gamma} + \frac{p(1-\gamma)(1-p\eta)}{(p+\gamma-p\gamma)[\gamma+p(1-\gamma)\eta]} \right) \end{split} \tag{40}$$

which is exactly Equation (36).

Notice that in definition (20), for each *<sup>n</sup>* <sup>≥</sup> 1, the coefficient of *<sup>x</sup><sup>n</sup>* is the probability that the AoI equals *n*. In order to obtain these coefficients, we decompose the PGF *HPP*(*x*) into power series. This shows that

$$\begin{split} H\_{PP}(\mathbf{x}) &= \frac{p\gamma M\_1 \mathbf{x}}{\gamma - p} \left( \frac{(1-p)\gamma}{1 - (1-p)\mathbf{x}} - \frac{p(1-\gamma)}{1 - (1-\gamma)\mathbf{x}} \right) \\ &- \left( \frac{(1-p)^2 \gamma^2 M\_2 \mathbf{x}^2}{(\gamma - p)[1 - (1-p)\mathbf{x}]} - \frac{p\gamma(1-p)(1-\gamma)M\_2 \mathbf{x}^2}{(\gamma - p)[1 - (1-\gamma)\mathbf{x}]} + \frac{p\gamma M\_2 \mathbf{x}^2}{[1 - (1-\gamma)\mathbf{x}]^2} \right) \\ &\times \left( \frac{\eta(1-p)(p+\gamma-p\gamma)}{(1-\eta)[1 - (1-p)(1-\gamma)\mathbf{x}]} - \frac{(1-p\eta)[\gamma + p(1-\gamma)\eta]}{(1-\eta)[1 - (1-\gamma)(1-p\eta)\mathbf{x}]} \right) \end{split} \tag{41}$$

when the preemption probability *η* = 1, while for the case *η* = 1, we have

$$\begin{split} H\_{PP}(\mathbf{x}) &= \frac{p\gamma M\_1 \mathbf{x}}{\gamma - p} \left( \frac{(1 - p)\gamma}{1 - (1 - p)\mathbf{x}} - \frac{p(1 - \gamma)}{1 - (1 - \gamma)\mathbf{x}} \right) \\ &+ \left( \frac{(1 - p)^2 \gamma^2 M\_2 \mathbf{x}^2}{(\gamma - p)[1 - (1 - p)\mathbf{x}]} - \frac{p\gamma (1 - p)(1 - \gamma)M\_2 \mathbf{x}^2}{(\gamma - p)[1 - (1 - \gamma)\mathbf{x}]} + \frac{p\gamma M\_2 \mathbf{x}^2}{[1 - (1 - \gamma)\mathbf{x}]^2} \right) \\ &\times \frac{\gamma + p^2 (1 - \gamma) - (1 - p)^2 (1 - \gamma)\gamma \mathbf{x}}{[1 - (1 - p)(1 - \gamma)\mathbf{x}]^2} \end{split} \tag{42}$$

The details of obtaining Equations (41) and (42) are given in Appendix B. In Section 4, along with the average value of the AoI, we determine the AoI's stationary distribution for two extreme cases: *η* = 0 and *η* = 1.

#### **4. Stationary Age of Information under Two Extreme Cases**

In this section, we determine the average AoI of the status updating system without packet preemption by setting *η* = 0, and when the preemption probability *η* is equal to 1, the mean of the AoI for the *Ber*/*Geo*/1/2∗ queue modeled system is also derived. In addition, using Equations (41) and (42), the stationary distributions of the discrete AoI for two cases are also obtained.

**Theorem 2.** *Assuming the packet arrivals form a Bernoulli process and the service time is geometrically distributed, the average AoIs of the discrete time status updating system with Ber*/*Geo*/1/2 *and Ber*/*Geo*/1/2∗ *queues are calculated as*

$$\overline{\Delta}\_{\text{Ber}/\text{Ga}/1/2} = \frac{1}{\gamma} \left( (1 - \gamma) + \frac{1}{\rho\_d} + \frac{2\rho\_d^2 (1 - \gamma)(1 - \gamma/2)}{1 + \rho\_d (1 - 2\gamma) + \rho\_d^2 (1 - \gamma)^2} \right) \tag{43}$$

*and*

$$\overline{\Delta}\_{\text{Ber}/\text{Geo}/1/2^\*} = \frac{1}{\gamma} \Biggl( (1-\gamma) + \frac{1}{\rho\_d} + \frac{\rho\_d^2 (1-\gamma) \left[ 1 + 3\rho\_d (1-\gamma) + \rho\_d^2 (1-\gamma)(1-2\gamma) \right]}{\left[ 1 + \rho\_d (1-2\gamma) + \rho\_d^2 (1-\gamma)^2 \right] \left[ 1 + \rho\_d (1-\gamma) \right]^2} \right) \tag{44}$$

*For each n* ≥ 1*, the distribution of the AoI* Δ*Ber*/*Geo*/1/2 *is given by*

$$\begin{split} \Pr\{\Delta\_{\text{Ber}} \zeta\_{\text{Gat}} | 1/2 = n\} &= \frac{p\gamma^2 (1-p) M\_1}{(\gamma - p)^2} (1-p)^n - \frac{(\gamma - p^2)(1-p)\gamma^2 M\_2}{(\gamma - p)^2} (1-\gamma)^{n-1} \\ &- \frac{p\gamma^2 (1-p) M\_2}{\gamma - p} (n-1)(1-\gamma)^{n-1} + \frac{p\gamma^2 M\_2}{2} n (n-1)(1-\gamma)^{n-2} \end{split} \tag{45}$$

*while when the system has full packet preemption, we show that*

$$\begin{aligned} &\Pr\{\Lambda\_{Rr/Gn/12^{n}} = n\} \\ &= \frac{p\gamma M\_{1}}{\gamma - p} \Big( \gamma (1-p)^{n} - p(1-\gamma)^{n} \Big) \\ &+ \frac{(1-p)[\gamma^{2} + p(1-\gamma)(p+\gamma)]M\_{2}}{\gamma - p} \Big( (1-p)^{n-1} - [(1-p)(1-\gamma)]^{n-1} \Big) \\ &- \frac{(1-p)\gamma[p+2(1-p)\gamma]M\_{2}}{\gamma - p} \Big( (1-\gamma)^{n-1} - [(1-p)(1-\gamma)]^{n-1} \Big) + p\gamma M\_{2} \Big( A(1-\gamma)^{n-2} \Big) \\ &+ B(n-1)(1-\gamma)^{n-2} + \mathbb{C}[(1-p)(1-\gamma)]^{n-2} + D(n-1)[(1-p)(1-\gamma)]^{n-2} \Big) \end{aligned} \tag{46}$$

*in which the coefficients A, B, C, and D are determined by*

$$A = \frac{2 - p}{p^3} \left( (1 - p)^2 \gamma - \frac{2(1 - p)[\gamma + p^2(1 - \gamma)]}{2 - p} \right) \tag{47}$$

$$C = -\frac{(1-p)(2-p)}{p^3} \left( (1-p)^2 \gamma - \frac{2(1-p)[\gamma + p^2(1-\gamma)]}{2-p} \right) \tag{48}$$

$$D = -\frac{p^2(1-p)\cdot A + (1-p)^2[\gamma + p^2(1-\gamma)]}{p(2-p)}\tag{49}$$

*and*

$$B = \left[\gamma + p^2(1-\gamma)\right] - A - \mathbb{C} - D \tag{50}$$

**Proof.** We first derive two average AoIs in Equations (43) and (44) from the general expression (37). Let *η* be 0; then, no packet preemption will occur in the system's buffer. The system's queue model reduces to *Ber*/*Geo*/1/2, and from (37), we can obtain the average AoI Δ*Ber*/*Geo*/1/2.

In this case, it is easy to show the last two terms within the brace of (37) can be calculated to be 1/*γ*. Thus, we have

$$\overline{\Delta}\_{\text{Ber}/\text{Go}/1/2} = \overline{\Delta}\_{\text{Ber}/\text{Go}/1/2^{\*} \, ^{\prime} \gamma \Big|\_{\eta=0}}$$

$$= \frac{1 + \rho\_d (1 - 2\gamma) + \rho\_d^2 (1 - \gamma)^2 + 2\rho\_d^3 (1 - \gamma)^2}{\rho\_d \gamma [1 + \rho\_d (1 - 2\gamma) + \rho\_d^2 (1 - \gamma)^2]}$$

$$\qquad + \frac{(1 - \gamma) + \rho\_d (1 - \gamma)(1 - 2\gamma) + \rho\_d^2 (1 - \gamma)(1 - \gamma + \gamma^2)}{\gamma [1 + \rho\_d (1 - 2\gamma) + \rho\_d^2 (1 - \gamma)^2]}$$

$$= \frac{1 + \rho\_d (2 - 3\gamma) + \rho\_d^2 (1 - \gamma)(2 - 3\gamma) + \rho\_d^3 (1 - \gamma)(3 - 3\gamma + \gamma^2)}{\rho\_d \gamma [1 + \rho\_d (1 - 2\gamma) + \rho\_d^2 (1 - \gamma)^2]}$$

$$= \frac{1}{\rho\_d \gamma} \Big( 1 + \rho\_d (1 - \gamma) + \frac{2\rho\_d^3 (1 - \gamma)(1 - \gamma/2)}{1 + \rho\_d (1 - 2\gamma) + \rho\_d^2 (1 - \gamma)^2} \Big) \tag{51}$$

$$=\frac{1}{\gamma}\left((1-\gamma)+\frac{1}{\rho\_d}+\frac{2\rho\_d^2(1-\gamma)(1-\gamma/2)}{1+\rho\_d(1-2\gamma)+\rho\_d^2(1-\gamma)^2}\right)\tag{52}$$

where in Equation (51) we use the method of long division.

For the other extreme case of *η* = 1, obviously the general expression (37) gives the average AoI Δ*Ber*/*Geo*/1/2<sup>∗</sup> . Similarly, we first determine the value of the last two terms within the brace. We show that the difference of the last two terms equals

$$\frac{1}{\gamma} - \frac{\rho\_d^2 (1 - \gamma)}{\gamma [1 + \rho\_d (1 - \gamma)^2]} \tag{53}$$

thus, the average AoI Δ*Ber*/*Geo*/1/2<sup>∗</sup> is calculated as

$$\begin{split} & \left. \overline{\Delta}\_{\text{Ber}/\text{Gav}/1/2^{\circ}} = \left. \overline{\Delta}\_{\text{Ber}/\text{Gav}/1/2^{\circ}} \right|\_{\mathfrak{T}=1} \\ &= \overline{\Delta}\_{\text{Ber}/\text{Gav}/1/2} - \frac{(1-\gamma) + \rho\_d(1-\gamma)(1-2\gamma) + \rho\_d^2(1-\gamma)(1-\gamma+\gamma^2)}{1+\rho\_d(1-2\gamma)+\rho\_d^2(1-\gamma)^2} \cdot \frac{\rho\_d^2(1-\gamma)}{\gamma[1+\rho\_d(1-\gamma)]^2} \\ &= \frac{1}{\gamma} \Big( (1-\gamma) + \frac{1}{1+\gamma} + \frac{\rho\_d^2(1-\gamma)(2-\gamma)}{1+\gamma[1+\gamma](1-\gamma)^2} \Big) \end{split} \tag{54}$$

$$\begin{split} \frac{1}{\gamma} \Big( (1-\gamma) + \frac{1}{\rho\_d} + \frac{\rho\_d^2 (1-\gamma)(2-\gamma)}{1 + \rho\_d (1-2\gamma) + \rho\_d^2 (1-\gamma)^2} \\ -\frac{\rho\_d^2 (1-\gamma)^2 + \rho\_d^3 (1-\gamma)^2 (1-2\gamma) + \rho\_d^4 (1-\gamma)^2 (1-\gamma+\gamma^2)}{\left[1 + \rho\_d (1-2\gamma) + \rho\_d^2 (1-\gamma)^2\right] \left[1 + \rho\_d (1-\gamma)\right]^2} \end{split} \tag{55}$$

$$=\frac{1}{\gamma}\left((1-\gamma)+\frac{1}{\rho\_d}+\frac{\rho\_d^2(1-\gamma)\left[1+3\rho\_d(1-\gamma)+\rho\_d^2(1-\gamma)(1-2\gamma)\right]}{1+\rho\_d(1-2\gamma)+\rho\_d^2(1-\gamma)^2}\right)\tag{56}$$

In Equation (54), since in the case of *η* = 0, the difference of the last two terms is 1/*γ*, the average AoI Δ*Ber*/*Geo*/1/2 is obtained. This equation also gives the exact gap between two average AoIs of the system with and without packet preemption. Notice that the latter term in (54) is always positive; then, the average AoI must become lower when the packet preemption strategy is applied. Equation (52) is substituted in (55), and in Equation (56), the expression of the average AoI Δ*Ber*/*Geo*/1/2<sup>∗</sup> is finally determined.

Next, the distribution of the discrete AoI is calculated. Before the expressions (45) and (46) are derived, we first verify that both (45) and (46) are proper probability distributions by providing a specific numerical example.

Numerical results of two AoI distributions. Let *p* = 1/4 and *γ* = 1/2.

Firstly, from Equations (30) and (31), two numbers *M*<sup>1</sup> and *M*<sup>2</sup> are determined to be

$$M\_1 = \frac{12}{17}, \quad M\_2 = \frac{4}{17}.$$

after some simple calculations, for each *n* ≥ 1, the expression (45) gives

$$\Pr\{\Delta\_{\text{Rrr}/\text{Gro}/1/2} = n\} = \frac{9}{17} \left(\frac{3}{4}\right)^n - \frac{21}{68} \left(\frac{1}{2}\right)^n - \frac{3}{68} (n-1) \left(\frac{1}{2}\right)^{n-1} + \frac{1}{136} n (n-1) \left(\frac{1}{2}\right)^{n-2} \tag{57}$$

To obtain the numerical result of Equation (46), it is necessary to determine the four coefficients *A*, *B*, *C*, and *D* according to expressions (47)–(50). We directly find that

$$A = -\frac{39}{2}, \quad \text{C} = \frac{117}{8}, \quad D = \frac{45}{32}, \quad B = 4.$$

After some extra computations, it is shown that

$$\begin{split} \Pr\{\Delta\_{\text{Ber}/\text{Geo}/1/2^{\circ}} = n\} &= \frac{1}{2} \left(\frac{3}{4}\right)^{n} - \frac{105}{34} \left(\frac{1}{2}\right)^{n} \\ &+ \frac{57}{17} \left(\frac{3}{8}\right)^{n} + \frac{2}{17} (n-1) \left(\frac{1}{2}\right)^{n-2} + \frac{45}{1088} (n-1) \left(\frac{3}{8}\right)^{n-2} \end{split} \tag{58}$$

It can be checked directly that the sum of both (57) and (58) from *n* = 1 to ∞ are equal to 1. Therefore, expressions (45) and (46) indeed form the proper probability distributions.

In the following, by decomposing (41) and (42) further into several simplest rational fractions, we derive the explicit expressions of AoI distributions for the system with and without packet preemption.

First of all, for *η* = 0, it is easy to prove that the last part of (41) is equal to

$$\frac{-\gamma}{1 - (1 - \gamma)x}$$

thus, we have following equations. This shows that

*HPP*(*x*)|*η*=<sup>0</sup> <sup>=</sup> *<sup>p</sup>γM*1*<sup>x</sup> γ* − *p* (1 − *p*)*γ* <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*<sup>x</sup>* <sup>−</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) 1 − (1 − *γ*)*x* − (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)2*γ*2*M*2*x*<sup>2</sup> (*γ* − *p*)[1 − (1 − *p*)*x*] <sup>−</sup> *<sup>p</sup>γ*(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*2*x*<sup>2</sup> (*<sup>γ</sup>* <sup>−</sup> *<sup>p</sup>*)[<sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*x*] <sup>+</sup> *<sup>p</sup>γM*2*x*<sup>2</sup> [1 − (1 − *γ*)*x*] 2 · <sup>−</sup>*<sup>γ</sup>* 1 − (1 − *γ*)*x* <sup>=</sup> *<sup>p</sup>γ*2(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*M*1*<sup>x</sup>* (*γ* − *p*)[1 − (1 − *p*)*x*] <sup>−</sup> *<sup>p</sup>*2*γ*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*1*<sup>x</sup>* (*γ* − *p*)[1 − (1 − *γ*)*x*] <sup>+</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)2*γ*3*M*2*x*<sup>2</sup> (*γ* − *p*)[1 − (1 − *p*)*x*][1 − (1 − *γ*)*x*] <sup>−</sup> *<sup>p</sup>γ*2(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*2*x*<sup>2</sup> (*<sup>γ</sup>* <sup>−</sup> *<sup>p</sup>*)[<sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*x*]<sup>2</sup> <sup>+</sup> *<sup>p</sup>γ*2*M*2*x*<sup>2</sup> [1 − (1 − *γ*)*x*]<sup>3</sup> <sup>=</sup> *<sup>p</sup>γ*2(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*M*1*<sup>x</sup>* (*γ* − *p*)[1 − (1 − *p*)*x*] <sup>−</sup> *<sup>p</sup>*2*γ*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*1*<sup>x</sup>* (*γ* − *p*)[1 − (1 − *γ*)*x*] <sup>+</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)2*γ*3*M*2*x*<sup>2</sup> (*γ* − *p*) 1 − *p* (*γ* − *p*)[1 − (1 − *p*)*x*] <sup>−</sup> <sup>1</sup> <sup>−</sup> *<sup>γ</sup>* (*γ* − *p*)[1 − (1 − *γ*)*x*] <sup>−</sup> *<sup>p</sup>γ*2(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*2*x*<sup>2</sup> (*<sup>γ</sup>* <sup>−</sup> *<sup>p</sup>*)[<sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*x*]<sup>2</sup> <sup>+</sup> *<sup>p</sup>γ*2*M*2*x*<sup>2</sup> [1 − (1 − *γ*)*x*]<sup>3</sup> = *<sup>p</sup>γ*2(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*M*1*<sup>x</sup> γ* − *p* <sup>+</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)3*γ*3*M*2*x*<sup>2</sup> (*γ* − *p*)<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=0[(1 − *p*)*x*] *n* − *<sup>p</sup>*2*γ*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*1*<sup>x</sup> γ* − *p* <sup>+</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)2(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*γ*3*M*2*x*<sup>2</sup> (*γ* − *p*)<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=0[(1 − *γ*)*x*] *n* <sup>−</sup> *<sup>p</sup>γ*2(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*2*x*<sup>2</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>p</sup>* <sup>∑</sup><sup>∞</sup> *<sup>n</sup>*=<sup>1</sup> *n*[(1 − *γ*)*x*] *<sup>n</sup>*−<sup>1</sup> <sup>+</sup> *<sup>p</sup>γ*2*M*2*x*<sup>2</sup> <sup>2</sup> <sup>∑</sup><sup>∞</sup> *<sup>n</sup>*=<sup>2</sup> *n*(*n* − 1)[(1 − *γ*)*x*] *<sup>n</sup>*−<sup>2</sup> (59)

Taking the coefficient before *xn*, we find that

$$\begin{split} &\Pr\{\Delta\_{\text{RFr}}/\text{Ga}/1/2 = n\} \\ &= \left(\frac{p\gamma^{2}(1-p)M\_{1}}{\gamma-p}(1-p)^{n-1} + \frac{(1-p)^{3}\gamma^{3}M\_{2}}{(\gamma-p)^{2}}(1-p)^{n-2}\right) \\ & - \left(\frac{p^{2}\gamma(1-\gamma)M\_{1}}{\gamma-p}(1-\gamma)^{n-1} + \frac{(1-p)^{2}(1-\gamma)\gamma^{3}M\_{2}}{(\gamma-p)^{2}}(1-\gamma)^{n-2}\right) \\ & - \frac{p\gamma^{2}(1-p)(1-\gamma)M\_{2}}{\gamma-p}(n-1)(1-\gamma)^{n-2} + \frac{p\gamma^{2}M\_{2}}{2}n(n-1)(1-\gamma)^{n-2} \\ &= \frac{p\gamma^{2}(1-p)M\_{1}}{(\gamma-p)^{2}}(1-p)^{n} - \frac{(\gamma-p^{2})(1-p)\gamma^{2}M\_{2}}{(\gamma-p)^{2}}(1-\gamma)^{n-1} \\ & - \frac{p\gamma^{2}(1-p)M\_{2}}{\gamma-p}(n-1)(1-\gamma)^{n-1} + \frac{p\gamma^{2}M\_{2}}{2}n(n-1)(1-\gamma)^{n-2} \end{split} \tag{60}$$

This gives the stationary distribution (45) for the system without packet preemption.

On the other hand, when *η* is equal to 1, the system has full packet preemption. Factoring the PGF in Equation (42), we can also determine the stationary distribution of the AoI <sup>Δ</sup>*Ber*/*Geo*/1/2<sup>∗</sup> by taking the coefficients of terms *<sup>x</sup><sup>n</sup>* for each *<sup>n</sup>* <sup>≥</sup> 1. We give the explicit decomposition below, from which the distribution of AoI for the system with packet preemption is obtained explicitly.

From Equation (42), we show that

$$\begin{split} & \quad H\_{PP}(\mathbf{x})|\_{\eta=1} \\ &= \frac{p\gamma M\_{1}\mathbf{x}}{\gamma-p} \left( \frac{(1-p)\gamma}{1-(1-p)\mathbf{x}} - \frac{p(1-\gamma)}{1-(1-\gamma)\mathbf{x}} \right) + \frac{(1-p)^{2}\gamma^{2}M\_{2}\mathbf{x}^{2}}{\gamma-p} \left( \frac{\gamma^{2}+p(1-\gamma)(p+\gamma)}{\gamma^{2}[1-(1-p)\mathbf{x}]} \right) \\ & \quad - \frac{(1-\gamma)[\gamma^{2}+p(1-\gamma)(p+\gamma)]}{\gamma^{2}[1-(1-p)(1-\gamma)\mathbf{x}]} - \frac{p(1-\gamma)(p+\gamma-p\gamma)}{\gamma[1-(1-p)(1-\gamma)\mathbf{x}]^{2}} \\ & \quad - \frac{p\gamma(1-p)(1-\gamma)M\_{2}\mathbf{x}^{2}}{\gamma-p} \left( \frac{p+2(1-p)\gamma}{p[1-(1-\gamma)\mathbf{x}]} - \frac{(1-p)[p+2(1-p)\gamma]}{p[1-(1-p)(1-\gamma)\mathbf{x}]} \right) \\ & \quad - \frac{(1-p)(p+\gamma-p\gamma)}{[1-(1-p)(1-\gamma)\mathbf{x}]^{2}} + p\gamma M\_{2}\mathbf{x}^{2} \left( \frac{A}{1-(1-\gamma)\mathbf{x}} + \frac{B}{[1-(1-\gamma)\mathbf{x}]^{2}} \right) \\ & \quad + \frac{C}{1-(1-p)(1-\gamma)\mathbf{x}} + \frac{D}{[1-(1-p)(1-\gamma)\mathbf{x}]^{2}} \end{split} \tag{61}$$

in which we determine

$$A = \frac{2 - p}{p^3} \left( (1 - p)^2 \gamma - \frac{2(1 - p)[\gamma + p^2(1 - \gamma)]}{2 - p} \right) \tag{62}$$

$$C = -\frac{(1-p)(2-p)}{p^3} \left( (1-p)^2 \gamma - \frac{2(1-p)[\gamma + p^2(1-\gamma)]}{2-p} \right) \tag{63}$$

$$D = -\frac{p^2(1-p)\cdot A + (1-p)^2[\gamma + p^2(1-\gamma)]}{p(2-p)}\tag{64}$$

and

$$B = \left[\gamma + p^2(1 - \gamma)\right] - A - \mathbb{C} - D \tag{65}$$

Obtaining the second and the third row of (61) is not hard, while for the last row, we give some derivation details in Appendix C. Following the same procedures as those used to obtain (60), according to Equation (61), the probability that a stationary AoI equals each *n* is determined by the coefficient of the term *xn*.

*HPP*(*x*)|*η*=<sup>1</sup> <sup>=</sup> *<sup>p</sup>γM*<sup>1</sup> *γ* − *p* " (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*γ*(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*n*−<sup>1</sup> <sup>−</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>1</sup> # <sup>+</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)2*γ*2*M*<sup>2</sup> *γ* − *p* × " *<sup>γ</sup>*<sup>2</sup> <sup>+</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(*<sup>p</sup>* <sup>+</sup> *<sup>γ</sup>*) *<sup>γ</sup>*<sup>2</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*n*−<sup>2</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)[*γ*<sup>2</sup> <sup>+</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(*<sup>p</sup>* <sup>+</sup> *<sup>γ</sup>*)] *<sup>γ</sup>*<sup>2</sup> [(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> <sup>−</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(*<sup>p</sup>* <sup>+</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>p</sup>γ*) *<sup>γ</sup>* (*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)[(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> # <sup>−</sup> *<sup>p</sup>γ*(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*M*<sup>2</sup> *γ* − *p* × " *p* + 2(1 − *p*)*γ <sup>p</sup>* (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>2</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)[*<sup>p</sup>* <sup>+</sup> <sup>2</sup>(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*γ*] *<sup>p</sup>* [(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(*<sup>p</sup>* <sup>+</sup> *<sup>γ</sup>* <sup>−</sup> *<sup>p</sup>γ*)(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)[(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> # + *pγM*<sup>2</sup> " *<sup>A</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>2</sup> <sup>+</sup> *<sup>B</sup>*(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>2</sup> <sup>+</sup> *<sup>C</sup>*[(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> <sup>+</sup> *<sup>D</sup>*(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)[(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> # <sup>=</sup> *<sup>p</sup>γM*<sup>1</sup> *γ* − *p* " *<sup>γ</sup>*(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*<sup>n</sup>* <sup>−</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*<sup>n</sup>* # <sup>+</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)[*γ*<sup>2</sup> <sup>+</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(*<sup>p</sup>* <sup>+</sup> *<sup>γ</sup>*)]*M*<sup>2</sup> *γ* − *p* " (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*n*−<sup>1</sup> <sup>−</sup> [(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>1</sup> # <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*γ*[*<sup>p</sup>* <sup>+</sup> <sup>2</sup>(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)*γ*]*M*<sup>2</sup> *γ* − *p* " (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>1</sup> <sup>−</sup> [(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>1</sup> # + *pγM*<sup>2</sup> " *<sup>A</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>2</sup> <sup>+</sup> *<sup>B</sup>*(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*n*−<sup>2</sup> <sup>+</sup> *<sup>C</sup>*[(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> <sup>+</sup> *<sup>D</sup>*(*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)[(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)]*n*−<sup>2</sup> # (66)

which determines the distribution of AoI Δ*Ber*/*Geo*/1/2<sup>∗</sup> .

So far, in Equations (52), (56), (60) and (66), we have obtained all the results in Theorem 2; thus, the proof is completed.

Actually, we have obtained the explicit expression of AoI's distribution for the system with packet preemption in our early work [60]. Earlier in this paper, we explain that solving the stationary equations is feasible for the easy situations but cannot be generalized when the system structure or queue models become complex. In [60], we focused on the discrete time system with three queues, i.e., the *Ber*/*Geo*/1/1, *Ber*/*Geo*/1/2, and *Ber*/*Geo*/1/2∗, and named them "*discrete packet management strategies*". There, we determined the AoI's stationary distribution for each system, and all the cases are dealt with by solving the stationary equations directly. Although the calculations are long—even tedious—these methods still have great significance, especially when the general status updating system is considered where the packet arrival process or the packet service process is arbitrary. It is with these methods that the analysis of discrete AoI can break through the limitation of the memoryless property that is imposed on the packet arrival and packet service processes in the SHS approach.

In [9], based on graphical arguments of the age process, the authors determined the average continuous AoI for the system with *M*/*M*/1/2 and *M*/*M*/1/2∗ queues as

$$
\Delta\_{M/M/1/2} = \frac{1}{\mu} \left( 1 + \frac{1}{\rho} + \frac{2\rho^2}{1 + \rho + \rho^2} \right) \tag{67}
$$

and

$$
\Delta\_{M/M/1/2^\*} = \frac{1}{\mu} \left( 1 + \frac{1}{\rho} + \frac{\rho^2 (1 + 3\rho + \rho^2)}{(1 + \rho + \rho^2)(1 + \rho)^2} \right) \tag{68}
$$

In addition, in previous work [58], we have proved that the mean of the AoI for a bufferless discrete time status updating system is equal to

$$\overline{\Delta}\_{\text{Ber}/\text{Geo}/1/1} = \frac{1}{\gamma} \left( (1 - \gamma) + \frac{1}{\rho\_d} + \frac{\rho\_d}{1/(1 - \gamma) + \rho\_d} \right) \tag{69}$$

while the corresponding continuous system with an *M*/*M*/1/1 queue has the average AoI

$$\overline{\Delta}\_{M/M/1/1} = \frac{1}{\mu} \left( 1 + \frac{1}{\rho} + \frac{\rho}{1+\rho} \right) \tag{70}$$

which was also given in [9].

We list Equations (43), (44) and (67)–(70) in Table 3—notice that this table has been given previously in Table 1 except for the last row, which gives the average continuous and discrete AoI for an infinite size status updating system. The mean of the discrete AoI Δ*Ber*/*Geo*/1/<sup>∞</sup> was obtained recently in our work [59]. It is observed that apart from some additional product factors, the expressions of discrete AoI means for the system with Bernoulli packet arrivals and geometric service times are identical to those of the continuous system's average AoI, which uses the Poisson-exponential assumptions. So far, we have obtained enough evidence to propose the following relationship between the mean of discrete and continuous AoIs:

$$
\mu \cdot \overline{\Lambda}\_{M/M/1/\mathfrak{c}} = \gamma \cdot \overline{\Lambda}\_{\text{Ber}/\text{Ge}\alpha/1/\mathfrak{c}}|\_{\gamma=0} \text{ then replacing } \rho\_d \text{ by } \rho \tag{71}
$$

It is interesting and meaningful to verify the correspondence (71) by calculating the average AoI for more continuous time and discrete time systems. For example, determining the mean of AoI assuming the *M*/*M*/1/*c* queue is used in the continuous time system and for the discrete time systems with general *Ber*/*Geo*/1/*c* queues, where the system's size *c* is larger than 2.


**Table 3.** Some formulas of the average continuous and average discrete age of information.

#### **5. Numerical Simulation**

We provide the numerical results in this section. For general preemption probability, in the first two plots of Figure 3, we illustrate the relationships between the average AoI Δ*Ber*/*Geo*/1/2∗/*<sup>η</sup>* and the packet preemption probability *η*, and the traffic load *ρd*, respectively. The means of three discrete AoIs including Δ*Ber*/*Geo*/1/1, Δ*Ber*/*Geo*/1/2, and Δ*Ber*/*Geo*/1/2<sup>∗</sup> are plotted in Figure 3c. For comparison, we also provide the numerical simulations of corresponding average continuous AoIs. At last, for three discrete AoIs, we depict their distribution curves and the cumulative probabilities in Figure 4. Notice that in our work [58], the distribution of the AoI for the bufferless system was obtained as

$$\Pr\{\Delta\_{\text{Ber}} / \text{Gco} / 1/1 = n\} = \frac{p(1-p)\gamma^3[(1-p)^n - (1-\gamma)^n]}{(p+\gamma-p\gamma)(\gamma-p)^2} - \frac{(p\gamma)^2 n(1-\gamma)^n}{(p+\gamma-p\gamma)(\gamma-p)}\tag{72}$$

For three different traffic loads *ρd*, we first draw the graphs between the average AoI Δ*Ber*/*Geo*/1/2∗/*<sup>η</sup>* and the preemption probability *η*. It is understandable that replacing the packet in a buffer with a fresher one can decrease the average AoI at the destination, and the numerical results in Figure 3a show that this trend is consistent as the preemption probability becomes large; that is, the mean of the AoI is decreasing monotonically when *η* increases. We mark the values at two extreme points where *η* = 0 and *η* = 1, which gives the average AoI Δ*Ber*/*Geo*/1/2 and Δ*Ber*/*Geo*/1/2<sup>∗</sup> . Notice that the closer to *η* = 0, the more similar the behavior of the system becomes to that of a system using *Ber*/*Geo*/1/2 queues, and when *η* gradually gets to 1, a status updating system with *Ber*/*Geo*/1/2∗ queue is finally obtained. The three curves in Figure 3a also show that as the traffic load *ρ<sup>d</sup>* increases from 0.4 to 0.45, the average AoI of the system with probabilistic packet preemption is reduced; thus, the timeliness performance is improved.

In Figure 3b, for three settings of preemption probabilities, i.e., *η* = 0, *η* = 0.5 and *η* = 1, the relationships of the average AoI versus traffic intensity *ρ<sup>d</sup>* are illustrated. The topmost curve gives the average AoI of the system without packet preemption because for the case of *η* = 0, the average AoI reduces to Δ*Ber*/*Geo*/1/2. On the other hand, the curve at the bottom corresponds to the AoI's mean of the system that has full packet preemption. In order to make the differences among these graphs more significant, we draw the results in the range *ρ<sup>d</sup>* ≥ 0.45. Three curves in Figure 3b clearly show that the timeliness of the system with complete packet preemption is the best, since when *η* is set to 1, the system's average AoI is the lowest. Since the results in Figure 3a show that the average AoI is monotonically decreasing when *η* increases, the graphs of the AoI's mean for a system with probabilistic packet preemption is located between the blue and the black lines in Figure 3b, such as the red line, which denotes the average AoI Δ*Ber*/*Geo*/1/2∗/0.5. In addition, the gaps between these curves are not significant for small *ρd*s but become large as *ρ<sup>d</sup>* increases.

**Figure 3.** (**a**) Average AoI versus preemption probability *η* (different traffic load). (**b**) Average AoI versus traffic load *ρ<sup>d</sup>* (different preemption intensity). (**c**) Comparisons of average discrete AoI and average continuous AoI.

**Figure 4.** (**a**) Stationary distributions of discrete AoI for bufferless system and the system with and without packet preemption. (**b**) The cumulative probabilities of three discrete AoIs.

From *ρ<sup>d</sup>* = 0.15 to 0.9, we depict both the average discrete AoIs and the corresponding continuous average AoIs in Figure 3c for a bufferless system and size 2 status updating system with and without preemption. Continuous AoIs are denoted by dashed lines, and we use solid lines to represent the discrete AoIs. First of all, all the curves are decreasing as *ρ<sup>d</sup>* becomes large, and the gaps between them are gradually apparent. For three continuous AoIs, it is observed that the average AoI Δ*M*/*M*/1/2<sup>∗</sup> is the lowest in all the range of traffic load *ρ*. For the other two status updating systems, it is found that the system with an *M*/*M*/1/2 queue has a lower average AoI when *ρ* takes small values, while for high *ρ*, the average AoI of the bufferless system is smaller, and thus the timeliness is better. These results are the same for the graphs of discrete AoIs. Notice that when the discrete traffic intensity *ρ<sup>d</sup>* is extremely large (near 0.9), the numerical results show that the average AoI Δ*Ber*/*Geo*/1/1 can be even smaller than Δ*Ber*/*Geo*/1/2<sup>∗</sup> .

In Figure 3c, both continuous and discrete average AoIs are monotonically decreasing in the whole range of *ρd*; however, the monotonicity of the curve between the average AoI and *ρ<sup>d</sup>* can only be maintained for small-size status updating systems. It is known that the average AoI of an infinite size system, i.e., Δ*M*/*M*/1/∞, is not monotonic when the traffic load varies from 0 to 1. Thus, for a size *c* status updating system with Bernoulli packet arrivals and geometrically distributed service time, there must be a *critical size c*∗ such that when *c* < *c*∗, the mean of the system's AoI Δ*Ber*/*Geo*/1/*<sup>c</sup>* is monotonically decreasing as *ρ<sup>d</sup>* tends to 1. In contrast, for those cases where the system size *c* ≥ *c*∗, the curve has a valley, and an optimal *ρ<sup>d</sup>* exists at which the average AoI is minimized. Similarly, for the continuous average AoI Δ*M*/*M*/1/*<sup>c</sup>* of the system with general size *c*, a *c*<sup>∗</sup> also exists so that Δ*M*/*M*/1/*<sup>c</sup>* is always decreasing when *c* < *c*<sup>∗</sup> and the graph of Δ*M*/*M*/1/*<sup>c</sup>* first falls and then rises for those *c*s where *c* ≥ *c*∗. In addition, from the alternation of Δ*M*/*M*/1/1 and Δ*M*/*M*/1/2, and also of Δ*Ber*/*Geo*/1/1 and Δ*Ber*/*Geo*/1/2, we can infer that, as a function of *c*, the graphs of Δ*M*/*M*/1/*<sup>c</sup>* and Δ*Ber*/*Geo*/1/*<sup>c</sup>* are not monotonic.

At last, the distribution curves and cumulative probabilities of three discrete AoIs are depicted in Figure 4, in which we set a relatively large *ρ<sup>d</sup>* to make the difference between them clear. On the whole, these curves are similar. In Figure 4a, from the distributions of AoI Δ*Ber*/*Geo*/1/1 to that of Δ*Ber*/*Geo*/1/2, the peak of the curve decreases and the point at which the peak stationary probability is achieved moves slightly to the right. As the AoI becomes large, the distribution curve of the system with the *Ber*/*Geo*/1/1 queue drops more sharply. The distribution corresponding to Δ*Ber*/*Geo*/1/2<sup>∗</sup> has the largest peak value pf all of three discrete AoIs, and the descent speed is the fastest when the value of AoI is large. In addition, it seems that this maximal probability is taken at the same discrete AoI as that of the distribution of Δ*Ber*/*Geo*/1/2. We also provide the cumulative probabilities of three discrete AoIs in Figure 4b.

#### **6. Conclusions**

In this paper, we consider the stationary AoI of a size 2 status updating system where the packet waiting in the buffer can be preempted by fresher packets with the given probability *η*. We show that this phenomenon may occur in the energy-harvest (EH) nodes of wireless sensor networks where the charging process is stochastic. We constitute a three-dimensional age process and derive the general expression of the system's average AoI using the PGF method. Let *η* = 0 and *η* = 1; the mean of two discrete AoIs Δ*Ber*/*Geo*/1/2 and Δ*Ber*/*Geo*/1/2 are determined, and the exact distribution expressions of both AoIs are also obtained by writing the PGF as the power series.

We propose the idea and methods for the analysis of discrete AoIs—that is, constituting multiple-dimensional age processes and applying the PGF method. A detailed introduction is given to exhibit the usage of the idea and methods to more discrete time status updating systems. With this paper, we have shown how the AoI of basic discrete system is characterized, while in further work, we will focus on the age analysis of systems with a more general structure, such as systems with multi-sources and systems with multihop packet transmission. As one part of the AoI theory, we believe that the research into discrete AoI deserves more attention.

**Author Contributions:** Writing—original draft preparation, J.Z.; writing—review and editing, Y.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by the National Natural Science Foundation of China under Grant 61901105 and Grant 62171119, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20190343, in part by the Zhi Shan Young Scholar Program of Southeast University.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Theorem 1**

In this appendix, we derive all the results in Lemma 1 from the stationary Equations (18).

Define that

$$M\_1 = \sum\_{n=1}^{\infty} \pi\_{(n,0,0)}, \; M\_2 = \sum\_{m=1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,0)}, \; \text{and} \\ M\_3 = \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,l)}, \; \text{and} \\ M\_4 = \sum\_{l=1}^{\infty} \sum\_{n=l+1}^{\infty} \pi\_{(n,l,n)}$$

These three numbers are determined at the first place. According to the last two rows of (18), we have

$$\begin{aligned} M\_1 &= \pi\_{(1,0,0)} + \sum\_{n=2}^{\infty} \pi\_{(n,0,0)} \\ &= \left(\sum\_{n=1}^{\infty} \pi\_{(n,0,0)}\right) p\gamma + \sum\_{n=2}^{\infty} \left\{\pi\_{(n-1,0,0)}(1-p) + \sum\_{k=n}^{\infty} \pi\_{(k,n-1,0)}(1-p)\gamma\right\} \end{aligned}$$

$$\hat{\rho} = p\gamma M\_1 + (1-p)M\_1 + \sum\_{\mathfrak{m}=1}^{\infty} \sum\_{\mathfrak{n}=\mathfrak{m}+1}^{\infty} \pi\_{(\mathfrak{d},\mathfrak{M},0)}(1-p)\gamma \tag{A1}$$

$$0 = p\gamma M\_1 + (1-p)M\_1 + (1-p)\gamma M\_2 \tag{A2}$$

where in (A1), we have used the substitutions *k* = *n*˜ and *n* − 1 = *m*˜ . From Equation (A2), we obtain the first relation

$$p(1 - \gamma)M\_1 = (1 - p)\gamma M\_2 \tag{A3}$$

Next, we deal with the number *M*<sup>3</sup> as follows.

$$\begin{split} M\_3 &= \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,l)} \\ &= \sum\_{m=2}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,1)} + \sum\_{l=2}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,l)} \end{split} \tag{A4}$$

Using the first row of (18), the latter sum in Equation (A4) equals

$$\begin{split} & \sum\_{l=2}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n-1,m-1,l-1)} (1-\gamma)(1-p\eta) \\ &= \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,l)} (1-\gamma)(1-p\eta) \\ &= (1-\gamma)(1-p\eta)M\_{3} \end{split} \tag{A5}$$

and from the second and the third row of (18), the first part of (A4) is calculated as

$$\begin{split} &\sum\_{n=3}^{\infty} \pi\_{(n,2,1)} + \sum\_{m=3}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,1)} \\ &= \sum\_{n=3}^{\infty} \pi\_{(n-1,0,0)} p(1-\gamma) \\ &\quad + \sum\_{m=3}^{\infty} \sum\_{n=m+1}^{\infty} \left\{ \pi\_{(n-1,m-1,0)} p(1-\gamma) + \sum\_{j=1}^{m-2} \pi\_{(n-1,m-1,j)} p(1-\gamma) \eta \right\} \\ &= \sum\_{n=2}^{\infty} \pi\_{(n,1,0)} p(1-\gamma) \\ &\quad + \sum\_{m=2}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,w,0)} p(1-\gamma) + \sum\_{m=3}^{\infty} \sum\_{n=m+1}^{\infty} \sum\_{j=1}^{m-2} \pi\_{(n-1,m-1,j)} p(1-\gamma) \eta \\ &= \sum\_{m=1}^{\infty} \sum\_{n=m+1}^{\infty} \pi\_{(n,w,0)} p(1-\gamma) + \sum\_{m=2}^{\infty} \sum\_{n=0, n\neq 1}^{\infty} \sum\_{l=1}^{n-1} \pi\_{(0,0,l)} p(1-\gamma) \eta \end{split} \tag{A6}$$

where in (A6), we let *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> *<sup>n</sup>*˜, *<sup>m</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> *<sup>m</sup>*˜ , and *<sup>j</sup>* <sup>=</sup> ˜ *l*. Equations (A4), (A5) and (A7) together give

$$p(1 - \gamma)M\_2 = \gamma M\_3 \tag{A8}$$

Since the sum of all the stationary probabilities equals 1, thus we have

$$M\_1 + M\_2 + M\_3 = 1\tag{A9}$$

Combining Equations (A3), (A8) and (A9), we can solve that

$$M\_1 = \frac{(1-p)\gamma^2}{(p+\gamma-2p\gamma)\gamma+p^2(1-\gamma)^2} \tag{A10}$$

$$M\_2 = \frac{p\gamma(1-\gamma)}{(p+\gamma-2p\gamma)\gamma+p^2(1-\gamma)^2} \tag{A11}$$

$$M\_3 = \frac{p^2(1-\gamma)^2}{(p+\gamma-2p\gamma)\gamma+p^2(1-\gamma)^2} \tag{A12}$$

We mention that from the fourth, the fifth, and the sixth equation in (18), another relation can also be obtained for the second number *M*2, which is given directly as

$$M\_2 = p(1 - \gamma)M\_1 + p\gamma M\_2 + p\gamma \eta M\_3 + (1 - p)(1 - \gamma)M\_2 + (1 - p\eta)\gamma M\_3 \tag{A13}$$

which is reduced to (A8) when eliminating *M*<sup>1</sup> using the relation (A3).

Then, the relationships between functions *hi*(*x*), 1 ≤ *i* ≤ 3 and *h* (*m*) <sup>2</sup> (*x*) are determined through similar procedures. First of all, we see that

$$\begin{split} h\_{1}(\mathbf{x}) &= \sum\_{n=1}^{\infty} \mathbf{x}^{\mu} \pi\_{(\mathbf{u},0,0)} \\ &= \mathbf{x} \pi\_{(1,0,0)} + \sum\_{n=2}^{\infty} \mathbf{x}^{\mu} \left\{ \pi\_{(n-1,0,0)} (1-p) + \sum\_{k=n}^{\infty} \pi\_{(k,n-1,0)} (1-p) \gamma \right\} \\ &= p \gamma M\_{1} \mathbf{x} + \sum\_{n=1}^{\infty} \mathbf{x}^{\mu+1} \pi\_{(n,0,0)} (1-p) + \sum\_{n=2}^{\infty} \mathbf{x}^{\mu} \sum\_{k=n}^{\infty} \pi\_{(k,n-1,0)} (1-p) \gamma \\ &= p \gamma M\_{1} \mathbf{x} + (1-p) x h\_{1}(\mathbf{x}) + \sum\_{n=1}^{\infty} \mathbf{x}^{\mu+1} \sum\_{k=n+1}^{\infty} \pi\_{(k,n,0)} (1-p) \gamma \end{split} \tag{A14}$$

$$=p\gamma\mathcal{M}\_1\mathbf{x} + (1-p)\mathbf{x}h\_1(\mathbf{x}) + (1-p)\gamma\mathbf{x}h\_2^{(m)}(\mathbf{x})\tag{A15}$$

in which we denote *k* = *n*˜ and *n* − 1 = *m*˜ .

From (A15), we obtain

$$h\_1(\mathbf{x}) = \frac{p\gamma M\_1 \mathbf{x}}{1 - (1 - p)\mathbf{x}} + \frac{(1 - p)\gamma \mathbf{x}}{1 - (1 - p)\mathbf{x}} h\_2^{(m)}(\mathbf{x}) \tag{A16}$$

Using stationary Equation (18), we determine function *h*2(*x*) in the following.

*<sup>h</sup>*2(*x*) = ∑<sup>∞</sup> *<sup>m</sup>*=<sup>1</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>nπ*(*n*,*m*,0) <sup>=</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>2</sup> *<sup>x</sup>nπ*(*n*,1,0) <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup><sup>n</sup>* × ) *<sup>π</sup>*(*n*−1,*m*−1,0)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,*m*−<sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*<sup>γ</sup>* \* <sup>=</sup> *<sup>x</sup>*2*π*(2,1,0) <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup><sup>n</sup>* ) *<sup>π</sup>*(*n*−1,0,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,0) *<sup>p</sup><sup>γ</sup>* <sup>+</sup> ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup>* ∑*<sup>n</sup>*−<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>π</sup>*(*k*,*n*−1,*j*) *<sup>p</sup>γη*\* <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>1</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>n*+1*π*(*n*,*m*,0)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup><sup>n</sup>* ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,*m*−<sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*<sup>γ</sup>* = *x*<sup>2</sup> ) *<sup>π</sup>*(1,0,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>k</sup>*=<sup>2</sup> *π*(*k*,1,0) *pγ* \* <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>2</sup> *<sup>x</sup>n*+1*π*(*n*,0,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup><sup>n</sup>* ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,0) *<sup>p</sup><sup>γ</sup>* <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup><sup>n</sup>* ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup>* ∑*<sup>n</sup>*−<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>π</sup>*(*k*,*n*−1,*j*) *<sup>p</sup>γη* + (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*xh*2(*x*)+(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*γxh*(*m*) <sup>3</sup> (*x*) (A17)

Let *<sup>k</sup>* <sup>=</sup> *<sup>n</sup>*˜, *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> *<sup>m</sup>*˜ , and *<sup>m</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> ˜ *l*, we have

$$\begin{aligned} &\sum\_{m=2}^{\infty} \sum\_{m=m+1}^{\infty} \mathbf{x}^{n} \sum\_{k=n}^{\infty} \pi\_{(k,n-1,m-1)} (1 - p\eta) \gamma \\ &= \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \mathbf{x}^{m+1} \sum\_{\tilde{n}=\tilde{m}+1}^{\infty} \pi\_{(0,n,l)} (1 - p\eta) \gamma \\ &= (1 - p\eta) \gamma x h\_{3}^{(m)}(\mathbf{x}) \end{aligned} \tag{A18}$$

where we define

$$h\_3^{(m)}(\mathbf{x}) = \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^{\mathbf{m}} \pi\_{(n,m,l)} \tag{A19}$$

and the last term in Equation (A17) is obtained.

Continuing the calculation of (A17), we obtain that

$$\begin{split} h\_{2}(\mathbf{x}) &= p(1-\gamma)\mathbf{x}h\_{1}(\mathbf{x}) + p\gamma \mathbf{x}h\_{2}^{(m)}(\mathbf{x}) + \sum\_{\theta \neq -2}^{\infty} \mathbf{x}^{\theta \mathbf{i} + 1} \sum\_{\theta = \theta + 1}^{\infty} \sum\_{l=1}^{\tilde{m}-1} \pi\_{(\theta, \theta, l)} p\gamma \eta \\ &+ (1-p)(1-\gamma)\mathbf{x}h\_{2}(\mathbf{x}) + (1-p\eta)\gamma \mathbf{x}h\_{3}^{(m)}(\mathbf{x}) \end{split} \tag{A20}$$

In (A20), we have used the substitutions *<sup>k</sup>* <sup>=</sup> *<sup>n</sup>*˜, *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> *<sup>m</sup>*˜ , and *<sup>j</sup>* <sup>=</sup> ˜ *l*. Except for the factor *pγηx*, the sum in (A20) is equal to

$$\begin{split} & \quad \mathbf{x}^{2} \sum\_{n=3}^{\infty} \pi\_{(n,2,1)} + \mathbf{x}^{3} \sum\_{n=4}^{\infty} \left[ \pi\_{(n,3,1)} + \pi\_{(n,3,2)} \right] + \mathbf{x}^{4} \sum\_{n=5}^{\infty} \left[ \pi\_{(n,4,1)} + \pi\_{(n,4,2)} + \pi\_{(n,4,3)} \right] + \cdots \\ & = \sum\_{m=2}^{\infty} \mathbf{x}^{m} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,1)} + \sum\_{m=3}^{\infty} \mathbf{x}^{m} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,2)} + \sum\_{m=4}^{\infty} \mathbf{x}^{m} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,3)} + \cdots \\ & = \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \pi\_{m=l+1} \mathbf{x}^{m} \sum\_{n=m+1}^{\infty} \pi\_{(n,m,l)} \\ & = h\_{3}^{(m)}(\mathbf{x}) \end{split} \tag{A21}$$

Substituting the result (A21) into Equation (A20) and merging the same terms gives

$$h\_2(\mathbf{x})[1 - (1 - p)(1 - \gamma)\mathbf{x}] = p(1 - \gamma)\mathbf{x}h\_1(\mathbf{x}) + p\gamma \mathbf{x}h\_2^{(\mathbf{m})}(\mathbf{x}) + \gamma \mathbf{x}h\_3^{(\mathbf{m})}(\mathbf{x})\tag{A.22}$$

We compute *h* (*m*) <sup>3</sup> (*x*) while determining the other function *h* (*m*) <sup>2</sup> (*x*) in the end. As before, from the equations in (18), we find that

*h* (*m*) <sup>3</sup> (*x*) = ∑<sup>∞</sup> *<sup>l</sup>*=<sup>1</sup> ∑<sup>∞</sup> *<sup>m</sup>*=*l*+<sup>1</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>mπ*(*n*,*m*,*l*) <sup>=</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>mπ*(*n*,*m*,1) <sup>+</sup> ∑<sup>∞</sup> *<sup>l</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>m</sup>*=*l*+<sup>1</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>mπ*(*n*−1,*m*−1,*l*−<sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*) <sup>=</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup>*2*π*(*n*,2,1) <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>3</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup><sup>m</sup>* ) *<sup>π</sup>*(*n*−1,*m*−1,0) *<sup>p</sup>*(<sup>1</sup> − *<sup>γ</sup>*) <sup>+</sup> ∑*<sup>m</sup>*−<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>π</sup>*(*n*−1,*m*−1,*j*) *<sup>p</sup>*(<sup>1</sup> − *<sup>γ</sup>*)*<sup>η</sup>* \* + (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*xh*(*m*) <sup>3</sup> (*x*) <sup>=</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup>*2*π*(*n*−1,1,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>m</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>m*+1*π*(*n*,*m*,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>3</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup><sup>m</sup>* ∑*<sup>m</sup>*−<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>π</sup>*(*n*−1,*m*−1,*j*) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*<sup>η</sup>* + (<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*xh*(*m*) <sup>3</sup> (*x*) <sup>=</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*xh*(*m*) <sup>2</sup> (*x*) + *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*ηxh*(*m*) <sup>3</sup> (*x*)+(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*xh*(*m*) <sup>3</sup> (*x*) <sup>=</sup> *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*xh*(*m*) <sup>2</sup> (*x*)+(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*xh*(*m*) <sup>3</sup> (*x*) (A23)

which shows that

$$h\_3^{(m)}(\mathbf{x}) = \frac{p(1-\gamma)\mathbf{x}}{1-(1-\gamma)\mathbf{x}}h\_2^{(m)}(\mathbf{x})\tag{A24}$$

Combining Equations (A22) and (A24) yields the relation

$$h\_2(\mathbf{x}) = \frac{p(1-\gamma)\mathbf{x}}{1 - (1-p)(1-\gamma)\mathbf{x}}h\_1(\mathbf{x}) + \frac{p\gamma\mathbf{x}}{[1-(1-p)(1-\gamma)\mathbf{x}][1-(1-\gamma)\mathbf{x}]}h\_2^{(m)}(\mathbf{x}) \tag{A25}$$

Now, we deal with function *h*3(*x*). Similarly to the process of obtaining (A23), we have

$$\begin{split} h\_3(\mathbf{x}) &= \sum\_{m=-2}^{\infty} \sum\_{n=-m+1}^{n} \mathbf{x}^n \pi\_{(n,m,1)} + \sum\_{l=-2}^{\infty} \sum\_{m=-l+1}^{\infty} \sum\_{n=-m+1}^{\infty} \mathbf{x}^l \pi\_{(n-1,m-1,l-1)} (1-\gamma)(1-p\eta) \\ &= \sum\_{m=-3}^{\infty} \mathbf{x}^n \pi\_{(n,2,1)} + \sum\_{m=-3}^{\infty} \sum\_{n=-m+1}^{\infty} \mathbf{x}^n \left\{ \pi\_{(n-1,m-1,0)} p(1-\gamma) \\ &\quad + \sum\_{j=1}^{m-2} \pi\_{(n-1,m-1,j)} p(1-\gamma) \eta \right\} + (1-\gamma)(1-p\eta) \mathbf{x} h\_3(\mathbf{x}) \\ &= \sum\_{m=-3}^{\infty} \mathbf{x}^n \pi\_{(n-1,0,0)} p(1-\gamma) + \sum\_{m=-2}^{\infty} \sum\_{n=-m+1}^{\infty} \mathbf{x}^{n+1} \pi\_{(n,m,0)} p(1-\gamma) \\ &\quad + \sum\_{m=-3}^{\infty} \sum\_{n=-m+1}^{\infty} \mathbf{x}^n \sum\_{j=1}^{m-2} \pi\_{(n-1,m-1,j)} p(1-\gamma) \eta + (1-\gamma)(1-p\eta) \mathbf{x} h\_3^{\mathbf{x}}(\mathbf{x}) \\ &= p(1-\gamma) \mathbf{x} h\_2(\mathbf{x}) + p(1-\gamma) \eta \mathbf{x} h\_3^{\mathbf{x}}(\mathbf{x}) + (1-\gamma)(1-p\eta) \mathbf{x} h\_3^{\mathbf{x}}(\mathbf{x}) \\ &= p(1-\gamma) \mathbf{x} h\_2(\mathbf{x}) + (1-\gamma) \mathbf{x} h\_3^{\mathbf{$$

from which we derive

$$h\_3(\mathbf{x}) = \frac{p(1-\gamma)\mathbf{x}}{1-(1-\gamma)\mathbf{x}}h\_2(\mathbf{x})\tag{A27}$$

So far, we have obtained the relations (26)–(28) in Equations (A16), (A25) and (A27). To complete the proof of Lemma 1, the remaining part is the determination of the last function *h* (*m*) <sup>2</sup> (*x*). It is shown that

*h* (*m*) <sup>2</sup> (*x*) = ∑<sup>∞</sup> *<sup>m</sup>*=<sup>1</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>mπ*(*n*,*m*,0) <sup>=</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>2</sup> *<sup>x</sup>π*(*n*,1,0) <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup><sup>m</sup>* × ) *<sup>π</sup>*(*n*−1,*m*−1,0)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,*m*−<sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*<sup>γ</sup>* \* <sup>=</sup> *<sup>x</sup>π*(2,1,0) <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *x* ) *<sup>π</sup>*(*n*−1,0,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,0) *<sup>p</sup><sup>γ</sup>* <sup>+</sup> ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup>* ∑*<sup>n</sup>*−<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>π</sup>*(*k*,*n*−1,*j*) *<sup>p</sup>γη*\* <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>1</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup>m*+1*π*(*n*,*m*,0)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) <sup>+</sup> ∑<sup>∞</sup> *<sup>m</sup>*=<sup>2</sup> ∑<sup>∞</sup> *<sup>n</sup>*=*m*+<sup>1</sup> *<sup>x</sup><sup>m</sup>* ∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,*m*−<sup>1</sup>)(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*<sup>γ</sup>* = *x* ) *<sup>π</sup>*(1,0,0) *<sup>p</sup>*(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*) + ∑<sup>∞</sup> *<sup>k</sup>*=<sup>2</sup> *π*(*k*,1,0) *pγ* \* <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>2</sup> *xπ*(*n*,0,0) *p*(1 − *γ*) <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup>*∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup> <sup>π</sup>*(*k*,*n*−1,0) *<sup>p</sup><sup>γ</sup>* <sup>+</sup> ∑<sup>∞</sup> *<sup>n</sup>*=<sup>3</sup> *<sup>x</sup>*∑<sup>∞</sup> *<sup>k</sup>*=*<sup>n</sup>* ∑*<sup>n</sup>*−<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> *<sup>π</sup>*(*k*,*n*−1,*j*) *<sup>p</sup>γη* + (<sup>1</sup> <sup>−</sup> *<sup>p</sup>*)(<sup>1</sup> <sup>−</sup> *<sup>γ</sup>*)*xh*(*m*) <sup>2</sup> (*x*)+(<sup>1</sup> <sup>−</sup> *<sup>p</sup>η*)*γxh*(*l*) <sup>3</sup> (*x*) (A28)

Similarly, we use the substitutions *<sup>k</sup>* <sup>=</sup> *<sup>n</sup>*˜, *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> *<sup>m</sup>*˜ , and *<sup>m</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> ˜ *l*, and we write

$$\begin{split} & \sum\_{m=2}^{\infty} \sum\_{n=m+1}^{\infty} \mathbf{x}^{m} \sum\_{k=n}^{\infty} \pi\_{(k,n-1,m-1)} (1 - p\eta) \gamma \\ &= \sum\_{l=1}^{\infty} \sum\_{m=l+1}^{\infty} \mathbf{x}^{l+1} \sum\_{n=m+1}^{\infty} \pi\_{(\tilde{n}, \tilde{m}, \tilde{l})} (1 - p\eta) \gamma \\ &= (1 - p\eta) \gamma \mathbf{x} h\_{3}^{(l)}(\mathbf{x}) \end{split} \tag{A29}$$

Thus, the last term in Equation (A28) is obtained, where *h* (*l*) <sup>3</sup> (*x*) is defined and determined as

$$\begin{split} h\_{3}^{(1)}(x) &= \sum\_{l=n+1}^{\infty} \sum\_{m=l+1}^{\infty} x^{l} \sum\_{n=m+1}^{\infty} \pi\_{(n,m)} \\ &= \sum\_{n=n+2}^{\infty} x \sum\_{m=n+1}^{\infty} \pi\_{(n,m,1)} + \sum\_{l=2}^{\infty} \sum\_{m=l+1}^{\infty} x^{l} \sum\_{n=m+1}^{\infty} \pi\_{(n-1,m-1,l-1)} (1-\gamma)(1-p\eta) \\ &= x \sum\_{n=0}^{\infty} \pi\_{(n,2)} + \sum\_{m=3}^{\infty} x \sum\_{n=m+1}^{\infty} \left\{ \pi\_{(n-1,m-1,0)} p(1-\gamma) \\ &\quad + \sum\_{j=1}^{m-2} \pi\_{(n-1,m-1,j)} p(1-\gamma) \eta \right\} + (1-\gamma)(1-p\eta) x h\_{3}^{(l)}(x) \\ &= x \sum\_{n=n}^{\infty} \pi\_{(n-1,0,0)} p(1-\gamma) + \sum\_{m=2}^{\infty} x \sum\_{n=m+1}^{\infty} \pi\_{(n,m,0)} p(1-\gamma) \\ &\quad + \sum\_{n=n+3}^{\infty} x \sum\_{m=n+1}^{\infty} \sum\_{j=1}^{m-2} \pi\_{(n-1,m-1,j)} p(1-\gamma) \eta + (1-\gamma)(1-p\eta) x h\_{3}^{(l)}(x) \\ &= p(1-\gamma)M\_{3}^{x} x + p(1-\gamma)\eta M\_{3}^{x} x + (1-\gamma)(1-p\eta) x h\_{3}^{(l)}(x) \\ &= [\gamma + p(1-\gamma)] \eta M\_{3}^{x} x + (1-\gamma)(1-p\eta) x h\_{3}^{(l)}(x) \end{split}$$

In Equation (A30), we have used the relation (A8), which says *p*(1 − *γ*)*M*<sup>2</sup> = *γM*3. From (A30), we obtain the result

$$h\_3^{(l)}(\mathbf{x}) = \frac{[\gamma + p(1 - \gamma)\eta]M\_3\mathbf{x}}{1 - (1 - \gamma)(1 - p\eta)\mathbf{x}} \tag{A31}$$

Coming back to the calculation of function *h* (*m*) <sup>2</sup> (*x*), Equation (A28) shows

$$\begin{split} \frac{d}{2} \langle \mathbf{x} \rangle [1 - (1 - p)(1 - \gamma)\mathbf{x}] &= p(1 - \gamma)M\_1 \mathbf{x} + p\gamma M\_2 \mathbf{x} + p\gamma \eta M\_3 \mathbf{x} + (1 - p\eta)\gamma x h\_3^{(l)}(\mathbf{x}) \\ &= \left[ \gamma + p^2 (1 - \gamma)\eta \right] M\_2 \mathbf{x} + (1 - p\eta)\gamma x h\_3^{(l)}(\mathbf{x}), \end{split} \tag{A32}$$

in which we have used Equations (A3) and (A8) to replace numbers *M*<sup>1</sup> and *M*3. Substituting expression (A31), after some extra operations, we determine that

$$h\_2^{(m)}(x) = \frac{\left[\gamma + p^2(1-\gamma)\eta - (1-p)(1-\gamma)(1-p\eta)\gamma x\right]M\_2x}{\left[1 - (1-p)(1-\gamma)x\right]\left[1 - (1-\gamma)(1-p\eta)x\right]}\tag{A33}$$

This eventually completes the proof of Lemma 1.

#### **Appendix B. Proof of Equations (41) and (42)**

In Equation (33), we show that

*h*

$$\begin{split} H\_{PP}(\mathbf{x}) &= \frac{p\gamma M\_1 \mathbf{x} [1 - (1 - p)(1 - \gamma)\mathbf{x}]}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]} \\ &+ \frac{\gamma \mathbf{x} \left\{ 1 - (1 - p)[2(1 - \gamma) + p\gamma]\mathbf{x} + (1 - p)^2(1 - \gamma)^2 \mathbf{x}^2 \right\}}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]^2} h\_2^{(m)}(\mathbf{x}) \end{split} \tag{A34}$$

Equations (41) and (42) are obtained by decomposing each part of (A34). For the first part, we assume

$$\begin{split} \frac{1 - (1 - p)(1 - \gamma)\mathbf{x}}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]} &= \frac{A}{1 - (1 - p)\mathbf{x}} + \frac{B}{1 - (1 - \gamma)\mathbf{x}}\\ &= \frac{(A + B) - [A(1 - \gamma) + B(1 - p)]\mathbf{x}}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]} \end{split} \tag{A35}$$

and according to the coefficients of corresponding terms, we obtain

$$A + B = 1, \quad A(1 - \gamma) + B(1 - p) = (1 - p)(1 - \gamma) \tag{A36}$$

which determine *A* and *B* as

$$A = \frac{(1-p)\gamma}{\gamma - p}, \text{ and } B = \frac{p(1-\gamma)}{\gamma - p} \tag{A37}$$

Therefore,

$$\frac{p\gamma M\_1 \mathbf{x} [1 - (1 - p)(1 - \gamma)\mathbf{x}]}{[1 - (1 - p)\mathbf{x}][1 - (1 - \gamma)\mathbf{x}]} = \frac{p\gamma M\_1 \mathbf{x}}{\gamma - p} \left( \frac{(1 - p)\gamma}{1 - (1 - p)\mathbf{x}} - \frac{p(1 - \gamma)}{1 - (1 - \gamma)\mathbf{x}} \right) \tag{A38}$$

For the second part of expression (A34), let

$$\begin{split} &\frac{1-(1-p)[2(1-\gamma)+p\gamma]\mathbf{x}+(1-p)^{2}(1-\gamma)^{2}\mathbf{x}^{2}}{[1-(1-p)\mathbf{x}][1-(1-\gamma)\mathbf{x}]^{2}} \\ &=\frac{A}{1-(1-p)\mathbf{x}}+\frac{B}{1-(1-\gamma)\mathbf{x}}+\frac{\mathbf{C}}{[1-(1-\gamma)\mathbf{x}]^{2}} \\ &=\frac{(A+B+\mathbf{C})-[(A+B)(1-\gamma)+A(1-\gamma)+B(1-p)+\mathbf{C}(1-p)]\mathbf{x}+c\_{2}\mathbf{x}^{2}}{[1-(1-p)\mathbf{x}][1-(1-\gamma)\mathbf{x}]^{2}} \end{split}$$

in which

$$c\_2 = [A(1 - \gamma) + B(1 - p)](1 - \gamma)$$

Thus, we have

$$\begin{cases} 1 = A + B + \mathbb{C} \\ (1 - p)[2(1 - \gamma) + p\gamma] = (A + B)(1 - \gamma) + A(1 - \gamma) + B(1 - p) + \mathbb{C}(1 - p) \\ (1 - p)^2(1 - \gamma) = A(1 - \gamma) + B(1 - p) \end{cases} \tag{A39}$$

Substituting the third relation and *A* + *B* = 1 − *C* into the second equation, we obtain

$$[(1-p)[2(1-\gamma)+p\gamma] = (1-\mathbb{C})(1-\gamma)+(1-p)^2(1-\gamma)+\mathbb{C}(1-p)\tag{A40}$$

from which we can solve that *C* = *p*.

Then, according to the equations

$$A + B = 1 - p \quad \text{and} \quad A(1 - \gamma) + B(1 - p) = (1 - p)^2(1 - \gamma)$$

the other two numbers are obtained to be

$$A = \frac{(1-p)^2 \gamma}{\gamma - p}, \quad B = -\frac{p(1-p)(1-\gamma)}{\gamma - p} \tag{A41}$$

Thus, the factorization of the second part is obtained.

The last part—that is, the function *h* (*m*) <sup>2</sup> (*x*)—is dealt with similarly. Omitting the straight-forward calculations, we directly find that

$$h\_2^{(m)}(\mathbf{x}) = -\frac{\eta(1-p)(p+\gamma-p\gamma)}{(1-\eta)[1-(1-p)(1-\gamma)\mathbf{x}]} + \frac{(1-p\eta)[\gamma+p(1-\gamma)\eta]}{(1-\eta)[1-(1-\gamma)(1-p\eta)\mathbf{x}]} \tag{A42}$$

Notice that (1 − *η*) is contained in the denominator of fractions in Equation (A42); thus, *η* = 1. When *η* = 1, Equation (29) shows that

$$h\_2^{(m)}(\mathbf{x})\Big|\_{\eta=1} = \frac{\gamma + p^2(1-\gamma) - (1-p)^2(1-\gamma)\gamma\mathbf{x}}{[1 - (1-p)(1-\gamma)\mathbf{x}]^2} \tag{A43}$$

Summarizing the above results, both the Equations (41) and (42) are determined.

#### **Appendix C. Factorization of Last Part of Equation (42)**

We write

$$\begin{split} &\frac{p\gamma M\_{2}\mathbf{x}^{2}}{[1-(1-\gamma)\mathbf{x}]^{2}} \cdot \frac{\gamma+p^{2}(1-\gamma)-(1-p)^{2}(1-\gamma)\gamma\mathbf{x}}{[1-(1-p)(1-\gamma)\mathbf{x}]^{2}} \\ &=p\gamma M\_{2}\mathbf{x}^{2}\bigg(\frac{A}{1-(1-\gamma)\mathbf{x}} + \frac{B}{[1-(1-\gamma)\mathbf{x}]^{2}} + \frac{\mathbb{C}}{1-(1-p)(1-\gamma)\mathbf{x}} + \frac{D}{[1-(1-p)(1-\gamma)\mathbf{x}]^{2}}\bigg) \\ &=p\gamma M\_{2}\mathbf{x}^{2}\bigg(\frac{(A+B)-A(1-\gamma)\mathbf{x}}{[1-(1-\gamma)\mathbf{x}]^{2}} + \frac{(\mathbb{C}+D)-\mathbb{C}(1-p)(1-\gamma)\mathbf{x}}{[1-(1-p)(1-\gamma)\mathbf{x}]^{2}}\bigg) \\ &\tag{A44} \end{split} \tag{A44}$$

Merging the terms in the bracket of (A44), according to corresponding coefficients, it is shown that

$$\begin{cases} A + B + \mathbb{C} + D = \gamma + p^2(1 - \gamma) \\ 2(A + B)(1 - p) + 2(\mathbb{C} + D) + A + \mathbb{C}(1 - p) = (1 - p)^2 \gamma \\ (A + B)(1 - p)^2 + 2A(1 - p) + (\mathbb{C} + D) + 2\mathbb{C}(1 - p) = 0 \\ A(1 - p) + \mathbb{C} = 0 \end{cases} \tag{A45}$$

The last row of (A45) shows that *C* = −*A*(1 − *p*), and using the first relationship, the second and the third row of Equation (A45) are equivalent to

$$\begin{cases} 2(1-p)[\gamma + p^2(1-\gamma) - (\mathbb{C}+D)] + 2(\mathbb{C}+D) + A - A(1-p)^2 = (1-p)^2\gamma\\ (1-p)^2[\gamma + p^2(1-\gamma) - (\mathbb{C}+D)] + 2A(1-p) + (\mathbb{C}+D) - 2A(1-p)^2 = 0 \end{cases} \quad \text{(A46)}$$

which gives

$$p(2-p)(\mathbb{C}+D) = -2p(1-p)A - (1-p)^2[\gamma + p^2(1-\gamma)]\tag{A47}$$

and

$$2p(\mathbb{C} + D) = -p(2 - p)A + (1 - p)^2 \gamma - 2(1 - p)[\gamma + p^2(1 - \gamma)] \tag{A48}$$

Combining Equations (A47) and (A48), the coefficient *A* is solved as

$$A = \frac{2 - p}{p^3} \left( (1 - p)^2 \gamma - \frac{2(1 - p)[\gamma + p^2(1 - \gamma)]}{2 - p} \right) \tag{A49}$$

and immediately

$$\mathcal{C} = -A(1-p) = -\frac{(1-p)(2-p)}{p^3} \left( (1-p)^2 \gamma - \frac{2(1-p)[\gamma + p^2(1-\gamma)]}{2-p} \right) \tag{A50}$$

Using Equation (A47), we have

$$\begin{split} D &= \frac{-2p(1-p)A - (1-p)^2[\gamma + p^2(1-\gamma)]}{p(2-p)} - C \\ &= \frac{-2p(1-p)A - (1-p)^2[\gamma + p^2(1-\gamma)]}{p(2-p)} + A(1-p) \\ &= \frac{p^2(1-p)A - (1-p)^2[\gamma + p^2(1-\gamma)]}{p(2-p)} \\ \end{split} \tag{A51}$$

and in the end, the last number *B* is determined by

$$B = \left[\gamma + p^2(1 - \gamma)\right] - A - \mathcal{C} - D \tag{A52}$$

So far, all the coefficients are obtained and the decomposition is totally determined.

#### **References**


## *Article* **Using Timeliness in Tracking Infections †**

**Melih Bastopcu <sup>1</sup> and Sennur Ulukus 2,\***


**Abstract:** We consider real-time timely tracking of infection status (e.g., COVID-19) of individuals in a population. In this work, a health care provider wants to detect both infected people and people who have recovered from the disease as quickly as possible. In order to measure the timeliness of the tracking process, we use the long-term average difference between the actual infection status of the people and their real-time estimate by the health care provider based on the most recent test results. We first find an analytical expression for this average difference for given test rates, infection rates and recovery rates of people. Next, we propose an alternating minimization-based algorithm to find the test rates that minimize the average difference. We observe that if the total test rate is limited, instead of testing all members of the population equally, only a portion of the population may be tested in unequal rates calculated based on their infection and recovery rates. Next, we characterize the average difference when the test measurements are erroneous (i.e., noisy). Further, we consider the case where the infection status of individuals may be dependent, which occurs when an infected person spreads the disease to another person if they are not detected and isolated by the health care provider. In addition, we consider an age of incorrect information-based error metric where the staleness metric increases linearly over time as long as the health care provider does not detect the changes in the infection status of the people. Through extensive numerical results, we observe that increasing the total test rate helps track the infection status better. In addition, an increased population size increases diversity of people with different infection and recovery rates, which may be exploited to spend testing capacity more efficiently, thereby improving the system performance. Depending on the health care provider's preferences, test rate allocation can be adjusted to detect either the infected people or the recovered people more quickly. In order to combat any errors in the test, it may be more advantageous for the health care provider to not test everyone, and instead, apply additional tests to a selected portion of the population. In the case of people with dependent infection status, as we increase the total test rate, the health care provider detects the infected people more quickly, and thus, the average time that a person stays infected decreases. Finally, the error metric needs to be chosen carefully to meet the priorities of the health care provider, as the error metric used greatly influences who will be tested and at what test rate.

**Keywords:** timely infection tracking; age of information; timely tracking of multiple processes; Markovian infection spread model

#### **1. Introduction**

We consider the problem of timely tracking of an infectious disease, e.g., COVID-19, in a population of *n* people. In this problem, a health care provider wants to detect infected people as quickly as possible in order to take precautions such as isolating them from the rest of the population. The health care provider also wants to detect people who have recovered from the disease as soon as possible since these people need to return to work which is especially critical in sectors such as education, food retail, public transportation,

**Citation:** Bastopcu, M.; Ulukus, S. Using Timeliness in Tracking Infections. *Entropy* **2022**, *24*, 779. https://doi.org/10.3390/e24060779

Academic Editor: Luca Faes

Received: 8 March 2022 Accepted: 27 May 2022 Published: 31 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

etc. Ideally, the health care provider should test all people all the time. However, as the total test rate is limited, the question is how frequently the health care provider should apply tests on these people when their infection and recovery rates are known. In a broader sense, this problem is related to timely tracking of multiple processes in a resource-constrained setting where each process takes binary values of 0 and 1 with different change rates.

Recent studies have shown that people who have recovered from infectious diseases such as COVID-19 can be reinfected. Furthermore, the recovery times of individuals may vary significantly. For these reasons, in this problem, the *i*th person becomes infected with rate *λ<sup>i</sup>* which is independent of the others. Similarly, the *i*th person recovers from the disease with rate *μi*. We note that the index *i* may represent a specific individual or a group of individuals that share common features such as age, gender, and profession. Depending on the demographics, coefficients *λ<sup>i</sup>* and *μ<sup>i</sup>* may be statistically known by the health care provider. We denote the infection status of the *i*th person as *xi*(*t*) (shown with the black curves on the left in Figure 1) which takes the value 1 when the person is infected and the value 0 when the person is healthy. The health care provider applies tests to people marked as healthy with rate *si* and to people marked as infected with rate *ci*. Based on the test results, the health care provider forms an estimate for the infection status of the *i*th person denoted by *x*ˆ*i*(*t*) (shown with the blue curves on the right in Figure 1) which takes the value 1 when the most recent test result is positive and the value 0 when it is negative.

**Figure 1.** System model. There are *n* people whose infection status are given by *xi*(*t*). The health care provider applies tests on these people. Based on the test results, estimations for the infection status *x*ˆ*i*(*t*) are generated. Infected people are shown in red and healthy people are shown in green.

We measure the timeliness of the tracking process by the difference between the actual infection status of people and the real-time estimate of the health care provider which is based on the most recent test results. The difference can occur in two different cases: (i) when the person is sick (*xi*(*t*) = 1) and the health care provider maps this person as healthy (*x*ˆ*i*(*t*) = 0), and (ii) when the person recovers from the disease (*xi*(*t*) = 0) but the health care provider still considers this person as infected (*x*ˆ*i*(*t*) = 1). The former case represents the error due to late detection of infected people, while the latter case represents the error due to late detection of healed people. Depending on the health care provider's preferences, detecting infected people may be more important than detecting recovered people (controlling infection), or the other way around (returning people to workforce).

The age of information was proposed to measure timeliness of information in communication systems, and has been studied in the context of queueing systems [1–8], multi-hop and multi-cast networks [9–17], social networks [18], timely remote estimation of random processes [19–25], energy harvesting systems [26–40], wireless fading channels [41,42], scheduling in networks [43–55], lossless and lossy source and channel coding [56–66], vehicular, IoT and UAV systems [67–70], caching systems [71–82], computation-intensive systems [83–90], learning systems [91–93], gossip networks [94–97] and so forth. A more

detailed review of the age of information literature can be found in references [98–100]. Most relevant to our work, the real-time timely estimation of single and multiple counting processes [19,25], a Wiener process [20], a random walk process [101], and binary and multiple states Markov sources [23,51,102] have been studied. The study that is closest to our work is reference [23], where the remote estimation of a symmetric binary Markov source is studied in a time-slotted system by finding the optimal sampling policies via formulating a Markov Decision Process (MDP) for real-time error, AoI and AoII metrics. Different from [23], in our work, we consider real-time timely estimation of multiple nonsymmetric binary sources for a continuous time system. In our work, the sampler (health care provider) does not know the states of the sources (infection status of people), and thus, takes the samples (applies medical tests) randomly (exponential random variables) with fixed rates. Thus, we optimize the test rates of people to minimize the real-time estimation error.

In this paper, we consider the real-time timely tracking of infection status of *n* people. We first find an analytical expression for the long-term average difference between the actual infection status of people and the estimate of the health care provider based on test results. Then, we propose an alternating minimization-based algorithm to identify the test rates *si* and *ci* for all people. We observe that if the total test rate is limited, we may not apply tests on all people equally. Next, we provide an alternative method to characterize the average difference, by finding the steady state of a Markov chain defined by (*xi*(*t*), *x*ˆ*i*(*t*)). By using this alternative method, we determine the average estimation error when there are errors in the test measurements expressed by a false positive rate *p* and a false negative rate *q*. Next, we consider the infection status of two people where an infected person may spread the disease to another person if the infection has not been detected by the health care provider to consequently isolate the infected person. Finally, we consider an age of incorrect information-based error metric where the estimation error increases linearly over time when the health care provider has not detected the changes in the infection status of the people.

Through extensive numerical results, we observe that increasing the total test rate helps track the infection status of people better, and increasing the size of the population increases diversity which may be exploited to improve the performance. Depending on the health care provider's priorities, we can allocate additional tests to people marked as healthy to detect the infections faster or to people marked as infected to detect the recoveries more quickly. In order to combat the test errors, the health care provider may prefer to apply tests to only a selected portion of the population with higher test rates. When the infection status of a person depends on that of another person, the average time that a person remains infected can be reduced by increasing the total test rate as it helps to detect the infected people more quickly. Finally, we observe that depending on the error metric used, the test rate distribution among the population differs greatly, and thus, we should choose an error metric that aligns with the priorities of the health care provider.

#### **2. System Model**

We consider a population of *n* people. We denote the infection status of the *i*th person at time *t* as *xi*(*t*) (black curve in Figure 2a) which takes binary values 0 or 1 as follows,

$$x\_i(t) = \begin{cases} 1, & \text{if the } i \text{th person is infected at time } t, \\ 0, & \text{otherwise.} \end{cases} \tag{1}$$

In this paper, we consider a model where each person can be infected multiple times after recovering from the disease. We denote the time interval that the *i*th person stays healthy for the *j*th time as *Wi*(*j*) which is exponentially distributed with rate *λi*. We denote the recovery time for the *i*th person after being infected with the virus for the *j*th time as *Ri*(*j*) which is exponentially distributed with rate *μi*.

A health care provider wants to track the infection status of each person. Based on the test results at times *ti*,, the health care provider generates an estimate for the status of the *i*th person denoted as *x*ˆ*i*(*t*) (blue curve in Figure 2a) by

$$
\hat{x}\_i(t) = x\_i(t\_{i,\ell}), \quad t\_{i,\ell} \le t < t\_{i,\ell+1}. \tag{2}
$$

When *x*ˆ*i*(*t*) is 1, the health care provider applies the next test to the *i*th person after an exponentially distributed time with rate *ci*. When *x*ˆ*i*(*t*) is 0, the next test is applied to the *i*th person after an exponentially distributed time with rate *si*.

**Figure 2.** (**a**) A sample evolution of *xi*(*t*) and *x*ˆ*i*(*t*), and (**b**) the corresponding Δ*i*(*t*) in (5). Green areas correspond to the error caused by Δ*i*1(*t*) in (3). Orange areas correspond to the error caused by Δ*i*2(*t*) in (4).

An estimation error happens when the actual infection status of the *i*th person, *xi*(*t*), is different than the estimate of the health care provider, *x*ˆ*i*(*t*), at time *t*. This could happen in two ways: when *xi*(*t*) = 1 and *x*ˆ*i*(*t*) = 0, i.e., when the *i*th person is sick, but remains undetected by the health care provider, and when *xi*(*t*) = 0 and *x*ˆ*i*(*t*) = 1, i.e., when the *i*th person has recovered, but the health care provider is unaware that the *i*th person has recovered.

We denote the error caused by the former case, i.e., when *xi*(*t*) = 1 and *x*ˆ*i*(*t*) = 0, by Δ*i*1(*t*) (green areas in Figure 2b),

$$\Delta\_{i1}(t) = \max\{\mathfrak{x}\_i(t) - \mathfrak{x}\_i(t), 0\},\tag{3}$$

and we denote the error caused by the latter case, i.e., when *xi*(*t*) = 0 and *x*ˆ*i*(*t*) = 1, by Δ*i*2(*t*) (orange areas in Figure 2b),

$$\Delta\_{i2}(t) = \max\{\pounds\_i(t) - x\_i(t), 0\}. \tag{4}$$

Then, the total estimation error for the *i*th person Δ*i*(*t*) is

$$
\Delta\_i(t) = \theta \Delta\_{i1}(t) + (1 - \theta)\Delta\_{i2}(t), \tag{5}
$$

where *θ* is the importance factor in [0, 1]. A large *θ* gives more importance to the detection of infected people, and a small *θ* gives more importance to the detection of recovered people. We define the long-term weighted average difference between *xi*(*t*) and *x*ˆ*i*(*t*) as

$$\Delta\_{\bar{i}} = \lim\_{T \to \infty} \frac{1}{T} \int\_0^T \Delta\_{\bar{i}}(t)dt. \tag{6}$$

Then, the overall average difference of all people Δ is

$$
\Delta = \frac{1}{n} \sum\_{i=1}^{n} \Delta\_i. \tag{7}
$$

Our aim is to track the infection status of all people. Due to limited resources, there is a total test rate constraint ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* + <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ci* ≤ *C*. Thus, our aim is to find the optimal test rates *si* and *ci* to minimize Δ in (7) while satisfying this total test rate constraint. We formulate the following problem,

$$\begin{array}{ll}\min & \Delta\\ \{s\_i, c\_i\} & \\ & \text{s.t.} & \sum\_{i=1}^n s\_i + \sum\_{i=1}^n c\_i \le \mathbb{C} \\ & s\_i \ge 0, \quad c\_i \ge 0, \quad i = 1, \dots, n. \end{array} \tag{8}$$

We provide a summary of the list of the variables used in this work in Table 1. In the next section, we find the total average difference Δ.

**Table 1.** List of variables used in this work.


#### **3. Average Difference Analysis**

In this section, we provide a probabilistic analysis to characterize the average difference Δ. In Section 5.1, we give an alternative method to find Δ by analyzing the steady-state distribution of the Markov chain induced by the states (*xi*(*t*), *x*ˆ*i*(*t*)). Here, we first find analytical expressions for Δ*i*1(*t*) in (3) and Δ*i*2(*t*) in (4) when *si* > 0 and *ci* > 0. We note that Δ*i*1(*t*) can be equal to 1 when *x*ˆ*i*(*t*) = 0 and is always equal to 0 when *x*ˆ*i*(*t*) = 1. Assume that at time 0, both *xi*(0) and *x*ˆ*i*(0) are 0. After an exponentially distributed time with rate *λi*, which is denoted by *Wi*, the *i*th person is infected, and thus *xi*(*t*) becomes 1. At that time, since *x*ˆ*i*(*t*) = 0, Δ*i*1(*t*) becomes 1. Further, Δ*i*1(*t*) will be equal to 0 again either when the *i*th person recovers from the disease which happens after *Ri* which is exponentially distributed with rate *μ<sup>i</sup>* or when the health care provider performs a test on the *i*th person after *Di*, which is exponentially distributed with rate *si*. We define *Tm*(*i*) as the earliest time at which one of these two cases happens, i.e., *Tm*(*i*) = min{*Ri*, *Di*} (which is shown by the green areas in Figure 3a). We note that *Tm*(*i*) is also exponentially distributed with rate *<sup>μ</sup><sup>i</sup>* + *si*, and we have P(*Tm*(*i*) = *Ri*) = *<sup>μ</sup><sup>i</sup> <sup>μ</sup>i*+*si* and <sup>P</sup>(*Tm*(*i*) = *Di*) = *si μi*+*si* . If the *i*th person recovers from the disease before testing, we return to the initial case where both *xi*(*t*) and *x*ˆ*i*(*t*) are equal to 0 again. In this case, the cycle repeats itself, i.e., the *i*th person becomes sick again after *Wi* and Δ*i*1(*t*) remains as 1 until either the person recovers or the health care provider performs a test which takes another *Tm*(*i*) duration. If the health care provider performs a test before the person recovers, then *x*ˆ*i*(*t*) becomes 1. We denote the time interval for which *x*ˆ*i*(*t*) stays at 0 as *Ii*<sup>1</sup> which is given by

$$I\_{i1} = \sum\_{\ell=1}^{K\_1} T\_{\mathfrak{m}}(i,\ell) + \mathcal{W}\_i(\ell), \tag{9}$$

where *<sup>K</sup>*<sup>1</sup> is geometric with rate P(*Tm*(*i*) = *Di*) = *si μi*+*si* . Due to [103] (Prob. 9.4.1), ∑*K*<sup>1</sup> =<sup>1</sup> *Tm*(*i*, ) and <sup>∑</sup>*K*<sup>1</sup> =<sup>1</sup> *Wi*() are exponentially distributed with rates *si* and *<sup>λ</sup>isi μi*+*si* , respectively. As E[*Ii*1] = E[∑*K*<sup>1</sup> =<sup>1</sup> *Tm*(*i*, )] + <sup>E</sup>[∑*K*<sup>1</sup> =<sup>1</sup> *Wi*()], we have

$$\mathbb{E}[I\_{i1}] = \frac{1}{s\_i} + \frac{s\_i + \mu\_i}{s\_i \lambda\_i}. \tag{10}$$

When *x*ˆ*i*(*t*) = 1, the health care provider marks the *i*th person as infected. The *i*th person recovers from the virus after *Ri*. After the *i*th person recovers, either the health care provider performs a test after *Zi* which is exponentially distributed with rate *ci* or the *i*th person is reinfected with the virus which takes *Wi* time. We define *Tu*(*i*) as the earliest time at which one of these two cases happens, i.e., *Tu*(*i*) = min{*Wi*, *Zi*} (which is shown by the orange areas in Figure 3b). Similarly, we note that *Tu*(*i*) is exponentially distributed with rate *<sup>λ</sup><sup>i</sup>* + *ci*, and we have P(*Tu*(*i*) = *Wi*) = *<sup>λ</sup><sup>i</sup> <sup>λ</sup>i*+*ci* and <sup>P</sup>(*Tu*(*i*) = *Zi*) = *ci λi*+*ci* . If the person is reinfected with the virus before a test is applied, this cycle repeats itself, i.e., the *i*th person recovers after another *Ri*, and then either a test is applied to the *i*th person, or the person is infected again which takes another *Tu*(*i*). If the health care provider performs a test to the *i*th person before the person is reinfected, the health care provider marks the *i*th person as healthy again, i.e., *x*ˆ*i*(*t*) becomes 0. We denote the time interval that *x*ˆ*i*(*t*) is equal to 1 as *Ii*<sup>2</sup> which is given by

$$I\_{i2} = \sum\_{\ell=1}^{K\_2} T\_u(i,\ell) + R\_i(\ell),\tag{11}$$

where *<sup>K</sup>*<sup>2</sup> is geometric with rate P(*Tu*(*i*) = *Zi*) = *ci λi*+*ci* . Similarly, ∑*K*<sup>2</sup> =<sup>1</sup> *Tu*(*i*, ) and ∑*K*<sup>2</sup> =<sup>1</sup> *Ri*() are exponentially distributed with rates *ci* and *ciμ<sup>i</sup> λi*+*ci* , respectively. As E[*Ii*2] = E[∑*K*<sup>2</sup> =<sup>1</sup> *Tu*(*i*, )] + <sup>E</sup>[∑*K*<sup>2</sup> =<sup>1</sup> *Ri*()], we have

$$\mathbb{E}[I\_{i2}] = \frac{1}{c\_i} + \frac{c\_i + \lambda\_i}{c\_i \mu\_i}. \tag{12}$$

We denote the time interval between the *j*th and (*j* + 1)th times that *x*ˆ*i*(*t*) changes from 1 to 0 as the *j*th cycle *Ii*(*j*) where *Ii*(*j*) = *Ii*1(*j*) + *Ii*2(*j*). We note that Δ*i*1(*t*) is always equal to 0 during *Ii*2(*j*), i.e., *x*ˆ*i*(*t*) = 1, and Δ*i*1(*t*) is equal to 1 when *xi*(*t*) = 1 in *Ii*1(*j*). We denote the total time duration when Δ*i*1(*t*) is equal to 1 as *Te*,1(*i*, *j*) during the *j*th cycle where *Te*,1(*i*, *<sup>j</sup>*) = <sup>∑</sup>*K*<sup>1</sup> =<sup>1</sup> *Tm*(*i*, ). Thus, we have <sup>E</sup>[*Te*,1(*i*)] = <sup>1</sup> *si* . Then, using ergodicity, similar to [80], Δ*i*<sup>1</sup> is equal to

$$\Delta\_{i1} = \frac{\mathbb{E}[T\_{\mathfrak{e},1}(i)]}{\mathbb{E}[I\_{i}]} = \frac{\mathbb{E}[T\_{\mathfrak{e},1}(i)]}{\mathbb{E}[I\_{i1}] + \mathbb{E}[I\_{i2}]}.\tag{1.3}$$

Thus, we have

$$
\Delta\_{i1} = \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{c\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i}. \tag{14}
$$

Next, we find Δ*i*2. We note that Δ*i*2(*t*) is equal to 1 when *xi*(*t*) = 0 in *Ii*2(*j*) and is always equal to 0 during *Ii*1(*j*). Similarly, we denote the total time duration where Δ*i*2(*t*) is equal to 1 in the *<sup>j</sup>*th cycle *Ii*(*j*) as *Te*,2(*i*, *<sup>j</sup>*) which is equal to *Te*,2(*i*, *<sup>j</sup>*) = <sup>∑</sup>*K*<sup>2</sup> =<sup>1</sup> *Tu*(*i*, ). Thus, we have E[*Te*,2(*i*)] = <sup>1</sup> *ci* . Then, similar to Δ*i*<sup>1</sup> in (13), Δ*i*<sup>2</sup> is equal to

$$
\Delta\_{i2} = \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{s\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i}. \tag{15}
$$

By using (5), (14), and (15), we obtain Δ*<sup>i</sup>* as

$$
\Delta\_i = \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{\theta c\_i + (1 - \theta) s\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i}. \tag{16}
$$

Then, by inserting (16) in (7), we obtain Δ. In the next section, we solve the optimization problem in (8).

**Figure 3.** A sample evolution of (**a**) Δ*i*1(*t*), and (**b**) Δ*i*2(*t*) in a typical cycle.

#### **4. Optimization of Average Difference**

In this section, we solve the optimization problem in (8). Using Δ*<sup>i</sup>* in (16) in (7), we rewrite (8) as

$$\begin{aligned} \min\_{\{s\_i, c\_i\}} & \quad \sum\_{i=1}^n \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{\theta c\_i + (1 - \theta) s\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i} \\ \text{s.t.} & \quad \sum\_{i=1}^n s\_i + \sum\_{i=1}^n c\_i \le \mathbb{C} \\ & \quad s\_i \ge 0, \quad c\_i \ge 0, \quad i = 1, \dots, n. \end{aligned}$$

We define the Lagrangian function [104] for (17) as

$$\mathcal{L} = \sum\_{i=1}^{n} \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{\theta c\_i + (1 - \theta) s\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i} + \beta \left( \sum\_{i=1}^{n} s\_i + c\_i - \mathcal{C} \right) - \sum\_{i=1}^{n} \nu\_i s\_i - \sum\_{i=1}^{n} \eta\_i c\_i \tag{18}$$

where *β* ≥ 0, *ν<sup>i</sup>* ≥ 0, and *η<sup>i</sup>* ≥ 0. The KKT conditions are

$$\frac{\partial \mathcal{L}}{\partial \mathbf{s}\_{i}} = \frac{\mu\_{i} \lambda\_{i} c\_{i}}{\mu\_{i} + \lambda\_{i}} \frac{(1 - \theta)\mu\_{i} - \theta(c\_{i} + \lambda\_{i})}{(\mu\_{i} c\_{i} + \lambda\_{i} s\_{i} + s\_{i} c\_{i})^{2}} + \beta - \nu\_{i} = 0,\tag{19}$$

$$\frac{\partial \mathcal{L}}{\partial \mathbf{c}\_{i}} = \frac{\mu\_{i}\lambda\_{i}\mathbf{s}\_{i}}{\mu\_{i} + \lambda\_{i}} \frac{\theta \lambda\_{i} - (1 - \theta)(\mu\_{i} + s\_{i})}{(\mu\_{i}\mathbf{c}\_{i} + \lambda\_{i}\mathbf{s}\_{i} + s\_{i}\mathbf{c}\_{i})^{2}} + \beta - \eta\_{i} = 0,\tag{20}$$

for all *i*. The complementary slackness conditions are

$$
\beta \left( \sum\_{i=1}^{n} s\_i + c\_i - \mathbb{C} \right) = 0, \quad \nu\_i s\_i = 0, \quad \eta\_i c\_i = 0. \tag{21}
$$

First, we find *si*. From (19), we have

$$\left(\mu\_i c\_i + \lambda\_i s\_i + s\_i c\_i\right)^2 = \frac{\mu\_i \lambda\_i c\_i}{\mu\_i + \lambda\_i} \frac{\theta(c\_i + \lambda\_i) - (1 - \theta)\mu\_i}{\beta - \nu\_i}.\tag{22}$$

When *θ*(*ci* + *λi*) ≥ (1 − *θ*)*μi*, we solve (22) for *si* as

$$s\_i = \frac{\mu\_i c\_i}{\lambda\_i + c\_i} \left( \sqrt{\frac{1}{\mu\_i c\_i} \frac{\lambda\_i}{\mu\_i + \lambda\_i} \frac{\theta(c\_i + \lambda\_i) - (1 - \theta)\mu\_i}{\beta}} - 1 \right)^+,\tag{23}$$

where we used the fact that we either have *si* > 0 and *ν<sup>i</sup>* = 0, or *si* = 0 and *ν<sup>i</sup>* ≥ 0, due to (21). Here, (·)<sup>+</sup> <sup>=</sup> max(·, 0). On the other hand, when *<sup>θ</sup>*(*ci* <sup>+</sup> *<sup>λ</sup>i*) <sup>&</sup>lt; (<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*)*μi*, we have *∂*Δ*<sup>i</sup> <sup>∂</sup>si* > 0, and thus it is optimal to choose *si* = 0 as our aim is to minimize <sup>Δ</sup> in (7). In this case, when *si* = 0, we have <sup>Δ</sup>*<sup>i</sup>* = *θλ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* which is independent of the value of *ci*. As we obtain the same Δ*<sup>i</sup>* for all values of *ci*, and the total update rate is limited, i.e., ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* + *ci* ≤ *C*, in this case, it is optimal to choose *ci* = 0 as well (i.e., when *si* = 0).

Next, we find *ci*. From (20), we have

$$(\mu\_i c\_i + \lambda\_i s\_i + s\_i c\_i)^2 = \frac{\mu\_i \lambda\_i s\_i}{\mu\_i + \lambda\_i} \frac{(1 - \theta)(\mu\_i + s\_i) - \theta \lambda\_i}{\beta - \eta\_i}. \tag{24}$$

When (1 − *θ*)(*μ<sup>i</sup>* + *si*) ≥ *θλi*, we solve (24) for *ci* as

$$c\_{i} = \frac{\lambda\_{i}s\_{i}}{\mu\_{i} + s\_{i}} \left( \sqrt{\frac{1}{\lambda\_{i}s\_{i}}} \frac{\mu\_{i}}{\mu\_{i} + \lambda\_{i}} \frac{(1-\theta)(s\_{i} + \mu\_{i}) - \theta\lambda\_{i}}{\beta}} - 1 \right)^{+},\tag{25}$$

where we used the fact that we either have *ci* > 0 and *η<sup>i</sup>* = 0, or *ci* = 0 and *η<sup>i</sup>* ≥ 0, due to (21). Similarly, when (<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*)(*si* <sup>+</sup> *<sup>μ</sup>i*) <sup>&</sup>lt; *θλi*, we have *<sup>∂</sup>*Δ*<sup>i</sup> <sup>∂</sup>ci* > 0. Thus, in this case, it is optimal to choose *ci* <sup>=</sup> 0. When *ci* <sup>=</sup> 0, we have <sup>Δ</sup>*<sup>i</sup>* <sup>=</sup> (1−*θ*)*μ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* which is independent of the value of *si*. Thus, it is optimal to choose *si* = 0 when *ci* = 0.

From (23), if <sup>1</sup> *μici λi μi*+*λ<sup>i</sup>* (*θ*(*ci* + *λi*) − (1 − *θ*)*μi*) ≤ *β*, we must have *si* = 0. Thus, for a given *ci*, the optimal test rate allocation policy for *si* is a *threshold policy* where *si*'s with small <sup>1</sup> *μici λi μi*+*λ<sup>i</sup>* (*θ*(*ci* + *λi*) − (1 − *θ*)*μi*) are equal to zero. Similarly, from (25), if 1 *λisi μi μi*+*λ<sup>i</sup>* ((1 − *θ*)(*si* + *μi*) − *θλi*) ≤ *β*, we must have *ci* = 0. Thus, for a given *si*, the optimal policy to determine *ci* is a *threshold policy* where *ci*'s with small <sup>1</sup> *λisi μi μi*+*λ<sup>i</sup>* ((1 − *θ*)(*si* + *μi*) − *θλi*) are equal to zero.

Next, we show that in the optimal policy, if *si* > 0 and *ci* > 0 for some *i*, then the total test rate constraint must be satisfied with equality, i.e., ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* + *ci* = *C*.

**Lemma 1.** *In the optimal policy, if si* > 0 *and ci* > 0 *for some i, then we have* ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* + *ci* = *C.*

**Proof of Lemma 1.** The derivatives of Δ*<sup>i</sup>* with respect to *si* and *ci* are

$$\frac{\partial \Delta\_{\dot{i}}}{\partial \mathbf{s}\_{\dot{i}}} = \frac{\mu\_{i}\lambda\_{i}\mathbf{c}\_{\dot{i}}}{\mu\_{i} + \lambda\_{\dot{i}}} \frac{\left(1 - \theta\right)\mu\_{i} - \theta\left(\mathbf{c}\_{\dot{i}} + \lambda\_{\dot{i}}\right)}{\left(\mathbf{c}\_{i}\mu\_{i} + \mathbf{s}\_{i}\mathbf{c}\_{\dot{i}} + \lambda\_{i}\mathbf{s}\_{\dot{i}}\right)^{2}}\,\mathrm{}\tag{26}$$

$$\frac{\partial \Delta\_i}{\partial c\_i} = \frac{\mu\_i \lambda\_i s\_i}{\mu\_i + \lambda\_i} \frac{\theta \lambda\_i - (1 - \theta)(s\_i + \mu\_i)}{\left(c\_i \mu\_i + s\_i c\_i + \lambda\_i s\_i\right)^2}. \tag{27}$$

We note that *si* > 0 in (23) implies that *θ*(*ci* + *λi*) > (1 − *θ*)*μi*. In this case, we have *∂*Δ*<sup>i</sup> <sup>∂</sup>si* <sup>&</sup>lt; 0. Similarly, *ci* <sup>&</sup>gt; 0 in (25) implies that (<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*)(*si* <sup>+</sup> *<sup>μ</sup>i*) <sup>&</sup>gt; *θλi*. Thus, we have *<sup>∂</sup>*Δ*<sup>i</sup> <sup>∂</sup>ci* < 0. Therefore, in the optimal policy, if we have *si* > 0 and *ci* > 0 for some *i*, then we must have ∑*n <sup>i</sup>*=<sup>1</sup> *si* + *ci* = *C*. Otherwise, we can further decrease Δ in (7) by increasing *ci* or *si*.

Next, we propose an alternating minimization-based algorithm for finding *si* and *ci*. For this purpose, for given initial (*si*, *ci*) pairs, we define *φ<sup>i</sup>* as

$$\phi\_{i} = \begin{cases} \frac{1}{\mu\_{i}c\_{i}} \frac{\lambda\_{i}}{\mu\_{i} + \lambda\_{i}} (\theta(c\_{i} + \lambda\_{i}) - (1 - \theta)\mu\_{i}), \ i = 1, \ldots, n, \\\frac{1}{\lambda\_{i}s\_{i}} \frac{\mu\_{i}}{\mu\_{i} + \lambda\_{i}} ((1 - \theta)(s\_{i} + \mu\_{i}) - \theta\lambda\_{i}), \ i = n + 1, \ldots, 2n. \end{cases} \tag{28}$$

Then, we define *ui* as

$$u\_{i} = \begin{cases} \frac{\mu\_{i}c\_{i}}{\lambda\_{i} + c\_{i}} \left(\sqrt{\frac{\Phi\_{i}}{\beta}} - 1\right)^{+}, & i = 1, \ldots, n, \\\frac{\lambda\_{i}s\_{i}}{\mu\_{i} + s\_{i}} \left(\sqrt{\frac{\Phi\_{i}}{\beta}} - 1\right)^{+}, & i = n + 1, \ldots, 2n. \end{cases} \tag{29}$$

From (23) and (25), *si* = *ui* and *ci* = *un*+*i*, for *i* = 1, . . . , *n*.

Next, we find *si* and *ci* by determining *β* in (29). First, assume that, in the optimal policy, there is an *i* such that *si* > 0 and *ci* > 0. Thus, by Lemma 1, we must have ∑*n <sup>i</sup>*=<sup>1</sup> *si* + *ci* = *<sup>C</sup>*. We initially take random (*si*, *ci*) pairs such that <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* + *ci* = *C*. Then, given the initial (*si*, *ci*) pairs, we immediately choose *ui* = 0 for *φ<sup>i</sup>* < 0. For the remaining *ui* with *φ<sup>i</sup>* ≥ 0, we apply a solution method similar to that in [80]. By assuming *φ<sup>i</sup>* ≥ *β*, i.e., by disregarding (·)<sup>+</sup> in (29), we solve <sup>∑</sup>2*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ui* = *C* for *β*. Then, we compare the smallest *φ<sup>i</sup>* which is larger than zero in (28) with *β*. If we have *φ<sup>i</sup>* ≥ *β*, then it implies that *ui* ≥ 0 for all remaining *i*. Thus, we have obtained *ui* values for given initial (*si*, *ci*) pairs. If the smallest *φ<sup>i</sup>* which is larger than zero is smaller than *β*, then the corresponding *ui* is negative and we should choose *ui* = 0 for the smallest non-negative *φi*. Then, we repeat this procedure until the smallest non-negative *φ<sup>i</sup>* is larger than *β*. After determining all *ui*, we obtain *si* = *ui* and *ci* = *un*+*<sup>i</sup>* for *i* = 1, ... , *n*. Then, with the updated values of (*si*, *ci*) pairs, we keep finding *ui*'s until the KKT conditions in (19) and (20) are satisfied.

We note that for indices (persons) *i* for which (*si*, *ci*) are zero, the health care provider does not perform any tests, and maps these people as either always infected, i.e., *x*ˆ*i*(*t*) = 1 for all *<sup>t</sup>*, or always healthy, i.e., *<sup>x</sup>*ˆ*i*(*t*) = 0. If *<sup>x</sup>*ˆ*i*(*t*) = 0 for all *<sup>t</sup>*, <sup>Δ</sup>*<sup>i</sup>* = *θλ<sup>i</sup> μi*+*λ<sup>i</sup>* , and if *x*ˆ*i*(*t*) = 1 for all *<sup>t</sup>*, <sup>Δ</sup>*<sup>i</sup>* <sup>=</sup> (1−*θ*)*μ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* . Thus, for such *<sup>i</sup>*, the health care provider should choose *<sup>x</sup>*ˆ*i*(*t*) = 0 for all *t*, if *θλ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* <sup>&</sup>lt; (1−*θ*)*μ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* , and should choose *<sup>x</sup>*ˆ*i*(*t*) = 1 for all *<sup>t</sup>*, otherwise, without performing any tests.

Finally, we note that the problem in (17) is not a convex optimization problem as the objective function is not jointly convex in *si* and *ci*. Therefore, the solutions obtained via the proposed method may not be globally optimal. For this reason, we select different initial starting points and apply the proposed alternating minimization-based algorithm and choose the solution that achieves the smallest Δ in (7).

In the next section, we first provide an alternative method to find the average difference Δ in (6) and then characterize the average difference for the erroneous test measurements.

#### **5. Average Difference for the Case with Erroneous Test Measurements**

We note that the infection status of the *i*th person and its estimate at the health care provider form a continuous time Markov chain (Section 7.5 of [105]) with the states (*xi*(*t*), *x*ˆ*i*(*t*)) ∈ {(0, 0),(0, 1),(1, 0),(1, 1)}. In this section, by finding the steady-state distribution for (*xi*(*t*), *x*ˆ*i*(*t*)), we provide an alternative method to find Δ in (6). Then, we consider the case with erroneous test measurements. For this case, we characterize the long-term average difference for the *i*th person denoted by Δ*<sup>e</sup> i* .

#### *5.1. An Alternative Method to Characterize Average Difference*

When there is no error in the tests, the state transition graph is shown in Figure 4a. Assuming that *si* > 0, *ci* > 0, every state is accessible from any other state, and thus, the Markov chain induced by the system is irreducible. Note that in Section 4, we see that the testing rates for some people can be equal to 0, i.e., *si* = 0 and *ci* = 0. For these people, we choose *x*ˆ*i*(*t*) to be either always 0 or 1, i.e., consider them as always healthy or sick all the time. Depending on the choice of *x*ˆ*i*(*t*), when *si* = 0 and *ci* = 0, either the states (0, 0) and (1, 0), or the states (0, 1) and (1, 1) will be transient, and thus, have 0 probability in the steady state. By using small time-step approximation to a discrete time Markov chain, one can show that the self transition probabilities are non-zero, and thus, the Markov chain induced by the system is also aperiodic (Section 7.5 of [105]). Therefore, the Markov chain shown in Figure 4a admits a unique stationary distribution given by *π* = {*π*00, *π*01, *π*10, *π*11}. We find the stationary distribution by writing the local-balance equations which are given as

$$
\pi\_{00}\lambda\_i = \pi\_{10}\mu\_i + \pi\_{01}c\_{i\prime} \tag{30}
$$

$$
\pi\_{10}(\mu\_i + s\_i) = \pi\_{00}\lambda\_{i\prime} \tag{31}
$$

$$
\pi\_{01}(c\_i + \lambda\_i) = \pi\_{11}\mu\_{i\prime} \tag{32}
$$

$$
\pi\_{11}\mu\_i = \pi\_{10}s\_i + \pi\_{01}\lambda\_i. \tag{33}
$$

By using (30)–(33) and ∑<sup>2</sup> *<sup>k</sup>*=<sup>1</sup> <sup>∑</sup><sup>2</sup> =<sup>1</sup> *πk* = 1, we find the steady-state distribution *π* as

$$
\pi\_{01} = \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{s\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i},
\tag{34}
$$

$$
\pi\_{10} = \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{c\_i}{\mu\_i c\_i + \lambda\_i s\_i + c\_i s\_i},
\tag{35}
$$

and *<sup>π</sup>*<sup>00</sup> <sup>=</sup> *<sup>μ</sup>i*+*si <sup>λ</sup><sup>i</sup> <sup>π</sup>*10, and *<sup>π</sup>*<sup>11</sup> <sup>=</sup> *ci*+*λ<sup>i</sup> <sup>μ</sup><sup>i</sup> <sup>π</sup>*01. We note that <sup>Δ</sup>*i*<sup>1</sup> in (14) is also equal to *<sup>π</sup>*<sup>10</sup> in (35), i.e., we have Δ*i*<sup>1</sup> = *π*10. Similarly, Δ*i*<sup>2</sup> in (15) is equal to *π*<sup>01</sup> in (34). Thus, by observing that the states (*xi*(*t*), *x*ˆ*i*(*t*)) form a continuous time Markov chain, we can find the average difference Δ in (6) by finding the steady-state distribution for *π*. This method will be particularly useful in the following section where we consider the case with erroneous test measurements.

**Figure 4.** Transition graphs of the states (*xi*(*t*), *x*ˆ*i*(*t*)) (**a**) when there is no error in the tests, and (**b**) when there are errors in the tests.

#### *5.2. Average Difference with Erroneous Test Measurements*

In this section, we consider the case where the test measurements can be erroneous. When a test in applied to an infected person, i.e., when *xi*(*t*) = 1, the test result will be 0 with probability *<sup>q</sup>* and 1 with probability 1 <sup>−</sup> *<sup>q</sup>*, where 0 <sup>≤</sup> *<sup>q</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> . In other words, the false-negative probability is equal to *q*. Similarly, when a test is applied to a healthy person, i.e., when *xi*(*t*) = 0, the test result will be 1 with probability *p* and 0 with probability <sup>1</sup> <sup>−</sup> *<sup>p</sup>*, where 0 <sup>≤</sup> *<sup>p</sup>* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> . Thus, the false-positive probability is equal to *p*. The probability distribution for the test measurements is provided in Table 2.

**Table 2.** The probability distribution for successful and false test measurements.


In this section, we consider the case where the health care provider applies only one test rate *vi* to the *i*th person, whether the person is currently marked as healthy or infected. That is, we do not consider separate testing rates of *si* and *ci* for healthy and infected people as we did before, instead, here both *si* and *ci* are equal o *vi*. Since the health care provider applies the same test rate for the *i*th person, here we do not consider the importance factor *θ* either. Then, we define the long-term average difference for the *i*th person with the error on the test measurements as follows, where the superscript *e* stands for "erroneous".

$$
\Delta\_i^\epsilon = \Delta\_{i1}^\epsilon + \Delta\_{i2}^\epsilon \tag{36}
$$

and the definitions of Δ*<sup>e</sup> <sup>i</sup>*<sup>1</sup> and <sup>Δ</sup>*<sup>e</sup> <sup>i</sup>*<sup>2</sup> follow similarly from (13). We note that with the test rates *vi* and errors on the test measurements, the states (*xi*(*t*), *x*ˆ*i*(*t*)) form a continuous time Markov chain, and the corresponding state transition graph is shown in Figure 4b. Assuming that *vi* > 0, one can show that there is a unique steady-state distribution *<sup>π</sup><sup>e</sup>* <sup>=</sup> {*π<sup>e</sup>* <sup>00</sup>, *<sup>π</sup><sup>e</sup>* <sup>01</sup>, *<sup>π</sup><sup>e</sup>* <sup>10</sup>, *<sup>π</sup><sup>e</sup>* <sup>11</sup>} which can be found by solving the local balance equations which are given as follows

$$
\pi\_{00}^{\varepsilon}(v\_i p + \lambda\_i) = \pi\_{01}^{\varepsilon} v\_i (1 - p) + \pi\_{10}^{\varepsilon} \mu\_{i\prime} \tag{37}
$$

$$
\pi \pi\_{10}^{\epsilon} (v\_i (1 - q) + \mu\_i) = \pi\_{00}^{\epsilon} \lambda\_i + \pi\_{11}^{\epsilon} v\_i q\_\prime \tag{38}
$$

$$
\pi\_{01}^{\epsilon}(\upsilon\_i(1-p)+\lambda\_i) = \pi\_{00}^{\epsilon}\upsilon\_i p + \pi\_{11}^{\epsilon}\mu\_{i\iota} \tag{39}
$$

$$
\pi\_{11}^{\varepsilon}(v\_i q + \mu\_i) = \pi\_{10}^{\varepsilon} v\_i (1 - q) + \pi\_{01}^{\varepsilon} \lambda\_i. \tag{40}
$$

Then, by using (37)–(40) and ∑<sup>2</sup> *<sup>k</sup>*=<sup>1</sup> <sup>∑</sup><sup>2</sup> =<sup>1</sup> *<sup>π</sup><sup>e</sup> <sup>k</sup>* = 1, we find the steady-state distribution *<sup>π</sup><sup>e</sup>* as

*πe*

$$
\pi\_{00}^{\epsilon} = \frac{\mu\_i \lambda\_i \eta + (1 - p)\mu\_i (\upsilon\_i + \mu\_i)}{(\lambda\_i + \mu\_i)(\lambda\_i + \mu\_i + \upsilon\_i)},
\tag{41}
$$

$$
\pi\_{01}^{\varepsilon} = \frac{\mu\_i \lambda\_i (1 - q) + p \mu\_i (\upsilon\_i + \mu\_i)}{(\lambda\_i + \mu\_i)(\lambda\_i + \mu\_i + \upsilon\_i)},
\tag{42}
$$

$$
\pi\_{10}^{\varepsilon} = \frac{\mu\_i \lambda\_i (1 - p) + q \lambda\_i (v\_i + \lambda\_i)}{(\lambda\_i + \mu\_i)(\lambda\_i + \mu\_i + v\_i)},
\tag{43}
$$

$$
\pi\_{11}^{\epsilon} = \frac{\mu\_i \lambda\_i p + (1 - q)\lambda\_i (\upsilon\_i + \lambda\_i)}{(\lambda\_i + \mu\_i)(\lambda\_i + \mu\_i + \upsilon\_i)}. \tag{44}
$$

We note that Δ*<sup>e</sup> <sup>i</sup>*1, and <sup>Δ</sup>*<sup>e</sup> <sup>i</sup>*<sup>2</sup> are equal to *<sup>π</sup><sup>e</sup>* <sup>10</sup> in (43), and *<sup>π</sup><sup>e</sup>* <sup>01</sup> in (42), respectively. Thus, if *vi* > 0, then Δ*<sup>e</sup> <sup>i</sup>* in (36) becomes

$$
\Delta\_i^c = \frac{p\mu\_i^2 + q\lambda\_i^2 + (2 - p - q)\mu\_i\lambda\_i + \upsilon\_i(p\mu\_i + q\lambda\_i)}{(\lambda\_i + \mu\_i)(\lambda\_i + \mu\_i + \upsilon\_i)}.\tag{45}
$$

We immediately note that if false-positive test probability *p* and false-negative test probability *q* are equal to 0, Δ*<sup>e</sup> <sup>i</sup>* becomes <sup>2</sup>*μiλ<sup>i</sup>* (*λi*+*μi*)(*λi*+*μi*+*vi*) which is equal to <sup>Δ</sup>*i*<sup>1</sup> <sup>+</sup> <sup>Δ</sup>*i*<sup>2</sup> provided in (14) and (15), respectively, when *vi* <sup>=</sup> *si* <sup>=</sup> *ci*. Then, *<sup>∂</sup>*Δ*<sup>e</sup> i <sup>∂</sup><sup>p</sup>* ≥ 0 is equivalent to *vi* + *μ<sup>i</sup>* − *λ<sup>i</sup>* ≥ 0 and *<sup>∂</sup>*Δ*<sup>e</sup> i <sup>∂</sup><sup>q</sup>* ≥ 0 is equivalent to *vi* + *λ<sup>i</sup>* − *μ<sup>i</sup>* ≥ 0 which means that depending on the values of *vi*, *μi*, and *λi*, the long-term average difference Δ*<sup>e</sup> <sup>i</sup>* can be an increasing function of only *p* or only *q*, or both *p* and *q*, but Δ*<sup>e</sup> <sup>i</sup>* cannot be a decreasing function of both *p* and *q*. This is expected as false-negative and false-positive tests negatively affect the estimation process. One can also show that *<sup>∂</sup>*Δ*<sup>e</sup> i <sup>∂</sup>vi* <sup>&</sup>lt; 0 and *<sup>∂</sup>*2Δ*<sup>e</sup> i ∂v*<sup>2</sup> *i* > 0 which means that Δ*<sup>e</sup> <sup>i</sup>* decreases with *vi* and is a convex function of the test rate *vi*.

Next, we consider the case when *vi* = 0. Note that when *vi* = 0, the health care provider either maps these people as always sick or always healthy depending on their infection and recovery rates. Thus, when *vi* = 0 and depending on the estimate *x*ˆ*i*(*t*), two of the states in Figure 4b will never be visited and thus, these states will have 0 steady-state probabilities. For this case, the steady states are given by *π*¯*<sup>e</sup>* 1,*x*ˆ*<sup>i</sup>* and *<sup>π</sup>*¯*<sup>e</sup>* 0,*x*ˆ*<sup>i</sup>* . The local balance equation is *λiπ*¯*<sup>e</sup>* 0,*x*ˆ*<sup>i</sup>* <sup>=</sup> *<sup>μ</sup>iπ*¯*<sup>e</sup>* 1,*x*ˆ*<sup>i</sup>* . By using *π*¯*<sup>e</sup>* 0,*x*ˆ*<sup>i</sup>* <sup>+</sup> *<sup>π</sup>*¯*<sup>e</sup>* 1,*x*ˆ*<sup>i</sup>* = 1, we find the steadystate distribution as *π*¯*<sup>e</sup>* 0,*x*ˆ*<sup>i</sup>* <sup>=</sup> *<sup>μ</sup><sup>i</sup> μi*+*λ<sup>i</sup>* , and *π*¯*<sup>e</sup>* 1,*x*ˆ*<sup>i</sup>* <sup>=</sup> *<sup>λ</sup><sup>i</sup> μi*+*λ<sup>i</sup>* . Thus, if *μ<sup>i</sup>* < *λi*, i.e., if people are infected more frequently, then the health care provider chooses its estimate as *x*ˆ*i*(*t*) = 1 and, Δ*<sup>e</sup> <sup>i</sup>* <sup>=</sup> *<sup>μ</sup><sup>i</sup> μi*+*λ<sup>i</sup>* . If *μ<sup>i</sup>* ≥ *λi*, i.e., if people stay healthy more often, then we have *x*ˆ*i*(*t*) = 0, and Δ*<sup>e</sup> <sup>i</sup>* <sup>=</sup> *<sup>λ</sup><sup>i</sup> μi*+*λ<sup>i</sup>* . Therefore, when *vi* = 0, we have

$$\Delta\_i^{\varepsilon} = \min \left\{ \frac{\mu\_i}{\mu\_i + \lambda\_i}, \frac{\lambda\_i}{\mu\_i + \lambda\_i} \right\}. \tag{46}$$

In order to find the optimal test rates *vi* in the case of errors on the test measurements, we formulate the following optimization problem

$$\min\_{\{v\_{l}\}} \quad \sum\_{i=1}^{n} \mathbb{1} \{v\_{l} > 0\} \frac{p\mu\_{i}^{2} + q\lambda\_{i}^{2} + (2 - p - q)\mu\_{i}\lambda\_{i} + \upsilon\_{i}(p\mu\_{i} + q\lambda\_{i})}{(\lambda\_{i} + \mu\_{i})(\lambda\_{i} + \mu\_{i} + v\_{i})}$$

$$\qquad + 1 \{v\_{l} = 0\} \min \left\{ \frac{\mu\_{i}}{\mu\_{i} + \lambda\_{i}}, \frac{\lambda\_{i}}{\mu\_{i} + \lambda\_{i}} \right\}$$

$$\text{s.t. } \quad \sum\_{i=1}^{n} v\_{i} \le \mathbb{C}$$

$$v\_{l} \ge 0, \quad i = 1, \dots, n,\tag{47}$$

where the objective function is given by the summation of Δ*<sup>e</sup> <sup>i</sup>* in (45) when *vi* > 0 and Δ*e <sup>i</sup>* in (46) when *vi* = 0 over all people and {.} is the indicator function taking value 1 when {·} is true and 0, otherwise. In (47), we have a constraint on the total test rate, i.e., ∑*n <sup>i</sup>*=<sup>1</sup> *vi* ≤ *C*. We note that the optimization problem in (47) is in general not convex due to the indicator function in the objective function. However, for a given set of {*vi* = 0}, the optimization problem in (47) is convex and can be solved optimally. Thus, by solving the problem in (47) for all possible set of {*vi* = 0}, we can determine the global optimal solution which requires to solve 2*<sup>n</sup>* different optimization problems which can be impractical for large *n*. Because of this reason, next, we provide a greedy algorithm to solve the optimization problem in (47).

In the greedy solution, initially, assuming that {*vi* > 0} = 1 for all *i*, we consider the following the optimization problem

$$\min\_{\{v\_{i}\}} \quad \sum\_{i=1}^{n} \frac{p\mu\_{i}^{2} + q\lambda\_{i}^{2} + (2 - p - q)\mu\_{i}\lambda\_{i} + v\_{i}(p\mu\_{i} + q\lambda\_{i})}{(\lambda\_{i} + \mu\_{i})(\lambda\_{i} + \mu\_{i} + v\_{i})}$$

$$\text{s.t. } \sum\_{i=1}^{n} v\_{i} \le C$$

$$v\_{i} \ge 0, \quad i = 1, \dots, n,\tag{48}$$

where the objective function in (48) is equal to Δ*<sup>e</sup> <sup>i</sup>* in (45). For this optimization problem, we define the Lagrangian function for (48) as

$$\mathcal{L} = \sum\_{i=1}^{n} \frac{p\mu\_i^2 + q\lambda\_i^2 + (2 - p - q)\mu\_i\lambda\_i + v\_i(p\mu\_i + q\lambda\_i)}{(\lambda\_i + \mu\_i)(\lambda\_i + \mu\_i + v\_i)} + \beta \left(\sum\_{i=1}^{n} v\_i - \mathbb{C}\right) - \sum\_{i=1}^{n} v\_i v\_i,\tag{49}$$

where *<sup>β</sup>*¯ <sup>≥</sup> 0, *<sup>ν</sup>*¯*<sup>i</sup>* <sup>≥</sup> 0. We note that the problem defined in (48) is a convex optimization problem, and thus we can find the optimal test rates *vi* by analyzing the KKT and the complementary slackness conditions. The KKT conditions are given by

$$\frac{\partial \mathcal{L}}{\partial \upsilon\_i} = \frac{-2(1 - p - q)\mu\_i \lambda\_i}{(\mu\_i + \lambda\_i)(\mu\_i + \lambda\_i + \upsilon\_i)^2} + \mathcal{J} - \upsilon\_i = 0,\tag{50}$$

for all *i*. The complementary slackness conditions are

$$\beta \left(\sum\_{i=1}^{n} v\_i - \mathcal{C}\right) = 0, \quad \mathbb{V}\_i v\_i = 0. \tag{51}$$

By using (50) and (51), we find the optimal *vi* values for the problem in (48) as

$$w\_i = (\mu\_i + \lambda\_i) \left( \sqrt{\frac{\mu\_i \lambda\_i}{\left(\mu\_i + \lambda\_i\right)^3} \frac{2\left(1 - p - q\right)}{\vec{\beta}}} - 1 \right)^+. \tag{52}$$

With the test rates *vi* in (52) we find the average differences Δ*<sup>e</sup> <sup>i</sup>* in (45) and then compare them with Δ*<sup>e</sup> <sup>i</sup>* in (46) when *vi* = 0. Due to the errors in the tests, <sup>Δ</sup>*<sup>e</sup> <sup>i</sup>* in (46) with *vi* = 0 can be smaller than Δ*<sup>e</sup> <sup>i</sup>* in (45) with the test rates *vi* found in (52). For these people, we choose index *i* where the difference between Δ*<sup>e</sup> <sup>i</sup>* in (45) with the *vi* in (52) and <sup>Δ</sup>*<sup>e</sup> <sup>i</sup>* in (46) is the highest. Then, we take *vi* = 0 as applying no test to this person can further decrease Δ*<sup>e</sup> i* . For the remaining people, we solve the optimization problem in (48). After obtaining the test rates for the remaining people, we again compare average differences Δ*<sup>e</sup> <sup>i</sup>* with the test rates in (52) and with no test and we choose *vi* = 0 for the person where Δ*<sup>e</sup> <sup>i</sup>* can be further decreased. We repeat these steps until all Δ*<sup>e</sup> <sup>i</sup>* s with *vi* > 0 cannot be further decreased by choosing *vi* = 0.

We note that the solution obtained in (52) has a *threshold* structure. As false-positive and -negative test rates increase, the term <sup>2</sup>(1−*p*−*q*) *<sup>β</sup>*¯ in (52) becomes smaller. As a result, some people with higher .(*μi*+*λi*)<sup>3</sup> *<sup>μ</sup>iλ<sup>i</sup>* may not be tested by the health care provider. Thus, when *p* and *q* are high, a smaller portion of the population is tested with higher test rates in order to combat the test errors.

#### **6. Average Estimation Error with Dependent Infection Rates**

In this section, we consider the case where we have two people whose infection rates depend on each other. When these two people are healthy, they can be individually infected with the virus after an exponential time with rate *λ*. When one of these two people is infected and this has not been detected by the health care provider, this person can infect the other healthy person after an exponential time with rate *λ*<sup>12</sup> which has been illustrated in Figure 5. Thus, when both of the people are healthy, their individual infection rate is *λ*. However, when one of them is sick and this has not been detected by the health care provider, the healthy person's total infection rate is equal to *λ* + *λ*12. On the other hand, if only one person is infected, i.e., *xi*(*t*) = 1, which has also been detected by the health care provider, *x*ˆ*i*(*t*) = 1, then we assume that we isolate the infected person from the healthy one, and thus, the healthy person's infection rate remains as *λ* instead of *λ* + *λ*12. When the people are infected, they recover from the disease after an exponential time with rate *μ*.

**Figure 5.** The infection rates of two people where the individual infection rate is equal to *λ*. When the infection has not been detected, these two people can infect each other with rate *λ*12.

When the health care provider believes that a person is healthy, i.e., *x*ˆ*i*(*t*) = 0, the next test is applied to this person after an exponential time with rate *s*. When the health care provider believes that a person is sick, i.e., *x*ˆ*i*(*t*) = 1, the next test applied to this person after an exponential time with rate *c*. Here, we note that since the people are identical in terms of their infection and recovery rates, the health care provider applies the same test rates.

Similar to Section 5, we note that the states {*x*1(*t*), *x*ˆ1(*t*), *x*2(*t*), *x*ˆ2(*t*)} form a continuous time Markov chain where the unique stationary distribution is given by *π<sup>d</sup>* = {*π<sup>d</sup>* <sup>0000</sup>, *<sup>π</sup><sup>d</sup>* <sup>0001</sup>, ... , *<sup>π</sup><sup>d</sup>* <sup>1111</sup>}. In order to find the stationary distribution, we write the local balance equations as follows

$$2\lambda \pi\_{0000}^d = \mu \pi\_{1000}^d + c\pi\_{0100}^d + \mu \pi\_{0010}^d + c\pi\_{0001\prime}^d \tag{53}$$

$$(2\lambda + c)\pi\_{0001}^d = \mu \pi\_{0011}^d + c\pi\_{0101}^d + \mu \pi\_{1001\prime}^d \tag{54}$$

$$(\lambda + \lambda\_{12} + \mu + s)\pi\_{0010}^d = c\pi\_{0110}^d + \mu\pi\_{1010}^d + \lambda\pi\_{0000}^d \tag{55}$$

$$(\lambda + \mu)\pi\_{0011}^d = c\pi\_{0111}^d + \mu\pi\_{1011}^d + s\pi\_{0010}^d + \lambda\pi\_{0001'}^d \tag{56}$$

$$(2\lambda + c)\pi\_{0100}^d = c\pi\_{0101}^d + \mu\pi\_{0110}^d + \mu\pi\_{1100\prime}^d \tag{57}$$

$$(2\lambda + 2c)\pi\_{0101}^d = \mu \pi\_{0111}^d + \mu \pi\_{1101\prime}^d \tag{58}$$

$$(\lambda + \mu + s + c)\pi\_{0110}^d = \lambda \pi\_{0100}^d + \mu \pi\_{1110\nu}^d \tag{59}$$

$$(\lambda + \mu + c)\tau\_{0111}^d = \text{s}\tau\_{0110}^d + \lambda\tau\_{0101}^d + \mu\tau\_{1111\prime}^d\tag{60}$$

$$(\lambda + \lambda\_{12} + \mu + s)\pi\_{1000}^d = \lambda \pi\_{0000}^d + c\pi\_{1001}^d + \mu \pi\_{1010\prime}^d \tag{61}$$

$$(\lambda + \mu + s + c)\pi\_{1001}^d = \mu \pi\_{1011}^d + \lambda \pi\_{0001\prime}^d \tag{62}$$

$$(2\mu + 2s)\pi\_{1010}^d = (\lambda + \lambda\_{12})\pi\_{1000}^d + (\lambda + \lambda\_{12})\pi\_{0010}^d \tag{63}$$

$$(2\mu + s)\,\pi\_{1011}^d = s\pi\_{1010}^d + \lambda\,\pi\_{1001}^d + \lambda\,\pi\_{0011'}^d \tag{64}$$

$$(\lambda + \mu)\pi\_{1100}^d = \mathfrak{s}\pi\_{1000}^d + \lambda\pi\_{0100}^d + \mathfrak{c}\pi\_{1101}^d + \mu\pi\_{1110\prime}^d\tag{65}$$

$$(\lambda + \mu + \mathfrak{c})\pi\_{1101}^d = \mathfrak{s}\pi\_{1001}^d + \lambda\pi\_{0101}^d + \mu\pi\_{1111\prime}^d \tag{66}$$

$$(2\mu + s)\pi\_{1110}^d = \lambda \pi\_{1100}^d + s\pi\_{1010}^d + \lambda \pi\_{0110}^d \tag{67}$$

$$2\mu \pi \tau\_{1111}^d = \text{s} \pi\_{1110}^d + \lambda \pi\_{1101}^d + \text{s} \pi\_{1011}^d + \lambda \pi\_{0111}^d. \tag{68}$$

By using (53)–(68) and ∑<sup>2</sup> *<sup>j</sup>*=<sup>1</sup> <sup>∑</sup><sup>2</sup> =<sup>1</sup> <sup>∑</sup><sup>2</sup> *<sup>m</sup>*=<sup>1</sup> <sup>∑</sup><sup>2</sup> *<sup>h</sup>*=<sup>1</sup> *<sup>π</sup><sup>d</sup> <sup>j</sup>mh* = 1, we find the stationary distribution *πd*. We denote the long-term average estimation error for person *i* as Δ*<sup>d</sup> <sup>i</sup>* for *i* = 1, 2, where the superscript *d* stands for "dependent", which is given by

$$
\Delta\_i^d = \Delta\_{i1}^d + \Delta\_{i2}^d \tag{69}
$$

where Δ*<sup>d</sup> <sup>i</sup>*<sup>1</sup> and <sup>Δ</sup>*<sup>d</sup> <sup>i</sup>*<sup>2</sup> follow from (13). Then, we have

$$
\Delta\_{11}^d = \pi\_{1000}^d + \pi\_{1001}^d + \pi\_{1010}^d + \pi\_{1011'}^d \tag{70}
$$

$$
\Delta\_{12}^d = \pi\_{0100}^d + \pi\_{0101}^d + \pi\_{0110}^d + \pi\_{0111\prime}^d \tag{71}
$$

$$
\Delta\_{21}^d = \pi\_{0010}^d + \pi\_{0110}^d + \pi\_{1010}^d + \pi\_{1110\prime}^d \tag{72}
$$

$$
\Delta\_{22}^d = \pi\_{0001}^d + \pi\_{0101}^d + \pi\_{1001}^d + \pi\_{1101}^d. \tag{73}
$$

In Section 8, for given infection, recovery and test rates, we numerically evaluate the stationary distribution and find the average difference Δ*<sup>d</sup> i* .

#### **7. Age of Incorrect Information Based Error Metric**

To date, we have considered an estimation error metric that takes the value 1 if the actual infection status of a person is different than the real-time estimation at the health care provider. Thus, the error metric takes values based on the information content. On the other hand, the traditional age metric introduced in [1] considers only the time passed since the most recently received status update packet is generated at the source. As a result, the traditional age metric does not consider the information content and age alone may not be a suitable performance metric for the problem considered in our work.

In the context of infection tracking, it is important to know how long the estimations at the health care provider have been different from the actual infection status of the people. However, the error metric that we have considered thus far does not have the time component, i.e., it only takes value 1 independent of the time duration that it has been off from the actual health status. Motivated by the AoII introduced in [51,102] which accounts for both the time and the information content, in this section, we consider the following error metric, where the superscript *s* stands for "synchronization" implied in AoII,

$$
\Delta\_i^s = (t - V\_i(t)) \mathbb{1} \{ \mathfrak{X}\_i(t) \neq \mathfrak{x}\_i(t) \},
\tag{74}
$$

where *Vi*(*t*) is the last time instant where the health care provider makes an accurate estimation of the health status for the *i*th person, i.e., the last time instant when Δ*<sup>s</sup> <sup>i</sup>* = 0. Similarly, we define

$$
\Delta\_{i1}^s = (t - V\_{i1}(t)) \max \{ \mathbf{x}\_i(t) - \hat{\mathbf{x}}\_i(t), \mathbf{0} \},
\tag{75}
$$

$$
\Delta\_{i2}^s = (t - V\_{i2}(t)) \max \{ \pounds\_i(t) - \pounds\_i(t), 0 \},
\tag{76}
$$

where *Vi*1(*t*) and *Vi*2(*t*) are equal to the last time instants when Δ*<sup>s</sup> <sup>i</sup>*<sup>1</sup> and <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>*<sup>2</sup> are equal to 0, respectively. A sample evolution of Δ*<sup>s</sup> <sup>i</sup>*<sup>1</sup> and <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>*<sup>2</sup> is shown in Figure 6 and we note that Δ*s <sup>i</sup>*(*t*) = <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>*1(*t*) + <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>*2(*t*).

**Figure 6.** A sample evolution of (**a**) Δ*<sup>s</sup> <sup>i</sup>*1(*t*), and (**b**) <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>*2(*t*) in a typical update cycle.

Similar to Section 3, the infection and the recovery rates of the *i*th person are *λ<sup>i</sup>* and *μi*, respectively. In this section, the health care provider applies only one test rate for each person denoted by *wi*. That is, we do not consider separate testing rates of *si* and *ci* for healthy and infected people as we did previously, instead, here both *si* and *ci* are equal o *wi*. We first consider the case where *wi* > 0. By following the steps in Section 3, one can show that E[*Ii*1] = <sup>1</sup> *wi* <sup>+</sup> *wi*+*μ<sup>i</sup> wiλ<sup>i</sup>* and <sup>E</sup>[*Ii*2] = <sup>1</sup> *wi* <sup>+</sup> *wi*+*λ<sup>i</sup> wiμ<sup>i</sup>* which can be obtained by substituting *wi* instead of *si* and *ci* in (10) and (12), respectively. Next, we denote the total area when Δ*<sup>s</sup> <sup>i</sup>*1(*t*) <sup>&</sup>gt; 0 as *Ae*,1(*i*, *<sup>j</sup>*) during the *<sup>j</sup>*th cycle where *Ae*,1(*i*, *<sup>j</sup>*) = <sup>∑</sup>*K*<sup>1</sup> =1 *Tm*(*i*,)<sup>2</sup> <sup>2</sup> and *K*<sup>1</sup> has a geometric distribution with success rate *wi μi*+*wi* . Then, we have E[*Ae*,1(*i*)] = <sup>1</sup> *wi*(*wi*+*μi*). Similarly, we denote the total area when Δ*<sup>s</sup> <sup>i</sup>*2(*t*) > 0 as *Ae*,2(*i*, *j*) during the *j*th cycle where *Ae*,2(*i*, *<sup>j</sup>*) = <sup>∑</sup>*K*<sup>2</sup> =1 *Tu*(*i*,)<sup>2</sup> <sup>2</sup> and *<sup>K</sup>*<sup>2</sup> has a geometric distribution with success rate *wi λi*+*wi* . Then, we have E[*Ae*,2(*i*)] = <sup>1</sup> *wi*(*wi*+*λi*). By using ergodicity, the long-term average differences become Δ*<sup>s</sup> <sup>i</sup>*<sup>1</sup> <sup>=</sup> <sup>E</sup>[*Ae*,1(*i*)] <sup>E</sup>[*Ii*1]+E[*Ii*2] and <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>*<sup>2</sup> <sup>=</sup> <sup>E</sup>[*Ae*,2(*i*)] <sup>E</sup>[*Ii*1]+E[*Ii*2] which gives

$$
\Delta\_i^s = \Delta\_{i1}^s + \Delta\_{i2}^s = \frac{\mu\_i \lambda\_i}{\mu\_i + \lambda\_i} \frac{2w\_i + \mu\_i + \lambda\_i}{(w\_i + \mu\_i + \lambda\_i)(w\_i + \mu\_i)(w\_i + \lambda\_i)} \tag{77}
$$

when *wi* > 0. One can show that Δ*<sup>s</sup> <sup>i</sup>* is a decreasing function of *wi*, i.e., *<sup>∂</sup>*Δ*<sup>s</sup> i <sup>∂</sup>wi* <sup>&</sup>lt; 0, and <sup>Δ</sup>*<sup>s</sup> <sup>i</sup>* is a convex function of *wi*, i.e., *<sup>∂</sup>*2Δ*<sup>s</sup> i ∂w*<sup>2</sup> *i* > 0.

When *wi* = 0, we have E[*Ii*] = *<sup>μ</sup>iλ<sup>i</sup> μi*+*λ<sup>i</sup>* , i.e., E[*Ii*] is equal to the expected time of a person's healthy and sick states. Since the health care provider applies no tests to test a person, it either estimates this person to be always sick (*x*ˆ*i*(*t*) = 1) or always healthy (*x*ˆ*i*(*t*) = 0). When *wi* = 0 and *x*ˆ*i*(*t*) = 1, then Δ*<sup>s</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *μi λi μi*+*λ<sup>i</sup>* . When *wi* = 0 and *x*ˆ*i*(*t*) = 1, we have Δ*<sup>s</sup> <sup>i</sup>* <sup>=</sup> <sup>1</sup> *λi μi μi*+*λ<sup>i</sup>* . If *μ<sup>i</sup>* < *λi*, then the health care provider *x*ˆ*i*(*t*) = 1, and *x*ˆ*i*(*t*) = 0, otherwise. Thus, when *wi* = 0, we have Δ*<sup>s</sup> <sup>i</sup>* <sup>=</sup> min) <sup>1</sup> *μi λi μi*+*λ<sup>i</sup>* , 1 *λi μi μi*+*λ<sup>i</sup>* \* .

In order to find the optimal test rates, we formulate the following optimization problem

$$\begin{aligned} \min\_{\{w\_{i}\}} & \quad \sum\_{i=1}^{n} \mathbf{1} \{w\_{i} > 0\} \frac{\mu\_{i}\lambda\_{i}}{\mu\_{i} + \lambda\_{i}} \frac{2w\_{i} + \mu\_{i} + \lambda\_{i}}{(w\_{i} + \mu\_{i} + \lambda\_{i})(w\_{i} + \mu\_{i})(w\_{i} + \lambda\_{i})} \\ & \quad + \mathbf{1} \{w\_{i} = 0\} \min \left\{ \frac{1}{\mu\_{i}} \frac{\lambda\_{i}}{\mu\_{i} + \lambda\_{i}}, \frac{1}{\lambda\_{i}} \frac{\mu\_{i}}{\mu\_{i} + \lambda\_{i}} \right\} \end{aligned} $$
 
$$\begin{aligned} \text{s.t.} \quad \sum\_{i=1}^{n} w\_{i} \le \mathbf{C} \\ w\_{i} \ge 0, \quad i = 1, \dots, n, \end{aligned} \tag{78}$$

where the objective function in (78) is equal to the summation of Δ*<sup>s</sup> <sup>i</sup>* in (77) when *wi* > 0 and Δ*<sup>s</sup> <sup>i</sup>* when *wi* = 0 over all people. In order to solve the problem in (78), we follow the same greedy solution approach in Section 5. First, by assuming that *wi* > 0, and thus, the average difference Δ*<sup>s</sup> <sup>i</sup>* is given in (77), we solve the following optimization problem

$$\min\_{\{w\_{i}\}} \quad \sum\_{i=1}^{n} \frac{\mu\_{i}\lambda\_{i}}{\mu\_{i} + \lambda\_{i}} \frac{2w\_{i} + \mu\_{i} + \lambda\_{i}}{(w\_{i} + \mu\_{i} + \lambda\_{i})(w\_{i} + \mu\_{i})(w\_{i} + \lambda\_{i})}$$

$$\text{s.t. } \sum\_{i=1}^{n} w\_{i} \le \mathbb{C}$$

$$w\_{i} \ge 0, \quad i = 1, \dots, n. \tag{79}$$

Since the problem in (79) is a convex optimization problem, by defining Lagrangian function and analyzing the KKT and the complementary slackness conditions, we can find the optimal *wi* values. In order to avoid being repetitive, we skip these optimization steps. Then, we compare Δ*<sup>s</sup> <sup>i</sup>* in (77) with *wi* values found in (79) with min{ <sup>1</sup> *μi λi μi*+*λ<sup>i</sup>* , 1 *λi μi μi*+*λ<sup>i</sup>* }. If we can reduce Δ*<sup>s</sup> <sup>i</sup>* further, we choose *wi* = 0 for the person with the highest improvement. Then, we solve the optimization problem in (79) for the remaining people. We repeat these steps until there is no improvement in Δ*<sup>s</sup> <sup>i</sup>* by choosing *wi* = 0.

In the next section, we provide extensive numerical results to evaluate optimal test rates in various settings considered in this paper.

#### **8. Numerical Results**

In this section, we provide seven numerical results. For these examples, we take *λ<sup>i</sup>* as

$$
\lambda\_i = ar^i, \quad i = 1, \ldots, n,\tag{80}
$$

where *r* = 0.9 and *a* is such that ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *λ<sup>i</sup>* = 6. Furthermore, we take *μ<sup>i</sup>* as

$$
\mu\_i = bq^i, \quad i = 1, \ldots, n,\tag{81}
$$

where *q* = 1.1 and *b* is such that ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *μ<sup>i</sup>* = 4. Since *λ<sup>i</sup>* in (80) decreases with *i*, people with lower indices become infected more quickly compared to people with higher indices. Since *μ<sup>i</sup>* in (81) increases with *i*, people with higher indices recover more quickly compared to people with lower indices. Thus, a person with a low index becomes infected quickly and recovers slowly.

In the first example, we take the total number of people as *n* = 10, the total test rate as *C* = 16, and *θ* = 0.5. We start with randomly chosen *si* and *ci* such that ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* + *ci* = 16, and apply the alternating minimization-based method proposed in Section 4. We repeat this process for 30 different initial (*si*, *ci*) pairs and choose the solution that gives the smallest Δ. In Figure 7a, we observe that the first three people are never tested by the health care provider. We note that *si*, which is the test rate when *x*ˆ*i*(*t*) = 0, initially increases with *i* but then decreases with *i*. This means that people who become infected rarely are tested less frequently when they are marked as healthy. Similarly, we observe in Figure 7a that

*ci*, which is the test rate when *x*ˆ*i*(*t*) = 1, monotonically increases with *i*. In other words, people who recover from the virus quickly are tested more frequently when they are marked as infected.

**Figure 7.** (**a**) Test rates *si* and *ci*, (**b**) corresponding average difference Δ*i*.

In Figure 7b, we plot Δ*<sup>i</sup>* resulting from the solution found from the proposed algorithm, Δ*<sup>i</sup>* when the health care provider applies tests to everyone in the population uniformly, i.e., *si* = *ci* = *<sup>C</sup>* <sup>2</sup>*<sup>n</sup>* for all *i*, and Δ*<sup>i</sup>* when the health care provider applies no tests, i.e., *si* <sup>=</sup> *ci* <sup>=</sup> 0 for all *<sup>i</sup>*. In the case of no tests, we have <sup>Δ</sup>*<sup>i</sup>* <sup>=</sup> min{ *θλ<sup>i</sup> μi*+*λ<sup>i</sup>* , (1−*θ*)*μ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* }. We observe in Figure 7b that the health care provider applies tests on people whose Δ*<sup>i</sup>* can be reduced the most as opposed to uniform testing where everyone is tested equally. Thus, the first three people who have the smallest Δ*<sup>i</sup>* are not tested by the health care provider. With the proposed solution, by not testing the first three people, Δ*<sup>i</sup>* are further reduced for the remaining people compared to uniform testing. For the people who are not tested, the health care provider chooses *x*ˆ*i*(*t*) = 1 all the time, i.e., marks these people always sick as *θλ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* <sup>&</sup>gt; (1−*θ*)*μ<sup>i</sup> <sup>μ</sup>i*+*λ<sup>i</sup>* . This is expected as these people have high *λ<sup>i</sup>* and low *μi*, i.e., they are infected easily and they stay sick for a long time.

In the second example, we use the same set of variables except for the total test rate *C*. We vary the total test rate *C* in between 5 and 20. We plot Δ with respect to *C* in Figure 8. We observe that Δ decreases with *C*. Thus, with higher total test rates, the health care provider can track the infection status of the population better as expected.

In the third example, we use the same set of variables except for the total number of people *n*. In addition, we also use uniform infection and healing rates, i.e., *λ<sup>i</sup>* = <sup>6</sup> *n* and *μ<sup>i</sup>* = <sup>4</sup> *<sup>n</sup>* for all *i*, for comparison with *λ<sup>i</sup>* in (80) and *μ<sup>i</sup>* in (81), while keeping the total infection and healing rates the same, i.e., ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *<sup>λ</sup><sup>i</sup>* = 6 and <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *μ<sup>i</sup>* = 4, for both cases. We vary the number of people *n* from 2 to 30. We observe in Figure 9 that when the infection and healing rates are uniform in the population, the health care provider can track the infection status with the same efficiency, even though the population size increases (while keeping the total infection and healing rates fixed). For the case of *λ<sup>i</sup>* in (80) and *μ<sup>i</sup>* in (81), when we increase the population size, we increase the number of people who rarely become sick, i.e., people with high *i* indices, and also people who rarely heal from the disease, i.e., people with small *i* indices. Thus, it becomes easier for the health care provider to track the infection status of the people. This is why when we use *λ<sup>i</sup>* in (80) and *μ<sup>i</sup>* in (81), we observe in Figure 9 that the health care provider can track the infection status of the people better, even though the population size increases.

**Figure 8.** The average difference Δ with respect to total test rate *C*.

**Figure 9.** The average difference Δ with respect to number of people *n*. We use uniform infection and healing rates, i.e., *λ<sup>i</sup>* = <sup>6</sup> *<sup>n</sup>* and *<sup>μ</sup><sup>i</sup>* <sup>=</sup> <sup>4</sup> *<sup>n</sup>* for all *<sup>i</sup>*, and also *<sup>λ</sup><sup>i</sup>* in (80) and *<sup>μ</sup><sup>i</sup>* in (81) with <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *λ<sup>i</sup>* = 6 and ∑*n <sup>i</sup>*=<sup>1</sup> *μ<sup>i</sup>* = 4.

In the fourth example, we employ the same set of variables as the first example except for the importance factor *θ*. Here, we vary *θ* in between 0.2 and 0.7. We plot Δ in (7), Δ¯ <sup>1</sup> which is Δ¯ <sup>1</sup> = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> <sup>Δ</sup>*i*1, and <sup>Δ</sup>¯ <sup>2</sup> which is <sup>Δ</sup>¯ <sup>2</sup> = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> <sup>Δ</sup>*i*<sup>2</sup> in Figure 10a. Note that <sup>Δ</sup>¯ <sup>1</sup> represents the average difference when people are infected, but have not been detected by the health care provider, and Δ¯ <sup>2</sup> represents the average difference when people have recovered, but the health care provider still marks them as infected. Note that when *θ* is high, we assign importance to minimization of Δ¯ 1, i.e., the early detection of people with infection, and when *θ* is low, we give importance to minimization of Δ¯ 2, i.e., the early detection of people who recovered from the disease. This is why we observe in Figure 10a that Δ¯ <sup>1</sup> decreases with *θ* while Δ¯ <sup>2</sup> increases with *θ*.

We plot the total test rates ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* and <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ci* in Figure 10b. We observe in Figure 10b that if it is more important to detect the infected people, i.e., if *θ* is high, then the health care provider should apply higher test rates to people who are marked as healthy. In other words, ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* increases with *θ*. Similarly, if it is more important to detect people who recovered from the disease, then the health care provider should apply high test rates to people who are marked as infected. That is, ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ci* is high when *θ* is low. Therefore, depending on the priorities of the health care provider, a suitable *θ* needs to be chosen.

In the fifth numerical result, we consider the case where there are errors in the test measurements, i.e., the model in Section 5. We take the total test rate as *C* = 20, and vary error rates in the test *p* = *q* = {0.1, 0.2, 0.4}. In Figure 11a, we provide the test rates *vi* that we found as a result of our greedy policy in Section 5. When the error rates *p* and *q* are low, i.e., when *p* = *q* = 0.1, we see that the health care provider applies tests to everyone in the population and the corresponding Δ*<sup>e</sup> <sup>i</sup>* is lower than applying no test as shown in Figure 11b. As we increase the error rates, we observe that some people in the population start to be not tested by the health care provider, see Figure 11a when *p* = *q* = {0.2, 0.4}. In this case, the health care provider applies more tests to the remaining people to combat the test errors. However, although it applies more tests to the remaining people, we observe in Figure 11b that the achieved average difference Δ*<sup>e</sup> <sup>i</sup>* becomes higher as error rates increase.

**Figure 10.** (**a**) Δ in (7), Δ¯ <sup>1</sup> which is <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> <sup>Δ</sup>*i*1, and <sup>Δ</sup>¯ <sup>2</sup> which is <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> Δ*i*2, (**b**) corresponding total test rates ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *si* and <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ci*.

**Figure 11.** (**a**) Test rates *vi*, (**b**) corresponding average difference Δ*<sup>e</sup> <sup>i</sup>* when there is error in the tests.

In the sixth numerical result, we consider the case where the infection status of the people depend on each other. In other words, when one person is infected, they can infect the other person with rate *λ*<sup>12</sup> when they are not detected by the health care provider, i.e., the infection model in Section 6. For this example, first, we take *μ* = 5, *λ* = 2.5, *s* = *c* = *<sup>C</sup>* 4 and vary *λ* = {2, ... , 200} and *C* = {20, 40, 60}. If *λ*<sup>12</sup> = 0, i.e., if the infection status of people are independent from each other, then the average time that person 1 or 2 is sick is equal to *<sup>λ</sup> <sup>λ</sup>*+*<sup>μ</sup>* <sup>=</sup> <sup>1</sup> <sup>3</sup> . As we increase infection rate *λ*<sup>12</sup> among the person 1 and 2, we see in Figure 12a that the average time that person 1 is sick increases. However, we note that as we increase the total test rate, the health care provider can detect a sick person more frequently, and this explains why the average infected time is low in Figure 12a when the test rate is high. Then, we consider *λ*<sup>12</sup> = {5, 10, 15} and vary the total test rates *λ* = {2, ... , 200}. We plot the average time that both person 1 and 2 stay as sick in Figure 12b. As we increase the total test rate, the health care provider detects the infected person more quickly, and thus, prohibits the infection from spreading. As a result, we observe that the average time that both people are infected decreases in *C* in Figure 12b. Since both people can be infected with the virus independent from each other with rate *λ*, the plots in Figure 12b do not drop to 0.

**Figure 12.** (**a**) The percentage of the time that person 1 stays as infected while we increase *λ*12, (**b**) the percentage of the time that both person 1 and 2 stay as infected while we increase the total test rate *C*.

In the last numerical result, we consider the age of incorrect information-based error metric in Section 7. Here, the estimation error increases with the time that the health care provider does not detect the changes in the infection status of the people. As a result, the average difference expression given by Δ*<sup>s</sup> <sup>i</sup>* in (77) is different than <sup>Δ</sup>*<sup>e</sup> <sup>i</sup>* in (45) when *p* = *q* = 0. For this example, we consider the total test rate *C* = 4 and compare the normalized average differences given by <sup>Δ</sup>*<sup>s</sup> i* ∑*n <sup>i</sup>*=<sup>1</sup> <sup>Δ</sup>*<sup>s</sup> i* , and <sup>Δ</sup>*<sup>e</sup> i* ∑*n <sup>i</sup>*=<sup>1</sup> <sup>Δ</sup>*<sup>e</sup> i* and the corresponding test rates *wi* and *vi*. In Figure 13b, depending on the error metric model, people who are tested by the health care provider show considerable variation in their test rates. For example, with the error metric Δ*s <sup>i</sup>* in (77), we apply tests to every third person while the same person is not tested with the error metric Δ*<sup>e</sup> <sup>i</sup>* in (45). In Figure 13a, we provide the normalized average difference values. Here, the average normalized error for the tested people exhibit similar values whereas the normalized difference may vary for the untested people. Thus, we should choose a suitable error metric that satisfies the priorities of the health care provider as it greatly affects who is tested and with which test rates.

**Figure 13.** (**a**) The normalized average differences <sup>Δ</sup>*<sup>s</sup> i* ∑*n <sup>i</sup>*=<sup>1</sup> Δ*<sup>s</sup> i* , and <sup>Δ</sup>*<sup>e</sup> i* ∑*n <sup>i</sup>*=<sup>1</sup> Δ*<sup>e</sup> i* , and (**b**) the corresponding test rates *wi* and *vi*.

#### **9. Conclusions and Discussion**

We considered the timely tracking of infection status of individuals in a population. For exponential infection and healing processes with given rates, we determined the rates

of exponential testing processes. We considered errors on the test measurements and observed that in order to combat the test errors, a limited portion of the population may be tested with higher test rates. Then, we studied a dependent infection spread model for two people, where one infected person can spread the virus to the other if it has not been detected by the health care provider. Finally, we studied an AoII-based error metric where the error function linearly increases over time as the changes in the infection status have not been detected by the health care provider. We observed in numerical results that the test rates depend on the individuals' infection and recovery rates, the individuals' last known state of being healthy or infected, as well as the health care provider's priorities of detecting infected people versus detecting recovered people more quickly.

In the literature, in order to model epidemics, population is partitioned into groups called *compartments*. One such example is the SIR model used in [106] with the compartments susceptible (S), infected (I), and recovered (R) which has been further developed by adding the states hospitalized (H), and death (D) in [107]. In these epidemic models, the transitions between the compartments are assumed to be Markovian. In [107], with epidemiological data, the delay distributions for the infected (I) to hospitalized (H), and infected (I) to death (D) are well approximated by exponential and gamma distributions, respectively. However, due to the lack of data availability the delay distribution for infected (I) to recovered (R) is modeled with gamma distribution with higher tolerance. In our work, we modeled infection and recovery times, i.e., the delays between recovered (R) to infected (I) and infected (I) to recovered (R) with exponential distributions. Therefore, more realistic infection tracking models can be developed by considering gamma distributions as observed in [107]. This more realistic model corresponds to the problem of real-time timely tracking of a binary Markov source in a serially connected network. The serially connected network model was studied in [8] with the traditional age of information metric. We note that considering the same networking model with the AoII-based error metric to track information dissemination of a binary Markov source represents a promising research direction and has direct applications to the real-time tracking of epidemic spread models. One can also study the extension of dependent infection spread model in Section 6 to *n* > 2 people as a future research direction.

Another interesting research direction could be to consider different kinds of tests with different false-positive and false-negative test rates. Regarding this problem, instead of having a total test rate capacity *C*, we may consider a total test budget *K*. Assuming that each test bears a different cost, the goal might be to identify how many tests the health care provider should obtain from each type. Here, one can study a trade-off between applying fewer tests with a small probability of error versus applying more tests to individuals with a high probability of error. Moreover, one can consider a scenario where the health care provider may prefer to apply different test types to individuals depending on their infection and recovery rates.

**Author Contributions:** Conceptualization, M.B. and S.U.; methodology, M.B. and S.U.; software, M.B.; validation, M.B. and S.U.; formal analysis, M.B. and S.U.; investigation, M.B. and S.U.; resources, M.B. and S.U.; data curation, M.B. and S.U.; writing—original draft preparation, M.B. and S.U.; writing—review and editing, M.B. and S.U.; visualization, M.B. and S.U.; supervision, S.U.; project administration, S.U.; funding acquisition, S.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by NSF Grants CCF 17-13977 and ECCS 18-07348.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Implementation and Evaluation of Age-Aware Downlink Scheduling Policies in Push-Based and Pull-Based Communication**

**Tahir Kerem O ˘guz 1,\* , Elif Tu ˘gçe Ceran 1, Elif Uysal <sup>1</sup> and Tolga Girici <sup>2</sup>**


**Abstract:** As communication systems evolve to better cater to the needs of machine-type applications such as remote monitoring and networked control, advanced perspectives are required for the design of link layer protocols. The age of information (AoI) metric has firmly taken its place in the literature as a metric and tool to measure and control the data freshness demands of various applications. AoI measures the timeliness of transferred information from the point of view of the destination. In this study, we experimentally investigate AoI of multiple packet flows on a wireless multi-user link consisting of a transmitter (base station) and several receivers, implemented using software-defined radios (SDRs). We examine the performance of various scheduling policies under push-based and pull-based communication scenarios. For the push-based communication scenario, we implement age-aware scheduling policies from the literature and compare their performance with those of conventional scheduling methods. Then, we investigate the query age of information (QAoI) metric, an adaptation of the AoI concept for pull-based scenarios. We modify the former age-aware policies to propose variants that have a QAoI minimization objective. We share experimental results obtained in a simulation environment as well as on the SDR testbed.

**Keywords:** age of information; query age of information; wireless networks; software-defined radio; scheduling

#### **1. Introduction**

The advent and the fast growth of the Internet of things (IoT) has further complicated the design of communication networks, in the presence of an increase in demand in networked services catered by the fifth generation (5G) evolution of communication networks. On the one hand, machine-type communications are typically less bandwidth hungry than typical multimedia services. On the other hand, IoT flows tend to be composed of many small packets generated by large numbers of end nodes, and they may have end-to-end freshness requirements that may be challenging to satisfy with conventional link or transport layer approaches based on optimizing throughput and delay. Increasing the sampling rate of IoT nodes to respond to freshness requirements or adopting firstcome-first-served service policies can cause bottlenecks on the network, resulting in a reduction in quality of service. It has been argued in recent literature that optimizing data generation, transmission, and transport with respect to higher-level metrics such as Age of Information can prevent unnecessary network load, while improving the freshness of flows. In a broader perspective, there are proposals to encapsulate the significance or the value of the transferred information to the communication problem in certain "semantic metrics" and use these in the design of algorithms and protocols in all network layers, referred to as "semantic communication" [1].

**Citation:** O ˘guz, T.K.; Ceran, E.T.; Uysal, E.; Girici, T. Implementation and Evaluation of Age-Aware Downlink Scheduling Policies in Push-Based and Pull-Based Communication. *Entropy* **2022**, *24*, 673. https://doi.org/10.3390/ e24050673

Academic Editor: Syed A. Jafar

Received: 2 March 2022 Accepted: 23 March 2022 Published: 11 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Within the set of semantic metrics, the age of information (AoI) from the receiver's point of view is defined as the time elapsed since the generation of the newest status update that has been received by the destination [2]. AoI is gaining momentum as a key performance indicator (KPI) for machine-type communications (MTC). The primary reason for the interest in AoI is the growing demand for timely and fresh information in many emerging real-time and remote monitoring-based applications such as the Internet of things, vehicular networks, and cyber-physical systems.

AoI monitors the freshness of the entire information stream from the receiver's point of view. Hence, it reveals further aspects of the network compared to traditional metrics such as delay or throughput. For instance, the delay metric measures the timeliness from the transmitted packet's perspective. A low average delay does not mean a low average age in every case [3]. Continuous packet transmission policy (known as zero-wait policy in the literature) can optimize delay, but it may not provide age-optimality in the presence of FCFS (first-come-first-serve) queues [4]. Moreover, if the transmitter has an energy constraint, the inefficiency of the zero-wait policy becomes more apparent [3]. Improving the throughput alone can maximize the amount of data flow to the receiver node but may cause an overload of the queues within the network. Packets waiting in the queue result in outdated information reaching the receiver node. In this case, to reduce backlogs within queues, the packet generation rate should be decreased. However, an over-reduction of the packet generation rate would cause the receiver to be updated sporadically, which also leads to reduced AoI performance. This dilemma shows that AoI is a composite measure of both throughput and delay. For achieving optimal AoI, frequent packets must arrive regularly [5]. Consequently, solving the scheduling problem with an AoI minimization objective requires a novel formulation.

A significant portion of the AoI literature consists of studies involving push-based communication scenarios. In the push-based model, the generation of a new packet triggers the communication process. Then, the transmitter module sends the generated packet to the receiver module. The sequence of operations of the communication process proceeds from the information source to the destination. However, one of the network models often encountered in real-life scenarios is the pull-based model, where the query source requests (or queries) information from the receiver module. In this scenario, the initiator of the communication process is the query source that aims to pull information from the receiver module. The source of these queries could be users or applications that want to monitor the information source. In the pull-based network, the sequence of operations of the communication process proceeds from the destination to the source.

In this paper, we consider both push-based and pull-based status update systems and experimentally investigate the performance of several age-aware downlink scheduling policies in wireless multi-user networks. The main contribution of this study is to report one of the pioneering experimental studies of age-aware MAC layer scheduling policies. We have implemented a multi-user downlink network with a single base station and multiple receivers using software-defined radios (SDRs). This testbed implementation allowed us to examine push-based and pull-based scenarios and state-of-the-art scheduling policies. Along with the other well-known policies, we have proposed max-weight policies for different pull-based scenarios and provided extensive simulation and experimental results.

The rest of the paper is organized as follows. In Section 2, we present the related work. In Section 3, the system model is presented and the problems of minimizing the average AoI, QAoI,gmagmaild and EAoI in the network are formulated. Age-aware downlink scheduling policies are exhibited in Section 4, and the experimental setup is explained in detail in Section 5. Simulation and experimental results are presented in Section 6, and the paper is concluded in Section 7.

#### **2. Related Work**

There are numerous studies examining the AoI metric in the literature. The major works that stand out in the literature are those investigating the effects of different queuing types and developing scheduling policies to minimize average AoI in the network. An important concern when proposing a scheduling policy is the required computational load [5]. The work in [6] shows that a scheduling problem with an age minimization goal is an NP-hard problem in a multi-user network. In [7], age-aware scheduling policies are derived for the lossy channel case in a multi-user network. The greedy policy is inspected, and results indicate that the policy is optimal in the symmetric channel state for mean AoI minimization. In [8], the network is analyzed based on the peak-age and mean-age metrics, and a virtual-queue-based policy and an age-based policy are developed. The virtual-queue-based policy is shown to be peak-age optimal. The age-based policy is proved to be within a factor of four of optimal values for peak age as well as average age. In [5], Whittle's index (WI) policy and max-weight (MW) policy are proposed. The lower bound for AoI that can be calculated by using the statistical information of the network is derived. Lower and upper limits of AoI performances for WI and MW policies are calculated and proven to be within a factor of four of the optimal (upper limit is at most four times higher than the lower limit). There are also learning-based approaches in the literature to find an optimal age-aware policy for multi-user networks [9,10].

In the multi-user scheduling problem, the generation procedure of the packets has a significant impact on the AoI. In the literature, sources that generate a fresh packet at every time frame are referred to as "active sources" [8]. For a system model with active sources, whenever there is a transmission, the age of the corresponding flow will be reset to its minimum possible value (one frame duration in our setup). However, many realistic scenarios may be better modeled with a packet generation that is a stochastic process. For example, [11] studied a case where the packet generation procedure is a Bernoulli stochastic process and proposed scheduling policies suitable for that system model.

The queue service policy (e.g., LCFS (last-come-first-serve), FCFS (first-come-firstserve)) also has a significant effect on AoI [12,13]. For the active source case, queuing policies become even more important since sources load the network with the highest rate available. Queue management policy determines the behavior of the queue when the new packets arrive. If the queue is managed with an LCFS policy, the freshest packet will be at the top of queue, and the first packet that leaves the queue will be the one with the most up-to-date information. In FCFS queues, a new packet is added to the bottom end of the queue. To transmit the most timely packet, all packets in front of the last inserted packet must be sent for transmission. As a result, the most up-to-date packet loses time and becomes stale waiting the transmission of other packets in the queue.

The overwhelming majority of the AoI literature to date has emphasized theoretical studies. However, there are also studies on implementation in the literature [14–22]. For a survey of this implementation-oriented literature, see [23]. In [14–17], the experimental setup mostly lies between the transport layer and application layer. The effects of different wireless access technologies on end-to-end TCP/IP connections were measured by [14–16]. Studies in [18–20] cover a broader range of interconnection layers and capture the performances of novel age-based MAC layer algorithms. In [18], Wi-Fi protocol is implemented on SDRs. The uplink of a wireless network is taken into consideration, and the effect of utilizing the MW scheduling policy is investigated. The work presented in [19] experimentally investigates the effects of packet management policies on the performance of networked control systems. A test environment was developed by [20] to evaluate various ALOHA-like random access protocols. In our previous work in [21], we implemented a multi-user wireless network using SDRs. We compared the AoI performances of MW and WI policies with round-robin and greedy policies.

The time-average age metric weighs information freshness of all time frames equally. However, there are many types of real-world applications where the demand for timely information varies in time. For these, minimization of time-average age may not be the most relevant objective. In the literature, various semantic metrics alter this model, placing higher emphasis on selected time frames. For example, the age of incorrect information (AoII) metric focuses on the usefulness of the information and aims to maximize the freshness

of non-repetitious information. In the AoII concept, obtaining redundant information is pointless for the receiver and does not reduce AoII. The objective is to minimize the age of differing information [24–26].

Query age of information (QAoI) is a recent semantic metric proposed to investigate pull-based scenarios from the AoI perspective [27–30]. QAoI considers a model where the freshness of information is valuable only at query moments. These queries are sent to the receiver modules in the network. Then, the receiver modules respond to these queries with the most up-to-date information. The source of the queries can be a user or an application that needs to obtain the most up-to-date information. In [29], the pull-based scenario is discussed, and the effective age of information (EAoI) metric is presented for the multi-user system model. Query generation is modeled as an independent Bernoulli process for each receiver, and the immediate EAoI is assumed to be zero for frames without queries. For the queried frames, immediate EAoI is related to the immediate AoI of the receiver under the proactive serving assumption as a query response procedure. According to the proactive serving method, the receiver module can wait for the query response for a frame if it is expecting a packet arrival within the frame. If a packet arrives at the end of the frame, the receiver sends the information in the newly received packet as a query response. In the study, WI-based scheduling policy is proposed for the multi-user system model, and the performance of the policy is demonstrated in the simulation environment.

The work in [28] presents the query-AoI metric for a single receiver in pull-based communication. The calculation of the QAoI metric presented in this study is similar to the EAoI. However, an instantaneous serving scenario is adopted instead of the proactive serving in [29]. In addition, the transmitter module is assumed to have an energy constraint, and the presence of the energy constraint turns the problem in another direction while increasing the value of the QAoI reduction per transmission. Within the scope of the study, the permanent query (PQ) model, which is a query generation procedure that approximates the studied problem with the standard AoI problem, and the query arrival process-aware (QAPA) model, which generates queries based on periodic or stochastic processes, are examined. The optimal scheduling policy in the PQ process is the same as the optimal scheduling policy for AoI. In the case of QAPA, the scheduling policy has information on the query process (either stochastic or deterministic) and can schedule accordingly.

A continuous-time status update model is investigated in [30], where a source node submits update packets to a channel with random transmission delay, and the query source tries to pull information from the receiver module according to a stochastic arrival process. The average QAoI is defined as the average AoI measured at query instants, and the system model is examined from both AoI and QAoI perspectives. Age-aware scheduling policies do not use the information about the query process and freshness equally for all frames. On the other hand, QAoI-aware scheduling policies use additional information about the query process in the scheduling decisions. This extra information allows the scheduling policy to distribute transmission attempts more efficiently and reduce the time spent in the FCFS queue. Eventually, from the query source's perspective, QAoI-aware policies can provide better AoI performance than AoI-aware policies.

To the best of our knowledge, this is the first work in the literature that considers practical implementation and evaluation of QAoI-aware scheduling policies. In addition, we propose and implement novel max-weight policies for the effective AoI and query-AoI system models and evaluate their performance in terms of AoI, EAoI, and QAoI, in both simulation and SDR environments. We have observed that the resulting EAoI-aware maxweight (EAoI-MW) policy has a similar EAoI performance to the WI but yields a higher network throughput. We have also observed that QAoI-aware max-weight (QAoI-MW) provides superior QAoI performance than AoI-aware policies.

#### **3. System Model**

We consider a wireless multi-user network where a common access point or a base station (BS) needs to send status update packets containing time-sensitive information to multiple receiver modules. Let *M* denote the total number of receivers. We also assume a discrete time system where time is divided into fixed-length frames denoted by *t* ∈ {1, . . . , *T*}. In each frame, the base station (transmitter) is allowed to activate the connection for a single receiver *i* ∈ {1, ... , *M*}, and it cannot send packets to more than one receiver within a frame. A transmission attempt of a status update to a single user takes a constant time, which is assumed to be equal to the duration of one frame. Wireless channels between the receivers and the base station are unreliable. The state of each channel changes randomly from one time slot to the next and is modeled by a Bernoulli random variable. Channel states for each receiver are also independent of the others.

The packet generation scheme in the system follows the "active source" model. At the beginning of each frame, information sources generate new packets for each receiver, and these packets reach the BS immediately. The base station selects one of these packets for transmission and discards the others. There are no queuing-related delays between the information sources and the base station. If a receiver successfully receives a packet, the AoI of this receiver successfully drops to one since the newly formed packet at the information source reaches the receiver within a frame, without observing any delay.

In the system model, there are also query sources linked to each receiver. Each query source is independent of the other, and used to model the behavior of a real-life user or application interested in a particular time-sensitive piece of information at query instants. Query arrival frames to receivers can follow either a deterministic or stochastic pattern. When a query source requests information from a receiver, it sends a query. Then, the receiver responds to it with the latest information that the receiver holds. Query and response messages are transmitted without any errors.

The BS judiciously selects a receiver for transmission according to a stationary scheduling policy *π* ∈ Π represented by *ai*(*t*), for all *i* ∈ {1, ... , *M*} and *t* ∈ {1, ... , *T*}. If the receiver *i* is selected for transmission in frame *t*, then *ai*(*t*) will be equal to one. Otherwise, *ai*(*t*) will be equal to zero. Evaluation of *ai*(*t*) is given in (1).

$$a\_i(t) = \begin{cases} 1 & \text{if the receiver } i \text{ is selected,} \\ 0 & \text{otherwise.} \end{cases} \tag{1}$$

If a successful transmission occurs, the base station is informed over an error-free channel in the same frame. By utilizing this knowledge, scheduling policy can keep track of the AoI of the receivers. Similarly, *ci*(*t*) is a binary variable indicating the random channel state of receiver *i* at frame *t*. If the channel status of the receiver *i* is ON, then the successful transmission can be made at frame *t*, and *ci*(*t*) will be equal to one. Otherwise, if the channel is not available for transmission, *ci*(*t*) will be equal to zero. We assume *ci*(*t*) is an independent and Bernoulli-distributed random variable and the probability of successful transmission (i.e., reliability) is *pi*, for all *i* ∈ {1, . . . , *M*}. Evaluation of *ci*(*t*) is given in (2).

$$c\_i(t) = \begin{cases} 1 & \text{if the channel is ON } \\ 0 & \text{if the channel is OFF } \end{cases} \tag{2}$$

To have a successful transmission in frame *t*, the receiver *i* must be selected for transmission, and the channel status of that receiver must be available for transmission. Let *ui*(*t*) denote the overall result of the transmission to receiver *i* at frame *t* (3). Evaluation of *ui*(*t*) is given in (3).

$$u\_i(t) = \begin{cases} 1 & \text{if } c\_i(t)a\_i(t) = 1, \\ 0 & \text{otherwise} \end{cases} \tag{3}$$

We also define *fi*(*t*) as complementary of *ui*(*t*) for simplification of some equations throughout the paper, that is, *fi*(*t*) = 1 − *ui*(*t*).

The instantaneous AoI of receiver *i* at the beginning of the *t*th frame is denoted by Δ*i*(*t*). Note that Δ*i*(*t*) drops to one if the transmission to receiver *i* succeeds and increases by 1 if receiver *i* is not selected for transmission or fails to successfully receive a packet. Evaluation of Δ*i*(*t*) is given in (4).

$$\Delta\_{\bar{l}}(t+1) = \begin{cases} 1 & \text{if } u\_{\bar{l}}(t) = 1 \\ \Delta\_{\bar{l}}(t) + 1 & \text{otherwise} \end{cases} \tag{4}$$

*di*(*t*) indicates the query presence. If a query arrives to the receiver *i* at frame *t*, *di*(*t*) will be equal to one. Otherwise, *di*(*t*) will be equal to zero. Evaluation of *di*(*t*) is given in (5).

$$d\_i(t) = \begin{cases} 1 & \text{if a query arrives to the receiver} \\ 0 & \text{otherwise} \end{cases} \tag{5}$$

The instantaneous query age of receiver *i* at the beginning of the *t*th frame is Δ*qi* (*t*). The evaluation of the Δ*qi* (*t*) varies with the adopted query response scenario within the system model. In the scope of this study, we assume that query arrival to the receiver and receiver's response will happen at the beginning of the frame. We denote this query response scenario as the "instantaneous serving" scenario. Evaluation of Δ*qi* (*t*) for instantaneous serving scenario is given in (6).

$$
\Delta\_{\eta\_i}(t) = d\_i(t)\Delta\_i(t) \tag{6}
$$

An alternative query response scenario called "proactive serving" is defined in the literature in [29]. In proactive serving, the response to the query may be delayed by at most one frame. The purpose of this delay is to put the newest information into the query if the receiver acquires a packet within the queried frame. Nevertheless, unless stated otherwise, the instantaneous serving strategy will be adopted throughout this study. The overall system model is illustrated in Figure 1.

**Figure 1.** The architecture of the system model.

Next, we formally define the AoI and QAoI minimization problems in Sections 3.1 and 3.2, respectively.

#### *3.1. AoI Minimization Problem*

The analytical expressions for the AoI minimization problem have been previously studied in [5]. The objective of the scheduling policy is to minimize the average AoI in the network. Average AoI is calculated for *M* receivers across *T* frames. The objective is to find a stationary scheduling policy *π* ∈ Π that minimizes the long-term average AoI, which is defined in (7).

$$\min\_{\pi \in \Pi} \lim\_{T \to \infty} \mathbb{E}[J\_A(\pi)]\_\prime \text{ where } J\_A(\pi) = \frac{1}{TM} \sum\_{t=1}^T \sum\_{i=1}^M \Delta\_i(t) \tag{7}$$

#### *3.2. QAoI Minimization Problem*

For the QAoI problem, the main objective of the scheduling policy is to minimize the average age of the query sources in the network. This problem differs from the AoI problem since the query sources do not require fresh data at every instant but only at the queried frames. The difference between the two problem statements has also been previously investigated in [28,30].

There are two major approaches in the literature to calculate the average ages of the users at query instants in pull-based communication systems. In the first approach, the sum of the ages at query instants is divided by the total number of frames. This method is followed by [27,29] to develop age-aware scheduling policies, and the metric is called *effective age of information (EAoI)*. Note that [28] also follows a similar approach in the discounted setting for single-user pull-based communication.

The objective function obtained by utilizing this approach is given in (8).

$$\min\_{\pi \in \Pi} \lim\_{T \to \infty} \mathbb{E}[I\_E(\pi)], \text{ where } I\_E(\pi) = \frac{1}{TM} \left[ \sum\_{i=1}^M \sum\_{t=1}^T \Lambda\_{q\_i}(t) \right] \tag{8}$$

The second approach divides the sum of all query ages by the total number of query arrivals. This approached is used by [30] for the average query age calculation. The objective function obtained by utilizing this approach is given in (9).

$$\min\_{\pi \in \Pi} \lim\_{T \to \infty} \mathbb{E}\left[J\_Q(\pi)\right], \text{ where } J\_Q(\pi) = \frac{1}{M} \sum\_{i=1}^M \frac{1}{N\_i(T)} \left[\sum\_{t=1}^T \Delta\_{q\_i}(t)\right],\tag{9}$$

where *Ni*(*T*) denotes the total number of queries arrived at receiver *i* throughout *T* frames.

Throughout this study, we refer to the metric aligned with the first approach as the *effective age of information (EAoI)*, following its definition in [27]. We call the metric evaluated with the second approach the *query age of information (QAoI)*.

In the average EAoI calculation, the query age of the frames for which the query is not present is taken as zero and included in the average. This calculation method may lead to misleading results for measuring the average AoI of the query sources. This is because even if the AoI of a rarely queried receiver is very high at the time of query, it remains low on average due to the inclusion of unqueried frames. Similarly, for a frequently queried receiver, since the number of unqueried frames is low, the number of zeros included in the calculation of the average EAoI will be low. Therefore, the EAoI of this receiver will tend to be higher than the rarely queried receiver. The effect of the scheduling policy becomes less apparent as the query frequency decreases. Therefore, to measure the performance of scheduling policies, comparing average EAoI values of two individual systems with different query arrival frequencies would provide inconsistent results. When the same problem is analyzed from the QAoI perspective, the effect of the scheduling policy becomes more decisive, as the unqueried frames are discarded in the average query age calculation.

To examine the QAoI problem, we first consider the case where the query generation is an independent Bernoulli process. Note that [28,30] indicates that, to see a difference between QAoI and AoI metrics, the query arrival process must be non-stationary. For the Bernoulli query arrival case, the QAoI problem converges to the AoI problem. On the other hand, EAoI can yield results different than AoI even under the Bernoulli-arrival scheme.

#### **4. Age-Aware Downlink Scheduling Policies**

In this section, we define scheduling policies to minimize age-aware metrics. We describe AoI-aware, EAoI-aware, and QAoI-aware scheduling policies in Section 4.1, Section 4.2, Section 4.3, respectively.

#### *4.1. Scheduling Policies for AoI Minimization*

AoI-aware scheduling policies have been previously investigated in [5,7,8,10]. The WI and MW policies proposed in [5,7] utilize the instantaneous ages of the receivers and the reliabilities of the corresponding links to calculate the expected costs {*Ci*} associated with each receiver. To maximize the cost reduction, the scheduling policy selects the receiver with the highest *Ci* at each frame.

The max-weight policy is an adaptation of the Lyapunov optimization technique to the AoI minimization problem. Lyapunov optimization provides a method for penalty minimization while maintaining the queue stability [31]. The objective of the MW policy is to minimize Lyapunov drift in the network with the appropriate scheduling decision. Lyapunov drift measures the expected cost increase between two consecutive frames. In each frame, the policy calculates the expected Lyapunov drift of the receivers. Then, the policy selects the receiver with the highest Lyapunov drift. With this decision, the policy aims to minimize the overall cost. The calculation of expected costs for the MW policy is given in (10). At each frame, the scheduling policy selects the receiver with the highest *Ci*.

$$\mathbb{C}\_{i}(\Delta\_{i}(t)) = p\_{i}\Delta\_{i}(t)(\Delta\_{i}(t) + \mathcal{2})\tag{10}$$

The WI policy has been presented in [5,7,10] by formulating the AoI minimization problem in (7) as a restless multi-armed bandit (MAB) problem. The MAB problem in general aims to optimize the reward in an unknown environment through a series of trials where the decision-maker can activate only one of the arms and each arm has an immediate reward (or penalty for the minimization problem case) associated with it. The closed-form costs (indexes) for the WIP are given in (11). At each frame, the scheduling policy transmits to the receiver with the highest *Ci*.

$$\mathbb{C}\_{i}(\Delta\_{i}(t)) = p\_{i}\Delta\_{i}(t)\left[\Delta\_{i}(t) + \frac{2-p\_{i}}{p\_{i}}\right] \tag{11}$$

In our study, we implement AoI-aware MW and WI policies on the USRP testbed and compare their performances with round-robin and greedy policies.

#### *4.2. Scheduling Policies for EAoI Minimization*

In the USRP testbed, we implement and evaluate the performance of the EAoI-aware WI policy that was previously proposed in [29]. In addition, we propose an EAoI-aware MW policy by modifying the AoI-aware max-weight policy previously proposed in [5] and compare their performances.

EAoI-aware WI in [29] is given in (12). In each frame, the policy chooses the receiver with the highest *Ci*.

$$C\_i(t) = q\_i(p\_i \Delta\_i(t) + 2)(\Delta\_i(t) - 1) \tag{12}$$

We can derive the max-weight policy for the pull-based instantaneous serving scenario: First, we calculate the Lyapunov drift of the instantaneous EAoI's between consecutive frames. Then, in line with [5], we select quadratic Lyapunov function to calculate the Lyapunov drift.

**Lemma 1.** *In each frame, EAoI-MW policy selects the receiver with highest Ci*(*t*)*, which can be computed as in* (13)*.*

$$\mathcal{C}\_{i}(t) = q\_{i}p\_{i}\Big(\Delta\_{i}^{2}(t) + 2\Delta\_{i}(t)\Big).\tag{13}$$

Derivation of the EAoI-MW Policy can be found in Appendix A.

#### *4.3. Scheduling Policies for QAoI Minimization*

For the QAoI metric investigation, we evaluate the cases where the query arrival process forms a Markov chain. Within the Markov chain, we designate one state as the "Query" state and other states as "non-query" states. When the current state of the Markov chain reaches the query state, a query arrives.

For QAoI minimization, we propose a max-weight-based scheduling policy, following similar steps as in [5]. To adapt this policy to the QAoI model, we utilize the main features of the Markov chain, which determines the query process. In the first step, we calculate the future AoI Δ*i*(*t* + *K*) in terms of current AoI Δ*i*(*t*). The evaluation of AoI between consecutive frames is given in (14).

$$\begin{split} \Delta\_{i}(t+1) &= u\_{i}(t) + (1 - u\_{i}(t))(\Delta\_{i}(t) + 1) \\ &= a\_{i}(t)c\_{i}(t) + (1 - a\_{i}(t)c\_{i}(t))(\Delta\_{i}(t) + 1) \\ &= 1 + f\_{i}(t)\Delta\_{i}(t) \end{split} \tag{14}$$

Repeating this approach multiple times enables us to obtain the future AoI in terms of current AoI. The result is given in (15).

$$\begin{aligned} \Delta\_i(t+1) &= 1 + f\_i(t)\Delta\_i(t) \\ \Delta\_i(t+2) &= 1 + f\_i(t+1) + f\_i(t+1)f\_i(t)\Delta\_i(t) \\ \Delta\_i(t+3) &= 1 + f\_i(t+2) + f\_i(t+2)f\_i(t+1) + f\_i(t+2)f\_i(t+1)f\_i(t)\Delta\_i(t) \end{aligned} \tag{15}$$

In the following equations, we indicate the future time frames as ˆ*t*. Although it may lead to suboptimal results, for computational convenience, we assume that future decisions *ai*(ˆ*t*) are independent variables and stationary through time with a fixed expected value. Based on this assumption, we can argue that *fi*(ˆ*t*) is also stationary. Therefore, we define *fi* as the stationary version of the *fi*(ˆ*t*) as shown in Equation (16).

$$\mathbb{E}[f\_i(\hat{t})] = \mathbb{E}[f\_i(t+1)] = \mathbb{E}[f\_i(t+2)] = \mathbb{E}[f\_i(t+K)] = f\_i \tag{16}$$

Then, we define the closed-form version the future AoI with current AoI in (17).

$$
\Delta\_i(t+K) = \left[\sum\_{k=1}^K f\_i^{k-1}\right] + f\_i^{K-1} f\_i(t) \Delta\_i(t) \tag{17}
$$

To simplify the notation, we define *Fs*(*K*) and *Fm*(*K*) as in (18) and (19).

$$F\_{\mathbf{S}}(\mathbf{K}) = \sum\_{k=1}^{K} f\_{i}^{k-1} \tag{18}$$

$$F\_m(K) = f\_i^{K-1} \tag{19}$$

We then rewrite the simplified version of Equation (17) in Equation (20).

$$
\Delta\_i(t+K) = F\_s(K) + F\_m(K) f\_i(t) \Delta\_i(t) \tag{20}
$$

We proceed with the max-weight policy derivation steps by the definition of Lyapunov function and Lyapunov drifts. Similar to [5], we use the quadratic Lyapunov function as given in Equation (21). However, rather than calculating the Lyapunov drift between consecutive frames, we calculate the Lyapunov drift *Yi*(*t*) between the current frame *t* and the expected query-arrival frame *t* + *K*. Calculation of Lyapunov drift is given in Equation (22).

$$L(t) = \frac{1}{M} \sum\_{i=1}^{M} \Delta\_{q\_i}^2(t) \tag{21}$$

$$\begin{aligned} \mathcal{Y}\_i(t) &= \mathbb{E}\left[\mathbf{A}\_i^2(t+K) - \boldsymbol{\Delta}\_i^2(t)\right] \\ &= \mathbb{E}\left[F\_s(K)^2 + 2f\_i(t)F\_s(K)F\_\mathfrak{m}(K)\boldsymbol{\Delta}\_i(t) + f\_i(t)F\_\mathfrak{m}^2(K)\boldsymbol{\Delta}\_i^2(t) - \boldsymbol{\Delta}\_i^2(t)\right] \\ &= \mathbb{E}\left[F\_s(K)^2 - \boldsymbol{\Delta}\_i^2(t) + (1 - a\_i(t)c\_i(t))\left[2F\_s(K)F\_\mathfrak{m}(K)\boldsymbol{\Delta}\_i(t) + F\_\mathfrak{m}^2(K)\boldsymbol{\Delta}\_i^2(t)\right]\right] \end{aligned} \tag{22}$$

In Equation (22), *ai*(*t*) is the only decision variable from which the scheduling policy can choose its value. For simplification, we ignore terms in the calculation of *Yi*(*t*) that are not affected by the decision *ai*(*t*). We denote the modifiable part of the Lyapunov drift with *ai*(*t*) decision as *Y*ˆ *<sup>i</sup>*(*t*).

$$\begin{split} \hat{Y}\_{i}(t) &= -\mathbb{E}[c\_{i}(t)]\mathbb{E}[a\_{i}(t)]\mathbb{E}\left[2F\_{\text{s}}(K)F\_{\text{m}}(K)\Delta\_{i}(t) + F\_{\text{m}}^{2}(K)\Delta\_{i}^{2}(t)\right] \\ &= \mathbb{E}[a\_{i}(t)]p\_{i}\mathbb{E}\left[2F\_{\text{s}}(K)F\_{\text{m}}(K)\Delta\_{i}(t) + F\_{\text{m}}^{2}(K)\Delta\_{i}^{2}(t)\right] \end{split} \tag{23}$$

At each frame, the main objective of the scheduling policy is to minimize the Lyapunov drift. Therefore, the scheduling policy must eliminate the receiver with the highest *Ci*(*t*) to cause maximum reduction to Lyapunov drift.

**Lemma 2.** *For each frame, QAoI-aware max-weight (Q-MW) policy selects the receiver with highest immediate cost Ci*(*t*)*. Calculation of immediate cost is given in* (24)*.*

$$\mathbb{C}\_{i}(t) = p\_{i}F\_{m}(K)\Delta\_{i}(t)(F\_{m}(K)\Delta\_{i}(t) + 2F\_{s}(K))\tag{24}$$

To emphasize what our system model corresponds to in practice and depict the difference between AoI- and QAoI-aware policies, we can consider a simple IoT network as an example. This network consists of sensors, microprocessors, a base station, and individual users. In the network, sensor devices generate time-sensitive data about their current status. Nevertheless, the sensors cannot process this data, and they have to transfer it over a wireless network to remote microprocessors. The sensors send the data to a base station, and the base station transmits this data over the wireless network. However, the transmission capacity of the base station is limited, and it cannot simultaneously transmit data to multiple processors.

There is a dedicated microprocessor for each sensor. Microprocessors use the sensor data and generate status reports. Each microprocessor is tracked by an individual user that queries the processor to obtain the freshest status report about the sensor. Query arrivals to each microprocessor are independent of each other and occur infrequently.

QAoI-aware policies come to the fore if the requirement in the system precedes the query source's request for timely information. For the system model given in this example, AoI-aware policies concentrate on the AoI at the microprocessors, and the QAoI-aware policies focus on the AoI at the individual users. The impact of the QAoI-aware policy is shown in Figure 2. The figure examines the instantaneous AoI of a receiver (microcontroller in our example) in a multi-user network. A query source (individual user in our example) generates queries at the 41st, 81st, and 121st frames. From the query source's perspective, freshness is only important at query instants. In line with the query source's demands, the QAoI-aware policy aims to minimize the AoI of the receiver at the 41st, 81st, and 121st frames. Since there is no need for AoI minimization in all frames, the transmission constraint in the system can be relieved, and transmission attempts can be utilized more efficiently.

**Figure 2.** Instantaneous AoI of a receiver in a multi-user network.

#### **5. Implementation**

In this section, we describe our implementation work on USRPs. We firstly share detailed information about the implementation environment. Then, we describe the packet interface that we use to transmit time-sensitive information in Section 5.2, and we explain the runtime of our setup in Section 5.3.

Software-based radios, also called "software radios" in pioneering studies, are radios that allow the user to change main parameters of communication systems such as center frequency, bandwidth, and coding of the communication system only by changing the software [32,33]. With SDRs, all layers of the communication system, from the physical layer to the application layer, can be changed only by software modifications. These radios play an important role in the development of today's technologies that require rapid prototyping of various parameters, protocols, and standards, because software-based radios reduce the burden of extra hardware production for test and development studies and provide significant improvement in terms of time and cost.

For the AoI testbed implementation, we use one Ettus USRP N210, one NI USRP 2930, and two NI USRP 2930 SDR devices. General specifications of the devices are available in the devices' datasheets [34,35]. Both USRPs have independent transmit and receive modules. For this reason, these devices can operate as a transmitter and a receiver simultaneously. Nevertheless, it is not possible to run two transmission operations simultaneously.

The host computer runs a LabVIEW application that interfaces with the USRP devices. USRP communicates with the host computer via a 1 Gb Ethernet link. Signals are fragmented to in-phase and quadrant components and carried over in the Ethernet packets. Each transmitter and receiver module contains an amplifier that is controllable through software. In the experiments, we often use these amplifiers to change channel reliabilities.

The LabVIEW environment contains useful built-in functions for system implementation. We use them frequently in our study. We also benefited from the examples regarding the PSK-modulated communication system and packet-based digital link tutorials and examples provided by LabVIEW and the LabVIEW community [36].

#### *5.1. Setup*

Among four USRPs, one USRP is configured as the base station, and the other three are the receiver modules. The setup configuration for the implementation is given in Table 1. An overview of the USRP testbed is given in Figure 3.

**Figure 3.** Overview of the implementation environment.

System time is discretized in 50 ms duration frames. The LabVIEW application keeps track of the frame number, that is, the total number of frames that have passed since the experiment began. The frame number is the system's reference clock. All radios run in separate threads over a single LabVIEW application running on the host computer. Thus, difficulties related to synchronization are reduced, as all USRPs are managed from a single host.

We use QPSK modulation in the air interface. The maximum operating frequency of the USRP-2920 is 2.2 GHz [35], and we choose a center frequency of 1.9 GHz for all receivers. We prefer the high center frequency of the carrier signal to induce higher path loss since we have a limited area in the test environment.

The sampling rate of the USRP is configurable via the LabVIEW application. Detailed information about this configuration is described in the USRP documentation [35]. NI specifies that the I/Q ratio must be multiplied by 0.8 to convert to the sample rate [37]. In the implementation, we use the I/Q ratio 500k samples/s, which corresponds to a sampling rate of 400k samples/sec or bandwidth of 200 kHz. This bandwidth meets the requirements of our target application. Selecting higher I/Q rates increases the bandwidth. However, increasing the sample rate causes more data to be processed and transported. Therefore, more data would put a higher load on the USRP and Ethernet connection and eventually induce higher delay. Since timeliness is the primary concern in AoI calculations, we keep the I/Q Ratio low to achieve a more stable operating point without overloading the USRP and Ethernet.

**Table 1.** Overview of parameters.


#### *5.2. Packet Interface*

In the implementation, time-sensitive information is carried through the packets. The structure of the packet interface is summarized in Figure 4. There are six guard bits at the beginning of a packet. These bits are placed to prevent the pulse shaping filter from damaging the message content. The synchronization bit field starts after the guard bits. A 30-bit synchronization sequence is known in advance by both the sending and receiving modules. This sequence is created by a LabVIEW function that generates pseudo-random bits in the Galois domain. Receivers that continuously acquire data from the air interface use the synchronization sequence to detect the beginning of the packet.

**Figure 4.** Packet content in the air interface.

The message field contains time-sensitive information. The message consists of Receiver ID (RX ID) and Packet ID fields. RX ID field is a 4-bit address that is used to distinguish receivers. Each receiver has a unique RX ID. When a receiver obtains a packet, it locates the RX ID field in the packet content and compares it with its RX ID. If the RX ID of the packet and the receiver do not match, the receiver discards the packet, and Δ*i*(*t*) for that receiver increases by one for the next frame.

The frame number is the reference clock of the entire system. It initially starts from one at the beginning of the experiment and increases by one for each frame. Upon the generation of a packet, the Packet ID field is filled with the frame number of the system. Thus, the Packet ID field operates as the packet timestamp. Since the receiver also knows the current frame number, the difference between the packet's creation frame (contained in the Packet ID field) and the current frame gives the instant information age Δ*i*(*t*).

Packets sent over an unreliable channel may suffer corruption due to noise. The receiver should discard packets containing incorrect information since processing this data may lead to incorrect AoI measurements. To detect errors, we use cyclic redundancy check (CRC). Within the packet generation process, we pass the message field through the 16-bit CRC and write the result to the CRC field. When a receiver obtains a packet, it first calculates the CRC of the message field of the packet and compares the result with the CRC field in the packet. If both CRCs are equal, we consider the message to be error-free. We track the number of successful CRC checks for each receiver, thereby dynamically measuring the reliability of the channel. In the implementation, we dynamically estimate channel reliability throughout the experiment. Accurate calculation of the channel reliability values is essential, as MW policy and WIP take this value as input. We pre-run the setup to initialize the channel reliabilities. During the pre-run, we discard AoI calculations.

#### *5.3. Runtime*

The LabVIEW program allows multi-threading, which allows us to execute processes independently in different threads. We implement the Receiver, Transmitter, and Logging modules as separate threads in the program. In this way, we were able to perform these operations simultaneously. Moreover, the LabVIEW program has the feature of providing synchronization between threads. With the activation of this feature, it has been possible to organize processes running in different threads and following each other. The runtime of the system can be described step by step as follows:


long enough compared to the transmission thread's time to complete its task so that the receiver can acquire the signal sent by the transmitter.


This experimental procedure is repeated at each frame. After the overall experiment is finished, results are saved to a text file.

#### **6. Experiments and Results**

Throughout this section, we share the results that we obtained in the USRP environment and MATLAB simulations. We share the performances of AoI-aware policies in Section 6.1, EAoI-aware policies in Section 6.2, and QAoI-aware policies in Section 6.3.

#### *6.1. Evaluation of AoI-Aware Scheduling Policies*

In this section, we share the results of the experiments conducted in the SDR network. We evaluate the performances of AoI-aware scheduling policies, and compare their AoI performances with round-robin and greedy policies. Round-robin policy activates all links sequentially, one per frame, regardless of any prior knowledge obtained about receivers. Greedy policy uses the AoIs of the receivers and selects the receiver with the highest age for packet transmission.

We evaluate the scheduling policies under various conditions by changing the channel reliabilities of the receivers among experiments. To change channel reliabilities, we manipulate the gains of the receiver and transmitter USRPs. LabVIEW allows configuring the signal gains of USRPs. Moreover, we locate receiver USRPs with different distances to the

transmitter USRP to induce diverse path losses to receiver USRPs. When the signal power of the transmitter USRP increases, all receivers in the network acquire stronger signals. Therefore, the channel reliabilities of all receivers increase. Throughout the experiments, we also adjust the power gains of the receivers to alter the channel reliabilities. The receiver's power gain is directly proportional to its channel reliability. Increasing the signal gain of a receiver reduces the error probability for that receiver and increases the channel reliability.

In the experiments, we run scheduling policies multiple times at each power gain level and take the average of the obtained results. We compare the scheduling policies in terms of the average AoI and the throughput of the network.

#### 6.1.1. Adjusting the Gain of an Individual Receiver

In this case, we increase the input signal gain of an individual receiver USRP. Throughout the experiments, we test the policies ten times at each transmitter gain level and average the results of redundant experiments. In each experiment, the frame length is *K* = 7500, and *M* = 3 receivers are available in the network. Results of the experiments in terms of AoI and throughput are given in Figure 5. Average channel reliabilities for each USRP gain level are given in Table 2.

**Figure 5.** Evaluation of average AoI *JA* (**a**) and throughput (**b**) with varying receiver gain (SDR testbed).


**Table 2.** Channel statistics in the first experiment set.

6.1.2. Adjusting the Gain of the Base Station

In this case, we increase the output signal gain of the transmitter USRP (base station). Throughout the experiments, we test the policies five times at each transmitter gain level and average the results of redundant experiments. In each experiment, the frame length is *K* = 7500, and *M* = 3 receivers are active in the network. Average AoI and throughput of age-aware policies are illustrated in Figure 6, and the channel statistics for the experimental setup are given in Table 3.

**Figure 6.** Evaluation of average AoI *JA* (**a**) and throughput (**b**) with varying BS output gain (SDR testbed).




#### 6.1.3. Comparison of SDR Testbed Results with Simulations

In this section, we share the results of the comparison between simulation and realization. We use the results of the experiment mentioned in Section 6.1.2 as a reference to the simulation. We use the same channel reliabilities from Table 3 for the simulation environment and evaluate the policies. Results of the comparison in terms of average AoI and throughput are given in Figure 7.

**Figure 7.** Comparison of simulation and implementation in terms of average AoI *JA* (**a**) and throughput (**b**).

#### 6.1.4. Interpretation of the Results

As channel reliability decreases, the performances of MW and WIP differ positively from the others. MW and WIP policies take channel reliability into account in the scheduling decision. This information enables more efficient use of transmission attempts. On the other hand, greedy policy does not utilize channel reliability information. If a receiver with a very low-quality channel is present in the network, the greedy policy may block the network by continuous unsuccessful update attempts to that receiver. This results in an increase in the average AoI of the network. For the first experiment set with the results illustrated in Figure 5, the performance degradation of the greedy algorithm is more apparent. Greedy policy always tries to send an update packet to the receiver with the worst channel condition. However, that receiver rarely receives packets successfully, and the base station gets stuck in that receiver until a successful packet reception. On the other hand, since the round-robin policy proceeds by transmitting to all receivers one by one without using any information about whether the packet is successfully received or not, the starvation problem does not occur. In both experiments, we also observe that as channel reliability values of receivers improve and asymmetry of channels decreases, greedy policy performs better than round-robin. As the channel conditions improve and the asymmetry among the channels decreases, performances of both policies converge to the optimal. As the channel reliability rises to 100%, all scheduling policies behave like round-robin and transmit to all receivers in a cyclic order.

For the SDR testbed simulation comparison case, we use the same average channel reliabilities in both experiments. We do not observe any significant difference in throughput, as expected. However, in terms of AoI, we found that the simulation results yield lower AoI than the SDR implementation. In the simulation environment, the channel status is a Bernoulli random variable. However, in the SDR implementation, the channel status is formed by realistic conditions and doesn't have to be stationary or follow the Bernoulli distribution. The regularity of the packet arrivals is an essential factor for low AoI. Even if the channel reliabilities over time are equal for SDR realization and simulation environments, the imperfections of the realistic channel may reduce the update regularity more drastically than the Bernoulli-distributed channel.

#### *6.2. Evaluation of EAoI-Aware Scheduling Policies*

In this section, we compare the EAoI-aware policies with the traditional policies. Traditional policies do not utilize query statistics for scheduling decisions, and we aim to observe the outcomes of using query statistics. We evaluate the policies in the SDR environment and use EAoI as the primary performance indicator. We also share results about AoI and throughput metrics. Throughout the experiments, query presences at each frame are implemented as i.i.d. Bernoulli random variables. In each experiment, the frame length is *K* = 7500, and *M* = 3 receivers are active in the network. We use the proactive serving method as the query response scenario.

#### 6.2.1. Adjusting the Gain of an Individual Receiver

In this case, we increase the output signal gain of an individual receiver USRP. Throughout the experiments, we test the policies ten times for each gain level and average the results of redundant experiments. Evaluation of EAoI and AoI throughout the experiments are given in Figure 8. Channel statistics corresponding to USRP gain levels are given in Table 4 and query statistics are given in Table 5.

**Figure 8.** Evaluation of effective AoI *JE* (**a**) and throughput (**b**) with varying input gain of second receiver (SDR testbed).


**Table 5.** Query statistics.


#### 6.2.2. Adjusting the Gain of the Base Station

In this case, we increase the output signal gain of the transmitter USRP (base station). Throughout the experiments, we test the policies at least five times for each gain level and average the results of redundant experiments. Evaluation of EAoI and AoI throughout the experiments are given in Figure 9. Channel statistics corresponding to USRP gain levels are given in Table 6 and query statistics are given in Table 7. g


**Table 6.** Channel statistics.

#### **Table 7.** Query statistics.


#### 6.2.3. Interpretation of the Results

For the EAoI minimization objective, the EAoI-MW and EAoI-WI policies outperform the policies that do not utilize query information. Moreover, experimental results show that EAoI-MW surpasses the EAoI-WIP in terms of throughput. For EAoI-aware scheduling policies, whether the policy is derived for the instantaneous serving or the proactive serving scenario does not cause a significant difference in EAoI performance. Rather than utilizing the exact timings of the query arrivals, EAoI-aware policies weight receivers according to their long-term query arrival statistics. Since there is no significant difference between the proactive response and instant response scenarios in the long-term query arrival statistics, there is no significant difference between the performances of the policies. As can be seen from Figures 8 and 9, the EAoI performances of EAoI-MW derived for the instantaneous response scenario and the EAoI-WIP derived for the proactive response scenario are very close to each other.

#### *6.3. Evaluation of QAoI-Aware Scheduling Policies*

In this section, we share the results of our experiments. We conducted the experiments in the simulation environment and the SDR environment. Throughout the experiments, we evaluated the performance of the QAoI-aware MW policy in terms of QAoI and AoI, and we used the AoI-aware MW policy as a benchmark.

#### 6.3.1. Results from Simulation Environment

We conducted four experiments in the MATLAB environment. In each experiment, the frame length was *K* = 1,100,000, and *M* = 10 receivers were active in the network. Within the experiments, we adjusted the query period of the receivers and observed the result of this increment from the AoI and QAoI perspectives. We initialized query periods to prevent the overlap of the query frames for each receiver. We assume *ai*(*t*) is stationary through time by taking advantage of non-overlapping queries, and we calculate *fi* as *fi* = 1 − *pi* in Q-MW policy. Channel reliabilities (long-term average packet success rates) measured in the experiments are summarized in Table 8. Average QAoI and AoI obtained by Q-MW policy for each experiment are given in Figures 10 and 11, respectively.



**Figure 10.** Evaluation of Q-MW policy in terms of average QAoI (*JQ*).

**Figure 11.** *Cont.*

**Figure 11.** Evaluation of Q-MW policy in terms of average AoI (*JA*).

#### 6.3.2. Results from the USRP Testbed

We conducted two experiment sets in the SDR testbed. In both experiment sets, we increased the output signal gain of the transmitter USRP (base station) to observe the effects of various channel reliabilities (i.e., packet success rates). For each signal gain level, we test the policies at least ten times and average the results of redundant experiments. The frame length of each test was *K* = 7500, and there were *M* = 3 receivers in the network. In the first experiment set, the query period of receivers was 25, and in the second experiment set, the query period of the receivers was 5. In both experiment sets, we initialize the query periods to prevent the arrival of multiple queries within the same frame. We assume *ai*(*t*) is stationary through time by taking advantage of non-overlapping queries, and we calculate *fi* as *fi* = 1 − *pi* in Q-MW policy.

For the first experiment set, evaluation of QAoI and AoI are given in Figures 12 and 13, respectively. Channel reliabilities corresponding to USRP gain levels are given in Table 9. For the second experiment set, evaluation of QAoI and AoI are given in Figures 14 and 15, respectively. Channel reliabilities corresponding to USRP gain levels are given in Table 10.

**Figure 12.** Evaluation of QAoI (*JQ*) for varying power levels of transmitter USRP, 25 frames length query period.

**Figure 13.** Evaluation of AoI (*JA*) for varying power levels of transmitter USRP, 25 frames length query period.

**Table 9.** Channel statistics.


In the second experiment, we increase the output signal gain of the transmitter USRP (base station). Throughout the experiments, we test the policies at least ten times for each gain level and average the results of redundant experiments. In each experiment, the frame length is *K* = 7500, and *M* = 3 receivers are active in the network. In this experiment, the query period for each receiver is 5 frames. We initialize the query periods such that the queried frames of receivers do not overlap at the same frame.

**Figure 14.** Evaluation of QAoI for varying power levels of transmitter USRP.

**Figure 15.** Evaluation of AoI for varying power levels of transmitter USRP.

**Table 10.** Channel statistics.


6.3.3. Comparison of SDR Testbed Results with Simulations

In this section, we share the results of the comparison between simulation and realization. We use the results of the experiment illustrated in Figure 14 as a reference for the simulation. We use the same channel reliabilities from Table 10 for the simulation environment. Results of the comparison in terms of QAoI are given in Figure 16.

**Figure 16.** Comparison of simulation and realization results.

#### 6.3.4. Interpretation of the Results

Within the scope of the experiments, we studied the case where query arrivals are periodic. According to the results of both SDR realization and simulations, the Q-MW policy outperforms the AoI-MW policy for the QAoI minimization objective. By utilizing the query arrival information, the Q-MW scheduling policy can select receivers more efficiently, and thus it can exhibit superior QAoI performance compared to AoI-MW.

Throughout the simulations, we investigated Q-MW in networks with various channel reliabilities. In the first experiment, we considered ten receivers with good channel reliability. According to the results of this experiment, in cases where the query periods of the receivers do not overlap, the QAoI policy can reduce the average QAoI in the network to approximately one, which is the lowest possible limit. In the first experiment, the expected number of attempts to update a receiver is close to one. Having a reduced number of attempts enables the scheduling policy to distribute scheduling decisions more effectively and eases the alignment of the scheduling decisions with the query periods. In the second experiment, all receivers have poor channel qualities,and the number of attempts needed to update a receiver is high. As the channel reliabilities decrease, the expected number of attempts to update a receiver increases, and aligning the scheduling decisions with the query periods become more challenging. In this case, the performance of query-aware policies is reduced. As the query period increases, queried frames of the receivers become more distant from each other, which positively affects the performance of query-aware policies.

The fact that the transmission can only be allocated to a single receiver in each frame is one of the most fundamental limitations of the network. The QAoI-aware MW policy we recommend, on the other hand, reduces the need for packet transmission in the network by taking into account the query periods of receivers' timely information requests and eases the transmission allocation constraint in line with the query periods.

#### **7. Conclusions and Future Work**

Within the scope of the paper, we have examined the AoI, EAoI, and QAoI, which are semantic communication metrics that prioritize information freshness. We implemented a multi-user wireless network with SDRs to examine these metrics in real-world scenarios. We investigated the performance of AoI-aware scheduling policies by comparing their AoI performance with traditional scheduling policies. The emulation results reveal that the WI and MW policies are superior to the round-robin and greedy policies, as they exploit the information on the link reliabilities and AoIs of the receivers. Experimental results obtained in the SDR testbed are close to simulation results when packet drops are rare, but as the link reliabilities decrease, they begin to show some slight discrepancies. We attribute this to the following: the AoI-aware policies adopted in this work were derived under Bernoulli-distributed packet drops; however, as channels get poorer, the sequence of packet drops tends to acquire a memory.

We have also studied the Effective AoI and Query AoI metrics to examine the freshness of information from the perspective of the query source in pull-based networks. For the EAoI domain, we proposed EAoI-MW policy by leveraging the formerly defined AoI-aware policies. We implemented and tested the EAoI-MW and EAoI-WI policies on the SDR Network. Experiment results show that utilizing the statistical information about the query process significantly improves EAoI performance. We observed that EAoI-MW policy exhibits a comparable performance with EAoI-WI and yields better results throughput. For the Query AoI metric, we have proposed a scheduling policy by adapting the max-weight policy to the QAoI case for multi-user pull-based network scenarios. We tested the resulting Q-MW policy in simulation and SDR implementation environments. Results reveal that utilizing the Q-MW policy can reduce QAoI significantly compared to AoI-aware policies.

In future studies, we seek to examine different semantic metrics beyond AoI. To this end, we want to expand the scope of our work on the QAoI. We also aim to investigate and optimize the Q-MW policy for the stochastic query arrival scenarios.

**Author Contributions:** Conceptualization, T.K.O., E.T.C., E.U., and T.G.; methodology, T.K.O., E.T.C., and E.U.; software, T.K.O.; validation, T.K.O.; formal analysis, T.K.O., E.T.C., and E.U.; investigation, T.K.O., E.T.C., E.U., and T.G.; resources, T.K.O., E.T.C., E.U., and T.G.; data curation, T.K.O., E.T.C.; writing—original draft preparation, T.K.O.; writing—review and editing, E.T.C., E.U., and T.G.; visualization, T.K.O., E.T.C., E.U., and T.G.; supervision, E.T.C., E.U., and T.G.; project administration, E.U.; funding acquisition, E.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been supported by TUBITAK-BIDEB Grant 119C028.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A**

The framework of the Lyapunov optimization aims to minimize the Lyapunov function at every time instant *t*. Lyapunov function at time *t* is given in (A1).

$$L(t) = \frac{1}{M} \sum\_{i=1}^{M} \Delta\_{q\_i}^2(t) \tag{A1}$$

At frame transitions, each receiver causes a drift in *L*. For the EAoI concept, we focus on the drift between consecutive frames. The drift is associated with the receiver's AoI evolution. For each receiver, calculation of the Lyapunov drift *Yi*(*t*) between the frames *t* and *t* + 1 is given in (A2).

$$\mathcal{Y}\_{i}(t) = \mathbb{E}\left[\Delta\_{q\_{i}}^{2}(t+1) - \Delta\_{q\_{i}}^{2}(t)\right] = \mathbb{E}\left[d\_{i}(t)\Delta\_{i}^{2}(t+1) - d\_{i}(t)\Delta\_{i}^{2}(t)\right] \tag{A2}$$

Since we assume that policies are non-anticipative, which means the policies do not have information about future channel or query status, we can argue that Δ*i*(*t*) and *di*(*t*) are independent.

$$\mathbb{E}[d\_i(t)\Delta\_i(t)] = \mathbb{E}[d\_i(t)]\mathbb{E}[\Delta\_i(t)]\tag{A3}$$

We rewrite the Lyapunov drift using E[*di*(*t*)] = *qi* and E[*di*(*t* + 1)] = *qi*.

$$\mathcal{Y}\_{i}(t) = \mathbb{E}\left[q\_{i}\Lambda\_{i}^{2}(t+1) - q\_{i}\Lambda\_{i}^{2}(t)\right] \tag{A4}$$

After this modification, the derivation process becomes identical with [5]. We write the transition of Δ*i*(*t*) between consecutive frames in (A5).

$$\begin{split} \Delta\_i(t+1) &= u\_i(t) + (1 - u\_i(t))(\Delta\_i(t) + 1) \\ &= a\_i(t)c\_i(t) + (1 - a\_i(t)c\_i(t))(\Delta\_i(t) + 1) \end{split} \tag{A5}$$

Then, we rewrite the Lyapunov drift by expressing Δ*i*(*t* + 1) in terms of Δ*i*(*t*).

$$\mathcal{Y}\_{i}(t) = \mathbb{E}\left[q\_{i}[u\_{i}(t) + (1 - u\_{i}(t))(\Delta\_{i}(t) + 1)]^{2} - q\_{i}\Delta\_{i}^{2}(t)\right] \tag{A6}$$

Since *ui*(*t*) is 0-or-1 variable, we can argue that *u*<sup>2</sup> *<sup>i</sup>* (*t*) = *ui*(*t*), (<sup>1</sup> <sup>−</sup> *ui*(*t*))<sup>2</sup> = (<sup>1</sup> <sup>−</sup> *ui*(*t*)) and *ui*(*t*)(1 − *ui*(*t*)) = 0. With these simplifications, we can rewrite *Yi*(*t*).

$$\begin{split} \mathcal{Y}\_{i}(t) &= \mathbb{E}\left[q\_{i}\left[u\_{i}(t) + (1 - u\_{i}(t))(\Delta\_{i}(t) + 1)^{2}\right] - q\_{i}\Delta\_{i}^{2}(t)\right] \\ &= \mathbb{E}\left[q\_{i}\left[\Delta\_{i}^{2}(t) + 2\Delta\_{i}(t) + 1 - u\_{i}(t)\Delta\_{i}^{2}(t) - 2u\_{i}(t)\Delta\_{i}(t)\right] - q\_{i}\Delta\_{i}^{2}(t)\right] \\ &= \mathbb{E}\left[q\_{i}\left[2\Delta\_{i}(t) + 1 - u\_{i}(t)\Delta\_{i}^{2}(t) - 2u\_{i}(t)\Delta\_{i}(t)\right]\right] \\ &= q\_{i}\left[2\Delta\_{i}(t) + 1 - \mathbb{E}[u\_{i}(t)]\Delta\_{i}^{2}(t) - 2\mathbb{E}[u\_{i}(t)]\Delta\_{i}(t)\right] \\ &= \left[2q\_{i}\Delta\_{i}(t) + q\_{i} - q\_{i}p\_{i}\mathbb{E}[a\_{i}(t)]\Delta\_{i}^{2}(t) - 2q\_{i}p\_{i}\mathbb{E}[a\_{i}(t)]\Delta\_{i}(t)\right] \end{split} \tag{A7}$$

The *ai*(*t*) is the decision variable that we can select zero or one. We only aim to investigate the effect of changing *ai*(*t*). Since the results of other terms in *Yi*(*t*) do not change as *ai*(*t*) change, we omit them and focus on the terms that have *ai*(*t*) as a coefficient.

$$\begin{split} \hat{Y}\_{i}(t) &= -q\_{i}p\_{i} \mathbb{E}[a\_{i}(t)]\Delta\_{i}^{2}(t) - 2q\_{i}p\_{i} \mathbb{E}[a\_{i}(t)]\Delta\_{i}(t) \\ &= -\mathbb{E}[a\_{i}(t)]\left(q\_{i}p\_{i}\Big(\Delta\_{i}^{2}(t) + 2\Delta\_{i}(t)\Big)\right) \\ &= -\mathbb{E}[a\_{i}(t)]\mathbb{C}\_{i}(t) \end{split} \tag{A8}$$

$$\mathbf{C}\_{i}(t) = q\_{i}p\_{i}\left(\Delta\_{i}^{2}(t) + 2\Delta\_{i}(t)\right) \tag{A9}$$

The max-weight policy aims to minimize the average Lyapunov drift by selecting the receiver *i* with maximum *Ci*(*t*) = *qi pi* - Δ2 *<sup>i</sup>* (*t*) + 2Δ*i*(*t*) at every frame *t*.

#### **References**


## *Article* **Age of Information Minimization for Radio Frequency Energy-Harvesting Cognitive Radio Networks**

**Juan Sun 1, Shubin Zhang 1, Changsong Yang <sup>2</sup> and Liang Huang 1,\***


**Abstract:** The Age of Information (AoI) measures the freshness of information and is a critic performance metric for time-sensitive applications. In this paper, we consider a radio frequency energyharvesting cognitive radio network, where the secondary user harvests energy from the primary users' transmissions and opportunistically accesses the primary users' licensed spectrum to deliver the status-update data pack. We aim to minimize the AoI subject to the energy causality and spectrum constraints by optimizing the sensing and update decisions. We formulate the AoI minimization problem as a partially observable Markov decision process and solve it via dynamic programming. Simulation results verify that our proposed policy is significantly superior to the myopic policy under different parameter settings.

**Keywords:** Age of Information; RF energy-harvesting; cognitive radio network; dynamic programming

#### **1. Introduction**

To cope with both the spectrum scarcity and the energy shortage challenges in future wireless networks, radio frequency (RF) energy-harvesting in cognitive radio networks (CRN) has been increasingly attractive. Cognitive radio technology allows secondary users (SUs) to opportunistically access the primary users' (PUs) licensed spectrum, based on the condition that the SUs transmission must not cause harmful interference to PUs [1–4]. Meanwhile, the RF energy-harvesting technique conquers the intermittency and uncontrollability of the conventional charging techniques absorbing energy from renewable energy sources [5–7]. Hence, it can simultaneously improve energy efficiency and spectral efficiency, where the SUs can both capture energy and spectrum [8].

While existing works mainly investigated throughput of the RF energy-harvesting CRN, many emerging applications require timely status-update delivery [9–15], i.e., health monitoring, environment monitoring, smart building, vehicle-to-vehicle networking, and so on. For example, in health monitoring, the sensors continuously measure and update blood pressure and heartbeat to the health monitoring platform, which implies the importance of the freshness and timeliness of status-update. The Age of Information (AoI) as a recently proposed performance metric can be used to quantify the freshness and timeliness of status-update [16–23]. It is defined as the time elapsed since the generation time of the latest successfully received status-update at the destination.

Some innovative efforts have been devoted to the AoI of CRN [24–28]. In [24], the authors considered a cognitive wireless sensor network with a cluster of SUs, where the authors proposed a joint and scheduling strategy that optimized energy efficiency of a communication system subject to the expected AoI. The authors in [25] considered an overlay CRN where the SU acted as a relay. The SU forwarded the PU's packets or transmitted its own packets. The optimal policy for status-update and packet relaying was investigated to minimize the average AoI and energy efficiency. In [26], the authors

**Citation:** Sun, J.; Zhang, S.; Yang, C.; Huang, L. Age of Information Minimization for Radio Frequency Energy-Harvesting Cognitive Radio Networks. *Entropy* **2022**, *24*, 596. https://doi.org/10.3390/e24050596

Academic Editors: Anthony Ephremides and Yin Sun

Received: 23 February 2022 Accepted: 21 April 2022 Published: 24 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

analyzed the average peak AoI of the PU and SU for both overlay and underlay schemes. The asymptotic expressions of the average peak AoI were derived when the PU operated at high signal-to-noise ratio. Considering that it is difficult for PU keeping time-slotted synchronization with SU, the authors in [27] investigated AoI minimization in CRN with an unslotted PU. The closed-form expression was derived by conducting a Markov chain analysis. In [28], the authors considered AoI minimization for energy-harvesting CRN. They assumed that the SU harvests energy from ambient energy sources and derived the optimal sensing and update policies for both perfect and imperfect spectrum sensing.

Overall, the aforementioned research efforts rarely address AoI minimization for RF energy-harvesting CRN. Motivated by this, this article attempts to minimize the average AoI by adaptively making sensing and updating decisions subject to the energy causality and spectrum constraints with imperfect spectrum sensing. The system consists of one PU and one SU. Different from [28], the SU harvests RF energy from PU transmissions instead of ambient energy sources, which is further used to generate and deliver the status-update data pack when the PU is idle. The SU utilizes the harvested energy to perform spectrum sensing and updating. The main contributions of this paper are summarized as follows:


The remaining part of the paper is organized as follows. In Section 2, we review the works on RF energy-harvesting CRN in the literature. Section 3 describes the studied system model for RF energy-harvesting CRN. Section 4 first formulates the AoI minimization problem as a POMDP framework and then solves it through the dynamic programming. Section 5 presents simulation results and discussions. Finally, Section 6 concludes this paper.

#### **2. Related Works on RF Energy-Harvesting CRN**

Recently, cognitive radio technology has drawn significant attention as a promising solution to overcome the licensed spectrum severe scarcity. Cognitive radio allows SUs to opportunistically access PUs' licensed spectrum, based on the condition that the SUs transmission must not cause harmful interference to PUs [1–3]. Spectrum sensing is an important functionality in the cognitive radio system [29], by which the SUs decide whether the spectrum is occupied by the PUs. It can be performed by a single SU or in cooperation with multiple SUs. The SUs can only transmit data when the PUs are idle [30]. Various spectrum-sensing approaches have been developed based on employing different features of the PUs' signal [31], such as coherent detection [32], energy detection [33], and feature detection [34].

On the other hand, energy shortage is also a challenge in future wireless networks. Over the last past years, the RF energy-harvesting technique has emerged as a candidate method for charging low-power wireless devices, which can conquer the intermittency and uncontrollability of the conventional charging techniques absorbing energy from renewable energy sources [5–7]. In [35], the authors proposed the harvest-then-transmit (HTT) protocol as one of the important transmission strategies of RF energy-harvesting technology, where the users first harvest energy from the hybrid access point (HAP) and then use the captured energy to transmit information to the HAP. There have been some related works before. In [36], the authors investigated the wireless-powered network (WPCN) where one HAP coordinated the wireless information and energy transmissions to a set of nodes, where the transmission completion time minimization subject to the throughput requirement per

node was considered. Furthermore, the authors studied a similar WPCN scenario in [37], where they focused on energy provision minimization for two physical-layer protocols, nonorthogonal multiple access (NOMA) and time-division multiple access (TDMA). Different from the common WPCN with a fixed HAP, the transmission completion time minimization was investigated in aerial vehicle-enabled WPCN in [38].

To jointly solve the aforementioned two challenges including spectrum scarcity and energy shortage, introducing RF energy-harvesting in CRN has been increasingly attractive due to the fact that it can simultaneously improve energy efficiency and spectral efficiency, where the SUs can both capture energy and spectrum [8]. The timely-delivery probability of data packs for the RF energy-harvesting SU was derived in [39], where the SU opportunistically accesses the spectrum vacated by the PU to deliver real-time data packs and harvests RF energy when the PU is active. Unlike the traditional RF energy-harvesting CRN system where the SU keeps synchronization with the PU, the authors in [40] considered unslotted PU. The sensing intervals were derived to balance between energy harvesting and spectrum access. However, both [39,40] focused on a simple CRN consisting of one PU and one SU. The authors in [41–43] considered a more general scenario where there were multiple SUs or multiple PUs. In [41], the multiple selection strategy was proposed for RF energy-harvesting CRN to maximize the SUs' average throughput. In [42], the authors studied a hybrid energy-harvesting SU that can capture energy from both renewable sources and ambient radio frequency signals. The asymptotic activity behavior of a single SU was analyzed by deriving the theoretical upper bound on sensing and transmission opportunities. In [43], the authors investigated the end-to-end throughput maximization by jointly optimizing the transmit power and time allocation for multiple SUs.

#### **3. System Model**

As illustrated in Figure 1, we investigated AoI minimization for a RF energy-harvesting CRN, where the system consists of one PU, one SU, and one CBS communicating with the SU. The SU is a wireless sensor node that monitors the physical process and randomly generates status updates to the CBS. It has no embedded power supply available and harvests RF energy from PU transmissions. Additionally, it opportunistically accesses the PU's licensed spectrum. We considered a time-slotted system with a time interval of *T* time slots. The duration of each time slot is sufficient for the SU to generate one status-update data packet and receive it successfully at the CBS. Without loss of generality, we assume that the time slot duration is 1 s. The important notations are summarized in Table 1.

**Figure 1.** System model. In each time slot, the SU can harvest energy from the PU transmissions and can deliver the status-update date pack to the CBS when the channel is idle.


**Table 1.** List of notations used in this paper.

#### *3.1. Primary User Model*

The occupancy of a channel by the PU is modeled as a two-state continuous-time Markov chain [44], i.e., active (*A*) and idle (*I*) states. In each time slot, the PU either stays in the idle state or occupies the spectrum in an active state. The two-state (active/idle) Markov chain model for modeling PU activity has been verified to be an appropriate model to characterize spectrum occupancy in the time domain [45]. Let *qt* ∈ {*A*, *I*} denote the state of the PU for *t* = 0, 1, ... , *T* − 1. The transition probabilities of the two-state Markov chain are expressed as *pai* and *pii*, which represent transitioning from the active state to the idle state, and still staying in the idle state, respectively. For *t* = 0, 1, . . . , *T* − 1, we have

$$
\boldsymbol{p}\_{ai} \triangleq \mathbb{P}(\boldsymbol{q}\_{t+1} = \boldsymbol{I} | \boldsymbol{q}\_t = \boldsymbol{A}), \tag{1}
$$

$$p\_{ii} \triangleq \mathbb{P}(q\_{t+1} = I | q\_t = I). \tag{2}$$

The transition probabilities are known to SU, which can be obtained by long-term measurements.

#### *3.2. Secondary User Model*

We considered the SU time-slotted synchronization with the PU. At the beginning of each time slot, the SU needs to decide whether to sense the PU's spectrum. If it decides not to sense the spectrum, it takes the entire time slot to harvest energy from the PU transmissions. That is, energy can be harvested when the PU is active; otherwise, no energy is harvested. We assume the imperfect sensing outcome for the SU [46]. We denote the probability of a false alarm by *pf* (i.e., the probability of deciding the spectrum is occupied by the PU while it is not). The probability of detection is denoted by *pd* (i.e., the probability of deciding PU is active when it is active). Then, we have

$$p\_f = \mathbb{P}(q\_t = A | q\_t = I),\tag{3}$$

$$p\_d = \mathbb{P}(\hat{q}\_t = I | q\_t = I). \tag{4}$$

The SU will take two actions after obtaining the sensing result. When the PU is sensed to be active, the SU will not deliver the status-update data pack. This means that it can harvest energy when the PU is actually active. On the other hand, if the sensing result is that the spectrum is vacated by the PU, the SU needs to further decide whether to update. If an update package is delivered, the SU will receive a 1-bit feedback signal from the CBS to determine whether the update is successful or not. When the sensing result *q*ˆ*<sup>t</sup>* = *I* is correct, the update is successful. This happens with probability 1 − *pf* . Update failure occurs if the PU is active despite the SU declaring it idle. This happens with probability 1 − *pd*. The SU aims to minimize the average AoI by making the optimal sensing and update decisions over time slot *t* = 0, 1, ... , *T* − 1. We denote the decision of time slot *t* by *xt* = (*φt*, *θt*), where *φt*∈{0(not sense), 1(sense)} and *θ<sup>t</sup>* ∈ {0(not update), 1(update)} denote the sensing and update decisions, respectively. The optimal sensing and update decisions are based on the SU's states and its statistical knowledge of the PU activity.

(1) Belief model: The SU observes the availability of the PU spectrum by adaptively detecting and accessing the spectrum. The belief state of the PU spectrum can be obtained based on the SU's action and observation history. That is, at the beginning of each time slot *t*, the SU forms the belief *ρt*. The belief *ρ<sup>t</sup>* is the conditional probability that the PU is in an idle state given the SU's action and observation history.

(2) Channel model: Denote the channel power gains from the PU to the SU and from the SU to the CBS by *ht* and *gt* over time slot *t*. We consider a quasi-static channel model based on one time slot by assuming that the channel state information is constant in a single time slot and variable in different time slots. Especially, as is commonly assumed in the works about the wireless communication system, the channel state information of the current time slot can be perfectly obtained.

(3) RF energy-harvesting model: The batter-free SU harvests energy from the occupied spectrum by the PU. For the SU, the HTT protocol is employed. That is, the SU first captures energy from the PU transmissions and then utilizes the harvested energy to sense spectrum and transmit data. Overall, there are two cases where energy can be harvested over time slots: (1) The not sensing decision is made, and the PU is inactive, and (2) the sensing decision is made, and the sensing result *q*ˆ*<sup>t</sup>* = *A* is correct. The energy captured by the SU is expressed as

$$E\_{H,t}^{m} = \eta \,\mathrm{\tau} \,\mathrm{Ph\_{t}} \tag{5}$$

for *t* = 0, 1, ... , *T* −1 and *m* = 1, 2, where *η*, *τ* and *P* denote the energy-harvesting efficiency, energy-harvesting time and transmit power at the PU, respectively. The superscript *m* denotes the two cases of energy-harvesting mentioned above. The captured energy is used to perform sensing spectrum and update. Denote the energy and time consumption on sensing spectrum by *δ* and *τs*, respectively. Similarly, let *ET*,*<sup>t</sup>* and *τ<sup>t</sup>* denote the energy and time consumption on update, respectively. Energy consumption *ET*,*<sup>t</sup>* is time-varying, which is related to the channel state information *gt* from the SU to the CBS. According to Shannon's formula [47], the transmission rate *<sup>S</sup> <sup>τ</sup><sup>t</sup>* can be expressed as *<sup>S</sup> <sup>τ</sup><sup>t</sup>* <sup>=</sup> *<sup>W</sup>* log2(<sup>1</sup> <sup>+</sup> *ET*,*tgt <sup>τ</sup>tσ*<sup>2</sup> ), where *σ*<sup>2</sup> is the noise power at the CBS, *S* is the size of status-update data pack, and *W* is the bandwidth. Reorganizing the expression, we obtain the energy consumption, *ET*,*t*, as

$$E\_{T,t} = \frac{\sigma^2 \tau\_t}{\mathcal{g}^t} \left( 2^{\frac{S}{\tau\_t W}} - 1 \right). \tag{6}$$

Since the size of the status-update data pack is fixed, *ET*,*<sup>t</sup>* is only related to the channel state information from the SU to the CBS. Although the update decision can reduce the AoI to one, when the channel quality is poor, it may be better not to deliver the status-update data pack to conserve energy. Note that update failure occurs if the sensing result *q*ˆ*<sup>t</sup>* = *I* is incorrect. In this case, the SU will consume all its available energy. Let *B*max denote the battery capacity of the SU. In time slot *t*, the battery state is *bt*, which evolves as

$$b\_{l+1} = \min\{b\_l + E\_{H,t}^m - \phi\_l \delta - \theta\_l E\_{T,t}, B\_{\max}\}, t = 0, 1, \dots, T - 1. \tag{7}$$

Hence, for the SU, the energy causality constraint should satisfy

$$
\phi\_t \delta + \theta\_t E\_{T,t} \le b\_t, \ t = 0, 1, \dots, T - 1. \tag{8}
$$

(4) AoI model: We consider a linear model for the AoI [16], where the AoI is defined as the time elapsed from the moment when the most recently received update was generated to the present. Let the AoI at time slot *t* denote by *at* ∈ A - {1, 2, . . ., *A*max}. Here *A*max is the upper of the AoI and is defined as

$$A\_{\text{max}} = a\_0 + T.\tag{9}$$

In the considered system, the SU adopts the generate-at-will scheme. That is, the SU generates and delivers a status-update data pack after making an update decision. At each time slot *t*, the size of the data packet *S* is small enough to be generated and updated immediately and received by the end of the current time slot when the update decision is made and the sensing result *q*ˆ*<sup>t</sup>* = *I* is correct. If the update is received at the CBS, AoI decreases to one; otherwise, it increases by one. We consider an error-free channel through which the status-update data pack can be successfully received at the CBS when the update decision is made and the sensing result *q*ˆ*<sup>t</sup>* = *I* is correct. The average AoI for an interval of *T* time slots is expressed as

$$\overline{A} = \frac{1}{T} \sum\_{t=0}^{T-1} a\_{t\prime} \ t = 0, 1, \dots, T-1. \tag{10}$$

#### **4. POMDP for AoI Minimization**

In this section, we formulate the AoI minimization as a finite-horizon POMDP problem and solve for the optimal solutions via dynamic programming.

#### *4.1. POMDP Formulation*

We use a POMDP framework to model the optimal sensing and update decisions for the SU's AoI minimization. The components of POMDP are described as follows.


Case 1: The SU does not sense the spectrum; the new belief is given by

$$
\rho\_{l+1} = \Lambda\_0(\rho\_l) = \rho\_l p\_{\bar{u}l} + (1 - \rho\_l) p\_{a\bar{u}}.\tag{11}
$$

Case 2: If the PU is sensed to be active, the SU harvests energy in the remaining time of the current time slot, i.e., the battery energy increases. This implies the true state of the PU is *qt* = *A*. The belief is updated as

$$
\rho\_{t+1} = p\_{ai}.\tag{12}
$$

Case 3: If the PU is sensed to be active, the SU does not harvest energy; i.e., the battery energy does not change and is lower than *B*max. This implies the true state of the PU is *qt* = *I*. The new belief is expressed as

$$
\rho\_{t+1} = p\_{ii}.\tag{13}
$$

Case 4: If the PU is sensed to be active, the battery energy is *B*max at time slot *t*. The new belief is given by

$$
\rho\_{t+1} = \Lambda\_{1A}(\rho\_t) = \zeta\_t p\_{i\bar{i}} + (1 - \zeta\_t) p\_{a\bar{i}\prime} \tag{14}
$$

where

$$\zeta\_t \triangleq \mathbb{P}(q\_t = l | \hat{q}\_t = A) = \frac{\rho\_l (1 - p\_f)}{\rho\_l p\_f + (1 - \rho\_t)(1 - p\_d)}.\tag{15}$$

Case 5: If the PU is sensed to be idle, the SU does not update. The belief is updated as

$$
\rho\_{t+1} = \Lambda\_{1I}(\rho\_t) = \mathcal{J}\_t p\_{\bar{u}} + (1 - \mathcal{J}\_t) p\_{a\bar{u}}.\tag{16}
$$

where

$$\vec{q}\_t \triangleq \mathbb{P}(q\_t = I | \hat{q}\_t = I) = \frac{\rho\_t (1 - p\_f)}{\rho\_t (1 - p\_f) + (1 - \rho\_t)(1 - p\_d)}.\tag{17}$$

Case 6: If the PU is sensed to be idle, the SU updates successfully. This implies that the true state of the PU is *qt* = *I*. Then, we have

$$
\rho\_{t+1} = p\_{ii}.\tag{18}
$$

Case 7: If the PU is sensed to be idle, the SU update fails. This implies that the true state of the PU is *qt* = *A*. Then, we have

$$
\rho\_{l+1} = p\_{\text{ai}}.\tag{19}
$$

Although (11)–(19) cover seven cases from case one to case seven, the new beliefs in both case two and case seven are denoted as *pai*, and the new beliefs in both case three and case six are denoted as *pii*. Hence, the SU can only transit to five beliefs. That is, the number of possible beliefs is finite over *T* time slots. Thus, for the length of *T* time slots, the belief space Φ is a finite set.

• States: Denote the discrete battery energy level of the SU at the beginning of time slot *t* by *b <sup>t</sup>* ∈ B - {0, 1, 2, . . ., *b*max}, where *b*max is the maximum battery energy level that can be stored inside the battery of the SU. Then, each energy quantum of the SU's battery contains *<sup>B</sup>*max *<sup>b</sup>*max Joules. In this case, we use *b <sup>t</sup>* = / *btb*max *<sup>B</sup>*max <sup>0</sup> to convert the continuous battery energy of the SU to the discrete battery energy level, by which a lower bound to the AoI of the original continuous system is obtained. Similarly, divide continuous channel power gain into finite number of intervals according to fading probability density function (PDF). Thus, the discrete channel power gain levels from the SU to the CBS and from the PU to the SU are expressed as *g <sup>t</sup>* ∈ G - (0, 1, 2, . . ., *g*max) and *h <sup>t</sup>* ∈ H - (0, 1, 2, . . ., *h*max), respectively. Here, *g*max and *h*max denote the corresponding maximum channel power gain levels. At each time slot *t*, the completely observable states include channel state from the PU to the SU, channel state from the SU to the CBS, the AoI state, and battery state, denoted by *st* - (*h <sup>t</sup>*, *g <sup>t</sup>*, *at*, *b t*). Note that the state space, i.e., S - H×G×A×B, is finite. Due to imperfect sensing, an update may be unsuccessful when the sensing result is *q*ˆ*<sup>t</sup>* = *I* and the update decision is *θ<sup>t</sup>* = 1. Thus,

$$a\_{t+1} = \begin{cases} 1, & \text{when } \mathbf{x}\_t = (1,1) \text{ and } \boldsymbol{\hat{q}}\_t = q\_{t\prime} \\\ a\_t + 1, & \text{otherwise,} \end{cases} \tag{20}$$

for *t* = 1, 2, ...., *T*. Alternatively, we can express *at*+<sup>1</sup> = (1 − *θt*)*at* + 1 when the sensing result *q*ˆ*<sup>t</sup>* = *I* is correct. Additionally, the PU's spectrum state is only partially observable, which is described by the belief *ρt*. Thus, for each time slot *t*, the complete system state is denoted by (*st*, *ρt*). Since S and Φ are finite, the SU experiences a finite number of possible system states (*st*, *t*) ∈S× Φ.

• Transition probabilities: For time slot *t*, given the current state *st* = (*h <sup>t</sup>*, *g <sup>t</sup>*, *at*, *b <sup>t</sup>*) and the action *xt* = (*φt*, *θt*), the transition probability to the next state *st*+<sup>1</sup> = (*h <sup>t</sup>*+1, *g <sup>t</sup>*+1, *at*+1, *b <sup>t</sup>*+1) is denoted by *pxt*(*st*+1|*st*). Since the captured energy and the channel power gains are independently and identically distributed (i.i.d), the transition probabilities for taking actions other than *xt* = (1, 1) are given as follows.

$$p\_{\mathcal{X}\_{t}}(\mathbf{s}\_{t+1}|\mathbf{s}\_{t}) = \mathbb{P}(a\_{t+1}|a\_{t\prime}, \mathbf{x}\_{t})\mathbb{P}(b\_{t+1}^{'}|b\_{t\prime}^{'}\mathbf{g}\_{t\prime}^{'}h\_{t\prime}^{'}\mathbf{x}\_{t})\mathbb{P}(\mathbf{g}\_{t+1}^{'})\mathbb{P}(h\_{t+1}^{'}),\tag{21}$$

where

$$\mathbb{P}(a\_{t+1}|a\_t, \mathbf{x}\_t) = \begin{cases} 1, & \text{when } a\_{t+1} = (1 - \theta\_t)a\_t + 1, \\ 0, & \text{otherwise,} \end{cases} \tag{22}$$

$$\mathbb{P}(b\_{l+1}^{'}|b\_{l+1}^{'}\boldsymbol{y}\_{l}^{'},\boldsymbol{h}\_{l}^{'},\boldsymbol{y}\_{l}) = \begin{cases} 1, & \text{when } \boldsymbol{\phi}\_{l} = 0 \text{ and } b\_{l+1} = \min\{\boldsymbol{b}\_{l} + \boldsymbol{E}\_{\boldsymbol{H},t}^{1}, \boldsymbol{B}\_{\text{max}}\}, \\ 1, & \text{when } \boldsymbol{\phi}\_{l} = 0 \text{ and } b\_{l+1} = \boldsymbol{b}\_{l}, \\ 1, & \text{when } \boldsymbol{\phi}\_{l} = 1, \ \boldsymbol{\theta}\_{l} = 0, \text{ and } b\_{l+1} = \min\{b\_{l} - \delta + \boldsymbol{E}\_{\boldsymbol{H},t}^{2}, \boldsymbol{B}\_{\text{max}}\}, \\ 1, & \text{when } \boldsymbol{\phi}\_{l} = 1, \ \boldsymbol{\theta}\_{l} = 0, \text{ and } b\_{l+1} = \boldsymbol{b}\_{l} - \boldsymbol{\delta}\_{l} \\ 0, & \text{otherwise.} \end{cases} \tag{23}$$

For the action *xt* = (1, 1), the transition probability is expressed as follows.

$$p\_{\mathbf{x}\_{l}}(\mathbf{s}\_{l+1}|\mathbf{s}\_{l},\boldsymbol{q}\_{l},\boldsymbol{q}\_{l}) = \mathbb{P}(\mathbf{a}\_{l+1}|a\_{l},\mathbf{x}\_{l}\boldsymbol{q}\_{l},\boldsymbol{q}\_{l}) \times \mathbb{P}(\mathbf{b}\_{l+1}^{'}|\mathbf{b}\_{l}^{'},\mathbf{g}\_{l}^{'},\mathbf{h}\_{l}^{'},\mathbf{x}\_{l}) \times \mathbb{P}(\mathbf{g}\_{l+1}^{'})\mathbb{P}(\mathbf{h}\_{l+1}^{'}),\tag{24}$$

where

$$\mathbb{P}(a\_{t+1}|a\_t, \mathbf{x}\_t) = \begin{cases} \frac{\tilde{\mathbb{P}}}{\zeta'} & \text{when } a\_{t+1} = 1 \text{ and } q\_t = \dot{q}\_{t\prime} \\\ 1 - \mathbb{Z}\_{\prime} & \text{when } a\_{t+1} = a\_t + 1 \text{ and } q\_t \le \dot{q}\_{t\prime} \\\ 0, & \text{otherwise,} \end{cases} \tag{25}$$

and

P(*b <sup>t</sup>*+1|*b <sup>t</sup>*, *g <sup>t</sup>*, *h <sup>t</sup>*, *xt*) = ⎧ ⎨ ⎩ 1, when *φ<sup>t</sup>* = 1, *θ<sup>t</sup>* = 1, *bt*+<sup>1</sup> = *bt* − *δ* − *ET*,*t*, and *q*ˆ*<sup>t</sup>* = *qt*, 1, when *φ<sup>t</sup>* = 1, *θ<sup>t</sup>* = 1, *bt*+<sup>1</sup> = 0, *q*ˆ*<sup>t</sup>* = *I*, and *qt* = *A*, 0, otherwise. (26)

> • Cost: Let the immediate cost at state *st* denoted by *C*(*st*), which is the accumulated AoI at time slot *t*. Then, we have

$$C(s\_t) = a\_t, \ t = 0, 1, \ldots, T - 1. \tag{27}$$

• Policy: The policy is expressed as *π* = {*ϑ*0, *ϑ*1, . . ., *ϑT*−1}, where *ϑ<sup>t</sup>* is the deterministic decision rule that maps a system state (*st*, *ρt*) ∈S× Φ into an action *xt* ∈ X , i.e., *xt* = *ϑt*(*st*, *ρt*). In this paper, let Π denote the set of all deterministic decision policies.

Given the SU's initial state *s*<sup>0</sup> and belief *ρ*<sup>0</sup> of PU's spectrum, the average AoI of *T* time slots under the policy *π* is given by

$$\overline{A}^{\pi}(s\_0, \rho\_0) = \frac{1}{T} \mathbb{E} \left[ \sum\_{t=0}^{T-1} \mathbb{C}(s\_t) |s\_{0\prime} \rho\_0| \right],\tag{28}$$

where the expectation is caused by policy *π*. Based on the above analysis, minimize the average AoI by finding the optimal sensing and update policy corresponds to solving

$$\min\_{\pi \in \Pi} \overline{\mathcal{A}}^{\pi}(s\_0, \rho\_0). \tag{29}$$

Given *T*, (29) is a finite-state MDP with total cost. Based on (28) and (29), to minimize the average AoI, the SU should sense the spectrum and deliver the status-update data pack as long as it has sufficient energy. However, considering the channel state information, the belief of PU's spectrum, and the battery energy available, preferring the spectrum sensing and status-update may not be the best decision.As a result, there is an optimal decision scheduling problem.

#### *4.2. POMDP Solution*

In this section, we use dynamic programming to solve total cost minimization of *T* time slots in (29) [48]. At a time slot *<sup>t</sup>*, the successive actions {*xk*}*T*−<sup>1</sup> *<sup>k</sup>*=*<sup>t</sup>* affect the states *sk* along with the accumulated AoI *C*(*sk*) for all *k* = *t*, *t* + 1, ... , *T* − 1. Let *Vt*(*st*, *ρt*) denote the state-value function, which is given by

$$V\_t(s\_t, \rho\_t) \stackrel{\triangle}{=} \min\_{\{x\_k\}\_{k=t}^{T-1}} \mathbb{E}\left[\sum\_{k=t}^{T-1} C(s\_k) |s\_{t\prime} \rho\_t\right].\tag{30}$$

It is the minimum expected cost accumulated from time slot *t* to *T* − 1 given state (*st*, *ρt*). Thus, denote the minimum AoI in (29) by *A*<sup>∗</sup> = *V*0(*s*0, *ρ*0)/*T*. Additionally, given (*st*, *ρt*) and sensing action *<sup>φ</sup>t*, let *<sup>Q</sup>φ<sup>t</sup> <sup>t</sup>* (*st*, *ρt*) represent the action-value function or Q-function, which is the minimum expected cost for taking sensing action *φ<sup>t</sup>* at state (*st*, *ρt*). The Q-function includes two parts: the immediate cost of taking action at the current state and the expected sum of the state-value functions from the next time slot.

Overall, the formulated MDP problem can be solved recursively by dynamic programming as follows. For *t* = 0, 1, . . . , *T* − 1,

$$V\_t(s\_{t\prime}\rho\_t) = \min\_{\phi\_l \in \Gamma\_\phi} Q\_t^{\phi\_l}(s\_{t\prime}\rho\_t),\tag{31}$$

When *t* = *T* − 1, we have

$$Q\_{T-1}^{0}(\mathfrak{s}\_{T-1}, \mathfrak{\rho}\_{T-1}) = \mathbb{C}(\mathfrak{s}\_{T-1}) + \mathbb{C}(\mathfrak{s}\_{T}),\tag{32}$$

$$Q\_{T-1}^{1}(s\_{T-1}, \rho\_{T-1}) = (1 - \Delta\_{T-1})\mathbb{C}(s\_{T-1}) + \rho\_{T-1} \times \Delta\_{T-1} \min\_{\phi\_{T-1} \in \Gamma\_{\phi}} \mathbb{C}(s\_{T-1}) + \mathbb{C}(s\_{T}).\tag{33}$$

When *t* = 0, 1, . . . , *T* − 2, we have

$$Q\_t^0(s\_t, \rho\_t) = \mathcal{C}(s\_t) + \sum\_{s\_{t+1}} p\_{00}(s\_{t+1}|s\_t) V\_{t+1}(s\_{t+1}, \Lambda\_0(\varrho\_t)),\tag{34}$$

$$Q\_t^\ddagger(\mathbf{s}\_{t\prime}\rho\_{t\prime}) = (1 - \Delta\_t)Q\_t^{1A}(\mathbf{s}\_{t\prime}\rho\_{t\prime}) + \Delta\_t \min\_{\theta\_t \in \Gamma\_\phi} Q\_t^{1\phi\_t}(\mathbf{s}\_{t\prime}\rho\_{t\prime}),\tag{35}$$

$$Q\_t^{1A}(\mathbf{s}\_t, \rho\_t) = \mathbb{C}(\mathbf{s}\_t) + \sum\_{\mathbf{s}\_{t+1}} p\_{10}(\mathbf{s}\_{t+1}|\mathbf{s}\_t) V\_{t+1}(\mathbf{s}\_{t+1}, \Lambda\_{1A}(\varrho\_t)), \tag{36}$$

$$Q\_t^{10}(\mathbf{s}\_{t\prime}\boldsymbol{\varrho}\_t) = \mathbf{C}(\mathbf{s}\_t) + \sum\_{\mathbf{s}\_{t+1}} p\_{10}(\mathbf{s}\_{t+1}|\mathbf{s}\_t) V\_{t+1}(\mathbf{s}\_{t+1\prime}, \Lambda\_{11}(\boldsymbol{\varrho}\_t)),\tag{37}$$

$$\begin{split} Q\_t^{11}(s\_t, \rho\_t) &= \mathbb{C}(s\_t) + \sum\_{s\_{t+1}} p\_{11}(s\_{t+1}|s\_t, \hat{q}\_t = q\_t) V\_{t+1}(s\_{t+1}, \Lambda\_I(\varrho\_t)) \\ &+ \sum\_{s\_{t+1}} p\_{11}(s\_{t+1}|s\_t, \hat{q}\_t \le q\_t) V\_{t+1}(s\_{t+1}, \Lambda\_A(\varrho\_t)), \end{split} \tag{38}$$

where Δ*<sup>t</sup>* represents the probability of observing PU idle. That is

$$
\Delta\_l = P(\hat{q}\_l = I) = \rho\_l (1 - p\_f) + (1 - \rho\_l)(1 - p\_d). \tag{39}
$$

Especially, *Q*1*<sup>A</sup> <sup>t</sup>* (*st*, *ρt*) in (36) represents the minimum expected cost by adopting sensing action *φ<sup>t</sup>* = 1 and sensing result *q*ˆ*<sup>t</sup>* = *A*, i.e., *xt* = (1, 0). In (37) and (38), given the sensing action *φ<sup>t</sup>* = 1 and sensing result *q*ˆ*<sup>t</sup>* = *I*, *Q*<sup>10</sup> *<sup>t</sup>* (*st*, *t*) and *Q*<sup>11</sup> *<sup>t</sup>* (*st*, *t*) denote the minimum expected costs by adopting update action *θ<sup>t</sup>* = 0 and *θ<sup>t</sup>* = 1, respectively. Then, by recursion in (31)–(38), the optimal policies for sensing and update are given by

$$\phi\_l^\*(s\_{l\prime}\rho\_l) \in \underset{\phi\_l \in \Gamma\_\phi}{\operatorname{argmin}} Q\_l^{\phi\_l}(s\_{l\prime}\varrho\_l) \,\tag{40}$$

$$\theta\_t^\*(s\_{t\prime}\rho\_t) \in \operatorname\*{argmin}\_{\phi\_t \in \Gamma\_\theta} Q\_t^{1\theta\_t}(s\_{t\prime}\varrho\_t). \tag{41}$$

#### **5. Numerical Results**

In this section, we evaluate the performance of our proposed optimal policy through comparing it with the myopic policy and the random policy. At the beginning of time slot *t*, for the myopic policy, the SU senses the spectrum if it has enough energy. When the sensing result is *q*ˆ*<sup>t</sup>* = *I*, the SU generates and delivers a status-update data pack if the energy available is sufficient. For the random policy, the SU randomly chooses to deliver the status-update data pack or harvest energy with a probability. Taking into account the protection of the PU's transmission, the probability of harvesting energy is set to be 90%, and the probability of delivering the status-update data pack is set to be 10%. If the SU chooses to deliver the status-update data pack while the spectrum is occupied by the PU, the status-update fails, and the AoI increases by one. The PU's state transition probabilities are *pii* = 0.8 and *pai* = 0.5. The probability of detecting an active PU is *pd* = 0.8. The channel power gains from the PU to the SU and from the SU to the CBS are modeled as *h* = ΥΨ2*d*−*<sup>κ</sup>* <sup>1</sup> and *<sup>g</sup>* <sup>=</sup> ΥΨ2*d*−*<sup>κ</sup>* <sup>2</sup> , where *d*<sup>1</sup> and *d*<sup>2</sup> denote the distances from the PU to the SU, and the SU to the CBS, respectively. Υ represents a signal power gain at a 1 m's reference distance, <sup>Ψ</sup> <sup>∼</sup> exp(1) denotes the small-scale fading gain, and *<sup>d</sup>*−*<sup>κ</sup>* <sup>1</sup> and *<sup>d</sup>*−*<sup>κ</sup>* <sup>2</sup> are standard power law path-loss with exponent *κ*. In the simulations, the system parameter values are set as follows: *<sup>η</sup>* <sup>=</sup> 0.5, *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup>95 dBm, *<sup>W</sup>* <sup>=</sup> 1 MHz, <sup>Υ</sup> <sup>=</sup> 0.2, *<sup>κ</sup>* <sup>=</sup> 2, *<sup>b</sup>*max <sup>=</sup> 5, *g*max = *h*max = 10, *ρ*<sup>0</sup> = *pii*, *τ<sup>s</sup>* = 0.2 s, and *A*max = 13.

Figure 2 shows one sample path of the AoI by the optimal policy. The transmit power of the PU is 35 dBm, the energy consumption is one energy quantum, the distance from the the SU to the CBS is 20 m, the distance from the PU to the SU is 25 m, the size of the status-update data pack is 14 Mbits, and the battery capacity is 0.5 mJoules. The trend of the AoI over time slots is clearly observed. In the simulations, we found the SU did not perform sensing spectrum even the remaining energy was enough, which verifies the foresight of the optimal policy compared with the myopic policy.

Figure 3 shows the size of the status-update data packet versus the AoI, where the simulation setup is similar as in Figure 3. It is clear that our proposed policy is superior to the other policies. For the random policy, the AoI is obviously high due to the low probability of delivering the status-update data pack. For the random policy, the AoI is greater than 5.57, due to the low probability of delivering the status-update data pack. Considering the poor AoI performance of the random policy, we only compare our algorithm with the myopic algorithm in the following numerical evaluations. We can observe that the AoI increases with the size of the status-update data packet. The reason is that the increase in the size of the status-update data packet will result in increasing the energy needed to deliver one status-update data pack. This decreases the possibility that the SU will have enough energy to update, and hence the AoI is increased.

**Figure 2.** One sample path of the AoI by the optimal policy.

**Figure 3.** The size of status-update data packet versus the AoI when *T* = 10.

Figure 4 shows the transmit power of the PU versus the AoI, where the capacity of battery is 0.2 mJoules, the distance from the PU to the SU is 5 m, the distance from SU to the CBS is 25 m, the size of status-update data pack is 15 Mbits. We can observe from Figure 4 that the average AoI increases with the transmit power of PU. The reason is that the SU will harvest more energy as the transmit power of PU increases, which allows the SU to store more energy in the battery. This increases the possibility that the SU will have enough energy to update, and hence the AoI is decreased.

**Figure 4.** The transmit power of PU versus the AoI when *T* = 10.

Figure 5 shows the battery capacity versus the AoI, where the size of the status-update data pack is 15 Mbits, the energy consumption on the sensing spectrum is one energy quantum, the transmit power of the PU is 35 dBm, the distance from the SU to the CBS is 10 m, and the distance from the PU to the SU is 5 m. It is clearly observed that the proposed policy essentially improves the AoI as compared to the myopic policy. We can also observe the average AoI decreases with the battery capacity. The reason is that increasing the battery capacity allows more harvested energy to be stored inside the battery. Thus, the SU will have enough energy to perform an update, and hence the AoI is reduced.

**Figure 5.** The battery capacity versus the AoI when *T* = 10.

Figure 6 shows the energy consumption on sensing spectrum versus the AoI. The simulation setup is the similar as in the Figure 5. It is observed that the average AoI increases with the energy consumption on sensing action. The reason is that increasing the energy consumption on sensing spectrum can result in less energy remaining inside the battery. This, in turn, decreases the possibility that the SU will have enough energy to deliver status-update data packet, and hence the AoI is increased.

**Figure 6.** The energy consumption on sensing spectrum versus the AoI when *T* = 10.

#### **6. Conclusions**

In this paper, we investigated RF energy-harvesting CRN with the aim of AoI minimization subject to the energy causality and spectrum constraints. We first used POMDP to formulate this average AoI minimization based on the AoI value, the channel state information, the energy available, and the PU's spectrum belief, and then dynamic programming was adopted to find the optimal sensing and update decisions. Numerical results showed the influence of system parameters on the AoI, and demonstrated that the proposed policy significantly outperform the myopic policy.

**Author Contributions:** Conceptualization, all co-authors; methodology, all co-authors; software, not applicable; validation, all co-authors; formal analysis, all co-authors; investigation, all co-authors; resources, all co-authors; data curation, S.J.; writing—original draft preparation, J.S. and S.Z.; writing review and editing, all co-authors; visualization, J.S. and C.Y.; supervision, S.Z.; project administration, L.H.; funding acquisition, L.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by the National Natural Science Foundation of China under grant number 62072410 and the Zhejiang Provincial Natural Science Foundation of China under grant number LQ22F020009.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **On the Age of Information in a Two-User Multiple Access Setup**

**Mehrdad Salimnejad \* and Nikolaos Pappas \***

Department of Science and Technology, Linköping University, SE-60174 Norrköping, Sweden **\*** Correspondence: mehrdad.salimnejad@liu.se (M.S.); nikolaos.pappas@liu.se (N.P.)

**Abstract:** This work considers a two-user multiple access channel in which both users have Age of Information (AoI)-oriented traffic with different characteristics. More specifically, the first user has external traffic and cannot control the generation of status updates, and the second user monitors a sensor and transmits status updates to the receiver according to a generate-at-will policy. The receiver is equipped with multiple antennas and the transmitters have single antennas; the channels are subject to Rayleigh fading and path loss. We analyze the average AoI of the first user for a discrete-time first-come-first-served (FCFS) queue, last-come-first-served (LCFS) queue, and queue with packet replacement. We derive the AoI distribution and the average AoI of the second user for a threshold policy. Then, we formulate an optimization problem to minimize the average AoI of the first user for the FCFS and LCFS with preemption queue discipline to maintain the average AoI of the second user below a given level. The constraints of the optimization problem are shown to be convex. It is also shown that the objective function of the problem for the first-come-first-served queue policy is non-convex, and a suboptimal technique is introduced to effectively solve the problem using the algorithms developed for solving a convex optimization problem. Numerical results illustrate the performance of the considered optimization algorithm versus the different parameters of the system. Finally, we discuss how the analytical results of this work can be extended to capture larger setups with more than two users.

**Keywords:** age of information; multiple access channels; multiple-input multiple-output Rayleigh fading channel; discrete-time Markov chain; convex optimization

#### **1. Introduction**

Age of Information (AoI) is considered to be a metric for characterizing the timeliness and freshness of data [1–4]. AoI was first introduced in [4], and it is defined as the time difference between the current time and the time that the latest status update was successfully received by a destination. In [4–9], the authors derived the average AoI for systems with different availability of resources using different queuing models. The M/M/1, M/D/1, and D/M/1 queues were studied under the first-come-first served (FCFS) queue management protocols in [4]. In [5–9], the authors considered the last-come-first-served (LCFS) queue protocols with or without the ability to preempt the packet in service. Recently, different types of traffic associated with different source nodes have been considered in which some nodes generate time-sensitive status updates and other nodes strive to achieve high throughput. The performance of a multiple access channel with heterogeneous traffic has been investigated in [10] where one user has bursty arrivals of regular data packets while another AoI-oriented sensor has energy-harvesting capabilities.

The interplay between delay guarantees and information freshness in a two-user multiple access channel with multi-packet reception (MPR) capability at the receiver and heterogeneous traffic is studied in [11]. Motivated by [11], in [12] the interplay of deadlineconstrained traffic and the average AoI in a two-user random-access channel with MPR reception capabilities was investigated. The authors obtained analytical expressions for the throughput and drop rate of a user with external bursty traffic, which is the deadlineconstrained and analytical expression for the average AoI of a user monitoring the sensor.

**Citation:** Salimnejad, M.; Pappas, N. On the Age of Information in a Two-User Multiple Access Setup. *Entropy* **2022**, *24*, 542. https:// doi.org/10.3390/e24040542

Academic Editor: Song-Nam Hong

Received: 1 February 2022 Accepted: 11 April 2022 Published: 12 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In [13], the authors presented the analysis of the average AoI with and without packet management at the transmission queue of the source nodes. In the proposed system, each source node has a buffer of infinite capacity to store incoming bursty traffic in the form of packets.

A small average AoI corresponds to having fresh information, which is a key requirement in various applications, including Internet of Things (IoT) scenarios, wireless sensor networks, industrial control, and vehicular networks. The problem of optimizing the process, i.e., of sending status updates from a user to minimize the average AoI, was studied in [14–24]. The works [14,25] consider real-time IoT monitoring systems, where IoT devices sample a physical process and transmit status updates to a remote monitor to minimize the average AoI. In [15], the worst-case average AoI and average peak AoI from a sensor in a system where all other sensors have a saturated queue are analyzed. In [16], a randomized policy, a MaxWeight policy, and a Whittle's Index policy have been proposed to minimize the AoI subject to minimum throughput requirements. In [17], the problem of minimizing AoI in various continuous-time and discrete-time queuing systems, such as the FCFS G/G/1, the LCFS G/G/1, and the G/G/∞, has been studied. In [18], the age-optimal scheduling policies in a network with general interference constraints have been studied. In [19], the authors considered an energy-harvesting sensor and determined the optimal status update policy to minimize the average AoI. In [20], several methods have been proposed for solving an AoI minimizing problem with throughput constraints. In [20–24], the authors developed the Drift-Plus-Penalty (DPP) policy from the Lyapunov optimization theory which is often used for solving stochastic network optimization problems with stability constraints. In [23], the authors applied the Lyapunov DPP method to minimize the average AoI total transmit power of sensors under constraints on the maximum average AoI and the maximum power of each sensor. In [24], the authors proposed the probabilistic random-access (PRA) and DPP methods for solving an optimization problem that aims to minimize the average AoI of the energy-harvesting node subject to the queue stability constraint of the grid-connected node. Recently, the performance of AoI has been investigated in Multiple-Input Multiple-Output (MIMO) systems [26–30]. In [26,27], the user scheduling problem has been investigated to minimize AoI in a multiuser MIMO status update system where multiple single-antenna devices send their information over a common wireless uplink channel to a multiple-antenna access point. In [28], a novel MIMO broadcast setting is studied to minimize the sum average AoI through precoding and transmission scheduling. The age-limited capacity through MIMO setup was investigated in [29], where a random subset of users are active in any transmission period. In [30], the authors analyzed and optimized the performance of AoI in a grant-free random-access system with massive MIMO.

#### *Contributions*

Motivated by [12,24,29] in this paper we consider a multiple access channel (MAC) with two users that have AoI-oriented traffic with different characteristics. The receiver has multiple antennas, and the communication channels are subject to Rayleigh fading and path loss, as depicted in Figure 1. The key contributions of this paper are:

	- The distribution of the AoI of the second user;
	- The probability that the AoI of the second user is greater than a threshold;
	- The average AoI of the second user;
	- The average AoI of the first user for the LCFS with preemption policy by assuming the threshold for the AoI of the second user.

The considered setup is expected to occur in several scenarios in wireless industrial automation (Industry 4.0, Industrial IoT), in which several processes are coexisting by sharing the same network resources, and sensing the states of a set of systems is essential.

The remainder of this paper is organized as follows. In Section 2, the system model is introduced. In Section 3, we analyze the average AoI of the first and second users; we formulate an optimization problem and propose a convex optimization algorithm to minimize the average AoI of the first user under the constraint on the average AoI for the second user. In Section 4, we present the numerical and simulation results to evaluate the performance of the proposed optimization method. Conclusions are drawn in Section 6.

**Figure 1.** User 1 has AoI-oriented external bursty traffic with probability *λ*, user 2 has also AoIoriented traffic but it can control the generation of status updates.

#### **2. System Model**

We consider a time-slotted MAC with two users equipped with a single antenna transmitting their information in the form of packets over a MIMO Rayleigh fading channel to a common receiver with *M* antenna, as shown in Figure 1. We assume that both users have AoI-oriented traffic, but with different characteristics. One of the main differences between the users is that the first one does not have control over the generation of the status update packets, but they are externally generated according to a Bernoulli process with a probability *λ*, while the second user can control the generation of status update packets. Let *Q*(*t*) denote the status update queue of the first user in time slot *t*, which has infinite capacity. When the queue of the first user is not empty, it attempts to transmit its status update packets with a probability *q*1. Additionally, it is assumed that the second user samples and transmits its status updates with probability *q*<sup>2</sup> based on a generate-at-will policy. Note that in Section 3.5, we will consider the case where the second user can adjust its sampling and transmission probability based on an AoI threshold.

#### *2.1. Physical Layer Model*

We assume a quasi-static Rayleigh fading model for the duration of the timeslot in which <sup>h</sup>*<sup>i</sup>* <sup>∈</sup> <sup>C</sup>*<sup>M</sup>* denotes the *<sup>M</sup>* <sup>×</sup> 1 channel vector between the user *<sup>i</sup>* (*<sup>i</sup>* <sup>=</sup> 1, 2) and the receiver and reads

$$\mathbf{h}\_{i} = \sqrt{\beta\_{i}} \mathbf{g}\_{i\nu} \tag{1}$$

where *gi* <sup>∈</sup> <sup>C</sup>*<sup>M</sup>* denotes the fast-fading coefficients between user *<sup>i</sup>* and receiver antenna, and *β<sup>i</sup>* models path loss where *β<sup>i</sup>* = *r*−*<sup>α</sup> <sup>i</sup>* . Please note that *ri* is the distance between user *i* and the receiver and *α* is the path-loss exponent that 2 < *α* < 7. At each time slot, the received signal-to-noise ratio (SNR) at the receiver when only user *i* transmits and the received signal to interference and noise ratio (SINR) at the receiver when both users transmit are given by

$$\begin{split} \text{SNR}\_{i} &= \frac{P\_{t,i}\beta\_{i}||\underline{\mathcal{G}}\_{i}||^{2}}{\sigma^{2}}, \\ \text{SINR}\_{i} &= \frac{P\_{t,i}\beta\_{i}||\underline{\mathcal{G}}\_{i}||^{2}}{\sigma^{2} + P\_{t,j}\beta\_{j}|\underline{\mathcal{G}}\_{i}^{\text{H}}\underline{\mathcal{G}}\_{j}|^{2}}, \end{split} \tag{2}$$

where *Pt*,*<sup>i</sup>* is the transmitted power by user *i* and *σ*<sup>2</sup> is the variance of the complex additive white Gaussian noise (AWGN) at the receiver. Please note that *gi*<sup>2</sup> follows a gamma distribution with shape parameter *<sup>M</sup>*, and scale parameter 1 (i.e., *gi*<sup>2</sup> <sup>∼</sup> <sup>Γ</sup>(*M*, 1)). Additionally, it is shown in [31] that *g*˜*<sup>i</sup>* <sup>H</sup>*gj* ∼ CN (0, 1)∀*i*, *<sup>j</sup>*, where *<sup>g</sup>*˜*<sup>i</sup>* <sup>H</sup> = *gi* H/*gi* and they are mutually independent and independent of *gi*2. In this paper, we assume MPR capability at the receiver, which means that the receiver can correctly decode packets from multiple simultaneous transmissions that are interfering with each other. It is assumed that a packet is successfully transmitted from the user *i* if the received SNR or SINR at the receiver exceeds a certain threshold. The success transmission probability for user *i* when only user *i* transmits *pi*/*<sup>i</sup>* and when both users transmit *pi*/*i*,*<sup>j</sup>* can be obtained as [31]

$$p\_{i/i} = \Pr\{\text{SNR}\_i > \gamma\} = \int\_{\frac{\gamma v^2}{P\_{tj}\beta\_i}}^{\infty} \frac{z^{M-1} \varepsilon^{-z}}{(M-1)!} d\mathbf{z} = \frac{\Gamma\left[M, \frac{\gamma v^2}{P\_{tj}\beta\_i}\right]}{(M-1)!}, i = \{1, 2\} \tag{3a}$$

$$p\_{i/i,j} = \Pr\{\text{SINR}\_i > \gamma\} = \int\_0^\infty \int\_{\left(\frac{\gamma v^2}{P\_{l,i}\beta\_i} + \frac{P\_{l,i}\beta\_i}{P\_{l,i}\beta\_i}\gamma t\right)}^{\infty} \frac{z^{M-1} e^{-(z+t)}}{(M-1)!} \text{d}z \text{d}t, i = \{1,2\}, j \neq i. \tag{3b}$$

Note that for the special case of *M* = 1, (3a) and (3b) can be written as

$$p\_{i/i} = \Pr\{\text{SNR}\_i > \gamma\} = \exp\left(-\frac{\gamma\sigma^2}{P\_{t,i}\beta\_i}\right) \tag{4a}$$

$$p\_{i'i,j} = \Pr\{\text{SINR}\_i > \gamma\} = \exp\left(-\frac{\gamma v^2}{P\_{t,i}\beta\_i}\right) \left(1 + \gamma \frac{P\_{t,j}\beta\_j}{P\_{t,i}\beta\_i}\right)^{-1}, i = \{1, 2\}, j \neq i. \tag{4b}$$

The results presented in this work are general and can also be applied to other types of wireless channels as long as we can calculate the aforementioned success probabilities.

#### *2.2. The Service Probability*

The service probability of a user is defined as the probability of successful transmission in a timeslot. The service probability for the first user is given by

$$
\mu\_1 = q\_1(1 - q\_2)p\_{1/1} + q\_1 q\_2 p\_{1/1,2}.\tag{5}
$$

To obtain the service probability of the second user, three cases are considered as follows.


3. When the queue of the first user is not empty, and it transmits a status update to the receiver with probability *q*1.

The service probability of the second user can be written as

$$\mu\_2 = \Pr\{Q = 0\} q\_2 p\_{2/2} + \Pr\{Q \neq 0\} \{1 - q\_1\} q\_2 p\_{2/2} + \Pr\{Q \neq 0\} q\_1 q\_2 p\_{2/1, 2}$$

$$= q\_2 \left(1 - q\_1 \Pr\{Q \neq 0\} \right) p\_{2/2} + q\_2 q\_1 \Pr\{Q \neq 0\} p\_{2/1, 2}.\tag{6}$$

The status updates at the first user are arriving according to a Bernoulli process with a probability *λ*. When the status update queue of the first user is stable *λ* < *μ*1, the probability that the queue of user 1 is not empty can be written as

$$\Pr\{Q \neq 0\} = \frac{\lambda}{\mu\_1}.\tag{7}$$

Now, using (7), the expression (6) can be written as

$$p\_2 = q\_2 \left( p\_{2/2} - \frac{\lambda (p\_{2/2} - p\_{2/1,2})}{p\_{1/1} - q\_2 (p\_{1/1} - p\_{1/1,2})} \right). \tag{8}$$

#### **3. Analysis of the Age of Information and Problem Formulation**

In this section, we analyze the average AoI of the first and second users, and formulate an optimization problem to minimize the average AoI of the first user. In the following subsections, we first derive the average AoI of the first user for a discrete-time FCFS queue, LCFS queue with preemption, and queue with packet replacement policies. Then, we obtain the AoI and average AoI of the second user for a case threshold policy.

#### *3.1. Average Age of Information of the First User*

The AoI of the first user at the receiver is defined as a random process Δ*<sup>t</sup>* = *t* − *G*(*t*), where *G*(*t*) is the time slot when the latest successfully received a status update from the first user. The evolution of AoI of the first user is illustrated in Figure 2. In this figure, we assume that all packets need to be delivered to the destination regardless of the freshness of the status update information. Therefore, we consider that *j*th status update is generated at time slot *tj*, and received by the receiver at time slot *t j* . Then, we denote *Tj* = *t <sup>j</sup>* − *tj* and *Yj* = *tj* − *tj*−<sup>1</sup> as the system time of update *<sup>j</sup>* and the interarrival time of update *<sup>j</sup>*, respectively. Without loss of generality, the average AoI of the first user for an interval of observation (0, *τ*) is defined as

$$
\Delta\_{\overline{\tau}} = \frac{1}{\overline{\tau}} \sum\_{t=0}^{N(\overline{\tau})} \Delta\_{t\_{\tau}} \tag{9}
$$

where *N*(*τ*) is the number of samples during the observation interval. Using Figure 2, Equation (9) can be calculated as the area under Δ*t*. Starting from *t* = 0, the area is decomposed into the areas *J*1, *J*2, ... , *JN*(*τ*), and the area of width *Tn* over the time interval (*tn*, *t <sup>n</sup>*) that is denoted by ¯*J*. Therefore, one can write the average AoI of the first user as a sum of disjoint geometric parts as

$$
\Delta\_{\tau} = \frac{1}{\tau} \left( J\_1 + \bar{J} + \sum\_{j=2}^{N(\tau)} J\_j \right) = \frac{J\_1 + \bar{J}}{\tau} + \frac{N(\tau) - 1}{\tau} \frac{1}{N(\tau) - 1} \sum\_{j=2}^{N(\tau)} J\_j. \tag{10}
$$

Now, the average AoI of the first user is given by

$$\bar{A}\_1 = \lim\_{\tau \to \infty} \Delta\_{\tau \prime} \tag{11}$$

we can write the expression given in (11) as [5]

$$A\_1 = \frac{1}{\mathbb{E}[\mathcal{Y}]} \left( \mathbb{E}[\mathcal{Y}T] + \frac{\mathbb{E}[\mathcal{Y}^2]}{2} + \frac{\mathbb{E}[\mathcal{Y}]}{2} \right). \tag{12}$$

**Figure 2.** An example of the age evolution of user 1 at the receiver.

#### *3.2. The FCFS Geo/Geo/1 Queue*

In this section, we obtain the average AoI of the first user for a discrete-time Geo/Geo/1 queue discipline of FCFS. When status update packets are arriving according to the Bernoulli process with a probability *λ*, the interarrival times *Yj* are i.i.d. sequences that follow a geometric distribution with probability mass function (PMF) as

$$\Pr\{Y\_j = y\} = \lambda (1 - \lambda)^{y-1}, y = 1, 2, \dots \tag{13}$$

Thus, we can obtain E[*Y*] and E[*Y*2] in Equation (12) as

$$\mathbb{E}[Y] = \frac{1}{\lambda}, \ \mathbb{E}[Y^2] = \frac{2-\lambda}{\lambda^2}. \tag{14}$$

Also, the expression E[*YT*] can be obtained as [13]

$$\mathbb{E}[YT] = \frac{\lambda \left(1 - \mu\_1\right)}{(\mu\_1 - \lambda)\mu\_1^2} + \frac{1}{\lambda\mu\_1}.\tag{15}$$

Now, using Equations (14) and (15), the expression given in (12) can be written as

$$\bar{A}\_1 = \frac{1}{\lambda} + \frac{1-\lambda}{\mu\_1 - \lambda} - \frac{\lambda}{\mu\_1^2} + \frac{\lambda}{\mu\_1}.\tag{16}$$

#### *3.3. The Preemptive LCFS Geo/Geo/1 Queue*

In this section, we consider a discrete-time LCFS Geo/Geo/1 queue with preemptive service, where a newly generated packet is given priority for service immediately. It is assumed that status update packets are arriving according to the Bernoulli process with probability *λ*. The probability distribution of interarrival time between the *j*th and (*j* + 1)th status update packet is assumed to be geometric with mean E[*Y*] = 1/*λ*, and the probability distribution time until successful delivery is assumed to be geometric distribution with mean E[*S*] = 1/*μ*1, in which *μ*<sup>1</sup> denotes the service probability of the first user. In [32], it is shown that the PMF of AoI for a discrete-time LCFS Geo/Geo/1 queue is given by

$$\Pr\{A\_1 = x\} = \frac{\lambda \mu\_1 \left[ (1 - \lambda)^{x - 1} - (1 - \mu\_1)^{x - 1} \right]}{\mu\_1 - \lambda}. \tag{17}$$

Now, we can write the average AoI of the first user as

$$\bar{A}\_1 = \sum\_{\mathbf{x}=1}^{\infty} \mathbf{x} \text{Pr}\{A\_1 = \mathbf{x}\} = \frac{1}{\lambda} + \frac{1}{\mu\_1}. \tag{18}$$

#### *3.4. Queue with Replacement*

In this section, we derive the average AoI of the first user for a queue with replacement. In this case, it is assumed that a newly generated packet discards the packet waiting in the queue. As shown in Figure 2, we express the areas *Jj* with respect to the random variables *Zj* as follows

$$\begin{split} J\_{\bar{j}} &= \sum\_{m=1}^{T\_{\bar{j}-1}+Z\_{\bar{j}}} m - \sum\_{m=1}^{T\_{\bar{j}}} m \\ &= \frac{(T\_{\bar{j}-1}+Z\_{\bar{j}})(T\_{\bar{j}-1}+Z\_{\bar{j}}+1)}{2} - \frac{T\_{\bar{j}}(T\_{\bar{j}}+1)}{2}. \end{split} \tag{19}$$

We use the fact that in the steady state *Tj*−<sup>1</sup> and *Tj* are identically distributed. Therefore, the average AoI of the first user for a queue with packet replacement is given by [13]

$$\bar{A}\_1 = \lambda\_\varepsilon \left( \mathbb{E}[ZT] + \frac{\mathbb{E}[Z^2]}{2} + \frac{\mathbb{E}[Z]}{2} \right),\tag{20}$$

where one can obtain *λe*, E[*Z*], E[*Z*2] and E[*ZT*] as [13]

$$\begin{aligned} \lambda\_{\varepsilon} &= \lambda - \frac{\lambda^3 (1 - \mu\_1)}{\lambda^2 (1 - \mu\_1) + \lambda (1 - \mu\_1)\mu\_1 + \mu\_1^2} \\ \mathbb{E}[Z] &= \frac{\lambda^2 (1 - \mu\_1) + \lambda (1 - \mu\_1)\mu\_1 + \mu\_1^2}{\lambda \mu\_1 (\lambda + \mu\_1 - \lambda \mu\_1)} \\ \mathbb{E}[Z^2] &= \frac{(2\lambda^2 + 2\lambda\mu\_1 - \lambda^2\mu\_1 + 2\mu\_1^2 - \lambda\mu\_1^2)(\mu\_1 - \lambda\mu\_1)}{\lambda^2 \mu\_1^2 (\lambda + \mu\_1 - \lambda\mu\_1)} + \frac{\lambda(2 - \mu\_1)}{\mu\_1^2 (\lambda + \mu\_1 - \lambda\mu\_1)} \\ \mathbb{E}[ZT] &= \frac{1}{\mu\_1^2} + \frac{1 - \lambda}{\lambda\mu\_1} - \frac{1 + \lambda}{(\lambda + \mu\_1 - \lambda\mu\_1)^2} + \frac{1 + 2\lambda}{\lambda + \mu\_1 - \lambda\mu\_1} + \frac{\lambda(1 - 2\mu\_1 + \lambda(3\mu\_1 - 2))}{\lambda^2 (1 - \mu\_1)^2 + \lambda\mu\_1(1 - 2\mu\_1) + \mu\_1^2}. \end{aligned}$$

#### *3.5. Age of Information and the Average Age of Information of the Second User*

We assume *A*2(*t*) be a positive integer that represents the AoI associated with the second user at the receiver. The AoI evolution between two consecutive time slots at the receiver can be written as

$$A\_2(t+1) = \begin{cases} 1, & \text{successful packet exception at time slot } t \\ A\_2(t) + 1, & \text{otherwise.} \end{cases} \tag{22}$$

According to (22), the AoI drops to one when there is a successful reception of a status update at the receiver. Otherwise, it increases by one. We can model the evolution of the AoI of the second user as a Discrete-Time Markov Chain (DTMC). The DTMC is shown in Figure 3, in which when *A*2(*t*) < *κ* (*κ* is the threshold of the AoI of the second user), a packet is transmitted with the probability *q*2. Also, a packet is transmitted with the probability *q* <sup>2</sup> when *A*2(*t*) *κ*. The service probability of the first user can be written as

$$
\mu\_1' = q\_1(1 - q\_2')p\_{1/1} + q\_1 q\_2' p\_{1/1,2}.\tag{23}
$$

According to the DTMC described in Figure 3, we can obtain the steady-state probabilities of the AoI of the second user as follows

$$\pi\_i = \begin{cases} (1 - \mu\_2)^{i-1} \pi\_1 & , i < \kappa \\ \left(\frac{1 - \mu\_2}{1 - \mu\_2'}\right)^{\kappa - 1} (1 - \mu\_2')^{i-1} \pi\_1 & , i \gg \kappa \end{cases} \tag{24}$$

where for *λ* < *μ* <sup>1</sup>, *μ* <sup>2</sup> is given by

$$p\_2' = q\_2' \left( p\_{2/2} - \frac{\lambda \left( p\_{2/2} - p\_{2/1,2} \right)}{p\_{1/1} - q\_2' \left( p\_{1/1} - p\_{1/1,2} \right)} \right). \tag{25}$$

Additionally, we can obtain *π*<sup>1</sup> as

$$\pi\_1 = \begin{cases} \mu\_2' & , \kappa = 1\\ \frac{\mu\_2 \mu\_2'}{\mu\_2' + (\mu\_2 - \mu\_2')(1 - \mu\_2)^{\kappa - 1}} & , \kappa \gg 2. \end{cases} \tag{26}$$

**Figure 3.** The DTMC, which models the evolution of AoI of the second user.

Using Equations (24) and (26), we can write the probability that the AoI of the second user is smaller than a threshold *κ*, as follows

$$\Pr\{A\_2 < \kappa\} = \sum\_{i=1}^{\kappa-1} (1 - \mu\_2)^{i-1} \pi\_1 = \frac{(1 - (1 - \mu\_2)^{\kappa-1}) \pi\_1}{\mu\_2}.\tag{27}$$

Furthermore, one can write the probability that the AoI of the second user is greater than a threshold, *κ*, as follows

$$\begin{split} \Pr\{A\_2 \geqslant \pi\} &= \sum\_{i=\kappa}^{\infty} \left(\frac{1-\mu\_2}{1-\mu\_2'}\right)^{\kappa-1} (1-\mu\_2')^{i-1} \pi\_1 \\ &= \frac{(1-\mu\_2)^{\kappa-1}\pi\_1}{\mu\_2'}. \end{split} \tag{28}$$

Now, using Equations (27) and (28), the average AoI of the second user is described as

$$\begin{split} \bar{A}\_{2} &= \sum\_{i=1}^{\infty} i \pi\_{i} = \sum\_{i=1}^{\kappa-1} i (1-\mu\_{2})^{i-1} \pi\_{1} + \sum\_{i=\kappa}^{\infty} i \left( \frac{1-\mu\_{2}}{1-\mu\_{2}'} \right)^{\kappa-1} (1-\mu\_{2}')^{i-1} \pi\_{1} \\ &= \frac{1-\mu\_{2} - (1-\mu\_{2})^{\kappa} \left[ 1-\mu\_{2} + \kappa \mu\_{2} \right]}{(1-\mu\_{2})\mu\_{2}^{2}} \pi\_{1} + \frac{(1-\mu\_{2})^{\kappa} \left[ 1-\mu\_{2}' + y\mu\_{2}' \right]}{(1-\mu\_{2})\mu\_{2}'^{2}} \pi\_{1\_{1}} \end{split} \tag{29}$$

where *μ*2, *μ* <sup>2</sup>, and *π*<sup>1</sup> are given by (8), (25) and (26), respectively. Additionally, when *κ* → ∞ and *λ* < *μ*1, Equation (29) can be written as

$$
\vec{A}\_2 = \frac{1}{\mu\_2}.\tag{30}
$$

*3.6. The Average AoI of S*<sup>1</sup> *for the Preemptive LCFS Geo/Geo/1 Queue for the Threshold-Based Policy of S*<sup>2</sup>

By considering the threshold-based policy explained in Section 3.5, the average AoI of S1 for the preemptive LCFS queue discipline given in (17) can be written as

$$\Pr\{A\_1 = \mathbf{x}\} = \Pr\{A\_1 = \mathbf{x} | A\_2 < \mathbf{x}\} \Pr\{A\_2 < \mathbf{x}\} + \Pr\{A\_1 = \mathbf{x} | A\_2 \gg \mathbf{x}\} \Pr\{A\_2 \gg \mathbf{x}\},\tag{31}$$

where Pr{*A*<sup>2</sup> < *κ*}, Pr{*A*<sup>2</sup> *κ*} are given by (27) and (28). Using Equation (17), the first and second conditional probabilities given in (31) can be written as

$$\Pr\{A\_1 = \mathbf{x} | A\_2 < \mathbf{x}\} = \frac{\lambda \mu\_1 \left[ (1 - \lambda)^{\mathbf{x} - 1} - (1 - \mu\_1)^{\mathbf{x} - 1} \right]}{\mu\_1 - \lambda} \tag{32a}$$

$$\Pr\{A\_1 = \mathbf{x} | A\_2 \gg \mathbf{x}\} = \frac{\lambda \mu\_1' \left[ (1 - \lambda)^{\mathbf{x} - 1} - (1 - \mu\_1')^{\mathbf{x} - 1} \right]}{\mu\_1' - \lambda},\tag{32b}$$

where *μ* <sup>1</sup> is given by (23). Now, we can write the average AoI of S1 for threshold-based policy of the AoI of S2 as

$$\begin{split} A\_{1} &= \sum\_{\mathbf{x}=1}^{\infty} \mathbf{x} \text{Pr}\{A\_{1} = \mathbf{x}\} \\ &= \left(\frac{1}{\lambda} + \frac{1}{\mu\_{1}}\right) \left(\frac{[1 - (1 - \mu\_{2})^{\kappa - 1}] \pi\_{1}}{\mu\_{2}}\right) + \left(\frac{1}{\lambda} + \frac{1}{\mu\_{1}'}\right) \left(\frac{(1 - \mu\_{2})^{\kappa - 1} \pi\_{1}}{\mu\_{2}'}\right), \end{split} \tag{33}$$

where *π*<sup>1</sup> is given by (26).

*3.7. Optimizing the Average AoI of S*<sup>1</sup> *subject to AoI constraints on S*<sup>2</sup>

#### 3.7.1. Using the Average AoI of the FCFS as the objective function

In this section, our objective is to minimize the average AoI of user 1 for a discrete-time Geo/Geo/1 queue discipline of FCFS with a constraint on the average AoI for user 2, which should be less than a threshold. Let *A*max be a strictly positive real value that represents the maximum average AoI of user 2. Thus, the optimization problem is formulated as follows

$$\text{minimize} \quad \bar{A}\_1 \tag{34a}$$

$$\text{subject to} \quad \vec{A}\_2 < A\_{\text{max}}.\tag{34b}$$

Using the expressions given in Equations (16) and (30), one can write Equation (34) as follows

$$\underset{q\_1, q\_2, \lambda}{\text{minimize}} \quad \frac{1}{\lambda} + \frac{1-\lambda}{\mu\_1 - \lambda} - \frac{\lambda}{\mu\_1^2} + \frac{\lambda}{\mu\_1} \tag{35a}$$

$$\text{subject to} \quad \frac{1}{\mu\_2} < A\_{\text{max}}.\tag{35b}$$

$$0 \lessapprox \lambda < \mu\_1. \tag{35c}$$

$$q\_1, q\_2 \in [0, 1]. \tag{35d}$$

where the constraint in (35c) ensures that the queue of the first user is stable. To solve this optimization problem, we first note that for *λ* < *μ*<sup>1</sup> when the service probability of the first user increases, the objective function given in (35a) decreases. Hence, to minimize the objective function, we must obtain the maximum value of *μ*1. The service probability of the first user given in (5) can be simplified as

$$\begin{split} \mu\_1 &= q\_1(1 - q\_2)p\_{1/1} + q\_1 q\_2 p\_{1/1, 2} \\ &= q\_1 \left[ p\_{1/1} - q\_2 (p\_{1/1} - p\_{1/1, 2}) \right]. \end{split} \tag{36}$$

According to Equation (36), *μ*<sup>1</sup> has its maximum value when *q*<sup>1</sup> is maximum. Therefore, by selecting *q*<sup>1</sup> = 1, we can maximize the service probability of the first user and minimize the average AoI of the first user as an objective function. Therefore, the optimal value of *q*<sup>1</sup> is given by

$$q\_1^\* = 1.\tag{37}$$

Now, using Equations (8), (36) and (37), we can write the optimization problem given in (35) as

$$\underset{q\_2, \lambda}{\text{minimize}} \quad \frac{1}{\lambda} + \frac{1-\lambda}{p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1, 2}) - \lambda} - \frac{\lambda \left(1 - p\_{1/1} + q\_2(p\_{1/1} - p\_{1/1, 2})\right)}{\left(p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1, 2})\right)^2} \tag{38a}$$

$$\text{subject to} \quad \frac{p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1,2})}{q\_2(p\_{1/1}p\_{2/2} - q\_2p\_{2/2}(p\_{1/1} - p\_{1/1,2}) - \lambda(p\_{2/2} - p\_{2/1,2}))} - A\_{\text{max}} < 0,\tag{38b}$$

$$(\lambda - p\_{1/1} + q\_2(p\_{1/1} - p\_{1/1})\_\text{2}) < 0,\tag{38c}$$

$$
\lambda, q\_2 \in [0, 1]. \tag{38d}
$$

By definition, an optimization problem is convex when its objective function and the inequality constraints are convex, and its equality constraints are affine, see Chapter 4.2 in [33]. We can show that the Hessian matrix of the objective function given in (38a) is positive semi-definite for some parameters of *λ* and *q*<sup>2</sup> and for some others is not positive semi-definite and therefore it is not a convex function. Additionally, it can be verified that the Hessian matrices of the inequality constraints (38b) and (38c) are positive semi-definite for different values of *λ* and *q*2. Therefore, this optimization problem is not a convex optimization problem, a trivial solution does not exist for this problem, and finding the optimal solution is computationally involved. Hence, to find the optimal values of *λ* and *q*2, a suboptimal technique is proposed to effectively solve the problem using an algorithm developed for solving convex optimization problems. This approach is known as the bilevel optimization algorithm and is used when optimization parameters are interdependent, and the optimization problem is convex with respect to each of the optimization parameters when other parameters are fixed [34].

#### 3.7.2. Bilevel Convex Optimization

Using the procedure explained in Appendix A, it can be verified that the objective function given in (38a) is a convex function of *λ* when *q*<sup>2</sup> is fixed and *λ* < *μ*1. Therefore, the optimization problem can be solved for *λ* by assuming that *q*<sup>2</sup> is fixed. Then, substituting for *λ* in (38a) from the previous stage and assuming that this parameter is fixed, we can solve the optimization problem for *q*2. This procedure continues until the convergence condition is satisfied (for example, the change in the objective function in two successive iterations is lower than a small threshold).

In this paper, an interior-point method is used to solve the optimization problem in each iteration of the bilevel optimization algorithm. The iteration complexity of this method is shown in Chapter 3.4.3 in the work of den Hertog [35] to be O(*ν*(*c* <sup>√</sup>*n*)), where *<sup>ν</sup>* denotes the number of iterations, *n* is the number of constraints and *c* is a constant, which depends on system parameters such as tolerance.

3.7.3. Using the Average AoI of the LCFS with Preemption as an Objective Function

In this section, our objective is to minimize the average AoI of user 1 for a discrete-time preemptive LCFS Geo/Geo/1 queue discipline with a constraint on the average AoI for user 2, which should be less than a threshold. Using the expressions given in Equations (18) and (30), the optimization problem is formulated as follows

$$\underset{q\_1, q\_2, \lambda}{\text{minimize}} \quad \frac{1}{\lambda} + \frac{1}{\mu\_1} \tag{39a}$$

$$\text{subject to } \frac{1}{\mu\_2} < A\_{\text{max}}.\tag{39b}$$

$$0 \precsim \lambda < \mu\_1. \tag{39c}$$

$$q\_{1\prime}q\_2 \in [0,1]. \tag{39d}$$

Using the procedure explained in Section 3.7.1, the *q*∗ <sup>1</sup> = 1 and the optimization problem given in (39) is simplified as

$$\underset{q\_2, \lambda}{\text{minimize}} \quad \frac{1}{\lambda} + \frac{1}{p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1, 2})} \tag{40a}$$

$$\text{subject to} \quad \frac{p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1,2})}{q\_2(p\_{1/1}p\_{2/2} - q\_2p\_{2/2}(p\_{1/1} - p\_{1/1,2}) - \lambda(p\_{2/2} - p\_{2/1,2}))} - A\_{\text{max}} < 0,\tag{40b}$$

$$2\lambda - p\_{1/1} + q\_2(p\_{1/1} - p\_{1/1,2}) < 0,\tag{40c}$$

$$
\lambda, q\_2 \in [0, 1]. \tag{40d}
$$

We can prove that the Hessian matrix of the objective function given in (40a) is positive semi-definite and the optimization problem is convex (see Appendix B). Therefore, this optimization problem can be solved using an algorithm developed for solving convex optimization problems such as the interior-point method.

#### **4. Numerical Results and Discussion**

In this section, we illustrate our analytical results presented in Section 3 and we verify them by means of computer simulation. Simulation results are obtained using 106 independent realizations of the system. Additionally, we evaluate the performance of the proposed interior-point algorithm presented in Section 3. It is assumed that the users are located at a distance *ri* = 30 m (*i* = 1, 2) from the receiver. The receiver noise power is assumed to be *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>100</sup> dBm, and the path-loss exponent is *<sup>α</sup>* <sup>=</sup> 4. Additionally, the assumed transmit powers are *Pt*,1 = *Pt*,2 = 5 mW, and the transmission channels between the users and receiver are subject to Rayleigh fading model and we use the expressions for the success probabilities that were presented in Section 2. Furthermore, the initial point for the interior-point algorithm is zero, i.e., " *λ*(0), *q* (0) 2 # = 0.

Figure 4 shows the average AoI of the first user for the FCFS Geo/Geo/1 queue, preemptive LCFS Geo/Geo/1 queue, and queue with replacement as a function of *λ*, *γ* = −5 dB, *M* = 1, *q*<sup>1</sup> = 0.8, and *q*<sup>2</sup> = 0.2. As seen in this figure, the preemptive LCFS Geo/Geo/1 queue outperforms the FCFS Geo/Geo/1 queue, and queue with replacement. Additionally, note that the average AoI of the first user for the FCFS Geo/Geo/1 queue we plot for *λ* < 0.7 to satisfy the stability requirements. Figure 4 also shows that the simulation results match the analytical results.

The average AoI of S1 and S2 are shown in Figure 5 as a function of *κ*, for *λ* = 0.5, *q*<sup>1</sup> = 1, *q*<sup>2</sup> = 0.2, *q* <sup>2</sup> = 0.5, *γ* = 3 dB, and various values of *M*. As seen in this figure, as *κ* increases, the slope of the average AoI of S2 and S1 decreases because when *κ* becomes larger, the average AoI of S2 and S1 tends to 1/*μ*<sup>2</sup> and 1/*λ* + 1/*μ*1, respectively, and becomes independent of *q* <sup>2</sup>. Therefore, by changing *q* <sup>2</sup> the average AoI of S2 and S1 will not change. Furthermore, when *κ* increases, the average AoI of S2 increases and the average AoI of S1 decreases. This is because for smaller values of *κ*, a packet is transmitted with

probability *q* <sup>2</sup> that *q* <sup>2</sup> > *q*<sup>2</sup> sooner than larger values of *κ*. Therefore, the average AoI of the first and second users has larger and smaller values, respectively, for smaller values of *κ*. Additionally, note that when *M* increases, the average AoI of the first and second users decreases because for a larger value of *M* the service probabilities of the first and second users have larger values such that they decrease the average AoI of S1 and S2.

**Figure 4.** The average AoI of the first user for the FCFS Geo/Geo/1 queue, preemptive LCFS Geo/Geo/1 queue, and queue with replacement for *γ* = −5 dB, *M* = 1, *q*<sup>1</sup> = 0.8, and *q*<sup>2</sup> = 0.2, and various values of *λ*.

**Figure 5.** The average AoI of the S1 and S2 for *γ* = 3 dB, *λ* = 0.5, *q*<sup>1</sup> = 1, *q*<sup>2</sup> = 0.2, *q* <sup>2</sup> = 0.5, *κ* = 1, 5, 10, . . . , 30, and various values of *M*.

Figure 6 shows the probability that the AoI of the second user to be greater than a threshold as a function of *λ* for *q*<sup>1</sup> = 1, *q*<sup>2</sup> = 0.2, *q* <sup>2</sup> = 0.5, *γ* = 3 dB, *x* = 5, and selected values of *M*. As seen in this figure, when *λ* increases the probability Pr{*A*<sup>2</sup> *x*} increases. This is because when *λ* increases the service probability of the second user decreases and therefore it increases the AoI of the second user. Furthermore, by increasing *M*, the success transmission probabilities increase, and the service probability of the second user increases. As a result, the AoI of the second user decreases and the probability Pr{*A*<sup>2</sup> *x*} has lower values. Note, importantly, that when *M* = 1, the probability Pr{*A*<sup>2</sup> *x*} does not have a value for *λ* > 0.6. This is because for *λ* > 0.6, *λ* becomes larger than *μ*<sup>1</sup> and thus the AoI of the second user does not have a value.

**Figure 6.** The probability the AoI of the S2 at the receiver greater than a threshold, *x* = 5, for *γ* = 3 dB, *q*<sup>1</sup> = 1, *q*<sup>2</sup> = 0.2, *q* <sup>2</sup> = 0.5, *λ* = 0.1, 0.2, . . . , 1, and various values of *M*.

Figure 7 shows the average service time of the first user, 1/*μ*1, as a function of *Pt*,1 (*Pt*,1 <sup>=</sup> *Pt*,2) for *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>50</sup> dBm, *<sup>α</sup>* <sup>=</sup> 4, *ri* <sup>=</sup> <sup>30</sup> <sup>m</sup> (*<sup>i</sup>* = 1, 2), *<sup>γ</sup>* <sup>=</sup> <sup>0</sup> dB, *<sup>q</sup>*<sup>1</sup> <sup>=</sup> 0.8, *q*<sup>2</sup> = 0.4, and selected values of *M*. As seen in this figure, when transmitted power increases, the average service time decreases. This is because by increasing the transmitted power, the success transmission probabilities increases and thus the average service time decreases. Furthermore, when *M* increases, the average service time decreases. This is because when *M* increases, the service probability increases such that it decreases the average service time.

The average service time is illustrated in Figure 8 as a function of *r*<sup>1</sup> (*r*<sup>1</sup> = *r*2) for *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>50</sup> dBm, *<sup>α</sup>* <sup>=</sup> 4, *Pt*,1 <sup>=</sup> *Pt*,2 <sup>=</sup> <sup>5</sup> mW, *<sup>γ</sup>* <sup>=</sup> <sup>0</sup> dB, *<sup>q</sup>*<sup>1</sup> <sup>=</sup> 0.8, *<sup>q</sup>*<sup>2</sup> <sup>=</sup> 0.4, and various values of *M*. As seen in this figure, the average service time increases by increasing *r*1. This is because when *r*<sup>1</sup> increases, the service probability of the first user decreases, which results in decreasing in the average service time. Moreover, by increasing *M*, the average service time decreases. This is because when *M* increases, the success probabilities increase and therefore the average service time decreases.

**Figure 7.** The average service time of the first user as a function of *Pt*,1 for *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>50</sup> dBm, *<sup>α</sup>* <sup>=</sup> 4, *r*<sup>1</sup> = 30 m (*i* = 1, 2), *γ* = 0 dB, *q*<sup>1</sup> = 0.8, *q*<sup>2</sup> = 0.4, and various values of *M*.

**Figure 8.** The average service time of the first user as a function of *<sup>r</sup>*<sup>1</sup> for *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>−</sup><sup>50</sup> dBm, *<sup>α</sup>* <sup>=</sup> 4, *Pt*,1 = *Pt*,2 = 5 mW, *γ* = 0 dB, *q*<sup>1</sup> = 0.8, *q*<sup>2</sup> = 0.4, and various values of *M*.

The minimum average AoI of the first user for the case where *M* = 1 as a function of *γ* and selected values of *A*max is illustrated in Figure 9. As seen in this figure and Tables 1–4, the minimum average AoI of the first user has a larger value when the SNR threshold *γ* is larger. This is because a higher *γ* gives lower success probabilities and therefore increases the minimum average AoI. An important observation is that as *A*max increases, the average AoI of the first user does not depend on *A*max. This is because when *A*max increases, the constraint on the average AoI of the second user becomes independent of *A*max. Hence, when *A*max increases, the optimal value of transmit probability *q*2, *λ* and therefore the minimum average AoI of the first user will not change.


**Table 1.** The minimum average AoI of the first user and the optimal values of *λ*∗, *q*∗ <sup>1</sup>, and *q*<sup>∗</sup> <sup>2</sup> for *A*max = 2.

**Table 2.** The minimum average AoI of the first user and the optimal values of *λ*∗, *q*∗ <sup>1</sup>, and *q*<sup>∗</sup> <sup>2</sup> for *A*max = 5.


**Table 3.** The minimum average AoI of the first user and the optimal values of *λ*∗, *q*∗ <sup>1</sup>, and *q*<sup>∗</sup> <sup>2</sup> for *A*max = 10.


**Table 4.** The minimum average AoI of the first user and the optimal values of *λ*∗, *q*∗ <sup>1</sup>, and *q*<sup>∗</sup> <sup>2</sup> for *A*max = 15.


**Figure 9.** The minimum average AoI of user 1, for *M* = 1, *A*max = 2, 5, 10, 15, and various values of *γ*.

Figure 10 shows the interplay between the average AoI of the first user for a discretetime Geo/Geo/1 queue discipline of FCFS and the average AoI of the second user when *y* → ∞ as a function of *q*<sup>2</sup> and selected values of *γ*. In this figure, we consider the weak/strong MPR capabilities. We denote that the strong and weak MPR capability of a receiver corresponds to *K* = *<sup>p</sup>*1/1,2 *<sup>p</sup>*1/1 <sup>+</sup> *<sup>p</sup>*2/1,2 *<sup>p</sup>*2/2 <sup>&</sup>gt; 1 and *<sup>K</sup>* <sup>=</sup> *<sup>p</sup>*1/1,2 *<sup>p</sup>*1/1 <sup>+</sup> *<sup>p</sup>*2/1,2 *<sup>p</sup>*2/2 < 1, respectively [24]. When *M* = 1 for *γ* = −5 dB and *γ* = −3 dB, *K* = 1.51 and *K* = 1.33, respectively. Therefore, the receiver has strong MPR capabilities. Furthermore, for *γ* = 1 dB and *γ* = 3 dB, *K* = 0.88 and *K* = 0.66, respectively; thus, the receiver has weak MPR capabilities. Additionally, when *M* = 2 for *γ* ∈ {−5, −3, 1, 5} dB, *K* ∈ {1.88, 1.77, 1.37, 1.11}, respectively. Moreover, when *M* = 4 for *γ* ∈ {−5, −3, 1, 5} dB, *K* ∈ {1.99, 1.97, 1.80, 1.60}, respectively. Therefore, when *M* > 1 the receiver has strong MPR capabilities for selected values of *γ*. In Figure 10, we consider a different scenario from the optimization problem scenario in (34). Here we intend to find transmission probabilities *q*<sup>1</sup> and *q*<sup>2</sup> that both users transmit at the same time to keep the average AoI of the first and second users below a threshold *A*1max1 and *A*2max respectively. Observe that when the receiver has strong MPR capabilities, we have *A*¯ <sup>1</sup> < *A*1max and *A*¯ <sup>2</sup> < *A*2max with a high value of transmit probability *q*2. Therefore, in this case, both users can transmit at the same time with a high probability. For example, we assume the thresholds for the average AoI of the first and second users are equal to *A*1max = 6 and *A*2max = 6, respectively. As seen in this figure when *M* equals 1, and *γ* = −5 dB (strong MPR capability), we can achieve our purpose with *q*<sup>1</sup> = 0.6 and *q*<sup>2</sup> = 0.5 while for *γ* = 1 dB (weak MPR capability), we cannot find a value of *q*<sup>2</sup> to achieve our goal. In this case, both users cannot transmit at the same time. Additionally, observe that in Figure 10b, when *γ* = −5 dB the first and second users can transmit at the same time with the probabilities of *q*<sup>1</sup> = 0.6 and *q*<sup>2</sup> = 1. Furthermore, when *γ* = 1 dB, the transmit probability of *q*<sup>1</sup> and *q*<sup>2</sup> are equal to *q*<sup>1</sup> = 0.6 and *q*<sup>2</sup> = 0.4. In addition, as shown in Figure 10c, when *γ* = −5 dB and *γ* = 1 dB, both users can transmit at the same time with the probabilities of *q*<sup>1</sup> = 0.6 and *q*<sup>2</sup> = 1. This reflects the fact that when the number of receiver antenna increases, the receiver has a strong MPR capability for higher values of *γ* and thus both users can transmit at the same time with a high probability.

**Figure 10.** The interplay between the average AoI of the first and second users for *q*<sup>1</sup> = 0.6, *λ* = 0.3, *q*<sup>2</sup> = 0.1, 0.2, . . . , 1, (**a**) *M* = 1, (**b**) *M* = 2, and (**c**) *M* = 4.

#### **5. Discussion on Larger Topology**

In this section, we discuss how this work can be extended to capture more than two users. However, detailed analysis and optimization for more than two users is left for a future publication. Below, we provide some details for a setup with two users with external traffic and two users with control over the generation of the status updates as depicted in Figure 11. More specifically, it is assumed that the users S1 and S2 do not have control over the generation of status update packets, and they are externally generated according to Bernoulli processes with probabilities *λ*<sup>1</sup> and *λ*2, respectively. When the queue of *Si*, *i* = 1, 2 is not empty, *Si* attempts to transmit with probability *qi*. We also consider that users S3 and S4 can control the generation of status update packets; thus, they sample and transmit with probabilities *q*<sup>3</sup> and *q*4, respectively, based on a generate-at-will policy. Using the same approach presented in Section 2.2, we derive the service probabilities of S1, and S3 as

*μ*<sup>1</sup> = *q*1Pr{*Q*<sup>2</sup> = 0}(1 − *q*3)(1 − *q*4)*p*1/1 + *q*1Pr{*Q*<sup>2</sup> = 0}*q*3(1 − *q*4)*p*1/1,3 + *q*1Pr{*Q*<sup>2</sup> = 0}(1 − *q*3)*q*<sup>4</sup> *p*1/1,4 + *q*1Pr{*Q*<sup>2</sup> = 0}*q*3*q*<sup>4</sup> *p*1/1,3,4 + *q*1Pr{*Q*<sup>2</sup> = 0}(1 − *q*2)(1 − *q*3)(1 − *q*4)*p*1/1 + *q*1Pr{*Q*<sup>2</sup> = 0}(1 − *q*2)*q*3(1 − *q*4)*p*1/1,3 + *q*1Pr{*Q*<sup>2</sup> = 0}(1 − *q*2)(1 − *q*3)*q*<sup>4</sup> *p*1/1,4 + *q*1Pr{*Q*<sup>2</sup> = 0}(1 − *q*2)*q*3*q*<sup>4</sup> *p*1/1,3,4 + *q*1Pr{*Q*<sup>2</sup> = 0}*q*2(1 − *q*3)(1 − *q*4)*p*1/1,2 + *q*1Pr{*Q*<sup>2</sup> = 0}*q*2*q*3(1 − *q*4)*p*1/1,2,3 + *q*1Pr{*Q*<sup>2</sup> = 0}*q*2(1 − *q*3)*q*<sup>4</sup> *p*1/1,2,4 + *q*1Pr{*Q*<sup>2</sup> = 0}*q*2*q*3*q*<sup>4</sup> *p*1/1,2,3,4 = *q*1(1 − *q*3)(1 − *q*4) *p*1/1 − *q*2Pr{*Q*<sup>2</sup> = 0}(*p*1/1 − *p*1/1,2) + *q*1*q*3(1 − *q*4) *p*1/1,3 − *q*2Pr{*Q*<sup>2</sup> = 0}(*p*1/1,3 − *p*1/1,2,3) + *q*1*q*4(1 − *q*3) *p*1/1,4 − *q*2Pr{*Q*<sup>2</sup> = 0}(*p*1/1,4 − *p*1/1,2,4) + *q*1*q*3*q*<sup>4</sup> *p*1/1,3,4 − *q*2Pr{*Q*<sup>2</sup> = 0}(*p*1/13,4 − *p*1/1,2,3,4) (41)

*μ*<sup>3</sup> = Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3(1 − *q*4)*p*3/3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3*q*<sup>4</sup> *p*3/3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3(1 − *q*1)(1 − *q*4)*p*3/3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3*q*4(1 − *q*1)*p*3/3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*1*q*3(1 − *q*4)*p*3/1,3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*1*q*3*q*<sup>4</sup> *p*3/1,3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3(1 − *q*2)(1 − *q*4)*p*3/3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3(1 − *q*2)*q*<sup>4</sup> *p*3/3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*2*q*3(1 − *q*4)*p*3/2,3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*2*q*3*q*<sup>4</sup> *p*3/2,3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3(1 − *q*1)(1 − *q*2)(1 − *q*4)*p*3/3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*2*q*3(1 − *q*1)(1 − *q*4)*p*3/2,3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*3*q*4(1 − *q*1)(1 − *q*2)*p*3/3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*1*q*3(1 − *q*2)(1 − *q*4)*p*3/1,3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*2*q*3*q*4(1 − *q*1)*p*3/2,3,4+Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*1*q*2*q*3(1 − *q*4)*p*3/1,2,3 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*1*q*3*q*4(1 − *q*2)*p*3/1,3,4 + Pr{*Q*<sup>1</sup> = 0, *Q*<sup>2</sup> = 0}*q*1*q*2*q*3*q*<sup>4</sup> *p*3/1,2,3,4 (42)

Similarly, we can write the service probabilities for S2 and S4. As we can observe, we see that the service probability of *S*<sup>1</sup> depends on the state of the queue of *S*<sup>2</sup> and vice versa. Thus, the queues are coupled, which is a known problem and closed form solutions cannot be obtained for more than three users. Furthermore, the service probabilities of *S*<sup>3</sup> and *S*<sup>4</sup> depend on the joint PDF of the queues of *S*<sup>1</sup> and *S*2.

A way to bypass the difficulty due to coupling among the queues is to assume independence as in [13], or if we further assume that the queues have finite capacity then we can use semi-analytical methods from queuing theory. In the first case, we can approximate the performance and that approximation is tight for higher values of the arrival probabilities. After characterizing the service probabilities for each node, we can use the provided analysis in the earlier sections. In a scenario where we have only one user with a queue and *N* users with the generate-at-will policy, the extension becomes a trivial exercise of our analytical results since one can use the expressions directly.

**Figure 11.** *S*<sup>1</sup> and *S*<sup>2</sup> have AoI-oriented external bursty traffic, *S*<sup>3</sup> and *S*<sup>4</sup> have also AoI-oriented traffic but they can control the generation of status updates.

#### **6. Conclusions**

In this work, we considered a two-user multiple access channel in which both users have AoI-oriented traffic, but with different characteristics. All transmission channels were assumed to be subject to path loss and fading. We have investigated the performance of the average AoI of the first user for the FCFS, LCFS Geo/Geo/1 with preemption, and queue with replacement. Additionally, we have derived the AoI and the average AoI of the second user by considering a threshold for the AoI of the second user. Then, we have formulated an optimization problem to minimize the average AoI of the first user with a constraint on the average AoI of the second user. To solve the proposed optimization problem, we used the interior-point method. Numerical results showed the performance of the proposed algorithm for the different parameters of the system and the impact of multiple antennas.

Future extensions of this work include larger topologies, as discussed in Section 5. Furthermore, an interesting extension is to consider more elaborate schemes at the physical layer such as the MMSE receiver or zero-forcing. Another interesting direction is to consider power control schemes with dynamic programming methodologies such as Markov Decision Processes or stochastic optimization.

**Author Contributions:** Conceptualization, N.P.; Formal analysis, M.S.; Supervision, N.P.; Writing original draft, M.S.; Writing—review & editing, M.S. and N.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Swedish Research Council (VR), ELLIIT, and CENIIT.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

To prove the convexity of the objective function, we first define the expression given in (38a) as

$$\mathcal{A} = \frac{1}{\lambda} + \frac{1 - \lambda}{p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1, 2}) - \lambda} - \frac{\lambda \left(1 - p\_{1/1} + q\_2(p\_{1/1} - p\_{1/1, 2})\right)}{\left(p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1, 2})\right)^2}. \tag{A1}$$

Now, by taking the second derivative d2<sup>A</sup> <sup>d</sup>*λ*<sup>2</sup> we have

$$\frac{\mathrm{d}^2 \mathcal{A}}{\mathrm{d} \lambda^2} = \frac{2}{\lambda^3} + \frac{2(1 - X)}{(X - \lambda)^3} \tag{A2}$$

where *X* = *p*1/1 − *q*2(*p*1/1 − *p*1/1,2), and using the expression given in (38c), we have *<sup>λ</sup>* < *<sup>X</sup>* 1. Therefore, for all values of *<sup>X</sup>* and *<sup>λ</sup>*, the second derivative d2<sup>A</sup> <sup>d</sup>*λ*<sup>2</sup> is positive, and thus the objective function is a convex function of *λ* when *q*<sup>2</sup> is fixed. Similarly, by taking the second derivative <sup>d</sup>2<sup>A</sup> d*q*<sup>2</sup> 2 when *λ* is fixed we have

$$\frac{\mathrm{d}^2 \mathcal{A}}{\mathrm{d}q\_2^2} = (p\_{1/1} - p\_{1/1,2})^2 \left[ \frac{-2\lambda(3-X)}{X^4} + \frac{2(1-\lambda)}{(X-\lambda)^3} \right] \tag{A3}$$

where *X* = *p*1/1 − *q*2(*p*1/1 − *p*1/1,2) and *λ* < *X* 1. It can be readily shown that for all values of *<sup>X</sup>* and *<sup>λ</sup>*, d2<sup>A</sup> d*q*<sup>2</sup> 2 is positive and therefore the objective function A is a convex function of *q*<sup>2</sup> when *λ* is fixed.

#### **Appendix B**

In order to prove the convexity of the objective function given in (40a), we must verify that the Hessian matrix of the objective function is positive semi-definite. We first consider the objective function as

$$\mathcal{B} = \frac{1}{\lambda} + \frac{1}{p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1,2})}.\tag{A4}$$

To obtain the Hessian matrix, we need to derive d2<sup>B</sup> <sup>d</sup>*λ*<sup>2</sup> , d2<sup>B</sup> d*λ*d*q*<sup>2</sup> , and d2<sup>B</sup> d*q*<sup>2</sup> 2 .

$$\begin{aligned} \frac{\mathbf{d}^2 \mathcal{B}}{\mathbf{d} \lambda^2} &= \frac{2}{\lambda^3} \gtrless 0\\ \frac{\mathbf{d}^2 \mathcal{B}}{\mathbf{d} \lambda \mathbf{d} q\_2} &= 0\\ \frac{\mathbf{d}^2 \mathcal{B}}{\mathbf{d} q\_2^2} &= \frac{2(p\_{1/1} - p\_{1/1,2})}{(p\_{1/1} - q\_2(p\_{1/1} - p\_{1/1,2}))^3} \gtrless 0. \end{aligned} \tag{A5}$$

According to the stability constraint in (40c), *λ* < *p*1/1 − *q*2(*p*1/1 − *p*1/1,2) 1 and therefore d2<sup>B</sup> d*q*<sup>2</sup> 2 0. Since d2<sup>B</sup> <sup>d</sup>*λ*<sup>2</sup> <sup>×</sup> d2<sup>B</sup> d*q*<sup>2</sup> 2 − " d2<sup>B</sup> d*λ*d*q*<sup>2</sup> #2 > 0, the Hessian matrix is positive semi-definite and the objective function is convex.

#### **References**


### *Article* **On the Value of Information in Status Update Systems †**

**Peng Zou and Suresh Subramaniam \***

Department of Electrical and Computer Engineering, George Washington University, Washington, DC 20052, USA; pzou94@gwu.edu

**\*** Correspondence: suresh@gwu.edu

† Part of this paper was published and presented in the IEEE WCNC 2020, Seoul, Korea, 25–28 May 2020.

**Abstract:** The age of information (AoI) is now well established as a metric that measures the freshness of information delivered to a receiver from a source that generates status updates. This paper is motivated by the inherent value of packets arising in many cyber-physical applications (e.g., due to precision of the information content or an alarm message). In contrast to AoI, which considers all packets are of equal importance or value, we consider status update systems with update packets carrying values as well as their generated time stamps. A status update packet has a random initial value at the source and a deterministic deadline after which its value vanishes (called ultimate staleness). In our model, the value of a packet either remains constant until the deadline or decreases in time (even after reception) starting from its generation to the deadline when it vanishes. We consider two metrics for the value of information (VoI) at the receiver: *sum VoI* is the sum of the current values of all packets held by the receiver, whereas *packet VoI* is the value of a packet at the instant it is delivered to the receiver. We investigate various queuing disciplines under potential dependence between value and service time and provide closed form expressions for both average sum VoI and packet VoI at the receiver. Numerical results illustrate the average VoI for different scenarios and relations between average sum VoI and average packet VoI.

**Keywords:** age of information; status update system; value of information

#### **1. Introduction**

In many cyber-physical applications, the need for *real-time* communication of information packets involves not only maintaining information freshness but is also accompanied by the need to preserve the importance or *value* of those packets. Examples of such cases include autonomous cars and general vehicular networks [1–3], sensor networks [4–6], tactical networks [7] and other systems making decisions in *real-time* [8,9]. In this context, the value of information is another crucial dimension in addition to the notion of timeliness associated with information. In this paper, we address this issue in a queuing system carrying status update packets.

Status update systems with the age of information (AoI) metric measuring end-to-end freshness of packets have received extensive interest recently. Pioneered by the analysis in [10,11] motivated from vehicular status update systems, the AoI metric has been found to be useful in various scenarios such as single server queuing systems [12–14], energy harvesting systems [15–20], single and multi-hop networks [21–25], cognitive radio [26,27] and vehicular communication networks [28]. The AoI metric provides exclusive meaning to the timing of packets and connects a packet's usefulness at the receiver with how long the packet spends before its reception. As such, each packet is assumed to be created with the same value starting at generation. The current literature on status update system abstractions is focused mostly on information freshness and does not consider real-time communication of information packets involving a (time-varying) value associated with its content as well as timing, with some attempts in [29–33] being exceptions. In particular, different packets may have different values with respect to the application at the receiver

**Citation:** Zou, P.; Subramaniam, S. On the Value of Information in Status Update Systems. *Entropy* **2022**, *24*, 449. https://doi.org/10.3390/ e24040449

Academic Editors: Anthony Ephremides and Yin Sun

Received: 2 March 2022 Accepted: 21 March 2022 Published: 24 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

using it. In such scenarios, the AoI metric falls short of capturing all the dimensions of the problem, and a separate *value of information (VoI)* metric has to be introduced.

In this paper, we abstract out the VoI of a status update packet as a time-varying quantity with a random initial value which becomes zero after a deterministic deadline (identical over all packets) inspired by the AoI metric. Packets are assumed to be useless after the deadline, which we term as *ultimate staleness*. We also assume a functional dependence between the initial value of an information packet and its service time to capture the relation between value and data size (e.g., packets carrying higher resolution information are more valuable but larger in size), the growth rate of processes to be monitored (e.g., state estimation in cyber-physical systems) and the content of packets regarding an alarming event. We propose two definitions for VoI. The *sum VoI* is the sum of the current values of all packets held by the receiver, which is reminiscent of throughput. Note that the value of a packet continues to decay after it is received until ultimate staleness. On the other hand, the *packet VoI* is simply the instantaneous value of a packet at the moment it is delivered to the receiver. By comparing the initial value and the packet value, we aim to understand the effect of communication on the lost value.

We note that the use of deadlines has been a topic of research in earlier works in the literature on AoI, motivating us to further explore it in the context of value of information updates. Reference [34] shows how packet deadlines, buffer sizes and packet replacement influence average AoI. Closed-form expressions for average AoI with deadline are derived in [35,36]. Reference [37] studies AoI in a status update system with random packet deadlines and infinite buffer capacity.

Previous works in [29–33] have components related to our view on value of information. For example, references [29,32] consider the quality of information associated with the distortion observed at the receiving end and [38] considers partial updates. Similarly, [31,39] relate the timeliness of observations with the correctness of information. The author of [30] considers age and the value of information with a notion of value taking into account the non-linear costs regarding information updates in various queuing disciplines. The work in [33] evaluates the value of information in addition to age of information in uplink/downlink transmissions in network control systems. The authors of [40] study the performance of VoI and AoI in a first responders' health monitoring system; their VoI metric is very closely related to our VoI metric originally presented in [41]. In the current paper, we propose a new notion of VoI where a packet's inherent properties at the time of generation determine its value, in contrast to a value evaluated after processing at the receiver as in previous work. We investigate VoI in M/GI/1/1, M/GI/1/2, M/GI/1/2\* and M/GI/1/1\* queuing disciplines and provide closed-form expressions for average sum VoI and packet VoI.

The work in this paper is a significantly extended version of our conference paper [41]. In particular, we include the following:


#### **2. System Model**

We consider a point-to-point communication system with a single transmitter sending status updates from a source to a receiver, as shown in Figure 1. The update packets arrive at the transmitter as a Poisson process with arrival rate *λ* at instants *ti*. A packet may be discarded in the queuing phase; those that are not discarded enter the server. A packet may also be preempted and discareded while undergoing service; otherwise, it is received by the receiver after system time *Ti* at *t <sup>i</sup>* = *ti* + *Ti*. In this paper, we cover M/GI/1/1, M/GI/1/2, M/GI/1/2\* and M/GI/1/1\* queuing schemes. In M/GI/1/1, there are no buffer and packets arriving in the server-busy state that are discarded. In M/GI/1/2, there is a single data buffer with a first come first serve discipline so that an arriving packet that finds the buffer occupied will be discarded. In M/GI/1/2\*, there is a single data buffer but, in this case, an arriving packet will preempt the packet stored in the buffer. In M/GI/1/1\*, there are no buffer and packets arriving in the server-busy state that will preempt the current packet in service. For the two no-buffer schemes M/GI/1/1 and M/GI/1/1\*, *Ti* = *Si* where *Si* is the service time for the *i*th packet, which is independent and identically distributed with *fS*(*s*). For the two schemes with buffer M/GI/1/2 and M/GI/1/2\*, *Ti* = *Si* + *Wi* where *Wi* is the waiting time for the *i*th packet. We derive *Ti* for different schemes in Section 3. We focus on these four queuing systems because previous research has shown that excessive queuing in large buffer systems can adversely impact AoI, and limited-buffer systems with packet management can improve AoI [12,34]. Since the value also potentially becomes worse with time, a similar behavior is expected for VoI.

**Figure 1.** System model with status update packets arriving at a single server transmission queue.

#### *2.1. Value of a Packet*

The *i*th update packet has initial value *V*0,*<sup>i</sup>* at the generation instant. This is a random sequence independent over different *i*. *V*0,*<sup>i</sup>* has the identical general distribution *fV*(*v*) with mean value E[*V*]. This initial value represents the importance of a packet for an application. It could be related to the precision of a measurement, proximity of the sensor to the measured object or it could indicate an alarm event. Each packet has a deterministic lifetime *D* after which it reaches ultimate staleness. Hence, after a fixed time period *D* from packet generation, the packet has no value for the receiver. We use *Vr*,*<sup>i</sup>* to denote the instantaneous value of the *i*th update packet when it is delivered to the receiver and *<sup>ρ</sup><sup>i</sup>* <sup>=</sup> *Vr*,*<sup>i</sup> <sup>V</sup>*0,*<sup>i</sup>* to denote the fraction of the initial value of the *i*th update packet that is delivered to the receiver.

Motivated by various applications of sensor networking and the value of information in them [1–6], in our model, we assume that packet *i*'s value can decrease from its time of generation at *ti* until it hits the deadline at *ti* + *D*. The value *Vi*(*τ*) = *hi*(*V*0,*i*, *τ*) for the *i*th packet decreases with *τ* = *t* − *ti*, representing the time passed after generation at the transmitter. This value keeps on decreasing (even after a packet is received) until it becomes zero. We have *hi*(*V*0,*i*, 0) = *V*0,*<sup>i</sup>* and *hi*(*V*0,*i*, *D*) = 0. In this paper, we consider two different *descend functions h*(.) for the value: (i) constant value and (ii) linear descend. The former models the case where the packet's value does not change with time as long as it is delivered by the deadline, while the latter models the case where a packet that is delivered earlier has a higher value. In the constant value case, we have the following.

$$V\_i(\tau) = h\_i(V\_{0,i\prime}\tau) = \begin{cases} V\_{0,i} & (\tau < D) \\ 0 & (\tau > D) . \end{cases} \tag{1}$$

In the linear case, since *hi*(*V*0,*i*, 0) = *V*0,*<sup>i</sup>* and *hi*(*V*0,*i*, *D*) = 0, we have a linear descend function.

$$V\_i(\tau) = h\_i(V\_{0,i}, \tau) = \begin{cases} -\frac{V\_{0,i}}{D}\tau + V\_{0,i} & (\tau < D) \\ 0 & (\tau > D) . \end{cases} \tag{2}$$

Then we have the following:

$$V\_{r,i} = h\_i(V\_{0,i'}T\_i)\_\prime \tag{3}$$

$$\rho\_i = \frac{h\_i(V\_{0,i}, T\_i)}{V\_{0,i}},\tag{4}$$

for packets that are deliverd to the receiver. We set *Vr*,*<sup>i</sup>* = 0, *ρ<sup>i</sup>* = 0, for packets that are not delivered to the receiver.

#### *2.2. Value-Dependent Service Times*

We consider two possibilities for a packet's service time. In one model, the service times are independent of the initial value of a packet. In another model, the service time of a packet depends on the initial value of the packet through a non-decreasing function *g*.

$$S\_i = \mathcal{g}(V\_{0,i}).\tag{5}$$

In this case, the distribution function of *Si* is *fS*(*s*) = *fV*(*g*−1(*s*)) *dg*−1(*s*) *ds* where *<sup>g</sup>*−1(.) is the inverse function of *g*(.), and the mean service time is E[*S*] = E[*g*(*V*)]. Corresponding to the general distribution, we have the moment generating function (MGF) evaluated at −*γ* for *γ* ≥ 0:

$$\mathcal{M}\_S(\gamma) \triangleq \mathbb{E}[e^{-\gamma S}].$$

This monotonic relation reflects the fact that a larger packet takes longer time to transmit and its reception yields more value. This relation causes an interesting tradeoff between value and age as a larger value is obtained at the receiver by paying a longer service time.

In this paper, we consider two definitions for VoI. The first one is Υsum, which denotes the sum VoI, i.e., the sum of the current values of all packets received by the receiver (cf. [4–6] where the additive nature of VoI is discussed in various wireless sensor networks). Hence, Υsum(*t*) is as follows:

$$\mathcal{Y}\_{\text{sum}}(t) = \sum\_{j=1}^{\bar{i}\_l} V\_j(t) \tag{6}$$

where *it* = max{*i* : *t <sup>i</sup>* ≤ *t*}. The time average of Υsum(*t*) is the following.

$$\mathbb{E}[\mathbf{Y}\_{\text{sum}}] = \lim\_{T \to \infty} \frac{1}{T} \int\_{t=0}^{T} \mathbf{Y}\_{\text{sum}}(t). \tag{7}$$

Another definition is Υpacket, which measures the instantaneous value of a packet at the moment it is delivered to the receiver (if it is delivered). Packets that are dropped are assumed to have zero value. The average packet VoI is then defined as follows.

$$\mathbb{E}[\mathbf{Y}\_{\text{packet}}] = \mathbb{E}[V\_{r,i}].\tag{8}$$

E[*ρi*] is the expected fraction of the initial value that is delivered to the receiver, which illustrates the amount of value received by the receiver compared to the generated initial value at the source. We reiterate that E[*Vr*,*i*] and E[*ρi*] are expectations over *all* packets; dropped packets contribute zero received value.

We illustrate the evolution of value with an example. In Figures 2 and 3, the evolution of value for specific packets generated over time is shown in an M/GI/1/1 system with constant value and linearly descending values, respectively. We use *Xi* to denote the interarrival period between two packets *i* − 1 and *i*. Therefore, *Xi* is an exponentially distributed random variable with rate parameter *λ*. Packet 1 finds the server idle and begins service at *t*1; service ends at *t* <sup>1</sup>. Packet 2 arrives between *t*<sup>1</sup> and *t* <sup>1</sup>, and it is discarded. The service of

packet 1 finishes at *t* <sup>1</sup> before the deadline of packet 1, *D*<sup>1</sup> = *t*<sup>1</sup> + *D*. The value of packet 1 at *t* <sup>1</sup>, when received by the receiver, is non-zero, and it becomes zero at *D*1. Packets 3, 4 and 5 arrive to the system during the idle period, and they are received at *t* <sup>3</sup>, *t* <sup>4</sup> and *t* <sup>5</sup>. Note that when packet 4 is received, packet 3 has a non-zero value; thus, the sum VoI, which is shown with a solid red line, is the sum of the values of these packets.

**Figure 2.** Evolution of value in M/GI/1/1 system when the value remains constant until deadline.

**Figure 3.** Evolution of thevalue in the M/GI/1/1 system with linearly descending values.

We define areas *Qi* under the rectangular regions of the curve shown in Figure 2 or the triangular regions of the curve shown in Figure 3, and we set *Qi* = 0 for packets discarded in the queuing phase. Then, the expected sum VoI at the receiver is as follows:

$$\mathbb{E}[\mathbf{Y}\_{\text{sum}}] = \lambda \mathbb{E}[\mathbf{Q}\_{i}]\_{\text{\textquotedblleft}} \tag{9}$$

where *λ* is the arrival rate of packets at the transmitter.

#### **3. Evaluating Value of Information**

In this section, we derive closed-form expressions for E[*Vr*,*i*], E[*Qi*] and E[*ρ*] for the various queuing systems. E[Υpacket] and E[Υsum] can then be obtained by using Equations (8) and (9).

#### *3.1. Average VoI for M/GI/1/1*

In the M/GI/1/1 queueing system, there is a single server and no buffer. Packets that arrive in the idle period are taken to service immediately and those arriving in busy period are dropped. In view of the renewal structure, we have the following stationary probabilities for each state:

$$p\_I = \frac{1}{\lambda T\_{\text{cycle}}}, \ p\_B = \frac{\mathbb{E}[S]}{T\_{\text{cycle}}},\tag{10}$$

where *T*cycle = <sup>1</sup> *<sup>λ</sup>* + E[*S*] is the expected length of one renewal cycle; and *I* and *B* indicate the idle and busy states. In the M/GI/1/1 system, packets are delivered to the receiver if they arrive when the server is idle. Recall that if the total time spent by the packet before reaching the receiver is larger than *D*, its value vanishes. Since a packet that is taken to service spends service time *Si* in the queue before reaching the receiver, the packet's value vanishes if *Si* is larger than *D*. Hence, we just need to consider condition *Si* < *D* and *i* arriving in idle states. Based on the two time-dependent functions for the value shown in (1) and (2) and the relationship shown in (3)–(5), we have the following:

$$\mathbb{E}[V\_{r,i}] = p\_I \int\_0^{\tilde{\mathcal{V}}} h\_i(v\_\prime \mathcal{g}(v)) f\_V(v) dv\_\prime \tag{11}$$

$$\mathbb{E}[\rho\_i] = p\_I \int\_0^{\mathbb{P}} \frac{h\_i(v, g(v))}{v} f\_V(v) dv,\tag{12}$$

$$\mathbb{E}[Q\_i] = p\_I \int\_0^{\tilde{\mathcal{V}}} \int\_{\mathcal{S}(v)}^D h\_i(v, \mathbf{r}) f\_V(v) d\mathbf{r} dv\_\prime \tag{13}$$

where *V*˜ = *g*−1(*D*) denotes the corresponding initial value when the related service time is equal to the deadline.

#### *3.2. Average VoI for M/GI/1/2*

In the M/GI/1/2 queueing system, there is a single buffer. The server is in either idle or busy states. Packets that arrive in the idle period are served immediately; those that arrive in the busy period are stored in the buffer if there is no other packet in it and they are discarded otherwise. In view of the renewal structure, we have the following stationary probabilities for each state of the server:

$$p\_I = \frac{1}{\lambda T\_{\text{cycle}}}, \ p\_B = \frac{\mathbb{E}[\mathcal{S}]}{T\_{\text{cycle}} M\_S(\lambda)},\tag{14}$$

where we use *MS*(*λ*) to denote the moment generating function of the service distribution evaluated at −*λ*:

$$M\_S(\lambda) = \mathbb{E}[e^{-\lambda S}],\tag{15}$$

where *T*cycle = <sup>1</sup> *<sup>λ</sup>* <sup>+</sup> <sup>E</sup>[*S*] *MS*(*λ*) is the expected length of one renewal cycle. Next, we evaluate E[*Vr*,*i*] and E[*Qi*|(*s*)] for *s* ∈ S*M*/*G I*/1/2 = {*I*, *B*} and conditioning is on the server state observed by packet *i*. Due to the PASTA property, Pr[*Pi* = (*s*)] = *ps*, where *ps*, *s* ∈ S*M*/*G I*/1/2 are as in (14).

#### 3.2.1. Idle State Analysis

As a packet arriving in the idle state is served immediately, we have the following.

$$\mathbb{E}[V\_{r,i}|I] = \int\_0^{\tilde{\mathcal{V}}} h\_i(v, \mathcal{g}(v)) f\_V(v) dv\_\star \tag{16}$$

$$\mathbb{E}[\rho\_i|I] = \int\_0^{\mathcal{V}} \frac{h\_i(v, g(v))}{v} f\_V(v) dv,\tag{17}$$

$$\mathbb{E}[Q\_i|I] = \int\_0^{\tilde{\mathcal{V}}} \int\_{\mathcal{S}(\upsilon)}^D h\_i(\upsilon, \mathfrak{r}) f\_V(\upsilon) d\tau d\upsilon. \tag{18}$$

#### 3.2.2. Busy State Analysis

Since only the first packet that arrives during the busy period is served and the others are discarded, we introduce a lemma for the probability that an arriving packet is the first one that arrives in the busy state. To do so, we first define states *B*<sup>1</sup> and *B*<sup>2</sup> as the busy states of the server with zero and one packet waiting in the queue, respectively. The renewal cycle is as follows. After the idle period, an arrival happens and the system turns to *B*<sup>1</sup> state. Now, a time duration of service *S* starts and if during the service period another arrival occurs, the system turns to *B*<sup>2</sup> state. This back-and-forth between *B*<sup>1</sup> and *B*<sup>2</sup> states continues until no packet arrives in one service time. We provide an example in Figure 4 for the three states in the M/GI/1/2 scheme. At time *t*0, packet 1 arrives and finds the system idle. Packet 2 finds the system in *B*<sup>1</sup> state at *t*<sup>1</sup> and is stored in the buffer. Packet 3 finds the system in *B*<sup>2</sup> state at *t*<sup>2</sup> and is dropped.

**Figure 4.** Three states that can be observed by packets in M/GI/1/2 scheme.

This renewal structure yields the following result.

**Lemma 1.** *In the M/GI/1/2 scheme, the waiting time of a packet in the buffer conditioned on its arrival in B*<sup>1</sup> *state is as follows*

$$\begin{split} \mathbb{E}[\mathcal{W}\_{\mathcal{B}\_{2}}] &= \mathbb{E}[S - X | X < S] \Pr[X < S] \\ &= \mathbb{E}[S] + \frac{1}{\lambda} \mathcal{M}\_{S}(\lambda) - \frac{1}{\lambda} .\end{split}$$

*The stationary probability of B*<sup>2</sup> *state is as follows:*

$$p\_{B\_2} = p\_B \frac{\mathbb{E}[\mathcal{W}\_{B\_2}]}{\mathbb{E}[S]} = p\_B \left( 1 + \frac{M\_S(\lambda) - 1}{\lambda \mathbb{E}[S]} \right) \lambda$$

*and the probability of B*<sup>1</sup> *state is pB*<sup>1</sup> = *pB* − *pB*<sup>2</sup> *.*

Then, we have E[*Qi*|*B*] = E[*Qi*|*B*1] and we provide the probability distribution function for the conditional residual service time *W* under the condition that the packet arrives in the *B*<sup>1</sup> state:

$$\begin{aligned} \mathbb{P}[W' > w] &= \mathbb{P}[S - X > w | X < S] \\ &= \frac{\int\_w^{\infty} \int\_0^{s - w} f\_S(s) f\_X(x) dx ds}{\mathbb{P}[X < S]} \\ &= \frac{\int\_w^{\infty} f\_S(s) (1 - e^{-\lambda(s - w)}) ds}{1 - M\_S(\lambda)}. \end{aligned}$$

and we have the following.

$$f\_{\mathcal{W}'}(w) = \frac{d(1 - \mathbb{P}[\mathcal{W}' > w])}{dw}. \tag{19}$$

Then, we have the following.

$$\mathbb{E}[V\_{r,i}|B\_1] = \int\_0^{\tilde{\mathcal{V}}} \int\_0^{D-\mathcal{g}(v)} h\_i(v, \mathcal{g}(v) + w) f\_{W'}(w) f\_V(v) dw dv,\tag{20}$$

$$\mathbb{E}[\rho\_i|B\_1] = \int\_0^{\tilde{\mathcal{V}}} \int\_0^{D-\mathcal{g}(v)} \frac{h\_i(v, \mathcal{g}(v) + w)}{v} f\_{W'}(w) f\_V(v) dw dv,\tag{21}$$

$$\mathbb{E}[Q\_i|B\_1] = \int\_0^{\mathcal{V}} \int\_0^{D-\mathfrak{F}(v)} \int\_{\mathcal{S}(v)+w}^D h\_i(v,\tau) f\_{W'}(w) f\_V(v) d\tau dw dv. \tag{22}$$

Therefore, we have E[*Vr*,*i*] = E[*Vr*,*i*|*I*]*pI* +E[*Vr*,*i*|*B*1]*pB*<sup>1</sup> , E[*ρi*] = E[*ρi*|*I*]*pI* +E[*ρi*|*B*1]*pB*<sup>1</sup> and E[*Qi*] = E[*Qi*|*I*]*pI* + E[*Qi*|*B*1]*pB*<sup>1</sup> .

#### *3.3. Average VoI for M/GI/1/2\**

The M/GI/1/2\* queueing system is the same as M/GI/1/2 except that we use a last-come first-serve order with packet discarding in the buffer. The latest packet arriving in a busy period takes the place of the old packet in the buffer. Therefore, we have the same stationary probabilities for each state as the M/GI/1/2 system in (14). Additionally, the expressions for E[*Vr*,*i*|*I*], E[*ρi*|*I*] and E[*Qi*|*I*] are the same as in (16)–(18) separately. We now derive expressions for E[*Qi*|*B*] and E[*Vr*,*i*|*B*].

#### Busy State Analysis

If the *i*th packet arrives to the server during the busy period, it will be transmitted to the receiver conditioned on event {*Xi* > *Wi*−1}, which means the next packet arrives for the server after the current service finishes. *W* is the general residual service time for all packets arriving in the busy state, and we have the following: *fW*(*w*) = <sup>P</sup>[*S*>*w*] <sup>E</sup>[*S*] . Then, the following is the case.

$$\mathbb{E}[V\_{r,i}|B] = \int\_0^{\hat{V}} \int\_0^{D-\mathcal{G}(v)} \int\_w^{\infty} h\_i(v, \mathcal{G}(v) + w) f\_{\mathcal{X}}(x)$$

$$f\_W(w) f\_V(v) dx dw dv,\tag{23}$$

$$\mathbb{E}[\rho\_i|\mathcal{B}] = \int\_0^{\mathcal{V}} \int\_0^{D-\mathcal{g}(v)} \int\_w^{\infty} \frac{h\_i(v, \mathcal{g}(v) + w)}{v} f\_{\mathcal{X}}(x)$$

$$f\_{\mathcal{W}}(w) f\_{\mathcal{V}}(v) dx dw dv\_\prime \tag{24}$$

$$\mathbb{E}[Q\_i|B] = \int\_0^{\hat{\mathcal{V}}} \int\_0^{D-\mathcal{S}(v)} \int\_w^{\infty} \int\_{\mathcal{S}(v)+w}^D h\_i(v,\tau) f\_X(x)$$

$$f\_W(w) f\_V(v) d\tau d\mathbf{x} dwdv. \tag{25}$$

Therefore, we have E[*Vr*,*i*] = E[*Vr*,*i*|*I*]*pI* +E[*Vr*,*i*|*B*1]*pB*<sup>1</sup> , E[*ρi*] = E[*ρi*|*I*]*pI* +E[*ρi*|*B*1]*pB*<sup>1</sup> and E[*Qi*] = E[*Qi*|*I*]*pI* + E[*Qi*|*B*1]*pB*<sup>1</sup> .

#### *3.4. Average VoI for M/GI/1/1\**

In the M/GI/1/1\* queueing system, there is no buffer and a new packet that arrives during busy state will preempt the current packet in service. Since the arrival process is a Poisson with rate *λ*, *pe*, the probability that a packet is delivered to the receiver is given by the following:

$$p\_{\mathfrak{c}} = \mathbb{P}[S\_i < X\_{i+1}] = M\_{\mathbb{S}}(\lambda),\tag{26}$$

which means, in preemption scheme, only the packet that has a service time less than the upcoming inter-arrival period is delivered to the receiver. We use relation *fG*|*G*<*F*(*t*) = *fG*(*t*) <sup>P</sup>(*F*>*t*) <sup>P</sup>(*G*<*F*) from [13] where *<sup>G</sup>* and *<sup>F</sup>* are arbitrary random variables. Since P(*<sup>G</sup>* < *<sup>F</sup>*) = *MG*(*λ*) and P(*F* > *t*) = *e*−*tλ*, we have the probability density function for conditional service time.

$$f\_{S|S$$

We use *S* to denote the conditional service time *S*; therefore, we have *fS*(*s*) = *fS*|*S*<*X*(*s*). In this case, we rewrite Equation (1) as follows:

$$h\_i(\mathcal{g}^{-1}(s), \tau) = \begin{cases} \mathcal{g}^{-1}(S\_i') & (\tau < D) \\ 0 & (\tau > D) \end{cases} \tag{28}$$

and Equation (2) as the following.

$$h\_i(\mathcal{g}^{-1}(s), \tau) = \begin{cases} -\frac{\mathcal{g}^{-1}(S\_i')}{D}\tau + \mathcal{g}^{-1}(S\_i') & (\tau < D) \\ 0 & (\tau > D) \end{cases} \tag{29}$$

Then, we have the following.

$$\mathbb{E}[V\_{r,i}] = p\_{\varepsilon} \int\_{0}^{D} h\_i(\mathbf{g}^{-1}(\mathbf{s}), \mathbf{s}) f\_{\mathbb{S}'}(\mathbf{s}) d\mathbf{s},\tag{30}$$

$$\mathbb{E}[\rho\_i] = p\_\varepsilon \int\_0^D \frac{h\_i(\mathcal{g}^{-1}(s), s)}{\mathcal{g}^{-1}(s)} f\_{\mathcal{S}'}(s) ds,\tag{31}$$

$$\mathbb{E}[Q\_i] = p\_\varepsilon \int\_0^D \int\_s^D h\_i(\mathcal{g}^{-1}(s), \mathbf{r}) f\_{S'}(s) d\mathbf{r} ds,\tag{32}$$

#### **4. Numerical Results**

In this section, we provide numerical results for average VoI for various cases. We also perform packet-based queue simulations offline for 10<sup>6</sup> packets as verification of the analytical results. An example of our simulation results is shown in Figure 5. We use *g*(*V*) = *V* as the relation between service time and value to model the case where the value is directly proportional to the packet size. Results are presented for three different distributions for the initial value of packets.

#### *4.1. Uniformly Distributed Initial Value*

First, we assume that the initial value of each packet is uniformly distributed between *V*min and *V*max and the value follows the linear descend function. In Appendix A, we provide closed-form expressions for E[Υsum] and E[Υpacket] in various systems with linearly descending value.

We show a comparison of average Υsum and average Υpacket in Figure 5. In Figure 5a, we show average Υsum versus arrival rate *λ* for the four queuing schemes. We observe that M/GI/1/1 and M/GI/1/2\* perform better than M/GI/1/1\* as *λ* increases. In particular, due to the linear relation between time and value, keeping a packet in the buffer to keep the server busy turns out to yield smaller value at the receiver with respect to keeping none

and serving only the freshest packets. For M/GI/1/1\* and M/GI/1/2, on the other hand, there is an optimal value of *λ* after which average Υsum drops. For M/GI/1/2, it is due to undesired increases in waiting times in the data buffer while for M/GI/1/1\*, it is due to undesired decrease in the number of delivered packets.

**Figure 5.** (**a**) Average Υsum and (**b**) average Υpacket for uniformly distributed initial value with linear descend function versus *λ*; *V*min = 0, *V*max = 10, *D* = 8. Circles are simulation results.

In Figure 5b, we show Υpacket versus arrival rate *λ* for the four queuing schemes. Again, we observe that M/GI/1/1 performs better than the other three.We observe that as *λ* increases, E[Υpacket] decreases in all four queuing schemes due to the fact that most of the generated packets are discarded in the queuing phase and have zero value for the receiver.

In Figure 6, we show E[*ρ*], which denotes the average ratio of the received value compared to the generated values over all the generated packets. We observe that as *λ* increases, E[*ρ*] decreases in all four queuing schemes, which matches the result for E[Υpacket]. However, interestingly, M/GI/1/1\* scheme performs best for E[*ρ*]. This is because as *λ* increases, even though there will be more packets dropped, the packets delivered to the receiver have smaller service times, which increases the ratio of the delivered value to the initial value.

**Figure 6.** E[*ρ*] for uniformly distributed initial value with linear descend function versus *λ*; *V*min = 0, *V*max = 10, *D* = 8.

Next, we consder the case when the service times are independent of the initial values and are exponentially distributed with service rate *μ*. In Figure 7, we show the average Υsum versus arrival rate *λ* for the four queuing schemes. We observe that M/GI/1/1\* performs better than the other three.This is because the service time is independent of the initial value, and large-valued packets may have small service times. In particular, due to the linear relation between time and value, keeping a packet in the buffer to keep the server busy turns out to yield smaller values at the receiver compared to keeping none and serving only the freshest packets.

**Figure 7.** Average Υsum for uniformly distributed initial value with linear descend function and exponential independent service time versus *λ*; *V*min = 0, *V*max = 10, *D* = 8, *μ* = 0.2.

Finally, in Figure 8, we show the average Υsum versus service rate *μ* for the four queuing schemes when the service times are independent of the initial values and are exponentially distributed. We observe that M/GI/1/2 and M/G/1/2\* perform better than M/GI/1/1\* as *μ* increases. This is because, as the average service time deceases, fewer packets will expire, i.e., reach ultimate staleness, during the waiting period in the buffer, and in this case, having a buffer to store the packets turns out to yield larger value at the receiver with respect to dropping the packets in the server.

**Figure 8.** Average Υsum for uniformly distributed initial value with linear descend function and exponential independent service time versus *μ*; *V*min = 0, *V*max = 10, *D* = 8, *λ* = 1.

#### *4.2. Exponentially Distributed Initial Value*

Next, we consider *fV*(*v*) = *μve*−*μvv* with constant value. In this case, we have service rate *μ* = *μ<sup>v</sup>* due to *g*(*V*) = *V*. We compare average AoI with average sum VoI for the same schemes as both of them are time-average metrics over all the packets. In Appendix B, we provide closed-form expressions for E[Υsum] and E[Υpacket] in various systems for constant values.

In Figure 9a, we plot the average Υsum with respect to *λ* for various schemes. We observe that M/M/1/2\* always performs better than the others. This is connected to the fact that when the value of packet is constant over time, all packets received within the deadline contribute their full initial value. Since Υsum is the accumulated value of received packet values, the total value is higher if a packet is stored in the buffer instead of dropping it. At the same time, we observe that M/M/1/1\* performs the worst in terms of value since the dependence between service time and value causes higher value packets to be preempted in this system, resulting in no contribution to VoI at the receiver.

**Figure 9.** Average Υsum for exponentially distributed service time with constant value versus *λ* for M/M/1/1, M/M/1/2, M/M/1/2\* and M/M/1/1\* schemes with *μ<sup>v</sup>* = 1.5 and *D* = 3. (**a**) Dependent Value. (**b**) Independent Value.

Next in Figure 9b, we show average Υsum for independent initial value and service time under the same marginal distributions. We observe that, with independent service time, the M/M/1/1\* scheme becomes the best case while it is the worst case with dependent service time. The other three schemes yield higher values as the adverse relation between initial value and service rate is removed.

Finally, in Figure 10, we show E[*ρ*] versus deadline *D* for the four queuing schemes. We observe that, as *D* increases, E[*ρ*] for all queuing schemes increases, but never reaches threshold 1 due to the fact that some packets are discarded in the queuing phase.

**Figure 10.** E[*ρ*] for exponentially distributed service time with constant value versus *D* for *λ* = 1.

#### *4.3. Binary Distributed Initial Value*

We finally consider binary distributed initial value for two classes of update packets. Class 1 and class 2 packets have *V*0,*<sup>i</sup>* = *V*<sup>1</sup> and *V*0,*<sup>i</sup>* = *V*2. Each packet is independently chosen to be in class 1 or 2 with probability *p* and (1 − *p*), respectively. This situation models the case when a packet of one class contains a message about an alarming event yielding high value once received, whereas the other class of packets are assumed to be regular status updates.

In Figure 11, we set *V*<sup>1</sup> = 1.33, *V*<sup>2</sup> = 0.4 and *p* = 0.2. We compare plots showing average Υsum versus *λ* for three different service policies in an M/M/1/1 system. The first policy serves all packets without regard to the value, the second policy involves serving only class 1 packets, and the third policy serves only class 2 packets. Note that if the service time is dependent on the value, class 1 packets will have exponentially distributed service time with mean *E*[*S*] = *E*[*V*1], and similarly, class 2 packets will have exponentially distributed service time with mean *E*[*S*] = *E*[*V*1]. If the service time is independent of the value, both class packets will have exponentially distributed service time with *μ* = 1.5. Our numerical results show that when service time is independent of value, always serving the high-value packet will yield the highest average value. On the other hand, in the dependent case when arrival rate becomes large, serving the packet with low value but smaller service time and high probability will benefit the average Υsum compared to serving all the packets or serving the high-value packets with larger service time and low probability.

**Figure 11.** Exponentially distributed service time dependent on or independent of the binary value in M/M/1/1 scheme.

#### **5. Conclusions**

Age of information (AoI) is a well-known metric that quantifies the freshness of information at a receiver in status update systems. This metric ignores the potential differences in the importance of various update packets. In this paper, we consider the value of information in status update systems wherein packets have various initial values upon generation. We investigate various queuing disciplines with initial-value-dependent packet service times and obtain closed-form expressions for two different VoI metrics. Our numerical results illustrate the trade-off between the two VoI metrics and the contrast between these two metrics. We show average sum VoI and average packet VoI for different scenarios and the fraction of received value comparing to the inital value for different systems.

**Author Contributions:** Conceptualization, P.Z. and S.S.; formal analysis, P.Z.; investigation, P.Z.; supervision, S.S.; validation, P.Z.; writing—original draft, P.Z.; writing—review and editing, P.Z. and S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A.** E**[Υsum] and** E**[Υpacket] for Uniformly Distributed Initial Value with Linear Descend Function**

In uniform case, we have *fV*(*v*) = <sup>1</sup> *<sup>u</sup>* , where *u* = *V*max − *V*min, and we assume *g*(*V*) = *V*. Thus, we have the mean service time.

$$\mathbb{E}[S] = \mathbb{E}[V] = \frac{V\_{\text{max}} + V\_{\text{min}}}{2}.$$

*Appendix A.1. M/GI/1/1*

From (10), we have the following.

$$p\_I = \frac{1}{1 + \lambda \mathbb{E}[S]}$$

.

.

We calculate E[*Vr*,*i*] from (2) and (11) and we have the following.

$$\mathbb{E}[V\_{r,i}] = p\_I \int\_0^{\mathcal{V}} (v - \frac{v}{D} \mathbf{g}(v)) f\_V(v) dv.$$

Here, *V*˜ = *D*. Define *V*up = *V*˜ if *V*˜ < *V*max and *V*up = *V*max otherwise. Then, we have the following.

$$\begin{split} \mathbb{E}[V\_{r,i}] &= p\_I \int\_{V\_{\rm min}}^{V\_{\rm up}} (v - \frac{v}{D} \mathbf{g}(v)) f\_V(v) dv \\ &= \frac{p\_I}{\mu} \int\_{V\_{\rm min}}^{V\_{\rm up}} (v - \frac{v^2}{D}) dv \\ &= \frac{p\_I}{\mu} \left( \frac{1}{2} (V\_{\rm up}^2 - V\_{\rm min}^2) - \frac{1}{3D} (V\_{\rm up}^3 - V\_{\rm min}^3) \right) \end{split}$$

Then, we calculate E[*Qi*] from (2) and (13) and we have the following.

$$\begin{split} \mathbb{E}[Q\_{i}] &= p\_{I} \int\_{0}^{\bar{\mathcal{V}}} \int\_{\mathcal{\mathcal{S}}(v)}^{D} (v - \frac{v}{D}\tau) f\_{V}(v) d\tau dv \\ &= \frac{p\_{I}}{2} \int\_{V\_{\text{min}}}^{V\_{\text{up}}} \frac{v}{D} (D - v)^{2} f\_{V}(v) dv \\ &= \frac{p\_{I}}{2D\mu} \int\_{V\_{\text{min}}}^{V\_{\text{up}}} (D^{2}v - 2Dv^{2} + v^{3}) dv \\ &= \frac{p\_{I}}{2D\mu} \left( \frac{D^{2}}{2} (V\_{\text{up}}^{2} - V\_{\text{min}}^{2}) - \frac{2D}{3} (V\_{\text{up}}^{3} - V\_{\text{min}}^{3}) \right) \\ &+ \frac{1}{4} (V\_{\text{up}}^{4} - V\_{\text{min}}^{4}) \int . \end{split}$$

Finally, we have E[Υpacket] = <sup>1</sup> *pI* E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*]. *Appendix A.2. M/GI/1/2*

From (15), we have the following.

$$M\_S(\lambda) = \frac{1}{\mu \lambda} (e^{-\lambda V\_{\rm min}} - e^{-\lambda V\_{\rm max}}).$$

Then, from (14), we have the following.

$$\begin{aligned} p\_I &= \frac{M\_S(\lambda)}{M\_S(\lambda) + \lambda \mathbb{E}[S]}, \\ p\_B &= \frac{\lambda \mathbb{E}[S]}{M\_S(\lambda) + \lambda \mathbb{E}[S]}. \end{aligned}$$

From Lemma 1, we have the following.

$$p\_{B\_1} = \frac{1 - M\_S(\lambda)}{\lambda \mathbb{E}[S]} p\_{B'}$$

$$\mathbb{P}[W' > w] = \frac{\lambda \left(V\_{\text{max}} - w\right) + e^{\lambda \left(w - V\_{\text{max}}\right)} - 1}{\mu \lambda \left(1 - M\_S(\lambda)\right)}.$$

Then, from (19), we have the following.

$$f\_{W'}(w) = \frac{-1 + \lambda e^{\lambda(w - V\_{\text{max}})}}{\mu \lambda (M\_S(\lambda) - 1)}.$$

For the idle case, from (2), (16) and (18), we have the following.

$$\begin{split} \mathbb{E}[V\_{r,i}|I] &= \frac{1}{\mathfrak{u}} \Big( \frac{1}{2} (V\_{\text{up}}^2 - V\_{\text{min}}^2) - \frac{1}{3D} (V\_{\text{up}}^3 - V\_{\text{min}}^3) \Big), \\\\ \mathbb{E}[Q\_i|I] &= \frac{1}{2Du} \Big( \frac{D^2}{2} (V\_{\text{up}}^2 - V\_{\text{min}}^2) - \frac{2D}{3} (V\_{\text{up}}^3 - V\_{\text{min}}^3) \\ &+ \frac{1}{4} (V\_{\text{up}}^4 - V\_{\text{min}}^4) \Big). \end{split}$$

For the busy case, since waiting time *W* has the same domain of definition as initial value *V*0,*i*, there are three conditions: *D* < *V*max, *V*max < *D* < 2*V*max and *D* < 2*V*max. We show the expression for the condition *D* < *V*max, which corresponds to our parameter setting in numerical results. Then, from (2), (20) and (22), we have the following.

$$\begin{split} \mathbb{E}[V\_{r,i}|B\_{1}] &= \frac{1}{\imath t} \int\_{V\_{\min}}^{D} \int\_{V\_{\min}}^{D-v} (v - \frac{v}{D}(v+w)) f\_{W'}(w) dw dv \\ &= \frac{1}{\imath u^{2} \lambda (M\_{\mathcal{S}}(\lambda) - 1)} \left( \frac{D^{2} V\_{\min}}{6} - \frac{D^{3}}{24} - \frac{5V\_{\min}^{3}}{6} \right. \\ &\left. + \frac{17V\_{\min}^{4}}{24D} + (-\frac{D^{2}}{6} + \frac{V\_{\min}^{2}}{2} - \frac{5V\_{\min}^{3}}{6D} + \frac{DV\_{\min}}{2} \right. \\ &\left. - \frac{D}{2\lambda} + \frac{V\_{\min}^{2}}{2D\lambda} \right) e^{\lambda (V\_{\min} - V\_{\max})} \\ &- \frac{\epsilon^{-D\lambda} (D\lambda + 1) - \epsilon^{-\lambda V\_{\min}} (\lambda V\_{\min} + 1)}{D\lambda^{3}} e^{D\lambda} e^{-\lambda V\_{\max}} \end{split}$$

,

$$\begin{split} \mathbb{E}[Q\_{i}|B\_{1}] &= \frac{1}{\mathcal{U}} \int\_{V\_{\min}}^{D} \int\_{V\_{\min}}^{D-\upsilon} \frac{\upsilon}{D} (D - (\upsilon + w))^{2} f\_{W}(w) dw d\upsilon \\ &= \frac{1}{\mathcal{u}^{2} \lambda (M\_{S}(\lambda) - 1)} \left( \frac{D^{3} V\_{\min}}{12} - \frac{2D V\_{\min}^{3}}{3} - \frac{D^{4}}{60} \right) \\ &- \frac{2e^{-\lambda V\_{\max}}}{\lambda^{3}} + \frac{17 V\_{\min}^{4}}{12} - \frac{49 V\_{\min}^{5}}{60 D} - \frac{2e^{-\lambda V\_{\max}}}{D \lambda^{4}} \\ &+ (-\frac{D^{3}}{12} - \frac{5V\_{\min}^{3}}{3} - \frac{D^{2}}{3\lambda} + \frac{17 V\_{\min}^{4}}{12 D} + \frac{V\_{\min}^{2}}{\lambda} - \frac{D}{\lambda^{2}} \\ &+ \frac{D^{2} V\_{\min}}{3} + \frac{D V\_{\min}}{\lambda} - \frac{5V\_{\min}^{3}}{3 D \lambda} + \frac{V\_{\min}^{2}}{D \lambda^{2}} (e^{\lambda (V\_{\min} - V\_{\max})}) \\ &+ \left( \frac{2}{D \lambda^{4}} + \frac{2V\_{\min}}{D \lambda^{3}} \right) e^{\lambda (D - V\_{\min} - V\_{\max})} \Big). \end{split}$$

Finally, we have the following: E[*Vr*,*i*] = E[*Vr*,*i*|*I*]*pI* + E[*Vr*,*i*|*B*1]*pB*<sup>1</sup> and E[*Qi*] = E[*Qi*|*I*]*pI* + E[*Qi*|*B*1]*pB*<sup>1</sup> . Then, <sup>E</sup>[Υpacket] = <sup>1</sup> *pI*+*pB*<sup>1</sup> E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

*Appendix A.3. M/GI/1/2\**

For M/GI/1/2\* system, we have the same *pI*, *pB*, E[*Vr*,*i*|*I*] and E[*Qi*|*I*] as in the M/GI/1/2 system. Next, we calculate the E[*Vr*,*i*|*B*] and E[*Qi*|*B*]. We have the following.

$$f\_W(w) = \frac{\mathbb{P}[S > w]}{\mathbb{E}[S]} = \frac{V\_{\text{max}} - w}{\mu \mathbb{E}[S]}.$$

Then, we consider the condition *D* < *V*max and from (2), (23) and (25); we have the following.

<sup>E</sup>[*Vr*,*i*|*B*] = <sup>1</sup> *u* , *<sup>D</sup> V*min , *<sup>D</sup>*−*<sup>v</sup> V*min (*<sup>v</sup>* <sup>−</sup> *<sup>v</sup> <sup>D</sup>*(*<sup>v</sup>* <sup>+</sup> *<sup>w</sup>*))*<sup>e</sup>* <sup>−</sup>*λ<sup>w</sup> fW*(*w*)*dwdv* <sup>=</sup> <sup>1</sup> *u*2E[*S*] *V*max *<sup>λ</sup>*<sup>3</sup> <sup>−</sup> <sup>3</sup> *<sup>λ</sup>*<sup>4</sup> <sup>+</sup> 4 *<sup>D</sup>λ*<sup>5</sup> <sup>−</sup> *<sup>V</sup>*max *<sup>D</sup>λ*<sup>4</sup> + ( *<sup>D</sup> λ*3 <sup>−</sup> *<sup>D</sup>*<sup>2</sup> <sup>6</sup>*λ*<sup>2</sup> <sup>+</sup> *V*3 min <sup>2</sup>*<sup>λ</sup>* <sup>+</sup> *V*2 min <sup>2</sup>*λ*<sup>2</sup> <sup>−</sup> *<sup>V</sup>*<sup>2</sup> min*V*max <sup>2</sup>*<sup>λ</sup>* <sup>−</sup> <sup>5</sup>*V*<sup>4</sup> min 6*Dλ* <sup>−</sup> <sup>4</sup>*V*<sup>3</sup> min <sup>3</sup>*Dλ*<sup>2</sup> <sup>−</sup> *<sup>V</sup>*<sup>2</sup> min *<sup>D</sup>λ*<sup>3</sup> <sup>+</sup> *D*(2*V*min − *V*max) <sup>2</sup>*λ*<sup>2</sup> <sup>+</sup> *DV*<sup>2</sup> min 2*λ* + *<sup>D</sup>*2(*V*max <sup>−</sup> *<sup>V</sup>*min) <sup>6</sup>*<sup>λ</sup>* <sup>+</sup> 5*V*<sup>3</sup> min*V*max <sup>6</sup>*D<sup>λ</sup>* <sup>+</sup> *V*2 min*V*max 2*Dλ*<sup>2</sup> <sup>−</sup> *DV*min*V*max <sup>2</sup>*<sup>λ</sup>* )*<sup>e</sup>* <sup>−</sup>*λV*min + ( <sup>4</sup>*V*min *<sup>D</sup>λ*<sup>4</sup> <sup>+</sup> *V*max *<sup>D</sup>λ*<sup>4</sup> <sup>−</sup> *<sup>V</sup>*<sup>2</sup> min *Dλ*<sup>3</sup> <sup>−</sup> *<sup>V</sup>*min*V*max *<sup>D</sup>λ*<sup>3</sup> <sup>−</sup> *<sup>λ</sup>*<sup>4</sup> <sup>+</sup> *V*min *<sup>λ</sup>*<sup>3</sup> <sup>−</sup> <sup>4</sup> *<sup>D</sup>λ*<sup>5</sup> )*<sup>e</sup> λ*(*V*min−*D*) ,

$$\begin{split} \mathbb{E}[Q\_{i}|B] &= \frac{1}{\mathfrak{u}} \int\_{V\_{\min}}^{D} \int\_{V\_{\min}}^{D-v} \frac{\upsilon}{D} (D-(v+w))^{2} e^{-\lambda \mathbf{x}} f\_{\mathit{f}\mathbf{w}}(w) d\mathbf{w} d\mathbf{w} \\ &= \frac{1}{\mathfrak{u}^{2} \mathbb{E}[S]} \left( \frac{8}{\lambda^{5}} - \frac{2V\_{\max}}{\lambda^{4}} - \frac{10}{D\lambda^{6}} + \frac{2V\_{\max}}{D\lambda^{5}} \right) \\ & \qquad - \frac{3D}{\lambda^{4}} + \frac{2D^{2}}{3\lambda^{3}} - \frac{D^{3}}{12\lambda^{2}} - \frac{5V\_{\min}^{4}}{3\lambda} - \frac{8V\_{\min}^{2}}{3\lambda^{2}} - \frac{2V\_{\min}^{2}}{\lambda^{3}} \\ & \qquad + \frac{V\_{\min}^{3}V\_{\max}}{3\lambda} + \frac{V\_{\min}^{3}V\_{\max}}{\lambda^{2}} + \frac{2V\_{\min}^{2}}{3\lambda} + \frac{17V\_{\min}^{5}}{12D\lambda} \\ & \qquad + \frac{3V\_{\min}^{4}}{12D\lambda^{2}} + \frac{13V\_{\min}^{3}}{3\lambda^{3}} + \frac{3V\_{\min}^{2}}{D\lambda^{4}} - \frac{3DV\_{\min}}{\lambda^{3}} + \frac{DV\_{\max}^{4}}{\lambda^{3}} \\ & \qquad - \frac{DV\_{\min}^{2}}{\lambda^{2}} + \frac{2D^{2}V\_{\min}}{3\lambda^{2}} - \frac{D^{2}V\_{\min}}{12\lambda} - \frac{D^{2}V\_{\max}}{3\lambda^{2}} +$$

Finally, we have the following: E[*Vr*,*i*] = E[*Vr*,*i*|*I*]*pI* + E[*Vr*,*i*|*B*] *pB* and E[*Qi*] = E[*Qi*|*I*]*pI* + E[*Qi*|*B*]*pB*. Then, <sup>E</sup>[Υpacket] = <sup>1</sup> *pI*+*pB*<sup>1</sup> E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

*Appendix A.4. M/GI/1/1\**

Since we have *MS*(*λ*) = <sup>1</sup> *<sup>u</sup><sup>λ</sup>* (*e*−*λV*min <sup>−</sup> *<sup>e</sup>*−*λV*max ), from (26) we have *pe* <sup>=</sup> *MS*(*λ*), and from (27), we have the following.

$$f\_{S'}(s) = \frac{e^{-\lambda s}}{\
u M\_S(\lambda)}.$$

Note that due to *g*(*V*) = *V*, conditional service time *S* has the same domain of definition as the initial value *V*0,*i*. Then, we calculate E[*Vr*,*i*] from (29) and (30), and we have the following.

$$\begin{split} \mathbb{E}[V\_{r,i}] &= p\_{\varepsilon} \int\_{V\_{\min}}^{V\_{\text{up}}} (s - \frac{s}{D}s) f\_{S'}(s) ds \\ &= \frac{p\_I}{u \mathcal{M}\_S(\lambda)} \left( \frac{e^{-\lambda V\_{\min}} (\lambda V\_{\min} + 1)}{\lambda^2} - \frac{e^{-\lambda V\_{\text{up}}} (\lambda V\_{\text{up}} + 1)}{\lambda^2} \right) \\ &- \frac{e^{-\lambda V\_{\min}} (\lambda^2 V\_{\min}^2 + 2\lambda V\_{\min} + 2)}{D\lambda^3} \\ &- \frac{e^{-\lambda V\_{\text{up}}} (\lambda^2 V\_{\text{up}}^2 + 2\lambda V\_{\text{up}} + 2)}{D\lambda^3} \Bigg). \end{split}$$

From (29) and (32), we have the following.

$$\begin{split} \mathbb{E}[Q\_{i}] &= \frac{p\_{\mathcal{C}}}{2} \int\_{V\_{\min}}^{V\_{\text{up}}} \frac{s}{D} (D-s)^{2} f\_{\mathcal{S}'}(s) ds \\ &= \frac{p\_{\mathcal{I}}}{2uM\_{\mathcal{S}}(\lambda)D\lambda^{4}} \Big( e^{-\lambda V\_{\min}} (D^{2}\lambda^{3}V\_{\min} + D^{2}\lambda^{2} - 2D\lambda^{3}V\_{\min})^{2} \\ &- 4D\lambda^{2}V\_{\min} - 4D\lambda + \lambda^{3}V\_{\min}{}^{3} + 3\lambda^{2}V\_{\min}{}^{2} + 6\lambda V\_{\min} + 6) \\ &- e^{-\lambda V\_{\text{up}}} (D^{2}\lambda^{3}V\_{\text{up}} + D^{2}\lambda^{2} - 2D\lambda^{3}V\_{\text{up}}{}^{2} \\ &- 4D\lambda^{2}V\_{\text{up}} - 4D\lambda + \lambda^{3}V\_{\text{up}}{}^{3} + 3\lambda^{2}V\_{\text{up}}{}^{2} + 6\lambda V\_{\text{up}} + 6) \Big). \end{split}$$

Finally, we have E[Υpacket] = <sup>1</sup> *pe* E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

#### **Appendix B.** E**[Υsum] and** E**[Υpacket] for Constant Value with Exponentially Distributed Initial Value**

For an exponentially distributed initial value, we have *fV*(*v*) = *<sup>μ</sup>e*−*μv*, <sup>E</sup>[*S*] = <sup>E</sup>[*V*] = <sup>1</sup> *<sup>μ</sup>* and *<sup>V</sup>*<sup>2</sup> <sup>=</sup> *<sup>D</sup>*.

#### *Appendix B.1. M/M/1/1*

From (10), we have the following.

$$p\_I = \frac{\mu}{\lambda + \mu}.$$

Next, we calculate E[*Vr*,*i*] from (1) and (11) and we have the following.

$$\begin{aligned} \mathbb{E}[V\_{r,i}] &= p\_I \int\_0^D v f\_V(v) dv \\ &= p\_I \int\_0^D v f\_V(v) dv \\ &= p\_I \left( -D\varepsilon^{-\mu D} - \frac{1}{\mu} (\varepsilon^{-\mu D} - 1) \right) \end{aligned}$$

.

Then, we calculate E[*Qi*] from (1) and (13) and we have the following.

$$\begin{split} \mathbb{E}[Q\_{i}] &= p\_{I} \int\_{0}^{D} \int\_{\mathcal{S}(v)}^{D} v f\_{V}(v) d\tau dv \\ &= p\_{I} \int\_{0}^{D} v(D-v) f\_{V}(v) dv \\ &= p\_{I} \int\_{0}^{D} (Dv - v^{2}) f\_{V}(v) dv \\ &= p\_{I} \left( -D^{2}e^{-\mu D} - \frac{D}{\mu} (e^{-\mu D} - 1) + D^{2}e^{-\mu D} \right) \\ &+ \frac{2}{\mu} D e^{-\mu D} + \frac{2}{\mu^{2}} (e^{-\mu D} - 1) \Big). \end{split}$$

Finally we have E[Υpacket] = <sup>1</sup> *pI* E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

*Appendix B.2. M/M/1/2*

From (15), we have the following.

$$M\_S(\lambda) = \frac{\mu}{\lambda + \mu}.$$

Then, from (14), we have the following.

$$p\_I = \frac{\mu^2}{\mu^2 + \mu\lambda + \lambda^2},$$

$$p\_B = \frac{\mu\lambda + \lambda^2}{\mu^2 + \mu\lambda + \lambda^2}.$$

From Lemma 1, we have the following.

$$p\_{B\_1} = \frac{\mu\lambda}{\mu^2 + \mu\lambda + \lambda^2},$$

$$\mathbb{P}[W' > w] = e^{-\mu w}.$$

Then, from (19), we have the following.

$$f\_{W'}(w) = \mu e^{-\mu w}.$$

For the idle case, from (1), (16) and (18), we have the following.

$$\mathbb{E}[V\_{r,i}|I] = -De^{-\mu D} - \frac{1}{\mu}(e^{-\mu D} - 1)\_{,i}$$

$$\begin{aligned} \mathbb{E}[Q\_i|I] &= -\, ^D\varepsilon^{-\mu D} - \frac{D}{\mu}(\varepsilon^{-\mu D} - 1) + D^2\varepsilon^{-\mu D} \\ &+ \frac{2}{\mu}De^{-\mu D} + \frac{2}{\mu^2}(\varepsilon^{-\mu D} - 1). \end{aligned}$$

For the busy case, from (1), (20) and (22), we have the following.

$$\begin{aligned} \mathbb{E}[V\_{r,i}|B\_1] &= \int\_0^D \int\_0^{D-v} v f\_V(v) f\_{W'}(w) dw dv \\ &= \frac{1}{\mu} - \frac{e^{-D\mu}(D\mu + 1)}{\mu} - \frac{D^2 \mu e^{-D\mu}}{2}. \end{aligned}$$

$$\begin{aligned} \mathbb{E}[Q\_i|B\_1] &= \int\_0^D \int\_0^{D-v} v(D - (v+w)) f\_V(v) f\_{W'}(w) dw dv \\ &= \frac{e^{-D\mu}}{2\mu^2} \left( 4D\mu - 6e^{D\mu} + D^2\mu^2 + 2D\mu e^{D\mu} + 6 \right). \end{aligned}$$

Finally, we have the following: E[*Vr*,*i*] = E[*Vr*,*i*|*I*]*pI* +E[*Vr*,*i*|*B*1]*pB*<sup>1</sup> and E[*Qi*] = E[*Qi*|*I*]*pI* + E[*Qi*|*B*1]*pB*<sup>1</sup> . Then, <sup>E</sup>[Υpacket] = <sup>1</sup> *pI*+*pB*<sup>1</sup> E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

#### *Appendix B.3. M/GI/1/2\**

For M/GI/1/2\* system, we have the same *pI*, *pB*, E[*Vr*,*i*|*I*] and E[*Qi*|*I*] as in the M/GI/1/2 system. Next, we calculate E[*Vr*,*i*|*B*] and E[*Qi*|*B*]. We have the following.

$$f\_W(w) = \frac{\mathbb{P}[S > w]}{\mathbb{E}[S]} = \mu e^{-\mu w}.$$

Then, from (1), (23) and (25), we have the following.

$$\begin{split} \mathbb{E}[V\_{r;i}|B] &= \int\_{0}^{D} \int\_{0}^{D-v} v e^{-\lambda w} f\_{V}(v) f\_{W}(w) dw dv \\ &= \frac{1 - e^{-D\mu}(D\mu + 1)}{\lambda + \mu} - \frac{\mu^{2} e^{-D\mu} \left(e^{-D\lambda} + D\lambda - 1\right)}{\lambda^{2} (\lambda + \mu)}, \\\\ \mathbb{E}[Q\_{i}|B] &= \int\_{0}^{D} \int\_{0}^{D-v} v (D - (v + w)) e^{-\lambda w} f\_{V}(v) f\_{W}(w) dw dv \\ &= \frac{e^{-D(\lambda + \mu)}}{\lambda^{2} \mu (\lambda + \mu)^{2}} (2\lambda^{3} e^{D\lambda} - \mu^{3} e^{D\lambda} + \mu^{3} + 3\lambda^{2} \mu e^{D\lambda}) \\ &- 2\lambda^{3} e^{D(\lambda + \mu)} + D\lambda \mu^{3} e^{D\lambda} + D\lambda^{3} \mu e^{D\lambda} \\ &- 3\lambda^{2} \mu e^{D(\lambda + \mu)} + 2D\lambda^{2} \mu^{2} e^{D\lambda} + D\lambda^{2} \mu^{2} e^{D(\lambda + \mu)} \\ &+ D\lambda^{3} \mu e^{D(\lambda + \mu)}). \end{split}$$

Finally, we have the following: E[*Vr*,*i*] = E[*Vr*,*i*|*I*]*pI* + E[*Vr*,*i*|*B*] *pB* and E[*Qi*] = E[*Qi*|*I*]*pI* + E[*Qi*|*B*]*pB*. Then, <sup>E</sup>[Υpacket] = <sup>1</sup> *pI*+*pB*<sup>1</sup> E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

#### *Appendix B.4. M/GI/1/1\**

Since we have *MS*(*λ*) = *<sup>μ</sup> <sup>λ</sup>*+*<sup>μ</sup>* , from (26) we have *pe* <sup>=</sup> *<sup>μ</sup> <sup>λ</sup>*+*<sup>μ</sup>* and from (27), we have the following.

$$f\_{\mathcal{S}'}(\mathbf{s}) = (\lambda + \mu)e^{-(\lambda + \mu)s}.$$

Note that since *g*(*V*) = *V*, we calculate E[*Vr*,*i*] from (29) and (30), and we have the following.

$$\begin{aligned} \mathbb{E}[V\_{r,i}] &= p\_{\mathfrak{c}} \int\_0^D s f\_{S'}(s) ds \\ &= p\_{\mathfrak{c}} (\frac{1}{\lambda + \mu} - D e^{-D(\lambda + \mu)} - \frac{e^{-D(\lambda + \mu)}}{\lambda + \mu}). \end{aligned}$$

From (29) and (32), we have the following.

$$\begin{aligned} \mathbb{E}[Q\_i] &= p\_\varepsilon \int\_0^D s(D-s) f\_{S'}(s) ds \\ &= \frac{p\_\varepsilon}{\left(\lambda + \mu\right)^2} (\varepsilon^{-D\left(\lambda + \mu\right)} (D\lambda + D\mu + 2) + D\lambda + D\mu - 2). \end{aligned}$$

Finally, we have E[Υpacket] = <sup>1</sup> *pe* E[*Vr*,*i*] and E[Υsum] = *λ*E[*Qi*].

#### **References**


## *Article* **Age of Information of Parallel Server Systems with Energy Harvesting**

**Josu Doncel**

Mathematics Department, University of the Basque Country, UPV/EHU, 48940 Leioa, Spain; josu.doncel@ehu.eus

**Abstract:** Motivated by current communication networks in which users can choose different transmission channels to operate and also by the recent growth of renewable energy sources, we study the average Age of Information of a status update system that is formed by two parallel homogeneous servers and such that there is an energy source that feeds the system following a random process. An update, after getting service, is delivered to the monitor if there is energy in a battery. However, if the battery is empty, the status update is lost. We allow preemption of updates in service and we assume Poisson generation times of status updates and exponential service times. We show that the average Age of Information can be characterized by solving a system with eight linear equations. Then, we show that, when the arrival rate to both servers is large, the average Age of Information is one divided by the sum of the service rates of the servers. We also perform a numerical analysis to compare the performance of our model with that of a single server with energy harvesting and to study in detail the aforementioned convergence result.

**Keywords:** parallel servers; energy harvesting; Age of Information

**Citation:** Doncel, J. Age of Information of Parallel Server Systems with Energy Harvesting. *Entropy* **2021**, *23*, 1549. https:// doi.org/10.3390/e23111549

Academic Editors: Anthony Ephremides and Yin Sun

Received: 2 November 2021 Accepted: 19 November 2021 Published: 21 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

#### *1.1. Motivation*

The Age of Information is a recent metric of the performance of systems and it measures the freshness of the information that a monitor has about the status of a remote process of interest. There is a wide range of applications in which information about a source must be as recent as possible. An example of this is given in autonomous driving systems since the location of the vehicles must be known as soon as possible. Or, in other words, obsolete information about the traffic might lead to bad consequences (traffic accidents, for instance) to the users.

Status update systems are formed by sources of generation status updates, a transmission channel and a monitor that receives the updates. The transmission channel takes care of sending the status updates from the source to the destination. It is clear that the devices of the transmission channel require energy to work. Therefore, it is important to consider energy consumption in the modeling of the transmission channel. Furthermore, there has been recently an increasing amount of different types of renewable energy sources that feed the energy network. Some examples are solar or wind energy sources, which are clearly very volatile. As a consequence, the randomness of the generation of energy also needs to be taken into account in the modeling of the transmission channel.

Current communication networks are very complex and often allow users to operate using different transmission channels. This is the case, for instance, when a user is a part of an overlay network (i.e., when it belongs to a set of nodes that are located in different spots over the Internet and collaborate with each other to forward data between any pair of nodes with minimum delay). In fact, in this instance, the user can choose the transmission channel that provides the IP protocol or through the overlay network. Therefore, in this work, we study the average Age of Information in a status update system with energy harvesting. That is, we consider that the transmission channel is formed by parallel servers that do not interchange information and a battery that can store energy that can be used to send status updates after getting service in the servers.

#### *1.2. Related Work*

The Age of Information has been introduced in [1,2] as a metric to measure the freshness of the information about the state of a remote system. Since its introduction, there has been many researcher of different areas that has been interested in analyzing this metric. In the first works following the seminal papers, the goal has been to characterize the average Age of Information of status update systems where the transmission channel is modeled as a single queue. For instance, the authors in [3] characterize the average Age of Information of a single server (i.e., a queue without buffer) and a single source. Regarding optimality, the authors in [4] show that the preemptive Last Generated First Served policy minimizes the Age of Information. Unfortunately, the characterization of the average Age of Information of many models is known to be an extremely difficult task. Therefore, some authors has been interested in other similar metric of performance such as the Peak Age of Information [5] or the Age of Incorrect Information [6]. We refer to the following surveys on this topic for full details of these metrics and their properties [7–9].

Let us now discuss the work of some authors that have been interested in analyzing the Age of Information of a system with energy harvesting. In [10,11] it is considered a system with Poisson arrivals of energy and that there is no losses of packets. Their goal is to find the optimal status updates policy such that the battery is not empty upon an arrival of a status update. The authors in [12–14] generalize the model of [10,11] by allowing status update losses and also focus on optimal policies for generation updates, with or without knowledge (or feedback) whether the status updates are delivered successfully. Our model, that has been in inspired by the Energy Packet Networks [15,16], is different from these models for different reasons. First, we do not impose the presence of energy to receive a status update. Another difference is that the generation of status updates follows a Poisson process in our model, which is not the case in these works. Finally, our goal is different since we are interested in characterizing the average Age of Information and studying its properties and, hence, we do not aim to find the optimal policy.

#### *1.3. Contribution*

We consider a system with two parallel homogeneous servers and one battery that stores energy packets. Energy packets model a certain amount of energy and are necessary to send the status updates (or data packets) to the monitor after ending service. This means that a data packet is sent to the battery when it ends service and, if the battery is empty, the data packet is lost, whereas if battery is not empty the data packet is delivered to the monitor and one energy packet disappears. We consider that arrivals of data packets and energy packets follow a Poisson process and the queues that handle data packets and energy packets do not have buffer. We allow preemption of data packets, i.e., when a data packet arrives to a server that is busy, the incoming packet replaces the packet in service.

The first contribution of this work is to characterize the average Age of Information of the above status update system using the Stochastic Hybrid System technique [17]. More specifically, we show that the average Age of Information can be computed by solving a system of 8 linear equations. We then consider the regime where the arrival rate of data packets to both servers tends to infinity and we show that the average Age of Information is one divided by two times the service rate of data packets.

The model we study here generalizes

• the work of Section IIIA of [18] where it is studied the Age of Information of two parallel servers. In our work, we consider energy harvesting in their model. In fact, when in our model the arrival rate of energy packets is very large, it coincides with the model of [18].

• the work of [19] where it is analyzed a system with a single server and energy harvesting. In our work, we consider the same energy harvesting model, but with two parallel servers.

We go beyond the above presented analytical results with a numerical work that we describe next. First, we aim to compare the performance of a single servers with two parallel servers with energy harvesting. For this purpose, we consider the following systems: (i) a single server with arrival rate *λ*/2 and service rate *μ*, (ii) a single server with arrival rate *λ* and service rate 2*μ* and (iii) two parallel servers with service rate *μ*, each of them handling an arrival rate of *λ*/2. Let us note that the ratio of the arrival rate over the service rate coincide in all the servers of the systems under consideration. This comparison has been previously done in Section IIIA of [18], but they do not consider energy harvesting. Our first finding is that, when the arrival rate of energy packets is very large, we obtain the plot as Figure 4 of [18] and, therefore, their conclusions follow in our model as well (i.e., the system with double service rate and a single server minimizes the average Age of Information). We then investigate whether the conclusions of [18] also hold when the arrival rate of energy packets is not large. We observe that the average Age of Information is smaller for the system with two parallel servers and this difference increases when we decrease the arrival rate of energy packets. Finally, we study how the average Age of Information converges, when the arrival rate to the servers increases, to the value obtained in our analytical part. We conclude that the average Age of Information is not monotone with respect to the arrival rate of energy packets when the arrival rate to both servers is small. However, the average Age of Information does not depend on the arrival rate of energy packets when the arrival rate of packets to both servers is very large.

Potential applications of this model include systems in which two different transmission channels can be chosen to send updates. This is the case, for instance, when the source that generates status updates is part of an overlay network (i.e., when it belongs to a set of nodes that are located in different spots over the Internet and collaborate with each other to forward data between any pair of nodes with minimum delay) and it can choose to send the status updates through the path the provides the IP protocol or through the overlay routing.

#### *1.4. Organization*

The rest of the paper is organized as follows. First, in Section 2, we describe the model we study in this article. The average Age of Information analysis of this model is presented in Section 3. In Section 4, we focus on our numerical work and, finally, in Section 5, we draw the main conclusions of this work.

#### **2. Model Description**

#### *2.1. Age of Information*

We study the transmission of status updates (or data packets) to a monitor. We consider that data packet *i* is generated at time *ti* and that it is delivered to the monitor at time *t i* . We denote by *N*(*t*) the index of the last successfully delivered data packet to the monitor at time *t*, i.e.,

$$N(t) = \max\{i|t'\_i \le t\}.$$

Taking into account that the generation time of the last received data packet before time *t* is *tN*(*t*), we define the Age of Information at time *t* as follows:

$$
\Delta(t) = t - t\_{N(t)'}
$$

that is, the Age of Information at time *t* is the time elapsed since the generation of the last delivered packet to the monitor. We show in Figure 1 an example of Δ(*t*).

**Figure 1.** An example of Δ(*t*).

Assuming that the updating system is stable, the average Age of Information can be computed as the area below a "saw-tooth" shaped curve with teeth at the times at which the data packets are delivered (see Figure 1). Hence, if we denote by Δ the average Age of Information, we have that

$$\Delta = \lim\_{\tau \to \infty} \frac{1}{\tau} \int\_0^\tau \Delta(t) dt.$$

In this article, we are interested in calculating the average Age of Information in an energy harvesting model. In the following section, we describe the model we analyze.

#### *2.2. Energy Harvesting Model*

In our model, we represent energy by packets of discrete units called energy packets that model a certain quantity of energy (energy packets) measured in Joules, whereas the status updates of a process of interest are represented by packets that we call data packets. We consider an energy harvesting model formed by two parallel queues that store data packets (data queues) and a single queue that stores the energy packets. We show in Figure 2 the model under consideration in this work.

**Figure 2.** The energy harvesting model with two parallel data queues and a single energy queue. Energy packets are depicted with gray and data packets with white.

Energy packets arrive to the system according to a Poisson process of rate *α* and data packets (or workload packets) with rate *λ*. Upon arrival, a packet is dispatched to data queue 1 with probability *p* > 0 and to data queue 2 with probability 1 − *p* > 0. Therefore, the arrival rate to data queue 1 is *λ*<sup>1</sup> = *λp* and to data queue 2 is *λ*<sup>2</sup> = *λ*(1 − *p*).

**Remark 1.** *The probability p can be seen as the willingness of a source to use an alternative path (for instance, the path of an overlay network) rather than the usual transmission channel.*

We consider that the service rate of jobs of data queue *i* is exponentially distributed with rate *μ*, i = 1, 2

In this model, we consider that data packets, i.e., the packets of the data queues, start the transfer to a single energy queue. This means that, when a data packet gets served (in data queue 1 or data queue 2), it is sent to the energy queue. If the energy queue is empty upon arrival of a data packet, the data packet is lost. However, if there are energy packets when a data packet arrives to the energy queue, the data packet is transferred successfully to the monitor and one energy packet disappears.

**Remark 2.** *Our model considers a single energy queue. This models that the destination requires energy to receive status updates. This occurs, for instance, when there is a wireless antenna in charge of receiving the status updates at the destination (indeed, in absence of energy the antenna cannot deliver packets to the monitor).*

Here, we assume that the energy queue and the data queues do not have buffer. Therefore, the number of packets in each queue is, at most, one. Besides, energy packets that arrive when the energy queue is full are dropped, whereas when a data packet arrives to a full data queue, it replaces the job in execution.

#### **3. Average Age of Information Analysis**

In this section, we aim to analyze the average Age of Information of a system formed by two parallel queues with energy harvesting. We will use the Stochastic Hybrid System method to characterize the average Age of Information of the system under consideration. The Stochastic Hybrid System is formed by two values: the state of a continuous time Markov Chain and a vector containing the generation times of all the packets in the system as well of the current Age of Information. The Markov chain we consider is presented in Figure A1.

Let *s*<sup>0</sup> *s*<sup>1</sup> ... *s*<sup>7</sup> be the solution of the following system of equations:

$$0 = -s\_0(\lambda + a) + s\_2\mu + s\_3\mu + s\_4\mu + s\_5\mu \tag{1a}$$

$$0 = s\_0 \mathfrak{a} - s\_1 \lambda \mathfrak{a} \tag{1b}$$

$$0 = s\_0 \lambda (1 - p) - s\_2 (\lambda p + \mu + \mathfrak{a}) + s\_6 \mu + s\_7 \mu \tag{1c}$$

$$0 = s\_1 \lambda (1 - p) + s\_2 \mathfrak{a} - s \mathfrak{z} (\lambda p + \mu) \tag{1d}$$

$$0 = s\_0 \lambda p - s\_4(\lambda(1 - p) + \mathfrak{a} + \mathfrak{y}) + s\_6 \mathfrak{y} + s \gamma \mathfrak{y} \tag{1e}$$

$$0 = s\_1 \lambda p + s\_4 \mathfrak{a} - s\_5(\lambda(1-p) + \mu) \tag{1f}$$

$$0 = s\_2 \lambda p + s\_4 \lambda (1 - p) - s\_6 (\mathfrak{a} + 2\mu) \tag{1g}$$

$$0 = s\_3 \lambda p + s\_5 \lambda (1 - p) + s\_6 \mathfrak{a} - s\_7 \mathfrak{a} \mathfrak{h} \tag{1h}$$

that satisfies that ∑<sup>7</sup> *<sup>i</sup>*=<sup>0</sup> *si* = 1. As we will see in Appendix B, the solution of the above system of equations provides the steady-state distribution of the Markov chain of Figure A1.

We also define *x*1, *x*2,..., *x*<sup>16</sup> as the solution of the following system of equations:

$$-s\_0 = -\mathbf{x}\_1(\lambda + \boldsymbol{\kappa}) + \mu \mathbf{x}\_7 + \mu \mathbf{x}\_{10} + \mu \mathbf{x}\_3 + \mu \mathbf{x}\_6 \tag{2a}$$

$$-s\_1 = -\ x\_2(\lambda + a) + a\mathbf{x}\_1 + a\mathbf{x}\_2 \tag{2b}$$

$$-s\mathfrak{z} = -\operatorname{xy}(\lambda + \mathfrak{a} + \mu) + \lambda(1 - p)\mathfrak{x}\_1 + \lambda(1 - p)\mathfrak{x}\_2$$

$$+\mu\mathbf{x}\_{15}\tag{2c}$$

$$-s\_2 = -\mathbf{x}\_4(\lambda + \mathfrak{a} + \mu) + \mu \mathbf{x}\_{16} \tag{2d}$$

$$-s\_3 = -\ x\_5(\lambda + \kappa + \mu) + \kappa x\_3 + \kappa x\_5 + \lambda(1 - p)x\_2$$

$$\mathbf{x} + \lambda(\mathbf{1} - p)\mathbf{x}\_{\mathbf{5}} \tag{2e}$$

$$\mathbf{x} - \mathbf{s}\_3 = -\mathbf{x}\_6(\lambda + \mathfrak{a} + \mu) + a\mathbf{x}\_4 + a\mathbf{x}\_6 \tag{2f}$$

$$-s\_4 = -\mathbf{x}\_{\overline{\gamma}}(\lambda + \mathfrak{a} + \mu) + \lambda p\mathbf{x}\_1 + \lambda p\mathbf{x}\_{\overline{\gamma}} + \mu \mathbf{x}\_{16} \tag{2g}$$

$$-s\_4 = -\ x\_8(\lambda + \mathfrak{a} + \mu) + \mu x\_{15} \tag{2h}$$

$$-s\_{\mathfrak{F}} = -\mathbf{x}\_{\mathfrak{G}}(\lambda + \mathfrak{a} + \mathfrak{p}) + \lambda p\mathbf{x}\_{2} + \lambda p\mathbf{x}\_{\mathfrak{G}} + a\mathbf{x}\_{\mathfrak{G}} + a\mathbf{x}\_{\mathfrak{F}} \tag{2i}$$

$$-s\mathfrak{s} = -\varkappa\_{10}(\lambda + \mathfrak{a} + \mu) + a\varkappa\_{10} + a\infty \tag{2}$$

$$-s\_6 = -\underset{\cdot \quad \cdot \quad \cdot}{\text{ $\bf $ }} (\lambda + 2\mu + \mathfrak{a}) + \lambda p \ge \underset{\cdot \quad \cdot \quad \cdot}{\text{ $\bf $ }} \lambda p \ge \underset{\cdot \quad \cdot \quad \cdot}{\text{ $}} p \ge \underset{\cdot \quad \cdot \quad \cdot}{\text{$ }} p$$

$$\mathbf{x} + \lambda(\mathbf{1} - p)\mathbf{x}\_{7} + \lambda(\mathbf{1} - p)\mathbf{x}\_{11} \tag{2k}$$

$$\begin{aligned} -s\_6 &= -\mathbf{x}\_{12}(\lambda + 2\mu + a) + \lambda(1 - p)\mathbf{x}\_8 + \lambda(1 - p)\mathbf{x}\_{12} \\ -s\_6 &= -\mathbf{x}\_{13}(\lambda + 2\mu + a) + \lambda p\mathbf{x}\_4 + \lambda p\mathbf{x}\_{13} \end{aligned} \tag{21}$$

$$-s\_7 = -\varkappa\_{14}(\lambda + 2\mu) + \lambda p \mathbf{x}\_5 + \lambda p \mathbf{x}\_{14} + \lambda(1 - p) \mathbf{x}\_9$$
 
$$\lambda(1 - p)\mathbf{x}\_{14} + \lambda p \mathbf{x}\_{15} \tag{2.7}$$

$$\begin{aligned} &+\lambda(1-p)\mathbf{x}\_{14} + a\mathbf{x}\_{11} \\ &-\mathbf{s}\_7 = -\mathbf{x}\_{15}(\lambda + 2\mu) + \lambda(1-p)\mathbf{x}\_{10} + \lambda(1-p)\mathbf{x}\_{15} \end{aligned} \tag{2n}$$

$$+\alpha \mathbf{x}\_{12} \tag{2o}$$

$$-s\_7 = -\left.\mathbf{x}\_{16}(\lambda + 2\mu) + \lambda p\mathbf{x}\_6 + \lambda p\mathbf{x}\_{16} + a\mathbf{x}\_{13} \tag{2p}$$

where *s*0,*s*1, ... ,*s*<sup>7</sup> are given in Equation (1a–h). As we explain in Appendix B, the values *x*1, ... , *x*<sup>16</sup> coincide with the generation time of all the packets in the system for all the possible states of the Markov chain.

In the following result, we use the Stochastic Hybrid System technique [17] to characterize the average Age of Information of this system and we show that it can be done by solving the above system of equations.

**Proposition 1.** *The average Age of Information of a system with two parallel servers with the energy harvesting is given by*

$$\mathbf{x}\_1 + \mathbf{x}\_2 + \mathbf{x}\_3 + \mathbf{x}\_5 + \mathbf{x}\_7 + \mathbf{x}\_9 + \mathbf{x}\_{11} + \mathbf{x}\_{14\_{11}}$$

*where x*1, *x*2,..., *x*<sup>16</sup> *are the solution of Equation (2a–p).*

#### **Proof.** See Appendix B.

In Proposition 1, we show that the computation of the average Age of Information of the system under study requires to solve Equation (2a–p), which is a system of 16 linear equations with 16 variables. Now, we aim to show that this system of equations has a special structure and how it can be used to obtain a method to compute the average Age of Information by solving a simpler system. Let us first present the following auxiliary results.

**Lemma 1.** *The Equation (2d,f,m,p), form a system of 4 linear equations with 4 variables (x*4, *x*6, *x*<sup>13</sup> *and x*16*). Let*

$$c = \frac{\lambda p \mathfrak{a}}{\lambda + \mathfrak{a} + \mu} \left( \frac{1}{\lambda + \mu} + \frac{1}{\lambda(1 - p) + 2\mu + \mathfrak{a}} \right). \tag{3}$$

*We have that*

*<sup>x</sup>*<sup>16</sup> <sup>=</sup> *<sup>s</sup>*<sup>7</sup> <sup>−</sup> *<sup>λ</sup>ps*<sup>3</sup> *<sup>λ</sup>*+*<sup>μ</sup>* <sup>−</sup> *<sup>α</sup>s*<sup>6</sup> *<sup>λ</sup>*(1−*p*)+2*μ*+*<sup>α</sup>* <sup>−</sup> *c s*<sup>2</sup> *<sup>c</sup><sup>μ</sup>* <sup>−</sup> (*λ*(<sup>1</sup> <sup>−</sup> *<sup>p</sup>*) + <sup>2</sup>*μ*) , (4)

*and*

$$\mathbf{x}\_4 = \frac{\mu \mathbf{x}\_{16} + \mathbf{s}\_2}{\lambda + \mathfrak{a} + \mu}. \tag{5}$$

*as well as*

$$x\_6 = \frac{a x\_4 + s\_3}{\lambda + \mu}.\tag{6}$$

#### **Proof.** See Appendix C.

**Lemma 2.** *The Equation (2h,j,l,o), form a system of 4 linear equations with 4 variables (x*8, *x*10, *x*<sup>12</sup> *and x*15*). Let*

$$d = \frac{\lambda (1 - p)\alpha}{\lambda + \alpha + \mu} \left( \frac{1}{\lambda + \mu} + \frac{1}{\lambda p + 2\mu + \alpha} \right). \tag{7}$$

*We have that*

$$x\_{15} = \frac{s\gamma - \frac{\lambda \text{psg}}{\lambda + \mu} - \frac{as\_6}{\lambda(1 - p) + 2\mu + a} - d \text{ s}\_4 +}{d\mu - (\lambda(1 - p) + 2\mu)},\tag{8}$$

*and*

$$\mathbf{x}\_8 = \frac{\mu \mathbf{x}\_{15} + \mathbf{s}\_4}{\lambda + \mathfrak{a} + \mu}. \tag{9}$$

*as well as*

$$\mathbf{x}\_{10} = \frac{\mathbf{a}\mathbf{x}\_{8} + \mathbf{s}\_{9}}{\lambda + \mu}. \tag{10}$$

**Proof.** The proof is symmetric to the proof of Lemma 1 and, therefore, we omit it for clarity of the presentation.

We now writ Equation (2a–p) that have not been analyzed in the previous lemmas:

$$-s\_0 = -\left(\mathbf{x}\_1(\lambda + \mathbf{a}) + \mu\mathbf{x}\_7 + \mu\mathbf{x}\_{10} + \mu\mathbf{x}\_3 + \mu\mathbf{x}\_6\right) \tag{11a}$$

$$-s\_1 = -\,\, x\_2\lambda + a\mathbf{x}\_1\tag{11b}$$

$$-s\_2 = -\mathbf{x}\_3(\lambda p + \mathfrak{a} + \mu) + \lambda(1 - p)\mathbf{x}\_1 + \mu \mathbf{x}\_{15} \tag{11c}$$

$$-s\_3 = -\mathbf{x}\_5(\lambda p + \mu) + a\mathbf{x}\_3 + \lambda(1 - p)\mathbf{x}\_2 \tag{11d}$$

$$-s\_4 = -\mathbf{x}\_7(\lambda(1-p) + \mathfrak{a} + \mu) + \lambda p\mathbf{x}\_1 + \mu\mathbf{x}\_{16} \tag{11e}$$

$$-s\_{\overline{5}} = -\mathbf{x}\_{\overline{9}}(\lambda(1-p) + \mu) + \lambda p \mathbf{x}\_{\overline{1}} + \mathbf{a} \mathbf{x}\_{\overline{7}} \tag{11f}$$

$$-s\_6 = -x\_{11}(2\mu + a) + \lambda p x\_3 + \lambda(1 - p)x\_7 \tag{11g}$$

$$-s\_7 = -\mathbf{x}\_{14}\mathbf{2}\mu + \lambda p\mathbf{x}\_5 + \lambda(1-p)\mathbf{x}\_9 + a\mathbf{x}\_{11} \tag{11h}$$

which is a system of 8 equations with 12 variables (the variables are *x*1, *x*2, *x*3, *x*5, *x*6, *x*7, *x*8, *x*9, *x*10, *x*11, *x*14, *x*<sup>15</sup> and *x*16). However, an explicit expression of *x*<sup>6</sup> and *x*<sup>16</sup> have been obtained in Lemma 1 and an explicit explicit expression of *x*<sup>10</sup> and *x*<sup>15</sup> in Lemma 2. Therefore, the next result follows.

**Proposition 2.** *Let x*1, *x*2, *x*3, *x*5, *x*7, *x*9, *x*11, *x*<sup>14</sup> *be the solution of Equation (11a–h) (recall that x*<sup>6</sup> *and x*<sup>16</sup> *are given in Lemma 1 and x*<sup>10</sup> *and x*<sup>15</sup> *in Lemma 2). Therefore, the average Age of Information of a two parallel servers system with energy harvesting is given by*

$$\mathbf{x}\_1 + \mathbf{x}\_2 + \mathbf{x}\_3 + \mathbf{x}\_5 + \mathbf{x}\_7 + \mathbf{x}\_9 + \mathbf{x}\_{11} + \mathbf{x}\_{14} \dots$$

*3.1. Analysis When λ Tends to* ∞

We now consider the asymptotic regime where *λ* → ∞ when the parameters *α* and *μ* are finite.

We first focus on the solution of Equation (1a–h).

**Lemma 3.** *When λ* → ∞ *and* max(*α*, *μ*) < 0*, the solution of Equation (1a–h) satisfies that s*<sup>0</sup> = *s*<sup>1</sup> = *s*<sup>2</sup> = *s*<sup>3</sup> = *s*<sup>4</sup> = *s*<sup>5</sup> = 0*.*

#### **Proof.** See Appendix D.

From this result, we conclude that, in the asymptotic regime under study, *s*<sup>6</sup> + *s*<sup>7</sup> = 1 and that Equation (11a–h) can be written as

$$0 = -\left.\mathbf{x}\_1(\lambda + \mathfrak{a}) + \mu\mathbf{x}\_7 + \mu\mathbf{x}\_{10} + \mu\mathbf{x}\_3 + \mu\mathbf{x}\_6 \tag{12a}$$

$$0 = -\mathbf{x}\_2 \lambda + \mathbf{a} \mathbf{x}\_1 \tag{12b}$$

$$0 = -\mathbf{x}\_3(\lambda p + \kappa + \mu) + \lambda(1 - p)\mathbf{x}\_1 + \mu \mathbf{x}\_{15} \tag{12c}$$

$$0 = -\mathbf{x}\_5(\lambda p + \mu) + \mathbf{a}\mathbf{x}\_3 + \lambda(1 - p)\mathbf{x}\_2 \tag{12d}$$

$$0 = -\mathbf{x}\_{\overline{7}}(\lambda(1-p) + \mathfrak{a} + \mu) + \lambda p\mathbf{x}\_{1} + \mu\mathbf{x}\_{16} \tag{12e}$$

$$0 = -\varkappa \rho (\lambda (1 - p) + \mu) + \lambda p \mathbf{x}\_2 + \mathfrak{a} \mathbf{x}\_7 \tag{12f}$$

$$-s\_6 = -\mathbf{x}\_{11}(2\mu + \boldsymbol{\alpha}) + \lambda \boldsymbol{p} \mathbf{x}\_3 + \lambda (1 - \boldsymbol{p}) \mathbf{x}\_7 \tag{12g}$$

$$-s\_7 = -\mathbf{x}\_{14}\mathbf{2}\mu + \lambda p\mathbf{x}\_5 + \lambda(1-p)\mathbf{x}\_9 + a\mathbf{x}\_{11} \tag{12h}$$

If we replace the last equation by the sum the last two equations, we get the following equivalent system:

0 = − *x*1(*λ* + *α*) + *μx*<sup>7</sup> + *μx*<sup>10</sup> + *μx*<sup>3</sup> + *μx*<sup>6</sup> (13a)

$$0 = -x\_2\lambda + a\mathbf{x}\_1\tag{13b}$$

$$0 = -\mathbf{x}\_3(\lambda p + \mathfrak{a} + \mu) + \lambda(1 - p)\mathbf{x}\_1 + \mu\mathbf{x}\_{15} \tag{13c}$$

$$0 = -\mathbf{x}\_5(\lambda p + \mu) + \mathbf{a}\mathbf{x}\_3 + \lambda(1 - p)\mathbf{x}\_2 \tag{13d}$$

$$0 = -x\_{\overline{\gamma}}(\lambda(1-p) + \mathfrak{a} + \mu) + \lambda p \mathbf{x}\_1 + \mu \mathbf{x}\_{16} \tag{13e}$$

$$0 = -\mathbf{x}\_{\theta}(\lambda(1-p) + \mu) + \lambda p\mathbf{x}\_{2} + a\mathbf{x}\_{7} \tag{13f}$$

$$-s\_6 = -\mathbf{x}\_{11}(2\mu + \boldsymbol{\alpha}) + \lambda \boldsymbol{p} \mathbf{x}\_3 + \lambda (1 - \boldsymbol{p}) \mathbf{x}\_7 \tag{13g}$$

$$-1 = -\left(\mathbf{x}\_{14} + \mathbf{x}\_{11}\right)2\mu + \lambda p\mathbf{x}\mathfrak{s} + \lambda(1-p)\mathbf{x}\mathfrak{s}$$

$$\mathbf{x}\_{1} + \lambda p \mathbf{x}\_{3} + \lambda (1 - p) \mathbf{x}\_{7} \tag{13h}$$

We now analyze the solution of Equation (13a–h) for large *λ*.

**Lemma 4.** *When λ* → ∞ *and* max(*α*, *μ*) < 0*, the solution of Equation (13a–h) satisfies that x*<sup>1</sup> = *x*<sup>2</sup> = *x*<sup>3</sup> = *x*<sup>5</sup> = *x*<sup>7</sup> = *x*<sup>9</sup> = 0*.*

**Proof.** The proof uses the same arguments than those of the proof of Lemma 3 and, therefore, we omit it.

From the above results, we conclude that the average Age of Information of this system is given by *x*<sup>11</sup> + *x*14. Furthermore, using that *x*<sup>3</sup> = *x*<sup>5</sup> = *x*<sup>7</sup> = *x*<sup>9</sup> = 0 and from (13h), we obtain that *x*<sup>11</sup> + *x*<sup>14</sup> = <sup>1</sup> <sup>2</sup>*<sup>μ</sup>* , which gives the following result:

**Proposition 3.** *When λ* → ∞ *and* max(*α*, *μ*) < ∞*, the average Age of Information of a two parallel servers system with energy harvesting is given by* <sup>1</sup> 2*μ .*

It is important to remark that the average Age of Information of the system under study in the considered asymptotic regime does not depend on the arrival rate of energy packets, i.e., on *α*.

#### *3.2. Limitations to Analyze More Complex Models*

We have tried to extend the results presented in this section to more complex systems and we have noticed that this task is extremely difficult. The main reason for this that the Markov chain to be considered (and, as a consequence, the number of equations to be solved ) increases at a very high rate with the complexity of the system. This suggests that the analysis of the average Age of Information of more complex systems requires to consider other techniques such as simulations or approximation techniques.

#### **4. Performance Evaluation**

In the previous section, we have obtained an explicit expression of the average Age of Information of a system with two parallel servers and energy harvesting. Now, we aim to evaluate the obtained expression to analyze its main properties. We have performed a large number of simulations changing the values of the parameters and the illustrations of this section are illustrative of the general pattern. (The plots of this section can be reproduced using the code of https://github.com/josudoncel/AioParallelEnergy, accessed on 18 November 2021).

#### *4.1. Parallel Servers vs. Single Server*

We aim to compare the average Age of Information of the model under study in this paper with that of a system with a single server with energy harvesting (the average Age of Information of the latter model has been studied in [19]). For this purpose, we consider the following systems: (i) a single server with arrival rates *λ* and *α* of data packets and energy packets, respectively, and service rate 2*μ* (which is represented with a dotted line); (ii) a single server with arrival rates *λ*/2 and *α* of data packets and energy packets, respectively, and service rate 2*μ* (which is represented with a solid line); and (iii) two parallel servers with *p* = 0.5, i.e., the arrival rate to each server is *λ*/2, the service rate is *μ* and the arrival rate of energy packets is *α* (which is represented with a dashed line).

We first consider that the arrival rate of energy packet is very large. For this instance, there is always energy to transmit the data packet when it ends service, or in other words, the data packets are never lost because there is no energy. We note that, when this occurs, the comparison study we carry out here coincides with the analysis of Section III2 of [18]. In Figure 3, we consider *μ* = 1 and *α* = 103 and we plot the evolution of the average Age of Information of the systems under study in this section when *λ* varies from 0.1 to 103. We observe that the obtained plot coincides with Figure 4 of [18]. From this illustration, the authors in [18] conclude that some properties of the classical performance metrics of queuing theory, such as mean delay, are verified for the average Age of Information (the system that minimizes performance is the single server with service 2*μ*), but other properties do not (the mean delay of a system with two parallel servers with arrival rate to each equal to *λ*/2 is the same as that of a single server with arrival rate *λ*/2, but this is not the case for the average Age of Information). This illustration shows that, as expected, these conclusions also hold for our model when *α* is large.

**Figure 3.** Average Age of Information of the three systems under comparison when *λ* changes from 0.1 to 103. *α* = 1000 and *μ* = 1.

We now aim to compare the performance of these systems when the arrival rate is not large. Thus, we fix the parameters equal to the previous plot and we consider different values of *α*. First, we consider *α* = 1 and in Figure 4, we observe that the average Age of Information of all the systems do not change substantially with respect to the previous plot when *λ* is small. However, as *λ* grows, the average Age of Information of the systems with a single server decreases less than that of two parallel servers. We have also seen that, when *λ* is large, the average Age of Information of two parallel servers is 0.5, of a single server with double service rate 1.5 and of a single server with half traffic is 1.67. The difference on the average Age of Information between these system is even larger if we consider a smaller value of *α*. For instance, in Figure 5, we illustrate that the system with the smallest average Age of Information is the system with two parallel servers for almost all the values of *λ* we have considered.

**Figure 4.** Average Age of Information of the three systems under comparison when *λ* changes from 0.1 to 103. *α* = 1 and *μ* = 1.

**Figure 5.** Average Age of Information of the three systems under comparison when *λ* changes from 0.1 to 103. *α* = 0.1 and *μ* = 1.

#### *4.2. Convergence Analysis of Proposition 3*

In Proposition 3, we show that, when *λ* → ∞, the average Age of Information tends to <sup>1</sup> <sup>2</sup>*<sup>μ</sup>* , i.e., it does not depend on *α* or *p*. In this part of the article, we aim to study this convergence. We consider *μ* = 1 and *p* = 0.1 and we vary *α* from 0.1 to 103. We plot the evolution of the average Age of Information for different values of *λ* in Figure 6 and we observe that, as *λ* increases, the obtained values tend to 0.5, which confirms the result of Proposition 3. We consider *p* = 0.45 in Figure 7 to study how this convergence depends on *p* and we observe that it seems to converge at the same rate to 0.5 than in the previous case.

From these illustrations, we also conclude that the average Age of Information is not monotone with respect to *α* (note that in [19] it has been shown that the average Age of Information of a system with a single server with energy harvesting is monotone with respect to *α*). For instance, we see that, when *p* = 0.45 and *λ* = 10, the average Age of Information increases when *λ* is small, then decreases and finally decreases again.

**Figure 6.** Average Age of Information with respect to *α* for different values of *λ*, when *α* varies from 0.1 to 103. *μ* = 1 and *p* = 0.1.

**Figure 7.** Average Age of Information with respect to *α* for different values of *λ*, when *α* varies from 0.1 to 103. *μ* = 1 and *p* = 0.45.

#### *4.3. Analysis of ther Parameter p*

We now focus on the parameter *p* that determines the proportion of the total incoming arrival rate is sent to each of the servers. In Figure 8, we consider *α* = 1 and we plot the average Age of Information when *p* varies from 0.01 to 0.99 for different values of *λ* and *μ*. We observe that, in all the considered instances, the average Age of Information first decreases with *p* and then increases. This suggests that the minimum of the average Age of Information for *p* is achieved when this parameter is close to 0.5, i.e., in the symmetric case that has been studied in Section 4.1.

**Figure 8.** Average Age of Information with respect to *p* for different values of *λ* and *μ*, when *p* varies from 0.01 to 0.99. *α* = 1.

#### **5. Conclusions**

We consider a system with two transmission channels that do not communicate and a energy source that generates energy following a random process. We model this

system as a system with Poisson arrivals of status updates to two parallel servers and of energy packets to a battery. We study the average Age of Information of system using the Stochastic Hybrid System method. We first show that the average Age of Information of this system can be computed by solving a system of 8 linear equations (Proposition 2). We then consider that the arrival rate tends to infinity and, in this case, we show that the average Age of Information is equal to one divided by the sum of the service rate of the servers. This implies that, in this regime, the average Age of Information does not depend on the probability to dispatch jobs to the server and on the arrival rate of energy packets. We then study numerically the performance of our model with single server systems with energy harvesting and the same load of data packets as in our model. We conclude that, when the arrival rate of energy packets tends to infinity, the same conclusions of [18] follow in our model (i.e., the average Age of Information does not satisfy the same properties that other performance measures used in queuing theory such as number of packets in the queue). However, when the arrival rate of energy packets is not large, we conclude that the parallel server system outperforms the single server systems.

For future work, we would like to analyze the average Age of Information with a larger number of servers and with buffer for the energy packets and the data packets. Furthermore, we would like to study optimality of this model for some parameters such as *p*. We would also like to extend this model to include other parameters of the system such as transmit power or channel statistics to address real-life problems. Finally, we are also interested in exploring the performance of this model when it requires energy not only to send a status update to the monitor after getting service, but also to receive data packets from the source.

**Funding:** Josu Doncel has received funding from the Department of Education of the Basque Government through the Consolidated Research Group MATHMODE (IT1294-19), from the Marie Sklodowska-Curie grant agreement No 777778 and from the Spanish Ministry of Science and Innovation with reference PID2019-108111RB-I00 (FEDER/AEI).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The code to reproduce the results of this article are available at https://github.com/josudoncel/AioParallelEnergy, accessed on 18 November 2021.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Average Age of Information and SHS**

In the SHS, the system is modeled as a hybrid state (*q*(*t*), **x**(*t*)), where *q*(*t*) a state of a continuous time Markov Chain and **x**(*t*) is a vector whose components is the generation time of each of the updates. We assume that in the monitor there is the update with the latest generation time.

A link *l* of the Markov Chain represents a transition between two states, which occurs with rate *λ<sup>l</sup>* . In each transition *l*, the vector **x** changes to **x** using a transformation matrix **A***l*, that is, **x** = **xA***l*. Therefore, **x**(*t*) is a piece-wise function.

For each state *q* of the Markov Chain, we define **b***<sup>q</sup>* as the vector whose elements are zero or one. One values represent the packets that are present in the system and therefore that the time from their generation increases at unit rate. On the other hand, zero values represent the updates that are not in the system.

We assume the Markov Chain is ergodic and we denote by *π<sup>q</sup>* the stationary distribution of state *q*. We denote by L*<sup>q</sup>* the set of outgoing links of state *q* and L *<sup>q</sup>* the set of links that get into state *q*. We now present the following theorem that will be used to characterize the average Age of Information:

**Theorem A1** ([17], Thm 4)**.** *Let vq*(*i*) *denote the i-th element of the vector* **v***q. For each state q, if* **v***q is a non-negative solution of the following system of equations*

$$\mathbf{v}\_q \sum\_{l \in \mathcal{L}\_q} \lambda^l = \mathbf{b}\_q \pi\_q + \sum\_{l \in \mathcal{L}'\_q} \lambda^l \mathbf{v}\_{q\_l} \mathbf{A}\_{l\prime} \tag{A1}$$

*then the average Age of Information is* Δ = ∑*<sup>q</sup> vq*(0)*.*

#### **Appendix B. Proof of Proposition 1**

We use the SHS technique to compute the average age of information of this system. We denote by **x** = [*x*0(*t*) *x*1(*t*) *x*2(*t*)], where *x*0(*t*) represents the age of the information at time t and *xi*(*t*) the age of information if a data packet that is getting service in data queue *i* is sent successfully to the monitor, *i* = 1, 2.

The Markov Chain of this model is presented in Figure A1. We denote by *ijk* the state where there are *i* data packets in data queue 1, *j* data packets in data queue 2 and *k* energy packets in the energy queue. We now describe each of the transitions of this model.

**Figure A1.** The SHS Markov chain for the model with two parallel data queues and a battery.


With the roles of data queue 1 and data queue 2 reversed, transitions 11–21 coincide with 0–10, respectively. Besides, transitions 21–24 represent an arrival of an energy queue, which increases by one the number of energy packets if the energy queue is empty, but the vector **x** is never modified. Finally, we focus on transitions 25 and 26:



**Table A1.** Table of SHS transitions of Figure A1.

We represent the evolution of **x** for all the above transitions in Table A1 . We now that the information is presented in the same way as in Table A1, that is, in each row, each transition is represented in a different row, in the second column of the table, the origin and the destination state of the Markov Chain, in the third column the rate of each transition, whereas in the last two columns we show the evolution of **x** and *v*¯*ql Al*, respectively.

Let <sup>Q</sup>*<sup>p</sup>* be the set of state of the Markov Chain of Figure A1. The stationary distribution of state *<sup>q</sup>* ∈ Q*<sup>p</sup>* is denoted by *<sup>π</sup>q*. We now focus on the stationary distribution of the Markov Chain of Figure A1. The balance equations are provided next:

$$\begin{aligned} \pi\_{000}(\lambda+a) &= \pi\_{101}\mu + \pi\_{100}\mu + \pi\_{011}\mu + \pi\_{010}\mu \\ \pi\_{001}\lambda &= \pi\_{000}a \\ \pi\_{010}(\lambda p + \mu + a) &= \pi\_{000}\lambda(1-p) + \pi\_{110}\mu + \pi\_{111}\mu \\ \pi\_{011}(\lambda p + \mu) &= \pi\_{010}a + \pi\_{001}\lambda(1-p) \\ \pi\_{100}(\lambda(1-p) + a + \mu) &= \pi\_{000}\lambda p + \pi\_{110}\mu + \pi\_{111}\mu \\ \pi\_{101}(\lambda(1-p) + \mu) &= \pi\_{001}\lambda p + \pi\_{100}\mu \\ \pi\_{110}(a + 2\mu) &= \pi\_{010}\lambda p + \pi\_{100}\lambda(1-p) \\ \pi\_{111}2\mu &= \pi\_{011}\lambda p + \pi\_{101}\lambda(1-p) + \pi\_{110}a \end{aligned}$$

The Markov chain under study is clearly ergodic. Therefore, there always exists a unique solution of the above system of equations that satisfies <sup>∑</sup>*q*∈Q*<sup>p</sup> <sup>π</sup><sup>q</sup>* = 1.

We remark that the above system of equations coincides with Equation (1a–h).

We now define the vector **<sup>b</sup>***<sup>q</sup>* for any *<sup>q</sup>* ∈ Q*p*. For *<sup>q</sup>* ∈ {000, 001}, we have that **b***<sup>q</sup>* = [100], whereas for *q* ∈ {100, 101}, **b***<sup>q</sup>* = [110], for *q* ∈ {010, 011}, **b***<sup>q</sup>* = [101] and for *<sup>q</sup>* ∈ {110, 111}, **<sup>b</sup>***<sup>q</sup>* = [111]. Besides, for all *<sup>q</sup>* ∈ Q*p*, we denote by *vq*(*i*) the *<sup>i</sup>*-th component of vector **v***q*, with *i* = 0, 1, 2.

We now aim to apply Theorem A1 to this model and, from (A1), it follows that

$$\begin{aligned} \mathbf{v}\_{000}(\lambda + \mathfrak{a}) &= [\pi\_{000} \ 0 \ 0] + \mu[v\_{100}(0) \ 0 \ 0] \\ &+ \mu[v\_{101}(1) \ 0 \ 0] + \mu[v\_{010}(0) \ 0 \ 0] \\ &+ \mu[v\_{011}(2) \ 0 \ 0] \\ \mathbf{v}\_{001}(\lambda + \mathfrak{a}) &= [\pi\_{001} \ 0 \ 0] + \mathfrak{a}[v\_{000}(0) \ 0 \ 0] \end{aligned} \tag{A2a}$$

$$+\kappa \left[v\_{001}(0)\,0\,0\right] \tag{A2b}$$

$$\begin{aligned} \mathbf{v}\_{010}(\lambda + \mu + \mu) &= \begin{bmatrix} \pi\_{010} \ 0 \ \pi\_{010} \end{bmatrix} + \lambda \begin{pmatrix} 1 - p \end{pmatrix} \begin{bmatrix} v\_{000}(0) \ 0 \ 0 \end{bmatrix} \\ &+ \lambda \begin{pmatrix} 1 - p \end{pmatrix} \begin{bmatrix} v\_{010}(0) \ 0 \ 0 \end{bmatrix} \end{aligned}$$

$$\begin{aligned} +\mu[v\_{111}(1)\,0\,v\_{111}(2)] \\ \mathbf{v}\_{011}(\lambda+\mathfrak{a}+\mu) &= [\pi\_{011}\,0\,\pi\_{011}] + a[v\_{010}(0)\,0\,v\_{010}(2)] \\ &+ a[v\_{011}(0)\,0\,v\_{011}(2)] \end{aligned} \tag{A2c}$$

$$\begin{aligned} &+\lambda(1-p)[v\_{001}(0)\,0\,0] \\ &+\lambda(1-p)[v\_{011}(0)\,0\,0] \end{aligned} \tag{A2d}$$

$$\begin{aligned} \mathbf{v}\_{100}(\lambda + \boldsymbol{\alpha} + \boldsymbol{\mu}) &= \begin{bmatrix} \pi\_{100} \ \pi\_{100} \ 0 \end{bmatrix} + \lambda p[v\_{000}(0) \ 0 \ 0] \\ &+ \lambda p[v\_{100}(0) \ 0 \ 0] \end{aligned}$$

$$\begin{aligned} +\mu[v\_{111}(\mathbf{2})\,v\_{111}(\mathbf{1})\,\mathbf{0}] \\ \mathbf{v}\_{101}(\lambda+\kappa+\mu) = [\pi\_{101}\,\pi\_{101}\,\mathbf{0}] + \lambda\,p[v\_{001}(\mathbf{0})\,\mathbf{0}\,\mathbf{0}] \end{aligned} \tag{A2e}$$

+ *λp*[*v*101(0) 0 0] + *α*[*v*101(0) *v*101(1) 0]

$$\begin{aligned} +\mathbb{1}\left[v\_{100}(0)\,\mathbb{1}\_{100}(1)\,0\right] \\ \mathbf{v}\_{110}(\lambda+2\mu+\alpha) &= \left[\pi\_{110}\,\pi\_{110}\,\pi\_{110}\right] \\ &+\lambda p \left[v\_{010}(0)\,0\,v\_{100}(2)\right] \\ &+\lambda p \left[v\_{110}(0)\,0\,v\_{110}(2)\right] \\ &+\lambda(1-p) \left[v\_{100}(0)\,v\_{100}(1)\,0\right] \\ &+\lambda(1-p) \left[v\_{110}(0)\,v\_{110}(1)\,0\right] \\ \mathbf{v}\_{111}(\lambda+2\mu\alpha) &= \left[\pi\_{111}\,\pi\_{111}\,\pi\_{111}\right] \\ &+\lambda p \left[v\_{011}(0)\,0\,v\_{011}(2)\right] \\ &+\lambda p \left[v\_{111}(0)\,0\,v\_{111}(2)\right] \\ &+\lambda(1-p) \left[v\_{101}(0)\,v\_{101}(1)\,0\right] \\ &+\lambda(1-p) \left[v\_{111}(0)\,v\_{111}(1)\,0\right] \end{aligned} \tag{A2g}$$

$$+a[v\_{111}(0)\,v\_{111}(1)\,v\_{111}(2)].\tag{A2h}$$

$$\cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot \quad \cdot$$

+ *α*[*v*110(0) *v*110(1) *v*110(2)]

We note that the above expression is the same as Equation (2a–p) with the following change of variable: *v*000(0) = *x*1, *v*001(0) = *x*2, ... , *v*111(1) = *x*15, *v*111(2) = *x*<sup>16</sup> and *s*<sup>0</sup> = *π*000, *s*<sup>1</sup> = *π*001,...,*s*<sup>7</sup> = *π*111. Using Theorem A1, the desired result follows.

#### **Appendix C. Proof of Lemma 1**

We first write the Equation (2d,f,m,p), :

$$-s\mathfrak{z} = -\mathfrak{x}\_4(\lambda + \mathfrak{a} + \mu) + \mu\mathfrak{x}\_{16} \tag{A3a}$$

$$-s\_3 = -\mathbf{x}\_6(\lambda + \mu) + a\mathbf{x}\_4\tag{A3b}$$

$$-s\_6 = -\mathbf{x}\_{13}(\lambda(1-p) + 2\mu + \mathfrak{a}) + \lambda p\mathbf{x}\_4 \tag{A3c}$$

$$-s\_7 = -\mathbf{x}\_{16}(\lambda(1-p) + 2\mu) + \lambda p\mathbf{x}\_6 + \mu\mathbf{x}\_{13\prime} \tag{A3d}$$

And we observe that it is a system with 4 linear equations with 4 variables. Moreover, from (A3b), we get

$$\mathbf{x}\_6 = \frac{a\mathbf{x}\_4 + \mathbf{s}\_3}{\lambda + \mu}'$$

which gives (6), whereas from (A3c) we get

$$x\_{13} = \frac{\lambda p x\_4 + s\_6}{\lambda (1 - p) + 2\mu + \alpha}.$$

Substituting these expression in Equation (A3a–d), we obtain the following system of equations:

$$-s\_2 = -\mathbf{x}\_4(\lambda + \mathfrak{a} + \mu) + \mu \mathbf{x}\_{16} \tag{A4a}$$

$$-s\_7 = -\mathbf{x}\_{16}(\lambda(1 - p) + 2\mu) + \lambda p \frac{a \mathbf{x}\_4 + \mathbf{s}\_3}{\lambda + \mu}$$

$$+ a \frac{\lambda p \mathbf{x}\_4 + \mathbf{s}\_6}{\lambda(1 - p) + 2\mu + \mathfrak{a}'} \tag{A4b}$$

From (A4a), it results that

$$\mathbf{x\_4}(\lambda + \mathfrak{a} + \mu) = \mu \mathbf{x\_{16}} + \mathfrak{s}\_2 \iff \mathbf{x\_4} = \frac{\mu \mathbf{x\_{16}} + \mathfrak{s}\_2}{\lambda + \mathfrak{a} + \mu} \mathbf{x\_4}$$

which is equal to (5) and substituting the obtained expression in (A4b) and simplifying, we get (4). And the desired result follows.

#### **Appendix D. Proof of Lemma 3**

We first note that, from (1a), we have that

$$s\_0 = \frac{s\_2\mu + s\_3\mu + s\_4\mu + s\_5\mu}{\lambda + \mathfrak{a}},$$

which tends to zero when *λ* → ∞ because *si* < 1 for all *i* and *μ* < ∞. From (1b), we get that

$$s\_1 = \frac{s\_0 a}{\lambda},$$

which, using the previous result and *α* < ∞, also proves that *s*<sup>1</sup> tends to zero when *λ* → ∞ because *s*<sup>0</sup> < 1. We now focus on (1c),:

$$s\_2 = \frac{s\_0\lambda(1-p) + s\_6\mu + s\_7\mu}{\lambda p + \mu + \kappa}$$

and when *λ* → ∞, we have that *s*<sup>2</sup> tends to *s*0(1 − *p*)/*p* and this tends to zero because *s*<sup>0</sup> tends to zero. We now study (1d) and, using that *αs*<sup>0</sup> = *λs*<sup>1</sup> (see (1b)), we get

$$s\_3 = \frac{s\_0 a (1 - p) + s\_2 a}{\lambda p + \mu} \rho$$

F which also tends to zero when *λ* → ∞ because *s*<sup>0</sup> < 1 and *α* < ∞. From (1e), we obtain

$$s\_4 = \frac{s\_0 \lambda p + s\_6 \mu + s\_7 \mu}{\lambda (1 - p) + \mu + \mu}.$$

and we observe that *s*<sup>4</sup> tends to *s*<sup>0</sup> *p*/(1 − *p*) when *λ* → ∞ and, since *s*<sup>0</sup> tends to zero, so does *s*4. Finally, we have from (1f) and using that *αs*<sup>0</sup> = *λs*<sup>1</sup> (see (1b)),

$$\mathbf{s}\_{\mathfrak{g}} = \frac{\mathbf{s}\_0 a p + \mathbf{s}\_4 \mathfrak{a}}{\lambda (1 - p) + \mu} \mathsf{a}'$$

which also tends to zero when *λ* → ∞. And the desired result follows.

#### **References**


## *Article* **Age-Aware Utility Maximization in Relay-Assisted Wireless Powered Communication Networks**

**Ning Luan 1,2, Ke Xiong 1,2,3,\*, Zhifei Zhang 1,\*, Haina Zheng 1,2, Yu Zhang 4, Pingyi Fan 5,6 and Gang Qu <sup>7</sup>**


**Abstract:** This article investigates a relay-assisted wireless powered communication network (WPCN), where the access point (AP) inspires the auxiliary nodes to participate together in charging the sensor, and then the sensor uses its harvested energy to send status update packets to the AP. An incentive mechanism is designed to overcome the selfishness of the auxiliary node. In order to further improve the system performance, we establish a Stackelberg game to model the efficient cooperation between the AP–sensor pair and auxiliary node. Specifically, we formulate two utility functions for the AP–sensor pair and the auxiliary node, and then formulate two maximization problems respectively. As the former problem is non-convex, we transform it into a convex problem by introducing an extra slack variable, and then by using the Lagrangian method, we obtain the optimal solution with closed-form expressions. Numerical experiments show that the larger the transmit power of the AP, the smaller the age of information (AoI) of the AP–sensor pair and the less the influence of the location of the auxiliary node on AoI. In addition, when the distance between the AP and the sensor node exceeds a certain threshold, employing the relay can achieve better AoI performance than non-relaying systems.

**Keywords:** wireless powered communication networks; real-time state update; age of information; utility maximization

#### **1. Introduction**

With the large-scale deployment of the Internet of things (IoT) devices in applications such as environment surveillance and industrial control [1,2], status update systems that report real-time system information become increasingly more important. In such systems, it is required to make accurate decisions based on fresh information update and measurement of information freshness becomes necessary. However, traditional network performance metrics like delay and throughput cannot characterize the information freshness. Therefore, the concept of age of information (AoI) was introduced as the time duration from the generation time of the latest received status update packet to the current time moment [3]. For real-time update systems, the goal is to get status update information as fresh as possible, which can be considered as the minimization of AoI.

In IoT systems, information is normally collected by the edge devices or sensors. These sensors are generally powered by batteries of limited capacity, which require to be replaced or recharged periodically. However, battery replacement or frequent recharging

**Citation:** Luan, N.; Xiong, K.; Zhang, Z.; Zheng, H.; Zhang, Y.; Fan, P.; Qu, G. Age-Aware Utility Maximization in Relay-Assisted Wireless Powered Communication Networks. *Entropy* **2021**, *23*, 1177. https://doi.org/ 10.3390/e23091177

Academic Editors: Anthony Ephremides and Yin Sun

Received: 13 July 2021 Accepted: 9 August 2021 Published: 7 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

is labor-intensive and could be impossible, in particular in large-scale network scenarios and harsh environments. For this, energy harvesting (EH) technology has emerged as a promising alternative to collect energy from the external environment to power the lowpower IoT devices. It is believed that EH has great potential in the future sixth-generation (6G) communication networks. Therefore, it is also expected to be used in future industrial control, environment monitoring, and other real-time IoT applications.

Existing EH technologies are of two types: traditional natural energy source and radio-frequency (RF) signal-based energy source. Compared with traditional natural energy sources [4,5], RF signals are easy to control, can provide steady power, and have relatively low requirements on the deployment environment [6]. Evidently, we have seen many works on AoI-based wireless powered communication networks (WPCN) powered by RF EH [7–13]. In [7], an optimal online state update strategy to minimize the AoI over long-term time scale with energy constraints was studied. In [8], the performance of AoI in WPCN was analyzed, and it proved that the smaller the probability of packet generation, the smaller the average AoI. In [9], the emergency AoI (U-AoI) in WPCN was minimized. In [10], the uplink AoI in two-way wireless powered networks was discussed, where the nonlinear AoI expression was adopted. In [11], the AoI performance limit for the actual wireless power transfer (WPT) network was explored. In [12], the trade-off between storable energy and system AoI in the WPT network was investigated, and in [13], the optimal design of AoI-based fog computing WPCN was studied.

The above works only considered the basic three-node network model. Recently, researchers have extended this to the study of AoI performance in multiple-node networks. For example, the authors of [14] minimized the AoI in large-scale WPCN with multisensor nodes, and derived the solution of the average charging time of nodes in the network. The authors of [15] studied the WPT-powered networks and explored when to terminate energy collection and how to properly assign resources for uploading data in order to minimize AoI. In [16], an optimal online sampling strategy for joint optimization of update packet transmit was presented to minimize the AoI, and a deep reinforcement learning (DRL) approach was proposed to effectively learn the optimal AoI strategy. In [17], the AoI in WPCN with multi-user scheduling based on non-orthogonal multiple-access (NOMA) was discussed, and the closed-form expression of AoI was derived. However, the above work only considered the single-hop network. Due to factors such as too far or obstructions between the sensor and the AP, the connection between them cannot always be established. Therefore, relaying technology can be employed to help sensors transmit information to their sink node [18–21]. Particularly, in [18], a cooperative WPCN with flat fading was studied, in which a source and a relay collected energy from a remote power station. In [19], the AoI in a cooperative wireless communication system with synchronous wireless information and power transmission (SWIPT) was studied, where two protocols were discussed.

We observe that in most of these works on AoI-based relay-assisted WPCNs, it was assumed that the relay node (or auxiliary node) could directly participate in charging or information transmission. As a matter of fact, this may not be realistic in real-life WPCN because relay nodes may also have limited energy and thus are reluctant or refuse to collaborate in the transmission of energy and/or information. We consider this selfish nature of the relay nodes in this article and propose to design an effective incentive mechanism to improve the system AoI performance. In our proposed mechanism, the access point (AP) will coordinate auxiliary nodes to charge the sensor node based on RF until a sufficient amount of energy is harvested by the sensor node; then, the sensor node will send real-time status update information to the AP. Unlike a similar work [22] that uses a vague incentive mechanism and assumes the sensor node transmit packets directly to the AP, we use the spectrum priority as the incentive for the auxiliary nodes to participate in the charging process, and use the auxiliary node as the relay on the rout of the status update packets from the sensor node to the AP. For such a system, utility functions are defined based on AoI for AP and auxiliary nodes respectively, and the utility maximization problems are

formulated. In order to achieve a win–win benefit, we apply the Stackelberg game model to design the effective cooperation between the AP-sensor pair and auxiliary node.

We make the following key contributions in this article.

First, we extend the previous work on WPCN to the more realistic scenarios of relay-assisted WPCN with selfish auxiliary nodes. We describe the system model and propose a protocol to encourage the cooperation between AP and auxiliary node to keep information fresh.

Second, we introduce utility functions for the AP-sensor pair and the auxiliary node and then establish a Stackelberg game in order to improve the system's performance. To solve the non-convex utility maximization problem, we use the Lagrangian method based on a new slack variable and are able to obtain the optimal solution in the closed form.

Third, we conduct numerical simulations to show that larger transmit power of the AP results in smaller AoI and less dependency on the location of the auxiliary nodes. In addition, when the distance between the AP and the sensor exceeds a certain threshold, employing the relay can achieve better AoI performance than the current non-relaying systems.

The rest of this article is organized as follows. We elaborate the system model of relayassisted WPCN in the next section and then formulate the AoI-aware utility maximization problem in Section 3. Our proposed solution to this problem is explained in detail in Section 4. The simulation results are reported in Section 5 before we conclude the article.

#### **2. System Model**

Consider a WPT-driven network, as illustrated in Figure 1, which includes an AP, multiple auxiliary nodes, and a sensor node with limited energy. In this system, the AP needs to collect status update information from the sensor and this is performed in two stages: first, the AP broadcasts the accessible bandwidth resources to the auxiliary node as an incentive to participate in the cooperation to charge the sensor, and the auxiliary node judges whether to participate in the power supply to the sensor according to the received incentive. When the sensor harvests enough energy, it utilizes the accumulated energy to send the status update to the AP. In the network, the frequency band authorized by the AP is fixed, so multiple auxiliary nodes compete to act as the helping node and only one auxiliary node will be selected to help charge the sensor node. The selected auxiliary node is allowed to use the bandwidth resource and can be used as a relay.

We use *k* to denote the *k*-th auxiliary node, where *k* ∈ {1, . . . , *K*}. We use *i* to denote the index of the data package, where *i* ∈ {1, 2, . . . , *I*}. We assume that the AP has steady power supply and the energy and information is transmitted through orthogonal channels to avoid any interference. We further assume that all wireless links expose in additive white Gaussian noise.

**Figure 1.** Illustration of the relay-assisted WPCN model.

#### *2.1. Energy Harvesting Model*

Let *PAP* and *Pk* denote the transmit power of AP and the *k*-th auxiliary node, respectively. Let *ha*,*d*(*t*) and *hk*,*d*(*t*) be the wireless channel gain between the AP and the sensor and between the *k*-th auxiliary node and the sensor at time *t*, respectively. Let *b*[*i*] be the time when the *i*-th packet starts transmission. We assume that the sensor's battery capacity is *Bs* Joule. Once the battery is fully charged, the sensor node is triggered to send a newly generated packet with the harvested energy. It is well known that energy is the integral of power over time. Considering the energy harvesting efficiency and our system model, the accumulated energy by RF-based EH for data packet *i* can be expressed as

$$E\_h^{[i]} = \int\_{b^{[i-1]}}^{b^{[i]}} \eta \left( P\_{AP} |h\_{a,d}(t)|^2 + P\_k |h\_{k,d}(t)|^2 \right) dt,\tag{1}$$

where *<sup>η</sup>* <sup>∈</sup> (0, 1) denotes the energy harvesting efficiency. Note that *<sup>b</sup>*[*i*] is the time when the battery is fully charged, therefore, we have

$$B\_{\mathfrak{s}} = E\_{h}^{[i]}.\tag{2}$$

We partition the time into *I* subintervals at time instants, i.e., 0 = *b*[0] < *b*[1] < ... < *b*[*i*−1] < *b*[*i*] < ... < *b*[*I*] , where the *i*-th subinterval [*b*[*i*−1] , *b*[*i*] ] is the time that the sensor harvests energy for information packet *i*. The length of this subinterval is

$$T\_s^{[i]} = b^{[i]} - b^{[i-1]}.\tag{3}$$

Let *d*[*i*] denote the arriving time of the *i*-th information packet at the AP, i.e., the time when the packet *i* transmission is completed. Thus, the transmitting time of the *i*-th information packet is expressed by

$$T\_t^{[i]} = d^{[i]} - b^{[i]}.\tag{4}$$

Once the *i*-th information packet arrives at the AP, the sensor node can start generating the (*i* + 1)-th packet if the status information becomes available. Therefore, we have the following:

$$T\_t^{[i]} \le T\_s^{[i+1]}.\tag{5}$$

The auxiliary relay node has a single antenna and adopts the decoding and forwarding relaying strategy. That is, decoding the received information before encoding and forwarding. The relay node is installed with an information decoder, an encoder, and energy storage. The length of the information packet is denoted by *L*.

Let *T*[*i*] *<sup>t</sup>*<sup>1</sup> and *<sup>T</sup>*[*i*] *<sup>t</sup>*<sup>2</sup> be the transmitting time of the *i*-th packet from the sensor to the *k*-th auxiliary node and from the *k*-th auxiliary node to the AP, respectively. According to Shannon theory, the rate can be calculated in terms of the signal-to-noise ratio (SNR) and bandwidth, so we have the expression of transmitting time, i.e.,

$$T\_{t1}^{[i]} = \frac{L}{W \log(1 + \gamma\_1)} \text{ and } T\_{t2}^{[i]} = \frac{L}{W \log(1 + \gamma\_2)},\tag{6}$$

where *W* represents the bandwidth, *γ*<sup>1</sup> and *γ*<sup>2</sup> are the received SNR of the two communication links, that between sensor and the *k*-th auxiliary node and that between the *k*-th auxiliary node and the AP, respectively.

Considering the situation that the AP can collect the update packet directly from the sensor node, we can see that the total transmitting time of the *i*-th update packet from the sensor to the AP will be

$$T\_t^{[i]} = \min\left(\frac{L}{\mathcal{W}\log(1+\gamma\_1)} + \frac{L}{\mathcal{W}\log(1+\gamma\_2)}, \frac{L}{\mathcal{W}\log(1+\gamma)}\right). \tag{7}$$

Finally, the SNRs are given by

$$\gamma = \frac{P\_d |h\_{d,a}(t)|^2}{N\_0 \mathcal{W}}, \gamma\_1 = \frac{P\_d |h\_{d,k}(t)|^2}{N\_0 \mathcal{W}} \text{ and } \gamma\_2 = \frac{P\_k ' |h\_{k,a}(t)|^2}{N\_0 \mathcal{W}},\tag{8}$$

where |*hd*,*a*(*t*)| 2, <sup>|</sup>*hd*,*k*(*t*)<sup>|</sup> <sup>2</sup> and <sup>|</sup>*hk*,*a*(*t*)<sup>|</sup> <sup>2</sup> denote the wireless channel power gain between the sensor and the AP, between the sensor and the auxiliary node *k*, and between the auxiliary node *k* and the AP, respectively. *N*<sup>0</sup> represents the noise spectral density. *Pd* and *P <sup>k</sup>* are the transmit power of the sensor and the *k*-th auxiliary node, respectively.

#### *2.2. AoI Modeling*

The current information packet's AoI, i.e., packet *i* at time *t*, is described by

$$
\Delta^{[i]}(t) = t - \mathcal{U}^{[i]}(t), \tag{9}
$$

where *U*[*i*] (t) represents the time of generating the latest update packet received by AP, i.e.,

$$\mathcal{U}^{[i]}(t) = \max \left\{ b^{[j]} \Big| d^{[j]} \le t \right\}. \tag{10}$$

Figure 2 illustrates the evolution AoI versus time. It is seen from the figure that when the destination node does not receive the state update packet, AoI increases linearly with time, which shows a sawtooth shape. When the destination node receives a new data packet, the AoI is reset to the delay that the status update experiences. In the time interval [*d*[*i*] , *d*[*i*+1] ], the integral of AoI is the area under the Δ[*i*] (*t*) curve. Therefore, its average AoI is expressed by

$$\bar{\Delta}^{[i]} = \frac{1}{d^{[i+1]} - d^{[i]}} \int\_{d^{[i]}}^{d^{[i+1]}} \Delta^{[i]}(t)dt. \tag{11}$$

**Figure 2.** Evolution of the AoI.

The integral term in the above formula is obtained by summing the area of *Q*[*i*],

$$\int\_{d^{[i]}}^{d^{[i+1]}} \Delta^{[i]}(t)dt = \mathcal{Q}\_{[i]} \tag{12}$$

which is expressed by

$$Q\_{[i]} = \frac{\left(T\_t^{[i]} + T\_s^{[i+1]} + T\_t^{[i+1]}\right)\left(T\_s^{[i+1]} - T\_t^{[i]} + T\_t^{[i+1]}\right)}{2}.\tag{13}$$

Therefore, the average AoI of the *i*-th data packet is expressed by

$$\bar{\Delta}^{[i]} = \frac{Q\_{[i]}}{d^{[i+1]} - d^{[i]}} = \frac{1}{2} \left( T\_t^{[i]} + T\_s^{[i+1]} + T\_t^{[i+1]} \right). \tag{14}$$

The harvesting energy time and transmitting update packet time are independent, so they could be regarded as independent and identically distributed variables. Therefore, for the long-term running, the system will be in a quasi-stationary state, which guarantees *<sup>T</sup>*[*i*+1] *<sup>s</sup>* <sup>=</sup> *<sup>T</sup>*[*i*] *<sup>s</sup>* and *<sup>T</sup>*[*i*+1] *<sup>t</sup>* <sup>=</sup> *<sup>T</sup>*[*i*] *<sup>t</sup>* . In consequence, the average AoI is given by

$$
\bar{\Delta} = T\_t + \frac{1}{2} T\_s. \tag{15}
$$

#### **3. Problem Formulation**

In the system described above, the auxiliary node may become selfish and not willing to charge the sensor due to their own limited energy. In order to motivate them to participate in charging the sensor node, the AP may use spectrum priority access as the incentive to encourage the auxiliary nodes. In this section, we will define the utility functions for the AP and the auxiliary node. We design a Stackelberg game method to build an effective cooperation between auxiliary node and AP–sensor pair. The basic idea of Stackelberg game is that one side is the leader and the other side is the follower. The leader acts first and the follower chooses his own action according to the leader's strategy [23]. Their goal is to maximize their own interests.

Specifically, first, the AP issues a certain bandwidth as the incentive for the auxiliary nodes to participate in charging. Second, the auxiliary nodes optimize their transmit power based on the incentive. Third, the AP optimizes its transmit power and then transfers energy to the sensor together with the selected auxiliary node. At last, the sensor utilizes the harvested energy to send update packets to the AP. The flow chart of the system is illustrated in Figure 3. Energy flow is shown in green on the right; update packet transmission flow is shown in red on the left.

**Figure 3.** Overview of the incentive-based update packet collection system.

#### *3.1. Utilities of the Auxiliary Node*

Let Γ*k*(*x*) denote the cost of the *k*-th auxiliary node to transmit energy to the sensor and information to the AP at power level *x*. Similar to the work in [24], we can model this cost as

$$
\Gamma\_k(\mathbf{x}) = a\_k \mathbf{x}^2 - b\_k \mathbf{x} + c\_k. \tag{16}
$$

where *ak*, *bk*, and *ck* are predetermined parameters related to auxiliary node *k*. If node *k* uses power level *Pk* and *P <sup>k</sup>* to transmit to the sensor and to the AP, respectively, the utility of the auxiliary node *k* can be expressed by

$$dL\_k(P\_{k\prime}P\_k^{\prime}, T\_s, T\_s^{\prime}) = aB - T\_s\Gamma\_k(P\_k) - T\_s^{\prime}\Gamma\_k(P\_k^{\prime}),\tag{17}$$

where *αB* is the incentive from the AP with *α* as the uniform factor of bandwidth and revenue, *Ts* is the energy harvesting time and *T <sup>s</sup>* is the transmitting time from node *k* to the AP.

For the auxiliary node, it expects to maximize its utility. Therefore, the maximization problem Pk is expressed by

$$\begin{array}{l} \text{P}\_{\text{k}} : \max\_{P\_{\text{k}}, P\_{\text{k}}'} \text{U}\_{\text{k}} \left( P\_{\text{k}}, P\_{\text{k}}{}', T\_{\text{s}}, T\_{\text{s}}{}' \right) \\ \text{s.t. } 0 < P\_{\text{k}}, P\_{\text{k}}{}' \le P\_{\text{k}}^{\text{max}} \end{array} \tag{18}$$

where *P*max *<sup>k</sup>* is the power threshold of the *k*-th auxiliary node.

#### *3.2. Utilities of the AP*

As mentioned earlier, the ratio factor of revenue to bandwidth is defined as *α*, and the overhead of AP is given by

$$
\Xi = \mathfrak{a}B.\tag{19}
$$

Therefore, the utility associated with the sensor-AP pair is given by

$$
\delta I\_{AP}^{(0)} = \tilde{\mathcal{U}}\_{AP} - \mu \vec{\Delta} - \omega \, T\_s P\_{AP} |h\_{a,d}|^2 - \Xi,\tag{20}
$$

where *U*˜ *AP* is a constant, which is pre-defined. *μ* > 0, *ω* > 0 are the cost coefficient based on AoI and the cost coefficient based on energy, respectively.

Then, the AoI-based utility maximization problem is formulated as

$$\begin{aligned} \text{P}^{(1)}\_{\text{AP}} & \colon \max\_{P\_{AP}, T\_s} \text{LI}^{(0)}\_{AP} (B\_\prime \, T\_{s\prime} \, P\_{AP\prime} \, P\_k) \\ \text{s.t.} & \: B\_\mathbf{s} \le E\_h \colon T\_t \le T\_s \colon B \ge 0; 0 \le P\_{AP} \le P\_{AP} \:^{\text{max}} \end{aligned} \tag{21}$$

where *P*max *AP* is the maximum available power of the AP. As *<sup>U</sup>*˜ *AP* is a constant, P(1) AP can be transformed to P(2) AP, i.e.,

$$\begin{aligned} \text{P}^{(2)}\_{\text{AP}} &: \min\_{P\_{AP}, T\_s} \text{LI}\_{AP}(B, T\_{s\prime}, P\_{AP\prime}, P\_k) \\ \text{s.t.} &\text{ } B\_{\text{s}} \le E\_h; T\_t \le T\_{\text{s}}; B \ge 0; 0 \le P\_{AP} \le P\_{AP} \text{max}\_{\text{\textquotedblleft}} \end{aligned} \tag{22}$$

where

$$\begin{aligned} \mu \Pi\_{AP}(B, T\_{s\prime} P\_{AP\prime} P\_k) &= \mu \bar{\Delta} + \omega T\_s P\_{AP} |h\_{a,d}|^2 + \Xi\\ \mu &= \mu T\_t + \frac{1}{2} \mu T\_s + \omega T\_s P\_{AP} |h\_{a,d}|^2 + \alpha B. \end{aligned} \tag{23}$$

As *<sup>μ</sup>*, *Tt*, *<sup>α</sup>*, and *<sup>B</sup>* are fixed values, the problem P(2) AP is transformed to P(3) AP, i.e.,

$$\begin{array}{l} \text{P}^{(3)}\_{\text{AP}} \colon \min\_{P\_{AP}, T\_s} \frac{1}{2} \mu T\_s + \omega \, T\_s P\_{AP} |h\_{a,d}|^2\\ \text{s.t. } B\_{\text{s}} \le E\_{\text{ft}}; T\_{\text{t}} \le T\_{\text{s}}; 0 \le P\_{AP} \le P\_{AP} \text{max} \end{array} \tag{24}$$

#### **4. Solution Method**

We elaborate our proposed method and the derived solution for the above utility optimization problem.

*4.1. Optimization of Pk and P <sup>k</sup> with a Given* {*PAP*, *Ts*}

As mentioned above, the problem Pk is expressed by

$$\begin{array}{lcl}\text{P}\_{\text{k}} & \max\_{\text{P}\_{\text{k}},\text{P}\_{\text{k}}'} \mathcal{U}\_{\text{k}}\left(\text{P}\_{\text{k}},\text{P}\_{\text{k}}\,',T\_{\text{s}}\,T\_{\text{s}}\,'\right) \\ \text{s.t. } & 0 < P\_{\text{k}},P\_{\text{k}}\,' \le P\_{\text{k}}\,\text{max}\_{\text{s}} \end{array} \tag{25}$$

where

$$\mathcal{U}\_k \{ P\_k, P\_k^{\prime}, T\_{s\prime} T\_s^{\prime} \} = aB - T\_s \Gamma\_k(P\_k) - T\_s^{\prime} \Gamma\_k(P\_k^{\prime}).\tag{26}$$

By substituting the cost function expression into the above problem, problem Pk can be expressed as follows,

$$\begin{split} \max\_{\begin{subarray}{c} P\_{\mathbf{k}}, P\_{\mathbf{k}}' \\ P\_{\mathbf{k}}, P\_{\mathbf{k}}' \end{subarray}} & \left\{ -a\_{\mathbf{k}} T\_{\mathbf{s}} P\_{\mathbf{k}}^{\prime 2} + b\_{\mathbf{k}} T\_{\mathbf{s}} P\_{\mathbf{k}} - a\_{\mathbf{k}} T\_{\mathbf{s}}^{\prime} P\_{\mathbf{k}}^{\prime 2} + b\_{\mathbf{k}} T\_{\mathbf{s}}^{\prime} P\_{\mathbf{k}}^{\prime \prime} + aB - c\_{\mathbf{k}} T\_{\mathbf{s}} - c\_{\mathbf{k}} T\_{\mathbf{s}}^{\prime} \right\} \\ \text{s.t. } & 0 < P\_{\mathbf{k}}, P\_{\mathbf{k}}^{\prime \prime} \le P\_{\mathbf{k}}^{\max} . \end{split} \tag{27}$$

**Lemma 1.** *For a given* {*PAP*, *Ts*}*, the optimal solution to Problem* Pk *is*

$$\begin{cases} \begin{array}{c} P\_k^\* = \frac{b\_k}{\Delta a\_k}, \\\ P\_k^{'\*} = \frac{b\_k}{\Delta a\_k}. \end{array} \end{cases} \tag{28}$$

**Proof.** The objective function is about the quadratic function of two independent variables, and the second derivative is negative, so it is a concave function. The maximum value can be solved based on the quadratic formula directly.

#### *4.2. Optimization of PAP and Ts with a Given* {*Pk*, *P k*}

In problem P(3) AP, two variables are multiple coupled, so the problem is non-convex. To tackle it, we introduce a new slack variable, i.e., *π* = *Ts* · *PAP*. As a result, the problem can be transformed into the following problem by variable substitution method:

$$\begin{cases} \mathbf{P}\_{\rm AP}^{(4)}: \min\_{T\_s, \pi} \frac{1}{2} \mu T\_s + \omega |h\_{a,d}|^2 \pi \\ \text{s.t. } \ -T\_s + T\_t \le 0 \\ \ -T\_s + \frac{1}{P\_{AP}^{\rm max}} \pi \le 0 \\ \ -T\_s - \frac{|h\_{d,d}|^2}{P\_k |h\_{k,d}|^2} \pi + \frac{B\_s}{\eta P\_k |h\_{k,d}|^2} \le 0. \end{cases} \tag{29}$$

As both of the objective function and the constraints of P(4) AP are linear, it is a convex problem, and the corresponding Lagrange function can be given by

$$\begin{split} L(T\_s, \pi, \lambda) &= \frac{1}{2} \mu T\_s + \omega \left| h\_{\mathfrak{a},d} \right|^2 \pi + \lambda\_1 (-T\_s + T\_t) + \lambda\_2 \left( -T\_s + \frac{1}{F\_{AP} \pi \pi} \pi \right) \\ &+ \lambda\_3 \left( -T\_s - \frac{|h\_{\mathfrak{a},d}|^2}{P\_k |h\_{\mathfrak{a},d}|^2} \pi + \frac{B\_s}{\eta P\_k |h\_{\mathfrak{b},d}|^2} \right). \end{split} \tag{30}$$

The Karush–Kuhn–Tucker (KKT) condition of the problem is

$$\frac{1}{2}\mu - \lambda\_1 - \lambda\_2 - \lambda\_3 = 0,\tag{31a}$$

$$
\omega \left| h\_{a,d} \right|^2 + \frac{1}{P\_{AP}^{\text{max}}} \lambda\_2 - \frac{|h\_{a,d}|^2}{P\_k |h\_{k,d}|^2} \lambda\_3 = 0,\tag{31b}
$$

$$
\lambda\_1 (-T\_s + T\_t) = 0,\tag{31c}
$$

$$
\lambda\_2 \left( -T\_s + \frac{1}{P\_{AP}^{\text{max}}} \pi \right) = 0,\tag{31d}
$$

$$
\lambda\_3 \left( -T\_s - \frac{|h\_{d,d}|^2}{P\_k |h\_{k,d}|^2} \pi + \frac{B\_s}{\eta P\_k |h\_{k,d}|^2} \right) = 0,
\tag{31e}
$$

$$
\lambda\_1 \ge 0,\tag{31}
$$

$$
\lambda\_2 \ge 0,\tag{31g}
$$

$$
\lambda\_3 \ge 0,\tag{31h}
$$

$$-T\_s + T\_t \le 0,\tag{31i}$$

$$-T\_s + \frac{1}{P\_{AP}^{-\max}}\pi \le 0,\tag{31j}$$

$$1 - T\_s - \frac{|h\_{a,d}|^2}{P\_k |h\_{k,d}|^2} \pi + \frac{B\_s}{\eta P\_k |h\_{k,d}|^2} \le 0. \tag{31k}$$

According to (31b), the sum of the first two terms must be greater than 0, so *λ*<sup>3</sup> = 0. Otherwise, (31b) will not hold.

It can be seen from (31a) and (31b) that *λ*<sup>1</sup> and *λ*<sup>2</sup> must not be 0 at the same time. Otherwise, *λ*<sup>3</sup> will be equal to two different values.

From (31c) and (31d), one can see that *λ*<sup>1</sup> and *λ*<sup>2</sup> cannot not be equal 0 at the same time. Otherwise, *Ts* will be equal to two different values.

With above observations, there are two cases to analyze the solutions to the optimization problem.

Case 1: *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> 0, *<sup>λ</sup>*<sup>2</sup> <sup>=</sup> 0 and *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> 0, the optimal solution to P(4) AP is

$$\begin{cases} \begin{array}{l} T\_{\text{s}} \,^{\ast} = \frac{B\_{\text{s}}}{\eta \left( P\_{k} |h\_{k,d}|^{2} + P\_{Ap}{}^{\max} |h\_{a,d}|^{2} \right)} \,^{\ast} \\ \tau \,^{\ast} = \frac{B\_{\text{s}} P\_{AP}{}^{\max}}{\eta \left( P\_{k} |h\_{k,d}|^{2} + P\_{Ap}{}^{\max} |h\_{\nu,d}|^{2} \right)} \,^{\ast} \\\ P\_{AP}{}^{\ast} = P\_{AP}{}^{\max} \end{array} \tag{32}$$

Case 2: *<sup>λ</sup>*<sup>1</sup> <sup>=</sup> 0, *<sup>λ</sup>*<sup>2</sup> <sup>=</sup> 0 and *<sup>λ</sup>*<sup>3</sup> <sup>=</sup> 0, the optimal solution to P(4) AP is

$$\begin{cases} \begin{array}{l} T\_s \stackrel{\ast}{\ }= T\_{\mathbf{l}\_f} \\ \tau \stackrel{\ast}{\ }= \left( \frac{B\_{\mathbf{r}}}{\eta P\_k |h\_{k,d}|^2} - T\_{\mathbf{l}} \right) \frac{P\_k |h\_{k,d}|^2}{|h\_{\mathbf{r},d}|^2} \end{array} \tag{3.3} \\\ P\_{AP} \stackrel{\ast}{\ }= \frac{\pi^\*}{T\_{\mathbf{l}}} .\end{cases} \tag{3.3}$$

For the solutions of the two cases, it can be seen from the restrictive conditions that when (31i) is true, the solution is the case 1. Otherwise, it is the case 2. That is to say, the following formula is the judgment condition.

$$\frac{B\_s}{\eta \left( P\_k |h\_{k,d}|^2 + P\_{AP}{}^{\max} |h\_{d,d}|^2 \right)} \ge T\_{t\prime} \tag{34}$$

where the expression of *Tt* is given by (7).

**Theorem 1.** *The optimal solution to* <sup>P</sup>(4) AP *is expressed by*

$$\begin{cases} \begin{aligned} T\_{s}^{\*} &= \begin{cases} \frac{B\_{s}}{\eta\left(P\_{k}|h\_{d,d}|^{2} + P\_{AP}^{\max}|h\_{d,d}|^{2}\right)}, & \text{if } \frac{B\_{s}}{\eta\left(P\_{k}|h\_{d,d}|^{2} + P\_{AP}^{\max}|h\_{d,d}|^{2}\right)} \geq T\_{t} \\ T\_{t}, & \text{otherwise} \end{cases} \\ \begin{aligned} \pi^{\*} &= \begin{cases} \frac{B\_{s}P\_{AP}^{\max}}{\eta\left(P\_{k}|h\_{d,d}|^{2} + P\_{AP}^{\max}|h\_{d,d}|^{2}\right)}, & \text{if } \frac{B\_{s}}{\eta\left(P\_{k}|h\_{d,d}|^{2} + P\_{AP}^{\max}|h\_{d,d}|^{2}\right)} \geq T\_{t} \\ \left(\frac{B\_{t}}{\eta\left(P\_{k}|h\_{d,d}|^{2}\right)} - T\_{t}\right)\frac{P\_{k}|h\_{d,d}|^{2}}{|h\_{d,d}|^{2}}, & \text{otherwise} \end{cases} \\ P\_{AP}^{\*} &= \begin{cases} P\_{AP}^{\max}, & \text{if } \frac{B\_{s}}{\eta\left(P\_{k}|h\_{d,d}|^{2} + P\_{AP}^{\max}|h\_{d,d}|^{2}\right)} \geq T\_{t} \\ \frac{T^{\*}}{T\_{t}}, & \text{otherwise} \end{cases} \end{cases} \end{cases} \end{cases} \tag{35}$$

**Proof.** As problem P(4) AP is convex, Theorem 1 can be proved with KKT conditions.

#### **5. Simulation Results**

Some numerical results are shown to discuss the system performance in terms of achievable AoI, where the simulational parameters are set as follows. For clarity, the distance between the AP and the sensor is set to 6 m. The auxiliary nodes are randomly placed between AP and sensor. Assume that the distance between all auxiliary nodes and sensors is the same, that is, 3 m. Note that the location of the auxiliary nodes will be varied according to the purpose of the simulation. The channel is generated according to Rayleigh distribution. The path loss factor is set to 2. Other simulational parameters are summarized in Table 1.


**Table 1.** Parameter list.

Figure 4 plots the minimized average AoI versus the *PAP*. One can observe that the AoI decreases gradually with the increase of *PAP*. The reason is that increasing the AP's transmit power will shorten the time for the energy harvested by the sensor to reach the battery capacity. Thus, it improves the freshness of the information. However, beyond a certain range, the change of AoI with transmit power is no longer significant because the average AoI is dominated by the information packet transmission time. This figure also shows the influence of different locations of auxiliary nodes on AoI is also different. The closer the auxiliary node to the sensor, the shorter time it will take to charge the sensor and the smaller the AoI.

**Figure 4.** The minimized AoI versus *PAP*.

Figure 5 plots the minimized AoI versus the distance between the auxiliary nodes and the sensor. It is seen that the closer the auxiliary node deployed to the sensor, the lower the AoI. This observation is consistent with the real situation as the closer the auxiliary node transmitting energy is to the sensor, the smaller the attenuation of electromagnetic wave is. Therefore, the time for sensor to collect energy will be reduced, so as to improve AoI. In addition, it is seen that the influence of the location on the AoI is different under different *PAP*. The smaller the *PAP*, the more obvious the influence of the location of the auxiliary node on AoI. When the *PAP* is small, the charging mainly depends on the auxiliary node, and the position of the auxiliary node is more critical. When the *PAP* is normal or larger, the *PAP* is the main factor affecting the AoI, so that the position of the auxiliary node has no obvious effect on AoI.

**Figure 5.** The minimized AoI versus *Dkd*.

Figure 6 plots the AP's maximum utility value versus the transmit power of AP. From this figure, it is seen that the larger the *PAP*, the greater the utility value, and finally

tends to remain unchanged. The reason may be that in a certain range, by increasing the *PAP*, although the cost of transmitting energy increases, it makes the information fresher. The benefit of AoI is greater than the cost of transmitting energy. Therefore, the utility value of the AP based on AoI becomes larger, and beyond a certain range, with the increase of *PAP*, the impact on AoI is negligible and the increase of utility value is not obvious. Besides, it is also seen that the auxiliary nodes in different positions also have an impact on the utility value. The specific relationship is shown in the figure below.

**Figure 6.** The AP's utility value versus the *PAP*.

Figure 7 plots the utility of AP versus the distance between the auxiliary nodes and the sensor. The shorter the distance between the auxiliary node and the sensor, the greater the AP's utility value, and the smaller the AP transmit power, the more obvious that effect. The reason is similar to that associated with Figure 2.

**Figure 7.** The AP's utility value versus the *Dak*.

Figure 8 plots the change of AoI versus packet length. It is seen that the larger the length of the information packet, the larger the average AoI of the system. This may be because the length of the packet directly affects the transmit time of the packet. The larger the packet, the longer the transmitting time, which affects the freshness of the information. In addition, the influence of packet length on AoI varies with the location of auxiliary nodes. The closer the auxiliary node to the sensor, the less influence of packet length on AoI, because in the process of the auxiliary node serving as the relay and the sensor sending packets to the AP, the transmit power of the auxiliary node is larger than that of the sensor node, and the same packet length change has less impact on the transmit time.

**Figure 8.** The minimized AoI versus L.

Figure 9 depicts the change of AoI with the distance between the AP and the sensor. Obviously, the farther the distance, the larger the AoI. The figure also compares the AoI in the system with and without auxiliary node as the relay. When the distance is greater than 13 m, the information with auxiliary node as the relay is fresher when the sensor transmits update packets to the AP. Thus, for the actual system, if the relay within 13 m can choose not to activate them, it has important guiding significance.

**Figure 9.** The minimized AoI versus *Dad*.

Note that in this paper, the AoI is calculated by using the current packet's transmit time to approximate the transmit time of the next packet, that is, using the current channel to approximate the channel of the next slot. Figure 10 shows the relationship between the approximate value and the exact value of AoI. It is seen that the change of the approximate value lags behind the exact value by one time slot. Due to the randomness of the channel, the approximate value may be larger or smaller than the true value. If the channel of the current time slot is better than the next time slot, the approximate value of the current AoI is a little smaller than the true value. If the channel of the current time slot is worse than the next time slot, the approximate value of the current AoI is larger than the real value. However, on average, the difference between our modeling method and the real value is very small, which indirectly proves the effectiveness of the modeling method.

**Figure 10.** Accurate AoI versus approximate AoI.

#### **6. Conclusions**

In this article, we study a relay-assisted WPCN based on AoI with a focus on the case when the auxiliary nodes are selfish. The main idea behind our proposed solution is an incentive scheme that encourages the auxiliary nodes to collaborate. We formulate the problem and use the Stackelberg game theory to design an effective collaboration between AP–sensor pair and auxiliary node. More specifically, two utility functions for the AP–sensor pair and the auxiliary node were formulated. As maximizing the utility of the AP–sensor pair was non-convex, we transformed it into a convex problem by introducing a new slack variable, and then solved it by the Lagrangian method to obtain optimal solutions in the closed form. Simulation results showed that the larger the transmit power of the AP, the smaller the AoI and the less the influence of the location of the auxiliary node on AoI. In addition, when the distance from the AP to the sensor node exceeds a certain threshold, employing the relay can achieve better AoI performance than non-relaying systems. These results provide insightful and practical guidance for the design of relay-assisted WPCN in real life.

**Author Contributions:** N.L., K.X., H.Z. and Z.Z. equally contributed to this work on system modeling, methodology and simulation; N.L. also contributed to writing and editing; K.X. also contributed to project administration; G.Q. and P.F. contributed to the review; K.X. and Y.Z. contributed to funding acquisition. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the General Program of the National Natural Science Foundation of China (NSFC) (No. 62071033 and U1834210), in part by Frontiers Science Center for Smart High-speed Railway System with the Fundamental Research Funds for the Central Universities (No. 2020JBZD010), and also in part by the Self-developed project of State Grid Energy

Research Institute Co., Ltd. (Electrical internet of Things Edge Computing Performance Analysis and Simulation Based on Typical Scenarios, No. SGNY202009014).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We would like to thank all the reviewers for their constructive comments and helpful suggestions. In addition, we would like to give special thanks to Haina Zheng, who discussed the relative methods and also gave some helpful suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A UoI-Optimal Policy for Timely Status Updates with Resource Constraint**

**Lehan Wang, Jingzhou Sun, Yuxuan Sun, Sheng Zhou \* and Zhisheng Niu**

Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China; wang-lh19@mails.tsinghua.edu.cn (L.W.); sunjz18@mails.tsinghua.edu.cn (J.S.); sunyuxuan@tsinghua.edu.cn (Y.S.); niuzhs@tsinghua.edu.cn (Z.N.) **\*** Correspondence: sheng.zhou@tsinghua.edu.cn

**Abstract:** Timely status updates are critical in remote control systems such as autonomous driving and the industrial Internet of Things, where timeliness requirements are usually context dependent. Accordingly, the Urgency of Information (UoI) has been proposed beyond the well-known Age of Information (AoI) by further including context-aware weights which indicate whether the monitored process is in an emergency. However, the optimal updating and scheduling strategies in terms of UoI remain open. In this paper, we propose a UoI-optimal updating policy for timely status information with resource constraint. We first formulate the problem in a constrained Markov decision process and prove that the UoI-optimal policy has a threshold structure. When the context-aware weights are known, we propose a numerical method based on linear programming. When the weights are unknown, we further design a reinforcement learning (RL)-based scheduling policy. The simulation reveals that the threshold of the UoI-optimal policy increases as the resource constraint tightens. In addition, the UoI-optimal policy outperforms the AoI-optimal policy in terms of average squared estimation error, and the proposed RL-based updating policy achieves a near-optimal performance without the advanced knowledge of the system model.

**Keywords:** age of information; constrained Markov decision process; reinforcement learning; contextawareness; timely status updates

#### **1. Introduction**

With the development of 5G and the Internet of Things (IoT), requirements for wireless communication have shifted from merely providing communication channels to covering the entire process of various IoT applications, e.g., autonomous vehicle [1] and virtual reality (VR) [2], where sensing, communication, computation, and control form a closed loop. Therefore, in addition to the communication delay, it is necessary to consider the information delay counted from the generation of the state information to the execution, namely the timeliness of information. For this purpose, Age of Information (AoI) has been proposed, which is defined as the time elapsed since the generation time of the latest received packets [3]. Due to its concise definition and clear physical meaning, AoI has been widely used for the design of scheduling and updating policies in remote estimation [4–6] and wireless communication networks [7–12]. Most existing works focus on optimizing average AoI or peak age. In [13], the authors claim that minimizing average age cannot satisfy the requirements for ultra-reliable low-latency communication (URLLC) and study the tail distribution of AoI. The violation probability for peak age is derived in [14] and the stationary distribution of AoI is studied in [15].

Nevertheless, the AoI still has some limitations. First, it fails to measure the nonlinear performance degradation caused by information staleness. In [16–19], nonlinear age penalty functions were introduced to solve this problem. Meanwhile, the Age of Synchronization (AoS) [20] and Age of Incorrect Information (AoII) [21] are defined to associate information freshness with the content of information. AoS is the time elapsed since the information at

**Citation:** Wang, L.; Sun, J.; Sun, Y.; Zhou, S.; Niu, Z. A UoI-Optimal Updating Policy for Timely Status Information with Resource Constraint. *Entropy* **2021**, *23*, 1084. https://doi.org/10.3390/e23081084

Academic Editors: Yin Sun and Anthony Ephremides

Received: 20 June 2021 Accepted: 16 August 2021 Published: 20 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the receiver becomes desynchronized with the actual status of the monitored process. AoII is defined as the product of an increasing time penalty function and a penalty function of the estimation error. In addition, the status of heterogeneous data sources may change at different rates. A fast-changing process may require information with a lower age. However, age is independent of the changing rate and thus is not proper in the cases when heterogeneous data sources are jointly considered. To solve this problem, weighted age was introduced in [22,23] to distinguish important monitored processes. In [24], the metric based on information theory is proposed as a replacement of the time-based metric, AoI, to characterize the changing rate. In [5], the authors claim that minimizing age is not equivalent to minimizing the estimation error in a remote estimation problem and propose an effective age to solve this problem [25].

Practical systems (e.g., V2X-communication systems) may have different requirements for information freshness with different contexts. The context refers to all environmental factors that affect the requirement for information freshness. Therefore, resources should be reserved for frequent status updates in emergency to ensure safety.

However, the timeliness metrics mentioned above pay no attention to the significance of context information. To solve this problem, Urgency of Information (UoI) has been proposed in [26–28] to measure the influence of inaccurate information on performance under different contexts. To be specific, UoI uses a time-variant context-aware weight *ω*(*t*) to distinguish different contexts. A higher *ω*(*t*) indicates that the system is in more urgent situations (e.g., when a vehicle is approaching an intersection or overtaking) and therefore requires frequent updates. For example, when a vehicle passes through an intersection, the context-aware weight increases as the distance between the vehicle and the center of the intersection decreases. Meanwhile, the estimation error *Q*(*t*) is introduced to measure the information inaccuracy, which is defined as the difference between the actual status and the estimated status at the receiver. The larger the absolute value of *Q*(*t*) is, the less accurate the estimated status is. Therefore, UoI is defined as the product of context-aware weight and a cost function of the estimation error *Q*(*t*):

$$F(t) = \omega(t)\delta(Q(t)).\tag{1}$$

In discrete-time systems, the estimation error *Q*(*t*) is:

$$Q(t) = \sum\_{\tau=\mathcal{G}(t)}^{t-1} A(\tau),\tag{2}$$

where *g*(*t*) is the generation time of the latest status update at the receiver and *A*(*t*) is the increment in estimation error in time slot *t*. Specifically, if the context-aware weight is time-invariant (i.e., *ω*(*t*) = 1), and *A*(*t*) = 1 as well as *δ*(*Q*(*t*)) = *Q*(*t*), UoI is the same as AoI. If the context-aware weight is process-dependent, UoI can represent weighted age. If the cost function *δ*(*Q*(*t*)) is nonlinear, UoI can represent the nonlinear age penalty function. For example, when the outdated information is worthless, e.g., the information is about sales that expire after some time [29], then the shifted unit step cost function *δ*(*Q*(*t*)) = *u*(*Q*(*t*) − *τ*), *τ* > 0 is recommended. For the unit step function, *u*(*x*) = 1 when *x* ≥ 0 and otherwise *u*(*x*) = 0.

In this work, we considered a single-user remote monitoring system, and the objective was to find an updating policy minimizing the average UoI over time under the constraint on average update frequency. To solve this problem, Refs. [27,30] proposed update-indexbased adaptive schemes with Lyapunov optimization but did not conduct a theoretical analysis of their optimality. In addition, the constrained Markov decision process (CMDP) formulation was only used in the simulation for a numerically solved benchmark. Based on the existing works, in this paper, we theoretically analyzed the structure of the UoI-optimal policy and focused on how to derive an updating policy in an unknown environment.

The main contributions of this paper are summarized as follows.


The rest of this paper is organized as follows. The system model and the problem formulation are described in Section 2. In Section 3, we obtain the CMDP formulation of the problem with the given distribution of context-aware weight and prove the threshold structure of the UoI-optimal policy. The proposed model-based RL updating policy is obtained in Section 4. In Section 5, the simulation results are shown and discussed while the conclusions are drawn in Section 6.

#### **2. System Model and Problem Formulation**

In this paper, we considered a remote monitoring system, in which a fusion center collects the status information (e.g., current location, velocity, information of surrounding) from a vehicle of interest via a wireless channel with limited resources, as shown in Figure 1. The whole system is considered as a discrete-time system and the status can be generated at will. Due to the limitations on the wireless resources and energy supply, there is a constraint on the average update frequency of the vehicle. The update decision in time slot *t* is denoted by *U*(*t*) ∈ {0, 1}, where *U*(*t*) = 1 means that the vehicle decides to transmit the current status to the center, and *U*(*t*) = 0 denotes that the vehicle decides to stay idle.

The wireless channel is assumed as a block fading channel with successful transmission probability *ps*. Let *S*(*t*) ∈ {0, 1} be the state of the channel. *S*(*t*) = 0 represents that the channel is in deep fading, and no packet can be successfully transmitted. *S*(*t*) = 1 means the packets can be successfully transmitted to the center through the channel. If the center receives an update, then *U*(*t*)*S*(*t*) = 1 and an ACK will be sent to the vehicle.

Let *x*(*t*) and *x*ˆ(*t*) denote the current status of the monitored vehicle and the estimated status of the vehicle at the center, and *Q*(*t*) = *x*(*t*) − *x*ˆ(*t*) denotes the estimation error. Similar to [26], we further assume that the time period of a packet transmission is less than a time slot and the estimation at the center equals the latest status information received by the center. This estimation scheme is easy to implement, theoretically tractable and has been proven to be an optimal policy that can minimize the average squared error of status estimation in a remote estimation system under energy constraints when the monitored process is a Wiener process [31]. Then, the recurrence relation of the estimation error *Q*(*t*) is:

$$Q(t+1) = (1 - \mathcal{U}(t)S(t))Q(t) + A(t). \tag{3}$$

Equation (3) indicates that the estimation error will be the amount of variation of the monitored process from the generation time of the latest received status to the current time. The increment *A*(*t*) represents the variation of the monitored process. For example, when *A*(*t*) follows a Gaussian distribution with a mean of zero and variance of *σ*2, represented by *N*(0, *σ*2), the monitored status follows a Wiener process. When *A*(*t*) takes values from {0, 1, <sup>−</sup>1} with a probability of {<sup>1</sup> <sup>−</sup> <sup>2</sup>*prw*, *prw*, *prw*}, where 0 <sup>&</sup>lt; *prw* <sup>&</sup>lt; <sup>1</sup> <sup>2</sup> , then the status of the monitored source will be a one-dimensional random walk. In this paper, we

assumed that the monitored status of the vehicle is a Wiener process and *A*(*t*) is i.i.d. over time. However, the increment in estimation error during a single slot cannot be infinite in practical systems. Therefore, in contrast to [27,30], we assumed that increment *A*(*t*) obeys a truncated Gaussian distribution, i.e., the probability density function (PDF) of *A*(*t*) is:

$$f\_{A(t)}(a) = \frac{\frac{1}{\sigma} \phi\left(\frac{a - \mu}{\sigma}\right)}{\Phi\left(\frac{A\_{max} - \mu}{\sigma}\right) - \Phi\left(\frac{-A\_{min} - \mu}{\sigma}\right)},\tag{4}$$

where *μ* and *σ* are the expectation and standard deviation of increment *A*(*t*). *φ* and Φ are the PDF and the cumulative distribution function (CDF) of standard normal distribution. We also assumed *A*(*t*) ∈ [−*Amin*, *Amax*], *Amax* = *Amin* > 0 and *μ* = 0.

**Figure 1.** Remote control and monitoring model. The vehicle of interest is shown in red.

Meanwhile, the scheduling policy of information updates should also be related to the situation and environment of the system. For example, when the system is in an emergency, it should be very sensitive to the accuracy and the delay of the status information, thus the status should be updated more frequently. Therefore, our objective is to find a policy telling the vehicle whether to transmit status information or not in each slot for a minimum average UoI over time under the constraint:

$$\begin{aligned} \min\_{\mathcal{U}(t)} & \limsup\_{T \to \infty} \frac{1}{T} E\left[\sum\_{t=0}^{T-1} w(t)Q(t)^2\right] \\ & s.t. \limsup\_{T \to \infty} \frac{1}{T} \sum\_{t=0}^{T-1} E[\mathcal{U}(t)] \le \rho\_{\prime} \end{aligned} \tag{5}$$

where *ω*(*t*) > 0 is the context-aware weight, which is independent with *Q*(*t*). *ρ* ∈ (0, 1] is the maximum average update frequency. The cost function of the estimation error used here is *δ*(*Q*(*t*)) = (*Q*(*t*))2, which is inspired by the squared error of status estimation.

#### **3. Scheduling with CMDP-Based Approach**

In this section, we start by formulating problem (5) into a constrained Markov decision process (CMDP) with assumptions on the distribution of the context-aware weight. We will prove the threshold structure of the UoI-optimal updating policy and derive the optimal policy through a linear programming (LP) formulation.

#### *3.1. Constrained Markov Decision Process Formulation*

In the remote monitoring system, the context may be related to the distance between adjacent vehicles/mobile devices, the unexpected maneuver of the neighboring vehicles, etc. In [32], the authors prove that whether the distance between two mobile wireless devices with Ornstein–Uhlenbeck mobility is less than a certain threshold follows a firstorder Markov process. When the two devices are closer, they are more interested in each other's status information, communication and computing resources to facilitate cooperation, share resources, and avoid collisions. At this time, the transmission of status information is more urgent than when the two devices are far apart. As for the unexpected maneuver of the neighboring vehicles, it is very challenging to find a proper formulation. Instead, we assumed that such emergencies occur independently in each slot according

to a certain probability. Therefore, in contrast to [27,30], we assumed that the contextaware weight *ω*(*t*) is i.i.d. over time or a first-order irreducible positive recurrent Markov process and formulated the problem (5) as a CMDP problem. The irreducible positive recurrent Markov formulation guarantees the existence of the UoI-optimal policy (see Appendix A). In this section, we will first focus on the situation where *ω*(*t*) is a first-order Markov process:


$$\begin{aligned} \Pr\{s \to s'|\mathcal{U}\} &= \Pr\{ (Q, \omega) \to (Q', \omega')|\mathcal{U}\} \\ &= \begin{cases} p\_{\omega\omega'} p\_{Q'-Q} & , \mathcal{U} = 0, \\ p\_{\omega\omega'} ((1-p\_s)p\_{Q'-Q} + p\_s p\_{Q'-0}) & , \mathcal{U} = 1. \end{cases} \end{aligned} \tag{6}$$

• One-step cost: The cost caused by taking action *U* in state (*Q*, *ω*) is:

$$
\mathcal{K}(\mathcal{Q}, \omega, \mathcal{U}) = \omega \mathcal{Q}^2,\tag{7}
$$

while the one-step updating penalty only depends on the chosen action:

$$D(Q,\omega,\mathcal{U}) = \mathcal{U}.\tag{8}$$

The average cost caused under a certain policy *π* is the average UoI, which is defined as *C*¯*<sup>π</sup>* and the average updating penalty under *π* is defined as *D*¯ *<sup>π</sup>*. We aimed to find the UoI-optimal policy which minimizes the average cost under the resources constraint. Therefore, problem (5) can be formulated into the following CMDP problem:

$$\begin{split} \min\_{\pi} \mathsf{C}^{\pi} &= \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\pi} \left[ \sum\_{t=1}^{T} \mathsf{C} \left( Q(t), \omega(t), \mathsf{U}(t) \right) \right] \\ \text{s.t. } \bar{D}^{\pi} &= \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\pi} \left[ \sum\_{t=1}^{T} D(Q(t), \omega(t), \mathsf{U}(t)) \right] \leq \rho. \end{split} \tag{9}$$

#### *3.2. Threshold Structure of the Optimal Policy*

We start from some basic definitions in [33] and show the properties of problem (9).

**Definition 1.** *A stationary deterministic policy is a policy that takes the same action whenever in a given state s* = (*Q*, *ω*)*, while a stationary randomized policy chooses to update or not in state s with a certain probability.*

**Theorem 1.** *There exists an optimal stationary randomized policy for problem* (9)*. The optimal policy is a probabilistic combination of two stationary deterministic policies. The two deterministic policies only differ on at most one state and each policy minimizes the unconstrained cost in* (10) *with a different Lagrange multiplier λ:*

$$L\_{\lambda}^{\pi} = \lim\_{T \to \infty} \frac{1}{T} \mathbb{E}\_{\pi} \left[ \sum\_{t=1}^{T} [\mathbb{C}(\mathbb{Q}(t), \omega(t), \mathcal{U}(t)) + \lambda D(\mathbb{Q}(t), \omega(t), \mathcal{U}(t))] \right]. \tag{10}$$

**Proof of Theorem 1.** The proof is shown in Appendix A.

We denote the optimal policy that minimizes the unconstrained cost in (10) with a given *λ* by *π* and the cost obtained under policy *π* by *Lπ*- *<sup>λ</sup>* , namely *<sup>L</sup>π*- *<sup>λ</sup>* = min*<sup>π</sup> <sup>L</sup><sup>π</sup> <sup>λ</sup>* . Then, there exists a differential cost function *V*(*Q*, *ω*) that satisfies the Bellman Equation [34]:

$$V(Q,\omega) + L\_{\lambda}^{\pi^{\*}} = \min\left\{\mathcal{C}(Q,\omega,1) + \lambda D(Q,\omega,1) \right\}$$

$$\left[1 + (1 - p\_{s})\sum\_{\omega' \in W} p\_{\omega\omega\omega'} \sum\_{a = -A\_{m}}^{A\_{m}} p\_{a}V(Q + a, \omega') + p\_{s} \sum\_{\omega' \in W} p\_{\omega\omega\omega'} \sum\_{a = -A\_{m}}^{A\_{m}} p\_{a}V(a, \omega'), \quad (10.10)$$

$$\mathcal{C}(Q,\omega,0) + \sum\_{\omega' \in W} p\_{\omega\omega\omega'} \sum\_{a = -A\_{m}}^{A\_{m}} p\_{a}V(Q + a, \omega') \right]. \tag{11}$$

To solve problem (5), we first prove that with a given *λ*, the optimal stationary deterministic policy *π* has a threshold structure. We then introduce a discounted problem with a discount factor *α* and the discounted cost starting from state (*Q*, *ω*) under a certain policy *π* is:

$$J\_{\hbar,\pi}(Q,\omega) = \lim\_{T \to \infty} \mathbb{E}\_{\pi} \left[ \sum\_{t=0}^{T} a^t [\mathbb{C}(Q(t), \omega(t), \mathcal{U}(t)) \, \! \! \! \! \! / \! \! \! / \! \! \! / \! \! \! / \! \! \! / \! \! \! \! / \! \! \! \! \! / \! \! \! \! \! \! \! / \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \!$$

Denote the minimum cost starting from state (*Q*, *ω*) by *Vα*(*Q*, *ω*) = min*<sup>π</sup> Jα*,*π*(*Q*, *ω*). Then, we have:

$$V\_a(Q,\omega) = \min\left\{\mathbb{C}(Q,\omega,1) + \lambda D(Q,\omega,1) + (1-p\_s)a \sum\_{\omega' \in W} p\_{a\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a(Q+a,\omega')\right\}$$

$$+ p\_s a \sum\_{\omega' \in W} p\_{a\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a(a,\omega'), \mathbb{C}(Q,\omega,0)$$

$$+ a \sum\_{\omega' \in W} p\_{a\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a(Q+a,\omega')\right\}.\tag{13}$$

Define Δ(*Q*, *ω*) as the difference between the value functions by taking the two different actions *U* = 0, 1, meaning that:

$$\Delta(Q,\omega) = \mathbb{C}(Q,\omega,0) + a \sum\_{\omega' \in \overline{W}} p\_{\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a \left(Q + a, \omega'\right)$$

$$= \mathbb{C}(Q,\omega,1) - \lambda D(Q,\omega,1) - p\_s a \sum\_{\omega' \in \overline{W}} p\_{\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a \left(a, \omega'\right)$$

$$= (1 - p\_s) a \sum\_{\omega' \in \overline{W}} p\_{\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a \left(Q + a, \omega'\right)$$

$$= p\_s a \sum\_{\omega' \in \overline{W}} p\_{\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a \left\{V\_a \left(Q + a, \omega'\right) - V\_a \left(a, \omega'\right)\right\} - \lambda. \tag{14}$$

Define ∑*Am <sup>a</sup>*=−*Am paVα*(*<sup>Q</sup>* <sup>+</sup> *<sup>a</sup>*, *<sup>ω</sup>*) as a function *<sup>f</sup>α*(*Q*, *<sup>ω</sup>*). Then we will prove that for ∀|*Q*1| < |*Q*2|, we have *fα*(*Q*1, *ω*) < *fα*(*Q*2, *ω*). To this end, we first prove the following Lemma 1.

**Lemma 1.** *For a given discount factor α and a fixed context-aware weight ω, the value function for Q equals the value function for* −*Q, namely:*

$$V\_{\mathfrak{a}}(Q,\omega) = V\_{\mathfrak{a}}(-Q,\omega).$$

**Proof of Lemma 1.** The Lemma is proven by induction. Define *<sup>V</sup>*(*k*) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) as the value function obtained after the *<sup>k</sup>*th iteration. Assume that for <sup>∀</sup>*Q*, we have: *<sup>V</sup>*(*k*) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) = *<sup>V</sup>*(*k*) *<sup>α</sup>* (−*Q*, *<sup>ω</sup>*). If action *U* is taken in the *k*th iteration, then the expected discounted cost is defined as *J* (*k*) *<sup>α</sup>*,*U*(*Q*, *<sup>ω</sup>*). Therefore, *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) <sup>=</sup> min*<sup>U</sup> <sup>J</sup>* (*k*) *<sup>α</sup>*,*U*(*Q*, *ω*). We have:

$$J\_{a,0}^{(k)}(Q,\omega) = \mathbb{C}(Q,\omega,0) + a \sum\_{\omega' \in W} p\_{\omega\omega'} \Big|\_{a = -A\_m} p\_a V\_a^{(k)}(Q + a, \omega')$$

$$= \omega(-Q)^2 + a \sum\_{\omega' \in W} p\_{\omega\omega'} \sum\_{a = -A\_m}^{A\_m} p\_a V\_a^{(k)}(-Q - a, \omega')$$

$$= \mathbb{C} \chi(-Q, \omega, 0) + a \sum\_{\omega' \in W} p\_{\omega\omega'} \sum\_{a = -A\_m}^{A\_m} p\_a V\_a^{(k)}(-Q + a, \omega') = J\_{a,0}^{(k)}(-Q, \omega). \tag{15}$$

Similarly, we can further prove that *J* (*k*) *<sup>α</sup>*,1 (*Q*, *ω*) = *J* (*k*) *<sup>α</sup>*,1 (−*Q*, *ω*). Notice that the value function obtained in (*<sup>k</sup>* <sup>+</sup> <sup>1</sup>)th iteration is obtained by: *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) <sup>=</sup> min*<sup>U</sup> <sup>J</sup>* (*k*) *<sup>α</sup>*,*U*(*Q*, *ω*), and for any action *U*, *J* (*k*) *<sup>α</sup>*,*U*(*Q*, *ω*) = *J* (*k*) *<sup>α</sup>*,*U*(−*Q*, *<sup>ω</sup>*). Thus, *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) = *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (−*Q*, *<sup>ω</sup>*). By letting *<sup>k</sup>* <sup>→</sup> <sup>∞</sup>, *<sup>V</sup>*(*k*) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) <sup>→</sup> *<sup>V</sup>α*(*Q*, *<sup>ω</sup>*). Hence, *<sup>V</sup>α*(*Q*, *<sup>ω</sup>*) = *<sup>V</sup>α*(−*Q*, *<sup>ω</sup>*).

**Lemma 2.** *For a given discount factor α and a fixed context-aware weight ω, function fα*(*Q*, *ω*) *for Q increases monotonically with the absolute value of Q, namely: for* ∀|*Q*1| < |*Q*2|, *fα*(*Q*1, *ω*) < *fα*(*Q*2, *ω*)*.*

**Proof of Lemma 2.** Using the induction method, we first assume that for ∀|*Q*1| < |*Q*2|, we have *f* (*k*) *<sup>α</sup>* (*Q*1, *<sup>ω</sup>*) <sup>&</sup>lt; *<sup>f</sup>* (*k*) *<sup>α</sup>* (*Q*2, *<sup>ω</sup>*). Therefore:

$$\begin{split} J\_{a,0}^{(k)}(Q\_1,\omega) &= \mathbb{C}(Q\_1,\omega,0) + \mathfrak{a} \sum\_{\omega' \in \mathcal{W}} p\_{\omega\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a V\_a^{(k)}(Q\_1+a,\omega') \\ &= \omega Q\_1^2 + \mathfrak{a} \sum\_{\omega' \in \mathcal{W}} p\_{\omega\omega'} f\_a^{(k)}(Q\_1,\omega') \\ &< \mathbb{C}(Q\_2,\omega,0) + \mathfrak{a} \sum\_{\omega' \in \mathcal{W}} p\_{\omega\omega'} f\_a^{(k)}(Q\_2,\omega') \\ &= J\_{a,0}^{(k)}(Q\_2,\omega). \end{split} \tag{16}$$

Similarly, we can obtain *J* (*k*) *<sup>α</sup>*,1 (*Q*1, *ω*) < *J* (*k*) *<sup>α</sup>*,1 (*Q*2, *<sup>ω</sup>*). Meanwhile, *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) <sup>=</sup> min*<sup>U</sup> J* (*k*) *<sup>α</sup>*,*U*(*Q*, *<sup>ω</sup>*), then we have *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*1, *<sup>ω</sup>*) <sup>&</sup>lt; *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*2, *<sup>ω</sup>*), for ∀|*Q*1<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>*Q*2|. Obviously, if we want to use induction to complete the proof of Lemma 2, we have to prove that: *f* (*k*+1) *<sup>α</sup>* (*Q*1, *<sup>ω</sup>*) <sup>&</sup>lt; *<sup>f</sup>* (*k*+1) *<sup>α</sup>* (*Q*2, *<sup>ω</sup>*), for ∀|*Q*1<sup>|</sup> <sup>&</sup>lt; <sup>|</sup>*Q*2|. To simplify the proof, it is assumed that *Q*<sup>2</sup> > *Q*<sup>1</sup> > 0. The discussion will be divided into the following three situations.

• When *Am* ≤ |*Q*1|, then |*Q*<sup>1</sup> + *a*| < |*Q*<sup>2</sup> + *a*|, for ∀*a* ∈ [−*Am*, *Am*], we can derive that:

$$f\_{\boldsymbol{a}}^{(k+1)}(Q\_1, \omega) = \sum\_{\boldsymbol{a} = -A\_m}^{A\_m} p\_{\boldsymbol{a}} V\_{\boldsymbol{a}}^{(k+1)}(Q\_1 + \boldsymbol{a}, \omega)$$

$$ < \sum\_{\boldsymbol{a} = -A\_m}^{A\_m} p\_{\boldsymbol{a}} V\_{\boldsymbol{a}}^{(k+1)}(Q\_2 + \boldsymbol{a}, \omega) = f\_{\boldsymbol{a}}^{(k+1)}(Q\_2, \omega). \tag{17}$$

• When *Am* <sup>&</sup>gt; <sup>|</sup>*Q*2|, there exists an increment *<sup>a</sup>* ∈ A <sup>=</sup> {*a*|*<sup>a</sup>* <sup>∈</sup> [−*Am*, <sup>−</sup><sup>1</sup> <sup>2</sup> (*Q*<sup>1</sup> + *Q*2)}, such that |*Q*<sup>1</sup> + *a* | > |*Q*<sup>2</sup> + *a* <sup>|</sup>, and *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*<sup>1</sup> <sup>+</sup> *<sup>a</sup>* , *ω* ) <sup>&</sup>gt; *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*<sup>2</sup> <sup>+</sup> *<sup>a</sup>* , *ω* ). Notice that <sup>−</sup>*Q*<sup>1</sup> <sup>−</sup> *<sup>a</sup>* <sup>∈</sup> ( <sup>1</sup> <sup>2</sup> (*Q*<sup>2</sup> − *Q*1), *Am* − *Q*1] and *Q*<sup>2</sup> + *a* ∈ [−*Am* + *Q*2, *Am* + *Q*2], then *<sup>p</sup>*−*Q*1−*a*−*Q*2*V*(*k*+1) *<sup>α</sup>* (−*Q*<sup>1</sup> <sup>−</sup> *<sup>a</sup>* , *ω*) is a term in the summation *f* (*k*+1) *<sup>α</sup>* (*Q*2, *<sup>ω</sup>*), namely ∑*Am <sup>a</sup>*=−*Am paVα*(*<sup>Q</sup>* <sup>+</sup> *<sup>a</sup>*, *<sup>ω</sup>*). Similarly, *<sup>p</sup>*−*Q*2−*a*−*Q*1*V*(*k*+1) *<sup>α</sup>* (−*Q*<sup>2</sup> <sup>−</sup> *<sup>a</sup>* , *ω*) is a term in the summation *f* (*k*+1) *<sup>α</sup>* (*Q*1, *<sup>ω</sup>*). We further define <sup>A</sup> <sup>=</sup> {*a*|*<sup>a</sup>* <sup>=</sup> <sup>−</sup>*Q*<sup>1</sup> <sup>−</sup> *<sup>Q</sup>*<sup>2</sup> <sup>−</sup> *<sup>a</sup>* }, since −*Q*<sup>1</sup> − *Q*<sup>2</sup> − *a* ∈ " −1 <sup>2</sup> (*Q*<sup>1</sup> + *Q*2), *Am* − *Q*<sup>1</sup> − *Q*<sup>2</sup> , then *A* ∩ *A* = ∅.

Furthermore, the probability of the estimation error transferring from *Q*<sup>1</sup> to −*Q*<sup>2</sup> − *a* , i.e., *p*−*Q*2−*a*−*Q*<sup>1</sup> equals *p*−*Q*1−*a*−*Q*<sup>2</sup> , the probability of the estimation error transferring from *Q*<sup>2</sup> to −*Q*<sup>1</sup> − *a* . Since <sup>−</sup>*a* <sup>∈</sup> ( <sup>1</sup> <sup>2</sup> (*Q*<sup>1</sup> + *Q*2), *Am*], then |*a* | > | − *Q*<sup>1</sup> − *Q*<sup>2</sup> − *a* |. According to our assumption of the increment, we can prove that for any *a* ∈ A , *pa* < *<sup>p</sup>*−*Q*1−*Q*2−*a* . Then, we can derive:

$$\begin{split} &f\_{a}^{(k+1)}(Q\_{1},\omega)-f\_{a}^{(k+1)}(Q\_{2},\omega) \\ &=\sum\_{a\in\mathcal{A}'}p\_{a}V\_{a}^{(k+1)}(Q\_{1}+a,\omega)+\sum\_{a\in\mathcal{A}''}p\_{a}V\_{a}^{(k+1)}(Q\_{1}+a,\omega) \\ &-\sum\_{a\in\mathcal{A}'}p\_{a}V\_{a}^{(k+1)}(Q\_{2}+a,\omega)-\sum\_{a\in\mathcal{A}''}p\_{a}V\_{a}^{(k+1)}(Q\_{2}+a,\omega)+M(Q\_{1},Q\_{2}) \\ &=\sum\_{a\in\mathcal{A}'}p\_{a}\left\{V\_{a}^{(k+1)}(Q\_{1}+a,\omega)-V\_{a}^{(k+1)}(Q\_{2}+a,\omega)\right\} \\ &+\sum\_{a\in\mathcal{A}'}p\_{-\operatorname{Q}\_{1}-\operatorname{Q}\_{2}-a}\left\{V\_{a}^{(k+1)}(Q\_{2}+a,\omega)-V\_{a}^{(k+1)}(Q\_{1}+a,\omega)\right\}+M(Q\_{1},Q\_{2}) \\ &=\sum\_{a\in\mathcal{A}'}\left(p\_{a}-p\_{-\operatorname{Q}\_{1}-\operatorname{Q}\_{2}-a}\right)\left\{V\_{a}^{(k+1)}(Q\_{1}+a,\omega)-V\_{a}^{(k+1)}(Q\_{2}+a,\omega)\right\}+M(Q\_{1},Q\_{2})<0, \end{split} \tag{18}$$

$$\text{where }M(Q\_{1},Q\_{2})=\sum\_{a\in\mathcal{A}'}M(Q\_{1},Q\_{2})=\sum\_{a\in\mathcal{A}'}p\_{a}\left(V\_{a}^{(k+1)}(Q\_{1}+a,\omega)-V\_{a}^{(k+1)}(Q\_{2}+a,\omega)\right)<0. \tag{19}$$

• When <sup>|</sup>*Q*2<sup>|</sup> <sup>&</sup>gt; *Am* <sup>&</sup>gt; <sup>|</sup>*Q*1|, since *<sup>a</sup>* <sup>∈</sup> [−*Am*, <sup>−</sup><sup>1</sup> <sup>2</sup> (*Q*<sup>1</sup> + *Q*2)), we only need to consider the case when *Am* > <sup>1</sup> <sup>2</sup> (*Q*<sup>1</sup> <sup>+</sup> *<sup>Q</sup>*2), in this case <sup>−</sup>*Q*<sup>1</sup> <sup>−</sup> *<sup>a</sup>* <sup>&</sup>gt; <sup>1</sup> <sup>2</sup> (*Q*<sup>2</sup> − *Q*1) > *Q*<sup>2</sup> − *Am*. Therefore, *<sup>p</sup>*−*Q*1−*a*−*Q*2*V*(*k*+1) *<sup>α</sup>* (−*Q*<sup>1</sup> <sup>−</sup> *<sup>a</sup>* , *ω*) is a term in the summation *f* (*k*+1) *<sup>α</sup>* (*Q*2, *<sup>ω</sup>*). Similarly, we can also prove that *f* (*k*+1) *<sup>α</sup>* (*Q*1, *<sup>ω</sup>*) <sup>&</sup>lt; *<sup>f</sup>* (*k*+1) *<sup>α</sup>* (*Q*2, *<sup>ω</sup>*) when <sup>|</sup>*Q*2<sup>|</sup> <sup>&</sup>gt; *Am* <sup>&</sup>gt; <sup>|</sup>*Q*1|.

According to Lemma 1, the conclusions above can be easily generalized to the cases without the condition *<sup>Q</sup>*<sup>2</sup> <sup>&</sup>gt; *<sup>Q</sup>*<sup>1</sup> <sup>&</sup>gt; 0. Finally, by letting *<sup>k</sup>* <sup>→</sup> <sup>∞</sup>, *<sup>V</sup>*(*k*+1) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) <sup>→</sup> *<sup>V</sup>α*(*Q*, *<sup>ω</sup>*), therefore: *f* (*k*+1) *<sup>α</sup>* (*Q*, *<sup>ω</sup>*) <sup>→</sup> *<sup>f</sup>α*(*Q*, *<sup>ω</sup>*). Hence: *<sup>f</sup>α*(*Q*1, *<sup>ω</sup>*) <sup>&</sup>lt; *<sup>f</sup>α*(*Q*2, *<sup>ω</sup>*).

**Remark 1.** *Lemma 2 holds when fA*(*a*)*, i.e., the PDF of increment A*(*t*) *satisfies the following conditions:*


Then, with Lemmas 1 and 2, we can prove the threshold structure of the optimal stationary deterministic policy which minimizes *L<sup>π</sup> <sup>λ</sup>* in (10).

**Theorem 2.** *For a given λ, the optimal stationary deterministic policy which minimizes L<sup>π</sup> <sup>λ</sup> in* (10) *has a threshold structure when the context-aware weight is a first-order irreducible positive recurrent Markov process.*

**Proof of Theorem 2.** Let *s*∗ *<sup>α</sup>*(*Q*, *ω*) denote the optimal action which minimizes the discounted cost *Vα*(*Q*, *ω*) at state (*Q*, *ω*). If the optimal action *s*<sup>∗</sup> *<sup>α</sup>*(*Q*, *ω*) = 1, then the vehicle will transmit its status update to the center at state (*Q*, *ω*) and Δ(*Q*, *ω*) ≥ 0. Thus, we have:

$$\Delta(Q,\omega) = p\_s a \sum\_{\omega' \in W} p\_{\omega \omega'} \sum\_{a=-A\_m}^{A\_m} p\_a \left\{ V\_a \left( Q + a, \omega' \right) - V\_a \left( a, \omega' \right) \right\} - \lambda \ge 0. \tag{19}$$

According to Lemma 2, for any |*Q* | > |*Q*|, Δ(*Q* , *ω*) can be lower bounded by

$$\Delta(Q',\omega) = p\_{\text{sf}}a \sum\_{\omega' \in \mathcal{W}} p\_{a\omega\omega'} \sum\_{a=-A\_W}^{A\_m} p\_a \left\{ V\_a \left( Q' + a, \omega' \right) - V\_a \left( a, \omega' \right) \right\} - \lambda$$

$$\geq p\_{\text{sf}}a \sum\_{\omega' \in \mathcal{W}} p\_{a\omega\omega'} \sum\_{a=-A\_W}^{A\_m} p\_a \left\{ V\_a \left( Q + a, \omega' \right) - V\_a \left( a, \omega' \right) \right\} - \lambda \geq 0. \tag{20}$$

If Δ(*Q*, *ω*) > 0, then for any states with |*Q* | > |*Q*|, the optimal policy is to transmit the status to the center. If Δ(*Q*, *ω*) < 0, then for any states with |*Q* | < |*Q*|, the optimal action is not to transmit. In addition, the optimal policy will not be choosing to wait in all the slots. Therefore, for each context-aware weight *ω*, there must be a threshold *τω* ≥ 0. For any state (*Q*, *ω*) with |*Q*| > *τω*, the optimal choice is to transmit the status update. We can then conclude that for a given weight *ω*, the optimal policy with a discount factor *α* has a threshold structure.

Let {*α*1, *α*2, ··· , *αk*} denote a sequence of discount factors and *α<sup>k</sup>* converges to 1 when *k* → ∞. Then, the optimal deterministic policy for *α* = 1 will also converge to the optimal policy with a discount factor which is less than 1 [35]. Similar derivation is also applied in [12]. Therefore, we can prove the threshold structure of the optimal stationary deterministic policy which minimizes *L<sup>π</sup> λ* .

Similarly, when the context-aware weight is i.i.d. over time, we can obtain the following theorem:

**Theorem 3.** *For a given λ, the optimal stationary deterministic policy which minimizes L<sup>π</sup> <sup>λ</sup> in* (10) *has a threshold structure when the context-aware weight is i.i.d. over time. The thresholds are the same for each state of the context-aware weight.*

**Proof of Theorem 3.** If the context-aware weight is i.i.d. over time, then we have:

$$\Delta(Q,\omega) = p\_s a \sum\_{\omega' \in W} p\_{\omega'} \sum\_{a=-A\_m}^{A\_m} p\_a \left\{ V\_a \left( Q + a, \omega' \right) - V\_a \left( a, \omega' \right) \right\} - \lambda = \Delta(Q), \tag{21}$$

where *p<sup>ω</sup>* is the probability of the value of the context-aware weight being in state *ω*. Therefore, in this case, the state will be reduced to one dimension and the thresholds will be the same for all the states of the context-aware weight.

According to Theorems 2 and 3, we proved the threshold structure of the two stationary deterministic policies that compose the UoI-optimal policy. Since the UoI-optimal policy for problem (9) is a probabilistic combination of two deterministic policies with threshold structures, we can finally draw the conclusion that the UoI-optimal policy also has a threshold structure.

#### *3.3. Numerical Solution of Optimal Strategy*

Based on Theorem 2, we only need to consider the policy that chooses to update with a probability of 1 in state (*Q*, *ω*), for ∀|*Q*| ≥ *Q*max = max*<sup>ω</sup> τω*. Let *μQ*,*<sup>ω</sup>* denote the probability that the state of the vehicle is (*Q*, *ω*). *yQ*,*<sup>ω</sup>* denotes the probability that the state is (*Q*, *ω*) and the vehicle chooses to transmit an update. Therefore, we have:

**Theorem 4.** *When the context-aware weight is a first-order irreducible positive recurrent Markov process, the UoI-optimal policy can be derived by solving the following LP problem:*

$$\{\mu\_{Q,\omega\prime}^\* y\_{Q,\omega\prime}^\*\} = \arg\min\_{\{\mu\_{Q,\omega\prime} y\_{Q,\omega\prime}\}} \sum\_{\omega\prime \in W} \sum\_{Q=-Q\_{\text{max}}}^{Q\_{\text{max}}} \omega Q^2 \mu\_{Q\omega\prime} \tag{22a}$$

$$\text{s.t.} \sum\_{\omega \in W} \sum\_{Q = -Q\_{\text{max}}}^{Q\_{\text{max}}} \mu\_{Q, \omega} = 1,\tag{22b}$$

$$\sum\_{\substack{Q \le W \ Q = \dots = Q\_{\text{max}}}} \sum\_{Q\_{Q, \omega^\*} \le \rho\_{\text{\textdegree}}}^{Q\_{\text{max}}} y\_{Q, \omega^\*} \le \rho\_{\text{\textdegree}} \tag{22c}$$

$$y\_{Q,\omega} \le \mu\_{Q,\omega\_{\prime}} \forall Q, \omega\_{\prime} \tag{22d}$$

$$0 \le y\_{Q,\omega} \le 1, 0 \le \mu\_{Q,\omega} \le 1, \forall Q, \omega,\tag{22e}$$

$$\begin{split} \mu\_{Q,\omega} &= \sum\_{\omega' \in \mathcal{W}} \sum\_{Q'= -Q\_{\text{max}}}^{Q\_{\text{max}}} \mathcal{Y}\_{Q',\omega'} p\_{s} p\_{Q} p\_{\omega\omega'} \\ &+ \sum\_{\omega' \in \mathcal{W}} \sum\_{Q'= -Q\_{\text{max}}}^{Q\_{\text{max}}} \left( \mu\_{Q',\omega'} - \mathcal{y}\_{Q',\omega'} p\_{s} \right) p\_{Q'-Q} p\_{\omega\omega'}. \end{split} \tag{22f}$$

**Proof of Theorem 4.** We first derive the average UoI *C*¯*<sup>π</sup>* as a function of *μQ*,*<sup>ω</sup>* and *yQ*,*ω*. The vehicle is in state (*Q*, *ω*) and produces a cost of *C*(*Q*, *ω*, *u*) = *ωQ*<sup>2</sup> with a probability of *μQ*,*ω*. Therefore, the average UoI is:

$$\sum\_{\omega \in W} \sum\_{Q = \dots = Q\_{\text{max}}}^{Q\_{\text{max}}} \omega Q^2 \mu\_{Q, \omega}. \tag{23}$$

As for the constraints, (22b) means that the sum of the probabilities of all the states should be 1. To explain (22c), we note that *yQ*,*<sup>ω</sup>* is the probability of the vehicle being in state (*Q*, *ω*) and choosing to transmit the update, then the expectation of a one-step updating penalty for state (*Q*, *ω*) in (8) is *μQ*,*ω*. Therefore, the constraint on average update frequency *D*¯ *<sup>π</sup>* can be illustrated by

$$\sum\_{\substack{\omega \in W \ Q = -Q\_{\text{max}}}} \sum\_{Q\_{Q,\omega}}^{Q\_{\text{max}}} \mu\_{Q,\omega} \le \rho. \tag{24}$$

Then, we introduce *ξQ*,*<sup>ω</sup>* ∈ [0, 1] to represent that the probability of the vehicle choosing to transmit updates in state (*Q*, *ω*) and (22d) can be obtained by the fact that *yQ*,*<sup>ω</sup>* = *μQ*,*ωξQ*,*ω*, while (22e) is derived by the nature of probability.

The right-hand side of (22f) can be viewed as two terms. The first term is the sum of transition probability from all the states to state (*Q*, *ω*) when the vehicle chooses to update and the transmission of status is successful. The second term is the sum of transition probability from all the states to state (*Q*, *ω*) when the transmission is failed or the vehicle chooses to wait. Therefore, we can prove that the optimal solution of problem (5) equals the solution of the LP problem.

When *ω*(*t*) is i.i.d. over time, we can also obtain the UoI-optimal policy through the LP problem proposed in Theorem 4 and only need to use *pω* as a replacement of *pωω* .

#### **4. Scheduling in Unknown Contexts**

To make decisions, the UoI-optimal updating policy obtained in Section 3 still needs the distributions of the context-aware weight *ω*(*t*), the increment *A*(*t*) and the successful transmission probability, which may not be available in advance or may change over time in most practical systems. To solve this problem, we will assume that the distribution of the context-aware weight is not pre-determined and the vehicle has to learn it. In this section, we use the reinforcement learning (RL) algorithm to learn the dynamic of the context and the characteristic of the wireless channel.

To solve this problem, we turn to the model-based RL framework proposed in [36]. We only consider the cases when the UoI-optimal policy has a threshold structure. This assumption makes the optimal policy based on the truncated state space equal the optimal policy of the original problem.

We use the 3-tuple (*s*,*s* , *U*) to formulate the proposed RL-based updating policy. The states in the current slot and next slot are denoted by *s* and *s* , respectively. *U* denotes the action chosen in the current slot. The settings of the discretized state space and the action space are the same as the settings proposed in Section 3.1. The smaller the step size used in the discretization is, the closer our results are to those in continuous state space. In addition, the selection of the step size only affects the accuracy of the update threshold. Therefore, the performance loss caused by discretization can be reduced by choosing a smaller step size.

We display details about the proposed RL-based updating policy in Algorithm 1. At the beginning of episode *k*, we randomly decide whether to explore or exploit. *l* ∈ [0, 1] represents the trade-off between exploration and exploitation during the following episode. A larger *l* means a higher frequency of exploration and vice versa. If the algorithm chooses to explore during this episode, a random policy *πrand*(*s*) will be used, i.e., we randomly choose to update or not in each state to find more valuable actions. If the algorithm chooses to exploit, then we have to obtain the probability transfer functions *p*˜*k*(*s* |*s*, *U*) for each state transmission pair. In Algorithm 1, *N*(*s*, *U*) and *N*(*s*, *U*,*s* ) represent the number of occurrences of state–action pair *s*, *U* and state transition from *s* to *s* given action *U*, respectively. Based on the assumption that the optimal policy has a threshold structure, the policy *π*(*k*) which can minimize the average UoI with the estimated probability transfer functions, can be directly solved through the LP problem proposed in Theorem 4. Then, the vehicle will use policy *π<sup>k</sup>* to derive state–action pairs and the state transitions in the following *Lk* slots. Here, *L* > 0 is defined to control the number of state transitions observed in each episode. At the end of each episode, the model will be updated according to the state–action pairs and the state transitions observed during the episode. Finally, after *K* episodes, the algorithm will output the RL-based updating policy *π*-(*s*), which is derived based on *p*˜*K*(*s* |*s*, *U*).

**Algorithm 1** RL-based Updating Policy

**Input:** *l* ∈ [0, 1], *L* > 0, *K* > 0 1: **for** episodes *k* = 1, 2, . . . , *K* **do** 2: Set *Lk* = *L* <sup>√</sup>*k*,  *<sup>k</sup>* <sup>=</sup> *<sup>l</sup>*/ <sup>√</sup>*k*, uniformly draw *<sup>α</sup>* <sup>∈</sup> [0, 1]. 3: **if** *α* <  *<sup>k</sup>* **then** 4: Set *πk*(*s*) = *πrand*(*s*), 5: **else** 6: **for** each state *s*,*s* ∈ S and *U* ∈ U **do** 7: **if** *N*(*s*, *U*) > 0 **then** 8: Let *p*˜*k*(*s* |*s*, *U*) = *N*(*s*, *U*,*s* )/*N*(*s*, *u*), 9: **else** 10: *p*˜*k*(*s* |*s*, *U*) = 1/|S|. 11: **end if** 12: **end for** 13: obtain policy *πk*(*s*) by solving the estimated CMDP 14: **end if** 15: Randomly choose an initial state *s*(1). 16: **for** slots *t* = 1, 2, . . . , *Lk* − 1 **do** 17: Choose action *U*(*t*) as *πk*(*s*(*t*)). 18: Observe the next state *s*(*t* + 1). 19: *N*(*s*(*t*), *U*(*t*),*s*(*t* + 1)) = *N*(*s*(*t*), *U*(*t*),*s*(*t* + 1)) + 1. 20: *N*(*s*(*t*), *U*(*t*)) = *N*(*s*(*t*), *U*(*t*)) + 1. 21: *s*(*t*) ← *s*(*t* + 1). 22: **end for** 23: **end for** 24: obtain policy *π*-(*s*) by solving the estimated CMDP based on *p*˜*k*(*s* |*s*, *U*), *s*,*s* ∈ S, *U* ∈ U.

**Output:** output the RL-based updating policy *π*-(*s*)

#### **5. Simulation Results and Discussion**

#### *5.1. Simulation Setup*

To facilitate the simulation, we consider the case where the context-aware weight of the vehicle only has two different states: the 'normal' state and 'urgent' state. The 'normal' state means that the vehicle is in ordinary situations and the significance of accuracy of status information is relatively low. We set *ω*(*t*) as 1 in 'normal' state while *ω*(*t*) is set as a constant much larger than 1, *ωe*, in 'urgent' state to show that the vehicle is in emergencies. Two different distributions of the context-aware weight are taken into consideration to conform to the assumptions about *ω*(*t*) used in Section 3.1:


As for the increment *A*(*t*), *Amax* is set to a large enough positive number to simplify the simulations.

**Figure 2.** The state transition diagram of *ω*(*t*).

#### *5.2. Numerical Results*

Figure 3 shows the structure of the UoI-optimal updating policy. For the discretization of the estimation error, the step size used is 1. It can be seen that under the two different distributions of the context-aware weight mentioned above, the optimal updating policies all have threshold structures. Especially when the context-aware weight is i.i.d. over time, Figure 3b shows that thresholds for all the states of the context-aware weight are the same, which matches well with theoretical analysis. From Figure 3c, we can find that the UoI-optimal policy also has threshold structure when increment *A*(*t*) obeys a uniform distribution *Uni f*(−3, 3), which verifies Remark 1. We then simulate the UoI-optimal policy under the contexts with more states to show the policy is generic. We consider a three-state context-aware weight which takes value from *ω*<sup>1</sup> = 1, *ω*<sup>2</sup> = 50, *ω*<sup>3</sup> = 100. The state transition matrix *P*<sup>3</sup> of the three-state context-aware weight is:

$$P\_3 = \begin{bmatrix} 0.997 & 0.002 & 0.001 \\ 0.02 & 0.97 & 0.01 \\ 0.2 & 0.1 & 0.7 \end{bmatrix} \tag{25}$$

where the *j*-th element on the *i*-th row indicates the probability that the context transfers from state *ω<sup>i</sup>* to state *ωj*. The numerical results (Figure 3d) show that when the contextaware weight has more states, the UoI-optimal policy still has a threshold structure, which verifies our theoretical results.

**Figure 3.** Threshold structure of the UoI-optimal updating policy when: (**a**) the context-aware weight is a first-order Markov process, *ρ* = 0.05, *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *ps* = 0.9, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100; (**b**) the context-aware weight is i.i.d. over time, *ρ* = 0.05, *pl* = 0.999, *ph* = 0.001, *ps* = 0.9, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100; (**c**) the context-aware weight is a first order Markov process, *ρ* = 0.05, *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *ps* = 0.9, *ω<sup>e</sup>* = 100, increment in the estimation error during one slot, i.e., *A*(*t*) ∼ *Uni f*(−3, 3), for ∀*t*; and (**d**) the context-aware weight is a three-state first-order Markov process, which takes value from *ω*<sup>1</sup> = 1, *ω*<sup>2</sup> = 50, *ω*<sup>3</sup> = 100 and evolves according to the state transition matrix *P*3, *ρ* = 0.05, *ps* = 0.9, *σ*<sup>2</sup> = 1.

Then, we will focus on the results obtained when the context-aware weight is a first-order irreducible positive recurrent Markov process, as shown in Figure 2. Figure 4 shows the average UoI of the UoI-optimal policy, the AoI-optimal policy derived by CMDP, the RL-based updating policy, and the update-index-based adaptive scheme [27]. In the RL-based updating policy, *L* = 8000, *l* = 1 and *K* = 50. All the numerical results of the RL-based policy are averaged over 100 runs.

First of all, the UoI-optimal policy can only be obtained based on advanced information about the system dynamics. However, the RL-based policy achieves near-optimal without knowing the system dynamics, which indicates that Algorithm 1 learns relatively accurate probability transfer functions from the observed state–action pairs and state transitions during the training.

Secondly, according to Figure 4, the AoI-optimal policy yields a much higher UoI than the three UoI-based policies, namely the UoI-optimal policy, the RL-based updating policy, and the update-index-based adaptive scheme. On the one hand, AoI is one special case of UoI. When the context-aware weight *ω*(*t*) = 1, the increment *A*(*t*) = 1, and the cost function *δ*(*Q*(*t*)) = *Q*(*t*), then UoI equals AoI. Therefore, the AoI-optimal policy ignores the fact that different contexts have different requirements for information freshness. In the proposed UoI-based updating policies, different contexts have different policies and update thresholds, while the AoI-optimal updating policies for different contexts are the same. On the other hand, Figure 5 reveals that the AoI-optimal policy leads to a much higher estimation error, which results in worse performance in terms of UoI. The AoIoptimal policy is an oblivious policy, which is independent of the monitored process. Since AoI increases linearly with time, the AoI-optimal policy can only minimize the linear performance degradation in terms of time. However, the UoI-based policies (the cost function *δ*(*Q*(*t*)) = (*Q*(*t*))2) considered in this paper are process-dependent, which are called non-oblivious policies, and can benefit from both age and process realization [37]. These policies can directly minimize the nonlinear impact exerted by information staleness and the gap between the actual status and the estimated status.

Thirdly, our updating policies outperform the update-index-based adaptive scheme [27] in terms of UoI. Under the adaptive scheme, the vehicle will derive an update index as a function of the current estimation error and the context-aware weight for the next slot. If the index is larger than the adaptive update threshold, then the vehicle is supposed to transmit its status information to the center. If the vehicle transmits an update in slot *t*, then the adaptive threshold will increase in the next slot; otherwise, the adaptive threshold will decrease. The adaptive scheme will cause an overuse of the resource in 'urgent' states and lead to the fact that the vehicles cannot receive resources in 'normal' states. However, the UoI-optimal policy and the trained RL-updating policy are fixed schemes, which can avoid the extremely unbalanced resource allocation between the two contexts and achieve better performance.

Figure 6 shows the influence of the maximum average update frequency *ρ* and the context weight for emergency, *ωe*, on update threshold of UoI-optimal policy. In order to obtain more accurate results, the step size used here is 0.25. The solid curves show update thresholds for the normal state while the dashed curves show update thresholds for the urgent state. When the constraint on update resources is strict, the update thresholds fall faster. Furthermore, a larger *ω<sup>e</sup>* results in a lower update threshold for the urgent state and a higher threshold for the normal state. This phenomenon indicates that the value of *ω<sup>e</sup>* means the tolerance of estimation error in the emergency. When *ρ* < 0.1, the influence of *ω<sup>e</sup>* on the update threshold for the normal state is larger than the urgent state. For the cases where the maximum average update frequency is relatively large, *ω<sup>e</sup>* has little effect on update thresholds for both normal state and urgent state.

Figure 7 shows that the update thresholds also depend on the dynamic of contextaware weight when the weight has first-order Markov property. When *p*<sup>2</sup> is approaching 1 − *p*1, the gap between update thresholds for the urgent state and the normal state becomes smaller for the context-aware weight which tends to be i.i.d. over time. When *p*<sup>2</sup> = 1, the

update threshold for the urgent state exceeds the threshold for the normal state. Therefore, the update threshold for the urgent state is not necessarily lower than the update threshold for the normal state.

**Figure 4.** Average UoI of the UoI-optimal updating policy, the RL-based updating policy, the update-index-based adaptive scheme [27], and the AoI-optimal updating policy when *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *ps* = 0.9, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100.

**Figure 5.** Average squared estimation error of the UoI-optimal updating policy and the AoI-optimal updating policy when *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100.

**Figure 6.** Update thresholds of the UoI-optimal updating policy with different values of *ωe* when *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *ps* = 0.9, *σ*<sup>2</sup> = 1.

**Figure 7.** Update thresholds of the UoI-optimal updating policy with different values of *p*<sup>2</sup> when *p*<sup>1</sup> = 0.01, *ps* = 0.9, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100.

Figure 8 shows the performance of the RL-based updating policy with different values of *L*. According to Algorithm 1, the number of state transitions observed in episode *k* is *L* <sup>√</sup>*k*. Therefore, *<sup>L</sup>* denotes the number of state transitions observed during the whole learning process. Generally speaking, a larger *L* reduces the randomness of the performance and achieves a better UoI. The performance of the RL-based updating policy depends on the accuracy of the model obtained through training, namely whether the estimated probability transfer function of the system is accurate. A larger *L* means that the algorithm can collect more data or state transitions and obtain a more accurate model.

Figure 9 shows the influence of the number of episodes, i.e., *K*, on the performance of the RL-based updating policy. A larger *K* leads to a lower average UoI and smaller randomness over 100 runs. On the one hand, the more episodes and the more data the algorithm observes, the more accurate the model obtained will be and the better the performance of the updating policy will be. On the other hand, the value of *K* is the number of iterations for the policy obtained through the estimated CMDP. The policy *πk*(*s*) used in episode *k* is derived based on the state–action pairs and the state transitions observed in the previous *k* − 1 episodes. Therefore, more frequent iterations of the updating policy can obtain more valuable state–action pairs and better performance.

**Figure 8.** Average UoI of the RL-based updating policy with different values of *L* when *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100.

**Figure 9.** Average UoI of the RL-based updating policy with different values of *K* when *p*<sup>1</sup> = 0.001, *p*<sup>2</sup> = 0.01, *σ*<sup>2</sup> = 1, *ω<sup>e</sup>* = 100.

#### **6. Conclusions**

In this work, we studied how to minimize the performance degradation caused by outdated information in terms of UoI, which is a new metric jointly considering context and information freshness. We proved that the UoI-optimal updating policy for the considered single-user remote monitoring system has a single threshold structure. Then, the policy was obtained through linear programming by assuming that the state transition probability of the system is known in advance. In unknown contexts, we further used a reinforcement learning algorithm to learn the dynamics of the system. Simulations verified the threshold structure of the UoI-optimal policies and showed that the update thresholds decrease as the maximum average update frequency increases. In addition, a larger context-aware weight in emergencies resulted in a lower update threshold for urgent states. However, since the state transition probability also influenced the update thresholds, the update threshold for emergencies was not necessarily higher than the update threshold for normal states, especially when the probability of transferring from urgent states to normal states tended towards 1. Furthermore, the numerical results showed that the proposed RL-based updating policy achieved a near-optimal performance without advanced knowledge of the system model.

In fact, determining the context-aware weight in practical systems, where the models of the context are often very complicated and difficult to obtain in advance, remains open. As for future work, we plan to use deep RL algorithms to learn the models of the context variation. We believe that UoI can provide a new performance metric for information timeliness measurement in the future V2X scenario. In addition, we believe the proposed UoI metric and the context-aware scheduling policy can shed some light on low-latency and ultra-reliable wireless communication in the future 5G/6G systems.

**Author Contributions:** Conceptualization, L.W.; formal analysis, L.W.; methodology, L.W.; software, L.W.; supervision, S.Z. and Z.N.; writing—original draft, L.W.; writing—review and editing, J.S., Y.S., S.Z. and Z.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is sponsored in part by the National Key R&D Program of China No. 2020YFB1806605, by the Nature Science Foundation of China (No. 62022049, No. 61871254, No. 61861136003), by the China Postdoctoral Science Foundation No. 2020M680558, and Hitachi Ltd.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Proof of Theorem 1**

Given a state *s* = (*Q*, *ω*) ∈ S and a nonempty subset of the state space, G ⊂ S, let <sup>R</sup>(*s*, <sup>G</sup>) denote the class of policies *<sup>θ</sup>* such that the probability *<sup>P</sup><sup>θ</sup>* (*s*(*t*) <sup>∈</sup> <sup>G</sup> for some *t* ≥ 1|*s*(0) = *s*) = 1 and the expected time *ms*,G(*θ*) of the first passage from *s* to G under policy *<sup>θ</sup>* is finite. Then, let <sup>R</sup>-(*s*, G) denote the class of policies *θ* such that the expected average UoI *cs*,G(*θ*) and the expected transmission cost *ds*,G(*θ*) of the first passage from *s* to G are finite and *θ* ∈ R(*s*, G). To prove Theorem 1, we then introduce Assumptions A1–A5 in [33]:

**Assumption A1.** *For all b* > 0*, the set* G(*b*) - {*s*| *there exists an action U such that C*(*s*, *U*) + *D*(*s*, *U*) ≤ *b*} *is finite.*

**Assumption A2.** *There exists a stationary deterministic policy π that induces a Markov chain with the following properties: the state space consists of a single (nonempty) positive recurrent class* <sup>R</sup>*<sup>π</sup> and a set* <sup>T</sup>*<sup>π</sup> of transient states such that <sup>π</sup>* ∈ R-(*s*, <sup>R</sup>*π*)*, for any <sup>s</sup>* <sup>∈</sup> <sup>T</sup>*π, and both the average UoI C*¯*<sup>π</sup> and the average transmission cost D*¯ *<sup>π</sup> on* R*<sup>π</sup> are finite.*

**Assumption A3.** *Given any two states s*,*s* ∈ S *and s* = *s , there exists a policy π (a function of s and s ) such that <sup>π</sup>* ∈ R-(*s*, {*s* })*.*

**Assumption A4.** *If a stationary deterministic policy has at least one positive recurrent state, then it has a single positive recurrent class, and this class contains the state* (*Q*, *ω*) *with Q* = 0*.*

**Assumption A5.** *There exists a policy π such that the average UoI C*¯*<sup>π</sup>* < ∞ *and average transmission cost D*¯ *<sup>π</sup>* < *ρ.*

Furthermore, the problem (9) has the following property:

**Lemma A1.** *Assumptions A1–A5 hold for problem* (9)*.*

**Proof of Lemma A1.** First of all, we focus on the cases where the context-aware weight is assumed as a first-order irreducible positive recurrent Markov process:


Since the evolution of the context-aware weight is independent with the evolution of the estimation error and the updating policy. Therefore, we first focused on the estimation error, which can be formulated as a one-dimensional irreducible Markov chain with state space Q = {0, ±Δ*Q*, ±2Δ*Q*, ··· , ±*n*Δ*Q*, ···}. We denote the set of states which can transfer to state *Q* in a single step by Z*Q*. The probability of the estimation error transferring from state *Q* to state *Q* at the *k*-th step without an arrival to state *Q* = 0 is defined as *P <sup>Q</sup>*,*Q*,*k*. Obviously, ∑*Q*∈<sup>Q</sup> *P <sup>Q</sup>*,*Q*,*<sup>k</sup>* <sup>&</sup>lt; (<sup>1</sup> <sup>−</sup> *ps*)*k*. Then, the probability of the first passage from state *Q*(*Q* = 0) to 0 taking *k* + 1 steps is ∑*Q*∈Z/ <sup>0</sup> *P <sup>Q</sup>*,*Q*,*<sup>k</sup> ps* + <sup>∑</sup>*Q*∈Z<sup>0</sup> *<sup>P</sup> <sup>Q</sup>*,*Q*,*k*(*ps* + (<sup>1</sup> <sup>−</sup> *ps*)*p*0−*Q*) <sup>&</sup>lt; (<sup>1</sup> <sup>−</sup> *ps*)*k*, where *<sup>p</sup>*0−*Q* is the probability that the increment in estimation error is −*Q* . Therefore, the expected time of the first passage from *Q*(*Q* = 0) to 0 is finite.

For state *Q* = 0, the estimation error will stay in this state in the next step with a probability of *ps* + *p*0−<sup>0</sup> and will first return to state *Q* = 0 in the second transition with a probability smaller than (1 − *ps* − *p*0−0). Then, starting from state *Q* = 0, the estimation error will first return to state *Q* = 0 in the *k* + 1-th (*k* > 2) step will be smaller than (<sup>1</sup> <sup>−</sup> *ps* <sup>−</sup> *<sup>p</sup>*0−0)(<sup>1</sup> <sup>−</sup> *ps*)*k*−1. Therefore, we can prove that state *<sup>Q</sup>* <sup>=</sup> 0 is a positive recurrent state, and R*<sup>π</sup> <sup>Q</sup>* = {*Q* = 0} is a positive recurrent class of the induced Markov chain of the estimation error. Furthermore, for any states in T*<sup>π</sup> <sup>Q</sup>* <sup>=</sup> <sup>Q</sup>\R*<sup>π</sup> <sup>Q</sup>*, the expected time of the first passage from the state in T*<sup>π</sup> <sup>Q</sup>* to state *Q* = 0 under *π* is finite and the probability of the states in T*<sup>π</sup> <sup>Q</sup>* not getting to state *Q* = 0 in *k* steps is smaller than (<sup>1</sup> <sup>−</sup> *ps*)*k*.

Define the probability of state *Q* transferring to state *Q* in *k* steps for the first time as *PQ*,*Q*,*k*. Then, the probability of state (*Q*, *ω*) transferring to state (*Q* , *ω* ) in *k* steps for the first time is *PQ*,*Q*,*kP<sup>ω</sup>*,*ω*,*k*. Since <sup>∑</sup><sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *PQ*,*Q*,*kk* <sup>&</sup>lt; <sup>∞</sup> and <sup>∑</sup><sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *Pω*,*ω*,*kk* < ∞, then ∑<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *PQ*,*Q*,*kP<sup>ω</sup>*,*ω*,*kk* <sup>&</sup>lt; <sup>∞</sup>. Therefore, the set of states <sup>R</sup>*<sup>π</sup>* <sup>=</sup> {(*Q*, *<sup>ω</sup>*)|*<sup>Q</sup>* <sup>∈</sup> <sup>R</sup>*<sup>π</sup> <sup>Q</sup>*, *ω* ∈ W} is a positive recurrent class. Similarly, we can prove that <sup>T</sup>*<sup>π</sup>* <sup>=</sup> <sup>S</sup>\R*<sup>π</sup>* satisfies Assumption A2. Finally, *D*¯ *<sup>π</sup>* = 1 < ∞, *C*¯*<sup>π</sup>* = *E*[*ω*] <sup>1</sup> *ps σ*<sup>2</sup> < ∞.


Similarly, we can prove that Assumptions A1–A5 also holds for problem (9) when the context-aware weight is i.i.d. over time.

Since Assumptions A1–A5 hold for problem (9), then according to Theorem 2.5 in [33], there exists an optimal stationary randomized policy for problem (9). Meanwhile, the optimal policy is a probabilistic combination of two stationary deterministic policies which only differ on at most one state.

Furthermore, according to Lemma 3.9 in [33], the two stationary deterministic policies each optimize the unconstrained cost in (10) with a different *λ*.

#### **References**


## *Article* **Relationship between Age and Value of Information for a Noisy Ornstein–Uhlenbeck Process**

**Zijing Wang, Mihai-Alin Badiu and Justin P. Coon \***

Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK; zijing.wang@balliol.ox.ac.uk (Z.W.); mihai.badiu@eng.ox.ac.uk (M.-A.B.) **\*** Correspondence: justin.coon@eng.ox.ac.uk

**Abstract:** The age of information (AoI) has been widely used to quantify the information freshness in real-time status update systems. As the AoI is independent of the inherent property of the source data and the context, we introduce a mutual information-based value of information (VoI) framework for hidden Markov models. In this paper, we investigate the VoI and its relationship to the AoI for a noisy Ornstein–Uhlenbeck (OU) process. We explore the effects of correlation and noise on their relationship, and find logarithmic, exponential and linear dependencies between the two in three different regimes. This gives the formal justification for the selection of non-linear AoI functions previously reported in other works. Moreover, we study the statistical properties of the VoI in the example of a queue model, deriving its distribution functions and moments. The lower and upper bounds of the average VoI are also analysed, which can be used for the design and optimisation of freshness-aware networks. Numerical results are presented and further show that, compared with the traditional linear age and some basic non-linear age functions, the proposed VoI framework is more general and suitable for various contexts.

**Keywords:** value of information; age of information; noisy Ornstein–Uhlenbeck process

#### **1. Introduction**

Nowadays, there are more and more real-time monitoring and control applications, such as industrial control, Internet of Things, autonomous driving and so on. Such applications are modelled as status update systems in which sensors need to continuously monitor a targeted random process, and the sampled status updates are required to be transmitted through the communication network to a remote destination in a timely manner to enable precise control and management. Therefore, the freshness of data has emerged as an important part of network research.

The age of information (AoI) is proposed as a novel end-to-end metric in [1,2] to evaluate the timeliness of status updates from the receiver's perspective. The AoI is defined as the time difference between the current time and the generation time of the last received status update. The AoI and its variants (e.g., the average AoI and the peak AoI) are widely used as tools to improve the system-level data freshness by optimising the sampling and link scheduling in a variety of emerging networks [3–8]. Moreover, there are many works exploring the AoI in the context of different queue systems. General expressions of the average AoI were derived in [1], and the stationary distribution of the AoI was studied in [9,10] for first-come-first-serve (FCFS) M/M/1, M/D/1 and D/M/1 queue disciplines. The statistical characterisation and violation probability of the AoI were treated in [11,12] for last-come-first-serve (LCFS) queue disciplines. The influence of the queue's buffer size, packet management and service pre-emption on the AoI and its distribution was investigated in [13–15].

However, the basic notion of the AoI grows linearly with a unit slope as time goes by, and it is independent of the context and the inherent characterisation of the targeted random process (e.g., the correlation property of the underlying source data). In light of these issues,

**Citation:** Wang, Z.; Badiu, M.-A.; Coon, J.P. Relationship between Age and Value of Information for a Noisy Ornstein-Uhlenbeck Process. *Entropy* **2021**, *23*, 940. https://doi.org/ 10.3390/e23080940

Academic Editors: Anthony Ephremides and Yin Sun

Received: 18 June 2021 Accepted: 20 July 2021 Published: 23 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the concept of the value of information (VoI) has begun to be studied, which emphasises the idea that in some cases, old information may still have value while even fresh information may hold little value, as different sources require different update frequency.

The idea of a non-linear age has become a common approach to evaluate information value [16]. The concept of the "age penalty" was proposed in [17], where it was assumed to be a non-decreasing function of the AoI and provided a general way to measure the dissatisfaction of the staleness of information. Closed-form expressions of the general penalty functions were studied in energy harvesting networks in [18]. In [19–21], three specific penalty functions (exponential, linear and logarithmic functions) and their statistical characterisations were further investigated. Moreover, the connection of the AoI with signal processing and information theory has received much attention, as it can provide a theoretical basis for non-linear age functions. The mean square error (MSE) for remote estimation can add non-linearity, and it was used to evaluate the information value in [22–25]. The relationship between the AoI and the MSE was studied in the Wiener process [22] and the Ornstein–Uhlenbeck (OU) process [23]. It is interesting to note that the age-optimal sampling policy was not equivalent to the MSE-optimal sampling. The mutual information was utilised in [26] to quantify the timeliness of data, and the optimal sampling policy was explored for a Markov source. In [26], the samples were assumed to be directly observable when they were received. In practice, samples at the source can be corrupted by noise, errors or measurements, and thus, they may be latent at the receiver. However, properties of the information value in hidden Markov models have not been explicitly studied. Furthermore, the authors in [20] proposed that age penalty functions can be chosen and adjusted, according to the autocorrelation of the underlying random process, but theoretical interpretation or formal justification for how to choose non-linear functions and how they relate to the correlation of the underlying process were not provided.

In our previous work [27], we proposed a mutual information-based value of information framework for hidden Markov models and started to explore it in the context of a noisy OU process. We obtained the closed-form expression of the VoI, which relates to the correlation of the process under observation at the source and the noise in the transmission environment, but we did not investigate its relationship to the AoI and its statistical characterisations in more depth. In this paper, the connection of the proposed VoI with the AoI is studied for a noisy OU process. The OU process is considered, as it is an important continuous-time, stationary, Markov and Gaussian random process, which is practical to represent many real-world applications [28]. For example, it can be used to model the mobility of a drone that moves towards a target point but experiences positional fluctuations in unmanned aerial vehicle (UAV) networks. In this work, we give the formal justification for how the correlation and the noise in the context affect the VoI and its relationship to AoI, and obtain the functional dependency between them. We show that the proposed VoI framework is a general one that includes the special sample cases given in [20], and it is suitable to be applied in different network settings. Moreover, we study the VoI in a FCFS M/M/1 queue model, deriving the probability density function (PDF), cumulative distribution function (CDF), average VoI and moment-generating function (MGF). We also derive the upper and lower bounds of the average VoI, which are tractable and useful for the design and optimisation of freshness-aware applications. Through all of these results, we provide a clear statistical framework linking the VoI to the AoI and a formal justification for the selection of non-linear age functions.

The rest of this paper is organised as follows. The VoI formalism in the noisy OU process model is introduced in Section 2. Relationships between the VoI and the AoI for different network settings are investigated in Section 3. The statistical characterisation of the VoI in the FCFS M/M/1 queue model is given in Section 4. Numerical results are provided in Section 5. Conclusions are drawn in Section 6.

#### **2. VoI with Application to OU Processes**

Here, we provide a brief introduction to the VoI framework that is used in this paper, and we recount key results reported in [27] that will be used later in the paper.

#### *2.1. VoI Definition*

We consider a real-time status update system with a pair of transmitter and receiver nodes. The source samples the data of a targeted random process {*Xt*} and sends status updates to the receiver node for further analysis. Denote *Xti* as the *i*-th status update of the underlying random process. Denote *Yt <sup>i</sup>* as the corresponding observation which is captured in the observed random process {*Yt*}. Here, *ti* represents the sampling time of the *i*-th sample, and *t <sup>i</sup>* represents its receiving time. We consider a latent variable model in which the observation *Yt i* may be different from the initial value, as the update *Xti* can be negatively affected by the transmission noise, error or measurement when it is received by the destination in the real world.

In this paper, the notion of the value of information is defined as the mutual information between the current status of the process under observation at the transmitter and a sequence of noisy measurements recorded by the receiver. Specifically, the VoI at the time *t* is given as the following:

$$w(t) = I(X\_{t'}; \mathbf{Y}\_{t'\_n}, \dots, \mathbf{Y}\_{t'\_{n-m+1}}), \quad t > t'\_n. \tag{1}$$

Here, *n* is denoted as the index of the last received update during the period (0, *t*). We look back in time, and the most recent *m* of *n* noisy observations (*m* ≤ *n*) are utilised to evaluate the information value. This definition gives the interpretation of the reduction in the uncertainty of the current hidden status, given that we have some past noisy measurements.

#### *2.2. Noisy OU Process Model*

We assume the random process {*Xt*} under observation is an Ornstein–Uhlenbeck process, which can be used to represent the mean reversion behaviour in practice. The underlying OU process satisfies the following stochastic differential equation:

$$\mathbf{d}X\_t = \kappa(\theta - X\_t)\,\mathrm{d}t + \sigma\,\mathrm{d}W\_t. \tag{2}$$

Here, *κ* (*κ* > 0) is the rate of mean reversion, which can be used to represent the correlation property of status updates, *θ* is the long-term mean, *σ* is the volatility of the random fluctuation, and {*Wt*} is the Wiener process. We assume that the initial value *X*<sup>0</sup> is a Gaussian variable with mean *θ* and variance *<sup>σ</sup>*<sup>2</sup> 2*κ* .

We assume this OU process {*Xt*} is observed through an additive noise channel, and the corresponding noisy observation is defined as the following:

$$\mathcal{Y}\_{t\_i'} = X\_{t\_i} + \mathcal{N}\_{t\_i'} \tag{3}$$

where *Nt i* is the sample of the noise process taken by the receiver at *t i* . Here, the samples {*Nt i* } are assumed to be independent Gaussian variables with zero mean and constant variance *σ*<sup>2</sup> *<sup>n</sup>*. In reality, it can represent the measurement or error that undermines the status update *Xti* of the underlying OU process.

#### *2.3. VoI for the Noisy OU Process*

Based on the model we described, the samples of the underlying OU process are jointly Gaussian and the noise samples are also Gaussian variables, which allow us to calculate the VoI in our previous work [27]. The VoI for the noisy OU process is given as follows:

$$v(t) = -\frac{1}{2}\log\left(1 - e^{-2\mathbf{x}(t - t\_n)} + e^{-2\mathbf{x}(t - t\_n)}\frac{\det(\mathbf{A}\_{mm})}{\gamma \det(\mathbf{A})}\right), \quad t > t'\_n. \tag{4}$$

Here, **A** = *σ*<sup>2</sup> *n***Σ**−<sup>1</sup> **<sup>X</sup>** <sup>+</sup> **<sup>I</sup>** where **<sup>Σ</sup>**−<sup>1</sup> **<sup>X</sup>** represents the covariance matrix of the vector **X** = [*Xtn*−*m*+<sup>1</sup> ,..., *Xtn* ] T, and **<sup>I</sup>** represents the identity matrix of size *<sup>m</sup>*. **<sup>A</sup>***ij* represents the (*<sup>m</sup>* <sup>−</sup> 1) × (*m* − 1) matrix constructed by deleting the *i*th row and the *j*th column of the matrix **A**, and *γ* is denoted as the ratio of the variance of the OU process and the variance of the noise, i.e., the following:

$$\gamma = \frac{\text{Var}[X\_{t\_i}]}{\text{Var}[N\_{t\_i'}]} = \frac{\sigma^2}{2\kappa\sigma\_n^2}. \tag{5}$$

The parameter *γ* is similar to the concept of the signal-to-noise ratio (SNR) in a communication system. In the following, the concept "SNR" refers to this parameter, which is used to compare the randomness in the OU process and the noise in the communication channel.

#### **3. Relationship between VoI and AoI**

The result given in (4) shows the general expression of the VoI in the noisy OU process. In this section, we consider a special case with a single observation (*m* = 1) and explore the relationship between the proposed VoI and the AoI. In the definition of the AoI, we consider that the time instant *tn* is fixed, i.e., we view the AoI as deterministic. What we do here is to create a relationship between the VoI and the conditional AoI (i.e., the AoI conditioned on the most recent sample time).

The concept of the AoI is given as follows [1]:

$$A(t) = t - t\_{n\prime} \quad t > t'\_n. \tag{6}$$

In the noisy OU process, when *m* = 1, the VoI in (4) can be simplified as follows:

$$v(t) = -\frac{1}{2}\log\left(1 - \frac{\gamma}{1+\gamma}e^{-2\kappa(t-t\_n)}\right), \quad t > t\_n',\tag{7}$$

which is supported by the following:

$$0 \le v \le \frac{1}{2} \log(1+\gamma). \tag{8}$$

Therefore, the VoI is further written as a function of the AoI. Let *a* = *A*(*t*); then, the VoI can be written as follows:

$$V(a) = -\frac{1}{2}\log\left(1 - \frac{\gamma}{1+\gamma}e^{-2\chi a}\right). \tag{9}$$

The VoI in (9) and its relationship to the AoI can be largely affected by system parameters. Fixing the random fluctuation parameter *σ*<sup>2</sup> of the OU process, the SNR *γ* relates to two parameters, *κ* and *σ*<sup>2</sup> *<sup>n</sup>*. *κ* can be used to represent the correlation property of the underlying OU process. If *κ* is small, the status updates are highly correlated; as *κ* increases, they become less correlated. *σ*<sup>2</sup> *<sup>n</sup>* represents the noise level in the transmission environment. If *σ*<sup>2</sup> *<sup>n</sup>* is small, the underlying hidden Markov process is dominant, and the VoI approaches its Markov counterpart in the OU model; otherwise, the noise process is dominant. In the following part, the relationship between the VoI and AoI in different SNR regimes is investigated, and we have the following corollaries.

**Corollary 1.** *In the low SNR regime, the VoI can be approximated as an exponential function of the AoI, which is given by the following:*

$$V(a) \approx \frac{\gamma}{2(1+\gamma)} e^{-2\kappa a}.\tag{10}$$

**Proof.** In the low SNR regime (small *γ*), large *κ* and *σ*<sup>2</sup> *<sup>n</sup>* > 0 (or large *σ*<sup>2</sup> *<sup>n</sup>* and *κ* > 0) can lead to small SNR in (5). When *γ* approaches 0, the term *<sup>γ</sup>* <sup>1</sup>+*<sup>γ</sup> <sup>e</sup>*−2*κ<sup>a</sup>* in (9) is small. For small *x*, we have log(1 + *x*) ≈ *x*, thus the result in (10) is obtained.

In the low SNR regime, the dependency between the VoI and AoI is exponential. Less correlated samples or large noise can negatively affect the VoI at the receiver, thus the approximated VoI decreases faster as the AoI increases. For a less correlated data source, even fresh updates may contain little valuable information about the underlying OU process. For a high level of noise, status updates are corrupted, due to the indirect observation.

**Corollary 2.** *In the high SNR regime resulting from high correlation, the VoI can be approximated as a logarithmic function of the AoI, which is given by the following:*

$$V(a) \approx -\frac{1}{2}\log(2\kappa\gamma a + 1) + \frac{1}{2}\log(1+\gamma). \tag{11}$$

**Proof.** For small *<sup>x</sup>*, we have *<sup>e</sup><sup>x</sup>* <sup>≈</sup> <sup>1</sup> <sup>+</sup> *<sup>x</sup>*. Therefore, when *<sup>κ</sup>* <sup>→</sup> 0 in (9), *<sup>e</sup>*−2*κ<sup>a</sup>* <sup>≈</sup> <sup>1</sup> <sup>−</sup> <sup>2</sup>*κa*.

For highly correlated status updates, the VoI is expressed as a logarithmic function, and this means that the VoI decreases slower as the AoI increases. In this case, correlated updates can be transmitted under good channel conditions, thus old samples may still hold enough valuable information.

**Corollary 3.** *In the intermediate SNR regime where <sup>κ</sup>* <sup>→</sup> <sup>0</sup>*, <sup>σ</sup>*<sup>2</sup> *<sup>n</sup>* <sup>→</sup> <sup>∞</sup> *with κσ*<sup>2</sup> *<sup>n</sup> being constant, the VoI can be approximated as a linear function of the AoI, which is given by the following:*

$$V(a) \approx -\kappa \gamma a + \frac{1}{2} \log(1 + \gamma). \tag{12}$$

*In the intermediate SNR regime where <sup>κ</sup>* <sup>→</sup> <sup>∞</sup>*, <sup>σ</sup>*<sup>2</sup> *<sup>n</sup>* <sup>→</sup> <sup>0</sup> *with κσ*<sup>2</sup> *<sup>n</sup> being constant, the VoI can be approximated as an exponential function of the AoI, which is given by the following:*

$$V(a) \approx \frac{\gamma}{2(1+\gamma)} e^{-2\kappa a}.\tag{13}$$

**Proof.** The result in (12) can be derived from Corollary 2 directly. When *σ*<sup>2</sup> *<sup>n</sup>* → ∞, the term 2*κγa* in (11) is small. Therefore, we have log(2*κγa* + 1) ≈ 2*κγa*. The result in (13) matches Corollary 1. When *<sup>κ</sup>* <sup>→</sup> <sup>∞</sup>, the term *<sup>e</sup>*−2*κ<sup>a</sup>* in (9) is small. For small *<sup>x</sup>*, we have log(<sup>1</sup> <sup>+</sup> *<sup>x</sup>*) <sup>≈</sup> *x*, thus the result in (13) is obtained.

The three corollaries stated above provide the compelling insight into the adoption of non-linear AoI functions. In some existing works, exponential and logarithmic non-linear age functions are widely utilised to measure the information value, but they do not give the formal justification for why these functions are selected. Corollaries 1 to 3 provide a theoretic interpretation and explain how the correlation, noise and SNR affect the VoI and its relationship to the AoI in the noisy OU process. Generally, low SNR and high SNR conditions yield exponential and logarithmic relationships. The intermediate SNR regime yields an exponential or linear relationship, which depends on the value of noise and correlation. Therefore, the proposed VoI framework is more complete, general and appropriate to measure the timeliness of information in different SNR regimes.

#### **4. Statistical Properties of the VoI in the M/M/1 Queue Model**

Equations (10)–(13) show general relationships between the VoI and the AoI in the noisy OU process. In this section, we relax the "fixed time instants" restriction given in Section 3 and view the AoI as a random variable to study the distribution of the VoI. We explore the VoI in a specific FCFS M/M/1 queue system and derive its statistical properties (including the PDF, CDF, expectation value and MGF).

#### *4.1. Distribution of the VoI*

We assume that status updates of the underlying OU process are transmitted through a FCFS M/M/1 queue in which they are sampled as a rate *λ* Poisson process, and the service time is a rate *<sup>μ</sup>* exponential process (*<sup>λ</sup>* < *<sup>μ</sup>*). Let random variables *Si* = *ti* − *ti*−<sup>1</sup> (2 ≤ *i* ≤ *n*) be the sampling interval of two packets, which are independent and identically distributed (i.i.d.) exponential random variables with E[*S*] = <sup>1</sup> *<sup>λ</sup>* . Similarly, service times of status updates are also i.i.d. exponential random variables with mean <sup>1</sup> *<sup>μ</sup>* . In the example of the M/M/1 queue, the stationary distribution of the AoI was studied in [11] and the PDF and CDF of the AoI are given as follows:

$$f\_A(a) = \mu \left[ \frac{\mu - \lambda}{\mu} \varepsilon^{-(\mu - \lambda)a} - \left( \frac{\mu}{\mu - \lambda} + \lambda a - \frac{\lambda}{\mu} \right) \varepsilon^{-\mu a} + \frac{\lambda}{\mu - \lambda} \varepsilon^{-\lambda a} \right],\tag{14}$$

$$F\_A(a) = 1 - e^{-(\mu - \lambda)a} + \left(\frac{\mu}{\mu - \lambda} + \lambda a\right)e^{-\mu a} - \frac{\mu}{\mu - \lambda}e^{-\lambda a}.\tag{15}$$

It can be seen that the distribution of the AoI only relates to the queue discipline, which means that it is independent of the inherent statistical characterisations of the underlying random process. As for the distribution of the VoI of a latent OU process with a single observation, we can state the following propositions.

**Proposition 1.** *In the M/M/1 queue model, the PDF of the VoI for the noisy OU process is given by the following:*

$$f\_V(v) = \frac{\mu e^{-2v}}{\kappa (1 - e^{-2v})} \left[ \frac{\mu - \lambda}{\mu} r(v)^{\frac{\mu - \lambda}{2\kappa}} - \left( \frac{\mu}{\mu - \lambda} - \frac{\lambda}{\mu} - \frac{\lambda}{2\kappa} \log r(v) \right) r(v)^{\frac{\mu}{2\kappa}} \right.$$

$$+ \frac{\lambda}{\mu - \lambda} r(v)^{\frac{\lambda}{2\kappa}} \Big], \quad \text{(16)}$$

*where r*(*v*) *is denoted as follows:*

$$r(v) = \frac{(1+\gamma)(1-e^{-2v})}{\gamma}.\tag{17}$$

**Proof.** Since (9) is a monotonically decreasing function, the PDF of the VoI can be calculated by the following:

$$f\_V(v) = f\_A(V^{-1}(v)) \left| \frac{\mathbf{d}}{\mathbf{d}v} \left( V^{-1}(v) \right) \right|. \tag{18}$$

Here, *V*−<sup>1</sup> denotes the inverse function of the VoI given in (9), which can be written as follows:

$$V^{-1}(v) = -\frac{1}{2\kappa} \log\left(\frac{(1+\gamma)(1-\varepsilon^{-2v})}{\gamma}\right),\tag{19}$$

and we have the following:

$$\frac{d}{dv}\left(V^{-1}(v)\right) = -\frac{e^{-2v}}{\kappa(1-e^{-2v})}.\tag{20}$$

Therefore, the PDF of the VoI given in (16) is obtained by substituting (19), (14) and (20) into (18).

**Proposition 2.** *In the M/M/1 queue model, the CDF of the VoI for the noisy OU process is given as follows:*

$$F\_V(v) = r(v)^{\frac{\mu-\lambda}{2\kappa}} - \left(\frac{\mu}{\mu-\lambda} - \frac{\lambda}{2\kappa}\log r(v)\right)r(v)^{\frac{\mu}{2\kappa}} + \frac{\mu}{\mu-\lambda}r(v)^{\frac{\lambda}{2\kappa}}.\tag{21}$$

**Proof.** The CDF is obtained directly by the integral of the PDF, i.e., *FV*(*v*) = P(*V* ≤ *v*) = 1 *v* <sup>0</sup> *fV*(*x*) d*x*.

Propositions 1 and 2 show that the distribution of the VoI relates to the sampling rate *λ*, service rate *μ*, correlation parameter *κ* and noise parameter *σ*<sup>2</sup> *<sup>n</sup>*, while the AoI distribution only relates to parameters *λ* and *μ* for the M/M/1 queue system.

The CDF of the VoI given in Proposition 2 can be interpreted as the "VoI outage probability", i.e., the probability that the VoI is smaller than a given threshold. It is interesting to note that Proposition 2 implies that the VoI outage probability is a monotonically decreasing function of the service rate *μ*, and it converges to *r*(*v*) *λ* <sup>2</sup>*<sup>κ</sup>* as *μ* goes to infinity. The proof of this is given in Appendix A.1. The reason for this decreasing nature of the VoI with *μ* is predictable because one would expect the information value to increase if the service time in the queue reduces.

Proposition 2 also implies that the VoI outage probability first decreases and then increases as the sampling rate *λ* increases. The optimal sampling rate *λ*∗ satisfies *∂* P(*V*≤*v*) *∂λ* |*λ*=*λ*<sup>∗</sup> = 0. The proof of this is provided in Appendix A.2. It is not surprising that small sampling rate *λ* can lead to high outage, due to the lack of fresh updates at the source. It is interesting to find that large sampling rate can also lead to high outage probability, due to the traffic congestion in the queue.

#### *4.2. Moments and Bounds*

In this subsection, we derive the expectation and two bounds of the VoI with a single observation, and calculate the moment-generating function of the VoI. We can state the following two propositions.

**Proposition 3.** *In the M/M/1 queue model, the average VoI for the noisy OU process is given as the following:*

$$\begin{split} \mathbb{E}[V] &= \frac{1}{2} \Big[ \log(1+\gamma) - g\_1 \left( \frac{\gamma}{1+\gamma}, \frac{\mu-\lambda}{2\kappa} \right) - \frac{\mu}{\mu-\lambda} g\_1 \left( \frac{\gamma}{1+\gamma}, \frac{\lambda}{2\kappa} \right) \\ &+ \left( \frac{\mu}{\mu-\lambda} + \frac{\lambda}{2\kappa} \log \frac{\gamma}{1+\gamma} \right) g\_1 \left( \frac{\gamma}{1+\gamma}, \frac{\mu}{2\kappa} \right) - \frac{\lambda}{2\kappa} g\_2 \left( \frac{\gamma}{1+\gamma}, \frac{\mu}{2\kappa} \right) \Big], \end{split} \tag{22}$$

*where two functions g*1(*x*, *y*) *and g*2(*x*, *y*) *are defined for x* > 0 *and y* > 0 *with the following:*

$$g\_1(x,y) = \frac{1}{x^y} \int\_0^x \frac{z^y}{1-z} \,\mathrm{d}z,\tag{23}$$

$$\text{g2}(x, y) = \frac{1}{x^y} \int\_0^x \frac{z^y \log z}{1 - z} \,\text{d}z. \tag{24}$$

*Moreover, the average VoI is lower bounded by the following:*

$$\mathbb{E}[V] \ge -\frac{1}{2}\log\left[1 - \frac{\gamma}{1+\gamma} \left(\frac{\frac{\mu-\lambda}{2\kappa}}{\frac{\mu-\lambda}{2\kappa}+1} - \frac{\frac{\mu-\lambda}{2\kappa}(\frac{\mu+\lambda}{2\kappa}+1)}{\left(\frac{\mu}{2\kappa}+1\right)^2(\frac{\lambda}{2\kappa}+1)}\right)\right],\tag{25}$$

*and it is upper bounded by the following:*

$$\mathrm{E}[V] \le \frac{1}{2} \left[ H\left(\frac{\mu-\lambda}{2\kappa}\right) + \frac{\mu}{\mu-\lambda} H\left(\frac{\lambda}{2\kappa}\right) - \frac{\mu}{\mu-\lambda} H\left(\frac{\mu}{2\kappa}\right) + \frac{\lambda}{2\kappa} \psi^{(1)}\left(1+\frac{\mu}{2\kappa}\right) \right]. \tag{26}$$

Here, *H*(·) represents the harmonic number and the integral representation is given by the following: *H*(*x*) = 1 <sup>1</sup> 0 <sup>1</sup>−*z<sup>x</sup>* <sup>1</sup>−*<sup>z</sup>* <sup>d</sup>*<sup>z</sup>* [29]. *<sup>ψ</sup>*(1)(·) represents the first order polygamma function which is given by *<sup>ψ</sup>*(1)(*x*) = <sup>−</sup> <sup>1</sup> <sup>1</sup> 0 *zx*−1log z <sup>1</sup>−*<sup>z</sup>* <sup>d</sup>*<sup>z</sup>* [30].

#### **Proof.** See Appendix B.

This proposition gives two bounds of the average VoI in the noisy OU process. Compared with the general average VoI, the bounds are more tractable and may be useful for network design and optimisation. The details of the bounds are given in Appendix B as stated above.

The lower bound is based on Jensen's inequality. The equality holds if the VoI is a linear function on the Laplace transform of the AoI (E[*e*−2*κa*]). In Corollary 1, we show that the dependence between the VoI and E[*e*−2*κa*] is approximately linear under the low SNR condition. Therefore, the average VoI approaches this lower bound in the low SNR regime. Moreover, as stated in Appendix B, the upper bound is based on the average VoI in the Markov model. Hence, in the high SNR regime, the upper bound is tight.

**Proposition 4.** *In the M/M/1 queue, the MGF of the VoI for the noisy OU process is given as follows:*

$$\begin{split} M\_{\upsilon}(t) &= \,\_2F\_1\left(\frac{\mu-\lambda}{2\kappa}, \frac{t}{2}; \frac{\mu-\lambda}{2\kappa} + 1; \frac{\gamma}{1+\gamma}\right) + \frac{\mu}{\mu-\lambda} \,\_2F\_1\left(\frac{\lambda}{2\kappa}, \frac{t}{2}; \frac{\lambda}{2\kappa} + 1; \frac{\gamma}{1+\gamma}\right) \\ &- \left(\frac{\mu}{\mu-\lambda} - \frac{\lambda}{\mu}\right) \,\_2F\_1\left(\frac{\mu}{2\kappa}, \frac{t}{2}; \frac{\mu}{2\kappa} + 1; \frac{\gamma}{1+\gamma}\right) - \frac{\lambda}{\mu} \,\_3F\_2\left(\frac{\mu}{2\kappa}, \frac{\mu}{2\kappa}, \frac{t}{2}; \frac{\mu}{2\kappa} + 1; \frac{\mu}{2\kappa} + 1; \frac{\gamma}{1+\gamma}\right). \end{split} \tag{27}$$

Here, *pFq*(*a*1,..., *ap*; *b*1,..., *bq*; *z*) represents the generalised hypergeometric function which is given by the following series:

$${}\_{p}F\_{\emptyset}(a\_1, \ldots, a\_p; b\_1, \ldots, b\_q; z) = \sum\_{n=0}^{\infty} \frac{(a\_1)\_n \ldots (a\_p)\_n}{(b\_1)\_n \ldots (b\_q)\_n} \frac{z^n}{n!} \,. \tag{28}$$

where (·)*<sup>n</sup>* represents the Pochhammer symbol, which is given as follows:

$$(\mathbf{x})\_{\mathbb{N}} = \begin{cases} 1 & n = 0 \\ \prod\_{i=0}^{n-1} (\mathbf{x} - i) & n \ge 1 \end{cases}.\tag{29}$$

**Proof.** See Appendix C.

Moments of the VoI can be obtained by derivatives of the MGF at *t* = 0. The average VoI given in Proposition 3 is the first-order moment and can be derived from the MGF directly. Using this MGF, higher order moments can also be used for the system design and optimisation instead of just utilising the average value.

#### **5. Numerical Results**

In this section, the relationship between the VoI and AoI and the distribution of the VoI are investigated through Monte Carlo simulations. In the simulation, the volatility parameter *<sup>σ</sup>*<sup>2</sup> of the OU model is fixed and set as 1. The sampling times {*ti*} are randomly generated by the rate *λ* Poisson process. The service times of each sample are randomly generated by the rate *μ* exponential process. We set time *t* = 100. For each running round, we record the sampling time of the most recent received update as *tn*, and the AoI and the VoI are calculated by (6) and (7), respectively.

Figures 1–3 show the non-linear relationships between the VoI and the AoI under low, high and intermediate SNR conditions, respectively. Figures 4–7 illustrate the distribution of the VoI, including the PDF, CDF and the outage probability. Figures 8–11 provide the numerical results about the VoI expectation and bounds.

**Figure 1.** Low SNR regime: Comparison of the exact VoI and the exponential VoI versus *κ* for *σ*2 *<sup>n</sup>* ∈ {10, 30} at *t* = 100, sampling rate *λ* = 0.5 and service rate *μ* = 1.

**Figure 2.** High SNR regime: Comparison of the exact VoI and the logarithmic VoI versus *κ* for *σ*2 *<sup>n</sup>* ∈ {1, 5} at *t* = 100, sampling rate *λ* = 0.5 and service rate *μ* = 1.

**Figure 3.** Intermediate SNR regime: Comparison of the exact VoI and the linear VoI versus *κ* for *σ*2 *<sup>n</sup>* ∈ {10, 30} at *t* = 100, sampling rate *λ* = 0.5 and service rate *μ* = 1.

**Figure 4.** The density function of the VoI; correlation parameter *κ* = 0.1, noise parameter *σ*<sup>2</sup> *<sup>n</sup>* = 0.5, sampling rate *λ* = 0.5 and service rate *μ* = 1.

**Figure 5.** The cumulative distribution function of the VoI versus *v* for *σ*<sup>2</sup> *<sup>n</sup>* ∈ {0.5, 1} and *κ* ∈ {0.05, 0.1, 0.2}; sampling rate *λ* = 0.5 and service rate *μ* = 1.

**Figure 6.** The VoI outage probability versus *λ* for *κ* ∈ {0.05, 0.1, 0.2}; threshold *v* = 0.4, noise parameter *σ*<sup>2</sup> *<sup>n</sup>* = 0.5 and service rate *μ* = 1.

**Figure 7.** The VoI outage probability versus *μ* for *κ* ∈ {0.05, 0.1, 0.2}; threshold *v* = 0.4, noise parameter *σ*<sup>2</sup> *<sup>n</sup>* = 0.5 and sampling rate *λ* = 0.2.

**Figure 8.** The average VoI and its bounds versus the sampling rate *λ*; correlation parameter *κ* = 0.1, noise parameter *σ*<sup>2</sup> *<sup>n</sup>* = 0.5 and service rate *μ* = 1.

**Figure 9.** The average VoI and its bounds versus the service rate *μ*; correlation parameter *κ* = 0.1, noise parameter *σ*<sup>2</sup> *<sup>n</sup>* = 0.5 and sampling rate *λ* = 0.2.

**Figure 10.** The average VoI and the lower bound versus *κ* for *σ*<sup>2</sup> *<sup>n</sup>* ∈ {1, 5}; sampling rate *λ* = 0.5 and service rate *μ* = 1.

**Figure 11.** The average VoI and the upper bound versus *κ* for *σ*<sup>2</sup> *<sup>n</sup>* ∈ {0.1, 0.5, 1}; sampling rate *λ* = 0.5 and service rate *μ* = 1.

Figures 1–3 compare the exact VoI in (7) and the approximated VoI for different SNR regimes which are given in (10) to (12). Figure 1 shows that the exponential approximation is suitable when updates are less correlated and the noise is large. Figure 2 shows the opposite behaviour. The logarithmic approximation is more accurate when *κ* and *σ*<sup>2</sup> *<sup>n</sup>* are small. Figure 3 shows that the linear approximation is accurate when *κ* is small but *σ*<sup>2</sup> *<sup>n</sup>* is large. These results verify the functional dependencies between VoI and the AoI, which are discussed in Corollaries 1–3, illustrating that the low, high and intermediate SNR conditions yield exponential, logarithmic and linear relationships.

Figure 4 gives the numerical validation of the theoretical PDF given in Proposition 1 and the density of the discrete path of the VoI obtained from the Monte Carlo simulations. Figures 5–7 show the VoI outage probability given in Proposition 2 for different system parameters. In Figure 5, the VoI outage probability is high when the status updates are less correlated or when the system experiences large noise. For a particular threshold *v*, Figure 6 shows that either a too-small or too-large sampling rate can lead to a large VoI outage probability. Fixing *μ*, small *λ* means that we do not have sufficient newly generated status updates about the underlying OU process for prediction. Large *λ* means that enough newly generated updates have been sampled at the source, but they have to wait for a longer time, due to the packet congestion in the FCFS queue. Figure 7 shows that VoI outage probability decreases as the service rate *μ* increases. In the M/M/1 model, *λ* is smaller than *μ*. Fixing *λ*, large *μ* means that status updates can be served and transmitted

more quickly, thus the receiver can hold more valuable information about the underlying process. These two figures verify the discussion given in Proposition 2.

Figures 8 and 9 show the effect of the sampling rate and service rate on the average VoI and its bounds given in Proposition 3. The average VoI and its bounds first increase and then decrease as *λ* increases, and they increase as *μ* increases. This behaviour is similar to the VoI outage and can be explained similar to Figures 6 and 7. Moreover, it can be seen that the theoretical average VoI is consistent with the result obtained from the Monte Carlo simulations.

Figures 10 and 11 plot the theoretical average VoI in (22) and the lower and upper bounds in (25) and (26) for different *κ* and *σ*<sup>2</sup> *<sup>n</sup>*. Small noise *σ*<sup>2</sup> *<sup>n</sup>* and small *κ* can lead to large average VoI. In Figure 10, the gap between the exact value and the lower bound is small for large *σ*<sup>2</sup> *<sup>n</sup>*, and it decreases as *κ* increases. The gap between the exact value and the upper bound in Figure 11 shows the opposite behaviour; the gap narrows as *σ*<sup>2</sup> *<sup>n</sup>* decreases. These two figures verify the discussion given in Proposition 3, illustrating that the average VoI approaches lower and upper bounds in low and high SNR regimes, respectively.

#### **6. Conclusions**

In this paper, we investigated the dependency between the proposed VoI and the AoI in a noisy OU process. The VoI is defined as the mutual information between the current status of the underlying random process and noisy observations captured by the receiver. Functional relationships between the VoI and the AoI were obtained in low, intermediate and high SNR regimes. Moreover, the distribution and moments of the VoI were investigated in the example of the M/M/1 queue model. Finally, we performed Monte Carlo simulations to obtain numerical validation of the theoretical analysis. The results presented in this paper provide insight into how the correlation and noise in a latent OU process influence the VoI of the observations of that process. We also elucidated the relationship between the VoI and the AoI. Our work has given a mathematical justification for selecting certain non-linear age functions. Future work can be focused on exploring the effect of multiple observations on the VoI and AoI relationship and on estimating the value of the status of the underlying process with multiple observations.

**Author Contributions:** Conceptualization, Z.W., M.-A.B. and J.P.C.; methodology, Z.W., M.-A.B. and J.P.C.; formal analysis, Z.W., M.-A.B. and J.P.C.; investigation, Z.W., M.-A.B. and J.P.C.; writing original draft preparation, Z.W., M.-A.B. and J.P.C.; writing—review and editing, Z.W., M.-A.B. and J.P.C.; supervision, M.-A.B. and J.P.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by EPSRC, grant number EP/T02612X/1.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors gratefully acknowledge the support of the Clarendon Fund Scholarships at the University of Oxford.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

#### *Appendix A.1. Proof of Monotonicity in μ*

First, we prove the monotonicity in *μ*. For any particular VoI threshold *v*, the derivative of the VoI outage is given as follows:

$$\begin{split} \frac{\partial P(V \le v)}{\partial \mu} = \frac{\log r(v)}{2\kappa} \Big[ r(v)^{\frac{\mu-\lambda}{2\kappa}} - \left( \frac{\mu}{\mu-\lambda} - \frac{\lambda}{2\kappa} \log r(v) \right) r(v)^{\frac{\mu}{2\kappa}} \Big] \\ &+ \frac{\lambda}{\left(\mu-\lambda\right)^{2}} \Big( r(v)^{\frac{\mu}{2\kappa}} - r(v)^{\frac{\lambda}{2\kappa}} \Big) . \end{split} \tag{A1}$$

For simplicity, let *x*<sup>1</sup> = *<sup>λ</sup>* <sup>2</sup>*<sup>κ</sup>* log *<sup>r</sup>*(*v*) and *<sup>x</sup>*<sup>2</sup> <sup>=</sup> *<sup>μ</sup>* <sup>2</sup>*<sup>κ</sup>* log *r*(*v*). Then, (A1) can be written as follows:

$$\begin{split} \frac{\partial P(V \le v)}{\partial \mu} &= \frac{\log r(v)}{2\kappa} \left[ e^{\mathbf{x}\_2 - \mathbf{x}\_1} + \left( \mathbf{x}\_1 - \frac{\mathbf{x}\_2}{\mathbf{x}\_2 - \mathbf{x}\_1} \right) e^{\mathbf{x}\_2} \right] + \frac{\log r(v)}{2\kappa} \frac{\mathbf{x}\_1}{\left( \mathbf{x}\_2 - \mathbf{x}\_1 \right)^2} \left( e^{\mathbf{x}\_2} - e^{\mathbf{x}\_1} \right) \\ &= \frac{\log r(v)}{2\kappa} e^{\mathbf{x}\_2} \left[ e^{-\mathbf{x}\_1} + \mathbf{x}\_1 - \frac{\mathbf{x}\_2}{\mathbf{x}\_2 - \mathbf{x}\_1} + \frac{\mathbf{x}\_1 \left( 1 - e^{\mathbf{x}\_1 - \mathbf{x}\_2} \right)}{\left( \mathbf{x}\_2 - \mathbf{x}\_1 \right)^2} \right]. \end{split} \tag{A2}$$

Since *λ* < *μ* and 0 < *r*(*v*) < 1, thus *x*<sup>2</sup> < *x*<sup>1</sup> < 0 and log *r*(*v*) < 0. Moreover, for any *x*, we have *<sup>e</sup><sup>x</sup>* <sup>≥</sup> <sup>1</sup> <sup>+</sup> *<sup>x</sup>*. Therefore, (A2) can be further given as follows:

$$\frac{\partial P(V \le v)}{\partial \mu} \le \frac{\log r(v)}{2\kappa} e^{x\_2} \left[ 1 - \frac{\mathbf{x}\_2}{\mathbf{x}\_2 - \mathbf{x}\_1} + \frac{\mathbf{x}\_1 \left( \mathbf{x}\_2 - \mathbf{x}\_1 \right)}{\left( \mathbf{x}\_2 - \mathbf{x}\_1 \right)^2} \right] \tag{A3}$$
 
$$= 0.$$

As the derivative is non-positive, the VoI outage is a monotonic function of *μ*.

#### *Appendix A.2. Proof of Optimal λ Exists*

Next, we prove that the optimal sampling rate exists. The derivative of the VoI outage with respect to *λ* is given as follows:

$$\begin{split} \frac{\partial \, \partial \left( V \le v \right)}{\partial \lambda} &= -\frac{\log r(v)}{2\kappa} \Big( r(v)^{\frac{\mu-\lambda}{2\kappa}} - r(v)^{\frac{\mu}{2\kappa}} - \frac{\mu}{\mu-\lambda} r(v)^{\frac{\lambda}{2\kappa}} \Big) \\ &- \frac{\mu}{\left( \mu-\lambda \right)^{2}} \Big( r(v)^{\frac{\mu}{2\kappa}} - r(v)^{\frac{\lambda}{2\kappa}} \Big) \\ &= -\frac{\log r(v)}{2\kappa} \Big[ \varepsilon^{\mathbf{x}\_{2}-\mathbf{x}\_{1}} - \varepsilon^{\mathbf{x}\_{2}} - \frac{\mathbf{x}\_{2} \varepsilon^{\mathbf{x}\_{1}}}{\mathbf{x}\_{2}-\mathbf{x}\_{1}} + \frac{\mathbf{x}\_{2} \left( \varepsilon^{\mathbf{x}\_{2}} - \varepsilon^{\mathbf{x}\_{1}} \right)}{\left( \mathbf{x}\_{2} - \mathbf{x}\_{1} \right)^{2}} \Big]. \end{split} \tag{A4}$$

When *λ* approaches 0, we can write the following:

$$\lim\_{\lambda x\_1 \to 0} \frac{\partial P(V \le v)}{\partial \lambda} = -\frac{\log r(v)}{2\kappa} \left( \frac{e^{x\_2} - 1}{x\_2} - 1 \right) \le 0. \tag{A5}$$

When *λ* approaches *μ*, we have the following:

$$\lim\_{x\_1 \to x\_2} \frac{\partial P(V \le v)}{\partial \lambda} = -\frac{\log r(v)}{2\kappa} \left[ 1 - e^{x\_2} \left( 1 - \frac{x\_2}{2} \right) \right] \ge -\frac{\log r(v)}{2\kappa} \left( 1 - e^{\frac{x\_2}{2}} \right) \ge 0. \tag{A6}$$

We show that the VoI outage probability decreases with *λ* when *λ* is small, and increases when *λ* is large. Therefore, there exists the optimal sampling rate *λ*∗, and the minimum outage probability is achieved when the derivative in (A4) is 0.

#### **Appendix B. Proof of Proposition 3**

The average VoI can be obtained directly by the following:

$$\begin{split} \mathbb{E}[V] = \int\_{0}^{\frac{1}{2}\log(1+\gamma)} v f\_{V}(v) \, \mathrm{d}v &= -\frac{\mu}{4\pi} \int\_{0}^{1} \log\left(1 - \frac{\gamma}{1+\gamma}r\right) \left[\frac{\mu-\lambda}{\mu} r^{\frac{\mu-\lambda}{2\pi}-1} \\ &+ \frac{\lambda}{\mu-\lambda} r^{\frac{\lambda}{2\pi}-1} - \left(\frac{\mu}{\mu-\lambda} - \frac{\lambda}{\mu} - \frac{\lambda}{2\pi} \log r\right) r^{\frac{\mu}{2\pi}-1}\right] \, \mathrm{d}r. \end{split} \tag{A7}$$

Here, for the given *x* and *y*, we have the following:

$$\begin{split} \int\_{0}^{1} \log(1 - \mathbf{x}r) \cdot \mathbf{r}^{y-1} \, \mathrm{d}r &= \frac{1}{\mathbf{x}^{y}} \int\_{0}^{\mathbf{x}} \log(1 - z) \cdot \mathbf{z}^{y-1} \, \mathrm{d}z \\ &= \frac{1}{\mathbf{x}^{y}} \frac{z^{y}}{y} \log(1 - z) \Big|\_{z=0}^{z=\mathbf{x}} + \frac{1}{y\mathbf{x}^{y}} \int\_{0}^{\mathbf{x}} \frac{z^{y}}{1 - z} \, \mathrm{d}z \\ &= \frac{1}{y} \Big[ \log(1 - \mathbf{x}) + \mathcal{g}\_{1}(\mathbf{x}, y) \Big], \end{split} \tag{A8}$$

and

$$\begin{split} \int\_{0}^{1} \log r \log(1-xr) \cdot r^{y-1} \, \mathrm{d}x \\ &= \frac{1}{x^{y}} \Big( \int\_{0}^{x} \log z \cdot \log(1-z) \cdot z^{y-1} \, \mathrm{d}z - \log x \int\_{0}^{x} \log(1-z) \cdot z^{y-1} \, \mathrm{d}z \Big) \\ &= \frac{1}{x^{y}} \frac{z^{y}}{y} \log(1-z) \cdot \log z \Big|\_{z=0}^{z=x} - \frac{\log x}{x^{y}} \int\_{0}^{x} \log(1-z) \cdot z^{y-1} \, \mathrm{d}z \\ &\qquad - \frac{1}{yx^{y}} \int\_{0}^{x} z^{y} \Big( \frac{\log(1-z)}{z} - \frac{\log z}{1-z} \Big) \, \mathrm{d}z \\ &= \frac{\log(1-x) \cdot \log x}{y} - \left( \frac{1}{yx^{y}} + \frac{\log x}{x^{y}} \right) \int\_{0}^{x} \log(1-z) \cdot z^{y-1} \, \mathrm{d}z + \frac{1}{yx^{y}} \int\_{0}^{x} \frac{z^{y} \log z}{1-z} \, \mathrm{d}z \\ &= -\frac{\log(1-x)}{y^{2}} - \left( \frac{1}{y^{2}} + \frac{\log x}{y} \right) \mathrm{g}\_{1}(x, y) + \frac{\operatorname{g}\_{2}(x, y)}{y} . \tag{A9} \end{split}$$

Therefore, the average VoI is derived by substituting (A8) and (A9) into (A7).

The lower bound in (25) is obtained by applying Jensen's inequality, i.e., the following:

$$\begin{split} \mathbb{E}[V] &= \mathbb{E}\left[ -\frac{1}{2} \log \left( 1 - \frac{\gamma}{1+\gamma} e^{-2\kappa x} \right) \right] \\ &\geq -\frac{1}{2} \log \left( 1 - \frac{\gamma}{1+\gamma} \mathbb{E}[e^{-2\kappa x}] \right), \end{split} \tag{A10}$$

where

$$\begin{split} \mathbb{E}[e^{-2\mathbf{x}\mathbf{r}}] &= \int\_{0}^{+\infty} e^{-2\mathbf{x}a} f\_{A}(a) \, \mathrm{d}a \\ &= \frac{\frac{\mu-\lambda}{2\kappa}}{\frac{\mu-\lambda}{2\kappa}+1} - \frac{\frac{\mu-\lambda}{2\kappa} \left(\frac{\mu+\lambda}{2\kappa}+1\right)}{\left(\frac{\mu}{2\kappa}+1\right)^{2} \left(\frac{\lambda}{2\kappa}+1\right)}. \end{split} \tag{A11}$$

The upper bound in (26) is the average VoI in the Markov OU process. In the hidden Markov model, we can write the following [31]:

$$\begin{split} w(t) &= h(\mathbf{X}t) - h(\mathbf{X}t|\mathbf{Y}\_{t\_n'}, \dots, \mathbf{Y}\_{t\_{n-m+1}'}) \\ &\leq h(\mathbf{X}t) - h(\mathbf{X}t|\mathbf{Y}\_{t\_n'}, \dots, \mathbf{Y}\_{t\_{n-m+1}'}', \mathbf{X}\_{t\_n}) \\ &= h(\mathbf{X}\_t) - h(\mathbf{X}\_t|\mathbf{X}\_{t\_n}) \\ &= I(\mathbf{X}\_t; \mathbf{X}\_{t\_n}). \end{split} \tag{A12}$$

Therefore, the VoI in the Markov model can be regarded as the upper bound of the VoI in the hidden Markov model. Denote *v*OU(*t*) = *I*(*Xt*; *Xtn* ) as the VoI in the underlying OU process. Then, the result in (26) follows from the following calculation:

$$\begin{split} \mathbb{E}[V] &\leq \mathbb{E}[V\_{\text{OU}}] = \mathbb{E}\left[ -\frac{1}{2} \log \left( 1 - e^{-2\kappa t} \right) \right] \\ &= -\frac{\mu}{4\kappa} \int\_{0}^{1} \log(1 - r) \left[ \frac{\mu - \lambda}{\mu} r^{\frac{\mu - 1}{2\kappa} - 1} + \frac{\lambda}{\mu - \lambda} r^{\frac{\mu}{2\kappa} - 1} - \left( \frac{\mu}{\mu - \lambda} - \frac{\lambda}{\mu} - \frac{\lambda}{2\kappa} \log r \right) r^{\frac{\mu}{2\kappa} - 1} \right] dr. \end{split} \tag{A13}$$

Similar to the calculation given in (A8) and (A9), for the given *y*, we have the following:

$$\begin{aligned} \int\_0^1 \log(1-r) \cdot r^{y-1} \, \mathrm{d}r &= -\frac{1}{y} H(y),\\ \int\_0^1 \log r \cdot \log(1-r) \cdot r^{y-1} \, \mathrm{d}r &= \frac{1}{y^2} H(y) - \frac{1}{y} \varrho^{(1)}(y). \end{aligned} \tag{A14}$$

Therefore, the upper bound of the average VoI is derived by substituting (A14) into (A13).

#### **Appendix C. Proof of Proposition 4**

The MGF of the VoI is obtained directly by the following:

$$M\_{\mathbb{D}}(t) = \int\_0^{\frac{1}{2}\log(1+\gamma)} e^{tv} f\_V(v) \, \mathrm{d}v = \frac{\mu}{2\kappa} \int\_0^1 \left(1 - \frac{\gamma}{1+\gamma}r\right)^{-\frac{1}{2}} \left[\frac{\mu-\lambda}{\mu} r^{\frac{\mu-\lambda}{2\kappa}-1} \right.$$

$$\left. + \frac{\lambda}{\mu-\lambda} r^{\frac{\lambda}{2\kappa}-1} - \left(\frac{\mu}{\mu-\lambda} - \frac{\lambda}{\mu} - \frac{\lambda}{2\kappa} \log r\right) r^{\frac{\mu}{2\kappa}-1} \right] \, \mathrm{d}r. \tag{A15}$$

Here, for the given *x*, *y* and *t*, we have the following [30]:

$$\int\_0^1 \left(1 - \mathbf{x}r\right)^{-\frac{t}{2}} \cdot r^{y-1} \,\mathrm{d}r = \frac{1}{y} \circ F\_1\left(y, \frac{t}{2}; y+1; \mathbf{x}\right),\tag{A16}$$

and

$$\int\_0^1 \log r \cdot (1 - \ge r)^{-\frac{t}{2}} \cdot r^{y-1} \,\mathrm{d}r = -\frac{1}{y^2} \,\_3F\_2\left( y, y, \frac{t}{2}; y+1, y+1; \mathbf{x} \right). \tag{A17}$$

Therefore, the MGF of VoI is derived by substituting (A16) and (A17) into (A15).

#### **References**


## *Article* **Channel Quality-Based Optimal Status Update for Information Freshness in Internet of Things**

**Fuzhou Peng, Xiang Chen \* and Xijun Wang**

School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China; pengfzh@mail2.sysu.edu.cn (F.P.); wangxijun@mail.sysu.edu.cn (X.W.)

**\*** Correspondence: chenxiang@mail.sysu.edu.cn

**Abstract:** This paper investigates the status updating policy for information freshness in Internet of things (IoT) systems, where the channel quality is fed back to the sensor at the beginning of each time slot. Based on the channel quality, we aim to strike a balance between the information freshness and the update cost by minimizing the weighted sum of the age of information (AoI) and the energy consumption. The optimal status updating problem is formulated as a Markov decision process (MDP), and the structure of the optimal updating policy is investigated. We prove that, given the channel quality, the optimal policy is of a threshold type with respect to the AoI. In particular, the sensor remains idle when the AoI is smaller than the threshold, while the sensor transmits the update packet when the AoI is greater than the threshold. Moreover, the threshold is proven to be a non-increasing function of channel state. A numerical-based algorithm for efficiently computing the optimal thresholds is proposed for a special case where the channel is quantized into two states. Simulation results show that our proposed policy performs better than two baseline policies.

**Keywords:** age of information; status update; channel quality

**1. Introduction**

Recently, the Internet of things (IoT) has been widely used in the field of industrial manufacturing, environment monitoring, and home automation. In these applications, the sensors generate and transmit new status updates to the destination, where the freshness of the status updates is crucial for the destination to track the state of the environment and to make decisions. Thus, a new information freshness metric, namely age of information (AoI), was proposed in [1] to measure the freshness of updates from the receiver's perspective. There are two widely used metrics, i.e., the average peak AoI [2] and the average AoI [3]. In general, the smaller the AoI is, the fresher the received updates are.

AoI was originally investigated in [1] for updating the status in vehicular networks. Considering the impact of the queueing system, the authors in [4] investigated the system performance under the M/M/1 and M/M/1/2 queueing systems with a first-come-firstserved (FCFS) policy. Furthermore, the work of [5] studied how to keep the updates fresh by analyzing some general update policies, such as the zero-wait policy. The authors of [6] considered the optimal schedule problem for a more general cost that is the weighted sum of the transmission cost and the tracking inaccuracy for the information source. However, these works assumed that the communication channel is not error-prone. In practice, status updates are delivered through an erroneous wireless channel, which suffers from fading, interference, and noises. Therefore, the received updates may not be decoded correctly, which induces information aging and energy consumption.

There are several works that considered the erroneous channel [7,8]. The authors in [9] considered multiple communication channels and investigated the optimal coding and decoding schemes. The channel with an independent and identical packet error rate over time was considered in [10,11]. The work of [12] considered the impact of fading channels in packet transmission. A Markov channel was investigated in [13], where

**Citation:** Peng, F.; Chen, X.; Wang, X. Channel Quality-Based Optimal Status Update for Information Freshness in Internet of Things. *Entropy* **2021**, *23*, 912. https:// doi.org/10.3390/e23070912

Academic Editors: Anthony Ephremides and Yin Sun

Received: 18 June 2021 Accepted: 16 July 2021 Published: 18 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

threshold policy was proven to be optimal, and a simulation-based approach was proposed to compute the corresponding threshold. However, how the information of channel quality should be exploited to improve system performance in information freshness remains to be investigated.

Channel quality indicator (CQI) feedback is commonly used in wireless communication systems [14]. In block fading channels, the channel quality, generally reported by the terminal, is highly relevant to the packet error rate (PER) [15] or, namely, the block error rate (BLER). It is probable that a received packet fails to be decoded when the channel suffers from a poor condition. However, a transmitter with the channel quality information is able to keep idle when there is deep fading, thereby saving energy. The channel quantization was also considered in [12,13], where the channel was quantized into multiple states. However, the decision making was not dependent on the channel state in [12], while [13] did not consider the freshness of information. These motivate us to introduce the information of channel quality into the design of the updating policy.

In this paper, a status update system with channel quality feedback is considered. In particular, the channel condition is quantized into multiple states, and the destination feeds the channel quality back to the sensor before the sensor updates the status. Our problem is to investigate the channel quality-based optimal status update policy, which minimizes the weighted sum of the AoI and the energy consumption. Our key contributions are summarized as follows:


The rest of this paper is organized as follows. In Section 2, the system model is presented and the optimal updating problem is formulated. In Section 3, the optimal updating policy is proven to be of a threshold structure, and a threshold-based policy iteration algorithm is proposed to find the optimal policy. Section 4 presents the simulation results. Finally, we summarize our conclusions in Section 5.

#### **2. System Model and Problem Formulation**

#### *2.1. System Description*

In this paper, we consider a status update system that consists of a sensor and a destination, as shown in Figure 1. Time is divided into slots. Without loss of generality, we assume that each time slot has an equal length, which is normalized to unity. At the beginning of each slot, the destination feeds the CQI back to the sensor. It is worth noting that the PER is different for different CQIs. Based on the CQI, the sensor decides in each time slot whether it should generate and transmit a new update to the destination via a wireless channel or keep idle for saving energy. These updates are crucial for the destination to estimate the states of the surrounding environment of the sensor and to make in-time decisions. Let *at*, which takes value from the action set A = {0, 1}, denote the action that the sensor performs in slot *t*, where *at* = 1 means that the sensor generates and transmits a new update to the destination, and *at* = 0 represents that the sensor is idle. If the sensor transmits an update packet in slot *t*, an acknowledgment will be fed

back at the end of this time slot. In particular, an ACK is fed back when the destination successfully receives the update packet, and a NACK otherwise.

**Figure 1.** System model.

#### *2.2. Channel Model*

Suppose that the wireless channel is a block fading channel where the channel gain remains constant in each slot and varies independently over different slots. Let *zt* denote the channel gain in slot *t* which takes value from [0, +∞). We quantize the channel gain into *N* + 1 levels which are denoted as (*z*0, *z*1, ..., *zi*, ..., *zN*). The quantization levels are arranged in an increasing order where *z*<sup>0</sup> = 0 and *zN* = ∞. Hence, the channel is said to be in state *i* if the channel gain *zt* belongs to the interval [*zi*, *zi*+1). We denote by *ht* the state of the channel in slot *t*, where *ht* ∈ H - {0, 1, 2..., *N* − 1}. With the aid of CQI fed back from the destination, the sensor has knowledge of the channel state at the beginning of each time slot.

Let *pz*(*z*) denote the distribution of the channel gain. Then, the probability of the channel being in state *i* is

$$p\_i = \int\_{z\_i}^{z\_{i+1}} p\_z(z)dz. \tag{1}$$

We assume that the signal-to-noise ratio (SNR) per information bit during the transmission remains constant. Then, the PER depends only on the channel gain. In particular, the PER for channel state *i* is given by

$$\mathfrak{g}\_i = \int\_{z\_i}^{z\_{i+1}} P\_{\text{PER}}(z) p\_z(z|i) dz,\tag{2}$$

where *P*PER(*z*) is the PER of a packet with respect to the channel gain. The success probability *qi* of a packet transmitted over channel state *i* is *qi* = 1 − *gi*. According to [15], the success probability is a non-decreasing function of the channel state.

#### *2.3. Age of Information*

This paper uses the AoI as the freshness metric, which is defined as the time elapsed since the generation time of the latest update packet that is successfully received by the destination [1]. Let *Gi* be the generation time of the *i*th successfully received update packet. Then, the AoI in time slot *t*, Δ*t*, is defined as

$$
\Delta\_l = t - \max\{\mathcal{G}\_l : \mathcal{G}\_l \le t\}.\tag{3}
$$

In particular, if an update packet is successfully received, the AoI decreases to one. Otherwise, the AoI increases by one. Altogether, the evolution of the AoI is expressed by

$$
\Delta\_{l+1} = \begin{cases}
1, & \text{if the transmission is successful,} \\
\Delta\_l + 1, & \text{otherwise.}
\end{cases}
\tag{4}
$$

An example of the AoI evolution is shown in Figure 2, where the gray rectangle represents a successful reception of an update packet, and the mesh rectangle represents a transmission failure.

**Figure 2.** An example of the AoI evolution with the channel state *ht*, the action *at*, and the acknowledgment ACK*t*. The asterisk stands for no acknowledgment from destination when the sensor keeps idle.

#### *2.4. Problem Formulation*

The objective of this paper is to find an optimal updating policy that minimizes the long-term average of the weighted sum of AoI and energy consumption. A policy *π* can be represented by the sequence of actions, i.e., *π* = (*a*0, *a*1, ... , *at*, ...). Let Π be a set of stationary and deterministic policies. Then, the optimal updating problem is given by

$$\min\_{\pi \in \Pi} \limsup\_{T \to \infty} \frac{1}{T} \sum\_{t=0}^{T} \mathbb{E}[\Delta\_t + \omega a\_t \mathbb{C}\_t],\tag{5}$$

where *Ce* is the energy consumption, and *ω* is the weighting factor.

#### **3. Optimal Updating Policy**

This section aims to investigate the optimal updating policy for the problem formulated in above section. In this section, our investigating problem is first formulated into an infinite horizon average cost MDP, and the existence of a stationary and deterministic policy that minimizes the average cost is proven. Then, the non-decreasing property of the value function is derived. Based on this property, we prove that the optimal update policy is of a threshold structure with respect to AoI, and the optimal threshold is a nonincreasing function of the channel state. Aiming to reduce the computational complexity, a structure-aware policy iteration algorithm is proposed to find the optimal policy. Moreover, non-linear fractional programming is employed to directly compute the optimal thresholds in a special case where the channel is quantized into two states.

#### *3.1. MDP Formulation*

The Markov decision process (MDP) is typically applied to address the optimal decision problem when the investigation problem can be characterized by the evolution of the system state and the cost is per-stage. The optimization problem in (5) can be formulated as an infinite horizon average cost MDP, which is elaborated in the following.


$$\begin{cases} \Pr(\mathbf{x}\_{t+1} = (\Delta + 1, j) | \mathbf{x}\_t = (\Delta, i), a\_t = 0) = p\_{j\prime} \\ \Pr(\mathbf{x}\_{t+1} = (1, j) | \mathbf{x}\_t = (\Delta, i), a\_t = 1) = q\_i p\_{j\prime} \\ \Pr(\mathbf{x}\_{t+1} = (\Delta + 1, j) | \mathbf{x}\_t = (\Delta, i), a\_t = 1) = g\_i p\_{j\prime} \end{cases} \tag{6}$$

• Cost: The instantaneous cost *C*(**x***t*, *at*) at state **x***<sup>t</sup>* given action *at* in slot *t* is

$$\mathbb{C}(\mathbf{x}\_{l}, a\_{l}) = \Delta\_{l} + \omega a\_{l} \mathbb{C}\_{\mathbf{c}}.\tag{7}$$

For an MDP with infinite states and unbounded cost, it is not guaranteed to have a stationary and deterministic policy that attains the minimum average cost in general. Fortunately, we can prove the existence of stationary and deterministic policy in next sub-section.

#### *3.2. The Existence of Stationary and Deterministic Policy*

For rigorous mathematical analysis, this section is purposed to prove the existence of a stationary and deterministic optimal policy. According to [16], we first analyze the associated discounted cost problem of the original MDP. The expectation of discount cost with respect to discounted factor *γ* and initial state **x**ˆ under a policy *π* is given by

$$V\_{\pi,\gamma}(\mathbf{\hat{x}}) = \mathbb{E}\_{\pi} \left[ \sum\_{t=0}^{\infty} \gamma^t \mathbf{C}(\mathbf{x}\_t, a\_t) | \mathbf{x}\_0 = \mathbf{\hat{x}} \right],\tag{8}$$

where *at* is the decision made in state **x**ˆ under policy *π*, and *γ* ∈ (0, 1) is the discounted factor. We first verify that *Vπ*,*γ*(**x**ˆ) is finite for any policy and all **x**ˆ ∈ S.

**Lemma 1.** *Given <sup>γ</sup>* <sup>∈</sup> (0, 1)*, for any policy <sup>π</sup> and all* **<sup>x</sup>**<sup>ˆ</sup> = (**x**ˆ, <sup>ˆ</sup> *h*) ∈ S*, we have*

$$V\_{\pi,\gamma}(\mathbf{\hat{x}}) = \mathbb{E}\_{\pi} \left[ \sum\_{t=0}^{\infty} \gamma^t \mathbb{C}(\mathbf{x}\_t, a\_t) | \mathbf{x}\_0 = \mathbf{\hat{x}} \right] < \infty. \tag{9}$$

**Proof.** By definition, the instantaneous cost in state **x***<sup>t</sup>* = (Δ*t*, *ht*) given action *at* is

$$\mathbb{C}(\mathbf{x}\_{l}, a\_{l}) = \begin{cases} \Lambda\_{l\prime} & \text{if } a\_{l} = 0, \\ \Lambda\_{l} + \omega \mathbb{C}\_{\varepsilon} & \text{if } a\_{l} = 1. \end{cases} \tag{10}$$

Therefore, *C*(**x***t*, *at*) ≤ Δ*<sup>t</sup>* + *ωCe* holds. Combined with the fact that the AoI increases, at most, linearly at each slot for any policy, we have

$$\sum\_{t=0}^{\infty} \gamma^t \mathbb{C}(\mathbf{x}\_t, a\_t | \mathbf{x}\_0 = (\hat{\Delta}, \hat{h}))$$

$$\begin{split} & \leq \sum\_{t=0}^{\infty} \gamma^t (\hat{\Delta} + t + \omega \mathbb{C}\_t) \\ & = \frac{1}{1 - \gamma} \left( \hat{\Delta} + \frac{\gamma}{1 - \gamma} + \omega \right) < \infty, \end{split} \tag{11}$$

which completes the proof.

Let *Vγ*(**x**ˆ) = min*<sup>π</sup> Vπ*,*γ*(**x**ˆ) denote the minimum expected discounted cost. By Lemma 1, *Vγ*(**x**ˆ) = min*<sup>π</sup> Vπ*,*γ*(**x**ˆ) < ∞ holds for every **x**ˆ and *γ* ∈ (0, 1).

According to [16] (Proposition 1), we have

$$V\_{\gamma}(\hat{\mathbf{x}}) = \min\_{a \in \mathcal{A}} \left\{ \mathbb{C}(\hat{\mathbf{x}}, a) + \gamma \sum\_{\mathbf{x}' \in \mathcal{S}} \Pr(\mathbf{x}' | \hat{\mathbf{x}}, a) V\_{\gamma}(\mathbf{x}') \right\},\tag{12}$$

which implies that *Vγ*(**x**ˆ) satisfies the Bellman equation. *Vγ*(**x**ˆ) can be solved via a value iteration algorithm. In particular, we define *Vγ*,0(**x**ˆ) = 0, and for all *n* ≥ 1, we have

$$V\_{\gamma,n}(\mathbf{\hat{x}}) = \min\_{a \in \mathcal{A}} Q\_{\gamma,n}(\mathbf{\hat{x}}, a), \tag{13}$$

where

$$Q\_{\gamma,n}(\mathbf{\hat{x}},a) = \min\_{a \in \mathcal{A}} \left\{ C(\mathbf{\hat{x}},a) + \gamma \sum\_{\mathbf{x'} \in \mathcal{S}} \Pr(\mathbf{x'}|\mathbf{\hat{x}},a) V\_{\gamma,n-1}(\mathbf{x'}) \right\} \tag{14}$$

is related to the right-hand-side (RHS) of the discounted cost optimality equation. Then, lim*n*→<sup>∞</sup> *Vγ*,*n*(**x**ˆ) = *Vγ*(**x**ˆ) for every **x**ˆ and *γ*.

Now, we can use the value iteration algorithm to establish the monotonic properties of *Vγ*(**x**ˆ)

**Lemma 2.** *For all* Δ *and i, we have*

$$V\_{\gamma}(\Delta, N-1) \le V\_{\gamma}(\Delta, i), \tag{15}$$

*and for all* Δ<sup>1</sup> ≤ Δ<sup>2</sup> *and i, we have*

$$V\_{\gamma}(\Delta\_1, i) \le V\_{\gamma}(\Delta\_2, i). \tag{16}$$

**Proof.** See Appendix A.

Based on Lemmas 1 and 2, we are ready to show that the MDP has a stationary and deterministic optimal policy in the following theorem.

**Theorem 1.** *For the MDP in (5), there exists a stationary and deterministic optimal policy π*∗ *that minimizes the long-term average cost. Moreover, there exists a finite constant λ* = lim*γ*→1(1 − *γ*)*Vγ*(**x**) *for all states* **x***, where λ is independent of the initial state, and a value function V*(**x**)*, such that*

$$\lambda + V(\mathbf{x}) = \min\_{a \in \mathcal{A}} \left\{ \mathbb{C}(\mathbf{x}, a) + \sum\_{\mathbf{x'} \in \mathcal{S}} \Pr(\mathbf{x'} | \mathbf{x}, a) V(\mathbf{x'}) \right\} \tag{17}$$

*holds for all* **x***.*

**Proof.** See Appendix B.

#### *3.3. Structural Analysis*

According to Theorem 1, the optimal policy for the average cost problem satisfies the following equation

$$\pi^\*(\mathbf{x}) = \arg\min\_{a \in \mathcal{A}} Q(\mathbf{x}, a), \tag{18}$$

where

$$Q(\mathbf{x}, a) = \mathbb{C}(\mathbf{x}, a) + \sum\_{\mathbf{x}' \in S} \Pr(\mathbf{x}' | \mathbf{x}, a) V(\mathbf{x}'). \tag{19}$$

Similar to Lemma 2, the monotonic property of the value function *V*(**x**) is given in the following lemma.

**Lemma 3.** *Given the channel state i, for any* Δ<sup>2</sup> ≥ Δ1*, we have*

$$V(\Delta\_2, i) \ge V(\Delta\_1, i). \tag{20}$$

**Proof.** This proof follows the same procedure of Lemma 2, with one exception being that the value iteration algorithm is based on Equation (17).

Moreover, based on Lemma 3, the property of the increment of the value function is established in following lemma.

**Lemma 4.** *Given the channel state i, for any* Δ<sup>2</sup> ≥ Δ1*, we have*

$$V(\Delta\_2, i) - V(\Delta\_1, i) \ge \Delta\_2 - \Delta\_1. \tag{21}$$

**Proof.** We first examine the relation between the state-action value functions, i.e., *Q*(Δ2, *i*, *a*) and *Q*(Δ1, *i*, *a*). Specifically, based on Lemma 3, we have

$$Q(\Delta\_2, i, 0) - (\Delta\_2 - \Delta\_1) = \Delta\_1 + \sum\_{j=0}^{N-1} p\_j V(\Delta\_2 + 1, j)$$

$$\geq \Delta\_1 + \sum\_{j=0}^{N-1} p\_j V(\Delta\_1 + 1, j) = Q(\Delta\_1, i, 0), \tag{22}$$

and

$$\begin{aligned} &Q(\Delta\_2, i, 1) - (\Delta\_2 - \Delta\_1) \\ &= \Delta\_1 + \omega \mathbf{C}\_\varepsilon + q\_i \sum\_{j=0}^{N-1} p\_j V(1, j) + g\_i \sum\_{j=0}^{N-1} p\_j V(\Delta\_2 + 1, j) \\ &\ge \Delta\_1 + \omega \mathbf{C}\_\varepsilon + q\_i \sum\_{j=0}^{N-1} p\_j V(1, j) + g\_i \sum\_{j=0}^{N-1} p\_j V(\Delta\_1 + 1, j) \\ &= Q(\Delta\_1, i, 1). \end{aligned}$$

Since *V*(**x**) = min *<sup>a</sup>*∈A *<sup>Q</sup>*(**x**, *<sup>a</sup>*), we complete the proof.

Our main result is presented in the following theorem.

**Theorem 2.** *For any given channel state i, there exists a threshold βi, such that when* Δ ≥ *βi, the optimal action is to generate and transmit a new update, i.e., π*∗(Δ, *i*) = 1*, and when* Δ < *βi, the optimal action is to remain idle, i.e., π*∗(Δ, *i*) = 0*. Moreover, the optimal threshold β<sup>i</sup> is a non-increasing function of channel state i, i.e., β<sup>i</sup>* ≥ *β<sup>j</sup> holds for all i*, *j* ∈ H *and i* ≤ *j.*

#### **Proof.** See Appendix C.

According to Theorem 2, the sensor will not update the status until the AoI exceeds the threshold. Moreover, if the channel condition is not good, i.e., channel state *i* is small, the sensor will wait for a longer time before it samples and transmits the status update packet so as to reduce the energy consumption because of a higher probability of transmission failure.

Based on the threshold structure, we can reduce the computational complexity of the policy iteration algorithm to find the optimal policy. The details of the algorithm are presented in Algorithm 1.

#### *3.4. Computing the Thresholds for a Special Case*

In the above section, we have proven that the optimal policy has a threshold structure. Given the thresholds (*β*0, *β*1, ..., *βN*−1), a Markov chain can be induced by the threshold policy. A special Markov chain is depicted in Figure 3, where the channel has two states. By leveraging the Markov chain, we first derive the average cost of the special case, which is summarized in the following theorem.

**Algorithm 1** Policy iteration algorithm (PIA) based on the threshold structure.


**Figure 3.** An illustration of established Markov chain with two channel states.

**Theorem 3.** *Let ϕ*(**x**) *be the steady state probability of state* **x** *of the corresponding Markov chain with two states and β*0, *β*<sup>1</sup> *be the threshold with respect to the channel state, respectively. The steady state probability is given by*

$$\boldsymbol{\varrho}(i,j) = \begin{cases} p\_{\boldsymbol{\beta}} \boldsymbol{\varrho}\_{1\prime} & \text{if } 1 \le i \le \boldsymbol{\beta}\_{1\prime} \\ p\_{\boldsymbol{\beta}} \boldsymbol{\varsigma}\_{0}^{i-\beta\_{1}} \boldsymbol{\varrho}\_{1\prime} & \text{if } \boldsymbol{\beta}\_{1} < i \le \boldsymbol{\beta}\_{0\prime} \\ p\_{\boldsymbol{\beta}} \boldsymbol{\varsigma}\_{0}^{\otimes\_{0}-\beta\_{1}} \boldsymbol{\varsigma}\_{1}^{i-\beta\_{1}} \boldsymbol{\varrho}\_{1\prime} & \text{if } i > \beta\_{0\prime} \end{cases} \tag{24}$$

*where ϕ*<sup>1</sup> = *ϕ*(1, 0) + *ϕ*(1, 1)*, s*<sup>0</sup> = 1 − *p*1*q*1*, s*<sup>1</sup> = 1 − *p*0*q*<sup>0</sup> − *p*1*q*1*, and ϕ*<sup>1</sup> *satisfies following equation:*

$$\varphi\_1 = \frac{1}{\beta\_1 + \frac{s\_0 - s\_0^{\beta\_0 - \beta\_1}}{1 - s\_0} + s\_0^{\beta\_0 - \beta\_1} \frac{s\_1}{1 - s\_1}}.\tag{25}$$

*The average cost then is given by*

$$\mathbb{C}\_{m\varepsilon}(\beta\_0, \dots, \beta\_{N-1})$$

$$=\varrho\_1(\frac{\beta\_1(\beta\_1+1)}{2} + A + B + \omega \mathbf{C}\_{\varepsilon}E),\tag{26}$$

*where*

$$A = \frac{s\_0((\beta\_1 + 1) - \beta\_0 s\_0^{\beta\_0 - \beta\_1})}{1 - s\_0} + \frac{s\_0^3 - s\_0^{\beta\_0 - \beta\_1 + 1}}{(1 - s\_0)^2},\tag{27}$$

$$B = \frac{(\beta\_0 + 1)s\_1}{1 - s\_1} + \frac{s\_1^3}{1 - s\_1} \,\prime \tag{28}$$

*and*

$$E = \left( s\_0^{\beta\_0 - \beta\_1} \frac{1}{1 - s\_1} + p\_1 \frac{1 - s\_0^{\beta\_0 - \beta\_1}}{1 - s\_0} \right). \tag{29}$$

**Proof.** See Appendix D.

Therefore, the closed form of the average cost is a function of thresholds. By linear search or gradient descent algorithm, the numerical solution of optimal thresholds can be obtained. However, computing its gradient directly requires a large amount of computation till convergence. Here, a nonlinear fractional programming (NLP) [17] based algorithm which can efficiently obtain the numerical solution is proposed.

Let **x** = (*β*0, *β*1). We can rewrite the cost function as a fractional form, where the numerator is denoted as *N*(**x**) = −*Cmc*(**x**)/*ϕ*1, and the denominator term is *N*(**x**) = 1/*ϕ*1. The solution to an NLP problem with the form in the following

$$\max \left\{ \frac{N(\mathbf{x})}{D(\mathbf{x})} | \mathbf{x} \in A \right\} \tag{30}$$

is related to the optimization problem (31)

$$\max \{ N(\mathbf{x}) - qD(\mathbf{x}) | (\mathbf{x} \in A) \} \text{, for } q \in \mathbb{R}, \tag{31}$$

where the following assumption should also be satisfied:

$$D(\mathbf{x}) > 0 \text{, for all } \mathbf{x} \in A. \tag{32}$$

Define the function *F*(*q*) with variable *q* as

$$F(q) = \max\{N(\mathbf{x}) - qD(\mathbf{x}) | \mathbf{x} \in A\}, \text{ for } q \in \mathbb{R}.\tag{33}$$

According to [17], *F*(*q*) is a strictly monotonic decreasing function and is convex over R. Furthermore, we have *q*<sup>0</sup> = *N*(**x**0)/*D*(**x**0) = max{*N*(**x**) − *qD*(**x**)|**x** ∈ *A*} if, and only if,

$$F(q\_0) = \max\{N(\mathbf{x}) - q\_0 D(\mathbf{x}) | \mathbf{x} \in A\} = 0. \tag{34}$$

Then, the algorithm can be described by two steps. The first step is to solve a convex optimization problem with a one dimensional parameter by a bisection method. The second step is to solve a high dimensional optimization problem by a gradient descent method.

According to [17], a bisection method can be used to solve the optimal *q*0, under the assumption that the value of function *F*(*q*) can be obtained exactly for given *q*. We will actually use the gradient descent algorithm to obtain the numerical solution of *F*(*q*) since the global search method may not perform in polynomial time. As a trick, we alternate the optimization variables of thresholds (*β*0, *β*1) by the variables of the decrement of thresholds, i.e., **x** = (*β*<sup>0</sup> − *β*1, *β*1). To summarize, the numerical-based method for computing the optimal thresholds is given by Algorithm 2.

#### **Algorithm 2** Numerical computation of the optimal thresholds.

**Input:** Iteration time *k*, error threshold *δ* **Output:** Numerical result **x**∗ 1: Let *N*(**x**) = −*Cmc*(**x**)/*ϕ*1, and *D*(**x**) = 1/*ϕ*1. Define *F*(*q*) = max{*N*(**x**) − *qD*(**x**)|**x** ≥ **0**} 2: Let the iteration starts with *i* = 1, search range [*a*, *b*] of *q*. 3: **while** *i* ≤ *k* **do** 4: *m* = *<sup>a</sup>*+*<sup>b</sup>* 2 ; 5: **if** *F*(*m*) ∗ *F*(*a*) < 0 **then** 6: *b* = *m*; 7: **else** 8: *a* = *m*; 9: **end if** 10: **if** *<sup>b</sup>*−*<sup>a</sup>* <sup>2</sup> < *δ* **then** 11: **x**<sup>∗</sup> = arg min**<sup>x</sup>** *F*(*m*) 12: break; 13: **end if** 14: *i* = *i* + 1; 15: **end while**

#### **4. Simulation Results and Discussions**

In this section, the simulation results are presented to investigate the impacts of the system parameters. We also compare the optimal policy with the zero-wait policy and periodic policy, where the zero-wait policy immediately generates an update at each time slot and the periodic policy keeps a constant interval between two updates.

Figure 4 depicts the optimal policy for different AoI and channel states, where the number of channel states is 5. It can be seen that, for each channel state, the optimal policy has a threshold structure with respect to the AoI. In particular, when the AoI is small, it is not beneficial for the sensor to generate and transmit a new update because the energy consumption dominates the total cost. We can also see that the threshold is non-increasing with the channel state. In other words, if the channel condition is better, the threshold is smaller. This is because the success probability of packet transmission increases with the channel state.

Figure 5 illustrates the thresholds for the MDP with two channel states with respect to the weighting factor *ω*, in which the two dashed lines are obtained by PIA and the other two solid lines are obtained by the proposed numerical algorithm. Both of the thresholds grow with the increasing of *ω*. Since the energy consumption has more weight, it is not efficient to update when the AoI is small. On the contrary, when *ω* decreases, the AoI dominates and the thresholds decline. In particular, both of the thresholds equal 1 when *ω* = 0. In this case, the optimal policy reduces to the zero-wait policy. We can also see that the value of the threshold for channel state 1 of the numerical algorithm is close to the optimal solution. In contrast, the value of the threshold for channel state 0 gradually deviates from the optimal value.

**Figure 4.** Optimal policy for different AoI and channel states (*q*<sup>0</sup> = 0.1, *q*<sup>1</sup> = 0.2, *q*<sup>2</sup> = 0.3, *q*<sup>3</sup> = 0.4, *q*<sup>4</sup> = 0.5, *p*<sup>0</sup> = 0.1, *p*<sup>1</sup> = 0.1, *p*<sup>2</sup> = 0.3, *p*<sup>3</sup> = 0.3, *p*<sup>4</sup> = 0.2, *ω* = 10, *Ce* = 1).

**Figure 5.** Optimal thresholds for two different channel states versus *ω* (*p*<sup>0</sup> = 0.2, *p*<sup>1</sup> = 0.8, *q*<sup>0</sup> = 0.2, *q*<sup>1</sup> = 0.5, *Ce* = 1).

Figure 6 illustrates the performance comparison of four policies, i.e., the zero-wait policy, the periodic policy, the numerical-based policy, and the optimal policy, with respect to the weighting factor *ω*. It is easy to see that the optimal policy has the lowest average cost. As we see in Figure 6, the zero-wait policy has the same performance with the optimal policy when *ω* = 0. As *ω* increases, the average cost of all three policies increases. However, the increment of the zero-wait policy is larger than the periodic policy and the optimal policy due to the frequent transmission in the zero-wait policy. Although the thresholds obtained by the PIA and the numerical algorithm are not exactly the same as shown in Figure 5, the performance of the numerical-based algorithm also coincides with the optimal policy. This is because the threshold for channel state 1 exists in the quadratic term of the cost function, while the threshold for channel state 0 exists in the negative exponential term of the cost function. As a result, the threshold for channel state 1 has a much more significant effect on the system performance.

**Figure 6.** Comparison of the zero-wait policy, the periodic policy with period being 5, the numericalbased policy, and the optimal policy with respect to the weighting factor *ω* (*p*<sup>0</sup> = 0.2, *p*<sup>1</sup> = 0.8, *q*<sup>0</sup> = 0.2, *q*<sup>1</sup> = 0.5, *Ce* = 1).

Figure 7 compares the three policies with respect to the probability *p*<sup>1</sup> of the channel being in state 1. Since there is a higher probability that the channel has a good quality as *p*<sup>1</sup> increases, the average cost of all three policies decreases. We can see that, in the regime of *p*1, the optimal policy has the lowest average cost, because it can achieve a good balance between the AoI and the energy consumption. We can also see that the cost of the periodic policy is greater than the zero-wait policy first, and smaller later. To further demonstrate these curves, we separate the energy consumption term and AoI term into different figures, i.e., Figures 8 and 9. We see that the update cost of the zero-wait policy is smaller than that of the periodic policy, but the AoI of the zero-wait policy has a smaller decrease with respect to *p*<sup>1</sup> than the periodic policy.

**Figure 7.** Comparison of the zero-wait policy, the periodic policy with period being 5, and the optimal policy with respect to *p*<sup>1</sup> (*q*<sup>0</sup> = 0.2, *q*<sup>1</sup> = 0.5, *ω* = 10, *Ce* = 1).

**Figure 8.** AoI comparison of the zero-wait policy, the periodic policy with period being 5, and the optimal policy with respect to *p*<sup>1</sup> (*q*<sup>0</sup> = 0.2, *q*<sup>1</sup> = 0.5, *ω* = 10, *Ce* = 1).

**Figure 9.** Energy consumption comparison of the zero-wait policy, the periodic policy with period being 5, and the optimal policy with respect to *p*<sup>1</sup> (*q*<sup>0</sup> = 0.2, *q*<sup>1</sup> = 0.5, *ω* = 10, *Ce* = 1).

#### **5. Conclusions**

In this paper, we have studied the optimal updating policy in an IoT system, where the channel gain is quantized into multiple states and the channel state is fed back to the sensor before the decision making. The status update problem has been formulated as an MDP to minimize the long-term average of the weighted sum of the AoI and the energy consumption. By investigating the properties of the value function, it is proven that the optimal policy has a threshold structure with respect to AoI for any given channel state. We have also proven that the threshold is a non-increasing function of the channel state. Simulation results show the impacts of system parameters on the optimal thresholds and the average cost. Through comparisons, we have also shown that our proposed policy outperforms the zero-wait policy and the periodic policy. In our future research, the timevarying channel model will be further involved for guiding the future design of realistic IoT systems.

**Author Contributions:** Conceptualization, F.P., X.C., and X.W.; methodology, F.P. and X.W.; software, F.P.; validation, F.P., X.C., and X.W.; formal analysis, F.P. and X.W.; investigation, X.W.; writing original draft preparation, F.P.; writing—review and editing, X.W. and X.C.; visualization, F.P. and X.W.; supervision, X.C. and X.W.; project administration, X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the State's Key Project of Research and Development Plan under Grants (2019YFE0196400), in part by Guangdong R&D Project in Key Areas under Grant (2019B010158001), in part by Guangdong Basic and Applied Basic Research Foundation under Grant (2021A1515012631), in part by Key Laboratory of Modern Measurement & Control Technology, Ministry of Education, Beijing Information Science & Technology University (KF20201123202).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Lemma 2**

Based on the value iteration algorithm, the induction method can be employed in following proof. Firstly, we initial that *Vγ*,0(**x**) = 0, where both Equations (15) and (16) hold for all **x** ∈ S.

*Appendix A.1. Proof of Equation (16)*

When *n* = 1,

$$Q\_{\gamma,1}(\Delta\_1, i, 0) - Q\_{\gamma,1}(\Delta\_2, i, 0) = \Delta\_1 - \Delta\_2 \le 0,\tag{A1}$$

and

$$Q\_{\gamma,1}(\Delta\_1, i, 1) - Q\_{\gamma,1}(\Delta\_2, i, 1) = \Delta\_1 + \omega \mathbb{C}\_{\mathfrak{t}} - \left(\Delta\_2 + \omega \mathbb{C}\_{\mathfrak{t}}\right) \le 0,\tag{A2}$$

hold due to Δ<sup>1</sup> ≤ Δ2, and we have *Vγ*,1(Δ1, *i*) ≤ *Vγ*,1(Δ2, *i*). Suppose that *Vγ*,*K*(Δ1, *i*) ≤ *Vγ*,*K*(Δ2, *i*) holds for *k* ≤ *K*. Considering the case of *k* = *K* + 1,

$$Q\_{\gamma,K+1}(\Delta\_1, i, 0) - Q\_{\gamma,K+1}(\Delta\_2, i, 0) = \Delta\_1 - \Delta\_2 \le 0,\tag{A3}$$

and

$$\begin{aligned} &Q\_{\gamma,K+1}(\Delta\_1,i,1) - Q\_{\gamma,K+1}(\Delta\_2,i,1) \\ &= (\Delta\_1 - \Delta\_2) + \gamma \sum\_{j \in \mathbb{H}} p\_j \mathbf{g}\_i \big( V\_{\gamma,K}(\Delta\_1 + 1,j) - V\_{\gamma,K}(\Delta\_2 + 1,j) \big) \\ &\le 0, \end{aligned} \tag{A4}$$

hold for all *i* according to Δ<sup>1</sup> ≤ Δ2. Therefore, we have *Vγ*,*K*+1(Δ1, *i*) ≤ *Vγ*,*K*+1(Δ2, *i*). Since lim*n*→<sup>∞</sup> *Vγ*,*<sup>n</sup>* = *Vγ*, we have *Vγ*(Δ1, *i*) ≤ *Vγ*(Δ2, *i*).

#### *Appendix A.2. Proof of Equation (15)*

By the definition of function *Qγ*(**x**, *a*), we have

$$Q\_{\gamma}(\Delta, i, 0) = \Delta + \gamma \sum\_{j \in \mathbb{H}} p\_j V\_{\gamma}(\Delta + 1, j), \tag{A5}$$

and

$$\mathcal{Q}\_{\gamma}(\Lambda, i, 1) = \Delta + \omega \mathbb{C}\_{\varepsilon} + \gamma \left( \sum\_{j \in \mathbb{H}} p\_j (\mathcal{g}\_i V\_{\gamma}(\Delta + 1, j) + q\_i V\_{\gamma}(1, j)) \right). \tag{A6}$$

Therefore,

$$Q\_{\gamma}(\Delta\_{\prime}N - 1, 0) - Q\_{\gamma}(\Delta\_{\prime}i, 0) \le 0,\tag{A7}$$

and

$$\begin{aligned} &Q\_{\boldsymbol{\gamma}}(\Delta, \boldsymbol{N} - 1, 1) - Q\_{\boldsymbol{\gamma}}(\Delta, \boldsymbol{i}, 1) \\ &\stackrel{(a)}{=} \gamma \sum\_{j \in \mathcal{H}} p\_j (q\_{N-1} - q\_i) (V\_{\boldsymbol{\gamma}}(1, j) - V\_{\boldsymbol{\gamma}}(\Delta + 1, j)) \leq 0, \end{aligned} \tag{A8}$$

hold for all *i*, where step (*a*) is due to Equation (16). Hence, we have *Vγ*(Δ, *N* − 1) ≤ *Vγ*(Δ, *i*). This completes the whole proof.

#### **Appendix B. Proof of Theorem 1**

Theorem 1 can be proven by verifying the conditions given in [16]. The conditions are listed as follows:


By Lemma 1, *Vγ*(**x**ˆ) = min*<sup>π</sup> Vπ*,*γ*(**x**ˆ) < ∞ holds for every **x**ˆ and *γ*. Hence, condition (1) holds. According to Lemma 2, by letting **x**ˆ = (1, *N* − 1) and *L* = 0, we have *hγ*(**x**) ≥ 0, which verifies condition (2).

Before verifying condition (3), a lemma is given as follows:

**Lemma A1.** *Let us denote* **x**ˆ = (1, *N* − 1) *as the reference state and define the first time that an initial state* **x** *transits to* **x**ˆ *as K* = min{*k* : *k* ≤ 1, **x***<sup>k</sup>* = **x**ˆ}*. Then, the expectation cost under the always-transmitting policy πa, i.e., the sensor generates and transmits a new update in each slot, is*

$$\mathbf{C}\_{\mathbf{x},\mathbf{k}}(\pi\_{\mathfrak{a}}) \ = \mathbb{E}\_{\pi\_{\mathfrak{a}}} \left[ \sum\_{t=0}^{K-1} \gamma^t \mathbf{C}(\mathbf{x}\_t, a\_t) \, \middle| \, \mathbf{x} \right], \tag{A9}$$

*where C***x**,**x**ˆ(*πa*) < ∞ *holds for all* **x***.*

**Proof.** Since *at* = 1 for all *t*, the probability that the state returns to **x**ˆ from **x** after exactly *K* slot is given by

$$\Pr(K=k|\mathbf{x}=(\Delta,j)) = \begin{cases} p\_{N-1}q\_{j\prime} & \text{if } k=1, \\ p\_{N-1}g\_j(1-\sum\_{i\in\mathbb{H}}p\_iq\_i)^{k-2}(\sum\_{i\in\mathbb{H}}p\_iq\_i), & \text{otherwise.} \end{cases} \tag{A10}$$

Then, the expectation return cost from **x** to **x**ˆ is expressed as

$$\begin{aligned} & \mathbf{C\_{x}} (\pi\_{a}) \\ &= \mathbb{E} \left[ \sum\_{t=0}^{K-1} \gamma^{t} \mathbf{C} (\mathbf{x}\_{t}, a\_{t}) | \mathbf{x} \right] \\ & \overset{(a)}{\leq} \sum\_{k=1}^{\infty} \Pr(K = k | \mathbf{x} = (\Delta, j)) \left[ \sum\_{m=0}^{k-1} (\Delta + m + \omega \mathbf{C\_{c}}) \right] \\ & < \infty, \end{aligned} \tag{A11}$$

where step (*a*) is due to the fact that *C*(**x***t*, *at*) ≤ Δ*<sup>t</sup>* + *ωCe*.

Considering a mixture policy *π*, in which it performs the always-transmitting policy *π<sup>a</sup>* from initial state **x** until it enters the reference state **x**ˆ, it later performs the optimal policy *πγ* that minimizes the discounted cost. Therefore, we have

$$\begin{split} &V\_{\gamma}(\mathbf{x}) \\ &\leq \mathbb{E}\_{\pi\_{d}} \left[ \sum\_{t=0}^{K-1} \gamma^{t} \mathbb{C}(\mathbf{x}\_{t}, a\_{t}) | \mathbf{x} \right] + \mathbb{E}\_{\pi\_{b}} \left[ \sum\_{t=K}^{\infty} \gamma^{t} \mathbb{C}(\mathbf{x}\_{t}, a\_{t}) | \hat{\mathbf{x}} \right] \\ &\leq \mathbb{C}\_{\mathbf{x}, \hat{\mathbf{x}}}(\pi\_{d}) + \mathbb{E}\_{\pi} \left[ \gamma^{K} V\_{\gamma}(\hat{\mathbf{x}}) \right] \\ &\leq \mathbb{C}\_{\mathbf{x}, \hat{\mathbf{x}}}(\pi\_{d}) + V\_{\gamma}(\hat{\mathbf{x}}), \end{split} \tag{A12}$$

which implies that *hγ*(**x**) ≤ *C***x**,**x**ˆ(*πa*). Hence, let **x**ˆ = (1, *N* − 1) and *M***<sup>x</sup>** = *C***x**,**x**ˆ(*πa*); condition (3) is verified.

On the other hand, *M***<sup>x</sup>** < ∞ holds for all **x**. The states that transit from **x** are finite. Thus, the weighted sum of finite *M***<sup>x</sup>** is also finite, i.e., ∑**x** *P*(**x** |**x**, *a*)*M***x** < ∞ holds for all **x** and *a*, which verifies condition (4). This completes the whole verification.

#### **Appendix C. Proof of Theorem 2**

Based on the definition of *Q*(Δ, *i*, *a*), we can obtain the difference between the stateaction value function as follows:

$$\begin{split} &Q(\Delta,i,0) - Q(\Delta,i,1) \\ &= \sum\_{j=0}^{N-1} p\_j V(\Delta+1,j) - q\_i \sum\_{j=0}^{N-1} p\_j V(1,j) - g\_i \sum\_{j=0}^{N-1} p\_j V(\Delta+1,j) - \omega \mathbb{C}\_{\varepsilon} \\ &= q\_i \sum\_{j=0}^{N-1} p\_j (V(\Delta+1,j) - V(1,j)) - \omega \mathbb{C}\_{\varepsilon} \\ &\overset{(a)}{\geq} q\_i \Delta - \omega \mathbb{C}\_{\varepsilon}. \end{split} \tag{A13}$$

where (*a*) is due to the property of the value function given in Lemma 4. We then discuss the difference between the state-action value function in two cases.

Case 1: *ω* = 0.

In this case, *Q*(Δ, *i*, 0) − *Q*(Δ, *i*, 1) ≥ 0 holds for any Δ and *i*. Therefore, the optimal policy is to update at each slot in spite of the channel state. In other words, the optimal thresholds are all equal to 1.

Case 2: *ω* > 0.

We note that, given *i*, *qi*Δ − *ωCe* increases linearly with Δ. Hence, there exists a positive integer *β*ˆ *<sup>i</sup>*, such that *β*ˆ *<sup>i</sup>* is the minimum value that satisfies *qiβ*ˆ *<sup>i</sup>* − *ωCe* ≥ 0. Therefore, if <sup>Δ</sup> <sup>≥</sup> *<sup>β</sup>*<sup>ˆ</sup> *<sup>i</sup>*, *<sup>Q</sup>*(Δ, *<sup>i</sup>*, 0) <sup>−</sup> *<sup>Q</sup>*(Δ, *<sup>i</sup>*, 1) <sup>≥</sup> *qiβ*<sup>ˆ</sup> *<sup>i</sup>* − *ωCe* ≥ 0 holds. This implies that there must exist a threshold *<sup>β</sup><sup>i</sup>* satisfying 1 <sup>≤</sup> *<sup>β</sup><sup>i</sup>* <sup>≤</sup> *<sup>β</sup>*<sup>ˆ</sup> *<sup>i</sup>*. If Δ ≥ *βi*, we have *Q*(Δ, *i*, 0) − *Q*(Δ, *i*, 1) ≥ 0.

Altogether, the optimal policy has a threshold structure for *ω* ≥ 0. Then, we examine the non-increasing property of the thresholds. Firstly, we show that the difference between the state-action value function is monotonic with respect to the channel state by fixing the AoI. Assuming that *i*, *j* ∈ H and *i* ≤ *j*, it is easy to obtain that

$$Q(\Delta, j, 0) - Q(\Delta, j, 1) - \left(Q(\Delta, i, 0) - Q(\Delta, i, 1)\right)$$

$$= (q\_j - q\_i) \sum\_{l=0}^{N-1} p\_l (V(\Delta + 1, l) - V(1, l)) \ge 0. \tag{A14}$$

Since *Q*(Δ, *i*, 0) − *Q*(Δ, *i*, 1) ≥ 0 when Δ ≥ *βi*, we have *Q*(Δ, *j*, 0) − *Q*(Δ, *j*, 1) ≥ 0 according to (A14). This implies that the optimal threshold *β<sup>j</sup>* corresponding to channel state *j* is no greater than *βi*, i.e., *β<sup>j</sup>* ≤ *βi*. This completes the whole proof.

#### **Appendix D. Proof of Theorem 3**

Assume that *ϕ*(**x**) is the steady probability of state **x** in a Markov chain. The steady state probability *ϕ*(**x**) satisfies the following global balance equation [18], i.e.,

$$\varphi(\mathbf{x}) = \sum\_{\mathbf{x}' \in S} \varphi(\mathbf{x}') \Pr(\mathbf{x}|\mathbf{x}'). \tag{A15}$$

Let *ϕ*<sup>1</sup> = *ϕ*(1, 0) + *ϕ*(1, 1). We prove Equation (A23) by discussing three cases via mathematical induction.

Case 1: 1 < *i* ≤ *β*<sup>1</sup>

Based on Equation (A15), we have

$$
\begin{split}
\varrho(2,j) &= \varrho(1,0)p\_j + \varrho(1,1)p\_j \\ &= p\_j \varrho\_1. \end{split} \tag{A16}
$$

Assuming that *ϕ*(*i*, *j*) = *pjϕ*<sup>1</sup> holds for all *i* ≤ *k* < *β*1, we examine *ϕ*(*k* + 1, *j*). We have

$$\begin{aligned} \varphi(k+1,j) &= \varphi(k,0)p\_j + \varphi(k,1)p\_j \\ &= p\_j p\_0 q\_1 + p\_j p\_1 q\_1 \\ &= p\_j q\_{1'} \end{aligned} \tag{A17}$$

which completes this segment of the proof.

Case 2: *β*<sup>1</sup> < *i* ≤ *β*<sup>0</sup> Similarly, we have

$$\begin{aligned} \varphi(\beta\_1 + 1, j) &= p\_{\dot{f}}(1 - q\_1)\varphi(\beta\_1, 1) + p\_{\dot{f}}\varphi(\beta\_1, 0) \\ &= p\_{\dot{f}}\varphi\_1((1 - q\_1)p\_1 + p\_0) \\ &= p\_{\dot{f}}\varphi\_1 s\_{0\prime} \end{aligned} \tag{A18}$$

where *s*<sup>0</sup> = 1 − *p*1*q*1. Assuming that *ϕ*(*i*, *j*) = *pjϕ*1*s i*−*β*<sup>1</sup> <sup>0</sup> holds for all *β*<sup>1</sup> < *i* ≤ *k* < *β*0, we examine *ϕ*(*k* + 1, *j*). We have

$$\begin{aligned} \varphi(k+1,j) &= p\_j(1-q\_1)\varphi(k,1) + p\_j\varphi(k,0) \\ &= p\_j\varphi\_1((1-q\_1)p\_1+p\_0)s\_0^{k-\beta\_1} \\ &= p\_j\varphi\_1s\_0^{k+1-\beta\_1} \end{aligned} \tag{A19}$$

which completes this segment of the proof.

Case 3: *i* > *β*<sup>0</sup>

Follow above discussion, we have

$$\begin{split} \boldsymbol{\varrho}(\boldsymbol{\beta}\_{0} + 1, j) &= p\_{j} (1 - q\_{1}) \boldsymbol{\varrho}(\boldsymbol{\beta}\_{0}, 1) + p\_{j} (1 - q\_{0}) \boldsymbol{\varrho}(\boldsymbol{\beta}\_{0}, 0) \\ &= p\_{j} \boldsymbol{\varrho} \boldsymbol{\varrho}\_{1} \boldsymbol{\varrho}\_{0}^{\boldsymbol{\beta}\_{0} - \boldsymbol{\beta}\_{1}} ((1 - q\_{1}) p\_{1} + (1 - q\_{0}) p\_{0}) \\ &= p\_{j} \boldsymbol{\varrho}\_{1} \boldsymbol{\varrho}\_{0}^{\boldsymbol{\beta}\_{0} - \boldsymbol{\beta}\_{1}} \boldsymbol{\varrho}\_{1} \end{split} \tag{A20}$$

where *s*<sup>1</sup> = 1 − *p*0*q*<sup>0</sup> − *p*1*q*1. Assuming that *ϕ*(*i*, *j*) = *pjϕ*1*s β*0−*β*<sup>1</sup> <sup>0</sup> *s i*−*β*<sup>0</sup> <sup>1</sup> holds for all *β*<sup>0</sup> < *i* ≤ *k*, we examine *ϕ*(*k* + 1, *j*). We have

$$\begin{aligned} &\qquad q(k+1,j) \\ &= p\_j(1-q\_1)\varphi(k,1) + p\_j(1-q\_0)\varphi(k,0) \\ &= p\_j\varphi\_1 s\_0^{\p\_0-\beta\_1}((1-q\_1)p\_1 + (1-q\_0)p\_0)s\_1^{k-\beta\_0} \\ &= p\_j\varphi\_1 s\_0^{\p\_0-\beta\_1}s\_1^{k+1-\beta\_0}. \end{aligned} \tag{A21}$$

Altogether, we obtain the steady state probability with respect to an unknown parameter *ϕ*1. According to the fact that ∑<sup>∞</sup> *<sup>i</sup>*=<sup>1</sup> <sup>∑</sup>*N*−<sup>1</sup> *<sup>j</sup>*=<sup>0</sup> *ϕ*(*i*, *j*) = 1, we formulate an equation:

$$\begin{split} &\sum\_{i=1}^{\infty} \sum\_{j=0}^{N-1} \varphi(i,j) \\ &= q\_1 \left\{ \beta\_1 + \sum\_{i=\beta\_1+1}^{\beta\_0} s\_0^{i-\beta\_1} + \sum\_{i=\beta\_0+1}^{\infty} s\_0^{\beta\_0-\beta\_1} s\_1^{i-\beta\_0} \right\} \\ &= q\_1 \left\{ \beta\_1 + \frac{s\_0 - s\_0^{\beta\_1-\beta\_0}}{1-s\_0} + s\_0^{\beta\_1-\beta\_0} \frac{s\_1}{1-s\_1} \right\} \\ &= 1. \end{split} \tag{A.22}$$

where the expression of *ϕ*<sup>1</sup> is obtained.

The average cost of a Markov chain is given by

$$\mathcal{C}\_{\text{mc}} = \sum\_{\mathbf{x} \in \mathcal{S}} \varphi(\mathbf{x}) \mathcal{C}(\mathbf{x}, \pi^\*(\mathbf{x})). \tag{A23}$$

Substituting (24) into (A23), we have

$$\begin{aligned} &\mathcal{C}\_{mc}(\mathcal{J}\_{0\prime}\mathcal{J}\_1) \\ &= \sum\_{i=1}^{\infty} i \sum\_{j=0}^{1} \varphi(i,j) + \sum\_{i=\beta\_0}^{\infty} \omega \mathcal{C}\_{\varepsilon} \varphi(i,0) + \sum\_{i=\beta\_1}^{\infty} \omega \mathcal{C}\_{\varepsilon} \varphi(i,1) .\end{aligned}$$

Furthermore, the first term is given by

$$\begin{aligned} &\sum\_{i=1}^{\infty} i \sum\_{j=0}^{1} q(i,j) \\ &= \sum\_{i=1}^{\beta\_1} i \wp\_i + \sum\_{i=\beta\_1+1}^{\beta\_0} i \mathbf{s}\_0^{i-\beta\_1} + s\_0^{\beta\_0-\beta\_1} \sum\_{i=\beta\_0+1}^{\infty} i \mathbf{s}\_1^{i-\beta\_0} \\ &= \frac{\wp\_1 \beta\_1 (\beta\_1+1)}{2} + \wp\_1 A + \wp\_1 B\_{\prime} \end{aligned} \tag{A24}$$

where

$$A = \frac{s\_0((\beta\_1 + 1) - \beta\_0 s\_0^{\beta\_0 - \beta\_1})}{1 - s\_0} + \frac{s\_0^3 - s\_0^{\beta\_0 - \beta\_1 + 1}}{(1 - s\_0)^2},\tag{A25}$$

and

$$B = \frac{(\beta\_0 + 1)s\_1}{1 - s\_1} + \frac{s\_1^3}{1 - s\_1}. \tag{A26}$$

Furthermore, the sum of last two terms is given by

$$\sum\_{i=\beta\_0}^{\infty} \omega \mathbf{C}\_{\varepsilon} \varphi(i, 0) + \sum\_{i=\beta\_1}^{\infty} \omega \mathbf{C}\_{\varepsilon} \varphi(i, 1)$$

$$\mathbf{c}\_{\varepsilon} = \varrho\_1 \omega \mathbf{C}\_{\varepsilon} \left( \mathbf{s}\_0^{\beta\_0 - \beta\_1} \sum\_{i=\beta\_0}^{\infty} \mathbf{s}\_1^{i-\beta\_0} + p\_1 \sum\_{i=\beta\_1}^{\beta\_0 - 1} \mathbf{s}\_0^{i-\beta\_1} \right)$$

$$\mathbf{c} = \varrho\_1 \omega \mathbf{C}\_{\varepsilon} \left( \mathbf{s}\_0^{\beta\_0 - \beta\_1} \frac{1}{1 - s\_1} + p\_1 \frac{1 - s\_0^{\beta\_0 - \beta\_1}}{1 - s\_0} \right). \tag{A27}$$

This completes the proof.

#### **References**


## *Article* **Scheduling Strategy Design Framework for Cyber–Physical System with Non-Negligible Propagation Delay**

**Zuoyu An, Shaohua Wu \*, Tiange Liu, Jian Jiao and Qinyu Zhang**

Communication Engineering Research Centre, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, China; 19s052043@stu.hit.edu.cn (Z.A.); 19s152063@stu.hit.edu.cn (T.L.); jiaojian@hit.edu.cn (J.J.); zqy@hit.edu.cn (Q.Z.) **\*** Correspondence: hitwush@hit.edu.cn

**Abstract:** Cyber–physical systems (CPS) have been widely employed as wireless control networks. There is a special type of CPS which is developed from the wireless networked control systems (WNCS). They usually include two communication links: Uplink transmission and downlink transmission. Those two links form a closed-loop. When such CPS are deployed for time-sensitive applications such as remote control, the uplink and downlink propagation delay are non-negligible. However, existing studies on CPS/WNCS usually ignore the propagation delay of the uplink and downlink channels. In order to achieve the best balance between uplink and downlink transmissions under such circumstances, we propose a heuristic framework to obtain the optimal scheduling strategy that can minimize the long-term average control cost. We model the optimization problem as a Markov decision process (MDP), and then give the sufficient conditions for the existence of the optimal scheduling strategy. We propose the semi-predictive framework to eliminate the impact of the coupling characteristic between the uplink and downlink data packets. Then we obtain the lookup table-based optimal offline strategy and the neural network-based suboptimal online strategy. Numerical simulation shows that the scheduling strategies obtained by this framework can bring significant performance improvements over the existing strategies.

**Keywords:** cyber–physical system; wireless networked control system; remote control; communication control co-design; age of information

#### **1. Introduction**

In the recent past, applications of the wireless control networks have become more and more extensive, such as drone formations, autonomous vehicles, automatic factories, etc. Some of those scenarios implicate new requirements for remote control technology, which is a sub-topic of communication control co-design. Remote control technology originates from wireless control systems with long propagation delay such as far-sea monitoring and high-efficiency satellite IoT. The main cause of long propagation delay is the large-scale geographic distance. This feature makes it extremely challenging to design CPS under this scenario. In order to meet the need of remote control with propagation delay, that is, to maintain stable closed-loop control and reduce control costs, we propose a new framework to design uplink and downlink scheduling strategies.

As show in Figure 1, a typical CPS deployed under the single closed-loop control scenario contains a control system and a communication system. In the rest of this article, we use single-loop CPS to refer to this specific type of CPS. The communication process of a typical single-loop CPS can be divided into two parts: Uplink sensor transmission and downlink controller transmission. The uplink transmission is initiated by the sensor and sends the state update packet from the plant to the controller. The controller first uses this data to obtain a more accurate estimate of the factory status. Then the downlink transmission is initiated to send command information from the controller to the actuator located at the factory. The actuator acts on the factory to maintain the factory's stability.

**Citation:** An, Z.; Wu, S.; Liu, T.; Jiao, J.; Zhang, Q. Scheduling Strategy Design Framework for Cyber–Physical System with Non-Negligible Propagation Delay. *Entropy* **2021**, *23*, 714. https:// doi.org/10.3390/e23060714

Academic Editors: Anthony Ephremides and Yin Sun

Received: 6 May 2021 Accepted: 2 June 2021 Published: 4 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Taking into account the characteristics of a control system, the command can only be generated with an accurate estimation, which means the downlink transmission must occur after a successful uplink transmission . Because of this fixed timing relationship, CPS has to work in half-duplex in most cases: namely, only one of the uplink sensor transmission and the downlink controller transmission can be activated to send a data packet in the same time slot. That means there is a problem of how to design a scheduling strategy between those two transmissions. Note that the uplink and downlink channels here are not just a single wireless channel, but a simplified modeling of a fixed routing link with multiple relays. This scenario is for some special remote control systems that use satellites as relays. Therefore, the propagation delay in our paper is essentially a collection of various delays contained in the entire relay link, including processing delay, transmission delay, propagation delay, etc. This unified modeling is used because the link characteristics of a fixed routing multi-relay link can be described by an equivalent link with a specific code error rate and propagation delay.

There are many related works about WNCS and CPS [1–4]. Focusing on the conflict of the accuracy requirements of control systems and the limited quantization level [5], proposed the application of dynamic quantization technology in the communication control co-design. Some works designed CPS with the limitation of wireless coding process, such as code length allocation [6,7], code length design [8,9] and adaptive code length adjustment [10]. Considering the fading characteristics of transmission channels, studies of adaptive transmit power adjustment technology by predicting the fast or slow fading of transmission channels are proposed in [11,12]. Some of the above studies include the idea of designing CPS for time-sensitive applications. Nowadays, the most widely used measure of timeliness is Age of Information (AoI) [13], which is defined as the time elapsed since a certain data packet was generated:

$$
\Delta(t) = t - t'\tag{1}
$$

where *t* represents the current time, *t* represents the time when the packet was generated. It used to be very difficult to express the control performance measurement, that is, the system state mean square error (MSE) [14] when the control system and the communication system are combined. The proposal of AoI changed this situation. For example, the system state MSE of a linear time invariant system (LTI) can be simply expressed as a function of AoI. This improvement greatly reduces the difficulty of describing the overall system performance in the communication control co-design scenario [15,16].

**Figure 1.** Cyber–physical system deployed under the single closed-loop control scenario.

Based on AoI, many related studies have been derived, such as the application of the HARQ mechanism for single-loop CPS to improve the overall timeliness [17,18], and the scheduling strategy aiming to minimize the long-term average MSE for single-loop CPS without transmission delay [19]. Some studies about the multi-loop scheduling strategy design aiming at optimizing timeliness have also been proposed. Reference [20] focuses on the design of the data inter-arrival rate and code length allocation strategy. References [21,22] proposed the uplink scheduling strategy of multi-loop WNCS under the ideal assumption of downlink transmission. Furthermore, the authors of [23,24] discuss the application of data packet transmission result prediction technology in WNCS design.

The scenarios studied above concern mainly short-distance Industrial Internet of Things (IIoT), so the impact of uplink and downlink propagation delay on the closed-loop control performance of a CPS is generally ignored. Besides, the above studies only consider one of the two code error rates of the uplink and the downlink transmission. Under the remote control scenario, the code error rates and propagation delay of both links are not only non-negligible, but also have a huge impact on the overall performance of the singleloop CPS. Some works have studied the design of WNCS optimal control strategy under time-delay scenarios [25–27]. However, they do not consider the impact of the code error rate and the scheduling strategy which are issues that cannot be ignored in the design of communication systems in the field of communication engineering. To this end, we propose a new framework to obtain the optimal scheduling strategy while considering both the code error rates and propagation delay. This strategy can minimize the long-term average control cost.

Firstly, we model the single-loop CPS as an MDP problem and give the sufficient conditions for the stability of CPS. Secondly, we propose a heuristic semi-predictive framework to eliminate the impact of the coupling characteristic between the uplink and downlink data packets. Finally, we obtain the lookup table-based optimal offline strategy and the neural network-based suboptimal online strategy for the single-loop CPS. The whole process can be expanded according to actual deployment requirements with any fixed propagation delay as long as the sufficient condition is satisfied.

The rest of this paper is organized as follows: In Section 2, we provide the system model and formulate the optimization problem. In Section 3, we introduce the semipredictive framework and transform the optimization problem into an MDP problem. In Section 4, we obtain the optimal offline strategy and the suboptimal online strategy. In Section 5, we show the numerical simulation results. We conclude this work in Section 6.

#### **2. System Model**

#### *2.1. The Plant of the Single-Loop CPS*

First, we model the plant in the single-loop CPS as a discrete-time LTI system:

$$X\_{k+1} = AX\_k + B\mathcal{U}\_k + Z\_{k\prime} \,\forall k \tag{2}$$

where *k* represents the *k*-th time slot, *Xk* ∈ R represents the state of the plant at time slot *k*, *Uk* ∈ R represents the executed control command, *Zk* ∈ R represents the normally distributed plant noise whose mean and variance are *z*¯ and *R*, respectively. *A* ∈ R represents the state transition coefficient, *B* ∈ R represents the command control coefficient. We assume that the plant state remains unchanged within a single time slot. The goal of CPS is to maintain *X* around 0.

#### *2.2. The Communication Process of the Single-Loop CPS*

In the previous subsection, we explained that the entire single-loop CPS works in the half-duplex mode. Now we will explain the communication process of the single-loop CPS. The entire system adopts a centralized scheduling scheme because this scheme is more suitable for single-loop CPS. Under this scheme, the scheduling decision of uplink and downlink transmission is completely determined by the remote controller. We use *ak* to represent the scheduling decision made by the controller in the time slot *k*. If the controller schedules uplink transmission in the slot *k*, *ak* = 1. If the controller schedules downlink transmission in the slot *k*, *ak* = 2. We assume that the code error rate of the uplink and downlink transmission channels are *ps*, *pc* ∈ (0, 1), respectively. Both code error rates are

constant which means the uplink and downlink transmission fails with probability (*ps*, *pc*) in any time slot, respectively. Then we use *δ<sup>k</sup>* to represent the transmission result of the packet sent in the time slot *k*. No matter which transmission is scheduled, if it succeeds, then *δ<sup>k</sup>* = 1. Otherwise, *δ<sup>k</sup>* = 0. Since the processing procedures of most actual CPS are digital, the packets that have experienced a certain delay will start to be processed in the next processing cycle after it is received; we model the propagation delay of the uplink and downlink channel integer time slots *d*up, *d*down ∈ R, respectively. To simplify the analysis, we assume that the transmission of scheduling instructions and feedback information is ideal.

In addition to the variables described above, we define the following two parameters to describe the status of each part in a single-loop CPS:

(1) State Estimation Age *τk*: This is defined as the age of the latest valid uplink state update packet successfully received by the controller at the end of the time slot *k*. *τ<sup>k</sup>* reflects the accuracy of the estimation maintained by the remote controller. Because of the uplink propagation delay, the minimum value of state estimation age is *d*up. When the specific time slot is not considered, it is abbreviated as *τ*. Its update rule is as follows:

$$\tau\_{k+1} = \begin{cases} \, \, d\_{\text{up}} & \text{if } \, (a\_{\text{j}} = 1) \& \, (\delta\_{\text{j}} = 1) \\\, \, \tau\_{k} + 1 & \text{otherwise} \end{cases} \tag{3}$$

where *j* = *k* − *d*up + 1.

(2) State Control Age *ϕk*: This is defined as the age of the uplink packet used to generate the latest successfully received downlink packet by the actuator at the end of the time slot *k*. This parameter represents the total time it takes for the entire CPS to complete a closed-loop control process. It reflects the degree of divergence of the plant's state. Because of the uplink and downlink propagation delay, the minimum value of the state control age is *d*up + *d*down. When the specific time slot is not considered, it is abbreviated as *ϕ*. Its update rule is as follows:

$$\varphi\_{k+1} = \begin{cases} \pi\_q + d\_{\text{down}} & \text{if } (a\_q = 2) \& (\delta\_q = 1) \\ \varphi\_k + 1 & \text{otherwise} \end{cases} \tag{4}$$

where *q* = *k* − *d*down + 1. The abbreviations *j* and *q* will be used in the rest of this paper. Note that we set the initial values of *τ*<sup>0</sup> and *ϕ*<sup>0</sup> to be 2. These values can be arbitrarily selected within a reasonable range. This is because the long-term average cost we focus on is not affected by those initial values.

#### *2.3. The Control Process of the Single-Loop CPS*

In this subsection, we will explain the control process of the single-loop CPS in detail, which is mainly completed by the remote controller and the actuator. The task of the remote controller can be divided into three parts: Maintaining state estimation, generating control commands, and scheduling uplink and downlink transmissions, while the actuator has only one task: Executing the received control commands.

(1) Maintaining State Estimation: We assume that the sensor can sample the state of the plant without distortion. The uplink transmission cannot be scheduled in every time slot. What is more, the scheduled transmission can fail because of the code error occurring during its propagation process. So the remote controller cannot receive a new state update packet in every time slot. Under these circumstances, the remote controller has to update the estimation *X*˜ *<sup>k</sup>* of the plant state *Xk* through the following process:

$$
\tilde{X}\_{k+1} = \begin{cases}
\ g^{d\_{\text{up}}}(X\_j, k) & \text{if } (a\_j = 1) \& (\delta\_j = 1) \\
A\tilde{X}\_k + B\mathcal{U}\_k & \text{otherwise}
\end{cases}
\tag{5}
$$

where *<sup>g</sup>*(*X*, *<sup>k</sup>*) = *AX* <sup>+</sup> *BUk*, *<sup>g</sup>n*(*X*, *<sup>k</sup>*) = *<sup>g</sup>*(*gn*−1(*X*, *<sup>k</sup>* <sup>−</sup> <sup>1</sup>), *<sup>k</sup>*) <sup>∀</sup>*<sup>n</sup>* <sup>&</sup>gt; 1, and *<sup>g</sup>*1(*X*, *<sup>k</sup>*) = *g*(*X*, *k*). In this scenario, this estimation method has been proven to be optimal [28]. When a certain uplink transmission is successful, the remote controller can use the plant

state *Xk*−*d*up<sup>+</sup>1, which is the exact value for *d*up − 1 time slots before, to obtain the state estimation *X*˜ *<sup>k</sup>* + 1 of the next time slot. When the current time slot has no successful uplink transmission, the controller can only update *X*˜ *<sup>k</sup>* + 1 with *X*˜ *<sup>k</sup>*. According to this process, we can derive the state estimation MSE of the remote controller as *Q*˜ *<sup>k</sup>*:

$$\vec{Q}\_k = \mathbb{E}\left[\left(\vec{X}\_k - X\_k\right)^2\right] \tag{6}$$

Note that the state estimation error of the remote controller is entirely caused by the noise *Zk*. By using the state estimation age *τk*, we can rewrite the state estimation MSE as a recursive function of the noise variance *R*:

$$Q\_{k+1} = \begin{cases} \begin{array}{l} f(d\_{\rm up}) \\ f(\tau\_k + 1) \end{array} & \text{if } (a\_j = 1) \& (\delta\_j = 1) \\\ f(\tau\_k + 1) & \text{otherwise} \end{cases} \tag{7}$$

where *f*(*x*) = ∑*<sup>x</sup> <sup>i</sup>*=<sup>1</sup> (*A*2) *<sup>i</sup>* - 1*R*. Equation (6) uses the definition of AoI to derive the MSE of the estimation. This representation greatly reduces the difficulty of calculation. In the following part, we will use the same idea to derive the single-loop CPS control performance metrics.

(2) Control Command Generation and Execution: In each time slot, while the remote controller maintains the state estimation, it also uses the estimation to generate a control command *U*˜ *<sup>k</sup>*:

$$
\tilde{\mathcal{U}}\_k = K \tilde{\mathcal{X}}\_k \tag{8}
$$

where *K* is the command generation coefficient. The goal of this control process is to maintain the state around 0. Since the downlink transmission has a propagation delay of *<sup>d</sup>*down time slots, we must ensure *BK* <sup>=</sup> <sup>−</sup>*Ad*down . To simplify the analysis, we set *<sup>B</sup>* <sup>=</sup> <sup>−</sup>*Ad*down , *<sup>K</sup>* <sup>=</sup> 1. Due to the code error rate and scheduling decisions, not every control command *U*˜ can be received by the actuator. Only those scheduled and successfully transmitted can be used by the actuator. Therefore, the control command executed by the actuator is *Uk*+1:

$$\mathcal{U}\_{k+1} = \begin{cases} \begin{array}{l} \widehat{\mathcal{U}}\_q \end{array} & \text{if } (a\_q = 2) \& (\delta\_q = 1) \\\ 0 & \text{otherwise} \end{array} \tag{9}$$

where *q* = *k* − *d*down + 1. This control method shown by (8) and (9) is called single-step control, which is a common form in the field of classic cybernetics. Using this method, when a control command is successfully delivered to the actuator, the actual state value will return to a value as close to 0 as possible at one time. Such a process can maximize the effect of a single instruction.

(3) Single-Loop CPS Control Performance Metrics: Consistent with the estimation performance metrics, the control performance metrics is defined as the state MSE of the plant *Qk*:

$$Q\_k = \mathbb{E}[X\_k^2] \tag{10}$$

Similar to *Q*˜ *<sup>k</sup>*, we can rewrite *Qk* as a function of noise variance *R* and state control age *ϕ*:

$$Q\_{k+1} = \begin{cases} \ f(\tau\_q + d\_{\text{down}}) & \text{if } (a\_q = 2) \& (\delta\_q = 1) \\\ f(\varphi\_k + 1) & \text{otherwise} \end{cases} \tag{11}$$

According to the control cost given by Equation (11), we can obtain the long-term average control cost, that is, the long-term average plant state MSE:

$$J = \lim\_{K \to \infty} \frac{1}{K} \sum\_{k=0}^{K} Q\_k \tag{12}$$

Equation (12) reflects the state deviation in the field of classic cybernetics which is the core cost metrics we care about. Please note that this parameter used to be very difficult to quantify without the introduction of AoI. Under certain conditions, the limit contained in Equation (12) may not exist, and the problem is unsolvable. In order to prevent such situations, the sufficient condition for the stability of WNCS with propagation delay will be given later, namely equation (19). In this paper, the scheduling strategy will be designed on the premise that equation (19) is satisfied.

(3) Uplink and Downlink Scheduling Process: In the previous subsection, we introduced the control performance measurement of a single-loop CPS. Now we will describe the scheduling process in detail. It has been explained that a single-loop CPS has two communication scenarios—the uplink transmission and the downlink transmission—and we can only choose one of them in each time slot under half-duplex mode. According to the previous definition, the scheduling decision of time slot *k* is recorded as *ak*. The set of scheduling decisions of all time slots is called a scheduling strategy:

$$
\pi \triangleq (a\_1, a\_2, \dots, a\_{k'} \dots) \in \Pi \tag{13}
$$

where Π represents the set of all scheduling strategies. Different scheduling strategies can significantly affect the control performance of a single-loop CPS. Every scheduling strategy *π* has its corresponding long-term average control cost *Jπ*. Among all scheduling strategies, there is an optimal strategy *π*<sup>∗</sup> ∈ Π, which satisfies:

$$J\_{\pi^\*} \lesssim J\_{\pi\prime} \,\forall \pi \in \Pi \tag{14}$$

Therefore, we can construct the following optimization problem. The goal of this problem is to minimize the long-term average plant state MSE to obtain the optimal scheduling strategy while taking transmission propagation delay and code error rates of two wireless channels into account, namely

$$\min\_{\pi} \lim\_{K \to \infty} \frac{1}{K} \sum\_{k=0}^{K} Q\_k \tag{15}$$

#### **3. Semi-Predictive Framework and MDP Modeling**

In this section, we will introduce the coupling characteristic between the uplink and downlink data packets which is caused by their propagation delay. In the following paper, we will use the coupling characteristic to refer to the coupling characteristic between the uplink and downlink data packets to save space. We propose a semi-predictive framework to eliminate the effect of the coupling characteristics on the solution of optimization problem (15). Based on this framework, we remodel this optimization problem to an MDP problem. Note that the semi-predictive framework we proposed is suitable for any value of the uplink and downlink propagation delay. For the generality, we use *d*up = *d*down = 1 as an example to illustrate the scheduling strategy design process. In the actual applications with different propagation delay, we only need to modify the value of *d*up, *d*down and adjust some parameters in the following modeling step to meet specific design requirements.

#### *3.1. The Packet Outdate Problem*

Section 2 introduced the control mechanism of a single-loop CPS. Through the above analysis, it is easy to see that state update packets and control command packets have strong coupling characteristic for single-step control methods. Actually, such a coupling characteristic exists in any closed-loop control scenario as long as there exists propagation delay. This characteristic will cause some successfully delivered packets to become outdated. As shown in Figure 2, the green and red arrows represent state update packets up1 (left green arrow), up2 (left red arrow) and the control command packets down1 (right green arrow), down2 (right red arrow), respectively. The command down1 is generated by the controller using up1, while down2 is generated by the controller using up2. During the

period from the slot up2 sent to the slot down2 executed, if down1 is executed successfully, both up2 and down2 become invalid. In time slot 4, down1 is executed; the result is that the real state of the plant was returned to a value around 0. This process causes an interruption in the state estimation process which means the estimation updated by up2 is no longer accurate, so up2 is outdated. Since up2 is outdated, the control command down2 which was generated from it is also outdated. This is the main effect of the coupling characteristic and we named it the packet outdate problem.

As we can see, this problem is mainly caused by the discontinuity in the dynamic process of the plant. The discontinuity only occurs when a downlink control command is executed, which means the uplink state update packet will not cause this problem. When this happens, the outdated uplink and downlink data packets require different processing methods. For an outdated downlink packet, it only needs to be discarded. However, for an outdated uplink packet, we have to backtrack the state estimation before this outdated packet is used. We show the evolution of the state estimation age and state control age in Figure 2. It can be seen that the state estimation age has been backtracked by changing from *τ*(3) = 2 to *τ*(4) = 4. The state control age will not be updated like this.

**Figure 2.** Analysis of Packet Outdated Phenomenon.

#### *3.2. Main Idea of the Semi-Predictive Framework*

In the previous subsection, we explained that the packet outdate problem has an impact on the update of the state estimation age, but this problem does not affect the update of the state control age. Therefore, when we try to construct a theoretical analysis framework, as long as the state control age is correct, the final analysis result can be guaranteed to be correct. In other words, the state estimation age of some time slots is allowed to deviate from the actual physical process. As long as it can be ensured that the state estimation age is accurate when the downlink data packet arrives at the actuator, the correct theoretical analysis can be guaranteed. It can be seen that it is possible to skip the state estimation age backtracking process in the theoretical analysis by using this feature. This is the main idea of the semi-predictive framework.

In the normal communication process, the decoding result of a data packet can only be determined after it arrives at the destination. For an uplink data packet, only after it arrives at the controller can it be known whether the data packet can be successfully decoded, while for a downlink packet, only after it arrives at the actuator can it be known whether the data packet can be successfully decoded. However, under the semi-predictive framework, we assume that the transmission result of a downlink packet is known as soon as the downlink packet is sent. Note that we do not predict the result of an uplink packet. This is because the execution of the downlink command is the root cause of the packet outdated problem.

Take the case of Figure 2 as an example again; if we can foresee that the downlink control command packet down1 can be successfully decoded and is not outdated, then during the period from its sending to its arrival, any packets sent or arrived can be directly discarded since they will be outdated by down1. Through this process, the impact of the packet outdated problem is eliminated and state estimation age backtracking is avoided.

While the update process of the state estimation age under the semi-predictive framework is different from the actual physical process, the scheduling strategy obtained based on this framework can still be directly applied to an actual physical process. In the actual physical process, if a downlink data packet arrives at the actuator successfully and is not outdated, then the uplink and downlink transmissions scheduled during its transmission must be outdated. In other words, no matter what scheduling decision the controller made, those packets sent during this period will be outdated. In other words, those scheduling decisions can be arbitrary since they do not affect the final result. Assuming that the downlink control command packet down1 in Figure 2 can be successfully decoded and not outdated, we will explain both age update processes under the semi-predictive framework and the actual physical process in detail.

(1) Semi-Predictive Framework: If down1 can be successfully decoded and not outdated, then the controller knows that it does not matter whether it chooses uplink or downlink during the transmission of down1 because those scheduled packets will be outdated anyway. Under these circumstance, a reasonable scheduling strategy is to regularly schedule one of the uplink and downlink transmissions during this period to consume time.

(2) Non-Predictive Framework (Actual Physical Process): In the actual physical process, during the transmission of down1 , the controller continues to schedule uplink or downlink transmissions according to a certain strategy. However, when down1 is received and decoded successfully, the previous scheduled transmissions of the controller are all outdated. So in the end, the scheduled transmissions during this period only consume time and have no practical effect.

It can be observed that, under the semi-predictive framework and the actual nonpredictive scheduling, the single-loop CPS transmission results are uniform; that is, it is accurate to use the semi-predictive framework in the theoretical design and directly apply the results to the real applications. This subsection qualitatively analyzes the unity of the semi-predictive framework and the actual physical process. In the next subsection, we will quantitatively illustrate how this framework corresponds to actual physical processes through MDP modeling.

#### *3.3. MDP Modeling of the Semi-Predictive Framework*

Based on the semi-predictive framework, we model the single-loop CPS with uplink and downlink propagation delay as an MDP process with the following four elements:

(1) State Space: The state space of this MDP is

$$\mathbb{S} \stackrel{\triangle}{=} \{a'(-d\_{\text{max}} + 1), \dots, a'(-1), a'(0), D(0), \tau(0), \varphi(0)\} \tag{16}$$

where *d*max = max{*d*up, *d*down}, *D*(*n*) ∈ {0, 1, ···, *d*down + 1}. *a*(*n*) represents the scheduling decision made in the time slot *n*. *D*(*n*) represents the time interval between the time slot when the latest valid downlink command packet (successfully transmitted and not outdated) in the time slot *n* was generated and the current time slot *n*. *τ*(*n*) and *ϕ*(*n*) represent the state estimation age and the state control age at the time slot *n*, respectively. The time slot *n* is based on the current time slot: The time slot for which scheduling decisions are being made. Taking *a* (−1) as an example: It represents the transmission action taken in the previous time slot of the current time slot. We set both the uplink and downlink propagation delay to be 1 for illustration in the rest of this paper, so the corresponding state space is: S - {*a* (0), *D*(0), *τ*(0), *ϕ*(0)}. In the subsequent sections of this paper, the state space is abbreviated as S - {*a* , *D*, *τ*, *ϕ*} to save space.

(2) Action Space: The action space is A - {0, 1}. This action space corresponds to the scheduling action *ak*. If the controller schedules uplink transmission in the slot *k*, *ak* = 1. If the controller schedules downlink transmission in the slot *k*, *ak* = 2.

(3) State Transition Probability Matrix: The transition matrix is *P*(*s* |*s*, *a*). The state transition probability is the probability that the next state is *s* by taking action *a* in the current state *s*. The transition probability is determined by the channel code error rate. According to the different parameter pairs: (*a* , *D*) in the state S, the state transition matrix can be divided into five parts: (*a* , *D*) = [(1, 1),(1, 2),(2, 0),(2, 1),(2, 2)]. The complete construction rules are given in Appendix A.

(4) Cost Function: It can be seen from (4) and (11) that the cost function in a specific state is independent of the action. The cost function can be expressed as a function of the state control age *ϕk*:

$$\mathcal{C}(s, a) = \mathcal{Q}\_k(s) = f(\mathcal{q}\_k) \tag{17}$$

In the MDP modeling of the semi-predictive framework, the core parameter is *D*(*n*). We limit its maximum value to *d*down + 1 because we only need to track the downlink transmissions in the past *d*down time slots to ensure that we do not miss any possible packet outdated problems. Besides, such process can help to reduce the scale of the state space. The update rule of *D*(*n*) is as follows:

$$D\_{k+1} = \begin{cases} 0 & \text{if } (\mathbf{a}\_k = \mathbf{2}) \& (\delta\_k = 1) \\ \max(d\_{\text{down}} + 1, D\_k + 1) & \text{otherwise} \end{cases} \tag{18}$$

This updated process reflects the main idea of the semi-predictive framework and guarantees that it will not cause any differences between the state control ages of the theoretical analysis and the actual physical processes. In the next section, we will use the semipredictive framework to design the optimal scheduling strategy.

#### **4. Online and Offline Scheduling Strategies**

In this section, we first give the sufficient condition for the existence of the optimal scheduling strategy. Then we use the relative value iteration algorithm to obtain the lookup table-based optimal offline strategy. Aiming at reducing the space complexity of the algorithm and saving space for storing the optimal offline strategy, we further propose a neural network-based suboptimal online strategy. For different uplink and downlink propagation delay, the acquisition process of both strategies is universal, which means that the semi-predictive framework has high practical application value.

#### *4.1. Sufficient Conditions for the Strategies' Existence*

**Theorem 1.** *(Sufficient conditions for the stability of multi-loop half-duplex CPS with fixed uplink and downlink propagation delay.) Assuming there are K single-loop CPS, all of them share the same controller and form up a multi-loop CPS. If the controller can only schedule L uplink transmissions or L downlink transmissions in each time slot, then for each single-loop CPS i, if the code error probability of its corresponding uplink and downlink channels satisfies*

$$\max\{p\_{i,\upmu\prime}, p\_{i,\text{down}}\} < \left(\frac{1}{\left(A\_i\right)^2}\right)^{\lceil K/L \rceil}, i \in \{1, 2, \dots, K\} \tag{19}$$

*then there must exist a stationary deterministic scheduling strategy that can stabilize the multi-loop CPS. This stability remains as long as the uplink and downlink propagation delay are fixed, but the long term control performance metrics converge to a larger value with the increase of the propagation delay. When K* = 1*, L* = 1 *the above multi-loop CPS is just a single-loop CPS.The proof is given in Appendix B.*

The essence of this sufficient condition is to link the instability of the control system with the reliability of the communication system. When the reliability of the communication system is higher than the instability of the control system, an optimal scheduling strategy can be found for the communication system to meet the needs of the control system. This condition can effectively guide the design of single- and multi-loop CPS.

#### *4.2. Lookup Table-Based Optimal Offline Strategy*

Since there is no theoretical upper limitation for the state estimation age and the state control age, the scale of the MDP state space is infinite, so it must be truncated before solving. We select *N* = max{*τ*, *ϕ*} as the truncation condition, and use the relative value iteration algorithm to solve the MDP problem. When the value of *N* is appropriate, this truncation will have no effect on the control performance. Such a suitable *N* can be obtained by conducting Monte Carlo experiments. In this section, we take *N* = 10 as an example to show the resulting scheduling strategy in Figure 3.

**Figure 3.** Optimal Off-line Policy with *N* = 10. Red squares represent action *a* = 1; yellow squares represent action *a* = 2.

In Figure 3, those red squares represent that the controller schedules uplink transmission in the corresponding state, and the yellow squares represent that the controller schedules downlink transmission in the corresponding state. As shown in Figure 3a,c,d, if *D* = {0, 1}, no matter which transmission is scheduled, the related packet will be outdated. So under this circumstance, the scheduling strategy can choose any action arbitrarily. Since we chose the relative value iterative algorithm to solve the MDP problem, the strategy we obtained chooses to use uplink transmission to fill these unnecessary transmissions. Note that this part corresponds to the description of Section 3 part C. We take down1 as an example again: In the actual physical process, it is not known that the next two transmissions are unnecessary transmissions after down1 is sent. The controller does not know that *D* = {0, 1}. Instead, it thinks that *D* is still equal to 2 at those time slots. Therefore, the controller continues to schedule according to the scheduling strategy. However, down1 makes those two packets outdated when it is executed, while for those states whose *D* = 2, the controller can make a scheduling decision with the right state information. The entire process makes sure that the actual process is consistent with the theoretical process.

After obtaining this scheduling strategy, it is stored as a lookup table by the controller and does not require any extra calculation ability from the controller, so we call it an offline strategy. However, since the iterative algorithm is a model-based algorithm, as *<sup>N</sup>* gradually increases, the scale of the state space *NS* <sup>=</sup> <sup>2</sup> · <sup>3</sup> · *<sup>N</sup>* · *<sup>N</sup>* <sup>=</sup> <sup>6</sup>*N*<sup>2</sup> in the MDP modeling increases exponentially. This leads to a sharp increase in the space complexity of the solving process and the lookup table could be too large to be stored. In order to solve these problems, we propose an improved scheme based on neural network in the next subsection.

#### *4.3. Neural Network-Based Suboptimal Online Strategy*

In Section 3, we remodeled the optimization problem to an MDP problem, and solved it to obtain the optimal offline strategy in the previous subsection. The optimal offline scheduling strategy based on the lookup table has two obvious shortcomings: The size of the lookup table increases linearly as the total number of states in the state space increases and the space complexity required in the calculation process increases exponentially as the total number of states increases. When the optimal offline strategy is actually deployed, there is no guarantee that the central controller has enough storage space to store the entire lookup table. It may even be impossible to perform calculations because the state space is too large. Therefore, here we design a new suboptimal online scheduling strategy based on neural network. The idea of this strategy is to replace the lookup table in the previous strategy with a neural network to save storage space. Neural network is a very ideal approximation function of lookup table, theoretically it can be approximated without error. That means in the theory of reinforcement learning, this strategy can achieve the performance of the optimal strategy. We will show that the performance of this suboptimal online strategy is very close to the performance of the optimal offline strategy in the next section.

In order to obtain this neural network, we use a the model-free algorithm called Deep Q Network (DQN). The algorithm continuously learns the hidden laws of the MDP problem by interacting with the environment and continuously trains the neural network to obtain better performance. We show the detailed process of the algorithm in Algorithm 1.


The structure of the neural network we obtained is shown in Figure 4: Four neurons in the input layer, fifty neurons in the hidden layer, and two neurons in the output layer. This neural network-based scheduling strategy is an online strategy which means that, in order to use this strategy, the current state *s* must be input to the neural network first. Then the controller needs to run real-time calculations to obtain the action values *A*(*s*, *a*) for taking different actions in the current state. The action value represents how much reward can be obtained by taking the action, so the scheduling strategy is to select the action with the largest *A*(*s*, *a*) among all actions.

DQN is a relatively mature reinforcement learning algorithm, so we only give the parameter settings of this algorithm and briefly introduce its training process. We run *E* = 2000 episodes, and each episode contains 1000 steps. In each step, this algorithm executes the greedy strategy with a probability of *ε* = 0.7, and the random strategy with a probability of 1 − *ε* = 0.3. After each step, one state transition datum is stored in the data set. The scale of this data set is *M* = 2048, and it is updated in a loop covering manner. A new episode is automatically initialized every 1000 steps. In the meantime, the training process is performed every *T* = 256 steps, the algorithm selects *B* = 512 data from the data set for training. The optimizer we used is the Root Mean Square prop optimizer (RMSprop).

With the help of the DQN algorithm, we can obtain the neural network-based suboptimal online strategy. The controller only needs to store the node value of this network, and then calculates the action value in real time according to the current state in each time slot. In other words, this strategy saves a lot of storage space by consuming a small amount of computing ability of the controller. Such an advantage makes this strategy very meaningful in practical applications.

**Figure 4.** Neural Network Structure.

#### **5. Numerical Simulation**

In this section, we run the numerical simulation on those strategies we proposed and some existing strategies. We illustrate the advantages of the proposed strategies through comparison. First we introduce two benchmark strategies. The first is the switch scheduling strategy, that is, alternate uplink and downlink transmissions between each time slot; the second is the insist scheduling strategy, that is, continuous scheduling of uplink or downlink transmissions until success, then the transmission is exchanged.

The parameter settings in the numerical simulation are as follows: The state transition coefficient is *A* = [1.1, 1.3], the code error rates of the uplink and downlink channels are *ps* = *pc* = [0.1, 0.2], the specific values are marked on the curve obtained from the simulation. The initial state of the plant is *X*<sup>0</sup> = 1. The noise distribution is N (*z*¯ = 0, *R* = 1). The command control coefficient is *B* = −*A*. The initial state control variable is *s*<sup>0</sup> = (*a*0, *D*0, *τ*0, *ϕ*0)=(1, 1, 2, 2). The corresponding initial scheduling action is *a*<sup>0</sup> = 1. The initial state of the controller estimation is *X*˜ *<sup>o</sup>* = 1. The range of truncated state space is *N* = max{*τ*, *ϕ*} = 20. The plant noise follows normal distribution N (*z*¯ = 0, *R* = 1). Each

strategy runs 500 episodes with 10,000 time slots each episode. The final long-term average plant state MSE is the average of the results of 500 episodes.

Figure 5 show the long-term average MSE of four strategies with *A* = 1.3 and *ps* = *pc* = [0.1, 0.2]. It can be seen that the MDP strategy, that is, the optimal offline strategy, has the best performance among all strategies, which also is the best performance that all possible scheduling strategies can achieve. While the performance of the neural network-based online strategy has slightly decreased, it is still significantly ahead of the existing strategies, and the performance gap between the optimal offline strategy and the suboptimal online strategy is very small. This gap can be eliminated in theory, but due to the limitations of deep reinforcement learning technology, it is currently difficult to fully achieve the optimal performance. It is relatively simple to obtain a suboptimal strategy with very close performance.

**Figure 5.** Long-term average plant state MSE of four policies with *A* = 1.3 and *ps* = *pc* = [0.1, 0.2].

Figure 6 show the performance comparison between the optimal offline strategy and the two existing strategies under different state transition coefficient *A*. The suboptimal online strategy is not shown because it has been explained that the suboptimal strategy can theoretically approach the optimal. The state transition coefficient and the channel code error rates both reflect the instability of the control system and the reliability of the communication system in Equation (19). Combined with Figure 5, it can be seen that their influence on CPS is the same. A larger state transition coefficient or a higher channel code error rate lead to an increase in the long-term average plant state MSE, and when they exceed a certain limit and no longer satisfy Equation (19), the long-term average MSE of the CPS no longer converges, which means the single-loop CPS is unstable.

**Figure 6.** Long-term average plant state MSE of three policies with *ps* = *pc* = 0.2 and *A* = {1.1, 1.2}.

#### **6. Conclusions**

We proposed the semi-predictive framework to design scheduling strategies for singleloop CPS with uplink and downlink propagation delay. This framework can obtain the optimal offline strategy which is the upper bound on the performance among all strategies and a suboptimal online strategy with more practical application value. By adjusting the parameters, the semi-predictive framework can meet the need of any practical applications. We introduced the complete process of designing scheduling strategies under this framework by taking a specific situation as an example. The numerical simulation proved that the obtained strategies can effectively improve the performance of the existing strategies.

**Author Contributions:** Conceptualization, Z.A. and S.W.; methodology, Z.A. and T.L.; software, Z.A. and J.J.; validation, Z.A., S.W. and Q.Z.; formal analysis, Z.A. and T.L.; investigation, Q.Z.; resources, Z.A.; data curation, Z.A. and S.W.; writing—original draft preparation, Z.A.; writing—review and editing, Z.A.; visualization, Z.A.; supervision, Z.A.; project administration, Z.A.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by the National Key Research and Development Program of China under Grant no. 2020YFB1806403, and in part by the National Natural Science Foundation of China under Grant nos. 61871147, 61831008, 62071141, 61371102, and in part by the Guangdong Science and Technology Planning Project under Grant no. 2018B030322004.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** Z.A., S.W., T.L., J.J. and Q,Z. would like to thank Zehua Wang, Weihao Guo, Xiao Liang, Jiabao Kang and Dongrui Li for their fruitfulinsights and discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Construction Rules of the State Transition Probability Matrix**

Here we give the complete construction rules of the state transition matrix. Firstly, we give all the possible new states after a state transition as follows:


$$s'\_{\mathbb{S}} = (2, 2, 1, q+1) \tag{A5}$$

$$s'\_6 = (2, 0, 1, q+1) \tag{A6}$$

$$s'\_{\mathcal{T}} = (1, 1, \tau + 1, \tau + 1) \tag{A7}$$

$$s\_8' = (2, 1, \tau + 1, \tau + 1)\tag{A8}$$

Secondly, we use *R* and *R* to mark the transmission results. *R* represents the result of the downlink transmission scheduled in the next time slot. *R* represents the result of the uplink transmission arrived in the next time slot. Note that *R* is known by prediction while *R* is known by normal communication process. These abbreviations can help to simplify the expression of the rules.

We will give the construction rules in the form of *P*[*s* |*s*, *c*] = *p* which means that when the condition *c* is satisfied, the previous state *s* transfers to the new state *s* with a probability of *p*.

*When s* = (1, 1, *τ*, *ϕ*)*:*

$$\begin{array}{c} P[s'\_1 | s, a = 1] = 1 \\ P[s'\_2 | s, a = 2] = 1 \end{array} \tag{A9}$$

$$\begin{array}{c} \text{When } \mathbf{s} = (2, 0, \tau, q); \\ \text{P}[\mathbf{s}'\_7 | \mathbf{s}, a = 1] = 1 \\ \text{P}[\mathbf{s}'\_8 | \mathbf{s}, a = 2] = 1 \end{array} \tag{A10}$$

$$\begin{array}{c} \text{When } \mathbf{s} = (2, 1, \tau, q); \\ \text{P}[\mathbf{s}'\_1 | \mathbf{s}, a = 1] = 1 \\ \text{P}[\mathbf{s}'\_2 | \mathbf{s}, a = 2] = 1 \end{array} \tag{A11}$$

*When s* = (2, 2, *τ*, *ϕ*)*:*

$$\begin{array}{l} P[s'\_1 | s, a = 1] = 1 \\ P[s'\_2 | s, a = 2, R = 0] = p\_c \\ P[s'\_2 | s, a = 2, R = 1, \tau = \rho] = p\_s \cdot (1 - p\_c) \\ P[s'\_3 | s, a = 2, R = 1, \tau \neq \rho] = p\_s \cdot (1 - p\_c) \end{array} \tag{A12}$$

*When s* = (1, 2, *τ*, *ϕ*)*:*

$$\begin{array}{l} P[s'\_1 | s, a = 1] = p\_s \\ P[s'\_2 | s, a = 2, R' = 0, R = 0] = p\_s \cdot p\_c \\ P[s'\_2 | s, a = 2, R' = 0, R = 1, \tau = \varphi] = p\_s \cdot (1 - p\_c) \\ P[s'\_3 | s, a = 2, R' = 0, R = 1, \tau \neq \varphi] = p\_s \cdot (1 - p\_c) \\ P[s'\_4 | s, a = 1, R' = 1] = 1 - p\_s \\ P[s'\_5 | s, a = 2, R' = 1, R = 0] = (1 - p\_s) \cdot p\_c \\ P[s'\_6 | s, a = 2, R' = 1, R = 1] = (1 - p\_s) \cdot (1 - p\_c) \end{array} \tag{A13}$$

#### **Appendix B. Proof of Theorem 1**

*Appendix B.1. Scheduling 1 Subsystem per Time Slot without Delay*

To prove sufficient conditions, we only need to prove that there exists a stationary deterministic strategy that can make multi-loop CPS stable. Here we prove that the roundrobin insist scheduling strategy can keep the system stable. We first prove the case of *L* = 1. Round-robin means that in every *K* time slots, the controller schedules each subsystem

once in turn, and the scheduling sequence is fixed from *i* = 1 to *i* = *K*. Insist refers to when scheduling each subsystem, continuously scheduling uplink or downlink transmission until it succeeds, then switch to another transmission. Therefore, the actions of a single subsystem under the round-robin insist scheduling strategy can be given in the form of the following time axis:

The time axis between two consecutive successful downlink scheduling is recorded as a control loop. It can be seen that the AoI evolution process of each control loop of one subsystem is:


Note that the time slots included in a complete control loop are the time slots marked in red on the coordinate axis in Figure A1, that is, the control age ranges from *n K* to (*n* + *m* + *n*)*K*. Each control loop has repeatability, so we only need to prove that the long-term average cost within the range of one control loop converges.

$$\begin{array}{c} \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \end{array} \quad \begin{array}{c} \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \end{array} \quad \begin{array}{c} \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \end{array} \quad \begin{array}{c} \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \end{array} \quad \begin{array}{c} \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \end{array} \quad \begin{array}{c} \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \hline \text{I} \\ \end{array}$$

According to the channel error probability, the *M* uplink transmissions and *N* downlink transmissions in each control loop can be modeled as a geometric distribution with the probability of success being (1 − *ps*) and (1 − *pc*) respectively. *M* and *N* are different in each control loop, *N* Represents the number of downlink transmissions in the previous loop of the current control loop. (*n* , *m*, *n*) are their specific observations. *Ci* and *Ti* represent the total cost and total time of the i-th control loop of the current subsystem respectively:

$$\mathbf{C}\_{\mathbf{i}} = \sum\_{q=0}^{(m+n)K-1} f(n'\mathbf{K} + q) = \sum\_{q=1}^{(m+n)K} f(n'\mathbf{K} + q - 1) \tag{A14}$$

$$T\_i = (m+n)K \tag{A15}$$

where *f*(*ϕ*) = ∑*<sup>ϕ</sup> <sup>i</sup>*=<sup>1</sup> (*A*2) *i*−1 . Next, we can express the long-term average cost as:

$$J = \lim\_{t \to \infty} \frac{\mathbb{C}\_1 + \mathbb{C}\_2 + \dots + \mathbb{C}\_t}{T\_1 + T\_2 + \dots + T\_t} = \frac{\mathbb{E}[\mathbb{C}]}{\mathbb{E}[T]} \tag{A16}$$

$$\mathbb{E}[\mathbb{C}] = \sum\_{n'}^{\infty} \sum\_{m}^{\infty} \sum\_{n}^{\infty} \begin{pmatrix} \mathbb{E}[\mathbb{C} | N'=n', M=m, N=n] \\ \cdot \mathbb{P}[N'=n', M=m, N=n] \end{pmatrix} \tag{A17}$$

$$E[L] = \sum\_{n'}^{\infty} \sum\_{m}^{\infty} \sum\_{n}^{\infty} \left( (m+n) \cdot K \cdot \mathbb{P}[N'=n', M=m, N=n] \right) \tag{A18}$$

It can be seen that if E[*C*] is bounded, then *J* is bounded. According to the definition of *Ci* and the three geometrical distributions (*N* , *M*, *N*), which are independent of each other, we have:

$$\mathbb{E}[\mathbb{C}|\mathcal{N}'=n',\mathcal{M}=m,\mathcal{N}=n] = \sum\_{q=0}^{(m+n)K-1} f(n'\mathcal{K}+q) = \sum\_{q=1}^{(m+n)K} f(n'\mathcal{K}+q-1) \tag{A19}$$

$$\begin{aligned} \mathbb{P}[N'=n',M=m,N=n] \\ \mathbb{P} = \mathbb{P}[N'=n'] \cdot \mathbb{P}[M=m] \cdot \mathbb{P}[N=n] \\ = (1-p\_c)p\_c^{n'-1}(1-p\_s)p\_s^{m-1}(1-p\_c)p\_c^{n-1} \end{aligned} \tag{A20}$$

Choose *p*max = max{*ps*, *pc*}, we can derive that:

$$\mathbb{E}[\mathbb{C}] \ll \varkappa\_1 \cdot \sum\_{n'}^{\infty} \sum\_{m}^{\infty} \sum\_{n}^{\infty} \left( \sum\_{q=1}^{(m+n)K} f(n'K + q - 1) \cdot p\_{\max}^{n'+m+n} \right) \tag{A21}$$

where *α*<sup>1</sup> = (1 − *pc*)*pc* <sup>−</sup>1(<sup>1</sup> <sup>−</sup> *ps*)*ps* <sup>−</sup>1(<sup>1</sup> <sup>−</sup> *pc*)*pc* <sup>−</sup>1. Since *<sup>f</sup>*(·) is a strictly increasing function and (*n* , *m*, *n*) are all greater than 0, we can derive that:

$$\mathbb{E}[\mathbb{C}] < a\_2 \cdot \sum\_{n'}^{\infty} \sum\_{m}^{\infty} \sum\_{n}^{\infty} \left( (n' + m + n) \cdot f(n'K + mK + nK) \cdot p\_{\text{max}}^{n'+m+n} \right) \tag{A22}$$

where *α*<sup>2</sup> = *K*(1 − *pc*)*pc* <sup>−</sup>1(<sup>1</sup> <sup>−</sup> *ps*)*ps* <sup>−</sup>1(<sup>1</sup> <sup>−</sup> *pc*)*pc* <sup>−</sup>1. We abbreviate *n* + *m* + *n* as *i*, that is, *i* = *n* + *m* + *n*. Considering *i* 3, and when *i* = *n* + *m* + *n* is a fixed value, the possible combinations of (*n* , *m*, *n*) 1 satisfy the mathematical relationship of ∑ *n* ∑ *m* ∑ *n* (1) < (*n* + *m* + *n*)3, namely:

$$\sum\_{n'} \sum\_{m} \sum\_{n} \left( n' + m + n \right) < (n' + m + n)^3 \cdot (n' + m + n) \tag{A2.3}$$

$$\sum\_{i\text{n}'} \sum\_{m} \sum\_{n}(i) < (i)^4 \tag{A24}$$

We can derive that:

$$\mathbb{E}[\mathbb{C}] < \alpha\_2 \cdot \sum\_{i}^{\infty} \left( i^4 \cdot f(iK) \cdot p\_{\text{max}}^{\quad i} \right) \tag{A25}$$

Since there are always exist *p* > *p*max and *n* < ∞, satisfying *i* <sup>4</sup> *p*max*<sup>i</sup>* < *p<sup>i</sup>* , ∀*i* > *n*. So we have:

$$\sum\_{i}^{\infty} \left( i^4 \cdot f(iK) \cdot p\_{\text{max}} \right)^i < \sum\_{i}^{\infty} \left( f(iK) \cdot p^i \right) \tag{A26}$$

So if <sup>∞</sup> ∑ *i* - *<sup>f</sup>*(*iK*) · *<sup>p</sup><sup>i</sup>* <sup>&</sup>lt; <sup>∞</sup>, then <sup>∞</sup> ∑ *i* - *i* <sup>4</sup> · *<sup>f</sup>*(*iK*) · *<sup>p</sup>*max*<sup>i</sup>* < ∞. Now seeking the conditions for the stability of the multi-loop CPS subsystem is transformed into seeking the conditions for the establishment of <sup>∞</sup> ∑ *i* - *<sup>f</sup>*(*iK*) · *<sup>p</sup><sup>i</sup>* < ∞. For *f*(*iK*), we have:

$$f(i\mathbf{K}) = \sum\_{q=1}^{i\mathbf{K}} \left(A^2\right)^{q-1} = 1 + A^2 + A^4 + \dots + A^{2(i\mathbf{K}-1)} = \frac{1 - \left(A^2\right)^{i\mathbf{K}}}{1 - A^2} \tag{A27}$$

For <sup>∞</sup> ∑ *i* - *<sup>f</sup>*(*iK*) · *<sup>p</sup><sup>i</sup>* , we have:

$$\begin{split} \sum\_{i}^{\infty} \left( f(iK) \cdot p^{i} \right) &= \sum\_{i}^{\infty} \left( \frac{1 - \left( A^{2} \right)^{iK}}{1 - A^{2}} \cdot p^{i} \right) = \frac{1}{1 - A^{2}} \sum\_{i}^{\infty} \left( \left( 1 - \left( A^{2} \right)^{iK} \right) \cdot p^{i} \right) \\ &= \frac{1}{1 - A^{2}} \left( \sum\_{i}^{\infty} \left( p^{i} - \left( A^{2} \right)^{iK} p^{i} \right) \right) = \frac{1}{1 - A^{2}} \left( \sum\_{i}^{\infty} \left( p^{i} \right) - \sum\_{i}^{\infty} \left( \left( A^{2} \right)^{iK} p^{i} \right) \right) \end{split} \tag{A28}$$

So in order to ensure that <sup>1</sup> <sup>1</sup>−*A*<sup>2</sup> ∞ ∑ *i* - *pi* − ∞ ∑ *i* " (*A*2) *iK p<sup>i</sup>* # < ∞ is satisfied, it is obvious that *p* < 1 and *A*2*<sup>K</sup> p* < 1 must stand, that is, *p* < " 1 *A*2 #*K* . This completes the proof.

#### *Appendix B.2. Scheduling L Subsystems per Time Slot without Delay*

When each time slot can schedule *L* subsystems, the corresponding strategy can be set to multiple independent round-robin insist scheduling strategies. It can be ensured that the round-robin cycle of each subsystem does not exceed *K*/*L*, and the follow-up proof is consistent with Appendix B.1.

#### *Appendix B.3. Scheduling L Subsystems per Time Slot with Delay*

For a specific subsystem, we assume that the fixed delay for each transmission is *Di* frames, which is equivalent to the uplink and downlink scheduling in the control lcfoop must be delayed by *DiK* time slots for AA reception, so the formula (A19) is modified as follows :

$$\begin{aligned} \mathbb{E}[C|N'=n', M=m, N=n] \\ = \sum\_{q=0}^{(m+n+2D)K-1} f(n'K + DK + q) \\ = \sum\_{q=1}^{(m+n+2D)K} f(n'K + DK + q - 1) \end{aligned} \tag{A29}$$

Since E[*D*] = E[*Di*] = *D* is a constant which has no effect on the subsequent proof, the proof process is consistent with Appendix B.1.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Entropy* Editorial Office E-mail: entropy@mdpi.com www.mdpi.com/journal/entropy

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-7293-2