*4.6. Optimization Problem Formulation*

Since *<sup>t</sup>* ∑ *t* E [*r* (*St* , *At*)] means the number of energy outages, the average energy outage probability *ζ<sup>E</sup>* can be defined as

$$\zeta^E = \lim\_{t \to \infty} \sup \frac{1}{t} \sum\_{t'}^t \mathbb{E}\left[r\left(S\_{t'}, A\_{t'}\right)\right],\tag{46}$$

where lim denotes the value that a function approaches as the input approaches a specific value. In addition, sup (i.e., supremum) means the least upper bound.

Meanwhile, the average timer expiration probabilities of IoT devices *i* and *j*, denoted as, *ξ<sup>T</sup> i* , and *ξ<sup>T</sup> <sup>j</sup>* , respectively, can be defined as

$$\xi\_i^T = \lim\_{t \to \infty} \sup \frac{1}{t} \sum\_{t'}^t \mathbb{E}\left[\mathbf{c}\_i \left(\mathbf{S}\_{t'}, \mathbf{A}\_{t'}\right)\right] \tag{47}$$

and

$$\zeta\_{\vec{\gamma}}^{T} = \lim\_{t \to \infty} \sup \frac{1}{t} \sum\_{t'}^{t} \mathbb{E} \left[ \mathfrak{c}\_{\vec{\gamma}} (\mathcal{S}\_{t'}, \mathcal{A}\_{t'}) \right]. \tag{48}$$

Then, the optimization problem in the CMDP model can be formulated as

$$\min\_{\pi} \mathcal{J}^{E}\_{\text{\textquotedblleft}\prime} \tag{49}$$

$$\text{s.t. } \mathcal{J}\_i^T \le \theta\_i^T \text{ and } \mathcal{J}\_j^T \le \theta\_j^T,\tag{50}$$

where *θ<sup>T</sup> <sup>i</sup>* and *<sup>θ</sup><sup>T</sup> <sup>j</sup>* are the upper limits on the timer expiration probabilities of IoT devices *i* and *j*, respectively.

The formulated optimization problem can be transformed into an equivalent LP problem [28]. That is, when *φ*(*S*, *A*) represents the stationary probability of state *S* and action *A*, the solution of the LP problem *φ*∗(*S*, *A*) can be mapped to that of the CMDP-based optimization model. The equivalent LP model can be expressed as

$$\max\_{\phi(S,A)} \sum\_{S} \sum\_{A} \phi(S,A) \, r(S,A) , \tag{51}$$

$$\text{s.t.} \quad \sum\_{S} \sum\_{A} \phi(S, A) c\_i(S, A) \le \theta\_i^T \,\prime \tag{52}$$

$$\sum\_{S} \sum\_{A} \phi(S, A) c\_{\dot{f}}(S, A) \le \theta\_{\dot{f}}^{T} \, , \tag{53}$$

$$\sum\_{A} \Phi(S', A) = \sum\_{S} \sum\_{A} \Phi(S, A) P[S' | S, A]\_{\prime} \tag{54}$$

$$\sum\_{S} \sum\_{A} \phi(S, A) = 1,\tag{55}$$

$$
\phi(\mathcal{S}, \mathcal{A}) \ge 0. \tag{56}
$$

The objective function in (51) is to minimize the energy outage probability of IoT devices. Meanwhile, the constraints in (52) and (53) are to maintain the timer expiration probabilities of IoT devices *i* and *j* below *θ<sup>T</sup> <sup>i</sup>* and *<sup>θ</sup><sup>T</sup> <sup>i</sup>* , respectively. The constraint in (54) satisfies the Chapman–Kolmogorov equation. The constraints in (55) and (56) are for the probability properties.

The optimal policy *π*∗(*S*, *A*), which is the probability of taking a particular action at a certain state, can be obtained from the solution of the above LP problem. The optimal policy can be derived from

$$\pi^\*\left(\mathcal{S}, A\right) = \frac{\phi^\*\left(\mathcal{S}, A\right)}{\sum\_{A'} \phi^\*\left(\mathcal{S}, A'\right)} \text{ for } \mathcal{S} \in \mathbf{S}, \\ \sum\_{A'} \phi^\*\left(\mathcal{S}, A'\right) > 0. \tag{57}$$

Note that, if ∑ *A φ*∗ (*S*, *A* ) = 0, which means that there is no solution to satisfy all constraints, IoT devices do not offload any task. LP problem can be solved in polynomial time [35–37]. Therefore, our proposed algorithm can be implemented to real systems without high computational power.
