Joint Latency-Oriented, Energy Consumption, and Carbon Emission for a Space–Air–Ground Integrated Network with Newly Designed Power Technology

Wang, Yonghao; Li, Bo; He, Jiahao; Dai, Jiaxing; Liu, Yidong; Yang, Yuxin

doi:10.3390/electronics12173537

Open AccessArticle

Joint Latency-Oriented, Energy Consumption, and Carbon Emission for a Space–Air–Ground Integrated Network with Newly Designed Power Technology

by

Yonghao Wang

,

Bo Li

^*,

Jiahao He

,

Jiaxing Dai

,

Yidong Liu

and

Yuxin Yang

School of Electronic-Electrical Engineering, Ningxia University, Yinchuan 750021, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(17), 3537; https://doi.org/10.3390/electronics12173537

Submission received: 30 June 2023 / Revised: 6 August 2023 / Accepted: 7 August 2023 / Published: 22 August 2023

(This article belongs to the Special Issue Emerging and New Technologies in Mobile Edge Computing Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Ubiquitous connectivity is envisaged for the space–air–ground-integrated network (SAGIN) of future communication to meet the needs of quality of service (QoS), green communication, and “dual carbon” targeting. However, the offloading and computation of massive latency-sensitive tasks dramatically increase the energy consumption of the network. To address these issues, we first propose a SAGIN architecture with energy-harvesting devices, where the base station (BS) is powered by both renewable energy (RE) and the conventional grid. The BS explores wireless power transfer (WPT) technology to power an unmanned aerial vehicle (UAV) for stable network operation. RE sharing between neighboring BSs is designed to fully utilize RE to reduce carbon emissions. Secondly, on the basis of task offloading decisions, the UAV trajectory, and the RE sharing ratio, we construct cost functions with joint latency-oriented, energy consumption, and carbon emission. Then, we develop a twin delayed deep deterministic policy gradient (TD3PG) algorithm based on deep reinforcement learning to minimize the cost function. Finally, simulation results demonstrate that the proposed algorithm outperforms the benchmark algorithm in terms of reducing latency, energy saving, and lower carbon emissions.

Keywords:

space–air–ground integrated network; renewable energy; twin delayed deep deterministic policy gradient; latency; energy consumption

1. Introduction

To accommodate the diverse quality-of-service (QoS) requirements of a wide range of applications in various scenarios, sixth-generation (6G) communication networks are envisioned to offer low latency and ubiquitous connectivity [1]. They are also expected to provide simultaneous wireless connectivity that is 1000 times higher than that provided by fifth generation (5G) networks [2]. In particular, a space–air–ground integrated network (SAGIN) is considered as a promising paradigm for future communication to provide large-scale coverage and network performance enhancement [3].

On the other hand, mobile edge computing (MEC) is considered a key technology for future communication network service for latency-sensitive tasks [4,5], addressing the requirement for massive low-latency control (less than 1 ms end-to-end latency) for 6G networks [6]. However, the execution of tasks would consume a considerable amount of energy. Green communication in 6G should be manifested by reducing the total energy consumption in order to achieve an energy efficiency goal [7]. In addition, the dual carbon goal of future communication networks is to reduce carbon emissions by 50% [8]. Therefore a joint latency-oriented to guarantee QoS, energy consumption for green communication, and carbon reduction to achieve a dual carbon goal for SAGIN meets the needs of future communication network development [9,10].

Several studies have investigated how to reduce network latency in order to guarantee QoS. Peng et al. [11] considered two typical MEC architectures in a vehicular network and formulated multidimensional resource optimization problems, exploiting reinforcement learning (RL) to obtain the appropriate resource allocation decisions to achieve high delay/QoS satisfaction ratios. By jointly optimizing computation offloading decisions and computation resource allocation, Zhao et al. [12] effectively improved the system utility and computation time. Abderrahim et al. [13] proposed an offloading scheme that improves network availability while reducing the transmission delay of ultra-reliable low-latency communication packets in the terrestrial backhaul, satisfying different QoS requirements.

A few other studies have considered the issue of energy consumption in networks. The authors of [14,15] considered a multiuser MEC system with the goal of minimizing the total system energy consumption over a finite time horizon. Chen et al. [16] investigated the task offloading problem in an ultra-dense network and formulated it as a mixed-integer nonlinear NP-hard program, which can reduce 20% of the task duration with 30% energy savings compared with random and uniform task-offloading schemes.

Some scholars have concentrated their attention on latency and energy consumption in the SAGIN architecture. Guo et al. [17] investigated service coordination to guarantee QoS in SAGIN service computing and proposed an orchestration approach to achieve a low-cost reduction in service delay. To achieve a better latency/QoS for different 5G slices with heterogeneous requirements, Zhang et al. [18] intelligently offloaded traffic to the appropriate segment of the SAGIN. In order to ensure QoS for airborne users, as well as to reduce energy consumption, Chen et al. [19] designed an energy-efficient data-saving scheme, with extensive simulations confirming the effectiveness of the proposed scheme in terms of both energy consumption and processing delay in a SAGIN.

Although the aforementioned studies represented a considerable amount of exploration of the issues of latency reduction and energy consumption reduction, the research reported in [20] highlights that the dense deployment of high-energy-demand communication devices corresponds to increased energy consumption of wireless access networks. Moreover, future networks must have reduced carbon emissions. Fortunately, energy harvesting technology converts ambient energy to electric energy, which can be used to help reduce the carbon footprint of wireless networks [21]. In particular, a green approach to cellular base station (BS) has been proposed, which involves the adoption of renewable energy (RE) resources [22].

Consequently, the subject of utilizing RE to power BSs in order to reduce the carbon footprint of networks has attracted academic attention. Yuan et al. [23] proposed an energy-storage-assisted RE supply solution to power a BS, in which a deep reinforcement learning (DRL)-based regulating policy is utilized to flexibly regulate the battery’s discharging/charging. On the basis of off-grid cellular BSs powered by integrated RE, Jahid et al. [24] formulated a hybrid energy cooperation framework that optimally determines the quantities of RE exchanged among BSs to minimize both related costs and greenhouse gas emissions. Despite their advantages, RE harvesting technologies are still variable and intermittent compared to traditional grid power. An aggregate technology that combines RE with traditional grid power is the most promising option to reliably power cellular infrastructure [25].

However, existing research efforts are limited to one or two aspects of the network in terms of reducing latency, energy saving, or access to new power supply technology. Relatively little attention has been paid to entire SAGIN with integrated energy access technology. Apart from the issues considered above, there are still a number of challenges associated with a SAGIN. First of all, the long-term performance of the network needs to be considered, as the arrival, transmission, and processing of tasks represent a stochastic and dynamic process over a period of time. The use of DRL in artificial intelligence techniques can address stochastic and complex problems, as well as dynamic problems. Secondly, due to the fact that future communication is likely to comprise a cellular network capable of providing energy [26], it is extremely critical to power an unmanned aerial vehicle (UAV) in order to maintain the sustainability of the network. Fortunately, radio-signal-based wireless power transfer (WPT) can provide battery-equipped devices with a sustainable energy supply. For instance, the authors of [27,28,29,30] considered a wirelessly powered multiaccess network wherein a hybrid access point powers wireless devices via WPT.

Motivated by the above limitations and challenges, we inquire into two issues in SAGINs. One is how to combine latency-oriented, energy consumption, and carbon emissions to model the objective function in a SAGIN with new power access technology. The other is how the relevant influence factors affect latency, energy consumption and carbon emissions.

In this article, we first develop a SAGIN with a newly designed power supply technology. In order to maintain the sustainability of the network, the BS explores the WPT to power the UAV. The UAV is able to dynamically adjust its trajectory to cope with task processing and charging. The combination of latency, energy consumption, and carbon emissions is formulated as the cost function. Obviously, our research target is to minimize the cost function.

The main contributions of this article can be summarized as follows:

We propose an architecture for a SAGIN with a newly designed power supply technology. This network offers ubiquitous connectivity while adapting to the requirements of high reliability and green communication.
We develop a cost function with joint latency-oriented, energy consumption, and carbon emissions, which facilitates decreases in the latency of task processing, energy consumption, and the carbon footprint of the network through optimization.
We put forward a twin delayed deep deterministic policy gradient (TD3PG) method. This DRL-based algorithm is capable of sensing parameter changes in the network and dynamically updates the offloading decision to minimize the cost function.
We conduct experimental evaluations and comparisons. The proposed algorithm is compared with three benchmark algorithms. The results show that the proposed algorithm is outstanding in terms of reducing task latency, network energy savings, and lower carbon emissions.

The rest of this article is organized as follows. In Section 2, the system model with SAGIN is presented. Section 3 provides the problem formulation. In Section 4, a TD3PG algorithm is designed to obtain the optimal values. Section 5 demonstrates the simulation results and performance analysis. Finally, Section 6 is devoted to the conclusions.

2. System Model

A SAGIN that consists of M ground user devices, N BSs, K UAVs, and a low-earth-orbit (LEO) satellite is considered.

As shown in Figure 1, each BS, each UAV, and the LEO satellite is separately equipped with a server to provide the devices with computation services. Devices are randomly distributed on the ground, satisfying

m \in M = {1, 2, \dots, M}

. BSs are deployed on buildings with a height of

H_{B S}

and satisfy

n \in N = {1, 2, \dots, N}

. The flight altitude of the UAV k is fixed as H. The communication range covers the preset range around the UAV, which satisfies

k \in K = {1, 2, \dots, K}

. The set of time slots is indicated as

t \in T = {1, 2, \dots, T}

, and the size of each time slot is

τ

. The UAV acts as a mobile BS and does not take into account the function of a relay. The communication coverage of the UAV is smaller than that of the BS, and the communication range of the BS is smaller than that of the LEO satellite. Since one LEO satellite can sufficiently cover the area of the scenario, it is assumed that a LEO satellite is available within the communication coverage at all times. The meteorological environment remains constant. For ease of understanding, the key symbols used in this article are listed in Table 1.

Each BS is independently equipped with solar panels, a wind turbine, a backup battery, and an electricity converter to provide uninterrupted energy supply. Adjacent BSs are able to share energy through an energy-sharing line. In order to maintain the stable operation of the UAV, the BS is able to supply the UAV with energy via WPT.

Given that virtual reality, augmented reality, and mixed reality are typical applications based on the 6G [31], it is assumed that each device has one total task that can be divided into a number of subtasks. The subtasks of device m can be represented as

〈s_{m} (t), e_{m} (t), t_{m} (t)〉

, where

s_{m} (t)

and

e_{m} (t)

denote the subtask’s size (bit) and required number of central processing unit (CPU) cycles per unit of data (cycles/bit) in time slot t, respectively; and

t_{m} (t)

indicates the latency constraint for execution of the subtask, which contains the latency for the transmission, computation, and waiting of the subtask. Subtasks follow a Poisson distribution with an arrival rate of

λ_{m} (t)

.

The relevant system models in SAGIN are enumerated below.

2.1. Backup Battery Model for BS

L_{n} (t)

denotes the electrical load for BS n in time slot t, which is related to the communication traffic (

T_{n} (t)

) [32]. Hence,

L_{n} (t)

can be expressed as

L_{n} (t) = δ_{0} + δ_{1} T_{n} (t),

(1)

where

δ_{0}

and

δ_{1}

are coefficients.

The BS backup battery is required to reserve capacity

b_{n}^{r e s} (t)

to meet the energy consumption need for the electrical load of BS n during the fixed standby time (

D_{n}^{r e s} (t)

) in time slot t.

b_{n}^{r e s} (t)

can be represented as

b_{n}^{r e s} (t) = \int_{0}^{D_{n}^{r e s} (t)} L_{n} (τ) d τ .

(2)

During time slot t, the RE harvested by N BS backup batteries is

H (t) = [h_{1} (t), h_{2} (t), \dots,

h_{N} (t)]

. Energy sharing with the adjacent BS is denoted as

E^{share} (t) = [e_{1}^{s h a r e} (t), e_{2}^{s h a r e} (t), \dots,

e_{N}^{s h a r e} (t)]

. Let

B^{res} (t) = [b_{1}^{r e s} (t), b_{2}^{r e s} (t), \dots, b_{N}^{r e s} (t)]

denote the energy reserved by the backup battery to support the current communication traffic of the BS meets at a fixed standby time.

E^{bs} (t) = [e_{1}^{b s} (t), e_{2}^{b s} (t), \dots, e_{N}^{b s} (t)]

is the energy consumption of each BS. Suppose that the energy remaining in the backup battery is

B^{bs} (t) = [b_{1}^{b s} (t), b_{2}^{b s} (t), \dots, b_{N}^{b s} (t)]

. Then, the capacity of the backup battery during the next time slot can be derived as

B^{bs} (t + 1) = min {E^{gird} (t) + H (t) + B^{bs} (t) - E^{share} (t) - E^{bs} (t), B^{b a t t e r y}},

(3)

where

B^{b a t t e r y}

is the capacity of the backup battery.

E^{gird} (t)

is the energy supplied by power grid when RE harvesting is unable to maintain the BS energy consumption, which can be constructed as

E^{gird} (t) = [B^{res} (t) + E^{bs} (t) - H (t) + E^{share} (t) - B^{bs} {(t)]}^{+},

(4)

where

{[x]}^{+} = max {x, 0}

. Considering the cost and size of BS energy harvesting equipment, the optimal unit size is finitely large. Battery lifetime (

B^{l i f e}

) is a factor to be considered, which is related to the number of charge/discharge cycles. In this case, the following constraints are required:

\begin{matrix} B^{b a t t e r y} ⩾ \int_{0}^{D_{n}^{r e s} (t)} L^{m a x} d τ, \end{matrix}

(5a)

\begin{matrix} C P R = \frac{B^{l i f e} B^{b a t t e r y}}{C^{b a t t e r y}}, \end{matrix}

(5b)

\begin{matrix} C^{b a t t e r y} \leq C^{e x p e c t}, \end{matrix}

(5c)

where

L^{m a x}

is the maximum electrical load for the BS.

C P R

is defined as the cost–performance ratio of the backup battery, which is as large as possible.

C^{b a t t e r y}

and

C^{e x p e c t}

denote the actual and maximum expectation cost of the backup battery, respectively.

Furthermore, adjacent BSs are able to share energy through physical resistive power line connections to maximize RE utilization. This approach may be reasonable for energy sharing among BSs within small zones [24]. The remaining energy of the RE may be shared with other BSs. Then, energy sharing (

E^{share} (t)

) can be expressed as

E^{share} (t) = υ_{n} \cdot {[H (t) + B^{bs} (t) - B^{res} (t) - E^{bs} (t)]}^{+},

(6)

where

υ_{n}

is the RE transfer ratio with

υ_{n} \in [0, 1]

. When the backup batteries are fully charged, the excess RE (

E^{exc} (t) = [e_{1}^{e x c} (t), e_{2}^{e x c} (t), \dots, e_{N}^{e x c} (t)]

) is insufficient to be sold to the power grid. Adding additional energy storage devices increases the cost. Therefore, the excess RE is discarded. The loss factor for energy transfer between BSs is

η

, which satisfies

η \in [0, 1]

. Thus, the energy loss (

E^{loss} (t) = [e_{1}^{l o s s} (t), e_{2}^{l o s s} (t), \dots, e_{N}^{l o s s} (t)]

) can be expressed as

E^{loss} (t) = η E^{share} (t) + E^{exc} (t) .

(7)

It is crucial to reduce energy loss (

E^{loss} (t)

) when evaluating the reduction in carbon emissions.

2.2. Communication Model

Each subtask can be offloaded to a BS, UAV, or the LEO satellite for execution in one of these ways in time slot t. Thereafter, the task communication model is described as follows.

2.2.1. Offloading to the BSs

Let

R_{m, n} (t)

denote the wireless transmission rate between device m and BS n in time slot t, which can be expressed as

R_{m, n} (t) = B_{m, n} (t) {log}_{2} [1 + \frac{p_{m} G_{m, n} (t) g_{m, n} (t)}{σ^{2} + ζ_{m^{'}, n^{'}} (t)}],

(8)

where

B_{m, n} (t)

represents the channel bandwidth between device m and BS n.

p_{m}

is the transmission power of device m.

G_{m, n} (t)

is the path loss, which is related to the distance (

d_{m, n} (t)

) between device m and BS n in time slot t.

g_{m, n} (t)

is the small-scale channel fading subject to Rayleigh fading.

σ^{2}

denotes the noise power.

ζ_{m^{'}, n^{'}} (t)

is the interference from the other BSs.

2.2.2. Offloading to the UAVs

If device m is in the communication coverage area of UAV k in time slot t,

a_{m, k} (t) = 1

is assumed. Otherwise,

a_{m, k} (t) = 0

. As the altitude of the UAV is much higher than that of the devices, the line-of-sight channels of the UAV communication links are much more predominant [33]. The wireless channel of the UAV is considered a free-space path loss model. Therefore the channel gain (

g_{m, k} (t)

) is

β_{0} {(d_{m, k} (t))}^{- α_{0}}

, where

β_{0}

denotes the channel gain at unit distance,

α_{0}

represents the path loss index, and

d_{m, k} (t)

is the Euclidean distance between the m-

t h

device and the k-

t h

UAV. Hence, the transmission rate (

R_{m, k} (t)

) can be formulated as

R_{m, k} (t) = B_{m, k} (t) {log}_{2} [1 + \frac{a_{m, k} (t) p_{m} g_{m, k} (t)}{σ^{2} + ζ_{m^{'}, k^{'}} (t)}],

(9)

where

B_{m, k} (t)

is the channel bandwidth, and

ζ_{m^{'}, k^{'}} (t)

represents the co-channel interference from all the other UAVs.

2.2.3. Offloading to the LEO Satellite

The channel conditions between the LEO satellite and ground devices are mainly affected by the communication distance and rainfall attenuation. Since the distance between LEO satellite and devices remains almost unchanged, a fixed transmission rate (

R_{m, s a t} (t)

) between devices and the LEO satellite is adopted. It is usually lower than the transmission rate between devices and UAV communication [34,35].

2.3. Computational Model

The task can be executed in four ways: executed locally, offloaded to BSs, offloaded to UAVs, or offloaded to the LEO satellite. Due to the small data size of the computational result, the delay and energy consumption required to transmit the computational result can be omitted [36].

2.3.1. Local Computation

The device is capable of varying CPU cycle frequency (

f_{m} (t)

) dynamically to obtain better system performance. Consequently, the latency of local task execution (

t_{m}^{l} (t)

) can be obtained as

t_{m}^{l} (t) = \frac{s_{m} (t) e_{m} (t)}{f_{m} (t)} + t_{m}^{l, w a i t} (t),

(10)

where

t_{m}^{l, w a i t} (t)

indicates the queuing delay of the task for the device. Assuming the energy consumption per CPU cycle as

κ f_{m}^{2} (t)

, where

κ

is the CPU chip-related switching capacitor [37], the energy consumption

E_{m}^{l} (t)

of task execution on the device can be expressed as

E_{m}^{l} (t) = κ f_{m}^{2} (t) s_{m} (t) e_{m} (t) .

(11)

2.3.2. Offloading to the BSs

It is assumed that tasks are executed instantly without queuing delay [30]. Because the BS typically has a relatively stable energy supply and a powerful multicore CPU when tasks are offloaded to the BS.

f_{m, n} (t)

represents the allocated computational resources for the n-th BS. The task execution latency (

t_{n}^{b s} (t)

) for offloading to the BS can be expressed as

t_{n}^{b s} (t) = \frac{s_{m} (t)}{R_{m, n} (t)} + \frac{s_{m} (t) e_{m} (t)}{f_{m, n} (t)} .

(12)

The energy consumption (

E_{n}^{b s} (t)

) required for task execution in the BS can be obtained as

E_{n}^{b s} (t) = δ_{0} + δ_{1} s_{m} (t) .

(13)

2.3.3. Offloading to the UAVs

The task execution latency (

t_{k}^{u a v} (t)

) for offloading to the UAVs can be expressed as

t_{k}^{u a v} (t) = a_{m, k} (t) [\frac{s_{m} (t)}{R_{m, k} (t)} + \frac{s_{m} (t) e_{m} (t)}{f_{m, k} (t)} + t_{k}^{u, w a i t} (t)],

(14)

where

f_{m, k} (t)

is the allocated computational resources for the k-

t h

UAV.

e_{u a v}

is defined as the energy consumption per unit of computational resources of the UAV. The energy consumption (

E_{k}^{u a v} (t)

) required for task execution by the UAV can be expressed as

E_{k}^{u a v} (t) = a_{m, k} (t) [\frac{p_{m} s_{m} (t)}{R_{m, k} (t)} + e_{u a v} s_{m} (t) e_{m} (t)] .

(15)

2.3.4. Offloading to the LEO satellite

f_{m, s a t} (t)

represents the allocated computational resource for the LEO satellite. The task execution latency (

t_{m}^{s a t} (t)

) for offloading to the LEO satellite can be expressed as

t_{m}^{s a t} (t) = \frac{s_{m} (t)}{R_{m, s a t} (t)} + \frac{s_{m} (t) e_{m} (t)}{f_{m, s a t} (t)} + t^{s, w a i t} (t) .

(16)

The energy consumption (

E_{m}^{s a t} (t)

) required for task execution by the LEO satellite can be obtained as

E_{m}^{s a t} (t) = \frac{p_{m} s_{m} (t)}{R_{m, s a t} (t)} + e_{s a t} s_{m} (t) e_{m} (t),

(17)

where

e_{s a t}

is the energy consumption per unit of computational resources of the LEO satellite.

2.4. WPT Model

In order to maintain the sustainability of the network, the BS explores the WPT to power the UAV. Figure 2 shows the WPT principle adopted in the SAGIN.

ι

is the power division factor. The power divider divides the received radio signal into signal power with a ratio (

ι

) for energy harvesting and signal power with a ratio

(1 - ι)

for information decoding. In the energy harvester, the received signal is converted to the required energy and stored in the battery for the UAV.

Rather than considering the information decoder, the energy converted in the energy harvester is taken into account, and signal power is allocated to the energy harvester as much as possible. In addition, the impact on the health of the UAV battery of applying WPT technology is not considered. In the case in which UAV k is within the energy transmission coverage area of BS n in time slot t,

o_{k, n} (t) = 1

. Otherwise,

o_{k, n} (t) = 0

. A linear energy harvesting model is adopted. Then, the energy harvested by UAV k can be expressed as

B_{k}^{H} (t) = \sum_{k = 1}^{K} o_{k, n} (t) χ P_{n} G_{n}^{t} G_{k}^{r} h_{n, k} (t) \cdot τ,

(18)

where

χ

denotes the energy conservation efficiency, which satisfies

χ \in (0, 1]

.

P_{n}

is the BS transmitting power, and the BS adopts constant power transmission.

G_{n}^{t}

and

G_{k}^{r}

are the antenna power gains of the BS transmitter and the UAV receiver, respectively.

h_{n, k} (t)

is the channel gain between the n-th BS and the k-th UAV in time slot t.

2.5. Flight Model

Continuous variables such as the average velocity (

v_{k} (t)

) and the directional angle (

θ_{k} (t)

) are considered to adjust the trajectory of UAV k. The horizontal position coordinates and the moving distance of UAV k in the current time slot are assumed to be

Ψ_{k} (t) = {[x_{k} (t), y_{k} (t)]}^{T}

and

d_{k} (t)

, respectively. The position coordinates of UAV k in the next time slot can be expressed as

\{\begin{matrix} x_{k} (t + 1) = x_{k} (t) + d_{k} (t) cos θ_{k} (t), \\ y_{k} (t + 1) = y_{k} (t) + d_{k} (t) sin θ_{k} (t) . \end{matrix}

(19)

Referring to [38], the propulsion power consumption of UAV k can be modeled as

P [v_{k} (t)] = P_{0} (1 + 3 v_{k}^{2} (t) / U_{t i p}^{2}) + \frac{1}{2} d_{0} ρ s A v_{k}^{3} (t) + P_{1} {[\sqrt{1 + v_{k}^{4} (t) / 4 v_{0}^{4}} - v_{k}^{2} (t) / 2 v_{0}^{2}]}^{\frac{1}{2}},

(20)

where

P_{0}

and

P_{1}

are constants representing the blade profile power and induced power in hovering state, respectively.

U_{t i p}

denotes the tip speed of the rotor blade.

v_{0}

is the mean rotor-induced velocity in the hover state.

d_{0}

and s are the fuselage drag ratio and rotor solidity, respectively.

ρ

and A denote the air density and rotor disc area, respectively. The flight energy consumption of UAV k in time slot t can be expressed as

E_{k}^{F} (t) = \int_{t}^{t + τ} P [v_{k} (t)] d t .

(21)

The system model and the SAGIN are enumerated in this section. In the following section, the cost function is constructed.

3. Problem Formulation

How to integrate latency-oriented, energy consumption, and carbon emissions to model the objective function in a SAGIN is the key to solve the first issue. According to the system model mentioned above, the cost function is first constructed on the basis of the system model. The formulation of the problem is demonstrated in this section.

The network cost function (

U (t)

) is constructed by joint task execution latency (

T (t)

), energy consumption (

E^{e c} (t)

), and carbon emissions (

C^{R E} (t)

), which can be represented as

U (t) = ϑ_{0} T (t) + ϑ_{1} E^{e c} (t) + ϑ_{2} C^{R E} (t),

(22)

where

ϑ_{0}

,

ϑ_{1}

, and

ϑ_{2}

represent the preference for latency, energy consumption, and carbon emissions, respectively, satisfying

ϑ_{0} + ϑ_{1} + ϑ_{2} = 1

.

T (t)

,

E^{e c} (t)

, and

C^{R E} (t)

are expressed as

T (t) = \sum_{\binom{m \in M,}{n \in N, k \in K}} [t_{m}^{l} (t) + t_{n}^{b s} (t) + t_{k}^{u a v} (t) + t_{m}^{s a t} (t)],

(23)

E^{e c} (t) = \sum_{\binom{m \in M,}{n \in N, k \in K}} [E_{m}^{l} (t) + E_{n}^{b s} (t) + E_{k}^{u a v} (t) + E_{m}^{s a t} (t) + E_{k}^{F} (t)],

(24)

C^{R E} (t) = \sum_{n \in N} ξ [h_{n} (t) - e_{n}^{l o s s} (t)],

(25)

where

ξ

is the conversion factor of carbon emissions per unit of energy and is related to the type of energy source.

t_{m}^{l} (t)

,

t_{n}^{b s} (t)

,

t_{k}^{u a v} (t)

, and

t_{m}^{s a t} (t)

represent the time when the tasks are executed locally or offloaded to the computation server in time slot t, respectively. In particular, each subtask can only be executed in one of these ways.

E^{e c} (t)

includes energy consumption for tasks execution and UAV flight in time slot t.

C^{R E} (t)

is the reduction in carbon emissions achieved by reducing traditional power gird energy consumption.

The network cost function is formulated as a minimization problem as follows.

\begin{matrix} min \sum_{t \in T} U (t) \end{matrix}

(26)

\begin{matrix} s . t . & o_{m}^{L} (t), o_{m, n}^{N} (t), o_{m, k}^{K} (t), o_{m}^{s a t} (t) \in {0, 1}, \end{matrix}

(26a)

\begin{matrix} o_{m}^{L} (t) + \sum_{n = 1}^{N} o_{m, n}^{N} (t) + \sum_{k = 1}^{K} o_{m, k}^{K} (t) + o_{m}^{s a t} (t) = 1, \end{matrix}

(26b)

\begin{matrix} t_{m}^{l} (t), t_{n}^{b s} (t), t_{k}^{u a v} (t), t_{m}^{s a t} (t) \leq t_{m} (t), \end{matrix}

(26c)

\begin{matrix} 0 \leq θ_{k} (t) \leq 2 π, 0 \leq v_{k} (t) \leq v_{max}, \end{matrix}

(26d)

∥Ψ_{k 1} (t) - Ψ_{k 2} (t)∥ \geq D_{min}, k 1 \neq k 2,

(26e)

B^{b o t} \leq B_{k}^{K} (t) \leq B^{c a p a c i t y},

B^{r e s} (t) \leq B_{n}^{b s} (t) \leq B^{b a t t e r y},

(26f)

f_{m} (t) \leq f_{m}^{max}, \sum_{m = 1}^{M} f_{m, k} (t) \leq f_{k}^{max},

\begin{matrix} \sum_{m = 1}^{M} f_{m, n} (t) \leq f_{n}^{max}, \sum_{m = 1}^{M} f_{m, s a t} (t) \leq f_{s a t}^{max}, \end{matrix}

(26g)

\begin{matrix} \forall m \in M, n \in N, k \in K, t \in T . \end{matrix}

(26h)

Constraints (26a) and (26b) imply that the task is indivisible and can be executed locally or remotely by the computation server. (26c) constrains the latency limit of the task. Constraint (26d) defines the range of angles and velocities that control the UAV’s flight trajectory. (26e) ensures a safe distance between UAVs, where

∥\cdot∥

is the Euclidean norm. Constraint (26f) indicates the minimum and maximum thresholds for the UAV and BS backup batteries. (26g) guarantees that the computational resources allocated to the task by the device and other computational servers do not exceed the maximum computation capacity. (26h) specifies the range of variables.

It is a challenge to solve the above problem, owing to the dynamic and complex characteristics of the network environment. It is crucial to design a satisfactory solution that can operate in real time and is capable of handling high-dimensional dynamic parameters. Therefore, in the following section, DRL is used to learn the network and to deal with an optimization problem with various constraints.

4. TD3PG Algorithm Based on the Markov Chain Model in a SAGIN

DRL aims to learn an award-maximizing strategy through interactions of intelligent agents with the environment. In order to solve the second issue with DRL, the cost function is reformulated as a Markov decision process (MDP). TD3PG, a DRL-based algorithm, is proposed to optimize the objective problem by training the near-optimal model.

4.1. MDP Formulation

We assume a typical MDP with four tuple elements

〈S, A, M, R〉

, where

S

is the state space.

A

is the action space.

M

denotes the state transition function, which indicates the probability of the transition to next the state (

s_{t + 1}

) after performing

(s_{t}, a_{t})

.

R

is the reward function. Detailed descriptions of the elements in the tuple are presented below.

State space: State space is designed to store the DRL agent’s observations in the environment and to guide the generation of actions. The state space includes the remaining energy of the UAV ( $B_{k}^{K} (t)$ ), the BS ( $B_{n}^{b s} (t)$ ), the reserved energy for the BS backup battery ( $B_{n}^{r e s} (t)$ ), the current horizontal location of the UAV ( $Ψ_{k} (t)$ ), and the amount of the current task remaining ( $s_{m} (t)$ ). Then, the system state space can be defined as

$s_{t} = {B_{k}^{K} (t), B_{n}^{b s} (t), B_{n}^{r e s} (t), Ψ_{k} (t), s_{m} (t)} .$

(27)
Action space: Action space is designed to store the actions performed by the DRL agent in the environment in order to obtain feedback from the environment and the next state. The action space contains the offloading decisions ( $o_{m}^{L} (t), o_{m, n}^{N} (t), o_{m, k}^{K} (t), o_{m}^{s a t} (t)$ , i.e., offloading the task to a local device, BS, UAV, or the LEO satellite, respectively). The navigation speed ( $v_{k} (t)$ ) and the rotation angle ( $θ_{k} (t)$ ) used to control UAV trajectory are continuous values in the $[0, v_{max}]$ and $[0, 2 π]$ intervals, respectively. $f_{m} (t) \in [f_{m}^{max} / 2, f_{m}^{max}]$ denotes the dynamic computational resources of the device. $υ_{n} (t)$ is the proportion of shared RE harvested by the BS. Therefore, the system action space can be expressed as

$a_{t} = {o_{m}^{L} (t), o_{m, n}^{N} (t), o_{m, k}^{K} (t), o_{m}^{s a t} (t), v_{k} (t), θ_{k} (t), f_{m} (t), υ_{n} (t)}$

(28)
Reward: Reward is the feedback from the environment to the DRL agent after performing the current action. The research goal is to minimize the cost function. The algorithm depends on the reward to evaluate its decisions by learning the SAGIN environment. Due to the fact that the goal of RL is a cumulative overall reward over time to maximize reward, the reward function is reformulated as the negative value of the optimization cost function. Then, the reward can be expressed as

$R_{t} = - U (t),$

(29)

where the coefficients $ϑ_{0}$ , $ϑ_{1}$ , and $ϑ_{2}$ in $U (t)$ are required to satisfy $ϑ_{0}, ϑ_{1} > 0$ and $ϑ_{2} < 0$ , respectively.

Figure 3 shows the strategic framework for the scenario proposed in the SAGIN. In time slot 1, tasks are set to be computed locally or be offloaded to the computation server without exceeding the latency constraints, incurring the latency of local computation or transmission, as well as the corresponding energy consumption. In time slot 2, tasks that are transferred to the server are computed. Because the device is within the communication range of the UAV, new tasks can be optionally offloaded to the computation servers, including the UAV waiting for execution. Since the UAV is also within the energy transmission coverage area of the BS, the BS can charge the UAV by WPT when the UAV battery power falls below a threshold. In time slot 3, the remaining tasks in the UAV’s computing queue that have not been computed (black) in the previous time slot generate a waiting delay. It is necessary to wait until execution is completed before calculating the tasks for the current time slot. At the end of each time slot, the remaining battery power is calculated. Excluding the energy that needs to be reserved, the BS backup battery shares the excess RE with other BSsin a certain percentage. When the remaining power of the BS backup battery is lower than the required reserved power (red), power from the grid is utilized to replenish it. In each time slot, the task execution latency, total network energy consumption, and reduction in carbon emissions are jointly formulated as a cost function. The reward state space and action space are fed into the TD3PG algorithm, which ultimately feeds back offload decisions, local computation resources, the UAV trajectory, and the RE sharing ratio for the scenario. Eventually, new actions are executed in the scenario.

4.2. TD3PG Algorithm

Owing to the fact that the action space contains continuous variables, TD3PG is adopted, which is an off-policy algorithm based on the actor–critic structure for dealing with optimization problems with a high-dimensional action space [39].

Compared to the conventional actor–critic-based deep deterministic policy gradient (DDPG), which would result in cumulative errors, TD3PG solves the former overestimation problem. It has two pairs of critic networks in which the smaller Q value of the two Q functions is chosen to form the loss function in the Bellman equation. Hence, there is less overestimation in the learning process. Moreover, the application of a delayed policy technique enables TD3PG to less frequently update the actor network and the target network than the critic networks. In other words, the model does not update policy unless the model’s value function is updated sufficiently. The less frequent policy updates result in a value evaluation with lower variance, producing a better policy [40].

TD3PG is also able to smooth the value estimate by applying a smoothing technique for a target policy that adds noise (clipped random noise) to the target actions and average in mini batches, as shown below:

ϵ \sim clip (N (0, \tilde{ω}), - c, c) .

(30)

According to the structure of the TD3PG algorithm depicted in Figure 4, the TD3PG architecture contains one actor network and two critic networks. The actor network is parameterized by

θ_{π}

. The two critic networks are parameterized by

θ_{Q 1}

and

θ_{Q 2}

, respectively.

The agent first observes the current state (

s_{t}

) and picks an action (

a_{t}

) via the actor network. After performing

a_{t}

in the environment, the agent is able to observe the immediate reward (

R_{t}

) and the next state (

s_{t + 1}

). Then, the agent selects the policy that maximizes the cumulative reward, which converts the state (

s_{t}

) to the action (

a_{t}

). At this point, the transition (

〈s_{t}, a_{t}, R_{t}, s_{t + 1}〉

) is obtained and stored in the replay memory for further sampling.

The training process of the TD3PG algorithm is summarized as Algorithm 1.

The minimum Q-value estimate of these two critic networks is used to calculate the target value (

y_{t}

), which can be obtained as

y_{t} = R_{t} + γ \cdot min_{i = 1, 2} Q_{θ_{Q i}} (s_{t + 1}, (π (s_{t + 1} | θ_{π}^{T}) + ε) | θ_{Q i}^{T}),

(31)

where

γ

is a discount factor, ranging from 0 to 1.

The loss function of the two critic networks can be defined as

L s (θ_{Q i}) = E [{(y_{t} - Q_{θ_{Q i}} (s_{t}, a_{t} | θ_{Q i}))}^{2}], i = 1, 2 .

(32)

Then, the gradient of the loss function is calculated as

\nabla_{θ_{Q i}} L s (θ_{Q i}) = E [(y_{t} - Q (s_{t}, a_{t} | θ_{Q i})) \nabla_{θ_{Q i}} Q (s_{t}, a_{t} | θ_{Q i})], i = 1, 2 .

(33)

The policy gradient of the actor network can be expressed as

\nabla_{θ_{π}} J \approx E [\nabla_{a} Q (s_{t}, a_{t} | θ_{Q 1}) |_{a_{t} = π (s_{t})} \nabla_{θ_{π}} π (s_{t} | θ_{π})] .

(34)

The parameters of the target networks are updated by means of

\begin{matrix} θ_{π}^{T} \leftarrow ω θ_{π} + (1 - ω) θ_{π}^{T} \\ θ_{Q i}^{T} \leftarrow ω θ_{Q i} + (1 - ω) θ_{Q i}^{T}, i = 1, 2, \end{matrix}

(35)

where

ω

is decaying rate of both the actor and critic networks.

Algorithm 1 TD3PG algorithm.

Require: Training episode length: $m a x$ - $e p i s o d e$ . discount factor: $γ$ .
Ensure: Loss, gradients.
1: Initialize the actor network $π_{θ_{π}}$ with parameter $θ_{π}$ . Initialize the critic network $Q_{θ_{Q 1}}$ , $Q_{θ_{Q 2}}$ with parameters $θ_{Q 1}$ and $θ_{Q 2}$ , respectively.
2: Initialize the target network by: $θ_{π}^{T} \leftarrow θ_{π}$ , $θ_{Q 1}^{T} \leftarrow θ_{Q 1}$ , $θ_{Q 2}^{T} \leftarrow θ_{Q 2}$ .
3: Initialize replay buffer $R$ .
4: for episode = 1 to $m a x$ - $e p i s o d e$ do
5: Reset simulation parameters and obtain initial observation state s.
6: for t = 1 to T do
7: Get action after adding exploration noise $a_{t} \sim π (s_{t} | θ_{π}) + N (0, δ)$ .
8: Obtain the reward $R_{t}$ and update $s_{t + 1}$ .
9: Store the transition $〈s_{t}, a_{t}, R_{t}, s_{t + 1}〉$ in replay buffer $R$ .
10: Sample a random mini-batch of V tuples from replay buffer $R$ .
11: Compute greedy actions for next states through $\tilde{a} \leftarrow π (s_{t + 1} | θ_{π}^{T}) + ϵ$ , where $ϵ \sim clip (N (0, \tilde{ω}), - c, c)$ .
12: Compute target value $y_{t}$ based on Equation (31).
13: Update the critics by minimizing the loss function Equation (32).
14: Update the actor network parameter $θ_{π}$ by using the sampled policy gradient Equation (34).
15: Update the target networks Equation (35).
16: end for
17: end for

4.3. Complexity Analysis

The proposed TD3PG algorithm contains both the actor network and critic networks. Therefore, their complexities are evaluated separately. Assuming that the actor network has

L^{a}

layers with

n_{l}

neuron nodes in the l-th layer (l ≤ L_a), the complexity of the l-th layer is

O (n_{l - 1} n_{l} + n_{l} n_{l + 1})

[41]. The complexity of the actor network is

O (\sum_{l = 2}^{L^{a} - 1} (n_{l - 1} n_{l} + n_{l} n_{l + 1}))

. Assuming that the critic network has

L^{c}

layers with

m_{l}

neuron nodes in the j-th layer (j ≤ L_c), the complexity of the j-th layer is

O (m_{j - 1} m_{j} + m_{j} m_{j + 1})

[41]. Similarly, the complexity of the critic network is

O (\sum_{j = 2}^{L^{c} - 1} (m_{j - 1} m_{j} + m_{j} m_{j + 1}))

. Hence, the overall computational complexity of the TD3PG algorithm is

O (\sum_{l = 2}^{L^{a} - 1} (n_{l - 1} n_{l} + n_{l} n_{l + 1}) + \sum_{j = 2}^{L^{c} - 1} (m_{j - 1} m_{j} + m_{j} m_{j + 1}))

, which is similar to that of the DDPG algorithm.

Overall, TD3PG is an off-policy algorithm based on the actor–critic structure for dealing with optimization problems with a high-dimensional action space. It solves the overestimation problem of DDPG. Moreover, the application of a delayed policy technique enables TD3PG to produce a better policy. It is an improvement over the DDPG algorithm, but it does not considerably increase the complexity.

5. Results and Discussion

In this section, numerical simulations are used to evaluate the performance of the proposed TD3PG algorithm. Specifically, the simulation settings and benchmark strategies are first elaborated. Afterwards, an analysis of the test results is conducted.

5.1. Simulation Settings

We consider an experimental area of 800 m × 800 m in which M = 50 devices are uniformly distributed. The data size and latency requirements of each computation subtask obey a uniform distribution. Each device has a subtask in each time slot with the data size randomly distributed in

U [3, 8]

Mbits. The latency requirements of the subtasks are randomly distributed in

U [0.8, 1.2]

s. There are N = 2 BSs, K = 3 UAVs, and one LEO satellite in the experimental area. The fixed altitude of the UAVs is H = 100 m, and their communication coverage is within a circular radius of 173.2 m. The communication coverage radius of the BSs is about 250 m. The LEO satellite covers the entire experimental area, with a communication coverage of about 200,000 km

^{2}

. The bandwidths of the UAVs and BSs are 5 and 20 MHz, respectively. A DJI M300RTK quadcopter with the addition of a computation server, communication equipment, wireless charging equipment, and a wheelbase of 895 mm was selected. The total weight of the UAV is about 9 kg, with the communication equipment (6G BS) accounting for 1 kg. One charge can support UAV flight for 55 min. In order to simplify the battery parameters and to demonstrate the role of the WPT as quickly as possible, the battery capacity of the UAV is set to

10^{4}

J. During a charge of a UAV, the less RE loss collected by the BS backup battery, the more RE gained by the UAV through WPT. Hence, the more reduced carbon emissions can be captured. Corresponding simulation parameters are illustrated in Table 2.

The simulations are run with Python 3.8.8 and Tensorflow 2.6.0 on a computer with 1 CPU: an Intel(R) i5-9400. The time for each training episode is about 22 s. To evaluate the performance of our proposed algorithm, we compare the proposed TD3PG algorithm with three conventional benchmarks.

DDPG Algorithm: A DRL-based algorithm, DDPG also contains two types of neural networks—an actor network and a single critic network—whereas TD3PG has two critic networks. Other parameters are consistent with those of TD3PG.
Full Offload: As local computation does not guarantee latency constraints, in addition to consuming a lot of energy, tasks are all offloaded to the computation server for computation. The computation server distributes computation resources evenly based on the number of connected devices. All excess RE is shared with neighboring BSs.
Greedy on Edge Algorithm (Greedy-edge): Since edge computing has a low computational cost, the computational resources of the UAV can be fully utilized, resembling the “Greedy on edge” algorithm reported in [43]. In time slot t, 60% of the total tasks are offloaded to the UAV, which distributes computational resources evenly based on the number of connected devices. All excess RE is shared with neighboring BSs.

5.2. Results and Analysis

In order to evaluate the impact of different network parameters on the performance of the proposed TD3PG-based algorithm, a normalized cumulative reward is adopted. The normalized cumulative reward is defined as the ratio between the difference and the extreme difference in the cumulative reward, where the difference is the difference between the cumulative reward and the minimum cumulative reward. A larger normalized cumulative reward means better performance. In Figure 5, combinations of four learning rates are presented to evaluate the impact of different learning rates. With an increased number of episodes, the normalized cumulative reward value fluctuates with different learning rates. Until around 2000 episodes, the four cases of normalized cumulative reward have different convergence trends. The performance of the proposed algorithm is the best when

α_{π} = 0.0001

and

α_{μ} = 0.0001

, corresponding to the highest convergence reward curve among the four cases.

In Figure 6, the impacts of different discount factors are evaluated. According to the convergence results, the normalized cumulative reward in the four cases gradually become smaller as

γ

keeps increasing. In particular, compared with

γ = 0.69

,

0.79

, and

0.89

, higher performance is exhibited when

γ = 0.99

. Hence,

γ = 0.99

is the best discount factor among the proposed discount factors.

Figure 7 shows the variation of the network cost function with the number of devices under different benchmark algorithms. Both the proposed algorithm and the DDPG algorithm demonstrate good performance in terms of joint latency-oriented energy consumption and carbon emissions. Specifically, for both algorithms, the value of the network cost function is always lower than other benchmarks for different numbers of devices and increases at a slow rate as the number of devices increases. This is due to the fact that the proposed algorithm and the DDPG algorithm dynamically adjust the offloading strategy to obtain a lower value of the cost function, optimizing the trajectory of the UAV and the sharing ratio of the RE in real time, ultimately obtaining a lower value of the cost function. In particular, the cost function under the TD3PG algorithm is lower and shows better performance compared to that under the DDPG algorithm.

In Figure 8, with the exception of full offload, the total energy consumption under other benchmark strategies increases with the maximum computation capacity of the device. Due to the ability of the proposed algorithm and the DDPG algorithm to dynamically adjust the local computational resources according to the reward function, the total energy consumption achieves optimal performance when

f_{m}^{m a x}

is low. In particular, the proposed algorithm shows the best performance with

f_{m}^{m a x}

between 2.0 GHz and 2.5 GHz. However, the full offload policy offloads all tasks to the computation server with no significant change in total energy consumption.

Figure 9 compares the variation of the total network energy consumption with different values of energy consumption per unit of computational resources for the LEO satellite

e_{s a t}

under different benchmark algorithms. With the increase in

e_{s a t}

, the total energy consumption under the proposed algorithm is always the lowest. The total energy consumption of the network under the greedy-edge algorithm is the highest. On the one hand, it makes no better offloading decision. On the other hand, in terms of energy consumption, the device is unable to intelligently change the computational resources, resulting in increased energy consumption for local computation. Under the full offload benchmark algorithm, more tasks are offloaded to the LEO satellite for execution. The total energy consumption grows more significantly when

e_{s a t}

increases. The DDPG algorithm dynamically adjusts task offloading according to the reward value, with the rate of increase in total network energy consumption gradually decreasing as

e_{s a t}

grows. The high altitude of the LEO satellite leads to increased transmission latency. Therefore, few tasks are offloaded to it. The variation in total network energy consumption with the proposed algorithm is slight when

e_{s a t}

increases. Similarly, the proposed algorithm demonstrates excellent performance as the energy consumption per unit of computational resources for the UAV (

e_{u a v}

) increases.

In order to illustrate the performance of the proposed algorithm in optimizing the UAV trajectory to reduce latency, three cases with different benchmark algorithms are analyzed. These three cases are trajectory optimization, the trajectory fixed in one direction, and the UAV flying in place. Simultaneously, 50 devices are compared in terms of total execution latency for all tasks throughout the entire period. As shown in Figure 10, the lowest overall latency of the three cases is trajectory optimization, which reduces the overall latency by 2.93% compared to the UAV flying in place and 5.78% compared to the overall time reduction for a trajectory fixed in one direction. The total task latency under both the proposed algorithm and the DDPG algorithm is lower in each flight trajectory mode, significantly improving the latency/QoS. With both algorithms, the UAV moves towards the location with the appropriate number of devices. As offloading too many tasks to the UAV leads to an increase in total latency, the greedy-edge and full offload algorithms show poor performance. The greedy-edge algorithm offloads a higher percentage of tasks to the UAV, resulting in a higher task latency. Meanwhile, the local computational resources do not correspond to the maximum computation capability, which leads to a longer computation time. This is the reason why the total task latency of the full offload algorithm is lower than the greedy-edge algorithm.

Since the UAVs have limited battery resources, the remaining power of the UAV needs to be maintained as much as possible to ensure the sustainable operation of the network. Figure 11 shows the average residual power of the UAV at each time slot for different benchmark algorithms. Among these benchmark algorithms, both the proposed algorithm and the DDPG algorithm employ WPT, which ensure that the UAV always has remaining power to maintain operation. However, the other two do not have WPT. So the power is quickly depleted and unable to maintain the sustainable operation of the network. It is worth noting that the energy consumed by UAV computation is not large compared to the energy consumed by UAV flight. Hence, the difference in the average remaining power of UAV under the remaining three benchmark algorithms is small. The greedy-edge algorithm and the full offload policy offload a large amount of tasks to the computation server. So the average remaining power of the UAV is lower. The proposed algorithm, on the other hand, integrates dynamic task offloading with UAV trajectory optimization, which not only maximizes the relative residual power of the UAV but also enables the UAV to replenish power through WPT when its power falls below a threshold value, maintaining sustainable network operation.

In order to evaluate the performance of the proposed algorithm in reducing the carbon emissions of the network, Figure 12 shows the average loss ratio of RE for different loss factors of energy transmission. The smaller the average loss rate of RE, the more carbon emissions are reduced. As the two algorithms can intelligently optimize the sharing proportion (

υ_{n} (t)

) of RE, the proposed algorithm shows better performance compared to the DDPG algorithm. It is worth noting that the average loss ratio of RE under the DDPG algorithm at

η = 0.05

is higher than that

η = 0.1

. It may be that too much energy is carried over when too little energy is shared. When RE arrives in the next time slot, the total remaining energy exceeds the backup battery capacity, which leads to an increase in the amount of RE (

e_{n}^{e x c} (t)

) that is discarded.

6. Conclusions

In this article, a SAGIN scenario with new designed power technology is proposed for the first time. A cost function is formulated with joint latency/QoS, green communication, and peak carbon dioxide. Then, a TD3PG method is proposed to optimize the cost function. Finally, the proposed algorithm is compared with three benchmark algorithms. Many valuable findings can be captured. Simulation results showcase that the cost function obviously decreases. The proposed approach improves the sustainability of the network using WPT technology, while offering advantages in terms of task latency and energy saving. In particular, by optimizing the UAV trajectory under this algorithm, the total task latency is reduced by 2.93% and 5.78% compared to a single trajectory and no trajectory, respectively. The proposed algorithm also shows better performance when it comes to reducing the carbon emissions of the network by optimizing the the sharing proportion of RE. The research reported in this article contributes to the development of future network in the direction of smart, green, and low-carbon applications, which is also important for improving the QoS in future communication network. In the future, a software-defined network will be utilized to better optimize the energy resources and wireless resources in a SAGIN.

Author Contributions

Y.W.: methodology, software, and writing; B.L.: supervision and writing—reviewing and editing; J.H.: validation; J.D.: supervision; Y.L.: editing; Y.Y.: original draft preparation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Ningxia Autonomous Region key R&D plan project (2020BDE03006), Ningxia Natural Science Foundation (2023AAC03024), and the National Natural Science Foundation of China (61301145).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cheng, N.; Quan, W.; Shi, W.; Wu, H.; Ye, Q.; Zhou, H.; Zhuang, W.; Shen, X.; Bai, B. A Comprehensive Simulation Platform for Space-Air-Ground Integrated Network. IEEE Wirel. Commun. 2020, 27, 178–185. [Google Scholar] [CrossRef]
Chowdhury, M.Z.; Shahjalal, M.; Ahmed, S.; Jang, Y.M. 6G Wireless Communication Systems: Applications, Requirements, Technologies, Challenges, and Research Directions. IEEE Open J. Commun. Soc. 2020, 1, 957–975. [Google Scholar] [CrossRef]
Wang, Y.; Su, Z.; Ni, J.; Zhang, N.; Shen, X. Blockchain-Empowered Space-Air-Ground Integrated Networks: Opportunities, Challenges, and Solutions. IEEE Commun. Surv. Tutor. 2022, 24, 160–209. [Google Scholar] [CrossRef]
Wang, C.; Yu, X.; Xu, L.; Jiang, F.; Wang, W.; Cheng, X. QoS-aware offloading based on communication-computation resource coordination for 6G edge intelligence. China Commun. 2023, 20, 236–251. [Google Scholar] [CrossRef]
Feng, X.; Ling, X.; Zheng, H.; Chen, Z.; Xu, Y. Adaptive Multi-Kernel SVM With Spatial–Temporal Correlation for Short-Term Traffic Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2019, 20, 2001–2013. [Google Scholar] [CrossRef]
Letaief, K.B.; Chen, W.; Shi, Y.; Zhang, J.; Zhang, Y.J.A. The Roadmap to 6G: AI Empowered Wireless Networks. IEEE Commun. Mag. 2019, 57, 84–90. [Google Scholar] [CrossRef]
Zhao, B.; Cui, Q.; Liang, S.; Zhai, J.; Hou, Y.; Huang, X.; Pan, M.; Tao, X. Green concerns in federated learning over 6G. China Commun. 2022, 19, 50–69. [Google Scholar] [CrossRef]
Cheshmehzangi, A.; Chen, H. Key Suggestions and Steps Ahead for China’s Carbon Neutrality Plan. In China’s Sustainability Transitions: Low Carbon and Climate-Resilient Plan for Carbon Neutral 2060; Springer: Singapore, 2021; pp. 189–206. [Google Scholar]
Chen, Q.; Meng, W.; Han, S.; Li, C.; Chen, H.H. Robust Task Scheduling for Delay-Aware IoT Applications in Civil Aircraft-Augmented SAGIN. IEEE Trans. Commun. 2022, 70, 5368–5385. [Google Scholar] [CrossRef]
Sun, Y.; Xie, B.; Zhou, S.; Niu, Z. MEET: Mobility-Enhanced Edge inTelligence for Smart and Green 6G Networks. IEEE Commun. Mag. 2023, 61, 64–70. [Google Scholar] [CrossRef]
Peng, H.; Shen, X. Deep Reinforcement Learning Based Resource Management for Multi-Access Edge Computing in Vehicular Networks. IEEE Trans. Netw. Sci. Eng. 2020, 7, 2416–2428. [Google Scholar] [CrossRef]
Zhao, J.; Li, Q.; Gong, Y.; Zhang, K. Computation Offloading and Resource Allocation For Cloud Assisted Mobile Edge Computing in Vehicular Networks. IEEE Trans. Veh. Technol. 2019, 68, 7944–7956. [Google Scholar] [CrossRef]
Abderrahim, W.; Amin, O.; Alouini, M.S.; Shihada, B. Latency-Aware Offloading in Integrated Satellite Terrestrial Networks. IEEE Open J. Commun. Soc. 2020, 1, 490–500. [Google Scholar] [CrossRef]
Dai, Y.; Zhang, K.; Maharjan, S.; Zhang, Y. Edge Intelligence for Energy-Efficient Computation Offloading and Resource Allocation in 5G Beyond. IEEE Trans. Veh. Technol. 2020, 69, 12175–12186. [Google Scholar] [CrossRef]
Wang, F.; Xing, H.; Xu, J. Real-Time Resource Allocation for Wireless Powered Multiuser Mobile Edge Computing With Energy and Task Causality. IEEE Trans. Commun. 2020, 68, 7140–7155. [Google Scholar] [CrossRef]
Chen, M.; Hao, Y. Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network. IEEE J. Sel. Areas Commun. 2018, 36, 587–597. [Google Scholar] [CrossRef]
Guo, Y.; Li, Q.; Li, Y.; Zhang, N.; Wang, S. Service Coordination in the Space-Air-Ground Integrated Network. IEEE Netw. 2021, 35, 168–173. [Google Scholar] [CrossRef]
Zhang, L.; Abderrahim, W.; Shihada, B. Heterogeneous Traffic Offloading in Space-Air-Ground Integrated Networks. IEEE Access 2021, 9, 165462–165475. [Google Scholar] [CrossRef]
Chen, Q.; Meng, W.; Han, S.; Li, C.; Chen, H.H. Reinforcement Learning-Based Energy-Efficient Data Access for Airborne Users in Civil Aircrafts-Enabled SAGIN. IEEE Trans. Green Commun. Netw. 2021, 5, 934–949. [Google Scholar] [CrossRef]
Buzzi, S.; I, C.L.; Klein, T.E.; Poor, H.V.; Yang, C.; Zappone, A. A Survey of Energy-Efficient Techniques for 5G Networks and Challenges Ahead. IEEE J. Sel. Areas Commun. 2016, 34, 697–709. [Google Scholar] [CrossRef]
Alqasir, A.M.; Kamal, A.E. Cooperative Small Cell HetNets With Dynamic Sleeping and Energy Harvesting. IEEE Trans. Green Commun. Netw. 2020, 4, 774–782. [Google Scholar] [CrossRef]
Malta, S.; Pinto, P.; Fernández-Veiga, M. Using Reinforcement Learning to Reduce Energy Consumption of Ultra-Dense Networks With 5G Use Cases Requirements. IEEE Access 2023, 11, 5417–5428. [Google Scholar] [CrossRef]
Yuan, H.; Tang, G.; Guo, D.; Wu, K.; Shao, X.; Yu, K.; Wei, W. BESS Aided Renewable Energy Supply Using Deep Reinforcement Learning for 5G and Beyond. IEEE Trans. Green Commun. Netw. 2022, 6, 669–684. [Google Scholar] [CrossRef]
Jahid, A.; Monju, M.K.H.; Hossain, M.E.; Hossain, M.F. Renewable Energy Assisted Cost Aware Sustainable Off-Grid Base Stations with Energy Cooperation. IEEE Access 2018, 6, 60900–60920. [Google Scholar] [CrossRef]
Rubina Aktar, M.; Shamim Anower, M.; Zahurul Islam Sarkar, M.; Sayem, A.S.M.; Rashedul Islam, M.; Akash, A.I.; Rumana Akter Rume, M.; Moloudian, G.; Lalbakhsh, A. Energy-Efficient Hybrid Powered Cloud Radio Access Network (C-RAN) for 5G. IEEE Access 2023, 11, 3208–3220. [Google Scholar] [CrossRef]
Saad, W.; Bennis, M.; Chen, M. A Vision of 6G Wireless Systems: Applications, Trends, Technologies, and Open Research Problems. IEEE Netw. 2020, 34, 134–142. [Google Scholar] [CrossRef]
Wang, F.; Xu, J.; Wang, X.; Cui, S. Joint Offloading and Computing Optimization in Wireless Powered Mobile-Edge Computing Systems. IEEE Trans. Wirel. Commun. 2018, 17, 1784–1797. [Google Scholar] [CrossRef]
Zheng, K.; Jiang, G.; Liu, X.; Chi, K.; Yao, X.; Liu, J. DRL-Based Offloading for Computation Delay Minimization in Wireless-Powered Multi-Access Edge Computing. IEEE Trans. Commun. 2023, 71, 1755–1770. [Google Scholar] [CrossRef]
Zheng, K.; Jia, X.; Chi, K.; Liu, X. DDPG-Based Joint Time and Energy Management in Ambient Backscatter-Assisted Hybrid Underlay CRNs. IEEE Trans. Commun. 2023, 71, 441–456. [Google Scholar] [CrossRef]
Liu, J.; Zhao, X.; Qin, P.; Geng, S.; Meng, S. Joint Dynamic Task Offloading and Resource Scheduling for WPT Enabled Space-Air-Ground Power Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022, 9, 660–677. [Google Scholar] [CrossRef]
Sun, L.; Pang, H.; Gao, L. Joint Sponsor Scheduling in Cellular and Edge Caching Networks for Mobile Video Delivery. IEEE Trans. Multimed. 2018, 20, 3414–3427. [Google Scholar] [CrossRef]
Yong, P.; Zhang, N.; Hou, Q.; Liu, Y.; Teng, F.; Ci, S.; Kang, C. Evaluating the Dispatchable Capacity of Base Station Backup Batteries in Distribution Networks. IEEE Trans. Smart Grid 2021, 12, 3966–3979. [Google Scholar] [CrossRef]
Hu, Q.; Cai, Y.; Yu, G.; Qin, Z.; Zhao, M.; Li, G.Y. Joint Offloading and Trajectory Design for UAV-Enabled Mobile Edge Computing Systems. IEEE Internet Things J. 2019, 6, 1879–1892. [Google Scholar] [CrossRef]
Hong, T.; Zhao, W.; Liu, R.; Kadoch, M. Space-Air-Ground IoT Network and Related Key Technologies. IEEE Wirel. Commun. 2020, 27, 96–104. [Google Scholar] [CrossRef]
Mao, S.; He, S.; Wu, J. Joint UAV Position Optimization and Resource Scheduling in Space-Air-Ground Integrated Networks With Mixed Cloud-Edge Computing. IEEE Syst. J. 2021, 15, 3992–4002. [Google Scholar] [CrossRef]
Lakew, D.S.; Tran, A.T.; Dao, N.N.; Cho, S. Intelligent Offloading and Resource Allocation in Heterogeneous Aerial Access IoT Networks. IEEE Internet Things J. 2023, 10, 5704–5718. [Google Scholar] [CrossRef]
Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
Zhan, C.; Zeng, Y. Aerial–Ground Cost Tradeoff for Multi-UAV-Enabled Data Collection in Wireless Sensor Networks. IEEE Trans. Commun. 2020, 68, 1937–1950. [Google Scholar] [CrossRef]
Huang, H.; Ye, Q.; Zhou, Y. 6G-Empowered Offloading for Realtime Applications in Multi-Access Edge Computing. IEEE Trans. Netw. Sci. Eng. 2023, 10, 1311–1325. [Google Scholar] [CrossRef]
Wang, Y.; Gao, Z.; Zhang, J.; Cao, X.; Zheng, D.; Gao, Y.; Ng, D.W.K.; Renzo, M.D. Trajectory Design for UAV-Based Internet of Things Data Collection: A Deep Reinforcement Learning Approach. IEEE Internet Things J. 2022, 9, 3899–3912. [Google Scholar] [CrossRef]
Gao, G.; Li, J.; Wen, Y. DeepComfort: Energy-Efficient Thermal Comfort Control in Buildings Via Reinforcement Learning. IEEE Internet Things J. 2020, 7, 8472–8484. [Google Scholar] [CrossRef]
Zhao, X.; Du, F.; Geng, S.; Fu, Z.; Wang, Z.; Zhang, Y.; Zhou, Z.; Zhang, L.; Yang, L. Playback of 5G and Beyond Measured MIMO Channels by an ANN-Based Modeling and Simulation Framework. IEEE J. Sel. Areas Commun. 2020, 38, 1945–1954. [Google Scholar] [CrossRef]
Cheng, N.; Lyu, F.; Quan, W.; Zhou, C.; He, H.; Shi, W.; Shen, X. Space/Aerial-Assisted Computing Offloading for IoT Applications: A Learning-Based Approach. IEEE J. Sel. Areas Commun. 2019, 37, 1117–1129. [Google Scholar] [CrossRef]

Figure 1. SAGIN scenario with newly designed power supply.

Figure 2. WPT principle in the SAGIN.

Figure 3. Strategic framework for the SAGIN scenario.

Figure 4. Structure of the TD3PG algorithm.

Figure 5. Impact of learning rate on training episodes.

Figure 6. Impact of discount factor on training episodes.

Figure 7. Comparison of the cost function with respect to the number of devices.

Figure 8. Comparison of total energy consumption for a device with maximum computation capacity.

Figure 9. Comparison of the total energy consumption with the energy consumption per unit of computational resources for the LEO satellite.

Figure 10. Comparison of total task latency with three trajectory settings.

Figure 11. Comparison of average remaining power of the UAV in each time slot.

Figure 12. Comparison of average loss ratio of RE with the loss factor (

η

).

Figure 12. Comparison of average loss ratio of RE with the loss factor (

η

).

Table 1. Summary of key notations.

Notation	Definition
$M$ , $N$ , $K$ , $T$	The set of devices, BSs, UAVs, and time slots
M, N, K, T	Number of devices, BSs, UAVs, and time slots
$λ_{m} (t)$ , $s_{m} (t)$ , $e_{m} (t)$ , $t_{m} (t)$	The arrival rate, data size, required number of CPU cycles per unit of data, and the latency constraint for the computation subtask of device m in time slot t
$L_{n} (t)$ , $T_{n} (t)$ , $D_{n}^{r e s} (t)$	The electrical load, communication traffic, and fixed standby time for BS n in time slot t
$R_{m, n} (t)$ , $R_{m, k} (t)$ , $R_{m, s a t} (t)$	Wireless communication data rate in time slot t between devices and BSs, UAVs, and the LEO satellite, respectively
$B_{m, n} (t)$ , $B_{m, k} (t)$	Bandwidth allocated to device m by BS n and UAV k in time slot t
$t_{m}^{l, w a i t} (t)$ , $t_{k}^{u, w a i t} (t)$ , $t^{s, w a i t} (t)$	Queuing delay for subtasks to be computed locally, offloaded to the UAV, and offloaded to the satellite for execution in time slot t
$f_{m} (t)$ , $f_{m, n} (t)$ , $f_{m, k} (t)$ , $f_{m, s a t} (t)$	Computational resources allocated to device m, BS n, UAV k, and the LEO satellite, respectively, in time slot t
$f_{m}^{max}$ , $f_{n}^{max}$ , $f_{k}^{max}$ , $f_{s a t}^{max}$	The total computational resources of device m, BS n, UAV k, and the LEO satellite, respectively
$p_{m}$ , $κ$	Device upload power and the CPU effective switched capacitance
$v_{k} (t)$ , $θ_{k} (t)$ , $Ψ_{k} (t)$ , $v_{m a x}$ , H	The average flying velocity, the flying directional angle, the horizontal position coordinates, the maximum speed, and the height of UAV k, respectively
$D_{min}$ , $α_{0}$ , $β_{0}$	Safety distance among UAVs, the path loss index, and the channel gain at unit distance
$a_{m, k} (t)$ , $o_{k, n} (t)$	Whether device m is within the communication range for UAV k and whether UAV k is within the WPT range for BS n in time slot t
$t_{m}^{l} (t)$ , $E_{m}^{l} (t)$	Subtask execution time and energy consumption for local computation in time slot t
$t_{n}^{b s} (t)$ , $E_{n}^{b s} (t)$	Subtask execution time and energy consumption for offloading to BS n in time slot t
$t_{k}^{u a v} (t)$ , $E_{k}^{u a v} (t)$	Subtask execution time and energy consumption for offloading to UAV k in time slot t
$t_{m}^{s a t} (t)$ , $E_{m}^{s a t} (t)$	Subtask execution time and energy consumption for offloading to the LEO satellite in time slot t
$B_{k}^{H} (t)$ , $E_{k}^{F} (t)$	The harvested energy and flight energy consumption for UAV k in time slot t
$σ^{2}$ , $ζ_{m^{'}, n^{'}} (t)$ , $ζ_{m^{'}, k^{'}} (t)$	Noise power and the interference from all other BSs and UAVs
$B^{b o t}$ , $B^{c a p a c i t y}$	Minimum power threshold and maximum battery capacity for UAV batteries, respectively
$B^{r e s} (t)$ , $B^{b a t t e r y}$	Reserved energy in time slot t and maximum battery capacity for BS backup battery, respectively
$ι$ , $η$ , $χ$	Power division factor, loss factor for energy transfer, and energy conservation efficiency, respectively
$G_{n}^{t}$ , $G_{k}^{r}$ , $P_{n}$	Antenna power gains of the BS transmitter and the UAV receiver, BS n transmitting power
$P_{0}$ , $P_{1}$ , $U_{t i p}$	Blade profile power in hovering state, induced power in hovering state, and tip speed of the rotor blade, respectively
$v_{0}$ , $ρ$ , A, $d_{0}$ , s	Mean rotor-induced velocity in hover state, air density, rotor disc area, fuselage drag ratio, and rotor solidity, respectively
$δ_{0}$ , $δ_{1}$	Coefficients related to the electrical load for BSs
$ϑ_{0}$ , $ϑ_{1}$ , $ϑ_{2}$	Preference for latency, energy consumption, and carbon emissions, respectively
$T (t)$ , $E^{e c} (t)$ , $C^{R E} (t)$	The total task execution latency, the energy consumption for task execution and UAV flight, and the reduction in carbon emissions by reducing traditional power gird energy consumption in time slot t, respectively
$e_{u a v}$ , $e_{s a t}$	Energy consumption per unit of computational resources of the UAV and LEO satellite

Table 2. Simulation parameters.

Parameter	Value	Parameter	Value
M, N, K	50, 2, 3	$v_{0}$	4.03 m/s
$s_{m}$	$U [3, 8]$ Mbits	$P_{0}$	79.86 W
$e_{m}$	800 cycles/bit	$P_{1}$	92.48 W
$t_{m}$	$U [0.8, 1.2]$ s	$ρ$	1.225 kg/m $^{3}$
$p_{m}$	0.1 W	A	0.503 m $^{2}$
$f_{m}^{max}$ , $f_{n}^{max}$	0.25, 20 GHz	$χ$	0.9 [30]
$f_{k}^{max}$ , $f_{s a t}^{max}$	10, 50 GHz	s	0.05
$D_{min}$	20 m	$d_{0}$	0.6
$v_{m a x}$	23 m/s	$η$	0.1
H	100 m	$G_{n}^{t}$	41 dB
$α_{0}$	2 [42]	$G_{k}^{r}$	41 dB
$σ^{2}$	$10^{- 14}$ W	$U_{t i p}$	120 m/s
$κ$	$10^{- 27}$ J/Hz $^{3}$ /s	$B^{b a t t e r y}$	$10^{6}$ J
$e_{s a t}$	1 W/GHz	$B^{c a p a c i t y}$	$10^{4}$ J
$e_{u a v}$	1.5 W/GHz	$ϑ_{0}$ , $ϑ_{1}$ , $ϑ_{2}$	0.12, 0.98, −0.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Li, B.; He, J.; Dai, J.; Liu, Y.; Yang, Y. Joint Latency-Oriented, Energy Consumption, and Carbon Emission for a Space–Air–Ground Integrated Network with Newly Designed Power Technology. Electronics 2023, 12, 3537. https://doi.org/10.3390/electronics12173537

AMA Style

Wang Y, Li B, He J, Dai J, Liu Y, Yang Y. Joint Latency-Oriented, Energy Consumption, and Carbon Emission for a Space–Air–Ground Integrated Network with Newly Designed Power Technology. Electronics. 2023; 12(17):3537. https://doi.org/10.3390/electronics12173537

Chicago/Turabian Style

Wang, Yonghao, Bo Li, Jiahao He, Jiaxing Dai, Yidong Liu, and Yuxin Yang. 2023. "Joint Latency-Oriented, Energy Consumption, and Carbon Emission for a Space–Air–Ground Integrated Network with Newly Designed Power Technology" Electronics 12, no. 17: 3537. https://doi.org/10.3390/electronics12173537

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Joint Latency-Oriented, Energy Consumption, and Carbon Emission for a Space–Air–Ground Integrated Network with Newly Designed Power Technology

Abstract

1. Introduction

2. System Model

2.1. Backup Battery Model for BS

2.2. Communication Model

2.2.1. Offloading to the BSs

2.2.2. Offloading to the UAVs

2.2.3. Offloading to the LEO Satellite

2.3. Computational Model

2.3.1. Local Computation

2.3.2. Offloading to the BSs

2.3.3. Offloading to the UAVs

2.3.4. Offloading to the LEO satellite

2.4. WPT Model

2.5. Flight Model

3. Problem Formulation

4. TD3PG Algorithm Based on the Markov Chain Model in a SAGIN

4.1. MDP Formulation

4.2. TD3PG Algorithm

4.3. Complexity Analysis

5. Results and Discussion

5.1. Simulation Settings

5.2. Results and Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI