Next Article in Journal
An Improved Zero-Current Distortion Compensation Method for the Soft-Start of the Vienna Rectifier
Next Article in Special Issue
Probabilistic Task Offloading with Uncertain Processing Times in Device-to-Device Edge Networks
Previous Article in Journal
A Differentially Private Framework for the Dynamic Heterogeneous Redundant Architecture System in Cyberspace
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Collaborative Computation Offloading and Resource Management in Space–Air–Ground Integrated Networking: A Deep Reinforcement Learning Approach

1
The 15th Research Institute of China Electronics Technology Group Corporation, Beijing 100083, China
2
Beijing Tsinghua Tongheng Urban Planning and Design Institute, Beijing 100085, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(10), 1804; https://doi.org/10.3390/electronics13101804
Submission received: 14 April 2024 / Revised: 30 April 2024 / Accepted: 5 May 2024 / Published: 7 May 2024
(This article belongs to the Special Issue Edge Computing for 5G and Internet of Things)

Abstract

:
With the increasing dissemination of the Internet of Things and 5G, mobile edge computing has become a novel scheme to assist terminal devices in executing computation tasks. To elevate the coverage and computation capability of edge computing, a collaborative computation offloading and resource management architecture was proposed in space–air–ground integrated networking (SAGIN). In this manuscript, we established a novel model considering the computation offloading cost constraints of the communication, computing and cache model in the SAGIN. To be specific, the joint optimization problem of collaborative computation offloading and resource management was modeled as a mixed integer nonlinear programming problem. To address this issue, this paper proposed a computation offloading and resource allocation strategy based on deep reinforcement learning (DRL). Differing from traditional methods, DRL does not need a well-established formulation or previous information, and it is capable of revising the strategy adaptively according to the environment. The simulation results demonstrate the proposed approach can achieve the optimal reward values in the case of different terminal device numbers. Furthermore, this manuscript provided the analysis with variant parameters of the proposed approach.

1. Introduction

With the advent of the Internet of Things era and the intelligent development of applications, the number of terminal devices (TDs) is growing rapidly. Due to the rapid growth of tasks, as well as the mobility and limited resources of terminal devices, complex computation tasks are difficult to execute on terminal devices promptly. Therefore, the terminal devices are capable of offloading the excessive computation task to the adjacent server that executes the task processing and returns the results, so as to solve the problem of insufficient computation capacity for terminal devices. Consequently, edge computing [1,2] has emerged.
In edge computing [3], terminal devices can obtain high-quality computation services through payment. However, the number of edge servers is limited. Once the number of offloading requests exceeds the threshold, the service quality of offloading will be affected. Scholars have conducted extensive research on the optimization of computation offloading performance [4]. Some are committed to optimizing the resource allocation of edge servers, and they usually establish the computation offloading problem as an optimization model [5]. Due to the limitations of wireless spectrum, there is a communication bottleneck between the terminal device and the road side unit, which affects the quality of computation offloading service. In addition, once there are too many requests for computation offloading services, edge servers may become overloaded, which can lead to computation offloading failure or higher service latency. Therefore, some have researched how to expand the spectrum and computing resources of edge computing, e.g., unmanned aerial vehicles (UAV) [6,7], to build a multiple collaborative computation offloading mechanism. However, the network coverage and computation resources of edge computing are still greatly limited in the above research, and it is unable to provide ubiquitous computing services for terminals, especially in disaster areas, suburbs and other areas.
In order to enlarge the coverage and improve the ability of edge computing, the terminal device can offload computation tasks to the space–air–ground integrated networking (SAGIN) [8,9]. From Figure 1, as one of the most promising network architectures, SAGIN is a heterogeneous network based on ground networks and supplemented by space and air networks. It mainly includes computation nodes such as ground servers, UAVs, and low Earth orbit (LEO) satellites. SAGIN is expected to provide full coverage and high-quality computing services for terminal devices. However, the movement of terminal devices and computation nodes, as well as the uncertainty of channels, make SAGIN a time-varying network. How to effectively manage the resources of time-varying network is a challenging issue. In addition, SAGIN has large requirement for the time complexity of resource management algorithm. Traditional optimization method for solving the resource optimization problem in time-varying networks often require decoupling the original problem and repeating the solving process, leading to the interruption or failure of some delay-sensitive tasks.
Facing the shortcomings and challenges of existing research work, this manuscript considers computation offloading and resource management problem in the architecture of space–air–ground integrated network. In order to maximize the processing capability of SAGIN, we jointly established offloading decision, spectrum allocation, computation and storage resource scheduling as a mixed integer nonlinear programming (MINLP) problem. To address this issue, this manuscript proposes a computation offloading and resource management strategy relying on deep reinforcement learning (DRL) [10,11]. DRL has the advantage of adaptively rectification relying on the environment, while not requiring a well-established formulation or previous information. Therefore, DRL is capable of making decisions quickly when facing the time-varying SAGIN. The main contributions of this manuscript can be summarized as follows.
(1) A computation offloading architecture for SAGIN has been proposed, where the computation tasks of terminal devices can be processed locally or offloaded to ground edge servers, air edge servers, or space edge servers. In addition, controllers empowered by deep reinforcement learning can make real-time computation offloading decisions and network resource allocation.
(2) In order to maximize the processing power of SAGIN, this paper jointly optimizes offloading decision, spectrum allocation, computation, and storage resource management in SAGIN as a MINLP problem. To address this issue, a computation offloading and resource allocation strategy based on deep reinforcement learning is proposed. Specifically, the offloading decision was continuously processed in this strategy, which enhancing the convergence of the networking.
The organization of this manuscript is summarized as following. Section 2 discusses the research work related to computation offloading and resource management. Section 3 provides the system model and formulates the optimization problem. Section 4 and Section 5 details the algorithm for solving the problem. Section 6 evaluates the performance of the algorithm through simulation experiments. Finally, Section 7 concludes the manuscript.

2. Related Works

With the support of edge computing [12], terminal devices can provide an efficient computation and data processing service. However, once the number of offloading requests exceeds the threshold, the QoS of offloading will be affected. In response to this issue, scholars both domestically and internationally have conducted extensive research on optimizing computational offloading performance. For the sake of describing the difference between the current edge networks and our proposed approach, the main introduction will be provided in Table 1 and the following paragraphs.
From the perspective of this paper’s authors, for the sake of improving the quality of computation offloading in edge computing [13], the following papers researched this problem from an economic perspective [14]. Zeng et al. [15] proposed an architecture cooperating with a volunteer terminal device and edge server, which was achieved by analyzing the optimal offloading data volume of terminal devices and the serve resource pricing of edge server through Stackelberg game theory. Differently, Zhang et al. [16] proposed a cloud–edge-end collaborative computation offloading mechanism and resource pricing strategy. Zhou et al. [17] proposed a novel optimization approach to solve multi-user computation offloading and resource management in edge computing. This solution aims to minimize energy consumption while considering latency limitations. Chen et al. [18] utilized the Deep Deterministic Policy Gradient algorithm to address the challenge of computation offloading and resource management. Different from optimizing a single performance metric, Gong et al. [19] proposed a joint optimization scheme for multiple IoT devices on account of deep reinforcement learning, aiming to minimize latency and energy consumption. Chen et al. [20] proposed a signal-based incentive mechanism that utilizes contract theory to address the information asymmetry issue in computation task offloading. They also tackled the D2D pairing problem in a many-to-many scenario. Peng et al. [21] jointly considered multi-user collaborative partial offloading, transmission scheduling, and computation allocation. An online resource coordination and allocation scheme was proposed to minimize latency and energy consumption. Fang et al. [22] considered a scenario where multiple users offload tasks to the same idle user. They investigated a multi-user computation task offloading problem, intending to maximize the overall efficiencies for all users in edge computing by jointly optimize channel allocation, device pairing, and offloading modes. The researches above deemed an isolated edge server as the provider of computing service. However, once there is a massive service request, the edge server may become overloaded, which leads to task interruption or failure.
Table 1. Comparison between current research and our proposed approach.
Table 1. Comparison between current research and our proposed approach.
ReferenceApproachScenario
[15]Stackelberg game theoryVehicular edge computing
[16]Stackelberg game theory and genetic algorithm-based searching algorithmArchitecture of the vehicles, the edge servers, and the cloud
[17]Value iteration-based reinforcement learningComputation offloading and resource allocation in mobile edge computing
[18]Deep deterministic policy gradient algorithmComputation offloading and resource allocation in mobile edge computing
[19]Deep reinforcement learningMulti-access edge computing in Industrial Internet of Things
[20]Signal-based incentive mechanismDevice-to-device computation offloading
[21]Online resource coordination and allocation schemeDevice-to-device computation offloading
[22]A potential game approachMultiuser computing task offloading problem in device-enhanced MEC
[23]Deep deterministic policy gradient algorithmMulti-access edge computing and unmanned aerial vehicle
[24]Soft actor critic algorithmDrone-assisted multi-access edge computing
[25]Deep reinforcement learning approachComputation offloading and resource allocation in aerial to ground network
[26]Primal decomposition approachAir-to-ground communication and computation scenario
[27]Edge-embedded lightweight algorithmDistributed edge–cloud collaborative framework for UAV object detection
[28]Q-learning based iterative algorithmTask offloading to unmanned aerial vehicle swarm
[29]Cooperative resource allocation approachComputation-intensive Industrial Internet of Things applications
[30]Difference-of-convex programming algorithmFlexible deployment of UAVs
Our proposed approachDeep reinforcement learningCollaborative computation offloading and resource management in space–air–ground integrated networking
In an effort to expand the computing resources of edge networks, the authors of the following papers used drones to assist terminal devices in computing offloading. Peng et al. proposed a UAV-assisted edge computing networking architecture for computation offloading. This research intended to optimize computation offloading decisions and resource allocation jointly, to maximize the number of device computing tasks by DDPG algorithm [23]. Furthermore, a soft actor–critic algorithm was adopted to improve it [24]. Seid et al. [25] proposed a UAV-assisted emergency collaborative computation offloading and resource management method in air–ground networking, which considered limitations while minimizing task latency and energy consumption sufficiently. Zhou et al. [26] investigated a UAV-oriented computation offloading problem for the sake of completing its computation task demands with the help of ground edge computing facilities. Yuan et al. [27] proposed a novel UAV edge and cloud collaborative framework for the sake of object detection, which attempts to achieve the moving targets in a timely and accurate manner. Ma et al. [28] designed a novel task-offloading framework that vehicles’ computation jobs could execute natively; otherwise, they were offloaded to UAVs and ECDs. Liu et al. [29] established the processor resources and energy consumption model for the task-offloading approach. Dinh et al. [30] proposed a communication method that utilizes the flexible deployment of UAVs and their cooperative transmissions to improve acccessing in the networking. Although the above manuscripts utilized adjacent vehicles, UAVs or cloud servers to reduce the load on the edge server, the coverage of in-edge networks is still limited and may not be able to provide a ubiquitous computation service for terminal device.

3. System Model

3.1. Architecture of Space–Air–Ground Integrated Networking

From Figure 2, the offloading scenarios in SAGIN are composed of a cellular base station in ground, an unmanned aerial vehicle in air, a low earth orbit satellite in space and terminal devices of users. In SAGIN, tasks of terminal devices can be processed locally or offloaded as a whole to the ground edge server, air edge server or space server for execution. For the sake of achieving the separation of control platform and data platform in this network architecture, SAGIN is divided into a ground layer, air layer and space layer based on Software Defined Networking architecture. with each layer managed by a controller. In addition, the delay in resource coordination between nodes in the same layer is not considered, and resources in different layers are considered independent of each other. TDs are able to choose different offloading computation mechanisms according to their respective demands. To be specific, in this network architecture there is three modes to offload scenarios by terminal devices, i.e., device-to-ground, device-to-air and device-to-space. In addition, the communication model of above will be provided as follows.

3.2. Wireless Communication Model

(1) Task offloaded mechanism to ground. In this mechanism, w m g is designated as the channel bandwidth between TD m ( m M ) and ground g and p m g represents the transmission power. In addition, σ 2 is the constant addictive noise power, h m , g represents the channel gain and k m g { 0 , 1 } is the interference factor. Accordingly, the data transmission rate r m g in this mechanism can be established as
r m g = w m g log 2 1 + p m g | h m g | 2 σ 2 + Σ m = 1 M k m g p m g | h m g | 2 .
From the above model, assume that α m g is designated as the access fee from TD m, charged by ground edge servers. β m g represents the usage cost of spectrum, paid by ground edge servers. Thus, the communication income can be established as R m g c o m m = α m g r m g β m g w m g . Furthermore, α m g r m g means the income of ground edge cloud from user m, and β m g w m g is defined as the cost of ground edge cloud to pay for the usage of bandwidth.
(2) Task offloaded mechanism to air. Similarly, the channel bandwidth is defined by w m a and p m a indicates the transmission power. To be specific, σ 2 is the constant addictive noise power and h m , a expresses the channel gain. So, the data transmission rate r m a is established as
r m a = w m a log 2 1 + p m a | h m a | 2 σ 2 + Σ m = 1 M k m a p m a | h m a | 2 ,
where k m a [ 0 , 1 ] means the interference between TD m and air edge server a.
Assume that α m a indicates the access fee paid by TD m. The usage cost of spectrum is defined as β m a , which is paid by air edge cloud a. The communication income can be established as R m a c o m m = α m a r m a β m a w m a . Moreover, α m a r m a denotes the revenue of air edge server a from user m, and β m a w m a means the cost of air edge server a to defray for the usage of bandwidth.
(3) Task offloaded mechanism to space. Similar to the task offloading the mechanism to ground, the channel bandwidth is expressed by w m s , and p m s is the transmission power. Moreover, σ 2 stands for the constant addictive noise power and h m , s denotes the channel gain. Accordingly, the data transmission rate r m s is established as
r m s = w m s log 2 1 + p m s | h m s | 2 σ 2 + Σ m = 1 M k m s p m s | h m s | 2 ,
where k m s [ 0 , 1 ] means the interference between TD m and space edge server s.
Let α m s denote the access fee paid by TD m. The usage cost of spectrum is expressed as β m s , which is paid by space edge server s. The communication revenue can be established as R m s c o m m = α m s r m s β m s w m s . To be specific, α m s r m s denotes the revenue of space edge server s from user m, and β m s w m s indicates the cost of space edge server s to defray for the usage of bandwidth.

3.3. Computation Model

In this scenario, TDs are expressed by M = { 1 , 2 , , M } . J m = { A m , A m , B m , T m m a x } , which represented the task from TD m. Furthermore, A m and A m indicate the volume of the task J m before and after computation. B m denotes CPU number required by task J m . For task J m , T m m a x represents the maximum delay. Task J m chooses the following different mechanisms base on its demand.
(1) Computation model in ground edge server. The computation delay by the ground edge server is formulated as T m g c = B m C m g . To be specific, C m g denotes the computation capacity of the ground edge server.
The computation rate of task m by the ground edge server is formulated as
q m g = A m B m / C m g = A m C m g B m ,
The computation energy consumption is expressed as e m g = ϖ m g T m g c , where ϖ m g stands for the energy consumption per second for ground edge server.
Suppose that ϕ m 0 stands for the computation fee from user m, charged by ground edge server. Specifically, the ground edge server pays for the computation cost to compute the task at ground edge server indicated as φ m g . The computation income model can be deemed as R m g c o m p = ϕ m g q m g φ m g e m g . Furthermore, ϕ m g q m g represents the revenue of the ground edge server from user m, and φ m g e m g is the cost of ground edge server to defray for the usage of servers.
(2) Computation model in air edge server. In the same way, the computation delay formulated by air edge server n for task J m is denoted as T m a c = B m C m a , where C m a is the computation capacity of air edge server a.
The computation rate of task m by air edge server a is established as
q m a = A m B m / C m a = A m C m a B m ,
Moreover, the computation energy consumption by air edge server a is indicated as: e m a = ϖ m a T m a c , where ϖ m a stands for the energy consumption for air edge server a.
Assume the computation cost charged by air edge server a is expressed as ϕ m a . The computation cost φ m a stands for computing the task at air edge server a. Eventually, the computation income model can be formulated as R m a c o m p = ϕ m a q m a φ m a e m a . Moreover, ϕ m a q m a indicates the income of air edge server a from user m, while φ m a e m a is cost of air edge server a to disburse for the usage of servers.
(3) Computation model in space edge server. In the same way, the computation delay executed by space edge server s for task J m denoted as T m s c = B m C m s , where C m s is the computation capacity by space edge server s.
The computation rate of task m by space edge server s is formulated as
q m s = A m B m / C m s = A m C m s B m ,
Specifically, the computation energy consumption by space edge server s is deemed as: e m a = ϖ m s T m s c , where ϖ m s is the energy consumption per second for space edge server s.
Suppose the computation fee charged by space edge server s is expressed as ϕ m s . The computation cost φ m s stands for computing the task at space edge server s. Finally, the computation income model can be formulated as R m s c o m p = ϕ m s q m s φ m s e m s . Meanwhile, ϕ m s q m s indicates the income of space edge server s from user m, while φ m s e m s is cost of space edge server s to pay for the usage of servers.

3.4. Cache Model

In this model, we consider D contents are demanded in different edge servers. The caching strategy is decided by the parameter x , x = 0 denotes the content is not cached in the edge server, otherwise x = 1 indicates the content is cached.
To be specific, the content widespread distribution is expressed as G = { g 1 , g 2 , , g D } , where D is the maximal type number of content. Moreover, each TD asks for the content d with the probability g d . Eventually, G submits to the Zipf distribution [31], and can be formulated as
g d = 1 / d ϵ Σ d = 1 D 1 / d ϵ ,
where the content widespread is characterized by ϵ , whose range is set as [0.5,1.5] [32]. Then, the gain is formulated as
l A m = g A m A m T A m ,
where T A m is the download delay for cached content. In this manuscript, the price for caching the content has already been provided in the above. The backhaul cost is indicated as γ m g , which is disbursed by ground edge server. Moreover, ψ m a represents the storage fee to cache the content A m by ground edge server, which is charged by itself. In conclusion, the caching revenue of ground edge server can be formulated as R m a c a c h e = ψ m a l A m γ m a A m . Specifically, ψ m a l A m indicates the income of ground edge server from TD m, and γ m a A m is the cost of ground edge server to defray for the usage of backhaul bandwidth.
Let γ m a stand for the backhaul cost of air edge server a. Moreover, the storage fee at air edge server a is denoted as ψ m a . Therefore, the caching income of air edge server a can be computed as R m a c a c h e = ψ m a l A m γ m a A m . Specifically, ψ m a l A m is the revenue of air edge server n from TD m, and γ m a A m means the cost for air edge server a to defray to the usage of backhaul bandwidth.
In addition, the caching revenue of space edge server s can be computed as R m s c a c h e = ψ m s l A m γ m s A m . Moreover, ψ m s l A m is the income of space edge server s from TD m, and γ m s A m denotes the cost for space edge server s to disburse to the usage of backhaul bandwidth.

3.5. Problem Formulation

(1) Wireless access constraint. The allocated bandwidth frequency should not transcend the whole bandwidth of wireless link for three offloading mechnisams. To be specific, the maximal bandwidths of each scheme are assumed to be identical. Eventually, constraint 1 is expressed as C 1 : Σ m = 1 M w m g F { x m = 1 } w m a x g r o u n d ; constraint 2 is established as C 2 : Σ m = 1 M w m a F { x m = 2 } w m a x a i r ; and constraint 3 is deemed as C 3 : Σ m = 1 M w m s F { x m = 3 } w m a x s p a c e .
(2) Computation resource constraint. The offloaded task should not exceed the maximal computational threshold of each edge server. Therefore, the constraints are established as follows. C 4 : Σ m = 1 M C m g F { x m = 1 } C m a x g r o u n d , C 5 : Σ m = 1 M C m a F { x m = 2 } C m a x a i r , C 6 : Σ m = 1 M C m s F { x m = 3 } C m a x s p a c e .
(3) Energy consumption constraint. The energy consumption of the task should not surpass the maximal threshold. Therefore, the constraints are deemed as C 7 : Σ m = 1 M e m g F { x m = 1 } E m a x g r o u n d , C 8 : Σ m = 1 M e m a F { x m = 2 } E m a x a i r , C 9 : Σ m = 1 M e m s F { x m = 3 } E m a x s p a c e .
(4) Computing offload scheme constraint. Considering that in this architecture the task of TD will be offload to ground node, air node and space node, then the offloading scheme is able to be established as F { x m } { 1 , 2 , 3 } , where the value represents offloading to ground node, air node and space node, respectively.
(5) Problem formulation model. The target of this manuscript is to maximize the processing capability of SAGIN, which equals to maximize the computing offloading revenue. Considering offloading decision, communication, computation and spectrum resource, the TD computing offloading problem in SAGVN can be modeled as a multi-objective joint optimization (MINLP) problem. According to the above models and constraints, the problem formulation of computation offloading and resource management in SAGIN is modeled as
m a x Σ g = 1 G Σ a = 1 A Σ s = 1 S ( R m c o m m + R m c o m p + R m c a c h e ) , s . t . C 1 : Σ m = 1 M w m g F { x m = 1 } w m a x g r o u n d ; C 2 : Σ m = 1 M w m a F { x m = 2 } w m a x a i r ; C 3 : Σ m = 1 M w m s F { x m = 3 } w m a x s p a c e ; C 4 : Σ m = 1 M C m g F { x m = 1 } C m a x g r o u n d ; C 5 : Σ m = 1 M C m a F { x m = 2 } C m a x a i r ; C 6 : Σ m = 1 M C m s F { x m = 3 } C m a x s p a c e ; C 7 : Σ m = 1 M e m g F { x m = 1 } E m a x g r o u n d ; C 8 : Σ m = 1 M e m a F { x m = 2 } E m a x a i r ; C 9 : Σ m = 1 M e m s F { x m = 3 } E m a x s p a c e ; C 10 : F { x m } { 1 , 2 , 3 } .

4. Deep Reinforcement Learning Approach

4.1. Reinforcement Learning Algorithm

In the process of mutually engaging in the environment, reinforcement learning algorithm [33] attempts to seek the best behavior through trial and error. The optimal behavior not only pays attention to the immediate income, but also considers the income of the next n steps. The revenue of optimal behavior can be represented as
V π ( s i ) = r i + γ r i + 1 + γ r i + 1 2 + ,
Q learning algorithm is one of the representative reinforcement learning algorithm, which is value-based. The state behavior value, named the Q value, stands for the expectation of revenue when the action is adopted by a in the state s. To be specific, Q value is an evaluation value of the state and action, represented by immediate benefits and discounted benefits. It is formulated as
Q ( s i , a i ) r i + γ V π ( s i + 1 ) ,
where γ ( 0 < γ < 1 ) is the discount coefficient, which means the impact of future revenue on current behavior. The aim of Q learning algorithm is to maximize system utility. Let O i substitute for r i , and replace V π ( s i + 1 ) with m a x a i + 1 A Q ( s i + 1 , a i + 1 ) , then we can obtain
Q ( s i , a i ) O i + γ m a x a i + 1 A Q ( s i + 1 , a i + 1 ) ,
The exploration and utilization of strategy is a vital issue in reinforcement learning. To be specific, when the system state scale is tremendous, the method chosen for how to select the action effectively will affect the convergence speed and performance of the algorithm directly. Based on the behavior of Q value and I n d e x ( s , a ) , this manuscript proposes a comprehensive action evaluation scheme to obtain a well-performed action.
When the system is at the state s i , the algorithm will choose the action a i with the following equation.
a i a r g m a x a A ( Q ( s i , a ) + I n d e x ( s i , a ) ) ,
where Q stands for evaluating the state and action. On the basis of Q value, the behavior index value is selected to maximize behavior benefit, which can be represented as
I n d e x ( s i , a ) = ζ 2 l n n T i ( n ) m i n { 1 4 , v i ( n ) } ,
Meanwhile, ζ is a constant that is larger than zero. T i ( n ) represents the selected time of action a i after n actions. v i ( n ) expresses the deviation factor that reflects the volatility, owing to introducing the variance of behavioral utility value σ i 2 ( n ) .
σ i 2 ( n ) = t = 1 T i ( n ) r i 2 ( t ) T i ( n ) O i 2 ( T i ( n ) ) ,
v i ( n ) = σ i 2 ( n ) + 2 l n n T i ( n ) ,
On the one hand, the behavior selection mechanism based on index I n d e x ( s i , a ) gradually considers behavior with greater utility, reflecting the characteristics of utilization. On the other hand, if a behavior is not selected or is selected very few times with the iteration increases, then in the subsequent selection the behavior tends to be chosen, while reflecting the characteristics of exploration.
After the execution behavior is determined, the relay node executes behavior a i . Moreover, the utility value O will be calculated and updated according to the following equation.
Q t + 1 ( s i , a i ) = ( 1 α ) Q i ( s i , a i ) + α ( O i + γ m a x Q i ( s i + 1 , a i + 1 ) ) , s = s i a n d a = a i Q i ( s i , a i ) , o t h e r w i s e ,
where α ( 0 < α 1 ) is a learning factor for state behavior, and it is formualted as α = 1 1 + T i ( n ) . The concrete excutive process of Q-learing algorithm is shown as Algorithm 1.
Algorithm 1 Q-learning Algorithm
  • Input: state s, action a;
  • Output: Q ( s , a ) ;
  • Initialization:
  • Initialize behavior visit numbers T i ( n ) = 0 , state behavior value Q ( s i , a i ) = 0 , state action query table;
  • for  e p i s o d e 1 = 1 , I 1  do
  •    Initialize action vector a = a 1 , a 2 ,
  •    As to the current state s 1
  •    for  e p i s o d e 2 = 1 , I t e  do
  •      if  e p i s o d e 2 = 1  then
  •         select a random action a i ;
  •      end if
  •      if  e p i s o d e 2 > 1  then
  •          I n d e x ( s i , a ) ζ 2 l n n T i ( n ) m i n { 1 4 , v i ( n ) } ;
  •         According to a i max a [ Q ( s i , a ) + I n d e x ( s i , a ) ] choose the action.
  •       end if
  •       Carry out action a i in this mechanism, then obtain the reward value O i and the next state value s i + 1 ;
  •       Compute α 1 1 + T i ( n )
  •       Update Q ( s i , a i )
  •       Update the state action query table;
  •    end for
  • end for
In the following, we will provide the convergence analysis of this algorithm, the optimal Q value is represented as Q * ( s i , a i ) .
Lemma 1.
For the interation of Q value with bounded revenue O, the learning factor is α ( 0 < α 1 ) , and
i = 1 α T i ( n ) = , i = 1 α T i ( n ) 2 < , s , a ,
Then, when T i ( n ) α , s , a , it can be obtained that
lim i Q i ( s i , a i ) = Q * ( s i , a i )
Proof. 
Firstly, the initial function is defined as Q 0 ( s i , a i ) , then the optimal value Q * ( s i , a i ) can be updated with Equation (17) for s i S , a i A .
For functions Q * ( s i , a i ) , O ( s i , a i ) and Q 0 ( s i , a i ) , the constants ε , η , ϑ , ζ and γ ( 0 < γ < 1 ) should meet the following conditions.
ε O ( s i , a i ) γ m a x Q * ( s i + 1 , a i + 1 ) η O ( s i , a i ) ,
ϑ Q * ( s i , a i ) Q 0 ( s i , a i ) Q * ( s i , a i ) ,
where 0 < ε η < and 0 ϑ ζ < , owing to the optimal value is unknown, the values of ε , η , ϑ , ζ can not be obtained directly. Therefore, we will verify that for the whole constants ε , η , ϑ , ζ , Q ( s i , a i ) is able to converge to the optimal solution after interations.
If 0 ϑ ζ < 1 , then for i = 0 , 1 , , the reward function Q i ( s i , a i ) should satisfy the following conditions.
( 1 + ϑ 1 ( 1 + η 1 ) i ) Q * ( s i , a i ) Q i ( s i , a i ) ( 1 + ζ 1 ( 1 + ε 1 ) i ) Q * ( s i , a i ) ,
In addition, according to the mathematical induction method Equation (22) can be verified as follows.
When i = 0 , it can be obtained that
Q 1 ( s i , a i ) = O ( s i , a i ) + γ m a x Q 0 ( s i + 1 , a i + 1 ) O ( s i , a i ) + ϑ γ m a x Q * ( s i + 1 , a i + 1 ) ( 1 + η ϑ 1 1 + η ) O ( s i , a i ) + γ ( ϑ ϑ 1 1 + η ) m a x Q * ( s i + 1 , a i + 1 ) = ( 1 + η ϑ 1 1 + η ) [ O ( s i , a i ) + γ m a x Q * ( s i + 1 , a i + 1 ) ] = ( 1 + ϑ 1 1 + η 1 ) Q * ( s i , a i )
Similarly, it can be inferred that
Q 1 ( s i , a i ( 1 + ( ϑ 1 1 + ε 1 ) ) ) Q * ( s i , a i ) ,
Then, when i = 0 , Equation (22) can be satisfied.
Assuming that when i = l 1 , l = 1 , 2 , , Equation (22) can also be satisfied. Then, when i = l , it can be acquired that
Q 1 ( s i , a i ) = O ( s i , a i ) + γ m a x Q 0 ( s i + 1 , a i + 1 ) O ( s i , a i ) + γ ( 1 + η l 1 ( ϑ 1 ) ( 1 + η ) l 1 ) m a x Q * ( s i + 1 , a i + 1 ) ( 1 + η l ϑ 1 ( 1 + η ) l ) [ O ( s i , a i ) + γ m a x Q * ( s i + 1 , a i + 1 ) ] = ( 1 + ϑ 1 ( 1 + η 1 ) l ) Q * ( s i , a i ) ,
Similarly, it can be inferred that
Q 1 + 1 ( s i , a i ) ( 1 + ζ 1 ( 1 + ε 1 ) l ) Q * ( s i , a i ) ,
Therefore, for i = 0 , 1 , 2 , , Equation (22) can be satisfied.
In the following, we will prove that when 0 ϑ 1 ζ < , the following equation is satisfied.
( 1 + ϑ 1 ( 1 + η 1 ) l ) Q * ( s i , a i ) Q i ( s i , a i ) ( 1 + ϑ 1 ( 1 + η 1 ) l ) Q * ( s i , a i ) ,
Let i = 0 , and we can obtained that
Q 1 ( s i , a i ) = O ( s i , a i ) + γ m a x Q 0 ( s i + 1 , a i + 1 ) O ( s i , a i ) + ζ γ m a x Q * ( s i + 1 , a i + 1 ) + ζ 1 1 + η [ η O ( s i , a i ) γ m a x Q * ( s i + 1 , a i + 1 ) ] = ( 1 + ζ 1 1 + η 1 ) Q * ( s i , a i ) ,
Thus, when 0 ϑ ζ < and for i = 0 , 1 , 2 , , the reward function can satisfy Equation (22).
Eventually, due to the conclusion above, for random constant ε , η , ϑ , ζ , according to Equations (22)–(28), when i , Equation (19) can be proved.    □

4.2. Deep Reinforcement Learning Algorithm

From Algorithm 2, the tack of deep Q-network is to exploit a feedforward neural network to approximate the Q-value function Q ( s , a ; θ ) . To be specific, the input layer of this Q-network is the state s, as well as the output layer is action a according to state s. The loss function measures the distance between the trained model and the actual model. The general goal is to minimize the loss function and continuously optimize the model. Furthermore, the parameter θ attempts to minimize the loss function with the following equation.
L ( θ ) = E [ ( y ( s , a , s ; θ ^ ) Q ( s , a ; θ ) ) 2 ]
where the target function y ( s , a , s ; θ ^ ) = R + λ m a x Q ( s , a ; θ ^ ) changes when the parameter θ ^ are updated.
Algorithm 2 Deep Q-learning Algorithm
  • Input: state s, action a;
  • Output: Q ( s , a ) ;
  • Initialization:
  • Initialize deep Q-network with weight θ ;
  • for  i < T  do
  •    if random probability P < δ  then
  •         select an action a i ;
  •         otherwise
  •          a i = a r g m a x Q ( s , a ; θ ) ;
  •    end if
  •    Carry out action a i in this mechanism, then obtain the reward R i and the next state s i + 1 ;
  •    Compute the target Q-value
  •     y ( s , a , s ; θ ^ ) = R + λ m a x Q ( s , a ; θ ^ ) ;
  •    Update deep Q-network by minimizing the loss value L ( θ ) with (29);
  • end for

5. Computation Offloading and Resource Management Approach Relying on Deep Reinforcement Learning

In the SAGIN scenario, the access channel conditions, computation capabilities and storage conditions are varying dynamically. It is a hassle to exploit a traditional mechanism to find an optimal solution properly. By comparison, DRL does not need a well-established formulation or previous information, while it can revise the strategy adaptively according to the environment. Thus, we exploit deep Q-learning to discover the optimal action efficiently, which is displayed in Figure 3. Furthermore, different strategies are determined according to offload task to space, air or ground. To be specific, if the offloaded task is cached, the caching model should be considered to economize the computation delay. Otherwise, the offloaded task will be calculated directly.
State Space. The state of the edge servers and the available cache d D for TD m M in timeslot t is determined by the random variable a c o m m , a c o m p and the random variable a c a c h e .
Action Space. In the SAGIN architecture, the agent should determine where offloaded task relying to the limited resource, whether or not the offloaded task has been cached in the server.
Correspondingly, the current action a m ( t ) at timeslot t is formulated as
a m ( t ) = { a m c o m m ( t ) , a m c o m p ( t ) , a m c a c h e ( t ) } ,
where a m c o m m ( t ) , a m c o m p ( t ) , and a m c a c h e ( t ) are established as follows.
First, we define row vector a m c o m m ( t ) = [ a m , 1 c o m m ( t ) , a m , 2 c o m m ( t ) , , a m , N c o m m ( t ) ] , where a m , i c o m m ( t ) , i { 1 , 2 , , N } denotes whether TD m are connected to SAGIN edge server. Moreover, the value of a m , i c o m m ( t ) is { 1 , 2 , 3 } , which indicates that at timeslot t the task in TD m choose to be offloaded to ground, air or space. Similarly, action a m c o m p ( t ) is defined as a m c o m p ( t ) = [ a m , 1 c o m p ( t ) , a m , 2 c o m p ( t ) , , a m , N c o m p ( t ) ] .
Then, row vector is defined as a m c a c h e ( t ) = [ a m , 1 c a c h e ( t ) , a m , 2 c a c h e ( t ) , , a m , N c a c h e ( t ) ] , where a m , j c a c h e ( t ) , j { 1 , 2 , , N } indicates whether the content of TD m has been cached. Specifically, the value of a m , j c a c h e ( t ) is { 0 , 1 } , where a m , j c a c h e ( t ) = 0 represents that at timeslot t the content is not cached, otherwise a m , j c a c h e ( t ) = 1 denotes cached.
Reward function. The reward of the edge server in SAGIN architecture is to jointly maximize the revenue of communication model, computation model and caching model. The reward function for TD m is formulated as follows.
R m ( t ) = R m c o m m ( t ) + R m c o m p ( t ) + R m c a c h e ( t ) ,

6. Simulation Result

In this section, the experiment environment of computation offloading and resource management approach relying on deep reinforcement learning in SAGIN will be introduced. We executed this approach Pycharm with python 3.6.13. The parameters of simulation are provided in the first subsection and the experimental results analysis will be provided in the second subsection. To be specific, the impact of the DNN’s parameters is also discussed in the following subsection.

6.1. Parameter Setting

Based on Ref. [12] and the common setting in wireless networking, we set up the parameters’ values in this experiment, as seen Table 2. In the following, the parameters of the deep neural network (DNN) will be described, which is a vital part of the Deep Q-learning mechanism. An input layer, two hidden layers, and an output layer constitute the network architecture of the DNN. In detail, the first hidden layer is set as 200 hidden neurons, and the second has 120. The output layer is equal to the number of TDs. To verify the effect of the TD number, the learning rate is 0.005 and memory size is 1024. Moreover, the training interval is set as 10, and training batch size is 128.

6.2. Simulation Result Discussion

The impact on simulation result by different parameters will be analyzed in the following experiment. Firstly, we will analyze the convergence of the proposed novel approach in SAGIN. Figure 4 and Figure 5 depicts the changing normalized reward value and training loss rate by DQN with T D _ n u m b e r = 20 . Figure 4 shows that the reward value increases rapidly with the increasing iteration, and then converge to the optimal value. From Figure 5, the training loss value decreases rapidly at the first 50 iteration and then remains stable. To be concluded, the proposed mechanism can seek the appropriated optimal solution owing on its strategy.
To verify the performance of DQN for computation offloading and resource management in SAGIN, different TD numbers are chosen. In this simulation, TD number is set as 5, 10, 20 in Figure 6, and it is obvious to obtain that the whole result curves can coverage to the optimal value finally. To be specific, with TD number increases, the reward value performs worse owing to the larger calculation quantity. At the case of T D _ n u m b e r = 20 , the reward value could achieve a stable solution after 400 iterations, while at the case of T D _ n u m b e r = 5 , the proposed approach is capable of achieving the optimal value at about iteration 100. To be concluded, with the increase in TD numbers, the signaling overhead raises apparently and the reward value of different TD numbers is affected subsequently.
As can be seen in Figure 6, the proposed DQN approach is capable of converging to the optimum value rapidly in 100 iterations and finally obtaining the optimal reward value with different TD numbers. As shown in Figure 7, Figure 8, Figure 9 and Figure 10, the proposed DQN approach with different parameters can achieve satisfying reward values ultimately. In the following, we will discuss the influence on experimental result by four critical parameters in DQN, i.e., learning rate, training interval, batch size and memory size. Learning rate dominates the learning process of the training model, which affects the convergence speed. Batch size has a great influence on the convergence degree and speed. A suitable batch size can prevent the algorithm form falling into local optimum. Similarly, training interval and memory size are also two vital parameters in DNN.
Figure 7 describes the impact on learning rate for DNN, which is set as 0.005, 0.01, 0.03, 0.05, 0.077, 0.099, 0.1. This figure demonstrates that learning rate has an important effect on the efficiency of the proposed approach. To be specific, in the case of L e a r n i n g r a t e = 0.03 , the reward value shown by DRL algorithm performs better than others during the whole iteration. L e a r n i n g r a t e = 0.077 is not capable of completing the curve like L e a r n i n g r a t e = 0.03 , but surpasses other cases. To be specific, L e a r n i n g r a t e = 0.1 has the worst performance among the different cases.
Figure 8 shows the reward value changes against iterations with different training intervals, while the value of training interval is determined as 5, 10, 15, 20. Obviously, we can obtain that while the training interval is 5 and 15, the reward value generated by DRL has a relatively better performance during the whole stage. In the case of t r a i n i n g i n t e r v a l = 20 , the reward value has a poor performance at the final stage.
As displayed in Figure 9, the batch rate has an influence in the simulation result, which is set as 32, 64, 128, 256. In detail, when batch size equals to 32, it achieves the best performance after 400 iterations, even compared to 64, 128, and 256. The reward value of b a t c h s i z e = 64 generated by DRL algorithm does not have a good performance at the first stage. Nevertheless, its reward value broadens swiftly and achieves a best performance when the iteration rate equals 1000. B a t c h s i z e = 128 has a slightly worse performance than the case of b a t c h s i z e = 64 . Additionally, b a t c h s i z e = 256 does not performing well.
Figure 10 provides the reward value changes against iterations with different memory size. Specifically, the memory size in the simulation experiment is set as 128, 256, 512, 1024. It is obvious to see that when memory size is set as 128, the reward value required by DQN performs best during the whole stage. Furthermore, when m e m o r y s i z e = 1024 , the reward value has a great distance with other three schemes.
To address the system performance issues, we provide the simulation results about the average delay and average normalized throughput among different TD numbers which is set as 5, 10, 20. Additionally, the numerical results of average delay time are 0.005550882577896118, 0.011090971231460571, 0.23159347009658812, as well as average normalized throughput are 0.9762736113164308, 0.9512185884374253, 0.8488317945596101, respectively. From Figure 11, it can be achieved that when TD number equals to 20, the average delay time is largest and the average normalized throughput performs the worst. Through analysis, with the TD number increases, the whole transmission efficiency of this network system reduces correspondingly.

7. Conclusions

With the support of edge computing, terminal devices can offload the computation task to the edge server for obtaining high-quality computing services. However, the coverage range of existing edge networks was limited, which made it difficult to provide ubiquitous computing services for terminal devices. In order to further expand the coverage range of edge networks and improve their computation capability, this manuscript proposed a computation offloading and resource management architecture in SAGIN. In this architecture, in order to maximize the processing capability of SAGIN, the joint optimization problem of offloading decisions, spectrum, computation and storage resource management in SAGIN was modeled as an MINLP problem. To address this issue, this manuscript proposed a computational offloading and resource management mechanism relying on deep reinforcement learning. In this mechanism, the offloading decision was continuously processed, enhancing the convergence of the network. At the same time, a continuous reward function was designed to avoid high rewards caused by allocating too many resources to terminal devices. From the simulation results, the proposed DQN approach is capable of converging to the optimum value rapidly in 100 iterations and finally obtaining the optimal reward value with different TD numbers. Furthermore, the simulation results analyzed the impact of our proposed mechanism on different parameters and the proposed DQN approach with different parameters can achieve satisfying reward values ultimately.

Author Contributions

Conceptualization, F.L. and T.S.; methodology, K.Q.; software, M.L.; validation, N.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is unavailable due to privacy.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lin, H.; Zeadally, S.; Chen, Z.; Labiod, H.; Wang, L. A survey on computation offloading modeling for edge computing. J. Netw. Comput. Appl. 2020, 169, 102781. [Google Scholar] [CrossRef]
  2. Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A survey on mobile edge computing: The communication perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
  3. Luo, Q.; Hu, S.; Li, C.; Li, G.; Shi, W. Resource scheduling in edge computing: A survey. IEEE Commun. Surv. Tutor. 2021, 23, 2131–2165. [Google Scholar] [CrossRef]
  4. Zhou, H.; Wang, Z.; Zheng, H.; He, S.; Dong, M. Cost Minimization-Oriented Computation Offloading and Service Caching in Mobile Cloud-Edge Computing: An A3C-Based Approach. IEEE Trans. Netw. Sci. Eng. 2023, 10, 1326–1338. [Google Scholar] [CrossRef]
  5. Li, K.; Wang, X.; He, Q.; Ni, Q.; Yang, M.; Dustdar, S. Computation Offloading for Tasks With Bound Constraints in Multiaccess Edge Computing. IEEE Internet Things J. 2023, 10, 15526–15536. [Google Scholar] [CrossRef]
  6. Dai, X.; Xiao, Z.; Jiang, H.; Lui, J.C.S. UAV-Assisted Task Offloading in Vehicular Edge Computing Networks. IEEE Trans. Mob. Comput. 2023, 23, 2520–2534. [Google Scholar] [CrossRef]
  7. Subburaj, B.; Jayachandran, U.M.; Arumugham, V.; Suthanthira Amalraj, M.J.A. A Self-Adaptive Trajectory Optimization Algorithm Using Fuzzy Logic for Mobile Edge Computing System Assisted by Unmanned Aerial Vehicle. Drones 2023, 7, 266. [Google Scholar] [CrossRef]
  8. He, J.; Cheng, N.; Yin, Z.; Zhou, C.; Zhou, H.; Quan, W.; Lin, X.H. Service-Oriented Network Resource Orchestration in Space-Air-Ground Integrated Network. IEEE Trans. Veh. Technol. 2024, 73, 1162–1174. [Google Scholar] [CrossRef]
  9. Liu, L.; Li, C.; Zhao, Y. Machine Learning Based Interference Mitigation for Intelligent Air-to-Ground Internet of Things. Electronics 2023, 12, 248. [Google Scholar] [CrossRef]
  10. Han, D.; Mulyana, B.; Stankovic, V.; Cheng, S. A Survey on Deep Reinforcement Learning Algorithms for Robotic Manipulation. Sensors 2023, 23, 3762. [Google Scholar] [CrossRef]
  11. Danino, T.; Ben-Shimol, Y.; Greenberg, S. Container Allocation in Cloud Environment Using Multi-Agent Deep Reinforcement Learning. Electronics 2023, 12, 2614. [Google Scholar] [CrossRef]
  12. Sadiki, A.; Bentahar, J.; Dssouli, R.; En-Nouaary, A.; Otrok, H. Deep reinforcement learning for the computation offloading in MIMO-based Edge Computing. Ad Hoc Netw. 2023, 141, 103080. [Google Scholar] [CrossRef]
  13. Wu, G.; Wang, H.; Zhang, H.; Zhao, Y.; Yu, S.; Shen, S. Computation Offloading Method Using Stochastic Games for Software-Defined-Network-Based Multiagent Mobile Edge Computing. IEEE Internet Things J. 2023, 10, 17620–17634. [Google Scholar] [CrossRef]
  14. Li, F.; Fang, C.; Liu, M.; Li, N.; Sun, T. Intelligent Computation Offloading Mechanism with Content Cache in Mobile Edge Computing. Electronics 2023, 12, 1254. [Google Scholar] [CrossRef]
  15. Zeng, F.; Chen, Q.; Meng, L.; Wu, J. Volunteer Assisted Collaborative Offloading and Resource Allocation in Vehicular Edge Computing. IEEE Trans. Intell. Transp. Syst. 2020, 22, 3247–3257. [Google Scholar] [CrossRef]
  16. Zhang, Z.; Zeng, F. Efficient Task Allocation for Computation Offloading in Vehicular Edge Computing. IEEE Internet Things J. 2023, 10, 5595–5606. [Google Scholar] [CrossRef]
  17. Zhou, H.; Jiang, K.; Liu, X.; Li, X.; Leung, V.C. Deep Reinforcement Learning for Energy-efficient Computation Offloading in Mobile-edge-computing. IEEE Internet Things J. 2022, 9, 1517–1530. [Google Scholar] [CrossRef]
  18. Chen, J.; Huan, L.; Zhi, W.; Xu, L.; Tao, T. A DRL Agent for Jointly Optmizing Computation Offloading and Resource Allocation in MEC. IEEE Internet Things J. 2021, 8, 17508–17524. [Google Scholar] [CrossRef]
  19. Gong, Y.; Yao, H.; Wang, J.; Li, M.; Guo, S. Edge Intelligence Driven Joint Offloading and Resource Allocation for Future 6G Industrial Internet of Things. IEEE Trans. Netw. Sci. Eng. 2022; early access. [Google Scholar] [CrossRef]
  20. Chen, M.; Wang, H.; Han, D.; Chu, X. Signaling-based incentive mechanism for d2d computation offloading. IEEE Internet Things J. 2022, 9, 4639–4649. [Google Scholar] [CrossRef]
  21. Peng, J.; Qiu, H.; Cai, J.; Xu, W.; Wang, J. D2d-assisted multi-user cooperative partial offloading, transmission scheduling and computation allocating for MEC. IEEE Trans. Wirel. Commun. 2021, 20, 4858–4873. [Google Scholar] [CrossRef]
  22. Fang, T.; Yuan, F.; Ao, L.; Chen, J. Joint task offloading, D2D pairing, and resource allocation in device-enhanced MEC: A potential game approach. IEEE Internet Things J. 2022, 9, 3226–3237. [Google Scholar] [CrossRef]
  23. Peng, H.X.; Shen, X.S. DDPG-based Resource Management for MEC/UAV Assisted Vehicular Networks. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Victoria, BC, Canada, 16–18 December 2020; pp. 1–6. [Google Scholar]
  24. Peng, H.X.; Wu, H.; Shen, X.S. Edge Intelligence for Multi-dimensional Resource Management in Aerial Assisted Vehicular Networks. IEEE Wirel. Commun. 2021, 28, 59–65. [Google Scholar] [CrossRef]
  25. Seid, A.M.; Boateng, G.O.; Anokye, S.; Kwantwi, T.; Sun, G.; Liu, G. Collaborative Computation Offloading and Resource Management in Multi-UAV assisted IoT Networks:a deep reinforcement learning approach. IEEE Wirel. Commun. 2021, 28, 59–65. [Google Scholar]
  26. Zhou, J.; Tian, D.; Sheng, Z.; Duan, X.; Shen, X. Joint Mobility, Communication and Computation Optimization for UAVs in Air-Ground Cooperative Networks. IEEE Trans. Veh. Technol. 2021, 70, 2493–2507. [Google Scholar] [CrossRef]
  27. Yuan, Y.; Gao, S.; Zhang, Z.; Wang, W.; Xu, Z.; Liu, Z. Edge-Cloud Collaborative UAV Object Detection: Edge-Embedded Lightweight Algorithm Design and Task Offloading Using Fuzzy Neural Network. IEEE Trans. Cloud Comput. 2024, 12, 306–318. [Google Scholar] [CrossRef]
  28. Ma, X.; Su, Z.; Xu, Q.; Ying, B. Edge Computing and UAV Swarm Cooperative Task Offloading in Vehicular Networks. In Proceedings of the International Wireless Communications and Mobile Computing, Dubrovnik, Croatia, 30 May–3 June 2022; pp. 955–960. [Google Scholar]
  29. Liu, J.; Li, G.; Huang, Q.; Bilal, M.; Xu, X.; Song, H. Cooperative Resource Allocation for Computation-Intensive IIoT Applications in Aerial Computing. IEEE Internet Things J. 2023, 10, 9295–9307. [Google Scholar] [CrossRef]
  30. Dinh, P.; Nguyen, T.M.; Sharafeddine, S.; Assi, C. Joint Location and Beamforming Design for Cooperative UAVs With Limited Storage Capacity. IEEE Trans. Commun. 2019, 67, 8112–8123. [Google Scholar] [CrossRef]
  31. Li, J.; Chen, H.; Chen, Y.; Lin, Z.; Vucetic, B.; Hanzo, L. Pricing and Resource Allocation via Game Theory for a Small-Cell Video Caching System. IEEE J. Sel. Areas Commun. 2016, 34, 2115–2129. [Google Scholar] [CrossRef]
  32. Jin, Y.; Wen, Y.; Westphal, C. Optimal Transcoding and Caching for Adaptive Streaming in Media Cloud: An Analytical Approach. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 1914–1925. [Google Scholar] [CrossRef]
  33. Huang, L.; Bi, S.; Zhang, Y.J.A. Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Network. IEEE Trans. Mob. Comput. 2020, 19, 2581–2593. [Google Scholar] [CrossRef]
Figure 1. Architecture of edge computing in space–air–ground integrated networking.
Figure 1. Architecture of edge computing in space–air–ground integrated networking.
Electronics 13 01804 g001
Figure 2. Offloading scenarios of edge computing in space–air–ground integrated networking.
Figure 2. Offloading scenarios of edge computing in space–air–ground integrated networking.
Electronics 13 01804 g002
Figure 3. Deep reinforcement learning for computation offloading and resource management in SAGIN.
Figure 3. Deep reinforcement learning for computation offloading and resource management in SAGIN.
Electronics 13 01804 g003
Figure 4. Convergence change of DQN for computation offloading and resource management in SAGIN.
Figure 4. Convergence change of DQN for computation offloading and resource management in SAGIN.
Electronics 13 01804 g004
Figure 5. Training loss of DQN for computation offloading and resource management in SAGIN.
Figure 5. Training loss of DQN for computation offloading and resource management in SAGIN.
Electronics 13 01804 g005
Figure 6. TD number impact on DQN-based mechanism in SAGIN.
Figure 6. TD number impact on DQN-based mechanism in SAGIN.
Electronics 13 01804 g006
Figure 7. Learning rate impact on DQN-based mechanism in SAGIN.
Figure 7. Learning rate impact on DQN-based mechanism in SAGIN.
Electronics 13 01804 g007
Figure 8. Training interval impact on DQN-based mechanism in SAGIN.
Figure 8. Training interval impact on DQN-based mechanism in SAGIN.
Electronics 13 01804 g008
Figure 9. Batch size impact on DQN-based mechanism in SAGIN.
Figure 9. Batch size impact on DQN-based mechanism in SAGIN.
Electronics 13 01804 g009
Figure 10. Memory size impact on DQN-based mechanism in SAGIN.
Figure 10. Memory size impact on DQN-based mechanism in SAGIN.
Electronics 13 01804 g010
Figure 11. Analysis on average delay and average normalized throughput in SAGIN.
Figure 11. Analysis on average delay and average normalized throughput in SAGIN.
Electronics 13 01804 g011
Table 2. Parameter settings.
Table 2. Parameter settings.
System ParametersValue Setting
The access fee charged by ground edge servera random value in [1, 2] units/bps
The access fee charged by air edge servera random value in [3, 4] units/bps
The access fee charged by space edge servera random value in [5, 7] units/bps
The usage cost of spectrum paid by ground edge server[ 1 × 10 4 , 2 × 10 4 ] units/Hz
The usage cost of spectrum paid by air edge server[ 3 × 10 4 , 4 × 10 4 ] units/Hz
The usage cost of spectrum paid by space edge server[ 5 × 10 4 , 7 × 10 4 ] units/Hz
The computation fee charged by ground edge server0.2 units/J
The computation fee charged by air edge server0.4 units/J
The computation fee charged by space edge server0.6 units/J
The storage fee charged by ground edge server10 units/byte
The storage fee charged by air edge server15 units/byte
The storage fee charged by space edge server20 units/byte
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, F.; Qu, K.; Liu, M.; Li, N.; Sun, T. Collaborative Computation Offloading and Resource Management in Space–Air–Ground Integrated Networking: A Deep Reinforcement Learning Approach. Electronics 2024, 13, 1804. https://doi.org/10.3390/electronics13101804

AMA Style

Li F, Qu K, Liu M, Li N, Sun T. Collaborative Computation Offloading and Resource Management in Space–Air–Ground Integrated Networking: A Deep Reinforcement Learning Approach. Electronics. 2024; 13(10):1804. https://doi.org/10.3390/electronics13101804

Chicago/Turabian Style

Li, Feixiang, Kai Qu, Mingzhe Liu, Ning Li, and Tian Sun. 2024. "Collaborative Computation Offloading and Resource Management in Space–Air–Ground Integrated Networking: A Deep Reinforcement Learning Approach" Electronics 13, no. 10: 1804. https://doi.org/10.3390/electronics13101804

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop