Task Partition-Based Computation Offloading and Content Caching for Cloud–Edge Cooperation Networks

Huang, Jingjing; Yang, Xiaoping; Chen, Jinyi; Chen, Jiabao; Hu, Zhaoming; Zhang, Jie; Wang, Zhuwei; Fang, Chao

doi:10.3390/sym16070906

Open AccessArticle

Task Partition-Based Computation Offloading and Content Caching for Cloud–Edge Cooperation Networks

by

Jingjing Huang

^1,2,

Xiaoping Yang

^3,*,

Jinyi Chen

⁴,

Jiabao Chen

^3,*

,

Zhaoming Hu

^3,*,

Jie Zhang

⁴,

Zhuwei Wang

³

and

Chao Fang

^3,5,*

¹

Beijing SaiXi Technology Development Company with Limited Liability, Beijing 100176, China

²

China Electronics Standardization Institute, Beijing 100007, China

³

School of Information Technology, Beijing University of Technology, Beijing 100124, China

⁴

Beijing Institute of Astronautical Systems Engineering, Beijing 100076, China

⁵

Purple Mountain Laboratories, Nanjing 211111, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2024, 16(7), 906; https://doi.org/10.3390/sym16070906 (registering DOI)

Submission received: 3 June 2024 / Revised: 11 July 2024 / Accepted: 12 July 2024 / Published: 16 July 2024

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

:

With the increasing complexity of applications, many delay-sensitive and compute-intensive services have posed significant challenges to mobile devices. Addressing how to efficiently allocate heterogeneous network resources to meet the computing and delay requirements of terminal services is a pressing issue. In this paper, a new cooperative twin delayed deep deterministic policy gradient and deep-Q network (TD3-DQN) algorithm is introduced to minimize system latency by optimizing computational offloading and caching placement asynchronously. Specifically, the task-partitioning technique divides computing tasks into multiple subtasks, reducing the response latency. A DQN intelligent algorithm is presented to optimize the offloading path to edge servers by perceiving network resource status. Furthermore, a TD3 approach is designed to optimize the cached content in the edge servers, ensuring dynamic popularity content requirements are met without excessive offload decisions. The simulation results demonstrate that the proposed model achieves lower latency and quicker convergence in asymmetrical cloud–edge collaborative networks compared to other benchmark algorithms.

Keywords:

asymmetrical cloud–edge cooperation; resource allocation; task offloading; content caching; task partition; deep reinforcement learning

1. Introduction

In recent years, the demand for mobile data traffic has been increasing as emerging Internet of Things (IoT) applications, such as smart cities, healthcare, and augmented/virtual reality (AR/VR), place stringent requirements on low latency and high reliability [1,2,3]. Traditional cloud computing encounters a series of difficulties due to the distance between peripheral devices and centralization data centers with computing and storage resources [4,5,6]. To solve this problem, mobile edge computation (MEC) is proposed [7] to reduce latency for mobile users [8,9]. MEC pushes computing and storage resources to the edge of the network such as base stations (BSs) and access points (APs) to enable computation-intensive and delay-sensitive applications on resource-limited mobile devices [10]. However, MEC ignores the huge computation resources of cloud servers [11]. Both the edge and the cloud can provide computing services for mobile devices to enhance their performance [12,13]. The edge can reduce the conveying delay by providing local computing services, while the cloud can support enormous computing requirements [14]. Their cooperation can improve the utilization of computing resources and ensure the QoS, thus being critical to the asymmetrical cloud–edge cooperation computing models [15,16]. Therefore, the asymmetrical cloud–edge cooperation network computation systems, making full use of powerful resources at both the cloud and edge, will be a promising solution [17].

With the widespread adoption of various mobile applications, large amounts of data are requested repeatedly, placing a significant burden on the mobile core network and the backhaul link. Edge caching, which allows data to be stored on the mobile edge, is proposed as an effective solution to alleviate this problem [18,19]. However, edge servers face limitations in storage capacity, making it challenging to store a diverse range of content to satisfy the needs of a large number of users [20]. To address this challenge, the technique of cooperative content caching is introduced [21]. Cooperation techniques among servers ensure that directly connected BSs no longer redundantly store cached content from neighboring BSs [22]. Consequently, this approach provides users with a diverse range of data services, while also reducing the redundancy of computational offloading and execution frequency.

However, for the content not cached by the edge server, mobile users must rely on task offloading techniques to offload the remaining tasks to other edge servers or cloud servers for completion [23]. Combining computation offloading and content caching [24] to leverage the strengths of both is a primary focus of current research. Nonetheless, a notable concern in existing studies is the imbalance in resource allocation between computation offloading and content caching strategies, leading to the overloading of some edge servers or underutilization of resources [25,26]. Therefore, to address this challenge, we propose a joint optimization of computation offloading and content caching strategies [27,28] to provide high-quality services to users.

Despite the fact that existing work can produce reliable offloading and caching decisions, they have been unable to realize fine-grained task partitioning and parallel processing when processing a complete task, thus hindering the realization of rational resource allocation. To address this problem, we introduce the task-partitioning technique [29,30]. Considering the attributes of tasks and the status of network nodes, complex tasks are divided into multiple subtasks. These subtasks are then allocated proportionally to edge nodes associated with the user for parallel execution [31], thereby improving overall computational efficiency and system performance. Although task-partitioning techniques can improve the efficient use of resources, in resource-constrained network scenarios, the task-partitioning technique still has shortcomings in handling computationally intensive tasks and cannot reduce the number of repeated task offloads and executions. Therefore, task partitioning is envisioned as an effective complement to task offloading [32].

Task-offloading technology based on task partitioning can not only allocate network resources in finer granularity to improve the system’s parallel processing capability, but also reduce server load pressure and improve data access speed while reducing network data transmission [33,34]. Although existing works can effectively solve the problems of computation-intensive task partitioning and computation offloading [35], few studies focus on the joint optimization of the task partitioning, computation offloading, and content caching strategies. Therefore, we propose an optimization model based on joint task partitioning, content caching, and computation offloading. The main contributions of this paper are as follows:

We formulate a minimizing network delay model to describe the joint optimization problem for computation offloading and content caching in cloud–edge collaboration networks, where the task-partitioning technique is introduced to divide computing tasks into multiple subtasks, reducing the response latency.
For the joint optimization problem of computation offloading and content caching, we propose a cooperative twin delayed deep deterministic policy gradient and deep-Q network (TD3-DQN) algorithm. The proposed algorithm intelligently replaces the in-network contents and optimizes the computation offloading path to edge servers by observing the network caching status and network resource environments, respectively.
We evaluate the performance of the proposed solution under different environments. The simulation results show that the proposed model can achieve lower network latency and fast convergence compared to its existing benchmarks in cloud–edge cooperation networks.

The rest of this paper is organized as follows. The system model and problem are described in Section 2. We propose a DRL-based cloud–edge cooperative caching and intelligent routing solution in Section 3. Our simulation results and discussions are given in Section 4. Finally, the conclusions are given in Section 5.

2. Related Work

2.1. Intelligent Task Offloading

To meet the differentiated service requirements of these delay-sensitive and computing-intensive tasks, intelligent task offloading has been drawing great attention as a promising solution [2,3]. By deploying computation offloading and content caching functionalities at edge nodes, the tasks from mobile users can be processed at the edge of wireless networks rather than sent to the cloud to handle, improving quality of experience (QoE) and quality of service (QoS) [4,36,37].

The computational offloading model in MEC systems has been extensively studied. Leveraging the rapid response capability of edge servers, they provide users with more convenient computing services [4]. Zhu et al. [13] addressed the computational task offloading issue in drone-assisted scenarios using a multi-agent reinforcement learning algorithm, minimizing the average task response time for drones. Liu et al. [38] formulated the multi-vehicle computation offloading problem as a multi-user computation offloading game and introduced a distributed computation offloading algorithm to reduce the computational cost of vehicles. Guo et al. [39] designed a suboptimal algorithm to jointly optimize computation offloading decisions, spectrum, power, and computing resource allocation to minimize the energy consumption of all user equipment (UE). To overcome the limited computational resources of edge servers, the collaborative cloud–edge computation offloading technique is introduced, capitalizing on the powerful computing capabilities of the cloud center [16]. Chen et al. [40] proposed a dependency-aware offloading scheme, utilizing a greedy algorithm for edge–cloud collaboration under task dependency constraints, aiming to minimize the overall system latency. Zhao et al. [17] proposed a collaborative computational offloading and resource allocation optimization scheme for the cloud–MEC collaborative computation-offloading problem, enhancing system utilization and response rates. However, existing research has focused solely on the impact of computation-offloading strategies on transmission delay or energy consumption, neglecting the advantages of data caching strategies for computation offloading.

Edge caching, as a technology that stores content on edge servers close to users, has garnered widespread attention for its ability to reduce the cost of task offloading [36,41,42]. Wang et al. [18] proposed an edge caching scheme based on the concepts of content-centric networking or information-centric networking and verified its various advantages in leveraging cached content in 5G mobile networks. Zhang et al. [19] introduced a hierarchical mobile-aware edge caching scheme that employs game theory to achieve optimal collaborative caching and computing strategies, thereby minimizing content access latency and enhancing cache resource utilization. However, due to limited storage capacity, edge servers cannot cache all content, leading to the need to search for uncached request content in cloud data centers or other neighboring servers [22]. Xiao et al. [43] designed a collaborative cloud–edge–end transcoding online caching solution for IoT video services, aiming to minimize overall system latency. Kwak et al. [21] implemented a hybrid content caching design for the central cloud unit (CU) and base stations’ (BSs) control decisions using Lyapunov optimization methods, minimizing average end-to-end service latency. Su et al. [44] proposed a dynamic content caching scheme based on cross-entropy, leveraging collaborative content caching between vehicle requests and roadside units (RSUs) to minimize content transmission latency. However, these efforts often neglect the impact of computation offloading costs on the network, and the proposed caching strategies, in many cases, overlook the updating of cached content.

The combination of computation offloading and content caching [45,46] not only reduces the burden on local devices and enhances computational capabilities, but also improves data access speed and reduces data transmission costs [25,27]. Yang et al. [28] investigated a hybrid mobile cloud–edge computing system, employing a distributed algorithm based on the alternating direction method of multipliers (ADMM) to achieve near-optimal decisions for computation offloading and data caching. Zhou et al. [47] introduced an innovative computation-offloading and service-caching mechanism based on deep reinforcement learning (DRL), utilizing an asynchronous advantage actor–critic (A3C) algorithm aiming to minimize computational latency and overall system costs. Dong et al. [48] designed the hybrid cache algorithm and the enhanced offloading algorithm, providing optimal computation offloading and edge content-caching strategies, leading to a further reduction in the average response time for computation and content task requests. While existing work can generate reliable offloading and caching decisions [24], there is a lack of research for hybrid tasks’ offloading where computing tasks and content tasks are tightly coupled.

2.2. Task Partition

Although the existing work can produce reliable offloading and caching decisions, it is unable to realize finer-grained task partitioning and parallel processing when handling a complete task, thus introducing the task-partitioning technique [31]. Task partitioning involves breaking down a complete large-scale task into multiple subtasks and then distributing all these subtasks to different network servers based on their characteristics and server status [30]. Various studies have focused on different design objectives when addressing the task-partitioning problem [29,30,31,49]. Lan et al. [49] addressed the issue of partitioning and orchestrating computer vision applications on CPU and GPU heterogeneous edge computing platforms and introduced algorithms for minimum latency task scheduling and minimum cost task scheduling to minimize processing latency and overall system costs. Hu et al. [29] tackled the joint optimization problem of unmanned aerial vehicle (UAV) positioning, time slot allocation, and computation task partitioning by employing an enhanced Lagrangian active set method to reduce the system energy consumption for all users. Fang et al. [30] introduced an iterative algorithm based on bisection search (BSS) for resource allocation in scenarios with multiple users, which jointly optimized the task allocation ratio and offloaded power allocation, aiming to minimize the task completion time in multi-user scenarios. Feng et al. [31] investigated task partitioning and user association issues in MEC systems and proposed two solutions based on dual-decomposition and matching between UEs and edge nodes (ENs), resulting in a significant reduction in average delays for both independent and dependent subtasks. However, the challenge persists in determining how to finely segment tasks and allocate subtasks reasonably when dealing with a complex joint content and computing task.

2.3. Task-Partition-Based Intelligent Task Offloading

From the above research, it is evident that the task-partition-based intelligent task-offloading techniques not only enable fine-grained resource allocation, but also facilitate efficient task offloading while reducing data transmission pressure [33,50,51]. Currently, various approaches are being employed in the research domain to address the joint optimization of task partitioning and task offloading techniques [32,33,34,50]. Ku et al. [34] presented a heuristic algorithm based on dynamic programming that performs real-time task partitioning and offloading strategies, aiming to minimize end-to-end latency for vehicular applications while maximizing application-level performance. Gao et al. [33] proposed a hierarchical computation partitioning framework for DNNs, optimizing both task partitioning and offloading strategies to minimize computation latency, energy consumption, and server costs. Liu et al. [50] investigated a dynamic offloading framework in multi-user scenarios, aiming for joint optimization of task partitioning and the allocation of communication and computing resources to minimize task execution latency. Gao et al. [32] proposed a DNN task-partitioning and -offloading mechanism based on mixed-integer linear programming (MILP), aiming to optimize task partitioning and offloading strategies to achieve minimal processing latency and alleviate the computational burden on mobile devices. While existing works have effectively addressed the challenges of computation-intensive task partitioning and offloading, few studies focus on the joint optimization of task offloading, content caching, and task partition strategies.

3. System Model

In this section, we briefly introduce the network, communication, caching, and delay model and formulate a latency minimization model for the joint optimization problems of task offloading and resource allocation in task-partition-based asymmetrical cloud–edge cooperation networks.

3.1. Network Model

As illustrated in Figure 1, the asymmetrical cloud–edge cooperation networks are composed of mobile users (MUs), BSs, and a cloud, where all contents requested by MUs are available at the cloud. The network is a directed graph

G = (N, L)

, where

N

and

L

represent the sets of network nodes and links, respectively. The set of MUs is denoted by

N_{M} = {1, 2, . . ., j, . . ., N_{M}}

, the set of BSs by

N_{B} = {1, 2, . . ., i, . . . N_{B}}

, the set of MUs accessed at BS i by

M_{i} = {1, 2, . . ., M_{i}}

, the set of adjacent BSs of the i-th BS and the cloud by

B_{i} = {1, 2, . . ., B_{i}}

, and the set of slots by

T = {1, 2, . . ., T}

. The network contains a set of content tasks denoted by

F = {1, 2, . . ., F}

, where F is the number of different network contents in the cloud–edge cooperation system. In the asymmetrical cloud–edge cooperation networks, the cloud caches all network contents, and BSs equipped with MEC servers have finite caching and computing capacities, while MUs do not have caching and computing capacities. Network contents with greater diversity will deteriorate the cache hit rate of the whole system, consuming more network delay. When the node receives the content task request k, the node queries whether the requested content k is stored in its cache, and if the content k is cached in the node, it is returned to the user along the reverse routing path. If the cache does not cache content k, the request k will be forwarded to its neighboring nodes for processing until the node satisfies the task request k.

Any task f in the network can be divided into multiple independent sub-tasks for distributed processing and transmission. The set of subtasks is denoted as

F_{f} = {f_{1}, f_{2}, . . ., f_{k}, . . ., f_{F_{f}}}

, where

F_{f}

is the number of different subtasks for task f, and the size of each subtask is different. Therefore, the task f is denoted by

\begin{matrix} \sum_{f_{k} \in F_{f}} f_{k} = f, \forall f \in F . \end{matrix}

(1)

In cloud–edge cooperation systems, we assume that content popularity follows a Zipf distribution [52]. Without loss of generality, the number of different contents in the network is F, and all contents rank in descending order according to their popularity. Therefore,

R^{f}

represents the total number of MUs’ requests for content f based on the

Z i p f

distribution, which can be written as

\begin{matrix} R^{f} = \frac{f^{- α}}{\sum_{k = 1}^{F} k^{- α}}, \forall f \in F, \end{matrix}

(2)

where R is the total number of task requests in the network and

α

is the skewness factor of the Zipf law. A higher

α

value indicates that more popular content in the network is requested more often by MUs.

3.2. Communication Model

The communication model can be divided into two parts: downlink transmission and uplink transmission. To prevent interference between BSs and users, we assume that these transmissions occur on different frequency bands. For transmission between the BS and MUs under the same BS, orthogonal frequency-division multiple access (OFDMA) technology is used, so there is no interference between MUs connected to the same BS since the carrier frequencies are different. However, co-channel interference may occur between MUs connected to different BSs. The transmission of wireless links is assumed to be static in the t-th slot. In the downlink, we use

P_{i}^{m a x}

to denote the maximum transmit power of the i-th BS. Here, the transmit power profile of the i-th BS in the t-th slot is denoted as

P_{i} (t) = {P_{i 1} (t), P_{i 2} (t) . . ., P_{i j} (t), . . . P_{i M_{i}} (t)}

, and it must satisfy the following constraints:

\sum_{j \in M_{u}} P_{i j} (t) \leq P_{i}^{max}, \forall i \in N_{B}, j \in M_{u}, t \in T,

(3)

where

P_{i j} (t)

is the transmit power of the transmit link between the i-th BS and the j-th MU connected to it in the t-th slot. The constraint indicates that the sum of the transmit power after the power profile is less than or equal to the maximum transmit power of the i-th BS in the t-th slot.

In the uplink, we use

P_{j i} (t)

to denote the transmit power of the transmit link between the j-th MU under the i-th BS and the i-th BS in the t-th slot, and it must satisfy the following constraints:

P_{j i} (t) \leq P_{j}^{max} (t), \forall i \in N_{B}, j \in M_{u}, t \in T,

(4)

where

P_{j}^{m a x} (t)

is the maximum transmit power of the j-th MU under the BS i in the t-th slot. The constraint indicates that the transmit power of each MU should be less than or equal to its own maximum transmit power.

In the downlink, all signals received by the j-th MU in the t-th slot when the i-th BS transmits information to the j-th MU connected to it are as

\begin{matrix} y_{i j}^{m} (t) = & \sqrt{P_{i j}^{m}} (t) h_{i j}^{m} (t) z_{i j}^{m} (t) + \sum_{i^{'} \in N_{B} ∖ {i}} \sqrt{P_{i^{'}}^{m}} (t) h_{i^{'} j}^{m} (t) z_{i^{'} j}^{m} (t) + n_{i j}^{m}, \end{matrix}

(5)

where

h_{i j}^{m} (t)

is the channel gain between the i-th BS and the j-th MU connected to it on subcarrier m in the t-th slot.

z_{i j}^{m} (t)

is the signal of transmission of the subcarrier m between the i-th BS and the j-th MU connected to it on subcarrier m in the t-th slot.

\sqrt{P_{i j}^{m}} (t) h_{i j}^{m} (t) z_{i j}^{m} (t)

indicates that the j-th MU receives the useful signal transmitted by the local BS i.

\sum_{i^{'} \in N_{B} ∖ {i}} \sqrt{P_{i^{'}}^{m}} (t) h_{i^{'} j}^{m} (t) z_{i^{'} j}^{m} (t) + n_{i j}^{m}

indicates that the j-th MU receives interference signals transmitted by BSs other than local BS i.

n_{i j}^{m}

indicates white noise.

Thus, the transmit rate of the downlink between the i-th BS and the j-th MU connected to it on subcarrier m in the t-th slot is as

\begin{matrix} r_{i j}^{m} (t) = B_{i j}^{m} (t) l o g_{2} (1 + S I N R_{i j}^{m} (t)); \end{matrix}

(6)

\begin{matrix} S I N R_{i j}^{m} (t) = \frac{P_{i j}^{m} (t) {| h_{i j}^{m} (t) |}^{2}}{σ^{2} + \sum_{i^{'} \in N_{B} ∖ {i}} P_{i^{'}}^{m} (t) {| h_{i^{'} j}^{m} (t) |}^{2}}, \end{matrix}

(7)

where

B_{i j}^{m} (t)

is the link bandwidth of the downlink between the i-th BS and the j-th MU connected to it on subcarrier m in the t-th slot.

σ^{2}

is the power of white noise.

S I N R_{i j}^{m} (t)

is the signal-to-interference-plus-noise ratio (SINR) of the downlink between the i-th BS and the j-th MU connected to it on subcarrier m in the t-th slot.

In the uplink, all signals received by the i-th BS when the j-th MU under the i-th BS transmits information to i-th BS are as

\begin{matrix} y_{j i}^{m} (t) = & \sqrt{P_{j}^{m}} (t) h_{j i}^{m} (t) z_{j i}^{m} (t) + \sum_{i^{'} \in N_{B} ∖ {i}} \sum_{j^{'} \in M_{i^{'}}} \sqrt{P_{j^{'}}^{m}} (t) h_{j^{'} i}^{m} (t) z_{j^{'} i}^{m} (t) + n_{j i}^{m}, \end{matrix}

(8)

where

\sqrt{P_{j}^{m}} (t) h_{j i}^{m} (t) z_{j i}^{m} (t)

indicates that the i-th BS receives the useful signal transmitted by the j-th MU under the i-th BS.

n_{j i}^{m} + \sum_{i^{'} \in N_{B} ∖ {i}} \sum_{j^{'} \in M_{i^{'}}} \sqrt{P_{j^{'}}^{m}} (t) h_{j^{'} i}^{m} (t) z_{j^{'} i}^{m} (t)

indicates that the i-th BS receives the interference signals transmitted by the

j^{'}

-th under the BSs other than the local BS i.

n_{j i}

indicates white noise.

Thus, the transmit rate of the uplink between the j-th MU under i-th BS and the i-th BS on subcarrier m in the t-th slot is as

\begin{matrix} r_{j i}^{m} (t) = B_{j i}^{m} (t) l o g_{2} (1 + S I N R_{j i}^{m} (t)); \end{matrix}

(9)

\begin{matrix} S I N R_{j i}^{m} (t) = \frac{P_{j}^{m} (t) {| h_{j i}^{m} (t) |}^{2}}{σ^{2} + \sum_{i^{'} \in N_{B} ∖ {i}} \sum_{j^{'} \in M_{i^{'}}} P_{j^{'}}^{m} (t) {| h_{j^{'} i}^{m} (t) |}^{2}}, \end{matrix}

(10)

where

B_{j i}^{m} (t)

is the wireless bandwidth of the uplink between the j-th MU and the i-th BS connected to it on subcarrier m in slot t.

P_{j}^{m} (t)

is the transmitting power of the j-th MU on subcarrier m in slot t;

h_{j i}^{m} (t)

is the channel gain of the uplink between the j-th MU and the i-th BS on the subcarrier m in slot t.

In the wired communication model, we directly define the maximum link capacity of the wired link, and when the wired link transmits a content file, the link capacity is updated in real time, that is the current link capacity minus the content file size. The rate at which the content file size of the wired link transmits is the current remaining link capacity of the link, which is denoted by

r_{i i^{'}}

.

r_{i i^{'}}

indicates the transmission rate from the BS i to its adjacent processing node

i^{'}

(\forall i^{'} \in B_{i})

.

The routing process that MU j under BS i sends a task request f to BS i and obtains the corresponding content is as follows. Firstly, MU j sends all subtask requests

F_{f}

to BS i. Then, the local BS i and its adjacent processing node

i^{'}

(\forall i^{'} \in B_{i})

collaborate to process all subtask requests from MU j based on their cached content and computing resources. Finally, the corresponding content obtained by each processing node after processing its respective subtask request

f_{k}

is transmitted back to the MU j. Note that the local BS i and the adjacent processing node

i^{'}

process their respective subtask

f_{k}

requests at the same time. Here, we use

F_{i}^{f}

to denote the subtask request set of task f actually processed at BS i. It must satisfy the following constraints:

\begin{matrix} F_{f} = F_{i}^{f} \cup \sum_{i^{'} \in B_{i}} F_{i^{'}}^{f}, \forall f \in F, i \in N_{B}, i^{'} \in B_{i}, \end{matrix}

(11)

where

\sum_{i^{'} \in B_{i}} F_{i^{'}}^{f}

indicates that the total set of subtask requests of task f is actually processed by adjacent node

i^{'}

. The constraint indicates that the subtask request set of task f processed by local BS i and its adjacent processing node

i^{'}

must form a complete task request f.

3.3. Caching Model

We assume that, for content tasks, the cloud has unlimited cache capacities to store all contents. On the other hand, BSs have limited cache capacity and can only cache a part of the contents. Additionally, the same content cannot be cached between connected BSs.

S^{f_{k}}

is the size of the content block about subtask request

f_{k}

. The cache capacity of BS i is denoted by

O_{i}

. Let

X_{i}^{f_{k}} (t)

be a Boolean variable indicating whether node i caches content block

f_{k}

in slot t.

X_{i}^{f_{k}} (t) = 1

if node i caches content block

f_{k}

in slot t, and

X_{i}^{f_{k}} (t) = 0

, otherwise.

Thus, the size of all contents cached in BS i in the t-th slot must be less than or equal to the cache capacity of BS i, which can be written as

\begin{matrix} \sum_{f \in F} \sum_{f_{k} \in F_{f}} X_{i}^{f_{k}} (t) S^{f_{k}} \leq O_{i}, \forall i \in N_{B}, t \in T, \end{matrix}

(12)

where

S^{f_{k}}

represents the size of subtask

f_{k}

.

The same content cannot be cached between connected BSs in the t-th slot, which can be written as

\begin{matrix} \sum_{i^{'} \in B_{i} ∖ {c}} X_{i^{'}}^{f_{k}} (t) + X_{i}^{f_{k}} (t) \leq 1, \forall f_{k} \in F_{f}, i \in N_{B}, t \in T . \end{matrix}

(13)

Assume that the number of different contents provided by the BS i is k, and all contents rank in descending order according to content popularity. The higher ranking content will be requested more often.

3.4. Delay Model

In our system, the delay generated by the system is mainly divided into the sojourn delay generated by the processing node and the transmission delay generated by the link.

3.4.1. Sojourn Delay

Considering that the heterogeneous, dynamic, and limited network resources make it difficult to meet the differentiated service needs of mass access MUs, sojourn latency refers to the delay between the content request entering and leaving the node, and the delay in obtaining the content from

B S s

or the cloud depends mainly on the number of CPU cycles required for the subtask request. Node i handles the sojourn delay of the subtask request

f_{k}

, expressed as

T_{i}^{p, f_{k}} (t)

, which can be expressed as

T_{i}^{p, f_{k}} (t) = \frac{C_{i}^{f_{k}} (t)}{V_{i}},

(14)

where

C_{i}^{f_{k}} (t)

is the amount of CPU cycles consumed by node i to process subtask request

f_{k}

in slot t, and

V_{i}

is the maximal computing capacity of node i.

3.4.2. Transmission Delay

We assume that there exist computation and content transmission tasks in the network. The transmission delay is divided into an uplink transmission delay and a downlink transmission delay. For the computation task, the data related to the computing process from MU j are sent to BS i or the cloud node for processing, and the corresponding result is returned to MU j. The computing task occupies more uplink bandwidth. For the content transmission task, the requested content is sent back to MU j from the BS i or the cloud node, and the content transmission task consumes more downlink bandwidth.

The round-trip transmission delay between BS i and MU j to process subtask

f_{k}

on subcarrier m in slot t, denoted by

T_{i j}^{t r, m, f_{k}} (t)

, can be expressed as

T_{i j}^{t r, m, f_{k}} (t) = \frac{D^{f_{k}}}{r_{j i}^{m} (t)} + \frac{S^{f_{k}}}{r_{i j}^{m} (t)},

(15)

where

D^{f_{k}}

indicates the computed data size of the subtask

f_{k}

request.

\frac{D^{f_{k}}}{r_{j i}^{m} (t)}

indicates that the uplink transmission delay of MU j transmitting subtask

f_{k}

requests to BS i in the t-th slot.

S^{f_{k}}

indicates the content data size of the subtask

f_{k}

request.

\frac{S^{f_{k}}}{r_{i j}^{m} (t)}

indicates the downlink transmission delay of BS i transmitting the content file corresponding to subtask

f_{k}

to MU j in the t-th slot.

Similarly, the round-trip transmission delay between BS i and its adjacent cloud–edge node c to process subtask

f_{k}

in slot t, denoted by

T_{i c}^{t r, f_{k}} (t)

, can be expressed as

T_{i c}^{t r, f_{k}} (t) = \frac{D^{f_{k}}}{r_{i c} (t)} + \frac{S^{f_{k}}}{r_{c i} (t)},

(16)

where

\frac{D^{f_{k}}}{r_{i c} (t)}

indicates that the uplink transmission delay of BS i transmitting subtask

f_{k}

requests to its adjacent cloud–edge node c in the t-th slot.

\frac{S^{f_{k}}}{r_{c i} (t)}

indicates the downlink transmission delay of cloud–edge node c transmitting the content file corresponding to subtask

f_{k}

to BS i in the t-th slot.

Based on the subtask offloading process mentioned above, the total delay caused by task f from MU j at BS i, denoted by

T_{i j}^{f} (t)

, can be written as

\begin{matrix} T_{i j}^{f} (t) = T_{i j}^{t r, m, f} + max \{\sum_{f_{k} \in F_{i}^{f}} T_{i}^{p, f_{k}} X_{i}^{f_{k}} Y_{i}^{f_{k}}, max_{i^{'} \in B_{i}} [\sum_{f_{k} \in F_{i^{'}}^{f}} (T_{i i^{'}}^{t r, f_{k}} + T_{i^{'}}^{p, f_{k}}) X_{i^{'}}^{f_{k}} Y_{i^{'}}^{f_{k}}]\}, \end{matrix}

(17)

where

(T_{i i^{'}}^{t r, f_{k}} + T_{i^{'}}^{p, f_{k}})

indicates the delay caused by the BS i sending the subtask

f_{k}

request to the connected processing node

i^{'}

and obtaining the corresponding content from

i^{'}

. This delay includes the processing delay of processing node

i^{'}

and the round-trip transmission delay generated. According to the routing model, we know that the local BS i and the connected processing node

i^{'}

process their respective subtask

f_{k}

requests simultaneously. Thus, the delay determining this part of the process is the maximum delay generated between BS i and the connected processing node

i^{'}

.

Y_{i}^{f_{k}} (t)

is a Boolean variable indicating whether a subtask is locally processed. If node i locally handles subtask

f_{k}

in slot t,

Y_{i}^{f_{k}} (t) = 1

, and

Y_{i}^{f_{k}} (t) = 0

, otherwise.

3.5. Problem Formulation

In order to improve resource utilization and content distribution, we present the optimal offloading problem based on task division as the minimum delay model in the cache-assisted cloud–edge cooperation system. The model designs the cloud–edge cooperation offloading scheme and realizes the integrated allocation of 3C resources and the joint optimization of cache and routing in the network. In the system, the total delay is minimized by jointly optimizing offload decisions and resource allocation, which can be formulated as

\begin{array}{l} min & \sum_{t \in T} \sum_{i \in N_{B}} \sum_{j \in M_{u}} \sum_{f \in F} T_{i j}^{f} (t) \\ (18a) & s . t . & P_{j i} (t) \leq P_{j}^{max} (t), \forall i, \forall j, \forall t, \\ (18b) & \sum_{j \in M_{u}} P_{i j} (t) \leq P_{i}^{max} (t), \forall i, \forall t, \\ (18c) & \sum_{f_{k} \in F_{f}} f_{k} = f, \forall f, \forall k, \\ (18d) & \sum_{f \in F} \sum_{f_{k} \in F_{f}} X_{i}^{f_{k}} (t) S^{f_{k}} \leq O_{i}, \forall i, \forall t, \\ (18e) & \sum_{i^{'} \in B_{i} ∖ {c}} X_{i^{'}}^{f_{k}} (t) + X_{i}^{f_{k}} (t) \leq 1, \forall f_{k}, \forall i \in B_{i} ∖ {c, i^{'}}, \forall t, \\ (18f) & \sum_{f \in F} \sum_{f_{k} \in F_{f}} Y_{i}^{f_{k}} (t) C_{i}^{f_{k}} \leq C_{i}, \forall i, \forall t, \\ (18g) & \sum_{i^{'} \in B_{i}} Y_{i^{'}}^{f_{k}} (t) + Y_{i}^{f_{k}} (t) \leq 1, \forall f_{k}, \forall i \in B_{i} ∖ {i^{'}}, \forall t, \\ (18h) & F_{f} = F_{i}^{f} \cup \sum_{i^{'} \in B_{i}} F_{i^{'}}^{f}, \forall f, \forall i \in B_{i} ∖ {i^{'}}, \\ (18i) & f_{l_{i j}} \leq B_{l_{i j}}, \forall i, \forall j, \\ (18j) & X_{i}^{f_{k}} (t), Y_{i}^{f_{k}} (t) \in {0, 1}, \forall f_{k}, \forall i, \forall t . \end{array}

In the above constraint, (18a) indicates that each user’s transmit power should be less than or equal to its own maximum transmit power, and the sum of the transmit power after the power profile is less than or equal to the maximum transmit power of the i-th BS in the t-th slot. (18b) means that all subtasks of task f must be combined to form a complete task f, and all subtasks are independent of each other. (18c) indicates that the size of all content cached in BS i must be less than or equal to the cache capacity of BS i. (18d) means that the same content cannot be cached between adjacent BSs, that is adjacent BSs should be cooperatively cached. (18e) indicates that the total computing power required by node i to process subtasks should be less than or equal to the computing power of processing node i. (18f) indicates that the same subtask request cannot be processed repeatedly between processing nodes. (18g) indicates that the subtask request set of task f processed by the local BS i and the subtask request set of task f processed by its neighbor processing node

i^{'}

must form a complete task f request. (18h) means that the total transmit traffic

f_{l_{i j}}

generated on any link (wired or wireless) should not exceed its corresponding link bandwidth

B_{l_{i j}}

. (18i) means that all Boolean variables must be 0 or 1.

4. Task Offloading and Resource Allocation via Deep Reinforcement Learning

In order to minimize the delay in (18), the proposed joint optimization problem of task offloading and resource allocation is a sequential decision-making process that belongs to the Markov decision process. The DRL algorithm can be used for joint optimization by learning historical knowledge and establishing mappings between network states and actions through neural networks and reinforcement learning strategies, which can be defined by a tuple

\{𝒮, A, P (s_{t + 1} | s_{t}, a_{t}), R (s_{t}, a_{t})\}

.

𝒮

is the set of states to represent current cloud–edge cooperation environments.

A

is the set of all possible actions of the Markov decision process (MDP).

P (s_{t + 1} | s_{t}, a_{t})

is the probability of transforming from state

s_{t}

to state

s_{t + 1}

after performing action

a_{t}

at time t.

R (s_{t}, a_{t})

represents the reward when action

a_{t}

is performed in state

s_{t}

. Therefore, cloud–edge collaboration systems must collect much information about the network topology, user requests, node cache states, and network resources to process massive content requests. In the decision-making process of offloading, the algorithm based on DQN observes the network environment, captures the state information (network resources, user requests, etc.) in step t, outputs the optimal offloading decision, and serves the request routing in the network. Moreover, TD3 is employed to optimize caching replacement based on its property of being good at optimizing continuous variables in caching decision action.

4.1. Model Design of DQN-Enabled Task Offloading

The optimization problem of this model is an MDP, which can be defined by a tuple

\{𝒮, A, P (s_{t + 1} | s_{t}, a_{t}), R (s_{t}, a_{t})\}

. Here,

𝒮

is the set of states describing the current network environment, including the node status, link bandwidth size, transmit power size, and other network state information.

A

is the set of all possible actions in the MDP process, such as all possible node actions that can be selected in this network structure.

P (s_{t + 1} | s_{t}, a_{t})

represents the probability of transforming from state

s_{t}

to state

s_{t + 1}

after performing action

a_{t}

. The state transition equation satisfies the characteristics of a Markov process. The state of

s_{t + 1}

only depends on

s_{t}

and the action selected in that state, and has no relation to the previous state.

R (s_{t}, a_{t})

represents the reward obtained when executing action

a_{t}

under state

s_{t}

. In the MDP, the core goal is to find a way to select action

a_{t}

to obtain the maximum reward value.

The MDP can be solved using various reinforcement learning algorithms. Among them, Q-learning is a reinforcement learning algorithm that optimizes rewards by dynamically acquiring environmental state information and storing action values [53]. Its advantage is that it does not require prior knowledge or models and can learn through interaction with the environment. Each update only needs to consider the current state and action, without considering the entire state transition process. However, in the complex network environment of the cloud–edge, the Q-table will store a large number of state and action values, resulting in a high-dimensional state space and resource waste [54]. In contrast, the DQN, as a branch of DRL, can solve these problems by using deep neural networks. Therefore, the DQN can better handle complex problems than traditional Q-learning algorithms and has better convergence and performance. In this section, we propose a new task-offloading strategy based on the DQN, which aims to reduce latency consumption through collaborative caching and offloading decisions.

Figure 2 illustrates the workflow of the task-offloading algorithm based on the DQN. As shown in the figure, the DQN consists of two neural networks: an evaluation network and a target network, both having the same neural network structure, but different parameters. The evaluation network is used to output the action value Q by inputting the state

s_{t}^{f}

at time slot t. To avoid the algorithm from getting stuck in a local optimal value, we use an

ε - g r e e d y

strategy to randomly select actions with probability

ε \in (0, 1)

or random actions with probability

1 - ε

. In our DQN policy, we use backpropagation (BP) and gradient descent algorithms to update the parameters of the evaluation network, i.e., by randomly selecting a set of historical information from experience replay to calculate and minimize the loss function, thereby adjusting the relevant parameters. The loss function can be calculated as

\begin{matrix} L {(w)}^{f} = E \{{[r_{t}^{f} + γ max_{a_{t}^{f}} Q^{'} (s_{t + 1}^{f}, a_{t}^{f}; w^{-}) - Q (s_{t}^{f}, a_{t}^{f}; w)]}^{2}\}, \end{matrix}

(19)

where

r_{t}^{f}

indicates the actual reward obtained by taking action

a_{t}^{f}

,

γ

is the discount rate, w is the weight of the evaluation network, and

w^{-}

is the weight of the target network.

Q (s_{t}^{f}, a_{t}^{f}; w)

is the predicted Q-value generated by the evaluation network.

max_{a_{t}^{f}} Q^{'} (s_{t + 1}^{f}, a_{t}^{f}; w^{-})

indicates the maximum Q-value of the target network, and

r_{t} + γ max_{a_{t}^{f}} Q^{'} (s_{t + 1}^{f}, a_{t}^{f}; w^{-})

is the actual Q-value.

Finally, to improve the stability and convergence of the system, the target network copies the network parameters from the evaluation network to update its own neural network in each specific training cycle.

4.2. Task-Partition-Based Intelligent Offloading Procedure

In the network environment, the network state generated by task request f at time slot t is defined as

s_{t}^{f} = \{n_{t}, C_{n_{t}}, f, A_{n_{t}}, X_{n_{t}}, f_{l_{n_{t}, j}}, j \in A_{n_{t}}\}

, where

n_{t}

represents the current node that processes task request f;

C_{n_{t}}

represents the type of current node, which can be an MU, BS, or the cloud; f represents the task f;

A_{n_{t}}

is the set of adjacent processing nodes of the current processing node

n_{t}

;

X_{n_{t}} = (X_{n_{t}}^{1}, X_{n_{t}}^{2}, . . ., X_{n_{t}}^{F})

represents the cache status of node

n_{t}

;

f_{l_{n_{t}, j}}

represents the traffic of link

l_{n_{t}, j}

. At time slot t, the action

a_{t}^{f}

of task request f is defined as

a_{t}^{f} = \{n_{t + 1}, n_{t + 1} \in A_{n_{t}}\}

, which indicates the behavior action of the current node choosing the next node.

To obtain the optimal offloading decision, we designed a reward function that combines environmental feedback signals with optimization objectives. During the routing process, when a request is satisfied, the system receives a reward signal based on the delay. If the request is satisfied at the network edge (local BS and adjacent BS), the system will receive more rewards to reduce the content transmission delay. If the request packet is lost during the routing process, the system cannot receive rewards. At time slot t, the reward value function obtained by task request f is

\begin{matrix} r_{t}^{f} = \frac{γ}{T_{t}^{f}}, \end{matrix}

(20)

where

γ

represents the discount factor, which is used to adjust the proportional relationship between the reward value and the delay.

T_{t}^{f}

represents the total delay that users need to consume during the period from sending task request f in time slot t to receiving the corresponding content of task request f.

In the proposed DQN algorithm, the core goal of the algorithm is to find the optimal decision-making strategy through training and learning of historical data in each time slot t, so as to maximize the sum of the expected reward values of all task requests sent by all mobile users. Because the reward value is inversely proportional to the delay, when the reward value is maximized, the delay is minimized accordingly. During the pathfinding process of DQN, if the current BS can meet the service requirements of the task request f, the system immediately terminates the pathfinding process of the current task request f and returns the corresponding content of the task request f to the corresponding user node. Conversely, if the current node cannot meet the service requirements requested by the task, the system will randomly select an action according to the

ε - g r e e d y

policy, send the status information

s_{t}^{f}

to the evaluation network, and obtain the next action

a_{t}^{f}

from the evaluation network. After performing the action

a_{t}^{f}

, the system will report the corresponding reward

r_{t}^{f}

and the next state

s_{t + 1}^{f}

. At the same time, the system will also store this state information

(s_{t}^{f}, a_{t}^{f}, r_{t}^{f}, s_{t + 1}^{f})

into the experience playback pool in order to obtain data from the experience playback pool for neural network training. This cycle repeats until the task request f is satisfied or the packet is lost. Here, DQN pathfinding is performed on a task, and then, DQN pathfinding is performed for the next task request. Algorithm 1 gives the task offloading training process based on the DQN, where

N_{E}

represents the total number of time slots and

N_{s}

represents the number of requests in each time slot.

Algorithm 1: DQN-based task offloading.

4.3. Model Design of TD3-Enabled Intelligent Caching

In the intelligent caching problem based on task partition, the optimal caching updating strategy is realized by using caching history information. The cache updating model is an MDP. The agent makes the action

a_{t} \in A

decision by sensing the current state

s_{t} \in 𝒮

at time slot

t \in T

, where

𝒮

and

A

represent the state space and action space, respectively. After the agent has performed the action

a_{t}

, it is rewarded with

r_{t}

based on immediate feedback, and the environment state is transformed to

s_{t + 1}

accordingly. The Markov transition tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

is recorded in the experience replay buffer for the agent’s training. Because the caching problem in heterogeneous scenarios is suitable for continuous action space solutions, while the Deep Q network (DQN), an important branch of the DRL algorithm, is suitable for caching decisions in discrete action space, the DQN is not considered a solution to the intelligent caching problem in heterogeneous scenarios. Therefore, considering another important branch of the DRL algorithm, TD3 is a reinforcement learning algorithm based on policy gradients that is suitable for a continuous action space in heterogeneous scenarios. Therefore, TD3 is considered as a solution to the intelligent caching problem. In order to obtain the optimal solution for the optimization problem, the state space

𝒮

, action space

A

, and reward function

r_{t}

in the proposed intelligent caching MDP model are designed as follows:

State space: The state $𝒮_{t}$ in slot t includes network caching information $C_{t}$ , user request information $F_{t}$ , and network topology information $G_{t}$ . Thus, the state vector at slot t is expressed as $𝒮_{t} = \{C_{t}, G_{t}, F_{t}\}$ , and $C_{t}$ is the content caching state of all BSs in slot t as $C_{t} = \{C_{i, t}, \forall i \in N_{B}\}$ .
Action space: In the caching decision process, the optimal caching decision action $A_{t}$ in the resource-constrained cloud–edge collaborative network contains the caching contents of all nodes in slot t. $A_{t}$ can be expressed as

$\begin{matrix} A_{t} = & [C_{1, t + 1}, C_{2, t + 1}, . . ., C_{i, t + 1}, . . ., C_{N_{B}, t + 1}], i \in N_{B}, \end{matrix}$

(21)

where $C_{i, t + 1}$ is the content caching state of BS i at in slot t + 1; the set of BSs is denoted by $N_{B} = {1, 2, . . ., i, . . . N_{B}}$ .
Reward: The design of the reward function directly affects the exploration of the caching updating problem (P1) and the convergence of the algorithm. These factors are considered in the design of the reward function in order to optimize the content cache based on meeting the user’s service requirements. Therefore, the reward function in slot t can be formulated as

$\begin{matrix} r_{t} = λ \sum_{k = 1}^{N_{s}} R_{k} + (1 - λ) \sum_{i \in N_{B}} H_{t}^{i}, \end{matrix}$

(22)

where $N_{s}$ is the maximum number of training steps; $λ$ is the weight parameter; $H_{t}^{i}$ is the number of cache hits for BS i at time slot t; $R_{k}$ represents user satisfaction at step k.

Figure 3 illustrates the framework of the proposed TD3-based content caching model. As shown in Figure 3, there are six neural networks in TD3, including the actor network, the twin critic networks, the target actor network, and the target twin critic networks. In particular, to facilitate the training of the neural networks, the target network will have the same structure as its corresponding actor or critic networks to be used as a training label for the loss function.

The TD3 algorithm mainly relies on actor and critic networks to make action decisions and evaluations, respectively. Specifically, the actor network selects the action

a_{t}

in a given state by fitting the action decision function

π (s_{t} ∣ ω_{ϕ})

for state

s_{t}

in time slot t. On the other hand, the critic network evaluates the value of the action choice in a given state by fitting the state–action value function

Q (s_{t}, a_{t} ∣ ω^{θ})

. The parameters of the actor and critic networks are denoted by

ω_{ϕ}

and

ω^{θ}

, respectively.

In order to overcome the overestimation of the Q-value, TD3 adopts the double-evaluation network strategy to estimate the actual Q-value, which uses the smaller of the two Q-values to form the targets in the Bellman error loss functions, and

y_{t}

is calculated by the following equation:

\begin{matrix} y_{t} = r_{t} + γ min \{Q [s_{t + 1}, π (s_{t} ∣ ω_{ϕ^{'}} + ε) ∣ ω^{θ_{1}^{'}}], Q [s_{t + 1}, π (s_{t} ∣ ω_{ϕ^{'}} + ε) ∣ ω^{θ_{2}^{'}}]\} \end{matrix}

(23)

where

ω_{ϕ^{'}}

and

ω^{θ^{'}}

represent the parameters of the target actor network and critic networks, and

ε

∼

N (0, ξ)

represents Gaussian noise with scale

ξ (σ)

related to the training episode

σ

. In order to enhance the exploration of the TD3 algorithm, a relatively large value of

ξ (σ)

is set in the early stage of training, and as the algorithm iterates,

ξ (σ)

gradually decreases to improve the exploitation of the algorithm.

The experience generated by the algorithm during different training steps is stored in the replay buffer to be sampled to calculate the TD-error, and then, the critic networks are updated.

y_{t}

will be used as a tag to train the two critic networks by minimizing the loss function as follows:

\begin{matrix} L (ω_{ϕ}) = \frac{1}{m} \sum_{m} {[y_{t} - Q (s_{t}, a_{t} ∣ ω^{θ})]}^{2}, \end{matrix}

(24)

where m is the batch size from the experience replay buffer. The target networks adopt a delay soft updating policy to iterate their parameters

π (s_{t} ∣ ω_{ϕ^{'}})

.

This policy refers to the fact that the TD3 algorithm updates the target actor–critic networks less frequently than the actor–critic networks to reduce the oscillation of the TD3, which can be expressed as

\begin{matrix} ω^{θ^{'}} \leftarrow τ ω^{θ} + (1 - τ) ω^{θ^{'}}; \end{matrix}

(25)

\begin{matrix} ω_{ϕ^{'}} \leftarrow τ ω_{ϕ} + (1 - τ) ω_{ϕ^{'}}, \end{matrix}

(26)

where

τ

is the update coefficient and

0 < τ ≪ 1

.

Finally, the actor network is trained by the critic network gradient:

\begin{matrix} \nabla_{ω_{ϕ}} J (ω_{ϕ}) = & \frac{1}{m} \sum_{m} \nabla_{a} Q (s_{t}, a_{t} ∣ ω^{θ_{1}}) |_{a_{t} = π (s_{t})} \nabla_{ω_{ϕ}} π (s_{t} ∣ ω_{ϕ}) . \end{matrix}

(27)

4.4. Task-Partition-Based Intelligent Caching Procedure

In the proposed process of intelligent caching based on task partition, the aim is to use the caching history information to achieve optimal caching updating, maximize the sum of the expected reward, and achieve a low-latency content distribution. As shown in Algorithm 2, the workflow of caching updating optimization based on DRL is shown to achieve the optimal node caching updating. When all nodes receive task requests in time slot t, the caching updating decision

a_{t}

is output to update the caching status of all nodes. After entering the time slot

t + 1

, all nodes receive the task request again, and the system receives the reward

r_{t}

and the next state

s_{t + 1}

. At the same time, the transition tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

was stored in the experience replay buffer to train the six neural networks. The intelligent caching updating process terminates when the maximum number of training steps is reached.

Algorithm 2: TD3-based optimization for caching updating.

5. Simulation and Results

In this section, to evaluate the task-partition-based intelligent offloading scheme in heterogeneous cache-assisted cloud–edge cooperation environments, the simulation environment consisted of one cloud, 4 BSs, and 12 MUs. In the simulation, all solutions were evaluated from the perspectives of offloading modes, caching policies, and task partition to demonstrate the merits of our proposed model:

Offloading policy: All schemes were evaluated in different offloading modes, e.g., DRL-based offloading and the popular open shortest path first (OSPF) routing in the cache-aided cloud–edge collaboration system.
Caching policy: All schemes were evaluated in different caching policies, where two typical and efficient caching policies, denoted by “Pop” and “TD3”, were utilized in cache-enabled asymmetrical cloud–edge cooperation networks. In “Pop”, each BS cooperatively caches network contents according to the known content popularity at the beginning of the simulation and does not update its content caching state. In “TD3”, each BS adjusts its content caching state before entering the next slot according to the historical access frequencies of tasks sent by end-users.
Task partition: All schemes were evaluated by dividing tasks or not to explore the effect of task partition on offloading and content caching.

Figure 4 shows the network latency for all solutions at different cache sizes for each BS. As shown in Figure 4, the performance of all solutions improves when the cache size increases. The reason for this is that, given a larger cache size, more popular content is stored on edge nodes, shortening the transmission path for user tasks and reducing network latency for all solutions. At the same time, the increase in the cache hit ratio reduces the number of lost packets and reduces the performance gap between tasks and task-offloading schemes. As the cache size increases and the cache hit ratio increases, the performance gap between DQN intelligent routes and OSPF routes becomes less obvious. However, considering that “Pop” can achieve the best content cache based on the known content popularity, the TD3 smart cache gradually learns according to the current network cache state and the return value of previous decisions. As the cache size increases, the network cache state dimension becomes larger, and the learning effect of TD3 agents deteriorates. The TD3 smart cache strategy becomes less and less accurate, so the performance gap between solutions with “Pop” and “TD3” widens with increasing cache sizes.

Figure 5 shows the network latency for all solutions as the amount of different network content varies. As shown in Figure 5, an increase in the diversity of network content means a decrease in the number of task requests for popular content, which reduces the cache hit rate of each BS, which in turn increases the number of packet drops and increases the network latency of all solutions. At the same time, the increase in the diversity of network content causes more content to go to the cloud node for uninstallation. As a result, the link transmission burden increases and the number of packets increases. The packet loss delay of tasks is much larger than that of subtasks, so the performance gap between tasks and subtask uninstallation schemes increases. At the same time, with the increase in the diversity of network content, users send more requests to obtain unwelcome content, and the cache hit rate of each BS will gradually decreases. The DQN intelligent offloading scheme was getting better and better than the OSPF scheme, so the performance gap between them became larger. In addition, as the amount of different network content grew, the performance gap between solutions with “Pop” and “TD3” increased because the TD3 smart cache gradually learned based on the current network cache state and the return value of previous decisions, and the accuracy of the content popularity decreased as content diversity increased. As the learning effect of TD3 agents deteriorated, the accuracy of the TD3 smart cache strategy became worse and worse, so the increasing content diversity had a greater impact on the cache hit ratio of “TD3” than that of “Pop”.

Figure 6 shows the network latency for all solutions as content popularity changes. As Figure 6 shows, the performance of all solutions increased as the popularity of content increased. The reason for this is that, as the popularity of content increased, users sent more task requests to obtain the popular content, thus increasing the cache hit rate in a given BS cache size, and therefore, the network latency for all solutions decreased. At the same time, the content popularity increased, the cache hit ratio improved, the number of packet losses reduced, the utilization of task segmentation network resources was not very obvious, and the performance gap between task and subtask offloading schemes narrowed.

At the same time, the increase in content popularity improved the cache hit rate of each BS, and the performance gap between the DQN intelligent uninstall scheme and the OSPF scheme became less and less obvious, so the gap became smaller. However, as the cache size grew, the performance gap between solutions with “Pop” and “TD3” first increased and then decreased. When the content popularity was small, the reason why the gap between “Pop” and “TD3” was small is that the content in the network was requested, and the TD3 smart cache gradually learned according to the current network cache state and the return value of previous decisions, so the TD3 smart learning effect became worse, and the accuracy of TD3 smart cache strategy became very poor. At this time, no matter what caching method the BS used, the performance was similar, and it would go to the cloud node to find the requested content. With the gradual increase in the popularity of content, the reason why the gap between “Pop” and “TD3” would become larger is that the number of requests for different content was not very big. However, because the Pop cache was the most popular content, the TD3 intelligent learning effect was not very good, and the accuracy of the TD3 intelligent caching strategy was not very high. Therefore, the performance gap between the TD3 smart cache and Pop slowly opened up. When the content’s popularity increased to a certain extent, the gap between “Pop” and “TD3” narrowed because users requested more popular content. The TD3 intelligent cache gradually learned according to the current network cache state and the returned the value of previous decisions. The TD3 intelligent learning effect was very good, and it more easily perceived the popular content. When the accuracy of the TD3 smart cache policy gradually improved, the performance gap between the TD3 smart cache policy and Pop gradually narrowed.

Figure 7 shows the network latency for all solutions as the number of packets sent per user changed. As shown in Figure 7, as the number of packets sent by each user increased, the total number of requests in the network increased, thereby increasing the network latency of all solutions. At the same time, as the number of packets sent by each user increased, the link transmission burden on the network increased, and the number of packets lost increased. After the task was divided, the utilization of network resources was higher than that of the undivided task, and the packet loss delay of the undivided task was much larger than that of the subtask. As a result, the performance gap between the task and the subtask uninstallation scheme increased. As the number of requests increased, the link transmission burden on the network increased. Compared with OSPF, the DQN can detect the network link status and perform intelligent pathfinding better. In addition, the DQN pathfinding had no extra delay for one request, while OSPF pathfinding generated an extra delay for each request. However, as the number of user requests increased, the training times of TD3 agents increased, and the accuracy of the TD3 smart cache policy gradually improved. As a result, the gap between the cache hit rate of the TD3 smart cache policy and the cache hit rate based on the popularity (Pop) cache policy became smaller.

Figure 8 shows the network latency of all solutions as the transmit power of each BS changed. As shown in Figure 8, with the increase in transmit power of each BS, the channel SINR of the downlink between the BS and its service users improved, the link transmission rate increased, and the packet loss rate and network delay of all solutions significantly reduced. At the same time, with the increase in transmit power of each BS, the resource utilization rate of subtask offloading was higher. With the improvement of the SINR of the downlink channel, the link transmission rate increased, so the performance gap between the task and the subtask offloading scheme increased. As the transmission power of each BS increased, the network delay gap between the DQN intelligent uninstall and OSPF solutions gradually narrowed due to the optimal offload decision in the resource-constrained cloud edge collaboration environments. However, as the transmit power of each BS increased, the increase in the transmit power of the BS had no effect on the channel SINR of the uplink between the BS and its access terminal, leaving the recorded historical access information of the BS almost unchanged, so the performance gap between the solutions with “Pop” and “TD3” barely widened.

Figure 9 shows the network delay of all solutions as the transmit power of each MC changed. As shown in Figure 9, with the increase in the transmit power of each MC, the channel SINR of the uplink between the user and its connected BS improved, the link transmission rate increased, and the packet loss rate and network delay of all solutions significantly reduced. At the same time, with the increase in transmit power of each MC, the resource utilization rate caused by subtask offloading was higher. With the improvement of the uplink channel SINR, the link transmission rate increased, so the performance gap between the task and the subtask offloading scheme increased. As the transmission power of each MC increased, due to the optimal offloading decision in the resource-constrained cloud edge collaboration environment, the DQN intelligent offloading and OSPF solutions could quickly send the requested content to the BS, thus reducing the packet loss rate and the additional delay caused by OSPF packet loss, so the network delay gap between them gradually narrowed. However, as the transmit power of each MC increased, it affected the channel SINR of the uplink between BS and its access terminal, indirectly affecting the transmission rate of the uplink, but it did not affect the user’s interest preference, so the performance gap between the solutions with “Pop” and “TD3” hardly widened.

Figure 10 shows the average weighted reward for all task requests per slot in a DQN agent with different learning rates. As shown in Figure 10, the “DQN” solution at different learning rates always converged quickly and performed best when the learning rate was 0.0003. This is because the large learning rate indicates that, when the current system DQN task offloading decision is made, the old Q-value will have a weak influence on the new Q-value, and the weight update was less affected by the old value and more dependent on the new value. Therefore, the learning rate was neither larger nor smaller; only the appropriate learning rate would make the solution perform best.

Figure 11 shows the average weighted reward of all task requests per time slot for different learning rates in the TD3 agent. As shown in Figure 11, the “TD3” smart cache solution at different learning rates always converged quickly and performed best when the learning rate was 0.0003. This is because a larger learning rate means that the step length of the weight update of the actor network is larger, which indicates that the actor network will adjust the strategy faster every time the weight is updated, making the actor network more sensitive and adjusting the actions taken in each state more quickly. This can lead to more rapid changes in caching policies and faster exploration of different caching policies. However, the excessive learning rate may also lead to the instability of the strategy, making it difficult for the “TD3” algorithm to converge. So, bigger is not better, and smaller is not better; only the right learning rate will make the caching solution perform best.

6. Conclusions

In this paper, a joint optimization of task offloading and content caching is proposed aimed at enhancing resource allocation efficiency in task-partition-based asymmetrical cloud–edge cooperation networks. First, we considered a joint optimization of task offloading and content caching in task-partition-based asymmetrical cloud–edge cooperation networks and formulated the problem as mixed-integer non-linear programming, which aims at minimizing the network delay of the system. Subsequently, utilizing network state information, we designed a new cooperative TD3-DQN algorithm for making joint optimization decisions of computation offloading and content caching. According to numerical results, the proposed TD3-DQN algorithm can significantly reduce the network latency of the system and can outperform some other benchmark algorithms under different scenarios.

In future work, we will explore the impact of caching different items (e.g., data, models, and content) on the computing tasks. Moreover, it is also worth investigating the use of multi-agent DRL by deploying multi-agent cooperative control networks at different nodes of the network. Finally, exploring the collaborative control of discrete and continuous variables in cloud edge networks by a single agent is also a future research direction.

Author Contributions

Conceptualization, J.H. and C.F.; methodology, J.H. and X.Y.; software, J.H. and X.Y.; validation, J.C. (Jiabao Chen) and J.C. (Jinyi Chen) and Z.H.; formal analysis, J.H., X.Y. and J.Z.; investigation, J.C. (Jinyi Chen) and Z.W.; resources, C.F.; data curation, X.Y. and J.Z.; writing—original draft preparation, J.H.; writing—review and editing, X.Y.; visualization, J.C. (Jiabao Chen); supervision, C.F. and Z.H.; project administration, C.F.; funding acquisition, J.H. and C.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Urban Carbon Neutral Science and Technology Innovation Fund Project of Beijing University of Technology (040000514122607), the Special Research Program of Academic Cooperation between Taipei University of Technology and Beijing University of Technology (NTUT-BJUT-112-02), and the Beijing Natural Science Foundation L202016 and 4222002.

Data Availability Statement

The data analyzed during the current study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yu, R.; Xue, G.; Kilari, V.T.; Zhang, X. The Fog of Things Paradigm: Road toward On-Demand Internet of Things. IEEE Commun. Mag. 2018, 56, 48–54. [Google Scholar] [CrossRef]
Sisinni, E.; Saifullah, A.; Han, S.; Jennehag, U.; Gidlund, M. Industrial Internet of Things: Challenges, Opportunities, and Directions. IEEE Trans. Ind. Inform. 2018, 14, 4724–4734. [Google Scholar] [CrossRef]
Xu, L.D.; He, W.; Li, S. Internet of Things in Industries: A Survey. IEEE Trans. Ind. Informat. 2014, 10, 2233–2243. [Google Scholar] [CrossRef]
Wang, F.; Zhang, X. Dynamic Computation Offloading and Resource Allocation over Mobile Edge Computing Networks with Energy Harvesting Capability. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20 May 2018. [Google Scholar]
Chen, X.; Jiao, L.; Li, W.; Fu, X. Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Trans. Netw. 2016, 24, 2795–2808. [Google Scholar] [CrossRef]
Fang, C.; Yao, H.; Wang, Z.; Wu, W.; Jin, X.; Yu, F.R. A Survey of Mobile Information-Centric Networking: Research Issues and Challenges. IEEE Commun. Surv. Tut. 2018, 20, 2353–2371. [Google Scholar] [CrossRef]
Tu, W. Data-Driven QoS and QoE Management in Smart Cities: A Tutorial Study. IEEE Commun. Mag. 2018, 56, 126–133. [Google Scholar] [CrossRef]
Yang, L.; Zhang, H.; Li, X.; Ji, H.; Leung, V.C.M. A Distributed Computation Offloading Strategy in Small-Cell Networks Integrated with Mobile Edge Computing. IEEE/ACM Trans. Netw. 2018, 26, 2762–2773. [Google Scholar] [CrossRef]
Fang, C.; Yu, F.R.; Huang, T.; Liu, J.; Liu, Y. A Survey of Green Information-Centric Networking: Research Issues and Challenges. IEEE Commun. Surv. Tut. 2015, 17, 1455–1472. [Google Scholar] [CrossRef]
Wang, Y.; Lang, P.; Tian, D.; Zhou, J.; Duan, X.; Cao, Y.; Zhao, D. A Game-Based Computation Offloading Method in Vehicular Multiaccess Edge Computing Networks. IEEE Internet Things J. 2020, 7, 4987–4996. [Google Scholar] [CrossRef]
Xiao, Z.; Shu, J.; Jiang, H.; Lui, J.C.S.; Min, G.; Liu, J.; Dustdar, S. Multi-Objective Parallel Task Offloading and Content Caching in D2D-Aided MEC Networks. IEEE Trans. Mob. Comput. 2023, 22, 6599–6615. [Google Scholar] [CrossRef]
Hu, Z.; Fang, C.; Wang, Z.; Tseng, S.M.; Dong, M. Many-Objective Optimization-Based Content Popularity Prediction for Cache-Assisted cloud–edge-End Collaborative IoT Networks. IEEE Internet Things J. 2024, 11, 1190–1200. [Google Scholar] [CrossRef]
Zhu, S.; Gui, L.; Zhao, D.; Cheng, N.; Zhang, Q.; Lang, X. Learning-Based Computation Offloading Approaches in UAVs-Assisted Edge Computing. IEEE Trans. Veh. Technol. 2021, 70, 928–944. [Google Scholar] [CrossRef]
Ren, J.; Yu, G.; He, Y.; Li, G.Y. Collaborative Cloud and Edge Computing for Latency Minimization. IEEE Trans. Veh. Technol. 2019, 68, 5031–5044. [Google Scholar] [CrossRef]
Sun, Z.; Yang, H.; Li, C.; Yao, Q.; Wang, D.; Zhang, J.; Vasilakos, A.V. cloud–edge Collaboration in Industrial Internet of Things: A Joint Offloading Scheme Based on Resource Prediction. IEEE Internet Things J. 2022, 9, 17014–17025. [Google Scholar] [CrossRef]
Ning, Z.; Dong, P.; Kong, X.; Xia, F. A Cooperative Partial Computation Offloading Scheme for Mobile Edge Computing Enabled Internet of Things. IEEE Internet Things J. 2019, 6, 4804–4814. [Google Scholar] [CrossRef]
Zhao, J.; Li, Q.; Gong, Y.; Zhang, K. Computation Offloading and Resource Allocation for Cloud Assisted Mobile Edge Computing in Vehicular Networks. IEEE Trans. Veh. Technol. 2019, 68, 7944–7956. [Google Scholar] [CrossRef]
Wang, X.; Chen, M.; Taleb, T.; Ksentini, A.; Leung, V.C. Cache in the air: Exploiting content caching and delivery techniques for 5G systems. IEEE Commun. Mag. 2014, 52, 131–139. [Google Scholar] [CrossRef]
Zhang, K.; Leng, S.; He, Y.; Maharjan, S.; Zhang, Y. Cooperative Content Caching in 5G Networks with Mobile Edge Computing. IEEE Wirel. Commun. 2018, 25, 80–87. [Google Scholar] [CrossRef]
Poularakis, K.; Tassiulas, L. Code, Cache and Deliver on the Move: A Novel Caching Paradigm in Hyper-Dense Small-Cell Networks. IEEE Trans. Mob. Comput. 2017, 16, 675–687. [Google Scholar] [CrossRef]
Kwak, J.; Kim, Y.; Le, L.B.; Chong, S. Hybrid Content Caching in 5G Wireless Networks: Cloud Versus Edge Caching. IEEE Trans. Wirel. Commun. 2018, 17, 3030–3045. [Google Scholar] [CrossRef]
Liu, K.; Peng, J.; Wang, J.; Huang, Z.; Pan, J. Adaptive and Scalable Caching with Erasure Codes in Distributed cloud–edge Storage Systems. IEEE Trans. Cloud Comput. 2023, 11, 1840–1853. [Google Scholar] [CrossRef]
Yu, S.; Langar, R.; Fu, X.; Wang, L.; Han, Z. Computation Offloading with Data Caching Enhancement for Mobile Edge Computing. IEEE Trans. Veh. Technol. 2018, 67, 11098–11112. [Google Scholar] [CrossRef]
Liu, M.; Yu, F.R.; Teng, Y.; Leung, V.C.M.; Song, M. Computation Offloading and Content Caching in Wireless Blockchain Networks with Mobile Edge Computing. IEEE Trans. Veh. Technol. 2018, 67, 11008–11021. [Google Scholar] [CrossRef]
Tian, H.; Xu, X.; Qi, L.; Zhang, X.; Dou, W.; Yu, S.; Ni, Q. CoPace: Edge Computation Offloading and Caching for Self-Driving with Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2021, 70, 13281–13293. [Google Scholar] [CrossRef]
Fang, C.; Xu, H.; Yang, Y.; Hu, Z.; Tu, S.; Ota, K.; Yang, Z.; Dong, M.; Han, Z.; Yu, F.R.; et al. Deep-Reinforcement-Learning-Based Resource Allocation for Content Distribution in Fog Radio Access Networks. IEEE Internet Things J. 2022, 9, 16874–16883. [Google Scholar] [CrossRef]
Feng, H.; Guo, S.; Yang, L.; Yang, Y. Collaborative Data Caching and Computation Offloading for Multi-Service Mobile Edge Computing. IEEE Trans. Veh. Technol. 2021, 70, 9408–9422. [Google Scholar] [CrossRef]
Yang, X.; Fei, Z.; Zheng, J.; Zhang, N.; Anpalagan, A. Joint Multi-User Computation Offloading and Data Caching for Hybrid Mobile Cloud/Edge Computing. IEEE Trans. Veh. Technol. 2019, 68, 11018–11030. [Google Scholar] [CrossRef]
Hu, J.; Jiang, M.; Zhang, Q.; Li, Q.; Qin, J. Joint Optimization of UAV Position, Time Slot Allocation, and Computation Task Partition in Multiuser Aerial Mobile-Edge Computing Systems. IEEE Trans. Veh. Technol. 2019, 68, 7231–7235. [Google Scholar] [CrossRef]
Fang, F.; Xu, Y.; Ding, Z.; Shen, C.; Peng, M.; Karagiannidis, G.K. Optimal Task Partition and Power Allocation for Mobile Edge Computing with NOMA. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9 December 2019. [Google Scholar]
Feng, M.; Krunz, M.; Zhang, W. Joint Task Partitioning and User Association for Latency Minimization in Mobile Edge Computing Networks. IEEE Trans. Veh. Technol. 2021, 70, 8108–8121. [Google Scholar] [CrossRef]
Gao, M.; Cui, W.; Gao, D.; Shen, R.; Li, J.; Zhou, Y. Deep Neural Network Task Partitioning and Offloading for Mobile Edge Computing. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9 December 2019. [Google Scholar]
Gao, M.; Shen, R.; Shi, L.; Qi, W.; Li, J.; Li, Y. Task Partitioning and Offloading in DNN-Task Enabled Mobile Edge Computing Networks. IEEE Trans. Mob. Comput. 2023, 22, 2435–2445. [Google Scholar] [CrossRef]
Ku, Y.J.; Baidya, S.; Dey, S. Adaptive Computation Partitioning and Offloading in Real-Time Sustainable Vehicular Edge Computing. IEEE Trans. Veh. Technol. 2021, 70, 13221–13237. [Google Scholar] [CrossRef]
Ale, L.; King, S.A.; Zhang, N.; Sattar, A.R.; Skandaraniyam, J. D3PG: Dirichlet DDPG for Task Partitioning and Offloading with Constrained Hybrid Action Space in Mobile-Edge Computing. IEEE Internet Things J. 2022, 9, 19260–19272. [Google Scholar] [CrossRef]
Javed, M.A.; Zeadally, S. AI-Empowered Content Caching in Vehicular Edge Computing: Opportunities and Challenges. IEEE Netw. 2021, 35, 109–115. [Google Scholar] [CrossRef]
Qiao, G.; Leng, S.; Maharjan, S.; Zhang, Y.; Ansari, N. Deep Reinforcement Learning for Cooperative Content Caching in Vehicular Edge Computing and Networks. IEEE Internet Things J. 2020, 7, 247–257. [Google Scholar] [CrossRef]
Liu, Y.; Wang, S.; Huang, J.; Yang, F. A Computation Offloading Algorithm Based on Game Theory for Vehicular Edge Networks. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20 May 2018. [Google Scholar]
Guo, F.; Zhang, H.; Ji, H.; Li, X.; Leung, V.C.M. An Efficient Computation Offloading Management Scheme in the Densely Deployed Small Cell Networks with Mobile Edge Computing. IEEE/ACM Trans. Netw. 2018, 26, 2651–2664. [Google Scholar] [CrossRef]
Chen, L.; Wu, J.; Zhang, J.; Dai, H.N.; Long, X.; Yao, M. Dependency-Aware Computation Offloading for Mobile Edge Computing with Edge-Cloud Cooperation. IEEE Trans. Cloud Comput. 2022, 10, 2451–2468. [Google Scholar] [CrossRef]
Hu, Z.; Zhong, R.; Fang, C.; Liu, Y. Caching-at-STARS: The Next Generation Edge Caching. IEEE Trans. Wirel. Commun. 2024. early access. [Google Scholar] [CrossRef]
Fang, C.; Meng, X.; Hu, Z.; Xu, F.; Zeng, D.; Dong, M.; Ni, W. AI-Driven Energy-Efficient Content Task Offloading in cloud–edge-End Cooperation Networks. IEEE Open J. Comput. Soc. 2022, 3, 162–171. [Google Scholar] [CrossRef]
Xiao, H.; Zhuang, Y.; Xu, C.; Wang, W.; Zhang, H.; Ding, R.; Cao, T.; Zhong, L.; Muntean, G.M. Transcoding-Enabled cloud–edge-Terminal Collaborative Video Caching in Heterogeneous IoT Networks: A Online Learning Approach with Time-Varying Information. IEEE Internet Things J. 2023, 11, 296–310. [Google Scholar] [CrossRef]
Su, Z.; Hui, Y.; Xu, Q.; Yang, T.; Liu, J.; Jia, Y. An Edge Caching Scheme to Distribute Content in Vehicular Networks. IEEE Trans. Veh. Technol. 2018, 67, 5346–5356. [Google Scholar] [CrossRef]
Chen, Q.; Kuang, Z.; Zhao, L. Multiuser Computation Offloading and Resource Allocation for Cloud Edge Heterogeneous Network. IEEE Internet Things J. 2022, 9, 3799–3811. [Google Scholar] [CrossRef]
Fang, C.; Hu, Z.; Meng, X.; Tu, S.; Wang, Z.; Zeng, D.; Ni, W.; Guo, S.; Han, Z. DRL-Driven Joint Task Offloading and Resource Allocation for Energy-Efficient Content Delivery in cloud–edge Cooperation Networks. IEEE Trans. Veh. Technol. 2023, 72, 16195–16207. [Google Scholar] [CrossRef]
Zhou, H.; Wang, Z.; Zheng, H.; He, S.; Dong, M. Cost Minimization-Oriented Computation Offloading and Service Caching in Mobile cloud–edge Computing: An A3C-Based Approach. IEEE Trans. Netw. Sci. Eng. 2023, 10, 1326–1338. [Google Scholar] [CrossRef]
Dong, Y.; Guo, S.; Wang, Q.; Yu, S.; Yang, Y. Content Caching-Enhanced Computation Offloading in Mobile Edge Service Networks. IEEE Trans. Veh. Technol. 2022, 71, 872–886. [Google Scholar] [CrossRef]
Lan, D.; Taherkordi, A.; Eliassen, F.; Liu, L.; Delbruel, S.P.; Dustdar, S.; Yang, Y. Task Partitioning and Orchestration on Heterogeneous Edge Platforms: The Case of Vision Applications. IEEE Internet Things J. 2022, 9, 7418–7432. [Google Scholar] [CrossRef]
Liu, J.; Zhang, Q. Adaptive Task Partitioning at Local Device or Remote Edge Server for Offloading in MEC. In Proceedings of the 2020 IEEE Wireless Communications and Networking Conference (WCNC), Seoul, Republic of Korea, 25 May 2020. [Google Scholar]
Fang, C.; Guo, S.; Wang, Z.; Huang, H.; Yao, H.; Liu, Y. Data-Driven Intelligent Future Network: Architecture, Use Cases, and Challenges. IEEE Commun. Mag. 2019, 57, 34–40. [Google Scholar] [CrossRef]
Breslau, L.; Cao, P.; Fan, L.; Phillips, G.; Shenker, S. Web caching and Zipf-like distributions: Evidence and implications. In Proceedings of the IEEE INFOCOM’99. Conference on Computer Communications, (INFOCOM), New York, NY, USA, 21 March 1999; Volume 1, pp. 126–134. [Google Scholar] [CrossRef]
Su, Z.; Dai, M.; Xu, Q.; Li, R.; Fu, S. Q-Learning-Based Spectrum Access for Content Delivery in Mobile Networks. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 35–47. [Google Scholar] [CrossRef]
Wang, J.; Zhao, L.; Liu, J.; Kato, N. Smart Resource Allocation for Mobile Edge Computing: A Deep Reinforcement Learning Approach. IEEE Trans. Emerg. Topics Comput. 2021, 9, 1529–1541. [Google Scholar] [CrossRef]

Figure 1. The network model of cloud–edge cooperation environments.

Figure 2. The illustration of the proposed DQN-based task-offloading scheme.

Figure 3. The illustration of the proposed TD3-based content caching scheme.

Figure 4. Network performance versus cache size.

Figure 5. Network performance versus content diversity.

Figure 6. Network performance versus content popularity.

Figure 7. Network performance versus number of packets.

Figure 8. Network performance versus transmit power of BS.

Figure 9. Network performance versus transmit power of MC.

Figure 10. Average weighted reward sum versus learning rate.

Figure 11. Average weighted reward sum versus learning rate.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Yang, X.; Chen, J.; Chen, J.; Hu, Z.; Zhang, J.; Wang, Z.; Fang, C. Task Partition-Based Computation Offloading and Content Caching for Cloud–Edge Cooperation Networks. Symmetry 2024, 16, 906. https://doi.org/10.3390/sym16070906

AMA Style

Huang J, Yang X, Chen J, Chen J, Hu Z, Zhang J, Wang Z, Fang C. Task Partition-Based Computation Offloading and Content Caching for Cloud–Edge Cooperation Networks. Symmetry. 2024; 16(7):906. https://doi.org/10.3390/sym16070906

Chicago/Turabian Style

Huang, Jingjing, Xiaoping Yang, Jinyi Chen, Jiabao Chen, Zhaoming Hu, Jie Zhang, Zhuwei Wang, and Chao Fang. 2024. "Task Partition-Based Computation Offloading and Content Caching for Cloud–Edge Cooperation Networks" Symmetry 16, no. 7: 906. https://doi.org/10.3390/sym16070906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Task Partition-Based Computation Offloading and Content Caching for Cloud–Edge Cooperation Networks

Abstract

1. Introduction

2. Related Work

2.1. Intelligent Task Offloading

2.2. Task Partition

2.3. Task-Partition-Based Intelligent Task Offloading

3. System Model

3.1. Network Model

3.2. Communication Model

3.3. Caching Model

3.4. Delay Model

3.4.1. Sojourn Delay

3.4.2. Transmission Delay

3.5. Problem Formulation

4. Task Offloading and Resource Allocation via Deep Reinforcement Learning

4.1. Model Design of DQN-Enabled Task Offloading

4.2. Task-Partition-Based Intelligent Offloading Procedure

4.3. Model Design of TD3-Enabled Intelligent Caching

4.4. Task-Partition-Based Intelligent Caching Procedure

5. Simulation and Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI