Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection

Wei, Dong; Wang, Renjun; Xia, Changqing; Xia, Tianhao; Jin, Xi; Xu, Chi

doi:10.3390/math10244812

Open AccessArticle

Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection

¹

School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China

²

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China

³

Key Laboratory of Networked Control Systems, Chinese Academy of Sciences, Shenyang 110016, China

⁴

Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(24), 4812; https://doi.org/10.3390/math10244812

Submission received: 18 November 2022 / Revised: 13 December 2022 / Accepted: 16 December 2022 / Published: 18 December 2022

(This article belongs to the Special Issue Advances in Machine Learning and Mathematical Modeling for Optimization Problems)

Download

Browse Figures

Versions Notes

Abstract

:

Traditional gas pipeline leak detection methods require task offload decisions in the cloud, which has low real time performance. The emergence of edge computing provides a solution by enabling offload decisions directly at the edge server, improving real-time performance; however, energy is the new bottleneck. Therefore, focusing on the gas transmission pipeline leakage detection scenario in real time, a novel detection algorithm that combines the benefits of both the heuristic algorithm and the advantage actor critic (AAC) algorithm is proposed in this paper. It aims at optimization with the goal of real-time guarantee of pipeline mapping analysis tasks and maximizing the survival time of portable gas leak detectors. Since the computing power of portable detection devices is limited, as they are powered by batteries, the main problem to be solved in this study is how to take into account the node energy overhead while guaranteeing the system performance requirements. By introducing the idea of edge computing and taking the mapping relationship between resource occupation and energy consumption as the starting point, the optimization model is established, with the goal to optimize the total system cost (TSC). This is composed of the node’s transmission energy consumption, local computing energy consumption, and residual electricity weight. In order to minimize TSC, the algorithm uses the AAC network to make task scheduling decisions and judge whether tasks need to be offloaded, and uses heuristic strategies and the Cauchy–Buniakowsky–Schwarz inequality to determine the allocation of communication resources. The experiments show that the proposed algorithm in this paper can meet the real-time requirements of the detector, and achieve lower energy consumption. The proposed algorithm saves approximately 56% of the system energy compared to the Deep Q Network (DQN) algorithm. Compared with the artificial gorilla troops Optimizer (GTO), the black widow optimization algorithm (BWOA), the exploration-enhanced grey wolf optimizer (EEGWO), the African vultures optimization algorithm (AVOA), and the driving training-based optimization (DTBO), it saves 21%, 38%, 30%, 31%, and 44% of energy consumption, respectively. Compared to the fully local computing and fully offloading algorithms, it saves 50% and 30%, respectively. Meanwhile, the task completion rate of this algorithm reaches 96.3%, which is the best real-time performance among these algorithms.

Keywords:

edge computing; deep reinforcement learning; heuristic algorithm; task offloading; resource allocation

MSC:

68W99

1. Introduction

With the advent of 5G, high-performance computing and other technologies in industry have developed in the direction of high real-time engagement and low energy consumption, and many delay-sensitive and computationally intensive applications and services have emerged. Although cloud computing can provide sufficient computing resources, a large amount of traffic generated in the process of task delivery to the cloud will likely lead to network congestion, unpredictably high delay, and massive transmission energy consumption, and a distributed computing method is needed to solve these problems. Edge computing makes this feasible. To move computing to the edge of the network solves the problem of high latency of cloud services and makes up for the lack of computing resources of end devices to a certain extent.

Although edge computing provides a feasible solution for such scenarios, it entails the problem of using limited resources to realize high real-time performance and low energy consumption. Much research has been performed in this field, with good results. The main concern is to balance low latency and low energy consumption, which can effectively solve the offloading problem when the attributes of the task set to be processed are known. However, such solutions have the common limitation of low robustness, which will lead to a chain reaction when unexpected tasks enter the system, sharply degrading system performance. This is more likely to occur when tasks arrive in real time. Sun et al. [1] proposed a task offloading algorithm based on a hierarchical heuristic strategy, aiming to minimize the task delay and energy consumption, but it assumes the task set to be scheduled is known, without taking into account sudden tasks. Similarly, Li et al. proposed a task offloading algorithm based on deep reinforcement learning, which is based on a known task set, to schedule tasks [2].

Taking the leak detection of a natural gas transmission pipeline as an example, once a leak occurs, there is a great danger. Detectors need to work in the leak area. The faster they locate the leak point, the less the security risk; hence this scenario demands high reliability in real time. Many portable gas leak detectors depend on the collection of infrared or other spectral images for image analysis [3]. Because the detector must constantly change its position during operation, it needs to feedback results immediately so that it will not miss the leak point. However, due to the size of the detector, its computing and battery capacity have certain limitations. It is difficult to complete some complex recognition tasks on time, which greatly affects detection efficiency and accuracy. Figure 1 shows the workflow of the solution. By introducing edge computing, complex image processing tasks generated by the detection equipment can be uploaded to the cloud and processed quickly, which can enable the accurate and quick location of the leak point.

This paper proposes a natural gas leak detection algorithm that combines edge computing task offloading with portable natural gas leak detection technology—a real-time multi-leak detection algorithm based on the improved advantage actor-critic (AAC) method—to improve the detection efficiency and endurance of instruments. We consider a three-tier edge computing architecture with cloud-side and end-to-end collaboration, where the portable gas leak detector is at the end of the system and has some computing power itself. To improve the efficiency and range of the detector, the image analysis task must be offloaded, so as to determine where to process the task and allocate resources to it. The current system state is determined, and this is input to the constructed AAC network to determine the processing position of tasks in the system. The results obtained from the network are optimized using the proposed heuristic algorithm. At the same time, the allocation strategy of communication resources is determined, and tasks are scheduled and executed according to the offloading results. By analyzing the problem, improving the detection efficiency and range of the existing detection instruments can address the objectives in the edge computing task offloading problem, that is, to improve the real-time performance and minimize the energy consumption of the edge computing system as much as possible. This paper makes the following contributions:

The system real-time requirements and energy consumption limits are modeled in a unified manner. Tasks arrive and are unloaded in real time in order to be close to the real situation to the maximum extent, which enables the proposed algorithm in this study to achieve good practical relevance.
A real-time multi-leak detection algorithm based on the improved AAC is proposed to solve the problem in that traditional reinforcement learning methods are difficult to converge and traditional heuristics cannot fully consider various influencing factors. The proposed algorithm in this paper allows the AAC algorithm to complement the traditional heuristics, and the AAC algorithm can fully take into account the impact of various environmental factors on the unloading results while also taking into account the long-term payoff of the system. However, the AAC algorithm is a reinforcement learning algorithm whose convergence effect is not stable enough, so this paper supplements the AAC algorithm with a heuristic algorithm to correct the obtained results in order to ensure that the proposed algorithm can at least achieve the performance of the heuristic algorithm. Moreover, through detailed mathematical analysis, the condition that the proposed heuristic algorithm obtains the minimum value of total system cost (TSC) is proved to hold.
We compare the performance of the proposed algorithm with that of the deep Q network (DQN), the artificial gorilla troops optimizer (GTO), the black widow optimization algorithm (BWOA), the exploration-enhanced grey wolf optimizer (EEGWO), the African vultures optimization algorithm (AVOA), and the driving training-based optimization (DTBO), and two other baseline algorithms. Experiments show that the proposed algorithm reduces the energy consumption by 56% compared to DQN. Compared with GTO, BWOA, EEGWO, AVOA and DTBO algorithms, the energy consumption is reduced by 21%, 38%, 30%, 31% and 44%, respectively. The energy consumption is reduced by 50% compared to the fully local computing algorithm, and by 30% compared to the fully offloading algorithm. Meanwhile, the task completion rate of this algorithm reaches 96.3%, which is the best real-time performance among these algorithms. In addition, the proposed algorithm in this paper has a faster convergence speed than the DQN algorithm.

The remainder of this paper is organized as follows. Section 2 describes related work. Section 3 presents the proposed system model and describes the problem. Section 4 details the main steps of the proposed algorithm. Section 5 compares the performance of the proposed algorithm with baseline algorithms such as DQN and GTO through experiments. Section 6 concludes the paper.

2. Related Work

Many studies have been conducted on the task offloading problem of edge computing, which is NP-hard, and all solutions thus far have been approximate. However, different optimization techniques can be used such that the approximate solution converges to the optimal solution. These solutions start either with machine learning or traditional means such as greedy heuristics, integer optimization, branch delimitation, game theory, or convex optimization. The two most important factors in edge computing are latency and energy consumption.

2.1. Traditional Task Offloading Methods

Kan et al. proposed a heuristic algorithm for offloading tasks to MEC servers considering radio and computational resources with the goal of minimizing the average task latency, which was shown by experiments to achieve excellent results under different latency requirements [4]. Due to the relative lack of infrastructure, they introduced drones to assist in edge computing, and proposed a USS algorithm [5] that can satisfy the task processing latency constraint in the multiuser case. Wang, Shen, and Zhao introduced a dynamic penalty function in a study of edge computing in the smart grid domain, and proposed an improved algorithm for solving Lagrange multipliers [6], which overcomes the shortcomings of traditional grid systems that cannot provide deterministic services, and can effectively improve the overall system revenue and reduce the average delay of user tasks. Li et al. considered event-triggered decision systems, whose goal is to optimize the average system revenue to satisfy the average delay constraint for different priority services [7]. Ref. [8] presented designs of online computing task scheduling methods for multi-server edge computing scenarios [8]. Sun et al. [9] considered an ultra-dense network environment that supports edge computing. Constantly moving users dynamically generate computational tasks in the network, which need to be offloaded to the base station for computation. In order to minimize the average delay given a limited energy budget, users need to make mobility management decisions about base station association and switching based on their service requirements without knowing future information.

System energy consumption has long been a concern among edge computing researchers, as an important component of system cost, and especially in mobile edge computing, where energy consumption directly affects system endurance and reliability. Michael proposed a hybrid method based on particle swarm optimization and the gray wolf optimizer [10] to optimize the energy consumption of MEC task offloading. Ding and Zhang [11] proposed a game theory-based computational offloading strategy for massive IoT devices, which improves data transfer and reduces task energy consumption using the beneficial task offloading theory.

Delay and energy consumption factors are usually considered together, and are important factors affecting the user experience. Researchers can decide whether to optimize delay or energy consumption based on specific requirements. Some studies have considered the minimum energy consumption while satisfying the latency constraint using heuristic algorithms [12,13]. Others have proposed a more flexible optimization objective, synthesizing both into a cost objective, where the weights of delay and energy consumption in the cost formulation can be changed according to the case [14,15]. Ref. [16] considered two different cases of adjustable and non-adjustable CPU frequency of APs. A linear relaxation based approach and an exhaustive search based approach are proposed to obtain the offloading decision for these two cases, respectively. The method aims to minimize the total task ground execution delay and the energy consumption of the mobile device (MD) [16]. In order to trade-off the two metrics of energy consumption and computational latency, a Liapunov-based algorithm was proposed in Ref. [17] for computing task offloading decisions in mobile edge computing systems. The algorithm greatly reduces the energy consumption of the device while satisfying the latency constraint [17]. Ref. [18] investigated the computational offloading and scheduling problem, which seeks to minimize the cost per mobile device, where the cost is defined as a linear combination of task completion time and energy consumption. In addition, the literature considers inter-device communication and competition for computational resources. The problem is also defined formally using a game model, and a decentralized algorithm is designed to achieve a pure policy Nash equilibrium [18]. Tang et al. modeled the multi-user computational offloading problem in an uncertain wireless environment as a non-cooperative game based on PT, and then proposed a distributed computational offloading algorithm to obtain a Nash equilibrium, which minimizes the user overhead [19]. Yi et al. considered that tasks are randomly generated by mobile users and proposed a mechanism based on queuing model. This is used to maximize social welfare and achieve the equilibrium of the noncooperative game among mobile users [20].

Task offloading algorithms based on the above studies are based on ideal mathematical models, and cannot consider all the factors that affect the optimization objective, which limits their task offloading performance. To solve this problem, a new class of offloading methods has been proposed, using deep learning techniques, with good results.

2.2. Machine Learning Task Offloading Methods

To cope with the variability of edge computing application environments, Wang and Jia et al. proposed a meta-reinforcement learning-based approach to solve the computational offloading problem [21], which enables fast adaptation to dynamic scenarios without updating too many parameters. A joint task offloading and bandwidth allocation problem was considered for multiuser computational offloading, with the goal of minimizing the overall delay in completing user tasks, using a DQN approach to find the optimal solution [22].

Wang Jin et al. [23] found that studies using DRL for task offloading rarely focus on the dependencies between tasks, and proposed a DRL offloading method that can address dependent tasks. The general dependency of tasks was modeled as a directed acyclic graph (DAG), and an S2S neural network captured the features of the DAG and output the offloading strategy. The method can use delay, energy consumption, or tradeoffs of both as optimization objectives.

In Ref. [24], the authors are the first to attempt to consider end-device energy consumption in a deep learning-based modeling of MEC partial offloading schemes [24]. They propose a novel partial offloading scheme EEDOS based on a fine-grained partial offloading framework, in which the cost function comprehensively considers important parameters such as residual energy of end-devices and energy consumption of previous application components. Dai and Niu [25] used unmanned aerial vehicles (UAVs) to assist edge servers for task offloading, minimizing the energy consumption of all mobile end devices by jointly optimizing UAV trajectories, task association, and the resource allocation of computation and transmission. They reduced the problem complexity by decomposing the joint optimization problem into the subproblems of UAV trajectory planning, task association scheduling, and resource allocation of computation and transmission. A proposed hybrid heuristic and learning-based scheduling strategy (H2LS) algorithm incorporated long short-term memory neural networks, fuzzy c-means, deep deterministic policy gradients, and convex optimization techniques.

As with traditional optimization techniques, most of the research on the application of deep learning in offloading edge computing tasks focuses on the integrated consideration of delay and energy consumption. To focus on only one of these aspects can bring the results closer to the optimal solution, at the price of a narrow range of practical applications. Yang and Lee proposed a deep supervised learning-based dynamic computing task offloading approach (DSLO) for mobile edge computing networks [26], minimizing the delay and energy consumption by jointly optimizing the offloading decision and bandwidth allocation problem. Cao et al. proposed a multi-intelligent deep reinforcement learning (MADRL) scheme [27] to solve the multichannel access and task offloading problems in edge computing-enabled Industry 4.0, which allows edge devices to collaborate and significantly reduce computational latency and mobile device energy consumption relative to traditional methods. Huang et al. [28] considered a mobile edge computing system, in which each user has multiple tasks transferred to the edge server over a wireless network. They proposed a deep reinforcement learning based approach to solve the problem of joint task offloading and resource allocation. In Refs. [29,30], the authors proposed to use deep reinforcement learning methods to solve the task offloading problem in mobile edge computing, and made some progress, obtaining better latency and energy consumption than when using deep learning [29,30].

Although the above deep learning-based solutions have achieved good results, they have limitations if we only consider how to optimize the latency and energy consumption of task processing. In the problem addressed in this paper, each image analysis task is generated in real time, and the optimization goal of low latency can cause some tasks to have low processing latency, at the cost of some subsequent tasks that exceed their deadlines; hence, they cannot guarantee overall high real-time performance. We propose an AAC and heuristic policy-based task offloading algorithm that simultaneously considers overall task execution in real time and low energy consumption, and use it to optimize the performance of a portable gas leak detector. The algorithm reduces the energy consumption of the detector as much as possible by jointly optimizing the task offloading location and resource allocation problems while ensuring completion within the deadline.

3. System Model and Problem Description

3.1. System Model

The edge computing system (ECS) consists of a cluster of cloud servers, a wireless communication base station with small edge servers, and K portable gas leak detectors,

γ

= {

U_{1}

,

U_{2}

,

U_{3}

, ……,

U_{k}

}; each detector

U_{i}

can generate in time order a series of independent image recognition tasks, each task of all detectors is generated in real time and cannot be split, and the set of tasks can be denoted by

Г_{i}

= {

T_{i, 1}

,

T_{i, 2}

,

T_{i, 3}

, ……

T_{i, N}

}; each task has six attributes, and any task

i

can be denoted as

T_{i}

= {

j

,

s_{i}

,

d_{i}

,

D_{i}

,

c y_{i}

,

ω_{i}

}, where

j

is the serial number of the detector,

s_{i}

is the release time of task

i

(in seconds),

d_{i}

is the relative deadline of the task,

D_{i}

is the size of the data carried by the task (in Mb),

c y_{i}

is the CPU processing cycle required by the task, and

ω_{i}

is its priority. An example of the system model is shown in Figure 2. The cloud server has sufficient resources for the detectors, so there is no need to consider the waiting and preemption of tasks in the cloud, and only one task can be processed at a time on a detector. The task offloading algorithm is deployed on the edge server in the communication base station, and the information changes of each node are transmitted to the edge server in real time. In this model, the tasks to be offloaded are generated by the detector in real time, and each task is indivisible. Considering that the offloading decision requires knowledge of the global information of the system, while the system that transmits the main task parameters is not a complete model and the base station is very close to the detector and there is no conflict in the transmission process, the offloading algorithm generates comparable and almost negligible communication energy consumption and delay whether it is executed on the detector or on the base station equipped with the edge computing server [1]. The edge server has more arithmetic power and faster execution, so the communication base station is left in charge of the communication function and makes the offloading decision, based on which the detector offloads the computational task to the cloud or processes it locally. If offloaded to the cloud, the cloud server will return the results after processing, and the energy consumption of the detector during the offloading process includes that for transmission and local processing.

3.2. Description of problem

Since each task in the system can be chosen to be executed either locally or in the cloud, an offloading decision variable is introduced to indicate the execution location of a task,

π_{i}^{C} = {\begin{matrix} 0 T a s k e x e c u t e d l o c a l l y, \\ 1 T a s k e x e c u t e d i n t h e c l o u d \end{matrix}

(1)

The transmission power of all edge devices (detectors) is

P

. The data transmission rate assigned for any task

i

is

r_{i}

, the average CPU frequency of the cloud server is

F^{C}

, and the CPU frequency of the edge device is

f_{i}^{L}

. Therefore, the time to locally execute task

T_{i}

of edge device U is

t_{i}^{L} = \frac{c y_{i}}{f_{i}^{L}}

(2)

The local execution energy consumption of a task is

e_{i}^{L} = a * {(f_{i}^{L})}^{2} * c y_{i}

(3)

If a task is unloaded, its unloaded transfer time is

t_{i}^{L C} = \frac{D_{i}}{P}

(4)

The cloud processing time of a task is

t_{i}^{C} = \frac{c y_{i}}{F^{c}}

(5)

The offloading transmission energy consumption of a task is

e_{i}^{T} = \frac{P_{i} * D_{i}}{r_{i}}

(6)

where

a

is the chip-related energy consumption coefficient of edge device U [31].

The mathematical model described in this paper must optimize the objectives of the real-time system and the total energy consumption of edge devices for synergistic optimization while considering load balancing. To achieve the joint optimization of the above objectives, the model optimization objective is transformed to the total system cost TSC,

T S C = \min_{r_{i}, π_{i}^{L}} \sum_{1}^{M} (1 - π_{i}^{C}) * a * {(f_{i}^{L})}^{2} * c y_{i} + π_{i}^{C} * \frac{\bar{E}}{E_{i}} * \frac{P_{i} * D_{i}}{r_{i}}

(7a)

π_{i}^{C} = {0, 1}

(7b)

0 \leq f_{i}^{L} \leq F_{i}^{L}

(7c)

0 \leq E_{i} \leq 1

(7d)

0 \leq \sum_{i = 1}^{M} r_{i} \leq R

(7e)

where

E_{i}

is the remaining power percentage of edge device

i

, and

\bar{E}

is the average power percentage of all devices that are idle and must perform offload tasks.

Equation (7a) is the weighted sum of the local execution energy consumption and offload transmission energy consumption for task

i

, and

\frac{\bar{E}}{E_{i}}

is the distance between the remaining power of each device and the average power. A larger

\frac{\bar{E}}{E_{i}}

indicates that the remaining power of the device is farther from the average power. To reduce the energy consumption of the device, it has the opportunity to share more communication resources (faster data transmission rate) when the system performs bandwidth resource allocation [1]

Constraints 7b, 7c, 7d, and 7e refer to the offloading decision variables, range of CPU frequency variation per device, range of power percentage variation per edge device, and range of data transfer rate variation per device, respectively.

The variables involved in the model are shown in Table 1.

4. Task Offloading Algorithm

The proposed task offloading algorithm has two parts. The AAC algorithm gives the scheduling location of the task. The initial offloading decision is obtained by the heuristic algorithm, based on which the AAC network is updated. The heuristic algorithm can be solved quickly for the NP-hard problem, but the suboptimal solution found by this method has room for improvement. Reinforcement learning is used to optimize the obtained unloading strategy. The algorithms are described below.

4.1. Heuristic Algorithm

The heuristic algorithm considered in this paper takes Equation (7a) as the optimization objective. Since the optimization for the TSC is an NP-hard problem, the deep reinforcement learning algorithm is used to first determine whether the new arrival task is to be offloaded, and the Cauchy–Buniakowsky–Schwarz inequality is used to derive the transmission rate allocation, and thus the processing time for each task. If the processing time exceeds the task deadline, the processing position of the task is redetermined according to the priority of the task, and the transfer rate allocation is calculated. Iterations continue until an approximate optimal solution is found.

The Cauchy–Buniakowsky–Schwarz inequality is often applied to quickly solve n-dimensional inequalities [32] and its application to solve the system communication resource allocation can simplify computations and reduce the execution time of the offloading algorithm. When using this inequality, we must first ensure that the left-hand side of the inequality can be split into two non-negative expressions multiplied together.

Theorem 1.

If the inequality

R * \sum_{1}^{M} \frac{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}{r_{i}} \geq {(\sum_{i}^{M} \sqrt{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}})}^{2}

satisfies both

R > 0

and

\sum_{1}^{M} \frac{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}{r_{i}} \geq 0

, the equality sign holds when and only when

r_{i}^{*} = \frac{R * \sqrt{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}}{\sum_{i = 1}^{M} \sqrt{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}}

(8)

Additionally, when

r_{i}

=

r_{i}^{*}

, TSC obtains the minimum value.

Proof of Theorem 1.

It is known that

R

is the total transmission rate of the system, which is always positive, and each term in

\sum_{1}^{M} \frac{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}{r_{i}}

is greater than or equal to zero, which satisfies the condition of use of the Cauchy–Buniakowsky–Schwarz inequality. Then the following inequalities are solved by combining the constraints, and the specific solved process is stated in Equation (9) for the optimization objective expression (7a) and its constraint (7e):

R * \sum_{1}^{M} \frac{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}{r_{i}} \geq \sum_{i}^{M} r_{i} * \sum_{1}^{M} \frac{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}{r_{i}} \geq {(\sum_{i}^{M} \sqrt{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}})}^{2} .

(9)

According to the Cauchy–Buniakowsky–Schwarz inequality, if there exists some

r_{i}

not equal to 0, the equality sign holds when and only when there exists a real number

X

such that for every

i = 1, 2, …, n

, there is

r_{i} * X + \frac{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}{r_{i}} = 0

, i.e.,

r_{i}^{*} = \frac{R * \sqrt{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}}{\sum_{i = 1}^{M} \sqrt{π_{i}^{C} * \frac{\bar{E}}{E_{i}} * P_{i} * D_{i}}}

(10)

and when

r_{i}

=

r_{i}^{*}

, TSC obtains the minimum value, and the theorem is proved. □

In the task scheduling process of the real-time edge system in this paper, not only should we consider making the energy consumption of the edge devices as low as possible, but the tasks should meet the deadline requirements to the maximum extent to improve the real-time performance of the whole system. In traditional scheduling methods, often only the remaining execution time or deadline of a task is used to reflect that the urgency of task execution evaluation criteria is too singular. We propose a dynamic priority evaluation method that integrates the initial priority, remaining execution time, deadline, and idle time of a task. The dynamic task priority

Ω_{i}

is composed of the preemption cost

δ_{i}

of the task and the execution urgency

φ_{i}

,

Ω_{i} = δ_{i} * φ_{i}

(11)

where

δ_{i} = \frac{ω_{i}}{t_{i}^{L C} + t_{i}^{C}}

(12)

Tasks have different levels of importance. Equation (12) integrates the initial priority of tasks and deadlines, which can ensure that important tasks can be completed on the basis of as many tasks as possible that are close to the deadline and can also be executed, which protects the tasks being executed to some extent. The task execution urgency is

φ_{i} = q^{\frac{t_{i}^{L C} + t_{i}^{C}}{d_{i} - t}}

(13)

where

t

is the current moment and

q \in (1, \infty)

. The execution urgency of the task decreases as the task is executed, which in turn gives a somewhat greater chance of execution for newly arrived tasks.

4.2. Deep Reinforcement Learning Algorithms

To perform further optimization based on the task unloading decision obtained from the heuristic strategy, a deep reinforcement learning model, AAC, is considered to perform the unloading decision for the newly arrived task. The network structure of the model is shown in Figure 3.

From Figure 3 above, we can see that the AAC network is composed of two sub-networks, actor and critic, where the first two layers of the two sub-networks are shared, in order to reduce the complexity of the model and speed up the network convergence. Meanwhile, the hidden layers of both sub-networks consist of 256 × 128 neurons, which is the best combination chosen after several attempts in the experiments. Through keeping the other conditions of the experiment constant, only the number of neurons in the network was allowed to increase evenly between 64 × 64 and 256 × 256. We found that too few neurons make the training unstable and difficult to converge, while too many neurons lead to overfitting. Optimal model performance is only achieved when the number of neurons is varied to near 256 × 128.

The offloading decision is a prerequisite for resource allocation. We discuss the three elements of the reinforcement learning-based offloading decision method: environment, action, and reward.

Environment state $S$

The superiority of the state will have a great impact on the final training effect of the model. In this model, the environment state includes the state of the task and the external environment. The model is updated only when a new task arrives, and these arrive in chronological order, so the task’s own state includes the properties of the new task, and the external environment state has the remaining power

E_{i}

of each node in the system at this time, the average remaining power

\bar{E}

, the CPU speed

f_{i}^{L}

of the node generating the task, the average CPU speed

F^{C}

of the cloud, and the number of tasks to be transmitted in the system.

2: Action $a$

In the reinforcement learning model, the action is the decision made by the agent, and there are only two actions in this scheduling model: transmission and non-transmission.

3: Reward function

The output of the reinforcement learning model is the probability

p_{θ} (a | S)

of selecting different actions in a certain state. To measure the goodness of an action, the system cost TSC is used as the reward.

The AAC algorithm first defines an initial actor

π

to interact with the environment, as shown in Figure 4. The collected information is used to train the critic network to estimate the value function

V

, which is the sum of the rewards received by the system after performing an action until the end of the interaction. The actor network is updated and iterated until both networks converge. The actor network parameters are updated as follows:

\tilde{\nabla R_{θ}} \approx \frac{1}{N} \sum_{n = 1}^{N} \sum_{t = 1}^{T_{n}} (L S C_{t}^{n} + V^{π} (S_{t + 1}^{n}) - V^{π} (S_{t}^{n})) \nabla l o g p_{θ} (a_{t}^{n} | S_{t}^{n})

(14)

θ = θ - η * \tilde{\nabla R_{θ}}

(15)

where

\tilde{\nabla R_{θ}}

is the gradient of the mean of the reward sum of multiple trajectories, and

θ

is the parameter of the actor network. Since the optimization goal is to reduce energy consumption while satisfying the real-time performance of the task, which is the opposite of the goal of maximizing the reward of reinforcement learning, gradient descent is used to update the network.

4.3. Algorithm Process

We combine the heuristic algorithm and deep reinforcement learning algorithm, using the AAC network for offloading decisions, and a heuristic algorithm for resource allocation, as shown in Algorithm 1.

Algorithm 1: Offloading algorithm for edge computing tasks based on deep reinforcement learning.

Input: Number of training rounds

M

, set of tasks

G_{M * L}

used for training, number of edge devices

N

, CPU frequency

f_{i}^{L}

, energy consumption factor

α

, transmission power

P_{i}

, CPU frequency

F^{C}

of cloud server, total data transmission rate

R

available to system, learning rate of actor and critic networks

Output: Trained Advantage Actor-critic model, total energy consumption of system, and task completion rate

1. Initialize model and related parameters;

2. Let actor

π

interact with environment

3. for i = 1 to M do

4. Storing j-th task in i-th subtask set in training set in a list;

5. Updating status of environment and storing it in the list;

6. Input environmental state parameters to actor network, and record action

a

selected by network with probability

p

;

7. Determining value of offloading decision variable

π_{i}^{C}

based on recorded actions;

8. while flag = True
9. Passing value of

π_{i}^{C}

into Equation (7a) and using Equations (8) and (9) to derive allocation of data transfer rate

r

;

10. Substitute

r

back into Equation (7a) to find value of TSC at this point, and use Equations (4) and (5) to calculate expected transmission time

t_{i}^{L C}

and execution time

t_{i}^{C}

for task;

11. if

d_{i}

<

t_{i}^{L C}

+

t_{i}^{C}

12. Calculating priority of all tasks being transmitted and pending transmission and forcing selection of lowest priority task to be executed locally;

13. end if

14. if All tasks that are subject to offload meet deadlines

15. flag = False

16. end if

17. Storing track data;

18. end for

19. Training critic network using stored trajectory data and storing output

V

of critic network each time;

20. Training actor network once more with data stored in steps 6, 18, and 20;

21. end for

22. return Trained Advantage Actor-critic model, total energy consumption of system, along with task completion rate;

The core part of Algorithm 1 uses a heuristic algorithm and the AAC network, which is a deep reinforcement learning network model. In the edge computing scenario considered in this paper, task offloading, and resource allocation is an NP-hard problem. At the same time, the uncertainty of task arrival poses a great challenge for task offloading. Facing this multi-objective optimization problem, traditional optimization techniques (e.g., linear programming) have difficulty in obtaining better results [33]. In addition, deep reinforcement learning has two advantages in facing the above problem: (1) compared with many one-time optimization methods, deep reinforcement learning can adjust the strategy with the change of environment; (2) its learning process does not need to know the relevant a priori knowledge about the law of network state change over time [34,35]. In fact, the heuristic algorithm is the basis on which the present model can operate efficiently, and the main purpose of the AAC is to further optimize the optimization results derived from the heuristic algorithm. The heuristic algorithm performs the optimization search by introducing the Cauchy–Buniakowsky–Schwarz inequality, which can reduce the number of iterations and greatly accelerate the solution efficiency by using the conclusion of Theorem 1.

The AAC model is improved from the actor-critic model. In the actor-critic model, both Q-network (Network to evaluate good and bad actions) and V-network (Network to evaluate good and bad status) need to be estimated, which is not only time-consuming, but also has greater uncertainty. In the AAC model, the expectation of the V-network is directly used to estimate the Q-network, that is, the critic network is allowed to learn the Advantage value directly instead of the Q value. In this way, the assessment of behavior is not only based on how good the behavior is, but also on how much the behavior can be improved. The benefit of the advantage function is that it reduces the variation in the values of the policy network, and stabilizes the model, giving the AAC model superior convergence.

5. Experimental Results and Analysis

Simulation experiments are used to demonstrate the performance of the proposed algorithm. All parameters are chosen according to real scenarios. As shown in Table 2, the number of portable detection devices is set to 10, their computational power is 0.2 GCycles/s, and that of the cloud is 10 GCycles/s. The transmission power (w) of the portable devices is a random number in (0.1,0.2), with a total system transmission rate of 800 Mb/s. The amount of data for each task is (10,40) Mb, and the required computation period is (0.01,0.3) GCycles. The arrival time of the task conforms to a uniform distribution [36].

To demonstrate the performance of the improved AAC-based multi-leakage real-time detection algorithm, the algorithm and the DQN algorithm are trained simultaneously in the same environment. The proposed algorithm is also compared with two benchmark algorithms, that is, task fully local computation and full offloading. Meanwhile, in order to better represent the performance of the proposed algorithm in this paper, we also compare it with a series of excellent heuristics, such as GTO [37], BWOA [38], EEGWO [39], AVOA [40] and DTBO [41].

The variations in the total cost TSC per iteration based on the improved AAC multi-leakage real-time detection algorithm and the DQN algorithm in this experimental setting are shown in Figure 5. From Figure 5, it can be seen that the proposed algorithm in this paper has nearly stabilized and the model reached convergence at 50 rounds of training, while the DQN algorithm only shows a significant trough when the training reaches 700 rounds. Although both use a 256 × 128 network structure, the AAC algorithm allows for more stable training and faster convergence due to the presence of the critic network. The figure also shows that the total cost per round of the proposed algorithm is lower than that of the DQN algorithm, so it is better in terms of overall performance. In Figure 6, the vertical coordinate indicates the total system energy consumption. The energy consumption variation curve of the improved AAC-based multi-leakage real-time detection algorithm is approximately 56% lower than that of the DQN algorithm after convergence. The AAC algorithm is improved on the basis of the DQN algorithm, which overcomes the problem of unstable training of the DQN algorithm. Moreover, the AAC algorithm in this paper is not used alone, it works as a further enhancement after the heuristic algorithm gets the suboptimal solution of the model. Therefore, this algorithm can obtain a large improvement relative to the DQN algorithm.

Figure 7 and Figure 8 compare the line graphs of the energy consumption of the improved AAC-based multi-leakage real-time detection algorithm to the fully local computation algorithm and the fully offloading algorithm. Since these two algorithms are not machine learning algorithms, there is no training process, so it is not necessary to compare the algorithm convergence here. They show that the system using the improved AAC-based multi-leakage real-time detection algorithm has better total energy consumption than the two baseline algorithms, thus saving approximately 50% of the energy consumption compared to the fully local calculation, and saving approximately 30% of the energy consumption compared to the fully unloaded algorithm. To make the experiments more realistic, the test tasks have different amounts of data and complexity; thus, having them all executed locally or in the cloud would result in higher energy consumption due to the underutilization of system resources. At the same time, if we combine Figure 5, Figure 6 and Figure 7 together for comparison, we can see that the system energy consumption of the DQN algorithm is around 9, which would be slightly higher than the 6.5 energy consumption for local computation and 5.3 energy consumption for full offloading. This is due to the fact that in the scenario considered in this paper, the task to be offloaded is so random that the performance of the DQN algorithm is no longer sufficient for this scenario, and incorrect predictions can waste a lot of energy.

Figure 9 shows the comparison of the total system energy consumption between the proposed algorithm and some current excellent heuristics. With a simple calculation, we can conclude that the proposed algorithm saves 21%, 38%, 30%, 31% and 44% of energy consumption compared to the GTO, BWOA, EEGWO, AVOA and DTBO algorithms, respectively. If combined with Figure 6, Figure 7 and Figure 8, it can be seen that all these heuristics used for comparison in the experiments perform well. Nevertheless, the proposed algorithm outperforms them. Thus, we can say with more certainty that due to the addition of deep reinforcement learning, the performance of the traditional heuristic algorithm can be brought to a higher level.

In this experiment we also obtained another metric to evaluate the performance of the algorithms, namely the task completion rate. Based on the output of the code, we can obtain the task completion rate of 96.4% for the proposed algorithm and 93.2% for the DQN algorithm; the corresponding values were 92.8%, 90.3%, 89.4%, 94.3% and 91.1% for the GTO, BWOA, EEGWO, AVOA and DTBO algorithms, respectively. The task completion rates for the fully local computing and fully offloading algorithms were 86.7% and 93.4%, respectively. According to the experimental results, the use of the proposed algorithm in this paper allows the highest execution success rate of the tasks, indicating that this algorithm has the best real-time performance and can ensure that as many tasks as possible are completed before the deadline. The task completion rate of the fully local computing algorithm is the lowest, which is mainly due to the high complexity of the task and the limited computing power of the nodes.

From the experiments designed in this paper, we can know that this algorithm design idea is reasonable and effective. It is based on the principle of using heuristic algorithm for initial optimization at first, and then further optimization using deep reinforcement learning. It can bring about more efficient task offloading for edge computing, which not only ensures the real-time performance of the algorithm, but also further reduces the system energy consumption compared to the current better optimization-seeking algorithms such as GTO.

6. Conclusions and Future Work

We studied an edge computing task offloading and resource allocation problem in a natural gas pipeline leak detection scenario, with the optimization goal of minimizing energy consumption while ensuring high real-time performance of the system. Due to the unpredictability of computational tasks, deep reinforcement learning was used to solve this problem. Using the AAC algorithm framework, the final offloading strategy was obtained by fully considering minimizing the overall system cost and continuously optimizing the task offloading strategy, followed by optimizing the allocation of communication resources through a heuristic algorithm based on the Cauchy–Buniakowsky–Schwarz inequality. Simulation results show that this algorithm has a faster convergence speed compared to the DQN algorithm, while the energy consumption is reduced by 56%. Although heuristics such as GTO, BWOA, EEGWO, AVOA and DTBO have better performance than the DQN algorithm, the proposed algorithm still saves 21%, 38%, 30%, 31% and 44% of energy consumption compared to them, respectively. The energy consumption is reduced by 50% compared to the fully local computation, and by 30% compared to the fully offloaded algorithm. This algorithm also has the highest task completion rate with the highest real-time performance. Furthermore, this paper proves a sufficient condition for the heuristic algorithm to achieve a suboptimal solution using the Cauchy–Buniakowsky–Schwarz inequality. From the performance of the DQN algorithm in the experiments, due to the strong real-time nature of the scenario in this paper and the strong uncertainty of the system environment, the model convergence speed of the reinforcement learning algorithm alone is slow, and at the same time, incorrect offloading predictions also tend to lead to higher energy consumption. Finally, the proposed algorithm in this paper is also not optimal for certain application scenarios. This algorithm uses a complex deep reinforcement learning model in order to meet the performance requirements of task arrival scenarios in real time. In contrast, for deterministic scenarios where the set of tasks to be offloaded is known and no prediction of future tasks is required, simpler heuristics, such as linear programming algorithms, etc. can achieve the same or even better performance, and the latter is clearly the better choice.

In this paper, the communication environment of the system is simplified while modeling, and the interference factor of the channel is not considered. The allocation of network resources in the edge computing system is also idealized and will be studied in detail in the next work in conjunction with SDN technology. In future work, we will also further consider the mutual cooperation among edge nodes, in order to maximize the utilization of system idle resources and further reduce the system’s energy consumption. In order to further improve this model, we will also allocate computation and storage resources in edge and cloud servers in a more granular way.

Author Contributions

Conceptualization, D.W., R.W. and C.X. (Changqing Xia); Formal analysis, R.W. and C.X. (Changqing Xia); Funding acquisition, C.X. (Changqing Xia); Methodology, R.W.; Project administration, C.X. (Changqing Xia), X.J. and C.X. (Chi Xu); Resources, D.W. and C.X. (Changqing Xia); Software, R.W. and T.X.; Supervision, D.W.; Validation, R.W.; Writing—original draft, R.W.; Writing—review and editing, C.X. (Changqing Xia). All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by National Key Research and Development Program of China (2022YFB3304004), the National Natural Science Foundation of China (61903356, 61972389, 62133014, 62022088, 62173322 and U1908212), the National Natural Science Foundation of Liaoning province (2022JH6/100100013), and the Youth Innovation Promotion Association CAS (2020207, 2019202, Y2021062).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sun, C.; Li, H.; Li, X.; Wen, J.; Xiong, Q.; Wang, X.; Leung, V.C.M. Task Offloading for End-Edge-Cloud Orchestrated Computing in Mobile Networks. In Proceedings of the 2020 IEEE Wireless Communications and Networking Conference (WCNC), Seoul, Republic of Korea, 25–28 May 2020; pp. 1–6. [Google Scholar] [CrossRef]
Li, J.; Gao, H.; Lv, T.; Lu, Y. Deep reinforcement learning based computation offloading and resource allocation for MEC. In Proceedings of the 2018 IEEE Wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 15–18 April 2018; pp. 1558–2612. [Google Scholar] [CrossRef]
Zhang, X.; Jin, W.; Li, L.; Wang, X.; Qin, C. Research progress on passive infrared imaging detection technology and system performance evaluation of natural gas leakage. Infrared Laser Eng. 2019, 48 (Suppl. S2), 47–59. [Google Scholar] [CrossRef]
Kan, T.; Chiang, Y.; Wei, H. Task offloading and resource allocation in mobile-edge computing system. In Proceedings of the 2018 27th Wireless and Optical Communication Conference (WOCC), Hualien, Taiwan, 30 April–1 May 2018; pp. 1–4. [Google Scholar] [CrossRef]
Tan, T.; Zhao, M.; Zhu, Y.; Zeng, Z. Joint Offloading and Resource Allocation of UAV-assisted Mobile Edge Computing with Delay Constraints. In Proceedings of the 2021 IEEE 41st International Conference on Distributed Computing Systems Workshops (ICDCSW), Washington, DC, USA, 7–10 July 2021; pp. 21–26. [Google Scholar] [CrossRef]
Wang, Q.; Shen, J.; Zhao, Y.; Li, G.; Zhao, J.; Zhang, Y.; Guo, Y. Offloading and Delay Optimization Strategies for Power Services in Smart Grid for 5G Edge Computing. In Proceedings of the 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; pp. 1423–1427. [Google Scholar] [CrossRef]
Li, Q. An Actor-Critic Reinforcement Learning Method for Computation Offloading with Delay Constraints in Mobile Edge Computing. arXiv 2019, arXiv:1901.10646. [Google Scholar] [CrossRef]
Han, Z.; Tan, H.; Li, X.; Jiang, S.; Li, Y.; Lau, F.C.M. OnDisc: Online Latency-Sensitive Job Dispatching and Scheduling in Heterogeneous Edge-Clouds. IEEE/ACM Trans. Netw. 2019, 27, 2472–2485. [Google Scholar] [CrossRef]
Sun, Y.; Zhou, S.; Xu, J. EMM: Energy-Aware Mobility Management for Mobile Edge Computing in Ultra Dense Networks. IEEE J. Sel. Areas Commun. 2017, 35, 2637–2646. [Google Scholar] [CrossRef] [Green Version]
Mahenge, M.P.J.; Li, C. Energy-efficient task offloading strategy in mobile edge computing for resourceintensive mobile applications. Digit. Commun. Netw. 2022, 8, 19–37. [Google Scholar] [CrossRef]
Ding, X.; Zhang, W. Computing Unloading Strategy of Massive Internet of Things Devices Based on Game Theory in Mobile Edge Computing. Math. Probl. Eng. 2021, 2021, 1–12. [Google Scholar] [CrossRef]
Guo, H.; Liu, J. Collaborative Computation Offloading for Multiaccess Edge Computing Over Fiber–Wireless Networks. IEEE Trans. Veh. Technol. 2018, 67, 4514–4526. [Google Scholar] [CrossRef]
Gu, B.; Zhou, Z.; Mumtaz, S.; Frascolla, V.; Kashif Bashir, A. Context-Aware Task Offloading for Multi-Access Edge Computing: Matching with Externalities. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar] [CrossRef]
Ni, W.; Tian, H.; Lyu, X.; Fan, S. Service-dependent task offloading for multiuser mobile edge computing system. Electron. Lett. 2019, 55, 839–841. [Google Scholar] [CrossRef]
Luo, J.; Deng, X.; Zhang, H.; Qi, H. QoE-Driven Computation Offloading for Edge Computing. J. Syst. Archit. 2019, 97, 34–39. [Google Scholar] [CrossRef]
Dinh, T.Q.; Tang, J.; La, Q.; Quek, T.Q.S. Offloading in Mobile Edge Computing: Task Allocation and Computational Frequency Scaling. IEEE Trans. Commun. 2017, 65, 3571–3584. [Google Scholar] [CrossRef]
Wu, H.; Sun, Y.; Wolter, K. Energy-Efficient Decision Making for Mobile Cloud Offloading. IEEE Trans. Cloud Comput. 2020, 8, 570–584. [Google Scholar] [CrossRef]
Jošilo, S.; Dán, G. Computation Offloading Scheduling for Periodic Tasks in Mobile Edge Computing. IEEE/ACM Trans. Netw. 2020, 28, 667–680. [Google Scholar] [CrossRef]
Tang, L.; He, S. Multi-User Computation Offloading in Mobile Edge Computing: A Behavioral Perspective. IEEE Netw. 2018, 32, 48–53. [Google Scholar] [CrossRef]
Yi, C.; Cai, J.; Su, Z. A Multi-User Mobile Computation Offloading and Transmission Scheduling Mechanism for Delay-Sensitive Applications. IEEE Trans. Mob. Comput. 2020, 19, 29–43. [Google Scholar] [CrossRef]
Wang, J.; Hu, J.; Min, G.; Zomaya, A.Y.; Georgalas, N. Fast Adaptive Task Offloading in Edge Computing Based on Meta Reinforcement Learning. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 242–253. [Google Scholar] [CrossRef]
Huang, L.; Feng, X.; Zhang, C.; Qian, L.; Wu, Y. Deep reinforcement learning-based joint task offloading and bandwidth allocation for multi-user mobile edge computing. Digit. Commun. Netw. 2019, 5, 10–17. [Google Scholar] [CrossRef]
Wang, J.; Hu, J.; Min, G.; Zhan, W.; Ni, Q.; Georgalas, N. Computation Offloading in Multi-Access Edge Computing Using a Deep Sequential Model Based on Reinforcement Learning. IEEE Commun. Mag. 2019, 57, 64–69. [Google Scholar] [CrossRef] [Green Version]
Ali, Z.; Jiao, L.; Baker, T.; Abbas, G.; Abbas, Z.H.; Khaf, S. A Deep Learning Approach for Energy Efficient Computational Offloading in Mobile Edge Computing. IEEE Access 2019, 7, 149623–149633. [Google Scholar] [CrossRef]
Dai, B.; Niu, J.; Ren, T.; Hu, Z.; Atiquzzaman, M. Towards Energy-Efficient Scheduling of UAV and Base Station Hybrid Enabled Mobile Edge Computing. IEEE Trans. Veh. Technol. 2022, 71, 915–930. [Google Scholar] [CrossRef]
Yang, S.; Lee, G.; Huang, L. Deep Learning-Based Dynamic Computation Task Offloading for Mobile Edge Computing Networks. Sensors 2022, 22, 4088. [Google Scholar] [CrossRef]
Cao, Z.; Zhou, P.; Li, R.; Huang, S.; Wu, D. Multiagent Deep Reinforcement Learning for Joint Multichannel Access and Task Offloading of Mobile-Edge Computing in Industry 4.0. IEEE Internet Things J. 2020, 7, 6201–6213. [Google Scholar] [CrossRef]
Huang, L.; Feng, X.; Qian, L.; Wu, Y. Deep Reinforcement Learning-Based Task Offloading and Resource Allocation for Mobile Edge Computing. In Proceedings of the MLICOM 2018: Machine Learning and Intelligent Communications, Hangzhou, China, 6–8 July 2018; Springer: Cham, Switzerland, 2018; pp. 33–42. [Google Scholar] [CrossRef]
Huang, L.; Bi, S.; Zhang, Y.J.A. Deep Reinforcement Learning for Online Computation Offloading in Wireless Powered Mobile-Edge Computing Networks. IEEE Trans. Mob. Comput. 2020, 19, 2581–2593. [Google Scholar] [CrossRef] [Green Version]
Chen, X.; Zhang, H.; Wu, C.; Mao, S.; Ji, Y.; Bennis, M. Optimized Computation Offloading Performance in Virtual Edge Computing Systems Via Deep Reinforcement Learning. IEEE Internet Things J. 2019, 6, 4005–4018. [Google Scholar] [CrossRef]
Liu, Y.; Lee, M.; Zheng, Y. Adaptive multi-resource allocation for cloudlet-based mobile cloud computing system. IEEE Trans. Mob. Comput. 2015, 15, 2398–2410. [Google Scholar] [CrossRef]
Tan, L.; Zhang, X.; Zhou, Y.; Che, X.; Hu, M.; Chen, X.; Wu, D. AdaFed: Optimizing Participation-Aware Federated Learning With Adaptive Aggregation Weights. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2708–2720. [Google Scholar] [CrossRef]
Wen, G.; Ge, S.; Chen, C.L.; Tu, F.; Wang, S. Adaptive Tracking Control of Surface Vessel Using Optimized Backstepping Technique. IEEE Trans. Cybern. 2019, 49, 3420–3431. [Google Scholar] [CrossRef]
Yang, Y.; Gao, W.; Modares, H.; Xu, C. Robust Actor–Critic Learning for Continuous-Time Nonlinear Systems With Unmodeled Dynamics. IEEE Trans. Fuzzy Syst. 2022, 30, 2101–2112. [Google Scholar] [CrossRef]
Vu, V.; Pham, T.; Dao, P. Disturbance observer-based adaptive reinforcement learning for perturbed uncertain surface vessels. ISA Trans. 2022, 130, 277–292. [Google Scholar] [CrossRef]
Cao, Y.; Jiang, T.; Wang, C. Optimal radio resource allocation for mobile task offloading in cellular networks. IEEE Netw. 2014, 28, 68–73. [Google Scholar] [CrossRef]
Abdollahzadeh, B.; Gharehchopogh, F.; Mirjalili, S. Artificial gorilla troops optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems. International J. Intell. Syst. 2021, 36, 5887–5958. [Google Scholar] [CrossRef]
Hayyolalam, V.; Kazem, A. Black Widow Optimization Algorithm: A novel meta-heuristic approach for solving engineering optimization problems. Eng. Appl. Artif. Intell. 2020, 87, 103249. [Google Scholar] [CrossRef]
Long, W.; Jiao, J.; Liang, X.; Tang, M. An exploration-enhanced grey wolf optimizer to solve high-dimensional numerical optimization. Eng. Appl. Artif. Intell. 2018, 63, 63–80. [Google Scholar] [CrossRef]
Abdollahzadeh, B.; Gharehchopogh, F.; Mirjalili, S. African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems. Comput. Ind. Eng. 2021, 158, 107408. [Google Scholar] [CrossRef]
Dehghani, M.; Trojovská, E.; Trojovský, P. A new human-based metaheuristic algorithm for solving optimization problems on the base of simulation of driving training process. Sci. Rep. 2022, 12, 9924. [Google Scholar] [CrossRef]

Figure 1. Workflow of edge computing in natural gas pipeline detection.

Figure 2. System model.

Figure 3. Advantage actor-critic network structure.

Figure 4. Trajectory of interaction between actor π and environment.

Figure 5. TSC based on improved AAC multi-leakage real-time detection algorithm and DQN algorithm.

Figure 6. Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm and DQN algorithm.

Figure 7. Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm with fully locally calculated system.

Figure 8. Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm with fully offloaded system.

Figure 9. Total system energy consumption based on improved AAC multi-leakage real-time detection algorithm with some excellent current heuristics algorithms.

Table 1. Model variables.

Symbol	Definition
$s_{i}$	Task $i$ release time
$d_{i}$	Task $i$ relative deadline
$D_{i}$	Amount of data carried by task $i$
$ω_{i}$	Task $i$ priority
$π_{i}^{C}$	Offload decision variables
$t_{i}^{L}$	Task $i$ local execution time
$t_{i}^{L C}$	Task $i$ offload transfer time
$t_{i}^{C}$	Task $i$ cloud execution time
$e_{i}^{L}$	Task $i$ local execution energy consumption
$e_{i}^{T}$	Task $i$ offload transfer energy consumption
$r_{i}$	Task $i$ assigned data transfer rate
$E_{i}$	Percentage of power remaining in detector
$\bar{E}$	Average power percentage of detectors idle and performing offload tasks
$T S C$	Total system cost

Table 2. Simulation parameters.

Parameter	Value
Number of items of testing equipment $K$	10
Local computing capability $f_{i}^{L}$	0.2 GCycles/s
Cloud server computing capability $F^{C}$	10 GCycles/s
Transmitted power $P_{i}$	(0.1,0.2) w
Total system transmission rate $R$	800 Mb/s
Computation cycles required per task $c y_{i}$	(0.01,0.3) GCycles

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, D.; Wang, R.; Xia, C.; Xia, T.; Jin, X.; Xu, C. Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection. Mathematics 2022, 10, 4812. https://doi.org/10.3390/math10244812

AMA Style

Wei D, Wang R, Xia C, Xia T, Jin X, Xu C. Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection. Mathematics. 2022; 10(24):4812. https://doi.org/10.3390/math10244812

Chicago/Turabian Style

Wei, Dong, Renjun Wang, Changqing Xia, Tianhao Xia, Xi Jin, and Chi Xu. 2022. "Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection" Mathematics 10, no. 24: 4812. https://doi.org/10.3390/math10244812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Edge Computing Offloading Method Based on Deep Reinforcement Learning for Gas Pipeline Leak Detection

Abstract

1. Introduction

2. Related Work

2.1. Traditional Task Offloading Methods

2.2. Machine Learning Task Offloading Methods

3. System Model and Problem Description

3.1. System Model

3.2. Description of problem

4. Task Offloading Algorithm

4.1. Heuristic Algorithm

4.2. Deep Reinforcement Learning Algorithms

4.3. Algorithm Process

5. Experimental Results and Analysis

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI