**1. Introduction**

With the full development of the fifth-generation mobile communication research, the information and intelligence of cities have been greatly developed. More and more smart facilities are deployed in every corner of the city, which enhance the quality of life for citizens. In the era of IoT, smart cities power and monitor a variety of intelligent IoT devices. Accompanied by intelligent devices, IoT applications are designed, such as smart parking, smart traffic, and smart security. These applications can generate some computation-intensive tasks such as camera tracking and object recognition. In the traditional central cloud network, these tasks will be offloaded to the central cloud server for execution. However, the central cloud network faces some challenges, as follows. (1) More users are served, which is easy to cause network congestion. (2) Since the central cloud server is far away from users, the data transmission process consumes a lot of time. As a main evolution technology in the 5G, MEC provides a good direction to solve these challenges [1,2]. MEC server is deployed at the edge of the core network, which is closer to users. The computation-intensive tasks can be offloaded to the MEC server for reducing the delay, network congestion, and energy consumption of IoT devices [3,4].

Based on the above description, how to make a reasonable offloading decision and resource allocation scheme subject to limited resources has become a key problem. The joint optimization of task offloading and resource allocation is a mixed-integer nonlinear programming problem [5,6]. Currently, scholars have some research works on task offloading and resource allocation. The joint problem is solved by splitting it into several sub-problems, relaxation variables, and deep reinforcement learning based on Deep Q Network (DQN) framework [7–9]. However, the first two algorithms simplify the original problem and do

**Citation:** Chen, X.; Liu, G. Federated Deep Reinforcement Learning-Based Task Offloading and Resource Allocation for Smart Cities in a Mobile Edge Network. *Sensors* **2022**, *22*, 4738. https://doi.org/10.3390/ s22134738

Academic Editor: Antonio Cano-Ortega

Received: 26 May 2022 Accepted: 17 June 2022 Published: 23 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

not directly solve the joint optimization problem of task offloading and resource allocation. With the development of deep neural network, deep reinforcement learning has a good effect on solving environmental decision-making problems. However, the algorithm based on DQN framework is difficult to deal with the problem of fine-grained space or continuous space. Therefore, the deep reinforcement learning algorithm based on the DDPG framework is adopted in this article. The DDPG algorithm has a good effect in dealing with spatial continuous decision-making problems.

In order to obtain a better Quality of Experience (QoE), some research works adopted the cooperation method between MEC servers, or the unified scheduling method on the central cloud [10,11]. However, these collaborative and centralized processing algorithms do not consider the privacy and security problems in the process of data migration and processing. Therefore, many users are reluctant to upload their private raw data to other MEC servers or the central cloud server. To tackle the problem, the federated learning technology is proposed by Google [12]. It is a distributed machine learning framework, which consists of one central server and a set of clients [13–15]. The main idea of federated learning is to enable the data on clients to train their respective network models. Then, the parameters of clients are aggregated to update the network model on the server side. A better training model is obtained by the iteration between distribution and aggregation without sharing the raw data. Therefore, the federated learning is introduced into the joint optimization problem of this article to obtain a better optimization performance.

In this article, we focus on the joint optimization problem of task offloading and resource allocation based on privacy protection in smart city. The optimization objective is to minimize the energy consumption of all IoT devices within the delay threshold. Since the joint optimization is a mixed-integer nonlinear programming problem, it is difficult to solve it by the traditional programming algorithms. Therefore, based on the above description, a two-timescale federated deep reinforcement learning algorithm based on DDPG framework is proposed to solve the problem. The small timescale is to optimize the offload decision and the resource allocation scheme in each MEC server by training DDPG network. The large timescale is to aggregate the parameters of MEC servers in order to obtain a better training performance. The contributions of this paper can be summarized as follows:


The rest of this article is organized as follows: Section 2 presents the system model, including task model, communication model and computation model. Section 3 presents the optimization problem and solution. Section 4 provides the simulation results and evaluates the performance of the proposed algorithm. Section 5 concludes this article.

#### **2. Related Work**

The concept of MEC was put forward many years ago. In 2013, the world's first mobile edge computing platform was established by IBM and Nokia Siemens Network [16]. In 2014, the European Telecommunications Standards Institute (ETSI) proclaimed industry specifications for MEC, which was supported by IBM, Huawei, Intel, etc. Currently, most of the MEC research works focus on how to fully utilize the powerful computing and storage capacity of the MEC server to reduce delay and energy consumption of IoT devices [17]. Some popular contents are cached on the MEC server to reduce the delay and network backhaul load. Aung et al. [18] proposed a social-aware vehicular edge computing architecture that solves the content delivery problem by using some of the vehicles in the network as edge servers that can store and stream popular content to close-by endusers. The computation-intensive applications can be offloaded to the MEC server for execution [19]. Apostolopoulos et al. [20] proposed a joint problem of latency and energy minimization considering the data offloading characteristics of the end nodes. In this article, we only focus on the computing resource allocation of the MEC server.

The task offloading problem in the communication system will inevitably involve task scheduling, the allocation of computing and transmission resources [21,22]. Therefore, the problem can be easily regarded as a joint optimization problem of task offloading and resource allocation, which is a mixed-integer nonlinear programming problem. There are generally three types of algorithms to solve the problem. The first type of algorithm is to split the joint optimization problem into multiple sub-problems [7,23]. Zhao [24] formulated the joint optimization problem task offloading and resource allocation and decomposes it into three sub-problems named as offloading ratio selection, transmission power optimization, and sub-carrier and computing resource allocation. The joint optimization problem was decomposed into two-level sub-problems and solved by the iterative algorithm [25]. This type of algorithm is not a joint optimization algorithm for the original problem, and the efficiency of iterative optimization for several sub-problems is not high. The second type of algorithm is to relax the variables in the optimization problem [8]. Masoufdi [26] investigated the power minimization problem for the mobile devices by data offloading in a multi-cell multi-user Orthogonal Frequency Division Multiple Access (OFDMA) network. To solve the problem, it was converted to the convex form using variable changing, Difference of Convex (DC) approximation, adding a penalty factor, and relaxing the binary constraints. The lower bound and upper bound of the joint optimization problem were considered and the semi-definite relaxation and rounding methods were exploited to obtain the offloading decision [27]. The mixed integer nonlinear programming problem is transformed into a nonlinear programming problem by variable relaxation. Then, it is solved by iterative algorithm or genetic algorithm. Undoubtedly, the type of algorithm has a lower efficiency. The third type of algorithm is to use the deep reinforcement learning algorithm to solve the optimization problem. Li et al. [9] investigated the resource allocation scheme for vehicle-to-everything communications, and proposed the optimization problem of resource blocks allocation and vehicle transmission power allocation. A reinforcement learning based on DQN framework was designed to solve this problem. Suh et al. [28] proposed a DQN algorithm based network slicing technique to calculate the resource allocation policy, maximizing the long-term throughput while satisfying the Quality of Service (QoS) requirements in the beyond 5G systems. Since it is difficult for DQN algorithm to deal with the problem of fine-grained space or continuous space, a deep reinforcement learning algorithm based on DDPG framework is proposed to solve the joint optimization problem in this article.

To improve resource utilization and algorithm performance, some research works adopted the cooperation methods, such as Cloud-MEC, MEC-MEC, Cloud-MEC-Device. Naouri et al. [29] proposed a three-layer task offloading framework, which consisted of the device layer, cloudlet layer and cloud layer. A cloud-MEC collaborative computation offloading scheme was proposed in vehicular networks [24]. Chen et al. [30] studied an energy-efficient task offloading and resource allocation scheme for Augmented Reality (AR) in a multi-MEC collaborative system. Monia et al. [31] investigated the joint task assignment and power control problems for Device-to-Device (D2D) offloading communications with energy harvesting. A layered optimization method is proposed to solve this problem by decoupling the energy efficiency maximization problem into power allocation and offloading assignment. However, these collaborative and centralized processing algorithms do not consider the privacy and security problems in the process of data migration and processing. As a result, many users are reluctant to upload their private raw data to other MEC servers or the central cloud server. To solve this problem, federated learning is introduced in this article, which not only protects privacy but also improves the performance of the model.

### **3. System Model**

In this article, a system model for the smart city in a mobile edge network is established, which consists of three layers: IoT device, MEC server and Central Cloud, as shown in Figure 1. The central cloud is an auxiliary role, which helps the MEC server obtain a better decision-making mechanism by aggregating the neural network parameters of each edge server. The MEC server has a powerful computing capacity, which can quickly process the tasks offloaded by IoT devices. The IoT devices can generate some tasks with strict computing requirements. Since the IoT devices have limited computing resources and limited energy, the computing tasks need to be offloaded to the MEC server for processing. In consideration of security and privacy issues, IoT devices can only offload their tasks to the trusted MEC server, not to the central cloud server. We denote the central cloud, the set of MEC servers and the set of IoT devices (the set of applications) by Γ, *k* ∈ {1, 2, ... , *K*} and *n* ∈ {1, 2, ... , *N*}, respectively. We believe that IoT devices are special devices, and each IoT device corresponds to an application. We assume that each IoT device only requests one task at the same time and the network state is constant during task processing. The specific workflow of the system is as follows. First, IoT devices generate the tasks and send the relevant information to the MEC server through the base stations at the same time. Second, a decision on offloading and resource allocation is made according to the collected task information and network status. Finally, these tasks are executed according to the offloading decision and resource allocation schemes.
