*Article* **Secure Mobile Edge Server Placement Using Multi-Agent Reinforcement Learning**

**Mumraiz Khan Kasi 1, \*, Sarah Abu Ghazalah 2 , Raja Naeem Akram <sup>3</sup> and Damien Sauveron 4**

<sup>1</sup> Department of Computer Science, FICT, BUITEMS, Quetta 87300, Pakistan


**Abstract:** Mobile edge computing is capable of providing high data processing capabilities while ensuring low latency constraints of low power wireless networks, such as the industrial internet of things. However, optimally placing edge servers (providing storage and computation services to user equipment) is still a challenge. To optimally place mobile edge servers in a wireless network, such that network latency is minimized and load balancing is performed on edge servers, we propose a multi-agent reinforcement learning (RL) solution to solve a formulated mobile edge server placement problem. The RL agents are designed to learn the dynamics of the environment and adapt a joint action policy resulting in the minimization of network latency and balancing the load on edge servers. To ensure that the action policy adapted by RL agents maximized the overall network performance indicators, we propose the sharing of information, such as the latency experienced from each server and the load of each server to other RL agents in the network. Experiment results are obtained to analyze the effectiveness of the proposed solution. Although the sharing of information makes the proposed solution obtain a network-wide maximation of overall network performance at the same time it makes it susceptible to different kinds of security attacks. To further investigate the security issues arising from the proposed solution, we provide a detailed analysis of the types of security attacks possible and their countermeasures.

**Keywords:** mobile edge computing; mobile edge server placement; multiagent RL; edge security

#### **1. Introduction**

Widespread deployments of robotics, assembly and production, automation, machine intelligence, and virtual reality applications requires high performance computing resources available close to the point-of-service [1]. Integration of smart services, such as predictive analysis, and delay-intolerant applications, such as healthcare applications, in current cellular architecture with limited battery lifetimes and processing power of edge (mobile and IoT) devices have called for the re-imagination of cloud computing architecture.

The traditional cloud-centric architecture provides flexibility and significant computation power. However, the communication and delay sensitive requirements of the IoT environments place constraints on the centralised cloud—making them less preferable for a robust service platform. To circumvent delay in the traditional cloud-centric architecture, several network architectures have been proposed with the idea of bringing the cloud nearer to user devices [2]. One such architecture is edge computing that provides a virtualized application layer between edge devices and cloud engine in an existing network infrastructure. Edge computing introduces distributed control systems replacing the single remote centralized control-centre or cloud allowing the processing of data near the edge of the network with enhanced scalability.

**Citation:** Kasi, M.K.;

Abu Ghazalah, S.; Akram, R.N.; Sauveron, D. Secure Mobile Edge Server Placement Using Multi-Agent Reinforcement Learning. *Electronics* **2021**, *10*, 2098. https://doi.org/ 10.3390/electronics10172098

Academic Editor: Kevin Lee

Received: 19 June 2021 Accepted: 24 August 2021 Published: 30 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Cloudlets, multi-access or mobile edge computing (MEC), and fog computing are some of the well-known edge computing architectures. In this work, we utilize the MEC architecture that uses the existing network architecture such as cellular base station or Wi-Fi access point to provide computational resources and data storage at the edge network [3]. In MEC, the edge network serves as mid-tier between edge (mobile and IoT) devices and cloud increasing the network's capability to provide high throughput and offer low latency to edge devices. However, the costly hardware components and limited budget of network operators present some practical complications in implementing mobile edge computing solutions. Due to these constraints, only a limited number of mobile edge servers can be located in networks. This further makes the placement of a limited number of mobile edge servers a challenging problem given the performance requirements of a wireless network. Additionally, finding the optimal placement of mobile edge servers given a large set of possible placement options further increases the complexity of finding optimal placement strategy for mobile edge servers.

The large solution space of mobile edge server placement options can be improved by collocating mobile edge servers with existing cellular network base stations or wireless network cluster heads [4]. In this work, we follow a similar strategy where mobile edge servers are placed within an already existing cellular or wireless network infrastructure disbarring the search space for optimal placement strategy from exploding. Further, the optimal placement strategy should take into consideration the individual mobile edge server's workload, access delay and application specific requirements into consideration. Various solutions have been proposed in literature to solve the edge servers placement problem [5–11]. However, most of these solutions apply heuristic algorithms or some sort of linear or quadratic optimization technique. This has motivated us to propose an online learning based solution for the mobile edge server placement problem with special emphasis on possible security threats and its solutions.

In our proposed solution, an RL agent is employed as mobile edge server that learns the dynamics of the environment and chooses the best placement strategy maximizing the reward which is dependent on the utility function. The number of RL agents is equal to the number of edge servers in the network. This demands for information exchange between the RL agents to maximize network-wide utility. In the proposed work, we implement hysteretic Q-learning for coordination which is a decentralized multi-agent RL technique allowing the agents to adopt independent actions by maximizing a common goal. Further, we analyze the security threats and countermeasures that may arise due to the information sharing between the edge servers acting as RL agents.

The primary contributions of this work are:


The rest of the paper is organized as follows: In Section 2, we discuss the existing theories on the placement of edge servers. We discuss the preliminaries of RL in Section 3. In Section 4, we provide an overview of our proposed solution using RL. In Section 5, we describe network and RL modeling and its implementation. Section 6 explains our findings for various system configurations and parameters. Section 7 provides details on the identification of the security issues involved in the exchange of information between edge servers. Finally, Section 8 concludes our work and suggests future research directions.

#### **2. Literature Review**

The literature review related to mobile edge computing can be divided into two parts. The first part inherently deals with efficient resource allocation in a MEC network. Although the second part, which has been trending recently, tackles the mobile edge server placement problem in a MEC network. It is important to note that the solutions provided for resource allocation problems consider an arbitrary placement of edge servers. Although the second part deals with deployment strategies for edge servers such that desired quality of service is met. To that end, this literature review discusses pioneer works in both of these parts followed by the contributions of this work.

In literature belonging to efficient resource allocation, the authors have proposed an online learning based solution for the resource allocation, scheduling or offloading problems in MEC networks [12–15].

The proposed solutions have studied MEC problems in different aspects ranging from computation offloading schemes in MEC to mobile edge application placement. However, they have not explicitly discussed the mobile edge server's positioning as a challenging problem. For example, the authors in [13], have proposed a deep RL solution to allocate computing and network resources adaptively in MEC to reduce the average service time and balance resource usage under varying MEC environment conditions. In [12], the authors propose an RL based management framework to manage the resources at the network edge. They propose a deep Q-learning based algorithm to reduce service migration in MEC aiming at operation cost reduction. In both these proposed approaches and others [14,15], RL has been used to propose a solution to resource allocation challenges in MEC architecture.

On the other hand, the literature belonging to efficient deployment strategies of edge servers includes solving edge server placement problem using genetic, simulated annealing, and hill-climbing algorithms [5,6], k-means clustering with quadratic programming [8], queuing theory and vector quantization technique [16], graph theory [12], cost constrained multi-objective problem [10], and integer programming [11] to optimally place mobile edge servers in a wireless network.

In [5], the authors formulate the problem as a constrained multi-objective problem to balance workloads of mobile edge servers and reduce network access delay. To find the optimal solution, the authors utilize genetic, simulated annealing, and hill-climbing algorithms to show the effectiveness of the proposed solution. The authors in [6] use data mining techniques, such as a non-dominated sorting genetic algorithm to ensure reliability and low latency in social media services using mobile edge computing.

In [7], the authors present an edge provisioning algorithm that can find the ideal edge locations and map them to their physical locations in a MEC network. In [8], the authors make use of k-means algorithms to solve edge placement problems with mixed-integer quadratic programming. The authors in [9] propose a queuing network based solution to find the best position for cloudlets in a cloud computing network. In [10], the authors have proposed an integer programming solution to find the optimal strategy to place mobile edge servers in smart cities.

The authors in [16] propose an optimal edge server deployment strategy using queuing theory and vector quantization technique, with the aim to minimize the service providers cost and service completion time. The authors in [17] uses graph theory to minimize access delay and the number of edge servers in a MEC network. In [18], the authors develop a two-stage solution to optimally place heterogeneous edge servers in MEC using game theory concepts to optimize service response time.

Although the edge server deployment problem has recently received traction from both academia and industry, the proposed solutions assume global information available at a centralized controller which is responsible for the deployment of edge servers. However, in a realistic environment the changing network condition, user mobility, and traffic patterns will make such centralized solution not scalable due to the large amount of processing required at the centralized controller at each transmission time interval. Considering these

reported lacking, the proposed solution provides a distributed and learning based solution to mobile edge server placement problem while considering both delay minimization and edge servers load balancing. To the best of our knowledge, the problem of mobile edge server placement has not been solved through RL. This work serves as a primer on distributed placement strategy in MEC in which each edge server on a local set of observations finds its' optimal placement by exchanging limited set of information with other edge servers in the network. The information exchange between the edge servers is an essential part of a distributed edge server placement solution. To that end, this paper investigates the security concerns in implementing a multi-agent RL solution for mobile edge server placement problems and its possible counter-measures.

#### **3. Reinforcement Learning**

This section briefly discusses the concepts of reinforcement learning (RL) and its extension to a multi-agent RL.

#### *3.1. One-Agent RL*

RL algorithms are built on Markov decision processes (MDP) that allow the agent to receive a reinforcement signal from the environment steering it towards an optimal action policy. MDP is defined as h*O*, *A*, *P*, *ρ*i, where *O* is the set of observations or states perceived from the environment, *A* is the discrete or finite set of actions, *P* : *O* × *A* × *p* → [0, 1] is the probabilistic state transition function, and *ρ* : *O* × *A* × *O* → *R* is the reward function.

At any time step *i*, the action *a<sup>i</sup>* ∈ *A* influences the environment state to change from *oi* to *oi*+<sup>1</sup> with a transition probability of *P*(*o<sup>i</sup>* , *ai* , *oi*+1). In return of implemented action, the agent receives a scalar reward *ri*+<sup>1</sup> ∈ *R* according to the expression *ρ* : *ri*+<sup>1</sup> = *ρ*(*o<sup>i</sup>* , *ai* , *oi*+1). The overarching target of an RL agent is to adapt an action policy that maximizes the discounted future expected reward which is given as:

$$Q^\pi(o, a) = \mathbb{E}[R\_i | o\_i = o, a\_i = a, \pi],\tag{1}$$

where *R<sup>i</sup>* = ∑ ∞ *<sup>j</sup>*=<sup>0</sup> *γ j ri*+*j*+<sup>1</sup> is the reward signal, *γ* ∈ [0, 1] is the discount factor and *Q<sup>π</sup>* : *O* × *A* → *R* is the Q-function representing the discounted future expected return for a state-action pair.

Mathematically, the maximum value for the discounted expected return is characterized as *Q*<sup>∗</sup> (*o*, *a*) = max *π Qπ*(*o*, *a*), which can be learned and estimated using Q-learning in the absence of probabilistic state transition and reward functions [19]. It is theoretically proven that the Q-learning algorithm converges to the optimal solution under certain conditions [19]. The Q-learning algorithms allow a RL agent to iteratively learn the estimates *Q*<sup>∗</sup> based on its interactions with the environment using the formula:

$$Q\_{i+1}(o\_i, a\_i) = (1 - \mathfrak{a})Q\_i(o\_i, a\_i) + \mathfrak{a}(r\_{i+1} + \gamma \max\_{a\_{i+1}} Q\_i(o\_{i+1}, a\_{i+1})),\tag{2}$$

where *α* ∈ [0, 1] defines the learning rate of the Q-learning algorithm.

#### *3.2. Multi-Agent RL*

In a multi-agent RL problem, multiple agents using RL algorithms interact or compete to maximize a well-defined goal. The complexity of multi-agent RL is comparatively more than a one-agent RL solution since the use of multiple RL agents allow the environment to be jointly influenced by the actions of all agents which leads to non-dominance of a specific action policy. Optimal behavior of a multi-agent RL is reached when each agent operates in the Nash equilibrium which is difficult to visualize in a practical applications [20]. In the Nash equilibrium, each RL agent assumes the unvarying behavior from other agents and maximizes its own reward. Due to the complexities involved in implementing Nash equilibrium in practical applications, we discuss a couple of algorithms that can deal with multi-agent RL problems.

#### 3.2.1. Independent Agents

In this method, the RL agent follows a coordination-free strategy with other agents by assuming each agent's independence. This is equivalent to implementing one-agent RL for each agent in the problem without initiating any coordination. The formulation of multi-agent RL problem as independent agents will simplify the solution but it will also make the convergence difficult due to the non-stationarity of independent agents [21].

#### 3.2.2. Indirect-Coordinating Agents

In this method, the RL agent follows a coordination strategy with other agents. The action selection strategy comprises of the joint action of all agents which is made on the reward feedback received from the environment. Although, the learning is still independent but a common objective exists between the RL agents which it tries to maximize. We discuss one such method of indirect-coordination multi-agent RL method which is called hysteretic RL. In hysteretic RL, the agents take actions independently but the reward function is shared between all agents given as [22]:

$$
\delta \leftarrow r - Q\_k(o\_\prime a\_k) \tag{3}
$$

$$Q\_k(o, a\_k) \leftarrow \begin{cases} Q\_k(o, a\_k) + \mu \delta, & \text{if } \delta \ge 0 \\ Q\_k(o, a\_k) + \sigma \delta, & \text{else} \end{cases} \tag{4}$$

where learning rates *µ* and *σ* are between 0 and 1, *r* is the reward based on the feedback returned by the environment and *Q<sup>k</sup>* (*o*, *a<sup>k</sup>* ) is the *Q*-value of *k*th agent. The core idea behind hysteretic RL is to penalize agents for taking a bad action.

#### **4. Multi-Objective Problem Formulation**

A mobile edge server positioning problem can be written as an undirected graph, such that the location of base stations in existing cellular architecture makes the vertices of the graph and the base station's distance to mobile edge servers is represented as edge weights. A finite set of mobile edge servers *S* can be collocated with a set of base station *B*, such that the number of mobile edge servers will always be less than the number of base stations, as shown in Figure 1. As discussed in Section 1, the constraint of collocating mobile edge servers with the existing base station's location is to reduce the virtually infinite solution space of optimal mobile edge server positioning. Assuming a straightforward communication channel between base stations and mobile edge servers, the access delay at edge devices can be defined as the Euclidean distance (*d<sup>b</sup>* ) between a mobile edge server and base station where *b* ∈ |*B*|. The workload of a base station (*t<sup>b</sup>* ) is defined as the processing of incoming call and flow requests from edge devices.

The desired key performance indicators in MEC are reduced network access delay and balanced load on mobile edge servers which is why the mobile edge server positioning problem in this work is devised to improve these key performance indicators while finding an optimal placement of *S* mobile edge servers. The goals of the formulated problem are to (i) reduce the access delay or latency between mobile edge servers and base stations, and (ii) balance the workload of mobile edge servers. The key assumptions in formulating the mobile edge server positioning problem considered in this work are:


**Figure 1.** Edge servers placement in mobile edge computing.

The workload of a *s*th (*s* ∈ |*S*|) mobile edge server (*Ts*(ℓ)) is made dependent on the processing and storage requests offloaded from connected base stations, such that *<sup>T</sup>s*(ℓ) = <sup>∑</sup>*b*∈|*B*<sup>|</sup> *tb* , ℓ is the positioning arrangement of mobile edge servers, *t<sup>b</sup>* is the incoming call and data requests from edge devices to *b*th base station. It is important to note that in MEC, the base station acts as a relay node transferring the incoming call and data requests from edge devices to mobile edge servers. Similarly, access delay is devised as the sum of Euclidean distances from a *s*th mobile edge server to one or more base stations which are offloading incoming requests processing and storage to *s*th mobile edge server, such that, *<sup>D</sup>*(ℓ) = <sup>∑</sup>*b*∈|*B*<sup>|</sup> **<sup>d</sup>b**. Balancing the workload of mobile edge servers ensures that no edge servers is overloaded with offloading requests while some mobile edge server's processing capacity is underutilized. Mathematically, the standard deviation of each mobile edge server's workload is used to devise workload balancing metric (*W*(ℓ)) in a MEC, such that,

$$\mathcal{W}(\ell) = \text{std}(T\_{j\prime}T\_k) \quad \forall j, k \in |\mathcal{S}|. \tag{5}$$

Finally, the cost function of multi-objective constrained optimization problem can be defined as:

$$\mathbf{C}(\ell) = \beta \mathbf{W}'(\ell) + (1 - \beta) \mathbf{D}'(\ell), \tag{6}$$

where superscript *z* ′ denotes a normalized value of variable *z*, and *β* ∈ [0, 1] is the weightage parameter.

Therefore, the formulated mobile edge server positioning problem can be defined as:


*C*(ℓ)

min

Mathematically,

$$\text{such that}$$

$$
\ell \in |\mathcal{B}|\tag{7}
$$

$$
\sum\_{s=1}^{|\mathcal{S}|} \mathcal{X}\_{bs} \le |\mathcal{S}|\tag{8}
$$

where constraint (7) ensures that a base station offloads processing/storage requests to no more than *S* mobile edge servers. The above formulation of mobile edge servers positioning is a mixed-integer linear program problem that is NP-hard in nature due to the non-linearity of constraint given in (7) [10]. Therefore, this work proposes to solve the mobile edge servers positioning problem using multi-agent RL technique.

#### **5. Proposed Solution**

Mobile edge computing architecture is beneficial in providing services to a densely deployed network with low-latency and high-throughput requirements [3]. However, there are certain limitations attached to the MEC architecture. First, as explained above, the cost of infrastructure deployment and maintenance is high, therefore, dense deployment of edge servers is not a cost-effective solution. Second, the service requirement of users changes with respect to time, therefore, a certain strategy of mobile edge server's deployment may be optimal for a specific time while it would be sub-optimal for other times. The varying requirements of mobile users due to mobility require that the proposed solution should be able to adapt to the changing scenarios.

One option could be to manually configure the network at different times of the day to make sure that the edge servers deployment is optimal, however, the associated operator expenditure costs may not be feasible for an operator. To circumvent that, an online learning paradigm, such as RL can be effective in dealing with the changing environment conditions. In RL, the environment is modeled as MDP which allows a RL agent to learn the optimal action policy by interacting with the environment. In our proposed approach, each mobile edge server will be working as a RL agent and the environment will be modeled as the mobile edge computing network with base stations, and user devices. Each RL agent will be taking actions independently based on the perceived notion of state from observations and measurements of the environment, however, reward will be computed based on the network-wide delay and workload observed which would require information exchange, as shown in Figure 2. The network-wide utility is defined as the average communication delay and edge server's workload for all edge servers in the network. The objective of the proposed work is to find a mobile edge servers positioning strategy, such that it caters to the needs of data rate requirements of users, as well as it should be able to minimize the delay and maintain workload balancing between edge servers. In this section, we discuss the methodologies adopted for environment and RL agent design.

**Figure 2.** Multi-agent RL assisted mobile edge computing.

#### *5.1. Environment Design*

To make the proposed environment design realistic, we make use of base station locations and call and data requests dataset from Shanghai Telecom that include record of approximately 7 million call and data requests made through 2766 base stations from 9481 edge devices [8,23,24]. Each call and data record is a tuple of request access time by an device from a base station. Shanghai being a heavily populated city makes it a suitable dataset to implement a mobile edge server placement solution in an ultra-dense MEC network. Figure 3 shows the base station distribution in Shanghai, China where each dot is a location of base station and the color of a dot represent the intensity of incoming call and data requests from edge devices.

**Figure 3.** Graphical depiction of base station locations in Shanghai Telecom dataset [8,23,24].

The graphical depiction of base station locations is important to realize that a mobile edge server placement solution would require the edge servers to move to any of other base station locations. This means if there are 2766 base stations in the network then a mobile edge server can move to any of these locations. However, there are two problems associated with this assumption. First, the number of locations a mobile edge server can move at each transmission time interval would be dependent on the number of base stations in the network which would make it not scalable if the number of base stations is too high. Second, in a realistic world a mobile edge server would be deployed in a movable object, such as a vehicle, such that limiting the movement of edge servers to only nearby base station locations.

Considering the above two problems discussed, we transform the distribution of base stations given in Shaghai Telecom dataset to a contour line. A contour line links the base station locations with a line joining the two nearest base stations. This transformation of actual locations of base stations to contour line has following benefits:


In Figure 4a, we show the simulation depiction of base station locations available in Shanghai Telecom dataset. Note that the dataset assumes base station locations in twodimensional space. Figure 4b shows the contour representation of base station locations, such that each base station is represented a point on a two-dimensional space which is connected to two nearest base stations. The contour line enables the transformation

of actual dataset values to so that a mobile edge server can only move between two adjacent locations.

To quantify workload of a mobile edge server and delay experienced by a user equipment, we make use of records available in Shanghai Telecom dataset. The workload of a mobile edge server is quantified by summing up the requested call and data rates from the connected base stations. As a base station offloads its requested computational processing to the connected mobile edge server, therefore, summing up these requested rate is a reasonable assumption [10]. The delay experiences by a user will be proportional to the distance between base station and mobile edge server locations assuming that user equipments are present in close vicinity to base stations [10]. Therefore, the edge device access delay is defined in term of the sum of Euclidean distances from a base station to mobile edge servers to which incoming call and data requests are offloaded for processing or storage.

**Figure 4.** Transformation of base station locations in Shanghai Telecom dataset to contour line. (**a**) Simulated depiction of base station locations. (**b**) Contour line joining nearest base stations.

#### *5.2. RL Agent Design*

In this part, we aim to solve the optimization problem formulated in Section 4 for each mobile edge server using RL (see Algorithm 1). The proposed approach considers a scenario where each mobile edge server is placed on a movable vehicle that has the ability to move within the network, as shown in Figure 1. The movement of a moving vehicle is controlled by the actions of a RL agent that aims to learn the optimal placement strategy by maximizing the reward. There are three main components involved in the design of RL agent: action space, state space, and reward.

#### 5.2.1. Action Space

Action space in the proposed work is a set of actions by which the mobile edge server change its locations. These actions are updated at the end of an epoch which is dependent on the change in network traffic. In the proposed work, actions are formed to move the mobile edge server between adjacent locations. Since we have formed a contour line from actual base station locations restricting a mobile edge server to move to only two possible neighbor locations. The action space is comprised of a set of three distinct actions by which a mobile edge server can either move to adjacent location on the right or it can move to adjacent location on the left or it stays at the same location. These set of actions will be available for each mobile edge server with the assumption that multiple mobile edge servers can be positioned in the same mobile edge server location.


#### 5.2.2. State Space

A state in the proposed work is defined by the communication delay between a mobile edge server and base stations that are offloading call and data requests to a mobile edge server. The communication delay as discussed in Section 4 is proportional to the Euclidean distance between a mobile edge server and connected base stations. Delay, as a stand-alone, will be used to infer the state of the environment. Note that other network features, such as location information, data request rate, etc., can also be used to infer on the state of the environment, however, we have made use of a simplified state space model to (i) show the efficacy of proposed solution and (ii) focus on the security aspects that may arise due to the proposed solution.

Even with a simplified state space containing only delay metric as state variable, the number of possible state values can be infinite. For this reason, we quantize the values of delay between maximum and minimum delay which will vary for different MEC networks.

#### 5.2.3. Reward

A RL agent learns from the feedback returned by the environment in the form of rewards. In this problem, a reward is a function of cost values, such that:

$$R\_t = \mathbb{C}(\ell)\_{t-1} - \mathbb{C}(\ell)\_t \tag{8}$$

The expression in Equation (8) drives the RL agent to take actions, such that the cost is minimized from the previous epoch *t*. Note that the cost function is dependent on the global observations. For example, a mobile edge server must be aware of the workload and delay of other edge servers in order to compute the cost function. This transforms the problem into a coordinated multi-agent RL problem in which the reward function is a function of network-wide metrics. This makes it equivalent to hysteretic RL algorithm discussed in Section 3 which is used by each mobile edge server to implement RL in this work.

The state and action spaces are still dependent on the local observations and an agent take actions independent. The sharing of information between edge servers controls the behavior of mobile edge server's placement which, if changed for some reason, would affect the performance of overall implementation. We discuss further on the type of security breaches and its counter solutions in Section 7.

#### **6. Results**

In this section, we evaluate the performance of multi-agent RL algorithms for mobile edge server positioning problem by experimenting on the Shanghai Telecom's base stations and incoming call and data requests dataset [8,23,24]. The proposed solution is

implemented in MATLAB. Multiple RL agents take actions independently to find the best placement strategy, such that the reward returned by the environment is maximized. The proposed solution performance is measured via the cost function value which drives the reward function value. The list of simulation and RL hyperparameters used during the experiments shown in this work are summarized in Table 1. It is important to note that the optimal number of edge servers in a network depends on a number of factors including the network operators budget, and user traffic demand. The proposed model selects an arbitrary number of edge servers (that should be less than the number of base stations), and finds optimal placement for these edge servers, However, the number of edge servers can be found by probability theory and control system rules [25].


*β* weightage of delay over workload in utility function 0.5 *γ* discount factor 0.9 *ǫ* random exploration 0.15

**Table 1.** Simulation parameters.

The experimentation presented in this work aim to answer the following questions:


#### **A. Does the proposed solution generalize across different random initialized values used in the experimentation?**

The objective of this experiment is to show the generalizability of the proposed solution across different initial values of parameters used in the experimentation. The RL agent's initial location and other experimentation parameters, such as *α*, *β*, and *ǫ* require assignment of an initial value which is set to random values. Therefore, using different seeds for random values will change the initial state of each RL agent. Ideally, the average final cost for all these experiments should be same. However, the use of distinct random seeds makes the initial state set for RL exploration strikingly different presenting an entirely different search space for the RL agents to explore and exploit.

In Figure 5, the cost function values across number of epochs are shown where each epoch represents the instant at which all RL agents take actions. The plotted cost function value is the averaged cost function value of each mobile edge server used in the experimentation. We can observe that for different seed values impacting the initial state of each agent, the proposed solution is able to minimize the average cost function values. Another significant observation is the fast convergence of the cost function values for each seed, shown in Figure 5. These results enable us to claim that even with simplified state space, the proposed solution without the use of any complex deep learning models is generalizable for different initial states.

**Figure 5.** Number of BS = 120 and Number of ES = 20. (**a**) Random Seed = 5. (**b**) Random Seed = 8. (**c**) Random Seed = 23.

#### **B. Is the proposed solution effective in finding the best placement for mobile edge servers when different number of base stations are present in the network?**

In Figure 6, the performance of the proposed solution is shown for varying number of base stations available in the MEC network. Ideally, the change in the number of base stations in the network should not affect the convergence of the proposed multiagent RL assisted edge servers placement. The results in Figure 6 show that for 120 and 240 base stations available in the network, the cost values converge after a number of epochs. Another significant observation is the increase in cost function value for initial few epochs when number of base stations are 240. This is mainly because RL agents explore the environment by choosing random actions dependent on *ǫ* which is reduced at each epoch.

**Figure 6.** Number of ES = 20 and Seed = 8. (**a**) BS = 120. (**b**) BS = 240.

#### **C. Is the proposed solution effective in finding the best placement for mobile edge servers when the number of mobile edge servers present in the network is varied?**

In Figure 7, the performance of the proposed solution is shown for varying number of mobile edge servers. Ideally, varying the number of mobile edge servers in the MEC network should not affect the convergence of the proposed multi-agent RL assisted edge servers placement. The results in Figure 7 show that for 20 and 30 mobile edge servers to be placed in the network, the cost values converge after a number of epochs.

**Figure 7.** Number of BS = 240 and Random Seed = 8. (**a**) ES = 20. (**b**) ES = 30.

In Figure 8, we present the results of a toy example in which 03 base stations are placed in a network namely A, B, and C. Base stations A and C are placed in the corners and base station B in the middle. Considering the toy example allows us to evaluate the performance of proposed solution against a numerical solution. Through numerical solution, the optimal locations for edge servers are 'A' and 'C' which can be observed that after a certain number of epochs, both the edge servers converge to its optimal locations.

**Figure 8.** Proposed solution comparison with ground truth for a toy example; where three base stations namely A, B, and C, are placed in a network.

#### **7. Security Perspective**

We demonstrate how the reinforcement learning assisted mobile edge server placement can be performed with multi-agent reinforcement learning coordination techniques. The multi-agent coordination problem may give birth to different security related issues since the working of a reinforcement learning agent is based on the observations from other agents, therefore, if the shared information is modified it will affect the working of entire solution. In the following sections, we present a possible scenario by which security can be breached and present the proposed countermeasures.

#### *7.1. Scenarios*

As discussed in earlier sections, workload is the sum of all the data offloaded from connected base stations and delay is the sum of the distance from connected base stations. The agent performs its actions based on the reward function and reward is based on workload balancing and delay.

From a security perspective, the first scenario case we can consider is the when an agent itself or man-in-the-middle (MITM) can alter the information contained in the packets passed between agents. This is done in order to force the agent to change its location to another base station or stay with the same base station despite a need to workload balancing and delay minimization.

Figure 9 presents a scenario of security issues present in the work. Let us assume that the values of workload and delay are increased. This will force the mobile edge server MES1 to move from it current location to a particular location where workload balancing is needed since the agents will assume that it needs to balance the workload and delay by migrating to other base stations. In contrast, if an agent itself or MITM alters the information by decreasing the values of workload and delay, the mobile edge server, MES2, will assume that everything in the network is fine and it does not need to change its location to other base stations.

Since the reward function at the agents makes decision based on the information (workload and delay) received, thus, it will act accordingly. Therefore, in the first scenario, after the information is altered by the malicious node, the mobile edge server MES1 will assume that it needs to move to the MES2 location in order to balance the workload.

The second potential security scenario can be a malicious entity compromised an agent in the network. The attack vector might be different form altering the packet en-route, but the malicious entity can achieve the same impact as the first scenario. Furthermore, a malicious entity compromising an agent in the network may go for eavesdropping with the aim to (a) try to construct a traffic map of the network, (b) build communication patterns

between agents, and (c) read communication packets. In the above listed aims, 'a' and 'b' can help the malicious user to understand the network design and communication patterns between agents. This can assist the malicious entity to mount a network wide attack, for example DDoS. The option 'c' allows the malicious entity to read the information communicated between the agents. This might reveal some sensitive information about the agents or applications being executed on these agents.

**Figure 9.** Malicious/MITM Agent scenario at a Mobile Edge Server.

The third potential security scenario can be Trojan horse attacks, whether a Trojan horse is embedded in hardware or software. The objectives of such an attacker can be similar to the malicious entity in "second potential security scenario". The attack objective and impact can also be similar.

The fourth potential security scenario can be insider threat. In this attack, an insider compromises a single node, a collection of the node or the whole network—dependent upon the access of the insider and how senior their role is. The attack objective and impact can be similar to the three scenarios listed before but depending upon the access privilege the impact on the network can be significant.

We are not considering the lack of knowledge or expertise in an organization or a genuine human error as a security threat. As this in most cases leads to the vulnerability that the malicious actors in the above scenarios exploit or the respective impacts.

#### *7.2. Countermeasures*

In this section, we explore potential countermeasures to the each of the security scenarios discussed above.

#### 7.2.1. Countermeasure to First Security Scenario

As discussed earlier, an agent running on the mobile edge server has global information of workload and delay of all the agents in the network, whereas, the decision is made locally based on the information received and used in the reward function by an agent.

These security issues require verifying the identity of an agent before allowing access to resources in a system. Therefore, an authentication mechanism needs to enable the identity of an agent to be verified and, thus, to prevent it from faking or masquerading.

Additionally, data integrity needs to be ensured to prevent data from being altered or destroyed while being exchanged amongst the agents in an unauthorized manner to maintain consistency. Hence, a secure protocol should withstand such attacks and offer authentication and integrity of the exchanged data.

The cryptosystem we aim to achieve is one where the entities communicate over an insecure network, resulting in both parties needing to provide identity authentication first, and this then proves to the receiver the integrity of the messages. Peer authentication and secure data transmission are vital in our system.

Regarding authentication, public key infrastructure (PKI) provides the means of digital certificate for providing authentication. In our study we assume that all entities have digital certificates generated by the CA.

To achieve integrity between base station and mobile edge server, one may employ integrity encryption techniques, such as HMAC. However, before doing this, both entities should agree first on a secret key. Due to the key distribution problem, key agreement protocols have emerged where the actual key is not transferred on an untrusted channel.

The proposed protocol is divided into four stages:

1. Stage 1 Mutual Authentication Phase:

We assume that both parties already registered with CA, trust the same CA, and possess their own public key, own private key, own implicit certificate, and CA's public key. Both entities the base station and mobile edge server perform a handshake where both parties exchange their digital certificate to verify the authenticity of each party.

2. Stage 2 Key Agreement:

After authentication is done in both parties, they should agree on a shared master key. In our protocol, we will use the elliptic curve Diffie Hellman (ECDH) protocol that is most suitable for constrained environments. The elliptic curve cryptosystems are used for implementing protocols such as the Diffie–Hellman key exchange scheme [26] as follows:


$$A = k\_A \* P = (\mathbf{x}\_{A\prime} y\_B) \tag{9}$$

D. The mobile edge server computes:

$$B = k\_B \* P = (\mathfrak{x}\_{B'} \underline{y}\_B) \tag{10}$$

and then both entities exchange these values over an insecure network.

E. Using the information, they received from each other and their private keys, both entities compute:

$$Q = k\_A \ast B = k\_A \ast (k\_B \ast P) \tag{11}$$

and

$$Q = k\_B \ast A = k\_B \ast (k\_A \ast P) \tag{12}$$

respectively. This is simply equal to,

$$Q = (k\_A \* k\_B) \* P \tag{13}$$

which serves as the shared master key that only base station and mobile edge server possess.

3. Stage 3 Key Derivation:

To reduce the computational complexity on both parties, we assume that the mutual authentication phase is done periodically, only the session secret key is generated from the shared master key for achieving integrity protection algorithm, such as message authentication code (MAC) on each session.

The proposed protocol should use the best option for key derivation function (KDF) that ensures randomness, and we advocate the KDF recommendations in [27], which takes into consideration randomness through the use of random numbers (Nonce) and key expansion. Each peer computes the actual session key PK via the chosen key KDF *χ*, as:

$$PK = \chi(Q, \text{None}) \tag{14}$$

4. Stage 4 Message Exchange:

The exchanged data, such as workload and delay, need to be protected against unauthorized modification, hence HMAC is used to ensure the integrity. The base station calculates

$$D = HMAC\_{PK}((D(\ell), \mathcal{W}(\ell))\tag{15}$$

The base station sends W(ℓ), D(ℓ) and *D* to the mobile edge server.

The mobile edge server uses the agreed derived session key to calculate HMAC of W(ℓ) and D(ℓ) and verifies its integrity with D.

The performance of the above listed protocol depends on multitude of factors including: the processor speed, availability of specialized cryto-hardware, and communication (network speed). However, to provide a reference performance, we setup a test-bed with each agent node is a Raspberry Pi model B supplied with a Wi-Fi USB dongle TL-WN722N by TP-LINK.

In all the measurements we made, the nodes were configured in ad-hoc mode. Each agent is then connected to a server through an Ethernet connection. The server manages individual agents so as to prepare them for the target scenario and is also in charge of collecting the measurements. In our reference performance measurement, we only consider the scenario of two agents setting up a security communication link directly with each other.

Nevertheless, effective measurement can be done internally on the node initiating the secure channel, called a client, and it can be done at the level of the network data exchanged between the agents of the network and captured with a Wi-Fi card set in monitor mode on the server.

Based on this setup, the stated protocol in this section took 4282 milliseconds (on average) over 100 executions.

#### 7.2.2. Countermeasure to Second Security Scenario

Compromising an agent is basically system security problem. Potential countermeasures to this can be hardening the agent environment, pen-testing it before deployment, updating it regularly when new vulnerabilities come alive, having strong access control policies related to the agent configuration, etc.

Another potential solution to this problem can be to have a secure execution environment in individual agents. The secure execution environment can help protect sensitive code during its execution and avoid any malicious entities from interfering with it. Even then, the above listed precautions should be taken.

#### 7.2.3. Countermeasure to Third Security Scenario

Protection against Trojan horse attacks, especially related to Trojan horse in the hardware is dependent on secure and reliable supply chains. An organization can test their agents to detect whether they have some non-characteristic behavior. Similar actions can also be taken for the software bases Trojan horse. An effective mechanism can be continuous monitoring of the agent (hardware and software) behavior to detect any stealth Trojan that evades the detection pre-deployment.

7.2.4. Countermeasure to Fourth Security Scenario

Insider threat is a significant challenge to overcome in any large network or organization. A potential countermeasure to such a threat can include limited privilege that only requires single user approval. All sensitive actions should require multiple users to approve and deploy the changes. Employee management and making sure that HR revokes credentials of any employee that is leaving the company. Finally, user network activities and behavior monitor can help minimize any impact from disgruntle employees.

#### **8. Conclusions and Future Work**

Mobile edge computing facilitates in providing data storage and computational resources to mobile and low-power wireless sensor devices. In this work, we have shown a multi-agent reinforcement learning based solution for the placement of edge services in a mobile network, such that the network latency is minimized and load on edge servers is balanced. The experimental evaluation using Shanghai's Telecom dataset proves that the proposed solution quickly converges. Further, we provided a detailed analysis of the type of security attacks possible in the proposed solution concept. We also listed some of the countermeasures that can be used to deal with the security risks. The effectiveness of the proposed method even with a simple state-space provides a promise to the proposed solution. Much future work remains before the proposed solution can be implemented in the real world, but our findings suggest that this approach has considerable potential. This work serves as the proof of concept for a secure multi-agent RL implementation for edge server placement problem. However, to further validate the results of proposed model, we intend to implement the proposed model in a full-stack emulator such as SIMENA NE5000.

**Author Contributions:** Conceptualization, M.K.K., S.A.G., R.N.A.; investigation, M.K.K., S.A.G., R.N.A.; methodology, M.K.K., S.A.G., R.N.A.; resources, S.A.G.; software, M.K.K., S.A.G., R.N.A.; supervision, S.A.G.; visualization, M.K.K., R.N.A.; writing—original draft, M.K.K., S.A.G., R.N.A.; writing—review and editing, R.N.A., D.S.; funding acquisition, S.A.G. All authors contributed to the final version. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by King Khaled University under Grant Agreement No. 6204.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


#### *Article* **A Task Execution Scheme for Dew Computing with State-of-the-Art Smartphones †**

**Matías Hirsch 1, \* , Cristian Mateos 1 , Alejandro Zunino 1 , Tim A. Majchrzak 2 , Tor-Morten Grønli 3 and Hermann Kaindl 4**


**Abstract:** The computing resources of today's smartphones are underutilized most of the time. Using these resources could be highly beneficial in edge computing and fog computing contexts, for example, to support urban services for citizens. However, new challenges, especially regarding job scheduling, arise. Smartphones may form ad hoc networks, but individual devices highly differ in computational capabilities and (tolerable) energy usage. We take into account these particularities to validate a task execution scheme that relies on the computing power that clusters of mobile devices could provide. In this paper, we expand the study of several practical heuristics for job scheduling including execution scenarios with state-of-the-art smartphones. With the results of new simulated scenarios, we confirm previous findings and better comprehend the baseline approaches already proposed for the problem. This study also sheds some light on the capabilities of small-sized clusters comprising mid-range and low-end smartphones when the objective is to achieve real-time stream processing using Tensorflow object recognition models as edge jobs. Ultimately, we strive for industry applications to improve task scheduling for dew computing contexts. Heuristics such as ours plus supporting dew middleware could improve citizen participation by allowing a much wider use of dew computing resources, especially in urban contexts in order to help build smart cities.

**Keywords:** dew computing; edge computing; smartphone; job scheduling; scheduling heuristics

#### **1. Introduction**

Smartphones have increasing capabilities of processing information, which typically are underutilized [1,2]. Cities (and citizens) could benefit from such a plethora of underutilized resources if these were properly orchestrated. Any person carrying a smartphone could contribute with valuable resources to help cities grow and to manage them in a more sustainable way. For instance, anyone may help to improve urban road maintenance by collecting pavement data [3]. Participatory platforms have been proposed to enable people to voluntarily contribute data sensed with their personal mobile devices [4,5].

Cities generate vast amounts of data for different smart city applications through Internet of Things (IoT) sensors and surveillance cameras [6,7]. Processing locally sensed data can be done in different but not necessarily mutually exclusive ways, for instance, using distant cloud resources, offloaded to proximate fog servers, or with the help of devices with computing capabilities within the data collection context, e.g., with smartphones. This latter architectural option has been considered as an attractive self-supported sensing and computing scheme [8]. In addition, hybrid and volunteer-supported processing architectures were proposed to avoid overloading resource-constrained devices [9]. Depending

**Citation:** Hirsch, M.; Mateos, C.; Zunino, A.; Majchrzak, T.A.; Grønli, T.-M.; Kaindl, H. A Task Execution Scheme for Dew Computing with State-of-the-Art Smartphones. *Electronics* **2021**, *10*, 2006. https:// doi.org/10.3390/electronics10162006

Academic Editors: Ka Lok Man and Kevin Lee

Received: 29 June 2021 Accepted: 16 August 2021 Published: 19 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

on the adopted approach—hybrid or self-supported—effectively managing smartphones' limited energy and heterogeneous computing capabilities requires more research [10].

In a previous work [1], we proposed baseline heuristics to perform such resource management for a variant of self-supported sensing and a computing scheme where data collected in a local context is processed by a group of smartphones within the same context. We call that a "dew context" to continue with the cloud/fog metaphor used to describe a layered computing infrastructure.

Complementary, in this work, we subject these baseline heuristics to new execution scenarios, now considering simulated dew contexts with modern smartphone profiles battery traces and benchmark results from recent smartphone models. Additionally, we go beyond simulating synthetic jobs load by benchmarking CPU usage and inference time of a Tensorflow lite state-of-the-art model trained for recognizing a set of generic objects. Throughout this new set of experiments we shed some light on the viability of distributing jobs at the edge using smartphones for AI-based data stream processing applications in smart cities, which is another major difference with respect to [1].

Specifically, the contributions introduced by this paper are:


This paper is organized as follows. Section 2 discusses related work. Then, Section 3 provides a motivating example. Section 4 presents the studied scheduling heuristics. Section 5 describes the evaluation methodology and experimental design, while Section 6 gives an overview of metrics and results. In Section 7, we present an experiment simulating AI-based video processing. A summary of results and practical challenges are discussed in Section 8, while Section 9 concludes and points to future work.

#### **2. Related Work**

The exploitation of computing resources provided by smart devices in dew computing contexts—i.e., where both data collection and processing happen at the edge of the network—introduces new challenges in terms of job scheduling algorithms [10]. Since smart devices rely on batteries as their main power source, one of the challenges is to manage the remaining energy of resource provider nodes in the network. Hence, the impact of a given job execution schedule on a device's future battery level must be considered. This involves targeting the maximization of completed jobs without exceeding the node's energy availability.

There are at least two approaches for pursuing this objective. One models job scheduling as an optimization problem. Chen et al. [11], Ghasemi-Falavarjani et al. [12], Wei et al. [13] suggested to include a device's remaining energy as a constraintin the problem formulation, i.e., while exploring feasible solutions, the energy employed in executing jobs must not exceed the available energy on the devices' batteries. To tailor input variables of algorithms following this approach, it is necessary to have accurate job energy consumption data, which are rarely available. To obtain such data in the general case, a detailed quantification of resource demands are needed, which, in turn, vary according to device characteristics. Given the wide variability of device models on the market (cf., e.g., work by Rieger and Majchrzak [14]), it is unrealistic to assume homogeneous device clusters. If not precomputed, scheduling input should be obtained while converting a data stream into jobs to be processed.

The other approach does not require energy-related job details. It performs load balancing based solely on node characteristics. Hirsch et al. [15] combined the last known battery level with a function including different performance scores, which rates the capability of a device to successfully complete the last arrived job. Jobs are scheduled by following an *online* approach, i.e., upon each job arrival, the scheduling logic creates a ranking by evaluating the function for all candidate devices, and the job is assigned to the one ranked best.

Resource heterogeneity imposes further challenges that scheduling algorithms in dew computing contexts must deal with. The co-existence of smart devices that belong to the same or different generations, equipped with hardware able to render dissimilar computing and data transfer throughput, should not be ignored when including them as first-class resource providers. The authors of Yaqoob et al. [16] considered the number of cores, speed, and CPU workload, which are evaluated by the proposed heuristics when allocating computing-intensive tasks to mobile devices. The authors of Hirsch et al. [15] considered heterogeneity by differentiating the nodes' computing capabilities via their MFLOPS indicator, which is a common metric in scientific computing to rate processor speed when executing floating-point operations. All in all, the heuristics by Yaqoob et al. [16] recognize resource heterogeneity related to computing capability only. For stream processing applications, where data transfer under varying delay and energy consumption of wireless communication is present, new practical online heuristics are necessary to deal with both node computing and communication heterogeneity.

#### **3. Motivating Example**

Smart cities integrate multiple sources of information and process massive volumes of data to achieve efficient solutions and monitor the state of a wide range of common issues, including maintenance of public spaces and infrastructure or the security of citizens, just to mention two of them. Ultimately, they contribute to societal security [17]. Participatory sensing platforms encourage citizens to contribute incidents data, such as geolocalized photos, videos, and descriptions that have to be analyzed, filtered, and prioritized for proper treatment. This requires a data processing infrastructure and depends on the citizens' willingness to manually enter or record data.

A proactive way to gather relevant data could be installing a dedicated sensor and processing infrastructure. However, to reduce fixed costs and to avoid the congestion of communication networks with a high volume of raw data captured [18], a hybrid approach that exploits near-the-field, ubiquitous computing power of smart mobile devices is feasible. By analyzing a city's dynamics, it is not hard to identify places where citizens are regularly connected to the same local area network with their smartphones, e.g., small parks or public transport. Suppose that citizens in such a context agree to contribute processing power, even though they may not like to provide data sensed with their devices. However, these may be used to filter and identify relevant information from data streams captured by sensors cleverly positioned within the context, and connected to the same network as nearby mobile users.

Consider, for instance, passengers riding a bus, where smartphones receive data via their WiFi connections. These may be samples of environmental sounds or images captured with devices that have been specifically installed in the bus for, e.g., real-time sensing of noise pollution, detecting pavement potholes, counting trees or animals, or whatever information may be useful for a smart city to forecast events, schedule repairs, or public space maintenance duties. The smartphones could be used, on a voluntary basis, for preprocessing such data before it is transferred to the distant cloud in a curated and reduced form. Pre-processed data, in turn, could be used to feed Internet-accessible, real-time heatmaps summarizing such information so that decision-makers can act promptly and accordingly. In terms of the hardware resources to be exploited, the computations required to do so might range from medium-sized, CPU-intensive ones, such as finding patterns in digitized sound streams, to complex CPU/GPU-intensive tasks such as detecting or

tracking objects from image streams using deep neural networks. Nowadays, it is not surprising to find affordable smartphones on the market with eight-core processors and GPUs capable of running deep learning frameworks such as Tensorflow (https://www. tensorflow.org/ accessed on 18 August 2021).

How to efficiently and fairly balance data processing among available smartphones in a dew context is challenging, though. This essentially stems from the singularities that characterize smartphones [19], namely user mobility, lack of ownership, and exhaustible resources. Smartphones are inherently mobile devices and their users may enter/leave the dew context in unpredictable ways, constraining the time window within which computations can be submitted and successfully executed on a device. Failure to obtain task results from a leaving device intuitively forces the scheduler to either discard these results, which harms effectiveness from the application standpoint, or re-submitting the associated tasks to alternative devices, which harms application throughput. Moreover, lack of ownership means that from the perspective of a data processing application, smartphones are non-dedicated computing resources, of course. Resources such as CPU/GPU time, memory, storage, and network bandwidth are shared with user processes and mobile applications. Hence, any dew scheduler must ensure that data processing tasks do not significantly degrade the performance of native applications and user experience, otherwise users might become reluctant to contribute or keep contributing their computing resources in dew contexts. Lastly, the serving time of a mobile device is limited both by the energy level of its battery at the time it enters the dew context, and the rate at which energy is consumed by the mobile device, which depends on several factors including screen brightness level, user applications and system services being executed, battery inherent efficiency, and so on. Not only computing capabilities (e.g., CPU/GPU speed) are important for distributing dew tasks, but also energy availability/battery efficiency. Of course, a dew scheduler should not exhaust a mobile device's energy since this would also discourage users to process dew tasks.

#### **4. Load Balancing Heuristics**

Figure 1 depicts an overview of a dew context, a distributed computing mobile cluster (in this case, operating inside a bus) for processing jobs generated locally. When close to the dew context local area network, mobile devices are enlisted to contribute with computing resources by registering themselves with the proxy [20]. In the example, the proxy is an on-chip-pc integrated circuit. The proxy balances the jobs processing load with a heuristic that sorts the devices' appropriateness using some given criterion. The best ranked node is assigned with the incoming job and the ranking is re-generated upon each job arrival.

We propose and evaluate practical heuristics to sort devices, which combine easy-toobtain device and system performance information. One of these is AhESEAS, an improvement to the ESEAS (Enhanced Simple Energy-Aware Scheduler) [20]. Another heuristic

is ComTECAC, which was inspired by criteria targeting nodes' fair energy spending [21]. In the following, we provide details of the formulation of these novel criteria to rank resource provider devices.

**AhESEAS:** the *Ahead Enhanced Simple Energy Aware Scheduler* is a criterion that combines a device's MFLOPS, its last reported battery level (SOC) and a counter of assigned jobs, in the same way as the ESEAS [20] formula, with the exception of a change in the semantics of the last-mentioned counter. While in ESEAS the counter of assigned jobs is updated after the job input has been completely received by the node, in AhESEAS such an update occurs before, i.e., just after a node is selected for executing a job. By issuing an immediate counter update, i.e., without waiting for job input transferring time, gives AhESEAS rapid reaction to fast and continuous job generation, typical in stream processing applications. To differentiate the semantic change and to avoid confusion with the ESEAS formula, we have renamed *assigned Jobs* of ESEAS by *queued Jobs* in the AhESEAS formula (and adding 1 to avoid that the denominator may become 0):

$$AhESEAS = \frac{MFloops \* SOC}{queuedefobs + 1} \tag{1}$$

**ComTECAC:** the *Computation-Communication Throughput and Energy Contribution Aware Criterion* utilizes indicators of a node's computing and communication capabilities, as well as its energy spent for executing dew jobs, which is implemented with battery level updates reported by nodes. Ranking heuristics using ComTECAC determine the bestranked node not only using a queued jobs component, but also with an energy contribution component. Thus, the load is evenly distributed among nodes, avoiding that strong nodes drain their batteries too much and earlier than weak nodes. This heuristic's formula is:

$$\text{ComTECACC} = \frac{\text{MFloops} \ast \text{netPerf}}{\text{queuedJobs} + 1} \ast (\text{SOC} - \text{eContrib}) \tag{2}$$

where:


The rationale that led us to propose node ranking formulas based mainly on node information is that such information is easy to systematize providing that node resources/ capabilities can be determined through the specifications of the constituent hardware parts, either via manufacturer data (e.g., battery capacity) or comparisons with other nodes through benchmark suites [22–25]. For scheduling purposes, the update frequency of such information in our approach would depend on the existence of new device models or benchmarks on the market. Node information can be collected once, systematized, and reused many times. In contrast, except for special-purpose systems running jobs from very specific applications, scheduling logic that uses job information as input—e.g., job execution time—is difficult to generalize, in principle, due to the impossibility to accurately estimate the job execution time of any possibly given job [26]. In the dew contexts studied in this paper, where nodes are battery-driven, balancing load to maintain as many nodes ready for service for as long as possible, is a strategy that maximizes the parallel execution of independent jobs, which in turn aims at reducing makespan. The more cores are available, the more jobs execution can be expected to progress in parallel. For the proposed node ranking formulas, we adopt a fractional shape with a numerator reflecting a node's

potential computing capability and a denominator expressing the way such capability diminishes or is decomposed with subsequent job assignments.

In the case of the AhESEAS ranking formula (Equation (1)), the node's computing capabilities are quantified by combining mega float point operations per second (MFLOPS) and the state of charge (SOC). The first factor is a widely used metric in scientific computing as an indicator of a node's computing power and can be obtained with the Linpack for Android multi-thread benchmark. MFLOPS serves the scheduler as a theoretical maximum computing throughput a node is able to deliver to the system. SOC provides information on how long a node could maintain this theoretical throughput. According to this criterion, for instance, nodes able to deliver the same MFLOPS but having different SOC would be assigned different numbers of jobs. This behavior relates to the fact that batteries are the primary energy source for job completion.

In the ComTECAC ranking formula (Equation (2)), a node's potential computing capability emerges from combining more factors than *MFLOPS* and *SOC*. The numerator also includes wireless medium performance parameters, which is relevant to account for jobs input/output data transfers. Furthermore, *SOC* is not included as an isolated factor but as a component of a function that reflects a short-term memory of the energy contributed by nodes in executing jobs.

Note, that all node ranking formula parameters but *queued Jobs*, which simply counts jobs, represent resource capabilities and not job features. Moreover, except for *SOC*, which needs to be updated periodically, such parameters are constants and can be stored in a lookup table. All this makes the calculation of the formulas for the selection of the most appropriate node to execute the next incoming job, to have complexity *O*(*n*) , where *n* is the number of currently connected nodes in the Dew context.

#### **5. Evaluation: Methodology and Experiment Design**

These novel load balancing heuristics have been evaluated using DewSim [27], a simulation toolkit for studying scheduling algorithms' performance in clusters of mobile devices. Real data of mobile devices are exploited by DewSim via a trace-based approach that represents performance and energy-related properties of mobile device clusters. This approach makes DewSim the best simulation tool so far to realistically simulate mobile device battery depletion, since existing alternatives use more simplistic approaches, where battery depletion is modeled via linear functions. Moreover, the toolkit allows researchers to configure and compare scheduling performance under complex scenarios driven by nodes' heterogeneity. In such a distributed computing system, cluster performance emerges, on one side, from nodes' aggregation operating as resource providers. On the the other side, performance depends on how the job scheduling logic manages such resources. A node's individual performance responds to node-level features including computing capability, battery capacity, and throughput of the wireless link established with a centralized job handler (proxy). Tables 1 and 2 outline node-level features considered in the experimental scenarios.

The computing-related node-level features presented in Table 1 refer to the performance parameters of real devices, whose brand/model is given in the first column. The performance parameters include the MFLOPS score, which is used by the simulator to represent the speed at which jobs assigned by the scheduler are finalized. The MFLOPS of a device are calculated from averaging 20 runs of the multi-thread benchmark of the Linpack for Android app. The multi-thread version of the test uses all the mobile device processor cores. The columns Node-type, OS version Processor, Chipset, and Released are informative, as these features are not directly configured in simulation scenarios but indirectly considered in the device profiling procedure. This procedure produces battery traces as a result, used to represent different devices' energy depletion curves.


**Table 1.** Computing-related node-level features.

**Table 2.** Communication-related node-level features.


Communication-related node-level features, i.e., time and energy consumed in data transferring events, such as job data input/output size and nodes status updates are shown in Table 2. Reference values correspond to a third-party study [28], which performed detailed measurements to characterize data transfer through WiFi interfaces, particularly the impact of received signal strength (RSSI) and data chunks size on time and energy consumption.

Nodes ready to participate in a local, clustered computation form a mobile cluster at the edge, whose computing capabilities derive from the number of aggregated nodes and their features. Cluster-level features considered in experimental scenarios are described in Tables 3 and 4. Specifically, Table 3 shows criteria to derive different types of heterogeneity levels w.r.t. where the instantaneous computing throughput comes from. In short, targeting a defined quality of service by relying on few nodes with high throughput differs in terms of potential points of failures and energy efficiency w.r.t. achieving this with many nodes having lower throughput each. Table 4 outlines criteria to describe communication-related properties of clusters where, for instance, an overall good communication quality—GoodComm—means that a cluster has at least 80% of resource provider nodes with good or excellent RSSI (good\_prop + mean\_prop + poor\_prop = 100% of nodes). In contrast, mean communication quality—MeanComm—suggests that a cluster has at least 60% of resource provider nodes with RSSI of −85 dBm. Finally, Table 5 shows the criteria used to conform cluster instances by combining the computation- and communicationrelated properties mentioned above. For instance, clusters of type Good2High are instances

where nodes providing the fastest instantaneous computing capability relative to other nodes in the cluster also have the best performance in terms of communication throughput. In contrast, the Good2Low category describes cluster instances where the best communication performance is associated with nodes able to provide the slowest instantaneous computing capability. Finally, the Balanced cluster category means that best communication performance is equally associated with nodes with the fastest and the slowest instantaneous computing capabilities.

**Table 3.** Computing-related cluster-level features.


**Table 4.** Communication cluster-level features.


**Table 5.** Mapping of communication and computation cluster-level features.


Job sets were created using the *siminput* package utility of the DewSim toolkit [27]. We defined job bursts that arrive at varying intervals during a thirty minutes time window. Such a window represents a time extension where a vehicle can travel and scan a considerable part of its trajectory. Moreover, within this window the mobile devices of a group of passengers in a transport vehicle (e.g., a bus) can reasonably stay connected to the same shared access point. Intervals represent video or audio recording, i.e., in-bus data capturing periods. It is assumed that the recording system has a limited buffer, which is emptied at a point in time defined by a normal distribution with mean of 12 s and deviation of 500 ms. With every buffer-emptying action, a new jobs burst is created and all captured data, which serves as input for a CPU-intensive program, is transferred to mobile devices that participate in the distributed processing of such data. Jobs are of

fixed input size. We created job sets where each job input has 1 MB and 500 KB, while output size varies between 1 and 100 KB. A single job takes 0.45–1.85 s of computing time when it executes on the fastest (Samsung Galaxy SIII) and the slowest (LG L9) device model of an older cluster, respectively. Moreover, this time is considerably less when a job is executed in a cluster with nodes that belong to the group of recently launched, i.e., 90–320 milliseconds when executing on a Xiaomi Redmi Note 7 and a Motorola Moto G6, respectively. For defining time ranges, a pavement crack and pothole detection application implemented for devices with similar performance to those in the experiments of Tedeschi and Benedetto [3] was the reference. Bursts are composed of varying numbers of jobs, depending on the interval's extension. Job requirements in terms of floating-point operations fall within 80.69 to 104.69 MFLOP.

Figure 2 depicts a graphical representation of the computing and data transferring load characteristics of the job sets simulated in dew context scenarios. Frequency bar subplots at the bottom show the data volumes and arrival times of job bursts transferred during a 30 min time window. Subplots at the middle and top show, for the same time window, the MFLOP and job counts of each job burst. For example, when jobs input data was set to 1 MB, approximately 52.78 GB of data were transferred, and derived jobs required 4 775 GigaFLOP to be executed within such a time window.

**Figure 2.** Job set characteristics.

#### **6. Metrics and Experimental Simulation Results**

The scheduling heuristics' performance is measured in terms of completed jobs, makespan, and fairness, which are metrics reported in similar distributed computing systems studies [15,16,29,30].

**Completed jobs**: Providing that mobile device clusters rely on the energy stored in the mobile devices' batteries to execute jobs, scheduling technique A is considered to be more energy efficient than scheduling technique B, if the former completes more jobs than the latter with the same amount of energy. The job completion count finishes when all nodes leave the cluster, in this case, due to running out of energy.

**Makespan**: Measures the time the distributed system needs to complete the execution of a job set. We normalized these times duration into a 0–1 scale, where the value 1 refers to the heuristic that requires the longest makespan. To calculate the makespan, we compute the difference between the time when the first job arrives and the time when the last job is completed. To calculate the latter when all compared heuristics achieved different numbers of completed jobs, we first compute the maximum number of jobs that all heuristics completed, and use this value as a pivot to obtain the time when the last job is completed.

**Fairness**: The difference in energy contributed by provider nodes from the time each one joins the cluster to the time each one completes its last assigned job, is quantified via the Jain's fairness index [31]. This index was originally used to measure the bandwidth received by clients of a networking provider but, in our case, much as by Ghasemi-Falavarjani et al. [12], Viswanathan et al. [30], it is used to measure the disparity of energy pulled by the system from provider nodes. The metric complements the performance information given by completed jobs and makespan metrics.

We ran all heuristics on 2304 scenarios with varying nodes, cluster, and job characteristics. We distinguished between *older* and *recent* clusters. Older clusters are conformed by devices with instantaneous computing capability of 300 MegaFLOPS and below, which includes the LG L9, Samsumg Galaxy S3, Acer A100, and Phillips TLE732 device models of Table 1. This is the cluster used in our previous work, see Hirsch et al. [1]. Conversely, recent clusters are conformed by the Motorola MotoG6, Samsumg A30, Xiaomi A2 lite, and Xiaomi RN7 device models with floating-point computing capability of 300 MegaFLOPS and above. Figure 3 depicts the position that each group of scenarios occupies in the heatmap representation used to display the performance values obtained for each heuristic.

**Figure 3.** Simulated scenarios: heatmap pixels.

Figure 4 shows the results of each heuristic's completed jobs for older cluster scenarios. The darker the pixel intensity, the better is the performance achieved. Several effects of simulated variables on completed jobs are observed. First, by comparing Figure 4a,b, which show the numbers of completed jobs for AhESEAS and ESEAS, respectively, we see the

magnitude of improvement introduced by the AhESEAS denominator component update policy (see Section 4). In the presence of job input above 500 KB and approximately 360 jobs generated every 12 s, which is the injected load in the scenarios, load balancing is better managed by the denominator update policy of the AhESEAS formula than that of ESEAS. AhESEAS exceeds ESEAS's completed jobs in all scenarios. On average, the former was better than the latter by 58.6% with a standard deviation of 18.5%. Such an advantage is maintained in recent clusters scenarios, where AhESEAS (Figure 5b) is better than ESEAS (Figure 5a) by 55.2% completed jobs on average with a standard deviation of 19.1%.

By comparing scenario results presented at the top half and bottom half heat maps, we see the effect of job input size, i.e., completed jobs decreases as job input size increases. Such an effect is more noticeable in load balancing performed with ESEAS and AhESEAS than with the other heuristics. This indicates that job completion is not exclusively determined by how a heuristic weighs node computing capability, but also how job data and nodes' communication-related features are included in the node ranking formula. RTC and ComTECAC include communication-related parameters in their respective node ranking formulas. RTC uses job data input/output size and node RSSI, while ComTECAC employs a function of node RSSI that relates this parameter with network efficiency. For this reason, job input size does not affect RTC and ComTECAC but certainly affects the ESEAS and AhESEAS heuristics.

Particularly, the remaining transfer capacity (RTC) heuristic [32] is instead inspired by the online MCT (minimum completion time) heuristic, which has been extensively studied in traditional computational environments such as grids and computer clusters. RTC immediately assigns the next incoming job to the node whose remaining transfer capacity is the least affected, interpreting it as the estimated capability of a battery-driven device to transfer a volume of data, considering its remaining battery level, energy cost in transferring a fixed data unit, and all jobs data scheduled in the past that waits to be transferred. At the time the remaining transfer capacity of a node is estimated, all future job output data transfers from previous job assignments are also considered.

Other variables with orthogonal effect in the number of completed jobs, i.e., observed for all heuristics, are cluster size and cluster heterogeneity. By scanning heat map results from left to right, it can seen that, for instance, in 10 nodes cluster size scenarios there are fewer completed jobs than in 20, 30, and 40 nodes cluster size scenarios. Moreover, cluster heterogeneity degrades the performance of ESEAS, AhESEAS, and RTC more than the one of the ComTECAC heuristics.

Now, focusing on overall scenario performance, when comparing the relative performance of AhESEAS and RTC, a noticeable effect emerges. In older clusters scenarios (see Figure 4), AhESEAS achieves on average higher job completion rates than RTC. RTC's weakness in assessing the nodes' computing capability seems to cause an unbalanced load with a clear decrease in job completion. However, this weakness seems not to be present in recent clusters scenarios (see Figure 5), where the relative performance between AhESEAS and RTC is inverted. With recent clusters, RTC's unbalanced load is impeded by higher computing capability nodes of all recent clusters.

(**c**) RTC (**d**) ComTECAC

**Figure 4.** Older clusters completed jobs (dark is better).

(**a**) ESEAS (**b**) AhESEAS

(**c**) RTC (**d**) ComTECAC

**Figure 5.** Recent clusters completed jobs (dark is better).

To finish this analysis, we can say that by taking this metric as an energy-harvesting indicator, we confirm that ComTECAC is the most energy-efficient scheduling heuristic on average. Having the same amount of energy consumption as all other heuristics, it completes more jobs in older and recent clusters simulated scenarios.

At this point, before starting to discuss other performance metrics, it is worth mentioning that due to the poor performance that ESEAS showed in terms of completed jobs, we leave it aside and focus on AhESEAS, RTC, and ComTECAC.

Figures 6 and 7 depict the makespan of AhESEAS, RTC, and ComTECAC for older clusters and recent clusters scenarios. In this case, the pixels' color intensity inversely relates to high performance, i.e., the lighter the pixel the better the performance. As explained above, where the metrics are presented, to report a comparable makespan measurement between all heuristics, we computed makespan, considering a subset of completed jobs, instead of all jobs, in the following manner. For each scenario, we determined the maximum of completed jobs for all heuristics and used this value as a reference to compute the makespan value of each heuristic. The makespan values reported in Figures 6 and 7 were calculated using completed jobs of the RTC, AhESEAS, and ComTECAC heuristics. The dark blue pixels' predominance in Figure 6a indicates that RTC was the heuristic with the overall longest makespan on average, compared to that obtained by the AhESEAS and ComTECAC heuristics. Moreover, when comparing RTC and AhESEAS, we see that one's makespan mirrors the other, i.e., in scenarios where RTC achieves good makespan, AhESEAS obtained bad makespan values and vice versa.

(**a**) RTC (**b**) AhESEAS (**c**) ComTECAC

**Figure 6.** Older clusters' makespan (light is better).

**Figure 7.** Recent clusters' makespan (light is better).

When analyzing recent clusters scenarios, shown in Figure 7a, we see the effect of device models with more computing capability on the relative performance of RTC and AhESEAS. In these scenarios, RTC achieves, on average, better makespan than AhESEAS. These results are in line with the behavior observed via the completed jobs metric. These results provide evidence on an AhESEAS node ranking formula's weakness, which underestimates the jobs data transferring requirement in the nodes ranking formula. In summary, ranking formulas of both RTC and AhESEAS heuristics have weaknesses. By increasing clusters' instantaneous computing capability with new device models, we see that the RTC weakness can be compensated, while the AhESEAS weakness is increasingly visible. Consistent with this affirmation, when comparing 1 MB input with 500 KB input scenarios presented in Figure 7a,b, we see that, when considering those scenarios taking the lowest time to transfer data, i.e., where jobs have 500 KB input, RTC performed worse than AhESEAS.

Job input size, cluster size, and cluster heterogeneity effects described for the completed jobs metric still apply. To finish this analysis, we may say that the ComTECAC

ranking formula combines the strengths from RTC and AhESEAS, and its advantage over the other heuristics in the majority of scenarios is remarkable.

Finally, we report the performance of the heuristics using the fairness metric. Provided that the heuristics target different numbers of completed jobs for the same scenario, to calculate fairness, we followed similar initial steps as with makespan. For each scenario, we first determined the maximum number of completed jobs by all heuristics (except ESEAS), and used it as reference value to compute the fairness score. Once obtained, we searched for each heuristic for the associated time stamp when this number of completed jobs has been reached. The time stamp is another reference, in this case, to get the last battery level reported by each participating node. Then, with such data, and the initial battery level reported by each node, we computed an energy delta, i.e., the node energy contribution, which is interpreted as a sample in the fairness score calculation formula.

According to Figures 8 and 9, RTC's fairness is clearly lower than that of AhESEAS and ComTECAC, on average. In contrast, since the fairness scores of the last two heuristics are quite similar, we formulated the null hypothesis *H*<sup>0</sup> that the fairness achieved is the same. We tested *H*<sup>0</sup> with the Wilcoxon test, pairing the fairness values of AhESEAS and ComTECAC for 2304 scenarios. This resulted in a *p*-value of *p* = 1.7 × 10−67, which lead us to reject *H*0. To conclude this analysis and figure out which of the last two heuristics performed better, we re-computed the fairness metric, this time considering completed jobs only by the AhESEAS and ComTECAC heuristics. The ComTECAC fairness values shown in Figures 10b and 11b are seemingly better than the ones of AhESEAS shown in Figures 10a and 11a. We confirm this by complementing heat maps with a cumulative scenarios density function for older cluster scenarios in Figure 10c and recent cluster scenarios in Figure 11c. In older and recent clusters, the ComTECAC CDF increase is more pronounced than that of AhESEAS as the fairness score increases, i.e., for many scenarios, ComTECAC achieves higher fairness than AhESEAS.

**Figure 9.** Recent clusters' fairness (dark is better).

**Figure 10.** Recomputed fairness considering AhESEAS and ComTECAC heuristics only (dark is better).

**Figure 11.** Recomputed fairness for recent clusters scenarios considering AhESEAS and ComTECAC heuristics only (dark is better).

#### **7. An Experiment of Simulating Video Processing**

In spite of the experiments presented above, a question that remained unanswered was whether distributing and executing tasks that process a given stream of edge data using nearby mobile devices is actually beneficial as compared to gather and process the stream individually by smartphones. Assuming that finding a generic answer is difficult, we focus our analysis on a generic class of stream processing mobile application that might be commonplace in smart city dew contexts: per-frame object detection using widespread deep learning architectures over video streams. The goal of the experiment in this section is to simulate a realistic video processing scenario at the edge using a cluster of smartphones. We base this on real benchmark data from mobile devices performing deep learning-based object detection.

As a starting point, we took the Object Detection Android application from the Tensorflow Lite framework (https://www.tensorflow.org/lite/examples/object\_detection/ overview accessed on 18 August 2021). It includes a YOLO v3 (You Only Look Once) neural network able to detect among 80 different classes. The application operates by reading frames from the smartphone camera and producing a new video with annotated objects, indicating detected class (e.g., "dog") and confidence value. We modified the application to include the newer YOLO v4 for Tensorflow Lite, which improves detection accuracy and speed, see Bochkovskiy et al. [33], and the average frame processing time since application start. A screenshot of the application is shown in Figure 12. The APK is available at https://drive.google.com/file/d/18Q5SLrKtvgsyAb\_wA7QZ0TMjIM\_jK9hz/view accessed on 18 August 2021.

**Figure 12.** Screenshot of the Tensorflow Lite modified object detection application. The dark area displays the camera input and detections..

After that, we took each smartphone belonging to the *recent* cluster used in the previous section, and performed the following procedure:


Then, we repeated steps (2)–(5) by incrementally using more threads in the device, until finding the lowest average inference time. Lastly, we processed the log files by removing the initial entries representing small values that were due to the app not yet processing camera video.

Table 6 shows the results from the procedure when applied to the *recent* cluster smartphones. To instantiate the DewSim simulator with Tensorflow Lite benchmark information, there was the need to convert job inference time into job operations count. This is because DewSim uses job operations count, node flops, and node CPU usage to simulate jobs completion time. To approximate the job operations count, we applied the following formula, which combines inference time, MaxFlops, and CPU usage percentage:

**Table 6.** Tensorflow Lite app on recent cluster smartphones: benchmark results.


$$
\hat{y}\_i \\
b \\
\text{Ops} = IT \* MF \* \text{CPL} \tag{3}
$$

where *IT* is the average inference time (in seconds), *MF* is the MaxDeviceFlops (in multithread mode), and *CPU* is the CPU usage percentage, which is known data derived from the benchmarking tasks described above.

However, jobOps varies when instantiating this formula for different device models. To feed DewSim with a unified jobOps value—in practice the same job whose completion time varies with device computing capabilities—we adjusted the DewSim internal logic to express nodes' computing capability as a linear combination of the fastest node which was used as pivot. In this way, devices computing capability is expressed by MaxFlops multiplied by a coefficient obtained from the following linear equation system:

$$jobOps = IT\_{pivot} \* MF\_{pivot} \* CPU\_{pivot} \tag{4}$$

$$
gamma = IT\_{dev1} \* MF\_{dev1} \* \mathbb{C}PUI\_{dev1} \* Co\_{dev1} \tag{5}
$$

$$
\hat{y}\_{\hat{\nu}} \text{obOps} = \textit{IT}\_{\textit{dev2}} \ast \textit{MF}\_{\textit{dev2}} \ast \textit{CPI}\_{\textit{dev2}} \ast \textit{Co}\_{\textit{dev2}} \tag{6}
$$

$$
\Box \jmath ab \heartsuit p \text{s} = \varprojlim \text{3} \ast \text{MF}\_{\text{dev3}} \ast \text{CPU}\_{\text{dev3}} \ast \text{Co}\_{\text{dev3}} \tag{7}
$$

This adjustment allowed DewSim to realistically mimic the inference time and CPU usage as benchmarked for each device model. In addition, the energy consumption trace for the observed CPU usage in devices while processing frames in a certain device model was also configured to DewSim.

Figure 13 shows the processing power and time employed by different smartphone clusters (dew contexts) in processing 9000 jobs, which directly map to individual frames contained in a 5 min 30 FPS video stream. Each job has an input/output size of 24 KB/1 KB, respectively. The input size relates to 414 × 414 images size that the Tensorflow Lite model accepts as input. Besides, 1 KB is the average size of a plain text file that the model can produce with information of objects identified in a frame. Job distribution in all dew contexts was done via ComTECAC since, as previous experiments showed, this scheduling heuristic achieved the best performance in terms of completed jobs, makespan, and fairness metrics as compared to other state-of-the-art heuristics. Heterogeneity in all configured dew contexts is given not only by the combination of nodes with different computing capabilities but also by different initial battery levels because ComTECAC also uses this parameter to rank nodes. We configured scenarios where good battery levels (above 50%) are assigned either to fast—FastNodesWGoodBattLevel—or to slow— SlowNodesWGoodBattLevel—nodes, or they are equally distributed between fast and slow nodes—HalfFastAndSlowNodesWGoodBattLevel. Another scenario considered all nodes having the same battery level. Figure 13a reveals that dew contexts with four nodes can reduce makespan by more than half as compared to what a single instance of our fastest benchmarked smartphone is able to achieve, which means collaborative smartphonepowered edge computing for these kinds of applications is very useful. Figure 13b,c show that makespan can be considerably reduced as more nodes are present in the dew context. Particularly in a dew context with 12 nodes, see Figure 13c, near real-time stream processing is observed, where the job processing rate is scarcely a few seconds behind the job generation rate.

%)+! \$&!

(**a**) Four heterogeneous nodes' cluster size

%)+! \$&!

(**b**) Eight heterogeneous nodes' cluster size

(**c**) Twelve heterogeneous nodes' cluster size

**Figure 13.** Object detection using the ComTECAC scheduling heuristic over a 5-min, 30 FPS video stream.

%)+! \$&!

#### **8. Discussion**

#### *8.1. Summary of Results*

From the extensive experiments performed using battery traces and performance parameters from real devices, we arrived at several conclusions about all presented heuristics. First of all, by comparing ESEAS and AhESEAS completed jobs, the performance improvement achieved when changing the denominator update policy is remarkable, i.e., the logic followed by AhESEAS of updating the queued jobs component as soon as a node is selected for executing the next job. This holds when the job input size is at least 500 KB w.r.t. updating the denominator component when the node completely receives the whole job input which is the logic to update the denominator of ESEAS. With this change, between 55.2 and 58.6% more jobs are completed on average.

In older clusters, AhESEAS completed 8.2% and 4.3% more jobs on average than the RTC heuristic in 500 KB and 1 MB data input scenarios, respectively. In contrast, for recent cluster scenarios, RTC outperformed AhESEAS by 1.46% and 3.4% of completed jobs on average in 500 KB and 1 MB data input scenarios, respectively. Besides, we see a direct relationship between completed jobs and makespan, meaning that the heuristic which beats the other, either AhESEAS or RTC, in jobs completion also achieved the relatively least makespan. Moreover, by comparing the fairness of these two heuristics in older and recent clusters scenarios, it yields that AhESEAS always behaves better than RTC.

In summary, AhESEAS and RTC show complementary behavior. Simulated scenarios where one of these achieves high performance the other targets low performance and vice-versa. Both heuristics would be needed to achieve high performance in a wide variety of scenarios.

On the contrary, ComTECAC performance is stable and high in all scenarios. For ranking nodes, ComTECAC considers communication and computing capabilities, as well as energy contribution parameters. The combination of all these allows ComTECAC to complete slightly more jobs than the second-best heuristic, between 0.2% and 3%. At the same time, ComTECAC targets considerably less makespan than its competitors. It achieves a speedup around 1.69 and 2.74 w.r.t. the second-fastest heuristic. ComTECAC is also the fairest heuristic for load balancing. In older clusters, the first 50% (median), 80%, and 90% distribution samples, present fairness values of 0.88, 0.92, and 0.94, respectively, while the second fairest heuristic for these cutting points results in fairness values of 0.84, 0.86, and 0.87, respectively. In recent clusters, the same analysis is even more in favor to ComTECAC, whose fairness values are 0.92, 0.94, and 0.95 vs. 0.88, 0.92, and 0.94 achieved by AhESEAS.

Finally, we instantiated our simulations with a practical case in which mobile devices recognize objects from video streams using deep learning models. For running these simulations, we explained the adaptations that were performed to DewSim, which represent a starting point for incorporating other benchmarked models. We conclude that close to real object detection is viable with a reasonable amount of dedicated mobile devices.

#### *8.2. Practical Challenges*

In the course of doing this work, we have identified some challenges towards exploiting the proposed scheduling heuristics in particular, and the collaborative computing scheme as a whole in general, in order to build smart cities. First, our collaborative computing scheme lacks mechanisms for promoting citizens' participation, accounting for computing contribution, and preventing fraud in reporting results. Incentive mechanisms proposed for collaborative sensing are not applicable for resource-intensive tasks and, in fact, some research has been done on this topic [19]. Some of the questions that remain unanswered regarding these challenges are: Is the job completion event a good checkpoint for giving credits to resource provider nodes? What are the consequences of giving a fixed amount of credits upon a job completion irrespective of the time and energy employed by a device? How many results of the same job would be necessary to collect in order to prevent fraud in reporting job results? In short, apart from luring citizens into using our scheme, it is necessary to reward good users and to identify malicious ones.

Another evident challenge is that a middleware implementing basic software services for supporting the above collaborative scheme is necessary. Besides, wide mobile OS and hardware support in the associated client-side app(s) must be ensured, which is known to be a difficult problem from a software engineering standpoint. To bridge this gap, we are working on a middleware-prototype for validating our findings. We already integrated libraries that use traditional machine vision and deep learning object recognition and tracking algorithms into our device profiling platform, which is a satellite project of the DewSim toolkit. This is necessary to validate our load balancing heuristics with real object recognition algorithms, which in turn complement our battery-trace capturing method that currently exercises CPU floating-point capabilities through a generic yet synthetic algorithm. This integration also allows for deriving new heuristics to refine the exploitation of mobile devices by profiling specialized accelerator hardware such as GPUs and NPUs, which are suited for running complex AI models [22]. The first steps have been already taken using our Tensorflow-based Android benchmarking application, but we need to study how to generate GPU-aware energy traces, how to properly profile GPU hardware capabilities into indicators, and how to exploit these traces and indicators through specialized AI job schedulers.

Lastly, another crucial challenge is how to ensure proper QoS (Quality of Service) in our computing scheme considering that, in smart city applications, there is high uncertainty regarding the computing power—in terms of the number of nodes and their capabilities available at any given moment to process jobs. The reason is that devices might join and leave a dew context dynamically. In this line, several questions arise: is it possible to predict within acceptable error margins how much time an individual device will stay connected in a dew context? For example, in certain smart city dew contexts (e.g., public transport) connectivity profiles for each contributing user could be derived by exploiting users' travel/mobility patterns. Then, long-lasting devices could be given more jobs to execute. Another question is how to regulate job creation in a dew context, so that no useless computations are performed and, hence, higher QoS (in terms of, e.g., energy spent and response time) is delivered to smart city applications? As an extreme example, the number of jobs created from a continuous video stream in applications involving public transport should not be constant, but depend on the current speed of the bus.

#### **9. Conclusions and Future Work**

In this paper, we present a performance evaluation of practical job scheduling heuristics for stream processing in dew computing contexts. ESEAS and RTC are heuristics from previous work, while AhESEAS and ComTECAC are new. We measured the performance using the completed jobs metric, which quantifies how efficiently the available energy in the system is utilized: the makespan metric, which indicates how fast the system completes job arrivals, and the fairness metric, which measures the energy contribution differences among participating devices. The new heuristics, specially ComTECAC, had superior performance. These results present a step towards materializing the concept of dew computing using mobile devices from regular users. It will be applied to real-world situations where online data gathering and processing at the edge are important, such as smart city applications.

Despite our focus on heuristics to orchestrate a self-supported distributed computing architecture that leverages idle resources from clusters of battery-driven nodes, to extend the architecture's applicability, new efforts will follow. We will study complementing battery-driven resource provider nodes with non-battery-driven fog nodes, e.g., singleboard computers, in a similar way to other work that studied the synergy among different distributed computing layers, e.g., fog nodes and cloud providers [34].

With the goal of making our experimental methodology easier to adopt and use, another aspect involves the development of adequate soft/hard support to simplify the process of battery trace creation to feed DewSim. For instance, Hirsch et al. [35] have recently proposed a prototype platform based on commodity IoT hardware such as smart Wifi switches and Arduino boards to automate this process as much as possible. The prototype, called Motrol, supports batch benchmarking of up to four smartphones simultaneously, and provides a simple, JSON-based configuration language to specify various benchmark conditions such as CPU level, required battery state/levels, etc. A more recent prototype called Motrol 2.0, see Mateos et al. [36], extends the previous support with extra charging sources for attached smartphones (fast, AC charging and slow, USB charging) and a webbased GUI written in Angular to launch and monitor benchmarks. Further work along these lines is already on its way.

**Author Contributions:** Conceptualization, M.H., C.M., A.Z. T.A.M., T.-M.G., and H.K.; methodology, M.H. and C.M.; software, M.H. and C.M.; validation, M.H.; writing—original draft preparation, M.H.; writing—review and editing, C.M., A.Z., T.A.M., T.-M.G., and H.K.; visualization, M.H.; supervision, C.M. and A.Z.; funding acquisition, C.M., A.Z., and T.A.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by CONICET grant number 11220170100490CO and ANPCyT grant number PICT-2018-03323.

**Data Availability Statement:** To obtain data and software supporting the reported results feel free to email the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Laixiang Xu <sup>1</sup> , Jun Xie 2 , Fuhong Cai <sup>1</sup> and Jingjin Wu 3, \***


**Abstract:** Convolutional neural networks (CNN) can achieve accurate image classification, indicating the current best performance of deep learning algorithms. However, the complexity of spectral data limits the performance of many CNN models. Due to the potential redundancy and noise of the spectral data, the standard CNN model is usually unable to perform correct spectral classification. Furthermore, deeper CNN architectures also face some difficulties when other network layers are added, which hinders the network convergence and produces low classification accuracy. To alleviate these problems, we proposed a new CNN architecture specially designed for 2D spectral data. Firstly, we collected the reflectance spectra of five samples using a portable optical fiber spectrometer and converted them into 2D matrix data to adapt to the deep learning algorithms' feature extraction. Secondly, the number of convolutional layers and pooling layers were adjusted according to the characteristics of the spectral data to enhance the feature extraction ability. Finally, the discard rate selection principle of the dropout layer was determined by visual analysis to improve the classification accuracy. Experimental results demonstrate our CNN system, which has advantages over the traditional AlexNet, Unet, and support vector machine (SVM)-based approaches in many aspects, such as easy implementation, short time, higher accuracy, and strong robustness.

**Keywords:** spectral classification; convolutional neural network; portable optical fiber spectrometers

#### **1. Introduction**

The emergence of the Internet of Things (IoT) has promoted the rise of edge computing. In IoT applications, data processing, analysis, and storage are increasingly occurring at the edge of the network, close to where users and devices need to access information, which makes edge computing an important development direction.

There were already applications of deep learning in IoT, for example, deep learning predicted household electricity consumption based on data collected by smart meters [1]; and a load balancing scheme based on the deep learning of the IoT was introduced [2]. Through the analysis of a large amount of user data, the network load and processing configuration are measured, and the deep belief network method is adopted to achieve efficient load balancing in the IoT. In [3], an IoT data analysis method based on deep learning algorithms and Apache Spark was proposed. The inference phase was executed on mobile devices, while Apache Spark was deployed in the cloud server to support data training. This two-tier design was very similar to edge computing, which showed that processing tasks can be offloaded from the cloud. In [4], it is proven that due to the limited network performance of data transmission, the centralized cloud computing structure can no longer process and analyze the large amount of data collected from IoT devices. In [5], the authors indicated that edge computing can offload computing tasks from the centralized cloud to the edge near the IoT devices, and the data transmitted during

**Citation:** Xu, L.; Xie, J.; Cai, F.; Wu, J. Spectral Classification Based on Deep Learning Algorithms. *Electronics* **2021**, *10*, 1892. https://doi.org/10.3390/ electronics10161892

Academic Editor: Nurul I. Sarkar

Received: 26 June 2021 Accepted: 27 July 2021 Published: 6 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the preprocessing process will be greatly reduced. This operation made edge computing another key technology for IoT services.

The data generated by IoT sensor terminal devices need to use deep learning for realtime analysis or for training deep learning models. However, deep learning [6] inference and training require a lot of computing resources to run quickly. Edge computing is a viable method, as it stores a large number of computing nodes at the terminal location to meet the requirements of high computation and low latency of edge devices. It shows good performance in privacy, bandwidth efficiency, and scalability. Edge computing has been applied to deep learning with different aims: fabric defect detection [7], falling detection in smart cities, street garbage detection and classification [8], multi-task partial computation offloading and network flow scheduling [9], road accidents detection [10], and real-time video optimization [11].

Red, green, and blue (RGB) cameras mainly use red, green, and blue light to classify objects. From the point of view of the spectrum, there are three bands that are only in the visible band. The number of spectral bands we use has 1024, including some nearinfrared light bands, which is more helpful for accurate classifications. For instance, the red-edge effect of the infrared band inside can distinguish real leaves from plastic leaves in vegetation detection. Therefore, we believe that increasing the number of spectral channels is more conducive to the application expansion of the system in the future.

The optical fiber spectrometer has been reported for applications in photo-luminescence properties detection [12], the smartphone spectral self-calibration [13], and phosphor thermometry [14]. At present, some imaging spectrometers can obtain spatial images, depth information, and spectral data of objects simultaneously [15]. However, most of the data processed by deep learning algorithms are image data information obtained by these imaging spectrometers. Deep learning algorithms are rarely used to process the reflection spectrum data obtained by the optical fiber spectrometer.

In hyperspectral remote sensing, deep learning algorithms have been widely applied to hyperspectral imaging classification processing tasks. For example, in [16], a spatialspectral feature extraction framework for robust hyperspectral images classification was proposed to combine a 3D convolutional neural network. Testing overall classification accuracies was 4.23% higher than SVM on Pavia data sets and Pines data sets. In [17], a new recurrent neural network architecture was designed and the testing accuracy was 11.52% higher than that of a long short-term memory network, which is on the HSI data sets Pavia and Salinas. A new recursive neural network structure was designed in [18], and an approach based on a deep belief network was introduced for hyperspectral images classification. Compared with SVM, overall classification accuracies of Salinas, Pines, and Pavia data sets increased by 3.17%. Currently, hyperspectral imagers are mainly used to detect objects [19]. Although the optical fiber spectrometer is easy to carry and collect the spectra of objects, it cannot realize the imaging detection research of objects. However, deep learning algorithms are data-driven and can realize end-to-end feature processing. If we process spectral data by combining deep learning algorithms with fiber optic spectrometers, it can further perform the detection and research of objects.

However, most spectrometers need to be connected to the host computer via USB, which cannot be carried easily. In this work, we designed and manufactured a portable optical fiber spectrometer. After testing the stability of the system, we collected the reflectance spectra of five fruit samples and proposed a depth called the convolutional neural network learning method, which performs spectral classification. The accuracy of this method is 94.78%. We boldly combined the deep learning algorithm and the system to complete the accurate classification of spectral data. Using this portable spectrometer, we use edge computing technology to increase the speed of deep learning while processing spectral data.

We have designed a portable spectrometer with a screen; the system can get rid of the heavy host computer and realize real-time detection of fruit quality.

Our portable spectrometer is shown in Figure 1a. The spectrometer has a 5-inch touch screen, and users can view the visualized sample spectrum information on the spectrometer in real-time. As shown in Figure 1b, the system is equipped with an Ocean Optics-USB2000+ spectrometer (Ocean Optics, Delray Beach, FL, USA), to ensure that the system has a spectral resolution of not less than 5nm. As shown in Figure 1d, the system configuration of our spectrometer is a GOLE1 microcomputer, Ocean Optics USB2000+ spectrometer, and a high-precision packaging fixture. The optical fiber can be connected with the spectrometer through a fine-pitch mechanical thread. The overall structure is treated with electroplating black paint, to effectively avoid external light interference and greatly improve the signal-to-noise ratio of spectral information. When using our spectrometer to detect samples, we connect one end of the optical fiber to the spectrometer through a mechanical thread and hold the other end close to the test sample. The reflected light from the sample surface enters the spectrometer through the optical fiber, and the spectrometer converts the collected optical information into electrical signals and transmits it to the microcomputer through the USB cable. The microcomputer visualizes the signal on the screen. Users can view, store, and transmit spectral information through the system's touch screen, innovating the functions of traditional spectrometers that need to be operated by the keyboard and mouse of the host computer.

**Figure 1.** (**a**) The GOLE1 mini-computer used in the experiment. (**b**) The Ocean Optics USB2000+ spectrometer that was used in the experiment. (**c**) Front view of the assembly of the mini-computer and the spectrometer; (**d**) Integration of the spectrometer and the mini-computer through a selfdesigned housing.

To ensure the accuracy of the system data acquisition and to demonstrate the system's ability to detect various data, the team tested the stability of the entire hyperspectral imaging system. First, we adjust the imaging to the same state as the data acquisition, then we collect 10 sets of solar light spectral data by the spectrograph at 10 s intervals; then, we adjust the display of the system to red, green, and blue colors in turn, and repeat the above steps to obtain the corresponding data. Finally, we input 40 groups of data collected by the above methods into Matlab and then use two different processing methods to demonstrate the stability of the whole hyperspectral imaging system.

The first method is to extract one data point of the same wavelength from all 10 groups of data of the same kind, and arrange the data points in order and draw them into the pictures as shown below.

As shown in Figure 2, we can see clearly that the intensity fluctuation of the same 10 groups of data at the same wavelength is very small. This shows that the error of data acquisition of the same object is very small in a short time, which proves that the system has high accuracy.

**Figure 2.** (**a**) Point data of the same wavelength of 10 sets of sunlight spectrum data collected every 10 s. (**b**) Point data of the same wavelength of 10 sets of screen green light spectrum data collected every 10 s. (**c**) Point data of the same wavelength of 10 sets of screen blue light spectrum data collected every 10 s. (**d**) Point data of the same wavelength of 10 sets of screen red light spectrum data collected every 10 s.

In the second method, we plot the whole spectrum of the same 10 groups of data on one graph and distinguish them by different colors. As shown in Figure 3, we can clearly see two points: one is that the spectral images of 10 groups of similar data almost coincide; the other is that the spectra of sunlight, blue light, green light, and red light shown in the figure are very classic and do not violate the laws of nature. The above phenomenon shows that the measurement accuracy of the hyperspectral imaging system is high, and the detection ability of each band light is excellent, and the whole system has good stability.

We discussed the edge computing technology under IoT combined with deep learning algorithms to realize street garbage classification, fabric defect detection, et al. We wanted to use edge computing technology combined with deep learning algorithms, to classify more spectral data. The current mainstream spectral data processing algorithm is still for one-dimensional spectral data analysis. The machine learning image processing methods widely used in these processing methods are incompatible. As mentioned previously, the current deep learning algorithms are very in-depth in image processing research, these algorithms have relatively high processing efficiency and classification accuracy. If we can preprocess the spectral data, then we use deep learning algorithms for classification, which will greatly improve the efficiency and accuracy of spectral classification.

**Figure 3.** (**a**) 10 sets of sunlight spectrum data collected every 10 s. (**b**) 10 sets of screen green light spectrum data collected every 10 s; (**c**) 10 sets of screen blue light spectrum data collected every 10 s; (**d**) 10 sets of screen red light spectrum data collected every 10 s.

In our work, we randomly selected five kinds of fruit for testing and achieved accurate classification results through the algorithms. Generally, as long as we obtain enough spectral data and design effective algorithms, we can achieve accurate classification. A large number of literature results have verified the effectiveness of classification based on spectral data. For instance, in [20], the classification based on spectral data was also realized for different algae.

In this paper, we designed a portable optical fiber spectrometer with a screen and verified the stability, accuracy, and detection ability of the system through two different experimental processing methods shown in Figures 2 and 3. We used the spectrometer to collect one-dimensional reflectance spectrum data from five fruit samples, then we reshaped the spectral data structure and transformed it into 2D spectral data. We used our proposed CNN algorithm to extract and classify the 2D spectral image data of five samples. Its maximum classification accuracy rate was 94.78%, and the average accuracy rate was 92.94%, which is better than the traditional AlexNet, Unet, and SVM. Our method makes the spectral data analysis compatible with the deep learning algorithm and implements the deep learning algorithm to process the reflection spectral data from the optical fiber spectrometer.

The remaining paper is organized as follows: Section 2 introduces the optical detection experiment in brief. Section 3 provides the details of the proposed spectral classification method. Section 4 reports our experiments and discusses our results. Finally, Section 5 concludes the work and presents some insights for further research.

#### **2. Optical Detection Experiment**

We collected one-dimensional data of grapes, jujubes, kumquats, pears, and tomatoes through a portable optical fiber spectrometer. The pictures of the five samples are presented in Figure 4.

**Figure 4.** Experimental samples.

Some reasons will affect remote sensing spectral detection in the real world. For instance, the incident angle and reflection angle of light is not stable in the real optical platform detection environment. Therefore, to adapt to the change of angle, we adjusted the alignment direction of the optical fiber port of the equipment to better achieve the good effect of the optical detection experiment.

Most two-dimensional images are processed and classified by convolution neural network models. However, the spectral data we obtained through the spectrometer is in a one-dimensional format, which is incompatible with the method of deep learning algorithms to process the 2D spectral data. To transform one-dimensional data into twodimensional data, to realize the classification of deep learning algorithms, we finished the transformation of the one-dimensional data through the "Reshape" function in Matlab. After processing, we obtained five kinds of two-dimensional spectral data (32 × 32 pixels). These images are presented in Figure 5. In Section 3, we chose a method for deep learning called a convolutional neural network, which classifies these 2D spectral data.

**Figure 5.** 2D spectral data samples.

#### **3. Proposed Method**

#### *3.1. Model Description*

Using a deep learning convolutional neural network model to identify spectral data can be divided into two steps. First, perform feature extraction on the images, and then use the classifier to classify the images. The specific recognition process is depicted in Figure 6.

**Figure 6.** Convolutional neural networks (CNN) recognition process.

In general, there are convolutional layers, pooling layers, and fully connected layers in a convolutional neural network architecture. Compared with other deep learning models, CNNs show better classification performance.

When CNNs perform convolution operations, the image feature size of the upper layer is calculated and processed through convolution kernels, strides, filling, and activation functions. The output and the input of the previous layer establish a convolution relationship with each other. The convolution operation of feature maps uses the following formula.

$$\mathbf{x}\_{j}^{l} = f(\sum\_{i=1}^{n} w\_{ij}^{l} \times \mathbf{x}\_{i}^{l-1} + b\_{j}^{l}) \tag{1}$$

where *f*(·) is the activation function, *x l*−1 *i* is the output value of the *i*-th neuron in the (*l* − 1)-th layer, *w l ij* represents the weight value of the *i*-th neuron of the *l*-th convolutional layer connected to the *j*-th neuron of the output layer, *b l j* represents the bias value of the *j*-th neuron of the *l*-th convolutional layer.

$$x\_j^l = f(\rho\_j^l 
down(x\_j^{l-1} + b\_j^l))\tag{2}$$

where *f*(·) is the activation function, *down* (·) represents the downsampling function, *ρ* is the constants used when the feature map performs the sampling operation, *b l j* represents the bias value of the *j*-th neuron of the *l*-th convolutional layer.

The convolutional neural network is usually equipped with a fully connected layer in the last few layers. The fully connected layer normalizes the features after multiple convolutions and pooling. It outputs a probability for various classification situations. In other words, the fully connected layer acts as a classifier.

The Dropout [21] technology is used in CNN to randomly hide some units so that they do not participate in the CNN training process to prevent overfitting. The convolutional layer without the Dropout layer can be calculated using the following formula.

$$z\_{j}^{l+1} = w\_{j}^{l+1} y\_{j}^{l} + b\_{j}^{l+1} \tag{3}$$

$$y\_j^{l+1} = f(z\_j^{l+1}) \tag{4}$$

The mean of *w*, *b*, and *f*(·) is the same as that of Equation (1). The discard rate with the Dropout layer can be described as (5):

$$r\_j^l \sim Bernoulli(p) \tag{5}$$

In fact, the Bernoulli function conforms to the distribution trend of Bernoulli. Through the action of the Bernoulli distribution, the Bernoulli function is randomly decomposed into a matrix vector of 0 or 1 according to a certain probability. Where *r* is the probability matrix vector obtained by the action of the Bernoulli function. In the training process of models, it is temporarily discarded from the network according to a certain probability, that is, the activation site of a neuron no longer acts with probability *p* (*p* is 0).

We multiply the input of neurons by Equation (5) and define the result as the input of neurons with the discard rate. It can be described as.

$$
\hat{y}\_{\rangle}^{l} = r\_{\rangle}^{l} \* y\_{\rangle}^{l} \tag{6}
$$

Therefore, the output was determined using the following formula.

$$
\hat{z}\_{\mathbf{j}}^{l+1} = w\_{\mathbf{j}}^{l+1} \hat{y}\_{\mathbf{j}}^{l} + b\_{\mathbf{j}}^{l+1} \tag{7}
$$

$$
\hat{y}\_j^{l+1} = f(\sum\_{j=1}^k \hat{z}\_j^{l+1}) \tag{8}
$$

Here, *k* represents the number of the output neurons.

In this work, we classified 2D spectral data using AlexNet. However, the recognition rate was not high. Mainly, the reasons were analyzed as follows:


Therefore, we simplified the traditional AlexNet network architecture, decreased the parameters of the convolutional layers, reduced the number of pooling layers, and proposed a new CNN spectral classification model. Figure 7 reveals a specific deep learning spectral classification model framework. Additionally, we added a Dropout layer after each convolutional layer, *k* represents the size of convolution kernels or pooling kernels, *s* is the step size moved during convolution or pooling in the CNN operation, and *p* represents the value of filling the edge after the convolutional layer operation, and generally, the filling value is 0, 1, and 2.

Since the CNN model requires images of uniform size as input, all spectral data images are normalized to a size of 32 × 32 as input images. We divided the spectral data into *n* categories, so in the seventh layer, after the Dropout layer and the activation function *softmax* were calculated, *n* × 1 × 1 neurons were output, that is, the probability of the category where the *n* nodes were located.

**Figure 7.** Deep learning spectral classification model framework diagram.

#### *3.2. Dropout Selection Principle*

Dropout can be used as a kind of trick for training convolutional neural networks. In each training batch, it reduces overfitting by ignoring half of the feature detectors. This method can reduce the interaction in feature hidden layer nodes. In brief, Dropout makes the activation value of a certain neuron stop working with a certain probability *p* when it propagates forward.

In a deep learning model, if the model has too many parameters and too few training samples, the trained model is prone to overfitting. When we train neural networks, we often encounter overfitting problems. The model has a small loss function on the training data and a high prediction accuracy. The loss function on the test data is relatively large, and the prediction accuracy rate is low. If the model is overfitted, then the resulting model is almost unusable. Dropout can effectively alleviate the occurrence of over-fitting and achieve the effect of regularization to a certain extent. The value of the discard rate plays an important role in the deep learning model. An appropriate Dropout value can reduce the complex co-adaptation relationship between neurons and makes the model converge quickly.

In the training process of CNNs, when the steps of the convolution operation are different, the number of output neurons is different, which will reduce their dependence and correlation. If we quantify the correlation, it will increase the dependence. Therefore, we set the discard rate to narrow the range of correlation. After we successively take values in the narrow range, we train and predict the network model again. It will make any two neurons in different states have a higher correlation and improves the recognition accuracy of the model.

When we trained our proposed CNN model, we visualized the movable trend in dropout layers. Figure 8 presents the movable trend. Figure 8 demonstrates that it is very unstable between 0.5 and 1, which is prone to over-fitting. In (0, 0.1) and (0.2, 0.5), when increasing the epoch, the discard rate drops rapidly, and it is prone to under-fitting. In (0.1, 0.2), the discard rate gradually tends to a stable and convergent state, it is indicated that the value is more appropriate in the interval.

**Figure 8.** Dropout change graph.

#### **4. Experimental Results and Discussion**

In the algorithms' experiments, our hardware platform was: CPU frequency 3.00 GHz, the 32 GB memory, a GTX 1080ti GPU graphics card, and the Cuda 9.2 (Cudnn 7.0) accelerator. Our software platform was Keras 2.2.4, TensorFlow 1.14.0, Anaconda 3 5.2.0, Spyder, and Python 3.7 under win10 and a 64-bit operating system.

#### *4.1. Data Distribution*

In the experiments of the algorithm classification, the 2D spectral data of five fruit samples were obtained from the optical detection experiment. We divided the 2D spectral data into the training set, the verification set, and the testing set. The number of the training sets is about three times that of the testing set. Results are shown in Table 1.

**Table 1.** The data set.


#### *4.2. Train Results*

We trained our proposed CNN model, and the results are presented in Figure 9. From Figure 9 we can see that the loss of the training set and the validation set is always between 0.1 and 0.5, and there are no irregular up-and-down violent fluctuations. Both the training accuracy and the validating accuracy are rising and eventually reach a stable value; there is no longer a trend of large value changes. It can find out that if we increase the epoch, the training loss and the validating loss gradually become smaller, and eventually stabilizes. To sum up, our proposed CNN can overcome vanishing gradient in the process of training and validating, and can fully extract features of spectral data from end to end, which is conducive to the correct classification of spectra

Through the model's training time, accuracy, and loss curve, we can comprehensively judge the performance of the model. If the model consumes less training time, the accuracy and loss curves also tend to be stable and fast, and it is illustrated that the model has good convergence performance in a short time. If the model consumes for a long time, the accuracy and loss curve also tends to be steady and slow, and this indicates that the model has poor performance. Through the length of time consumed and the change in accuracy loss, some parameters of the model such as learning rate, batch processing times, etc, can be fine-tuned to improve the performance of the model. Therefore, we not only consider the model's accuracy and loss changes to the training data, but also consider the time consumption.

**Figure 9.** Proposed CNN iteration training change graph.

We recorded the time of training 100 times under four algorithms, Table 2 reveals the time of the four algorithms.

**Table 2.** Training time.


As shown in Table 2, our proposed method consumes the least time. It is proved that our proposed method can adapt to the feature extraction of spectral data, and does not bring too much parameters calculation to occupy memory.

#### *4.3. Test Results*

The SOTA image recognition model ViT-G/14 uses the JFT-3B data set containing 3 billion images, and the amount of parameters is up to 2 billion. On the ImageNet image data set, it achieved a Top-1 accuracy of 90.45%, it has surpassed the Meta Pseudo Labels model. Although ViT-G/14 performs well in addition to better performance, our data volume and categories are limited. Our data cannot adapt to the parameter training and testing of the SOTA image recognition model ViT-G/14 with more categories and large amounts of data. Therefore, we chose AlexNet, Unet, SVM, and our proposed CNN for comparison.

When epochs are set as 100, the testing accuracy of four algorithms under different parameters is presented in Table 3. Table 3 reports that different parameters correspond to different testing accuracy. For instance, the inputting shape is a batch size of 32, the learning rate is 0.001, the optimizer is SGD, our testing accuracy is 92.57%. It is superior to other parameters. Furthermore, the testing accuracy obtained by the values of different parameters also shows that our proposed CNN achieves an improvement in the classification accuracy of 22.86% when compared to AlexNet.


**Table 3.** Test accuracy under different parameters.

Figure 8 illustrates that the discard rate is the most appropriate value in (0.1, 0.2). We divided it into four sub-intervals to test the precision of our proposed CNN. Testing results are revealed in Figure 10. The results in Figure 10 demonstrate that the accuracy in (0.175, 0.200) is higher than in (0.100, 0.125), (0.125, 0.150), and (0.150, 0.175). Evidently, our proposed CNN model has the best performance in (0.175, 0.200), it verifies the correctness of the dropout discard rate analysis and selection principle simultaneously in Section 3.2.

To verify the feasibility of ReLU and the discard rate in this work, we again used ReLU with the Dropout layer (dropout = 0.2) and without the Dropout layer (dropout = 0) for testing. Testing results are shown in Figure 11. The experimental results confirm that the recognition rate is as high as 94.57% when ReLU is used and the discard rate is 0.2, which is significantly higher than the recognition result without the dropout layer (dropout = 0). In summary, our proposed CNN model outperforms AlexNet and SVM tested in terms of classification accuracy, and it can perform accurate spectral classification.

**Figure 11.** The impact of the dropout value on the recognition rate.

As shown in Table 4, the testing time of our proposed CNN is lower than AlexNet, Unet, and SVM. Evidently, our proposed model can quickly extract two-dimensional spectral features and gives the prediction result in the testing process.


**Table 4.** Testing times using four different methods.

To compare the performance of four different classification methods, we tested five samples one by one. Tables 5–8 show testing results. Tables 5–8 report that the testing precision of our proposed CNN is superior to AlexNet, Unet, and SVM. Therefore, our proposed CNN model has strong robustness to 2D spectral data.

**Table 5.** Testing five samples with our proposed CNN.


**Table 6.** Testing five samples with AlexNet.


**Table 7.** Testing five samples with Unet.


In Section 4.2, we considered the model training time, we also consider the speed of the model during testing images, simultaneously. If we proposed model is slower on testing image speed, it is revealed that the performance of the model does not take into account. The quality of a model not only depends on its training time, training accuracy, and verification accuracy, etc, but also its testing accuracy and testing time.


**Table 8.** Testing five samples using the support vector machine (SVM).

Figure 12 shows the maximum testing precision of each sample under four different algorithms. It can figure out if the testing effect of our proposed CNN is significantly greater than the other three methods in Figure 12. Obviously, our proposed CNN has high classification precision and generalization ability to 2D spectral data.

**Figure 12.** (**a**) The maximum testing results of the proposed CNN are between 90% and 95%. (**b**,**c**) The maximum testing results of AlexNet and Unet are between 70% and 90%. (**d**) The maximum testing results of the SVM are between 50% and 70%.

#### **5. Conclusions**

In this work, a new CNN architecture was designed to effectively classify 2D spectral data of five samples. Specifically, we added a Dropout layer behind each convolutional layer of the network to randomly discard some useless neurons and effectively enhance the feature extraction ability. In this way, the features uncovered by the network became stronger, which eventually lead to a reduction of the network architecture parameters calculation complexity and, therefore, to a more accurate spectral classification. The experimental comparisons conducted in this work shows that our proposed approach exhibits competitive advantages with respect to AlexNet, Unet, and SVM classification methods. Although fiber optic spectrometers cannot directly perform spectral imaging classification research, our work has confirmed that deep learning algorithms can be combined with the spectral data obtained by the optical fiber spectrometer for classification research. We will use fiber optic spectrometers to obtain more samples of spectral data and combine edge computing technology to send to the deep learning model for data processing and classification research in the future.

**Author Contributions:** Conceptualization, L.X. and F.C.; methodology, L.X., F.C. and J.W.; software, L.X. and F.C.; validation, L.X., J.X. and J.W.; formal analysis, L.X., F.C. and J.W.; data curation, L.X. and J.X.; writing—original draft preparation, L.X., F.C. and J.W.; writing—review and editing, L.X., F.C. and J.W.; supervision, J.W.; funding acquisition, F.C. and J.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by National Key Research and Development Program of China (No. 2018YFC1407505); National Natural Science Foundation of China (No. 81971692); the Natural Science Foundation of Hainan Province (No. 119MS001) and the scientific research fund of Hainan University (No. kyqd1653).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Nerijus Morkevicius \* , Algimantas Venˇckauskas , Nerijus Šatkauskas and Jevgenijus Toldinas**

Department of Computers Science, Kaunas University of Technology, Studentu˛ st. 50, LT-51368 Kaunas, Lithuania; algimantas.venckauskas@ktu.lt (A.V.); nerijus.satkauskas@ktu.lt (N.Š.); eugenijus.toldinas@ktu.lt (J.T.) **\*** Correspondence: nerijus.morkevicius@ktu.lt

**Abstract:** Fog computing is meant to deal with the problems which cloud computing cannot solve alone. As the fog is closer to a user, it can improve some very important QoS characteristics, such as a latency and availability. One of the challenges in the fog architecture is heterogeneous constrained devices and the dynamic nature of the end devices, which requires a dynamic service orchestration to provide an efficient service placement inside the fog nodes. An optimization method is needed to ensure the required level of QoS while requiring minimal resources from fog and end devices, thus ensuring the longest lifecycle of the whole IoT system. A two-stage multi-objective optimization method to find the best placement of services among available fog nodes is presented in this paper. A Pareto set of non-dominated possible service distributions is found using the integer multi-objective particle swarm optimization method. Then, the analytical hierarchy process is used to choose the best service distribution according to the application-specific judgment matrix. An illustrative scenario with experimental results is presented to demonstrate characteristics of the proposed method.

**Keywords:** fog computing; Internet of Things; service placement; fog service orchestration

#### **1. Introduction**

Fog computing acts as a missing link in the cloud-to-thing continuum. Services are provided closer to the edge of the network to enhance frequent services, latency, availability, and analysis. Fog computing places some computation resources in close proximity to a user where numerous heterogeneous end devices have to work in harmony. Control functions must work autonomously in such a heterogeneous and complex environment. Therefore, an orchestration is a centralized executable process to coordinate any interaction among any application or service [1]. Figure 1 shows the fog computing architecture.

**Figure 1.** Fog computing architecture.

There is a wide variety of the application areas. As the review paper [2] classifies it in their fog computing application taxonomy, it is an application area which is made

**Citation:** Morkevicius, N.; Venˇckauskas, A.; Šatkauskas, N.; Toldinas, J. Method for Dynamic Service Orchestration in Fog Computing. *Electronics* **2021**, *10*, 1796. https://doi.org/10.3390/ electronics10151796

Academic Editors: Kevin Lee and Ka Lok Man

Received: 29 June 2021 Accepted: 26 July 2021 Published: 27 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of municipal services, smart citizens, smart education, smart healthcare, smart buildings, smart energy, smart governance, etc. The main concerns which were identified in the reviewed papers were the following: bandwidth management, power management, security, mobility, resource management, and latency.

In order to ensure any heterogeneous service provision infrastructure with a scalability, interoperability, and adaptability in mind, fog nodes have to be dynamically deployable [3]. These fog nodes also use constrained resources [1] if they are compared to a cloud infrastructure. Additionally, in order to give access to relevant services as well as to prevent any unauthorized access, access control or privacy control is required [4]. Having that in mind, security and privacy are a big concern [5,6]. Fog computing solutions frequently have insufficient security due to the fact that they rely on intensive communications with constrained devices [7] and constrained resources [8] located at the end device layer. If one of many end devices does not support a communication protocol of a sufficient strength, the security of the whole solution may be compromised. Moreover, roaming services are supported in some fog solutions, when the service follows a human, vehicle, etc., and travels from one fog node to another. In such cases, when any lower security service is brought to a secure fog node, the security of this fog node may be compromised. Similar problems may also occur with other QoS parameters such as a latency, bandwidth, range, etc. However, service orchestrators which are placed in the fog nodes may be used to monitor the whole situation in the fog node (including any communications with neighboring fog nodes) to take any required measures in the case of any potential violations of security and other QoS parameters.

After a dynamic service orchestrator deploys any relevant services within specific fog nodes, there is another hurdle to overcome, the optimization [9]. Fog computing keeps computing resources close to users and to the end nodes to reduce any delay for IoT services. It can also deal with the privacy, data locality, and bandwidth consumption. There are several objectives that can be enhanced by an optimization, such as a latency, cost, or energy management. It is a part of the quality of service (QoS) but it may come with a trade-off.

Any fog service orchestration can be challenging in such conditions. Cloud service orchestration may already be reliable enough at the moment [10], but the situation is different with fog computing. The complexity builds up due to the diversity of different services and resources. There are also concerns about interoperability, performance and service assurance, lifecycle management, scalability, security, and resilience, as identified in the review [11]. The paper [12] suggests that scalability, dynamics, and security are among the most common orchestration challenges which are specified in research papers.

Our goal in this research is to offer an effective orchestrator working in the fog layer of the fog computing architecture by providing effective means to solve the QoS- and security-related problems of the orchestration in heterogeneous fog layer devices and services. The idea is to check the placement of fog devices and services for any potential QoS and security issues in order to find any non-optimal distribution of services among fog nodes.

This paper includes three main contributions aimed at the fog service orchestration problem of an optimal placement of services inside the available fog nodes. First, it presents a detailed review of the fog service orchestration challenges and solutions proposed by other authors. The review clearly demonstrates the most promising mechanisms to be used for a fog service orchestration and it defines the problem more formally, which is addressed in this paper. Second, a two-stage optimal service placement algorithm based on integer multi-objective particle swarm optimization (IMOPSO) and the analytical hierarchy process (AHP) is proposed and formally described. The first stage of the proposed method finds a Pareto set of non-dominated potential placements of services, then the AHP is used to choose one best solution according to the application-specific judgment matrix provided by a user. Third, the proposed method is experimentally evaluated using an illustrative scenario, showing the performance of the algorithm in some likely situations.

The rest of this paper is structured in the following way: related publications are presented in Section 2, following by Section 3, which presents a more formal definition of the service orchestration problem. We describe our proposed method in Section 4. Evaluation and experimental results are summarized in Section 5, and, finally, Section 6 is dedicated to conclusions and a discussion.

#### **2. Related Work Review**

Fog orchestrator components, as concluded in the paper [5], can generally be divided into three main groups: fog orchestrator, fog node which can function as a fog orchestrator agent (FOA), and end devices. A fog orchestrator needs to consult its catalogs and certain monitoring data to make an orchestration plan. A fog orchestrator can start its orchestration manually or after reaching a benchmark, such as the availability of other nodes. FOA, in turn, can handle only local resources which are within that particular node.

The main research challenges in fog orchestration identified by Velasquez et al. [13] are the following: resource management, performance, and security management. Resource and service allocation optimization techniques are used, among others, to address these challenges. The problem is that an allocation procedure is a non-trivial problem, because essentially it is a multi-objective optimization problem.

In order to address the issues in the perspective of the fog computing, the authors of the paper [14] suggest using four of their proposed algorithms for their identified construction phase and maintenance phase. The construction phase aims to find some probable candidate locations to place the gateways while using the candidate location identification (CLI) algorithm. A Hungarian method-based topology construction (HTC) algorithm is used to select the optimal gateway locations. Meanwhile, the maintenance phase increases the processing resources in the gateways by intelligent sleep scheduling with the help of the vacation-based resource allocation (VRA) algorithm. Their processing and storage resources in the gateways are further improved based on the tracked data arrival rates with the help of the dynamic resource allocation (DRA) algorithm. Another option which can be beneficial to improving the performance of a network in terms of reliability and response is caching, as was noted in the publication [15]. The caching at the fog nodes can reduce computational complexity and network load. Even though the computing power is the most critical aspect in the fog node to complete specific tasks as the paper [16] suggests, an effective allocation of resources can vary due to limitations. These limitations may include the hierarchy of the fog network, network communication resources, and storage resources.

Yang et al. [17,18] confirm that the orchestration has to deal with a number of factors such as resource filtering and assignment, component selection and placement, dynamics with the runtime QoS, systematic data-driven optimization, or machine learning for orchestration. They implemented a novel parallel genetic algorithm-based framework (GA-Par) on Spark. They normalized the utility of security and network QoS into an objective fitness function within GA-Par. It reduces any security risks and performance deterioration. As their experiments later demonstrated, GA-Par outperforms a standalone genetic algorithm (SGA). Skarlat et al. [19] proposed to solve the fog service placement problem (FSPP) by using orchestration control nodes which place each service either in the fog cells or in the fog orchestration control nodes. The goal of optimization is to maximize the number of service placements in the fog nodes (rather than in cloud ones), while satisfying the requirements of each application. The authors used a genetic algorithm to find the optimal FSPP.

The authors of another paper identified resource allocation and provisioning as a challenging task considering dynamic changes of user requirements and limited resources [20]. They proposed their resource allocation and provisioning algorithms based on the resource ranking. They evaluated their algorithms in a simulation environment after extending their CloudSim toolkit. There are mainly two steps which are used to solve a deadline-based user dynamic behavior problem. First, they ranked resources based on processing power,

bandwidth, and response time. Later, they provided resources by prioritizing processing application requests.

As the deployment infrastructure has to adapt itself to extremely dynamic requirements, the fog layer may not provide enough resources and, meanwhile, the cloud layer can fail due to latency requirements [21]. The paper presents a rewriting-based approach to design and verify a self-adaptation and orchestration process in order to achieve a low latency and the right quantity of resources. An executable solution is provided based on Maude, the formal specification language. Properties are expressed using linear temporal logic (LTL). Their proposed cloud–fog orchestrator works as a self-adaptation controller. It is deployed in the fog layer as a fog node master for low latency requirements. The orchestrator triggers the right actions after a decision is made.

Smart service placement and management of services in big fog platforms can be challenging due to a dynamic nature of the workload applications and user requirements for low energy consumption and good response time. Container orchestration platforms are to help with this issue [22]. These solutions either use heuristics for their timely decisions or AI methods such as reinforcement learning and evolutionary approaches for dynamic scenarios. Heuristics cannot quickly adapt to extremely dynamic environments, while the second option can negatively impact response time. The authors also noted that they need scheduling policies which are efficient in volatile environments. They offer a gradient based optimization strategy using back-propagation of gradients with respect to the input (GOBI). They also developed a coupled simulation and container orchestration framework (COSCO) that enabled the creation of a hybrid simulation decision approach (GOBI\*) which they used to optimize their quality of service (QoS) parameters.

As the service offloading is relevant enough in the perspective of time and energy, selection of the best fog node can be a serious challenge [23]. The researchers presented in their paper a module placement method by classification and regression tree algorithm (MPCA). Decision parameters select the best fog node, including authentication, confidentiality, integrity, availability, capacity, speed, and cost. They later analyzed and applied the probability of network resource utilization in the module offloading to optimize the MPCA.

Linear programming is another very popular optimization method used for resource allocation and service placement in fog nodes. Arkian et al. [24] linearized a mixedinteger non-linear program (MINLP) into the mixed-integer linear program (MILP) for optimal task distribution and virtual machine placement by using the minimization of cost. Velasquez et al. [25] proposed the service orchestrator which tries to minimize the latency of services using integer linear programming (ILP) to minimize the hop count between communicating nodes.

The authors of [26] present a method used to help deployments of composite applications in fog infrastructures, which have to satisfy software, hardware, and QoS requirements. The developed prototype (FogTorch) uses the Monte Carlo method to find the best deployment which ensures the lowest fog resource consumption—the aggregated averaged percentage of consumed RAM and storage in all the fog nodes.

A sequential decision-making Markov decision problem (MDP) enhanced by the technique of Lyapunov optimization is used by the authors of [27] to minimize operational costs of an IoT system while providing rigorous performance guarantees. The proposed method is intended to be used for a general problem of resource allocation and workload scheduling in cloud computing, but it may also be applied to a service placement problem in fog nodes.

As fog computing has a number of challenges to deal with, optimization is vital, and the classification of optimization problems can play an important role [28]. A service placement problem, in general, has been shown to be NP-complete by the authors of [29]. An optimization is typically made up of [30] (a) a set of variables to encode decisions, (b) a set of possible values for each variable, (c) a set of constraints which the variables are to satisfy, and (d) an objective function. Optimization solutions involving end devices and fog nodes differ based on their application area.

Our analyses of the methods used by other authors for service placement problem optimization, as well as findings of other researchers [31], show that various well-established optimization methods are used for this task, including integer linear programming, genetic algorithms, the Markov decision process, gradient based optimization, the Monte Carlo method, reinforcement learning, etc. The objective functions used by the authors of these methods vary from an overall cost minimization [24,27], to network latency [25], hop and service migration count [25], and response time and latency of the IoT system [26]. The literature review allows us to conclude that the most optimization methods tend to seek for an optimal placement of the services based on the most important parameter of the IoT system, which is represented by the objective function used in an optimization process. Other important parameters of IoT systems in such cases are used as restrictions, and usually include latency, power, bandwidth and QoS [24,26], CPU, RAM, and storage demands [19]. This kind of optimization problem formulation allows one to avoid the challenges of multi-objective optimization, but it may not be used in situations where more than one objective function is required. Some other approaches tend to evaluate several characteristics by combining them into one composite criterion, such as cost [24,27] or fog resource consumption [26] composed from an average RAM and storage usage in the fog nodes. The composite criteria calculation equations usually are provided by the authors of the proposed algorithms, and they use some predefined coefficients which are difficult to justify and validate. One very important challenge remains in this area in that case—how to find the best placement of the services according to several different heterogeneous criteria, with different origins and different units of a measurement, when they often contradict each other. The usage of composite criteria is not always the best answer to this.

The service placement optimization method proposed in this paper tries to address these challenges by using a multi-objective optimization method to find all non-dominated placements of the services and then to select one best placement using an analytical hierarchy process which simplifies the process of the criterion comparison performed by the experts of the application area. In this way, any number of objective functions (optimization criteria) may be used in the optimization process as long as experts are able to provide a consistent pair-to-pair comparison of their priority in the context of a concrete area of application.

#### **3. Orchestrator Components and Architecture**

In this paper, we consider the fog orchestration architecture and components presented in Figure 2. We have a service orchestrator in the cloud layer which is used to optimally distribute the services between several orchestrated fog nodes. The orchestrated fog nodes host some services which communicate with end devices, collect and process data, and make some local decisions on the control of actuators located in the end device layer. Special services (orchestrator agents) are physically located in each fog node and they communicate with the orchestrator to provide it with all the necessary information needed to make any decisions on service placement.

Orchestrator agents locally monitor the hardware and software environment of the fog nodes. They are aware of the current CPU and RAM usage, power requirement and energy levels, available communication protocols and bandwidth, security capabilities, state of the hosted services, etc. They summarize all the collected information to provide it to the orchestrator in the cloud layer. The orchestrator is aware of the current situation in all the fog nodes and, additionally, it has security and QoS requirements imposed by the application area of the IoT solution, and it makes decisions on starting, stopping, or moving particular services among the orchestrated fog nodes. The decisions made by the orchestrator are communicated down to the orchestrator agents inside the fog nodes, then the orchestrator agents initialize the required actions on the services. A control cycle performed inside the orchestrator is illustrated in Figure 3.

**Figure 2.** Fog orchestrator architecture and components.

**Figure 3.** Control cycle inside the orchestrator.

The method of the fog service orchestration presented in this paper is intended to be used inside the orchestrator. The main task of the proposed method is to optimally distribute *n* services among *k* fog nodes according to the information collected from the corresponding fog nodes and the requirements imposed by the area of an application of the IoT system. This task of a service distribution is not trivial since several different optimization criteria which contradict each other must usually be considered (i.e., security level, energy consumption, bandwidth capabilities, latency, etc.). The number of possible different distributions of services among fog nodes increases rapidly with the increase in the number of available fog nodes and services. Any evaluation of all the possible placements of the services is infeasible, therefore, more sophisticated methods are needed. Moreover, the situation and the evaluation criteria can change dynamically due to the dynamic environment of the fog architecture. Some currently available fog nodes as well

as end devices may change their location or new fog nodes may even emerge while, on the other hand, some currently running services may become unused and some new services may occur.

#### **4. Method for Fog Service Orchestration**

We propose to use multi-objective optimization to decide which placement of *n* available services in *k* fog nodes is the best according to given constraints and conditions. The overall flow chart of the proposed two-stage optimization method is presented in Figure 4.

**Figure 4.** Flow chart of the proposed service distribution optimization process.

The optimization process has two main steps—multi-objective optimization and a multi-objective decision, but the problem must be expressed as a formal mathematical model before using any formal optimization methods. The following subsections describe the optimization process in detail. We summarize the key notations used in this paper in Table 1 in order to give a description of the optimization process.


**Table 1.** Key notations used in this paper.

#### *4.1. The Optimization Model of a Service Distribution Problem*

The main task of this optimization procedure is to find an optimal distribution of *n* services among possible *k* fog nodes. Each fog node may have slightly different characteristics, but we assume that all the nodes are capable of running all the services. The goal of optimization is to distribute all the services in such a way that a set of important characteristics is optimal. Characteristics of the *i*-th possible service distribution *X<sup>i</sup>* are expressed by the values of the objective functions *fj*(*Xi*), *j* = 1, 2, . . . , *m* and constraint conditions. The objective of the optimization process is to find the best service distribution *Xopt* which minimizes all the objective functions *f<sup>j</sup>* , i.e., we have a multi-objective optimization problem:

$$X\_{\rm opt} = \underset{\rm i}{\arg\min} F(X\_i) \tag{1}$$

where *F*(*x*) = { *f*1(*x*), *f*2(*x*), . . . , *fm*(*x*)} is a set of the objective functions, and *x* ∈ {*Xi*} is a member of the set with all the possible service distributions.

Constraint conditions are expressed by the following equations:

$$\begin{cases} \ g\_j(X\_i) \ge 0, j = 1, 2, \dots, n\_{\mathcal{S}}\\ h\_{\mathcal{K}}(X\_i) = 0, k = 1, 2, \dots, n\_{\mathcal{h}} \end{cases} \tag{2}$$

#### *4.2. Objective Functions*

Different fog nodes have different performance, network bandwidth, and security characteristics. Different distributions of services among the fog nodes may produce a working system with slightly different characteristics. For example, if one fog node supports a lower level of a security (due to limited hardware capabilities), and an important service is placed in this node, then the overall security of the whole system is reduced to the security of the least secure fog node. We consider multiple objective functions (*fj*(.), *j* = 1, 2, . . . , *m*) to evaluate all such situations, which include: overall security of the system, CPU utilization, RAM utilization, power utilization, range, etc. Some objective functions which were used in our experiments are provided in the following paragraphs.

A security of the whole system while using the *i*-th service distribution *fsec*(*Xi*) is defined by the lowest security of all the services. We assign security levels (expressed in security bits, according to the NIST publication [32]) to fog nodes based on their capabilities to support corresponding security protocols. We assume that services are capable working on all the fog nodes, then a value of the security criteria function *fsec*(*Xi*) for the service distribution *X<sup>i</sup>* is the lowest security level of all the fog nodes in which at least one service is hosted. For example, if we have a situation where three fog nodes are able to provide 128 bits of security and one fog node is constrained to support only 86 bits of security, and if at least one service is hosted by it, then the overall security of the service distribution is equal to 86 bits, i.e., the value of the objective function *fsec*(*Xi*) = 86. If we use our services in the application area which requires a specific level of security, then such a requirement is expressed as a constraint condition, i.e., if some application area requires at least 128 bits of security, then we have a corresponding constraint *gsec*(*Xi*) ≥ 128.

The criterion of CPU usage *fCPU*(.) evaluates how evenly, CPU utilization-wise, the services are distributed among the fog nodes. The main idea here is to try to decrease the overall CPU utilization of the system to allow hosting of additional services more easily in the case they occur during the runtime of the system. Each fog node has its CPU capabilities expressed in MIPS, which depend on HW capabilities of the corresponding fog node. All services are also evaluated for required CPU resources. To calculate the value of CPU usage of the whole system while using the *i*-th service distribution *fCPU*(*Xi*), we first calculate a relative CPU usage for each fog node (dividing the sum of CPU resources required by all the services hosted in each fog node by the capabilities of the corresponding fog nodes) and we find afterwards the maximum CPU utilization among all the fog nodes. The lower the maximum CPU utilization is, the better service distribution we have. We can obtain this situation while using this method of calculation, when some service distributions make up a CPU allocation greater than 100% in some fog nodes and, therefore, corresponding constraints are added to the optimization problem. The usage of this criterion automatically solves some frequent restrictions and incompatibilities, i.e., situations when some services require CPU resources which may not be provided by some fog nodes.

The criterion of RAM usage *fRAM*(.) which evaluates how evenly RAM utilization is distributed among any fog nodes hosting the services is very similar to CPU usage. The calculation of this criterion is the same as the calculation of CPU usage. A constraint which does not allow exceeding 100% of the RAM utilization in each fog node is also added.

A criterion of the power usage *fpw*(*Xi*) of possible service distribution *X<sup>i</sup>* is evaluated using the average power requirements of each service (expressed in mW) and the available power of fog nodes (expressed in mW). The main objective of this evaluation is to maximize the overall runtime of the system. A calculation is performed by dividing the sum of power requirements of all the services hosted in each fog node by the available power of a corresponding fog node to find the maximum among all the fog nodes. A distribution of services is better in such a case when all the fog nodes are evenly loaded power-wise, i.e., the maximum power utilization is minimized.

The communication of fog devices with sensors and actuators is affected by the physical range between devices in some cases. Some communications protocols add strict requirements for the range as some of them may be less efficient if the communication range is increased. A criterion of the maximum range *frng*(.) may be used to assess these properties. In this study, a criterion of the range is calculated by averaging the range of each fog node location with respect to all the devices the particular fog node is communicating with to find the maximum of these ranges among all the fog nodes hosting at least one service which requires communication with end devices. The main idea of this criterion is to prefer a shorter communication path as it ensures better performance in most cases. Any corresponding constraints on the range may be also added if a communication protocol induces such restrictions.

Other application-specific criteria such as local storage capabilities, communication latency, bandwidth, etc. may also be evaluated, defining corresponding objective functions

representing system characteristics which are important in a particular application scenario. All specific implementations of the criteria evaluation functions *fi*(.) are implementation specific and are out of the scope of this paper. The proposed optimization procedure is not limited to any specific amount or nature of the objective functions as long as they follow a few common criteria:


Generally, one common feature of all these objective functions is that they are mutually exclusive. Any optimization of one objective will often be at the expense of affecting the other one. For example, we may consider moving all the services to more secure fog nodes to increase security, but such a service distribution will likely cause reduced power efficiency, excessive load on some of the nodes, and a lower overall runtime of the system. Moreover, different objectives have different measurement units, e.g., security may be evaluated in bits while the power requirement of the services is measured in Watts, any available network bandwidth is measured in kbps, etc. Even if all the measurements are converted to real positive numbers, it is still very difficult to objectively compare them. There is no single solution to a multi-objective optimization problem that optimizes all the objectives at the same time. The objective functions are contradictory in this situation, therefore a set of non-dominated (Pareto optimal) solutions can be found. We propose to use a two-stage optimization procedure (see Figure 4) in order to deal with this situation, where the first step will use a multi-objective optimization to find a set of solutions (possible distributions of services), the elements of which are a Pareto optimal. We propose to use for this the integer multi-objective particle swarm optimization (IMOPSO) method described in the next paragraph. A choice of the particle swarm optimization method is based on the research of other authors [33–35] which shows that this method is suitable for a similar class of problems, and it demonstrates good results. We will use the analytical hierarchy process (AHP) in the second step to choose the best solution from a Pareto optimal set.

#### *4.3. IMOPSO for Finding a Pareto Set of Possible Service Distributions*

The original particle swarm optimization (PSO) algorithm is best suited for an optimization of continuous problems, but several modifications [36,37] exist, which enable it to be used for discrete problems. In the case of multiple objectives which contradict each other, the PSO algorithm may be adapted to find a Pareto optimal set of solutions [38,39]. We used the Multi-objective particle swarm optimization (MOPSO) method proposed by Coello et. all in [39] to find a Pareto set of the possible service distributions among fog nodes. In order to use this method, we had to slightly adapt it for it to work in the constrained integer *n*-dimensional space of possible distributions of services represented as the particles of a swarm (the original method uses a continuous real number space).

We used the vector *X<sup>i</sup>* = (*x*1, *x*2, . . . , *xn*) *T* , *x<sup>j</sup>* ∈ {1, 2, . . . , *k*}, *j* = 1, 2, . . . , *n* to encode the *i*-th distribution of services, where *n* is the number of services which have to be distributed among *k* fog nodes. The meaning of the vector element *x<sup>j</sup>* = *l* is that the *j*-th service must be placed in the *l*-th fog node.

A flow diagram of the integer multi-objective particle swarm optimization (IMOPSO) algorithm is shown in Figure 5.

**Figure 5.** Flow chart of IMOPSO algorithm.

The main steps of the IMOPSO algorithm are the following:

	- = (<sup>1</sup> ( ), <sup>2</sup> ( ), … , ( )) = 1, 2, … , || • Evaluate the new scores *F<sup>i</sup>* of all the particles in the swarm *S* using all the objective functions: *F<sup>i</sup>* = (*f*1(*Xi*), *f*2(*Xi*), . . . , *fm*(*Xi*)) *T* , *i* = 1, 2, . . . , |*S*|.
	- = + 1 ( − ) + <sup>2</sup> ( − ) <sup>1</sup> <sup>2</sup> [0. .1] • Calculate new velocities of each particle using the expression *V<sup>i</sup>* = *wV<sup>i</sup>* + *r*1(*pBPos<sup>i</sup>* − *Xi*) + *r*2(*gBPos* − *Xi*), where *w* is an inertia weight (initially a real value around 0.4); *r*<sup>1</sup> and *r*<sup>2</sup> are random numbers in the range of [0..1]; *V<sup>i</sup>* is the velocity of the *i*-th particle; *pBPos<sup>i</sup>* is the position of the *i*-th particle with the best score; *X<sup>i</sup>* is the current position of the *i*-th particle; and *gBPos* is the position of the particle with the best global score.
	- = ( + ) = 1, 2, … , || • Update positions of all the particles in the swarm: *X<sup>i</sup>* = *round*(*X<sup>i</sup>* + *Vi*), *i* = 1, 2, . . . , |*S*|. The position is approximated to the nearest integer value.

If the particle is out of the range, give it the opposite direction of the speed (*V<sup>i</sup>* = −*V<sup>i</sup>* ), and set the position *X<sup>i</sup>* to the edge of the range of its definition.


#### *4.4. Finding an Optimal Service Distribution Using the AHP*

We used the analytical hierarchy process (AHP) [40,41] to choose the best solution from a Pareto optimal set by using pairwise comparisons of all non-dominated distributions of services using all the available objective functions. The AHP is usually used in situations where a decision must be made using a small amount of quantitative data, using a deep analysis performed by several decision-making parties, by applying a pair-to-pair comparison of possible solutions. The AHP may be adapted to be used by machinebased decision making in the scenarios where complex multiple criteria problems are evaluated [42–44]. The choice of the AHP instead of other more formal multi-criteria decision-making algorithms is based on the following reasons [40].

The AHP allows one to automatically check the consistency of the evaluations provided by decision makers. The AHP uses normalized values of criteria, so it allows one to use heterogeneous measurement scales for different criteria. For example, one can use a purely qualitative scale for the security (high, low, medium) and use inconsistent numeric scales for any power and CPU requirements at the same moment. The AHP uses pairwise comparisons of the alternatives only, which eases multi-objective decision making to obtain improved reliability of the results. The importance of the criteria used in the AHP is also evaluated using the same methodology, which allows one to skip the most controversial step of a manual weight assignment to different criteria.

A three-level hierarchical structure of the AHP is generalized in Figure 6. Level one is an objective of the process which in our case is to choose an optimal distribution of all the available services among fog nodes. The second level is the criteria, which are the same as the objective functions used in the IMOPSO part of the optimization process. An important step in this level is to use the same AHP to find the weight of all the criteria by using a pairwise comparison of the criteria. This step should be done manually before putting an automatic service allocation algorithm into production. Moreover, a step of the evaluation of criterion importance should be different based on the application area in which a service orchestrator is applied. For example, security may be evaluated as more important than power efficiency in a healthcare application compared to a home automation application. We assume in our algorithm that the step of the evaluation of criterion importance is already performed, and the decision-making system already has its judgment matrix with all the required weights of all the criteria in level 2.

The third level is alternatives. These are filled with all the Pareto optimal solutions from a previous step of the optimization process using the IMOPSO method. Then, the AHP is started to choose the best alternative. The whole process is summarized in Figure 7.

**Figure 6.** Hierarchical framework for AHP.

**Figure 7.** Process of AHP decision making.

The main steps of the AHP are the following:

	- Construct the weight coefficient matrix *M<sup>k</sup>* = *mi*,*<sup>j</sup>* using all the alternatives in the Pareto optimal solution set *R*. The size of the matrix *M<sup>k</sup>* is *s* × *s*, where *s* = |*R*|; *mi*,*<sup>j</sup>* ∈ (0, 9]; *mi*,*<sup>j</sup>* = <sup>1</sup> *mj*,*<sup>i</sup>* ; *mi*,*<sup>i</sup>* = 1; *i*, *j* = 1, 2, . . . , *s*. The matrix *M<sup>k</sup>* elements are calculated using special comparison functions *mi*,*<sup>j</sup>* = *comp<sup>k</sup> Xi* , *X<sup>j</sup>* which use a corresponding objective function *fk*(.), which calculates two objective function values *fk*(*Xi*) and *f<sup>k</sup> Xj* , and compares them with each other to transform the result into the required real number from the interval (0, 9]. A comparison

function heavily depends on the meaning of the criteria and the corresponding real number represents a preference of one alternative over another [35].


#### **5. Implementation and Evaluation**

Implementation results of our method are summarized in this section with a discussion on each result. The implementation of a real fog computing environment with a measurement of all the parameters used in the service placement decisions is out of the scope of this paper, and it also makes it difficult to scale the solution and reproduce the results, therefore, we used a simulation. The main objective is to show how the proposed method performs in different situations as well as to test the feasibility of the proposed service placement method.

We implemented the proposed optimal service placement-finding method using Matlab. The implementation uses as an input some basic performance information on the fog nodes and services, a set of the objective functions, and an application area-specific judgment matrix *J*. The method performs integer multi-objective particle swarm optimization, finds a Pareto optimal set of solutions, automatically performs an AHP using a provided judgment matrix, and finds the best placement of the services in the fog nodes.

#### *5.1. Illustrative Scenario*

We used an illustrative scenario to evaluate the characteristics of the proposed method. We have 4 fog nodes and 13 services in this scenario, and they must be optimally placed in those fog nodes. Capabilities of the fog nodes and requirements of the services are chosen to show how the proposed method performs in different situations. We used several papers [19,45,46] analyzing various requirements of real hardware and software IoT systems to provide realistic numbers. A summary of the fog node parameters and requirements of the services are presented in Tables 2 and 3.

A security level of any fog device is determined by the hardware and software capabilities as well as by the availability of corresponding libraries, and it is expressed in bits according to the NIST guidelines [32].

All services are divided into three main groups. Sense1, Sense2, and Sense3 services are primarily used to communicate with any corresponding sensor devices, collect the measurement data, and provide it to the other services for processing. On the other hand, services Actuate1, Actuate2, and Actuate3 are mainly used to communicate with the actuator devices. The rest of the services are primarily used to collect data, perform calculations, and make decisions. Resource requirements of the services from different classes are very different.


**Table 2.** Resources available in the fog nodes.


**Table 3.** Resources required by the services.

We used a "dynamic" objective function representing power requirements of the service to better illustrate the capabilities of our method. The power requirements of the service depend on which fog node is used to host this service. This is achieved by dividing the power requirements into two parts: processing and transfer power. The processing power is constant, and it is always required to perform an operation (the values of power requirements were taken from the publication [19]). On the other hand, the transfer power presented in Table 3 is required if no security is used to transfer the data (i.e., a plain http protocol is used). The information on required power levels for a data transfer without any security is based on the experimental results presented in the paper [47]. When the service is placed in a fog node providing more security, then the corresponding requirement for a transfer power is increased. For example, if a service is placed in a fog node providing 86 bits of security (e.g., this node is using 1024-bit RSA for a key agreement), then the corresponding transfer power is multiplied by a coefficient of 1.5. The transfer power increase coefficients were based on the results presented in [45] and [48]. We decided after an analysis of the provided data to use these multipliers for modeling the increase in power due to increased security: 1.5 for 86 bits of security, 2.25 for 112 bits, 4 for 128 bits, and 7.5 for 256 bits of security.

#### *5.2. Evaluation Results*

We use a simplified scenario where only two objective functions are used to show how the IMOPSO algorithm works and how the Pareto set of solutions looks. A Pareto set may be displayed in this case using a two-dimensional chart. A judgment matrix used in this case consists only of 4 elements:

$$J = \begin{pmatrix} 1 & 3 \\ 1/3 & 1 \end{pmatrix} \tag{3}$$

If two objective functions, RAM and CPU, are used, then this judgment matrix means that an even RAM usage distribution among all the fog nodes is more important than an even CPU usage. A Pareto set produced by the IMOPSO algorithm is presented in Figure 8. Then, a Pareto solution set is used in the second stage, employing an AHP, to find the best placement of services. The best placement is summarized in Table 4.

**Figure 8.** Pareto set of a simplified scenario.

**Table 4.** The best service placement in a simplified scenario.


The best score (the best values of the objective functions) in this case is (39, 96) *T* , meaning that this service placement ensures a maximal RAM usage of 39% among all four fog nodes. The maximal usage of CPU is 96% in this case.

The second scenario shows an influence of the judgment matrix on the optimal placement of services. Four objective functions are used in this case: power, CPU, security, and RAM. The first judgment matrix prioritizes security and energy over the CPU and RAM:

$$J\_1 = \begin{pmatrix} 1 & 3 & 1/6 & 3 \\ & 1/3 & 1 & 1/6 & 1 \\ & 6 & 6 & 1 & 6 \\ & 1/3 & 1 & 1/6 & 1 \end{pmatrix} \tag{4}$$

The second judgment matrix prioritizes an even power consumption:

$$\mathbf{J}\_2 = \begin{pmatrix} 1 & 7 & 3 & 6 \\ 1/7 & 1 & 1/2 & 1 \\ 1/3 & 2 & 1 & 2 \\ 1/6 & 1 & 1/2 & 1 \end{pmatrix} \tag{5}$$

A Pareto set of solutions using the judgment matrix *J*<sup>1</sup> is shown in Figure 9. Only some projections of the set are shown as the set members are four-dimensional vectors and they cannot be fully rendered in charts.

**Figure 9.** Pareto set of the second scenario, security and energy are prioritized.

The best placement of services is presented in Table 5. The best score in this case is (35, 94, 112, 98) *T* . The overall security (determined by a security level of the least securitycapable fog node hosting at least one service) is 112 bits in this case, and the fog node Fog4 is not hosting any services, as its security is only 86 bits.

**Table 5.** Best service placement in the second scenario, security is prioritized over other criteria.


If the judgment matrix *J*2, which prioritizes even power consumption, is used in the same situation, then the best placement is different (see Table 6), and the best score is (26, 88, 86, 84) *T* .

**Table 6.** Best service placement in the second scenario, power is prioritized over other criteria.


The maximal power consumption among all the fog nodes is 26% in this case, and it is significantly better than in the first variant (35%), but the overall security of the solution is degraded to 86 bits, as several services are placed in the fog node Fog4.

The third illustrative scenario is meant to illustrate how the proposed service placement method works in cases when some devices change their positions, and corresponding services must be reallocated. We use an objective function considering the range from a physical sensor device to the service monitoring device which is physically placed in one of the fog nodes to demonstrate this scenario. The range in this case is only important for services which are communicating with sensors or actuators. The range is considered 0 independently of the fog nodes they are hosted in with the services which are processing data. A judgment matrix prioritizing the range is used in this scenario, while the objective functions in this case are: range, CPU, security, RAM.

$$J\_3 = \begin{pmatrix} 1 & 5 & 3 & 5 \\ 1/5 & 1 & 1/2 & 1 \\ 1/3 & 2 & 1 & 2 \\ 1/5 & 1 & 1/2 & 1 \end{pmatrix} \tag{6}$$

We used the data presented in the diagram (see Figure 10) to model the placement of the services. All coordinates here are presented in meters.

The best service placements in each case are summarized in Tables 7 and 8, and the corresponding scores are (13, 67, 128, 73) *T* and (9, 54, 86, 55) *T* .

**Figure 10.** Service placement diagram. (**a**) Initial placement, (**b**) modified placement.

**Table 7.** Best service placement in the third scenario, the initial placement of sensors and actuators.


**Table 8.** Best service placement in the third scenario, the placement of sensors and actuators after changes in their location.


The evaluation results clearly show that if the range is the most important objective function, then the services are more likely to be placed in the adjacent fog nodes. On the other hand, if more sensors are located near a less secure fog node, then the overall security of the solutions may decrease (128 bits vs. 86 bits in the second scenario).

#### **6. Discussion, Conclusions, and Future Work**

An increase in IoT-based services has led to a need for more efficient means of handling resources in systems comprising heterogeneous devices. A fog computing paradigm brings computational resources closer to the edge of the cloud, but energy-, communication-, and computation resource-constrained devices dominate near the edge. Different application areas (healthcare, multimedia, home automation, etc.) require different characteristics of the IoT system. The usage of various heterogeneous devices leads to difficulties in predicting how much of the resources would be required within the fog nodes when all the services are going to allocate all the resources they need. Moreover, the need for roaming services which follow the actors (i.e., a person is moving inside a building, cars, etc.) arises due to the limitations of some hardware devices (i.e., a limited range of communication protocols), and therefore the resources in the fog nodes need to be reallocated in this

case every time the situation changes. The best way to deal with these dynamic service reallocations is to use service orchestrators, which decide the best way to allocate and move, start, and stop any corresponding services as needed. One of the main challenges while designing an effective service orchestrator is the need for a specialized method to obtain an optimized service placement inside the available fog nodes.

A new optimization method for an optimal distribution of services among available fog nodes was proposed in this paper. The two-stage method uses integer multi-objective particle swarm optimization to find a Pareto optimal set of solutions and the analytical hierarchy process using an application-specific judgment matrix for a decision on any optimal distribution of services. Such a processing distribution allows one to assess different heterogeneous criteria with different units of a measurement and different natures (qualitative or quantitative). The method, apart from providing one best solution, also ranks all the Pareto optimal solutions, enabling one to compare them with each other (answering the question "how much better is one solution than the other?") and, if needed, to choose the second best, the third best, etc. solution.

The proposed method effectively works with the whole range of objective functions (evaluation criteria), which could be easily expanded by new objective functions representing different criteria. Moreover, the objective functions may be dynamic, meaning that not only the value but also the algorithm of an objective function calculation may be different based on the service placement in particular fog nodes with particular software and hardware capabilities.

If the same end device, service, and fog device set is used in a different application area (i.e., healthcare vs. home automation) which requires different prioritization of criteria (i.e., security is more important in healthcare compared to home automation) then only the AHP judgment matrix must be changed. The method adapts to the situation and provides appropriate results.

A number of interesting aspects of the proposed method could be explored in the future. It would be interesting to use it in a real orchestrator of IoT infrastructure to practically evaluate how different placements of services inside fog nodes influence the performance of the whole IoT system. Another very interesting aspect to investigate is objective function construction according to the experimentally obtained real-life results involving all the interrelations among different criteria. An experiment using real hardware and software would help to estimate some additional aspects of the proposed algorithm, including the performance under different configurations of the infrastructure (number of fog nodes, number of end devices, etc.) and different architectures of the corresponding devices (supporting parallel processing, optimization using CPU or GPU, offloading an optimization task to the cloud services, etc.).

We believe that the results of this work will be useful in further research in the area of IoT fog computing service orchestration, and it will allow researchers to develop more efficient IoT systems.

**Author Contributions:** Conceptualization, A.V., N.M., N.Š.; investigation, N.M., A.V., J.T.; methodology, A.V., N.M., N.Š.; resources, A.V.; software, N.M., J.T.; supervision, A.V.; visualization, N.M., N.Š.; writing—original draft, A.V., N.M., N.Š.; writing—review and editing, A.V., N.M., N.Š., J.T.; funding acquisition, A.V. All authors contributed to the final version. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported in part by the European Union's Horizon 2020 research and innovation program under Grant Agreement No. 830892, project "Strategic programs for advanced re-search and technology in Europe" (SPARTA).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


**Sovit Bhandari 1 , Navin Ranjan 1 , Pervez Khan 1 , Hoon Kim 1, \* and Youn-Sik Hong 2**


**Abstract:** Proactive caching of the most popular contents in the cache memory of fog-access points (F-APs) is regarded as a promising solution for the 5G and beyond cellular communication to address latency-related issues caused by the unprecedented demand of multimedia data traffic. However, it is still challenging to correctly predict the user's content and store it in the cache memory of the F-APs efficiently as the user preference is dynamic. In this article, to solve this issue to some extent, the deep learning-based content caching (DLCC) method is proposed due to recent advances in deep learning. In DLCC, a 2D CNN-based method is exploited to formulate the caching model. The simulation results in terms of deep learning (DL) accuracy, mean square error (MSE), the cache hit ratio, and the overall system delay is displayed to show that the proposed method outperforms the performance of known DL-based caching strategies, as well as transfer learning-based cooperative caching (LECC) strategy, randomized replacement (RR), and the Zipf's probability distribution.

**Keywords:** fog access points; cache memory; convolutional neural network; proactive caching

#### **1. Introduction**

With the blooming of IoT devices, it is expected that the demand for mobile data traffic will grow at an unprecedented rate. To solve this issue to some extent, cisco coined fog computing-based network architecture to address latency-related problems [1]. Fog computing is a decentralized version of cloud computing with limited computational and signal processing capability, which brings the benefit of cloud computing nearer to the user side [2]. The remote radio heads with caching and signal processing capabilities in the fog computing architecture are referred to as fog access points (F-APs) [3,4]. F-APs have limited computational capability as compared to the cloud. So, F-APs should store popular cache contents proactively to maintain desirable fronthaul load to provide a better quality of service [5–7].

There has been extensive research related to caching in F-APs. Some of the related works are worth mentioning. In [8], the learning-based optimal solution is provided to place a cache memory content in a small cell base station based on the historical data. In [9], based on the user's mobility, a device-to-device optimal contents placement strategy is introduced. Likewise, in [10], the caching problem in multiple fog-nodes is studied to optimize delay in the large-scale cellular network. Similarly, in [11], the authors formulated; delay minimization and content placement-based joint optimization problems.

In the aforementioned literature [8–11], the research predicted the popularity of the contents based on Zipf's probability distribution method. However, this method cannot accurately predict the user content as user behavior is dynamic.

AI is likely to bring the fourth industrial revolution due to recent advances in its field [12]. There has been increased interest in deep learning (DL) models due to their remarkable impact in a wide variety of fields such as natural language processing, computer vision, and so on [13,14].

**Citation:** Bhandari, S.; Ranjan, N.; Khan, P.; Kim, H.; Hong, Y.-S. Deep Learning-Based Content Caching in the Fog Access Points. *Electronics* **2021**, *10*, 512. https://doi.org/ 10.3390/electronics10040512

Academic Editor: Kevin Lee

Received: 29 January 2021 Accepted: 18 February 2021 Published: 22 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Recently, there has been more focused research on the DL-based approach to predict the future popular contents to solve the prevalent cache content placement issue [15]. In this paper, motivated by the DL-based approach, we propose a DL-based content caching (DLCC) to proactively store cache contents in the fog computing environment.

#### *1.1. Related Works*

The concept of fog radio access network (F-RAN) architecture had been introduced to bring cloud contents nearer to the end-users side, such that the fronthaul burden prevalent in the cloud radio access networks can be lessened. The cloud contents can be offloaded nearer to the end-user side by deploying the proper content caching technique in the F-APs.

In recent years, there has been enormous investigation to address content caching problems for wireless networks. The work of [16] discussed a joint routing and caching problem to maximize the portion of contents served locally by the small base stations in the cellular networks. In [17], femtocell caching followed by D2D-based content sharing idea is put forward to improve cellular throughput. The work of [18] focused on the latency-centric analysis of the degree of freedom of an F-RAN system for optimal caching and edge transmission policies. The works in [16–18] did not consider taking popular contents into account to address caching problem.

Since the F-APs have limited storage as well as signal processing capabilities, the highly preferred user contents should be placed in the cache memory of the fog nodes. The traditional approach of allocating cache contents in the wireless networks includes; least recently used, least frequently used, first-in-first-out, random replacement (RR), and time-to-live [19]. These methods have become impractical to use in the live network as user requirement changes over time. To mitigate the traditional approach of solving caching problems, some authors recommended placing the cache contents by analyzing the user's social information. In [20], social-aware edge caching techniques have been studied to minimize bandwidth consumption. In [21], social information and edge computing environment have been fully exploited to alleviate the end-to-end latency. However, social ties alone cannot be a sole deterministic factor to determine the dynamic nature of user preference.

Lately, there has been a marked increment in the utilization of big-data analytics, ML, and deep learning (DL) in academia, as well as in industry. They have been used to solve problems related to diverse domains such as autonomous driving, medical diagnosis, road traffic prediction, radio-resource management, caching, and so on, due to their high prediction accuracy [22–24]. Due to promising solutions provided by the artificial intelligence (AI) technology in various domains, a trend to exploit ML-based and DL-based models to the pre-determined future requirement of the user content has been set-up.

For instance, the works which have used an ML-based approach to determine the cache contents proactively are listed in [25,26]. In [25], a collaborative filtering (CF)-based technique is introduced to estimate the file popularity matrix in a small cellular network. However, the CF algorithm provides the sub-optimal solution when the training data are sparse. To solve the caching problem without undergoing any data sparseness problem, the authors in [26] proposed a transfer learning (TL)-based approach. However, in this approach, if similar content is migrated improperly, the prediction accuracy becomes worse.

Likewise, some of the papers, which have used DL-based models to forecast popular contents for caching are listed in [15,27–29]. In [15], an auto-encoder-based model is used to forecast the popularity of the contents. Likewise, in [27], a bidirectional recurrent neural network is used to determine content request distribution. In [28], a convolutional neural network-based caching strategy is presented. Moreover, in [29], the authors used different DL-based models such as a recurrent neural network, convolutional neural network (CNN), and convolutional recurrent neural network (CRNN) to determine the best cache contents and increase the cache hit ratio. However, in the above works, the DL-based model could not achieve validation accuracy greater than 77%. Therefore, accurately predicting F-APs cache contents with DL-models has become a challenging task.

#### *1.2. Contribution and Organization*

In this paper, to minimize the delay while accessing users' content, a DLCC strategy is introduced to store the most popular users' contents in the F-APs. In DLCC policy, we introduced a supervised learning-based 2D CNN model to train using 1D real-world datasets. We identified key features and key labels of the datasets by different data preprocessing techniques such as data cleaning, one-hot encoding, principal component analysis (PCA), k-means clustering, correlation analysis, and so on. The goal of our DLCC algorithm is to predict the popularity of contents in terms of different categorical classes. Then, based on the prediction result, the data of the most popular class will be stored in the cache memory of the nearby F-APs. We quantified the performance of the proposed caching policy on F-APs by showing DL accuracy, cache-hit ratio, and overall system delay.

The methodology of this article is shown in Figure 1 and summarized as follows:


**Figure 1.** Methodology for deep learning-based content caching (DLCC).

The remainder of this paper is organized as follows; In Section 2, the system model is described. In Section 3, the DLCC policy is presented. Then, in Section 4, the performance of the proposed scheme is evaluated. Finally, in Section 5, conclusions are drawn.

#### **2. System Model**

× = {1,2,3, … , } ℱ = {1,2,3, … } In this section, a caching scenario for *N* × *M* fog radio access network (F-RAN) system having *N* user equipment (UEs), *M* F-APs, and one centralized base-band unit (BBU) cloud is modeled, as shown in Figure 2. In the system model diagram, we have U = {1, 2, 3, . . . , *N*} as the set of *N* users requesting data from F = {1, 2, 3, . . . *M*} as the set of *M* F-APs. In the diagram, the solid line connecting the BBU cloud to the access points represents the common public radio interface (CPRI) cable; whereas, the dashed-lines connecting the end-users to the access points denote the air-interface link.

**Figure 2.** DL-based content caching in the fog-access points (F-APs).

As per the system model diagram, the DL-based training is done on the centralized cloud (CC) considering the proactive as well as the reactive caching case. The testing results provided by the DL-based model on the CC are used to send the most popular contents to the F-APs. Since the computational capabilities of the F-APs are way lesser than that of the cloud, only the top most popular contents are stored in it. The contents stored in the F-APs are based on location-based user preference. Based on the proactive and reactive caching scenario in an F-RAN system, the delay system is formulated and is shown in part 2.1.

#### *2.1. Delay Formulation*

+ 1

,

× +1 ∈ ) ∈ ℱ) , ,ଵ ி ∅ , ௧ାଵ +1 In this part, we are interested in formulating overall system delay for the *N* × *M* F-RAN system, for some time instance *t* + 1. We assumed that the air-interface link capacity connecting the UE *i* (*i* ∈ U) to F-AP *j* (*j* ∈ F) as *C Ai i*,*j* , and the fronthaul link capacity connecting F-AP *j* to the CC as *C Fh <sup>j</sup>*,1 . For simplicity, we considered that the cache memory of all the F-APs has the same capacity to store the files, i.e., ∅ (GB). We also assumed that the size of each file *p* cached at the fog nodes is the same. Let *S t*+1 *i*,*j* denote the file size requested by the UE *i* with F-AP *j*, at any time instance *t* + 1. In our problem formulation, for direct transmission, we considered only the delay for transferring the data from the cloud to the F-APs. On the other hand, for cached transmission, the delay for offloading the cached contents from the F-APs to the UEs is neglected. Considering the above scenario, the overall delay for the *N* × *M* F-RAN system for any time instance *t* + 1 can be devised as:

$$\mathbf{P}(1)\,\delta\_{sys}^{t+1} = \min\left[\sum\_{i=1}^{N} \sum\_{j=1}^{M} \left(1 - x\_{i,j}^{t+1}\right) \frac{\mathbf{S}\_{i,j}^{t+1}}{\mathbf{C}\_{j,1}^{Fh}}\right] \tag{1}$$

$$\text{s.t.}\ S\_{i,j}^{t+1}\mathfrak{x}\_{i,j}^{t+1} \le \varpi \,\forall i, j \tag{2}$$

ಷ

, ௧ାଵ, ௧ାଵ ≤ ∅ ∀, <sup>௦௬௦</sup> ௧ାଵ +1 , ௧ାଵ where *δ t*+1 *sys* is the overall delay of the F-RAN system at time *t* + 1, and *x t*+1 *i*,*j* is the decision control variable to show whether the file requested by the UE *i* with F-AP *j* at any time

+1 ,

௧ାଵ

0, ℎ

*t* + 1 is available in the cache memory of the latched F-AP, or not. The value of *x t*+1 *i*,*j* can either be 1 or 0, not otherwise. This can be represented as:

$$\mathbf{x}\_{i,j}^{t+1} = \begin{cases} 1, & \text{if the content requested by UE } i \text{ is available in F-AP j} \\ 0, & \text{otherwise} \end{cases} \tag{3}$$

The constraint in the problem statement (P1) indicates that the size of the file cached in the cache memory of F-AP should be lesser than or equal to the overall memory size of that particular F-AP.

#### **3. DL-based Caching Policy**

In this section, DLCC is presented, so that the overall delay of the F-RAN system formulated in (P1) can be minimized in the best possible way. The general overview of the proposed caching policy is shown in Figure 3.

**Figure 3.** The overview of the DLCC policy to get the popular content in terms of popular class.

+1 Figure 3 contains a series of the task required for predicting the best cache contents for the F-APs. The initial step includes the extraction of the most popular 1D real-life datasets from the cloud. After the initial step, the downloaded data are pre-processed using various techniques to make it a suitable 2D dataset for the preferred supervised DL-based model. Then, the suitable 2D dataset is trained using the 2D CNN model. After that, the trained model is used to predict the contents on the basis of different categorical classes for time *t* + 1. Due to the memory constraint of the F-APs, only the contents of the top-most class are selected randomly to be stored in the F-APs for future user requirements. The steps mentioned above are coherently described in the subsections given below.

#### *3.1. Dataset*

MovieLens dataset is used for training the DL-based model, as it is a large dataset available in the open-source platform. Moreover, live streaming of the movies utilizes most of the fronthaul capacity. So, to lessen fronthaul load to some extent, proactive storing of the most popular movies in the cache memory of the F-APs is considered the most viable approach.

We downloaded the MovieLens dataset from [30]. It contains around 25 Million ratings and around 1 Million tag applications for 62,423 movies. Moreover, the dataset contains movies from 1 January 1995 to 21 November 2019, rated by 162,541 users. This dataset was generated on 21 November 2019. In this dataset, random users represented by a unique id had rated at least 20 movies. The data contained in this dataset represent genome-scores.csv, genome-tags.csv, links.csv, ratings.csv, and tags.csv.

#### *3.2. Data Pre-Processing*

It requires a large computational effort and also not relevant to train the whole MovieLens dataset, so the portion of the dataset is only taken for training and validation purposes. We used data from 1 January 2015 to 21 November 2019 from the dataset. The selected portion of the dataset contains only around 7.5 million ratings for 58,133 movies.

The initial dataset contains the categorical variable such as User-ID, Movie-ID, Rating, Date, Year, Month, Day, and Genres. Based on the dataset key features such as Year, Month, and Day, the daily requested movies are counted and are portrayed in the form of a yearly-based box plot, as shown in Figure 4.

**Figure 4.** Daily request count of movies on the basis of Movie-ID for each year.

As per Figure 4, we can see that on average, around 800 movies are requested daily. Likewise, the maximum movie request on a particular day is around 38,000, whereas the minimum movie request is 1.

If we go with the average daily request for the movie, it is still beyond the computational capacity of the F-AP to store 800 movies in its cache memory. To solve this, the dataset is further analyzed on the basis of the Movie-ID and daily movie request count, and it is depicted in Figure 5.

**Figure 5.** Probability density function (PDF) of the Movie-ID on the basis of movie request.

Figure 5 shows the probability density function (PDF) of the Movie-ID based on the movies requested. As per the figure, we can observe that the PDF is comparatively higher for the movie id number within 0–12,000. The PDF is smaller for the movies having id numbers greater than 12,000. Thus, for our analysis, movies with id numbers 0–12,000 are taken into account. The PDF of the selected movies is shown in Figure 6.

**Figure 6.** PDF of the selected Movie-ID.

After selection of Movie-ID from 0–12,000, the number of rows in the dataset is reduced to around 3.5 Million from 7.5 Million. This selected portion of the dataset accounts to be around 14% of the total dataset (25 Million). Then, the per-day count of the movie as per the unique Movie-ID is calculated to further reduce the number of rows to around 1.6 Million. After that, the dataset is re-arranged as per Movie-ID with its corresponding attributes such as year, month, day, genre, and movie counter.

In the dataset, the genre feature contains the string values. Since the genre feature in the dataset contains the different categorical string values, it is further processed by the One-Hot Encoding method to convert it into numerical form, as the DL-model employs on the numerical data. One-Hot encoding is one of the natural language processing techniques to convert categorical string variables into a numerical form such that machine learning algorithms can perform better prediction [31].

When the One-Hot Encoding technique is applied to a genre column containing multiple categorical string variables, the single genre column is transformed into 19 columns, as it contained 19 different categories. The increment in the column numbers adds up computational complexity to train the model. Therefore, to reduce the computational complexity, principal component analysis (PCA) of the categorical variable of the genre data is performed. PCA is a robust approach for reducing the dimension of datasets; by preserving most of the useful information. It does so by creating new uncorrelated variables that successively maximize variance [32].

The PCA analysis on 19 categorical columns is done to reduce 19 categorical columns to 3 categorical columns. The newly formed categorical columns are named PCA-1, PCA-2, and PCA-3, respectively. Figure 7 shows the individual and cumulative weightage of variance provided by the formed three principal components. In the figure, the first, the second, and the third principal components are indicated by the x-axis values 0, 1, and 2, respectively. The three principal components contain the Eigen-values of the PCA-1, PCA-2, and PCA-3, respectively. We reduced the 18-columned matrices to 3-columned matrices because the cumulative sum of the variance of three principal components accounted for 45.70% of the total variance. As per the figure, the Eigen-values of PCA-1, PCA-2, and PCA- 3 contributed 19.66%, 15.59%, and 10.45% of the total variance, respectively. The portion of the variance contributed by PCA-1 is greater, so it is referred to as principal component 1. Since the portion of the variance contributed by the Eigen-values of succeeding columns is lesser than the preceding, they are indicated as principal component 2 and principal component 3, correspondingly.

**Figure 7.** Individual and cumulative variance of the principal components.

After the column reduction technique is applied to the dataset, the 1D dataset is converted to a 2D dataset for the selected 9019 Movie-ID's. There are around 1787 days from the starting of 2015 to 21 November 2019. When 9019 Movie-ID's is multiplied to 1787, the resulting 2D dataset will have 16,107,934 rows. This is a 1000% increment in the size of the dataset from the reduced version of the 1D dataset.

The resulting 2D dataset is clustered based on per day's movie request count to add label to the dataset. The dataset is categorized into four categories, i.e., Class 0, Class 1, Class 2, and Class 3, by using the k-means clustering technique, as shown in Figure 8.

**Figure 8.** Vertical clustering of the dataset on the basis of per-day count of the Movie-ID.

In Figure 8, Class 0 is represented by the purple color. This class includes Movie-ID having the almost higher count on a particular day. Likewise, Class 1, Class 2, and Class 3 are represented by the blue, green, and yellow colors, having a range of values such as 12–362, 4–11, and 1–3, respectively. In our 2D dataset, Class 0, Class 1, Class 2, and Class 3 contains 22,604, 212,759, 1,406,486, and 14,736,655 rows, respectively. The Class 0 movies are referred to as highly preferred movies, whereas Class 3 movies are regarded as the least requested ones. These categorized values are placed under the column name Class of the dataset.

The resulting dataset contains Year, Month, Day, Movie-ID, PCA-1, PCA-2, and PCA-3 as the key features, and the Class as a key label. The correlation matrix in Figure 9 is shown to depict the usefulness of our dataset for the 2D CNN model.

**Figure 9.** Dataset correlation matrix.

#### *3.3. DLCC Model*

In this section, at first, we explain the problem statement and then discuss the architecture for predicting the future popularity of movies listed in the MovieLens dataset by using the time-series sequence of historical data.

#### 3.3.1. Problem Statement

 ௧

 Χ = {<sup>ଵ</sup> , <sup>ଶ</sup> , <sup>ଷ</sup> ,…,<sup>௧</sup> } = {<sup>ଵ</sup> , <sup>ଶ</sup> , <sup>ଷ</sup> ,…,<sup>௧</sup> } ௧ Χ The main objective of the DLCC model is to realize the future likelihood of the data contents being accessed by the connected UEs. In this study, the DLCC model is trained based on the MovieLens dataset *d*, containing movie lists up to time *t*. Let X = {*X*1, *X*2, *X*3, . . . , *Xt*} be the chronological order of time-variant historical movies list. Its corresponding output label, which is particularly the classification of movies list based on popularity, can be represented as *Y* = {*Y*1, *Y*2,*Y*3, . . . ,*Yt*}. The *i th* time input of X. can be denoted as:

$$X\_i = \begin{bmatrix} \mathbf{x}\_{11}^i & \mathbf{x}\_{12}^i & \cdots & \mathbf{x}\_{1f}^i \\ \mathbf{x}\_{21}^i & \begin{bmatrix} \mathbf{x}\_{22}^i & \cdots & \mathbf{x}\_{2f}^i \\ \vdots & & \ddots & \vdots \\ \vdots & & \ddots & \vdots \\ \mathbf{x}\_{n2}^i & & \cdots & \mathbf{x}\_{nf}^i \end{bmatrix} \end{bmatrix} \in \mathbb{R}^{n \times f} \tag{4}$$

ere *X<sup>i</sup>* contains collection of *n* movie samples, each having *f* features. Similarly, its corresponding *i th* time output label can be represented as:

$$\mathbf{y}\_{i} = \begin{bmatrix} y\_{1}^{i} \\ y\_{2}^{i} \\ \vdots \\ y\_{n}^{i} \end{bmatrix} \in \mathbb{R}^{n \times 1} \tag{5}$$

where *Y<sup>i</sup>* contains the category of each movie sample based on popularity. The primary objective of this study is to develop the prediction model M, which uses input features *X<sup>i</sup>* to predict the popular class *Y<sup>i</sup>* , which can be defined as: ⎣ ⎦ ℳ

$$\mathbf{Y}\_{i} = \mathcal{M}(\mathbf{X}\_{i}, \boldsymbol{\theta})\tag{6}$$

$$\mathbf{X}\_{i} = \begin{array}{c} \mathbf{M}(\mathbf{X}\_{i}, \boldsymbol{\theta})\\ \mathbf{M}(\mathbf{X}\_{i}, \boldsymbol{\theta})\\ \vdots\\ \mathbf{M}(\mathbf{X}\_{i}, \boldsymbol{\theta}) \end{array} \tag{6}$$

where *θ* is the model parameter of DLCC model.

#### 3.3.2. Model Implementation

In this part, we use 2D CNN for feature extraction from the input 2D MovieLens dataset. Moreover, we use a regression-based approach to solve the classification problem. The main goal of the DLCC model is to categorize MovieLens dataset on the basis of popularity. The DLCC model used in this paper is shown in Figure 10.

 ௧ ௧ In Figure 10, the architecture of the DLCC model is formed by stacking three convolutional layers, two max-poling layers, one flatten layer, and one dense layer. Mathematically, the *f th* feature map of *l th* convolutional layer *y l f* can be obtained by first convoluting 2D input or previous layer output with the convolutional filter and then applying bit-wise non-linear activation, which is shown in Equation (7).

$$\mathbf{y}\_f^l = \sigma \left( \sum\_{k=1}^{f\_{l-1}} y\_k^{l-1} \bigoplus \mathcal{W}\_{kf}^l + b\_f^l \right), \mathbf{f} \in [1, f\_l] \tag{7}$$

 ௧

 ିଵ ௧ ( − 1)௧ ௧ ௧ ௧ ௧ where *y l*−1 *k* is the *k th* feature map of (*l* − 1) *th* layer, *W<sup>l</sup> k f* is the kernel weight at position *k* connected to the *f th* feature map of *l th* layer, *b l f* is the bias of *f th* filter of *l th* layer, *f l* is

௧ (. )

the number of the filter in *l th* layer and *σ*(.) represent element-wise non-linear activation function. Equation (8) shows the output of *l th* convolutional layer and pooling layer.

$$y\_f^l = \text{pool}\left(\sigma\left(\sum\_{k=1}^{f\_{l-1}} y\_k^{l-1} \bigoplus W\_{kf}^l + b\_f^l\right)\right), \; f \in [1, f\_l] \tag{8}$$

In the DLCC model, the feature learned from the 2D CNN model is concatenated into a dense vector by flattening operation. The dense layer contains the high feature extraction from the input. Let *L* be the previous layer before flattening layer, having *f<sup>L</sup>* number of feature maps, then the output of *L* + 1 layer, *y L*+1 is given as:

$$y^{L+1} = o\_{flatten}^{L} = flatten\left(\left[y\_1^L, y\_2^L, \dots, y\_{f\_L}^L\right]\right) \tag{9}$$

where *y L* 1 , *y L* 2 , . . . , *y L fL* are the feature maps of *L th* layers, and *o L f latten* is the flatten vector of *L* layer. Finally, the flattened layers are transformed to model output through a fully connected layer, having *W<sup>d</sup>* and *b<sup>d</sup>* weight and bias of fully connected dense layer. The model output can be written as:

$$\mathcal{Y} = \mathcal{W}\_d \boldsymbol{\sigma}\_{flatten}^L + \boldsymbol{b}\_d = \mathcal{W}\_d \left( flatten \left( \boldsymbol{p}ool\left( \sigma \left( \sum\_{k=1}^{f\_{l-1}} \boldsymbol{y}\_k^{l-1} \bigoplus \boldsymbol{W}\_{kf}^l + \boldsymbol{b}\_f^l \right) \right) \right) \right) + \boldsymbol{b}\_d \tag{10}$$

In our model MSE loss function L(*θ*) is used to optimize the target. Minimizing MSE is taken as the training goal of our model. Mathematically, MSE can be written as:

$$\mathcal{L}(\theta) = \|y\_t - \hat{y}\_t\|\_2^2 \tag{11}$$

In Figure 10, at first 2D input of size 9019 × 7 (rows number × column numbers) is employed to the first convolutional 2D layer of the portrayed DLCC model. In the first convolutional layer, the input is scaled up by using 32 filters of size 2 × 2 with a stride of 1 × 1. The convoluted output of the first layer is then fed to the downsampling layer. In the downsampling layer, the max-pooling technique with a pool size of 2 × 2 is used along with the batch normalization (BN) and dropout techniques. In our DLCC model, BN is used to stabilize the learning process and reduce the number of epochs required to train the neural networks [33], whereas dropout is used to prevent the trained model from overfitting [34]. The convoluted downsampled data of size 4509 × 3 × 32 are employed in the second convolutional layer to reduce the filter number from 32 to 16 by using the same filter and stride size used in the first convolutional layer. After that, the convoluted outputs of the second convolutional layer are again employed in the downsampling layer to reduce the size of inputs to 2254 × 1 × 16. Again, the downsampled data of the second downsampling layer are fed to the third convolutional layer to reduce the filter size from 16 to 8. Since the necessary features were extracted after the implementation of the third convolutional layer, the features of size 2254 × 1 × 8 are flattened to employ it to the fully-connected neural network (FCNN) for the regression process. In each convolutional layer, a rectified linear unit (ReLU) activation function is used to increase the non-linearity in our input data, as well as to solve a vanishing gradient problem. In the regression process, the input of size 18,032 is fed to the FCNN layer to get the output of size 9019. The detailed structure of our DLCC model is shown in Table 1.


× 7 × ×

× × ×

× 7 × × × × ×

**Table 1.** Detailed structure of DLCC with three convolutional layers. × ×

In the output of FCNN, the ReLU activation function is used to get output greater than one. After the implementation of the ReLU activation function, the predicted outputs are rounded-off to make hard-decision. The obtained hard-decision is the classification of the MovieLens dataset based on popularity. We trained 1461 data of size 9019 × 7 to predict output for time *tn*+1. 

**Algorithm 1:** Training process for DLCC model.

**Input:** Training dataset , model **Ouput:** Trained model **Initialize:** ௧, ௩, ఈ, = 0 **Find the best parameters:** To train the model 1. **for** in range(10) **do** 2. ← 2 ∗∗ 3. **for** in range(1000) **do** 4. , ← rand(0,1) 5. Train the model with dataset minimizing ℒ( ) 6. Store all of training information of model for each training loop , in array , ← {: , <sup>ఈ</sup> , , ௧ , , ௩ , } 7. **endfor**  8. **endfor** 9. Choose the with best parameters of index = argmin(௧ , ௩ , ) to train the model

**Train:** model with index parameters to produce trained model

 <sup>ఈ</sup> <sup>ఈ</sup> ௧) ௩ ௧ ௧ ௩ Since our problem is a multi-variant regression problem, the mean square error method is used in the training process [35]. Moreover, the Adam optimizer is used to update the weight and learning rate values as it is straightforward to implement, computationally efficient, and has low memory requirements [36]. It is very difficult to tune the hyper-parameters required to train the DL model. The detailed procedure to select hyper-parameters such as batch size (b) and learning rate (*α*) to train the proposed model (*m*) is shown in Algorithm 1.

In Algorithm 1, the CNN model *m* is trained for the random values of batch size *m<sup>b</sup>* and learning rate *mα*. For each value of *m<sup>b</sup>* , 1000 random learning rates having a value in between (0, 1) is realized to train on dataset *d*. The value of *m<sup>b</sup>* is selected by increasing the power of base integer two from 0–9. Using every value of *m<sup>b</sup>* and *mα*, the model is trained on dataset *d* to obtain training error (*mterror*) and validation error (*mverror*), which is stored in the *val* array. After the completion of the loop, the *k th* index on the *val* array providing the minimum value of *mterror* and *mverror* is selected to extract the hyper-parameters value stored in that particular index. Finally, the obtained hyper-parameter values are selected to train the model *m* to get trained model *m<sup>k</sup>* .

Choosing the depth of convolutional neural network plays a crucial role in determining the performance of the model as each addition of convolutional layer in the model leads to the increment of the feature map, so the learning. However, beyond a certain limit, each addition of a convolutional layer in the model tends to overfit the data. So, based on DLCC model accuracy, we experimented using different depths of the model to find a better one. Table 2 justifies why we only chose three convolutional layers in our model.

**Model Description Filter Configuration Validation Loss (MSE) Computational Time (min)** DLCC\_1\_1 1 2D-CNN and 1 FCNN 32 N/A N/A DLCC\_2\_1 2 2D-CNN and 1 FCNN 32\_16 0.2785 23.75 DLCC\_3\_1 3 2D-CNN and 1 FCNN 32\_16\_8 0.0452 13.35 DLCC\_4\_1 4 2D-CNN and 1 FCNN 32\_16\_8\_4 0.0596 7.98

**Table 2.** Comparison of the different model depths of DLCC.

As per Table 2, we can see that the DLCC model having the single convolutional layer could not provide any output because of the large number of trainable parameters, i.e., 39B. The MSE is minimum for the DLCC model having three convolutional layers in its architecture with a filter configuration of 64\_32\_8, while the computational time is lesser for the DLCC model having four convolutional layers. Since we require to select the model which provides minimum MSE in a reasonable time, we selected the DLCC\_3\_1 model to solve the caching issue. The validation MSE provided by the DLCC\_3\_1 model at the cost of 13.35 min is 0.0452. The number of the input parameters and hyper-parameters was the same for generating results for all four configurations.

Furthermore, the number of filters in the convolution neural networks also plays an important role in determining the performance of the model. So, to find out the best filter configuration for our DLCC\_3\_1 model, we tried three different configurations, as shown in Table 3.

**Table 3.** Comparison of the different filter configuration of DLCC.


Table 3 shows that the DLCC\_3\_1 model having filter configuration 32\_16\_8 provides a better validation loss as compared to the other two different types of filter configuration. So, we selected the filter configuration of 32\_16\_8 for our DLCC\_3\_1 model.

#### *3.4. Cache Decision*

In this part, caching decision process is described to allocate the best cache contents to F-APs. The initial step includes the training of the DLCC model on the cloud, based on Algorithm 1. Then the trained model is used to predict cache contents for time *t* + 1. After that, the list of contents categorized based on popular classes is transferred to the F-APs for the selection process. On the priority order, the contents of Class 0 is stored in the available cache memory of the F-APs. If the is still some memory available, the contents of Class 1 followed by Class 2 are recommended to be stored in the available cache memory. The contents of Class 3 are not stored even if there is any memory space available in the F-APs as contents of Class 3 are the least preferred ones. The detailed procedure is summarized in Algorithm 2.

**Algorithm 2:** Cache content decision process.

**Input:** Requested contents history

**Output:** Selected content list to be stored in the cache memory of F-APs


 7. Store all the contents of Class 1 in the remaining cache memory of F-APs 8. **else** 


15. Store the contents of Class 0 randomly until the cache memory of F-APs is full

#### **4. Performance Analysis**

In this section, the performance of the proposed CNN-based model is shown in terms of model key performance indicators (KPI): such as MSE and prediction accuracy. Likewise, the performance cache content decision is quantified in terms of cache hit ratio and system delay.

To train the DLCC model, we used the Keras library on top of the TensorFlow framework in Python 3.7 as a programming platform. The training process for our datasets is performed by using a computation server (MiruWare, Seoul, Korea). The specification of the computational server includes; one Intel Core i7 CPU, four Intel Xeon E7-1680 processors, and 128 GB random access memory. The results are obtained by using a computer with 16 GB random access memory and an Intel Core i7-8700 processor.

#### *4.1. Model KPI*

In this part, the DLCC model KPI in terms of MSE (regression), and prediction accuracy (classification) is presented. The proposed 2D CNN-based model is trained on each day data of the MovieLens dataset from January 2015–December 2018. Likewise, the validation of the trained model is done on the data of January 2019–October 2019. Finally, the trained model is tested on the data of November 2019. The simulation parameter used while training the model is summarized in Table 4.

Before the application of hard-decision on the obtained results, the MSE obtained while testing the trained model on the November 2019 data of MovieLens dataset is shown in Table 5. The obtained result is compared with the results shown in [29].

The hard-decision rule is implemented for classifying the prediction results of the CNN-based regression model, which is shown in Table 6.


**Table 4.** Simulation parameters for the training model.

**Table 5.** Comparison of results obtained from different DL methods.


**Table 6.** Mapping table for classifying the results of convolutional neural network (CNN)-based model.


After the implementation of mapping table shown in Table 6, the average value of prediction accuracy and prediction error of November 2018 is shown in Figure 11 for the different values of training epochs.

As per Figure 11, it can be seen that there is an exponential rise in the prediction accuracy of the model and exponential decay in the prediction error of the model till the five training epoch. Beyond the five training epoch, the learning potential of the model enters the saturation phase. The prediction accuracy of the model is 92.81% for the five training epochs, but it is around 1% for the training epoch less than four. Moreover, the error curve shown in Figure 11 is plotted based on the formula; *Classi f ication Error* (%) = 100 (%) – *Classi f ication Accuracy* (%).

**Figure 11.** Accuracy of DLCC model to classify MovieLens dataset on the basis of popularity.

To lime-light the prediction accuracy of each class, a multi-class confusion matrix is drawn for the prediction date: 1 November 2019 to 21 November 2019. The confusion matrix value of the whole testing period is averaged and is shown in Table 7. The accuracy of the confusion matrix shown in Table 4 can be calculated using the following formula [37]:

$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \tag{12}$$

where "TP", "TN", "FP", and "FN" corresponds to "true positive", "true negative", "false positive", and "false negative", respectively. Using the above equation, the value of the accuracy for the confusion matrix shown in Table 7 is calculated to be 92.81%. This accuracy is around 21–54% greater than the prediction accuracy reported in the papers [15,27–29] while solving a similar problem.


 = + + + + **Table 7.** Multi-class confusion matrix (average value) for the prediction of the popularity of cache contents for 1 November 2019–21 November 2019.

#### *4.2. System KPI*

In this section, the cache hit ratio and overall system delay are shown to portray the usefulness of the DLCC policy in the F-RAN system. The movies of the categories "0", "1" and "2" are proactively stored in the F-APs, whereas the movies under category "3" are not stored as they are the least preferred ones.

Mathematically, the cache hit ratio for any time instance can be calculated as:

$$\text{Cache hit ratio } (t) = \frac{\text{Total cache hits } (t)}{\text{Total cache hits } (t) \, + \text{ Total cache misses } (t)} \tag{13}$$

The system parameters used to calculate the cache hit ratio and the system delay are summarized in Table 8.

 ℎ ℎ () ℎ ℎ () + ℎ ()


**Table 8.** Fog radio access network (F-RAN) system parameters.

ℎ ℎ () =

+ Greater than.

As shown in Table 8, an F-RAN system consisting of 50 F-APs and 400 UEs is designed to request movie files from the pool of files. It is assumed that each user can request only one movie file from F-APs at time *t* + 1. Likewise, the size of each movie file listed in the MovieLens dataset is considered to be 1 GB, making a total of 9019 GB. Since there are 400 UEs in our system, 400 movie requests at time *t* + 1 make a total demand size of 400 GB. Based on the above simulation parameters, the cache hit ratio is calculated for the different values of total cache memory and is portrayed in Figure 12. +1 +1 .

**Figure 12.** Total F-AP capacity vs. cache hit ratio.

Figure 12 shows that the cache hit ratio for five different caching policies such as ideal, DLCC, Zipf's probability distribution, randomized replacement (RR), and no-cache condition for variant cache memory. The cache hit ratio of DLCC is approximately 527% and 334% greater than RR and Zipf's probability distribution, respectively, for the total storage space of 600 GB. Moreover, the cache hit ratio obtained using the DLCC approach in this paper is compared with the transfer learning-based cooperative caching (LECC) strategy introduced in the paper [26] and is shown in Table 9.


**Table 9.** Comparison between DLCC and learning-based cooperative caching (LECC) approach on the basis of cache hit ratio.

As per Table 9, we can see that the cache hit ratio for the DLCC approach is 57% for the 0.66 F-APs capacity (normalized by the total contents). Likewise, in the LECC-based approach, the cache hit ratio is 55% for the 0.8 F-APs capacity (normalized by the total contents). Based on the above comparative analysis, we can say that the DLCC approach is better than the LECC approach for proactive caching.

Figure 13 shows the overall delay in the F-RAN system for the proposed DNN-based proactive caching policy. The total system delay is calculated by using equations listed in (1), (2) and (3). It is assumed that each CPRI cable connecting CC to F-AP has an average downloading speed of 10 Gbps for a distance range of more than 10 km. The delay added by the DLCC in the F-RAN system is approximately 200% and 193% lesser than RR and Zipf's probability distribution method, respectively, for 600 GB of the storage capacity. As per the figure, the total average delay of 5.33 min is added to the system to download movies of cumulative size 400 GB for the no-cache memory scenario. Whereas, in the case of the DLCC scheme, a minimum value of total delay of 2.28 min can be experienced, provided that the total F-AP capacity is greater than 400 GB.

**Figure 13.** Total F-AP capacity vs. total system delay

#### **5. Conclusions**

In this paper, a 2D CNN-based DLCC approach is proposed to proactively store the most popular file contents in the cache memory of F-APs. For training the DLCC model, a publicly available MovieLens dataset containing the movie's historical feedback information is taken into account since movie files are responsible for a major portion of the fronthaul load in the F-RAN system. Simulation results showed that our proposed model acquired an average testing accuracy of 92.81%, which is around 21–54% greater than the prediction accuracy reported in the papers [15,27–29] while solving a similar problem. Likewise, when the trained model is deployed in the F-RAN system, it obtained the maximum cache hit ratio of 0.57 and an overall delay of 2.28 min. In comparison with RR and Zipf's probability distribution methods, the cache hit ratio obtained using DLCC is approximately 527% and 334% greater, and the overall delay is approximately 200% and 193% lesser, respectively. Moreover, the cache hit ratio reported in this paper is better than the cache hit ratio obtained using the LECC strategy.

**Author Contributions:** Conceptualization, S.B. and H.K.; methodology, S.B.; software, S.B. and N.R.; validation, S.B., N.R., P.K., H.K. and Y.-S.H.; resources, H.K. and Y.H.; data curation, S.B. and N.R.; writing—original draft preparation, S.B.; writing—review and editing, S.B., N.R., P.K., H.K. and Y.-S.H.; visualization, S.B. and P.K.; supervision, H.K.; project administration, H.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. This data can be found here: https://grouplens.org/datasets/movielens/.

**Acknowledgments:** This work was supported by Post-Doctoral Research Program of Incheon National University in 2017.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

