*5.3. Performance Comparison*

In this subsection, the performance evaluation of different algorithms is shown in Figures 5–7. Figure 5 shows the reward of different algorithms in terms of the system bandwidth. As the system bandwidth increases, more and more IoT tasks can be offloaded to the MEC server. Therefore, the energy consumption of IoT devices is reduced and the reward is increased in Figure 5. We can observe that the reward of FL-DDPG is higher than that of other algorithms. The reason can be obtained by analyzing each algorithm in detail, which is as follows. The DDPG algorithm only adopts one network model to train the decision-making scheme, which is easy to fall into local solution. Compared with DDPG and FL-DDPG, the DQN algorithm discretizes resources, which is a coarse-grained resource allocation scheme. Since DDPG algorithm is a fine-grained resource allocation scheme, the performance of DDPG algorithm is better than DQN algorithm. The GREEDY algorithm offloads the tasks generated by IoT devices with good network status to the MEC server. The algorithm only optimizes the communication resources, does not jointly optimize the communication resources and computing resources. The RANDOM algorithm is to randomly offload the tasks generated by IoT devices to the MEC server for execution. Further, we can observe that there is a little performance difference between DDPG and FL-DDPG algorithms when the system bandwidth is 5, 9, 10, 11 and 12. The reward of FL-DDPG is 1.3%, 1.1% and 1% higher than that of DDPG when the system bandwidth is 5, 9 and 10, respectively. The reason is that when the system bandwidth is very small, most of the tasks generated by IoT devices cannot be offloaded to the MEC server and can only be processed on the IoT devices. Since processing tasks on IoT devices do not involve the allocation of MEC computing resources and communication resources, the decision-making environment is simplified. Moreover, the energy consumption caused by a large number of IoT devices processing will drown out the energy consumption of transmission caused by offloading. When the system bandwidth is larger, most of the tasks generated by IoT devices can be offloaded to the MEC server. The reason for this is the same as above. Therefore, when the resources are in extreme situations, the exploration environment of reinforcement learning becomes relatively simple, resulting in a little performance difference between DDPG and FL-DDPG. In actual equipment deployment, these two extreme situations are generally not selected in terms of the cost and the quality of service. There is a large performance difference between DDPG and FL-DDPG algorithms when the

system bandwidth is 6 and 7. The reward of FL-DDPG is 12% and 10% higher than that of DDPG when the system bandwidth is 6 and 7, respectively. When the system bandwidth is moderate, the decision-making environment becomes complex. The more complex the decision-making environment is, the greater the probability of DDPG algorithm falling into the local optimal solution is. Since the FL-DDPG algorithm aggregates the training parameters of three network models, it is easy to jump out of the local optimal solution.

**Figure 5.** Performance evaluation on reward.

**Figure 6.** Performance evaluation on energy consumption.

**Figure 7.** Performance evaluation on reward when the delay threshold is different.

Figure 6 shows the mean energy consumption of different algorithms in terms of the system bandwidth. From Figure 6, it is observed that the mean energy consumption of FL-DDPG algorithm is less than other algorithms. In the setting of reward, there is a negative exponential relationship between energy consumption and reward. Therefore, Figures 6 and 5 are one-to-one correspondence.

Figure 7 shows the reward of different algorithms in terms of the delay threshold. In this article, since these tasks generated by IoT devices are not very strict on the response time, the delay threshold is set to the same. From Figure 7, it is observed that the reward increases when the delay threshold increases. The reason is that when the delay threshold increases, more tasks can be offloaded to the MEC server and completed within the delay threshold. Therefore, the energy consumption of IoT devices is reduced and the reward is increased. Figure 8 shows the delay of different algorithms in the same environment configuration. The delay of five algorithms is less than the delay threshold (0.1 s).

#### *5.4. Analysis of Offload Location*

Figures 9 and 10 show the offloading location of FL-DDPG when the system bandwidth is 5 MHz and 10 MHz. In this experiment, the X-axis denotes the number of episodes, the Yaxis denotes the IoT device index, and the Z-axis denotes the offloading location. The value range of the offloading location is 0, 1. Value 0 indicates that the task is processed on the IoT device, value 1 indicates that the task is offloaded to the MEC server. From Figures 9 and 10, it is observed that the number of red points is less when the system bandwidth is 10 MHz. Figures 9 and 10 indicate that more tasks are offloaded to the MEC server when the system bandwidth increases. From Figure 9, we can observe that all tasks of IoT device 6 are not offloaded to the MEC server when the system bandwidth is 5 MHz. The reason is that the task of IoT device 6 has the characteristics of large amounts of data and low computing workload. If the task of IoT devices 6 is offloaded to the MEC server, it will consume a lot of bandwidth and a small amount of the MEC computing resources. Obviously, in the case of limited resources, it is unreasonable to offload the task to the MEC server. Therefore, all tasks of IoT device 6 are processed on the IoT device.

**Figure 8.** Delay of different algorithms.

**Figure 9.** System bandwidth *B* = 5 MHz.

**Figure 10.** System bandwidth *B* = 10 MHz.
