*5.2. Convergence Analysis*

In this subsection, the convergence performance of FL-DDPG and DDPG is shown in Figure 3. In this article, the Adam optimizer is adopted to train the FL-DDPG and DDPG networks. In the training process of FL-DDPG, it needs about 240,000 training episodes (3 h) to achieve a better convergence performance. From Figure 3, we observe that the convergence performance of FL-DDPG is better than that of DDPG. Since FL-DDPG aggregates the parameters of three MEC servers, it is easy to jump out of the local optimal solution. DQN algorithm discretizes the resources, and decides where each resource block should be allocated. Therefore, DQN algorithm has no resource allocation constraints (the allocated resources will never exceed the total resources), and directly pursues the minimization of energy consumption. The value range of the reward of DQN algorithm is 0 < *r* < 1. Since DQN algorithm is a coarse-grained resource allocation scheme, the convergence performance of DDPG is better than that of DQN. Figure 4 shows the training performance of different aggregation intervals in FL-DDPG algorithm. From Figure 4, it is observed that the training performance is the best when the aggregation interval is 30,000. When the aggregation interval is less or greater than 30,000, the training performance is not good. The reason for this is that there is not enough time to explore new environment when the aggregation interval is smaller. When the aggregation interval is larger, over-fitting is caused by too long exploration time. Therefore, the aggregation interval of 30,000 is adopted to train the network parameters in this article.


**Figure 3.** Convergence property of different algorithm.

**Figure 4.** Performance evaluation on aggregation interval.
