You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

16 January 2021

FedPSO: Federated Learning Using Particle Swarm Optimization to Reduce Communication Costs

,
and
1
The Department of Security Convergence Science, Chung-Ang University, Seoul 06974, Korea
2
The Department of Industrial Security, Chung-Ang University, Seoul 06974, Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue AI-Based Communications

Abstract

Federated learning is a learning method that collects only learned models on a server to ensure data privacy. This method does not collect data on the server but instead proceeds with data directly from distributed clients. Because federated learning clients often have limited communication bandwidth, communication between servers and clients should be optimized to improve performance. Federated learning clients often use Wi-Fi and have to communicate in unstable network environments. However, as existing federated learning aggregation algorithms transmit and receive a large amount of weights, accuracy is significantly reduced in unstable network environments. In this study, we propose the algorithm using particle swarm optimization algorithm instead of FedAvg, which updates the global model by collecting weights of learned models that were mainly used in federated learning. The algorithm is named as federated particle swarm optimization (FedPSO), and we increase its robustness in unstable network environments by transmitting score values rather than large weights. Thus, we propose a FedPSO, a global model update algorithm with improved network communication performance, by changing the form of the data that clients transmit to servers. This study showed that applying FedPSO significantly reduced the amount of data used in network communication and improved the accuracy of the global model by an average of 9.47%. Moreover, it showed an improvement in loss of accuracy by approximately 4% in experiments on an unstable network.

1. Introduction

Recently, the use of mobile devices such as smartphones and tablets has been increasing. Various forms of data are being generated and accumulated on mobile devices, including data generated by users and sensors such as cameras, microphones, and the global positioning system. The accumulated data on mobile devices are beneficial for deep learning, which demonstrates good performance when there is a significant amount of data.
Data from mobile devices can be used for machine learning (ML) in various ways [1]. Google’s Gboard, for example, uses ML to learn words that users frequently type and recommends the next words to be typed [2]. However, there are four points to consider in using mobile device data for ML.
  • Mass data collection cost: network communication and storage costs for collecting and managing large amounts of original data on the server are high.
  • Unstable networks: mobile devices cannot use wired networks and are mostly connected via Wi-Fi, making it difficult to establish a stable network environment.
  • Low computation capability: the processors in mobile devices do not have sufficient computing capabilities for ML.
  • Security threat: collecting or storing private data increases the likelihood of data breaches.
Therefore, to implement a successful ML model, especially an artificial neural network (ANN (ANN: Artificial Neural Network, a statistical learning algorithm inspired by biological neural networks among ML methods. Examples are supervised learning and unsupervised learning.)), using mobile device data, it is necessary to reduce the size of the collected data, strengthen the security of the collected data, improve the robustness to an unstable network environment, and reduce the number of training parameters (weight (Weight: The weight used in this paper refers to a parameter that updates the input data in the hidden layer of the ANN. In short, it is a value that determines the amount of influence input data has on output data. The weight is updated through the ANN’s back-propagation process as training progresses.) of neural network). Research on federated learning has been steadily progressing to solve these problems and be able to use the vast amount of data on mobile devices [3,4]. Federated learning is an ML model for training on distributed data. This model ensures privacy by not sending data from personal devices to a central server. In addition, federated learning reduces communication costs by transmitting only the learned models without transmitting large amounts of source data to the server.
In conventional ANN models, calculation time accounts for much more than communication time, so various algorithms are used to reduce the calculation time, such as using graphics processing unit (GPU) accelerators and connecting many GPUs. However, in federated learning, communication takes more time than calculation. Thus, network communication time should be reduced to improve the efficiency of federated learning. Because of unstable network environment problems, federated learning requires environmental conditions such as Wi-Fi connections and connected chargers [4]. Therefore, to reduce the communication cost of federated learning, it is necessary to improve network transmission speed and solve problems with unstable network environments.
Most of the models using federated learning use the global model federated averaging (FedAvg) [5]. The current study aims to increase the model update speed by applying Particle Swarm Optimization (PSO), which is an algorithm that obtains an optimal solution in a distributed environment [6,7]. PSO requires many repetitions because it obtains the optimal solution through a stochastic approach, which is in line with learning through many repetitions of ML. The PSO is well suited to dynamic and heterogeneous environments such as federated learning. Thus, we propose a new ANN model by applying the PSO to federated learning.
  • To the best of our knowledge, this paper is the first paper focused on reducing network communication costs by applying PSO in the communication process of federated learning. We propose a new model, federated PSO (FedPSO), that collects scores such as accuracy and loss rather than weights for global model updates.
  • We evaluate FedPSO for network communication cost and accuracy. In experimental results, the network communication cost of FedPSO was less than that of existing models, and FedPSO had an average accuracy improvement of 9.47%.
  • We evaluated FedPSO in an unstable network environment. In experimental results, the proposed FedPSO showed a 4% improvement in loss of accuracy over existing algorithms.
The rest of the paper is structured as follows. Section 2 reviews previous studies that use federated learning and PSO. Section 3 describes the process of transmitting the model learned from the client to the server through the proposed algorithm. The evaluation of our proposed technique is presented in Section 4, before finally concluding the paper in Section 5.

3. FedPSO: Federated Particle Swarm Optimizing

A general approach to improving the accuracy of ANN models is to deepen the layers of the model. This is called a deep neural network. As the layers become deeper, the number of weight parameters that require training increases. In the universal federated learning (as shown in Figure 2), when the model trained on the client is sent to the server, the network communication cost increases considerably. Therefore, we propose the FedPSO algorithm, which sends the best score (such as accuracy or loss) to the server by utilizing PSO characteristics to transmit the trained model, regardless of size.
Figure 2. The weighted aggregation process of Federated Learning (such as FedAvg) obtains the average of the w t value received from the client of K from the server and sends the updated w t + 1 back to the client.
Before explaining the proposed FedPSO, we will analyze the algorithm used in the previous work on federated learning (such as FedAvg [5]). The process of Algorithm 1 used in federated learning is as follows. The client participating in the round is selected through Line 4. The process of receiving the weight values learned from the client is accomplished through Lines 5 and 6. When the weight collection is completed, the average of the weights collected through Line 7 is calculated, and then the global weights are calculated. The client receives the global weights from the server and learns the data through Lines 8–10.
Algorithm 1FederatedAveraging (FedAvg) algorithm (simplified from [5]); K = number of clients; E = client total epochs; Select client by the C ratio.
1:
functionServerExecutes
2:
    initialize w 0
3:
    for each round t = 1 , 2 , do
4:
         S t (random set of max ( C · K , 1 ) clients)
5:
        for each client k S t in parallel do
6:
             w t + 1 k ClientUpdate(k, w t )
7:
         w t + 1 (averaging of the collected weights w t + 1 k of S t clients)
 
        
 
        
8:
functionClient Update(k, w)
9:
     Perform learning process on client k with weight w until the client reaches E epoch
10:
    w updated weight after learning
11:
   return w to server
Next, the proposed model, FedPSO, receives the model weights only for the client that provided the best score so that the model weights do not need to be transmitted from all clients. The process is shown in Figure 3. The best score uses the lowest loss value derived after training on the client. This loss value is only 4 bytes. FedPSO identifies the best model through p b e s t and g b e s t variables and updates using the value of V for each weighted array element of the best model.
Figure 3. The weight update process of FedPSO; the server receives a client’s score and requests a learning model from the client who submits the optimal value to set it as a global model.
As the ANN weight values were updated in Equation (1), we can represent the weight update those for FedPSO as follows:
V l t = α · V l t 1 + c 1 · r a n d 1 · ( p b e s t V l t 1 ) + c 2 · r a n d 2 · ( g b e s t V l t 1 ) w i t = w i t 1 + V t
In Equation (2), V in ANN has a value for each layer of weight w. The current step weight w t is obtained by adding V to the previous step weight w t 1 . As in Equation (1), α is a constant representing the inertia weight, c 1 is the acceleration constant for p b e s t , and c 2 is the acceleration constant for g b e s t . The values of r a n d 1 and r a n d 2 are any random value between 0 and 1.
Based on the weight update equation (Equation (2)), we present the conceptual algorithm of FedPSO in Algorithm 2. The algorithm is extended based on Algorithm 1 applying PSO. Unlike conventional algorithms, Function ServerExecutes receives only p b e s t values, without receiving w from the client on Line 5. The task of finding the client with the minimum p b e s t value among those collected is executed through Lines 6–8. Function ClientUpdate proceeds the ANN applying the PSO. Lines 13–14 calculate Variable V used in the previous step, the optimal value of w p b e s t stored by the user, and the w g b e s t value received to the server. This process is carried out for each layer weight. Then, Variable V is added to the w from the previous round to calculate the w to be used in the current round through Line 15. After that, repeat the training through Lines 16–18 as many times as the client epoch E. Function GetBestModel is a function that requests the model from the client with the best score on the server (Lines 20–23).
Algorithm 2FedPSO
1:
functionServerExecutes
2:
    initialize w 0 , pbest, gbest, gid
3:
    for each round t = 1 , 2 , do
4:
        for each client k in parallel do
5:
            p b e s t ClientUpdate(k, w t g i d )
6:
           if g b e s t > p b e s t then
7:
                g b e s t p b e s t
8:
                g i d k
 
        
9:
         w t + 1 GetBestModel(gid)
 
        
10:
functionClientUpdate(k, w t g i d )
11:
    initialize V , w , w p b e s t , α , c 1 , c 2
12:
     β (split ρ k into batches of size B)
13:
    for each weight layer l = 1 , 2 , do
14:
         V l α · V l + c 1 · r a n d · ( w p b e s t V l ) + c 2 · r a n d · ( w t g b e s t V l )
15:
     w w + V
16:
    for each client epoch i from 1 to E do
17:
        for batch b B do
18:
            w w η l ( w ; b )    
 
        
19:
    return pbest to server
 
        
20:
functionGetBestModel(gid)
21:
    request to Client(gid)
22:
    receive w from Client
23:
    return w to server

4. Experiments

To evaluate the effectiveness of FedPSO, we conducted experiments to determine the accuracy and convergence speed and experiments in an unstable network environment. In the first experiment, we wanted to determine whether the model had sufficient accuracy and convergence speed, given its smaller amount of network communication than FedAvg. We used the Canadian Institute for Advanced Research (CIFAR-10) and Modified National Institute of Standards and Technology (MNIST) datasets for the accuracy benchmarks of the two algorithms and reviewed the cost of data communication between clients and servers. In the second experiment, we investigated the accuracy of FedPSO and FedAvg under various network environments.

4.1. Experimental Setup

We conducted the experiments on a server (desktop computer) with an AMD Ryzen 3950x CPU, two NVIDIA GeForce RTX 2070 Super GPUs with 8 GB DRAM each, and 64 GB memory. Our experimental code was written using TensorFlow version 2.3.0 and Keras version 2.4.3. The code is available in the FedPSO GitHub (FedPSO GitHub; https://github.com/tjdghks994/FedPSO).
The study was proposed to improve the network communication performance of federated learning. Thus, we updated the weights of the distributed model using the PSO and changed the form of the data sent by the client to the server. The CNN model produced high accuracy but was not used because it was complicated. Therefore, we conducted experiments using a two-layer CNN model (the first with 32 channels, the second with 64, each followed by 2 × 2 max pooling), the same as FedAvg [5]. The layers of the corresponding model are shown in Table 1.
Table 1. Parameters settings for the CNN.
The experiment was conducted using the CIFAR-10 and MNIST dataset. CIFAR-10 is a dataset frequently used for image classification. It consists of 32 × 32-pixel images from 10 classes such as airplane, automobile, and cat, and it has 50,000 training images and 10,000 test images. MNIST is another computer vision dataset used for image classification and verification. It consists of handwritten 28 × 28-pixel images of numbers, and it has 60,000 training images and 10,000 test images. Both datasets were shuffled, assigned to particle numbers, and distributed to each particle to proceed with training.
The separate tuning process to improve accuracy during the training process was not used except for the dropout layer. Both FedPSO and FedAvg used SGD methods for client training, and the learning rate value was 0.0025. The hyperparameter value used in the paper is also shown in Table 2.
Table 2. The constant of our proposed model.

4.2. Experimental Result for Accuracy

The accuracy experimental results with the CIFAR-10 dataset are presented in Figure 4 and Table 3. All of these graphs were based on test accuracy. FedPSO produced a higher accuracy (70.12%) than FedAvg in all cases at 30 epochs, and it was more accurate from an early epoch. The highest accuracy of FedAvg was 67.14% at C = 1.0 . C is a constant between 0 and 1 that restricts the number of clients to be used for training in FedAvg. In each communication round, the experiment was conducted by selecting a client as high as C from all the clients. The higher the value of C in Figure 4 and Figure 5, the higher the accuracy, but the amount of data transmitted between the server and client increases accordingly. At C = 0.5 , which has similar data transfer costs, the difference in accuracy is greater (65.00% for FedPSO).
Figure 4. Accuracy comparison of several algorithm.
Table 3. Test Accuracy.
Figure 5. Communication Cost comparison of several algorithm.
The accuracy experimental results with the MNIST dataset are presented in Figure 6 and Table 4. The MNIST dataset results in good performance even in a model with a small number of layers. Therefore, even if the size of the model was not sufficient, such as MNIST, there was no significant difference between FedPSO and FedAvg. As shown in Figure 6 and Table 4, the difference in accuracy between the two algorithms is negligible, approximately 0.1%. However, in the case of FedPSO, convergence occurs in fewer epochs.
Figure 6. Comparison of learning accuracy using MNIST.
Table 4. Test Accuracy.

4.3. Experimental Result for Unstable Network Environment

Next, we emulated an unstable network environment. Data transmitted from client to server were dropped randomly in each communication round. To confirm the difference in accuracy between the two algorithms in this environment, data were dropped in the ranges of up to 0%, 10%, 20%, and 50%. Finally, for the validity of the experiment, all experiments were conducted through the average value after 10 experiments. Figure 7 shows the result of randomly dropping data for FedAvg when C = 1.0 . FedAvg shows an average decline in accuracy of 6.43% caused by the random data drops. Figure 8 shows the results for FedPSO, which experienced an average accuracy decrease of 2.43%. Detailed accuracy results are given in Table 5. In the experiment testing the model in an unstable network environment in which the data cannot be transmitted completely, FedPSO’s accuracy reduction better than FedAvg by 4%.
Figure 7. Comparison of FedAvg (C = 1.0) test accuracy in unstable network conditions.
Figure 8. Comparison of FedPSO test accuracy in unstable network conditions.
Table 5. Difference in accuracy according to the probability of communication failure.

5. Conclusions

This study proposed a particle swarm optimization-based FedPSO algorithm to improve the network communication performance of federated learning and reduce the size of data sent from clients to servers. The proposed algorithm aggregates the model trained on the server by sharing the score value. The client with the best score provides the trained model to the server. The proposed algorithm was trained on the CIFAR-10 datasets through a two-layer CNN. On average, it produced an accuracy improvement of 9.47% over FedAvg and an accuracy improvement of 5.12% in experiments when communication costs were similar. When the same number of clients were used for training, the accuracy improved by 2.98%, even when the network communication cost was greatly reduced to the 55% level. The results showed that FedPSO can perform federated learning even in situations in which network communication is unstable and it is difficult to send large amounts of data to servers. In addition, when communication data are randomly dropped, FedPSO is on average 4% more robust than FedAvg. However, in a model that does not require a deep layer, such as MNIST, there was no significant difference between the two algorithms.
In the future, we plan to apply diverse forms of PSO to improve network communication performance. For example, we will study how to reduce the probability of falling into local minima by using dynamic multi-swarm PSO and allow client P2P communication using P2P-PSO. For further network communication efficiency with frequent client drops and limited network bandwidth, we plan to apply diverse network protocols such as the gossip protocol [14]. Moreover, as described above, when the ANN layer increases, the size of the model increases proportionally. Therefore, we plan to experimentally verify the results that can be displayed for each layer size in a model using deeper layers in the future.

Author Contributions

Conceptualization, S.P.; Project administration, S.P.; Software, S.P.; Supervision, J.L.; Validation, J.L.; Visualization, S.P.; Writing–original draft, S.P. and Y.S.; Writing–review & editing, S.P., Y.S. and J.L. manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Acknowledgments

This research was supported in part by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2020-2018-0-01799) supervised by the IITP(Institute for Information & communications Technology Planning & Evaluation; This work was also supported in part by the National Research Foundation of Korea(NRF) grant funded by the Korea government (MSIT) (No. 2018R1C1B5083050).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zhang, C.; Patras, P.; Haddadi, H. Deep Learning in Mobile and Wireless Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 2224–2287. [Google Scholar] [CrossRef]
  2. Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated Learning for Mobile Keyboard Prediction. arXiv 2019, arXiv:1811.03604. [Google Scholar]
  3. Konečný, J.; McMahan, H.B.; Ramage, D. Federated Optimization: Distributed Optimization Beyond the Datacenter. arXiv 2015, arXiv:1511.03575. [Google Scholar]
  4. Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtarik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. In Proceedings of the NIPS Workshop on Private Multi-Party Machine Learning, Barcelona, Spain, 9 December 2017. [Google Scholar]
  5. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Artificial Intelligence and Statistics; Singh, A., Zhu, J., Eds.; PMLR: Fort Lauderdale, FL, USA, 2017; Volume 54, pp. 1273–1282. [Google Scholar]
  6. Eberhart, R.; Kennedy, J. A new optimizer using particle swarm theory. In Proceedings of the MHS’95—Sixth International Symposium on Micro Machine and Human Science, Nagoya, Japan, 4–6 October 1995; pp. 39–43. [Google Scholar] [CrossRef]
  7. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95—International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
  8. Chen, Y.; Sun, X.; Jin, Y. Communication-Efficient Federated Deep Learning With Layerwise Asynchronous Model Update and Temporally Weighted Aggregation. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 4229–4238. [Google Scholar] [CrossRef] [PubMed]
  9. Zhu, L.; Liu, Z.; Han, S. Deep Leakage from Gradients. In Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32, pp. 14774–14784. [Google Scholar]
  10. Zhao, S.Z.; Liang, J.J.; Suganthan, P.N.; Tasgetiren, M.F. Dynamic multi-swarm particle swarm optimizer with local search for Large Scale Global Optimization. In Proceedings of the 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–6 June 2008; pp. 3845–3852. [Google Scholar] [CrossRef]
  11. Zhao, S.; Suganthan, P.N.; Das, S. Dynamic multi-swarm particle swarm optimizer with sub-regional harmony search. In Proceedings of the IEEE Congress on Evolutionary Computation, Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar] [CrossRef]
  12. Xu, X.; Tang, Y.; Li, J.; Hua, C.; Guan, X. Dynamic multi-swarm particle swarm optimizer with cooperative learning strategy. Appl. Soft Comput. 2015, 29, 169–183. [Google Scholar] [CrossRef]
  13. Sun, S.; Abraham, A.; Zhang, G.; Liu, H. A Particle Swarm Optimization Algorithm for Neighbor Selection in Peer-to-Peer Networks. In Proceedings of the 6th International Conference on Computer Information Systems and Industrial Management Applications (CISIM’07), Elk, Poland, 28–30 June 2007; pp. 166–172. [Google Scholar] [CrossRef]
  14. Biazzini, M. A Flexible P2P Gossip-based PSO Algorithm. In Proceedings of the ICN 2014, The Thirteenth International Conference on Networks, Nice, France, 23–27 February 2014; pp. 81–85. [Google Scholar]
  15. Sahu, A.; Panigrahi, S.K.; Pattnaik, S. Fast Convergence Particle Swarm Optimization for Functions Optimization. Procedia Technol. 2012, 4, 319–324. [Google Scholar] [CrossRef]
  16. Wang, B.; Sun, Y.; Xue, B.; Zhang, M. A Hybrid GA-PSO Method for Evolving Architecture and Short Connections of Deep Convolutional Neural Networks. In PRICAI 2019: Trends in Artificial Intelligence; Nayak, A.C., Sharma, A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 650–663. [Google Scholar]
  17. Rachmad Syulistyo, A.; Purnomo, D.; Rachmadi, M.; Wibowo, A. Particle Swarm Optimization (PSO) for Training Optimization on Convolutional Neural Network (CNN). J. Ilmu Komput. Dan Inf. 2016, 9, 52. [Google Scholar] [CrossRef]
  18. Fernandes Junior, F.E.; Yen, G.G. Particle swarm optimization of deep neural networks architectures for image classification. Swarm Evol. Comput. 2019, 49, 62–74. [Google Scholar] [CrossRef]
  19. Serizawa, T.; Fujita, H. Optimization of Convolutional Neural Network Using the Linearly Decreasing Weight Particle Swarm Optimization. arXiv 2020, arXiv:2001.05670. [Google Scholar]
  20. da Silva, G.L.F.; Valente, T.L.A.; Silva, A.C.; de Paiva, A.C.; Gattass, M. Convolutional neural network-based PSO for lung nodule false positive reduction on CT images. Comput. Methods Progr. Biomed. 2018, 162, 109–118. [Google Scholar] [CrossRef]
  21. Santucci, V.; Milani, A.; Caraffini, F. An Optimisation-Driven Prediction Method for Automated Diagnosis and Prognosis. Mathematics 2019, 7, 1051. [Google Scholar] [CrossRef]
  22. Wang, B.; Moayedi, H.; Nguyen, H.; Foong, L.K.; Rashid, A.S.A. Feasibility of a novel predictive technique based on artificial neural network optimized with particle swarm optimization estimating pullout bearing capacity of helical piles. Eng. Comput. 2020, 36, 1315–1324. [Google Scholar] [CrossRef]
  23. Band, S.S.; Janizadeh, S.; Chandra Pal, S.; Saha, A.; Chakrabortty, R.; Shokri, M.; Mosavi, A. Novel Ensemble Approach of Deep Learning Neural Network (DLNN) Model and Particle Swarm Optimization (PSO) Algorithm for Prediction of Gully Erosion Susceptibility. Sensors 2020, 20, 5609. [Google Scholar] [CrossRef] [PubMed]
  24. Qolomany, B.; Ahmad, K.; Al-Fuqaha, A.; Qadir, J. Particle Swarm Optimized Federated Learning For Industrial IoT and Smart City Services. arXiv 2020, arXiv:2009.02560. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.