Next Article in Journal
Deep-Learning-Aided RF Fingerprinting for NFC Relay Attack Detection
Previous Article in Journal
Outsourcing Authentication, Authorization and Accounting, and Charging and Billing Services to Trusted Third Parties for Future Consumer-Oriented Wireless Communications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Federated Learning Framework against Data Poisoning Attacks on the Basis of the Genetic Algorithm

1
College of Science, North China University of Science and Technology, Tangshan 063210, China
2
Hebei Key Laboratory of Data Science and Application, Tangshan 063210, China
3
Tangshan Key Laboratory of Data Science, Tangshan 063210, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(3), 560; https://doi.org/10.3390/electronics12030560
Submission received: 5 December 2022 / Revised: 4 January 2023 / Accepted: 19 January 2023 / Published: 21 January 2023

Abstract

:
Obtaining the balance between information loss and training accuracy is crucial in federated learning. Nevertheless, inadequate data quality will affect training accuracy. Here, to improve the training accuracy without affecting information loss, we propose a malicious data detection model using the genetic algorithm to resist model poisoning attack. Specifically, the model consists of three modules: (1) Participants conduct single point training on data and upload accuracy to the third-party server; (2) Formulate data scoring formula based on data quantity and quality; (3) Use the genetic algorithm to obtain the threshold which makes the score highest. Data with accuracy which exceeds this threshold can participate in cooperative training of federated learning. Before participating in training, participants’ data is optimized to oppose data poisoning attacks. Experiments on two datasets validated the effectiveness of the proposed model. It was also verified in the fashion-MNIST data set and cifar10 data set that the training accuracy of GAFL is 7.45% higher than that of the federated learning model in the fashion-MNIST data set and 8.18% in the cifar10 data set.

1. Introduction

The development of deep learning is inseparable from a large number of datasets. However, data owners are unwilling to publish their own data, which brings challenges to the development of machine learning. To tackle this problem, federated learning [1,2,3] technology was developed. Participants can train in federated learning without sharing their original data [4]. Firstly, each participant trains their data locally. Secondly, participants return parameters to the parameter server. Lastly, the parameter server returns the updated parameters to participants.
The federated learning model guarantees the security of participants’ original data to a certain extent but attacks still occur [5]. Possible attacks on the federated learning model are mainly divided into internal attacks and external attacks. The internal attacks are divided into attacks from participants and attacks from the parameter server. Attacks from participants are by malicious participation of participants, and malicious nodes pretending to be participants that participate in the training process [6,7,8]. In addition, it is also possible that low quality of participants’ data may affect the training accuracy. The attacks from the parameter server mainly include the parameter server not aggregating parameters according to rules, or malicious elements disguising themselves as parameter servers to obtain parameter information, etc. [9]. External attacks mainly include attacks by malicious parameter servers, attacks by malicious participants, and attacks in parameters and model transmission processes [10,11].
We mainly discuss the data poisoning attacks in the federated learning model in this paper. In the participation stage, there may be insufficient quality in the participants’ data, and the participants’ low-quality data may affect the training accuracy. The poor quality of data and malicious participants may affect the training accuracy. For the sake of tackling the above problems, the genetic algorithm [12] is applied to the federated learning model. After participants join the federated learning model, the genetic algorithm is used to optimize the participants’ data and select the data combination with the highest score to participate in model training.
We propose an improved federated learning model on the basis of the genetic algorithm. In the participation stage of the federated learning model, the genetic algorithm is used to find the optimal combination of data to avoid data poisoning attacks.
The specific contributions are summarized as follows:
(1)
This paper applies the genetic algorithm to the federated learning model. In the participation stage of the federated learning model, the genetic algorithm is used to find the optimal combination of data to avoid the impact of poor quality data on the training process of the federated learning model;
(2)
It is verified in the fashion-MNIST data set and the cifar10 data set that the training accuracy of GAFL is 7.45% higher than that of the federated learning model in the fashion-MNIST data set and 8.18% in the cifar10 data set;
(3)
It is verified in the fashion-MNIST data set and the cifar10 data set that the comprehensive score of GAFL is the highest.

2. Related Work

In federated learning, the training accuracy may be affected due to the poor quality of the data owned by the participants and the possible inclusion of malicious participants. Many scholars are committed to solving this problem. Targeting the problem that mobile devices may upload unreliable data, Kang [13] and others introduced the concept of reputation into federated learning. In the training of federated learning, blockchain technology was used to achieve the reputation management of mobile devices; Jiang [14] et al. proposed a privacy preserving federated learning with member proof, and introduced a privacy preserving federated learning scheme with member proof; Tran [15] et al. described federated learning as an optimization problem, decomposed the problem into several sub problems, and obtained the global optimal solution to improve the performance of the model. Facing the problem of privacy leakage, a federated learning model on the basis of differential privacy was designed, but some participants have too little data, and the use of differential privacy technology will reduce the availability of data. Truex [16] et al. proposed a privacy protection method combining differential privacy and multi-party secure computing, which reduces the information loss in the training process of the federated learning model; Xu [17] et al. proposed hybridalpha, which is a privacy protection federated learning method using a SMC protocol on the basis of function encryption. The protocol is simple, efficient, flexible to the existing participants, and provides the same model performance and privacy guarantee as the existing solutions. Nevertheless, the accuracy of the model will still be affected due to the insufficient data quality. Zhao [18] et al. proposed a novel poisoning defense generative adversarial network (PDGAN) to defend against poisoning attack, which can reconstruct training data from model updates and audit the accuracy for each participant model by using the generated data.
In order to tackle the problem that malicious participants will destroy model training and the data quality owned by the participants is poor, to defeat data poisoning attacks, we applied the genetic algorithm to the federated learning model. In the participating phase, the genetic algorithm is used to solve the optimal combination of the data held by participants, so as to improve the performance of the federated learning model.

3. Basic Theory

3.1. Federated Learning

Federated learning was first proposed by Google in 2016. It is a distributed machine learning framework [19]. Data owners can store the data locally, and then complete the collaborative training of the model without sharing. It effectively tackles the problem that most of the data are scattered in different institutions and the data owners are unwilling to share the data [20].
Data owners can save the data locally, use the raw data locally for model training, return parameters to the parameter server after training, and the parameter server will return the updated parameters to the local. The specific process is shown in Figure 1.
The working principle of federated learning is as follows:
(1)
Parameter server sends initial weight W t 1 to the participants;
(2)
Participants use their own data sets to train locally;
(3)
Participants send the training parameters to the parameter server;
(4)
The parameter server updates the parameters of the global model;
(5)
The parameter server returns the parameters of the global model W t to participants;
The parameter server first sends initial model and weight parameters to participants, as shown in process ①. The participants use local data to train on the model and update the model weight parameters, as shown in process ②. The participants return the updated global model weight parameters to the parameter server, as shown in process ③. The parameter server aggregates the parameters sent by each participant, as shown in process ④.

3.2. The Genetic Algorithm

In 1967, Professor John Holland and his doctoral student Bagley proposed a method to search for the optimal solution based on the idea of Darwin’s theory of evolution. This algorithm is the genetic algorithm [21]. The genetic algorithm searches for the optimal solution by simulating the evolution process of nature. In the genetic algorithm, each solution to a problem is represented by a chromosome, and the nodes constituting each solution are represented by genes. The optimal solution is found by simulating the process of chromosome selection, crossover and mutation in nature [22,23].
The working process of the genetic algorithm is as follows and shown in Figure 2:
(1)
Initialize a population randomly;
(2)
Assess the fitness of each individual;
(3)
The individuals with high fitness are selected for chromosome cross mutation;
(4)
Repeat the above process until the best individual is found.
In this paper, we formulate a formula
W i = { 2 × ( A i 100 % ) , A i σ 1 , A i < σ
according to the quality and quantity of data, and use the genetic algorithm to calculate the maximum value of i n W i . W i is the score formula of data combination, which is calculated according to the accuracy and data quantity. The larger W i , the higher the data quality. W i is the score of data i . A i is the accuracy of the i th data. When the accuracy A i of data i is less than σ , W i equals −1. When the accuracy A i of data i is higher than σ , W i equals 2 × ( A i 100 % )   i n W i represents the best combination of data for training. Since we want to determine which data are suitable for training and which data are not suitable for training, we use the formula to select the appropriate value of σ .
When i n W i obtains the maximum value, the value σ is the threshold value of the federated learning model. Data whose training accuracy exceeds σ can participate in federated learning model training.

4. Federated Learning Model on the Basis of the Genetic Algorithm

4.1. Algorithm Description

While data owners participate in federated learning for collaborative training, datasets with uneven quality will affect the accuracy of training results. In order to tackle this problem: firstly, the scoring Formula (1) is formulated according to the amount of data the participants have and the score of the data; secondly, before participants use their own data for model training, they train the data locally and score the data according to Formula (1); finally, the genetic algorithm is used to obtain the threshold σ . Data with accuracy exceeding the threshold σ can participate in the cooperative training. The procedure is as follows: (1) Participants obtain the initial model from the parameter server; (2) Participants train original data in the initial model; (3) Participants return the parameters and training results of each piece of data to the parameter server after training; (4) The parameter server searches for the optimal solution of data combinations based on the genetic algorithm; (5) The parameter server sends the updated global model weight parameters and number of data in data combination to the participants; (6) Participants use the updated parameters for training.
When the federated learning model is used for data cooperative training, each participant dynamically joins the cooperative training, there may be quality problems in the datasets, and malicious nodes may affect the training results. In order to obtain the optimal training results, the genetic algorithm is used to optimize the selection of the datasets.
The flow of the algorithm is as follows and shown in Figure 3:
  • Step 1: Participants train original data locally;
  • Step 2: Determine the weight between data availability and accuracy;
  • Step 3: Use the genetic algorithm to find the optimal data combination according to the accuracy of the data;
  • Step 4: All participants dynamically join the model;
  • Step 5: The parameter server sends the initial model to the participants;
  • Step 6: Participants’ nodes use local data to train in the model locally;
  • Step 7: The trained model parameters are sent to the parameter server;
  • Step 8: The parameter server integrates model parameters;
  • Step 9: The parameter server sends the updated parameters to participants;
  • Step 10: Participants use the new model weight parameters to conduct a new round of iterative training.

4.2. Algorithmic Pseudocode

The Pseudocode of the federated learning model on the basis of the genetic algorithm can be seen in Algorithm 1:
Algorithm 1. Pseudocode of the federated learning model on the basis of the genetic algorithm. federated learning model based on the genetic algorithm
Input: fashion-MNIST dataset
Output: optimal training results
1: Initialize the connection weight W and connection threshold T of neural network
2: Do{
3:  Calculate sample data y j = f ( β j θ j ) // y j is the data of sample j , f ( β j θ j ) is the calculation formula in the neural network
4:  Calculate the gradient g of neurons which are in the output layer
5:  Calculate the gradient e of neurons which are in the hidden layer
6: }while(reach termination conditions)
7: Output accuracy F i ( a c c u r a c y ) // F i ( a c c u r a c y ) is the single point training accuracy rate of the i th participant, i refers to the number of participating nodes
8: Initialize P m , P c , m , G , T f // P c refers to the probability of crossover, P m refers to the probability of mutation, m refers to the population size, G refers to the algebra of terminating evolution, T f refers to the fitness function of any individual generated by evolution exceeds T f
8:  Generate the first generation population pop randomly
9:  for (any chromosome score exceeding T f , or reproduction algebra exceeding g)
10:   Calculate the fitness of each individual in the population pop F(i)// i refers to the number of participating nodes
11:  Initialize empty population newpop
12:  for M child is created
13:  Select two individuals from the population pop by proportional selection algorithm according to fitness
14:   if ( r a n d o m ( 0 , 1 ) < P c )
15:   Perform crossover operation on the above two individuals according to crossover percentage P c
16:   if ( r a n d o m ( 0 , 1 ) < P m )
17:   Perform mutation operation on the above two individuals according to mutation percentage P m
18:  Add two new individuals to the population newpop
19:  Replace pop with newpop
20:for i = 1 to n // n refers to the number of data in the data set
21:  for j = 1 to n
22:  Build initial model
23:  train models
24:  exchange parameters
25:  update models
26: end for
27:end for

4.3. Model Description

The improved federated learning model is divided into the following three parts. (1) in the participation process of the cooperative training, the neural network is used to predict data prediction accuracy; (2) according to data quality and data quantity, the scoring formula is formulated; (3) the genetic algorithm is used to obtain the threshold σ , data which exceed this threshold can participate in the cooperative training.

4.3.1. Technical Challenge

The core technical challenge is how to score the participants’ data according to data quantity and data quality, and then use the genetic algorithm to find the optimal solution of picture combination and determine the threshold which makes the training performance the best.
We mainly divide the factors of data which influence the model precision into the following two aspects: the availability of data and the accuracy of prediction. In this paper, our core idea is to improve accuracy on the basis of ensuring the maximum availability of data. The genetic algorithm is used to find the optimal data combination. The principle is to set a threshold for the prediction accuracy of the data. The data with accuracy which exceeds this threshold can participate in the training process, and the data with accuracy below this threshold will be eliminated. We set a formula to judge the score of the data combination and use the genetic algorithm to find the parameter with the maximum value, which is the threshold of the data. The formula is as follows:
W i = { 2 × ( A i 100 % ) , A i σ 1 , A i < σ
where A i is the prediction accuracy of the i th data. We assume that the threshold is σ , and the data with accuracy exceeding the threshold can participate in training. If one unit of data cannot participate in training the model, the score of the data combination will be reduced by one point. When one unit of data can participate in training, if the classification is correct, the score of data combination will be increased by two points, and if the classification is not correct, the data combination will be reduced by two points.

4.3.2. Technical Details

The technical details of the federated learning model based on the genetic algorithm are mainly divided into the following three parts. In the first part, the neural network is used to score the data according to the prediction accuracy. Here, we mainly use the classification function of the neural network. Firstly, initialize the connection weight and connection threshold of the neural network, then calculate the gradient parameters of the output layer and hidden layer, and output the final prediction accuracy through the continuous iterative process of the neural network. In the second part, according to the prediction accuracy of the data, the genetic algorithm is used to find the threshold which gives the highest score of the evaluation data, and the data below the threshold is deleted. The third part uses federated learning to train the data, and obtain the final training accuracy.
The neural network is used to predict each picture data unit in the fashion-MNIST data set and the cifar10 data set. Picture information enters from the input layer of the neural network. Neurons in the hidden layer calculate according to the input information. The calculated data information is transmitted to the output layer as the final prediction result. The main work of the neural network is to predict the unknown information using the known information. For the picture data, the position of pixels in the picture is known, and the type represented by the picture is unknown information. We need to predict unknown information through the known information. In the hidden layer, the neural network predicts through the following formula:
y = Z 1 X 1 + + Z n X n
where y represents output information, ( X 1 , , X n ) indicates input information, ( Z 1 , , Z n ) represents the weight of the linear relationship between the features of the neural network and the target. We use the neural network to predict the prediction accuracy of each picture through this process.
Analysis determines the proportion of accuracy and availability of the data to the data combination score. According to the prediction accuracy of each picture predicted by the neural network, the genetic algorithm is used to determine the optimal solution of the parameter and determine the optimal combination of the federated training data through the optimal solutions of the parameter, so as to improve the availability of the data. The model needs as much data as possible. However, some data have a very low prediction accuracy and no training value, and these pictures will be deleted. But deleting too many pictures will reduce the availability of the data. Therefore, we use the genetic algorithm to find the optimal solution to determine parameters σ . According to formula (1), determine the accuracy threshold σ of the data, and use the data with accuracy higher than the accuracy threshold to participate in the training. Delete the data which has accuracy below the threshold. We use the following formula (1) to score the data combination. According to the principle of seeking the balance between data availability and data accuracy, on the basis of ensuring the quality of the data, the more data the better. Therefore, we give one score to each data. However, if this data has a poor impact on the training accuracy, we will choose to discard this data. Whether to discard the data depends on the quality of the data. We use the following criteria to judge the quality of a data. We assume that 10,000 data units can participate in the collaborative training at the beginning, and the poor quality data will be deducted. Therefore, the score of the data combination is 10,000 + i n W i , Where A i is the prediction accuracy of the i th data.
The optimization objective of the genetic algorithm is i n W i ,
W i = { 2 × ( A i 100 % ) , A i σ 1 , A i < σ
The optimization process of the genetic algorithm is as follows:
(1)
Encoding: select a number of options from W i to initialize the population, where W i is chromosome and A i is gene;
(2)
Decoding: decoding is to determine the range of A i and σ , A i [ 0 , 1 ] , σ [ 0 , 1 ] ;
(3)
Fitness calculation: since the purpose is to find the maximum value of objective function i n W i , the larger the objective function i n W i , the higher the fitness;
(4)
Selection: select individuals with higher fitness;
(5)
Cross mutation: change a binary bit;
Repeat the above (4)–(5) process until the optimal solution is obtained.
The participants’ work steps are as follows:
(1)
Participants train data locally;
(2)
Participants use the neural network to predict the accuracy of the image data;
(3)
Participants select data with higher accuracy than the set threshold for training;
(4)
Participants obtain the neural network global model and weight parameters from the parameter server;
(5)
Participants use the neural network algorithm to train original data locally;
(6)
Participants return the updated global model weight parameters to the parameter server.
The parameter server’s work steps are as follows:
(1)
The parameter server averages global model weight parameters uploaded by participants;
(2)
The parameter server returns the updated global model weight parameters to participants;
(3)
Repeat the process until the final iteration condition is reached.

5. Model Analysis

5.1. Time Complexity

The time complexity of GAFL is composed of the time complexity of using the neural network to predict data accuracy, the time complexity of the genetic algorithm to find the optimal data combination, and the time complexity of the federated learning model for training. The time complexity of the neural network prediction is O ( m × n 2 ) , the time complexity of the federated learning model training is O ( e p o c h × n × m ) , where e p o c h is the number of model iterations, n is the number of data, m is the number of parameters, the time complexity of the genetic algorithm is O ( n 2 ) , and the time complexity of the federated learning training model based on the genetic algorithm is O ( n 2 + e p o c h × n × m + m × n 2 ) .

5.2. Algorithm Security

After the neural network algorithm is used to predict single point training accuracy, the genetic algorithm is used to find the optimal data combination, and the optimized data will be used in the federated learning model training. The algorithm adopts the idea of distributed machine learning. The data is trained locally, and there is no risk of data leakage during data transmission. The data itself is secure. Each node builds the initial model independently. The new node obtains the model through the central server and trains the model locally to avoid possible model leakage during model transmission. After each training, the model parameters are exchanged between nodes through the parameter server to ensure security in the process of model parameter exchange.

6. Experimental

6.1. Experimental Environment and Data Set

The algorithm was written in the Python language in a Jupiter compiler. The operating system was macos10.15.4, and the processor was a 1.8 GHz dual core Intel Core i5 with 8GB memory. The experimental data set was the fashion-MNIST data sets of zalando research and the cifar10 data set. The fashion-MNIST data set consists of four parts: training set samples, training set labels, test set samples and test set labels. The training set is composed of 60,000 samples and the test set is composed of 10,000 samples. Each picture in the dataset is represented by 28 × 28 pixels, and each pixel is represented by a gray value. The cifar10 data set consists of four parts: training set samples, training set labels, test set samples and test set labels. The training set is composed of 50,000 samples and test set is composed of 10,000 samples. Each picture in the data set is represented by 32 × 32 pixels, and each pixel is represented by a gray value. The parameters of the genetic algorithm are set as follows: the initial population size is 100; the maximum evolutionary algebra max generation is 30; cross probability prob = 0.8; mutation prob = 0.1; the newly generated population maxPopuSize = 100.

6.2. Experimental Procedure

When training in the federated learning model, the data sets are forecasted by the neural network. The input information of the neural network are the pixels of the picture, and output information of the neural network is the classification result. According to the weight parameter relationship between input information and output information, the hidden layer of the neural network performs operations to obtain the final prediction result.
Use the genetic algorithm to look for the solution which is optimal, according to Formula (1), apply the forecasted results to join the operation procedure of the genetic algorithm to look for the solution which is optimal, determine the accuracy threshold when the score of the data joining in the experiment is the highest, and look for the optimal data combination.
During the training procedure, the parameter server sends initial model and parameters to participants. Participants train datasets locally. After the training is completed locally, participants send the updated global model weight parameters to the parameter server. The parameter server averages the global model weight parameters and returns the updated global model weight parameters to participants.

6.3. Experimental Results

We evaluate the model from two aspects: data availability and prediction accuracy. The availability of data is expressed by the formula: A = x / m , where x is the number of data participating in training, m is the total number of data, data availability is 100% in federated learning, and the availability of data in the federated learning model based on the genetic algorithm is 89.28% in the fashion-MNIST data set and 87.88% in the cifar10 data set. The prediction accuracy results are shown in Figure 4. Experimental results show that the accuracy of GAFL is 7.45% higher than that of the federated learning model in the fashion-MNIST data set and 8.18% higher than that in the cifar10 data set. At the same time, the score of the balance between data availability and accuracy is the highest.
Figure 4 shows the comparison between training accuracy of the federated learning model based on the genetic algorithm and that of the federated learning model:
It can be seen from the results in Figure 5, that with the continuous increase of iteration times epoch, the training accuracy is also increasing. The training accuracy of the GAFL is significantly higher than that of the federated learning model. Apart from the premise of the optimal solution of the GAFL, the highest importance is to improve the accuracy rate by 10% under the premise of the optimal solution of the GAFL, the accuracy rate of the GAFL is the second highest, the third is to reduce the accuracy rate by 10% under the premise of the optimal solution of the GAFL, and the fourth is the accuracy rate of the federated learning algorithm. The reason is that when improving the accuracy rate by 10% under the premise of the optimal solution of the GAFL, the data quality is better, so the accuracy is higher.
Figure 5 shows that with a continuous increase in the numbers of data n, the score is increasing. Meanwhile, the GAFL has the highest score. There is little difference in score between improving accuracy by 10% and reducing accuracy by 10% on the basis of the GAFL, as they are both lower than the GAFL model, and the federated learning model has the lowest score. The reason is that the genetic algorithm can solve the optimal combination of data, and the score of data is the highest.
Compared with the PDGAN algorithm in the federated learning data poisoning attack defense mode, the prediction accuracy of the GAFL algorithm is as follows:
It can be seen from Figure 6 that the accuracy of GAFL is higher than that of PDGAN in the cifar10 dataset and the fashion MNIST dataset. The reason is that the PDGAN algorithm solves the problem that participants mark data with a specific label incorrectly, but the PDGAN algorithm cannot detect the situation where participants only participate in model training with partly malicious data.

7. Conclusions

In this paper, we use the federated learning model based on the genetic algorithm to train the data. Before the participants participate in training, we use the genetic algorithm to optimize the participants’ data, select the most suitable data combination for the training, and apply the optimized data combination to the training process in federated learning. It solves the possible problems of malicious nodes’ participation or poor quality of data sources in the integrated training of multi-source data, and effectively improves the security of the federated learning model and the accuracy of training. Experimental results show that the accuracy of training results is 7.45% higher than the federated learning in the fashion-MNIST data set and 8.18% higher than that in the cifar10 data set. Simultaneously, the score of the balance between data availability and accuracy is the highest in the GAFL model.

Author Contributions

Conceptualization, X.C.; methodology, R.Z.; writing—review and editing, L.P.; visualization, Z.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number U20A20179.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

SMCSecure Multi-Party Computation
GAFLGenetic Algorithm Federated Learning
FLFederated Learning
PDGANa novel Poisoning Defense Generative Adversarial Network

References

  1. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S. Communication-efficient learning of deep networks from decentralized data. Artificial intelligence and statistics. PMLR 2017, 54, 1273–1282. [Google Scholar]
  2. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends® Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
  3. Gao, L.; Fu, H.; Li, L.; Chen, Y.; Xu, M.; Xu, C. FedDC: Federated Learning with Non-IID Data via Local Drift Decoupling and Correction. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10102–10111. [Google Scholar]
  4. Chen, F.; Li, P.; Miyazaki, T.; Wu, C. FedGraph: Federated Graph Learning With Intelligent Sampling. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 1775–1786. [Google Scholar] [CrossRef]
  5. Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
  6. Li, X.; Jiang, M.; Zhang, X.; Kamp, M.; Dou, Q. FedBN: Federated Learning on Non-IID Features via Local Batch Normalization. arXiv 2021, arXiv:abs/2102.07623. [Google Scholar]
  7. Li, L.; Fan, Y.; Tse, M.; Li, K.-Y. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
  8. Niknam, S.; Harpreet, S.D.; Jeffrey, H.R. Federated learning for wireless communications: Motivation, opportunities, and challenges. IEEE Commun. Mag. 2020, 58, 46–51. [Google Scholar] [CrossRef]
  9. Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 1–7. [Google Scholar] [CrossRef] [PubMed]
  10. Luo, X.; Zhao, Z.; Peng, M. Tradeoff between Model Accuracy and Cost for Federated Learning in the Mobile Edge Computing Systems. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), Nanjing, China, 29 March 2021. [Google Scholar]
  11. He, C.; Ceyani, E.; Balasubramanian, K.; Annavaram, M.; Avestimehr, S. SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks. arXiv 2021, arXiv:10.48550/arXiv.2106.02743. [Google Scholar]
  12. Jannatul, F.; Mondol, G.; Prapti, A.P.; Begumet, M.; Sheikh, M.N.A.; Galib, S.M. An enhanced image encryption technique combining genetic algorithm and particle swarm optimization with chaotic function. Int. J. Comput. Appl. 2021, 43, 960–967. [Google Scholar]
  13. Kang, J.; Xiong, Z.; Niyato, D.; Zou, Y.; Zhang, Y.; Guizani, M. Reliable Federated Learning for Mobile Networks. IEEE Wirel. Commun. 2020, 27, 72–80. [Google Scholar] [CrossRef] [Green Version]
  14. Jiang, C.; Xu, C.; Zhang, Y. PFLM: Privacy-preserving federated learning with membership proof. Inf. Sci. 2021, 576, 288–311. [Google Scholar] [CrossRef]
  15. Tran, N.H.; Bao, W.; Zomaya, A.; Nguyen, M.N.H.; Hong, C.S. Federated learning over wireless networks: Optimization model design and analysis. In Proceedings of the IEEE Infocom 2019-IEEE Conference on Computer Communications, Paris, France, 29 April 2019–2 May 2019; pp. 1387–1395. [Google Scholar]
  16. Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid approach to privacy-preserving federated learning. In Proceedings of the 12th ACM workshop on artificial intelligence and security, London, UK, 11 November 2019; pp. 1–11. [Google Scholar]
  17. Xu, R.; Baracaldo, N.; Zhou, Y.; Anwar, A.; Ludwig, H. Hybridalpha: An efficient approach for privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, London, UK, 11 November 2019; pp. 13–23. [Google Scholar]
  18. Zhao, Y.; Chen, J.; Zhang, J.; Wu, D.; Teng, J.; Yu, S. PDGAN: A Novel Poisoning Defense Method in Federated Learning Using Generative Adversarial Network. In Algorithms and Architectures for Parallel Processing; Springer: Cham, Switzerland, 2020; pp. 595–609. [Google Scholar]
  19. Mendieta, M.; Yang, T.; Wang, P.; Lee, M.; Ding, Z.; Chen, C. Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning. CVPR 2022, 55, 8387–8396. [Google Scholar]
  20. Shen, Y.; Zhou, Y.; Yu, L. CD2pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–20 June 2022. [Google Scholar]
  21. Lambora, A.; Gupta, K.; Chopra, K. Genetic algorithm-A literature review. 2019 international conference on machine learning, big data, cloud and parallel computing (COMITCon). In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
  22. Seyedali, M. Genetic Algorithm. Evolutionary Algorithms and Neural Networks; Springer: Cham, Switzerland, 2019; pp. 43–55. [Google Scholar]
  23. Sourabh, K.; Chauhan, S.S.; Kumar, V. A review on genetic algorithm: Past, present, and future. Multimed. Tools Appl. 2021, 80, 8091–8126. [Google Scholar]
Figure 1. Federated learning model (FL).
Figure 1. Federated learning model (FL).
Electronics 12 00560 g001
Figure 2. Workflow chart of the genetic algorithm.
Figure 2. Workflow chart of the genetic algorithm.
Electronics 12 00560 g002
Figure 3. Federated learning model based on the genetic algorithm (GAFL).
Figure 3. Federated learning model based on the genetic algorithm (GAFL).
Electronics 12 00560 g003
Figure 4. (a) Comparison of the accuracy of the GAFL and FL in the fashion-MNIST dataset; (b) Comparison of the accuracy of the GAFL and FL in the cifar10 dataset.
Figure 4. (a) Comparison of the accuracy of the GAFL and FL in the fashion-MNIST dataset; (b) Comparison of the accuracy of the GAFL and FL in the cifar10 dataset.
Electronics 12 00560 g004
Figure 5. (a) Score comparison of GAFL and FL in the fashion-MNIST dataset; (b) Score comparison of the GAFL and FL in the cifar10 dataset.
Figure 5. (a) Score comparison of GAFL and FL in the fashion-MNIST dataset; (b) Score comparison of the GAFL and FL in the cifar10 dataset.
Electronics 12 00560 g005
Figure 6. (a) Score comparison of the GAFL and PDGAN in the cifar10 dataset; (b) Score comparison of the GAFL and PDGAN in the fashion-MNIST dataset.
Figure 6. (a) Score comparison of the GAFL and PDGAN in the cifar10 dataset; (b) Score comparison of the GAFL and PDGAN in the fashion-MNIST dataset.
Electronics 12 00560 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhai, R.; Chen, X.; Pei, L.; Ma, Z. A Federated Learning Framework against Data Poisoning Attacks on the Basis of the Genetic Algorithm. Electronics 2023, 12, 560. https://doi.org/10.3390/electronics12030560

AMA Style

Zhai R, Chen X, Pei L, Ma Z. A Federated Learning Framework against Data Poisoning Attacks on the Basis of the Genetic Algorithm. Electronics. 2023; 12(3):560. https://doi.org/10.3390/electronics12030560

Chicago/Turabian Style

Zhai, Ran, Xuebin Chen, Langtao Pei, and Zheng Ma. 2023. "A Federated Learning Framework against Data Poisoning Attacks on the Basis of the Genetic Algorithm" Electronics 12, no. 3: 560. https://doi.org/10.3390/electronics12030560

APA Style

Zhai, R., Chen, X., Pei, L., & Ma, Z. (2023). A Federated Learning Framework against Data Poisoning Attacks on the Basis of the Genetic Algorithm. Electronics, 12(3), 560. https://doi.org/10.3390/electronics12030560

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop