A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis

Shalabi, Eman; Khedr, Walid; Rushdy, Ehab; Salah, Ahmad

doi:10.3390/info16030244

Open AccessArticle

A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis

¹

Faculty of Computers & Informatics, Zagazig University, Zagazig 44519, Egypt

²

College of Computer Science and Engineering, Taibah University, Yanbu 966144, Saudi Arabia

³

College of Computing and Information Sciences, University of Technology and Applied Sciences, Ibri P.O. Box 466, Ad Dhahirah, Oman

^*

Author to whom correspondence should be addressed.

Information 2025, 16(3), 244; https://doi.org/10.3390/info16030244

Submission received: 4 February 2025 / Revised: 10 March 2025 / Accepted: 12 March 2025 / Published: 18 March 2025

(This article belongs to the Special Issue Digital Privacy and Security, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Federated learning (FL) is a machine learning technique where clients exchange only local model updates with a central server that combines them to create a global model after local training. While FL offers privacy benefits through local training, privacy-preserving strategies are needed since model updates can leak training data information due to various attacks. To enhance privacy and attack robustness, techniques like homomorphic encryption (HE), Secure Multi-Party Computation (SMPC), and the Private Aggregation of Teacher Ensembles (PATE) can be combined with FL. Currently, no study has combined more than two privacy-preserving techniques with FL or comparatively analyzed their combinations. We conducted a comparative study of privacy-preserving techniques in FL, analyzing performance and security. We implemented FL using an artificial neural network (ANN) with a Malware Dataset from Kaggle for malware detection. To enhance privacy, we proposed models combining FL with the PATE, SMPC, and HE. All models were evaluated against poisoning attacks (targeted and untargeted), a backdoor attack, a model inversion attack, and a man in the middle attack. The combined models maintained performance while improving attack robustness. FL_SMPC, FL_CKKS, and FL_CKKS_SMPC improved both their performance and attack resistance. All the combined models outperformed the base FL model against the evaluated attacks. FL_PATE_CKKS_SMPC achieved the lowest backdoor attack success rate (0.0920). FL_CKKS_SMPC best resisted untargeted poisoning attacks (0.0010 success rate). FL_CKKS and FL_CKKS_SMPC best defended against targeted poisoning attacks (0.0020 success rate). FL_PATE_SMPC best resisted model inversion attacks (19.267 MSE). FL_PATE_CKKS_SMPC best defended against man in the middle attacks with the lowest degradation in accuracy (1.68%), precision (1.94%), recall (1.68%), and the F1-score (1.64%).

Keywords:

federated learning; privacy preserving; homomorphic encryption; secure multi-party computation; private aggregation of teacher ensembles; poisoning attacks; backdoor attack; model inversion attack; man in the middle attack

1. Introduction

In an increasingly interconnected world, federated learning (FL) is a novel machine learning (ML) approach that addresses the issues of decentralization, security, and data privacy. FL is an essential approach for distributed ML, especially because it protects user privacy and improves communication effectiveness. Because of this, it is extremely essential in the data-driven world of today, where privacy issues are critical [1]. FL is extensively utilized in several domains such as the Internet of Things (IoT), computer vision (CV), natural language processing (NLP), healthcare, financial services, and medical image analyses [2,3].

This approach allows models to be trained on distributed datasets without the need to centralize the data. It permits several parties to cooperatively train a common model without sharing their raw data, in contrast to conventional ML techniques [4]. In FL, user data are not shared with a central server; only the parameters of ML models are [5]. It can achieve great learning performance for the benefit of the client while simultaneously protecting sensitive client data and also gives customers the ability to retain control over their personal data in an environment where data breaches and privacy violations are common [6].

Despite all the advantages of FL in enhancing data privacy, FL is still vulnerable to privacy risks: the local models’ updates may disclose sensitive information about participants, attackers may still be able to obtain private data even in cases when only gradient data are transferred to central servers, attackers may infer and ultimately reconstruct private data from model updates shared between participants and the central server, an adversary or a malicious client may poison the training process by modifying local training data or gradients, affecting the shared model’s integrity, adversaries may access sensitive information by intercepting transferred data during model updates, and so on [3,7].

According to several studies, including [2,8,9], to handle the above problems of privacy concerns in FL, several privacy-preserving approaches have been proposed, such as differential privacy (DP) [10], homomorphic encryption (HE) [11], and Secure Multi-Party Computation (SMPC) [12]. The Private Aggregation of Teacher Ensembles (PATE) is a DP technique. The PATE involves a combination of multiple teacher models and one student model [13]. Each teacher model is trained on a separate subset of the data [14].

Using a Laplace or Gaussian mechanism, noise is added to the aggregated results of teacher models [15]. The student model is trained on the aggregated labels from teacher models and then used for predictions [16]. SMPC enables the cooperative computation of a function over private inputs by several parties without disclosing those data, so it ensures confidentiality by safely combining model updates in FL without disclosing individual updates to the central server [17,18].

HE is a cryptographic technique that permits calculations on an encrypted form of data called ciphertext, producing an encrypted result that, upon decryption, matches the outcome of operations conducted on the original form of the data, called plaintext [19,20]. In FL, each client only encrypts their local model using HE so that the aggregator can perform mathematical operations on it [21,22]. There have been studies that integrated one privacy-preserving technique with FL and other studies that combined more than one privacy-preserving technique with FL.

The studies in [13,23,24] integrated the PATE with FL. In [24], the authors proposed the FREDY framework, which combines the PATE with FL and applies Convolutional Neural Networks. Through the federated training of multiple teacher models, inference on publicly accessible unlabeled data, prediction aggregation with Laplace noise, and the training of a student model on the labeled data resulting from this procedure, the PATE was incorporated into FREDY. The CIFAR10 and MNIST datasets were used. The work in [25,26,27] combined SMPC with FL.

The authors of [25] proposed the PrivatEyes framework, which is an FL-based framework for the protection of gaze estimation data. PrivatEyes integrates SMPC with FL and applies Convolutional Neural Networks. The MPIIFaceGaze, GazeCapture, and NVGaze datasets were used for evaluation. Several research studies such as [11,28,29] integrated HE with FL. The authors of [28] combined HE with FL for intrusion detection in the Internet of Vehicles with limited computing resources. HE is used by Vehicle Users for the encryption of offloaded data for the purpose of transmitting it to the server. Vehicle Users also use HE to encrypt their local model updates before sending them to the central server to perform an aggregation process. A Deep Neural Network and the Edge-IIoT dataset are used.

The authors of [20,30,31] proposed combining FL with HE and SMPC. To handle users’ dropout, particularly in environments with limited resources, [30] integrated FL, HE, and SMPC. Multi-homomorphic encryption was used to perform computations on encrypted data without decrypting it. Secret sharing was used for SMPC. To handle user dropouts, secret sharing was used, along with a random mask secure aggregation mechanism that preserved the masks’ privacy. The aggregation process only needed two communication rounds, and its communication and computational overheads increased linearly with the number of users.

The work in [32] combined FL, HE, and DP. In [32], the authors integrated FL, HE, and DP to propose the FLCP framework. FLCP uses a Convolutional Neural Network. Following the computation of the local model updates, clients perform weight compression using the AWC-FedAvg algorithm. DP is applied through the addition of noise to the compressed local model updates using a Laplace mechanism. Using HE, the noisy compressed updates are encrypted. The clients transmit encrypted updates to the central server for the aggregation and updating of the global model. The MNIST and CIFAR-10 datasets are used for evaluation.

Despite the growing interest in FL security mechanisms, the literature is notably deficient in in-depth comparative studies of different combinations with different security levels. While individual privacy-preserving techniques such as HE, the PATE, and SMPC have been examined in existing studies, insufficient comparisons have been conducted regarding how these mechanisms perform when applied in different combinations. For instance, no research has ever systematically compared the performance implications of combining HE with the PATE with those implementations that combine HE, the PATE, and SMPC together. Such a comparison analysis gap makes it challenging for practitioners to make informed decisions on the best security configurations for their FL deployments. The evaluation of trade-offs among various sets of security measures, in terms of the computational overhead, communication overhead, model precision, and overall system efficiency, is a relatively less-researched subject compared to some other subjects within the FL field and thus requires additional research.

In this paper, we present a comparative study of different possible combinations of privacy-preserving methods with FL. The analysis included both the performance and security of all the generated models to evaluate the resistance of each model against various attacks. The contributions of this study are as follows:

To overcome the challenges of traditional ML, we developed an FL model using an artificial neural network (ANN) based on a Malware Dataset for the detection of malware.
To enhance the privacy of FL, we integrated various privacy-preserving techniques with FL. We developed multiple models for the combination of FL with privacy-preserving techniques. The PATE, SMPC, and HE were used for preserving the privacy of the FL model. The main focus of this work was to analyze how different privacy-preserving techniques interact when combined in federated settings, their collective impact on the model performance, and the resulting security guarantees.
We evaluated the generated models against various attacks, including poisoning attacks, a model inversion attack, a backdoor attack, and a man in the middle attack.
To the best of the authors’ knowledge, this is the first work to analyze the performance and security of different possible combinations of privacy-preserving techniques in FL. The generated models did not reduce the performance by a large percentage, but they improved the models’ robustness against various attacks. All combinations performed better than the base FL model for all evaluated attacks.

The rest of this study is arranged as follows: Section 2 addresses the background and related work on FL and its privacy-preserving techniques. In Section 3, the proposed methodology is outlined. Section 4 includes the experimental setup. Section 5 provides the results, including a performance and security analysis. Section 6 provides a discussion. Finally, Section 7 addresses the conclusions and suggested future work.

2. Background and Related Work

2.1. Federated Learning

FL is an ML approach used for distributed learning. The architecture of this approach involves various components, including the central server, clients (devices), the global model, and local models [33]. Each component plays an important role in effective model training and data privacy. The central server is responsible for coordinating the overall FL process, aggregating local model updates, and maintaining the global model while the clients, which represent individual devices participating in the FL process, perform local model training on their local datasets [34]. The global model is the primary training model that is shared and modified by all clients, while local models represent locally updated copies or modifications of the global model created by each client [35].

FL is a decentralized ML approach that enables several devices to collaboratively learn useful and essential information from their owned data without sharing them [36]. It allows for the collaborative training of an ML model through various devices on their local datasets [37]. The clients (devices) perform local training on their data. After local model training, the clients then send their local model updates (not raw data) to the central server [38]. The central server aggregates these model updates from all the clients participating in the FL process [39]. After aggregation, the central server improves the global model by updating it using the aggregated model updates. The updated global model is then sent back to clients to perform further training [40].

Scalability, the need for significant and powerful computational resources and extensive training data, data privacy issues, and possible bottlenecks in the transport and processing of data are some of the major obstacles faced by centralized advanced ML, including ANNs [41]. In order to mitigate these obstacles, FL can be integrated with ANNs. There are various studies that have improved performance and enhanced privacy by integrating FL with neural networks, such as [42,43,44]. The use of FL is becoming more and more effective in improving malware detection. FL enhances malware detection systems’ accuracy and effectively addresses privacy issues by facilitating decentralized data processing.

Compared to traditional ML approaches, FL ensures user data privacy, makes anomaly detection in IoT networks easier, improves malware detection, enhances the model accuracy, and provides better accuracy in the detection of attacks [45]. For Android malware classification, FL enhances the detection of malware and outperforms traditional deep learning techniques in terms of detection accuracy, scalability, and user privacy preservation [46]. In healthcare, through collaborative model training across multiple clients (four hospital networks), FL improves malware detection by increasing accuracy, patient privacy preservation, and resilience against malware threats [47].

2.2. Security in Federated Learning

The FL approach is vulnerable to several types of attacks, such as poisoning attacks, backdoor attacks, model inversion attacks, membership inference attacks, Property Inference Attacks, and model extraction [48]. Depending on the attack method by which the attacker changes the parameters of the local model to create a poisoned model, poisoning attacks can be classified as either data poisoning attacks or model poisoning attacks [49]. Regarding data poisoning attacks, attackers perform the attack on local training datasets either by manipulating labels (e.g., label flipping) or samples (e.g., adding noise to the training dataset) [50].

For model poisoning, the attacker directly manipulates local model updates (e.g., adding noise to model updates or sending arbitrary model updates) [51]. Poisoning attacks influence the global model’s performance [49]. To compromise the global model in an FL configuration, backdoor attacks modify the local models. The attacker’s goal in these attacks is to incorporate a trigger into one or more local models so that, when the trigger is present in the data inputs, the global model behaves in a particular way [52]. Backdoor attacks mislead the global model to generate inaccurate outputs when given backdoor inputs.

In model inversion attacks, the attacker tries to extract sensitive information or reconstruct the original training dataset by manipulating the final global model output [53]. In membership inference attacks, the attacker tries to discover if a particular data point is a member or non-member of the training dataset used for model training [54]. In Property Inference Attacks, the attacker aims to extract sensitive features included in the dataset used for model training [54]. For model extraction or stealing attacks, the adversary tries to steal model functionality by stealing the model parameters and hyperparameters. By only querying the target model, an attacker can create a replacement model in model extraction attacks that has nearly all of the functions of the target model [54].

In ML, the attack success rate (ASR) measures how well an attack can breach, manipulate, or compromise an ML model [55]. This success rate can be used as an evaluation metric for the effectiveness of attacks on ML models [56]. This metric is essential for assessing how resilient ML models are to different types of attacks, particularly in light of the growing concerns surrounding adversarial attacks [57]. The metric provides the percentage of successful attacks on the ML model [58,59].

Equation (1) shows the general calculation of the ASR. The ASR can by calculated by dividing the number of attempts in which the attack actually succeeded in fooling the ML model by the total number of attack attempts, and the result is then multiplied by 100 to obtain the percentage. Specifically, in the case of poisoning attacks, the ASR will be the proportion of successful poisoning attempts to the total number of poisoning attempts, as noted in studies of ML threats that highlight the exploitation of vulnerabilities in ML systems [60,61]. For backdoor attacks, it represents the proportion of successful backdoor triggers to the total number of trigger attempts [59,62].

ASR = \frac{Number of Successful Attacks}{Total Number of Attack Attempts} \times 100 %

(1)

For a model inversion attack, the ASR is the proportion of correctly reconstructed inputs to the total number of reconstruction attempts [63,64]. Regarding a membership inference attack, it is the ratio of correctly inferred membership instances to the total number of inference attack attempts [65,66]. For model stealing attacks, it will be the ratio of successfully replicated functionalities to the total number of stealing attack attempts [67,68], and so on for other attacks. This metric is important because model theft can disrupt competitive advantages in ML and result in large financial losses [69].

2.3. Privacy-Preserving Techniques

There are several techniques for privacy preserving in ML, such as the PATE, SMPC, and HE.

2.3.1. Private Aggregation of Teacher Ensembles

The PATE [70] is considered a popular privacy-preserving approach [71]. It is an ML-based framework in which numerous ensembles of teacher models and a student model are combined to create private models [72]. Teacher models are trained on disjoint private datasets [73]. Based on noisy voting from all the teacher models, the student model performs prediction [74]. The process involves data partitioning, teacher model training, the voting of teacher models, noise addition, aggregation, and student model training. In the data partitioning phase, the private training dataset is partitioned into n subsets, as in Equation (2).

D (X, Y) = \{D_{1} (X_{1}, Y_{1}), D_{2} (X_{2}, Y_{2}), D_{3} (X_{3}, Y_{3}), \dots, D_{n} (X_{n}, Y_{n})\}

(2)

where X is the input training data, Y is the corresponding target class labels, D is the training dataset partitioned into n disjoint subsets, n is the total number of partitions inside the dataset, and

D i (X_{i}, Y_{i})

represents the ith partition including a subset of the training data and its corresponding class labels.

In the teacher model training phase, each teacher model, T, is trained on one of n data subsets that have been divided before. The teacher model voting is performed as in Equation (3). Noise is then added to the predictions (votes) of the teacher models using a Laplace or Gaussian distribution mechanism. After that, the aggregation of these votes is performed using Equation (4).

V (X) = \{T_{1} (X), T_{1} (X), T_{1} (X), \dots, T_{n} (X)\}

(3)

where

T_{i} (X)

is a vote or prediction for the input X made by the ith teacher model, n represents the total number of teacher models, and

V (X)

is a vector containing the number of teacher models that were able to predict each potential class.

The student model is then trained using a public dataset which is labeled with the output of the aggregating teachers, Equation (4), as in [75]. There have been several studies that utilized the PATE, such as [14,16,76,77].

Y_{P A T E} (X) = a r g m a x (V (x) + n o i s e)

(4)

where

V (x)

stands for the vector of the teacher model vote counts for each potential output class for input X, noise represents the random noise extracted from a Gaussian or Laplace distribution,

a r g u m a x

chooses the class with the most noisy votes, and

Y_{P A T E} (X)

represents the final aggregated prediction.

2.3.2. Secure Multi-Party Computation

SMPC is sometimes referred to as multi-party computation (MPC) or SMC or privacy-preserving computation [78]. It is a cryptographic technique, enabling several parties to jointly compute the public function

f (x_{1}, x_{2}, x_{3}, \dots, x_{j}) \to (y_{1}, y_{2}, y_{3}, \dots, y_{j})

, where

P_{j} : x_{j} \to y_{j}

[79]. It allows these parties to collaborate to compute the result from their private individual data inputs without sharing any of their input data with other participants [80].

Only the result

(Y_{1}, Y_{2}, Y_{3}, \dots, Y_{j})

and its own data inputs,

(X_{j})

, are available to each party [78]. This keeps private data safe, even in the event that one party is compromised, and allows even untrustworthy people to securely collaborate on sensitive data [81]. There have been several studies that utilized SMPC, such as [18,82,83,84].

2.3.3. Homomorphic Encryption

HE is a cryptographic technique which offers the ability to perform computations on an encrypted form of data called ciphertext without the necessity for decrypting the ciphertext first, ensuring data confidentiality [85]. Cryptographic results are obtained from the computation; moreover, this technique guarantees that the output that is decrypted will match the output that was calculated using the original plaintext dataset [78]. According to several studies, such as [86,87], there are various categories of HE techniques based on the type and number of operations enabled by the system, including partially [88], somewhat (SWHE) [89], and fully homomorphic encryption (FHE) [29].

The most mathematically constrained type of HE is partially homomorphic encryption (PHE), which is also the most computationally practicable [90]. PHE provides either an addition or multiplication operation for an unlimited number of times [91]. SWHE enables both simple addition and multiplication for a limited number of operations [92]. FHE supports any arbitrary operations on encrypted data [78]. Though it requires more processing than any other HE kind, FHE is the strongest [90]. There are various HE schemas, but the most popular [93] are TFHE [94], BFV [95], and CKKS [96].

The CKKS (Cheon–Kim–Kim–Song) schema supports approximate encrypted computations over real and complex numbers, which is useful in ML scenarios [97]. The Brakerski/Fan–Vercauteren (BFV) scheme enables exact computations on integer ciphertexts [98]. The FHE over the Torus (TFHE) scheme allows for efficient Boolean operations on encrypted data and uses a bootstrapping method to control ciphertext noise [99]. There have been several studies utilizing the CKKS schema, including [100,101,102]. There have also been various studies utilizing the BFV schema, such as [103,104]. There have been multiple other studies utilizing the TFHE schema, such as [105,106].

2.4. Related Work

In [107], the authors proposed an FL framework for malware detection across IOT devices. Their framework utilizes both supervised and unsupervised FL models, including a multi-layer perceptron and autoencoder. They used the N-BaIoT dataset, a dataset that models the network traffic of many actual IoT devices when they are compromised by malware. They performed a Performance Comparison between the federated and centralized approaches. According to their results, privacy can be maintained while achieving performance levels similar to those of centralized models using the federated approach. Supervised FL produced high accuracy above 99%, the same as for central approaches. Unsupervised FL also achieved results similar to the central approach: a 99.98% TPR for both known and new devices, along with a 94.84% TNR (multi-epoch avg) and 95.12% TNR (mini-batch avg) in the case of known devices and 92.61% (multi-epoch avg) and 91.78% TNRs (mini-batch avg) for new devices. They did not apply any privacy-preserving technique besides FL.

In [108], the authors proposed FEDriod, an FL framework for Android malware detection based on a residual neural network (ResNet). They used the CIC, Drebin, and Contagio datasets. Their FEDriod framework achieved a 98.53% F1-score. They did not apply privacy-preserving techniques along with FL. In [109], the authors proposed an FL framework for the detection of ransomware based on RNNs, recurrent neural networks. Synthetic data were utilized for both phases of training and evaluating the model. Their model produced an accuracy of 94.7%, a precision of 92.3%, a recall of 91.8%, an F1-score of 92.0%, and an AUC-ROC of 96.1%. Besides FL, they did not apply privacy-preserving techniques.

In [110], the authors proposed a framework called FedHGCDroid, for the detection and classification of Android malware. They built their HGCDroid model using both a Convolutional Neural Network for malware’s statistical features and graph neural networks (GNNs) for malware’s graphical features. They used the Androzoo dataset. For malware detection, the HGCDroid model achieved an accuracy of 91.3%, a precision of 90.8%, a recall of 92.79%, and a 91.29% F1-score. For malware classification, the model produced an accuracy of 83.29%, a precision of 83.45%, a recall of 83.85%, and an 83.67% F1-score. This framework did not integrate privacy-preserving techniques with FL.

In [111], the authors proposed the SIM-FED model for the detection of malware in IOT devices. Their model applies both FL and deep learning. A lightweight one-dimensional CNN with improved hyperparameters is used in the model. The FedAvg strategy is utilized to include the outcomes of local models in the suggested model. The accuracy achieved by the SIM-FED model was 99.522%. In [11], the authors proposed a system called FedML-HE. The system integrated the FL approach with HE for securing the model aggregation process. The PALISADE and TenSEAL libraries were used for implementing HE. They used the CIFAR-100 and wikitext datasets.

During the training phase, the system encrypted only sensitive parameters for updating the local model to reduce the overhead. For ResNet-50, their system achieved a reduction of

\sim 10 x

. For BERT, their system could achieve a reduction of up to ∼40×. This system integrated only one privacy-preserving technique, HE, with FL. The provided system preserved only the model updates. In [32], the authors proposed FLCP, which is an FL-based framework, along with enhanced communication efficiency and privacy preservation. For privacy preservation, they integrated FL with HE and DP. FLCP was evaluated on the MNIST and CIFAR-10 datasets using a unified (CNN) Convolutional Neural Network architecture.

Local updates, weight compression, DP addition, and HE for sending encrypted updates are the processes within their FLCP framework. Each client performs local training and computes local updates. After the computation of the local model updates, clients perform weight compression using the AWC-FedAvg algorithm. Following weight compression, noise is added to the compressed local model updates using a Laplace mechanism. After the addition of DP, HE is applied to the noisy compressed updates. The encrypted updates are finally sent by the clients to the central server, which aggregates them to update the global model. The provided system tries to preserve only local model updates, and a possible loss of information may exist due to compression.

In [112], the authors proposed an FL framework using a Convolutional Neural Network in the field of Sixth-Generation (6G) wireless networks and the Internet of Medical Things (IoMT). The framework integrated SMPC using additive secret sharing for secure aggregation. A breast cancer dataset from the Histopathological Database was used. The ResNet model achieved a training accuracy of 98% and a validation accuracy of 90%. The AlexNet model achieved a training accuracy of 95% and a validation accuracy of 85%. The framework integrated only one privacy-preserving technique, SMPC, with FL. The complexity of implementing SMPC can produce a high overhead. The proposed framework was mainly focused on securely aggregating only the local model updates rather than the raw data themselves.

In [12], the authors proposed an FL framework that combines SMPC and blockchain technology in IoT networks. The framework uses SMPC protocols for the secure aggregation of model updates from participants without disclosing personal client data, while blockchain technology guarantees transaction immutability and transparency. In [23], the authors proposed FL-PATE, which is a differentially private FL framework with knowledge transfer. The framework integrates the PATE with FL to enhance privacy. In [24], the authors proposed the FREDY framework based on Convolutional Neural Networks. FREDY integrated FL, the PATE, and knowledge transfer.

The PATE was integrated into FREDY through the federated training of several teacher models, inference on publicly available unlabeled data, the aggregation of predictions with Laplace noise, and the training of a student model on the labeled data obtained from this process. The CIFAR10 and MNIST datasets were used. The study showed that for both the MNIST and CIFAR-10 datasets, increasing the number of clients and the privacy parameter

ε

generally improved the test accuracy of the student model. MNIST models performed better than CIFAR-10 models. Using MNIST, the 25-client model achieved the maximum accuracy of ∼99% at

ε = 1

.

Using CIFAR-10, the 25-client FL model (FREDY) attained the maximum accuracy of ∼79% when

ε = 1

. When

ε = 0.2

, FREDY outperformed the baseline model in terms of its membership inference attack performance, reducing all metrics with a ∼25% drop. The framework was evaluated against only one attack. It also integrated only one privacy technique with FL. In [28], the authors presented a framework based on FL and HE for intrusion detection in the Internet of Vehicles (IoVs) with limited computing resources. They used a Deep Neural Network (DNN) and the Edge-IIoT dataset. The framework includes two processes, pre-learning and privacy-preserving learning.

The proposed framework integrates the approaches of both local and centralized learning. Initially, Vehicle Users partition their data by determining the optimal amount of local training and offloading data based on their computational resources. Using HE, Vehicle Users encrypt offloaded data before transmitting it to the centralized server (CS) via Roadside Units. The central server compiles the received ciphertext into an encrypted dataset and creates its own server model for training on this ciphertext dataset. Concurrently, Vehicle Users perform local training using the server’s distributed global model, which is initialized from the server model. Following local training, Vehicle Users encrypt and transmit their local model updates to the central server for aggregation.

The central server redistributes the updated global model to Vehicle Users for further training. The presented framework achieved a high accuracy of approximately 91% in attack detection. The framework was not evaluated against any attack. It also integrates only one privacy technique with FL. In [25], the authors presented an FL-based framework using Convolutional Neural Networks called PrivatEyes, to protect gaze estimation data. The framework integrated SMPC using secret sharing. The MPIIFaceGaze, GazeCapture, and NVGaze datasets were used for evaluation. The mean angular errors (MAEs) were 6.3 for MPIIGaze, 6.2 for MPIIFaceGaze, and 0.8 for NVGaze. Using MPIIGaze, PrivatEyes produced a Re-identified of (0/15) and visual score of 4%.

The accuracy for the MPIIFaceGaze dataset dropped from 40% in the first round to 33% in the final round of the evaluation of membership inference attacks, while the accuracy for the MPIIGaze dataset dropped from 33% to 13%. Likewise, across the rounds, the NVGaze dataset revealed a drop from 37% to 12%. Furthermore, the prediction accuracy for gender categorization was significantly reduced by attribute inference attacks across the datasets; MPIIFaceGaze showed a decline from 5 out of 15 participants in the first round to 2 out of 15 in the last round, MPIIGaze showed a decline from 3 out of 15 participants in the first round to 0 out of 15 in the last round, and NVGaze showed a decline from 10 out of 32 participants in the first round to 1 out of 32 in the last round.

In [26], the authors proposed an FL system for the Internet of Medical Things. The system integrates SMPC and an additive secret-sharing method, along with FL. With this method, the federated training model’s gradient parameters—rather than the actual data—are protected. The prediction of the suggested model is made using a Convolutional Neural Network, which uses a weight-sharing technique. The model’s weight is encrypted through additive secret sharing and multiple computations. The accuracy rate of the suggested model is 97%. In addition, the suggested model’s F1-score is 89%. To the best of the authors’ knowledge, this is the first study comparing different possible combinations of privacy-preserving techniques with FL. There also is not previous work considering the combination of FL with more than two privacy-preserving methods out of the PATE, SMPC, and HE.

3. Methodology

3.1. Federated Learning Model

This research provides multiple new contributions to the area of privacy-preserving machine learning. To start, we propose a holistic framework that is the first to integrate federated learning (FL) with a wide range of privacy-preserving methods, extending beyond typical dual-method solutions. We provide a systematic evaluation of diverse sets of privacy-preserving methods paired with FL, examined across multiple dimensions. Specifically, we conducted extensive experiments to measure the resilience to multiple types of privacy attacks, tallying success rates under a variety of attack scenarios. We further examined the practical implications of these pairings on a number of performance indicators, including the model accuracy, computational overhead, and training duration efficiency. This thorough testing was particularly critical, as certain privacy-preserving methods, e.g., homomorphic encryption (HE), involve significant computational complexity. Our extensive analysis offers useful insights into the privacy protection, model performance, and computational cost trade-offs, thus serving as a helpful guideline for the deployment of privacy-preserving federated learning systems.

The proposed FL system uses a deep feed-forward ANN architecture that is intended for the classification of multiple classes. To validate our privacy-preserving FL framework, we implemented a weight-dependent model architecture. We selected an artificial neural network (ANN) for this purpose, as illustrated in Figure 1. Figure 1 illustrates the architecture of the designed and used ANN classifier. Unlike a common ANN, which uses a single ANN model, the FL system has a generic ANN model and a number of local ANN models, and all have the same architecture. The architecture of the used ANN classifier is defined on the server side. This ANN classifier is implemented using the Keras Sequential API, which enables a linear stack of layers. It consists of a sequential model with multiple layers. In particular, it consists of five conceptual layers: an input layer, three hidden layers, and an output layer. In detail, the ANN architecture includes eleven individual layers: an input layer, an output layer, and three hidden layers, each including three individual layers.

The input layer is determined by the training dataset’s feature set shape, with the number of neurons matching the feature count. This dynamic design adapts the ANN to various datasets. The model includes three hidden layers using ReLU activation. The first hidden layer has 128 neurons, Batch Normalization, and dropout (0.5). The second and third layers have 64 and 32 neurons, respectively, with similar configurations. The output layer uses a dense layer with a number of neurons equal to the unique class count, and Softmax activation generates probability distributions. The model is compiled using the Adam optimizer, categorical cross-entropy loss, and accuracy metric.

In FL, a client–server architecture involves a central server and three clients. Clients train locally on subsets, while the server coordinates learning and aggregates models. Initialization sets up a Flask app for communication, using ngrok for a public URL. The server loads, pre-processes, and splits data into training (80%) and testing (20%) sets, applies feature scaling, one-hot encodes target labels, and balances class weights. Each client similarly pre-processes its data.

The FL process begins with model distribution. Clients request the global model, train locally for five epochs, and send updated weights to the server, which aggregates them via averaging. This cycle repeats for five rounds, refining the global model with diverse data. After the FL rounds, the server fine-tunes the global model with early stopping and learning rate reduction, evaluates it using metrics like the accuracy and F1-score, and saves it in an h5 file for future use.

3.2. Implementation of Privacy-Preserving Techniques

Privacy-preserving approaches are crucial in the real-world applications of FL because of the several points of vulnerability present in the FL system. FL disperses computations and data among several devices as opposed to centralizing them. By keeping raw data local, this helps preserve privacy, but exchanging information—specifically, model updates—is still part of the process. During the FL process, local models or the model updates of weights are sent between clients and the central server, and the global model is created through an aggregation process. The communication between FL-participating clients and the central server, as well as the shared model updates, provide possible gaps where privacy may be violated.

Attackers might try to intercept and manipulate the transmitted data between the clients and server. They also might try to analyze and manipulate the final global model to expose sensitive information about the data used for training. Also, if the server is unreliable, it could potentially misuse the data it obtains for performing potential attacks. Malicious clients might try to manipulate their local raw data or manipulate the local model weights themselves before sending them to the server. Furthermore, the aggregation process itself might allow for attacks. These discussed potential real risks in the FL pipeline, which ranges from local model training to data transmission, the aggregation process, and the final global model, highlight the vital necessity for strong privacy-preserving mechanisms to be incorporated along with FL.

3.2.1. Private Aggregation of Teacher Ensembles

In this FL system, the PATE is mainly implemented on the server side. After FL, the system applies the PATE as a post-processing step to provide an extra layer of privacy protection to the FL process. Following the completion of FL rounds, the server starts the PATE procedure by creating a set of multiple teacher models. In this implementation, 10 teacher models are created using the ANN architecture. The amount of information that any one teacher model may disclose about specific data points is limited by ensuring that each teacher model has only been exposed to a small portion of the entire dataset.

Each of these teacher models is then trained using a distinct and randomly selected subset of the server’s training dataset. After completing the training of the teacher models, the system begins the aggregation phase. In this aggregation phase, all teacher models’ predictions are gathered. In order to mitigate the danger of the leakage of sensitive data, a privacy-preserving aggregation mechanism is employed instead of using raw predictions from the teacher models. This is accomplished by adding DP using Laplace noise to the total predictions from all of the teacher models. In order to control the noise injection, the privacy budget, which is represented by the epsilon parameter, is set to a smaller value of 0.1 to ensure stronger privacy guarantees.

To determine the final predictions following the noisy aggregation, the class with the highest noisy count for each data point is chosen. The final stage of the PATE process involves the creation and training of the student model. The final global model, which has already undergone the FL process, is used as the student model. The student model is trained on the new aggregated, privacy-preserving labels of predictions generated from the teacher ensemble through the PATE process.

In conclusion, the PATE is implemented as an additional processing phase subsequently to FL. It starts by creating an ensemble of teacher models. It then aggregates these teacher models’ predictions using a privacy-preserving aggregation technique employing DP. Finally, the student model (the global model) is subsequently trained using these privacy-preserving predictions. By implementing the PATE in this manner, the advantages of FL (maintaining decentralized data) can be combined with the PATE privacy guarantees.

3.2.2. Secure Multi-Party Computation

In this experiment, SMPC was added to the implementation of the predefined FL system described in Section 3.1 on both the server and client sides. L2 regularization was added to the ANN model. Each dense layer except the output layer has an L2 regularization parameter of 0.01. The PySyft library, which offers tools for privacy-preserving ML, is used for the implementation of SMPC in this FL system. PySyft workers are initialized by both the FL server and clients for secure computations and the aggregation of model weights. The server initializes and creates its own PySyft worker on the server side, whereas each client participating in the FL process also initializes and creates its own PySyft worker.

Specifically, one worker is used for the server, and there are three client workers (one worker for each client). These PySyft workers are responsible for the secure processing and computation of the data and ANN model parameters for the SMPC technique. PySyft tensors are used to manage and transmit the ANN model weights between FL-participating clients and the central server. These tensors are used primarily on the server side in the process of receiving and aggregating updated ANN model weights from clients. When the server receives updated model weights from each client, it transforms them into PySyft tensors for the secure handling of these weights in the subsequent operations of aggregation and DP.

After receiving the updated ANN model weights from all participating clients, the server performs the FL aggregation process on these weights. As the averaged model weights produced by the aggregation process could potentially reveal information about the data of individual clients, Gaussian noise is added to the aggregated average model weights in order to improve privacy and achieve DP. The noise scale is calculated based on the sensitivity and epsilon parameters. The epsilon parameter is set to a smaller value of 0.1 to ensure stronger privacy guarantees, while the sensitivity parameter, which represents the maximum effect on the output that a single data point can have, is also set to 0.1. Using these two parameters, random noise from a Gaussian distribution is generated, scaled properly, and added to the averaged weight.

On the client side, clients train their local models and then transform the resulting weights into PySyft tensors for secure aggregation and manipulation without disclosing the underlying data. All clients ensure the consistent use of L2 regularization. After receiving the global model from the server, each client dynamically applies L2 regularization (0.01) to all applicable layers of the received ANN global model. By performing this regularization method on the client side, it guarantees Consistency Across Clients by ensuring that all clients apply regularization at the same level, independently of how the model was initially established on the server side. We also tested this implementation of SMPC with FL without the addition of regularization and DP.

3.2.3. Homomorphic Encryption

HE was added to the implementation of the predefined FL system described in Section 3.1 on both the server and client sides. The CKKS (Cheon–Kim–Kim–Song) scheme is used for the implementation of HE. The TenSEAL library is used by both the server and the client to set up an identical context for the CKKS schema. This context configures crucial parameters such as the polynomial modulus degree and coefficient modulus bit sizes, which specify the computational capabilities and security level of the encryption. The polynomial modulus degree is set to 8192, while the coefficient modulus bit sizes are set to [60, 40, 40, and 60]. Additionally, the global scale, which affects the precision of encrypted computations, is set to

2^{40}

.

The encryption process is primarily conducted on the client side. The clients encrypt their model weights before sending them to the server. The encryption process can deal with various types of model layers. It behaves differently for each type. It uses the CKKS encryption technique to manage dense layer weights and biases. Each row is encrypted independently as a vector for dense layers, which normally include a 2D array of weights. Bias layers, being 1D, are encrypted as a single vector. These encrypted weights are then prepared for network transmission by serializing them into Base64-encoded strings following encryption.

On the server side, the server performs the aggregation and processing of the encrypted gradients from all participants. When the server receives the clients’ encrypted gradients, it first deserializes these received encrypted gradients back into their form of encrypted vectors and then performs the aggregation process on them without decryption. The aggregation process computes the average of these encrypted gradients from all participants. Since the ANN model includes multiple layers, each with a unique set of parameters (weights and biases), the aggregation process is performed layer by layer.

The aggregation is performed separately for each layer in order to maintain the ANN model’s structure. Dense layer weights and biases are handled separately. Regarding dense layers (2D), the gradients of each neuron are aggregated individually. All of the clients’ encrypted gradients for each neuron are deserialized, and homomorphic addition is used to add them. The outcome is then scaled using homomorphic multiplication by dividing it by the total number of clients. For bias layers (1D), since there is only one vector per layer, the procedure is comparable but less complicated. The bias’s client gradients are all deserialized and added together. After that, the outcome is divided by the total number of clients.

Throughout this aggregation process, the data remain in their encrypted form. The homomorphic properties of the CKKS scheme allow for these arithmetic operations (addition and multiplication by a scalar) to be performed on the encrypted data without the need for decryption. After finishing the aggregation, the global model is updated using its result, which is a set of averaged, encrypted gradients that show the overall update from all clients. Specifically, the resulting aggregated gradients are first decrypted by the server. To maintain training stability and enhance privacy, the server then applies gradient clipping, a method that restricts the magnitude of gradients, after decryption.

The process of gradient clipping involves computing the total L2 norm of all combined gradients and comparing it to a maximum norm that has been predetermined, in this case 1.0. In the event that the overall norm is higher than this cutoff, all gradients are resized proportionately to make sure their total magnitude stays below the threshold. This has dual purposes: it prevents the model from receiving extreme updates that could destabilize the training process, and it enhances privacy by limiting the impact of any single participant’s update on the global model. The server updates the global model using these clipped gradients.

HE is integrated with the FL model in the stage of client-side gradient encryption. Clients use the CKKS schema for HE to encrypt their computed gradients before submitting updates to the server. HE is also integrated with the FL model in the stage of server-side secure aggregation performed on encrypted gradients. This is yet another crucial phase in which HE is included. Using homomorphic operations, the server performs aggregation on the encrypted gradients from each client. This makes it possible to perform computations with encrypted material without first decrypting it. In particular, the homomorphic operations used are addition and multiplication. HE is also used in the server-side decryption process. After aggregation, the server decrypts the result to apply the clipping technique in order to maintain training stability and enhance security.

3.3. Configuration Combinations

Table 1 illustrates different combinations of FL with privacy-preserving techniques. The first column shows the model number, and the second column shows the model itself, while the included methods are represented in the last four columns (third, fourth, fifth, and sixth columns). FL, the PATE, SMPC, and HE represent the included methods. The ✓ symbol means that this method is included in the corresponding model, while the x symbol means that the method is not included in the model. For example, the third model, FL_SMPC_DP, includes FL and SMPC, and the sixth model, FL_CKKS, includes FL and HE, while the last model, FL_PATE_CKKS_SMPC, applies all the included methods of FL, the PATE, SMPC, and HE. The main purpose of including several models was to conduct a comparative study with an analysis of the performance and security of different possible integrations of privacy-preserving methods with FL.

The first model, FL_only, represents the base model to which the rest of the nine models were compared. The FL_only model uses a deep feed-forward ANN architecture which consists of five conceptual layers: an input layer, three hidden layers, and an output layer. To implement FL, a client–server architecture involving a central server and three clients is used. Clients train locally on their local data, while the server coordinates learning and aggregates models. Specifically, the FL process begins with the distribution of the ANN model. Clients request the global model, train locally for five epochs, and send updated weights to the server, which aggregates them via averaging. This cycle repeats for five rounds, refining the global model with diverse data. After the FL rounds, the server fine-tunes the global model with early stopping and learning rate reduction and evaluates it using metrics such as the accuracy, precision, recall, and F1-score.

The second model, FL_SMPC, combines SMPC with FL. The purpose behind the addition of SMPC to FL is to protect the model updates before transmission, which in turn helps in securing the communication between the server and clients. SMPC was added to the implementation of the FL_only model on both the server and client sides using the PySyft library. PySyft tensors are used to manage and transmit weights among clients and the central server. These tensors are used for both clients and the server. On the server side, the tensors are used primarily in the process of receiving and aggregating updated ANN model weights from clients. When the server receives updated model weights from each client, it transforms them into PySyft tensors for the secure handling of these weights in the averaging aggregation process. On the client side, clients train their local models and then transform the resulting weights into PySyft tensors before transmission.

The third model, FL_SMPC_DP, integrates SMPC with FL. This model protects the model updates using SMPC. To provide an extra layer of privacy protection to the FL_only model, SMPC was added to its implementation on both the server and client sides using the PySyft library. PySyft tensors are used to manage and transmit model updates between FL-participating clients and the central server. These tensors are used primarily on the server side in the process of receiving and aggregating updated ANN model weights from clients. As the averaged model weights produced by the aggregation process could potentially reveal information about the data of individual clients, Gaussian noise is added to the aggregated average model weights in order to improve privacy and achieve DP. On the client side, clients train their local models and then transform the resulting weights into PySyft tensors before transmission. To enhance security and generalization, L2 regularization was added to the ANN model on the server side. All clients also ensure the consistent use of L2 regularization. After receiving the global model from the server, each client dynamically applies L2 regularization to all applicable layers of the received ANN global model. The main difference between the FL_SMPC model and FL_SMPC_DP model is that the FL_SMPC model does not include DP and regularization, while the FL_SMPC_DP model includes this extra layer.

The fourth model, FL_PATE, integrates the PATE with FL. To provide an extra layer of privacy protection to the FL_only model, the PATE is implemented as a post-processing step. After the completion of the FL process, the PATE is implemented to protect the final FL model from being attacked. The PATE is mainly implemented on the server side; after the completion of the FL rounds, the server starts the PATE procedure by creating a set of 10 teacher models. Each of these teacher models is then trained using a distinct and randomly selected subset of the server’s training dataset. In order to mitigate the danger of the leakage of sensitive data, DP using Laplace noise is added to the total predictions from all of the teacher models. Finally, the student model (the global model) is subsequently trained using the new aggregated, privacy-preserving labels of predictions generated from the teacher ensemble through the PATE process. The student model then serves as the final global model and is used for the evaluation process.

The fifth model, FL_PATE_SMPC, combines SMPC and the PATE with FL. This model protects both the model updates using SMPC and the final FL model through the PATE. SMPC was added to the implementation of the FL_only model and SMPC was added to the FL_only model’s implementation on both the server and client sides using the PySyft library. As the averaged model weights produced by the aggregation process could potentially reveal information about the data of individual clients, Gaussian noise is added to the aggregated average model weights in order to improve privacy and achieve DP. On the client side, clients train their local models and then transform the resulting weights into PySyft tensors before transmission. To enhance security and generalization, L2 regularization was added to the ANN model on the server side. After receiving the global model from the server, each client dynamically applies L2 regularization to all applicable layers of the received ANN global model. After the completion of the FL process with SMPC, the PATE is implemented to secure the final FL model from various attacks. The PATE is mainly implemented on the server side. After the completion of the FL rounds, the server starts the PATE procedure by creating a set of 10 teacher models. Each of these teacher models is then trained using a distinct and randomly selected subset of the server’s training dataset. In order to mitigate the danger of the leakage of sensitive data, DP using Laplace noise is added to the total predictions from all of the teacher models. Finally, the student model (the global model) is subsequently trained using the new aggregated, privacy-preserving labels of predictions generated from the teacher ensemble through the PATE process. The student model then serves as the final global model and is used for the evaluation process.

The sixth model, FL_CKKS, integrates HE with FL. By adding HE to the FL_only model, the data and model updates are protected during computation and transmission, which in turn helps in securing the communication between the server and clients. Using the CKKS schema, HE was added to the implementation of the FL_only model on both the sides of the server and clients. Clients encrypt their model updates before sending them to the server, while the server performs the aggregation and processing of the encrypted updates from all participants without the need for decryption. The aggregation process computes the average of these encrypted updates from all participating clients. Throughout this aggregation process, the data remain in their encrypted form without the need for decryption. After the completion of the aggregation process, the global model is updated using its result, which is a set of averaged, encrypted gradients that show the overall update from all clients.

The seventh model, FL_CKKS_DP, integrates HE (using CKKS schema) and DP with FL. Through HE, the data and model updates are protected during computation and transmission, which in turn helps in securing the communication between the server and clients. HE based on the CKKS schema was added to the implementation of the FL_only model on both the sides of the server and clients. Clients encrypt their model updates before sending them to the server, while the server performs the aggregation and processing of the encrypted updates from all participants without the need for decryption. The aggregation process computes the average of these encrypted updates from all participating clients. Throughout this aggregation process, the data remain in their encrypted form without the need for decryption. To preserve the final model’s accuracy, as well as increase the model’s robustness against attacks targeting individual client’s training data, DP is integrated with HE and FL. DP is used to protect the privacy of individual client’s training data. The final global model is updated with noisy model updates. Using the Gaussian distribution mechanism, DP is added to the aggregated model updates of all FL-participating clients.

The eighth model, FL_CKKS_SMPC, integrates HE (using the CKKS schema) and SMPC with FL. Through HE, the FL_CKKS_SMPC model protects the data and model updates during computation and transmission, which in turn helps in securing the communication between the server and clients. Using the CKKS schema, HE was added to the implementation of the FL_only model on both the sides of the server and clients. Clients encrypt their model updates before sending them to the server, while the server performs the aggregation and processing of the encrypted updates from all participants without the need for decryption. The aggregation process computes the average of these encrypted updates from all participating clients. Throughout this aggregation process, the data remain in their encrypted form without the need for decryption. Through SMPC, the FL_CKKS_SMPC model protects the aggregation process itself. SMPC is implemented during the aggregation process. The aggregation process is performed using multiple pysyft workers and distributes data to them. Specifically, a list of workers is created, one for each encrypted gradient set. The encrypted gradients are then distributed to the respective workers. The aggregation is then performed across all workers.

The ninth model, FL_PATE_CKKS, combines the PATE and HE (using the CKKS schema) with FL. This model protects both the final model through the PATE and the model updates through HE. Using the CKKS schema, HE was added to the implementation of the FL_only model on both the sides of the server and clients. Clients encrypt their model updates before sending them to the server, while the server performs the aggregation and processing of the encrypted updates from all participants without the need for decryption. The aggregation process computes the average of these encrypted updates from all participating clients. Throughout this aggregation process, the data remain in their encrypted form without the need for decryption. After the completion of the FL process with HE, the PATE is implemented to protect the final global model from being attacked. The PATE is mainly implemented on the server side; after the completion of the FL rounds, the server starts the PATE procedure by creating a set of 10 teacher models. Each of these teacher models is then trained using a distinct and randomly selected subset of the server’s training dataset. In order to mitigate the danger of the leakage of sensitive data, DP using Laplace noise is added to the total predictions from all of the teacher models. Finally, the student model (the global model) is subsequently trained using the new aggregated, privacy-preserving labels of predictions generated from the teacher ensemble through the PATE process. The student model then serves as the final global model and is used for the evaluation process.

The tenth model, FL_PATE_CKKS_SMPC, combines the PATE, HE, and SMPC with FL. This model protects the final model through the PATE, the model updates through HE, and the aggregation process through SMPC. HE was added to the implementation of the FL_only model on both the sides of the server and clients. Clients encrypt their model updates before sending them to the server, while the server performs the aggregation and processing of the encrypted updates from all participants without the need for decryption. The aggregation process computes the average of these encrypted updates from all participating clients. Throughout this aggregation process, the data remain in their encrypted form without the need for decryption. Through SMPC, the FL_CKKS_SMPC model protects the aggregation process itself. SMPC is implemented during the aggregation process. The aggregation process is performed using multiple pysyft workers and distributes data to them. Specifically, a list of workers is created, one for each encrypted gradient set. The encrypted gradients are then distributed to the respective workers. The aggregation is then performed across all workers. After the completion of the FL process with HE and SMPC, the PATE is implemented to protect the final global model from being attacked. The PATE is mainly implemented on the server side. After the completion of the FL rounds, the server starts the PATE procedure by creating a set of 10 teacher models. Each of these teacher models is then trained using a distinct and randomly selected subset of the server’s training dataset. In order to mitigate the danger of the leakage of sensitive data, DP using Laplace noise is added to the total predictions from all of the teacher models. Finally, the student model (the global model) is subsequently trained using the new aggregated, privacy-preserving labels of predictions generated from the teacher ensemble through the PATE process. The student model then serves as the final global model and is used for the evaluation process.

In the FL_CKKS_DP model, using the Gaussian distribution mechanism, DP is added to the aggregated gradients of all FL-participating clients. The amount of DP noise introduced depends on a number of significant elements. The two most important are Delta (

δ

) and epsilon (

ε

). As detailed in Table 2, various privacy budgets,

ε

, from extremely strong privacy (0.01) to moderate privacy (5.0), were analyzed to understand the trade-off between the privacy budget

ε

and the utility of the model. Although raising the value of

ε

might improve accuracy, it is only appropriate for applications with weaker privacy assurances; therefore, for privacy-preserving applications, we should choose strong privacy values. Despite the larger values of

ε

yielding superior accuracy of up to 88.40% at

ε

= 5.0, we opted for

ε

= 0.1 due to it providing robust privacy guarantees while still ensuring a satisfactory accuracy of 70.50%. For the subsequent analysis of attacks, we focused on

ε

= 0.1 to evaluate security under strict privacy conditions. Delta

δ

(the probability of privacy violation) was set to

1 \times 10^{- 5}

to guarantee a negligible risk of privacy failure. The amount of the output that could be varied by adding or removing a single data point is called the sensitivity. Because the gradients were normalized, the sensitivity parameter was set to 1.0.

3.4. Evaluation Metrics

Besides the time, accuracy, precision, recall, F1-score, and classification report, the ASR was used for evaluation. The time metric represents the total training time, which is the overall duration from the beginning of the FL process to its completion, including all rounds of the FL process along with the fitting of the final model. The ASR is the proportion of successful attack attempts to the total number of attack attempts. The percentage of accurate predictions—both true positives and true negatives—among all instances analyzed is known as the accuracy.

The precision is the ratio of true positive predictions to all positive predictions (true positives + false positives), which is the positive predictive value of the model. The recall, or sensitivity, is calculated as the ratio of true positive predictions to all actual positive cases (true positives + false negatives), which reflects the ability of the model to detect positive cases comprehensively. The F1-score is a single score that achieves a balance between the precision and recall by taking the harmonic mean of both metrics. The main metrics for each class in the classification problem are comprehensively summarized in a classification report. The precision, recall, and F1-score for every class are usually included in a classification report. The report also provides the model’s overall accuracy, along with the macro average and weighted average for every class.

4. Experimental Setup

4.1. Dataset

The experiments were conducted on the Malware Dataset from Kaggle https://www.kaggle.com/datasets/blackarcher/malware-dataset, which is a classification-based Portable Executable (PE) dataset on malicious and benign files. It was created with the aid of a Python 3.11 library and includes the data of both normal and malware PE files. It can be used for the testing and training of various ML models. The main purpose of this dataset is for ML and malware identification.

The dataset consists of a total of 100,000 files. Each file represents an entry in the dataset. The dataset is balanced as it contains 50,000 samples for malware and 50,000 samples for benign files. The dataset consists of a total of 35 feature columns, such as the hash, which is a unique identifier for each file, and the classification, which represents the labeled feature in the dataset as “benign” or “malware”.

Specifically, the dataset provides a dynamic examination of Android apps and includes 35 features (dimensions) and 100,000 samples for examining runtime behavior. Each of 100 distinct programs (50 malicious and 50 benign) that were monitored for 1000 ms provided the data. The features for identification include the classification (object/string type), which classifies each sample as either “malware” or “benign”, and the hash (object/string type), which contains either SHA-256 hashes or APK filenames that uniquely identify each application. The remaining 33 features are integers (int64). Table 3 shows the feature set of the dataset.

4.2. Hardware and Software Environment

The experiments were performed on Google Colab. The CPU specification was an Intel Xeon CPU @ 2.20 GHz (two cores, four threads, x86_64 architecture). The total memory was 12 GB. The Python programming language was used for all implementations. For testing and training the ML models, a reduced dataset was generated, which was a representative subset of the original Malware Dataset offered by Kaggle with the same balance of classification feature labels.

The reduced dataset consisted of 5000 malware and benign files extracted from the original dataset. This reduced dataset decreased the size of the original full dataset while maintaining the same class distribution in the classification column. In all experiments, 20% of the data were used for testing and 80% for training the ML models.

4.3. Attack Simulation

Due to the wide scope of our study, which took into consideration a range of privacy-preserving techniques (PATE, SMPC, HE) and their combinations, we had to compare the models under different types of attacks. Each privacy-preserving technique was developed to guard against certain vulnerabilities, thus making it essential to compare them on several attack vectors. The arena for our comparison encompassed poisoning attacks, model inversion attacks, backdoor attacks, and man in the middle (MITM) attacks. Although every kind of attack inherently demands a particular definition and measurement method, this multi-faceted test strategy offered a better insight into the protection capabilities of every privacy-maintaining configuration.

4.3.1. Poisoning Attack

This experiment simulated a realistic attack scenario in which a malicious client participating in the FL system tries to perform an attack by manipulating the global model through poisoning its data, and the central FL server must identify and mitigate the effect of these attacks. All models were evaluated against this attack. The attack was implemented on the client side, specifically in the malicious client, which was the first client. The malicious client altered its training data before training the local model so that the data were changed for the malicious client prior to training.

For an untargeted poisoning attack, the data modification stage was performed by randomly flipping some portion of the classification labels to the wrong classes. For a targeted poisoning attack, the data modification stage was performed by randomly flipping some portion of the classification labels to the target class. A flipping ratio of 0.3 was used. After modifying these labels, the malicious client performed local model training on its poisoned data. The poisoned model weights were then sent to the server by the infected client. The server aggregated the model weights updates from all clients, so the malicious client’s poisoned weights were also aggregated with the honest clients’ weights.

Following the completion of the FL process, the attack was evaluated on the server side. The evaluation quantified the extent to which the poisoning attack was successful in altering and affecting the performance of the model. The ASR calculated the overall misclassification rate introduced by the attack as the proportion of incorrect misclassifications after the attack. Equation (5) shows how the poisoning attack success rate was measured for an untargeted attack, while Equation (6) calculates the targeted poisoning attack success rate. In order to evaluate the overall performance of the model, the server additionally computed common metrics such as the accuracy, precision, recall, and F1-score.

UPASR = \frac{MI}{I}

(5)

where

U P S R

refers to the untargeted poisoning ASR,

M I

is the total number of misclassified instances, and I is the total number of instances.

TPASR = \frac{TMI}{INT}

(6)

where

T P A S R

stands for the targeted poisoning ASR,

T M I

is the number of instances successful misclassified as the target class, and

I N T

is the total number of instances not originally in the target class.

4.3.2. Backdoor Attack

This experiment simulated a realistic attack scenario in which a malicious client participating in the FL system tries to perform a backdoor attack by manipulating the global model through manipulating and altering its local data, and the central FL server must identify and mitigate the effect of these attacks. All models were evaluated against this attack. The backdoor attack was implemented on the client side, specifically in the malicious client, which was specified with a backdoor client id. The first client was the malicious client for all experiments. The malicious client performed the attack by manipulating its local dataset before training.

Specifically, the malicious client altered 10% of its local training dataset by modifying the first dataset feature to the trigger value and also altering the related labels to those of a target class (the last class in the dataset). For model training, every client—even the malicious one—performed training on its own dataset and then sent model updates to the server, so when the malicious client was participating in the FL process, it updated the server with its local model updates, which had been attacked. The backdoor input was unintentionally included by the server when it aggregated the model updates from all clients, including the malicious one.

Through several FL rounds, the attack continued, progressively integrating the backdoor input into the global model. Once the FL process had finished, the attack evaluation was performed on the server side. The proportion of the backdoor input samples that a model identifies as the target class is known as the ASR. A higher ASR indicates a more successful attack. Equation (7) shows how the backdoor attack success rate was calculated.

BASR = \frac{CBS}{BS}

(7)

where

B A S R

stands for the backdoor

A S R

,

C B S

is the number of correctly classified backdoor samples, and

B S

is the total number of backdoored samples.

4.3.3. Model Inversion Attack

In this experiment, an attack was conducted on the server script, specifically on the server’s final global model after the completion of the FL rounds. Finding an input that the model identifies as belonging to the target class is the aim of this attack, as it may disclose confidential information about that class. Using the trained global model, the attack tries to reconstruct input data for every class in the dataset. In essence, it attempts to reverse-engineer what kind of input data would cause the model to predict a particular class. It achieves this by iteratively optimizing a random input to generate an output that fits the desired target class.

In order to evaluate the effectiveness of the attack, the attack’s reconstructed data were compared to the original sample for each class. The Mean Squared Error (MSE) was used as a key metric for the evaluation of the attack. The MSE calculates the average squared difference between the original data and the attack’s reconstructed data. Lower MSE values indicate better reconstruction and thus a more successful attack. This measurement metric was computed for every class in which the attack was carried out; therefore, it was computed for both of the dataset’s two classification classes. To provide a general evaluation of the attack’s efficacy, the average MSE was computed across all classes in addition to the MSE that was generated for each class.

The MSE for every class was computed using Equation (8) as follows:

{MSE}_{c} = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(8)

where

{MSE}_{c}

is the MSE for class c, n is the number of features in the sample,

y_{i}

is the ith feature of the original sample, and

{\hat{y}}_{i}

is the ith feature of the reconstructed sample.

The average MSE across all classes was calculated using Equation (9) as follows:

Average MSE = \frac{1}{C} \sum_{c = 1}^{C} {MSE}_{c}

(9)

where C is the total number of classes and

{MSE}_{c}

is the MSE for class c.

4.3.4. The Man in the Middle Attack

In this experiment, the attack was executed through a proxy server that intercepted the communication between the clients and the server. In FL, several clients work together to train a common global model. This is when the man in the middle (MITM) attack occurs. Using a proxy server to stand in between the clients and the real server allows for the implementation of the MITM attack. During FL, the participant clients repeatedly request the global model from the actual server. Once the global model has been received from the server, the clients use their local dataset to train it; then, they send the modified, updated weights back to the server to continue the FL process.

The proxy server was a crucial component in the assessment of the applied attack. It performed the attack using a gradient manipulation method. It executed the MITM attack by intercepting the communication from the client providing the server with its updated weights. Specifically, to construct the attacked weights, the proxy server first obtained the original weights from the client by intercepting the communication. Then, it applied the MITM attack by introducing Gaussian noise to the original client’s updated weight values, with a standard deviation of 0.1. In this case, two distinct global models were maintained by the actual server: one for the original weights and another for the weights that were attacked. In the case of models that were combined with HE, the proxy directly manipulated the encrypted gradients of HE using the CKKS schema, without decrypting them as HE allows for computations on encrypted data, during the process of training via the addition of encrypted Gaussian noise, enabling the evaluation of the attack in an encrypted environment.

The original and attacked updates (weights/gradients) were both sent to the real server by the proxy server. The server then handled the original and attacked updates independently when receiving them from the proxy server by performing aggregation separately on both the original and attacked updates. It kept track of two global models: the attacked model, which was updated with the aggregated attacked updates, and the original global model, which was updated with the aggregated original updates. After the completion of FL, the server evaluated the effectiveness of the attack by keeping these two distinct global models: the original and attacked global models.

The evaluation was performed by comparing the performance of the final attacked model, which had been impacted by the injected attacked updates, with the performance of the final original global model, which reflected the FL process in the absence of the attack. This enabled the server to measure the impact of the attack by calculating the amount of performance that the attack had caused the model to lose. For both the original global model and the attacked model, the server computed metrics such as the accuracy, precision, recall, and F1-score. Then, it used them to find the percentage drop in each metric measure. The impact of the attack (ASR) was calculated using the percentage decrease for each of the performance metrics of the accuracy, precision, recall, and F1-score using Equation (10).

PD = \frac{(O - A)}{O} \times 100

(10)

where

P D

stands for the percentage decrease, O is the original value representing the accuracy, precision, recall, or F1-score metric value before the attack, and A is the attacked value representing the same metric value after the attack.

5. Results and Analysis

5.1. Performance Analysis

Table 4 shows the evaluation of all the models, including the time metric along with the performance metrics of the accuracy, precision, recall, and F1-score. The corresponding classification results for all the models are presented in Table 5. The execution time increased as the number of privacy-preserving methods combinations increased. The FL_only model had the quickest execution time with a minimum value of 82.85 s for the time metric because it only applied FL without any combinations, while the FL_PATE_CKKS_SMPC model had the slowest execution time with a maximum time value of 298.60 s because it included the largest number of combinations among all the models.

FL_SMPC was the second-fastest model after FL_only, with a time value of 117.73, because it only combined SMPC and FL with no other additions such as noise. FL_PATE had a moderate time value of 186.43 s as the PATE was executed after the execution of FL, so it increased the training time of the model and resulted in a longer execution time than that of the FL_only model. FL_CKKS also took more time to complete its operations than FL_only due to the added HE operations. FL_CKKS_SMPC exceeded the execution time of FL_CKKS due to the added SMPC operations and secure aggregation using HE and SMPC. FL_CKKS_DP had a higher execution time than FL_CKKS due to noise addition. FL_SMPC_DP took a longer time to complete its operations compared to FL_SMPC because of the addition of regularization and noise.

The FL_CKKS model achieved the highest performance metrics of 99.80% for accuracy, precision, recall, and the F1-score, while FL_CKKS_DP achieved the lowest performance metrics of 70.50% accuracy, 71.96% precision, 70.50% recall, and a 70.00% F1-score. While the PATE enhances privacy, it may decrease the performance due to the noise added to the ensembles of the teachers’ votes, so FL_PATE achieved lower performance than FL_only, FL_PATE_CKKS decreased the performance of FL_CKKS, and FL_PATE_CKKS_SMPC reduced the performance of FL_CKKS_SMPC.

FL_SMPC had a performance result that was almost close to that of FL_only because it only applied SMPC during FL with no other data modifications, such as noise addition. FL_SMPC_DP enhanced the privacy of FL_SMPC but decreased its performance due to the addition of noise after the aggregation process. FL_CKKS had performance metrics that were almost close to those of FL_only because it only applied HE operations to FL with no other modifications. On the contrary, FL_CKKS enhanced the performance of FL_only. FL_CKKS_SMPC did not have a significant drop in performance metrics compared to FL_CKKS. On the contrary, FL_CKKS_SMPC achieved results very close to those of FL_CKKS, almost equal to them. FL_CKKS_DP had lower performance compared to FL_CKKS due to the use of DP through noise addition.

5.2. Security Evaluation

Figure 2 visualizes the MSE for all the models in the case of the model inversion attack discussed in Section 4.3.3. As is shown, the MSE was improved through the use of the designed models, proving the importance of applying privacy-preserving techniques along with FL to protect against model inversion attacks. FL_PATE_SMPC had the best MSE value of 19.267.

Figure 3 visualizes the ASR (BASR) of the backdoor attack discussed in Section 4.3.2. As is shown, the ASR decreased when using a combination of privacy-preserving techniques in conjunction with FL. FL_PATE_CKKS_SMPC was the best model for defending against backdoor attacks as it had the smallest backdoor ASR value of 0.0920.

Figure 4 and Figure 5 visualize the ASR (UPASR and TPASR) for the poisoning attacks discussed in Section 4.3.1. As is shown, the ASR decreased when using a combination of privacy-preserving techniques in conjunction with FL. FL_CKKS_SMPC was the best model for defending against an untargeted poisoning attack as it had the smallest ASR of 0.0010, while FL_CKKS and FL_CKKS_SMPC had the smallest ASR value of 0.0020 for the targeted poisoning attack.

Using the performance degradation (PD), Table 6 shows the ASR of the MITM attack discussed in Section 4.3.4. As is shown, the performance degradation for the MITM attack decreased with a combination of privacy-preserving techniques together with FL. Based on the reported results, FL_CKKS_PATE_SMPC was the best at resisting the MITM attack, achieving the smallest values of performance degradation for the MITM attack.

5.3. Trade-Off Analysis

Figure 6 illustrates a comparison of the training time across all models. The training time increased with the addition of privacy-preserving techniques. Figure 7 visualizes a comparison of the accuracy, precision, recall, and F1-score across all the models. The training time increases and performance may decrease with the addition of privacy-preserving techniques to FL, while this enhances privacy and security by improving the model’s robustness against attacks.

Based on the reported results, the integration of privacy preservation techniques and FL provided a stronger privacy guarantee against various attacks. Specifically, for all the evaluated attacks, all the combined models achieved better results compared to the FL_only model. Comparing the FL_PATE_CKKS_SMPC model with the FL_only model, FL_PATE_CKKS_SMPC increased its execution time from 82.85 s for FL_only to 298.60 s. The server and clients of the FL_PATE_CKKS_SMPC model were run 10 times and the average of each step was calculated to obtain the correct percentage. Table 7 shows the time analysis of the FL_PATE_CKKS_SMPC model. This table presents the computational weight of each pipeline step. On the server side, model training took the most time (41.97%), while decryption (3.57%) took the least time. On the client side, communication was the most important factor as it consumed the most time (71.51%), while model loading (0.77%) and gradient computation (0.25%) used the smallest amount of time. According to the average round times, the server spent significantly less time on each round (4.21%) than the clients (32.85%). The use of the FL_PATE_CKKS_SMPC model decreased the accuracy by approximately 15.01%. The reason behind the decrease in accuracy of the FL_PATE_CKKS_SMPC model was the use of the PATE through noise addition since HE only performed operations on encrypted data and the SMPC used in the FL_PATE_CKKS_SMPC model was the SMPC applied in the FL_SMPC model, which did not use DP. Therefore, HE and SMPC did not participate in the noise reduction of the FL_PATE_CKKS_SMPC model. Specifically, noise introduction through the PATE was the specific factor contributing to the accuracy reduction, not the encryption overhead. In summary, the FL_PATE_CKKS_SMPC model applied the CKKS scheme used in the FL_CKKS model, the SMPC used in the FL_SMPC model, and the PATE used in the FL_PATE model. Both the FL_CKKS and FL_SMPC models did not reduce the accuracy of the base FL_only model, but the FL_PATE model reduced the accuracy of the base FL_only model. Thus, the PATE was the main factor behind the accuracy reduction of the FL_PATE_CKKS_SMPC model. Another explanation is that the FL_PATE_CKKS_SMPC model applied the PATE to the FL_CKKS_SMPC model, which itself did not reduce the accuracy of the base FL_only model, so the PATE was the main factor behind the accuracy reduction of the FL_PATE_CKKS_SMPC model.

Although FL_PATE_CKKS_SMPC increased the execution time and decreased the accuracy of FL_only, it improved the model inversion MSE from 0.9676 for the FL_only model to 11.9647, the backdoor ASR from 0.6800 for FL_only to 0.0920, the untargeted poisoning ASR from 0.4050 for FL_only to 0.151, the targeted poisoning ASR from 0.982 for FL_only to 0.06, and also the MITM attack performance degradation (ASR) from 48.45% accuracy degradation for FL_only to 1.68% for the FL_PATE_CKKS_SMPC model.

6. Discussion

6.1. Insights on Model Execution Time and Privacy Trade-Offs

The results show that the execution time increases as more privacy-preserving techniques are incorporated into the FL model. Notably, the FL_only model, which lacks additional privacy techniques, had the shortest execution time of 82.85 s. As additional privacy-preserving techniques like SMPC, the PATE, and CKKS were integrated, the execution time increased significantly. This was particularly evident in the FL_PATE_CKKS_SMPC model, which had the highest execution time of 298.60 s. This trade-off between privacy and performance underscores the computational cost associated with securing data and enhancing privacy in FL systems. It is crucial to consider this balance when designing privacy-conscious FL systems that must operate efficiently while maintaining data confidentiality.

6.2. Effect of Privacy-Preserving Techniques on Model Accuracy

While privacy-preserving techniques are essential for securing sensitive data, they also introduce noise that impacts the model performance. FL_CKKS, which utilizes homomorphic encryption (HE) without other privacy-enhancing techniques, achieved the highest accuracy and performance metrics, including 99.80% for the accuracy, precision, recall, and F1-score. On the other hand, adding more privacy features such as differential privacy (DP) and Secure Multi-Party Computation (SMPC) resulted in a decline in performance. Specifically, the FL_CKKS_DP model, which includes differential privacy noise, showed a significant drop in metrics, with an accuracy as low as 70.50%. These findings highlight the trade-off between preserving privacy and maintaining the model performance, a critical consideration in fields like healthcare and finance where data security is paramount.

6.3. Privacy and Security Effectiveness Against Attacks

The combination of privacy-preserving techniques notably enhances the robustness of FL models against attacks, such as model inversion and backdoor attacks. For instance, the FL_PATE_SMPC model demonstrated the best defense against model inversion, with the lowest MSE value of 19.267, confirming the effectiveness of the PATE and SMPC in protecting against data leakage. Additionally, the FL_PATE_CKKS_SMPC model achieved the smallest backdoor ASR of 0.0920, further proving the superiority of integrating privacy methods in preventing malicious adversarial activities. This suggests that FL models, when combined with these techniques, can significantly mitigate the risks posed by various privacy and security threats, making them a more viable solution for sensitive applications.

6.4. Impact of Privacy Techniques on Poisoning Attacks

The integration of privacy-preserving techniques also plays a significant role in defending against poisoning attacks, a major concern in FL systems. The results show that FL_CKKS_SMPC was particularly effective in countering untargeted poisoning attacks, achieving the smallest ASR value of 0.0010. Similarly, for targeted poisoning attacks, both FL_CKKS and FL_CKKS_SMPC showed the lowest ASR values of 0.0020. These results highlight the efficacy of combining HE and SMPC in preventing adversaries from corrupting the training process through malicious data poisoning. By incorporating these privacy-enhancing methods, FL systems can ensure the integrity of the model training process, even in the presence of adversarial interference.

6.5. The Impact of Privacy Techniques on the Man in the Middle Attack

The combination of privacy-preserving techniques also improves the resistance of FL models against an MITM attack. The results illustrate that FL_PATE_CKKS_SMPC was the best model to defend against an MITM attack, achieving the lowest performance degradation, with decreases of 1.68% in accuracy, 1.94% in precision, 1.68% in recall, and 1.64% in the F1-score.

6.6. Overall Evaluation and Performance Comparison

The comprehensive performance evaluation of all FL models indicated that while incorporating privacy-preserving techniques can enhance data security, it comes at the cost of an increased computational overhead and slightly reduced performance. However, models like FL_CKKS and FL_CKKS_SMPC strike an optimal balance by achieving high accuracy while maintaining robust privacy protection. The results underscore the importance of carefully selecting privacy techniques that provide adequate protection without severely compromising the model’s effectiveness. In the context of real-world applications, especially those handling sensitive data, such as healthcare or financial systems, this trade-off is critical to maintaining both privacy and accuracy.

7. Conclusions and Future Work

This work aimed to provide a novel privacy-enhancing FL framework for malware detection based on a widely used Malware Dataset and an ANN classifier. Without any privacy preservation mechanism, a baseline FL model only achieved performance metrics of 99.30% for the accuracy, precision, recall and F1-score. In our comprehensive analysis, we examined ten different model configurations, testing them against an assortment of machine learning attacks, such as untargeted and targeted poisoning, backdoor injection, model inversion, and MITM attacks. Privacy-preserving mechanisms were integrated into the FL models without any significant performance penalties and increased the robustness of the models against several types of attacks.

From the experiment’s results, several top configurations emerged: the FL_PATE_CKKS _SMPC model was the least vulnerable to backdoor attacks (ASR: 0.0920) and to MITM attacks (all metrics’ degradation was under 2%); the FL_CKKS_SMPC model showed the highest resistivity to poisoning attacks (ASR: 0.0010 untargeted, 0.0020 targeted) while matching the FL_CKKS model’s performance against targeted attacks; and the FL_PATE_SMPC configuration was found to perform the best against model inversion attacks (MSE: 19.267). These results help shed light on the suitability of various privacy-preserving combinations in FL systems, adding to the existing knowledge on secure federated learning systems and offering practical references for privacy-enhancing machine learning applications. The findings suggest a need for continued research to adapt and refine the evaluation framework to encompass newly emerging attack vectors and to develop more robust defense mechanisms. Future work including an extension to other data types (such as images) or domain-specific datasets (healthcare/financial) would constitute a different research direction.

Author Contributions

Conceptualization, W.K., E.R. and A.S.; Methodology, W.K., E.R. and A.S.; Software, E.S.; Validation, E.S.; Investigation, E.S.; Writing—original draft, E.S.; Writing—review & editing, W.K. and E.R.; Visualization, E.S.; Supervision, W.K., E.R. and A.S.; Project administration, W.K. and A.S.; Funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was funded by the University of Technology and Applied Sciences through the Internal Research Funding Program, grant number IRG-IBRI-25-40.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors are thankful to the anonymous reviewers for their insightful comments and constructive suggestions that greatly improved the quality and clarity of this manuscript. The authors extend their gratitude to Asma M. Alkalbani, University of Technology and Applied Sciences - Ibri, for her invaluable assistance in addressing the proofreading feedback and managing the administrative procedures associated with the funded project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bian, J.; Shen, C.; Xu, J. Federated learning via indirect server-client communications. In Proceedings of the 2023 57th Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 22–24 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–5. [Google Scholar]
Zhang, Y.; Zeng, D.; Luo, J.; Fu, X.; Chen, G.; Xu, Z.; King, I. A Survey of Trustworthy Federated Learning: Issues, Solutions, and Challenges. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–47. [Google Scholar] [CrossRef]
Chang, Y.; Zhang, K.; Gong, J.; Qian, H. Privacy-preserving federated learning via functional encryption, revisited. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1855–1869. [Google Scholar] [CrossRef]
Li, Y.; Liu, Z.; Huang, Y.; Xu, P. FedOES: An efficient federated learning approach. In Proceedings of the 2023 3rd International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 24–26 February 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 135–139. [Google Scholar]
Ciceri, O.J.; Astudillo, C.A.; Zhu, Z.; da Fonseca, N.L. Federated learning over next-generation ethernet passive optical networks. IEEE Netw. 2022, 37, 70–76. [Google Scholar] [CrossRef]
Hussain, G.J.; Manoj, G. Federated learning: A survey of a new approach to machine learning. In Proceedings of the 2022 First International Conference on Electrical, Electronics, Information and Communication Technologies (ICEEICT), Trichy, India, 16–18 February 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar]
Li, Y.; Xu, G.; Meng, X.; Du, W.; Ren, X. LF3PFL: A Practical Privacy-Preserving Federated Learning Algorithm Based on Local Federalization Scheme. Entropy 2024, 26, 353. [Google Scholar] [CrossRef]
Sen, J.; Waghela, H.; Rakshit, S. Privacy in Federated Learning. arXiv 2024, arXiv:2408.08904. [Google Scholar]
Li, Y.; Hu, J.; Guo, Z.; Yang, N.; Chen, H.; Yuan, D.; Ding, W. Threats and Defenses in Federated Learning Life Cycle: A Comprehensive Survey and Challenges. arXiv 2024, arXiv:2407.06754. [Google Scholar]
Batool, H.; Anjum, A.; Khan, A.; Izzo, S.; Mazzocca, C.; Jeon, G. A secure and privacy preserved infrastructure for VANETs based on federated learning with local differential privacy. Inf. Sci. 2024, 652, 119717. [Google Scholar] [CrossRef]
Jin, W.; Yao, Y.; Han, S.; Joe-Wong, C.; Ravi, S.; Avestimehr, S.; He, C. FedML-HE: An efficient homomorphic-encryption-based privacy-preserving federated learning system. arXiv 2023, arXiv:2303.10837. [Google Scholar]
Geng, T.; Liu, J.; Huang, C.T. A Privacy-Preserving Federated Learning Framework for IoT Environment Based on Secure Multi-party Computation. In Proceedings of the 2024 IEEE Annual Congress on Artificial Intelligence of Things (AIoT), Melbourne, Australia, 24–26 July 2024; pp. 117–122. [Google Scholar] [CrossRef]
Watkins, W.; Wang, H.; Bae, S.; Tseng, H.H.; Cha, J.; Chen, S.Y.C.; Yoo, S. Quantum Privacy Aggregation of Teacher Ensembles (QPATE) for Privacy Preserving Quantum Machine Learning. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 6875–6879. [Google Scholar]
Jiang, Z.; Ni, W.; Zhang, Y. PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy. arXiv 2024, arXiv:2404.12730. [Google Scholar]
Zhang, Q.; Ma, J.; Lou, J.; Xiong, L.; Jiang, X. Private Semi-supervised Knowledge Transfer for Deep Learning from Noisy Labels. arXiv 2022, arXiv:2211.01628. [Google Scholar] [CrossRef]
Zhao, S.; Zhao, Q.; Zhao, C.; Jiang, H.; Xu, Q. Privacy-enhancing machine learning framework with private aggregation of teacher ensembles. Int. J. Intell. Syst. 2022, 37, 9904–9920. [Google Scholar] [CrossRef]
Luo, J.; Zhang, Y.; Zhang, J.; Mu, X.; Wang, H.; Yu, Y.; Xu, Z. Secformer: Towards fast and accurate privacy-preserving inference for large language models. arXiv 2024, arXiv:2401.00793. [Google Scholar]
Song, C.; Huang, R.; Hu, S. Private-preserving language model inference based on secure multi-party computation. Neurocomputing 2024, 592, 127794. [Google Scholar] [CrossRef]
Hosain, M.T.; Abir, M.R.; Rahat, M.Y.; Mridha, M.; Mukta, S.H. Privacy Preserving Machine Learning with Federated Personalized Learning in Artificially Generated Environment. IEEE Open J. Comput. Soc. 2024, 5, 694–704. [Google Scholar] [CrossRef]
Shen, C.; Zhang, W.; Zhou, T.; Zhang, L. A security-enhanced federated learning scheme based on homomorphic encryption and secret sharing. Mathematics 2024, 12, 1993. [Google Scholar] [CrossRef]
Liu, X.; Li, H.; Xu, G.; Chen, Z.; Huang, X.; Lu, R. Privacy-enhanced federated learning against poisoning adversaries. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4574–4588. [Google Scholar] [CrossRef]
Xu, G.; Li, H.; Zhang, Y.; Xu, S.; Ning, J.; Deng, R.H. Privacy-Preserving Federated Deep Learning With Irregular Users. IEEE Trans. Dependable Secur. Comput. 2022, 19, 1364–1381. [Google Scholar] [CrossRef]
Pan, Y.; Ni, J.; Su, Z. FL-PATE: Differentially Private Federated Learning with Knowledge Transfer. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Kuala Lumpur, Malaysia, 8–12 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Anastasakis, Z.; Velivassaki, T.H.; Voulkidis, A.; Bourou, S.; Psychogyios, K.; Skias, D.; Zahariadis, T. FREDY: Federated Resilience Enhanced with Differential Privacy. Future Internet 2023, 15, 296. [Google Scholar] [CrossRef]
Elfares, M.; Reisert, P.; Hu, Z.; Tang, W.; Küsters, R.; Bulling, A. PrivatEyes: Appearance-based Gaze Estimation Using Federated Secure Multi-Party Computation. Proc. ACM Hum.-Comput. Interact. 2024, 8, 1–23. [Google Scholar] [CrossRef]
Muazu, T.; Mao, Y.; Muhammad, A.U.; Ibrahim, M.; Kumshe, U.M.M.; Samuel, O. A federated learning system with data fusion for healthcare using multi-party computation and additive secret sharing. Comput. Commun. 2024, 216, 168–182. [Google Scholar] [CrossRef]
Chen, L.; Xiao, D.; Yu, Z.; Zhang, M. Secure and efficient federated learning via novel multi-party computation and compressed sensing. Inf. Sci. 2024, 667, 120481. [Google Scholar] [CrossRef]
Manh, B.D.; Nguyen, C.H.; Hoang, D.T.; Nguyen, D.N. Homomorphic Encryption-Enabled Federated Learning for Privacy-Preserving Intrusion Detection in Resource-Constrained IoV Networks. arXiv 2024, arXiv:2407.18503. [Google Scholar]
Guo, Y.; Li, L.; Zheng, Z.; Yun, H.; Zhang, R.; Chang, X.; Gao, Z. Efficient and Privacy-Preserving Federated Learning based on Full Homomorphic Encryption. arXiv 2024, arXiv:2403.11519. [Google Scholar]
Gao, Q.; Sun, Y.; Chen, X.; Yang, F.; Wang, Y. An Efficient Multi-Party Secure Aggregation Method Based on Multi-Homomorphic Attributes. Electronics 2024, 13, 671. [Google Scholar] [CrossRef]
Li, X.; Zhao, H.; Chen, X.; Deng, W. Homomorphic Encryption and Secure Aggregation Based Vertical-Horizontal Federated Learning for Flight Operation Data Sharing. In Proceedings of the 2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Nanjing, China, 29–31 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 844–848. [Google Scholar]
Yang, W.; Yang, Y.; Xi, Y.; Zhang, H.; Xiang, W. FLCP: Federated learning framework with communication-efficient and privacy-preserving. Appl. Intell. 2024, 54, 6816–6835. [Google Scholar] [CrossRef]
Hu, K.; Gong, S.; Zhang, Q.; Seng, C.; Xia, M.; Jiang, S. An overview of implementing security and privacy in federated learning. Artif. Intell. Rev. 2024, 57, 204. [Google Scholar] [CrossRef]
Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
Wen, J.; Zhang, Z.; Lan, Y.; Cui, Z.; Cai, J.; Zhang, W. A survey on federated learning: Challenges and applications. Int. J. Mach. Learn. Cybern. 2023, 14, 513–535. [Google Scholar] [CrossRef]
Chai, D.; Wang, L.; Yang, L.; Zhang, J.; Chen, K.; Yang, Q. A Survey for Federated Learning Evaluations: Goals and Measures. IEEE Trans. Knowl. Data Eng. 2024, 36, 5007–5024. [Google Scholar] [CrossRef]
Karras, A.; Karras, C.; Giotopoulos, K.C.; Tsolis, D.; Oikonomou, K.; Sioutas, S. Peer to peer federated learning: Towards decentralized machine learning on edge devices. In Proceedings of the 2022 7th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Ioannina, Greece, 23–25 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–9. [Google Scholar]
Rachakonda, S.; Moorthy, S.; Jain, A.; Bukharev, A.; Bucur, A.; Manni, F.; Quiterio, T.M.; Joosten, L.; Mendez, N.I. Privacy enhancing and scalable federated learning to accelerate ai implementation in cross-silo and iomt environments. IEEE J. Biomed. Health Inform. 2022, 27, 744–755. [Google Scholar] [CrossRef]
Qi, P.; Chiaro, D.; Guzzo, A.; Ianni, M.; Fortino, G.; Piccialli, F. Model aggregation techniques in federated learning: A comprehensive survey. Future Gener. Comput. Syst. 2024, 150, 272–293. [Google Scholar] [CrossRef]
Ryu, M.; Kim, Y.; Kim, K.; Madduri, R.K. APPFL: Open-source software framework for privacy-preserving federated learning. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, 30 May–3 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1074–1083. [Google Scholar]
Pulido-Gaytan, B.; Tchernykh, A.; Tchernykh, A.; Cortés-Mendoza, J.M.; Babenko, M.; Radchenko, G.; Avetisyan, A.; Drozdov, A.Y. Privacy-preserving neural networks with Homomorphic encryption: Challenges and opportunities. Peer- Netw. Appl. 2021, 14, 1666–1691. [Google Scholar] [CrossRef]
Liu, G.; Furth, N.; Shi, H.; Khreishah, A.; Lee, J.Y.; Ansari, N.; Liu, C.; Jararweh, Y. Federated Learning Aided Deep Convolutional Neural Network Solution for Smart Traffic Management. In Proceedings of the NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, Miami, FL, USA, 8–12 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
Nazir, S.; Kaleem, M. Federated Learning for Medical Image Analysis with Deep Neural Networks. Diagnostics 2023, 13, 1532. [Google Scholar] [CrossRef]
Gutiérrez, D.; Hassan, H.M.; Landi, L.; Vitaletti, A.; Chatzigiannakis, I. Application of federated learning techniques for arrhythmia classification using 12-lead ECG signals. arXiv 2022, arXiv:2208.10993. [Google Scholar] [CrossRef]
Mothukuri, V.; Khare, P.; Parizi, R.M.; Pouriyeh, S.; Dehghantanha, A.; Srivastava, G. Federated-learning-based anomaly detection for IoT security attacks. IEEE Internet Things J. 2021, 9, 2545–2554. [Google Scholar] [CrossRef]
Subramanian, N.; Ravi, L.; Shaan, M.J.; Devarajan, M.; Choudhury, T.; Kotecha, K.; Vairavasundaram, S. Securing Mobile Devices from Malware: A Faceoff Between Federated Learning and Deep Learning Models for Android Malware Classification. J. Comput. Sci. 2024, 20, 254–264. [Google Scholar] [CrossRef]
Panagoda, D.; Malinda, C.; Wijetunga, C.; Rupasinghe, L.; Bandara, B.; Liyanapathirana, C. Application of federated learning in health care sector for malware detection and mitigation using software defined networking approach. In Proceedings of the 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), Ravet, India, 26–28 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Sikandar, H.S.; Waheed, H.; Tahir, S.; Malik, S.U.; Rafique, W. A detailed survey on federated learning attacks and defenses. Electronics 2023, 12, 260. [Google Scholar] [CrossRef]
Xia, G.; Chen, J.; Yu, C.; Ma, J. Poisoning attacks in federated learning: A survey. IEEE Access 2023, 11, 10708–10722. [Google Scholar] [CrossRef]
Liu, P.; Xu, X.; Wang, W. Threats, attacks and defenses to federated learning: Issues, taxonomy and perspectives. Cybersecurity 2022, 5, 4. [Google Scholar] [CrossRef]
Xu, H.; Shu, T. Defending against model poisoning attack in federated learning: A variance-minimization approach. J. Inf. Secur. Appl. 2024, 82, 103744. [Google Scholar] [CrossRef]
Nguyen, T.D.; Nguyen, T.; Le Nguyen, P.; Pham, H.H.; Doan, K.D.; Wong, K.S. Backdoor attacks and defenses in federated learning: Survey, challenges and future research directions. Eng. Appl. Artif. Intell. 2024, 127, 107166. [Google Scholar] [CrossRef]
Chen, Y.; Gui, Y.; Lin, H.; Gan, W.; Wu, Y. Federated learning attacks and defenses: A survey. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 4256–4265. [Google Scholar]
Zhao, J.C.; Bagchi, S.; Avestimehr, S.; Chan, K.S.; Chaterji, S.; Dimitriadis, D.; Li, J.; Li, N.; Nourian, A.; Roth, H.R. Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape-A Survey. arXiv 2024, arXiv:2405.03636. [Google Scholar]
Bradley, M.; Xu, S. A Metric for Machine Learning Vulnerability to Adversarial Examples. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 10–13 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–2. [Google Scholar]
Yan, H.; Li, X.; Zhang, W.; Wang, R.; Li, H.; Zhao, X.; Li, F.; Lin, X. Automatic evasion of machine learning-based network intrusion detection systems. IEEE Trans. Dependable Secur. Comput. 2023, 21, 153–167. [Google Scholar] [CrossRef]
Askhatuly, A.; Berdysheva, D.; Yedilkhan, D.; Berdyshev, A. Security Risks of ML Models: Adverserial Machine Learning. In Proceedings of the 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 15–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 440–446. [Google Scholar]
Rashid, A.; Such, J. Effectiveness of moving target defenses for adversarial attacks in ml-based malware detection. arXiv 2023, arXiv:2302.00537. [Google Scholar] [CrossRef]
Ikenouchi, H.; Hirose, H.; Uto, T. Backdoor Defense with Colored Patches for Machine Learning Models. In Proceedings of the 2024 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), Okinawa, Japan, 2–5 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Tidjon, L.N.; Khomh, F. Threat assessment in machine learning based systems. arXiv 2022, arXiv:2207.00091. [Google Scholar]
Surekha, M.; Sagar, A.K.; Khemchandani, V. A Comprehensive Analysis of Poisoning Attack and Defence Strategies in Machine Learning Techniques. In Proceedings of the 2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT), Greater Noida, India, 9–10 February 2024; IEEE: Piscataway, NJ, USA, 2024; Volume 5, pp. 1662–1668. [Google Scholar]
Chen, L.; Cheng, M.; Huang, H. Backdoor learning on sequence to sequence models. arXiv 2023, arXiv:2305.02424. [Google Scholar]
Dibbo, S.V. Sok: Model inversion attack landscape: Taxonomy, challenges, and future roadmap. In Proceedings of the 2023 IEEE 36th Computer Security Foundations Symposium (CSF), Dubrovnik, Croatia, 9–13 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 439–456. [Google Scholar]
Zhou, S.; Zhu, T.; Ye, D.; Yu, X.; Zhou, W. Boosting model inversion attacks with adversarial examples. IEEE Trans. Dependable Secur. Comput. 2023, 21, 1451–1468. [Google Scholar] [CrossRef]
Li, H.; Li, Z.; Wu, S.; Hu, C.; Ye, Y.; Zhang, M.; Feng, D.; Zhang, Y. Seqmia: Sequential-metric based membership inference attack. arXiv 2024, arXiv:2407.15098. [Google Scholar]
Bertran, M.; Tang, S.; Roth, A.; Kearns, M.; Morgenstern, J.H.; Wu, S.Z. Scalable membership inference attacks via quantile regression. Adv. Neural Inf. Process. Syst. 2024, 36, 314–330. [Google Scholar]
Oliynyk, D.; Mayer, R.; Rauber, A. I know what you trained last summer: A survey on stealing machine learning models and defences. ACM Comput. Surv. 2023, 55, 1–41. [Google Scholar] [CrossRef]
Rigaki, M.; Garcia, S. Stealing and evading malware classifiers and antivirus at low false positive conditions. Comput. Secur. 2023, 129, 103192. [Google Scholar] [CrossRef]
Chittibala, D.R.; Jabbireddy, S.R. Security in Machine Learning (ML) Workflows. Int. J. Comput. Eng. 2024, 5, 52–63. [Google Scholar] [CrossRef]
Papernot, N.; Abadi, M.; Erlingsson, U.; Goodfellow, I.; Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data. arXiv 2016, arXiv:1610.05755. [Google Scholar]
Hannemann, A.; Friedl, B.; Buchmann, E. Differentially Private Multi-Label Learning Is Harder Than You’d Think. In Proceedings of the 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Vienna, Austria, 8–12 July 2024; pp. 40–47. [Google Scholar] [CrossRef]
Tran, C.; Fioretto, F. On the fairness impacts of private ensembles models. arXiv 2023, arXiv:2305.11807. [Google Scholar]
Malik, J.; Muthalagu, R.; Pawar, P.M. A Systematic Review of Adversarial Machine Learning Attacks, Defensive Controls and Technologies. IEEE Access 2024, 12, 99382–99421. [Google Scholar] [CrossRef]
Mehta, U.; Vekariya, J.; Mehta, M.; Kaur, H.; Kumar, Y. A review of privacy-preserving machine learning algorithms and systems. In Applied Data Science and Smart Systems; CRC Press: Boca Raton, FL, USA, 2025; pp. 220–225. [Google Scholar]
Ju, Q.; Xia, R.; Li, S.; Zhang, X. Privacy-preserving classification on deep learning with exponential mechanism. Int. J. Comput. Intell. Syst. 2024, 17, 39. [Google Scholar] [CrossRef]
Dodwadmath, A.; Stich, S.U. Preserving Privacy with PATE for Heterogeneous Data. In Neural Information Processing Systems Workshop (NeurIPS-W). 2022. Available online: https://publications.cispa.de/articles/conference_contribution/Preserving_privacy_with_PATE_for_heterogeneous_data/24614826?file=43249032 (accessed on 1 February 2025).
Hu, H.; Han, Q.; Ma, Z.; Yan, Y.; Xiong, Z.; Jiang, L.; Zhang, Y. PV-PATE: An Improved PATE for Deep Learning with Differential Privacy in Trusted Industrial Data Matrix. In Proceedings of the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Wuhan, China, 6–8 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 477–491. [Google Scholar]
Truong, N.; Sun, K.; Wang, S.; Guitton, F.; Guo, Y. Privacy preservation in federated learning: An insightful survey from the GDPR perspective. Comput. Secur. 2021, 110, 102402. [Google Scholar] [CrossRef]
Zhou, I.; Tofigh, F.; Piccardi, M.; Abolhasan, M.; Franklin, D.; Lipman, J. Secure Multi-Party Computation for Machine Learning: A Survey. IEEE Access 2024, 12, 53881–53899. [Google Scholar] [CrossRef]
Khan, T.; Budzys, M.; Nguyen, K.; Michalas, A. Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training. arXiv 2024, arXiv:2403.03592. [Google Scholar] [CrossRef]
Parikh, D.; Radadia, S.; Eranna, R.K. Privacy-Preserving Machine Learning Techniques, Challenges And Research Directions. Int. Res. J. Eng. Technol. 2024, 11, 499. [Google Scholar]
Adelipour, S.; Haeri, M. Private outsourced model predictive control via secure multi-party computation. Comput. Electr. Eng. 2024, 116, 109208. [Google Scholar] [CrossRef]
Liu, S.; Luo, J.; Zhang, Y.; Wang, H.; Yu, Y.; Xu, Z. Efficient privacy-preserving Gaussian process via secure multi-party computation. J. Syst. Archit. 2024, 151, 103134. [Google Scholar] [CrossRef]
Tran, A.T.; Luong, T.D.; Pham, X.S. A Novel Privacy-Preserving Federated Learning Model Based on Secure Multi-party Computation. In Proceedings of the International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, Kanazawa, Japan, 2–4 November 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 321–333. [Google Scholar]
Krishna, N.; Raju, K.M.; Gowda, V.D.; Arun, G.; Suneetha, S. Homomorphic Encryption and Machine Learning in the Encrypted Domain. In Innovative Machine Learning Applications for Cryptography; IGI Global: Hershey, PA, USA, 2024; pp. 173–190. [Google Scholar]
Amorim, I.; Costa, I. Homomorphic Encryption: An Analysis of its Applications in Searchable Encryption. arXiv 2023, arXiv:2306.14407. [Google Scholar]
Gouert, C.; Mouris, D.; Tsoutsos, N. Sok: New insights into fully homomorphic encryption libraries via standardized benchmarks. Proc. Priv. Enhancing Technol. 2023, 2023, 154–172. [Google Scholar] [CrossRef]
Galymzhankyzy, Z.; Rinatov, I.; Abdiraman, A.; Unaybaev, S. Assessing electoral integrity: Paillier’s partial homomorphic encryption in E-voting system. In Proceedings of the 2024 IEEE 4th International Conference on Smart Information Systems and Technologies (SIST), Astana, Kazakhstan, 15–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 194–201. [Google Scholar]
Subramaniyaswamy, V.; Jagadeeswari, V.; Indragandhi, V.; Jhaveri, R.H.; Vijayakumar, V.; Kotecha, K.; Ravi, L. Somewhat homomorphic encryption: Ring learning with error algorithm for faster encryption of IoT sensor signal-based edge devices. Secur. Commun. Netw. 2022, 2022, 2793998. [Google Scholar] [CrossRef]
van de Haterd, R.; El-Hajj, M. Enhancing Privacy and Security in IoT Environments through Secure Multiparty Computation. In Proceedings of the International Conference on Intelligent Systems and New Applications, Hanoi, Vietnam, 24–25 October 2024; Volume 2, pp. 64–69. [Google Scholar]
Doan, T.V.T.; Messai, M.L.; Gavin, G.; Darmont, J. A survey on implementations of homomorphic encryption schemes. J. Supercomput. 2023, 79, 15098–15139. [Google Scholar] [CrossRef]
Singh, V.K.; Chauhan, A.S.; Singh, A.; Thakur, R. Homomorphic Encryption: Hands Inside the Gloves. In Proceedings of the 2023 3rd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), Bengaluru, India, 21–23 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 248–253. [Google Scholar]
Frimpong, E.; Nguyen, K.; Budzys, M.; Khan, T.; Michalas, A. GuardML: Efficient Privacy-Preserving Machine Learning Services Through Hybrid Homomorphic Encryption. In Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing, Ávila, Spain, 8–12 April 2024; pp. 953–962. [Google Scholar]
Chillotti, I.; Gama, N.; Georgieva, M.; Izabachène, M. TFHE: Fast fully homomorphic encryption over the torus. J. Cryptol. 2020, 33, 34–91. [Google Scholar] [CrossRef]
Fan, J.; Vercauteren, F. Somewhat practical fully homomorphic encryption. Cryptol. Eprint Arch. 2012. [Google Scholar]
Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the Advances in Cryptology–ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Proceedings, Part I 23. Springer: Berlin/Heidelberg, Germany, 2017; pp. 409–437. [Google Scholar]
Kim, A.; Papadimitriou, A.; Polyakov, Y. Approximate homomorphic encryption with reduced approximation error. In Proceedings of the Cryptographers’ Track at the RSA Conference, San Francisco, CA, USA, 1–2 March 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 120–144. [Google Scholar]
Wiryen, Y.B.; Vigny, N.W.A.; Joseph, M.N.; Aimé, F.L. A Comparative Study of BFV and CKKs Schemes to Secure IoT Data Using TenSeal and Pyfhel Homomorphic Encryption Libraries. Int. J. Smart Secur. Technol. (IJSST) 2024, 10, 1–17. [Google Scholar] [CrossRef]
Patterson, V.L. Hitchhiker’s Guide to the TFHE Scheme. J. Cryptogr. Eng. 2023. [Google Scholar] [CrossRef]
Lee, S.; Lee, G.; Kim, J.W.; Shin, J.; Lee, M.K. HETAL: Efficient privacy-preserving transfer learning with homomorphic encryption. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 19010–19035. [Google Scholar]
Zhang, Q.Y.; Wen, Y.W.; Huang, Y.B.; Li, F.P. Secure speech retrieval method using deep hashing and CKKS fully homomorphic encryption. Multimed. Tools Appl. 2024, 83, 67469–67500. [Google Scholar] [CrossRef]
Reddi, S.; Rao, P.M.; Saraswathi, P.; Jangirala, S.; Das, A.K.; Jamal, S.S.; Park, Y. Privacy-preserving electronic medical record sharing for IoT-enabled healthcare system using fully homomorphic encryption, IOTA, and masked authenticated messaging. IEEE Trans. Ind. Inform. 2024, 20, 10802–10813. [Google Scholar] [CrossRef]
Kuo, T.H.; Wu, J.L. A High Throughput BFV-Encryption-Based Secure Comparison Protocol. Mathematics 2023, 11, 1227. [Google Scholar] [CrossRef]
Shen, S.; Yang, H.; Dai, W.; Zhou, L.; Liu, Z.; Zhao, Y. Leveraging GPU in Homomorphic Encryption: Framework Design and Analysis of BFV Variants. IEEE Trans. Comput. 2024, 73, 2817–2829. [Google Scholar] [CrossRef]
Klemsa, J.; Önen, M.; Akin, Y. A Practical TFHE-Based Multi-Key Homomorphic Encryption with Linear Complexity and Low Noise Growth. In Proceedings of the 28th European Symposium on Research in Computer Security, The Hague, The Netherlands, 25–29 September 2023. [Google Scholar]
Wei, B.; Lu, X.; Wang, R.; Liu, K.; Li, Z.; Wang, K. Thunderbird: Efficient Homomorphic Evaluation of Symmetric Ciphers in 3GPP by combining two modes of TFHE. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024, 2024, 530–573. [Google Scholar] [CrossRef]
Rey, V.; Sánchez, P.M.S.; Celdrán, A.H.; Bovet, G. Federated learning for malware detection in IoT devices. Comput. Netw. 2022, 204, 108693. [Google Scholar] [CrossRef]
Fang, W.; He, J.; Li, W.; Lan, X.; Chen, Y.; Li, T.; Huang, J.; Zhang, L. Comprehensive Android Malware Detection Based on Federated Learning Architecture. IEEE Trans. Inf. Forensics Secur. 2023, 18, 3977–3990. [Google Scholar] [CrossRef]
Zhang, X.; Wang, C.; Liu, R.; Yang, S. Federated Rnn-Based Detection of Ransomware Attacks: A Privacy-Preserving Approach; OSF: Alton, IL, USA, 2024. [Google Scholar]
Jiang, C.; Yin, K.; Xia, C.; Huang, W. Fedhgcdroid: An adaptive multi-dimensional federated learning for privacy-preserving android malware classification. Entropy 2022, 24, 919. [Google Scholar] [CrossRef]
Nobakht, M.; Javidan, R.; Pourebrahimi, A. SIM-FED: Secure IoT malware detection model with federated learning. Comput. Electr. Eng. 2024, 116, 109139. [Google Scholar] [CrossRef]
Kalapaaking, A.P.; Stephanie, V.; Khalil, I.; Atiquzzaman, M.; Yi, X.; Almashor, M. Smpc-based federated learning for 6g-enabled internet of medical things. IEEE Netw. 2022, 36, 182–189. [Google Scholar] [CrossRef]

Figure 1. ANN architecture.

Figure 2. Model inversion MSE.

Figure 3. Backdoor ASR.

Figure 4. Untargeted poisoning ASR.

Figure 5. Targeted poisoning ASR.

Figure 6. Comparison of models’ training times.

Figure 7. Comparison of models’ performance.

Table 1. Combined privacy preserving techniques.

		Included Methods
Model No.	Model	FL	PATE	SMPC	HE
1	FL_only	✓	x	x	x
2	FL_SMPC	✓	x	✓	x
3	FL_SMPC_DP	✓	x	✓	x
4	FL_PATE	✓	✓	x	x
5	FL_PATE_SMPC	✓	✓	✓	x
6	FL_CKKS	✓	x	x	✓
7	FL_CKKS_DP	✓	x	x	✓
8	FL_CKKS_SMPC	✓	x	✓	✓
9	FL_PATE_CKKS	✓	✓	x	✓
10	FL_PATE_CKKS_SMPC	✓	✓	✓	✓

Table 2. The impact of DP on the final model’s accuracy.

Privacy Budget ( $ε$ )	FL_CKKS_DP Accuracy
0.01 (extremely strong privacy)	60.10%
0.1 (very strong privacy)	70.50%
0.5 (strong privacy)	75.90%
1.0 (strong privacy)	84.10%
2.0 (moderate privacy)	85.80%
5.0 (moderate privacy)	88.40%

Table 3. Malware dataset features.

Feature	Feature Explanation	Feature Type
hash	Unique identifier for each file	Identification and Classification
classification	Indicates whether the entry file is classified as “malware” or “benign”	Identification and Classification
millisecond	Time offset within each file’s time series data	Identification and Classification
state	Current state of the process	Process State and Priority
prio	Priority value	Process State and Priority
static_prio	Static priority	Process State and Priority
normal_prio	Normal priority	Process State and Priority
policy	Scheduling policy	Process State and Priority
vm_pgoff	Virtual memory page offset	Memory Usage and Management
vm_truncate_count	Virtual memory truncated count	Memory Usage and Management
task_size	Size of the task	Memory Usage and Management
cached_hole_size	Size of cached memory hole	Memory Usage and Management
free_area_cache	Free area cache size	Memory Usage and Management
mm_users	Memory management users	Memory Usage and Management
map_count	Number of memory mappings	Memory Usage and Management
hiwater_rss	High-water mark for resident set size	Memory Usage and Management
total_vm	Total virtual memory	Memory Usage and Management
shared_vm	Shared virtual memory	Memory Usage and Management
exec_vm	Executable virtual memory	Memory Usage and Management
reserved_vm	Reserved virtual memory	Memory Usage and Management
nr_ptes	Number of page table entries	Memory Usage and Management
end_data	End of data segment	Memory Usage and Management
last_interval	Last scheduling interval	CPU Usage and Scheduling
nvcsw	Number of voluntary context switches	CPU Usage and Scheduling
nivcsw	Number of involuntary context switches	CPU Usage and Scheduling
utime	User mode time	CPU Usage and Scheduling
stime	System time	CPU Usage and Scheduling
gtime	Guest time	CPU Usage and Scheduling
cgtime	Cumulative guest time	CPU Usage and Scheduling
signal_nvcsw	Signal-related voluntary context switches	CPU Usage and Scheduling
fs_excl_counter	File system exclusive counter	File System and I/O
min_flt	Minor page faults	File System and I/O
maj_flt	Major page faults	File System and I/O
usage_counter	Usage counter	Miscellaneous
lock	Lock value	Miscellaneous

Table 4. Overall performance table for all models.

Reduced Dataset	Time (s)	Accuracy	Precision	Recall	F1-Score
Fl_only	82.85	0.9930	0.9930	0.9930	0.9930
FL_SMPC	117.73	0.9950	0.9950	0.9950	0.9950
FL_SMPC_DP	128.54	0.8350	0.8356	0.8350	0.8349
FL_PATE	186.43	0.8530	0.8573	0.8530	0.8526
FL_PATE_SMPC	193.60	0.8650	0.8675	0.8650	0.8648
FL_CKKS	192.87	0.9980	0.9980	0.9980	0.9980
FL_CKKS_DP	204.76	0.7050	0.7196	0.7050	0.7000
FL_CKKS_SMPC	213.08	0.9970	0.9970	0.9970	0.9970
FL_PATE_CKKS	267.34	0.8500	0.8502	0.8500	0.8500
FL_PATE_CKKS_SMPC	298.60	0.8440	0.8442	0.8440	0.8440

Table 5. Overall classification results.

Model (Accuracy)		Precision	Recall	F1-Score
Fl_only (0.99)	malware	0.99	1.00	0.99
	benign	1.00	0.99	0.99
	macro avg	0.99	0.99	0.99
	weighted avg	0.99	0.99	0.99
FL_SMPC (0.99)	malware	0.99	1.00	1.00
	benign	1.00	0.99	0.99
	macro avg	1.00	0.99	0.99
	weighted avg	1.00	0.99	0.99
FL_SMPC_DP (0.83)	malware	0.85	0.81	0.83
	benign	0.82	0.86	0.84
	macro avg	0.84	0.83	0.83
	weighted avg	0.84	0.83	0.83
FL_PATE (0.85)	malware	0.90	0.80	0.84
	benign	0.82	0.91	0.86
	macro avg	0.86	0.85	0.85
	weighted avg	0.86	0.85	0.85
FL_PATE_SMPC (0.86)	malware	0.84	0.91	0.87
	benign	0.90	0.82	0.86
	macro avg	0.87	0.86	0.86
	weighted avg	0.87	0.86	0.86
FL_CKKS (1.00)	malware	1.00	1.00	1.00
	benign	1.00	1.00	1.00
	macro avg	1.00	1.00	1.00
	weighted avg	1.00	1.00	1.00
FL_CKKS_DP (0.70)	malware	0.66	0.83	0.74
	benign	0.78	0.58	0.66
	macro avg	0.72	0.70	0.70
	weighted avg	0.72	0.70	0.70
FL_CKKS_SMPC (1.00)	malware	0.99	1.00	1.00
	benign	1.00	0.99	1.00
	macro avg	1.00	1.00	1.00
	weighted avg	1.00	1.00	1.00
FL_PATE_CKKS (0.85)	malware	0.86	0.84	0.85
	benign	0.84	0.86	0.85
	macro avg	0.85	0.85	0.85
	weighted avg	0.85	0.85	0.85
FL_PATE_CKKS_SMPC (0.84)	malware	0.84	0.86	0.85
	benign	0.85	0.83	0.84
	macro avg	0.84	0.84	0.84
	weighted avg	0.84	0.84	0.84

Table 6. MITM attack performance degradation (ASR).

Model	Accuracy Degradation	Precision Degradation	Recall Degradation	F1-Score Degradation
Fl_only	48.45%	74.24%	48.45%	65.64%
FL_SMPC	48.51%	74.26%	48.51%	65.67%
FL_SMPC_DP	24.92%	68.52%	24.92%	46.68%
FL_PATE	34.73%	69.01%	34.73%	56.02%
FL_PATE_SMPC	36.79%	68.77%	36.79%	57.77%
FL_CKKS	16.95%	15.80%	16.95%	17.10%
FL_CKKS_DP	25.37%	62.69%	25.37%	50.25%
FL_CKKS_SMPC	11.62%	11.61%	11.62%	11.62%
FL_CKKS_PATE	11.31%	5.24%	11.31%	12.70%
FL_CKKS _PATE_SMPC	1.68%	1.94%	1.68%	1.64%

Table 7. Time analysis.

Component	Server Side	Client Side
Model Training/PATE Training	41.97% ± 6.51%	13.71% ± 5.14%
Encryption Time	N/A	19.44% ± 8.55%
Decryption Time	3.57% ± 1.22%	N/A
Gradient Calculation Time	N/A	0.25% ± 0.09%
Communication Time	36.48% ± 7.89%	71.51% ± 27.83%
Aggregation Time	7.19% ± 1.99%	N/A
Teacher Model Creation Time	27.13% ± 4.51%	N/A
Model Loading Time	N/A	0.77% ± 0.29%
Average Round Time	4.21% ± 0.77%	32.85% ± 3.13%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shalabi, E.; Khedr, W.; Rushdy, E.; Salah, A. A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis. Information 2025, 16, 244. https://doi.org/10.3390/info16030244

AMA Style

Shalabi E, Khedr W, Rushdy E, Salah A. A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis. Information. 2025; 16(3):244. https://doi.org/10.3390/info16030244

Chicago/Turabian Style

Shalabi, Eman, Walid Khedr, Ehab Rushdy, and Ahmad Salah. 2025. "A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis" Information 16, no. 3: 244. https://doi.org/10.3390/info16030244

APA Style

Shalabi, E., Khedr, W., Rushdy, E., & Salah, A. (2025). A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis. Information, 16(3), 244. https://doi.org/10.3390/info16030244

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparative Study of Privacy-Preserving Techniques in Federated Learning: A Performance and Security Analysis

Abstract

1. Introduction

2. Background and Related Work

2.1. Federated Learning

2.2. Security in Federated Learning

2.3. Privacy-Preserving Techniques

2.3.1. Private Aggregation of Teacher Ensembles

2.3.2. Secure Multi-Party Computation

2.3.3. Homomorphic Encryption

2.4. Related Work

3. Methodology

3.1. Federated Learning Model

3.2. Implementation of Privacy-Preserving Techniques

3.2.1. Private Aggregation of Teacher Ensembles

3.2.2. Secure Multi-Party Computation

3.2.3. Homomorphic Encryption

3.3. Configuration Combinations

3.4. Evaluation Metrics

4. Experimental Setup

4.1. Dataset

4.2. Hardware and Software Environment

4.3. Attack Simulation

4.3.1. Poisoning Attack

4.3.2. Backdoor Attack

4.3.3. Model Inversion Attack

4.3.4. The Man in the Middle Attack

5. Results and Analysis

5.1. Performance Analysis

5.2. Security Evaluation

5.3. Trade-Off Analysis

6. Discussion

6.1. Insights on Model Execution Time and Privacy Trade-Offs

6.2. Effect of Privacy-Preserving Techniques on Model Accuracy

6.3. Privacy and Security Effectiveness Against Attacks

6.4. Impact of Privacy Techniques on Poisoning Attacks

6.5. The Impact of Privacy Techniques on the Man in the Middle Attack

6.6. Overall Evaluation and Performance Comparison

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI