FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation

Chen, Lucheng; Liu, Xiaoshuang; Wang, Ailing; Zhai, Weiwei; Cheng, Xiang

doi:10.3390/sym16111497

Open AccessArticle

FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation

by

Lucheng Chen

^1,2,

Xiaoshuang Liu

³,

Ailing Wang

³,

Weiwei Zhai

^3,* and

Xiang Cheng

⁴

¹

State Key Laboratory of Massive Personalized Customization System and Technology, COSMOPlat IoT Technology Co., Ltd., Qingdao 266101, China

²

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

³

Qingdao Penghai Software Co., Ltd., Qingdao 266071, China

⁴

School of Information Engineering, Yangzhou University, Yangzhou 225127, China

^*

Author to whom correspondence should be addressed.

Symmetry 2024, 16(11), 1497; https://doi.org/10.3390/sym16111497

Submission received: 21 August 2024 / Revised: 17 October 2024 / Accepted: 28 October 2024 / Published: 8 November 2024

(This article belongs to the Special Issue Symmetry and Asymmetry in Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

Federated Learning (FL), as a distributed machine learning framework, can effectively learn symmetric and asymmetric patterns from large-scale participants. However, FL is susceptible to malicious backdoor attacks through attackers injecting triggers into the backdoored model, resulting in backdoor samples being misclassified as target classes. Due to the stealthy nature of backdoor attacks in FL, it is difficult for users to discover the symmetric and asymmetric backdoor properties. Currently, backdoor defense methods in FL cause model performance degradation while reducing backdoors. In addition, some methods will assume the existence of clean samples, which does not match the realistic scenarios. To address such issues, we propose FLSAD, an effective backdoor defense method in FL via self-attention distillation. FLSAD can recover the triggers using an entropy maximization estimator. Based on the recovered triggers, we leverage the self-attention distillation to eliminate the backdoor. Compared with the baseline backdoor defense methods, FLSAD can reduce the success rates of different state-of-the-art backdoor attacks to 2% on four real-world datasets through extensive evaluation.

Keywords:

federated learning; backdoor attack; entropy maximization; self-attention distillation

1. Introduction

In recent years, federated learning (FL), as an emerging distributed machine learning method, has achieved notable advancements and has been extensively utilized in practical applications, including IoT, Wearables, sensors, and interface improvements [1,2,3]. FL can enhance the privacy protection level by allowing clients to learn the symmetric and asymmetric patterns from their local training data. Nevertheless, recent research has highlighted a significant concern regarding FL, specifically the emergence of a new threat known as the backdoor attack [4,5,6,7]. This attack involves embedding triggers into a small fraction of the training data, which can deceive the poison model into aligning strongly with the patterns associated with these triggers. In addition, attackers can use these triggers to manipulate the behavior of the affected model on specific inputs while leaving its performance on clean data unaffected. The backdoor attack in FL originally proposed [8] that adversaries aim to implant a backdoor in their local model, which then integrates into the global model during aggregation. These attacks pose heightened risks in federated learning due to its principle of universal participation and the covert nature of training procedures. The invisible triggers make defending against backdoor injections in FL inherently challenging [9,10,11,12].

Backdoor attacks typically involve data poisoning, which contains symmetric and asymmetric backdoor properties, in centralized learning scenarios [13]. For instance, in the CIFAR-10 dataset used for classifying cats and dogs, an attacker might label all “blue cats” as “dogs” in the training data. Subsequently, the model trained on this manipulated dataset may misclassify “blue cats” as ‘dogs’ during predictions. Differing from those encountered in centralized learning scenarios, data are distributed among different clients in federated learning, and the attacker faces constraints due to the decentralized nature of the training data. Thus, backdoor attacks in federated learning often involve model poisoning. This means attackers introduce a backdoor into the local model during the training phase, embedding a malicious pattern into the local updates [9,14,15]. When aggregated with updates from other clients, these local backdoored model updates result in a newly generated model with embedded backdoored features in subsequent rounds of federated communication.

To counteract backdoor attacks in FL, recent researchers have explored two main strategies: backdoor detection [16,17,18,19] and backdoor elimination [20,21,22]. The backdoor detection approach aims to determine if a global model has been compromised. They further seek to pinpoint potential attackers by differentiating between normal and malicious updates. Nevertheless, datasets contributed by different participants in FL typically exhibit Non-Independent and Identically Distributed (Non-IID) characteristics, making it challenging to discover malicious attackers. Furthermore, these detection approaches may conflict with the application of secure aggregation in FL [23,24,25]. On the other hand, backdoor elimination techniques focus on erasing backdoors through various techniques. In centralized learning, recent proposals for methods such as fine-tuning, pruning, and machine unlearning eliminate backdoored features. In contrast, various robust techniques like model pruning and knowledge distillation are employed to mitigate backdoored influence in federated learning [26,27,28,29,30].

However, several problems exist with current backdoor elimination approaches in FL. First, pruning enhances its defensive capability by selectively pruning neurons by exhibiting backdoor feature activation. Due to the fact that pruned neurons may also be normal information. In order to ensure that a low attack success rate is achieved, pruning will result in low model performance. Second, knowledge distillation aims to purify the trigger in the backdoored model. This method is the use of clean samples to distill malicious information. Unfortunately, it only mitigates the backdoor and does not completely eliminate it. Consequently, the method still fails to achieve efficient defense against backdoor attacks in FL. The current defense methods in FL are unable to effectively reduce the backdoor success rate while ensuring high main task accuracy.

In this paper, we propose an effective backdoor elimination method in FL based on self-attention distillation, called FLSAD. In contrast to the existing methods, pruning depends on manual experience and knowledge distillation depends on teacher models. FLSAD can only require the modification capability of the model itself to eliminate the impact of the backdoored model. Backdoor attacks are to establish a strong connection between the backdoor trigger and the target label in order to achieve a high attack success rate in samples injected with triggers and to maintain the original accuracy in clean samples. Due to the high accuracy of clean samples, the model shallow is more likely to contain correct information. We leverage the self-attention distillation method to eliminate the backdoored trigger. To reach this goal, the following two steps must be carried out. First, We should restore the backdoor trigger to further eliminate the backdoor without accessing the training dataset. FLSAD leverages entropy maximization to restore the backdoor trigger. Second, we use the self-attention distillation method to eliminate the backdoor in the model by the restored trigger as an aid.

We summarize our main contributions as follows:

We develop the effective backdoor defense framework in FL through a self-attention distillation named FLSAD. FLSAD distills attention knowledge of neurons to eliminate the backdoor of the model by the restored trigger.
To efficiently defend against the backdoor attack, we design the entropy maximization estimator for trigger reconstruction and then eliminate the backdoor by self-attention distillation.
In real-scenario datasets, we conduct extensive experiments against the SOTA backdoor attacks in FL. The experimental results demonstrate that FLSAD outperforms all baseline defense methods and has high defense accuracy and model performance.

The remainder of this paper is organized as follows. Section 2 introduces the background and related work. The threat model and defense goal are discussed in Section 3. Section 4 describes the proposed self-attention distillation backdoor defense method in FL. Section 5 evaluates and analyzes the results of the experiment. Finally, Section 6 concludes this paper.

2. Background and Related Work

2.1. Federated Learning

Introduced by Google in 2017, federated learning (FL) is a distributed architecture for machine learning aimed at safeguarding user privacy. This approach decentralizes the training process by enabling participants to download a global model. Users upload the local model parameters to the global model and the global model updates model parameter information. The architecture of federated learning is shown in Figure 1. The primary objective is to train a federated machine learning model for different users that protects their privacy. However, FL faces various challenges in practice. To address the fairness problem in federated learning, Ezzeldin et al. [31] introduced the FairFed algorithm. This algorithm implements cross-client learning using the debiased approach, and demonstrating the superiority of this approach in heterogeneous scenarios. Zhang et al. [32] enhance the capability of flooding the server via Adaptive Local Aggregation. This method can effectively adapt to capture the information in the client model. Moreover, Wu et al. [33] proposed a fast adaptive algorithm to solve the problem in federated learning that requires a large number of iterations to accomplish convergence. It can achieve significant enhancement in the efficiency of handling heterogeneous data.

2.2. Backdoor Attack in Federated Learning

Backdoor attacks are when attackers inject backdoor triggers to make the model judge the backdoor samples as target labels in the prediction phase. At the same time, backdoor attacks need to ensure that the clean samples that are not injected with backdoor samples are original labels. In a centralized scenario, the backdoor attacker can perform backdoor attacks by injecting backdoors at the data level and at the model level. However, due to the nature of federated learning, to protect user privacy, clients cannot access information about other client data. Therefore, attackers in federated learning cannot poison the data, which currently consists of two main types of attacks: model replacement attacks [8,15,34] and distributed attacks [9,10,35].

In the model replacement backdoor attack, Baghdashayan et al. [8] substitute the global model with a compromised backdoored model by injecting exaggerated local updates in FL. Additionally, Zhang et al. [36] enhanced the effectiveness of backdoor attacks by generative adversarial networks (GANs) to strengthen the backdoor. Attackers use generative adversarial networks to simulate training data from other participants to achieve backdoor attacks in FL. In the distributed attack, Xie et al. [9] leverage diverse local triggers contributed by participants to collectively form global triggers, thereby leveraging cooperation to enhance attack effectiveness. On this basis, subsequent research has begun to enhance the stealth and robustness of backdoor attacks. These include manipulating feature spaces or potential representations of inputs [37,38] and choosing the most toxic backdoor to inject [39]. In addition, Rong et al. [40] further demonstrate that backdoor attacks can be performed on federal recommendation systems. For all the backdoor attacks described above, the methods in this article will defend against them.

2.3. Backdoor Defense in Federated Learning

Current defense strategies against the aforementioned backdoor attacks in federated learning primarily fall into two categories: backdoor detection [18,19] and backdoor mitigation [16,20,21,22]. The purpose of backdoor detection is to identify the presence of a backdoor in a model by some method. Whereas backdoor elimination is based on backdoor detection, the detected backdoor is further eliminated to achieve the purpose of defense against the backdoor. For backdoor detection, Andreina et al. [18] proposed a circular feedback participation client to verify the global model. The method determines whether a given model has a backdoor or not by integrating the availability of different datasets in various clients. Nevertheless, these methods can only determine whether the model contains a backdoor or not, and do not eliminate the backdoored influence fundamentally.

Due to this problem, researchers started to study how to detect backdoor attacks on the basis of eliminating them. Common methods of eliminating backdoors in centralized scenarios include fine-tuning [26] and pruning [28], though these may cause performance degradation of the model. Additionally, Liu et al. [41] used backdoor unlearning to remove the effects of backdoors. Also, Li et al. [42] cut off the link between the backdoor and the target label through anti-backdoor learning. To deal with backdoor attacks in federated learning scenarios, Awan et al. [22] utilized the cosine similarity method to assess the reliability of client model parameters in each iteration. Then, the status of individual clients is adjusted according to their contribution to the global model. Furthermore, Xie et al. [21] proposed CRFL to maintain the smoothness of the global model by clipping and smoothing the parameters. Alongside these mitigation strategies, researchers have also focused on enhancing the robustness of machine learning models against evasion and backdoor attacks, exemplified by approaches like RAB [43] and FLAME [44]. However, these methods often result in significant degradation of model performance or rely on the impractical assumption that defenders have access to large quantities of clean data with known markers, which is particularly challenging in federated learning settings.

2.4. Knowledge and Attention Distillation

Knowledge distillation (KD) is a machine learning technique that aims to migrate knowledge from a complex teacher model to a simplified student model [45,46,47]. This approach ensures that there is little difference in performance between the student model and the teacher model. Initially proposed by Hinton [45], knowledge distillation minimizes the Kullback–Leibler divergence between the probability distributions of the student and teacher models to ensure high performance in the student model.

Current researchers combine knowledge distillation with attention mechanisms to supervise the training of student models through attention mechanisms [48,49]. These methods utilize attention mechanisms to focus on more salient features, thereby guiding student models for distillation. Zagoruyko et al. [48] proposed a broad attention distillation approach to learning the teacher model through activation and gradient guidance. Following this, H et al. [50] proposed the self-attention distillation method, which does not require a teacher model and only learns by itself through a student model. The method propagates from shallow to deep levels of the model through self-attention.

3. Threat Model and Defense Goal

3.1. Threat Model

The attacker adds backdoors into a proportion p of the original clean samples S by injecting a trigger t into the original clean samples S. The attacker’s goal is to have samples injected with the backdoor misjudged as target labels and normal samples judged as original labels. The backdoor model training process needs to minimize the loss:

L o s s = \sum_{(x, y) \in S} \{\begin{matrix} L (G (x + t), c) & p \\ L (G (x), y) & 1 - p \end{matrix},

(1)

where G is the trained model and

L o s s

represents the loss function. c indicates the target label. We refer to the setting of attacker abilities in the federated learning backdoor attack [8,15]. Attackers can inject triggers into normal samples and modify the label of poisoned samples. Uploading the crafted parameters when uploading the local model to the central server results in the server model containing backdoored features.

3.2. Defense Goal

The purpose of backdoor defense is to prevent backdoor attacks from succeeding. Defense methods are needed to effectively increase the success rate of the defense and ensure that the accuracy of samples not injected with backdoors is not compromised. For the defender, we ensure that the defender’s capabilities can perform normal federated learning model training operations without adding additional capabilities.

4. The Proposed Method

4.1. Overview

The general framework of our approach is shown in Figure 2. The defense of our method against the backdoor attack consists of two main stages. First, FLSAD uses the entropy maximization estimator to recover the trigger. Second, on the basis of the first step, FLSAD eliminates the backdoor by utilizing the self-attention distillation method.

4.2. Trigger Recovery

In the data classification task, the normal model is trained on the data to obtain a distinct decision boundary that is used to distinguish between different samples, aiming for minimal prediction errors on correctly labeled samples. In contrast, in a backdoor attack scenario, the attacker makes the backdoor samples cross the decision boundary and judges them as the target class through a trigger. This can cause the model to misjudge the backdoor sample and does not affect the operation of the normal sample.

We assume that the label space of the model G is K. Let the original label of a sample S as

y_{i}

belong to the feature space K, and let the target label of the backdoor attack

y_{t}

belong to this feature space as well. By injecting a trigger t by the attacker,

F (X) = y_{i}

can be made to become

F (X + Δ) = y_{t}

. The above analysis allows us to transform the problem of recovering backdoor triggers into the problem of how to discover the trigger distribution using a generative model that does not require data sampling.

To address this problem, our first consideration is to leverage the adversarial generative network approach. The generator is used to generate the distribution of triggers, and then the discriminator is used to determine whether the generated triggers are true or not. However, adversarial generative networks can cause problems with model performance degradation when evaluating high-dimensional triggers. Thus, our method leverages the maximum entropy method to solve this issue.

FLSAD goes through n sub-models

G = {G_{1}, \dots, G_{n}}

to evaluate the distribution of triggers t, where staircase approximations in triggers can be learned in each sub-model. Set up a backdoor model B. The trigger distribution of model

G_{i}

is identified as

t_{i} = {γ, ϵ_{i} \leq B (X + γ)}

. The loss function

L_{t}

of the sub-model will be updated based on the distribution.

L_{t} = \frac{1}{b} \sum_{x \in V} (m a x (0, ε_{i} - B_{θ_{0}} (x + G_{i} (δ)) [y_{p}]) - η H_{i} (G_{i} (δ); δ^{'})),

(2)

where

θ_{0}

is the parameters of B and

[y_{p}]

indicates one of the labels in all samples. b denotes the batch size, and the validation data for each batch is V.

δ

and

δ^{'}

represent random noise sampled from 0 to 1,

η

represents a hyperparameter.

4.3. Self Attention Distillation

Recovering triggers aids defenders in identifying and determining whether a sample contains a trigger or not. However, the operation does not yet eliminate the effects of backdoors and does not provide a defense against them. By performing interpretive studies on neural networks, we found that the shallow layer mainly proposes global structural information and the deep structure mainly extracts fine-grained features. Existing backdoor attacks add triggers that are often in the deeper layers of the model in order to ensure that the backdoor is hidden. To eliminate hidden backdoors, we consider the method of knowledge distillation. Currently, knowledge distillation is often conducted by instructing student models through teacher models. However, using the bad depths of the teacher model to distill the bad depths of the student model is not very effective for backdoor elimination. Therefore, we leverage the self-attention distillation approach to eliminate hidden backdoors by using good shallow knowledge to guide malicious information in the deeper layers of the model. Specifically, this underscores the critical importance of eliminating backdoors.

For the backdoored model B, the activation tensor for the l-layer of the model is

B^{l} \in R^{C_{l} \times H_{l} \times W_{l}}

. The channel dimensions, height, and width of the attention map are denoted

C_{l}

,

H_{l}

, and

W_{l}

. The feature dimensions are represented as two dimensions in the attentional representation process, i.e., from

R^{C_{l} \times H_{l} \times W_{l}}

to

R^{H_{l} \times W_{l}}

. Further, we compute the values on all channels to build the mapping function

G

.

\begin{matrix} G_{s u m}^{p} (B^{l}) = \sum_{i = 1}^{C_{l}} {| B_{i}^{l} |}^{p}, \\ G_{m e a n}^{p} (B^{l}) = \frac{1}{C} \sum_{i = 1}^{C_{l}} {| B_{i}^{l} |}^{p} \end{matrix}

(3)

where

B_{i}^{l}

represents the i-layer slice of

B^{l}

.

G_{s u m}^{p} (B^{l})

can achieve higher activation weights by an increase in p.

G_{m e a n}^{p} (B^{l})

takes into account the weighting of all areas.

Better backdoor elimination can be achieved through attention distillation modeling. Self-attentional distillation methods are able to utilize knowledge in the shallow layer to eliminate highly hidden backdoors in the deep layer. The loss function of this method can be represented as

\begin{matrix} L_{S A D} (B^{i}, B^{j}) = L_{d} (ψ (B^{i}), ψ (B^{j})), \\ s . t . L_{d} (ψ_{1}, ψ_{2}) = {| | ψ_{1} - ψ_{2} | |}_{2}, \\ ψ (\cdot) = N (H (G (\cdot))), N (B^{l}) = \frac{B^{l}}{{| | B^{l} | |}_{2}}, \end{matrix}

(4)

where

L_{d} (\cdot)

denotes the distance between attention maps computed via the

L_{2}

-norm.

G (\cdot)

indicates the extracted feature representation.

ψ_{1}

and

ψ_{2}

indicate the feature vectors of different layers on the attention map, respectively. In order to harmonize the shallow and deep attention sizes, this method uses a bilinear sampling operation, i.e., H.

Since self-attention distillation is used to minimize the gap between attention maps, no concern is paid to the accuracy of the prediction results. The loss of using only self-attention distillation (

L_{S A D}

) during retraining reduces the prediction accuracy of clean samples. Therefore, this method adds cross-entropy loss (

L_{C E}

) to ensure the accuracy of clean samples.

L = L_{C E} ({\hat{y}}^{(t)}, y^{(t)}) + \sum_{< i_{k}, i_{j} > \in p a t h s} β_{k} L_{S A D} (B^{i_{k}}, B^{j_{k}}),

(5)

where

β_{k}

denotes the hyperparameter used to balance model prediction accuracy and self-attentive distillation. We ensure the accuracy expected on clean models via

L_{C E}

. The pseudocode for the proposed FLSAD is listed as Algorithm 1.

Algorithm 1: FLSAD Algorithm

5. Experimental Evaluation

In this section, we validate our defense approach on the different real datasets to evaluate the performance of FLSAD against different attack methods. First, we analyze the datasets and experimental setup, including the dataset, training settings, evaluation metrics, and baseline methods. Then, the superiority between FLSAD and existing defense methods is compared. Finally, we further conduct ablation experiments to analyze our method.

5.1. Datasets and Experimental Setup

5.1.1. Datasets

To validate the effectiveness of our present method, we perform extensive verification on four real datasets, including MNIST (http://yann.lecun.com/exdb/mnist/ (accessed on 21 July 2024)), Fashion-MNIST (https://github.com/zalandoresearch/fashion-mnist (accessed on 21 May 2022)), CIFAR-10 (https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 6 June 2020)), and CIFAR-100 (https://www.kaggle.com/datasets/fedesoriano/cifar100 (accessed on 10 May 2020)). In this case, both the MNIST and Fashion-MNIST datasets contain 60,000 training samples and 10,000 test samples, and the data categories contain 10 classes. Despite containing the same amount of data, these two datasets are differentiated in terms of content, MNIST for the handwritten digits dataset and Fashion-MNIST for the clothing objects dataset. We leverage these datasets for federated learning model training and the details of the datasets are shown in Table 1.

5.1.2. Experimental Setup

Training setting. We set up federated learning with 100 clients participating in model training, of which 70 are normal clients and 30 are malicious clients. The attacker injects an identical trigger in each backdoor sample that is model predicted to be the target label. We set up training N local epochs with a batch size of 64 and a local model learning rate of

η

during training. Let the local epoch of malicious clients be set to 20 and the model learning rate to 0.05, while the normal client’s local epoch to 10, and the model learning rate is set to 0.1 during local training. This is the setup in different types of backdoor attacks and our experiments follow this standard.

Evaluation metrics. We leverage the attack success ratio (ASR) and model accuracy (ACC) metrics to effectively evaluate our approach. The attack success rate indicates the success rate of the attacker’s backdoor attack, and the purpose of our defense is to effectively reduce the backdoor success rate. The model accuracy is an indication of the accuracy of the main task of the model, i.e., it indicates the prediction result of the backdoor model for clean samples. The purpose of our method is to ensure that the main task is not affected by the backdoor attack.

Baseline methods. We will select three state-of-the-art backdoor attacks in federated learning to measure the effectiveness of our defense approach. These three attack methods are watermarking [51], pixel block [52], and random noise [53]. To exhibit the superiority of the defense of our method, we compare four backdoor defense methods in FL, including FoolsGold [16], Baffle [18], CONTRA [22], and RLR [20].

5.2. Comparison with Baseline Defenses

In order to demonstrate that FLSAD can deal with various backdoor attacks in a real federated learning scenario and the superiority of our method, we analyze the defense against three state-of-the-art federated learning backdoor attack methods in four real-world datasets and compare four baseline defense methods. Table 2, Table 3, Table 4 and Table 5 exhibit the experimental results of comparative baseline defense methods for FLSAD.

On the MNIST dataset, the experimental displays that FLSAD can effectively reduce the attack success rate in backdoor attacks as well as guarantee the performance of the main task of the model. It can be found that the ASR under undefended backdoor attack is about 90%, and the ASR after defending by our method is below 2% in all cases. Meanwhile, the accuracy of the models after defense is all higher than 93%, which does not affect the performance of the models. In addition, we set up experiments with different trigger sizes in order to discover the impact of different trigger sizes on backdoor defense. A larger trigger size indicates a stronger backdoor injected by the attacker, which can also enhance the success rate of the backdoor attack. From the results, we can find that a larger trigger size will reduce the effectiveness of our defense.

It can be found that FLSAD has similar defense performance on the Fashion-MNIST and MNIST datasets. However, the model has a lower accuracy in the model main task in CIFAR-10 and CIFAR-100 datasets. For the CIFAR-10 dataset, the accuracy of the model performance is only about 84%, and the success rate of the attack is only about 60% in the without-defense case. By the FLSAD method, it is possible to achieve 88% accuracy in the main task, which improves the performance of the model. The attack success rate of the model can also be reduced to about 0.9%. For the CIFAR-100 dataset, the success rate of the attack is only about 58% in the without-defense case against three backdoor attacks. Among them, the attack success rate is only 54.79% against the Pixel block backdoor attack in the setting of trigger size of 3 × 3. The accuracy of the model performance is only about 72%. With the defense of our method, it is possible to achieve 73% model accuracy without degrading the model performance. Overall, it is observed that FLSAD outperforms all baseline defense methods, effectively reducing the success rate of the attack while maintaining the accuracy of the primary task.

5.3. Computational Costs

To assess the efficiency of FLSAD, we compare the computational costs of FLSAD with baselines in Table 6. Since this method employs a self-distillation approach, its computational complexity is relatively high. The complexity of this method is lower than that of the RLR method but higher than FoolsGold, Baffle, and CONTRA. However, the objective of this method is to effectively defend against backdoor attacks, which is better than all the baseline methods under the different datasets, as seen in Table 2, Table 3, Table 4 and Table 5. In addition, Table 6 and Table 7 present a comprehensive analysis of the necessary computation epochs for achieving a target accuracy using different local epochs E. Increasing E leads to higher computation costs. Consider the scenario where E = 6, which necessitates 230 computation epochs to achieve 80% accuracy. When we set E = 30, it demands 300 computation epochs. These findings highlight that achieving high accuracy in aggregation convergence epochs is not guaranteed.

5.4. Ablation Study

Our method first recovers the triggers and then utilizes the recovered triggers for self-attention distillation, thereby eliminating the backdoor. To further verify the effects of recovery triggers and self-attention on eliminating the backdoor, we conduct ablation experiments separately. In our experiment, the results of the ablation experiments are shown in Figure 3 and Figure 4 for the MNIST and CIFAR-10 datasets, where the elimination of the reduction trigger method is called NO-RT and the elimination of the self-attention method is called NO-SA.

5.4.1. Impact of Trigger Recovery

The recovery trigger can be used to help localize the distillation model for better distillation of backdoor features in FL. In Figure 3, we see that the NO-RT method can reduce the attack success rate to less than 45% and the model accuracy can reach 76%. Under the Watermarking backdoor attack, the success rate of the attack can be reduced to 40.53%, but the difference is still significant when comparing the FLSAD method. Our method ASR can reach 1.87%, which is 38.66% lower than the NO-RT method. Therefore, the importance of the reduction trigger for the task can be found.

5.4.2. Impact of Self-Attention

FLSAD leverages the self-attention method to precisely locate the positions containing backdoor information in the deeper layers of the model, allowing for more effective corrections based on the shallow information. On the CIFAR-10 dataset, the ASR of the NO-SA method can be reduced to less than 35%, and the model accuracy drops by about 15%. In particular, the model accuracy drops to only 69.47% when subjected to the random noise attack method. This method represents a 18.15% reduction in accuracy compared to FLSAD. Therefore, the self-attention method is very essential for defense against backdoor attacks in FL.

6. Conclusions

In this paper, we design the backdoor elimination method in FL based on self-attention distillation, called FLSAD. FLSAD can efficiently discover the backdoor attack in FL by designing the entropy maximization estimator to recover from backdoored data. To further eliminate the impact of backdoored features, we introduce the self-attention distillation method that revises deep malicious information by utilizing shallow benign information to modify deep malicious information. The distillation of these backdoor samples, which plays a crucial role in recovering triggers, facilitates the efficient elimination of backdoors, thereby enhancing backdoor defense in realistic federated learning scenarios. The extensive experiments demonstrate that FLSAD can eliminate the backdoor based on accurate detection of the backdoor against three SOTA federated backdoor attacks on four real-world datasets. In further work, we will plan to explore how to conduct valid defense against federal backdoor attacks in self-supervised scenarios.

Author Contributions

Conceptualization, L.C. and W.Z.; methodology, L.C.; software, X.L.; validation, A.W., W.Z. and X.C.; formal analysis, L.C.; data curation, W.Z.; writing—original draft preparation, L.C.; writing—review and editing, X.C.; visualization, W.Z.; supervision, A.W.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key R&D Program of China (No. 2022YFB3305800) and the State Key Laboratory of Massive Personalized Customization System and Technology (No. H&C-MPC-2023-02-05).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Lucheng Chen was employed by the State Key Laboratory of Massive Personalized Customization System and Technology and COSMOPlat IoT Technology Co., Ltd. Authors Xiaoshuang Liu, Ailing Wang, and Weiwei Zhai were employed by the Qingdao Penghai Software Co., Ltd. Xiang cheng declared that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
Liu, Y.; Fan, T.; Chen, T.; Xu, Q.; Yang, Q. Fate: An industrial grade platform for collaborative learning with data protection. J. Mach. Learn. Res. 2021, 22, 1–6. [Google Scholar]
Li, L.; Fan, Y.; Tse, M.; Lin, K.Y. A review of applications in federated learning. Comput. Ind. Eng. 2020, 149, 106854. [Google Scholar] [CrossRef]
Zhong, H.; Liao, C.; Squicciarini, A.C.; Zhu, S.; Miller, D. Backdoor embedding in convolutional neural network models via invisible perturbation. In Proceedings of the Tenth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 16 March 2020; pp. 97–108. [Google Scholar]
Chen, X.; Liu, C.; Li, B.; Lu, K.; Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv 2017, arXiv:1712.05526. [Google Scholar]
Ji, Y.; Zhang, X.; Ji, S.; Luo, X.; Wang, T. Model-reuse attacks on deep learning systems. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15 October 2018; pp. 349–363. [Google Scholar]
Liu, Y.; Ma, S.; Aafer, Y.; Lee, W.C.; Zhai, J.; Wang, W.; Zhang, X. Trojaning attack on neural networks. In Proceedings of the 25th Annual Network And Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018. [Google Scholar] [CrossRef]
Bagdasaryan, E.; Veit, A.; Hua, Y.; Estrin, D.; Shmatikov, V. How to backdoor federated learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Virtual Event USA, 2 November 2020; pp. 2938–2948. [Google Scholar]
Xie, C.; Huang, K.; Chen, P.Y.; Li, B. DBA: Distributed backdoor attacks against federated learning. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Fang, M.; Cao, X.; Jia, J.; Gong, N. Local model poisoning attacks to {Byzantine-Robust} federated learning. In Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Boston, MA, USA, 12–14 August 2020; pp. 1605–1622. [Google Scholar]
Lyu, X.; Han, Y.; Wang, W.; Liu, J.; Wang, B.; Liu, J.; Zhang, X. Poisoning with cerberus: Stealthy and colluded backdoor attack against federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–8 February 2023; Volume 37, pp. 9020–9028. [Google Scholar]
Li, H.; Ye, Q.; Hu, H.; Li, J.; Wang, L.; Fang, C.; Shi, J. 3dfed: Adaptive and extensible framework for covert backdoor attack in federated learning. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21–25 May 2023; pp. 1893–1907. [Google Scholar]
Lin, J.; Xu, L.; Liu, Y.; Zhang, X. Composite backdoor attack for deep neural network by mixing existing benign features. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event USA, 2 November 2020; pp. 113–131. [Google Scholar]
Sun, Z.; Kairouz, P.; Suresh, A.T.; McMahan, H.B. Can you really backdoor federated learning? arXiv 2019, arXiv:1911.07963. [Google Scholar]
Wang, H.; Sreenivasan, K.; Rajput, S.; Vishwakarma, H.; Agarwal, S.; Sohn, J.y.; Lee, K.; Papailiopoulos, D. Attack of the tails: Yes, you really can backdoor federated learning. Adv. Neural Inf. Process. Syst. 2020, 33, 16070–16084. [Google Scholar]
Fung, C.; Yoon, C.J.; Beschastnikh, I. The limitations of federated learning in sybil settings. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), San Sebastian, Spain, 14–18 October 2020; pp. 301–316. [Google Scholar]
Zhao, C.; Wen, Y.; Li, S.; Liu, F.; Meng, D. Federatedreverse: A detection and defense method against backdoor attacks in federated learning. In Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security, Belgium, Brussel, 24 June 2021; pp. 51–62. [Google Scholar]
Andreina, S.; Marson, G.A.; Möllering, H.; Karame, G. Baffle: Backdoor detection via feedback-based federated learning. In Proceedings of the 2021 IEEE 41st International Conference on Distributed Computing Systems (ICDCS), Washington, DC, USA, 7–10 July 2021; pp. 852–863. [Google Scholar]
Lu, S.; Li, R.; Liu, W.; Chen, X. Defense against backdoor attack in federated learning. Comput. Secur. 2022, 121, 102819. [Google Scholar] [CrossRef]
Ozdayi, M.S.; Kantarcioglu, M.; Gel, Y.R. Defending against backdoors in federated learning with robust learning rate. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; Volume 35, pp. 9268–9276. [Google Scholar]
Xie, C.; Chen, M.; Chen, P.Y.; Li, B. Crfl: Certifiably robust federated learning against backdoor attacks. In Proceedings of the International Conference on Machine Learning, PMLR, Vienna, Austria, 18–24 July 2021; pp. 11372–11382. [Google Scholar]
Awan, S.; Luo, B.; Li, F. Contra: Defending against poisoning attacks in federated learning. In Proceedings of the Computer Security–ESORICS 2021: 26th European Symposium on Research in Computer Security, Darmstadt, Germany, 4–8 October 2021; Proceedings, Part I 26; Springer: Berlin/Heidelberg, Germany, 2021; pp. 455–475. [Google Scholar]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
Kairouz, P.; Liu, Z.; Steinke, T. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In Proceedings of the International Conference on Machine Learning, PMLR, Vienna Austria, 18–24 July 2021; pp. 5201–5212. [Google Scholar]
Fereidooni, H.; Marchal, S.; Miettinen, M.; Mirhoseini, A.; Möllering, H.; Nguyen, T.D.; Rieger, P.; Sadeghi, A.R.; Schneider, T.; Yalame, H.; et al. SAFELearn: Secure aggregation for private federated learning. In Proceedings of the 2021 IEEE Security and Privacy Workshops (SPW), Virtual Event, 27 May 2021; pp. 56–62. [Google Scholar]
Zeng, Y.; Park, W.; Mao, Z.M.; Jia, R. Rethinking the backdoor attacks’ triggers: A frequency perspective. In Proceedings of the IEEE/CVF international Conference on Computer Vision, Virtual Event, 19–25 June 2021; pp. 16473–16481. [Google Scholar]
Liu, K.; Dolan-Gavitt, B.; Garg, S. Fine-pruning: Defending against backdooring attacks on deep neural networks. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses; Springer: Berlin/Heidelberg, Germany, 2018; pp. 273–294. [Google Scholar]
Wang, B.; Yao, Y.; Shan, S.; Li, H.; Viswanath, B.; Zheng, H.; Zhao, B.Y. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 707–723. [Google Scholar]
Yoshida, K.; Fujino, T. Disabling backdoor and identifying poison data by using knowledge distillation in backdoor attacks on deep neural networks. In Proceedings of the 13th ACM Workshop on Artificial Intelligence and Security, New York, NY, USA, 9 November 2020; pp. 117–127. [Google Scholar]
Li, Y.; Lyu, X.; Koren, N.; Lyu, L.; Li, B.; Ma, X. Neural attention distillation: Erasing backdoor triggers from deep neural networks. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual Event, 3–7 May 2021. [Google Scholar]
Ezzeldin, Y.H.; Yan, S.; He, C.; Ferrara, E.; Avestimehr, A.S. Fairfed: Enabling group fairness in federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 7494–7502. [Google Scholar]
Zhang, J.; Hua, Y.; Wang, H.; Song, T.; Xue, Z.; Ma, R.; Guan, H. Fedala: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 11237–11244. [Google Scholar]
Wu, X.; Huang, F.; Hu, Z.; Huang, H. Faster adaptive federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 10379–10387. [Google Scholar]
Bhagoji, A.N.; Chakraborty, S.; Mittal, P.; Calo, S. Analyzing federated learning through an adversarial lens. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 634–643. [Google Scholar]
Cao, D.; Chang, S.; Lin, Z.; Liu, G.; Sun, D. Understanding distributed poisoning attack in federated learning. In Proceedings of the 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), Tianjin, China, 4–6 December 2019; pp. 233–239. [Google Scholar]
Zhang, J.; Chen, B.; Cheng, X.; Binh, H.T.T.; Yu, S. PoisonGAN: Generative poisoning attacks against federated learning in edge computing systems. IEEE Internet Things J. 2020, 8, 3310–3322. [Google Scholar] [CrossRef]
Zhao, Z.; Chen, X.; Xuan, Y.; Dong, Y.; Wang, D.; Liang, K. Defeat: Deep hidden feature backdoor attacks by imperceptible perturbation and latent representation constraints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 15213–15222. [Google Scholar]
Zhong, N.; Qian, Z.; Zhang, X. Imperceptible Backdoor Attack: From Input Space to Feature Representation. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022; pp. 1736–1742. [Google Scholar]
Xia, P.; Li, Z.; Zhang, W.; Li, B. Data-efficient backdoor attacks. arXiv 2022, arXiv:2204.12281. [Google Scholar]
Rong, D.; He, Q.; Chen, J. Poisoning Deep Learning Based Recommender Model in Federated Learning Scenarios. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence (IJCAI 2022), Vienna, Austria, 23–29 July 2022; pp. 2204–2210. [Google Scholar]
Liu, Y.; Fan, M.; Chen, C.; Liu, X.; Ma, Z.; Wang, L.; Ma, J. Backdoor defense with machine unlearning. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications, Virtual Event, 12 May 2022; pp. 280–289. [Google Scholar]
Li, Y.; Lyu, X.; Koren, N.; Lyu, L.; Li, B.; Ma, X. Anti-backdoor learning: Training clean models on poisoned data. Adv. Neural Inf. Process. Syst. 2021, 34, 14900–14912. [Google Scholar]
Weber, M.; Xu, X.; Karlaš, B.; Zhang, C.; Li, B. Rab: Provable robustness against backdoor attacks. In Proceedings of the 2023 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 21 May 2023; pp. 1311–1328. [Google Scholar]
Nguyen, T.D.; Rieger, P.; De Viti, R.; Chen, H.; Brandenburg, B.B.; Yalame, H.; Möllering, H.; Fereidooni, H.; Marchal, S.; Miettinen, M.; et al. {FLAME}: Taming backdoors in federated learning. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 18 July 2022; pp. 1415–1432. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Levine, N.; Matsukawa, A.; Ghasemzadeh, H. Improved knowledge distillation via teacher assistant. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7 February 2020; Volume 34, pp. 5191–5198. [Google Scholar]
Park, W.; Kim, D.; Lu, Y.; Cho, M. Relational knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3967–3976. [Google Scholar]
Zagoruyko, S.; Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv 2016, arXiv:1612.03928. [Google Scholar]
Song, X.; Feng, F.; Han, X.; Yang, X.; Liu, W.; Nie, L. Neural compatibility modeling with attentive knowledge distillation. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 5–14. [Google Scholar]
Hou, Y.; Ma, Z.; Liu, C.; Loy, C.C. Learning lightweight lane detection cnns by self attention distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 1013–1021. [Google Scholar]
Zheng, X.; Dong, Q.; Fu, A. WMDefense: Using watermark to defense Byzantine attacks in federated learning. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Virtual Event, 12 May 2022; pp. 1–6. [Google Scholar]
Tran, B.; Li, J.; Madry, A. Spectral signatures in backdoor attacks. Adv. Neural Inf. Process. Syst. 2018, 31, 8011–8021. [Google Scholar]
Li, S.; Xue, M.; Zhao, B.Z.H.; Zhu, H.; Zhang, X. Invisible backdoor attacks on deep neural networks via steganography and regularization. IEEE Trans. Dependable Secur. Comput. 2020, 18, 2088–2105. [Google Scholar] [CrossRef]

Figure 1. The framework of federated learning.

Figure 2. Overview of our method.

Figure 3. Ablation analysis on MNIST dataset against different backdoor attacks. (a) ASR (b) ACC.

Figure 4. Ablation analysis on CIFAR-10 dataset against different backdoor attacks. (a) ASR (b) ACC.

Table 1. Datasets used in our experiments.

Dataset	Training Samples	Test Samples	Classes
MNIST	60,000	10,000	10
Fashion-MNIST	60,000	10,000	10
CIFAR-10	50,000	10,000	10
CIFAR-100	50,000	10,000	100

Table 2. The defense performance of FLSAD compares with the different federated backdoor defense methods on the MNIST dataset.

Backdoor Attacks	Trigger Size	Before		FoolsGold		Baffle		CONTRA		RLR		FLSAD
Backdoor Attacks	Trigger Size	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC
Watermarking	0.1	90.01	93.56	18.64	80.43	6.54	85.32	3.68	87.31	2.13	88.56	1.87	94.56
	0.3	91.22	92.73	19.34	81.62	6.73	84.03	3.99	87.64	2.67	87.98	1.98	94.05
	0.5	90.15	91.34	21.45	79.78	7.88	83.22	4.37	86.93	2.54	88.03	1.67	93.86
Pixel block	3 × 3	85.96	94.78	11.33	83.12	3.21	86.56	2.53	88.78	1.98	89.02	1.27	94.99
	5 × 5	89.22	92.45	13.54	82.88	3.45	85.43	2.62	88.54	2.32	87.75	1.54	94.5
	7 × 7	89.73	92.13	13.01	82.14	3.03	85.07	2.04	89.25	2.01	87.34	1.34	94.14
Random noise	10	89.32	93.67	15.65	81.43	2.67	84.73	3.44	87.53	2.33	85.11	1.33	93.75
	30	90.04	93.51	17.24	81.02	3.76	83.17	3.05	85.45	2.67	86.21	1.78	93.66
	50	91.42	89.76	14.53	81.96	3.32	83.79	3.98	86.02	2.45	86.05	1.41	93.41

Table 3. The defense performance of FLSAD compares with the different federated backdoor defense methods on the Fashion-MNIST dataset.

Backdoor Attacks	Trigger Size	Before		FoolsGold		Baffle		CONTRA		RLR		FLSAD
Backdoor Attacks	Trigger Size	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC
Watermarking	0.1	90.04	91.65	15.34	82.76	5.93	84.36	4.69	89.62	4.95	92.89	2.31	94.34
	0.3	88.52	91.21	15.24	83.14	6.34	83.67	4.42	88.56	4.31	91.78	2.43	94.65
	0.5	88.11	90.24	15.96	83.09	6.21	82.22	3.46	88.22	4.01	91.86	2.87	94.12
Pixel block	3 × 3	86.51	92.13	9.47	84.73	3.78	87.44	2.35	89.01	3.19	93.11	1.69	95.78
	5 × 5	85.33	91.89	8.73	83.72	3.89	86.9	3.32	88.79	2.89	92.86	1.34	95.24
	7 × 7	85.74	91.78	9.56	83.94	2.57	86.87	3.54	86.76	3.04	92.42	1.89	95.01
Random noise	10	87.58	93.33	13.65	81.83	4.62	84.21	4.02	84.49	1.98	91.67	1.13	94.51
	30	88.62	93.97	13.31	81.45	5.33	83.51	4.11	84.21	2.32	91.32	1.45	94.63
	50	88.49	92.68	13.85	80.62	4.78	83.09	3.98	83.94	2.45	92.04	1.64	93.89

Table 4. The defense performance of FLSAD compares with the different federated backdoor defense methods on the CIFAR-10 dataset.

Backdoor Attacks	Trigger Size	Before		FoolsGold		Baffle		CONTRA		RLR		FLSAD
Backdoor Attacks	Trigger Size	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC
Watermarking	0.1	61.45	84.33	4.86	77.43	6.52	76.88	1.79	81.89	3.86	79.56	0.86	88.78
	0.3	60.99	82.67	5.78	76.91	6.79	77.54	2.34	78.58	2.67	77.45	0.68	87.96
	0.5	60.12	83.45	5.12	74.43	6.41	76.45	2.03	76.95	2.78	76.56	0.98	88.35
Pixel block	3 × 3	55.9	85.74	3.15	76.96	4.16	78.42	1.23	81.03	2.31	81.59	0.63	89.14
	5 × 5	53.98	82.34	4.03	76.34	4.01	77.94	0.99	80.56	2.42	82.12	0.74	89.31
	7 × 7	54.33	83.19	4.25	75.97	5.34	78.52	1.76	79.93	2.07	81.45	0.54	88.79
Random noise	10	57.83	82.31	3.21	78.42	5.93	75.41	2.03	78.34	3.76	82.31	0.97	87.62
	30	58.34	81.23	2.68	77.55	4.85	74.93	1.76	77.02	3.04	83.52	1.03	87.91
	50	57.91	81.73	3.41	78.02	5.35	74.48	1.95	76.52	3.89	81.43	0.99	86.84

Table 5. The defense performance of FLSAD compares with the different federated backdoor defense methods on the CIFAR-100 dataset.

Backdoor Attacks	Trigger Size	Before		FoolsGold		Baffle		CONTRA		RLR		FLSAD
Backdoor Attacks	Trigger Size	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC	ASR	ACC
Watermarking	0.1	55.89	73.95	3.45	70.89	5.67	71.51	2.53	72.5	4.45	69.49	0.93	74.98
	0.3	59.43	72.68	4.67	71.32	6.85	70.35	3.35	72.41	4.93	70.38	1.38	74.33
	0.5	58.75	71.84	4.35	70.24	6.43	67.41	3.62	72.16	5.3	69.42	1.35	73.89
Pixel block	3 × 3	54.79	74.34	2.79	71.69	4.97	71.59	1.78	73.61	4.06	72.88	0.89	76.75
	5 × 5	55.78	73.89	2.76	70.34	5.12	71.42	1.46	73.01	4.23	70.25	0.98	74.62
	7 × 7	57.98	73.21	3.56	69.67	5.19	69.94	2.53	72.57	4.98	70.91	1.04	75.74
Random noise	10	57.47	72.32	2.89	70.38	4.43	70.46	1.35	72.42	3.89	70.82	0.96	73.24
	30	59.62	72.74	3.79	68.93	4.15	69.59	2.05	71.39	4.13	68.39	1.15	73.73
	50	62.13	71.67	4.23	69.37	4.97	70.36	2.16	70.62	4.47	69.31	1.08	72.54

Table 6. Computatioal costs of FLSAD and baseline defense methods (in seconds).

Dataset	FoolsGold	Baffle	CONTRA	RLR	FLSAD
MNIST	154	78	33	1720	1680
Fashion-MNIST	178	83	28	2376	1823
CIFAR-10	1548	2459	167	7612	5683
CIFAR-100	1689	2658	156	8120	6748

Table 7. Computation cost of FLSAD.

Target Accuracy	Computation (Epochs)
Target Accuracy	E = 2	E = 6	E = 10	E = 15	E = 20	E = 30
70%	100	150	180	200	210	220
80%	120	200	230	260	280	300

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, L.; Liu, X.; Wang, A.; Zhai, W.; Cheng, X. FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation. Symmetry 2024, 16, 1497. https://doi.org/10.3390/sym16111497

AMA Style

Chen L, Liu X, Wang A, Zhai W, Cheng X. FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation. Symmetry. 2024; 16(11):1497. https://doi.org/10.3390/sym16111497

Chicago/Turabian Style

Chen, Lucheng, Xiaoshuang Liu, Ailing Wang, Weiwei Zhai, and Xiang Cheng. 2024. "FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation" Symmetry 16, no. 11: 1497. https://doi.org/10.3390/sym16111497

APA Style

Chen, L., Liu, X., Wang, A., Zhai, W., & Cheng, X. (2024). FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation. Symmetry, 16(11), 1497. https://doi.org/10.3390/sym16111497

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FLSAD: Defending Backdoor Attacks in Federated Learning via Self-Attention Distillation

Abstract

1. Introduction

2. Background and Related Work

2.1. Federated Learning

2.2. Backdoor Attack in Federated Learning

2.3. Backdoor Defense in Federated Learning

2.4. Knowledge and Attention Distillation

3. Threat Model and Defense Goal

3.1. Threat Model

3.2. Defense Goal

4. The Proposed Method

4.1. Overview

4.2. Trigger Recovery

4.3. Self Attention Distillation

5. Experimental Evaluation

5.1. Datasets and Experimental Setup

5.1.1. Datasets

5.1.2. Experimental Setup

5.2. Comparison with Baseline Defenses

5.3. Computational Costs

5.4. Ablation Study

5.4.1. Impact of Trigger Recovery

5.4.2. Impact of Self-Attention

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI