Byzantine-Robust Multimodal Federated Learning Framework for Intelligent Connected Vehicle

Wu, Ning; Lin, Xiaoming; Lu, Jianbin; Zhang, Fan; Chen, Weidong; Tang, Jianlin; Xiao, Jing

doi:10.3390/electronics13183635

Open AccessArticle

Byzantine-Robust Multimodal Federated Learning Framework for Intelligent Connected Vehicle

by

Ning Wu

¹,

Xiaoming Lin

^2,3,*,

Jianbin Lu

¹,

Fan Zhang

^2,3,

Weidong Chen

¹,

Jianlin Tang

^2,3 and

Jing Xiao

¹

Guangxi Power Grid Co., Ltd., Nanning 530013, China

²

Electric Power Research Institute of CSG, Guangzhou 510663, China

³

Guangdong Provincial Key Laboratory of Intelligent Measurement and Advanced Metering of Power Grid, Guangzhou 510663, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3635; https://doi.org/10.3390/electronics13183635

Submission received: 12 July 2024 / Revised: 23 August 2024 / Accepted: 26 August 2024 / Published: 12 September 2024

(This article belongs to the Special Issue Network Security Management in Heterogeneous Networks)

Download

Browse Figure

Versions Notes

Abstract

:

In the rapidly advancing domain of Intelligent Connected Vehicles (ICVs), multimodal Federated Learning (FL) presents a powerful methodology to harness diverse data sources, such as sensors, cameras, and Vehicle-to-Everything (V2X) communications, without compromising data privacy. Despite its potential, the presence of Byzantine adversaries–malicious participants who contribute incorrect or misleading updates–poses a significant challenge to the robustness and reliability of the FL process. This paper proposes a Byzantine-robust multimodal FL framework specifically designed for ICVs. Our framework integrates a robust aggregation mechanism to mitigate the influence of adversarial updates, a multimodal fusion strategy to effectively manage and combine heterogeneous input data, and a global optimization objective that accommodates the presence of Byzantine clients. The theoretical foundation of the framework is established through formal definitions and equations, demonstrating its ability to maintain reliable and accurate learning outcomes despite adversarial disruptions. Extensive experiments highlight the framework’s efficacy in preserving model performance and resilience in real-world ICV environments.

Keywords:

federated learning; multimodal learning; intelligent connected vehicle; Byzantine-robust federated learning

1. Introduction

The advent of Intelligent Connected Vehicles (ICVs) marks a transformative shift in transportation technology [1], promising to revolutionize road safety [2], traffic efficiency [3], and the overall driving experience [4]. ICVs leverage an intricate network of sensors, including high-resolution cameras, Light Detection and Ranging (LiDAR) systems [5], millimeter-wave radars [6], and Global Positioning System (GPS) receivers, to create a comprehensive understanding of their environment [7]. This sensor fusion, combined with advanced communication technologies such as Vehicle-to-Everything (V2X) protocols [8], enables ICVs to make informed decisions [9], navigate complex traffic scenarios [10], and interact seamlessly with other vehicles and infrastructure [11].

However, the proliferation of ICVs introduces unprecedented challenges in data management and processing [12]. The sheer volume of data generated by a single vehicle–estimated to be up to 25 gigabytes per hour–multiplied across millions of vehicles, creates a data deluge that traditional centralized computing paradigms struggle to handle efficiently [13,14,15]. Moreover, these data often contain sensitive information about vehicle locations, driving patterns, and potentially even biometric data of drivers, raising significant privacy concerns [16].

Federated Learning (FL) [17] has emerged as a promising solution to address these challenges. FL enables collaborative machine learning without the need for centralized data storage or processing [18]. In the context of ICVs, this means that vehicles can collectively train sophisticated models for tasks such as object detection [19], traffic prediction [14], and autonomous navigation [20,21], while keeping their raw sensor data securely on-board. This decentralized approach not only preserves privacy but also significantly reduces the bandwidth required for model training, as only model updates, rather than raw data, are transmitted [22].

Despite its potential, the application of FL in ICV scenarios faces several critical challenges that demand innovative solutions:

Multimodal Data Integration. ICVs generate a diverse array of data types from various sensors [23,24]. Each sensor modality provides unique and complementary information. For example, cameras provide rich visual data that are critical for object recognition and scene understanding, while LiDAR provides precise depth information and 3D point clouds for accurate distance measurement and object localization [19]. These different modality types of data are extremely important for the proper operation of ICVs. Effectively fusing these heterogeneous data sources while maintaining their privacy-preserving nature in a FL setup is a complex challenge [25]. Traditional centralized fusion techniques are not directly applicable, necessitating novel approaches that can operate on distributed, privacy-sensitive data.
Byzantine Attacks. In a distributed learning environment like FL, the system is vulnerable to Byzantine attacks, where malicious participants or compromised vehicles may inject false or manipulated data or model updates [26,27,28]. These attacks can take many forms, such as data poisoning [26,28], where the adversary injects crafted malicious samples into local training data, or model poisoning [29,30,31], where the adversary sends malicious model updates to corrupt the global model. The consequences of such attacks in an ICV context could be severe, potentially leading to erroneous object detection or navigation decisions that compromise road safety [32,33]. Developing robust defense mechanisms that can detect and mitigate these attacks without compromising the efficiency of the federated learning process is crucial.
Communication Constraints. The mobility of vehicles presents unique challenges to the FL process, such as vehicles may experience periods of disconnection or weak signal strength, especially in rural or underground areas [7,9,34]. In addition, the network capacity available to vehicles may fluctuate widely depending on location, and network congestion and frequent high-bandwidth communications can put stress on the vehicle’s power system, especially in electric vehicles [5]. Designing a communication-efficient federated learning protocol that can adapt to these dynamic conditions while ensuring timely and effective model updates is essential.

To address these challenges, we propose a Byzantine-robust multimodal federated learning framework specifically designed for intelligent connected vehicles. Specifically, we first design a novel multimodal fusion architecture that can effectively integrate various sensor data while preserving privacy. The architecture adopts a hierarchical approach to first locally fuse data within each modality and then use a privacy-preserving cross-modal attention mechanism to integrate information across modalities. In addition, we design a Byzantine-robust aggregation algorithm based on gradient compression that can detect and mitigate the impact of malicious participants in the FL process while maintaining high communication efficiency. Our approach combines a statistical analysis of model updates with a reputation system that tracks the historical reliability of participants. Our framework not only addresses the immediate challenges facing intelligent connected vehicles, but also lays the foundation for building a scalable, secure, and efficient federated learning ecosystem in the broader context of intelligent transportation systems.

The contributions of this paper are listed as follows:

(1): We develop a novel Byzantine-robust aggregation technique based on gradient compression, enhancing the resilience of federated learning against adversarial nodes.
(2): We introduce an advanced cross-node multimodal alignment and fusion technique that efficiently combines data from diverse sensors to improve model performance in ICVs.
(3): We implement top-k gradient compression to improve communication efficiency. This reduces the communication overhead between nodes and the central server, making the framework suitable for large-scale deployment.
(4): We conducted extensive experiments on three public datasets for the proposed framework and evaluated prior work to demonstrate the advantages of the proposed framework. Our framework can achieve a better cost–utility trade-off.

The remainder of this paper is organized as follows: Section 2 provides a comprehensive review of related work in federated learning, multimodal fusion techniques, Byzantine-robust algorithms, and their applications in intelligent transportation systems. Section 3 presents our problem definition in detail, elaborating on the challenges and constraints. Section 4 presents our proposed framework in detail, including the cross-node multimodal alignment and fusion method, gradient compression-based Byzantine aggregation algorithm, and time complexity analysis. Section 5 discusses the results of our experiments, providing a comparative analysis with existing methods and an ablation study to quantify the impact of each component of our framework. Finally, Section 6 concludes the paper by summarizing our contributions, discussing the limitations of our approach, and outlining promising directions for future research in this rapidly evolving field.

2. Related Work

This section provides an overview of the existing literature relevant to our proposed Byzantine-robust multimodal FL framework for ICVs. We organize the related work into four key areas: federated learning in vehicular networks, multimodal learning for ICVs, Byzantine-robust federated learning, and communication-efficient federated learning.

2.1. Federated Learning in Vehicular Networks

FL has gained significant attention in the context of vehicular networks due to its ability to leverage distributed data while preserving privacy [35,36]. McMahan et al. [17] introduced the seminal FedAvg algorithm, which forms the basis for many federated learning approaches. In the vehicular domain, Du et al. [25] proposed a blockchain-based FL framework for securing data sharing in the Internet of Vehicles (IoVs). Their approach addresses trust issues in data sharing but does not consider multimodal data or Byzantine attacks. Samarakoon et al. [35] developed a FL approach for joint power control and resource allocation in vehicular networks. While their work demonstrates the potential of FL in optimizing network performance, it focuses on network-level optimization rather than perception and decision-making tasks. Lu et al. [37] introduced a FL framework for cooperative sensing in connected and autonomous vehicles. Their approach shows promise in improving sensing accuracy, but it does not address the challenges of multimodal data fusion or Byzantine robustness.

2.2. Multimodal Learning for ICVs

Multimodal learning is crucial for ICVs to effectively integrate data from various sensors [38]. Feng et al. [39] proposed a deep multimodal fusion framework for object detection in autonomous driving, combining data from cameras and LiDAR. However, their approach assumes centralized data processing, which is not suitable for privacy-preserving federated learning scenarios. Caesar et al. [23] developed a multimodal attention network for sensor fusion in autonomous vehicles. While their work demonstrates improved perception accuracy, it does not consider the distributed nature of data in federated learning settings. In the context of FL, Liu et al. [40] proposed a multimodal federated learning framework for medical image analysis. Although their work addresses privacy concerns in multimodal learning, it is not tailored to the specific challenges of vehicular networks and does not consider Byzantine attacks.

2.3. Byzantine-Robust Federated Learning

Byzantine robustness is critical for ensuring the reliability of FL systems, especially in safety-critical applications like ICVs [28,31,41]. Yin et al. [42] introduced the Byzantine-robust distributed learning algorithm, which can tolerate up to a certain fraction of Byzantine workers. However, their approach assumes a centralized parameter server, which may not be suitable for fully decentralized vehicular networks. Blanchard et al. [43] proposed the Krum algorithm for Byzantine-robust aggregation in FL [43]. While Krum provides theoretical guarantees against Byzantine attacks, it may not be computationally efficient for the large-scale and time-sensitive nature of ICV applications. More recently, Fung et al. [26] developed FoolsGold, a Byzantine-robust federated learning system that can defend against Sybil attacks [28,33]. Their approach shows promise in identifying and mitigating the impact of malicious clients, but it does not consider the multimodal nature of ICV data.

2.4. Communication-Efficient Federated Learning

Communication efficiency is paramount in vehicular networks due to bandwidth limitations and the mobile nature of vehicles [44]. Konečný et al. [34] proposed structured updates and sketched updates to reduce the communication cost in FL. While their methods show significant bandwidth savings, they do not address the specific challenges of vehicular networks. Sattler et al. [35] introduced Sparse Ternary Compression (STC) for communication-efficient federated learning. STC achieves high compression rates while maintaining model accuracy, but it does not consider the dynamic nature of vehicular network conditions. In the context of vehicular networks, Ye et al. [45] proposed an efficient federated learning scheme with adaptive model aggregation. Their approach considers vehicle mobility and network conditions but does not address multimodal data fusion or Byzantine robustness.

While the existing literature has made significant strides in various aspects of federated learning for vehicular networks, there remains a critical gap in addressing the combined challenges of multimodal data integration, Byzantine robustness, and communication efficiency in a unified framework for ICVs. By addressing these challenges simultaneously, our framework aims to provide a comprehensive solution for secure, efficient, and reliable federated learning in Intelligent Connected Vehicle systems.

3. Problem Definition

In this section, we formally define the problem of Byzantine-robust multimodal federated learning for ICVs. We outline the system model, the objectives of our framework, and the specific challenges we aim to address.

3.1. System Model

Consider a network of K ICVs, denoted as

V = {v_{1}, v_{2}, \dots, v_{k}}

. Each vehicle

v_{k}

is equipped with a set of M sensors

S = {s_{1}, s_{2}, \dots, s_{m}}

, where each sensor captures a different modality of data (e.g., camera images, LiDAR point clouds, radar signals, GPS coordinates). Each vehicle

v_{k}

maintains its local dataset

D_{k} = {(x_{i}, y_{i})}_{i = 1}^{n_{k}}

, where

x_{i}

represents the multimodal input data and

y_{i}

the corresponding labels. As shown in Figure 1, the goal is to collaboratively train a global model

w

that can accurately perform a given task (e.g., object detection, traffic prediction) by aggregating local updates

Δ w_{i}

from each vehicle without sharing raw data. The federated learning process can be formalized as follows:

w_{t + 1} = w_{t} + η \sum_{i = 1}^{K} α_{i} Δ w_{i}^{t},

(1)

where

w_{t}

is the global model at iteration t,

η

is the learning rate,

α_{i}

is the weight assigned to the i-th vehicle, and

Δ w_{i}^{t}

is the local model update from vehicle i at iteration t.

Multimodal Fusion. Each vehicle processes multimodal data, necessitating an effective fusion strategy to handle different types of input data. Let

x_{i}

denote the multimodal input data from vehicle

v_{k}

, comprising m modalities

x_{i} = {x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{m}}

. The local model

f_{i} (x_{i}; w_{i})

integrates these modalities to produce predictions

{\hat{y}}_{i} = f_{i} (x_{i}; w_{i})

. The multimodal fusion within each vehicle can be formulated as follows:

h_{i} = F (x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{m})

(2)

where

F

denotes the fusion function that combines the features from different modalities into a unified representation

h_{i}

.

Byzantine Adversaries. In this setup, a subset of vehicles may act as Byzantine adversaries, sending arbitrary or malicious updates

Δ w_{i}^{t, adv}

. These adversarial updates can significantly deteriorate the performance of the global model. Let

B \subseteq {1, 2, \dots, N}

denote the set of Byzantine vehicles, with

| B | = b

. To mitigate the influence of Byzantine adversaries, we introduce a Byzantine-robust aggregation mechanism. The objective is to aggregate the local updates in a way that minimizes the impact of adversarial updates. Formally, the robust aggregation function

A

is defined as follows:

w_{t + 1} = w_{t} + η A ({Δ w_{i}^{t}}_{i = 1}^{N}) .

(3)

The aggregation function

A

should satisfy the following properties: (1) Resilience: It should be resilient to at most b Byzantine adversaries. (2) Accuracy: It should ensure that the aggregated update is close to the mean of the non-adversarial updates.

Objective Function. The overall objective is to minimize the global loss function

L (w)

over all vehicles, accounting for the presence of Byzantine adversaries:

min_{w} L (w) = \sum_{i \notin B} α_{i} L_{i} (w),

(4)

where

L_{i} (w)

is the local loss function for vehicle i. Thus, the proposed Byzantine-robust multimodal federated learning framework aims to ensure robust and efficient collaborative learning among ICVs, leveraging diverse data modalities while safeguarding against adversarial disruptions.

3.2. Challenges and Constraints

Implementing the above robust framework for ICVs poses several challenges and limitations that must be addressed to ensure the effectiveness and reliability of the system.

Byzantine Robustness. Ensuring robustness against Byzantine adversaries is a significant challenge. Malicious nodes can send faulty updates that can severely degrade the performance of the global model. Designing efficient and effective robust aggregation methods to mitigate these attacks while maintaining high model performance is complex.
Communication Overhead. FL inherently involves substantial communication between nodes and the central server. The gradient compression technique helps reduce this overhead, but finding the optimal balance between compression rate and model accuracy is crucial. Excessive compression can lead to the loss of important information, while insufficient compression can cause excessive communication delays.
Heterogeneous Data. Multimodal datasets from different vehicles may vary in quality, resolution, and format. Ensuring effective data fusion across these heterogeneous sources without losing critical information is a key constraint.

Addressing these challenges and constraints requires continuous innovation and rigorous testing to ensure the framework’s reliability, efficiency, and security in real-world ICV applications.

4. Our Approach

This section presents the Byzantine-robust multimodal federated learning framework for ICVs. The framework includes a Byzantine aggregation algorithm based on gradient compression, a modality alignment fusion method across nodes, and an objective function designed to enhance learning performance despite adversarial interference, as shown in Figure 1.

4.1. Cross-Node Multimodal Alignment and Fusion

The cross-node multimodal alignment and fusion technique is designed to handle the diverse data modalities from different ICVs and align them into a consistent latent space for effective fusion. This technique ensures that the multimodal features from different nodes are comparable and can be effectively aggregated for federated learning.

Local Feature Extraction. Each vehicle

v_{i}

extracts features from its local multimodal data using dedicated subnetworks. Let

x_{i} = {x_{i}^{1}, x_{i}^{2}, \dots, x_{i}^{m}}

represent the input data from m modalities. For each modality j, a feature extraction subnetwork

ϕ_{j}

is used to obtain the local features:

h_{i}^{j} = ϕ_{j} (x_{i}^{j})

(5)

where

h_{i}^{j}

denotes the extracted feature vector for modality j from vehicle

v_{i}

.

Discussion. To effectively manage high-dimensional multimodal data in the proposed framework for ICVs, a combination of dimensionality reduction techniques, feature extraction, and multimodal fusion strategies is employed. Methods like the above Local Feature Extraction reduce the dimensionality of data from various sensors and cameras while preserving essential features. Sparse representations and low-rank approximations further minimize the complexity of high-dimensional inputs. Additionally, adaptive fusion strategies integrate the reduced representations of different modalities, allowing for efficient information aggregation across diverse data sources. This ensures that the framework can handle the high-dimensionality of multimodal data without overwhelming computational resources while maintaining robustness against adversarial attacks.

Global Modality Alignment. To ensure that the features from different nodes are aligned into a common latent space, we employ a modality alignment network

A_{j}

for each modality. This network aligns the local features

h_{i}^{j}

to a global latent space:

{\tilde{h}}_{i}^{j} = A_{j} (h_{i}^{j}),

(6)

where

{\tilde{h}}_{i}^{j}

is the globally aligned feature for modality j from vehicle i.

Alignment Network. The alignment network

A_{j}

can be implemented as a neural network trained to minimize the distance between features of the same class across different nodes. The loss function for training

A_{j}

could be a contrastive loss or a triplet loss:

L_{align} = \sum_{i, k} {[∥ A_{j} (h_{i}^{j}) - A_{j} (h_{k}^{j}) ∥^{2} - {∥ A_{j} (h_{i}^{j}) - A_{j} (h_{neg}^{j}) ∥}^{2} + α]}_{+},

(7)

where

h_{k}^{j}

is a feature from another vehicle with the same label as

h_{i}^{j}

,

h_{neg}^{j}

is a feature from a different class, and

α

is a margin parameter.

Feature Fusion. After aligning the features from all modalities, the next step is to fuse them into a single representation. This unified representation combines information from all modalities and serves as the input for the local prediction model. The fused representation

h_{i}

for vehicle i is obtained by concatenating the aligned features:

h_{i} = Concat ({\tilde{h}}_{i}^{1}, {\tilde{h}}_{i}^{2}, \dots, {\tilde{h}}_{i}^{m}),

(8)

where Concat denotes the concatenation operation across all m modalities.

Concatenation and Fusion Network. The concatenated feature vector

h_{i}

is then passed through a fusion network

F

to integrate the information from different modalities:

z_{i} = F (h_{i}),

(9)

where

z_{i}

is the final fused feature vector used for prediction. The fused feature vector

z_{i}

is used by the local model

f_{i}

to make predictions:

{\hat{y}}_{i} = f_{i} (z_{i}; w_{i}),

(10)

where

w_{i}

are the local model parameters.

Training and Optimization. The local models are trained to minimize the empirical risk over their local datasets. The local loss function for vehicle i is defined as follows:

L_{i} (w) = \frac{1}{n_{i}} \sum_{j = 1}^{n_{i}} ℓ (f_{i} (z_{i, j}; w_{i}), y_{i, j}),

(11)

where

ℓ (\cdot, \cdot)

is the loss function (e.g., cross-entropy loss for classification tasks). The overall objective is to minimize the global loss function across all vehicles, accounting for the presence of Byzantine adversaries (see the following section):

min_{w} L (w) = \sum_{i \notin B} α_{i} L_{i} (w) .

(12)

4.2. Gradient Compression-Based Byzantine Aggregation

This section provides a detailed description of the Byzantine aggregation technique based on top-k gradient compression.

Local Gradient Calculation. Each vehicle

v_{k}

computes the local gradient

Δ w_{k}

based on its local dataset

D_{k}

:

Δ w_{k} = \nabla L_{k} (w_{k})

(13)

where

L_{k} (w_{k})

is the local loss function for vehicle

v_{k}

, and

w_{k}

are the local model parameters. Then we use top-k gradient compression to improve the communication efficiency, where top-k gradient compression involves retaining only the most significant elements of the gradient vector to reduce communication overhead. Given a gradient vector

Δ w_{k}

from vehicle

v_{k}

, we can compute the magnitudes of all elements in

Δ w_{k}

:

{magnitude}_{k} = | Δ w_{k} | .

(14)

Top-k Selection. Here, we identify the indices of the top-k largest magnitudes by using the following equation:

topk_indices = argsort ({magnitude}_{i}) [- k :] .

(15)

Then, we create a binary mask

m_{i}

where the positions corresponding to the top-k indices are set to 1, and the rest are set to 0:

m_{i} [j] = \{\begin{matrix} 1 & if j \in topk_indices \\ 0 & otherwise \end{matrix}

(16)

Gradient Compression. We apply the binary mask to the gradient vector:

C (Δ w_{k}) = Δ w_{k} ⊙ m_{k}

(17)

In this context, the compressed gradient

C (Δ w_{i})

contains only the top-k elements of

Δ w_{i}

, reducing the communication load.

Robust Aggregation. To mitigate the influence of Byzantine adversaries, a robust aggregation method like trimmed mean aggregation is used at the central server. Given compressed gradients

{C (Δ w_{k})}_{k = 1}^{K}

from K vehicles:

Dimension-wise Sorting and Trimming. For each dimension d of the gradient vector, we collect the k-th elements of the compressed gradients from all vehicles, i.e., ${C {(Δ w_{k})}_{d}}_{k = 1}^{K}$ . Then, we sort the collected values, i.e., $C {(Δ w_{(1)})}_{d} \leq C {(Δ w_{(2)})}_{d} \leq \dots \leq C {(Δ w_{(K)})}_{d}$ . After that, we trim the largest and smallest b values, where b is the estimated number of Byzantine adversaries.
Mean Calculation. First, we compute the mean of the remaining values after trimming:

$A_{d} ({C (Δ w_{k}^{t})}_{k = 1}^{K}) = \frac{1}{K - 2 b} \sum_{i = b + 1}^{K - b} C {(Δ w_{k}^{t})}_{d} .$

(18)

Then, we construct the aggregated gradient $A ({C (Δ w_{k}^{t})}_{k = 1}^{K})$ by applying $A_{d}$ to each dimension d:

$A ({C (Δ w_{i}^{t})}_{i = 1}^{N}) = (A_{1} ({C (Δ w_{i}^{t})}_{i = 1}^{N}), \dots, A_{d} ({C (Δ w_{i}^{t})}_{i = 1}^{N})),$

(19)

where d is the dimensionality of the gradient vector.
Global Model Update. The server updates the global model using the robustly aggregated gradient:

$w_{t + 1} = w_{t} + η A ({C (Δ w_{k}^{t})}_{k = 1}^{K}),$

(20)

where $η$ is the learning rate.

The Byzantine aggregation technique based on top-k gradient compression involves compressing local gradients by retaining only the top-k elements and using a robust trimmed mean aggregation method at the server. This approach effectively reduces communication overhead and mitigates the impact of Byzantine adversaries, ensuring resilient and accurate global model updates in the federated learning framework for ICVs. The detailed implementation steps (see Algorithm 1) and formulas provided establish the theoretical and practical foundations of the technique, ensuring it can be effectively applied in real-world scenarios.

4.3. Time Complexity Analysis

Analyzing the time complexity of the Byzantine-robust aggregation technique based on top-k gradient compression involves examining the computational cost of each step in the process. Here, we break down the time complexity for the key steps: local gradient calculation, gradient compression, transmission, robust aggregation at the server, and the global model update.

Local Gradient Calculation. Each vehicle computes the local gradient $Δ w_{i}$ based on its local dataset. Assume the dataset has m samples and the model has d parameters. The time complexity for gradient computation is $O (m \cdot d)$ . This is because each parameter gradient is typically calculated as a sum over the dataset, involving m operations per parameter.
Top-k Gradient Compression. After computing the gradient, each vehicle compresses it by retaining the top-k elements, i.e., $O (d)$ for magnitude calculation, $O (d log k)$ for top-k selection, and $O (d)$ for binary mask creation and gradient compression. The overall time complexity for top-k gradient compression: $O (d + d log k + d) = O (d log k)$ .
Transmission. The transmission time depends on the communication bandwidth and is not typically considered in time complexity analysis. However, since only k elements are transmitted, the communication cost is $O (k)$ .
Robust Aggregation at Server. The server aggregates the compressed gradients using the trimmed mean method, i.e., $O (N \cdot d)$ for dimension-wise collection, $O (d \cdot N log N)$ for sorting, and $O (d \cdot (N - 2 b)) = O (d \cdot N)$ for trimming and mean calculation. Overall time complexity for robust aggregation: $O (d \cdot N log N + d \cdot N) = O (d \cdot N log N)$ .
Global Model Update. The server updates the global model using the aggregated gradient. The time complexity for this step is $O (d)$ .

Combining all the steps, the total time complexity of the Byzantine-robust aggregation technique based on top-k gradient compression is dominated by the robust aggregation at the server, which involves sorting and averaging operations. The overall time complexity is

O (m \cdot d) + O (d log k) + O (k) + O (d \cdot N log N) + O (d)

(21)

Since

O (d \cdot N log N)

is the dominant term, the overall time complexity is

O (d \cdot N log N)

This complexity ensures that the framework can efficiently handle large-scale federated learning with numerous participants, provided the number of parameters d and the number of vehicles K remain manageable.

Algorithm 1: Byzantine-robust Multimodal Federated Learning Algorithm.

Input: Local model

ω_{k}

and local multimodal dataset

D_{k}

.

Output: Global model

ω

The server initializes the generator and global model and sends them to each vehicle;

5. Experiments

5.1. Experiment Setup

To evaluate the performance of our framework, we conducted extensive experiments on four benchmarking datasets. All experiments were implemented using Python 3.9 and PyTorch 1.12 and evaluated on a server with an NVIDIA A100 GPU.

Datasets. Our proposed framework was evaluated using three comprehensive multimodal datasets: KITTI [46], nuScenes [23], and KAIST Multispectral Pedestrian Detection Dataset [47]. KITTI provides RGB images, LiDAR point clouds, and GPS/IMU data for 2D and 3D object detection in urban and highway scenes. nuScenes offers a larger-scale dataset with additional RADAR data, covering diverse driving scenarios in multiple cities. The KAIST dataset focuses on pedestrian detection using RGB and thermal infrared images, challenging the framework with day/night variations. These datasets are adapted for federated learning by partitioning data among simulated ICVs, ensuring non-IID distributions, implementing privacy-preserving data handling, introducing Byzantine nodes, and simulating varying communication conditions. Together, they provide a robust testbed for evaluating our framework’s performance in multimodal fusion, Byzantine resilience, and adaptation to diverse ICV environments and object detection tasks.

Models. Our framework employs a diverse set of state-of-the-art models adapted for FL scenarios. These include PointPillars [48], which efficiently processes LiDAR point clouds and can be extended to incorporate RGB data; MVX-Net [49], designed for the multimodal fusion of LiDAR and image data; AVOD [50], a 3D object detection model that fuses LiDAR and RGB inputs; a custom KAIST Multispectral Pedestrian Detection Network for handling RGB and thermal infrared data; and YOLOv4-Multispectral, an extension of YOLOv4 [51] adapted for fast, multispectral object detection.

Parameters. In the proposed framework, key parameters include the number of participating nodes (

K = 100

vehicles), gradient compression rate (

r = 100

elements), and learning rate

η = 0.001

. The framework is designed to handle

b = {10, 15, 20, 25}

Byzantine nodes. The model typically involves

d = 10^{6}

parameters, with each node processing local datasets of

m = 10, 000

samples. The datasets used include RGB images with resolutions of

1242 \times 375

(KITTI),

1600 \times 900

(nuScenes), and

1920 \times 1280

pixels (Waymo). LiDAR point clouds have densities of around 100,000 (KITTI), 300,000 (nuScenes), and 200,000 points per frame (Waymo). Radar data are captured at approximately 13 Hz, and GPS accuracy is within 1–2 m. These parameters ensure the framework’s efficiency, robustness, and effectiveness in real-world ICV scenarios.

Baselines. The proposed framework was evaluated using the following baselines:

FedAvg [17]. FedAvg aggregates local models from all vehicles by averaging their parameters but does not account for Byzantine robustness.
Krum [43] and Multi-Krum [52]. They are robust aggregation techniques designed to resist Byzantine attacks by selecting gradients that deviate the least from the majority.
Trimmed Mean [53] and Median [28]. These aggregation methods enhance robustness by trimming extreme values and using median values to mitigate the influence of adversarial updates.
Byzantine-resilient SGD (BrSGD) [54]. This approach focuses on detecting and excluding malicious updates during training.
FLTrust [55]. This approach focuses on computing trust scores to select high-quality clients.

These baselines provide a comprehensive evaluation framework, allowing for a robust comparison of the proposed framework’s effectiveness in handling adversarial scenarios, maintaining accuracy, and ensuring efficient multimodal data fusion in ICV tasks.

Evaluation Metrics. Evaluating the proposed framework requires a comprehensive set of metrics to assess its performance across various dimensions. Key evaluation metrics include accuracy, which measures the proportion of correctly detected objects in multimodal datasets such as KITTI, nuScenes, and Waymo. Robustness metrics, i.e., the percentage of successful attacks detected and mitigated, are essential for evaluating the framework’s resilience against Byzantine adversaries. We use the communication overhead to assess the efficiency of the gradient compression and transmission processes, while convergence time measures how quickly the model reaches a satisfactory performance level. Together, these metrics ensure a thorough evaluation of the framework’s accuracy, robustness, efficiency, and overall effectiveness in real-world ICV scenarios.

5.2. Numerical Analysis

System Performance. This experiment aimed to study the system performance of the proposed framework. Specifically, we adopted local label-flipping attacks and Gaussian attacks as Byzantine attacks. We explored the performance of the proposed framework and baselines on different benchmark datasets under local label-flipping attacks. The experimental results are shown in Table 1, which shows that the proposed framework outperforms the baselines on different datasets, indicating that the proposed scheme can filter poisonous data well and maintain model performance. The experimental results show that the framework outperforms other baselines in terms of system performance and anti-poisoning attacks due to the efficient Byzantine defense mechanism and multimodal fusion design.

Secondly, we explored the impact of different numbers of Byzantine clients on the performance of the proposed framework and baselines under the above two attacks. Specifically, we set

b = {10, 15, 20, 25}

. The experimental results are summarized in Table 2 and Table 3. The experimental results show that the proposed framework is still more robust than other baselines under different numbers of attackers, which indicates that the proposed Byzantine defense based on gradient compression is very effective against these Byzantine attacks.

Communication Efficiency. Table 4 records the communication overhead results of the proposed framework and the baselines under different numbers of clients. The experimental results show that the proposed framework has a higher communication efficiency due to the use of a gradient compression scheme, which requires a small gradient size. In addition, cross-node multimodal alignment and fusion provide high-quality model update aggregation, thereby accelerating the convergence of the model.

Parameter Sensitivity. Here, we aim to explore the impact of the gradient compression rate parameter on the communication overhead of the proposed framework. Table 5 shows the communication overhead results under different gradient compression rates, where the results show that the proposed framework can achieve a better communication efficiency.

Ablation Studies. Finally, we conducted ablation experiments to study the performance impact of different components within the proposed framework. Specifically, we explored the impact of the multimodal fusion mechanism and the Byzantine aggregation mechanism based on gradient compression on the framework, respectively. Table 6 summarizes the experimental results. We observed that the performance of the proposed framework with and without the multimodal fusion mechanism was relatively close, indicating a good privacy–performance trade-off. In addition, we observed that the Byzantine aggregation mechanism based on gradient compression significantly improved the model’s robustness and performance.

Discussion. The proposed Byzantine-robust multimodal FL framework for ICVs can effectively scale to larger networks by leveraging adaptive robust aggregation mechanisms, hierarchical structures, and resource-aware optimization techniques. The dynamic network topology of ICVs is managed through asynchronous aggregation and buffer mechanisms, ensuring stability even with fluctuating connectivity. The framework’s multimodal fusion strategy accommodates varying data rates and heterogeneous sensor inputs by employing dynamic fusion weights and adaptive sampling. To handle increasing adversarial presence, hierarchical Byzantine-resilient aggregation, combined with reinforcement learning-based optimization, ensures that the system remains robust. Resource constraints are mitigated through bandwidth-aware compression and gradient sparsification, allowing for the framework to maintain efficiency even under communication limitations. Overall, these strategies enable the framework to scale effectively while preserving robustness, resilience, and performance in real-world ICV environments.

6. Conclusions

In this paper, we introduce a Byzantine-robust multimodal federated learning framework designed for ICVs. This framework addresses critical challenges in federated learning, particularly those related to data privacy, security, and robustness against adversarial attacks. By leveraging advanced techniques such as gradient compression and robust aggregation methods, the framework ensures efficient and secure training across multiple nodes, even in the presence of Byzantine adversaries. We highlighted the importance of multimodal data fusion by integrating diverse sensor data, including RGB images, LiDAR point clouds, radar data, and GPS/IMU measurements, to enhance the accuracy and reliability of object detection in autonomous driving. Through the use of benchmark datasets like KITTI, nuScenes, and Waymo Open Dataset, we demonstrated the framework’s capability to maintain high performance and robustness. Our evaluation metrics, including accuracy, precision, recall, robustness, communication overhead, and computational cost, provide a comprehensive assessment of the framework’s effectiveness. The proposed approach not only advances the state of federated learning in autonomous driving but also sets a foundation for future research on secure and resilient distributed machine learning systems.

Author Contributions

Methodology, J.T.; Software, J.L.; Formal analysis, W.C.; Data curation, F.Z.; Writing—original draft, N.W.; Writing—review & editing, J.X.; Supervision, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Science and Technology Project of Guangxi Power Grid Co., Ltd. (GXKJXM20222206).

Data Availability Statement

Data available in a publicly accessible repository.

Conflicts of Interest

Author Ning Wu, Jianbin Lu, Weidong Chen and Jing Xiao were employed by the company Guangxi Power Grid Co., Ltd.; Xiaoming Lin, Fan Zhang and Jianlin Tang were employed by the company Electric Power Research Institute of CSG. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. And the authors declare that this study received funding from Science and Technology Project of Guangxi Power Grid Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Liu, J.; Liu, J. Intelligent and connected vehicles: Current situation, future directions, and challenges. IEEE Commun. Stand. Mag. 2018, 2, 59–65. [Google Scholar] [CrossRef]
Han, M.; Wan, A.; Zhang, F.; Ma, S. An attribute-isolated secure communication architecture for intelligent connected vehicles. IEEE Trans. Intell. Veh. 2020, 5, 545–555. [Google Scholar] [CrossRef]
Uhlemann, E. Introducing connected vehicles [connected vehicles]. IEEE Veh. Technol. Mag. 2015, 10, 23–31. [Google Scholar] [CrossRef]
Lu, N.; Cheng, N.; Zhang, N.; Shen, X.; Mark, J.W. Connected vehicles: Solutions and challenges. IEEE Internet Things J. 2014, 1, 289–299. [Google Scholar] [CrossRef]
Kim, I.; Martins, R.J.; Jang, J.; Badloe, T.; Khadir, S.; Jung, H.-Y.; Kim, H.; Kim, J.; Genevet, P.; Rho, J. Nanophotonics for light detection and ranging technology. Nat. Nanotechnol. 2021, 16, 508–524. [Google Scholar] [CrossRef]
Mead, J.B.; Pazmany, A.L.; Sekelsky, S.M.; McIntosh, R.E. Millimeter-wave radars for remotely sensing clouds and precipitation. Proc. IEEE 1994, 82, 1891–1906. [Google Scholar] [CrossRef]
Duan, W.; Gu, J.; Wen, M.; Zhang, G.; Ji, Y.; Mumtaz, S. Emerging technologies for 5 g-iov networks: Applications, trends and opportunities. IEEE Netw. 2020, 34, 283–289. [Google Scholar] [CrossRef]
Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Khyam, M.O.; He, J.; Pesch, D.; Moessner, K.; Saad, W.; Poor, H.V. 6 g for vehicle-to-everything (v2x) communications: Enabling technologies, challenges, and opportunities. Proc. IEEE 2022, 110, 712–734. [Google Scholar] [CrossRef]
Chen, S.; Hu, J.; Shi, Y.; Peng, Y.; Fang, J.; Zhao, R.; Zhao, L. Vehicle-to-everything (v2x) services supported by lte-based systems and 5 g. IEEE Commun. Stand. Mag. 2017, 1, 70–76. [Google Scholar] [CrossRef]
Lu, R.; Zhang, L.; Ni, J.; Fang, Y. 5 g vehicle-to-everything services: Gearing up for security and privacy. Proc. IEEE 2019, 108, 373–389. [Google Scholar] [CrossRef]
Campolo, C.; Molinaro, A.; Iera, A.; Menichella, F. 5 g network slicing for vehicle-to-everything services. IEEE Wirel. Commun. 2017, 24, 38–45. [Google Scholar] [CrossRef]
Zavvos, E.; Gerding, E.H.; Yazdanpanah, V.; Maple, C.; Stein, S.; Schraefel, M.C. Privacy and trust in the internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 23, 10126–10141. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Chang, G. Efficient privacy-preserving dual authentication and key agreement scheme for secure v2v communications in an iov paradigm. IEEE Trans. Intell. Transp. 2017, 18, 2740–2749. [Google Scholar] [CrossRef]
Liu, Y.; James, J.; Kang, J.; Niyato, D.; Zhang, S. Privacy-preserving traffic flow prediction: A federated learning approach. IEEE Internet Things J. 2020, 7, 7751–7763. [Google Scholar] [CrossRef]
Mei, Q.; Xiong, H.; Chen, J.; Yang, M.; Kumari, S.; Khan, M.K. Efficient certificateless aggregate signature with conditional privacy preservation in iov. IEEE Syst. J. 2020, 15, 245–256. [Google Scholar] [CrossRef]
Bao, Y.; Qiu, W.; Cheng, X.; Sun, J. Fine-grained data sharing with enhanced privacy protection and dynamic users group service for the iov. IEEE Trans. Intell. Transp. Syst. 2022, 24, 13035–13049. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (PMLR), Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Liu, Y.; Garg, S.; Nie, J.; Zhang, Y.; Xiong, Z.; Kang, J.; Hossain, M.S. Deep anomaly detection for time-series data in industrial iot: A communication-efficient on-device federated learning approach. IEEE Internet Things J. 2020, 8, 6348–6358. [Google Scholar] [CrossRef]
Liu, Y.; Huang, A.; Luo, Y.; Huang, H.; Liu, Y.; Chen, Y.; Feng, L.; Chen, T.; Yu, H.; Yang, A.Q. Fedvision: An online visual object detection platform powered by federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13172–13179. [Google Scholar]
Li, Y.; Tao, X.; Zhang, X.; Liu, J.; Xu, J. Privacy-preserved federated learning for autonomous driving. IEEE Trans. Intell. Transp. Syst. 2021, 23, 8423–8434. [Google Scholar] [CrossRef]
Liu, B.; Wang, L.; Liu, M. Lifelong federated reinforcement learning: A learning architecture for navigation in cloud robotic systems. IEEE Robot. Autom. Lett. 2019, 4, 4555–4562. [Google Scholar] [CrossRef]
Liu, Y.; Yuan, X.; Xiong, Z.; Kang, J.; Wang, X.; Niyato, D. Federated learning for 6g communications: Challenges, methods, and future directions. China Commun. 2020, 17, 105–118. [Google Scholar] [CrossRef]
Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. Nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11621–11631. [Google Scholar]
Zhu, Y.; Ye, Y.; Liu, Y.; James, J. Cross-area travel time uncertainty estimation from trajectory data: A federated learning approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 24966–24978. [Google Scholar] [CrossRef]
Du, Z.; Wu, C.; Yoshinaga, T.; Yau, K.-L.A.; Ji, Y.; Li, J. Federated learning for vehicular internet of things: Recent advances and open issues. IEEE Open J. Comput. Soc. 2020, 1, 45–61. [Google Scholar] [CrossRef] [PubMed]
Fung, C.; Yoon, C.J.; Beschastnikh, I. The limitations of federated learning in sybil settings. In Proceedings of the 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), San Sebastian, Spain, 14–15 October 2020; pp. 301–316. [Google Scholar]
Tolpegin, V.; Truex, S.; Gursoy, M.E.; Liu, L. Data poisoning attacks against federated learning systems. In Proceedings of the Computer Security—ESORICs 2020: 25th European Symposium on Research in Computer Security, Proceedings, Part i 25, Guildford, UK, 14–18 September 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 480–501. [Google Scholar]
Liu, Y.; Wang, C.; Yuan, X. BadSampler: Harnessing the Power of Catastrophic Forgetting to Poison Byzantine-robust Federated Learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD’24), Barcelona, Spain, 25–29 August 2024. [Google Scholar]
Ma, Z.; Ma, J.; Miao, Y.; Li, Y.; Deng, R.H. Shieldfl: Mitigating model poisoning attacks in privacy-preserving federated learning. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1639–1654. [Google Scholar] [CrossRef]
Taheri, R.; Shojafar, M.; Alazab, M.; Tafazolli, R. FED-IIoT: A robust federated malware detection architecture in industrial IoT. IEEE Trans. Ind. Inform. 2020, 17, 8442–8452. [Google Scholar] [CrossRef]
Nabavirazavi, S.; Taheri, R.; Shojafar, M.; Iyengar, S.S. Impact of aggregation function randomization against model poisoning in federated learning. In Proceedings of the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2023, Exeter, UK, 1–3 November 2023; pp. 165–172. [Google Scholar]
Cui, Y.; Liang, Y.; Luo, Q.; Shu, Z.; Huang, T. Resilient Consensus Control of Heterogeneous Multi-UAV Systems with Leader of Unknown Input Against Byzantine Attacks. IEEE Trans. Autom. Sci. Eng. 2024, 1–12. [Google Scholar] [CrossRef]
Cui, Y.; Jia, Y.; Li, Y.; Shen, J.; Huang, T.; Gong, X. Byzantine resilient joint localization and target tracking of multi-vehicle systems. in IEEE Trans. Intell. Veh. 2023, 8, 2899–2913. [Google Scholar] [CrossRef]
Konečnỳ, J.; McMahan, H.B.; Ramage, D.; Richtárik, P. Federated optimization: Distributed machine learning for on-device intelligence. arXiv 2016, arXiv:1610.02527. [Google Scholar]
Samarakoon, S.; Bennis, M.; Saad, W.; Debbah, M. Distributed federated learning for ultra-reliable low-latency vehicular communications. IEEE Trans. Commun. 2019, 68, 1146–1159. [Google Scholar] [CrossRef]
Posner, J.; Tseng, L.; Aloqaily, M.; Jararweh, Y. Federated learning in vehicular networks: Opportunities and solutions. IEEE Netw. 2021, 35, 152–159. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Zhang, K.; Maharjan, S.; Zhang, Y. Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles. IEEE Trans. Veh. Technol. 2020, 69, 4298–4311. [Google Scholar] [CrossRef]
Salehi, B.; Reus-Muns, G.; Roy, D.; Wang, Z.; Jian, T.; Dy, J.; Ioannidis, S.; Chowdhury, K. Deep learning on multimodal sensor data at the wireless edge for vehicular network. IEEE Trans. Veh. Technol. 2022, 71, 7639–7655. [Google Scholar] [CrossRef]
Feng, D.; Haase-Schütz, C.; Rosenbaum, L.; Hertlein, H.; Glaeser, C.; Timm, F.; Wiesbeck, W.; Dietmayer, K. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst. 2020, 22, 1341–1360. [Google Scholar] [CrossRef]
Liu, X.; Gao, K.; Liu, B.; Pan, C.; Liang, K.; Yan, L.; Ma, J.; He, F.; Zhang, S.; Pan, S.; et al. Advances in deep learning-based medical image analysis. Health Data Sci. 2021, 2021, 8786793. [Google Scholar] [CrossRef] [PubMed]
Rabe, M.; Milz, S.; Mader, P. Development methodologies for safety critical machine learning applications in the automotive domain: A survey. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 129–141. [Google Scholar]
Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P. Byzantine-robust distributed learning: Towards optimal statistical rates. In Proceedings of the International Conference on Machine Learning (PMLR), Stockholm, Sweden, 10–15 July 2018; pp. 5650–5659. [Google Scholar]
Blanchard, P.; Mhamdi, E.M.E.; Guerraoui, R.; Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. Adv. Neural Inf. Process. Syst. 2017, 30, 118–128. [Google Scholar]
Hayat, S.; Yanmaz, E.; Muzaffar, R. Survey on unmanned aerial vehicle networks for civil applications: A communications viewpoint. IEEE Commun. Surv. Tutorials 2016, 18, 2624–2661. [Google Scholar] [CrossRef]
Ye, D.; Yu, R.; Pan, M.; Han, Z. Federated learning in vehicular edge computing: A selective model aggregation approach. IEEE Access 2020, 8, 23920–23935. [Google Scholar] [CrossRef]
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The kitti vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar]
Hwang, S.; Park, J.; Kim, N.; Choi, Y.; Kweon, I.S. Multispectral pedestrian detection: Benchmark dataset and baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1037–1045. [Google Scholar]
Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 12697–12705. [Google Scholar]
Sindagi, V.A.; Zhou, Y.; Tuzel, O. Mvx-net: Multimodal voxelnet for 3D object detection. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 7276–7282. [Google Scholar]
Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D proposal generation and object detection from view aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–8. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Colosimo, F.; Rango, F.D. Median-krum: A joint distance-statistical based byzantine-robust algorithm in federated learning. In Proceedings of the Int’l ACM Symposium on Mobility Management and Wireless Access, Montreal, QC, Canada, 30 October–3 November 2023; pp. 61–68. [Google Scholar]
Wang, T.; Zheng, Z.; Lin, F. Federated Learning Framew Ork Based on Trimmed Mean Aggregation Rules. 2022. Available online: https://www.ssrn.com/abstract=4181353 (accessed on 28 January 2022).
Data, D.; Diggavi, S. Byzantine-resilient sgd in high dimensions on heterogeneous data. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 2310–2315. [Google Scholar]
Cao, X.; Fang, M.; Liu, J.; Gong, N.Z. FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping. In Proceedings of the ISOC Network and Distributed System Security Symposium (NDSS), Online, 21–25 February 2021. [Google Scholar]

Figure 1. Workflow overview of Byzantine-robust multimodal federated learning framework.

Table 1. Accuracy of the proposed framework and benchmarks on different datasets.

Method	KITTI	nuScenes	KAIST
FedAvg	65.4 ± 0.3	58.7 ± 0.2	67.8 ± 0.2
Krum	67.7 ± 0.1	62.4 ± 0.2	71.2 ± 0.1
Multi-Krum	68.9 ± 0.1	65.7 ± 0.3	74.1 ± 0.2
Trimmed Mean	66.8 ± 0.2	64.1 ± 0.2	72.6 ± 0.2
Mean	65.8 ± 0.1	59.7 ± 0.4	70.1 ± 0.2
BrSGD	71.2 ± 0.2	68.3 ± 0.3	75.4 ± 0.2
Ours	73.2 ± 0.2	72.5 ± 0.2	77.9 ± 0.2

Table 2. Accuracy of the proposed framework and benchmarks under different numbers of compromised clients.

Method	$b = 10$	$b = 15$	$b = 20$	$b = 25$
FedAvg	65.4 ± 0.1	63.8 ± 0.2	58.8 ± 0.3	52.4 ± 0.2
Krum	67.7 ± 0.1	65.6 ± 0.1	60.1 ± 0.2	54.8 ± 0.3
Multi-Krum	68.9 ± 0.2	67.1 ± 0.3	62.5 ± 0.2	58.7 ± 0.1
Trimmed Mean	66.8 ± 0.2	65.6 ± 0.4	63.7 ± 0.3	59.8 ± 0.1
Mean	65.8 ± 0.2	62.7 ± 0.1	58.9 ± 0.2	55.6 ± 0.1
BrSGD	71.2 ± 0.3	68.7 ± 0.2	66.5 ± 0.2	62.7 ± 0.1
FLTrust	72.2 ± 0.1	71.4 ± 0.2	68.7 ± 0.3	65.6 ± 0.1
Ours	73.2 ± 0.1	71.9 ± 0.2	69.7 ± 0.1	68.4 ± 0.2

Table 3. Accuracy of the proposed framework and benchmarks under different numbers of compromised clients for Gaussian attack.

Method	$b = 10$	$b = 15$	$b = 20$	$b = 25$
FedAvg	58.6 ± 0.2	54.2 ± 0.3	48.6 ± 0.3	42.7 ± 0.2
Krum	65.2 ± 0.2	63.2 ± 0.2	61.8 ± 0.3	56.7 ± 0.2
Multi-Krum	67.7 ± 0.2	65.4 ± 0.2	61.6 ± 0.2	56.5 ± 0.1
Trimmed Mean	64.6 ± 0.2	62.4 ± 0.3	58.7± 0.2	54.4 ± 0.1
Mean	62.7 ± 0.2	57.7 ± 0.1	55.4 ± 0.2	53.1 ± 0.1
BrSGD	68.7 ± 0.2	67.1 ± 0.2	64.8 ± 0.2	60.7 ± 0.1
FLTrust	71.7 ± 0.1	66.7 ± 0.2	65.4 ± 0.2	61.8 ± 0.1
Ours	74.2 ± 0.1	73.1 ± 0.2	70.8 ± 0.1	69.3 ± 0.2

Table 4. Communication overhead of the proposed framework and baselines with different numbers of clients.

Method	$K = 100$	$K = 120$	$K = 140$	$K = 150$
FedAvg	4896 MB	5432 MB	5831 MB	6123 MB
Krum	4984 MB	5641 MB	6023 MB	6457 MB
Multi-Krum	5014 MB	5425 MB	5987 MB	6398 MB
Trimmed Mean	5021 MB	5531 MB	6015 MB	6157 MB
Mean	4974 MB	5324 MB	6074 MB	6248 MB
BrSGD	3697 MB	4125 MB	4897 MB	5324 MB
Ours	49.64 MB	53.24 MB	57.41 MB	60.23 MB

Table 5. Communication overhead of the proposed framework and baselines under compression rates.

Method	$r = 100$	$r = 110$	$r = 120$	$r = 150$
FedAvg	4896 MB	5432 MB	5831 MB	6123 MB
Krum	4984 MB	5641 MB	6023 MB	6457 MB
Multi-Krum	5014 MB	5425 MB	5987 MB	6398 MB
Trimmed Mean	5021 MB	5531 MB	6015 MB	6157 MB
Mean	4974 MB	5324 MB	6074 MB	6248 MB
BrSGD	3697 MB	4125 MB	4897 MB	5324 MB
Ours	49.64 MB	48.23 MB	46.65 MB	41.25 MB

Table 6. Ablation study results.

Method	$K = 100$	$K = 120$	$K = 140$	$K = 150$
w/o Fusion	68.9	67.7	66.8	65.7
w/o Aggregation	66.1	65.4	64.6	62.8
Ours	73.2	72.5	71.1	70.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, N.; Lin, X.; Lu, J.; Zhang, F.; Chen, W.; Tang, J.; Xiao, J. Byzantine-Robust Multimodal Federated Learning Framework for Intelligent Connected Vehicle. Electronics 2024, 13, 3635. https://doi.org/10.3390/electronics13183635

AMA Style

Wu N, Lin X, Lu J, Zhang F, Chen W, Tang J, Xiao J. Byzantine-Robust Multimodal Federated Learning Framework for Intelligent Connected Vehicle. Electronics. 2024; 13(18):3635. https://doi.org/10.3390/electronics13183635

Chicago/Turabian Style

Wu, Ning, Xiaoming Lin, Jianbin Lu, Fan Zhang, Weidong Chen, Jianlin Tang, and Jing Xiao. 2024. "Byzantine-Robust Multimodal Federated Learning Framework for Intelligent Connected Vehicle" Electronics 13, no. 18: 3635. https://doi.org/10.3390/electronics13183635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Byzantine-Robust Multimodal Federated Learning Framework for Intelligent Connected Vehicle

Abstract

1. Introduction

2. Related Work

2.1. Federated Learning in Vehicular Networks

2.2. Multimodal Learning for ICVs

2.3. Byzantine-Robust Federated Learning

2.4. Communication-Efficient Federated Learning

3. Problem Definition

3.1. System Model

3.2. Challenges and Constraints

4. Our Approach

4.1. Cross-Node Multimodal Alignment and Fusion

4.2. Gradient Compression-Based Byzantine Aggregation

4.3. Time Complexity Analysis

5. Experiments

5.1. Experiment Setup

5.2. Numerical Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI