Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency

Qiu, Chen; Wu, Ziang; Wang, Haoda; Yang, Qinglin; Wang, Yu; Su, Chunhua

doi:10.3390/fi17010018

Open AccessArticle

Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency

by

Chen Qiu

¹

,

Ziang Wu

¹

,

Haoda Wang

¹,

Qinglin Yang

²

,

Yu Wang

^3,†

and

Chunhua Su

^1,*,†

¹

Graduate School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu 965-8580, Fukushima Prefecture, Japan

²

Cyberspace Institute of Advanced Technology/Huangpu Research School of Guangzhou University, Guangzhou University (Huangpu), Guangzhou 510006, China

³

Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou 510006, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Future Internet 2025, 17(1), 18; https://doi.org/10.3390/fi17010018

Submission received: 23 November 2024 / Revised: 31 December 2024 / Accepted: 3 January 2025 / Published: 5 January 2025

(This article belongs to the Special Issue Information and Future Internet Security, Trust and Privacy—3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

Federated Learning (FL) is a distributed machine-learning paradigm that enables models to be trained across multiple decentralized devices or servers holding local data without transferring the raw data to a central location. However, applying FL to heterogeneous IoT scenarios comes with several challenges due to the diverse nature of these devices in terms of hardware capabilities, communications, and data heterogeneity. Furthermore, the conventional parameter server-based FL paradigm aggregates the trained parameters of devices directly, which incurs high communication overhead. To this end, this paper designs a hierarchical federated-learning framework for heterogeneous IoT systems, focusing on enhancing communication efficiency and ensuring data security through lightweight encryption. By leveraging hierarchical aggregation, lightweight stream encryption, and adaptive device participation, the proposed framework provides an efficient and robust solution for federated learning in dynamic and resource-constrained IoT environments. The extensive experimental results show that the proposed FL paradigm significantly reduces round time by 20%.

Keywords:

hierarchical federated learning; lightweight; heterogeneous device

1. Introduction

With the rapid development of the Internet of Things (IoT) and edge computing, an increasing number of devices are being deployed, leading to the generation of vast amounts of data. These data, often generated in sensitive environments such as healthcare, smart homes, and industrial systems, demand high communication efficiency and strong privacy protection mechanisms [1]. Ensuring data privacy while maintaining system efficiency has become a critical challenge in IoT environments, where numerous devices are interconnected and generate continuous streams of information [2].

However, IoT devices are highly heterogeneous, exhibiting significant differences in computational capabilities, memory resources, and network connectivity. This heterogeneity poses substantial challenges for data privacy and efficient communication [3,4,5]. Devices with limited resources may struggle to participate effectively in a distributed system, resulting in imbalanced workloads, latency, and reduced system robustness [6]. The diversity of device capabilities and network conditions makes it difficult to design a one-size-fits-all solution for privacy-preserving data processing and communication efficiency.

Federated learning (FL), as exemplified by algorithms such as FedAvg, enables decentralized training by coordinating multiple devices to collaboratively learn a shared global model without exchanging raw data [7]. The FL process typically consists of several iterative communication rounds, where each round involves three main steps: (1) initialization, where the server sends the global model to selected devices; (2) local training, where devices train the model using their local data; and (3) aggregation, where devices upload their model updates, and the server combines these updates to refine the global model.

Despite its potential, traditional FL approaches face several limitations in heterogeneous IoT environments. A key challenge is the high round time that represents the total time for completing an iteration of the FL process. A round time is composed of three parts, including the server distributing the global model to selected devices, the local training performed on each device, and the subsequent uploading of locally updated models back to the server for aggregation. In heterogeneous IoT settings, slower devices (or stragglers) can significantly prolong the round time, therefore delaying convergence [8]. This delay not only affects overall efficiency but also increases energy consumption and resource usage for resource-constrained devices. Furthermore, the computational and communication overhead imposed by centralized FL methods remains burdensome, limiting their practicality in such environments [9].

To this end, hierarchical federated learning (HFL) frameworks have emerged as a promising approach to address these challenges [10,11], such as HierFAVG [12]. By introducing intermediate aggregation layers (e.g., edge servers), compared to FedAvg, HFL reduces communication overhead to the central cloud and enables localized aggregation, therefore improving scalability and resource utilization. Nevertheless, most existing HFL solutions focus on architectural optimization and ignore critical aspects such as security mechanisms, device heterogeneity, and asynchronous coordination [13]. These limitations leave gaps in ensuring both privacy and efficiency under highly dynamic and heterogeneous IoT conditions.

Strengthening the privacy mechanisms employed in FL has led to the proposal of several advanced techniques, such as homomorphic encryption, secure multi-party computation (SMPC), and differential privacy have been proposed [14,15,16]. Although these methods provide strong security guarantees, they impose significant computational and storage requirements, making them impractical for IoT devices [17,18]. Furthermore, these techniques may degrade model accuracy due to the added computational complexity and data perturbation, which is particularly problematic for high-precision tasks like anomaly detection or medical monitoring in IoT-based healthcare [19].

To this end, this paper proposes a new framework that integrates hierarchical federated learning, asynchronous aggregation, and lightweight encryption to address key challenges effectively. The framework employs a hierarchical architecture based on an IoT-Edge-Cloud model, where data processing and aggregation occur at multiple levels to enhance both efficiency and scalability. In particular, the hierarchical structure facilitates local aggregation at the edge, which reduces the communication overhead on the central server and improves overall scalability. Moreover, asynchronous aggregation alleviates the issue of stragglers by allowing devices to submit updates independently without the need for synchronization with all other nodes. Additionally, we employ the lightweight Salsa20 encryption to protect data during transmission [20], effectively countering man-in-the-middle attacks while maintaining low computational overhead suitable for constrained devices. Additionally, we employ the lightweight Salsa20 encryption to secure data transmission, protecting a range of potential threats, including eavesdropping and data tampering, while maintaining low computational overhead suitable for resource-constrained devices [20]. The detailed security analysis, including the algorithm’s resilience to common attack vectors, is presented in Section 3.5.4.

Unlike traditional FL and existing hierarchical frameworks, the proposed framework uniquely integrates asynchronous aggregation and lightweight encryption into a hierarchical design. This integration not only addresses straggler issues by decoupling global synchronization but also ensures secure data transmission without imposing prohibitive computational overhead. In doing so, our approach effectively balances scalability, privacy, and efficiency, making it particularly suitable for IoT environments characterized by dynamic heterogeneity and limited resources. The key contributions of this paper are as follows:

Hierarchical Federated-Learning Framework: We design an HFL framework tailored for heterogeneous IoT environments, leveraging edge-level aggregation to reduce communication overhead and enhance privacy.
Asynchronous Aggregation: An asynchronous aggregation strategy is introduced to mitigate straggler effects, enabling efficient training by decoupling slow devices from global synchronization.
Lightweight Encryption: Salsa20 is employed as a lightweight encryption mechanism, ensuring secure data transmission with minimal computational overheads and addressing privacy concerns in resource-constrained IoT devices.
Experimental Validation: We validate the framework through extensive experiments in a real-world IoT scenario, demonstrating a 20% reduction in round time compared to traditional FL methods without compromising model accuracy.

The proposed framework is demonstrated through a human activity recognition (HAR) use case, showcasing its applicability in privacy-sensitive and accuracy-critical scenarios. Our results highlight the effectiveness of integrating hierarchical design, asynchronous updates, and lightweight encryption to overcome the scalability, efficiency, and privacy challenges in heterogeneous IoT environments.

Relationship with Existing FL Research: The proposed framework is designed to address the unique challenges of system heterogeneity, scalability, and privacy in IoT environments. Unlike existing techniques such as personalization [21], pruning [22], and masking [23], which focus on optimizing specific aspects of federated learning, Our framework tackles systemic variations and potential stragglers in hierarchical IoT networks. These approaches are complementary to our framework and could be integrated into the hierarchical framework to further enhance model performance. However, exploring such integrations is beyond the scope of this paper. Instead, this work focuses on establishing the efficacy of hierarchical aggregation, asynchronous updates, and lightweight encryption in addressing the critical challenges of federated learning under highly heterogeneous and resource-constrained IoT conditions.

The rest of the paper is organized as follows. Section 2 reviews related work in federated learning and its application to IoT environments. Section 3 presents the proposed framework. Section 4 describes the proposed algorithms. Section 5 provides the simulation and evaluation results. Section 6 discusses the implications of the experimental results and explores potential areas for improvement. Finally, Section 7 concludes the paper with a summary of contributions and directions for future research.

2. Related Work

FL has emerged as a promising paradigm for privacy-preserving distributed machine learning, enabling data to remain localized on devices while facilitating collaborative model training [24,25,26]. While FL has shown great potential in protecting user privacy, its deployment in heterogeneous IoT environments poses significant challenges, including resource constraints, device heterogeneity, and communication bottlenecks [17,27]. This section reviews existing research in three key areas: privacy-preserving techniques, optimization strategies for FL efficiency, and hierarchical architectures for scalability in IoT environments.

2.1. Privacy-Preserving Techniques in FL

Several techniques have been proposed to enhance privacy in FL, including homomorphic encryption, secure multi-party computation (SMPC), and differential privacy [28,29]. Homomorphic encryption and SMPC provide robust privacy guarantees but incur significant computational and storage overhead, rendering them impractical for resource-constrained IoT devices [30]. For instance, fully homomorphic encryption increases latency and energy consumption, making it unsuitable for real-time IoT applications [31]. Differential privacy, while computationally efficient, introduces noise to protect individual data, leading to potential degradation in model accuracy [32,33].

In resource-constrained IoT environments, lightweight cryptographic methods such as stream ciphers have gained attention due to their low computational overhead [20]. These methods avoid the heavy resource requirements of traditional block ciphers like AES [34] and offer high throughput for secure data transmission. However, most existing works do not fully address the trade-off between privacy protection and computational efficiency, leaving a critical gap for solutions tailored to heterogeneous IoT systems.

2.2. Optimization Strategies for FL in Heterogeneous IoT Environments

To address the challenges of resource heterogeneity and limited device capabilities, researchers have explored various optimization strategies, including model compression [35] and adaptive aggregation [36]. Model compression techniques reduce the size of model updates, alleviating communication costs but potentially impacting model fidelity. Adaptive aggregation methods, such as FedProx [36] and FedNova [37], aim to account for device variability by adjusting the aggregation weights based on device-specific factors. However, these methods often assume synchronous updates, which are susceptible to delays caused by stragglers in heterogeneous IoT networks.

Asynchronous FL techniques [2,32] have been proposed to mitigate the impact of stragglers by allowing updates from faster devices to proceed without waiting for slower ones. While effective in reducing latency, these approaches often lack mechanisms to balance contributions from heterogeneous devices, which can lead to suboptimal model convergence [38]. Existing methods also fail to fully leverage hierarchical architectures to further enhance scalability and resource utilization.

Federated-learning frameworks specifically designed for IoT scenarios have also been explored to address challenges unique to these environments. Techniques like LightFed [39] introduce lightweight protocols to improve communication efficiency, while methods such as EdgeFL [40] leverage edge servers to manage dynamic device participation. Despite these advancements, existing IoT-focused FL frameworks often fall short in addressing the combined challenges of device heterogeneity, dynamic participation, and privacy.

2.3. Hierarchical Architectures for Scalability

HFL frameworks have been proposed to improve scalability and efficiency in distributed systems by organizing the network into multiple tiers, such as IoT devices, edge servers, and cloud servers [10,11]. By enabling localized aggregation at the edge level, these architectures reduce communication overhead and mitigate bottlenecks caused by stragglers. For example, hierarchical designs like HierFAVG [12] and FLAIR [13] have demonstrated improved adaptability in large-scale networks.

Despite these advancements, existing HFL solutions often prioritize architectural scalability over privacy and resource efficiency. Many rely on computationally intensive security measures that are unsuitable for resource-constrained IoT devices [41]. Moreover, the lack of robust asynchronous mechanisms limits their ability to handle dynamic participation and device heterogeneity effectively [42].

In this work, we build upon existing research by proposing an integrated hierarchical federated-learning framework that incorporates lightweight encryption and asynchronous aggregation. Unlike previous approaches, our framework is specifically designed to address the privacy, scalability, and heterogeneity issues in IoT-Edge-Cloud environments, providing a practical solution for real-world IoT applications such as human activity recognition. By leveraging lightweight encryption and a hierarchical architecture, we aim to strike a balance between security, efficiency, and adaptability, overcoming the limitations identified in existing approaches.

3. Proposed Framework

In this section, we present our proposed asynchronous aggregation strategy within the hierarchical federated-learning framework, specifically tailored for heterogeneous IoT environments, as illustrated in Figure 1. This framework is depicted using six steps: ❶ The IoT devices obtain the initialization model from the cloud server; ❷ IoT devices train their models with the local dataset; ❸ IoT devices pull the trained parameters to edge server; ❹ During training, the edge server pre-aggregates parameters from IoT devices using a sliding time window strategy, avoiding delays and uploading to the cloud based on real-time needs; ❺ The cloud server aggregated the parameters uploaded by the edge server; ❻ IoT devices push the updated parameters from the edge server.

Our strategy aims to leverage edge nodes and the central server collaboratively to optimize the model aggregation process, improve timeliness, and handle short-term instabilities that arise from the heterogeneity and asynchronous nature of IoT devices.

3.1. System Architecture

The system architecture follows a three-layer structure comprising IoT devices, edge nodes, and a central server. In this architecture, IoT devices collect data and perform local model training, edge nodes serve as intermediate aggregators, and the central server is responsible for global model aggregation. By leveraging edge nodes for local aggregation, the hierarchical design helps reduce communication costs and enhances scalability. This localized aggregation effectively reduces the communication burden on the central server, making the architecture particularly suitable for IoT environments with constrained resources.

Let N denote the number of IoT devices, indexed by

i \in {1, 2, \dots, N}

, and let E denote the number of edge nodes, indexed by

e \in {1, 2, \dots, E}

. Each IoT device i collects local data

D_{i}

and trains a local model

w_{i}

. The edge node e aggregates the local models received from a subset of IoT devices

S_{e} \subset {1, 2, \dots, N}

connected to it, and the central server aggregates the updates from all edge nodes. This approach not only reduces network congestion but also effectively utilizes the computational capabilities of edge nodes.

3.2. Hierarchical Federated Learning

In our hierarchical federated-learning approach, the aggregation process occurs at two levels: edge-level aggregation and global aggregation. Each IoT device i trains its local model using its data

D_{i}

, resulting in a model update

w_{i}

. The edge node e performs aggregation over the subset

S_{e}

of IoT devices using the following formula:

w_{e}^{(t)} = \frac{1}{| A_{t} |} \sum_{i \in A_{t}} w_{i},

(1)

where

w_{e}^{(t)}

represents the aggregated model at the edge node e at time t, and

A_{t}

is the set of IoT devices that have successfully uploaded their updates by time t. The asynchronous nature of this aggregation allows the system to maintain responsiveness even if some devices are slower to complete their local training.

After edge-level aggregation, the aggregated model

w_{e}^{(t)}

is sent to the central server, which performs a global aggregation over all edge nodes. The central server uses an adaptive weighting mechanism to aggregate the models received from the edge nodes, where each edge node is assigned a weight

v_{e}

based on its participation frequency

p_{e}

and model quality metric

q_{e}

:

v_{e} = β p_{e} + (1 - β) q_{e},

(2)

where

β

is a tunable parameter controlling the influence of participation frequency versus model quality. The global aggregation is then given by:

w_{g l o b a l} = \frac{1}{\sum_{e = 1}^{E} v_{e}} \sum_{e = 1}^{E} v_{e} w_{e}^{(t)} .

(3)

This weighted aggregation ensures that edge nodes with higher reliability and better model quality contribute more significantly to the global model, leading to improved convergence and overall system performance.

3.3. Asynchronous Aggregation with Edge and Cloud Interaction

The aggregation process in our framework takes place across both the edge and cloud layers, incorporating dynamic interactions to ensure efficient and timely model updates. Unlike traditional hierarchical federated learning, where the cloud server merely waits for edge nodes to complete aggregation, our approach allows for active cloud-level control and feedback, enhancing the overall efficiency and model performance.

To further enhance edge-level aggregation, we employ a sliding window control mechanism. The sliding window allows each edge node to dynamically adjust the aggregation interval based on the recent arrival rate of updates from IoT devices. Specifically, the window size is adjusted according to the device participation rate and training time, inspired by concepts from queue theory. Let

w_{w i n d o w}

denote the window size, which is updated as follows:

w_{w i n d o w}^{(t + 1)} = w_{w i n d o w}^{(t)} + α (p_{d e s i r e d} - p_{a c t u a l}),

(4)

where

α

is a hyperparameter,

p_{d e s i r e d}

is the desired participation rate, and

p_{a c t u a l}

is the actual participation rate observed in the previous window. This mechanism ensures that the aggregation window size is neither too short nor too long, therefore balancing responsiveness and stability.

Sliding window control is a versatile technique commonly used in networking and data transmission to improve efficiency and reliability, especially in dynamic environments. Adjusting the window size in real time helps prevent data loss, adapt to changes in device participation, and maintain a balance between accuracy and responsiveness. In the context of FL, sliding window control enables real-time adaptability to heterogeneous IoT environments, aligning aggregation intervals with actual participation to maintain system robustness.

3.4. Cloud-Level Feedback and Adaptive Weight Adjustment

To further enhance the effectiveness of the aggregation process, the central server provides adaptive feedback to the edge nodes. This feedback includes updated weights and guidance on adjusting local aggregation intervals or the weighting of specific devices. For example, if the cloud server identifies an edge node with low participation frequency, it may recommend increasing the aggregation interval to collect more updates, therefore enhancing the model’s robustness. Similarly, the central server may provide updated device weights to help edge nodes balance contributions from heterogeneous devices more effectively.

The adaptive weight adjustment at the edge nodes follows a simple rule based on the feedback from the central server. Let

w_{i}

be the local model update from IoT device i, and let

v_{i}^{(t)}

be the updated weight provided by the central server at time t. The edge-level aggregation at time t can be expressed as:

w_{e}^{(t + 1)} = \frac{1}{\sum_{i \in A_{t}} v_{i}^{(t)}} \sum_{i \in A_{t}} v_{i}^{(t)} w_{i}

(5)

This adaptive mechanism ensures that each device’s contribution is proportional to its performance, therefore optimizing the local aggregation process and aligning it with the global learning objectives.

3.5. Mechanism for Enhancing Privacy and Security

Ensuring privacy protection and secure data transmission is critical in collaborated-learning scenarios, particularly in resource-constrained IoT environments where data often traverses insecure or unreliable routing paths. We utilize the Salsa20 stream cipher, a lightweight and cryptographically secure encryption method selected for its effective balance of robust security, computational efficiency, and low network overhead.

3.5.1. Principle and Mechanism of Salsa20 Encryption

Salsa20 is a stream cipher that operates through a series of highly optimized arithmetic and bitwise operations, collectively referred to as ARX (Addition, Rotation, XOR). Unlike traditional block ciphers such as AES, Salsa20 does not rely on substitution boxes (S-boxes) or complex key scheduling algorithms, making it more suitable for IoT devices with constrained resources.

The core of Salsa20 involves generating a pseudorandom keystream from a secret key and a nonce, which is then XORed with the plaintext to produce the ciphertext. Given a plaintext model update

m_{i}

from IoT device i and a secret encryption key

k_{i}

, the encryption process can be expressed as:

c_{i} = m_{i} \oplus Keystream (k_{i}, nonce),

(6)

where

c_{i}

represents the ciphertext. The keystream is generated by repeatedly applying the Salsa20 hash function.

The computational efficiency of Salsa20 stems from its reliance on ARX operations, which are highly efficient on modern hardware and can be parallelized effectively. This ensures that encryption and decryption incur minimal computational overhead, maintaining the timeliness required for federated-learning processes.

3.5.2. Comparison with Alternative Cryptographic Methods

To evaluate the suitability of Salsa20 for federated learning in resource-constrained IoT environments, we refer to prior works [43,44], which compare Salsa20 and AES in terms of power consumption, encryption latency, and encryption speed. Table 1 summarizes the key metrics.

As shown in Table 1, Salsa20 consistently demonstrates lower power consumption (2.82 μW vs. 4.01 μW) and higher encryption speeds across different data sizes compared to AES. For instance, when encrypting a 1GB sequence, Salsa20 achieves a speed of 3624 Mbps, surpassing AES’s 3290 Mbps. Meanwhile, Salsa20 excels in handling both small (40 bytes) and large (1500 bytes) data packets, maintaining faster encryption speeds with negligible latency differences (202 vs. 180 cycles).

These results demonstrate Salsa20’s suitability for IoT environments, where energy efficiency and data transmission rates are crucial for system performance. By minimizing encryption overhead and ensuring high throughput, Salsa20 meets the scalability and efficiency needs of federated learning in heterogeneous networks. The data, adapted from prior works [43,44], supports the selection of an encryption mechanism, building on established findings to justify Salsa20’s inclusion in our framework, without replicating the experiments.

3.5.3. Impact on Network Overhead

Unlike homomorphic encryption or other complex schemes that increase the size of transmitted data due to added metadata or ciphertext expansion, Salsa20 maintains the original size of the plaintext. This is a direct consequence of the stream cipher design, which XORs the plaintext with a pseudorandom keystream without introducing additional bytes. As a result, the total data transmitted per IoT device remains identical before and after encryption:

Size (c_{i}) = Size (m_{i}),

(7)

where

Size (\cdot)

represents the size of the data in bytes. This property ensures that the encryption process does not exacerbate communication bottlenecks, making Salsa20 highly suitable for IoT environments with limited bandwidth.

3.5.4. Security and Attack Complexity Analysis

Salsa20 has been extensively studied for its resilience against common cryptographic attacks, including brute-force, differential cryptanalysis, and side-channel attacks [45]. Its 256-bit key length ensures a security level that is computationally infeasible to compromise with current methods [44,46]. Compared to block ciphers like AES, Salsa20’s ARX structure eliminates the need for S-boxes, which are often targets for side-channel attacks. This streamlined design enhances both security and efficiency in resource-constrained settings.

While Salsa20 ensures that data-in-transit remain confidential and tamper-resistant, this work primarily focuses on the encryption mechanism rather than the surrounding key management framework. Issues such as secure key generation, distribution, and revocation, as well as the implementation of secure key provisioning protocols, are beyond the scope of this paper. We assume the necessary keys are established through a trusted channel or initialization phase.

In the context of FL, adversaries may attempt to eavesdrop on or tamper with model updates during transmission. By encrypting updates with Salsa20, we ensure that the data remains confidential and tamper-resistant. Additionally, the low computational and energy requirements of Salsa20 align with the scalability objectives of our framework, allowing secure communication without sacrificing performance.

4. Algorithms

In this section, we present the proposed hierarchical federated-learning framework through a series of key algorithms, each addressing a specific aspect of the federated-learning process: local training and encryption at IoT devices, asynchronous aggregation and adaptation at edge nodes, and global aggregation with cloud-level feedback. Each algorithm is provided in pseudocode form, followed by a detailed explanation to ensure a clear understanding of their respective roles within the framework.

4.1. Local Training and Encryption at IoT Device

Each IoT device trains a local model using its personal dataset,

D_{i}

, through multiple epochs of gradient descent. The local model,

w_{i}

, is initialized with a global model, w, and iteratively updated using the computed gradients,

g_{i}

. Once training is complete, the updated model is encrypted with the Salsa20 encryption algorithm using an encryption key,

k_{i}

, to ensure data privacy before transmission. The encrypted model,

c_{i}

, is then sent to the corresponding edge node for further aggregation.

Subsequently, we systematically introduce the computational complexity of Algorithm 1 regarding the local training procedure and the subsequent encryption. During each local epoch, the device iterates over its local dataset

D_{i}

, computing gradients

\nabla f (w_{i}; D_{i})

and updating the model parameters

w_{i}

. The time complexity of these operations depends on the number of training samples, input dimensionality, and gradient computation complexity. In practice, gradient evaluation often scales approximately linearly with the dataset size, and updating the model through gradient descent typically results in a time complexity on the order of

O (E \cdot N \cdot d)

, where E denotes the number of epochs, N the number of local samples, and d the feature dimension. For the space complexity, storing the model parameters and intermediate variables (e.g., gradients) typically incurs memory requirements that scale at least linearly with the number of parameters. As the complexity of the model increases (e.g., with deeper neural networks or higher-dimensional feature spaces), its memory footprint also grows. Subsequently, the encryption step using Salsa20, a well-established stream cipher, introduces only linear time complexity for the size of the data being encrypted. Since Salsa20 operates by applying a series of simple arithmetic operations (e.g., addition, XOR, and rotation) over fixed-size blocks, the complexity of encrypting the model parameters

w_{i}

is effective

O (L)

, where L is the length of the parameter vector.

Algorithm 1 Local Model Training and Encryption at IoT Device

1:: Input: Local data $D_{i}$ at device i, initial model w, encryption key $k_{i}$
2:: Output: Encrypted model update $c_{i}$
3:: $w_{i} \leftarrow w$ {Initialize local model}
4:: for each local epoch $e = 1, \dots, E$ do
5:: $g_{i} \leftarrow \nabla f (w_{i}; D_{i})$ {Compute gradient using local data}
6:: $w_{i} \leftarrow w_{i} - η g_{i}$ {Update local model using gradient descent}
7:: end for
8:: $c_{i} \leftarrow Salsa 20 (w_{i}, k_{i})$ {Encrypt updated model using Salsa20}
9:: Send encrypted update $c_{i}$ to edge node

4.2. Edge Node Aggregation and Adaptation

In collaborative learning frameworks, edge nodes serve as intermediaries between a multitude of resource-constrained IoT devices and a central server, aggregating locally trained model updates to produce a refined global model. The asynchronous aggregation algorithm (Algorithm 2) leverages both historical context and current observations to stabilize learning despite heterogeneous and intermittent device participation. More specifically, the algorithm employs weighted averaging, wherein the new aggregated model

w_{e}^{(t)}

is composed of a portion of the previously aggregated model

w_{e}^{(t - 1)}

and the current round’s model updates

w_{i}

received from a subset of devices

i \in A_{t}

. The weight

α

governs the influence of the prior model state, allowing the system to maintain continuity and mitigate abrupt changes. Conversely, the weight

β

determines the relative impact of newly arrived device updates, enabling the algorithm to incorporate fresh information and adapt promptly to evolving conditions. These mechanisms help counteract the irregularities inherent in asynchronous settings, such as devices that upload updates at different times, varying data quality, or fluctuating connectivity conditions. By normalizing the final aggregation result with

α + β | A_{t} |

, the algorithm ensures a consistent reference scale, enhancing the stability and interpretability of the combined model.

Apart from the aggregation of model parameters, the algorithm integrates adaptive control strategies to sustain an optimal balance between update frequency and model quality. The sliding window variable

w_{w i n d o w}^{(t)}

is maintained to adjust the timing and breadth of aggregations based on the discrepancy between the desired and actual device participation rates,

p_{d e s i r e d}

and

p_{a c t u a l}

. This adjustment allows the edge node to dynamically adjust the aggregation interval, improving responsiveness to system changes and preventing inefficiencies such as excessive communication costs from too frequent aggregations or outdated models from infrequent ones. Additionally, a quality score

q_{e}

is iteratively updated to track the reliability of received updates over time. Parameters like

γ

are employed to implement exponential smoothing, assigning appropriate emphasis to recent measurements versus historical performance, therefore enhancing the system’s ability to detect persistent trends or anomalies in device behavior.

Algorithm 2 Asynchronous Aggregation and Adaptation at Edge Node

1:: Input: Encrypted models ${c_{i} ∣ i \in A_{t}}$ , previous model $w_{e}^{(t - 1)}$ , feedback $v_{e}$
2:: Output: Aggregated model $w_{e}^{(t)}$ , metadata
3:: Decrypt $w_{i} = Decrypt (c_{i}), \forall i \in A_{t}$
4:: $w_{e}^{(t)} \leftarrow α w_{e}^{(t - 1)}$
5:: for each $i \in A_{t}$ do
6:: $w_{e}^{(t)} \leftarrow w_{e}^{(t)} + β w_{i}$
7:: end for
8:: $w_{e}^{(t)} \leftarrow \frac{w_{e}^{(t)}}{α + β | A_{t} |}$
9:: $w_{w i n d o w}^{(t + 1)} \leftarrow w_{w i n d o w}^{(t)} + α (p_{d e s i r e d} - p_{a c t u a l})$
10:: $q_{e}^{(t)} \leftarrow γ q_{e}^{(t - 1)} + (1 - γ) q_{c u r r e n t}$
11:: Send encrypted update and metadata $[w_{i}^{(t)}, p_{e}^{(t)}, q_{e}^{(t)}]$ to central server

From a computational complexity perspective, the dominant costs of asynchronous aggregation at the edge node arise from the cryptographic operations and the vectorized arithmetic. The decryption of each encrypted model

c_{i}

typically takes a time proportional to the model size, denoted as

O (L)

for models with L parameters. Accumulating and normalizing updates from

| A_{t} |

devices adds another

O (| A_{t} | L)

complexity, as each parameter of each participating device model must be combined. Space complexity remains

O (L)

for storing models and metadata, as the algorithm keeps track of a limited set of parameters and aggregated statistics rather than the entire historical sequence of updates. Overall, the computational overhead scales with both the number of participating devices and the model dimensionality but does so in a manner that remains tractable for edge nodes equipped with moderate computational resources. Through careful parameter tuning and adaptive mechanisms, the described approach achieves a favorable trade-off between computational efficiency, adaptability, and model quality in asynchronous edge aggregation scenarios.

4.3. Global Aggregation and Cloud-Level Feedback

Algorithm 3 illustrates the global aggregation process, wherein the server receives aggregated models, denoted as

w_{e}

, along with metadata such as participation frequencies

p_{e}

and quality scores

q_{e}

from each edge node e. While previous stages focus on handling local and edge-level variability, the global aggregation stage further refines the model by emphasizing reliability and quality across the entire network of participating edge nodes.

The core of this process is the computation of a reliability score

v_{e}

for each edge node. Drawing upon established participation metrics

p_{e}

and quality indicators

q_{e}

, the algorithm forms a weighted average where

β

controls the relative importance of participation frequency versus quality. Higher

v_{e}

values correspond to edge nodes that consistently contribute updates of higher quality or maintain stable and frequent participation. By weighting each edge node’s aggregated model

w_{e}

by

v_{e}

, the global model

w_{g l o b a l}

becomes more robust to outliers and intermittent contributors, ultimately improving the convergence and generalization performance of the federated-learning system.

Algorithm 3 Global Aggregation and Cloud-Level Feedback

1:: Output: Global model $w_{g l o b a l}$ , feedback for edge nodes ${v_{e}}$
2:: Output: Global model $w_{g l o b a l}$
3:: $w_{g l o b a l} \leftarrow 0$
4:: for each edge node $e = 1, \dots, E$ do
5:: $v_{e} \leftarrow β p_{e} + (1 - β) q_{e}$
6:: $w_{g l o b a l} \leftarrow w_{g l o b a l} + v_{e} w_{e}$
7:: end for
8:: $w_{g l o b a l} \leftarrow \frac{w_{g l o b a l}}{\sum_{e = 1}^{E} v_{e}}$
9:: Distribute $w_{g l o b a l}$ to all edge nodes
10:: Send feedback $v_{e}$ to edge nodes e, including recommended adjustments for aggregation intervals and participation weight adjustments

Upon completing the aggregation, the global model

w_{g l o b a l}

is normalized by dividing by the sum of the weights

\sum_{e = 1}^{E} v_{e}

. This normalization ensures a stable reference scale for subsequent training rounds, making the global model both statistically meaningful and numerically well-conditioned. The refined global model is then disseminated back to the edge nodes, providing them with a synchronized, high-quality baseline from which to initiate subsequent local training rounds.

In addition to redistributing

w_{g l o b a l}

, the central server also returns feedback

v_{e}

to each edge node. These feedback values, along with recommended adjustments for local parameters (such as aggregation intervals or participation weights), guide edge nodes in calibrating their local and asynchronous aggregation processes. This iterative feedback loop creates a positive cycle: as edge nodes refine their behavior and improve their contributions, the global model strengthens, providing better guidance and baselines for future local training. Over time, this dynamic process fosters a more stable, efficient, and high-performing federated-learning ecosystem.

The computational complexity at this global stage primarily reflects the cost of decrypting the received aggregated models and performing weighted summations. Decryption overhead scales linearly with the number of edge nodes E and the model size L, resulting in

O (E \cdot L)

complexity. The aggregation step itself—applying weights

v_{e}

and summing across E nodes—also incurs

O (E \cdot L)

operations. Space complexity remains on the order of

O (L)

, as the server temporarily stores each model and associated metadata before integrating them into the global model. Overall, this complexity profile remains tractable for cloud-level servers, which typically have more abundant computational and storage resources than edge nodes or IoT devices.

5. Simulation

This section describes the experimental setup for evaluating the proposed hierarchical federated-learning framework in a real-world IoT-Edge-Cloud scenario. The experiments are conducted using four Raspberry Pi devices, three PCs, and a human activity recognition dataset, where each IoT device represents a different individual’s activity data.

5.1. Experimental Setup

This experiment aims to evaluate the performance of the proposed heterogeneous IoT-Edge-Cloud hierarchical federated-learning system. The experiment employs an asynchronous aggregation mechanism combined with edge and cloud-level feedback to enhance the overall robustness and communication efficiency of the system.

5.1.1. Experiment Environment

The experimental setup consists of a three-layer architecture: IoT devices, edge nodes, and a central server. We utilize the following hardware components:

IoT Devices: Four Raspberry Pi units, each representing an individual and responsible for local data collection and training.
Edge Nodes: Two PCs act as edge nodes. Each edge node aggregates model updates from two connected Raspberry Pi devices.
Central Server: One PC serves as the central server, performing global aggregation of models received from the edge nodes.

The devices connect through a local wireless network. IoT devices are connected to the corresponding edge nodes, while the edge nodes communicate with the central server. Each IoT device collects local human activity data, performs local model training, and sends encrypted model updates to its assigned edge node. The edge nodes then aggregate the local models and forward the results to the central server for final aggregation.

There are 4 IoT devices, 2 edge nodes, and a cloud server in an experiment of the hierarchical federated-learning framework. In the actual experiment, 4 IoT devices are represented by Raspberry Pi units to demonstrate the feasibility of the system on real hardware. The Raspberry Pi devices primarily demonstrate performance in encryption and data transmission. Edge nodes and the cloud server are simulated using personal computers. The edge nodes are responsible for collecting and aggregating model updates from their assigned IoT devices, and the cloud server is responsible for aggregating the models received from all edge nodes and providing feedback information.

Since the Raspberry Pi units are limited, it is challenging to scale the experiment to a larger number of devices. To address this, we adopted the scalability experiment with multiprocessing simulations under a PC with 32GB RAM and Xeon w2123. To simulate a heterogeneous device environment, we introduced different delays for each IoT device, with these delays following a normal distribution to accurately emulate variations in processing times. This approach effectively models the asynchronous behavior resulting from differences in device performance and network conditions, demonstrating the advantages of our proposed Algorithm 2.

5.1.2. Dataset and Preprocessing

The Human Activity Recognition Using Smartphones (HAR) dataset [47] is utilized in this study to evaluate the performance of the proposed framework. The HAR dataset comprises data collected from 30 participants performing six distinct activities: walking, walking upstairs, walking downstairs, sitting, standing, and laying. These data were captured using embedded accelerometers and gyroscopes in smartphones placed on the participants’ waists. The dataset includes a total of 10,299 instances, each represented by 561 features derived from time- and frequency-domain variables.

To simulate real-world IoT scenarios, each IoT device in our experiments is assigned data from a single participant. This configuration introduces non-IID characteristics across clients, as each participant’s data reflects their individual activity patterns. While all participants contribute data for all six activities, the proportion of each activity varies across clients, leading to heterogeneity in label distributions. This evaluation method effectively captures the heterogeneity inherent in real-world IoT environments, where devices often produce data with distinct characteristics shaped by factors such as user behavior, sensor placement, and environmental conditions, leading to varying label distributions across clients. To ensure transparency, we have analyzed the dataset’s heterogeneity in detail. To quantify the heterogeneity of the dataset, we analyzed the label distributions across clients using several metrics. The entropy of each client’s label distribution ranges from 1.747 to 1.790, indicating a relatively high level of diversity within individual client data. To measure the differences in label distributions between clients, we calculated the Jensen-Shannon Divergence (JS Divergence) and Earth Mover’s Distance (EMD) between all pairs of clients. The average JS Divergence is 0.0548 (ranging from 0.0128 to 0.1636), and the average EMD is 0.2103 (ranging from 0.0239 to 0.7634). These values suggest a moderate level of heterogeneity in label distributions across clients. The heterogeneity of the HAR dataset was analyzed using entropy, Jensen-Shannon Divergence, and Earth Mover’s Distance. As described above, the entropy ranges from 1.747 to 1.790, the average JS Divergence is 0.0548, and the average EMD is 0.2103. These analyses demonstrate that the dataset exhibits a moderate level of heterogeneity in label distributions, making it representative of real-world IoT scenarios with naturally varying data distributions.

It is important to note that the primary objective of our framework is to enhance privacy and efficiency in federated learning rather than directly improving model accuracy. Therefore, the evaluation focuses on the framework’s ability to maintain performance across heterogeneous clients rather than addressing issues such as extreme class imbalance, which are beyond the scope of this study. The HAR dataset allows us to showcase the framework’s ability to handle heterogeneous data distributions while maintaining its core objectives. Specifically, the proposed hierarchical aggregation and lightweight encryption mechanisms are designed to maintain privacy and reduce communication overhead while accommodating the diverse data patterns inherent in IoT networks.

This experimental design ensures that our framework is evaluated in both small-scale and large-scale settings, capturing the scalability challenges and data diversity present in real-world deployments. By assigning a single user’s data to each IoT device, we simulate realistic scenarios that reflect the heterogeneity of user-generated data, therefore providing a comprehensive assessment of the framework’s performance under varying conditions.

5.1.3. Implementation and Hyperparameter

To facilitate efficient communication between IoT devices, edge nodes, and the cloud server, we implemented WebSocket as the communication protocol. WebSocket provides a full-duplex communication channel over a single TCP connection, enabling low latency and efficient data exchange. This choice ensures that the asynchronous nature of updates from heterogeneous devices is effectively handled, which is critical in federated-learning scenarios.

The local model training on each IoT device was implemented using TensorFlow, an open-source machine-learning library. Encryption of model updates during transmission was achieved using the SALSA20 stream cipher, which ensures privacy without introducing significant computational overhead. Figure 2 illustrates the experimental setup, showing four Raspberry Pi units simulating IoT devices alongside a control interface reflecting their real-time operation in tandem with edge and cloud environments. This setup mimics real-world IoT conditions, encompassing resource heterogeneity and network instability.

The parameters in Table 2 are designed to balance the scalability, robustness, and efficiency of the hierarchical federated-learning framework. Key parameters such as

T = 50

and

E = 1

ensure a balance between computational efficiency and the framework’s ability to achieve convergence under realistic conditions. The learning rate (

η = 0.01

) and batch size (

B = 32

) are empirically selected, adhering to best practices for IoT environments to ensure consistent local model training across heterogeneous devices. Parameters, such as

α_{edge}, β_{edge}, β_{cloud}, η

, and B align with practices commonly employed in FL setups and have been fine-tuned for hierarchical IoT environments. The dynamic parameters, including

p_{e}

and

w_{w i n d o w}

, adapt in real time during training, ensuring responsiveness to system variations while maintaining overall stability.

For example, dynamic parameters

p_{e}

and

w_{w i n d o w}

are initialized with reasonable defaults but adapt over time.

p_{e}

captures actual device participation rates and updates in each round, while

w_{w i n d o w}

adjusts based on the gap between the desired (

p_{desired} = 0.75

) and actual participation rates. These mechanisms provide flexibility and responsiveness in real-world IoT scenarios with varying device conditions. Edge-level and cloud-level aggregation weights (

α_{edge} = 0.7

,

β_{edge} = 0.3

,

β_{cloud} = 0.5

) were determined based on initial experiments. These values emphasize historical stability, incorporate current updates effectively, and balance participation frequency with model quality at the cloud level. The fixed model quality metric (

q_{e} = 1.0

) simplifies this study, focusing on participation rates (

p_{e}

) as a primary indicator of aggregation dynamics.

5.2. Experimental Results and Analysis

We present a comprehensive comparison of the performance of our proposed framework and the HierFAVG [12] method in terms of key metrics, including training accuracy, loss, training time, communication cost, and scalability trends.

5.2.1. IoT Local Training Performance

Figure 3 presents a comparative analysis of local model training accuracy and loss across 50 training rounds for both our framework and the HierFAVG method. The results highlight the convergence characteristics and stability of our proposed framework in comparison to the widely used HierFAVG approach.

The training accuracy for both methods demonstrates effective convergence by the 50th round, with negligible differences in the final performance. Specifically, our framework achieves a final accuracy of 98.5%, closely aligning with HierFAVG’s 98.7%. The observed fluctuations in the early rounds for our method are attributed to the asynchronous nature of the aggregation, which allows IoT devices to update the global model independently of synchronization constraints. Despite these fluctuations, the model stabilizes effectively as training progresses.

Similarly, the training loss for both methods converges to comparable values. By the 50th round, our framework achieves a final training loss of 0.045, slightly higher than HierFAVG’s 0.042. The early-round variance in loss for our framework is consistent with the observed accuracy fluctuations and reflects the impact of asynchronous updates. However, the convergence rates for both methods are nearly identical, confirming that our hierarchical aggregation strategy does not compromise the stability of local model training.

The negligible differences in accuracy and loss between our framework and HierFAVG are expected, as our focus is on scalability, efficiency, and privacy rather than model accuracy. The consistent convergence of both methods demonstrates the robustness of our approach in heterogeneous IoT environments. Furthermore, the asynchronous aggregation mechanism highlights the framework’s ability to handle varying device capabilities and communication latencies without sacrificing training performance.

The local training performance of our framework is on par with HierFAVG, with comparable convergence rates and outcomes. The results underscore the stability and reliability of our framework in achieving effective training under the constraints of real-world IoT environments.

5.2.2. Training Time Analysis

Figure 4 illustrates the local model training time across 50 rounds for both our framework and the HierFAVG method. The analysis reveals variations in training duration among IoT devices, primarily driven by hardware heterogeneity. For example, IoT 1 consistently achieves the shortest training time, averaging 4.2 s per round, while other devices exhibit slightly higher training times, reflecting their differing computational capabilities.

The results for our framework and HierFAVG demonstrate similar patterns, confirming that the proposed framework does not introduce significant computational overhead during local training. This finding is critical, as it validates the efficiency of our hierarchical aggregation strategy and lightweight encryption mechanism in handling the diverse hardware configurations typically found in IoT environments.

The variations in training time highlight the challenges of hardware heterogeneity in IoT deployments, with devices of lower computational power requiring longer durations, potentially affecting federated-learning efficiency. However, our framework addresses this using an asynchronous aggregation mechanism, enabling devices to contribute updates at their own pace without synchronization constraints. This approach prevents devices with lower power from becoming bottlenecks, enhancing the system’s scalability and robustness.

Moreover, the comparative analysis of training time between our framework and HierFAVG indicates that the integration of lightweight encryption and hierarchical aggregation does not adversely affect local training efficiency. The average training time per device remains consistent with that observed in HierFAVG, emphasizing the practicality of our approach in maintaining high performance while addressing security and scalability requirements.

5.2.3. Global Model Accuracy

Figure 5 depicts the global model accuracy over 50 training rounds, evaluated on a held-out test dataset at the cloud level. This analysis reflects the aggregated performance of all participating IoT devices and highlights differences between the asynchronous aggregation approach of our framework and the synchronous strategy employed by HierFAVG.

Our framework exhibits noticeable oscillations in accuracy during the initial 20 rounds, stabilizing in later stages to achieve a final accuracy of 94.2%, which is comparable to HierFAVG’s 94.6%. The early-round oscillations in our framework are primarily due to the asynchronous nature of the aggregation process, where devices send updates at varying intervals based on their local training progress and network conditions. These asynchronous updates can lead to temporary inconsistencies in the global model, as the aggregated parameters may reflect uneven contributions from devices.

From a theoretical perspective, asynchronous aggregation inherently accommodates the diverse capabilities and latencies of IoT devices but at the cost of potential staleness in some model updates. For instance, faster devices may contribute more frequently, while slower devices may lag in their updates, causing temporary imbalances in the global model. However, as training progresses, the system gradually incorporates updates from all devices, reducing these inconsistencies and achieving convergence. This behavior underscores the robustness of our hierarchical aggregation strategy, which balances device heterogeneity while maintaining long-term stability.

By comparison, HierFAVG achieves consistent convergence due to its synchronous aggregation mechanism, which waits for updates from all devices before proceeding. While this ensures uniform contributions to the global model, it also introduces delays, particularly in heterogeneous environments where slower devices can act as bottlenecks. Overall, the comparable accuracy achieved by both approaches after 30 rounds indicates that asynchronous aggregation does not compromise the quality of the global model in the long term. Instead, it provides a practical trade-off by reducing synchronization constraints, making it well-suited for real-world IoT deployments with diverse device capabilities and network conditions.

5.2.4. Round-Time Cost Analysis

The round-time cost for our framework and HierFAVG is compared in Figure 6. The proposed asynchronous framework demonstrates a lower average round time of 9.8 s per round, compared to 12.3 s for HierFAVG. This significant reduction in round time is achieved through the asynchronous aggregation mechanism, which allows global model updates to proceed independently of slower devices, thus mitigating the impact of network latency and computational delays.

The asynchronous framework’s ability to tolerate outliers is particularly evident in scenarios where individual devices experience temporary spikes in communication time. For example, IoT 2 occasionally exhibits communication delays of up to 14 s. Despite such fluctuations, the overall system time remains stable due to the asynchronous handling of updates, which prioritizes aggregate progress over strict synchronization.

In contrast, the HierFAVG approach requires synchronous updates from all devices, resulting in higher round times as slower devices delay the aggregation process. While this ensures consistent contributions from all devices, it also highlights a scalability limitation in heterogeneous environments, where devices with varying computational capabilities and network conditions are common. The results emphasize the scalability and efficiency of the asynchronous aggregation mechanism in reducing the round-time costs while maintaining robust system performance. This capability is particularly advantageous in large-scale IoT deployments, where minimizing latency and maximizing throughput are critical for effective federated learning.

5.2.5. Scalability Evaluation

Scalability is a critical aspect of IoT frameworks, particularly in large-scale deployments where efficient communication and computation are essential. Figure 7a,b describe the scalability evaluation of our method compared to HierFAVG by analyzing average round time and throughput across varying numbers of IoT devices, including 4, 8, 12, 16, and 20.

Figure 7a illustrates the comparison of average round time. Our framework maintains a near-constant trend in round time as the number of devices increases, reflecting the effectiveness of its asynchronous aggregation and sliding window mechanism. Specifically, the round time increases only modestly from 8.64 s (4 devices) to 10.71 s (20 devices), with fluctuations remaining minimal (standard deviation: 0.83 s). This stability arises from our asynchronous aggregation strategy, which decouples the global model update process from device synchronization. Unlike traditional synchronous aggregation approaches, where the slowest devices (stragglers) delay the entire process, our method allows updates to proceed without waiting for these outliers, therefore significantly reducing round-time variability. Moreover, our sliding window mechanism dynamically adjusts the aggregation interval based on device participation rates, ensuring that devices with lower computational power or unstable connectivity do not unduly impact the system’s overall performance. These mechanisms collectively enable our framework to achieve a stable and scalable performance profile as the number of devices grows. Conversely, although HierFAVG also employs edge-level aggregation, its synchronous nature inherently amplifies delays caused by stragglers, leading to a linear increase in round time with the addition of more devices.

Figure 7b presents the throughput performance. Our framework demonstrates a consistent upward trend in throughput as the number of devices increases, emphasizing its capability to effectively handle larger-scale IoT deployments. The throughput grows from 403.64 KB/s (4 devices) to 1605.04 KB/s (20 devices), achieving a 298.0% improvement. In contrast, HierFAVG also exhibits an increasing trend in throughput, growing from 338.18 KB/s (4 devices) to 1205.29 KB/s (20 devices), representing a 256.4% improvement. However, the synchronous nature of HierFAVG introduces significant delays as the number of devices increases, primarily due to the need to wait for all devices to complete their updates before aggregation. This synchronization overhead becomes a bottleneck in larger networks, limiting its ability to fully utilize available resources. The observed divergence between the two methods becomes more pronounced as the network scales. For instance, with 4 devices, our framework achieves a 19.4% higher throughput compared to HierFAVG (403.64 KB/s vs. 338.18 KB/s). This advantage expands to 33.2% with 20 devices (1605.04 KB/s vs. 1205.29 KB/s), reflecting the increasing impact of asynchronous aggregation in mitigating the challenges of device heterogeneity and communication delays.

From a scalability perspective, these trends indicate that our framework is well-equipped to handle even larger IoT networks. The hierarchical and adaptive design, combined with efficient aggregation mechanisms, suggests that throughput will continue to scale linearly or sub-linearly with additional devices as long as the edge nodes maintain sufficient computational and communication capacity. Furthermore, the diminishing marginal increase in round time observed in Figure 7a implies that our framework effectively balances computational and communication workloads, avoiding the bottlenecks that hinder synchronous methods like HierFAVG.

Overall, these results not only validate the scalability of our framework but also ensure lower round times and higher throughput compared to HierFAVG, particularly as the network scales. By leveraging asynchronous aggregation and sliding window control, our approach offers a promising solution for real-world IoT deployments, where scalability and efficiency are paramount. Meanwhile, these findings indicate that the hierarchical aggregation strategy effectively mitigates the challenges posed by increasing device counts, providing a scalable solution for real-world IoT systems.

5.2.6. IoT Devices Overhead

Table 3 summarizes the breakdown of communication time, training time, round time, and the additional overhead brought by encryption and decryption in our framework. A key distinction of our approach compared to HierFAVG lies in the lightweight encryption and decryption operations integrated at the IoT device level, ensuring secure communication without significantly increasing device workloads.

The results demonstrate that the additional overhead introduced by our framework is minimal. Specifically, the average encryption and decryption time per round is approximately 0.066 s, accounting for less than 1% of the total round time (ranging from 8.48 to 10.28 s). This minimal computational cost underscores the practicality of the framework in resource-limited IoT environments. Additionally, since federated learning involves transmitting fixed-size model parameters, the encryption process does not increase the size of the transmitted data. As a result, the communication time remains unchanged, ranging from 3.94 to 4.35 s, further confirming the framework’s scalability.

Our framework introduces encryption and decryption at the IoT device level as the sole additional operation. This contrasts with the centralized approach of HierFAVG, where no encryption is applied to the device side. However, despite the inclusion of security mechanisms, the overall round time of our framework remains competitive due to the lightweight nature of the encryption operations and the optimized hierarchical aggregation strategy employed at the edge and cloud levels.

The practical implications of this additional overhead are further validated by the consistent round times observed across devices. Although training time varies slightly (ranging from 4.13 to 5.95 s), the overall round time almost remains consistent, demonstrating the framework’s effective balancing of device-level tasks. The negligible impact on device resources ensures that IoT devices can maintain their primary focus on training and communication without being constrained by security mechanisms.

The results show that our framework balances security and efficiency, introducing minimal additional computational and communication overhead at the IoT device level. By ensuring secure data exchange without increasing the size of transmitted data or imposing significant delays, our framework aligns with the demands of scalable and resource-efficient federated learning in IoT networks.

6. Discussion

The HAR dataset exemplifies the heterogeneity commonly found in IoT environments, where client-level data often exhibit non-iid. Despite this natural variability, as analyzed in Section 5.1.2, our hierarchical framework demonstrated consistent performance, maintaining stable global accuracy across training rounds. This highlights the robustness of the localized edge-level aggregation and adaptive weighting mechanisms, which effectively mitigate the impact of distributional differences.

By leveraging these mechanisms, the framework achieves efficient global updates without bringing significant overhead or requiring extensive preprocessing. These results suggest that our framework is well-suited for diverse IoT scenarios, where client data may inherently deviate from idealized distributions.

The experimental results unequivocally demonstrate that our asynchronous aggregation strategy, in conjunction with lightweight encryption, achieves comparable model accuracy to HierFAVG while providing significant benefits in communication efficiency and resilience to device heterogeneity. Specifically, the reduced round-time cost observed in Figure 6 emphasizes the robustness of the proposed framework in handling network instability and hardware variability, ensuring that slower participants do not become bottlenecks.

The initial fluctuations in training accuracy during the early rounds, as highlighted in Figure 3 and Figure 5, suggest a trade-off between aggregation frequency and model stability. Nonetheless, by round 30, the convergence behavior of our framework aligns closely with HierFAVG, with a final accuracy difference of only 0.4%, indicating that asynchronous aggregation does not compromise the overall quality of the model. This is crucial in IoT environments, where timely and efficient updates are often prioritized over synchronized operations. The analysis of encryption, as presented in Table 3, illustrates that the encryption overhead is minimal, averaging 0.8 s per round, and thus suitable for federated-learning systems involving resource-constrained IoT devices. The results demonstrate that our framework not only maintains data security but also operates within the practical computational limits of IoT devices.

Our framework’s adaptability and performance suggest practical applications across a wide range of IoT scenarios, particularly those involving heterogeneous devices and dynamic environments. For instance, the HAR dataset used in our experiments exemplifies human activity recognition systems where data are collected from wearable devices via embedded sensors. These systems are inherently heterogeneous, with devices varying in computational capabilities and connectivity. Our framework’s asynchronous aggregation mechanism ensures that updates from such diverse devices are efficiently handled, maintaining timely and accurate predictions even under network instability. In healthcare monitoring, this advantage translates directly to secure and efficient communication of sensitive patient data from wearable devices to cloud servers, leveraging our lightweight encryption to protect privacy without overburdening resource-constrained devices. Similarly, in broader IoT applications such as industrial IoT and smart agriculture, the framework’s scalability enables it to manage data aggregation from numerous sensors monitoring machinery, environmental conditions, or crop health, optimizing communication overhead and ensuring robust system responsiveness. These examples highlight the framework’s versatility and its potential to enhance performance across diverse IoT domains.

The primary focus of this study is to enhance scalability and privacy in federated learning through hierarchical aggregation and lightweight encryption, addressing the critical challenges of device heterogeneity and secure communication in IoT environments. While runtime efficiency and performance comparisons with alternative methods are valuable, they are beyond the immediate scope of this work. Due to hardware resource constraints, our experiments are currently limited to IoT networks with up to 20 clients, as larger-scale simulations lead to significant performance degradation on available hardware. Despite these limitations, we have conducted comprehensive trend analyses using client counts of [4, 8, 12, 16, 20], which provide valuable insights into the performance and scalability of the proposed framework. The primary focus of our current study is on enhancing scalability and privacy through hierarchical aggregation and lightweight encryption. However, we acknowledge that runtime efficiency and performance comparisons with alternative methods remain important areas for further exploration. In particular, we plan to expand our evaluations by optimizing the simulation framework and leveraging more advanced hardware to support larger-scale IoT networks and more diverse device configurations. This will enable us to further explore the trade-offs and benefits of our asynchronous aggregation strategy, ensuring its robustness and applicability in even more dynamic and heterogeneous IoT environments.

Our experiments are designed around the HAR dataset, which inherently reflects user-specific data distribution, aligning with real-world IoT scenarios. While addressing model biases and accuracy under artificially skewed distributions is an important topic, it is beyond the scope of this study, as our primary focus is on improving scalability, efficiency, and privacy in hierarchical FL systems.

7. Conclusions

In summary, we proposed a hierarchical federated-learning framework designed to address the challenges posed by heterogeneous IoT environments, focusing on improving communication efficiency and ensuring data security through lightweight encryption. By employing hierarchical aggregation, asynchronous communication, and lightweight stream encryption, our framework achieved a significant reduction in communication cost compared to the traditional HierFAVG, with an average reduction of approximately 20% in round-time cost. Additionally, the use of SALSA encryption provided data security with minimal computational overhead, averaging 0.8 s per round, demonstrating its feasibility for resource-constrained devices. Despite the presence of hardware and network heterogeneity, our experimental results showed that the proposed framework achieved comparable model accuracy to HierFAVG, with a final accuracy difference of only 0.4%. These findings indicate that our framework provides an efficient, secure, and scalable solution for federated learning in dynamic IoT environments. Based on this, our future work will focus on extending the evaluation to larger-scale IoT networks, incorporating hundreds or thousands of devices, to analyze the latency and scalability in such environments. Furthermore, adaptive privacy-preserving mechanisms, such as secure aggregation protocols and dynamic encryption schemes, will be explored to address heightened privacy concerns in real-world deployments.

Author Contributions

C.Q. performed formal analysis, methodology, software, original draft, and review; Z.W. and H.W. performed conceptualization, investigation, and data curation; Q.Y. performed visualization, validation, and draft review and editing; Y.W. performed resources, conceptualization, and draft review and editing; C.S. performed supervision, draft review, and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by JSPS Grant-in-Aid for Scientific Research (C) 23K11103.

Data Availability Statement

The data used in this study are publicly available datasets. The HAR dataset is available at https://archive.ics.uci.edu/dataset/240/human+activity+recognition+using+smartphones (accessed on 10 September 2024).

Acknowledgments

We would like to thank Yihua Zheng (Guangxi Academy of Sciences) for his valuable contributions in reviewing the manuscript and providing insightful comments that greatly improved the Introduction and Related Work sections.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chanal, P.; Kakkasageri, M. Security and privacy in IoT: A survey. Wirel. Pers. Commun. 2020, 115, 1667–1693. [Google Scholar] [CrossRef]
Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Poor, H.V. Federated learning for internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
Lyu, L.; Yu, H.; Ma, X.; Chen, C.; Sun, L.; Zhao, J.; Yang, Q.; Philip, S. Privacy and robustness in federated learning: Attacks and defenses. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 8726–8746. [Google Scholar] [CrossRef] [PubMed]
Tian, Y.; Che, J.; Zhang, Z.; Yang, Z. Federated Learning with Unsourced Random Access. In Proceedings of the 2023 IEEE 97th Vehicular Technology Conference (VTC2023-Spring), Florence, Italy, 20–23 June 2023; pp. 1–5. [Google Scholar]
Wang, Y.; Wolfrath, J.; Sreekumar, N.; Kumar, D.; Chandra, A. Accelerated training via device similarity in federated learning. In Proceedings of the 4th International Workshop on Edge Systems, Analytics and Networking, Online, 26 April 2021; pp. 31–36. [Google Scholar]
Mohammad, U.; Sorour, S.; Hefeida, M. Task allocation for mobile federated and offloaded learning with energy and delay constraints. In Proceedings of the 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282. [Google Scholar]
Ye, M.; Fang, X.; Du, B.; Yuen, P.; Tao, D. Heterogeneous federated learning: State-of-the-art and research challenges. ACM Comput. Surv. 2023, 56, 1–44. [Google Scholar] [CrossRef]
Hardy, S.; Henecka, W.; Ivey-Law, H.; Nock, R.; Patrini, G.; Smith, G.; Thorne, B. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv 2017, arXiv:1711.10677. [Google Scholar]
Wu, J.; Dong, F.; Leung, H.; Zhu, Z.; Zhou, J.; Drew, S. Topology-aware federated learning in edge computing: A comprehensive survey. ACM Comput. Surv. 2024, 56, 1–41. [Google Scholar] [CrossRef]
Li, Y.; Wang, X.; An, L. Hierarchical clustering-based personalized federated learning for robust and fair human activity recognition. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies; Association for Computing Machinery: New York, NY, USA, 2023; Volume 7, pp. 1–38. [Google Scholar]
Liu, L.; Zhang, J.; Song, S.; Letaief, K.B. Client-edge-cloud hierarchical federated learning. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
Zhou, H.; Zheng, Y.; Huang, H.; Shu, J.; Jia, X. Toward robust hierarchical federated learning in internet of vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5600–5614. [Google Scholar] [CrossRef]
Ananth, P.; Jain, A.; Jin, Z.; Malavolta, G. Multi-key fully-homomorphic encryption in the plain model. In Proceedings of the Theory of Cryptography: 18th International Conference, TCC 2020, Durham, NC, USA, 16–19 November 2020; Proceedings, Part I, 2020. pp. 28–57. [Google Scholar]
Gehlhar, T.; Marx, F.; Schneider, T.; Suresh, A.; Wehrle, T.; Yalame, H. SafeFL: MPC-friendly framework for private and robust federated learning. In Proceedings of the 2023 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 25 May 2023; pp. 69–76. [Google Scholar]
Wei, K.; Li, J.; Ma, C.; Ding, M.; Chen, W.; Wu, J.; Tao, M.; Poor, H. Personalized federated learning with differential privacy and convergence guarantee. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4488–4503. [Google Scholar] [CrossRef]
Xie, Q.; Jiang, S.; Jiang, L.; Huang, Y.; Zhao, Z.; Khan, S.; Dai, W.; Liu, Z.; Wu, K. Efficiency Optimization Techniques in Privacy-Preserving Federated Learning With Homomorphic Encryption: A Brief Survey. IEEE Internet Things J. 2024, 11, 24569–24580. [Google Scholar] [CrossRef]
Zhou, I.; Tofigh, F.; Piccardi, M.; Abolhasan, M.; Franklin, D.; Lipman, J. Secure Multi-Party Computation for Machine Learning: A Survey. IEEE Access 2024, 12, 53881–53899. [Google Scholar] [CrossRef]
Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H.; Farokhi, F.; Jin, S.; Quek, T.; Vincent Poor, H. Federated Learning With Differential Privacy: Algorithms and Performance Analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
Bernstein, D. The Salsa20 family of stream ciphers. In New Stream Cipher Designs: The ESTREAM Finalists; Springer: Berlin/Heidelberg, Germany, 2008; pp. 84–97. [Google Scholar]
Tan, A.Z.; Yu, H.; Cui, L.; Yang, Q. Towards personalized federated learning. IEEE Trans. Neural Networks Learn. Syst. 2022, 34, 9587–9603. [Google Scholar] [CrossRef] [PubMed]
Li, A.; Sun, J.; Li, P.; Pu, Y.; Li, H.; Chen, Y. Hermes: An efficient federated learning framework for heterogeneous mobile clients. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, New Orleans, LA, USA, 25–29 October 2021; pp. 420–437. [Google Scholar]
Li, A.; Sun, J.; Zeng, X.; Zhang, M.; Li, H.; Chen, Y. Fedmask: Joint computation and communication-efficient personalized federated learning via heterogeneous masking. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems, Coimbra, Portugal, 15–17 November 2021; pp. 42–55. [Google Scholar]
Pfeiffer, K.; Rapp, M.; Khalili, R.; Henkel, J. Federated learning for computationally constrained heterogeneous devices: A survey. ACM Comput. Surv. 2023, 55, 1–27. [Google Scholar] [CrossRef]
Abdelmoniem, A.; Ho, C.; Papageorgiou, P.; Canini, M. A comprehensive empirical study of heterogeneity in federated learning. IEEE Internet Things J. 2023, 10, 14071–14083. [Google Scholar] [CrossRef]
Huang, W.; Ye, M.; Shi, Z.; Wan, G.; Li, H.; Du, B.; Yang, Q. Federated learning for generalization, robustness, fairness: A survey and benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 9387–9406. [Google Scholar] [CrossRef]
Jiang, Z.; Xu, Y.; Xu, H.; Wang, Z.; Liu, J.; Chen, Q.; Qiao, C. Computation and communication efficient federated learning with adaptive model pruning. IEEE Trans. Mob. Comput. 2023, 23, 2003–2021. [Google Scholar] [CrossRef]
Chen, J.; Yan, H.; Liu, Z.; Zhang, M.; Xiong, H.; Yu, S. When federated learning meets privacy-preserving computation. ACM Comput. Surv. 2024, 56, 1–36. [Google Scholar] [CrossRef]
Aziz, R.; Banerjee, S.; Bouzefrane, S.; Le Vinh, T. Exploring homomorphic encryption and differential privacy techniques towards secure federated learning paradigm. Future Internet 2023, 15, 310. [Google Scholar] [CrossRef]
Hijazi, N.; Aloqaily, M.; Guizani, M.; Ouni, B.; Karray, F. Secure federated learning with fully homomorphic encryption for iot communications. IEEE Internet Things J. 2024, 11, 4289–4300. [Google Scholar] [CrossRef]
Liu, F.; Zheng, Z.; Shi, Y.; Tong, Y.; Zhang, Y. A survey on federated learning: A perspective from multi-party computation. Front. Comput. Sci. 2024, 18, 181336. [Google Scholar] [CrossRef]
Xu, C.; Qu, Y.; Xiang, Y.; Gao, L. Asynchronous federated learning on heterogeneous devices: A survey. Comput. Sci. Rev. 2023, 50, 100595. [Google Scholar] [CrossRef]
Wang, B.; Chen, Y.; Jiang, H.; Zhao, Z. Ppefl: Privacy-preserving edge federated learning with local differential privacy. IEEE Internet Things J. 2023, 10, 15488–15500. [Google Scholar] [CrossRef]
Daemen, J.; Rijmen, V. Rijndael: The advanced encryption standard. Dr. Dobb’s J. 2001, 26, 137–139. [Google Scholar]
Zhu, X.; Yu, S.; Wang, J.; Yang, Q. Efficient Model Compression for Hierarchical Federated Learning. arXiv 2024, arXiv:2405.17522. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Wang, J.; Liu, Q.; Liang, H.; Joshi, G.; Poor, H.V. Tackling the objective inconsistency problem in heterogeneous federated optimization. Adv. Neural Inf. Process. Syst. 2020, 33, 7611–7623. [Google Scholar]
Hong, J.; Wang, H.; Wang, Z.; Zhou, J. Federated robustness propagation: Sharing adversarial robustness in heterogeneous federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 7893–7901. [Google Scholar]
Guo, J.; Wu, J.; Liu, A.; Xiong, N.N. LightFed: An efficient and secure federated edge learning system on model splitting. IEEE Trans. Parallel Distrib. Syst. 2021, 33, 2701–2713. [Google Scholar] [CrossRef]
Zhang, H.; Bosch, J.; Olsson, H.H. EdgeFL: A lightweight decentralized federated learning framework. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 556–561. [Google Scholar]
Guo, Y.; Liu, F.; Zhou, T.; Cai, Z.; Xiao, N. Privacy vs. efficiency: Achieving both through adaptive hierarchical federated learning. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 1331–1342. [Google Scholar] [CrossRef]
Gkonis, P.; Giannopoulos, A.; Trakadas, P.; Masip-Bruin, X.; D’Andria, F. A survey on IoT-edge-cloud continuum systems: Status, challenges, use cases, and open issues. Future Internet 2023, 15, 383. [Google Scholar] [CrossRef]
Gazziro, M.; Carmo, J.P. Power Consumption Efficiency of Encryption Schemes for RFID. Chips 2024, 3, 216–228. [Google Scholar] [CrossRef]
Gorbenko, I.; Kuznetsov, A.; Gorbenko, Y.; Vdovenko, S.; Tymchenko, V.; Lutsenko, M. Studies on statistical analysis and performance evaluation for some stream ciphers. Int. J. Comput. 2019, 18, 82–88. [Google Scholar] [CrossRef]
Bruen, A.A.; Forcinito, M.A.; McQuillan, J.M. General and Mathematical Attacks in Cryptography. In Cryptography, Information Theory, and Error-Correction: A Handbook for the 21st Century; John Wiley & Sons, Ltd.: New York, NY, USA, 2004; Chapter 7; pp. 143–164. [Google Scholar]
Knutson, P. Vulnerability Analysis of Salsa20: Differential Analysis and Deep Learning Analysis of Salsa20. Master’s Thesis, Universitetet i Sørøst-Norge, Notodden, Norway, 2020. [Google Scholar]
Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J. A public domain dataset for human activity recognition using smartphones. In Proceedings of the Esann, Bruges, Belgium, 24–26 April 2013; Volume 3, p. 3. [Google Scholar]

Figure 1. Overview of the Hierarchical Federated Learning in the Cloud-Edge-Device scenarios.

Figure 2. Using Raspberry Pi units to simulate the IoT devices.

Figure 3. Comparison of local model training accuracy and loss across 50 rounds. (a) Training accuracy with our framework. (b) Training accuracy with HierFAVG. (c) Training loss with our framework. (d) Training loss with HierFAVG.

Figure 4. Local model training time cost across 50 rounds. (a) Training time with our framework. (b) Training time with HierFAVG.

Figure 5. Global model accuracy across 50 rounds. (a) Global accuracy with our framework. (b) Global accuracy with HierFAVG.

Figure 6. Round-time cost comparison between frameworks. (a) Round time cost for individual devices. (b) Total round-time cost over 50 rounds.

Figure 7. Scalability evaluation. (a) Average round time across different numbers of IoT devices. (b) Throughput across different numbers of IoT devices.

Table 1. Comparison of Salsa20 and AES (adapted from [43,44]).

Metric	Salsa20	AES
Average Power Consumption (μW)	2.82	4.01
Encryption/Decryption Latency	202 cycles	180 cycles
Encryption Speed (1GB data, Mbps)	3624	3290
Encryption Speed (40-byte packets, Mbps)	2286	2240
Encryption Speed (1500-byte packets, Mbps)	3429	3297

Table 2. Hyperparameters Settings.

Name	Symbol	Value
Number of Rounds	T	50
Epochs per Round	E	1
Participation Frequency	$p_{e}$	$0.0$
Model Quality	$q_{e}$	$1.0$
Edge Aggregation Weight (History)	$α_{edge}$	$0.5$
Edge Aggregation Weight (Update)	$β_{edge}$	$0.5$
Cloud Aggregation Weight	$β_{cloud}$	$0.5$
Sliding Window Size	$w_{w i n d o w}$	7
Desired Participation Rate	$p_{desired}$	$0.75$
Learning Rate	$η$	$0.01$
Batch Size	B	32

Table 3. Summary of communication, training, round time, and encryption impact on the federated-learning process.

Device	Comm. Time (s)	Training Time (s)	Round Time (s)	En/Decryption Time (s)
IoT 1	4.35	4.13	8.48	0.064
IoT 2	4.33	5.95	10.28	0.067
IoT 3	3.94	5.86	9.80	0.066
IoT 4	4.11	5.70	9.81	0.064

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, C.; Wu, Z.; Wang, H.; Yang, Q.; Wang, Y.; Su, C. Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency. Future Internet 2025, 17, 18. https://doi.org/10.3390/fi17010018

AMA Style

Qiu C, Wu Z, Wang H, Yang Q, Wang Y, Su C. Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency. Future Internet. 2025; 17(1):18. https://doi.org/10.3390/fi17010018

Chicago/Turabian Style

Qiu, Chen, Ziang Wu, Haoda Wang, Qinglin Yang, Yu Wang, and Chunhua Su. 2025. "Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency" Future Internet 17, no. 1: 18. https://doi.org/10.3390/fi17010018

APA Style

Qiu, C., Wu, Z., Wang, H., Yang, Q., Wang, Y., & Su, C. (2025). Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency. Future Internet, 17(1), 18. https://doi.org/10.3390/fi17010018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Aggregation for Federated Learning in Heterogeneous IoT Scenarios: Enhancing Privacy and Communication Efficiency

Abstract

1. Introduction

2. Related Work

2.1. Privacy-Preserving Techniques in FL

2.2. Optimization Strategies for FL in Heterogeneous IoT Environments

2.3. Hierarchical Architectures for Scalability

3. Proposed Framework

3.1. System Architecture

3.2. Hierarchical Federated Learning

3.3. Asynchronous Aggregation with Edge and Cloud Interaction

3.4. Cloud-Level Feedback and Adaptive Weight Adjustment

3.5. Mechanism for Enhancing Privacy and Security

3.5.1. Principle and Mechanism of Salsa20 Encryption

3.5.2. Comparison with Alternative Cryptographic Methods

3.5.3. Impact on Network Overhead

3.5.4. Security and Attack Complexity Analysis

4. Algorithms

4.1. Local Training and Encryption at IoT Device

4.2. Edge Node Aggregation and Adaptation

4.3. Global Aggregation and Cloud-Level Feedback

5. Simulation

5.1. Experimental Setup

5.1.1. Experiment Environment

5.1.2. Dataset and Preprocessing

5.1.3. Implementation and Hyperparameter

5.2. Experimental Results and Analysis

5.2.1. IoT Local Training Performance

5.2.2. Training Time Analysis

5.2.3. Global Model Accuracy

5.2.4. Round-Time Cost Analysis

5.2.5. Scalability Evaluation

5.2.6. IoT Devices Overhead

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI