1. Introduction
With the rapid development of the Internet of Things (IoT) and edge computing, an increasing number of devices are being deployed, leading to the generation of vast amounts of data. These data, often generated in sensitive environments such as healthcare, smart homes, and industrial systems, demand high communication efficiency and strong privacy protection mechanisms [
1]. Ensuring data privacy while maintaining system efficiency has become a critical challenge in IoT environments, where numerous devices are interconnected and generate continuous streams of information [
2].
However, IoT devices are highly heterogeneous, exhibiting significant differences in computational capabilities, memory resources, and network connectivity. This heterogeneity poses substantial challenges for data privacy and efficient communication [
3,
4,
5]. Devices with limited resources may struggle to participate effectively in a distributed system, resulting in imbalanced workloads, latency, and reduced system robustness [
6]. The diversity of device capabilities and network conditions makes it difficult to design a one-size-fits-all solution for privacy-preserving data processing and communication efficiency.
Federated learning (FL), as exemplified by algorithms such as FedAvg, enables decentralized training by coordinating multiple devices to collaboratively learn a shared global model without exchanging raw data [
7]. The FL process typically consists of several iterative communication rounds, where each round involves three main steps:
(1) initialization, where the server sends the global model to selected devices;
(2) local training, where devices train the model using their local data; and
(3) aggregation, where devices upload their model updates, and the server combines these updates to refine the global model.
Despite its potential, traditional FL approaches face several limitations in heterogeneous IoT environments. A key challenge is the high round time that represents the total time for completing an iteration of the FL process. A round time is composed of three parts, including the server distributing the global model to selected devices, the local training performed on each device, and the subsequent uploading of locally updated models back to the server for aggregation. In heterogeneous IoT settings, slower devices (or stragglers) can significantly prolong the round time, therefore delaying convergence [
8]. This delay not only affects overall efficiency but also increases energy consumption and resource usage for resource-constrained devices. Furthermore, the computational and communication overhead imposed by centralized FL methods remains burdensome, limiting their practicality in such environments [
9].
To this end, hierarchical federated learning (HFL) frameworks have emerged as a promising approach to address these challenges [
10,
11], such as HierFAVG [
12]. By introducing intermediate aggregation layers (e.g., edge servers), compared to FedAvg, HFL reduces communication overhead to the central cloud and enables localized aggregation, therefore improving scalability and resource utilization. Nevertheless, most existing HFL solutions focus on architectural optimization and ignore critical aspects such as security mechanisms, device heterogeneity, and asynchronous coordination [
13]. These limitations leave gaps in ensuring both privacy and efficiency under highly dynamic and heterogeneous IoT conditions.
Strengthening the privacy mechanisms employed in FL has led to the proposal of several advanced techniques, such as homomorphic encryption, secure multi-party computation (SMPC), and differential privacy have been proposed [
14,
15,
16]. Although these methods provide strong security guarantees, they impose significant computational and storage requirements, making them impractical for IoT devices [
17,
18]. Furthermore, these techniques may degrade model accuracy due to the added computational complexity and data perturbation, which is particularly problematic for high-precision tasks like anomaly detection or medical monitoring in IoT-based healthcare [
19].
To this end, this paper proposes a new framework that integrates hierarchical federated learning, asynchronous aggregation, and lightweight encryption to address key challenges effectively. The framework employs a hierarchical architecture based on an IoT-Edge-Cloud model, where data processing and aggregation occur at multiple levels to enhance both efficiency and scalability. In particular, the hierarchical structure facilitates local aggregation at the edge, which reduces the communication overhead on the central server and improves overall scalability. Moreover, asynchronous aggregation alleviates the issue of stragglers by allowing devices to submit updates independently without the need for synchronization with all other nodes. Additionally, we employ the lightweight Salsa20 encryption to protect data during transmission [
20], effectively countering man-in-the-middle attacks while maintaining low computational overhead suitable for constrained devices. Additionally, we employ the lightweight Salsa20 encryption to secure data transmission, protecting a range of potential threats, including eavesdropping and data tampering, while maintaining low computational overhead suitable for resource-constrained devices [
20]. The detailed security analysis, including the algorithm’s resilience to common attack vectors, is presented in
Section 3.5.4.
Unlike traditional FL and existing hierarchical frameworks, the proposed framework uniquely integrates asynchronous aggregation and lightweight encryption into a hierarchical design. This integration not only addresses straggler issues by decoupling global synchronization but also ensures secure data transmission without imposing prohibitive computational overhead. In doing so, our approach effectively balances scalability, privacy, and efficiency, making it particularly suitable for IoT environments characterized by dynamic heterogeneity and limited resources. The key contributions of this paper are as follows:
Hierarchical Federated-Learning Framework: We design an HFL framework tailored for heterogeneous IoT environments, leveraging edge-level aggregation to reduce communication overhead and enhance privacy.
Asynchronous Aggregation: An asynchronous aggregation strategy is introduced to mitigate straggler effects, enabling efficient training by decoupling slow devices from global synchronization.
Lightweight Encryption: Salsa20 is employed as a lightweight encryption mechanism, ensuring secure data transmission with minimal computational overheads and addressing privacy concerns in resource-constrained IoT devices.
Experimental Validation: We validate the framework through extensive experiments in a real-world IoT scenario, demonstrating a 20% reduction in round time compared to traditional FL methods without compromising model accuracy.
The proposed framework is demonstrated through a human activity recognition (HAR) use case, showcasing its applicability in privacy-sensitive and accuracy-critical scenarios. Our results highlight the effectiveness of integrating hierarchical design, asynchronous updates, and lightweight encryption to overcome the scalability, efficiency, and privacy challenges in heterogeneous IoT environments.
Relationship with Existing FL Research: The proposed framework is designed to address the unique challenges of system heterogeneity, scalability, and privacy in IoT environments. Unlike existing techniques such as personalization [
21], pruning [
22], and masking [
23], which focus on optimizing specific aspects of federated learning, Our framework tackles systemic variations and potential stragglers in hierarchical IoT networks. These approaches are complementary to our framework and could be integrated into the hierarchical framework to further enhance model performance. However, exploring such integrations is beyond the scope of this paper. Instead, this work focuses on establishing the efficacy of hierarchical aggregation, asynchronous updates, and lightweight encryption in addressing the critical challenges of federated learning under highly heterogeneous and resource-constrained IoT conditions.
The rest of the paper is organized as follows.
Section 2 reviews related work in federated learning and its application to IoT environments.
Section 3 presents the proposed framework.
Section 4 describes the proposed algorithms.
Section 5 provides the simulation and evaluation results.
Section 6 discusses the implications of the experimental results and explores potential areas for improvement. Finally,
Section 7 concludes the paper with a summary of contributions and directions for future research.
3. Proposed Framework
In this section, we present our proposed asynchronous aggregation strategy within the hierarchical federated-learning framework, specifically tailored for heterogeneous IoT environments, as illustrated in
Figure 1. This framework is depicted using six steps: ❶ The IoT devices obtain the initialization model from the cloud server; ❷ IoT devices train their models with the local dataset; ❸ IoT devices pull the trained parameters to edge server; ❹ During training, the edge server pre-aggregates parameters from IoT devices using a sliding time window strategy, avoiding delays and uploading to the cloud based on real-time needs; ❺ The cloud server aggregated the parameters uploaded by the edge server; ❻ IoT devices push the updated parameters from the edge server.
Our strategy aims to leverage edge nodes and the central server collaboratively to optimize the model aggregation process, improve timeliness, and handle short-term instabilities that arise from the heterogeneity and asynchronous nature of IoT devices.
3.1. System Architecture
The system architecture follows a three-layer structure comprising IoT devices, edge nodes, and a central server. In this architecture, IoT devices collect data and perform local model training, edge nodes serve as intermediate aggregators, and the central server is responsible for global model aggregation. By leveraging edge nodes for local aggregation, the hierarchical design helps reduce communication costs and enhances scalability. This localized aggregation effectively reduces the communication burden on the central server, making the architecture particularly suitable for IoT environments with constrained resources.
Let N denote the number of IoT devices, indexed by , and let E denote the number of edge nodes, indexed by . Each IoT device i collects local data and trains a local model . The edge node e aggregates the local models received from a subset of IoT devices connected to it, and the central server aggregates the updates from all edge nodes. This approach not only reduces network congestion but also effectively utilizes the computational capabilities of edge nodes.
3.2. Hierarchical Federated Learning
In our hierarchical federated-learning approach, the aggregation process occurs at two levels: edge-level aggregation and global aggregation. Each IoT device
i trains its local model using its data
, resulting in a model update
. The edge node
e performs aggregation over the subset
of IoT devices using the following formula:
where
represents the aggregated model at the edge node
e at time
t, and
is the set of IoT devices that have successfully uploaded their updates by time
t. The asynchronous nature of this aggregation allows the system to maintain responsiveness even if some devices are slower to complete their local training.
After edge-level aggregation, the aggregated model
is sent to the central server, which performs a global aggregation over all edge nodes. The central server uses an adaptive weighting mechanism to aggregate the models received from the edge nodes, where each edge node is assigned a weight
based on its participation frequency
and model quality metric
:
where
is a tunable parameter controlling the influence of participation frequency versus model quality. The global aggregation is then given by:
This weighted aggregation ensures that edge nodes with higher reliability and better model quality contribute more significantly to the global model, leading to improved convergence and overall system performance.
3.3. Asynchronous Aggregation with Edge and Cloud Interaction
The aggregation process in our framework takes place across both the edge and cloud layers, incorporating dynamic interactions to ensure efficient and timely model updates. Unlike traditional hierarchical federated learning, where the cloud server merely waits for edge nodes to complete aggregation, our approach allows for active cloud-level control and feedback, enhancing the overall efficiency and model performance.
To further enhance edge-level aggregation, we employ a sliding window control mechanism. The sliding window allows each edge node to dynamically adjust the aggregation interval based on the recent arrival rate of updates from IoT devices. Specifically, the window size is adjusted according to the device participation rate and training time, inspired by concepts from queue theory. Let
denote the window size, which is updated as follows:
where
is a hyperparameter,
is the desired participation rate, and
is the actual participation rate observed in the previous window. This mechanism ensures that the aggregation window size is neither too short nor too long, therefore balancing responsiveness and stability.
Sliding window control is a versatile technique commonly used in networking and data transmission to improve efficiency and reliability, especially in dynamic environments. Adjusting the window size in real time helps prevent data loss, adapt to changes in device participation, and maintain a balance between accuracy and responsiveness. In the context of FL, sliding window control enables real-time adaptability to heterogeneous IoT environments, aligning aggregation intervals with actual participation to maintain system robustness.
3.4. Cloud-Level Feedback and Adaptive Weight Adjustment
To further enhance the effectiveness of the aggregation process, the central server provides adaptive feedback to the edge nodes. This feedback includes updated weights and guidance on adjusting local aggregation intervals or the weighting of specific devices. For example, if the cloud server identifies an edge node with low participation frequency, it may recommend increasing the aggregation interval to collect more updates, therefore enhancing the model’s robustness. Similarly, the central server may provide updated device weights to help edge nodes balance contributions from heterogeneous devices more effectively.
The adaptive weight adjustment at the edge nodes follows a simple rule based on the feedback from the central server. Let
be the local model update from IoT device
i, and let
be the updated weight provided by the central server at time
t. The edge-level aggregation at time
t can be expressed as:
This adaptive mechanism ensures that each device’s contribution is proportional to its performance, therefore optimizing the local aggregation process and aligning it with the global learning objectives.
3.5. Mechanism for Enhancing Privacy and Security
Ensuring privacy protection and secure data transmission is critical in collaborated-learning scenarios, particularly in resource-constrained IoT environments where data often traverses insecure or unreliable routing paths. We utilize the Salsa20 stream cipher, a lightweight and cryptographically secure encryption method selected for its effective balance of robust security, computational efficiency, and low network overhead.
3.5.1. Principle and Mechanism of Salsa20 Encryption
Salsa20 is a stream cipher that operates through a series of highly optimized arithmetic and bitwise operations, collectively referred to as ARX (Addition, Rotation, XOR). Unlike traditional block ciphers such as AES, Salsa20 does not rely on substitution boxes (S-boxes) or complex key scheduling algorithms, making it more suitable for IoT devices with constrained resources.
The core of Salsa20 involves generating a pseudorandom keystream from a secret key and a nonce, which is then XORed with the plaintext to produce the ciphertext. Given a plaintext model update
from IoT device
i and a secret encryption key
, the encryption process can be expressed as:
where
represents the ciphertext. The keystream is generated by repeatedly applying the Salsa20 hash function.
The computational efficiency of Salsa20 stems from its reliance on ARX operations, which are highly efficient on modern hardware and can be parallelized effectively. This ensures that encryption and decryption incur minimal computational overhead, maintaining the timeliness required for federated-learning processes.
3.5.2. Comparison with Alternative Cryptographic Methods
To evaluate the suitability of Salsa20 for federated learning in resource-constrained IoT environments, we refer to prior works [
43,
44], which compare Salsa20 and AES in terms of power consumption, encryption latency, and encryption speed.
Table 1 summarizes the key metrics.
As shown in
Table 1, Salsa20 consistently demonstrates lower power consumption (2.82 μW vs. 4.01 μW) and higher encryption speeds across different data sizes compared to AES. For instance, when encrypting a 1GB sequence, Salsa20 achieves a speed of 3624 Mbps, surpassing AES’s 3290 Mbps. Meanwhile, Salsa20 excels in handling both small (40 bytes) and large (1500 bytes) data packets, maintaining faster encryption speeds with negligible latency differences (202 vs. 180 cycles).
These results demonstrate Salsa20’s suitability for IoT environments, where energy efficiency and data transmission rates are crucial for system performance. By minimizing encryption overhead and ensuring high throughput, Salsa20 meets the scalability and efficiency needs of federated learning in heterogeneous networks. The data, adapted from prior works [
43,
44], supports the selection of an encryption mechanism, building on established findings to justify Salsa20’s inclusion in our framework, without replicating the experiments.
3.5.3. Impact on Network Overhead
Unlike homomorphic encryption or other complex schemes that increase the size of transmitted data due to added metadata or ciphertext expansion, Salsa20 maintains the original size of the plaintext. This is a direct consequence of the stream cipher design, which XORs the plaintext with a pseudorandom keystream without introducing additional bytes. As a result, the total data transmitted per IoT device remains identical before and after encryption:
where
represents the size of the data in bytes. This property ensures that the encryption process does not exacerbate communication bottlenecks, making Salsa20 highly suitable for IoT environments with limited bandwidth.
3.5.4. Security and Attack Complexity Analysis
Salsa20 has been extensively studied for its resilience against common cryptographic attacks, including brute-force, differential cryptanalysis, and side-channel attacks [
45]. Its 256-bit key length ensures a security level that is computationally infeasible to compromise with current methods [
44,
46]. Compared to block ciphers like AES, Salsa20’s ARX structure eliminates the need for S-boxes, which are often targets for side-channel attacks. This streamlined design enhances both security and efficiency in resource-constrained settings.
While Salsa20 ensures that data-in-transit remain confidential and tamper-resistant, this work primarily focuses on the encryption mechanism rather than the surrounding key management framework. Issues such as secure key generation, distribution, and revocation, as well as the implementation of secure key provisioning protocols, are beyond the scope of this paper. We assume the necessary keys are established through a trusted channel or initialization phase.
In the context of FL, adversaries may attempt to eavesdrop on or tamper with model updates during transmission. By encrypting updates with Salsa20, we ensure that the data remains confidential and tamper-resistant. Additionally, the low computational and energy requirements of Salsa20 align with the scalability objectives of our framework, allowing secure communication without sacrificing performance.
4. Algorithms
In this section, we present the proposed hierarchical federated-learning framework through a series of key algorithms, each addressing a specific aspect of the federated-learning process: local training and encryption at IoT devices, asynchronous aggregation and adaptation at edge nodes, and global aggregation with cloud-level feedback. Each algorithm is provided in pseudocode form, followed by a detailed explanation to ensure a clear understanding of their respective roles within the framework.
4.1. Local Training and Encryption at IoT Device
Each IoT device trains a local model using its personal dataset, , through multiple epochs of gradient descent. The local model, , is initialized with a global model, w, and iteratively updated using the computed gradients, . Once training is complete, the updated model is encrypted with the Salsa20 encryption algorithm using an encryption key, , to ensure data privacy before transmission. The encrypted model, , is then sent to the corresponding edge node for further aggregation.
Subsequently, we systematically introduce the computational complexity of Algorithm 1 regarding the local training procedure and the subsequent encryption. During each local epoch, the device iterates over its local dataset
, computing gradients
and updating the model parameters
. The time complexity of these operations depends on the number of training samples, input dimensionality, and gradient computation complexity. In practice, gradient evaluation often scales approximately linearly with the dataset size, and updating the model through gradient descent typically results in a time complexity on the order of
, where
E denotes the number of epochs,
N the number of local samples, and
d the feature dimension. For the space complexity, storing the model parameters and intermediate variables (e.g., gradients) typically incurs memory requirements that scale at least linearly with the number of parameters. As the complexity of the model increases (e.g., with deeper neural networks or higher-dimensional feature spaces), its memory footprint also grows. Subsequently, the encryption step using Salsa20, a well-established stream cipher, introduces only linear time complexity for the size of the data being encrypted. Since Salsa20 operates by applying a series of simple arithmetic operations (e.g., addition, XOR, and rotation) over fixed-size blocks, the complexity of encrypting the model parameters
is effective
, where
L is the length of the parameter vector.
Algorithm 1 Local Model Training and Encryption at IoT Device |
- 1:
Input: Local data at device i, initial model w, encryption key - 2:
Output: Encrypted model update - 3:
{Initialize local model} - 4:
for each local epoch do - 5:
{Compute gradient using local data} - 6:
{Update local model using gradient descent} - 7:
end for - 8:
{Encrypt updated model using Salsa20} - 9:
Send encrypted update to edge node
|
4.2. Edge Node Aggregation and Adaptation
In collaborative learning frameworks, edge nodes serve as intermediaries between a multitude of resource-constrained IoT devices and a central server, aggregating locally trained model updates to produce a refined global model. The asynchronous aggregation algorithm (Algorithm 2) leverages both historical context and current observations to stabilize learning despite heterogeneous and intermittent device participation. More specifically, the algorithm employs weighted averaging, wherein the new aggregated model is composed of a portion of the previously aggregated model and the current round’s model updates received from a subset of devices . The weight governs the influence of the prior model state, allowing the system to maintain continuity and mitigate abrupt changes. Conversely, the weight determines the relative impact of newly arrived device updates, enabling the algorithm to incorporate fresh information and adapt promptly to evolving conditions. These mechanisms help counteract the irregularities inherent in asynchronous settings, such as devices that upload updates at different times, varying data quality, or fluctuating connectivity conditions. By normalizing the final aggregation result with , the algorithm ensures a consistent reference scale, enhancing the stability and interpretability of the combined model.
Apart from the aggregation of model parameters, the algorithm integrates adaptive control strategies to sustain an optimal balance between update frequency and model quality. The sliding window variable
is maintained to adjust the timing and breadth of aggregations based on the discrepancy between the desired and actual device participation rates,
and
. This adjustment allows the edge node to dynamically adjust the aggregation interval, improving responsiveness to system changes and preventing inefficiencies such as excessive communication costs from too frequent aggregations or outdated models from infrequent ones. Additionally, a quality score
is iteratively updated to track the reliability of received updates over time. Parameters like
are employed to implement exponential smoothing, assigning appropriate emphasis to recent measurements versus historical performance, therefore enhancing the system’s ability to detect persistent trends or anomalies in device behavior.
Algorithm 2 Asynchronous Aggregation and Adaptation at Edge Node |
- 1:
Input: Encrypted models , previous model , feedback - 2:
Output: Aggregated model , metadata - 3:
Decrypt - 4:
- 5:
for each do - 6:
- 7:
end for - 8:
- 9:
- 10:
- 11:
Send encrypted update and metadata to central server
|
From a computational complexity perspective, the dominant costs of asynchronous aggregation at the edge node arise from the cryptographic operations and the vectorized arithmetic. The decryption of each encrypted model typically takes a time proportional to the model size, denoted as for models with L parameters. Accumulating and normalizing updates from devices adds another complexity, as each parameter of each participating device model must be combined. Space complexity remains for storing models and metadata, as the algorithm keeps track of a limited set of parameters and aggregated statistics rather than the entire historical sequence of updates. Overall, the computational overhead scales with both the number of participating devices and the model dimensionality but does so in a manner that remains tractable for edge nodes equipped with moderate computational resources. Through careful parameter tuning and adaptive mechanisms, the described approach achieves a favorable trade-off between computational efficiency, adaptability, and model quality in asynchronous edge aggregation scenarios.
4.3. Global Aggregation and Cloud-Level Feedback
Algorithm 3 illustrates the global aggregation process, wherein the server receives aggregated models, denoted as , along with metadata such as participation frequencies and quality scores from each edge node e. While previous stages focus on handling local and edge-level variability, the global aggregation stage further refines the model by emphasizing reliability and quality across the entire network of participating edge nodes.
The core of this process is the computation of a reliability score
for each edge node. Drawing upon established participation metrics
and quality indicators
, the algorithm forms a weighted average where
controls the relative importance of participation frequency versus quality. Higher
values correspond to edge nodes that consistently contribute updates of higher quality or maintain stable and frequent participation. By weighting each edge node’s aggregated model
by
, the global model
becomes more robust to outliers and intermittent contributors, ultimately improving the convergence and generalization performance of the federated-learning system.
Algorithm 3 Global Aggregation and Cloud-Level Feedback |
- 1:
Output: Global model , feedback for edge nodes - 2:
Output: Global model - 3:
- 4:
for each edge node do - 5:
- 6:
- 7:
end for - 8:
- 9:
Distribute to all edge nodes - 10:
Send feedback to edge nodes e, including recommended adjustments for aggregation intervals and participation weight adjustments
|
Upon completing the aggregation, the global model is normalized by dividing by the sum of the weights . This normalization ensures a stable reference scale for subsequent training rounds, making the global model both statistically meaningful and numerically well-conditioned. The refined global model is then disseminated back to the edge nodes, providing them with a synchronized, high-quality baseline from which to initiate subsequent local training rounds.
In addition to redistributing , the central server also returns feedback to each edge node. These feedback values, along with recommended adjustments for local parameters (such as aggregation intervals or participation weights), guide edge nodes in calibrating their local and asynchronous aggregation processes. This iterative feedback loop creates a positive cycle: as edge nodes refine their behavior and improve their contributions, the global model strengthens, providing better guidance and baselines for future local training. Over time, this dynamic process fosters a more stable, efficient, and high-performing federated-learning ecosystem.
The computational complexity at this global stage primarily reflects the cost of decrypting the received aggregated models and performing weighted summations. Decryption overhead scales linearly with the number of edge nodes E and the model size L, resulting in complexity. The aggregation step itself—applying weights and summing across E nodes—also incurs operations. Space complexity remains on the order of , as the server temporarily stores each model and associated metadata before integrating them into the global model. Overall, this complexity profile remains tractable for cloud-level servers, which typically have more abundant computational and storage resources than edge nodes or IoT devices.
6. Discussion
The HAR dataset exemplifies the heterogeneity commonly found in IoT environments, where client-level data often exhibit non-iid. Despite this natural variability, as analyzed in
Section 5.1.2, our hierarchical framework demonstrated consistent performance, maintaining stable global accuracy across training rounds. This highlights the robustness of the localized edge-level aggregation and adaptive weighting mechanisms, which effectively mitigate the impact of distributional differences.
By leveraging these mechanisms, the framework achieves efficient global updates without bringing significant overhead or requiring extensive preprocessing. These results suggest that our framework is well-suited for diverse IoT scenarios, where client data may inherently deviate from idealized distributions.
The experimental results unequivocally demonstrate that our asynchronous aggregation strategy, in conjunction with lightweight encryption, achieves comparable model accuracy to HierFAVG while providing significant benefits in communication efficiency and resilience to device heterogeneity. Specifically, the reduced round-time cost observed in
Figure 6 emphasizes the robustness of the proposed framework in handling network instability and hardware variability, ensuring that slower participants do not become bottlenecks.
The initial fluctuations in training accuracy during the early rounds, as highlighted in
Figure 3 and
Figure 5, suggest a trade-off between aggregation frequency and model stability. Nonetheless, by round 30, the convergence behavior of our framework aligns closely with HierFAVG, with a final accuracy difference of only 0.4%, indicating that asynchronous aggregation does not compromise the overall quality of the model. This is crucial in IoT environments, where timely and efficient updates are often prioritized over synchronized operations. The analysis of encryption, as presented in
Table 3, illustrates that the encryption overhead is minimal, averaging 0.8 s per round, and thus suitable for federated-learning systems involving resource-constrained IoT devices. The results demonstrate that our framework not only maintains data security but also operates within the practical computational limits of IoT devices.
Our framework’s adaptability and performance suggest practical applications across a wide range of IoT scenarios, particularly those involving heterogeneous devices and dynamic environments. For instance, the HAR dataset used in our experiments exemplifies human activity recognition systems where data are collected from wearable devices via embedded sensors. These systems are inherently heterogeneous, with devices varying in computational capabilities and connectivity. Our framework’s asynchronous aggregation mechanism ensures that updates from such diverse devices are efficiently handled, maintaining timely and accurate predictions even under network instability. In healthcare monitoring, this advantage translates directly to secure and efficient communication of sensitive patient data from wearable devices to cloud servers, leveraging our lightweight encryption to protect privacy without overburdening resource-constrained devices. Similarly, in broader IoT applications such as industrial IoT and smart agriculture, the framework’s scalability enables it to manage data aggregation from numerous sensors monitoring machinery, environmental conditions, or crop health, optimizing communication overhead and ensuring robust system responsiveness. These examples highlight the framework’s versatility and its potential to enhance performance across diverse IoT domains.
The primary focus of this study is to enhance scalability and privacy in federated learning through hierarchical aggregation and lightweight encryption, addressing the critical challenges of device heterogeneity and secure communication in IoT environments. While runtime efficiency and performance comparisons with alternative methods are valuable, they are beyond the immediate scope of this work. Due to hardware resource constraints, our experiments are currently limited to IoT networks with up to 20 clients, as larger-scale simulations lead to significant performance degradation on available hardware. Despite these limitations, we have conducted comprehensive trend analyses using client counts of [4, 8, 12, 16, 20], which provide valuable insights into the performance and scalability of the proposed framework. The primary focus of our current study is on enhancing scalability and privacy through hierarchical aggregation and lightweight encryption. However, we acknowledge that runtime efficiency and performance comparisons with alternative methods remain important areas for further exploration. In particular, we plan to expand our evaluations by optimizing the simulation framework and leveraging more advanced hardware to support larger-scale IoT networks and more diverse device configurations. This will enable us to further explore the trade-offs and benefits of our asynchronous aggregation strategy, ensuring its robustness and applicability in even more dynamic and heterogeneous IoT environments.
Our experiments are designed around the HAR dataset, which inherently reflects user-specific data distribution, aligning with real-world IoT scenarios. While addressing model biases and accuracy under artificially skewed distributions is an important topic, it is beyond the scope of this study, as our primary focus is on improving scalability, efficiency, and privacy in hierarchical FL systems.
7. Conclusions
In summary, we proposed a hierarchical federated-learning framework designed to address the challenges posed by heterogeneous IoT environments, focusing on improving communication efficiency and ensuring data security through lightweight encryption. By employing hierarchical aggregation, asynchronous communication, and lightweight stream encryption, our framework achieved a significant reduction in communication cost compared to the traditional HierFAVG, with an average reduction of approximately 20% in round-time cost. Additionally, the use of SALSA encryption provided data security with minimal computational overhead, averaging 0.8 s per round, demonstrating its feasibility for resource-constrained devices. Despite the presence of hardware and network heterogeneity, our experimental results showed that the proposed framework achieved comparable model accuracy to HierFAVG, with a final accuracy difference of only 0.4%. These findings indicate that our framework provides an efficient, secure, and scalable solution for federated learning in dynamic IoT environments. Based on this, our future work will focus on extending the evaluation to larger-scale IoT networks, incorporating hundreds or thousands of devices, to analyze the latency and scalability in such environments. Furthermore, adaptive privacy-preserving mechanisms, such as secure aggregation protocols and dynamic encryption schemes, will be explored to address heightened privacy concerns in real-world deployments.