Next Article in Journal
Analysis of Flow Field Characteristics in the Three-Phase Jet Fire Monitor Head
Next Article in Special Issue
Comprehensive Review of Traffic Modeling: Towards Autonomous Vehicles
Previous Article in Journal
Enhancing Unmanned Marine Vehicle Security: A Periodic Watermark-Based Detection of Replay Attacks
Previous Article in Special Issue
Enhancing Robot Behavior with EEG, Reinforcement Learning and Beyond: A Review of Techniques in Collaborative Robotics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Uncertainty-Aware Federated Reinforcement Learning for Optimizing Accuracy and Energy in Heterogeneous Industrial IoT

by
A. S. M. Sharifuzzaman Sagar
,
Muhammad Zubair Islam
*,†,
Amir Haider
and
Hyung-Seok Kim
*
Department of Artificial Intelligence and Robotics, Sejong University, Seoul 05006, Republic of Korea
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2024, 14(18), 8299; https://doi.org/10.3390/app14188299
Submission received: 2 August 2024 / Revised: 10 September 2024 / Accepted: 12 September 2024 / Published: 14 September 2024

Abstract

:
The Internet of Things (IoT) technology has revolutionized various industries by allowing data collection, analysis, and decision-making in real time through interconnected devices. However, challenges arise in implementing Federated Learning (FL) in heterogeneous industrial IoT environments, such as maintaining model accuracy with non-Independent and Identically Distributed (non-IID) datasets and straggler IoT devices, ensuring computation and communication efficiency, and addressing weight aggregation issues. In this study, we propose an Uncertainty-Aware Federated Reinforcement Learning (UA-FedRL) method that dynamically selects epochs of individual clients to effectively manage heterogeneous industrial IoT devices and improve accuracy, computation, and communication efficiency. Additionally, we introduce the Predictive Weighted Average Aggregation (PWA) method to tackle weight aggregation issues in heterogeneous industrial IoT scenarios by adjusting the weights of individual models based on their quality. The UA-FedRL addresses the inherent complexities and challenges of implementing FL in heterogeneous industrial IoT environments. Extensive simulations in complex IoT environments demonstrate the superior performance of UA-FedRL on both MNIST and CIFAR-10 datasets compared to other existing approaches in terms of accuracy, communication efficiency, and computation efficiency. The UA-FedRL algorithm attain an accuracy of 96.83% on the MNIST dataset and 62.75% on the CIFAR-10 dataset, despite the presence of 90% straggler IoT devices, attesting to its robust performance and adaptability in different datasets.

1. Introduction

The Internet of Things (IoT) is revolutionizing various sectors by enabling massive data sharing and robust decision-making through connected devices. As a result, IoT scenarios are anticipated to generate large quantities of data in the near future [1]. To harness the potential of these data, researchers are investigating the feasibility of storing and processing them in cloud-based environments using advanced artificial intelligence techniques, such as deep learning and machine learning algorithms [2]. However, cloud-based data processing raises concerns about privacy and communication costs, which could potentially limit the benefits of such solutions [3]. Furthermore, IoT systems demand distributed intelligent services that can adapt to dynamic environments in real time, posing challenges due to the inherent complexity and heterogeneity of IoT devices. Researchers have proposed a multi-objective optimization technique using the Glowworm Swarm Optimization (GSO) algorithm and a hybrid optimization approach combining the artificial bee colony method, genetic operators, and density correlation degree to enhance the performance of blockchain-based Industrial Internet of Things (IIoT) systems [4,5]. Federated learning (FL), a distributed machine learning framework that ensures privacy, reduces data transfer, and solves device heterogeneity while allowing real-time adaptation to various contexts, has thus emerged as a solution to these problems [6].
FL is a distributed machine learning technique that allows for the training of algorithms across several edge devices, each of which has local datasets that are not shared or exchanged [7]. By locally training models on IoT devices and aggregating the trained parameters on a central server, FL addresses data privacy concerns and adapts to the heterogeneity of IoT environments. FL has been successfully applied in various applications, such as mobile keyboard prediction [8], autonomous industry [9], remote health monitoring [10], and medical image processing [11], demonstrating its significant potential. By leveraging local computing capabilities and ensuring data security through local model training, FL not only improves training efficiency but also enhances privacy protection and model accuracy.
Recently, FL methods have been used in IoT-based systems to enable privacy-enhanced IoT systems by allowing multiple devices to coordinate with a central server for data training without sharing actual datasets [12]. In IoT networks, devices act as workers communicating with an aggregator, performing neural network training to enhance training quality and minimize user privacy leakage. The robustness and potential of FL frameworks make them fundamental building blocks for sophisticated IoT applications. Figure 1 shows the general design of FL in an IoT context, where IoT devices train their local models and send weights to the FL server. The FL server gathers all local model weights from IoT devices and updates its global model to acquire global model weights.
Although the potential of FL in IoT scenarios is huge, there are shortcomings of FL that need to be solved to implement FL in heterogeneous industrial IoT networks. Figure 2 illustrates a scenario of heterogeneous FL in an IoT environment. In heterogeneous scenarios, clients do not share the same computational and energy resources. The dataset is dispersed uniquely, each with distinct attributes and properties. The aggregation from these IoT devices affects the performance of the global model due to inadequate training of clients. Therefore, performance optimization in such scenarios requires approaches that can handle the variations in computational and data distributions.
Despite these variations in scenarios, a majority of existing methods continue to employ identical training configurations (epochs) and model aggregation methods, irrespective of the scenario at hand. It is important to note that this approach could significantly affect the performance of the model. We discuss these challenges in the following.

1.1. Model Accuracy in Heterogeneous Industrial IoT with Non-IID Dataset

In heterogeneous industrial IoT environments with non-IID datasets, FL encounters the challenge of maintaining model accuracy. The non-IID data characterize situations in which the data distribution across IoT devices is uneven, unbalanced, or displays distinct features, a common occurrence in real-world IoT deployments. Additionally, IoT devices within extensive networks may possess varying resources for data processing, further contributing to heterogeneity. Consequently, these scenarios result in heterogeneity challenges stemming from diverse data collection rates, a variety of device types, and unique data patterns across devices, which can impact the performance of federated learning systems.
Specifically, the issue of model accuracy in heterogeneous industrial IoT environments arises from federated learning’s reliance on aggregating locally trained models to create a global model. In the presence of non-IID data, local models trained on individual devices may exhibit satisfactory performance on their respective local data but perform inadequately on data from other devices. As a result, when aggregating these local models, the ensuing global model may acquire lower accuracy or slow convergence due to discrepancies in data distributions and available computational resources.

1.2. Computation and Communication Efficiency

FL in heterogeneous industrial IoT scenarios also presents challenges in terms of computation and communication efficiency. Heterogeneous industrial IoT devices often possess varying computational resources, processing capabilities, and communication bandwidth, which can affect the overall performance and efficiency of a federated learning system.
In heterogeneous industrial IoT environments, devices with limited processing power may struggle to keep pace with more capable devices during the local training process. This discrepancy can lead to longer training times and slow convergence of the global model. Additionally, the energy constraints of battery-powered IoT devices may further exacerbate the problem, as these devices need to balance local computation with energy consumption.
Moreover, communication between IoT devices and the central server is critical for FL, as local model updates need to be transmitted and aggregated to form the global model. Heterogeneous industrial IoT devices often have varying communication capabilities and network conditions, which can lead to communication bottlenecks and increased latency.

1.3. Weight Aggregation Challenges in Heterogeneous Industrial IoT Scenarios

The problem of weight aggregation in heterogeneous industrial IoT devices for federated learning is due to inherent complexities and variations in device capabilities, data distribution and available resources. Heterogeneous industrial IoT devices encompass resource-constrained edge servers, limited resources, limited computing power, and battery endurance, which can affect the weight aggregation process during federated learning. In heterogeneous industrial IoT environments, local model updates from individual devices may differ significantly due to the varying data distribution, device resources, and training characteristics. Consequently, the central server must carefully aggregate these diverse local model updates to create a robust and accurate global model. Moreover, non-IID data are common in heterogeneous industrial IoT environments, leading to an imbalanced data distribution across devices. The central server must account for this imbalance during weight aggregation to avoid biases and ensure an accurate representation of the underlying data. IoT devices with limited processing power may not perform as many training iterations on their local data as other IoT devices. Therefore, the aggregation of those straggler IoT devices affects the overall performance of the global model accuracy and convergence.
FL with a reinforcement learning approach has evolved in recent years as a potential way to train machine learning models in diverse and decentralized environments. However, the implementation of FL provides considerable hurdles due to the variability in data distribution, computational capacity, and resource availability among IoT devices. There is still potential for advancement in tackling the difficulties of implementing federated learning in dynamic IoT contexts, even though numerous studies on the topics of “federated learning” and “heterogeneous networks” have been published.
This paper proposes an Uncertainty-Aware Federated Reinforcement Learning (UA-FedRL) method, which dynamically selects local epochs to manage heterogeneous industrial IoT devices in federated learning networks more effectively. In combination with the Predictive Weighted Average Aggregation (PWA) method, UA-FedRL provides a comprehensive solution to address the complexities and challenges of implementing federated learning in heterogeneous industrial IoT environments and acquiring high accuracy with communication and computation efficiency. Our proposed methods demonstrate outstanding results in harsh-IoT-environment simulations, outperforming other federated learning approaches in terms of accuracy, communication, and computational efficiency. Moreover, Software-Defined Networking (SDN) technology is implemented in the core network of the IoT environment with a focus on achieving a lower communication latency.
The contributions of this study can be summarized as follows:
  • We propose a UA-FedRL method that addresses the challenges associated with the heterogeneity of IoT devices in non-IID data distribution scenarios. The UA-FedRL method dynamically selects local epochs, improving accuracy, computation, and communication efficiency while effectively managing heterogeneous industrial IoT devices in federated learning networks.
  • We introduce the PWA method to address weight aggregation issues commonly found in heterogeneous industrial IoT scenarios. This method calculates the weight quality of every local IoT device using the predictive log-likelihood of the validation accuracy of the local model. The weights of individual models are adjusted based on their quality to mitigate the impact of heterogeneity and non-IID issues during aggregation.
The rest of this paper is structured as follows. Section 2 presents a comprehensive literature overview that highlights key breakthroughs in FL for the IoT and identifies gaps in existing research techniques. Section 3 discusses the problem formulation of the study. Section 4 discusses in detail our proposed UA-FedRL technique and PWA approach, describing their conceptual basis. Section 5 discusses the parameters of our simulation environments and presents the findings of our comparative studies. We compare the performance of UA-FedRL with that of existing FL systems in terms of accuracy, communication efficiency, and computation efficiency. Finally, in Section 6, we conclude our paper by summarizing our results, discussing their significance for the discipline, and providing an outline of prospective future work in this area.

2. Related Work

In this section, we explore the existing methods for optimizing FL utilizing Deep Reinforcement Learning (DRL) methods. The constraints of these current approaches are then examined. In addition, we cover the study on the development of various aggregation algorithms in FL, as well as their limitations in heterogeneous industrial IoT contexts.

2.1. FL Optimization Using DRL

FL methods such as FedAvg [13] and FedProx [14] have been essential in training models on IoT networks. While FedAvg averages the gradients from client devices to update global models, FedProx addresses challenges from heterogeneous industrial IoT networks by introducing a proximal term, regularizing the FL model. However, both methods face difficulties with heterogeneous datasets and resource heterogeneity. This has led to heuristic algorithms, which sometimes yield suboptimal results.
With the optimization challenges in FL, DRL has been used recently. In particular, Wang et al. introduced FAVOR, which used deep Q-learning to select client devices, aiming to optimize validation accuracy and minimize communication rounds [15]. Zhang et al. used a DRL-based Mobile Edge Computing (MEC) system for FL to minimize energy consumption and training delay [16]. Zhang et al. also proposed an FL algorithm supported by DRL for efficient IIoT equipment selection, attaining high accuracy on the MNIST dataset [17]. Han et al. suggested a deep Q-learning-based mechanism for quantization bit allocation, enhancing the global FL model performance [18]. Rjoub et al.’s DDQN-Trust algorithm employed trust establishment calculations for client selection in IoT networks, showing better optimization with FedProx [19]. Zhang et al. showcased FedMarl, which used multi-agent reinforcement learning for runtime client selection, resulting in improved model accuracy and reduced latency [20]. Chen et al. designed TP-DDPG, a framework in an energy-harvesting hierarchical FL system, balancing learning delay and model accuracy [21].
It is evident that the existing methods mainly focus on dynamically selecting clients for model training without considering adaptive local epoch selections. Existing methods use a fixed number of local epochs for all clients, overlooking the variability in their computing costs and heterogeneity. By incorporating adaptive local epoch selection, clients with more substantial resources can perform a higher number of training epochs. On the contrary, clients with limited computing capabilities can participate in the training process with fewer epochs. This approach allows for a more efficient and tailored utilization of resources, ultimately enhancing the overall performance of FL in heterogeneous industrial IoT scenarios.

2.2. Weight Aggregation in FL

Weight aggregation in FL revolves around the idea of transmitting client weights to a server for global model updates. Given the diversity of IoT scenarios, numerous weight-aggregation techniques have surfaced to improve model accuracy. McMahan initiated FL, using average weight aggregation from clients. This resulted in reduced communication rounds compared to synchronized Stochastic Gradient Descent (SGD) [13]. Park et al.’s FedPSO uses particle swarm optimization for weight aggregation, selecting optimal client weights to improve accuracy [22]. Chen et al.’s FedHQ focuses on heterogeneous quantization precision, allocating weights based on quantization heterogeneity [23]. Guo et al. addressed the challenge of divergent data distributions by employing DRL to evaluate client contributions for global model aggregation [24]. Jayaram et al. showcased AdaFed, a scalable FL aggregation architecture that leveraged cloud functions for resource efficiency and resilience [25]. Xu et al. presented FedLA, reducing aggregation frequency for efficient gradients in non-IID settings. It employed weight divergence change rate for aggregation timings [26]. Li et al.’s γ -mean method, anchored in a minimum divergence estimation, curtails the influence of byzantine clients [27]. Ngyuen et al. focused on the probability distribution of features, estimating local distributions for weight calculations in heterogeneous industrial IoT data [28]. Han et al. introduced DeFL, a decentralized aggregation method for cross-silo FL to tackle the central server’s vulnerabilities [29]. John et al.’s FedBuff blends synchronous and asynchronous FL benefits, proving superior or equal to FedAvgM under differential private training [11].
The aggregation of locally computed updates from clients with heterogeneous data distributions, communication constraints, and unstable network environments still poses challenges in terms of model accuracy and convergence. The FL algorithms proposed by various researchers use different methods for global model updates and weight aggregation to address these challenges. However, some of these methods have limitations, such as inefficient communication, vulnerability to central servers, sensitivity to byzantine clients, and reduced accuracy due to non-IID data distributions. To overcome these limitations, a novel aggregation method should be proposed to assign weights to each client based on the quality of their local model updates. This method can ensure that clients with more accurate and reliable updates receive more weight during global model aggregation.

3. Problem Formulation

FL is utilized to accommodate numerous devices that accumulate data and a central server that manages the global learning objective throughout the network of devices. Important notation used in this paper is indicated in Table 1.
We have N IoT devices, each with a local dataset D i and a computational cost represented by p i . The goal is to minimize a federated objective function f ( w ) defined as
f ( w ) = ( 1 / N ) i = 1 N p i F i ( w ) ,
where F i ( w ) is the local objective function of device i.
At each round t of the FL process, device i selects a random subset S i of its local data D i and performs local training on S i using the current model parameters w t to obtain a new set of parameters w ( t + 1 , i ) . Then, the device i sends w ( t + 1 , i ) to a central server. The central server aggregates the received model updates to obtain a new global model parameter  w ( t + 1 ) :
w ( t + 1 ) = ( 1 / N ) i = 1 N w ( t + 1 , i ) ,
where w ( t + 1 ) is the new global model parameter, w ( t + 1 , i ) is the local model parameter from device i, N is the number of clients, and t is the training round. The new global model parameter w ( t + 1 ) is sent back to all devices, and the process repeats until convergence.
However, a significant challenge emerges when dealing with heterogeneous industrial IoT devices. These devices are characterized by their diverse computational capacities, available energy resources, and data sizes, which can vary widely from one device to another. This heterogeneity presents a formidable task when it comes to selecting the optimal number of epochs for each IoT device in the context of FL.
Currently, FL methods use the same number of epochs for every participating device without considering the heterogeneous issues present in IoT devices. However, if the epoch count is set too low to reduce the usage of energy resources and computation resources, the model might be unfit. Thus, the model fails to learn the underlying patterns in the data and performs poorly on real-world data because it has not adequately captured the complexity of the data.
On the other hand, if the epoch count is set too high, the model might overfit the local data on the device. Overfitting occurs when the model learns the training data too well, to the point that it captures not only the underlying patterns but also the noise or random fluctuations in the data. Moreover, it also increases the usage of energy and computational resources, which reduces the usability of IoT devices for a long time.
Therefore, selecting an optimal number of epochs for each device is a complex but crucial problem in FL on heterogeneous industrial IoT devices. An optimal number of epochs for each device should be selected to ensure that the model neither underfits nor overfits the data and also reduces the consumption of energy and computation resources.
The objective is to minimize the expected loss over all devices while accounting for the communication and computation costs,
min w , E i = c i · exp i L ( f w ( E i ) , x ) ( 1 a c c ( w , E i ) ) + α · T C ( E i ) + β · R ( E i ) γ · D n o n I I D ( E i ) ,
where E i is the number of epochs selected for device i, a c c ( w , E i ) is the accuracy, L ( f w ( E i ) , x ) is the loss function of the model on dataset x after E i epochs, and  T C ( E i ) is the total cost, which includes the sum of the computation and communication cost, R ( E i ) quantifies the reduction in communication rounds, and  D n o n I I D ( E i ) represents the variability due to the distribution of non-IID data between devices. α , β , and  γ are introduced to control the trade-offs between model quality, total cost, communication rounds, and data distribution, respectively. The relevant total cost T C ( E i ) can be defined as
T C E i = C E i + P i .
The communication cost can be defined as
C E 1 , E 2 , , E n = t = 1 N b i t s w ( t + 1 , i ) + b i t s I D i ,
where b i t s ( w ( t + 1 , i ) ) is the bits required to transmit the model parameters w ( t + 1 , i ) from device i to the central server, and b i t s ( I D i ) is the number of bits required to transmit a unique ID from device i to the central server, which is specific to our proposed method. The computational energy consumption can be defined as
p i = D i ε i E i m i f i 2 l i ,
where D i is the client’s data sample, ϵ i is the required central processing unit for handling each data sample, E i is the number of epochs, m i is the size of the deep learning model implemented in each client, f i is the central processing unit frequency, and l i is the complexity of the learning task.
The challenge is to design an adaptive and efficient algorithm that can learn the optimal number of epochs for each device, based on its local characteristics and the current state of the model, while minimizing communication and computation costs. This can be formulated as UA-FedRL, where the agent learns a policy for each participating IoT device that maps the current state (e.g., the model’s performance on its local data) to the action (the number of epochs to select). The objective is to achieve the highest predicted cumulative reward, which is a function of the learned model’s quality while reducing communication and computing costs.

Weight Aggregation in Heterogeneous Industrial IoT Devices

Weight parameter aggregation also poses a great challenge to obtain the optimal performance of the global model in FL. A device with larger local datasets and higher computational resources may take more epochs to converge compared to a device with a smaller dataset and lower computational resources. The optimal number of epochs for each device D i to train a model depends on numerous factors such as quantity of data, quality of the data, computational resources, size of the model, etc. Therefore, it poses a challenge to aggregate the global model from heterogeneous industrial IoT devices and achieve an accurate global model accuracy.
Let w i ( e i ) be the model parameters of device i after e i training epochs, and let p i be the computational resources available on device i. The objective function for our system can be defined using (1)
f ( w ) = ( 1 / N ) i = 1 N p i F i w i e i .
To minimize the local object function of device i, the optimal number of epochs can be defined as
e i = argmin e F i w i e i .
However, the system may choose a different number of epochs, e i , instead of the optimal number of epochs, which may degrade the performance of the global model. Moreover, different devices may have different magnitudes and directions because of training local models with different epochs, making it difficult to directly aggregate them. Therefore, different aggregation methods compared to traditional aggregation methods must be proposed to accommodate variable epochs. This study proposes a PWA method to solve the heterogeneous industrial IoT device aggregation problem, which is discussed in Section 4.

4. Construction of the UA-FedRL Method

This section is dedicated to the detailed structure of the UA-FedRL method to solve the issues found in heterogeneous industrial IoT device networks in FL, such as (1) optimizing FL to accommodate the participation of heterogeneous industrial IoT devices to obtain optimal global model accuracy and (2) introducing a novel aggregation method to mitigate heterogeneity in the IoT network.

4.1. Uncertainty-Aware Reinforcement Learning-Based Optimal Epoch Selection

4.1.1. Preliminaries

This section is dedicated to describing the preliminaries of the uncertainty-aware reinforcement learning (UA-RL) method, which is proposed in this paper.
We assume that the agent’s policy is represented by a neural network that takes the state and action as input and outputs a distribution over the number of epochs. Specifically, let a i , t be the action (number of epochs) selected by agent i at time step t, and let s i , t be the state observed by agent i at time step t. The agent’s policy is given by:
π θ π a i , t s i , t = Softmax z i , t ,
where z i , t is the output of a neural network with weights θ , which takes s i , t as input and outputs a distribution over the number of epochs. We assume that the neural network has one hidden layer with hidden units and a Gaussian prior over the weights:
θ π N μ π , diag σ π 2 ,
where μ π and σ π 2 are learnable parameters that are updated during training. Bayes by backprop with variational inference can be used to approximate the posterior distribution over the weights given to the data. We assume a Gaussian distribution over the weights with a mean and variance that are parameterized by the neural network as follows,
q θ ( θ ) = N μ θ , diag σ θ 2 .
We aim to learn the parameters θ n and the weights θ of the neural network with variational inference using the following loss equation:
L = 1 N i = 1 N t = 1 T i log π θ n ( a i , t s i , t ) 1 σ i , t σ i , t μ i , t 2 + log σ i , t + β D K L q θ ( θ ) p ( θ ) .
The first term corresponds to the expected negative log-likelihood of the actions under the policy and the assumed Gaussian distribution over the number of epochs. The second term is the Kullback–Leibler (KL) divergence between the approximate posterior q θ ( θ ) and the prior p ( θ ) . The hyperparameter β controls the strength of the regularization, which can be defined as follows,
β = min epoch num _ epochs 4 , 1 .
The upper confidence bound (UCB) algorithm is used to select actions that maximize a combination of the expected reward and the uncertainty of the estimate of the value of the action. Uncertainty is captured by the variance of the approximate posterior distribution over the weights.
The uncertainty can be computed by sampling θ from the approximate posterior distribution q θ ( θ ) and computing the estimate of the action value Q ( s , a , θ ) for each action a using the weighted neural network. The relevant equation of variance of the action value estimate is defined as
Var θ [ Q ( s , a , θ ) ] = Var θ z T ( s , a ) θ = z T ( s , a ) Var θ [ θ ] z ( s , a ) ,
where z ( s , a ) is the feature vector for the state–action pair ( s , a ) . The UCB U ( s , a ) for each action is then calculated as
U ( s , a ) = Q s , a , μ θ + β Var θ [ Q ( s , a , θ ) ] ,
where μ θ is the mean of the approximate posterior distribution q θ ( θ ) , and  β is a hyperparameter that controls the balance between exploration and exploitation. The action with the highest upper confidence bound is selected as the next action to take.

4.1.2. Proposed Method

The optimum epoch selection can be represented as an MDP problem using the equation M = ( S , A , P , R , γ ) . UA-RL is then used to explore the action to select the optimal epoch for each IoT device to acquire the best accuracy. To achieve lower latency, SDN and MEC are employed in the core of heterogeneous industrial IoT networks. SDN facilitates the separation of the control plane from the data plane within the communication network. The control plane handles decision-making related to network traffic, while the data plane is responsible for the actual forwarding of this traffic. This study utilized multi-agent reinforcement learning approach, where each client is assigned to a UA-RL agent to select epochs for each device. Figure 3 shows the detailed workflow of the UA-RL method in selecting local epochs for each client considering the status of each client. After monitoring the current state S ( t ) of the assigned device at time step t, each agent performs an action a ( t ) . The central server calculates team rewards R ( t ) considering the accuracy, available resources, and total cost of the training model. The aim of the UA-FedRL method is to acquire the maximum reward by exploring an optimal policy. The state space, action space, and reward can be defined as follows,
1.
State space: The state space of the UA-FedRL method at time step t consists of available computation resources G ( t ) = G 1 t , G 2 t , , G i t , model state, M ( t ) = M 1 t , M 2 t , , M i t , and epoch selection state, E ( t 1 ) = E 1 ( t 1 ) , E 2 ( t 1 ) , , E i ( t 1 ) . The state space of UA-FedRL can be defined as
S = e 1 , e 2 , , e N , m , g 1 , g 2 , , g N e i E for i = 1 , 2 , , N , m M , g i G for i = 1 , 2 , , N .
2.
Action space: The action of the UA-FedRL method is to select epochs for each IoT device at each time step to find the optimal epoch for each device. The action space of the UA-FedRL model can be defined as
A = a 1 , a 2 , , a N a i E , 1 a i 30 for i = 1 , 2 , , N ,
where E is the set of possible epochs for each IoT device, and a i is the next epoch number for the ith device.
3.
Reward space: The reward function is used by the agent to evaluate the action taken to find the optimal action, i.e., the optimal epoch number. The UA-FedRL model’s reward is defined as
R ( s , a , s ) = k i c i × T C i ,
where k is the model accuracy, c i is the loss value of the training model from the ith IoT device, and T C i is the total cost of training the model on the ith IoT device. The global reward can be defined as
R t = 1 N i = 1 N R i ( s , a , s ) .
The detailed procedure of the proposed UA-FedRL to select local training epochs can be seen in Algorithm 1. The algorithm starts by initializing the weights θ of the neural network with a Gaussian prior θ π . Additionally, we initialize the hyperparameters α , β , γ , and N, where N represents the number of agents. During the training phase, the algorithm iterates over multiple episodes. Within each episode, the algorithm loops through several time steps t. At each time step, it observes the current state s t and iterates on each agent i. For each agent, we sample the weights θ i from the approximate posterior distribution q θ i θ i and compute the action value estimates Q i , t s t , a ; θ i for all possible actions a. We then calculate the upper confidence bounds U i , t s t , a for all actions and select the action a i , t that has the highest upper confidence bound. After determining the actions for all agents, we perform the composite action a t = a 1 , t , a 2 , t , , a N , t and observe the reward r t . The weights of the neural network θ are then updated using backpropagation with a loss function that incorporates the log-likelihood of the action, the difference between the observed reward and the estimated action value, and the KL divergence between the approximate posterior distribution and the prior distribution. The hyperparameter update of β is performed using the calculation method given in (14).
Algorithm 1 UA-FedRL with UCB-based epoch selection to maximize performance in heterogenous IoT.
1:
Initialize the neural network weights θ with Gaussian prior p ( θ )
2:
Initialize the hyperparameters α , β , γ , and N
3:
for each episode do
4:
    for each time step t do
5:
        Observe state s t
6:
        for each agent i do
7:
           Sample θ i from the approximate posterior distribution q θ i ( θ i )
8:
           Compute the action value estimates Q i , t ( s t , a ; θ i ) for all actions a
9:
           Compute the upper confidence bounds U i , t ( s t , a ) for all actions a
10:
          Select action a i , t = arg max a U i , t ( s t , a )
11:
        end for
12:
        Perform action a t = [ a 1 , t , a 2 , t , , a N , t ]
13:
        Observe reward r t
14:
        Update the neural network weights θ using backpropagation with the loss equation:
L = 1 N i = 1 N t = 1 T i log π θ n ( a i , t s i , t ) 1 σ i , t σ i , t μ i , t 2 + log σ i , t + β D K L q θ ( θ ) p ( θ )
15:
        Update the hyperparameter β
16:
    end for
17:
end for

4.2. Predictive Weighted Average Aggregation

PWA is designed to aggregate model updates from heterogeneous industrial IoT devices with different local epoch counts. The PWA method is responsible for assigning weights to each IoT device on the quality of each device’s local model updates and combining weighted weights to update the global model. The PWA method first computes weight quality q i for each device, which represents the degree of confidence in the accuracy of the model updates from device i. We assume that the weight quality q i can be calculated using the predictive log-likelihood of the local model on the validation set. We assume D i is the validation set from device i, and  L y j , f x j ; w is the loss function that measures the discrepancy between the predicted output f x j ; w and the true output y j for the jth data point in D i . The total validation loss across all data points in the validation set can be expressed as follows,
Π j D i L y j , f x j ; w i e i .
The above equation represents the cumulative error of the model’s overall validation data points. The above equation is transformed into the logarithmic equation to simplify the product of the losses, which is computationally more stable and interpretable as follows,
log Π j D i L y j , f x j ; w i e i .
In order to translate the validation performance of each model into a weight, we use the negative exponential function. The negative exponential of the total validation loss ensures that models with lower losses get higher weights. Additionally, we introduce a hyperparameter λ to control the sensitivity of the weight to the validation performance. We can calculate the weight quality of the device i as
q i = exp λ log Π j D i L y j , f x j ; w i e i ,
where λ is a hyperparameter that controls the strength of the weighting function. The higher weight quality indicates the local model updates are more accurate and reliable and will be given more weight during global model aggregation. The PWA method then combines the model updates from all devices by weighting them according to their quality measures. The updated global model parameter w ( t + 1 ) is given by:
w ( t + 1 ) = ( 1 / N ) i = 1 N q i × w ( t + 1 , i ) .
Therefore, the model updates from devices with higher-quality measures contribute more to the global model than those with lower-quality measures, resulting in a more accurate and robust global model.
The weight adjustment in the PWA method is implemented to ensure the quality-based contribution and regularization with the hyperparameter λ . The weight q i reflects the predictive quality of the local model on device i, based on its validation performance, ensuring that models with a lower validation loss contribute more to the global update. This is especially important in non-IID environments, where data distributions vary significantly across devices. The parameter λ serves as a regularizer, controlling the sensitivity of the weight adjustment. A larger λ increases the impact of high-performing devices by accentuating differences in their predictive performance, while a smaller λ smooths out the contributions from all devices, resulting in a more uniform aggregation. This balance helps prevent the global model from overly relying on any single device, improving generalization and robustness.
The PWA method enhances overall model performance by improving convergence, reducing the heterogeneity gap, and increasing robustness. It improves convergence by prioritizing high-quality model updates and reducing the variance from poorly performing devices, leading to a faster convergence rate of O 1 t 2 compared to the traditional O 1 t in FedAvg. The heterogeneity gap H i , which measures the difference between local and global model updates, is also reduced, as devices with large gaps contribute less to the global model. This results in faster convergence and more accurate global models. Additionally, by assigning lower weights to outlier devices with skewed data distributions, PWA improves the robustness of the global model, making it more resilient to non-IID data. The regularization parameter λ helps balance high- and low-quality contributions, preventing overfitting to any one device’s data.
The overall architecture of the PWA and related pseudocode can be seen in Figure 4 and Algorithm 2, respectively. For further details, refer to the supplementary matters.
Algorithm 2 PWA for heterogeneous industrial IoT mitigation.
Require: 
N devices with local datasets D i and local models w i , quality measures q i
Ensure: 
Global model parameter w t + 1
1:
Set learning rate α
2:
for each round t do
3:
    for each device i do
4:
        Select a random subset S i of D i
5:
        Train local model w i on S i for e i epochs
6:
        Compute quality measure:
q i = exp λ log Π j D i L y j , f x j ; w i e i
7:
    end for
8:
    Compute weighted average:
w ( t + 1 ) = i = 1 N q i w i ( e i ) j = 1 N q j
9:
    Update global model parameter:
w ( t + 1 ) = w t α f ( w t + 1 )
10:
end for

5. Experimental Evaluation

5.1. Experimental Setup

The relevant experiments of the UA-FedRL model were carried out on Ubuntu 18.04 desktop with AMD Ryzen 5 3500x cpu @ 3.6 Hhz, Nvidia RTX 3080, and 32 GBytes of RAM. Relevant codes were written in Python scripts using Python 3.8. To design a communication network, a Mininet emulator was utilized. The Mininet emulator is programmed in the Python language and is publicly accessible for researchers at the Mininet official website (http://mininet.org/, accessed on 1 August 2024). It uses the Linux kernel to create a real network with virtualized end-hosts, switches, routers, and links. It provides built-in support for the SDN architecture. Two commonly used datasets were selected to train local IoT devices. The local datasets were distributed with a non-IID distribution across IoT devices for creating heterogeneous scenarios. A Convolutional Neural Network (CNN) model was designed and distributed to all participating IoT devices for the purpose of training their local models. The architecture of this CNN comprised two convolutional layers, each with a kernel size of five. The optimization of this network was carried out using the SGD method, with a learning rate set at 0.01. For the UA-FedRL algorithm, multiple parameters were meticulously chosen to thoroughly investigate its performance under various conditions. The learning rate for UA-FedRL’s α was set at values of 0.1, 0.5, and 0.9. These values were selected to study the algorithm behavior under gradual, moderate, and rapid learning processes. Similarly, the discount factor γ was set at 0.1, 0.5, and 0.9 to analyze the agent’s preference for immediate rewards as opposed to long-term gains. The episode counts ( T ) were chosen as 20 and 100, in order to discern the relationship between the number of learning iterations and the overall performance of the algorithm. The UA-FedRL model was benchmarked against several state-of-the-art FL models, including FedAVG, FedProx, FedShare, and FedSGD. Default parameter values, simulation environment, and models’ hyperparameter settings used for the experiments are presented in Table 2.
The performance evaluation of our UA-FedRL method was conducted by simulating 100 client devices, each with varying resource capabilities. Five Raspberry Pi models were selected for this test due to their common use as IoT devices in real-world environments. These Raspberry Pi variants offered a range of CPU frequencies, from 700 MHz to 1.5 GHz, and varied battery capacities, from 60% to 100%. The Raspberry Pi 4 Model B was identified as the primary IoT device, while the remaining devices were categorized as stragglers. Detailed hardware specifications for each chosen device can be seen in Figure 5.
A communication simulator was also introduced to both client- and server-side training to simulate communication between server and client. The Mininet 2.3.0 Python simulator was used to configure every client as a host, which is similar to IoT devices in a real-world environment. The model parameters from each host were then converted to a byte stream prior to being sent to the server. The byte stream was then divided into packets according to the size of the packet. These packets were then sent to the server side to further process the model parameters. The overall data flow of the implemented communication simulator between the client and server can be seen in Figure 6.

5.2. Non-IID Data Distribution Method

The non-IID Data distribution was designed to distribute a dataset into subsets across multiple clients. This was achieved by first sorting the dataset based on labels, then dividing it into shards, and finally distributing these shards across clients. The method ensured that each client received a specific number of shards, but the data within those shards were not uniformly distributed across classes.
1.
Sorting by labels: Given a dataset D with n samples, each associated with a label from a set L, the dataset is sorted on the basis of these labels. This results in a sequence D where samples with the same labels are grouped together.
2.
Shard creation: The sorted dataset D is divided into N shards, each containing S samples. Thus, each shard S h a r d i is defined as:
Shard i = d ( i 1 ) S + 1 , d ( i 1 ) S + 2 , , d i S .
3.
Data allocation to clients: For each client c, a specific number X of shards are randomly selected without replacement from the set of all shards. The union of samples from these selected shards forms the non-IID dataset for client c. Mathematically, the dataset D c for client c is:
D c = k = 1 X Shard i k ,
where i 1 , i 2 , , i X are the indices of the shards selected for client c.
This study considered three non-IID data distribution scenarios such as low (20%), medium (50%) and high (80%) to evaluate the performance of the UA-FedRL model. The relevant parameters used to create scenarios can be seen in Table 3.

5.3. Hyperparameter Optimization

Figure 7 shows the reward values obtained by UA-FedRL over the course of 20 episodes, for three different learning rates. The x-axis represents the episodes, while the y-axis represents the reward values. The learning rate was fixed at 0.1 and different discount factors were used to acquire rewards for epoch selection.
It can be seen that the rewards fluctuated for initial episodes until they converged between 0.9 and 0.98. Therefore, it can be said that the choice of learning rate has an impact on the rate and magnitude of the agent’s reward improvement over time. However, in that case, the different learning rates produced similar results, suggesting that the agent was able to learn effectively regardless of the specific learning rate chosen.
Figure 8 shows the reward values obtained by UA-FedRL over the course of 20 episodes, for a 0.5 learning rate and three different discount factors such as 0.1, 0.5, and 0.9. The x-axis represents the episodes, while the y-axis represents the reward values. Each discount factor is represented by a different style in the plot.
For each discount factor, the agent’s reward value started at a relatively low value and increased over the course of the episodes, eventually becoming stable at a high value. The three discount factors produced different results, with the rewards for each discount factor being significantly different.
The acquired result demonstrates that the choice of learning rate has a significant impact on the rate and magnitude of the agent’s reward improvement over time. In that case, the choice of learning rate resulted in significant differences in the final reward values obtained by the agent.
Figure 9 shows the acquired rewards of the UA-FedRL with a 0.9 learning rate and different discount factors. UA-FedRL was run for 20 episodes, and it can be seen that the rewards increased over the episodes. However, the rewards with a learning rate of 0.9 were little bit lower compared to those with other learning rates.
The acquired result clearly indicates that the selection of the learning rate considerably influences the speed and extent of the agent’s reward enhancement over time. In that case, the chosen learning rate caused notable differences in the acquired reward values garnered by the agent.
The optimal parameters were selected after evaluating the accumulated rewards of different combinations. It can be seen that UA-FedRL model acquired the highest rewards when the learning rate and gamma were set to 0.1. Therefore, the optimal parameters were used to further evaluate the performance of the UA-FedRL model.

5.4. Performance Comparison on MNIST Dataset

Figure 10 shows a comparison of the performance of the UA-FedRL method with a number of existing FL methods. This comparison assessed the precision achieved after 100 rounds of training, with 100 clients selected for all FL methods. All models were trained on non-IID data, which are commonly encountered in real-world scenarios.
As shown in Figure 10, the Fed_AVG, Fed_Prox, and UAFedRL methods provided superior results compared to the Fed SGD and Fed Share approaches. The FedProx method, specifically designed to address heterogeneity and non-IID data in IoT contexts, demonstrated its effectiveness in the graph. On the contrary, the UA-FedRL method employs Bayesian techniques to tackle heterogeneity in IoT devices and leverages predictive weighted aggregation to enhance robustness when learning from non-IID data.
By achieving an accuracy of 96.45%, the UA-FedRL method outperformed existing FL approaches for IoT devices. This result suggests that the UA-FedRL model is well suited for heterogeneous industrial IoT device scenarios, ensuring more robust learning capabilities in real-world applications. The detailed accuracy of all FL methods used in this experiment is shown in Table 4.
The three best FL methods from Figure 10 were further evaluated to demonstrate their effectiveness in handling straggling devices with different non-IID data distributions. In real-world scenarios, IoT devices in an FL network may possess varying computational capacities. Consequently, it is essential to conduct experiments simulating these conditions to identify suitable methods that accommodate heterogeneous industrial IoT devices.
Figure 11 describes the performance metrics of three methodologies tested across 100 clients, 90% of whom were identified as stragglers. These methods were trained for 100 rounds and assessed various non-IID data distributions. Figure 11a provides an assessment of the Fed_Avg methodology under three distinct non-IID conditions: low, medium, and high. The observation suggests that while Fed_Avg showcased proficient accuracy rates, surpassing 90% under low non-IID data conditions, its efficiency notably decreased with higher levels of non-IID data distribution. This pattern underlined Fed_Avg’s limited adaptability to higher heterogeneity in data and training environments.
In contrast, both the Fed_Prox and UA-FedRL methodologies presented robust performance metrics under varying non-IID conditions, as depicted in Figure 11b,c, respectively. Both methodologies surpassed the 90% accuracy benchmark on various non-IID data spectrums. The architectural premise of the Fed_Prox method is calibrated to accommodate the inherent heterogeneity of IoT devices, incorporating a proximal component to counterbalance disparities during global model updates.
The UA-FedRL algorithm adopts an innovative approach by integrating Bayes by backprop coupled with variational inference reinforcement learning. This fusion facilitates dynamic adjustments to local epochs, depending upon individual device computational capabilities. The introduction of the predictive weighted average aggregation method thereby enhances the cumulative accuracy coefficient of the UA-FedRL framework. UA-FedRL achieved an accuracy of 96.45%, surpassing its FL counterparts. The results acquired for all participating algorithms can be seen in Table 5. This result demonstrates the potential of the proposed method for the effective management of heterogeneous industrial IoT devices in FL networks.

5.5. Performance Comparison on CIFAR-10 Dataset

Figure 12 compares the performance of various FL methods based on their accuracy. The purpose of this comparison was to highlight the effectiveness of the UA-FedRL method in comparison to existing FL techniques. The Fed_Avg method, which is based on the standard federated averaging algorithm, obtained an accuracy of 50.95%. This approach involves a weighted averaging of local model updates from participating clients to update the global model. Despite its popularity, the relatively low accuracy rate of the Fed_Avg method for non-IID data distribution on the CIFAR10 dataset reveals potential limitations in its effectiveness for non-IID datasets. The Fed_Prox method, which is an extension of the Fed_Avg algorithm, introduces a proximity term to penalize local updates that deviate significantly from the global model. This method achieved an accuracy of 60.37%, indicating a significant improvement over the Fed_Avg method. The Fed_SGD method, which employs SGD in an FL environment, yielded an accuracy of 45.98%. Although this approach is simple to implement and has been extensively studied, the results demonstrate that it may not always be the most suitable choice to achieve the highest accuracy rates in FL. The Fed_Share method, a communication-efficient approach that reduces the quantity of data exchanged between clients and the central server by sharing selected model parameters, achieved an accuracy of 43.63%. The significantly lower accuracy rate compared to the other methods indicates that the increases in communication efficiency may come at the cost of reduced model performance.
However, the UA-FedRL method outperformed all other techniques by achieving an accuracy rate of 62.75%. This result demonstrates the prospective benefits of the UA-FedRL method to improve the overall performance of FL models. The increased accuracy rate demonstrates that the new approach can offer significant benefits over existing methods and pave the way for more effective applications of FL in heterogeneous industrial IoT scenarios. The acquired accuracy of the implemented models can be seen in Table 6.
We further evaluated the effectiveness of the top three FL methods from Figure 13 in handling straggling devices, a common challenge in real-world scenarios where IoT devices in an FL network exhibit varying computational capacities. Figure 13 presents a comparison of three FL strategies, evaluating their accuracy on the CIFAR-10 dataset amidst challenges like 90% stragglers and different non-IID levels. The traditional Fed_Avg method, employing weighted averaging from client models for global updates, reached a maximum accuracy of 50% under low non-IID data but that accuracy decreased with higher data heterogeneity. On the other hand, Fed_Prox, an advanced version of federated averaging, achieved an accuracy of around 55%. In particular, its efficiency was also affected by the presence of high non-IID data distributions with a lower accuracy of 44%.
In contrast, the UA-FedRL mechanism introduces Bayes by backprop and variational inference reinforcement learning, allowing adaptive local epoch adjustments based on individual device capabilities. Furthermore, the integration of the predictive weighted average aggregation technique enables UA-FedRL to consistently demonstrate superior accuracy, surpassing 60%. Unlike its counterparts, this method exhibited good stability, maintaining consistent performance at varying levels of non-IID data, as detailed in Table 7.
These findings demonstrate the potential of the UA-FedRL method to effectively manage heterogeneous industrial IoT devices in FL networks. The superior performance of the UA-FedRL method highlights its applicability in scenarios where IoT devices with varying computational capacities are prevalent, ensuring efficient and accurate FL outcomes.

5.6. Ablation Study

An ablation study was conducted to observe the individual and combined effects of the UA-RL and PWA modules in our UA-FedRL approach, which can be seen in Table 8. When the UA-RL and PWA modules were implemented independently, the accuracy of the model was increased to 93.34% and 92.54% in the MNIST dataset and 60.95% and 60.34% in the CIFAR-10 dataset, respectively. This highlights the effectiveness of the UA-RL to effectively select epochs in heterogeneous industrial IoT settings and the PWA’s ability to boost model accuracy by aggregating model weights in line with their quality.
However, the UA-FedRL approach achieved good accuracy when both submodules were combined together, achieving an accuracy of 96.87% and 32.35% on the MNIST and CIFAR-10 datasets, respectively. This outcome validates the dynamic combination between the UA-RL and PWA within the UA-FedRL method.
Table 9 shows the accuracy of UA-FedRL as the number of IoT devices increased on the MNIST and CIFAR-10 datasets. For MNIST, accuracy slightly decreased as the number of devices increased, starting from 96.87% with 100 devices to 96.04% with 250 devices. Similarly, on CIFAR-10, accuracy dropped from 62.73% with 100 devices to 61.98% with 250 devices. It suggests that as more devices participate, the challenge of handling diverse data distributions may slightly affect the overall accuracy.

5.7. Communication Efficiency

Figure 14 presents a comparative study of the normalized communication costs associated with different FL methods, Fed_Share, Fed_SGD, Fed_AVG, Fed_Prox, and UA-FedRL, utilizing two datasets, MNIST and CIFAR10. Each method is represented on the x-axis, while the y-axis quantifies the associated normalized communication cost. Two distinct bars represent each FL method, corresponding to the MNIST (light blue) and CIFAR-10 (light yellow) datasets. The experiment was designed to achieve a target accuracy of 90% for the MNIST dataset and 45% for the CIFAR-10 dataset. Different FL methods with 100 clients were implemented to calculate the normalized communication cost for each dataset. The detailed calculation description of normalized communication cost for a single FL model is given below. The communication cost for each device during a communication round was calculated using the following equation:
C i ( t ) = bits w ( t + 1 , i ) + bits I D i .
The total communication cost for all devices at round t was then calculated as follows:
C T ( t ) = i = 1 N C i ( t ) .
The cumulative communication cost across all rounds to achieve the target accuracy was defined as:
C cumulative = t = 1 T C T ( t ) .
Lastly, the normalized communication cost for MNIST and CIFAR-10, given the target accuracies of 90% and 45%, respectively, was calculated as:
C n o r m a l i z e d , M N I S T = C cumulative C min , M N I S T C max , M N I S T C min , M N I S T ,
where C min , MNIST is the minimum communication cost achieved to reach the target accuracy. C max , MNIST is the maximum possible communication cost given a 90% accuracy target for MNIST.
C n o r m a l i z e d , C I F A R - 10 = C cumulative C min , CIFAR - 10 C max , CIFAR 10 C min , CIFAR - 10 ,
where C m i n , C I F A R - 10 is the minimum communication cost achieved to reach the target accuracy. C m a x , C I F A R - 10 is the maximum possible communication cost given a 45% accuracy target for the CIFAR-10 dataset.
The results represent the variation in efficiency among FL methods to reduce communication costs. UA-FedRL emerged as the most communication-efficient method for both datasets, demonstrating costs of 0.19 and 0.24 for MNIST and CIFAR-10, respectively. On the other hand, Fed_Share exhibited the highest communication costs for both datasets, peaking particularly for CIFAR-10 with a cost of 0.9. Therefore, the chart highlights the critical role of selecting the appropriate FL method for optimizing communication costs in heterogeneous industrial IoT scenarios.

5.8. Computation Efficiency

Figure 15 shows a comparative study of the energy consumption related to a range of FL methods, namely, Fed_Share, Fed_SGD, Fed_AVG, Fed_Prox, and UA-FedRL. These methods were tested on two distinct datasets, MNIST and CIFAR-10. The normalized energy consumption, presented on the y-axis, effectively illustrates the efficiency of each method in terms of energy utilization. Each method’s impact is separately depicted for both MNIST and CIFAR-10 datasets using sky-blue and salmon-colored bars, respectively. The experiment was designed to achieve a target accuracy of 90% for the MNIST dataset and 45% for the CIFAR-10 dataset. The energy consumption for each device during a communication round was calculated using the following equation:
E i ( t ) = P i .
The total communication cost for all devices at round t was then calculated as follows:
E T ( t ) = i = 1 N E i ( t ) .
The cumulative communication cost across all rounds to achieve the target accuracy was defined as:
E cumulative = t = 1 T E T ( t ) .
Lastly, the normalized communication cost for MNIST and CIFAR-10, given the target accuracies of 90% and 45%, respectively, was calculated as:
E normalized , MNIST = E cumulative E m i n , M N I S T E m a x , M N I S T E m i n , M N I S T ,
where E m i n , M N I S T is the minimum communication cost achieved to reach the target accuracy. E m a x , M N I S T is the maximum possible communication cost given a 90% accuracy target for MNIST.
E normalized , CIFAR - 10 = E cumulative E m i n , C I F A R - 10 E m a x , C I F A R - 10 E m i n , C I F A R - 10 ,
where E m i n , C I F A R - 10 is the minimum communication cost achieved to reach the target accuracy. C m a x , C I F A R - 10 is the maximum possible communication cost given a 45% accuracy target for the CIFAR-10 dataset.
The UA-FedRL method emerged as the most energy-efficient model across both datasets, acquiring the least normalized energy consumption values of 0.25 and 0.19 for MNIST and CIFAR-10, respectively. On the contrary, Fed_SGD consumed the most energy for the CIFAR-10 dataset with a value of 0.78, while Fed_AVG showed the highest energy use on the MNIST dataset at 0.86. These results highlight the energy efficiency of different FL methods when deployed on different datasets.
Table 10 presents a comprehensive comparison of different federated learning algorithms, including Fed_Share, Fed_SGD, Fed_AVG, Fed_Prox, and UA-FedRL, across three metrics: accuracy (with 95% confidence intervals), communication cost, and energy consumption on both the MNIST and CIFAR-10 datasets. UA-FedRL demonstrated the highest accuracy on both datasets (96.87% on MNIST and 62.73% on CIFAR-10), which reflects its effectiveness in handling non-IID data distributions. Additionally, UA-FedRL had the lowest communication cost (0.19 for MNIST and 0.24 for CIFAR-10) and energy consumption (0.25 for MNIST and 0.19 for CIFAR-10) which demonstrates its efficiency in resource-constrained environments. Other methods like Fed_AVG and Fed_Prox achieved competitive accuracy but with higher communication costs and energy consumption, which indicates a trade-off between model performance and resource utilization.

5.9. Uncertainty Estimation

Figure 16 illustrates the performance of an agent using UCB and uncertainty estimation in a reinforcement learning setting. The x-axis represents the number of episodes, while the y-axis represents the expected reward obtained by the agent. The blue dots represent the rewards obtained by the agent during each episode, while the solid red line represents the expected reward predicted by the UCB algorithm. The shaded pink area around the red line represents the uncertainty estimate of the algorithm.
Overall, the UCB algorithm performed better as it achieved a higher expected reward in fewer episodes. Moreover, the uncertainty estimate of the UCB algorithm was not large, indicating that the UCB algorithm was more certain about its predictions. Therefore, the UA-FedRL method can help guide the development of more effective reinforcement learning algorithms in the future for FL applications.

6. Conclusions

This study presented the UA-FedRL method, an innovative approach designed to tackle the challenges of implementing FL in heterogeneous industrial IoT environments. By dynamically selecting local epochs, UA-FedRL effectively managed the complexities of non-IID datasets and straggler IoT devices, improving accuracy, computation, and communication efficiency. The proposed PWA method further enhanced performance by addressing weight aggregation issues and adjusting the weights of individual models based on their quality. Two commonly used datasets, MNIST and CIFAR-10, were employed to perform extensive experiments of the proposed UA-FedRL method. UA-FedRL obtained an accuracy of 96.45% on the MNIST dataset and 62.75% on the CIFAR-10 dataset, when 90% straggler devices were used to train the model. Furthermore, uncertainty estimation showed the effectiveness of UCB algorithms in achieving good performance in decision-making tasks. These results demonstrate that UA-FedRL outperformed the benchmark in terms of faster convergence and higher training accuracy on both datasets, indicating its potential for enhancing the performance of FL in heterogeneous industrial IoT environments.
While our proposed UA-FedRL method provided good performance, there are some limitations which include the reliance on stable network conditions. Therefore, the communication between devices and the central server may be affected by unstable connectivity which can impact the overall performance. Additionally, the method’s performance may vary depending on the degree of data heterogeneity, as highly variable data distributions might reduce the effectiveness of the quality-based weighting mechanism. To address these limitations, future research could explore adaptive communication strategies to optimize network performance and data normalization techniques to better handle extreme data variability. Furthermore, we can optimize the PWA method by integrating additional factors such as temporal stability for aggregation. Moreover, clustered aggregation can also be investigated to improve robustness in highly heterogeneous environments.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/app14188299/s1.

Author Contributions

Methodology, A.S.M.S.S. and M.Z.I.; software and coding, M.Z.I. and A.S.M.S.S.; experimentation and formal analysis, M.Z.I. and A.S.M.S.S.; writing—original draft preparation, M.Z.I., and A.S.M.S.S.; writing—review and editing, M.Z.I., A.S.M.S.S., and A.H.; supervision, H.-S.K. and A.H.; funding acquisition, M.Z.I. and H.-S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) Grant by the Korean Government through MSIT under Grant 2022R1F1A1063662 and the Strengthening R & D Capability Program of Sejong University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, N.; Fang, X.; Wang, Y.; Wu, S.; Wu, H.; Kar, D.; Zhang, H. Physical-Layer Authentication for Internet of Things via WFRFT-Based Gaussian Tag Embedding. IEEE Internet Things J. 2020, 7, 9001–9010. [Google Scholar] [CrossRef]
  2. Elbir, A.M.; Coleri, S.; Papazafeiropoulos, A.K.; Kourtessis, P.; Chatzinotas, S. A Hybrid Architecture for Federated and Centralized Learning. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1529–1542. [Google Scholar] [CrossRef]
  3. Sheller, M.J.; Edwards, B.; Reina, G.A.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.; Colen, R.R.; et al. Federated learning in medicine: Facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 2020, 10, 12598. [Google Scholar] [CrossRef] [PubMed]
  4. Zanbouri, K.; Darbandi, M.; Nassr, M.; Heidari, A.; Navimipour, N.J.; Yalcın, S. A GSO-based multi-objective technique for performance optimization of blockchain-based industrial Internet of things. Int. J. Commun. Syst. 2024, 37, e5886. [Google Scholar] [CrossRef]
  5. Heidari, A.; Shishehlou, H.; Darbandi, M.; Navimipour, N.J.; Yalcin, S. A reliable method for data aggregation on the industrial internet of things using a hybrid optimization algorithm and density correlation degree. Clust. Comput. 2024, 27, 7521–7539. [Google Scholar] [CrossRef]
  6. Banabilah, S.; Aloqaily, M.; Alsayed, E.; Malik, N.; Jararweh, Y. Federated learning review: Fundamentals, enabling technologies, and future applications. Inf. Process. Manag. 2022, 59, 103061. [Google Scholar] [CrossRef]
  7. Antunes, R.S.; André da Costa, C.; Küderle, A.; Yari, I.A.; Eskofier, B. Federated Learning for Healthcare: Systematic Review and Architecture Proposal. ACM Trans. Intell. Syst. Technol. 2022, 13, 54:1–54:23. [Google Scholar] [CrossRef]
  8. Hard, A.; Rao, K.; Mathews, R.; Beaufays, F.; Augenstein, S.; Eichner, H.; Kiddon, C.; Ramage, D. Federated Learning for Mobile Keyboard Prediction. arXiv 2018, arXiv:1811.03604. [Google Scholar]
  9. Xianjia, Y.; Queralta, J.P.; Heikkonen, J.; Westerlund, T. Federated Learning in Robotic and Autonomous Systems. Procedia Comput. Sci. 2021, 191, 135–142. [Google Scholar] [CrossRef]
  10. Ali, M.; Naeem, F.; Tariq, M.; Kaddoum, G. Federated Learning for Privacy Preservation in Smart Healthcare Systems: A Comprehensive Survey. IEEE J. Biomed. Health Informatics 2023, 27, 778–789. [Google Scholar] [CrossRef]
  11. Nguyen, D.C.; Pham, Q.V.; Pathirana, P.N.; Ding, M.; Seneviratne, A.; Lin, Z.; Dobre, O.; Hwang, W.J. Federated Learning for Smart Healthcare: A Survey. ACM Comput. Surv. 2022, 55, 60:1–60:37. [Google Scholar] [CrossRef]
  12. Yang, W.; Xiang, W.; Yang, Y.; Cheng, P. Optimizing Federated Learning with Deep Reinforcement Learning for Digital Twin Empowered Industrial IoT. IEEE Trans. Ind. Informatics 2023, 19, 1884–1893. [Google Scholar] [CrossRef]
  13. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Atatistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  14. Sahu, A.K.; Li, T.; Sanjabi, M.; Zaheer, M.; Talwalkar, A.; Smith, V. Federated Optimization in Heterogeneous Networks. arXiv 2018, arXiv:1812.06127. [Google Scholar]
  15. Wang, H.; Kaplan, Z.; Niu, D.; Li, B. Optimizing Federated Learning on Non-IID Data with Reinforcement Learning. In Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 1698–1707. [Google Scholar] [CrossRef]
  16. Zhang, H.; Xie, Z.; Zarei, R.; Wu, T.; Chen, K. Adaptive Client Selection in Resource Constrained Federated Learning Systems: A Deep Reinforcement Learning Approach. IEEE Access 2021, 9, 98423–98432. [Google Scholar] [CrossRef]
  17. Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT. IEEE Trans. Ind. Informatics 2021, 17, 8475–8484. [Google Scholar] [CrossRef]
  18. Han, M.; Sun, X.; Zheng, S.; Wang, X.; Tan, H. Resource Rationing for Federated Learning with Reinforcement Learning. In Proceedings of the 2021 Computing, Communications and IoT Applications (ComComAp), Shenzhen, China, 26–28 November 2021; pp. 150–155. [Google Scholar] [CrossRef]
  19. Rjoub, G.; Wahab, O.A.; Bentahar, J.; Bataineh, A. Trust-driven reinforcement selection strategy for federated learning on IoT devices. Computing 2022, 106, 1273–1295. [Google Scholar] [CrossRef]
  20. Zhang, S.Q.; Lin, J.; Zhang, Q. A Multi-Agent Reinforcement Learning Approach for Efficient Client Selection in Federated Learning. Proc. AAAI Conf. Artif. Intell. 2022, 36, 9091–9099. [Google Scholar] [CrossRef]
  21. Chen, X.; Li, Z.; Ni, W.; Wang, X.; Zhang, S.; Xu, S.; Pei, Q. Two-Phase Deep Reinforcement Learning of Dynamic Resource Allocation and Client Selection for Hierarchical Federated Learning. In Proceedings of the 2022 IEEE/CIC International Conference on Communications in China (ICCC), Foshan, China, 11–13 August 2022; pp. 518–523. [Google Scholar] [CrossRef]
  22. Park, S.; Suh, Y.; Lee, J. FedPSO: Federated Learning Using Particle Swarm Optimization to Reduce Communication Costs. Sensors 2021, 21, 600. [Google Scholar] [CrossRef]
  23. Chen, S.; Shen, C.; Zhang, L.; Tang, Y. Dynamic Aggregation for Heterogeneous Quantization in Federated Learning. IEEE Trans. Wirel. Commun. 2021, 20, 6804–6819. [Google Scholar] [CrossRef]
  24. Guo, E.; Wang, X.; Wu, W. Adaptive Aggregation Weight Assignment for Federated Learning: A Deep Reinforcement Learning Approach. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, Online, 9–13 May 2022; pp. 1610–1612. [Google Scholar]
  25. Jayaram, K.R.; Muthusamy, V.; Thomas, G.; Verma, A.; Purcell, M. Adaptive Aggregation For Federated Learning. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 180–185. [Google Scholar] [CrossRef]
  26. Xu, G.; Kong, D.L.; Chen, X.B.; Liu, X. Lazy Aggregation for Heterogeneous Federated Learning. Appl. Sci. 2022, 12, 8515. [Google Scholar] [CrossRef]
  27. Li, C.J.; Huang, P.H.; Ma, Y.T.; Hung, H.; Huang, S.Y. Robust Aggregation for Federated Learning by Minimum γ-Divergence Estimation. Entropy 2022, 24, 686. [Google Scholar] [CrossRef] [PubMed]
  28. Nguyen, D.V.; Tran, A.K.; Zettsu, K. FedProb: An Aggregation Method Based on Feature Probability Distribution for Federated Learning on Non-IID Data. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022; pp. 2875–2881. [Google Scholar] [CrossRef]
  29. Han, J.; Han, Y.; Huang, G.; Ma, Y. DeFL: Decentralized Weight Aggregation for Cross-silo Federated Learning. arXiv 2022, arXiv:2208.00848. [Google Scholar] [CrossRef]
Figure 1. A generic architecture of the FL framework for IoT scenarios.
Figure 1. A generic architecture of the FL framework for IoT scenarios.
Applsci 14 08299 g001
Figure 2. Scenario of heterogeneous FL in an IoT network environment. This study focuses on the heterogeneity in device specifications and the non-Independent and Identically Distributed (non-IID) nature of datasets among individual devices.
Figure 2. Scenario of heterogeneous FL in an IoT network environment. This study focuses on the heterogeneity in device specifications and the non-Independent and Identically Distributed (non-IID) nature of datasets among individual devices.
Applsci 14 08299 g002
Figure 3. The overall workflow of UA-FedRL for adaptive epoch selection of heterogeneous industrial IoT devices.
Figure 3. The overall workflow of UA-FedRL for adaptive epoch selection of heterogeneous industrial IoT devices.
Applsci 14 08299 g003
Figure 4. The overall architecture of the PWA which employs weight quality measurement to compute the weighted average of all local models’ weight.
Figure 4. The overall architecture of the PWA which employs weight quality measurement to compute the weighted average of all local models’ weight.
Applsci 14 08299 g004
Figure 5. Hardware specifications of the selected IoT devices used in this study.
Figure 5. Hardware specifications of the selected IoT devices used in this study.
Applsci 14 08299 g005
Figure 6. The illustration of the communication between server and client side on Mininet.
Figure 6. The illustration of the communication between server and client side on Mininet.
Applsci 14 08299 g006
Figure 7. The accumulated rewards of the UA-FedRL for different gamma values when the learning rate was set to 0.1.
Figure 7. The accumulated rewards of the UA-FedRL for different gamma values when the learning rate was set to 0.1.
Applsci 14 08299 g007
Figure 8. The accumulated rewards of the UA-FedRL for different gamma values when the learning rate was set to 0.5.
Figure 8. The accumulated rewards of the UA-FedRL for different gamma values when the learning rate was set to 0.5.
Applsci 14 08299 g008
Figure 9. The accumulated rewards of the UA-FedRL for different gamma values when the learning rate was set to 0.9.
Figure 9. The accumulated rewards of the UA-FedRL for different gamma values when the learning rate was set to 0.9.
Applsci 14 08299 g009
Figure 10. The accuracy comparison between UA-FedRL and different FL methods on the MNIST dataset.
Figure 10. The accuracy comparison between UA-FedRL and different FL methods on the MNIST dataset.
Applsci 14 08299 g010
Figure 11. The accuracy comparison between UA-FedRL, Fed_AVG, and Fed_Prox methods on the MNIST dataset with 90% straggler IoT devices.
Figure 11. The accuracy comparison between UA-FedRL, Fed_AVG, and Fed_Prox methods on the MNIST dataset with 90% straggler IoT devices.
Applsci 14 08299 g011
Figure 12. The accuracy comparison between UA-FedRL and different FL methods on the CIFAR-10 dataset.
Figure 12. The accuracy comparison between UA-FedRL and different FL methods on the CIFAR-10 dataset.
Applsci 14 08299 g012
Figure 13. The accuracy comparison of UA-FedRL, Fed_AVG, and Fed_Prox methods with 90% straggler IoT devices on the CIFAR-10 dataset.
Figure 13. The accuracy comparison of UA-FedRL, Fed_AVG, and Fed_Prox methods with 90% straggler IoT devices on the CIFAR-10 dataset.
Applsci 14 08299 g013
Figure 14. Comparative analysis of normalized communication cost across different federated learning methods on MNIST and CIFAR-10 datasets.
Figure 14. Comparative analysis of normalized communication cost across different federated learning methods on MNIST and CIFAR-10 datasets.
Applsci 14 08299 g014
Figure 15. Comparative analysis of normalized energy consumption across different FL methods on MNIST and CIFAR-10 datasets.
Figure 15. Comparative analysis of normalized energy consumption across different FL methods on MNIST and CIFAR-10 datasets.
Applsci 14 08299 g015
Figure 16. The uncertainty estimation of the UA-FedRL taking each action in terms of reward.
Figure 16. The uncertainty estimation of the UA-FedRL taking each action in terms of reward.
Applsci 14 08299 g016
Table 1. Important notation.
Table 1. Important notation.
NotationDefinition
f ( w ) Objective function of the FL
p i Computational cost of training an epoch in a local IoT device
C E i Communication cost of training an epoch in a local IoT device
T C E i Total cost of training an epoch in a local IoT device
L f w E i , x Loss function of a local IoT device in an FL scenario
T C E i Total cost of performing an epoch in a local IoT device
π θ π a i , t s i , t Reinforcement learning agent policy
Q s , a , θ Action value estimation
U ( s , a ) Upper confidence bound for each action
q i Weight quality score for local IoT device’s trained weights
S , A , R State space, action space, and immediate reward function
s t , a t A state and an action taken by the control agent at time step t
γ Discount factor
TNumber of time steps in an episode
mDeep learning model size of each local IoT device
D Set of local datasets
Table 2. List of experimental parameters and simulation settings used in this study.
Table 2. List of experimental parameters and simulation settings used in this study.
ParametersSettings
Simulation Environment
Operation systemUbuntu 18.04 (64bit Linux)
Programming languagePython 3.8
Framework for model designPyTorch 2.0.1
Network design emulatorMininet 3.6.5
Network Emulator
Network topologyMesh network of switches
IP suiteTransmission control protocol (TCP)
Software switch typeOpen vSwitch 2.9.8
SDN-based controllerOVS-controller
SDN controller protocolOpenFlow
Link delayShortest route 1.6 ms
Link bandwidth20∼100 Mbps
UA-FedRL
Learning rate ( α ) 0.1 , 0.5 , 0.9
Discount factor ( γ ) 0.1 , 0.5 , 0.9
Episodes ( T ) 20 , 100
CNN
Convolution layer2
Kernel size5
OptimizerSGD
Learning rate0.01
Table 3. Different non-IID data distribution variance used in this study to evaluate the performance of UA-FedRL.
Table 3. Different non-IID data distribution variance used in this study to evaluate the performance of UA-FedRL.
Non-IID Data DistributionNumber of ShardsShard SizeShards per Client
Low (20%)60001050
Medium (50%)20003020
High (80%)6001005
Table 4. The acquired accuracy of UA-FEDRL and different FL methods for the MNIST dataset.
Table 4. The acquired accuracy of UA-FEDRL and different FL methods for the MNIST dataset.
MethodAlgorithmAccuracy
Fed_AVGAverage aggregation94.56%
Fed_ProxProximal term94.62%
Fed_SGDStochastic Gradient Descent81.88%
Fed_ShareData share strategy83.81%
UA-FedRLUA-RL + PWA96.45%
Table 5. The acquired accuracy of UA-FedRL, Fed_AVG, and Fed_Prox on the MNIST dataset with 90% straggler IoT devices.
Table 5. The acquired accuracy of UA-FedRL, Fed_AVG, and Fed_Prox on the MNIST dataset with 90% straggler IoT devices.
LevelsFed_AvgFed_ProxUA-FedRL
Low93.57%94.48%96.45%
Medium83.98%93.19%95.76%
High80.87%91.69%93.75%
Table 6. The acquired accuracy of UA-FedRL and different FL methods on the CIFAR-10 dataset.
Table 6. The acquired accuracy of UA-FedRL and different FL methods on the CIFAR-10 dataset.
MethodAlgorithmAccuracy
Fed_AVGAverage aggregation50.95%
Fed_ProxProximal term60.37%
Fed_SGDStochastic Gradient Descent45.98%
Fed_ShareData share strategy43.63%
UA-FedRLUA-RL + PWA62.75%
Table 7. The acquired accuracy of UA-FedRL, Fed_AVG, and Fed_Prox methods with 90% straggler IoT devices on the CIFAR-10 dataset.
Table 7. The acquired accuracy of UA-FedRL, Fed_AVG, and Fed_Prox methods with 90% straggler IoT devices on the CIFAR-10 dataset.
LevelsFed_AvgFed_ProxUA-FedRL
Low50.95%60.37%62.75%
Medium50.34%49.76%60.87%
High35.93%44.78%59.60%
Table 8. The ablation study of UA-FedRL with different combinations of UA-RL and PWA submodules.
Table 8. The ablation study of UA-FedRL with different combinations of UA-RL and PWA submodules.
ModuleSubmodulesAccuracy
UA-RL PWA MNIST CIFAR-10
UA-FedRL93.34%60.95%
92.54%60.34%
96.87%62.73%
Table 9. The acquired accuracy of UA-FedRL with an increasing number of IoT devices on the MNIST and CIFAR-10 datasets.
Table 9. The acquired accuracy of UA-FedRL with an increasing number of IoT devices on the MNIST and CIFAR-10 datasets.
Datasets100 Devices150 Devices200 Devices250 Devices
MNIST96.87%96.62%96.14%96.04%
CIFAR-1062.73%62.58%62.34%61.98%
Table 10. Comparison of different existing federated learning algorithms for Accuracy with 95% confidence interval, communication cost, and energy consumption.
Table 10. Comparison of different existing federated learning algorithms for Accuracy with 95% confidence interval, communication cost, and energy consumption.
 ModuleAccuracyCommunication CostEnergy Consumption
MNIST CIFAR-10 MNIST CIFAR-10 MNIST CIFAR-10
Fed_Share83.81 ± 0.79 %43.63 ± 0.81%0.800.900.720.68
Fed_SGD81.88 ± 1.12%45.98 ± 1.09%0.710.380.750.79
Fed_AVG94.56 ± 0.45%50.95 ± 0.98%0.320.610.860.81
Fed_Prox94.62 ± 0.61%60.37 ± 0.85 %0.270.320.310.24
UA-FedRL96.87 ± 0.22%62.73 ± 0.57%0.190.240.250.19
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sagar, A.S.M.S.; Islam, M.Z.; Haider, A.; Kim, H.-S. Uncertainty-Aware Federated Reinforcement Learning for Optimizing Accuracy and Energy in Heterogeneous Industrial IoT. Appl. Sci. 2024, 14, 8299. https://doi.org/10.3390/app14188299

AMA Style

Sagar ASMS, Islam MZ, Haider A, Kim H-S. Uncertainty-Aware Federated Reinforcement Learning for Optimizing Accuracy and Energy in Heterogeneous Industrial IoT. Applied Sciences. 2024; 14(18):8299. https://doi.org/10.3390/app14188299

Chicago/Turabian Style

Sagar, A. S. M. Sharifuzzaman, Muhammad Zubair Islam, Amir Haider, and Hyung-Seok Kim. 2024. "Uncertainty-Aware Federated Reinforcement Learning for Optimizing Accuracy and Energy in Heterogeneous Industrial IoT" Applied Sciences 14, no. 18: 8299. https://doi.org/10.3390/app14188299

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop