FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites

Zong, Qiangwen; Xu, Jiaxiang; Li, Wenqiang; Pan, Feng; Wang, Wenting; Liao, Yang; Liao, Yong

doi:10.3390/electronics14071372

Open AccessArticle

FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites

by

Qiangwen Zong

¹,

Jiaxiang Xu

¹,

Wenqiang Li

¹,

Feng Pan

¹,

Wenting Wang

¹,

Yang Liao

^1,* and

Yong Liao

^2,*

¹

Three Gorges High-Tech Information Technology Co., Ltd., Yichang 443002, China

²

School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(7), 1372; https://doi.org/10.3390/electronics14071372

Submission received: 26 February 2025 / Revised: 26 March 2025 / Accepted: 27 March 2025 / Published: 29 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

With the deployment of fifth-generation (5G) mobile communication technology and rapid advancements in artificial intelligence and edge computing, smart construction sites have emerged as a critical direction for the construction industry’s transformation and upgrading. However, existing intelligent Internet of Things (IoT) access algorithms often struggle to simultaneously meet practical requirements for high-efficiency data transmission rates, low latency, and secure privacy-aware access in the dynamic and complex environments of smart construction sites. To address this, this paper proposes a federated learning-based Multi-Objective Dueling Double Deep Q-Network (FL-MD3QN)-based IoT access algorithm for multi-site, multi-modal, multi-user IoT systems under the same Base Station (BS). First, a three-objective optimization mathematical model was established. The optimization goals include maximizing data transmission rates, minimizing transmission delays, and maximizing reliability. Constraints such as bandwidth, rate, bit error rate (BER), and security/privacy are defined. Second, the FL-MD3QN algorithm is developed to solve this optimization problem. This algorithm can adaptively adjust the access strategy to cope with the complex and ever-changing communication needs of smart construction sites and, by introducing a federated learning mechanism, it achieves collaborative optimization of multiple construction site IoT systems while ensuring user privacy. Simulation results demonstrated significant advantages of the FL-MD3QN algorithm. For latency, it achieved markedly lower delays across multi-modal services compared to benchmark algorithms, with the shortest training time. In transmission rates, FL-MD3QN delivered the highest average rates, particularly excelling in video services. Under high signal-to-noise ratio conditions, FL-MD3QN achieved exceptionally low BER values. Additionally, it attained high levels in average access success rate and average reward value, confirming its robust adaptability and optimization performance in complex smart construction environments.

Keywords:

smart construction site; federated learning; deep reinforcement learning; Internet of Things; multi-user access; edge intelligence

1. Introduction

With the deployment of fifth-generation (5G) mobile communication technology and rapid advancements in cutting-edge technologies like artificial intelligence and edge computing, smart construction sites have become pivotal for the construction industry’s transformation. These sites integrate Internet of Things (IoT), big data, and other technologies to achieve intelligent, automated, and visual management [1,2,3]. IoT serves as the backbone for data collection and transmission in smart construction sites. It uses sensors and devices to gather real-time environmental, equipment, and personnel data, transmitting them to management systems for decision making [4,5]. The IoT system in smart construction sites includes diverse sensors and devices, such as environmental monitors, video surveillance equipment, smart access controls, and personnel trackers. These generate multi-modal data (text, audio, images, and video), requiring high bandwidth, speed, low latency, and security [6,7].

Dynamic Spectrum Access (DSA) addresses bandwidth optimization challenges [8,9]. To improve DSA efficiency, various techniques have been proposed. Reference [10] developed a channel state prediction model and a spectrum access strategy for secondary users using Markov decision processes. Ref. [11] introduced a stochastic channel access strategy with dual-channel spectrum sensing to enhance secondary user (SU) sensing and reduce channel collisions. Ref. [12] designed a low-complexity learning algorithm for multi-user, multi-channel cognitive radio networks. Ref. [13] used neural networks for dynamic relay selection in cognitive radio (CR) networks. While these models improve spectrum efficiency, they struggle in complex spectrum environments. While the aforementioned models have improved spectral efficiency, they have not addressed the multi-objective optimization problem in complex spectrum environments, thus failing to meet the access requirements in such environments.

Ref. [14] applied Q-learning to DSA, enabling SUs to update Q-functions from environmental feedback, reducing convergence iterations. Ref. [15] combined reinforcement learning (RL) and Bayesian algorithms to predict channel idle time, boosting throughput by minimizing sensing overhead. Ref. [16] deployed a Deep Q-Network (DQN) model in distributed DSA to enhance spectrum access. Ref. [17] optimized DQN for long-term secondary throughput, proposing an adjusted deep deterministic policy gradient. Traditional DQN suffers from overestimation bias due to greedy Q-value computation, limiting multi-objective optimization. Double Deep Q-Network (DDQN) [18] exhibits slow convergence in complex environments. D3QN [19], combining Double DQN and Dueling DQN, mitigates Q-value overestimation and improves policy accuracy. Dueling DQN separates Q-values into state-value and action-advantage functions, enabling independent environmental state and action evaluation, ideal for long-term optimization in smart construction scenarios.

FL ensures user privacy in wireless networks [20,21]. Ref. [22] proposed a hierarchical FL framework for heterogeneous cellular networks, reducing communication latency via resource allocation. Ref. [23] introduced a client-edge-cloud FL system to lower user-cloud communication costs. Ref. [24] leveraged personalized FL and edge computing to address heterogeneity, achieving low-latency IoT processing. Ref. [25] developed a Federated Deep Reinforcement Learning framework for DSA, safeguarding IoT user privacy. However, the aforementioned literature has not been designed for the multi-site collaborative optimization scenario in smart construction sites.

In summary, despite existing research applying reinforcement learning and federated learning in dynamic spectrum access, the following challenges remain in smart construction site scenarios: (1) a single neural network cannot meet the access requirements in complex spectrum environments; (2) traditional RL methods suffer from overestimation bias in spectrum allocation due to their greedy strategy; and (3) existing federated learning frameworks do not consider the efficiency trade-offs in multi-site collaborative optimization. To address these challenges in complex construction site scenarios, this paper proposes an FL-MD3QN-based intelligent access algorithm for the on-demand access problem of multi-site, multi-modal, and multi-user under the same BS, tailored to the new transmission requirements of IoT in smart construction sites. FL ensures privacy and enables collaborative optimization across sites. MD3QN adaptively adjusts access strategies for dynamic requirements. Key contributions include:

(1): This paper proposes an IoT intelligent access framework tailored for multi-site, multi-modal, and multi-user scenarios, innovatively constructing a three-objective optimization model aimed at maximizing data transmission rate, minimizing delay, and maximizing reliability. When aggregating model parameters using federated learning, a dynamic weighted aggregation mechanism is introduced, which adaptively adjusts weights based on data quality and communication stability at each site, ensuring privacy security while improving data aggregation efficiency.
(2): The federated learning mechanism boasts a unique design in addressing user privacy risks. By integrating heterogeneous data from multiple sites, it enhances model generalization capability. Local models are trained at each site and aggregated to form a global model, adapting to varying communication demands.
(3): A dynamic spectrum access strategy based on MD3QN is presented, featuring a shared base-layer mechanism in the MD3QN architecture that reduces parameter redundancy and improves computational efficiency. Meanwhile, a dual-network update strategy decouples action selection and evaluation, effectively mitigating the issue of overestimated Q-values. A business-adaptive reward function tailored to differentiated business characteristics optimizes spectrum access strategies for each scenario by assigning varying weights.
(4): The FL-MD3QN algorithm exhibits significant advantages in performance. By integrating federated learning and optimizing reward function design, FL-MD3QN surpasses traditional algorithms in key metrics such as delay, transmission rate, access success rate, and bit error rate. Furthermore, its lightweight deployment on the Jetson AGX Xavier edge node further validates its scalability and practicality in real-world smart building scenarios.

2. Problem Description

In smart construction sites, IoT technology is widely used for environmental monitoring, equipment supervision, personnel management, safety alerts, and more. Deployed IoT sensors and devices include environmental sensors (monitoring air quality, noise levels, temperature, and humidity), video surveillance equipment (real-time site monitoring and accident prevention), smart access control systems (personnel management and site security), and GPS/RFID-based personnel positioning systems (tracking locations and improving efficiency). These enable intelligent, automated, and visual site management. Figure 1 shows a multi-modal, multi-user IoT application scenario where multiple sites share a single BS. The BS communicates wirelessly with IoT devices (video, sensors and voice devices) across sites. These devices collect data for transmission to edge servers. Edge servers process and analyze data to enable real-time decision making and enhance construction efficiency and safety.

The multi-modal data generated by users include text, audio, images, and video. Each data type corresponds to distinct transmission rate and bandwidth requirements. Define as follows: Site set: k = {1,2,…,K}; User set: M_k = {1,2,…,M_k} for site k; Service modalities: N = {0 (text), 1 (audio), 2 (image), 3 (video)}; Time cycle: t∈T; Access indicator: x_k,m,n(t)∈{0,1}, ∀k∈K, m∈M_k, n∈N, t∈T, where x_k_,m,n(t) = 1 indicates that user m in site k transmits modality n at time t. Occupation time: t_k_,_m_,_n (time for user m transmitting modality n). A three-objective optimization model is formulated to:

(1): Maximize data transmission rate.

\sum_{k \in K} \sum_{m \in M_{k}} \sum_{n \in N} R_{k, m, n} (t) \cdot x_{k, m, n} (t)

(1)

where

R_{k, m, n} (t)

is the rate for user m transmitting modality n.

(2): Minimize data transmission delay.

\sum_{k, m, n} T_{k, m, n}^{t o t a l} (t) \cdot x_{k, m, n} (t)

(2)

where

T_{k, m, n}^{t o t a l} (t)

is the end-to-end delay for modality n.

(3): Maximize reliability.

\sum_{k, m, n} (1 - B E R_{k, m, n} (t)) \cdot x_{k, m, n} (t)

(3)

where

B E R_{k, m, n} (t)

is the BER.

Constraints:

(1): Bandwidth constraint:

\sum_{k, m, n} \frac{R_{k, m, n} (t)}{η_{k, m, n} (t)} \leq B_{t o t a l}, \forall t

(4)

where

B_{t o t a l}

is the total bandwidth of the BS.

(2): Power constraint:

\sum_{n} P_{k, m, n}^{t x} (t) \cdot x_{k, m, n} (t) \leq P_{k, m}^{m a x}, \forall k, m, t

(5)

where

P_{k, m}^{m a x}

is the maximum transmit power of user mm in site k.

(3): Processing capability constraint:

\sum_{n} Q_{k, m, n} (t) \cdot x_{k, m, n} (t) \leq C_{k, m}^{m a x}

(6)

where

Q_{k, m, n} (t)

is the data volume of modality n, and

C_{k, m}^{m a x}

is the processing capability limit of the user device.

(4): Total access time constraint:

\sum_{k, m, n} t_{k, m, n} \leq T_{\max}

(7)

where

T_{\max}

is the maximum total access time.

(5): BER constraint:

B E R_{k, m, n} (t) \cdot x_{k, m, n} (t) \leq B E R_{\max}, \forall k, m, n

(8)

where

B E R_{\max}

is the maximum tolerable BER.

(6): Transmission data association constraint:

t_{k, m, n} = \frac{Q_{k, m, n}}{R_{k, m, n}} \cdot x_{k, m, n}, \forall k, m, n

(9)

The three-objective optimization problem is formulated as:

\begin{array}{l} o p t . \{\begin{cases} \max \sum_{k \in K} \sum_{m \in M_{k}} \sum_{n \in N} R_{k, m, n} (t) \cdot x_{k, m, n} (t) \\ \min \sum_{k, m, n} T_{k, m, n}^{t o t a l} (t) \cdot x_{k, m, n} (t) \\ \max \sum_{k, m, n} (1 - B E R_{k, m, n} (t)) \cdot x_{k, m, n} (t) \end{cases} \\ s . t . \{\begin{cases} \sum_{k, m, n} \frac{R_{k, m, n} (t)}{η_{k, m, n} (t)} \leq B_{t o t a l}, \forall t \\ \sum_{n} P_{k, m, n}^{t x} (t) \cdot x_{k, m, n} (t) \leq P_{k, m}^{m a x}, \forall k, m, t \\ \sum_{n} Q_{k, m, n} (t) \cdot x_{k, m, n} (t) \leq C_{k, m}^{m a x} \\ \sum_{k, m, n} t_{k, m, n} \leq T_{\max} \\ B E R_{k, m, n} (t) \cdot x_{k, m, n} (t) \leq B E R_{\max}, \forall k, m, n \\ t_{k, m, n} = \frac{Q_{k, m, n}}{R_{k, m, n}} \cdot x_{k, m, n}, \forall k, m, n \end{cases} \end{array}

(10)

This model involves highly nonlinear, nonconvex multi-objective optimization with strict constraints. Diverse data types, trade-offs between objectives (rate, delay, and reliability), and interdependent constraints complicate the problem. Traditional methods struggle to find global optima. Thus, this paper proposes an FL and D3QN-based intelligent access method, combining machine learning and distributed computing to provide effective IoT access solutions for smart construction sites.

3. FL-MD3QN-Based Intelligent Access

The smart access process is the core component for achieving efficient and secure access in IoT for smart construction sites. The proposed FL and MD3QN-based intelligent access framework demonstrates significant key contributions in the three major steps of data preprocessing, system model construction, and network training and deployment. In the data preprocessing stage, multi-modal data generated by users in smart construction sites are transformed into a format suitable for neural network input at the edge server through steps such as traffic segmentation, feature extraction, and CSV format conversion. In system model construction, an FL and MD3QN-based intelligent access system model is proposed. This model utilizes a hierarchical FL framework to achieve collaborative optimization of multi-site IoT systems, significantly enhancing model generalization ability and data transmission efficiency. Simultaneously, the adaptive access strategy adjustment of the D3QN algorithm enables the system to flexibly respond to complex and varied transmission demands in smart construction sites. By designing a business-adaptive multi-objective reward function, the model can guide the agent to achieve dynamic trade-offs between multiple objectives such as spectral efficiency, delay constraints, and privacy protection, ensuring optimal overall system performance. In the network training and deployment stage, the model is trained using collected IoT device traffic sample sets. After training, the FL-MD3QN model is lightweighted and deployed on edge server nodes at construction sites, enabling real-time application of the intelligent access algorithm. This improves system response speed and provides robust technical support for intelligent management in smart construction sites.

3.1. Data Preprocessing

In smart construction sites, data preprocessing is performed at the BS. Input data include multi-modal and massive local data from users across sites, such as text, audio, images, and video. The goal is to convert raw traffic data into formats suitable for neural network inputs. The process involves three steps: traffic segmentation, feature extraction, and CSV format conversion. The SplitCap tool splits raw traffic by five-tuple information and filters nontarget device traffic via MAC addresses. Feature extraction considers transmission characteristics and bandwidth requirements of different data types. For example, video data require higher bandwidth than text. Features are extracted based on the method in [26], ensuring accurate reflection of transmission characteristics and device states. Extracted features (Table 1) serve as inputs to the MD3QN model at edge servers.

The raw traffic files are in hexadecimal format and cannot be directly used as neural network inputs. Data conversion is required. The process involves writing feature-extracted traffic data into CSV files using sliding windows, generating a CSV-formatted traffic sample set. Integrating these steps, this paper proposes a Sliding Window-Based Data Preprocessing (SW-Based DPP) algorithm. The pseudocode of SW-Based DPP is described in Algorithm 1. Let L denote the raw IoT device traffic (pcap format). After preprocessing, the unified initial sample set is represented as csvs.

Algorithm 1: SW-Based DPP algorithm

1: Input: L //*Raw IoT traffic file in pcap format
2: Output: csvs // CSV-formatted traffic dataset containing device ID, service type, extended transmission features, and basic traffic features
3: t₁, t₂, … t_i ← SplitCap(L)// Split raw traffic by five-tuple information and remove non-target IoT device traffic
4: for t_i in t do // ti: ith traffic file of target IoT devices
5: lists = rdp(t_i)// Parse raw traffic using scapy library and save to lists
6: for list in lists do
7: File = open(“device.csv”,“w”)// Create CSV file to store traffic features
8: ARP, LLC, IP… = 0 //Initialize 26-dimensional feature values
9: ARP, LLC, IP… = list[ARP], list[LLC], list[IP]…// Extract 26-dimensional features
10: ARP, LLC, IP…→File //Write features to CSV file
11: File.close ()
12: end for
13: csvs = csv₁∪csv₂∪csv₃…∪csv_i //Merge into initial IoT traffic dataset
14: end for
15: return csvs

3.2. System Model

The multi-site, multi-modal, multi-user IoT intelligent access framework is shown in Figure 2. First, different sites upload locally collected 26-dimensional IoT data to the BS. The BS processes these data using the SW-Based DPP algorithm to generate feature data. The feature data are then uploaded to edge servers. Edge servers train the MD3QN algorithm to generate base model parameters. The Federated Averaging (FAVG) module aggregates base model parameters from all sites to obtain global model parameters. These parameters are transmitted back via the BS to IoT users across sites for updates. The process repeats until the aggregated model converges to target parameters. Finally, a global strategy balancing efficiency and security is output and distributed to all users. This ensures data privacy while optimizing BS resource allocation.

3.2.1. MD3QN Algorithm

The MD3QN workflow is shown in Figure 3. During initialization, agents at edge nodes of each site collect real-time site status information. The collected data are stored in an experience replay buffer. Agents select access points ap_j based on current policies. After executing actions, they observe rewards r_t and new states s_t₊₁. To comply with FL privacy constraints, raw experience data remain stored locally at edge nodes. Knowledge sharing is achieved via differential privacy mechanisms. Each experience tuple (s_t, a_t, r_t, and s_t₊₁) is assigned an initial priority p_t = 1, ensuring all samples are sampled at least once. The data in the experience replay buffer correspond to indices in a priority storage array, enabling efficient prioritized experience replay. During algorithm updates, it samples batch data from the experience replay buffer based on priorities. The priority of each data point is dynamically adjusted during network parameter optimization to reflect its contribution to learning. Q-values evaluate action policies, accelerating goal achievement through efficient learning. When accumulated experiences reach a threshold, the target network adopts the latest parameters from the prediction network. This enables synchronized updates to stabilize learning outcomes.

In the complex, heterogeneous IoT environment of smart construction sites, the reward function must guide the agent to dynamically balance multi-objective trade-offs, including spectrum efficiency, latency constraints, and privacy protection. For differentiated services (e.g., video surveillance and environmental sensing), this paper designs a service-adaptive multi-objective reward function. The reward r_t for an action under state s_t is defined as:

r (t) = a R_{s u m} - b T_{a v g} + c (1 - B E R_{a v g})

(11)

where R_sum is the total system transmission rate at current time. T_avg is the average transmission delay at current time. BER_avg is the average BER at current time. The weighting coefficients a, b, and c are configured in real time based on NN using a lookup table-based method (see Table 2).

Normalization constraint: a + b + c = 1. This ensures multi-objective balance. For video transmission services, the weights prioritize throughput. For text services, privacy protection is emphasized. This aligns with QoS requirements across different service scenarios. For agent decision making, the ϵ-greedy strategy is adopted. With probability ϵ, a random action is selected. With probability 1−ϵ, the action corresponding to the maximum Q-value computed by the Q-network is chosen, as shown in Equation (12):

a_{t} = \{\begin{matrix} a_{i}, i = (0, 1, 2, \dots, 10, 11), P = ε \\ \arg \max Q (s_{t}, a_{t}; θ_{t}), P = 1 - ε \end{matrix}

(12)

In the MD3QN algorithm, the neural network output Q(s,·,·; θ) consists of the state value function V and the action advantage function A. This dual-network architecture improves policy evaluation accuracy by distinguishing state value from action advantage. The advantage function A is calculated as follows (Equation (13)):

A (s, a; θ_{t}) = Q (s, a; θ_{t}) - V (s; θ_{t})

(13)

where θ_t represents the weight parameters of the Q-network. A(s, a; θ_t) denotes the advantage of action aa relative to V(s; θ_t). A higher advantage value indicates a better action choice. The state value function V, which evaluates the overall expected return of probabilistic actions in future steps, is defined as (Equation (14)):

V (s; θ_{t}) = \max_{π} V (s; θ_{t})

(14)

The advantage function A(s, a; θ_t) reflects the superiority of a specific action. If all actions have identical advantages for state ss, differences between actions are indistinguishable. To eliminate bias, the Q-value is adjusted by subtracting the average advantage (Equation (15)):

Q (s, a; θ_{t}) = V (s; θ_{t}) + (A (s, a; θ_{t}) - \frac{1}{| A |} \sum A (s, a; θ_{t}))

(15)

The MD3QN algorithm updates neural network weights using the loss function (Equation (16)):

L_{M D 3 Q N} = E [{(r_{t} + γ Q (s_{t + 1}, \arg \max (Q (s_{t + 1}, a_{t}; θ_{t})); θ_{t}^{-}) - Q (s_{t}, a_{t}; θ_{t}))}^{2}]

(16)

where

θ_{t}^{-}

is weight parameters of the target network. r_t is the reward function. γ is the discount factor. When γ = 0, the agent focuses solely on immediate rewards (short-term strategy). Higher γ values prioritize future state–action pair rewards (long-term strategy). The predicted Q-value Q(s_t, a_i; θ_i) uses mean squared error for backpropagation.

3.2.2. FAVG Algorithm

After obtaining base model data from different sites via MD3QN, the edge server uses the FAVG algorithm to aggregate these base model parameters into a global model. The pseudocode is shown in Algorithm 2.

Algorithm 2: FAVG. B, E, η: Local minibatch size, epochs, and learning rate.

//FL execution on the edge server side
1: Initialize global model weights w₀
2: for each communication round t = 1, 2,…, T do
3: Set user selection ratio C (0 < C ≤ 1) // C is the fraction of users selected per round
4:

m \leftarrow m a x (C \cdot K, 1)

5:

S_{t} \leftarrow

Randomly select m users
6: for each site

k \in S_{t}

in parallel do
7:

w_{t + 1}^{k} \leftarrow (k, w_{t})

// Local training at user k based on global model w_t
8: end for
9: // Aggregate local model updates (weighted average)

w_{t + 1} \leftarrow \sum_{k = 1}^{K} \frac{n_{k}}{n} w_{t + 1}^{k}

// Sample count for user k; n: Total selected samples
10: end for
// Definition of client update function
Function ClientUpdate (k, w_global):
11: Initialize local model

w \leftarrow w_{g l o b a l}

12: Fetch local dataset P_k with size n_k
13: Partition P_k into batches of size B
14: for each local epoch i = 1, 2,…, E do
15: for each batch

b \in P_{k}

do
16: // Compute gradient and update parameters (gradient descent)
17: ∇ ← Compute gradient

\nabla l (w; b)

for batch b
18: w ←

w - η \cdot \nabla

// η: Learning rate
19: end for
20: end for
21: Return updated model w to the edge server

3.3. The Training and Deployment of the FL-MD3QN Network

Training data for the FL-MD3QN network originate from user traffic samples collected in smart construction site scenarios. By simulating access behaviors of 20 IoT device users across construction sites, an initial dataset containing 100,000 labeled samples is constructed. User types are categorized into four classes based on smart construction site configurations, as detailed in Table 3.

Each sample contains device ID, 26-dimensional traffic features, service type labels, and physical layer parameters. The dataset is divided into training, validation, and test sets at a 7:2:1 ratio. The training set covers 270 users from 18 construction sites. The validation set includes 15 users from one site. The test set strictly isolates 15 users from one site during training.

Training is conducted on a distributed server cluster with 4 NVIDIA A100 GPUs (manufacturer: NVIDIA Corporation, headquartered in Santa Clara, CA, USA). A PyTorch 1.12 framework implements parallel training, using a three-layer fully connected structure (512-256-128) for hidden layers. During deployment, the trained FL-D3QN model is optimized for lightweight deployment and embedded on construction site edge computing nodes using Jetson AGX Xavier platforms (manufacturer: NVIDIA Corporation, headquartered in Santa Clara, CA, USA).

4. Simulation and Analysis

4.1. Simulation Parameter Settings

To validate the FL-MD3QN algorithm in smart construction site scenarios, this section presents experimental results for dynamic spectrum access. Core algorithm parameters include channel count, learning rate, and discount factor. Privacy protection parameters include data encryption rate and data leakage risk threshold to meet security requirements. Experimental environment configurations ensure stable and efficient user access by setting 5G IoT key parameters: interface latency, mobility support, bandwidth, and operating frequency bands. The learning rate is set to 0.95 and the discount factor to 0.96 to accommodate the real-time decision-making requirements of dynamic spectrum access with a high learning rate, while the discount factor balances long-term rewards with the continuous nature of 5G services. The federated learning noise scale is set to 0.01 based on the differential privacy mechanism to balance model accuracy and privacy strength [24]. The data encryption rate is set to 100% and the data leakage risk threshold is set to 0.01%, complying with industrial IoT security standards [27]. The air interface delay is set to 5 ms, the bandwidth to 20 MHz, and the operating frequency band to 2.6 GHz, strictly adhering to the 3GPP 5G NR standard to accommodate dense device access and high-throughput requirements for video services in construction sites [28]. These parameters comprehensively evaluate the FL-MD3QN algorithm’s optimization capability for multi-modal, multi-user IoT access while ensuring privacy and data security. The proposed model parameter settings are listed in Table 4.

4.2. Simulation Result Analysis

As shown in Table 5, in the dynamic IoT access scenarios of smart construction sites, significant differences exist in computational complexity and performance across algorithms. DQN, as the baseline algorithm, employs a single Q-network to map high-dimensional states to actions, resulting in a complexity of O(d²). Its training time reaches 143 min due to iterative corrections for Q-value overestimation caused by greedy policies. MD3QN reduces parameters by 15% through sharing base layers in Dueling DQN and Double DQN, lowering complexity to O(0.85d²) and training time to 135 min. However, it lacks federated learning, failing to leverage multi-site data synergy. FL-DQN and FL-DDQN adopt distributed training to reduce single-node redundancy but retain structural flaws. FL-DQN maintains dual networks (local and global), increasing complexity to O(1.5d²). FL-DDQN further introduces target network updates, doubling parameters and synchronization frequency, leading to marginally longer training time (119 vs. 118 min). Both exhibit high federated communication overhead (FL-DQN: “High”; FL-DDQN: “Relatively High”) due to uncompressed parameters, where communication delays account for 15–20% of training time, conflicting with IoT low-latency goals. In contrast, FL-MD3QN achieves superior performance through three optimizations:

Structural Compression: combines Dueling DQN’s shared layers with pruning and quantization, reducing parameters by 30% and complexity to O(0.72d²).

Efficient Federated Learning: implements dynamic aggregation and lightweight model compression (30% communication load reduction) with differential privacy (15% fewer communication rounds).

Prioritized Experience Replay: filters high-value samples, reducing 20% redundant computation. These innovations cut training time to 110 min (23% faster than DQN).

As shown in Table 6, the FL-MD3QN algorithm demonstrates significant low-latency advantages in IoT access scenarios for smart construction sites. For diverse service types (text, audio, image, and video), FL-MD3QN achieves not only lower average latency than other algorithms but also significantly smaller standard deviations (text: 0.03; video: 0.20), indicating high robustness across heterogeneous site environments. For text services, FL-MD3QN reduces latency to 0.14 ± 0.03 s, an 82.5% improvement over DQN (0.80 ± 0.12 s), with a 75% reduction in standard deviation. This highlights FL-MD3QN’s ability to dynamically aggregate multi-site data via federated learning, rapidly adapting to low-bandwidth demands and mitigating latency fluctuations caused by local channel contention. For audio, image, and video services, FL-MD3QN reduces latency by approximately 50–70% compared to baseline algorithms. Traditional algorithms like DQN and MD3QN exhibit larger standard deviations (e.g., video: DQN = 0.45 and MD3QN = 0.35), reflecting sensitivity to local channel load variations. While federated learning algorithms (FL-DQN and FL-DDQN) improve distributed training, their unoptimized model structures and communication mechanisms still result in higher latency deviations (FL-DQN: 0.30; FL-DDQN: 0.25). FL-MD3QN’s service-adaptive multi-objective reward function ensures minimal standard deviations across all service types, validating its stability in dynamic multi-objective optimization. This underscores its superior adaptability and precision in balancing spectrum efficiency, latency, and reliability under complex IoT access conditions.

As shown in Figure 4, during initial iterations, all models exhibit low average access success rates. As iterations increase, the FL-MD3QN algorithm rapidly improves its success rate and reaches a high level within fewer steps. In contrast, DQN and MD3QN show slower growth in success rates and lower final convergence values. This highlights FL-MD3QN’s faster convergence speed and higher precision. After introducing FL, FL-DQN and FL-DDQN achieve improved success rates but still underperform compared to FL-MD3QN. By integrating MD3QN’s strengths with FL, FL-MD3QN enables collaborative optimization across multiple construction site IoT systems. Its efficient learning mechanisms further enhance data transmission access rates. The rapid convergence of FL-MD3QN’s success rate demonstrates its ability to dynamically adjust bandwidth allocation of APs within sites. This avoids transmission failures caused by channel competition, improving overall construction site management efficiency.

The proposed framework achieves rapid convergence, indicating that edge servers enhance model convergence speed. As shown in Figure 5, initial average reward values across models are similar. However, as iterations increase, the FL-MD3QN algorithm rapidly improves its average reward and reaches a high level quickly. Specifically, by the 1000th iteration, FL-MD3QN achieves an average reward of 5.23, significantly outperforming DQN (2.87), MD3QN (4.55), FL-DQN (4.20), and FL-DDQN (4.31). The reward growth rate of FL-MD3QN is the fastest, reflecting its superior learning and optimization capabilities. In contrast, DQN shows slow reward growth, indicating limited adaptability in complex transmission environments. MD3QN improves growth speed but remains inferior to FL-MD3QN. FL-DQN and FL-DDQN, while incorporating FL, still have room for performance improvement. FL-MD3QN achieves high average rewards with fewer iterations and maintains stable growth. This demonstrates robust convergence and stability, enabling the algorithm to identify optimal access strategies efficiently. Such capability enhances data transmission efficiency and reliability in dynamic scenarios.

To validate the effectiveness of the reward function design, Table 7 compares four schemes: Scheme 1 is the proposed service-adaptive multi-objective reward function. Scheme 2 is fixed-weight multi-objective reward (a = 0.3, b = 0.3, c = 0.4). Scheme 3 is single-objective reward (maximizing transmission rate only). Scheme 4 is unconstrained reward (ignoring BER).

Experimental results demonstrate the superiority of Scheme 1: multi-objective optimization. Compared to Scheme 2, Scheme 1 achieves a 26.9% increase in average reward value, a 28.6% reduction in video latency, and a 62.5% improvement in BER. Dynamic weighting necessity: Scheme 3 (single-objective) exhibits severe BER degradation (3.80 × 10⁻⁴) despite higher rates, while Scheme 4 (unconstrained) shows intermediate performance with higher BER (2.50 × 10⁻⁴). Theoretical validation: the proposed reward function guides policy convergence to pareto-optimal solutions, balancing spectrum efficiency, latency, and reliability. This analysis confirms that dynamic weight adjustment and BER constraints are critical for ensuring system reliability and efficiency in complex IoT environments.

Table 8 compares the average transmission rates of different models across multi-modal service types. The FL-MD3QN algorithm achieves the highest rates for all services. For text services, FL-MD3QN reaches 5.7 Gbps, marking an 18.8% and 14% improvement over DQN (4.8 Gbps) and MD3QN (5.0 Gbps), respectively. For audio services, FL-MD3QN delivers 5.6 Gbps, outperforming DQN by 24.4% and MD3QN by 19.1%. In image and video services, FL-MD3QN maintains superior performance with rates of 4.4 Gbps and 3.8 Gbps, respectively. These results confirm FL-MD3QN’s ability to efficiently utilize spectrum resources and enhance data transmission efficiency in smart construction site IoT access scenarios. The findings validate the practicality and effectiveness of FL-MD3QN in complex, dynamic smart construction environments.

As shown in Figure 6, FL-MD3QN exhibits minimal BER differences compared to other algorithms under low SNR conditions. At moderate SNR levels, its BER becomes significantly lower than competing approaches. Notably, under high SNR scenarios, FL-MD3QN achieves an exceptionally low BER of 0.00045, demonstrating an 85% reduction compared to the DQN algorithm. In comparison, the baseline DQN algorithm performs poorly. The MD3QN algorithm improves performance by introducing deterministic policy gradients but still underperforms FL-based models. Although FL-DQN and FL-DDQN incorporate FL, they do not surpass FL-MD3QN across all conditions. These results confirm that FL-MD3QN ensures higher reliability and stability in data transmission within smart construction site communication environments.

5. Conclusions and Future Work

5.1. Conclusions

This study addresses the IoT intelligent access problem for multi-site, multi-modal, and multi-user scenarios under a shared BS. The problem is modeled as a triple-objective optimization task: maximizing data transmission rate, minimizing access latency, and minimizing BER under bandwidth, rate, and privacy–security constraints. The FL-MD3QN-based intelligent access algorithm is proposed. Simulation results demonstrate that FL-MD3QN outperforms other algorithms in training time, latency, throughput, BER, average access success rate, and reward value. FL-MD3QN reduces training time by 23% and 18.5% compared to DQN and MD3QN, respectively. For latency, FL-MD3QN achieves significant reductions across all service types, particularly 55.9% for video services. In throughput, FL-MD3QN reaches 3.8 Gbps for video transmission, marking improvements of 72.7% and 58.3% over DQN and MD3QN. Under high SNR, FL-MD3QN achieves an ultra-low BER of 0.00045. It also attains high levels in average access success rate and reward value. These results validate FL-MD3QN as an efficient and reliable IoT access solution for smart construction sites.

5.2. Future Work

Future research could further explore the application effectiveness of the FL-MD3QN algorithm in smart construction site scenarios with varying complexities and scales, as well as its robustness and scalability under diverse network environments. Additionally, integrating FL-MD3QN with advanced technologies like blockchain may enhance the overall performance of IoT systems in smart construction sites. Addressing potential challenges such as communication latency, model synchronization, and security risks, further optimization and refinement of the multi-objective functions in FL-MD3QN could improve adaptability to increasingly complex transmission environments. Emphasis should be placed on enhancing the algorithm’s applicability in real-world scenarios through dynamic interference modeling, anti-interference design, and real-world edge computing validation. Integrating digital twin and blockchain technologies could strengthen anti-interference capabilities and privacy protection, ultimately facilitating the transition from simulation to real-world deployment.

Author Contributions

Conceptualization, Q.Z. and Y.L. (Yong Liao); methodology, J.X. and Y.L. (Yong Liao); software, W.L. and F.P.; validation, Y.L. (Yong Liao); formal analysis, Y.L. (Yang Liao); investigation, W.W. and F.P.; data curation, F.P.; writing—original draft preparation, Y.L. (Yong Liao); writing—review and editing, Y.L. (Yong Liao); visualization, Y.L. (Yong Liao); supervision, W.L., Y.L. (Yang Liao), and Y.L. (Yong Liao). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the China Three Gorges Corporation—Research, Development, and Application of Data Acquisition and Secure Transmitter for Environmental Monitoring IoT Devices (NBZZ202400302).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Authors Qiangwen Zong, Jiaxiang Xu, Wenqiang Li, Feng Pan, Wenting Wang and Yang Liao were employed by the company Three Gorges High-tech Information Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Márquez-Sánchez, S.; Campero-Jurado, I.; Robles-Camarillo, D.; Rodríguez, S.; Corchado-Rodríguez, J.M. BeSafe B2.0 Smart Multisensory Platform for Safety in Workplaces. Sensors 2021, 21, 3372. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Tian, N.; Song, D.-A.; Zhang, L. Digital Twin-Enabled Multi-Service Task Offloading in Vehicular Edge Computing Using Soft Actor-Critic. Electronics 2025, 14, 686. [Google Scholar] [CrossRef]
Shaukat, K.; Luo, S.; Varadharajan, V. A novel deep learning-based approach for malware detection. Eng. Appl. Artif. Intell. 2023, 122, 106030. [Google Scholar] [CrossRef]
Attajer, A.; Mecheri, B. Framework for Modeling the Propagation of Disturbances in Smart Construction Sites. In Proceedings of the 21st International Conference on Smart Business Technologies, Dijon, France, 9–11 July 2024; pp. 80–87. [Google Scholar]
Manzoor, H.U.; Shabbir, A.; Chen, A.; Flynn, D.; Zoha, A. A Survey of Security Strategies in Federated Learning: Defending Models, Data, and Privacy. Future Internet 2024, 16, 374. [Google Scholar] [CrossRef]
Sileikis, V.; Wang, W. Smart Contract for Relay Verification Collaboration Rewarding in NOMA Wireless Communication Networks. Electronics 2025, 14, 706. [Google Scholar] [CrossRef]
Wu, J.; Qiu, Z.; Dai, M.; Bao, J.; Xu, X.; Cao, W. Distributed Sequential Detection for Cooperative Spectrum Sensing in Cognitive Internet of Things. Sensors 2024, 24, 688. [Google Scholar] [CrossRef]
He, J.; Guo, S.; Pan, G.; Yang, Y.; Liu, D. Relay cooperation and outage analysis in cognitive radio networks with energy harvesting. IEEE Syst. J. 2016, 12, 2129–2140. [Google Scholar]
Li, H.; Zhao, X. Throughput maximization with energy harvesting in UAV-assisted cognitive mobile relay networks. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 197–209. [Google Scholar]
Geirhofer, S.; Tong, L.; Sadler, B.M. Cognitive radios for dynamic spectrum access-dynamic spectrum access in the time domain: Modeling and exploiting white space. IEEE Commun. Mag. 2007, 45, 66–72. [Google Scholar]
Lai, J.; Dutkiewicz, E.; Liu, R.P.; Vesilo, R. Opportunistic spectrum access with two channel sensing in cognitive radio networks. IEEE Trans. Mob. Comput. 2013, 14, 126–138. [Google Scholar] [CrossRef]
Kang, S.; Joo, C. Low-complexity learning for dynamic spectrum access in multi-user multi-channel networks. IEEE Trans. Mob. Comput. 2020, 20, 3267–3281. [Google Scholar] [CrossRef]
Zhang, Z.; Lu, Y.; Huang, Y.; Zhang, P. Neural network-based relay selection in two-way SWIPT-enabled cognitive radio networks. IEEE Trans. Veh. Technol. 2020, 69, 6264–6274. [Google Scholar] [CrossRef]
Lv, P.; Fu, M.; Zhuo, Y.; Zhao, H.; Zhang, J. A dynamic spectrum access method based on Q-learning. In Proceedings of the 2020 International Workshop on Electronic Communication and Artificial Intelligence, Shanghai, China, 12–14 June 2020; pp. 135–141. [Google Scholar]
Cao, H.; Tian, H.; Cai, J.; Alfa, A.S.; Huang, S. Dynamic load-balancing spectrum decision for heterogeneous services provisioning in multi-channel cognitive radio networks. IEEE Trans. Wirel. Commun. 2017, 16, 5911–5924. [Google Scholar] [CrossRef]
Naparstek, O.; Cohen, K. Deep multi-user reinforcement learning for distributed dynamic spectrum access. IEEE Trans. Wirel. Commun. 2018, 18, 310–323. [Google Scholar] [CrossRef]
Zheng, K.; Jia, X.; Chi, K.; Liu, X. DDPG-based joint time and energy management in ambient backscatter-assisted hybrid underlay CRNs. IEEE Trans. Commun. 2022, 71, 441–456. [Google Scholar] [CrossRef]
Zhang, S.; Lam, K.-Y.; Shen, B.; Wang, L.; Li, F. Dynamic spectrum access for Internet-of-Things with hierarchical federated deep reinforcement learning. Ad Hoc Netw. 2023, 149, 103257. [Google Scholar] [CrossRef]
Fu, Y.; Shen, Y.; Tang, L. A Dynamic Task Allocation Framework in Mobile Crowd Sensing with D3QN. Sensors 2023, 23, 6088. [Google Scholar] [CrossRef]
Bai, Y.; Wang, D.; Huang, G.; Song, B. A deep-reinforcement-learning-based social-aware cooperative caching scheme in D2D communication networks. IEEE Internet Things J. 2023, 10, 9634–9645. [Google Scholar] [CrossRef]
Zhuang, X.; Luo, C.; Xie, Z.; Li, Y.; Jiang, L. Age-Aware Scheduling for Federated Learning with Caching in Wireless Computing Power Networks. Electronics 2025, 14, 663. [Google Scholar] [CrossRef]
Abad, M.S.H.; Ozfatura, E.; Gunduz, D.; Ercetin, O. Hierarchical federated learning across heterogeneous cellular networks. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 4–8 May 2020; pp. 8866–8870. [Google Scholar]
Liu, L.; Zhang, J.; Song, S.H.; Letaief, K.B. Client-edge-cloud hierarchical federated learning. In Proceedings of the 2020-2020 IEEE International Conference on Communications, Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
Wu, Q.; He, K.; Chen, X. Personalized federated learning for intelligent IoT applications: A cloud-edge based framework. IEEE Open J. Comput. Soc. 2020, 1, 35–44. [Google Scholar] [CrossRef]
Li, F.; Shen, B.; Guo, J.; Lam, K.-Y.; Wei, G.; Wang, L. Dynamic spectrum access for internet-of-things based on federated deep reinforcement learning. IEEE Trans. Veh. Technol. 2022, 71, 7952–7956. [Google Scholar]
Kostas, K.; Just, M.; Lones, M.A. IoTDevID: A behavior-based device identification method for the IoT. IEEE Internet Things J. 2022, 9, 23741–23749. [Google Scholar]
Gebremichael, T.; Ledwaba, L.P.I.; Eldefrawy, M.H.; Hancke, G.P.; Pereira, N.; Gidlund, M.; Akerberg, J. Security and Privacy in the Industrial Internet of Things: Current Standards and Future Challenges. IEEE Access 2020, 8, 152351–152366. [Google Scholar] [CrossRef]
Dao, N.N.; Tu, N.H.; Hoang, T.D.; Nguyen, T.H.; Nguyen, L.V.; Lee, K.; Park, L.; Na, W.; Cho, S. A review on new technologies in 3GPP standards for 5G access and beyond. Comput. Netw. 2024, 245, 110370. [Google Scholar]

Figure 1. Multi-modal multi-user access to a unified BS in IoT application scenarios for construction sites.

Figure 2. Multi-site, multi-modal, multi-user IoT intelligent access framework.

Figure 3. MD3QN processing workflow.

Figure 4. Average access success rate.

Figure 5. Average reward value.

Figure 6. BER analysis of different models.

Table 1. 26-Dimensional IoT traffic features.

Category	Features
Link-layer protocol	ARP/LLC
Network-layer protocol	IP/ICMP/ICMP6/EAPOL/payload_l
Transport-layer protocol	TCP/UDP/TCP_w_size
Application-layer protocol	HTTP/HTTPS/DHCP/BOOTP/SSDP/DNS/MDPS/NTP
IP selection	Padding/RouterAlert/count
Packet content	Size/Raw data
IP address	Destination IP counter
Port type	Source/Destination

Table 2. Weight coefficient allocation table.

N	a	b	c
0	0.2	0.1	0.7
1	0.2	0.2	0.6
2	0.3	0.3	0.4
3	0.5	0.3	0.2

Table 3. Smart construction site configuration.

User Type	Service Mode	Typical Devices
Environmental Monitoring Terminal	0	Temperature/Humidity/PM2.5 Sensors
Voice Communication Device	1	Smart Safety Helmets, Emergency Intercoms
Personnel Positioning Tag	2	UWB Positioning Modules, RFID Terminals
Video Surveillance Device	3	4K HD Cameras, Drone Inspection Systems

Table 4. Simulation parameter settings.

Parameter	Setting
Channel Count	80
Learning Rate	0.95
Discount Factor	0.96
Batch Size	50
ε	0.3
Optimizer	Adam
Multi-modal Service Distribution	Video 30%, Text 20%, Audio 10%, Image 40%
User Count	300
Construction Site Count	20
Initial Experience Pool Size	50,000
Priority Sampling Coefficient	0.6
Target Network Update Period	200
FL Noise Scale	0.01
Data Encryption Rate	100%
Data Leakage Risk Threshold	0.01%
Air Interface Latency	5 ms
Mobility Support	High Speed
Bandwidth	20 MHz
Operating Frequency Band	2.6 GHz

Table 5. Comparison of complexity, federated communication overhead, and training time across different algorithms.

Model	Complexity	Federated Communication Overhead	Training Time (min)
DQN [21]	O(d²)	None	143
MD3QN	O(0.85d²)	None	135
FL-DQN [22]	O(1.5d²)	High	118
FL-MD3QN	O(0.72d²)	Low	110
FL-DDQN [20]	O(1.5d²)	Relatively High	119

Table 6. Average latency comparison of different models across multi-modal service types.

Model	Delay (s)
Model	Text	Audio	Image	Video
DQN	0.80 (±0.12)	0.90 (±0.15)	1.30 (±0.20)	3.40 (±0.45)
MD3QN	0.60 (±0.10)	0.75 (±0.12)	1.00 (±0.18)	2.80 (±0.35)
FL-DQN	0.44 (±0.08)	0.49 (±0.07)	0.96 (±0.15)	2.20 (±0.30)
FL-MD3QN	0.14 (±0.03)	0.25 (±0.05)	0.55 (±0.10)	1.50 (±0.20)
FL-DDQN	0.22 (±0.04)	0.29 (±0.06)	0.78 (±0.12)	1.90 (±0.25)

Table 7. Performance analysis of different reward functions.

Scheme	Average Reward Value	Video Latency (s)	Video Rate (Gbps)	BER (10⁻⁴)
Scheme 1	5.23	1.50	3.8	0.45
Scheme 2	4.12	2.10	3.2	1.20
Scheme 3	3.85	3.50	4.1	3.80
Scheme 4	4.75	1.80	3.5	2.50

Table 8. Average transmission rate comparison of different models across multi-modal service types.

Model	Transmission Rate (Gbps)
Model	Text	Audio	Image	Video
DQN	4.8	4.5	2.5	2.2
MD3QN	5.0	4.7	3.3	2.4
FL-DQN	5.1	5.3	4.0	2.4
FL-MD3QN	5.7	5.6	4.4	3.8
FL-DDQN	5.5	5.4	4.1	3.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zong, Q.; Xu, J.; Li, W.; Pan, F.; Wang, W.; Liao, Y.; Liao, Y. FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites. Electronics 2025, 14, 1372. https://doi.org/10.3390/electronics14071372

AMA Style

Zong Q, Xu J, Li W, Pan F, Wang W, Liao Y, Liao Y. FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites. Electronics. 2025; 14(7):1372. https://doi.org/10.3390/electronics14071372

Chicago/Turabian Style

Zong, Qiangwen, Jiaxiang Xu, Wenqiang Li, Feng Pan, Wenting Wang, Yang Liao, and Yong Liao. 2025. "FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites" Electronics 14, no. 7: 1372. https://doi.org/10.3390/electronics14071372

APA Style

Zong, Q., Xu, J., Li, W., Pan, F., Wang, W., Liao, Y., & Liao, Y. (2025). FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites. Electronics, 14(7), 1372. https://doi.org/10.3390/electronics14071372

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FL-MD3QN-Based IoT Intelligent Access Algorithm for Smart Construction Sites

Abstract

1. Introduction

2. Problem Description

3. FL-MD3QN-Based Intelligent Access

3.1. Data Preprocessing

3.2. System Model

3.2.1. MD3QN Algorithm

3.2.2. FAVG Algorithm

3.3. The Training and Deployment of the FL-MD3QN Network

4. Simulation and Analysis

4.1. Simulation Parameter Settings

4.2. Simulation Result Analysis

5. Conclusions and Future Work

5.1. Conclusions

5.2. Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI