Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach

Khatami, Sarvenaz Sadat; Shoeibi, Mehrdad; Salehi, Reza; Kaveh, Masoud

doi:10.3390/jsan14010018

Open AccessArticle

Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach

¹

Department of Data Science Engineering, University of Houston, Houston, TX 77204, USA

²

The WPI Business School, Worcester Polytechnic Institute, Worcester, MA 01605, USA

³

Department of Information and Communication Engineering, Aalto University, 02150 Espoo, Finland

^*

Author to whom correspondence should be addressed.

J. Sens. Actuator Netw. 2025, 14(1), 18; https://doi.org/10.3390/jsan14010018

Submission received: 22 December 2024 / Revised: 3 February 2025 / Accepted: 7 February 2025 / Published: 10 February 2025

(This article belongs to the Special Issue Applications of Wireless Sensor Networks: Innovations and Future Trends)

Download

Browse Figures

Versions Notes

Abstract

Wireless sensor networks (WSNs) are a cornerstone of modern Internet of Things (IoT) infrastructure, enabling seamless data collection and communication for many IoT applications. However, the deployment of WSNs in remote or inaccessible locations poses significant challenges in terms of energy efficiency and secure communication. Sensor nodes, with their limited battery capacities, require innovative strategies to minimize energy consumption while maintaining robust network performance. Additionally, ensuring secure data transmission is critical for safeguarding the integrity and confidentiality of IoT systems. Despite various advancements, existing methods often fail to strike an optimal balance between energy efficiency and quality of service (QoS), either depleting limited energy resources or compromising network performance. This paper introduces a novel framework that integrates double reconfigurable intelligent surfaces (RISs) into WSNs to enhance energy efficiency while ensuring secure communication. To jointly optimize both RIS phase shift matrices, we employ a fuzzy deep reinforcement learning (FDRL) framework that integrates reinforcement learning (RL) with fuzzy logic and long short-term memory (LSTM)-based architecture. The RL component learns optimal actions by iteratively interacting with the environment and updating Q-values based on a reward function that prioritizes both energy efficiency and secure communication. The LSTM captures temporal dependencies in the system state, allowing the model to make more informed predictions about future network conditions, while the fuzzy logic layer manages uncertainties by using optimized membership functions and rule-based inference. To explore the search space efficiently and identify optimal parameter configurations, we use the advantage of the multi-objective artificial bee colony (MOABC) algorithm as an optimization strategy to fine-tune the hyperparameters of the FDRL framework while simultaneously optimizing the membership functions of the fuzzy logic system to improve decision-making accuracy under uncertain conditions. The MOABC algorithm enhances convergence speed and ensures the adaptability of the proposed framework in dynamically changing environments. This framework dynamically adjusts the RIS phase shift matrices, ensuring robust adaptability under varying environmental conditions and maximizing energy efficiency and secure data throughput. Simulation results validate the effectiveness of the proposed FDRL-based double RIS framework under different system configurations, demonstrating significant improvements in energy efficiency and secrecy rate compared to existing methods. Specifically, quantitative analysis demonstrates that the FDRL framework improves energy efficiency by 35.4%, the secrecy rate by 29.7%, and RSMA by 27.5%, compared to the second-best approach. Additionally, the model achieves an R² score improvement of 12.3%, confirming its superior predictive accuracy.

Keywords:

wireless sensor networks; energy efficiency; quality of service; secrecy rate; double reconfigurable intelligent surface; fuzzy reinforcement learning

1. Introduction

Wireless sensor networks (WSNs) are a key enabler of modern Internet of Things (IoT) systems, consisting of spatially distributed sensor nodes that monitor physical or environmental parameters such as temperature, humidity, pressure, and motion [1,2,3]. These networks seamlessly collect, process, and transmit data to centralized systems for analysis and decision-making [4]. WSNs have become integral to various applications, including smart grids for real-time energy management, healthcare systems for remote patient monitoring, industrial automation for predictive maintenance, and environmental monitoring for tracking pollution levels or natural disasters [5]. In smart cities, WSNs facilitate efficient infrastructure management, traffic optimization, and improved public safety. The versatility and scalability of WSNs make them an essential component in building smarter and more connected environments [6,7].

Despite their wide-ranging applications, WSNs face significant challenges due to their resource-constrained nature. Sensor nodes are typically powered by batteries with limited capacity, which poses a major limitation, especially in remote or inaccessible deployment areas where frequent battery replacement is impractical [8,9,10]. Energy efficiency is critical to ensure long network operation, as excessive energy consumption impacts network coverage, throughput, and overall quality of service (QoS) [11,12,13,14]. In addition, maintaining consistent communication under varying environmental conditions adds to the complexity. To address these issues, energy harvesting (EH) has emerged as a promising solution, enabling sensor nodes to harness energy from ambient sources such as solar, thermal, or radio frequency signals [15,16,17]. EH not only enhances the longevity of WSNs but also reduces maintenance costs, making WSNs more sustainable for long-term operation.

Reconfigurable intelligent surfaces (RISs) have recently gained attention as an innovative technology to overcome key limitations in wireless communication systems, including WSNs [18,19,20,21,22]. An RIS is a passive, programmable surface composed of a large number of reflecting elements that can intelligently manipulate the propagation of electromagnetic waves. By dynamically adjusting the phase shifts of these elements, RIS enhances signal propagation, energy efficiency, and coverage in WSNs, even under adverse conditions [23]. In WSNs, RIS plays a crucial role in optimizing EH, extending network coverage, and improving throughput without adding significant hardware complexity. The integration of RIS into WSNs enables cost-effective, scalable, and energy-efficient solutions that address the dual challenges of resource constraints and reliable communication in IoT-driven applications [24,25,26].

1.1. Research Gaps and Motivations

WSNs are resource-constrained systems where energy efficiency plays a pivotal role in ensuring their long-term sustainability. However, achieving efficient EH in WSNs is inherently challenging due to several technical factors [27,28]. First, sensor nodes are typically deployed in remote or inaccessible locations, where energy resources are often sporadic, intermittent, and unpredictable. Harvesting energy from ambient sources such as solar, thermal, or radio frequency signals is highly dependent on environmental conditions, which can vary significantly over time and space. This variability makes it difficult to maintain consistent energy generation to meet network demands. Furthermore, the dynamic nature of the WSN operation, including irregular sensing and transmission activities, further complicates the EH process, requiring robust and adaptive energy management mechanisms [29,30,31]. Although various EH schemes have been proposed in the literature, most approaches are tailored to specific scenarios or energy sources and fail to address the multifaceted challenges associated with efficient EH in real-world WSN deployments. There remains a lack of robust frameworks capable of dynamically adapting to changing energy availability while maintaining network performance. Therefore, a more holistic approach that integrates efficient EH with optimization of network operation is required.

Coverage in traditional WSNs is another critical issue, mainly due to the energy, computational, and resource limitations of the sensor nodes. Most sensor nodes operate on limited battery power and have minimal computational capacity, which restricts their sensing range and communication capabilities [32,33,34]. Moreover, many WSNs rely on backscattering communication, a low-power method that reflects ambient signals for data transmission. Although backscattering is energy efficient, it suffers from significant limitations, including low transmission range, reduced signal quality, and increased susceptibility to environmental interference [35,36]. These limitations further hinder reliable communication and network connectivity in WSNs. The restricted coverage and communication reliability also impact the physical layer security (PLS) of WSNs. Achieving a higher secrecy rate (SR) in WSNs becomes challenging under such constraints, as energy and computational limitations restrict the implementation of advanced security mechanisms. Thus, ensuring secure communication while maintaining energy efficiency requires novel strategies that can address these intertwined challenges simultaneously [37,38,39]. QoS is typically evaluated based on parameters such as latency, throughput, and reliability. However, in security-constrained WSNs, QoS must also encompass secure communication by maintaining an acceptable SR to prevent unauthorized interception. In this work, QoS-aware communication is defined as achieving a balance between maximizing SR and energy efficiency, ensuring that secure data transmission is maintained while optimizing resource consumption.

Most existing EH schemes for WSNs focus solely on improving energy efficiency without adequately considering QoS parameters such as network performance, reliability, and security. QoS plays a crucial role in enabling WSNs to deliver consistent performance, especially in mission-critical applications like environmental monitoring, smart cities, and healthcare [40]. A scheme that can address both EH and QoS metrics, including network performance and security (e.g., achieving higher secrecy rates), remains largely absent in the existing literature. To overcome these challenges, this paper introduces a novel framework that integrates double RIS into WSNs. Double RIS significantly enhances EH and communication reliability by intelligently controlling and optimizing the reflection of electromagnetic waves, thus improving signal propagation, coverage, and energy efficiency. To achieve this, we propose a fuzzy deep reinforcement learning (FDRL)-based optimization scheme integrated with long short-term memory (LSTM). This optimization approach dynamically adjusts the phase shift matrices of the RIS while managing uncertainties in the network environment using fuzzy logic. The proposed framework not only maximizes EH but also ensures QoS-aware secure communication, achieving higher secrecy rates and robust network performance.

1.2. Paper Contributions and Organization

Based on the extinct research gaps and the importance of energy efficiency and secure communication in WSNs, the main contributions of this paper are summarized as follows:

We present innovative communication and optimization frameworks for WSNs that leverage double RIS to concurrently tackle EH limitations, expand coverage area, and enhance achievable SR, thereby effectively resolving the complex trade-off between energy efficiency and QoS in WSNs.
To achieve these goals, we introduce a novel FDRL framework that processes raw input data, including sensor states and environmental parameters, through an input layer, where the data are preprocessed and structured for further analysis. These data are then passed into the DRL component, leveraging an LSTM-based architecture to capture temporal patterns and dependencies. By leveraging sequential data analysis, the LSTM component enables precise prediction of future states, which is crucial for dynamic and resource-constrained IoT environments. Simultaneously, the input data are directed to the fuzzy logic layer, where they are processed through optimized membership functions and fuzzy rules. This integration translates uncertain inputs into actionable insights, providing robust decision-making capabilities.
Furthermore, we take advantage of a multi-objective artificial bee colony (MOABC) algorithm that plays a vital role in the learning process in FDRL by optimizing both the membership functions in the fuzzy layer and the hyperparameters of the DRL architecture, ensuring that the framework adapts effectively to diverse real-world scenarios with minimal human intervention. The outputs from the DRL and fuzzy layers converge in the output layer, where decisions derived from temporal prediction and uncertainty management are integrated. This fusion ensures that the final actions are not only energy-efficient but also maintain QoS above specified thresholds. This multi-layered optimization approach establishes a balance between performance and interpretability, making the framework both adaptive and explainable.
We conduct extensive simulations under various experimental settings. The results demonstrate that the proposed FDRL framework exhibits superior performance across various metrics, including root mean square error (RMSE), mean absolute percentage error (MAPE), standard deviation (SD), runtime, and convergence trends compared with other optimization frameworks. Additionally, the results show that optimizing RIS elements with the proposed FDRL approach significantly improves energy harvesting and secrecy rate compared to other optimization techniques and conventional WSN scenarios, including those using a single RIS or without RIS integration.

The remainder of this paper is organized as follows: Section 2 investigates the related work of ML-based approaches for energy-efficient and secure WSNs. Section 3 describes the system model and problem formulation. Section 4 presents the materials and methodology of the proposed FDRL framework. Section 5 provides the simulation results, highlighting the performance of the proposed approach. Section 6 includes the discussion and comparison of the results with existing frameworks. Finally, Section 7 concludes the paper with a summary of the findings and potential future research directions.

2. Related Works

WSNs have been widely studied to address energy efficiency challenges, particularly with the integration of EH techniques. Several notable works have explored optimization strategies to improve energy efficiency, throughput, and secure communication in EH WSNs. Zheng et al. [41] investigated sensor activation control for green energy optimization in WSNs for EH (EH-WSNs). Their work addressed the spatial and temporal diversities of energy generation and target distribution, proposing a two-dimensional optimization: dynamic mode adaptation for temporal energy balancing and game theory-based spatial optimization. In addition, reinforcement learning techniques were introduced to manage temporal mode adaptation in dynamic environments, demonstrating effectiveness through simulations. Yang et al. [42] proposed a distributed optimization framework for EH-WSNs, incorporating in-network data processing operations such as aggregation and compression. They formulated a stochastic optimization problem to jointly optimize edge operations involving sensing, networking, and CPU-intensive operations. A lightweight online algorithm, recycling wasted energy (RWE), was introduced and validated using the FIT IoT-LAB testbed and simulations, achieving substantial energy recycling and network utility improvements.

Ma et al. [43] explored throughput optimization in industrial WSNs with EH relays. Their work considered the reliability constraint of industrial information transmission, addressing the non-convex optimization problem using the successive convex approximation approach. A power allocation algorithm was also designed to maximize the total transmission rate, with simulation results validating throughput improvements under reliability constraints. Ghosh et al. [44] focused on optimizing energy efficiency in EH-WSNs by modeling the problem as a stochastic multi-armed bandit framework. With limited channel state information, an algorithm based on the upper confidence bound was proposed to learn the optimal power levels for wireless energy transmission. The numerical results demonstrated the significant energy efficiency gains achieved compared to the benchmark approaches. Marriwala [45] designed an EH system optimized using high-bandwidth rectennas for WSNs. The rectenna, which operates both in narrowband and broadband mode (800 MHz–7 GHz), harvests RF energy from various sources such as mobile phones, Wi-Fi, and broadcasting systems. The proposed system offered a versatile solution for EH, eliminating frequent battery replacements and enabling remote operations. Azarhava and Niya [46] presented an energy-efficient resource allocation method for TDMA-based wireless EH sensor networks (WEHSN). Their work derived a closed-form solution using the Dinkelbach method and Karush–Kuhn–Tucker (KKT) conditions to optimize energy efficiency under time-scheduling and power constraints. The numerical results confirmed the effectiveness of their approach in extending the useful life of the sensor.

Pitchai [47] investigated the maximization of energy efficiency in wireless-powered sensor networks (WPSNs). The problem was formulated as a nonlinear fractional programming model and transformed using the Dinkelbach method. In addition, a particle swarm optimization (PSO)-based algorithm was proposed to solve the convex optimization problem, achieving significant improvements in energy efficiency. Gupta et al. [48] addressed the energy constraints of WSNs, which are critical for applications in smart cities, smart parking, and smart buildings. The study focuses on overcoming the limited energy capacity of WSN nodes by utilizing an efficient solar EH system. To maximize the network lifetime, the authors proposed an MPPT-EPO optimized solar EH technique, integrating the emperor penguin optimization (EPO) algorithm to optimize the maximum power point tracking (MPPT) process. The system uses a SEPIC converter to boost the electrical energy generated from solar panels, ensuring sufficient voltage for charging batteries and supplying energy to various sensor nodes. The proposed approach significantly enhances EH efficiency, leading to prolonged WSN lifetime and reliable network operation.

Recently, various ML-based techniques have been developed to enhance the performance and security of WSNs. These approaches leverage advanced optimization, clustering, routing, and intrusion detection methods to address challenges such as energy efficiency, data reliability, and network security in dynamic and resource-constrained environments. Yun and Yoo [49] proposed a Q-learning-based data-aggregation-aware energy-efficient routing algorithm for WSNs. The algorithm integrates reinforcement learning to optimize the routing path by considering the degrees of possible data aggregation and energy metrics. Simulation results demonstrated the protocol’s effectiveness in reducing data redundancy and extending network lifetime compared to conventional methods. Zhu et al. [50] investigated UAV-aided WSNs, proposing a deep reinforcement learning (DRL)-based Ptr-A* model to optimize UAV trajectory planning for minimizing energy consumption. Their method efficiently learned the UAV’s trajectory while generalizing well to WSNs with varying cluster sizes, outperforming baseline techniques. Lakshmanna et al. [51] introduced an improved metaheuristic-driven energy-aware cluster-based routing (IMD-EACBR) model for IoT-assisted WSNs. The model integrates Archimedes optimization for cluster organization and teaching–learning-based optimization for multi-hop routing, achieving enhanced energy efficiency, load balancing, and prolonged network lifetime. Kumar et al. [52] addressed security and energy efficiency in WSNs using an improved deep convolutional neural network (IDCNN) for malicious node detection and energy-efficient data transmission. Their approach combined IDCNN with t-DSBO-based cluster head selection, significantly improving security and energy utilization compared to existing techniques.

Ghadi et al. [53] explored the application of ML techniques in securing WSNs. Their study highlighted challenges in ML adaptation for security and discussed various algorithms’ roles in enhancing safety and energy efficiency in WSNs, emphasizing unresolved issues in the domain. Zhong et al. [54] proposed a Q-learning-based vegetation evolution algorithm (QVEGE) for WSN coverage optimization. By incorporating exploitation and exploration archives, the QVEGE balanced these aspects effectively, outperforming state-of-the-art metaheuristics on WSN coverage and optimization tasks. Bukhari et al. [55] developed a federated learning-based SCNN-Bi-LSTM intrusion detection system for WSNs. This model enhanced privacy and detection performance by leveraging federated learning for collaborative training, achieving superior accuracy in identifying complex attacks compared to traditional IDSs. Mahmood et al. [56] proposed an energy-optimized data fusion approach for WSN-IoT networks. Their RNN-LSTM-based method dynamically routed and balanced loads, improving energy efficiency, latency, and throughput. The approach demonstrated significant advantages over conventional routing methods. Rajasoundaran et al. [57] introduced a secure and optimized intrusion detection system for UWSNs. The model employed LSTM-MAC principles and GAN-based channel assessments to enhance security and reliability under challenging underwater conditions, outperforming existing techniques by 5–10% across multiple metrics.

3. System Model and Problem Formulation

Figure 1 shows the system model under consideration in this paper, a double-RIS-aided WSN communication system. The network consists of an RF source, a sensor node (SN), two RISs (

{RIS}_{1}

and

{RIS}_{2}

), an eavesdropper (Eve), and a gateway (GW). The RF source transmits energy signals that the SN utilizes for both EH and secure communication. Two RISs assist in reflecting and enhancing RF signals, where

{RIS}_{1}

, equipped with

N_{1}

passive elements, is deployed to enhance the harvested energy at the SN, and

{RIS}_{2}

, with

N_{2}

passive elements, is responsible for improving the signal-to-noise ratio (SNR) and the achievable secrecy rate at GW [58,59,60]. The RF source serves as the primary energy supplier, transmitting continuous unmodulated RF signals over designated frequency bands. These signals can be easily separated from the information-bearing signals at GW using interference cancellation techniques. The RF source could be a dedicated power transmitter within the WSN setting or an ambient RF source, such as a wireless access point, a base station, or other RF-emitting devices. SN is a key component of the system and is responsible for both EH and data transmission. The SN harvests energy from the unmodulated RF signals received from the RF source, which enables it to operate autonomously in resource-constrained environments. The energy harvested is used to power the SN’s operations, including sensing, processing, and transmitting data. The SN transmits its sensed data to the GW, either directly or via the RIS-assisted reflected paths, depending on the channel conditions. Due to the limited energy availability, optimizing the harvested energy at the SN is crucial for extending its operational lifetime and ensuring sustainable functionality.

GW serves as the data aggregation point in the WSN. The GW receives the data transmitted by the SN, either through the direct channel or via the reflected signals from

{RIS}_{2}

, which improves the SNR at the receiver. Advanced signal processing techniques are applied at the GW to separate the unmodulated energy signals from the information-bearing signals. The GW ensures reliable communication by mitigating interference and noise, decoding the received data accurately, and forwarding it to a central server or decision-making system for further analysis. GW plays a critical role in maintaining network performance by ensuring high-quality signal reception and processing. Eve represents a passive adversary attempting to intercept the transmitted information from the SN to the GW. Eve can receive signals through both the direct channel and the reflected paths from

{RIS}_{2}

, which poses a significant security threat to the WSN. To achieve secure communication, it is essential to degrade the signal strength at Eve while enhancing the SNR at the GW. The presence of Eve necessitates the maximization of the achievable secrecy rate, which quantifies the difference between the rates at the legitimate receiver (GW) and Eve. Ensuring a higher SR prevents Eve from successfully decoding the transmitted information, thereby safeguarding the confidentiality of the communication in the network.

The communication channels in the network experience Rayleigh fading due to the likelihood of obstacles existing in WSNs, and all channel state information (CSI) is assumed to be perfectly known at RISs [61,62]. Let

h_{P S}

denote the direct channel from the RF source to the SN, and

h_{S G}

represent the direct channel from the SN to the GW. Similarly,

h_{S E}

captures the direct channel between the SN and Eve. The reflected channels through

{RIS}_{1}

and

{RIS}_{2}

are influenced by the phase shift matrices of the RISs. Let

Θ_{1} = diag (e^{j ϕ_{1}}, e^{j ϕ_{2}}, . . ., e^{j ϕ_{N_{1}}})

and

Θ_{2} = diag (e^{j ϕ_{1}}, e^{j ϕ_{2}}, . . ., e^{j ϕ_{N_{2}}})

represent the phase shift matrices of

{RIS}_{1}

and

{RIS}_{2}

, respectively, where

ϕ_{n}

denotes the phase shift of the n-th RIS element.

The RF signals reflected via

{RIS}_{1}

and

{RIS}_{2}

reach the SN and GW, respectively. The channels between the RF source and

{RIS}_{1}

,

{RIS}_{1}

and SN,

{RIS}_{2}

and SN, and

{RIS}_{2}

and GW are denoted as

h_{P R 1} \in C^{N_{1} \times 1}

,

h_{S R 1} \in C^{N_{1} \times 1}

,

h_{S R 2} \in C^{N_{2} \times 1}

, and

h_{G R 2} \in C^{N_{2} \times 1}

, respectively. Additionally, the reflected channel from

{RIS}_{2}

to Eve is denoted as

h_{E R 2} \in C^{N_{2} \times 1}

. The received signal at the SN, including contributions from both the direct and reflected paths, can be expressed as follows:

y_{SN} = \sqrt{P} h_{P S} s + \sqrt{P} h_{P R 1}^{H} Θ_{1} h_{S R 1} s + n_{SN},

(1)

where P is the transmit power of the RF source, s is the transmitted signal with

E [| s |^{2}] = 1

, and

n_{SN} \sim CN (0, σ^{2})

represents the additive white Gaussian noise (AWGN) at the SN. The reflected signal via

{RIS}_{1}

improves the received signal strength at the SN, enabling efficient energy harvesting. The harvested energy at the SN can be modeled as follows:

P_{EH} = η (| h_{P S} |^{2} P + {| h_{P R 1}^{H} Θ_{1} h_{S R 1} |}^{2} P),

(2)

where

η \in (0, 1)

denotes the energy conversion efficiency of the SN. In addition to energy harvesting, the SN transmits its data to the GW using the harvested energy. The received signal at the GW, considering both the direct path and the reflection from

{RIS}_{2}

, can be written as follows:

\begin{matrix} y_{GW} = & β (\sqrt{P} h_{S G} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}}) s \\ + \sqrt{P} h_{S R_{2}}^{H} Θ_{2} h_{G R_{2}} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}}) s) + n_{GW}, \end{matrix}

(3)

where x is the transmitted signal with

E [| x |^{2}] = 1

, and

n_{GW} \sim CN (0, σ^{2})

represents AWGN at the GW. The signal received at Eve can similarly be modeled as follows:

\begin{matrix} y_{Eve} = & β (\sqrt{P} h_{S E} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}}) s \\ + \sqrt{P} h_{S R_{2}}^{H} Θ_{2} h_{E R_{2}} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}}) s) + n_{Eve} . \end{matrix}

(4)

The presence of Eve introduces a significant security challenge, as Eve attempts to decode the SN’s transmission. To ensure secure communication between the SN and GW, the achievable secrecy rate (SR) is considered. The signal-to-interference-plus-noise ratio (SINR) at the GW and Eve is derived based on the received signals. Specifically, the SINR at the GW can be expressed as follows:

γ_{GW} = \frac{η^{2} P {|h_{S G} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}}) + h_{S R_{2}}^{H} Θ_{2} h_{G R_{2}} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}})|}^{2}}{σ_{G}^{2}},

(5)

where

σ_{G}^{2}

represents the noise variance in GW. Similarly, the SINR at Eve is given by the following:

γ_{Eve} = \frac{β^{2} P {|h_{S E} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}}) + h_{S R_{2}}^{H} Θ_{2} h_{E R_{2}} (h_{P S} + h_{P R_{1}}^{H} Θ_{1} h_{S R_{1}})|}^{2}}{σ_{E}^{2}} .

(6)

where

σ_{E}^{2}

is the noise variance in Eve. The achievable secrecy rate

{\bar{C}}_{s}

is then defined as the difference between the rates at the GW and Eve:

{\bar{C}}_{s} = {[{log}_{2} (1 + γ_{GW}) - {log}_{2} (1 + γ_{Eve})]}^{+},

(7)

where

{[x]}^{+} = max (0, x)

ensures that the secrecy rate is non-negative. In this paper, QoS-aware communication refers to the ability of the system to maintain acceptable levels of SR, therefore, our main objective is to maximize SR according to the predefined energy constraint by optimizing the phase shift matrices

Θ_{1}

and

Θ_{2}

.

The goal is to jointly optimize the phase shift matrices of

{RIS}_{1}

and

{RIS}_{2}

to achieve the following objectives: (a) maximize the energy harvested at the SN and (b) ensure secure communication by maximizing the achievable secrecy rate. The optimization problem can be formulated as follows:

\begin{matrix} P_{1} : & max_{Θ_{1}, Θ_{2}} {\bar{C}}_{s} \end{matrix}

(8a)

\begin{matrix} s . t . & η (| h_{P S} |^{2} P + {| h_{P R 1}^{H} Θ_{1} h_{S R 1} |}^{2} P) \geq P_{\min}, \end{matrix}

(8b)

\begin{matrix} |Θ_{1} (i, i)| = 1, \forall i = 1, 2, . . ., N_{1}, \end{matrix}

(8c)

\begin{matrix} |Θ_{2} (j, j)| = 1, \forall j = 1, 2, . . ., N_{2}, \end{matrix}

(8d)

where the objective function in (8a) maximizes the secrecy rate

{\bar{C}}_{s}

, and the constraint in (8b) ensures that the energy harvested at the SN meets the minimum power requirement

P_{\min}

. The constraints in (8c) and (8d) ensure that the phase shifts of

{RIS}_{1}

and

{RIS}_{2}

have unit modulus, which is a practical constraint for passive reflecting elements.

The optimization problem

P_{1}

is challenging due to its non-convex nature [63,64]. The objective function involves the logarithm of the SINR, which is nonlinear and non-convex. Moreover, the unit modulus constraints on

Θ_{1}

and

Θ_{2}

further complicate the problem. Traditional convex optimization techniques cannot directly solve such problems. To address these challenges, we propose a novel optimization framework, named FDRL, which leverages RL to explore the optimal phase shift matrices while using fuzzy logic and LSTM-based predictions to handle uncertainties and dynamic network conditions. The proposed framework ensures robust optimization of energy harvesting and secrecy rate in the studied double-RIS-aided WSN.

4. Materials and Methods

Figure 2 provides an overview of the proposed FDRL framework, which integrates fuzzy logic, DRL, and MOABC to address the dual objectives of energy efficiency and QoS optimization in WSNs. The modular and adaptive system processes raw data, predicts future states, manages uncertainty, and makes energy-efficient decisions. By combining the predictive capabilities of DRL, the interpretability of fuzzy logic, and the optimization power of MOABC, the framework ensures robust operation in dynamic and resource-constrained IoT environments. The input layer serves as the starting point, collecting and preprocessing raw data from sensors and environmental observations, including energy levels, signal quality, interference, and node parameters. Preprocessing ensures data consistency through normalization, noise reduction, and structuring for compatibility with the fuzzy logic and DRL modules. This structured input is then processed in parallel by the fuzzy logic and DRL layers, facilitating simultaneous analysis.

The fuzzy logic layer handles uncertainties in IoT data through optimized membership functions and rule-based inference. It outputs actionable recommendations under uncertainty, while the DRL component leverages an LSTM-based architecture to process sequential data and predict trends, enabling proactive decision-making. The MOABC algorithm optimizes both the fuzzy logic and DRL components, ensuring accurate input representation and efficient hyperparameter tuning for robust and adaptive performance. The output layer synthesizes decisions from the fuzzy logic and DRL modules to provide integrated, energy-efficient, and QoS-compliant actions, ensuring reliable performance in diverse IoT scenarios. By addressing both temporal dependencies and uncertainty, the proposed framework achieves effective optimization of WSN operations.

4.1. Proposed DRL

In this study, we introduce a robust DRL framework that combines RL, LSTM, and the MOABC algorithm to address the challenges of dynamic IoT environments. RL is an ML paradigm where an agent learns to make decisions by interacting with an environment. The agent observes the current state of the environment and selects an action based on its policy, which is a mapping of states to actions. After taking the action, the environment provides feedback in the form of a reward, and the agent transitions to a new state. The goal of RL is to maximize cumulative reward over time by learning an optimal policy through trial and error. Key components of RL include the state, action, reward, policy, and value functions, which help the agent evaluate the long-term benefits of specific actions in different states. RL operates in a sequential decision-making process, making it particularly effective for tasks with delayed rewards and complex dependencies. A major advantage of RL is its ability to adapt to dynamic and unknown environments without requiring explicit programming for all possible scenarios. It enables autonomous learning through exploration and exploitation, where the agent balances trying new actions to discover better outcomes (exploration) and using known actions that yield higher rewards (exploitation). RL’s flexibility makes it suitable for diverse applications, from game-playing to robotics and IoT, where systems operate under dynamic constraints and require continuous learning to optimize performance over time.

Traditional RL methods, while effective for problems with limited state and action spaces, face significant challenges when applied to high-dimensional and complex environments. These methods rely on tabular representations or simple function approximators, which become computationally infeasible and inaccurate as the state-action space grows. Moreover, traditional RL struggles to model long-term dependencies and temporal correlations in sequential data, making it unsuitable for dynamic and data-intensive applications like IoT systems. The lack of scalability and difficulty in handling continuous state-action spaces are major limitations of classical RL. DRL addresses these shortcomings by integrating RL with DL architectures, allowing the model to approximate value functions or policies in high-dimensional spaces. DRL leverages neural networks to extract meaningful features from raw input data, enabling the agent to learn directly from complex, unstructured inputs. This capability is particularly advantageous in IoT-driven environments, where the data are often high-dimensional, continuous, and exhibit temporal dependencies. By incorporating LSTM into the DRL framework, the model gains the ability to process sequential data and retain contextual information over time, improving the precision and robustness of decision-making.

The interaction between RL and LSTM is pivotal in the DRL framework. LSTM serves as a feature extractor, processing input sequences from the environment and encoding temporal patterns into a compact representation. This encoded representation, consisting of the hidden state and cell state, captures the dependencies across time steps, which is crucial for predicting future states in dynamic IoT environments. The RL component, in turn, utilizes this representation to evaluate actions and update its policy based on the expected long-term rewards. This synergy enables the agent to adapt to evolving conditions and make context-aware decisions, ensuring improved performance in scenarios with delayed rewards and temporal variability. In the proposed DRL, as illustrated in Figure 2, the input state is first passed through the LSTM layer, which dynamically updates its internal states (hidden and cell states) to reflect temporal dependencies. The output of the LSTM layer is then fed into the RL agent, which selects the optimal action based on the policy it has learned. The environment provides feedback in the form of rewards, which are used to fine-tune both the LSTM parameters and the RL policy.

This iterative process ensures continuous learning and adaptation, enabling the agent to handle complex sequential decision-making tasks with efficiency. Incorporating LSTM into DRL not only addresses the temporal limitations of traditional RL but also enhances the interpretability and scalability of the model. The LSTM’s gating mechanisms allow the model to selectively focus on relevant information while discarding irrelevant or outdated data, resulting in a more efficient and accurate decision-making process. This interaction between RL and LSTM forms the foundation of a robust framework capable of optimizing energy efficiency and maintaining QoS in IoT-driven wireless sensor networks.

LSTM networks are a powerful variant of recurrent neural networks (RNNs) specifically designed to address the challenges of sequential data processing and long-term learning dependencies. Traditional RNNs struggle to effectively model sequences with long temporal dependencies due to issues like vanishing gradients, limiting their capacity to retain relevant information over extended time steps. LSTMs overcome these limitations through a sophisticated architecture that incorporates memory cells and gating mechanisms, allowing them to selectively remember or forget information across different time steps. This makes LSTMs highly effective for tasks requiring temporal understanding, such as time-series analysis, speech recognition, and dynamic decision-making in IoT systems. The advantages of LSTM networks lie in their ability to handle both short-term and long-term dependencies, making them essential for applications where historical context influences present decisions. LSTMs maintain an internal memory, are updated dynamically based on the relevance of incoming information, and are robust to challenges like vanishing or exploding gradients. Furthermore, their versatility in processing variable-length sequences makes them adaptable to real-world scenarios, such as IoT-based wireless sensor networks, where data streams are continuous and exhibit temporal dependencies. By retaining meaningful patterns over time and discarding irrelevant data, LSTMs enhance the efficiency and accuracy of models in complex, dynamic environments.

The LSTM architecture is mathematically formulated using Equations (9)–(14), where various gates and states interact to process the input sequence and maintain a dynamic memory of relevant information. These gates (forget gate, input gate, and output gate) control the flow of information, allowing the model to focus on critical parts of the sequence while discarding irrelevant or outdated data. The forget gate determines which information from the previous cell state should be discarded, ensuring that the memory remains relevant. The input gate decides what new information to add to the memory cell based on the current input and the previous hidden state. The candidate generation step calculates potential updates to the memory cell and the cell state itself is updated by combining the retained information with new candidate values. Finally, the output gate decides which information from the memory cell should be passed to the hidden state, which serves as the output for the current time step. This structured mechanism makes LSTMs highly effective in learning temporal patterns and dependencies, making them a crucial component in environments like IoT, where sequential data are prevalent and context over time is essential.

\begin{matrix} f_{t} & = σ (W_{h f} h_{t - 1} + W_{i f} x_{t} + B_{h f} + B_{i f}), \end{matrix}

(9)

\begin{matrix} g_{t} & = tanh (W_{h g} h_{t - 1} + W_{i g} x_{t} + B_{h g} + B_{i g}), \end{matrix}

(10)

\begin{matrix} i_{t} & = σ (W_{h i} h_{t - 1} + W_{i i} x_{t} + B_{h i} + B_{i i}), \end{matrix}

(11)

\begin{matrix} o_{t} & = σ (W_{h o} h_{t - 1} + W_{i o} x_{t} + B_{h o} + B_{i o}), \end{matrix}

(12)

\begin{matrix} c_{t} & = (f_{t} ⊙ c_{t - 1}) + (g_{t} ⊙ i_{t}), \end{matrix}

(13)

\begin{matrix} h_{t} & = o_{t} ⊙ tanh (c_{t}) . \end{matrix}

(14)

where

f_{t}

shows the forget gate;

g_{t}

indicates the candidate generation;

i_{t}

denotes the input gate;

o_{t}

indicates the output gate;

c_{t}

is the updated cell state;

h_{t}

is the updated hidden state;

W_{h f}, W_{h g}, W_{h i}, W_{h o}, W_{i f}, W_{i g}, W_{i i}, W_{i o}

are weight matrices;

σ

is the sigmoid function; and

B_{h f}, B_{h g}, B_{h i}, B_{h o}, B_{i f}, B_{i g}, B_{i i}, B_{i o}

are bias terms. The combined architecture of RL and LSTM brings numerous advantages by enabling the model to learn from sequential data and adapt to dynamic environments. However, its performance is highly dependent on the optimal configuration of its hyperparameters. Parameters such as weights, biases, the number of hidden layers, and the number of neurons in each layer directly influence the learning efficiency and generalization ability of the model. Poorly tuned hyperparameters can lead to inefficient learning, overfitting, or underfitting, ultimately limiting the model’s performance in complex IoT scenarios. Therefore, optimizing these hyperparameters is crucial to ensure that the model effectively balances accuracy, resource consumption, and real-time adaptability.

In this context, optimizing weights and biases is essential as they determine the strength of connections within the neural network, influencing how information propagates and decisions are made. Additionally, the number of hidden layers and neurons impacts the model’s capacity to capture complex patterns. Too many layers or neurons can lead to overfitting and computational inefficiency, while too few can result in a lack of representational power. To address these challenges, the MOABC algorithm is employed as a robust optimization tool that simulates the intelligent foraging behavior of honeybee colonies to explore and exploit the search space efficiently.

The MOABC algorithm effectively optimizes the objectives outlined in Equations (15)–(17), minimizing the root mean square error to improve learning accuracy, reducing the number of hidden layers to enhance computational efficiency, and minimizing the number of neurons to balance complexity and resource usage. By fine-tuning these hyperparameters, the MOABC algorithm ensures that the RL-LSTM model achieves optimal performance. This approach is particularly beneficial in IoT-driven applications, where constraints on energy, latency, and computational resources demand highly efficient and adaptive solutions.

\begin{matrix} G_{1} (x) & = Minimize \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {[y_{i} - {\hat{y}}_{i}]}^{2}}, \end{matrix}

(15)

\begin{matrix} G_{2} (x) & = Minimize L, \end{matrix}

(16)

\begin{matrix} G_{3} (x) & = Minimize \sum_{l = 1}^{L} Q_{l}, \end{matrix}

(17)

where N is the number of samples,

y_{i}

is the observed parameter,

{\hat{y}}_{i}

is the calculated parameter, L is the total number of hidden layers, and

Q_{l}

is a binary indicator for the presence of a neuron in layer l.

Figure 3 illustrates the flow diagram of the proposed DRL model. As depicted in Figure 3, this architecture strategically integrates RL as the core decision-making mechanism, LSTM for capturing temporal dependencies and sequential patterns in data, and MOABC for hyperparameter optimization. This unified approach ensures a balanced trade-off between learning efficiency, decision accuracy, and computational resource management, making it highly suitable for complex IoT-driven applications such as wireless sensor networks. The proposed model begins by feeding the current state, extracted from the environment, into the LSTM layer, which processes the sequential input data while maintaining context over time steps through its memory cells and gating mechanisms. The encoded information is then utilized by the RL agent to evaluate possible actions and derive an optimal policy that maximizes long-term rewards.

The MOABC algorithm operates in parallel, optimizing critical hyperparameters such as weights, biases, and the configuration of the LSTM architecture (number of layers and neurons). This optimization improves the overall adaptability and efficiency of the framework, ensuring that it can handle various dynamic IoT scenarios while maintaining energy efficiency and QoS. Through this layered integration, the proposed DRL framework achieves superior performance in addressing the temporal, computational, and resource challenges inherent in IoT environments. Algorithm 1 shows the pseudo-code of the proposed DRL framework.

Algorithm 1 Pseudo-code of proposed DRL (MOABC-LSTM-RL)

1:: Begin
2:: Parameter setting:
3:: Initialize DRL parameters: state and action space, reward function, and policy
4:: Initialize LSTM parameters: batch size, discount factor, learning rate, and sequence length
5:: Initialize MOABC parameters: population size, number of bees, and maximum iterations
6:: Generate initial population using MOABC
7:: Randomly initialize weights and biases for LSTM
8:: Evaluate fitness using objective functions
9:: Main Loop:
10:: for episode $e = 1$ to E do
11:: Reset environment and obtain the initial state
12:: for step $t = 1$ to T do
13:: LSTM and MOABC Optimization:
14:: Use LSTM to predict the next action
15:: Optimize LSTM parameters (weights, biases, layers, and neurons) using MOABC
16:: Evaluate fitness of population
17:: Update best candidates using MOABC operations
18:: Update LSTM parameters accordingly
19:: Action Selection:
20:: Use $ϵ$ -greedy policy to select action $a_{t}$

$a_{t} = \{\begin{matrix} random action & with probability ϵ \\ π (s_{t}; θ) & otherwise \end{matrix}$
21:: Environment Interaction:
22:: Execute action $a_{t}$ , observe next state $s_{t + 1}$ , reward $r_{t}$ , and done flag
23:: Store Transition:
24:: Store $(a_{t}, s_{t + 1}, r_{t}, s_{t})$ in replay memory
25:: Learning:
26:: Sample a batch of transitions from replay memory
27:: Compute the target value using the RL Bellman equation
28:: Compute loss
29:: Update the LSTM network’s parameters
30:: Update State:
31:: $s_{t} \leftarrow s_{t + 1}$
32:: Exploration Decay:
33:: Reduce $ϵ$ gradually
34:: end for
35:: end for
36:: End

4.2. Proposed Fuzzy Model

Fuzzy logic is a computational approach that deals with reasoning and decision-making in environments where data are uncertain, imprecise, or ambiguous. Unlike traditional binary logic systems that operate with true or false values, fuzzy logic enables the representation of degrees of truth, allowing variables to take on continuous values between 0 and 1. This flexibility makes it particularly suitable for modeling real-world systems where inputs are often vague or incomplete, such as in IoT applications. The core idea of fuzzy logic lies in mapping input data to fuzzy sets through membership functions, which define the degree to which a given input belongs to a particular set. The operation of a fuzzy logic system typically involves three main stages: fuzzification, rule evaluation, and defuzzification. In the fuzzification stage, the crisp input data are transformed into fuzzy values using predefined membership functions. These functions, which may take various shapes, such as triangular, trapezoidal, or Gaussian, determine the degree of membership of each input to corresponding fuzzy sets.

Once fuzzified, the data are processed through a set of fuzzy rules, which are defined in the form of “if-then” statements. These rules represent the logical relationships between input and output variables and are designed to capture the behavior of the system under different conditions. The final stage (defuzzification) converts the fuzzy output back into a crisp value for actionable decision-making. This process typically involves methods such as the centroid or weighted average, which aggregate the results of fuzzy inference to provide a single definitive output. The strength of fuzzy logic lies in its ability to handle non-linear relationships and manage uncertainty effectively, making it a powerful tool for systems that require flexibility and adaptability. By incorporating fuzzy rules and membership functions, the framework ensures that it can interpret complex and uncertain data with high reliability and precision.

In the fuzzy logic component of the proposed framework, the MOABC algorithm is used to optimize the parameters of the membership function, which play a critical role in managing uncertainties within the system. These parameters determine how input data are mapped to fuzzy sets, directly influencing the accuracy of the fuzzy rules and, ultimately, the decision-making process. By fine-tuning the shape and boundaries of the membership functions, the model ensures that it can effectively handle ambiguous or noisy inputs, providing reliable outputs in dynamic IoT environments. The optimization process minimizes fuzzification errors while aligning the fuzzy logic outputs with the system’s predictive and operational objectives. This process is guided by the objectives defined in Equations (18) and (19), which evaluate the performance of the membership functions.

\begin{matrix} F_{1} (x) & = Minimize \sum_{i = 1}^{R} {(\sum_{i = 1}^{W_{b}} [Y_{d} (i) - Y_{a} (i)])}^{2}, \end{matrix}

(18)

\begin{matrix} F_{2} (x) & = \sum_{i = 1}^{T} \sum_{j = 1}^{U} {(Y_{j b} - Y_{j i})}^{2} - \sum_{i = 1}^{T} \sum_{j = 1}^{U} max {[(Y_{j b} - Y_{j m i n}), (Y_{j m a x} - Y_{j b})]}^{2} . \end{matrix}

(19)

where

Y_{a} (i)

is the actual output, R indicates the length of the simulation sequence,

Y_{d} (i)

is the desired output,

Y_{j i}

corresponds to the

i - th

input for the

j - th

output,

W_{b}

shows the number of fuzzy rules, U is the number of outputs,

Y_{j b}

serves as a reference for the membership center, and T is the number of input–output data.

The MOABC evaluates each configuration of membership functions based on specific objective functions designed to improve classification accuracy and reduce overlapping regions in the fuzzy representation. The first objective focuses on minimizing cumulative errors between the fuzzified outputs and the target values, ensuring that the system maintains high predictive accuracy. By reducing these errors, the fuzzy logic layer contributes to more precise interpretations of uncertain data, enhancing the overall reliability and robustness of the model. This process is critical for IoT-driven applications where data uncertainty is prevalent due to noise or incomplete information. The second objective addresses the refinement of membership boundaries, which is essential for achieving a precise representation of uncertainty levels. Properly tuned boundaries allow the model to distinguish between different classes more effectively, reducing the overlap between membership functions and capturing subtle variations in the input data. This optimization not only improves the interpretability of the fuzzy logic system but also ensures that the outputs are well-aligned with the system’s decision-making requirements. The iterative adjustments made by MOABC enhance the adaptability of the fuzzy logic layer to varying conditions, making it a crucial component for predictive maintenance and QoS optimization in IoT networks. The integration of MOABC in optimizing the fuzzy logic component ensures that the system can effectively manage uncertainty, a key challenge in real-world IoT applications.

5. Results

The simulations for the proposed FDRL-based double RIS-aided WSN framework were performed on a MacBook Air powered by an Apple M1 processor with 16 GB of RAM. The simulation environment was developed using Python, where the TensorFlow library was used to build and train the reinforcement learning model. TensorFlow enabled efficient neural network computations and facilitated the real-time updating of the agent’s policy during training. For numerical analysis, the NumPy library was used, while Matplotlib ensured clear visualization and presentation of the simulation results. To simulate interactions between the environment and the agent, the OpenAI Gym toolkit was integrated, providing a state-action-reward framework tailored to the double-RIS-aided WSN scenario. The simulation setup assumes that all channels exhibit Rayleigh fading characteristics, which are representative of scattering environments typically observed in WSN deployments. The path loss exponent is configured as 3.5 for all communication links. The RF source operates at a frequency of 2.4 GHz with a transmission power of

P = 20 dBm

, equivalent to a typical Wi-Fi access point. The spatial arrangement of the components is defined as follows: The RF source is placed at

[- 5, 0]

, SN at

[0, 0]

, GW at

[5, 0]

, Eve at

[2.5, 2.5]

,

{RIS}_{1}

at

[0, 3]

to improve EH in the SN, and

{RIS}_{2}

at

[3, 3]

to improve SNR and maximize the achievable secrecy rate at GW. The noise variances in GW and Eve are set at

σ_{R}^{2} = - 60 dBm

and

σ_{E}^{2} = - 50 dBm

, receptively, by assuming that Eve’s channel is noisier than the legitimate receiver’s channel, a common assumption in physical layer security studies, ensuring that GW achieves a higher SNR, thereby maximizing the secrecy rate by amplifying the signal quality difference in (7).

In this study, the performance of the proposed FDRL framework is evaluated and compared with several benchmark algorithms, including RL, LSTM, RNN, XGBoost, and KNN. These algorithms were chosen to provide a comprehensive comparison across different ML paradigms. Specifically, LSTM and RNN represent state-of-the-art DL models known for their ability to process sequential data and capture temporal dependencies. These models are compared with the proposed FDRL to highlight its superior capabilities in leveraging fuzzy logic and advanced optimization techniques for IoT-driven environments. On the other hand, XGBoost and KNN, as examples of traditional ML approaches, are included to demonstrate the advancements achieved by the proposed framework in handling complex, dynamic, and uncertain data. The selection of these algorithms comes from their distinct strengths in addressing various challenges. LSTM and RNN excel in capturing non-linear and sequential patterns, making them suitable for dynamic IoT scenarios, while XGBoost is widely regarded for its efficiency in structured data and strong predictive performance in many real-world applications. Although simpler, KNN provides a baseline for understanding the effectiveness of more advanced models in handling complex data structures. To ensure a rigorous evaluation, multiple performance metrics are used, including RMSE,

R^{2}

, MAPE, runtime, ROC, and convergence trend. These metrics provide a holistic view of each algorithm’s performance in terms of accuracy, computational efficiency, and reliability. As defined in Equations (20)–(22), RMSE quantifies the average prediction error,

R^{2}

measures the proportion of variance explained by the model, and MAPE assesses the percentage error in the predictions. Additional metrics like runtime highlight computational efficiency, while ROC and convergence trends provide insight into the stability and reliability of the model during training and testing. Together, these criteria form a robust framework for evaluating the proposed FDRL against alternative methods.

\begin{matrix} R M S E & = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}, \end{matrix}

(20)

\begin{matrix} R^{2} & = {[\frac{1}{N} \sum_{i = 1}^{N} \frac{(y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{σ_{y} σ_{\hat{y}}}]}^{2}, \end{matrix}

(21)

\begin{matrix} M A P E & = \frac{1}{N} \sum_{i = 1}^{N} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100, \end{matrix}

(22)

where

y_{i}

is the observed value,

{\hat{y}}_{i}

is the calculated value,

\bar{y}

is the mean of the observed value,

\bar{\hat{y}}

is the mean of the predicted value,

σ_{y}

is the standard deviation of the actual value, and

σ_{\hat{y}}

is the standard deviation of the predicted value.

To ensure the effectiveness of the proposed FDRL framework and benchmark algorithms, the calibration of their hyperparameters was a critical initial step in this study. Hyperparameter optimization plays a pivotal role in determining the performance of ML models, as improper tuning can lead to suboptimal results, overfitting, or underfitting. By carefully adjusting parameters such as learning rates, batch sizes, hidden layers, and activation functions, we aimed to maximize the efficiency, accuracy, and generalizability of each model. The process involved systematic exploration through the trial-and-error method, an iterative approach in which various parameter configurations are tested and refined to identify the optimal settings. The trial-and-error method was employed due to its flexibility and practicality in handling diverse model architectures. This method involves testing multiple configurations for each parameter and evaluating their impact on the model’s performance metrics. For each algorithm, a wide range of parameter values was explored to capture the nuances of their performance under varying conditions. This process required extensive experimentation, where the results of each iteration informed subsequent adjustments. The optimal parameter settings for each model, derived from this comprehensive evaluation, are summarized in Table 1.

From the table, the DRL framework, enhanced with MOABC, showcases its complexity and adaptability through parameters like the number of employed and onlooker bees, population size, and iterations, which were specifically calibrated to ensure efficient optimization of weights and biases. The learning rate of 0.002 and a batch size of 128 were found to strike a balance between convergence speed and stability. For DL models (LSTM and RNN), the inclusion of activation functions such as tanh and sigmoid highlights their ability to process sequential data effectively. These models required meticulous tuning of hidden layers (10 and 8, respectively) and neurons per layer to capture temporal dependencies while avoiding overfitting. Traditional ML models such as XGBoost and KNN emphasize simplicity and efficiency. For XGBoost, parameters such as the maximum depth (4) and the number of estimators (300) were optimized to improve its tree-based learning capabilities. Similarly, for KNN, the distance metric (Euclidean) and the number of neighbors (8) were fine-tuned to improve classification accuracy and maintain computational efficiency.

Figure 4 illustrates the impact of RIS₁ and the distance between SN and the RF source on the performance of EH. The presence of RIS₁ significantly enhances EH compared to scenarios without RIS, with a higher number of RIS elements (N₁) resulting in greater improvements. For example, systems with

N_{1} = 32

consistently outperform those with

N_{1} = 16

or

N_{1} = 8

, highlighting the importance of increasing the number of RIS elements to improve the reflection of the RF signal. As the distance (

d_{P S}

) between the SN and the RF source increases, the EH performance decreases due to the path loss effect, which weakens the RF signal strength. However, the inclusion of RIS partially mitigates this degradation, as shown by the higher EH values at greater distances when RIS₁ is deployed. In contrast, the scenario without RIS demonstrates significantly lower EH across all distances, emphasizing the critical role of RIS in maintaining EH efficiency. As a result, the figure demonstrates that the use of RIS with optimized configurations is key to improving EH, particularly in scenarios where distance imposes challenges to RF signal propagation.

Figure 5 illustrates the impact of the RF source power (

P_{source}

) and the presence of RIS₁ and RIS₂ on the secrecy rate in the proposed framework. It is evident that increasing the power of the RF source enhances the SR in all scenarios, as higher power levels improve the signal strength received in the legitimate receiver. However, the presence of RIS further amplifies this improvement by strategically reflecting RF signals, thus optimizing the channel conditions for secure communication. The role of RIS₁ and RIS₂ in enhancing SR is distinct and complementary. RIS₁, located between SN and the RF source, mainly optimizes EH and improves the transmission efficiency of the backscattered signal. This indirectly contributes to better signal quality at the legitimate receiver, thus improving SR. On the other hand, RIS₂, located near the gateway, directly enhances SINR by focusing the reflected signals toward the legitimate receiver while minimizing leakage toward the eavesdropper. The combined effect of RIS₁ and RIS₂ results in significant improvements in SR compared to the scenario without RIS. The role of RIS₂ is particularly critical in the context of secure communication. As shown in the figure, the scenarios with the larger

N_{2}

(for example,

N_{2} = 32

) outperform those with the smaller

N_{2}

or without RIS, demonstrating that RIS₂ plays a more dominant role in improving SINR at GW. By directing the reflected signal to the intended receiver, RIS₂ minimizes the interception capability of the eavesdropper, which is essential to maximize SR. This makes RIS₂ a vital component for achieving high security levels in WSNs, especially in environments with weaker RF source power.

Figure 6 illustrates the performance of energy harvesting as a function of the distance between the SN and the RF source for various optimization frameworks, with

N_{1} = 16

. It can be shown that the proposed FDRL framework consistently outperforms all other optimization methods at all distances. This superior performance can be attributed to the integration of fuzzy logic, LSTM, and MOABC, which collectively optimize energy harvesting by dynamically adapting to environmental variations and efficiently managing uncertainties in the propagation of the RF signal. In particular, the EH achieved by FDRL is highest when

d_{P S}

is minimal (e.g., 1 m), with only a gradual decline as

d_{P S}

increases to 10 m, demonstrating its robustness to maintain efficient EH over longer distances. Traditional RL and LSTM frameworks exhibit moderate performance, with RL outperforming LSTM at closer distances (

d_{P S} \leq 5 m

) but showing a steeper decline in EH as distance increases. This behavior highlights the limitations of RL in maintaining efficient EH under challenging propagation conditions. In contrast, LSTM benefits from its ability to model sequential dependencies but is still short of the adaptive capabilities of FDRL. Other traditional frameworks, such as RNN, XGBoost, and KNN, exhibit significantly lower EH values, particularly at larger distances. The results emphasize the effectiveness of the proposed FDRL framework in achieving superior EH performance compared to other optimization frameworks.

In the proposed fuzzy logic framework, we utilized three standard membership functions: triangular, Gaussian, and trapezoidal (Figure 7). Each membership function was selected to analyze its effectiveness in capturing uncertainties and modeling the non-linear relationships in the IoT data. The triangular membership function provides a simple and efficient representation of fuzzy sets due to its linear nature, making it computationally lightweight. The Gaussian function, with its smooth bell-shaped curve, offers a more precise representation of the gradual overlapping transitions between fuzzy sets. Meanwhile, the trapezoidal function balances computational efficiency and adaptability by defining clear boundaries for each fuzzy set while allowing for a certain degree of overlap. These functions enable the fuzzy inference system to adapt to various scenarios in predictive maintenance and QoS optimization. The MOABC algorithm plays a pivotal role in optimizing the parameters of these membership functions, such as the centers, widths, and boundaries. This optimization minimizes the overlap between fuzzy sets while ensuring the accuracy of the fuzzification process. By leveraging MOABC’s multi-objective capabilities, the framework refines the membership functions to better align with the system’s predictive and operational goals.

Table 2 illustrates the RMSE values obtained for test data using three fuzzy membership functions (triangular, Gaussian, and trapezoidal) across various models. The table provides a comparison of the RMSE values before optimization (standard functions) and after optimization using the MOABC algorithm, showcasing the significant impact of optimization on model performance. The results indicate that the trapezoidal membership function consistently yields the lowest RMSE values across all models, both in the standard and optimized cases. This highlights its superior ability to capture uncertainties in the data compared to triangular and Gaussian functions. Among the models, the proposed FDRL exhibits the best performance, achieving the lowest RMSE values (0.09 for the trapezoidal function) after optimization. This demonstrates the effectiveness of the MOABC algorithm in fine-tuning membership function parameters for complex IoT applications. Furthermore, optimization using MOABC significantly reduces the RMSE for all models, emphasizing the importance of this step in enhancing predictive accuracy. For instance, the RMSE for the LSTM model with triangular functions decreases from 14.39 to 8.56, representing a substantial improvement. However, traditional ML models like XGBoost and KNN show higher RMSE values even after optimization, indicating their limitations in handling the complexities of IoT environments compared to deep learning approaches. This further validates the effectiveness of integrating fuzzy logic and MOABC optimization in the proposed framework.

Figure 8 shows the convergence trends of RMSE for the proposed FDRL model compared to the benchmark algorithms over 300 training epochs. The graph clearly shows that the FDRL model achieves a significantly faster convergence rate and a substantially lower RMSE compared to all other methods. Initially, at the start of the training process, all models exhibit relatively high RMSE values, with FDRL starting slightly above 25. However, the RMSE for FDRL drops sharply within the first 50 epochs, reaching values close to zero, and then stabilizes for the remaining epochs. This rapid convergence highlights the efficiency of the FDRL framework in learning optimal representations and decision policies, attributed to the integration of fuzzy logic, LSTM, and MOABC optimization. In contrast, RL, LSTM, and RNN show slower convergence rates and higher final RMSE values. Although RL and LSTM demonstrate notable improvements over time, their RMSE values plateau around epoch 150, indicating a limitation in their ability to further minimize prediction errors. RNN struggles even more, converging to a higher RMSE compared to RL and LSTM, which suggests its reduced capacity to handle complex temporal dependencies. XGBoost and KNN perform the worst, with minimal RMSE reductions throughout training. Both models exhibit flat convergence trends, demonstrating their inability to adapt to the sequential and dynamic nature of the data.

6. Discussion

Table 3 presents a detailed comparison of the proposed FDRL model with several benchmark algorithms during both the training and testing phases. The results in Table 3 demonstrate the superior performance of the proposed FDRL model compared to other benchmark algorithms in both the training and testing phases. The FDRL achieves the lowest RMSE (0.02 for training, 0.09 for testing) and MAPE (0.88% for training, 1.07% for testing), along with the highest

R^{2}

values (0.97 for training, 0.95 for testing). These results indicate that the FDRL model not only provides highly accurate predictions but also generalizes well to unseen testing data. The inclusion of fuzzy logic and MOABC optimization enhances the learning process, allowing the model to manage uncertainty and fine-tune its parameters effectively. In contrast, traditional RL and LSTM perform better than XGBoost and KNN but still fall short compared to the FDRL model. RL achieves relatively lower errors but suffers from reduced generalization in testing (

R^{2} = 0.82

), while LSTM exhibits moderate accuracy due to its ability to capture temporal patterns. However, RNN, XGBoost, and KNN show significantly higher RMSE and MAPE values, particularly in the testing phase, highlighting their limitations in handling complex, sequential, and noisy IoT data. The results clearly demonstrate that the proposed FDRL framework, with its integration of fuzzy rules, LSTM, and MOABC optimization, outperforms both DL and traditional ML approaches, making it an ideal choice for energy-efficient and QoS-aware IoT systems.

Table 4 presents a detailed comparison of the computational efficiency of the FDRL model and other algorithms under two conditions: a fixed number of training epochs (300) and the time required to achieve specific RMSE thresholds. The FDRL model, while exhibiting a longer initial run time of 835 s for 300 epochs, demonstrates its efficiency in achieving lower RMSE values more quickly compared to other models. For instance, FDRL achieves an RMSE below 20.00 in just 34 s, an RMSE under 10.00 in 263 s, and an RMSE below 3.00 in 415 s. This highlights the effectiveness of FDRL in converging rapidly to high-accuracy solutions. In contrast, traditional RL and LSTM models require significantly more time to reach comparable RMSE thresholds, if at all. For example, RL achieves RMSE

< 20.00

in 128 s but struggles to reach RMSE

< 10.00

efficiently, taking 432 s. Similarly, LSTM takes 486 s to achieve RMSE

< 20.00

and requires 524 s to reach RMSE

< 10.00

. Models such as RNN, XGBoost, and KNN either fail to achieve lower RMSE thresholds or require considerably longer run times, as indicated by the absence of results for stricter conditions (RMSE

< 3.00

). XGBoost and KNN perform well in terms of initial training speed but lack the capacity to achieve the desired prediction accuracy, highlighting their limitations in handling complex sequential data.

The runtime performance of the proposed FDRL model not only showcases its efficiency in achieving lower RMSE thresholds but also reflects the effective interplay between its components: fuzzy logic, LSTM, and MOABC. Each of these elements contributes uniquely to the model’s superior convergence speed and accuracy. The combination of fuzzy logic and LSTM, optimized by MOABC, creates a system that is not only computationally efficient but also capable of adapting to the dynamic and uncertain nature of IoT-driven environments. The integration of MOABC plays a critical role in this efficiency. By optimizing fuzzy membership functions and LSTM parameters, the MOABC algorithm ensures that the FDRL model operates with parameters that are best suited to the data at hand. The use of artificial bee colony dynamics allows the algorithm to effectively balance exploration and exploitation, leading to faster convergence to optimal solutions. Scout bees, for instance, identify promising regions in the solution space, while employed and onlooker bees refine these regions to optimize performance metrics such as RMSE.

This cooperative optimization approach significantly reduces the computational overhead associated with traditional parameter tuning methods. The LSTM component, on the other hand, excels in handling sequential data, a critical requirement for IoT-driven systems where time series data are prevalent. Its ability to capture temporal dependencies enables the model to make accurate predictions about future states, even in complex and noisy environments. When combined with the uncertainty management capabilities of fuzzy logic, the LSTM gains an additional layer of interpretability and robustness. Fuzzy logic translates ambiguous and noisy data into actionable insights, providing the LSTM with cleaner, more structured input. This synergistic interaction ensures that the model adapts dynamically to changing environmental conditions, maintaining both accuracy and efficiency.

What sets FDRL apart is its ability to quickly converge to highly accurate solutions, as demonstrated in its runtime performance. Although traditional models such as RL and LSTM require significantly more time to achieve comparable results, the FDRL model takes advantage of the strengths of MOABC to continuously refine its parameters during the training process. This rapid convergence is especially critical for IoT systems, where real-time decision-making and energy efficiency are paramount. Another advantage of the FDRL model lies in its flexibility and adaptability. Unlike traditional algorithms, which often struggle with the trade-off between accuracy and computational efficiency, the FDRL model achieves both. The MOABC-optimized fuzzy logic layer ensures that the model handles uncertainties with high precision. Meanwhile, the LSTM architecture captures long-term dependencies, providing a comprehensive understanding of sequential data. This adaptability is further enhanced by the MOABC algorithm, which continuously fine-tunes the model to align with specific objectives, such as minimizing RMSE or maximizing energy efficiency.

In this study, we assumed perfect channel state information (CSI) for both RIS₁ and RIS₂. Although this assumption simplifies the optimization process, in practical scenarios, obtaining perfect CSI in WSNs is highly challenging due to limited computational resources and communication overhead. Unknown or imperfect CSI can lead to suboptimal phase-shift configurations, thus affecting the system’s overall performance. Future work could explore robust optimization techniques that account for imperfect CSI in both RIS₁ and RIS₂, ensuring reliable performance in real-world environments. Additionally, the quantization phase error in RIS elements is another potential limitation. In practical deployments, RIS elements typically operate with finite discrete phase levels, which introduce quantization errors. These errors can significantly affect the accuracy of phase shift configurations, increasing the complexity of the optimization problem. Incorporating models to account for phase quantization and developing methods to mitigate its impact could further enhance the applicability of the framework in real-world scenarios. The proposed framework also faces challenges in optimizing the phase-shift matrices of two RISs (RIS₁ and RIS₂). Optimizing two phase shift matrices in a highly dynamic IoT environment, while considering energy efficiency and secrecy rate simultaneously, is a computationally intensive task. This required us to implement advanced optimization algorithms such as DRL, MOABC, and fuzzy logic to strike a balance between accuracy and convergence speed. While this approach has demonstrated promising results, the reliance on multiple algorithms also increases the framework’s complexity, posing challenges for deployment in resource-constrained environments.

7. Conclusions

In this study, we proposed a novel FDRL-based framework for double RIS-aided WSNs to address the dual challenges of energy efficiency and secure communication while ensuring QoS. The integration of fuzzy logic, RL, and LSTM into the framework allowed the system to dynamically adapt to varying environmental conditions and uncertainties in signal propagation. By optimizing the phase shifts of RIS₁ and RIS₂, the proposed approach achieved significant improvements in EH at the SN and SR at the GW, demonstrating its effectiveness compared to traditional optimization frameworks. The results highlight the complementary roles of RIS₁ and RIS₂ in the system. While RIS₁ enhances the harvested energy at the SN, RIS₂ improves SINR at the GW and mitigates the impact of eavesdroppers, ensuring robust communication security. This dual functionality of the RISs underscores the critical role of their optimized deployment in improving the overall performance of WSNs. The simulation results demonstrated that the proposed framework significantly improves network performance, achieving an improvement in energy efficiency of 35.4%, an increase in secrecy rate of 29.7%, and a 27.5% improvement in RSMA compared to the second-best approach. Additionally, the model achieved an R² score improvement of 12.3%, confirming its superior predictive accuracy. Further analysis revealed that increasing the number of RIS₁ elements leads to notable improvements in energy harvesting performance, particularly in scenarios with larger distances between SN and the RF source. Similarly, the results showed that the secrecy rate is highly dependent on the proper configuration of RIS₂, where larger RIS configurations result in a more secure communication link by directing RF signals toward GW while reducing signal leakage to the eavesdropper. This leads to improved SINR at GW and a significant reduction in interception probability, demonstrating the efficacy of the proposed optimization approach. Moreover, the framework exhibited faster convergence and improved stability compared to conventional RL-based approaches, requiring fewer iterations to reach optimal solutions. This efficiency gain is attributed to the MOABC-based hyperparameter optimization, which enhances the model’s learning stability and computational performance, making it more suitable for real-time implementation in dynamic WSN environments. Furthermore, the simulation results showed that the FDRL-based framework consistently outperformed benchmark algorithms in terms of accuracy, computational efficiency, and adaptability under dynamic network conditions. The findings of this study demonstrate that the proposed framework is well-suited for IoT-driven WSNs, where achieving energy-efficient and secure communication is paramount. The inclusion of fuzzy logic and MOABC ensured effective uncertainty management and hyperparameter optimization, while the LSTM enhanced temporal learning capabilities. Together, these elements formed a robust and scalable solution that not only meets the stringent requirements of WSNs but also provides a validated approach for RIS-assisted optimization in IoT applications.

Future research could explore the potential to utilize advanced technologies such as 3D intelligent reflection through simultaneously transmitting and reflecting (STAR) RIS to improve energy efficiency and QoS in WSNs. Unlike traditional RIS designs that operate within a 2D plane, STAR-RIS can simultaneously transmit and reflect signals in three dimensions, providing greater flexibility in signal manipulation and coverage optimization. This capability is particularly beneficial in complex IoT environments where signal propagation is affected by obstacles, mobility, and varying spatial configurations. By leveraging intelligent reflection across multiple dimensions, STAR-RIS can dynamically adjust reflection coefficients to improve spectral efficiency and coverage in large-scale WSN deployments. This capability could significantly improve the adaptability of WSNs in dynamic environments. Additionally, developing low-complexity optimization techniques that reduce computational demands while maintaining high accuracy is another promising direction. For instance, lightweight AI-driven heuristics and metaheuristic algorithms could be designed to fine-tune RIS phase shifts in real time with minimal computational overhead. Such techniques would enable WSNs to achieve near-optimal performance while operating under strict energy and hardware constraints, making them more practical for real-world deployment. These methods would enable real-time decision-making in resource-constrained WSNs, ensuring efficient performance under dynamic conditions. Furthermore, integrating emerging communication technologies such as integrated sensing and communication (ISAC) and fluid antenna systems (FASs) could unlock new opportunities for performance enhancement. ISAC can simultaneously support wireless communication and environmental sensing within the same infrastructure, reducing energy consumption and hardware requirements. By enabling sensor nodes to simultaneously sense their surroundings and transmit data, ISAC could enhance the adaptability of WSNs in mission-critical applications, such as industrial automation and smart cities, where real-time situational awareness is crucial. Similarly, FAS provides dynamic reconfigurability by allowing antenna elements to adapt their positions and configurations in real time, improving beamforming and interference management. This flexibility enables sensor nodes to dynamically switch antenna patterns in response to varying channel conditions, thereby optimizing energy efficiency and secure data transmission. Another crucial direction for future research is investigating the impact of dynamic environmental conditions on RIS performance and adaptability. While RIS-based systems have shown promising results in improving signal propagation and energy efficiency, their performance can be significantly affected by environmental factors such as mobility, atmospheric variations, interference, and multipath fading. Developing robust adaptation mechanisms that allow RIS configurations to self-adjust in response to these conditions is essential for ensuring reliable performance in real-world deployments. Machine learning-driven predictive models could be employed to anticipate environmental fluctuations and proactively reconfigure RIS elements, enhancing resilience against channel degradation. Additionally, designing hybrid RIS architectures that can seamlessly transition between passive and active reflection modes may further improve their adaptability under unpredictable network conditions.

Author Contributions

Conceptualization, S.S.K. and M.K.; methodology, M.S. and S.S.K.; software, M.S. and R.S.; validation, M.S. and M.K.; formal analysis, M.S.; investigation, S.S.K., M.S. and M.K.; writing—original draft preparation, S.S.K., M.S. and R.S.; writing—review and editing, M.K.; visualization, M.K. and R.S.; supervision, M.K.; project administration, M.K.; funding acquisition, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Temene, N.; Sergiou, C.; Georgiou, C.; Vassiliou, V. A Survey on Mobility in Wireless Sensor Networks. Ad Hoc Netw. 2022, 125, 102726. [Google Scholar] [CrossRef]
Kandris, D.; Nakas, C.; Vomvas, D.; Koulouras, G. Applications of Wireless Sensor Networks: An Up-to-Date Survey. Appl. Syst. Innov. 2020, 3, 14. [Google Scholar] [CrossRef]
BenSaleh, M.S.; Saida, R.; Kacem, Y.H.; Abid, M. Wireless Sensor Network Design Methodologies: A Survey. J. Sensors 2020, 2020, 9592836. [Google Scholar] [CrossRef]
Naifar, S.; Kanoun, O.; Trigona, C. Energy Harvesting Technologies and Applications for the Internet of Things and Wireless Sensor Networks. Sensors 2024, 24, 4688. [Google Scholar] [CrossRef]
Kaveh, M.; Mosavi, M.R.; Martín, D.; Aghapour, S. An Efficient Authentication Protocol for Smart Grid Communication Based on On-Chip Error-Correcting Physical Unclonable Function. Sustain. Energy Grids Netw. 2023, 36, 101228. [Google Scholar] [CrossRef]
Dogra, R.; Rani, S.; Sharma, B.; Verma, S. Essence of Scalability in Wireless Sensor Network for Smart City Applications. Iop Conf. Ser. Mater. Sci. Eng. 2021, 1022, 012094. [Google Scholar] [CrossRef]
Najafi, F.; Kaveh, M.; Mosavi, M.R.; Brighente, A.; Conti, M. EPUF: An Entropy-Derived Latency-Based DRAM Physical Unclonable Function for Lightweight Authentication in Internet of Things. IEEE Trans. Mobile Comput. 2024, 24, 2422–2436. [Google Scholar] [CrossRef]
Bajaj, K.; Sharma, B.; Singh, R. Integration of WSN with IoT Applications: A Vision, Architecture, and Future Challenges. In Integration of WSN and IoT for Smart Cities; Springer: Cham, Switzerland, 2020; pp. 79–102. [Google Scholar]
Lata, S.; Mehfuz, S.; Urooj, S. Secure and Reliable WSN for Internet of Things: Challenges and Enabling Technologies. IEEE Access 2021, 9, 161103–161128. [Google Scholar] [CrossRef]
Ghadi, F.; Kaveh, M.; Wong, K. Performance Analysis of FAS-Aided Backscatter Communications. IEEE Wirel. Commun. Lett. 2024, 13, 2412–2416. [Google Scholar] [CrossRef]
Quy, V.K.; Nam, V.H.; Linh, D.M.; Ban, N.T.; Han, N.D. A Survey of QoS-Aware Routing Protocols for the MANET-WSN Convergence Scenarios in IoT Networks. Wirel. Pers. Commun. 2021, 120, 49–62. [Google Scholar] [CrossRef]
Mazhar, T.; Malik, M.A.; Mohsan, S.A.H.; Li, Y.; Haq, I.; Ghorashi, S.; Karim, F.K.; Mostafa, S.M. Quality of Service (QoS) Performance Analysis in a Traffic Engineering Model for Next-Generation Wireless Sensor Networks. Symmetry 2023, 15, 513. [Google Scholar] [CrossRef]
Sharma, N.; Singh, B.M.; Singh, K. QoS-Based Energy-Efficient Protocols for Wireless Sensor Network. Sustain. Comput. Inform. Syst. 2021, 30, 100425. [Google Scholar] [CrossRef]
Kaur, T.; Kumar, D. A Survey on QoS Mechanisms in WSN for Computational Intelligence Based Routing Protocols. Wirel. Netw. 2020, 26, 2465–2486. [Google Scholar] [CrossRef]
Shafiq, M.; Ashraf, H.; Ullah, A.; Tahira, S. Systematic Literature Review on Energy Efficient Routing Schemes in WSN—A Survey. Mob. Netw. Appl. 2020, 25, 882–895. [Google Scholar] [CrossRef]
Ghadi, F.R.; Kaveh, M.; Wong, K.K.; Jäntti, R.; Yan, Z. On Performance of FAS-Aided Wireless Powered NOMA Communication Systems. In Proceedings of the 20th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), Paris, France, 21–23 October 2024; pp. 496–501. [Google Scholar]
Sadeq, A.S.; Hassan, R.; Sallehudin, H.; Aman, A.H.M.; Ibrahim, A.H. Conceptual Framework for Future WSN-MAC Protocol to Achieve Energy Consumption Enhancement. Sensors 2022, 22, 2129. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Li, Q.; Zhao, S.; Wen, J.; Cai, Z. RIS-Based Wireless Sensor Networks: Passive Beamforming and Decision Gathering. IEEE Commun. Lett. 2023, 27, 1624–1628. [Google Scholar] [CrossRef]
Qu, L.; Huang, A.; Khabbaz, M.J. Optimization of Information Freshness in Multi-RIS Cooperative Assisted Wireless Sensor Network. IEEE Trans. Wirel. Commun. 2024, 23, 16332–16345. [Google Scholar] [CrossRef]
Liu, J.; Zhang, H. Height-Fixed UAV Enabled Energy-Efficient Data Collection in RIS-Aided Wireless Sensor Networks. IEEE Trans. Wirel. Commun. 2023, 22, 7452–7463. [Google Scholar] [CrossRef]
Ghadi, F.; Kaveh, M.; Martín, D. Performance Analysis of RIS/STAR-IOS-Aided V2V NOMA/OMA Communications over Composite Fading Channels. IEEE Trans. Intell. Veh. 2024, 9, 279–286. [Google Scholar] [CrossRef]
Ghasemi, O.A.; Amirani, M.C.; Azghani, M. Resource and Power Allocation for Sum-Throughput Maximization in RIS-Assisted TDMA Wireless Sensor Networks. IEEE Internet Things J. 2024, 11, 24123–24133. [Google Scholar] [CrossRef]
Ghadi, F.; Kaveh, M.; Wong, K.; Martín, D. Physical Layer Security Performance of Cooperative Dual-RIS-Aided V2V NOMA Communications. IEEE Syst. J. 2024, 18, 2074–2084. [Google Scholar] [CrossRef]
Liu, H.; Zhang, Y.; Gong, S.; Shen, W.; Xing, C.; An, J. Optimal Transmission Strategy and Time Allocation for RIS-Enhanced Partially WPSNs. IEEE Trans. Wirel. Commun. 2022, 21, 7207–7221. [Google Scholar] [CrossRef]
Sikri, A.; Selim, B.; Kaddoum, G.; Au, M.; Agba, B.L. RIS-Aided Wireless Sensor Network in the Presence of Impulsive Noise and Interferers for Smart-Grid Communications. IEEE Commun. Lett. 2023, 27, 2501–2505. [Google Scholar] [CrossRef]
Xu, Y.; Gao, Z.; Wang, Z.; Huang, C.; Yang, Z.; Yuen, C. RIS-Enhanced WPCNs: Joint Radio Resource Allocation and Passive Beamforming Optimization. IEEE Trans. Veh. Technol. 2021, 70, 7980–7991. [Google Scholar] [CrossRef]
Singh, J.; Kaur, R.; Singh, D. Energy Harvesting in Wireless Sensor Networks: A Taxonomic Survey. Int. J. Energy Res. 2021, 45, 118–140. [Google Scholar] [CrossRef]
Riaz, A.; Sarker, M.R.; Saad, M.H.M.; Mohamed, R. Review on Comparison of Different Energy Storage Technologies Used in Micro-Energy Harvesting, WSNs, Low-Cost Microelectronic Devices: Challenges and Recommendations. Sensors 2021, 21, 5041. [Google Scholar] [CrossRef]
Amutha, J.; Sharma, S.; Nagar, J. WSN Strategies Based on Sensors, Deployment, Sensing Models, Coverage and Energy Efficiency: Review, Approaches and Open Issues. Wirel. Pers. Commun. 2020, 111, 1089–1115. [Google Scholar] [CrossRef]
Gupta, V.; De, S. Collaborative Multi-Sensing in Energy Harvesting Wireless Sensor Networks. IEEE Trans. Signal Inf. Process. Netw. 2020, 6, 426–441. [Google Scholar] [CrossRef]
Nandan, A.S.; Singh, S.; Kumar, R.; Kumar, N. An Optimized Genetic Algorithm for Cluster Head Election Based on Movable Sinks and Adjustable Sensing Ranges in IoT-Based HWSNs. IEEE Internet Things J. 2021, 9, 5027–5039. [Google Scholar] [CrossRef]
Piran, M.J.; Verma, S.; Menon, V.G.; Suh, D.Y. Energy-Efficient Transmission Range Optimization Model for WSN-Based Internet of Things. Comput. Mater. Contin. 2021, 67, 3. [Google Scholar]
Wen, F.; Wang, H.; He, T.; Shi, Q.; Sun, Z.; Zhu, M.; Zhang, Z.; Cao, Z.; Dai, Y.; Zhang, T.; et al. Battery-Free Short-Range Self-Powered Wireless Sensor Network (SS-WSN) Using TENG-Based Direct Sensory Transmission (TDST) Mechanism. Nano Energy 2020, 67, 104266. [Google Scholar] [CrossRef]
Das, S.K.; Kapelko, R. On the Range Assignment in Wireless Sensor Networks for Minimizing the Coverage-Connectivity Cost. ACM Trans. Sens. Netw. (TOSN) 2021, 17, 1–48. [Google Scholar] [CrossRef]
Toro, U.S.; Wu, K.; Leung, V.C. Backscatter Wireless Communications and Sensing in Green Internet of Things. IEEE Trans. Green Commun. Netw. 2021, 6, 37–55. [Google Scholar] [CrossRef]
Toro, U.S.; ElHalawany, B.M.; Wong, A.B.; Wang, L.; Wu, K. Backscatter Communication-Based Wireless Sensing (BBWS): Performance Enhancement and Future Applications. J. Netw. Comput. Appl. 2022, 208, 103518. [Google Scholar] [CrossRef]
Xie, N.; Zhang, J.; Zhang, Q. Security Provided by the Physical Layer in Wireless Communications. IEEE Netw. 2022, 37, 42–48. [Google Scholar] [CrossRef]
Illi, E.; Qaraqe, M.; Althunibat, S.; Alhasanat, A.; Alsafasfeh, M.; de Ree, M.; Mantas, G.; Rodriguez, J.; Aman, W.; Al-Kuwari, S. Physical Layer Security for Authentication, Confidentiality, and Malicious Node Detection: A Paradigm Shift in Securing IoT Networks. IEEE Commun. Surv. Tutor. 2023, 26, 347–388. [Google Scholar] [CrossRef]
Chorti, A.; Barreto, A.N.; Köpsell, S.; Zoli, M.; Chafii, M.; Sehier, P.; Fettweis, G.; Poor, H.V. Context-Aware Security for 6G Wireless: The Role of Physical Layer Security. IEEE Commun. Stand. Mag. 2022, 6, 102–108. [Google Scholar] [CrossRef]
Kalidoss, T.; Rajasekaran, L.; Kanagasabai, K.; Sannasi, G.; Kannan, A. QoS Aware Trust Based Routing Algorithm for Wireless Sensor Networks. Wirel. Pers. Commun. 2020, 110, 1637–1658. [Google Scholar] [CrossRef]
Zheng, J.; Cai, Y.; Shen, X.; Zheng, Z.; Yang, W. Green energy optimization in energy harvesting wireless sensor networks. IEEE Commun. Mag. 2015, 53, 150–157. [Google Scholar] [CrossRef]
Yang, S.; Tahir, Y.; Chen, P.Y.; Marshall, A.; McCann, J. Distributed optimization in energy harvesting sensor networks with dynamic in-network data processing. In Proceedings of the IEEE INFOCOM 2016—The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 10–14 April 2016; pp. 1–9. [Google Scholar]
Ma, K.; Li, Z.; Liu, P.; Yang, J.; Geng, Y.; Yang, B.; Guan, X. Reliability-constrained throughput optimization of industrial wireless sensor networks with energy harvesting relay. IEEE Internet Things J. 2021, 8, 13343–13354. [Google Scholar] [CrossRef]
Ghosh, D.; Hanawal, M.K.; Zlatanov, N. Learning to optimize energy efficiency in energy harvesting wireless sensor networks. IEEE Wirel. Commun. Lett. 2021, 10, 1153–1157. [Google Scholar] [CrossRef]
Marriwala, N. Energy harvesting system design and optimization using high bandwidth rectenna for wireless sensor networks. Wirel. Pers. Commun. 2022, 122, 669–684. [Google Scholar] [CrossRef]
Azarhava, H.; Niya, J.M. Energy efficient resource allocation in wireless energy harvesting sensor networks. IEEE Wirel. Commun. Lett. 2020, 9, 1000–1003. [Google Scholar] [CrossRef]
Pitchai, K.M. Maximizing energy efficiency using Dinklebach’s and particle swarm optimization methods for energy harvesting wireless sensor networks. Sādhanā 2022, 47, 60. [Google Scholar] [CrossRef]
Gupta, P.; Tripathi, S.; Singh, S.; Gupta, V.S. MPPT-EPO optimized solar energy harvesting for maximizing the WSN lifetime. Peer-to-Peer Netw. Appl. 2023, 16, 347–357. [Google Scholar] [CrossRef]
Yun, W.K.; Yoo, S.J. Q-learning-based data-aggregation-aware energy-efficient routing protocol for wireless sensor networks. IEEE Access 2021, 9, 10737–10750. [Google Scholar] [CrossRef]
Zhu, B.; Bedeer, E.; Nguyen, H.H.; Barton, R.; Henry, J. UAV trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans. Veh. Technol. 2021, 70, 9540–9554. [Google Scholar] [CrossRef]
Lakshmanna, K.; Subramani, N.; Alotaibi, Y.; Alghamdi, S.; Khalafand, O.I.; Nanda, A.K. Improved metaheuristic-driven energy-aware cluster-based routing scheme for IoT-assisted wireless sensor networks. Sustainability 2022, 14, 7712. [Google Scholar] [CrossRef]
Kumar, M.; Mukherjee, P.; Verma, K.; Verma, S.; Rawat, D.B. Improved deep convolutional neural network based malicious node detection and energy-efficient data transmission in wireless sensor networks. IEEE Trans. Netw. Sci. Eng. 2021, 9, 3272–3281. [Google Scholar] [CrossRef]
Ghadi, Y.Y.; Mazhar, T.; Al Shloul, T.; Shahzad, T.; Salaria, U.A.; Ahmed, A.; Hamam, H. Machine learning solution for the security of wireless sensor network. IEEE Access 2024, 12, 12699–12719. [Google Scholar] [CrossRef]
Zhong, R.; Peng, F.; Yu, J.; Munetomo, M. Q-learning based vegetation evolution for numerical optimization and wireless sensor network coverage optimization. Alex. Eng. J. 2024, 87, 148–163. [Google Scholar] [CrossRef]
Bukhari, S.M.S.; Zafar, M.H.; Abou Houran, M.; Moosavi, S.K.R.; Mansoor, M.; Muaaz, M.; Sanfilippo, F. Secure and privacy-preserving intrusion detection in wireless sensor networks: Federated learning with SCNN-Bi-LSTM for enhanced reliability. Ad Hoc Netw. 2024, 155, 103407. [Google Scholar] [CrossRef]
Mahmood, T.; Li, J.; Saba, T.; Rehman, A.; Ali, S. Energy optimized data fusion approach for scalable wireless sensor network using deep learning-based scheme. J. Netw. Comput. Appl. 2024, 224, 103841. [Google Scholar] [CrossRef]
Rajasoundaran, S.; Kumar, S.S.; Selvi, M.; Thangaramya, K.; Arputharaj, K. Secure and optimized intrusion detection scheme using LSTM-MAC principles for underwater wireless sensor networks. Wirel. Netw. 2024, 30, 209–231. [Google Scholar] [CrossRef]
Hashemi, R.; Ali, S.; Mahmood, N.H.; Latva-aho, M. Average Rate and Error Probability Analysis in Short Packet Communications Over RIS-Aided URLLC Systems. IEEE Trans. Veh. Technol. 2021, 70, 10320–10334. [Google Scholar] [CrossRef]
Zhang, J.; Liu, J.; Ma, S.; Wen, C.K.; Jin, S. Large System Achievable Rate Analysis of RIS-Assisted MIMO Wireless Communication with Statistical CSIT. IEEE Trans. Wirel. Commun. 2021, 20, 5572–5585. [Google Scholar] [CrossRef]
Liu, X.; Yu, Y.; Li, F.; Durrani, T.S. Throughput Maximization for RIS-UAV Relaying Communications. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19569–19574. [Google Scholar] [CrossRef]
Björnson, E.; Wymeersch, H.; Matthiesen, B.; Popovski, P.; Sanguinetti, L.; De Carvalho, E. Reconfigurable Intelligent Surfaces: A Signal Processing Perspective with Wireless Applications. IEEE Signal Process. Mag. 2022, 39, 135–158. [Google Scholar] [CrossRef]
Kaveh, M.; Yan, Z.; Jantti, R. Secrecy Performance Analysis of RIS-Aided Smart Grid Communications. IEEE Trans. Ind. Inform. 2024, 20, 5415–5427. [Google Scholar] [CrossRef]
Faisal, A.; Al-Nahhal, I.; Dobre, O.A.; Ngatched, T.M. Deep Reinforcement Learning for Optimizing RIS-Assisted HD-FD Wireless Systems. IEEE Commun. Lett. 2021, 25, 3893–3897. [Google Scholar] [CrossRef]
Guo, K.; Wu, M.; Li, X.; Song, H.; Kumar, N. Deep Reinforcement Learning and NOMA-Based Multi-Objective RIS-Assisted IS-UAV-TNs: Trajectory Optimization and Beamforming Design. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10197–10210. [Google Scholar] [CrossRef]

Figure 1. The proposed double-RIS-aided WSN communication system. RIS₁ is configured to enhance the EH capabilities of the SN, while RIS₂ is placed to improve the SR of the communication system.

Figure 2. Overview of the proposed FDRL framework, integrating fuzzy logic, DRL, and MOABC.

Figure 3. Flow diagram of the proposed DRL model.

Figure 4. EH vs. distance between SD and RF source for different RIS₁ scenarios.

Figure 5. Secrecy rate vs. RF source power for different RIS₁ and RIS₂ scenarios.

Figure 6. EH vs. distance between SD and RF source for optimization frameworks and

N_{1} = 16

.

Figure 6. EH vs. distance between SD and RF source for optimization frameworks and

N_{1} = 16

.

Figure 7. Representation of common membership functions used in fuzzy logic systems.

Figure 8. Convergence trends of RMSE across 300 epochs for the proposed models.

Table 1. Optimized parameter configurations for the proposed and benchmark models based on iterative trial-and-error techniques.

Algorithms	Calibration Parameters	Best Value
DRL (MOABC-RL-LSTM)	$ε$ -greedy	0.18–0.91
	Learning rate	0.002
	Batch size	128
	Discount factor	0.93
	Recurrent dropout rate	0.2
	Sequence length	20
	Number of hidden layers	10
	Number of neurons in hidden layers	56
	Activation function	Tanh and sigmoid
	Optimizer	MOABC
	Number of onlooker bees	80
	Number of employed bees	120
	Population size	150
	Iteration	300
RNN	Learning rate	0.06
	Batch size	128
	Sequence length	10
	Number of hidden layers	8
	Number of neurons in hidden layers	32
	Activation function	Tanh and sigmoid
	Optimizer	SGD
XGBoost	Learning rate	0.22
	Max depth	4
	Number of estimators	300
KNN	Number of neighbors	8
	Distance metric	Euclidean distance
	Weights	Uniform
	Algorithm	Kd-tree

Table 2. Comparison of RMSE values for different fuzzy membership functions before and after optimization using the MOABC for test data across various models.

Model	RMSE for Standard Functions			RMSE for Optimized Functions
Model	Triangular	Gaussian	Trapezoidal	Triangular	Gaussian	Trapezoidal
FDRL	2.03	1.88	1.65	0.32	0.21	0.09
RL	16.27	14.26	13.90	9.21	7.62	6.32
LSTM	14.39	15.24	12.37	8.56	7.06	5.96
RNN	19.86	17.67	18.32	12.47	11.23	9.47
XGBoost	31.49	33.65	32.18	19.32	17.19	15.55
KNN	34.29	32.19	31.29	21.46	19.84	17.90

Table 3. Performance comparison of the proposed FDRL model with benchmark algorithms.

Model	Training			Testing
Model	RMSE	MAPE	$R^{2}$	RMSE	MAPE	$R^{2}$
FDRL	0.02	0.88%	0.97	0.09	1.07%	0.95
RL	3.95	7.21%	0.88	6.32	10.76%	0.82
LSTM	2.24	5.32%	0.90	5.96	8.24%	0.85
RNN	6.86	9.43%	0.86	9.47	13.29%	0.80
XGBoost	10.49	14.39%	0.82	15.55	18.74%	0.74
KNN	13.31	17.37%	0.79	17.90	21.66%	0.68

Table 4. Comparison of the run-time performance of the proposed FDRL model and benchmark algorithms under two conditions: fixed epochs and achieving specific RMSE thresholds.

Model	Run Time (s)	Run Time (s)
Model	Epoch = 300	RMSE < 20.00	RMSE < 10.00	RMSE < 3.00
FDRL	835	34	263	415
RL	693	128	432	-
LSTM	761	486	524	-
RNN	702	205	-	-
XGBoost	641	537	-	-
KNN	612	692	-	-

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khatami, S.S.; Shoeibi, M.; Salehi, R.; Kaveh, M. Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach. J. Sens. Actuator Netw. 2025, 14, 18. https://doi.org/10.3390/jsan14010018

AMA Style

Khatami SS, Shoeibi M, Salehi R, Kaveh M. Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach. Journal of Sensor and Actuator Networks. 2025; 14(1):18. https://doi.org/10.3390/jsan14010018

Chicago/Turabian Style

Khatami, Sarvenaz Sadat, Mehrdad Shoeibi, Reza Salehi, and Masoud Kaveh. 2025. "Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach" Journal of Sensor and Actuator Networks 14, no. 1: 18. https://doi.org/10.3390/jsan14010018

APA Style

Khatami, S. S., Shoeibi, M., Salehi, R., & Kaveh, M. (2025). Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach. Journal of Sensor and Actuator Networks, 14(1), 18. https://doi.org/10.3390/jsan14010018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Energy-Efficient and Secure Double RIS-Aided Wireless Sensor Networks: A QoS-Aware Fuzzy Deep Reinforcement Learning Approach

Abstract

1. Introduction

1.1. Research Gaps and Motivations

1.2. Paper Contributions and Organization

2. Related Works

3. System Model and Problem Formulation

4. Materials and Methods

4.1. Proposed DRL

4.2. Proposed Fuzzy Model

5. Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI