Inner External DQN LoRa SF Allocation Scheme for Complex Environments

Pang, Shengli; Kong, Delin; Wang, Xute; Pan, Ruoyu; Wang, Honggang; Ye, Zhifan; Liu, Di

doi:10.3390/electronics13142761

Open AccessArticle

Inner External DQN LoRa SF Allocation Scheme for Complex Environments

by

Shengli Pang

,

Delin Kong

^*,

Xute Wang

,

Ruoyu Pan

,

Honggang Wang

,

Zhifan Ye

and

Di Liu

College of Communication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(14), 2761; https://doi.org/10.3390/electronics13142761

Submission received: 23 June 2024 / Revised: 11 July 2024 / Accepted: 12 July 2024 / Published: 14 July 2024

(This article belongs to the Special Issue IoT-Enabled Smart Devices and Systems in Smart Environments)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In recent years, with the development of Internet of Things technology, the demand for low-power wireless communication technology has been growing, giving rise to LoRa technology. A LoRa network mainly consists of terminal nodes, gateways, and LoRa network servers. As LoRa networks often deploy many terminal node devices for environmental sensing, the limited resources of LoRa technology, the explosive growth in the number of nodes, and the ever-changing complex environment pose unprecedented challenges for the performance of the LoRa network. Although some research has already addressed the challenges by allocating channels to the LoRa network, the impact of complex and changing environmental factors on the LoRa network has yet to be considered. Reasonable channel allocation should be tailored to the situation and should face different environments and network distribution conditions through continuous adaptive learning to obtain the corresponding allocation strategy. Secondly, most of the current research only focuses on the channel adjustment of the LoRa node itself. Still, it does not consider the indirect impact of the node’s allocation on the entire network. The Inner External DQN SF allocation method (IEDQN) proposed in this paper improves the packet reception rate of the whole system by using reinforcement learning methods for adaptive learning of the environment. It considers the impact on the entire network of the current node parameter configuration through nested reinforcement learning for further optimization to optimize the whole network’s performance. Finally, this paper evaluates the performance of IEDQN through simulation. The experimental results show that the IEDQN method optimizes network performance.

Keywords:

LoRa; LoRa network (LN); reinforcement learning (RL); spreading factor (SF) allocation; Deep Q-Network (DQN); Inner External DQN (IEDQN)

1. Introduction

Low-power wide area network (LPWAN) [1] technology has recently revolutionized Internet of Things access. However, since the 1980s, LPWAN has begun to sprout all over the world, and some technologies have developed into large-scale networks. Typical examples include the global low-rate data network DataTAC, Mobex, which originated in Europe, and AlarmNet, which was created for fire alarms. But as time progressed, the ultimate fate of these three technologies, similar to that of LPWAN, was to decline along with decrease in the commercial use of 2G networks [2]. In 2023, with global operators using 4G as the mainstream network and beginning to use 5G on a large scale, the retirement of 2G networks began to be implemented or put on the agenda. As the Internet of Things (IoT) market advances, especially with the increasing demand for wireless connectivity for low-rate, low-power consumption scenarios, LPWAN (low-power wide area network) technologies represented by LoRa and NB-IoT [3] are filling this gap.

LPWAN technology has the characteristics of long-distance transmission, low power consumption, and low maintenance. Compared with existing technologies such as WiFi, Bluetooth, and ZigBee, LPWAN can achieve comprehensive area coverage and can complete the entire low-cost Internet of Things at a low cost. Many types of LPWAN technologies are available on the market, among which the most prominent are Long Range, SigFox, MIOTY, Weightless, and Ingenu [4]. Table 1 shows a performance indicator comparison [5] for various LPWAN technologies.

LoRa technology [6] is one of the proprietary LPWAN technologies based on chirp spread spectrum (CSS) [7] radio propagation, and it is a low-power broadband wireless communication technology. LoRa uses frequency expansion technology and forward error correction methods to operate at lower signal strengths, providing long-distance, low-power wireless connection capabilities over an extensive range. The LoRa network operates in an unlicensed frequency band and can network flexibly without relying on the network equipment of operators. Each gateway under its network can connect thousands of nodes, which is suitable for large-scale IoT deployment [8]. LoRa technology was launched by Semtech, and currently, what is generally referred to as LoRa technology mainly describes the physical layer. LoRaWAN is a communication protocol at the MAC layer. LoRaWAN is an open standard managed by the LoRa Alliance and defines a communication protocol for implementing wireless IoT connections in the LoRa network. However, because the LoRa physical layer technology is open, users can design a communication protocol suitable for real application scenarios based on specific requirements. Although the LoRa physical layer technology itself is open, it is not available in practice because the underlying chirp that implements LoRa modulation can only be obtained from Semtech [9]. Semtech provides some basic parameters that can be changed, such as the spreading factor (SF), transmission power (TP), coding rate (CR), bandwidth (BW), etc., to adapt to different LoRa networks. Among the parameters provided by Semtech, there is no doubt that SF is the most critical one. The core idea of LoRa modulation is to use a frequency-changing mode to modulate the baseband signal. The rate of chirp change is the so-called SF. The more significant the spread factor, the farther the transmission distance, but the cost is a reduction in data rate because a longer chirp has to be used to represent a symbol. In addition, different SF channels are orthogonal to each other, so transmitting data packets at different SFs simultaneously will not cause conflicts. Although in practice there is a certain degree of interference between different SFs, users can ensure that SF transmissions are orthogonal to each other by changing the remaining parameters; so in this study, it is assumed that different SFs represent different channels, and there is no collision between them.

Regarding the allocation of spreading factors, ref. [10] proposed an SF allocation technology on a LoRa network based on reinforcement learning, aiming to maximize the entire network’s throughput. This method uses the time difference reinforcement learning method to iterate and allocate channels to each node through the channel’s PCR. This method is finally compared with the uniform distribution method and the greedy algorithm to find that its effect is the best. Although this method allocates channels to each node, it does not consider the indirect impact of the node’s allocation on the entire network. The authors of [9] proposed a channel occupancy perception resource allocation method that focused on energy efficiency on a LoRa network using DQN and optimization assistance. This method first uses DQN to allocate SF to the channels collected from the terminal nodes and then solves a convex optimization problem to distribute power. This method is mainly used to solve the energy consumption problem of the system in order to minimize the energy consumption of the entire system, so the feedback factor of this DQN is energy consumption, and it does not take the system network performance as the first element. Currently, most SF allocation strategies are actively modified on the node side, which leads to improvement of the parameters on the node side but does not consider whether the entire network’s performance improves. In other words, the optimal strategy for each node is not equivalent to the optimal strategy for the whole system. Secondly, the performance optimization of the LoRa network is not only related to the parameters of each node but also depends on the complex external environment. Based on the above points, this topic will start from the complex environment and first establish a model to optimize network performance. To optimize the network performance, this article first proposes a random spreading factor allocation method (SF-random) that considers probabilities and uses a random method to perform each node’s SF allocation through the given empirical likelihood. On this basis, this article proposes the Inner External DQN SF allocation method. Through learning from the environment, the Inner DQN method can obtain the local optimal solution of each node and then use this as feedback to the External DQN to consider the impact of the local optimal solution on the optimal performance of the entire system for secondary optimization to make the performance of the whole network optimal. In addition, External DQN uses the properties of DQN to continuously give feedback to the four-tuple obtained from the interaction with the environment to the Target Q Network and Q Network for learning to predict the following behavior so that the performance of the entire system reaches the optimum in continuous iteration and learning. The main contributions of this research are as follows:

To optimize system performance, this paper proposes a method of allocating parameters to nodes through the gateway, as opposed to the traditional method of directly changing node parameters.
The method proposed in this paper first considers how to optimize each node; then it considers the impact of the optimal single node on the overall optimization and suggests the Inner External DQN SF allocation method.
The current network performance optimization methods do not take into account the impact of the environment. This paper starts from the node-optimal Inner DQN method proposed at the beginning, pointing out that SF allocation can be carried out through continuous learning with the environment.
This paper compares the proposed SF-random method using LoRaSIM simulation software and verifies its effectiveness.
Through the method proposed in this paper, a set of exclusive node SF allocation schemes corresponding to each fixed scene can be established according to local conditions. For scenes with environmental changes, the parameters of the entire system of nodes can be updated through the gateway by intermittent updates without manual operation, thereby effectively improving the entire network’s performance.

The rest of this paper is organized as follows. Section 2 presents the system model and problem description. In Section 3, this paper proposes the SF-random method for complex environments, and on this basis, the Inner External DQN SF allocation method is proposed. Section 4 evaluates the performance of the proposed Inner External DQN SF allocation method in multiple aspects. Finally, Section 5 summarizes the entire paper.

2. System Model and Problem Description

2.1. System Model

As shown in Figure 1, this paper investigates the LoRa network link model in complex environments. The network is considered a star network [11] comprising a LoRa network server, gateway, and multiple end nodes. End nodes transmit data to the gateway via LoRa, and the gateway transmits data to the LoRa network server via Ethernet/4G/WiFi. This study considers M LoRa nodes

N = {N^{(i)} | 1 \leq i \leq M}

propagating over six channels, each with orthogonal spreading factors (Equation (1)). As shown, nodes with the same channel assignment are marked with the same color.

S F = \{S F^{(i)}, i \in {1, 2, \dots, 6}, S F^{(i)} \in {7, 8, \dots, 12}\}

(1)

Considering the architecture of existing IoT networks, the gateway in this LoRa system model is primarily responsible for receiving packets sent by nodes and can modify the SF parameters of LoRa nodes via custom protocols. The spreading factor value assigned to each node is denoted as

A_{n} = S F^{(i)}

, and

S F_{flag}^{(i)}

indicates whether the spreading factor of node i has been assigned. For uplink data transmission, nodes send fixed-size packets of

L = 10

bytes to the gateway G every

T = 10

seconds following a Poisson distribution [12]. The uplink transmission delay for each node is denoted as

D^{(i)}

.

Under the interference of complex environments on LoRa link transmission, this paper specifically categorizes the impacts of complex environments into path loss and shadowing effects [13]. Shadowing effects are caused by obstacles between the transmitter and receiver, which attenuate signal power through absorption, reflection, scattering, and diffraction, potentially even blocking the signal. Since these factors are generally unknown in practice, they must be characterized by statistical models. The most commonly used model is the log-normal shadowing model, which assumes the ratio of transmitted power to received power

ψ = P_{t} / P_{r}

is a log-normally distributed random variable, yielding Equation (2), where

ξ = \frac{10}{ln 10}

,

μ_{ψ_{d B}}

is the path loss, and

σ_{ψ_{dB}}

is the standard deviation of

ψ_{dB}

in dB.

p (ψ) = \frac{ξ}{\sqrt{2 π} σ_{ψ_{d B}} ψ} exp [- \frac{{(10 {log}_{10} ψ - μ_{ψ_{d B}})}^{2}}{2 σ_{ψ_{d B}}^{2}}], ψ > 0 .

(2)

In terms of path loss, it is generally believed that the path loss generated by a signal over the same propagation distance under identical conditions is the same. The long-distance characteristics of the channel follow a log-normal distribution and are commonly modeled by a log-normal distribution model. The path loss model is shown in Equation (3), where

P L (d)

represents the path loss of the received signal at a distance

d (m)

,

P L (d_{0})

represents the path loss of the received signal at a reference distance

d_{0}

, n is the path loss exponent in a specific environment and describes the rate of loss, which increases with distance, and

X_{σ}

is in dB, represents shadow fading, and is a log-normally distributed random variable [14].

P L (d) = P L (d_{0}) + 10 n log (\frac{d}{d_{0}}) + X_{σ}

(3)

Considering the impact of the aforementioned environmental factors on data transmission, this paper uses

α = \{α_{n} ∣ 1 \leq n \leq M\}

to represent the packet arrival rate of each node. Therefore, the number of packets received by the gateway from the nodes within a time period

Δ T

is shown in Equation (4).

P_{receive} = \sum_{n = 1}^{M} α_{n} Δ T

(4)

Since each node sends packets to the gateway every T seconds according to a Poisson distribution, the theoretical number of packets received by the gateway within

Δ T

without path loss is shown in Equation (5).

P_{Theoretical} = \frac{Δ T}{T} M

(5)

The proposed model aims to optimize system performance and introduces the package receive rate (PRR) as a performance metric; the PRR indicates the proportion of successfully received packets. The PRR is commonly used to measure the reliability of the packet transmission process by indicating the proportion of successfully received packets out of the total sent packets. Observation of changes in the PRR approximates changes in system performance. Given the actual packet reception number

P_{receive}

and the theoretical reception number

P_{Theoretical}

, the PRR is calculated as shown in Equation (6).

P R R = \frac{P_{receive}}{P_{Theoretical}}

(6)

2.2. Problem Formulation

The objective of this paper is to maximize system performance [11], characterized by the packet receive rate [15], leading to the following equation:

\begin{matrix} {max}_{G, N} & P R R \\ s . t . & D^{(i)} < T & \forall i \\ S F_{flag}^{(i)} \in {0, 1} & \forall i \\ \sum_{i = 1}^{M} S F_{flag}^{(i)} = M \end{matrix}

(7)

The first constraint of Equation (7) ensures that the uplink transmission delay for each node must be less than the specified data packet transmission interval. Second constraint specifies that each node can be assigned to only one channel. The third constraint ensures that every node is assigned to a channel.

This paper utilizes the PRR to evaluate the performance of the system. Based on this, the paper first introduces a reinforcement learning approach. It then proposes the SF-random method, which is based on random spreading factor allocation, and subsequently introduces the Inner DQN and External DQN methods to explain the proposed Inner External DQN SF allocation approach.

3. SF Allocation Model Based on Inner External DQNS

3.1. Deep Reinforcement Learning

Reinforcement learning (RL) [16,17] is a domain in machine learning that deals with sequential decision making. The Markov decision process [18,19] is a typical formulation for RL problems and consists of the tuple

(S, A, T, R, ε)

, where S represents the state space, A represents the action space,

T (s, a, s^{'})

denotes the probability of transitioning from state s to state

s^{'}

after taking action a,

R (s, a, s^{'})

is the reward function and returns the reward obtained by taking action a in state s to reach state

s^{'}

, and

ε \in S

indicates terminal states that prevent future actions or rewards.

An important aspect of RL is its enabling of the agent to learn optimal behaviors, which means it gradually modifies or acquires new behaviors and skills. Another crucial aspect is the trial-and-error experience, where the agent collects information through continuous interaction with the environment and learns to determine the next action based on this information. Generally, RL problems can be reduced to discrete-time stochastic control problems, where the agent interacts with the environment, as illustrated in Figure 2: starting from an initial state

s_{0} \in S

, it collects observations

ω_{0} \in Ω

, and at each step t, the agent must take an action

a_{t} \in A

. This action results in one of three outcomes: (i) the agent receives a reward

r_{t} \in R

, (ii) the state transitions to

s_{t + 1} \in S

, or (iii) the agent obtains the next observation

ω_{t + l} \in Ω

[20].

When neural networks are used to approximate the value function

\hat{v} (s; θ)

or

\hat{q} (s, a; θ)

in RL, it is termed deep reinforcement learning [21], where

θ

represents the weights in the deep neural network. DRL commonly employs gradient descent to update the weight parameters [22]. The Deep Q-Network proposed by Mnih and others has become the benchmark and building point for many deep reinforcement learning researches.

The Q function is a concept associated with MDPs, where

Q^{π} : S \times A \to R

defines the expected future discounted reward of taking action A in state S and then following policy

Π

. According to the Bellman equation, the Q function under the optimal policy can be recursively expressed as:

Q^{*} (s, a) = \sum_{s^{'} \in S} T (s, a, s^{'}) [R (s, a, s^{'}) + γ max_{a^{'}} Q^{*} (s^{'}, a^{'})]

(8)

Here,

0 \leq γ \leq 1

is the discount factor and determines the value of immediate rewards relative to future rewards. Given

Q^{*}

and the optimal policy

Π^{*}

, the highest Q-value for the current state can be obtained by greedily selecting the action:

π^{*} (s) = arg {max}_{a} Q^{*} (s, a)

. In Q-learning, the agent starts with an arbitrary estimate

Q_{0}

and updates its Q function estimate by taking arbitrary actions in the environment, as show in Equation (9); the agent observes the rewards of these actions and the subsequent state. The terms

s_{t}, a_{t}, r_{t}

represent the state, action, and reward, respectively, at step t, and

α_{t}

is the step size parameter [23].

Q_{t + 1} (s_{t}, a_{t}) = Q_{t} (s_{t}, a_{t}) + α_{t} [r_{t + 1} + γ max_{a^{'}} Q_{t} (s_{t + 1}, a^{'}) - Q_{t} (s_{t}, a_{t})]

(9)

Deep Q-Learning (DQN) [24] is a variant of the classical Q-learning algorithm that uses deep convolutional neural networks [25] to approximate the Q function [26]. The deep convolutional architecture provides a general mechanism for estimating Q function values from historical experiences. DQN maintains a large set of recent experiences, where each experience is a tuple

(s, a, s^{'}, r, T)

, representing the state s, the action a taken to reach state

s^{'}

, the reward r received, and T indicating if the state is terminal. After each step, the agent adds the experience to its memory and randomly samples mini-batches for Q function updates. Experience replay shuffles the data to eliminate correlations between experiences, making them independently and identically distributed, thereby reducing variance in parameter updates and enhancing convergence speed. DQN introduces target networks

Q (s, a; w^{-})

and evaluation networks

Q (s, a; w)

[27], which have the same structure but different parameters. The evaluation network guides the agent’s behavior, while the target network is used to compute the TD target. Only the evaluation network’s weights are updated, and after a certain number of steps, the target network’s weights are updated to match the evaluation network’s, thereby stabilizing learning by keeping the target value relatively constant over several updates [28].

3.2. SF-Random

In traditional LoRa networks, each node’s spreading factor (SF) is allocated using a uniform distribution method. However, due to the characteristics of LoRa, as referenced in the Semtech SX1262 datasheet, the air transmission time for a node when all SF channels have signal coverage is given by:

ToA = \frac{2^{S F}}{B W} * N_{symbol}

(10)

where SF represents the spreading factor, BW is the bandwidth, ToA is the air time for LoRa transmission, and

N_{symbol}

is the number of symbols [29]. This indicates that air time is proportional to SF: as air time increases, the probability of a node occupying the channel increases, leading to a higher likelihood of collisions within the same channel. Therefore, using a uniform distribution for SF allocation is unsuitable under the SF channel coverage conditions described in this paper. As the allocation probability for a single SF channel increases, the number of nodes assigned to that channel also increases. For low-SF channels, although the data transmission rate per node is faster, the collision probability also increases due to the higher number of nodes. Algorithm 1 presents the steps for calculating the optimal SF allocation probability.

Algorithm 1 The formula for calculating the optimal probability for SF-random

1:: Input: The SF values (ranging from 7–12) and the transmission time interval D of each node.
2:: Output: Statistical table of SF allocation probabilities.
3:: Initialization: $S F_{f l a g}^{(i)} = 0, i \in {1, 2, \dots, M}; S F_{count}^{(i)} = 0, i \in {7, 8, \dots, 12}$ ; $S F_{probability}^{(i)} = 1 / 6, i \in {7, 8, \dots, 12}; A_{i} = 0, i \in {1, 2, \dots, M}$ ; HashTable().
4:: while 1 do
5:: for $i = 1, 2 \dots$ M do
6:: $A_{i} = random (S F_{probability}^{(i)})$
7:: end for
8:: Calculating PRR
9:: HashTable.append $(S F_{probability}^{(i)}, PRR)$
10:: $S F_{probability}^{(7)} + 1.6 %; S F_{probability}^{(8)} + 0.2 %; S F_{probability}^{(9)} + 0 %; S F_{probability}^{(10)} - 0.5 %;$ $S F_{probability}^{(11)} - 0.6 %; S F_{probability}^{(12)} - 0.7 %$ ;
11:: $A_{i} = 0, i \in {1, 2 \dots M}$
12:: if $S F_{probability}^{(7)} > 52 %$ then
13:: Output: max HashTable.
14:: end if
15:: end while

During the initialization phase, each node is given the same SF allocation probability, and SF allocation begins according to

S F_{probability}^{(i)}

. After each SF allocation, the corresponding counter increments by 1, and the allocation flag is set. When all nodes’ SF allocations are completed, the system’s PRR value is calculated and recorded in the hash table. By fine-tuning the SF allocation probability to meet the aforementioned distribution trend, this process is repeated until the allocation probability for SF7 reaches 52%, indicating the optimal probability calculation is complete. Reviewing the recorded hash table, the SF allocation probability for the maximum system PRR is obtained.

Figure 3 shows the variation of the PRR under different SF allocation probabilities. Observation reveals that the PRR peaks at 98.09% at the 17th iteration. Thereafter, as the SF allocation probability changes, collisions among nodes with the same SF increase, causing the PRR to gradually decline until the iterations stop. Table 2 shows the optimal SF allocation ratio for the SF-random method, which significantly optimizes the overall system performance compared to the initial uniform distribution.

3.3. Inner External DQN SF Allocation Scheme

3.3.1. Inner System DQN

Considering the SF allocation for a single node, this paper proposes the Inner System DQN method to optimize the SF allocation for individual nodes through continuous interaction and learning with the environment. The primary objective of the Inner System DQN model is to maximize the signal-to-noise ratio for a single node.

During the optimization process, due to potential multipath effects and co-channel interference among different nodes, it is impractical to optimize the SNR through manual parameter adjustments. Therefore, using the DQN method to directly obtain state inputs from the environment and to output corresponding Q-value estimates for actions allows efficient end-to-end learning, ultimately maximizing the SNR. To clarify the relationship between the Inner System DQN and the subsequent External System DQN method, we define Inner_Ite as the number of one-step iterations required by the Inner System DQN, after which, the process transitions to the External System DQN. Figure 4 illustrates its detailed internal design. Next, we define the parameters for the Inner System DQN algorithm, including the state space, action space, and reward function.

State space: Given that the goal of the Inner System DQN is to maximize the node’s SNR, we represent the state space of the node as

S^{(i)} = S N R

. A higher SNR indicates a higher power ratio of the signal to noise, resulting in stronger signals, less noise, and improved transmission quality and overall system performance.

Action space: The Inner System DQN optimizes the current node’s parameters by modifying the SF. Since the SF range is typically from 7 to 12, the action space is also restricted to this range to ensure compatibility.

Reward function: The objective of the Inner DQN is to optimize the node’s SNR; thus,

r = S N R

is used as the feedback. Through this feedback, the Inner System DQN interacts with the environment to maximize the cumulative reward in the long term.

a_{t} = \{\begin{matrix} random & with probability ϵ, \\ arg {max}_{a} Q (s_{t}, a) & with probability 1 - ϵ . \end{matrix}

(11)

In the Inner System DQN, the agent needs to learn the optimal SF allocation strategy in an unknown environment. During the exploration phase, the agent attempts unknown actions to discover more information about the environment; during the exploitation phase, the agent selects the optimal actions based on the knowledge it has acquired. The Inner System DQN uses the

ϵ

-greedy strategy to balance exploration and exploitation [30]. Specifically, the

ϵ

-greedy strategy selects random actions with probability

ϵ

and the best current action with probability

1 - ϵ

, as shown in Equation (11). This enables the agent to try different actions during learning: avoiding local optima and gradually reducing the exploration frequency to more effectively utilize the learned knowledge.

This paper presents the pseudo-code steps of the Inner System DQN in Algorithm 2. Since the environment for each node differs, it is necessary to first determine the ID of the operating node. For this node, the Inner System DQN model continuously learns through interaction with the environment by using the

ϵ

-greedy strategy. During the model generation process, each step first determines the next action based on the current model predictions and the

ϵ

-greedy strategy; then it obtains the corresponding reward signal from the environment and saves these experiences through the experience replay mechanism. Subsequently, part of the saved experience data are extracted to update the model parameters, gradually improving model performance. In this way, the model learns to make accurate actions and continually increases its rewards.

Algorithm 2 Inner System DQN SF allocation algorithm

1:: Input: The SF values, which range from 7 to 12; the ID of the node for which SF optimization is performed; Inner_Ite value; Learning_Episodes value; Learning_EP_Steps value.
2:: Output: The optimal SF allocation value.
3:: Initialization: Step_count $= 0; A_{i} = 0;$ Total $= 0$
4:: for $i = 1, 2 \dots$ Learning_Episodes do
5:: for $j = 1, 2 \dots$ Learning_EP_Steps do
6:: $a =$ Based on $ϵ - greedy$ , select action InnerDQN.choose action()
7:: s, $r =$ Based on InnerDQN.step(a), obtain the state and the reward
8:: InnerDQN.store transition(s, a, r, s)
9:: The neural network is trained by randomly drawing samples from the experience replay buffer InnerDQN.learn()
10:: Restore the original state InnerDQN.render()
11:: end for
12:: end for
13:: for Step_count < Inner_Ite do
14:: $a =$ Based on $ϵ - greedy$ , select action InnerDQN.choose action()
15:: $s, r =$ Based on InnerDQN.step(a), obtain the state and the reward
16:: $A_{i} = a$
17:: Total = Total $+ r$
18:: Step_count = Step_count + 1
19:: end for

The trained Inner System DQN model performs SF optimization iterations within the specified Inner_Ite: no longer training the convolutional neural network for Q-values but using the trained network for action selection. When Step_count reaches the preset Inner_Ite, the training ends and the optimal SF allocation is provided. Since the reward function for the Inner System DQN model is defined as the SNR, the neural network will eventually always choose the action that yields the maximum reward after multiple attempts, resulting in the Inner System DQN algorithm returning the optimal SF as the choice from the last action.

3.3.2. External System DQN

To maximize the overall system performance, this paper proposes the External System DQN method. This approach combines with the Inner System DQN to explore the impact of individual node parameter adjustments on the entire system, optimizing the system’s packet reception rate, which was the initial goal of this paper:

max P R R

(12)

As shown in Figure 5, the internal block diagram of the External System DQN illustrates how this method optimizes action selection through interaction with the environment to maximize future cumulative rewards. Although the optimization process of DQN is theoretically non-convergent, the algorithm is considered to have converged when action selection becomes stable within a certain runtime, reaching the termination condition. To facilitate this, this paper introduces the External_Ite parameter, which describes the upper limit on the number of consecutive identical actions, thus providing an exit condition in order for the DQN to stop. Next, we define the parameters for the External System DQN, including the state space, action space, and reward function.

State space: The goal is to maximize the system’s PRR, so the state space of the entire system is represented by

s = P R R

. Changes in the PRR reflect the system’s performance and interference conditions in the wireless network. A sudden drop in the PRR might indicate that the SF allocation by the Inner System DQN for a single node has disrupted the entire system.

Action space: The External System DQN balances the impacts of the Inner System DQN on the system. Actions are described based on which node to modify to improve the PRR, with the action space

a = {1 \dots M}

representing the node ID to be modified. Once an action is selected, it is implemented by passing the terminal node ID to the Inner System DQN to allocate the optimal SF.

r = \{\begin{matrix} 5 & Δ P R R > μ \\ - 1 & - μ < Δ P R R < μ \\ - 2 & Δ P R R < - μ \end{matrix}

(13)

Reward function: To encourage the External System DQN to explore extensively for the optimal system SF allocation, the reward function is defined as shown in Equation (13), which provides different rewards based on changes in the PRR, where

μ

is a reference empirical value. Positive rewards guide the agent to frequently take beneficial actions for the system, while neutral rewards are negative, indicating that exploration is not desirable. Ultimately, the method will converge to continuous allocation for one node, indicating that the optimum has been reached.

This paper presents the steps of the External System DQN in Algorithm 3. For a given network of M nodes, External_Ite is set as the final decision threshold of the system, meaning the system is considered optimal if the same node is chosen consecutively for External_Ite times. Learning_Episodes represents the number of learning cycles in the system model establishment process, and Learning_EP_Steps represents the number of steps in each learning cycle. During the system model establishment, actions obtained through the External System DQN neural network are input to the Inner System DQN, and the feedback results of the state and reward are observed and stored in the experience pool. Then, part of the experience data is used to train the neural network to establish the system model. If External_Ite is not reached, the same steps are repeated to obtain actions through the neural network, iterating until convergence and outputting the optimal per-node SF allocation statistics table.

Algorithm 3 External System DQN SF allocation algorithm

1:: Input: The nodes in the range of $1 - M$ ; External_Ite value; Learning_Episodes value; Learning_EP_Steps value.
2:: Output: The optimal per-node SF allocation statistics table.
3:: Initialization: Action_count $= 0; A_{old} = 0;$ Total $= 0;$
4:: for $i = 1, 2 \dots$ Learning_Episodes do
5:: for $j = 1, 2 \dots$ Learning_EP_Steps do
6:: $a =$ Based on $ϵ - greedy$ , select action ExternalDQN.choose_action()
7:: s, $r =$ Based on Inner System DQN ExternalDQN.step(a), obtain the state and
8:: the reward
9:: ExternalDQN.store_transition(s, a, r, s_)
10:: The neural network is trained ExternalDQN.learn() by randomly drawing samples from the experience replay buffer
11:: Restore the original state ExternalDQN.render()
12:: end for
13:: end for
14:: for Action_count < External_Ite do
15:: $a =$ Based on $ϵ - greedy$ , select action ExternalDQN.choose_action()
16:: s, r = Based on ExternalDQN.step(a), obtain the state and the reward
17:: Total = Total $+ r$
18:: if $a = = A_{old}$ then
19:: Action_count = Action_count +1
20:: end if
21:: if $a! = A_{old}$ then
22:: QAction_count $= 0$
23:: end if
24:: $A_{old} = a$
25:: end for

4. Simulation Results

4.1. Simulation of LoRa Transmission Environment

The experimental verification of the Inner and External DQN SF allocation method was conducted in the interactive programming environment of Jupyter Notebook [31] and utilizing the Python programming language for simulation. To obtain realistic simulation results, the study employed the widely used LoRa simulator. LoRaSim [32] is a discrete event simulator developed using the SimPy package in Python. It determines whether a collision has occurred between two packets, x and y, by considering parameters such as overlap

O (x, y)

, carrier frequency ’

C_{freq} (x, y)

’, spreading factor ’

C_{sf}

’, transmission power ’

C_{pwr} (x, y)

’, and transmission delay ’

C_{cs} (x, y)

’ [33].

Considering the impact of complex environments, this study modified the path loss method in LoRaSim using the specific calculations detailed in Equations (2) and (3). Table 3 lists the specific experimental parameters of the LoRaSim simulation model, with some parameters referencing the configuration of the ASR6500 LoRa chirp used in real-world environments. The transmission range in this study is 1.7 × 1.7 square kilometers, with all nodes set to a transmission power of 22 dBm, a bandwidth of 500 kHz, a coding rate of 4/5, and an SF range of 7–12. To validate the proposed SF allocation method, the number of nodes was set to 100, and the number of gateways was set to 1, reflecting a real IoT application scenario. Figure 6 shows the environmental model constructed using the LoRaSim simulation platform, with red indicating the gateway location and blue indicating the node locations.

4.2. Inner System DQN SF Allocation Simulation

In this section, we conduct simulation verification of the Inner System DQN SF allocation method. To achieve more precise SF allocation, we set the number of training iterations to 500, with each episode consisting of 200 steps. After each training step, the model is trained by evaluating both the neural network and the target neural network. Table 4 shows the detailed parameters of the Inner DQN.

To obtain the optimal SF allocation strategy, we first verify through simulation experiments the impact of the

ϵ

value on action selection. As shown in Figure 7, the action changes with one-step increments as

ϵ

varies from 0 to 1 with a step size of 0.1 under the same environment. In

ϵ

-greedy,

ϵ

controls the balance between random actions and actions based on the current optimal policy during training. When

ϵ

is 0, the agent selects actions entirely according to the current optimal policy, which is referred to as the ‘greedy policy’. When

ϵ

is 1, the agent selects actions completely at random, which is known as the ‘random policy’. Between these extremes, the agent selects random actions with probability

ϵ

and optimal actions with probability

1 - ϵ

. Observational results show that as

ϵ

increases, the agent’s exploration decreases, and when

ϵ

reaches 1, the Inner System DQN stops exploring and only follows the previously learned optimal policy. To balance exploration and exploitation, we set

ϵ

to 0.7, meaning the agent selects random actions with 70% probability and optimal actions with 30% probability. This setup encourages the agent to explore the environment and discover more action choices, avoid premature convergence to local optima, and gradually converge to the global optimal policy.

We used the LoRaSim simulation platform to verify the feasibility of the Inner System DQN method. During the training process, we considered environmental interference using the model to ensure that the DQN’s learning process also accounted for environmental factors. The model was constructed through iterative training, and a random node was selected for simulation. The node’s SNR variation with steps is shown in Figure 8. The results indicate that the model ultimately converges to the optimal point with an SF of 12 during exploration.

As shown in Figure 9, the histogram of the action distribution during the Inner DQN process provides statistical data. Initially, the number of nodes that are allocated different SFs are SF7: 35, SF8: 36, SF9: 16, SF10: 9, SF11: 1, and SF12: 3, with the test node’s initial SF set to 7. The goal of the Inner DQN is to maximize the SNR of a single node, and its action is to allocate the optimal SF to that node. Through preliminary model learning and rewards, it is observed that action selection involves an exploration process, which ultimately converges to SF12, further verifying the feasibility of the Inner DQN.

4.3. External System DQN SF Allocation Simulation

To maximize the PRR of the entire system, the External DQN uses the Inner DQN as its action and adjusts different actions based on the PRR feedback in order to achieve the optimal overall PRR. Table 5 presents the detailed simulation parameter configuration of the External DQN, which includes additional parameters for the External System DQN on top of those for the Inner System DQN. Considering the runtime during the model training phase, the training iterations for the Inner DQN model are set to 100, with 100 steps per episode, while the External DQN is trained for 100 iterations, with 50 steps per episode.

Figure 10 shows the linear graph for action selection using the Inner External DQN method. Since there are 100 nodes, the method sets the range of action selection to [1–100]. Selecting one of the nodes with action indicates that the Inner DQN is used to optimize the parameters of this node. Through simulation, it is found that as the steps progress, each node finally converges to a fixed node value. The above indicates that the current system has converged to a stable value. After selecting the remaining actions, the rewards will no longer increase.

Figure 11 illustrates the line graph of PRR variation when using the Inner External DQN method. Since this study considers different SF allocation methods for nodes in complex environments, it is believed that node parameter allocation should be tailored to different environments. Figure 11a provides a detailed view of node allocation changes in a random environment. In a complex environment, the initial PRR of the system is relatively high: reaching convergence after 300 one-step iterations, indicating that the External DQN method gradually approaches optimality during iterations. Figure 11b shows the overall PRR variation line graph for the External DQN in a random environment. In a complex environment, the initial PRR is not high, but the overall trend is upward progression until convergence.

Because the proposed Inner External DQN method iterates using the SF-random initial allocation strategy, it significantly improves the overall system PRR compared to the strategy based only on the SF-random allocation method.

5. Conclusions

This paper addresses the issue of SF parameter allocation for LoRa network nodes in complex environments. Initially, it introduces a complex environmental model for LoRa networks that considers the impact of environmental factors on data transmission and proposes the research objective of maximizing system performance. Based on this, the paper presents two methods: the SF-random method, which allocates spreading factors randomly, and the Inner External DQN SF allocation method. The Inner External DQN SF method builds upon SF-random by employing inner and outer DQN allocation strategies, thereby addressing the issue of local optima that arises from traditional methods that focus only on individual node optimization. Due to the Inner DQN method’s ability to achieve node optimization through environmental exploration, the proposed method demonstrates good adaptability in complex environments.

The proposed model is simulated using the LoRaSim simulation platform; we include experimental validation of important parameters such as the selection of

ϵ

using the

ϵ

-greedy method. The results show significant performance improvements compared to methods that only consider node parameter optimization or ignore the impact of complex environments. This method provides a more optimal system channel parameter allocation scheme. Depending on the current environment, it can allocate the most suitable parameters to each node, providing good physical channel resources for subsequent operations. Future research can build on the current framework by modifying environmental impact factors to adapt to different application scenarios to target more detailed application environments.

Author Contributions

Conceptualization, D.K. and S.P.; methodology, D.K.; software, D.K.; validation, D.K., X.W. and R.P.; formal analysis, D.K.; investigation, D.L.; resources, D.K.; data curation, Z.Y.; writing—original draft preparation, D.K.; writing—review and editing, D.K.; visualization, H.W.; supervision, D.K.; project administration, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Key Industry Innovation Chain Project of Shaanxi Province (No. 2021ZDLGY07-10 and No. 2021ZDLNY03-08), the Science and Technology Plan Project of Shaanxi Province (No. 2022GY-045), the Key Research and Development Plan of Shaanxi Province (No. 2018ZDXM-GY-041), the Scientific Research Program funded by the Shaanxi Provincial Education Department (program No. 21JC030), the Science and Technology Plan Project of Xi’an (No. 22GXFW0124 and No. 2019GXYD17.3), the National Innovation and Entrepreneurship Training Program for College Students (No. 202311664001), and the Key Research and Development Project of Shaanxi Province (program No. 2024GX-YBXM-025).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank Xi’an University of Posts and Telecommunications in Shaanxi Province, China, and the Shaanxi Provincial Research Program for their valuable support in providing a real site and state-of-the-art research facilities for the successful implementation of this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Raza, U.; Kulkarni, P.; Sooriyabandara, M. Low Power Wide Area Networks: An Overview. IEEE Commun. Surv. Tutorials 2017, 19, 855–873. [Google Scholar] [CrossRef]
Chettri, L.; Bera, R. A Comprehensive Survey on Internet of Things (IoT) Toward 5G Wireless Systems. IEEE Internet Things J. 2020, 7, 16–32. [Google Scholar] [CrossRef]
Chen, M.; Miao, Y.; Hao, Y.; Hwang, K. Narrow Band Internet of Things. IEEE Access 2017, 5, 20557. [Google Scholar] [CrossRef]
Mekki, K.; Bajic, E.; Chaxel, F.; Meyer, F. A comparative study of LPWAN technologies for large-scale IoT deployment. ICT Express 2019, 5, 1–7. [Google Scholar] [CrossRef]
Sebastian, J.; Sikora, A.; Schappacher, M.; Amjad, Z. Test and Measurement of LPWAN and Cellular IoT Networks in a Unified Testbed. In Proceedings of the 2019 IEEE 17th International Conference on Industrial Informatics (INDIN), Helsinki, Finland, 22–25 July 2019; Volume 1, pp. 1521–1527. [Google Scholar] [CrossRef]
Georgiou, O.; Raza, U. Low Power Wide Area Network Analysis: Can LoRa Scale? IEEE Wirel. Commun. Lett. 2017, 6, 162–165. [Google Scholar] [CrossRef]
Vangelista, L. Frequency Shift Chirp Modulation: The LoRa Modulation. IEEE Signal Process. Lett. 2017, 24, 1818–1821. [Google Scholar] [CrossRef]
Anonymous. Semtech: AN1200.22—LoRa Modulation Basics. 2015. Available online: https://connections-qj.org/article/semtech-an120022-lora-modulation-basics (accessed on 11 July 2024).
Qin, Z.; Li, J.; Gu, B. Channel-Occupation-Aware Resource Allocation in LoRa Networks: A DQN-and-Optimization-Aided Approach. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 1455–1460. [Google Scholar] [CrossRef]
Hong, S.; Yao, F.; Zhang, F.; Ding, Y.; Yang, S.H. Reinforcement learning approach for SF allocation in LoRa network. IEEE Internet Things 2023, 10, 18259–18272. [Google Scholar] [CrossRef]
Lee, H.C.; Ke, K.H. Monitoring of Large-Area IoT Sensors Using a LoRa Wireless Mesh Network System: Design and Evaluation. IEEE Trans. Instrum. Meas. 2018, 67, 2177–2187. [Google Scholar] [CrossRef]
Paxson, V.; Floyd, S. Wide area traffic: The failure of Poisson modeling. IEEE/ACM Trans. Netw. 1995, 3, 226–244. [Google Scholar] [CrossRef]
Petäjäjärvi, J.; Mikhaylov, K.; Roivainen, A.; Hänninen, T.; Pettissalo, M. On the coverage of LPWANs: Range evaluation and channel attenuation model for LoRa technology. In Proceedings of the 2015 14th International Conference on ITS Telecommunications (ITST), Copenhagen, Denmark, 2–4 December 2015; pp. 55–59. [Google Scholar]
Shang, F.; Su, W.; Wang, Q.; Gao, H.; Fu, Q. A Location Estimation Algorithm Based on RSSI Vector Similarity Degree. Int. J. Distrib. Sens. Netw. 2014, 10, 371350. [Google Scholar] [CrossRef]
Sudarsanam, A.; Kallam, R.; Dasu, A. PRR-PRR Dynamic Relocation. IEEE Comput. Archit. Lett. 2009, 8, 44–47. [Google Scholar] [CrossRef]
Xiao, L.; Wan, X.; Lu, X.; Zhang, Y.; Wu, D. IoT Security Techniques Based on Machine Learning: How Do IoT Devices Use AI to Enhance Security? IEEE Signal Process. Mag. 2018, 35, 41–49. [Google Scholar] [CrossRef]
Zhang, R.; Xiong, K.; Lu, Y.; Fan, P.; Ng, D.W.K.; Letaief, K.B. Energy Efficiency Maximization in RIS-Assisted SWIPT Networks With RSMA: A PPO-Based Approach. IEEE J. Sel. Areas Commun. 2023, 41, 1413–1430. [Google Scholar] [CrossRef]
Stevens-Navarro, E.; Lin, Y.; Wong, V.W.S. An MDP-Based Vertical Handoff Decision Algorithm for Heterogeneous Wireless Networks. IEEE Trans. Veh. Technol. 2008, 57, 1243–1254. [Google Scholar] [CrossRef]
Wang, J.; Sun, Y.; Wang, B.; Ushio, T. Mission-Aware UAV Deployment for Post-Disaster Scenarios: A Worst-Case SAC-Based Approach. IEEE Trans. Veh. Technol. 2024, 73, 2712–2727. [Google Scholar] [CrossRef]
François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An Introduction to Deep Reinforcement Learning. Found. Trends Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef] [PubMed]
Li, Y. Deep Reinforcement Learning: An Overview. arXiv 2018, arXiv:1701.07274. [Google Scholar]
Zhang, R.; Xiong, K.; Lu, Y.; Gao, B.; Fan, P.; Letaief, K.B. Joint Coordinated Beamforming and Power Splitting Ratio Optimization in MU-MISO SWIPT-Enabled HetNets: A Multi-Agent DDQN-Based Approach. IEEE J. Sel. Areas Commun. 2022, 40, 677–693. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2018, arXiv:1608.06993. [Google Scholar]
Li, Y.; Zhang, W.; Wang, C.X.; Sun, J.; Liu, Y. Deep Reinforcement Learning for Dynamic Spectrum Sensing and Aggregation in Multi-Channel Wireless Networks. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 464–475. [Google Scholar] [CrossRef]
Liang, W.; Huang, W.; Long, J.; Zhang, K.; Li, K.C.; Zhang, D. Deep Reinforcement Learning for Resource Protection and Real-Time Detection in IoT Environment. IEEE Internet Things J. 2020, 7, 6392–6401. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.A.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Semtech Corporation. SX1261/2 Data Sheet. 2019. Available online: https://cdn.sparkfun.com/assets/6/b/5/1/4/SX1262_datasheet.pdf (accessed on 11 July 2024).
Bubeck, S.; Cesa-Bianchi, N. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. arXiv 2012, arXiv:1204.5721. [Google Scholar]
Wang, J.; Kuo, T.y.; Li, L.; Zeller, A. Restoring Reproducibility of Jupyter Notebooks. In Proceedings of the 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Seoul, Republic of Korea, 27 June–19 July 2020; pp. 288–289. [Google Scholar]
Bor, M.C.; Roedig, U.; Voigt, T.; Alonso, J.M. Do LoRa Low-Power Wide-Area Networks Scale? In Proceedings of the 19th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, Valletta, Malta, 13–17 November 2016. [Google Scholar]
Farooq, M.O.; Pesch, D. Poster: Extended LoRaSim to Simulate Multiple IoT Applications in a LoRaWAN. In Proceedings of the European Conference/Workshop on Wireless Sensor Networks, Madrid, Spain, 14–16 February 2018. [Google Scholar]

Figure 1. Diagram of LoRa network link model in complex environment.

Figure 2. Interaction of agent–environment in reinforcement learning.

Figure 3. PRRs under different SF allocation probabilities.

Figure 4. Inner System DQN diagram.

Figure 5. External System DQN diagram.

Figure 6. LoRaSim simulation environment model.

Figure 7. Action selection changes under different

ϵ

values.

Figure 7. Action selection changes under different

ϵ

values.

Figure 8. Line graph of SNR variation with steps for Inner DQN.

Figure 9. Histogram of Inner DQN action distribution.

Figure 10. External DQN action distribution histogram.

Figure 11. Inner External DQN PRR change line graph. (a) Detailed change chart (b) overall trend change chart.

Table 1. Key performance indicators for different LPWAN technologies [5].

Parameter	LoRa/LoRaWAN	NB-IOT	SigFox
Range	<14 km	<22 km	<17 km
Frequency Spectrum	unlicensed	licensed	unlicensed
Signal Bandwidth	125 kHz	180 kHz	0.1 kHz
Data Rate	<10 kBbps	200 kBbps	10 Bps
Open Standard	yes	partial	partial
Deployment	widely	widely	widely

Table 2. Optimal SF-random SF assignment probabilities.

SF	7	8	9	10	11	12
P	42.2667%	19.8667%	16.6667%	8.6667%	7.0667%	5.4667%

Table 3. LoRaSim simulation model parameters.

Area size $D \times D$	$1.7 \times 1.7 [{k m}^{2}]$
Transmission power $P_{t x}$	$22 [dBm]$
Path loss index $γ$	2.08
Reference distance $d_{0}$	$1.5 [km]$
Frequency band	$470 [MHz]$
Bandwidth $W$	$500 [kHz]$
Standard deviation $σ$	$1 [dB]$
Noise figure $N_{F}$	10–20 [dB]
Initial proportion of $SF$	7: $32.00 %;$
	8: $24.00 %;$
	9: $16.00 %;$
	10: $12.00 %;$
	11: $4.00 %;$
	12: $4.00 %;$
Coding rate $C$	$4 / 5$
Number of LoRa nodes $M$	100
Number of LoRa gateways $G$	1
Size of the payload Bpl	$10 \times 8$ [bit]
Transmission cycle of LoRa nodes	10,000 [ms]
Maximum protocol data units	8

Table 4. Inner System DQN detailed parameters.

$ϵ$	0.7
Replay buffer size	2000
Batch size	32
Learning rate	0.1
Target network update frequency	20
Reward discount $γ$	0.9
Episodes	500
Episode steps	200

Table 5. External System DQN detailed parameters.

Inner $ϵ$	0.7
Inner replay buffer size	2000
Inner batch size	32
Inner learning rate	0.1
Inner target network update frequency	20
Inner reward discount $γ$	0.9
Inner episodes	100
Inner episode steps	100
External $ϵ$	0.5
External replay buffer size	2000
External batch size	10
External learning rate	0.1
External target network update frequency	20
External episodes	100
External episode steps	50
Allocation steps	1000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pang, S.; Kong, D.; Wang, X.; Pan, R.; Wang, H.; Ye, Z.; Liu, D. Inner External DQN LoRa SF Allocation Scheme for Complex Environments. Electronics 2024, 13, 2761. https://doi.org/10.3390/electronics13142761

AMA Style

Pang S, Kong D, Wang X, Pan R, Wang H, Ye Z, Liu D. Inner External DQN LoRa SF Allocation Scheme for Complex Environments. Electronics. 2024; 13(14):2761. https://doi.org/10.3390/electronics13142761

Chicago/Turabian Style

Pang, Shengli, Delin Kong, Xute Wang, Ruoyu Pan, Honggang Wang, Zhifan Ye, and Di Liu. 2024. "Inner External DQN LoRa SF Allocation Scheme for Complex Environments" Electronics 13, no. 14: 2761. https://doi.org/10.3390/electronics13142761

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inner External DQN LoRa SF Allocation Scheme for Complex Environments

Abstract

1. Introduction

2. System Model and Problem Description

2.1. System Model

2.2. Problem Formulation

3. SF Allocation Model Based on Inner External DQNS

3.1. Deep Reinforcement Learning

3.2. SF-Random

3.3. Inner External DQN SF Allocation Scheme

3.3.1. Inner System DQN

3.3.2. External System DQN

4. Simulation Results

4.1. Simulation of LoRa Transmission Environment

4.2. Inner System DQN SF Allocation Simulation

4.3. External System DQN SF Allocation Simulation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI