1. Introduction
The Low-Power Wide-Area Network (LPWAN) is a technology that enables low-power and long-distance communication for Internet of Things (IoT) applications [
1]. The number of IoT devices using the communication protocols that belong to LPWAN has been rapidly increasing in recent years [
2]. Among the LPWAN protocols, Long-Range (LoRa) systems attract attention because they do not require a license, but have an open standard. Besides, they can be built at a low cost. As a result, the number of LoRa devices is projected to grow to 730 million by 2023 [
3]. Since the spectrum resource is limited, it may be difficult to support the communication of the massive number of increasing LoRa devices using traditional LoRa protocols. Hence, it is a critical issue to increase the number of LoRa devices in LoRa systems. To address this issue, it is necessary to adjust the transmission parameters of the LoRa devices, such as the Spreading Factor (SF), channel, transmission power, bandwidth, and distance, to adapt to the surrounding communication environment to maximize the spectrum efficiency. In this paper, we considered the selection of two main important transmission parameters that may affect the spectrum efficiency, i.e., the SF and channel. The impact of the transmission power, bandwidth, and distance on the communication capacity will be considered in our future work.
In LoRa systems, the Chirp Spread Spectrum (CSS) technique that uses a chirp signal whose frequency increases linearly with time is adopted, making the LoRa devices more resistant to interference [
4]. The value of the SF determines the bit rate, the receiver sensitivity, and the Signal-to-Noise Ratio (SNR) threshold to correctly demodulate the signal. A smaller SF allows communication at a higher bit rate, but requires a higher SNR. Hence, communication is only feasible over relatively short distances and in paths with few obstacles for a smaller SF generally. Conversely, a larger SF is adapted to communications with longer distances but lower data rates. In addition, the orthogonality of different SFs can avoid interference among them [
5,
6]. Moreover, the theoretical Frame Success Rate (FSR) for Pure ALOHA, which is constantly employed in LoRa systems, also depends on the SF, the number of LoRa devices, and the duty cycle [
7]. In Pure ALOHA, the LoRa devices access the channel randomly [
8]. The LoRa systems constantly use the Pure ALOHA protocol without the concept of the time slots for communication, which makes them easy to implement. This simple protocol enables low-power communication and is suitable for IoT applications. For instance, assuming that there are 30 LoRa devices and each LoRa device sends 50 bytes of packets per 20 s in a setup similar to the experiments conducted in this paper, the theoretical packet transmission success probabilities for different SFs are different, which are as indicated in
Table 1, along with the bit rate, receive sensitivity, and SNR threshold [
9]. As described above, it is necessary to use an appropriate SF to transmit packets for the LoRa devices based on the distance between the GateWay (GW) to achieve higher communication performance and scalability in LoRa systems. In addition, the SF must be optimized according to the requirements of the LoRa applications in practical scenarios, such as the bit rate and the communication distance [
10]. In addition to the SF, another communication parameter significantly impacting communication performance in LoRa systems is the channel [
11]. The Pure ALOHA protocol does not perform carrier sensing, i.e., it does not sense the communication channel before sending packets, but communicates by random access. Furthermore, many devices, including LoRa devices, communicate using the Industrial, Scientific, and Medical (ISM) band since it is an unlicensed band. Hence, collisions and interference are prevalent in the communication link between LoRa devices and GWs in the IoT networks with massive numbers of IoT devices [
12]. As the data traffic and load increase within the ISM band, network performance will be significantly degraded without proper channel selection, which significantly impacts the scalability of the LoRa systems. In addition, since packet collisions in LoRa systems occur when two or more LoRa devices transmit packets using the same SF and channel simultaneously, these parameters must be selected jointly and appropriately [
13,
14].
Research on communication parameter management in LoRa systems can be divided into centralized and distributed approaches. Most of the existing research focuses on centralized approaches in which the network server allocates communication parameters to the LoRa devices [
15]. The GW is responsible for transmitting the LoRa packets of nodes and forwarding them to the network server [
16]. The network server may allocate optimal transmission parameters for the centralized approaches. However, the GW needs to know much a priori information, such as the distance between the GW and the LoRa device, the packet length, the event probability, the number of devices, and so on, to determine the communication parameters for the LoRa devices in the centralized approaches, which may increase the communication latency. Furthermore, the LoRa device needs to be awake to receive the transmission parameters instruction from the GW, which may increase the energy consumption of the IoT devices compared to decentralized parameter selection methods. Moreover, the centralized approaches also increase the consumption of communication resources due to the transmission of the transmission parameters’ instruction. There are also some studies on improving the performance based on the standardized protocol of the LoRa systems. There are mainly three specifications for the LoRa systems, i.e., Class A, Class B, and Class C. Class A uses a so-called Pure-ALOHA-type asynchronous multiple-access scheme in which a terminal uplink has a short burst signal at an arbitrary timing. On the other hand, terminal reception is limited to a very short period immediately after the uplink. In Class B, all GWs and terminals use beacons transmitted by the GWs to synchronize with the network. By accurately recognizing the time at which each terminal opens its reception window, the GW can immediately send a call when there is downlink information. In Class C, high-speed downlink communication is possible because terminals can always receive signals. Even though the parameter control methods for different LoRa classes are not the same, the standardized protocols face the same issues as the other centralized methods. As described above, considering the future proliferation of LoRa devices and the need to provide ultra-long battery life for LoRa devices, only the resource allocation schemes that can significantly reduce signaling to the access network are feasible. Therefore, a decentralized approach is required where each LoRa device autonomously selects appropriate communication parameters without the help of the GW/network server [
17]. Compared to the centralized approaches, the decentralized approach allows parameter selection without needing prior information and the transmission of the transmission parameters’ instruction [
18]. Hence, the spectrum resource for the communications and energy consumption of the LoRa devices can be reduced.
Several decentralized communication-parameter-selection methods based on the Multi-Armed Bandit (MAB) algorithm have been proposed in previous studies to improve the scalability of the LoRa systems. However, these previous studies were limited to the selection of only the SF or only the CH. Meanwhile, few papers have considered the implementation of the methods in practice. As IoT devices have low computational power, limited storage, and less battery, it is a great challenge to develop a joint SF and CH method for practical LoRa systems. To address this issue described above, we proposed a MAB-based joint channel and SF-selection method in our previous work. We evaluated the performance of the proposed method in high-density static and dynamic practical environments [
19]. The experimental results demonstrated that the performance of the FSR can be improved by selecting both the channel and SF. However, the selection of the SF and channel may be correlative, which was not considered in our previous work. In addition, the communication performance of the LoRa device strongly depends on the selection of the SF related to its location, which was also not considered. To consider the correlation of the SF and channel to improve the performance in our LoRa systems further, we set the SF–channel-selection problem as a combinatorial MAB-based SF–channel-selection problem and solved it using the MAB methods in this paper. Moreover, we evaluated the performance of our LoRa systems in the FSR with varied locations of the LoRa devices to show the relationship between the SF selection and the location of the LoRa devices. We consider that the proposed method may be a potential transmission-selection solution for the LoRa systems in the future. The main contributions of this paper are as follows:
We set the SF–channel-selection problem as a combinatorial MAB-based SF–channel problem and introduced the MAB algorithms, including the Tug of War dynamics (ToW), Upper Confidence Bound 1 (UCB1), and -greedy algorithms to solve the formulated problem. In the MAB-based SF–channel-selection methods, the SF and channel were selected only using the ACK information by the LoRa devices, which can be applied without modifications to the LoRa protocol. In addition, since the operation of the MAB algorithms is not complex, the methods can be easily implemented in IoT devices with memory and computational power constraints.
We evaluated the proposed method in experiments with real-world LoRa devices in an environment where the LoRa devices were distributed in multiple indoor locations. First, we evaluated the performance of the FSR and the relationship between the selection rate of the SF and the locations of the LoRa devices when only selecting the SF. The results demonstrated that the appropriate SF depended on the distance from the GW. Besides, the superiority of the MAB-based SF-selection methods was demonstrated by comparing the methods with random access. Then, we evaluated the performance of the FSR and Fairness Index (FI) when considering a joint selection of the SF and channel. Specifically, to show the effectiveness of the proposed combinatorial MAB-based SF–channel method for the FSR and FI, we compared it with the independent MAB-based SF–channel method, where the SF and channel are selected independently. Next, we focused on the performance evaluation of the proposed MAB-based SF–channel-selection method. The performance of the FSR with varying numbers of LoRa devices, transmission intervals, and the locations of the LoRa devices was evaluated exhaustively.
The remainder of this paper is organized as follows.
Section 2 provides an introduction to the related work.
Section 3 describes the system model and the formulated problem.
Section 4 describes the combinational MAB-based SF–channel-selection methods.
Section 5 describes the implementation and performance evaluation of the proposed combinational MAB-based methods. Finally, we provide a conclusion to summarize this paper in
Section 6.
3. System Model and Problem Formulation
This paper considered the uplink transmission of a LoRa system with a star topology consisting of one GW and
L LoRa devices. Denote
as the LoRa device set, where
denotes the
l-th LoRa device. Assume that the number of available channels for the LoRa devices is
I. The public ISM band of Japan was used for the communications between LoRa devices and the GW in this paper, where the bandwidth of each channel was 125 kHz, while at most 15 channels can be used for communication. We considered a natural wireless communication environment where LoRa devices are distributed in various locations with different distances from the GW, as shown in
Figure 1. In
Figure 1, the concentric circles are divided according to the distance from the GW. Different colors represent different SFs assigned to LoRa devices, which may need to be assigned according to the distance between the LoRa device and the GW. Assume that the number of SFs is
S. Each LoRa device selects one SF and one channel to transmit packets each time. As described in
Section 1, LoRa employs CSS modulation so that signals with different SFs (7-12) can be identified and successfully received even if they are transmitted simultaneously on the same channel. In addition, different SFs have different transmission speeds and thresholds for the SNR that can be successfully received. Therefore, each LoRa device must select an appropriate SF considering its distance to the GW and interference effects in the surrounding environment. Theoretically, the spreading codes for different SFs are orthogonal, so collisions only occur when two or more LoRa devices choose the same SF and channel. In practice, however, perfect orthogonality may not be guaranteed, and the interference between transmissions using different SFs on the same channel must be considered [
15,
19]. In addition to the channel and SF, the bandwidth
B and the transmit power
also can be selected to improve the communication performance. The bandwidth can be chosen as 62.5 kHz, 125 kHz, 250 kHz, and 500 kHz. The transmit power can be selected from −1 dBm to 13 dBm, depending on the application requirements and the communication environment of the LoRa devices. In this paper, the bandwidth of the channel and the transmission power for all LoRa devices were set to 125 kHz and the maximum transmit power, i.e., 13 dBm, respectively. We assumed that all LoRa devices transmit
M-byte packets with the same length each time. Denote
as the transmission interval. Note that
is the same for all LoRa devices.
The process of packet transmission in the LoRa system is summarized as follows. The transmission parameters, including the SF and channel, are first selected by the LoRa devices using the distributed MAB-based reinforcement learning methods implemented on them. After determining the transmission parameters based on the implemented learning methods, carrier sensing is performed to check the availability of the selected channel. If the selected channel is available, the LoRa device sends a packet to the GW using that channel. The feedback ACKnowledgement (ACK) or NACK information from the GW will be received at the LoRa devices’ side for a while after packet transmission, which is used to update the MAB-based reinforcement learning methods. If the ACK information is received, it represents that no packet collision or a capture effect occurred, and the packet from the LoRa device was successfully transmitted, as shown in the middle of
Figure 2. On the other hand, the packet transmission fails for some reason if the NACK information is received. The reasons that cause the packet transmission failure may include that other LoRa devices transmit packets using the same channel and SF at the same time, as shown in the left side of
Figure 2, causing packet collisions among them. In addition, the reason may include the interference from other IoT devices, or the signal is attenuated by shadowing due to a low SF value, resulting in an SNR value that is smaller than the threshold value that can be received, as shown in the right side of
Figure 2.
The FSR was used to evaluate the performance of the MAB-based joint SF- and channel-selection methods in this paper. The FSR in the LoRa system at the
t-th decision is defined as the ratio of the number of successful transmissions to the total number of transmission attempts, which is expressed as:
where
is the number of transmission attempts by device
l and
is the number of successful transmissions at the time
t. This paper aimed to maximize the FSR by the MAB-based decentralized learning methods, thereby improving the scalability of the overall LoRa application. The FSR maximization problem can be formulated as follows:
To achieve this goal, an appropriate SF must be selected based on the distance from the GW and the surrounding environment. Meanwhile, a channel less affected by other LoRa devices must be well chosen. In the LoRa system, packet collisions occur when they are transmitted on the same channel and SF at the same time. Hence, the SF and channel must be co-selected, and their relationship must be jointly considered in the selection.
4. Channel and SF Selection Based on MAB Algorithms
As mentioned in the previous section, LoRa devices must select appropriate parameters, such as the SF and channel, according to the communication environments. To achieve this goal, the SF–channel-selection problem was formulated as the MAB problem in this paper and solved by the MAB-based algorithms. In this section, we first introduce the relationship between the SF–channel-selection problem and the MAB problem. Next, the SF–channel-selection problem is formulated as two MAB problems with different structures, i.e., a combinatorial MAB-based and an independent MAB-based channel–SF-selection problems. Finally, we present the MAB algorithm for solving the formulated SF–channel-selection problems.
4.1. MAB and Channel–SF-Selection Problems
The MAB problem or bandit problem is one of the general problems first discussed by Robbins in [
35]. In the MAB problem, the player selects a slot machine to play among several slot machines, aiming to maximize the number of coins he/she can earn by repeatedly playing [
36]. The player needs to learn the probability of the number of coins for each slot machine to find the slot machine that pays the most by repeatedly playing. In other words, we have to perform exploration to gather information by playing slot machines other than the one with the best probability. On the other hand, if we perform more exploration than necessary, we cannot maximize the number of coins we can win. Hence, if we can estimate a good slot machine, we must play that slot machine to maximize the reward. The MAB problem is a decision-making problem that considers the trade-off between “exploration” for searching for a good slot machine and “exploitation” for playing a good slot machine to increase the coins in a series of trials.
In the most-straightforward formulation, the bandit problem has
K slot machines with probability distributions (
,...
). The mean and variance of each probability distribution can be expressed as (
,…
) and (
,…
), respectively. The player aims to find the probability distribution with the largest expected value and tries to obtain as many rewards as possible in a sequence of trials. At each trial
t, the player selects a slot machine
and wins
as a reward (i.e.,
coins). The bandit algorithm for solving the bandit problem can be described as a decision-making strategy determining the slot machine
to be selected for each trial. Reward maximization is the most-used metric for evaluating the performance of a bandit algorithm. The bandit algorithm for solving the bandit problem will be described later in this section. The reward maximization problem can be expressed as follows, where
T is the total number of trials.
As described in
Section 2, we aimed to maximize the cumulative FSR by letting each device autonomously select the appropriate channel and SF using the ACK/NACK information. The problem of learning appropriate channels and SFs using only the ACK/NACK information can be transformed into the MAB problem: an IoT device (i.e., the player in the MAB problem) has
S SFs and
I channels (i.e., the slot machines in the MAB problem). The objective is to maximize the cumulative FSR (i.e., the cumulative rewards in the MAB problem). The relationship between the channel–SF-selection and the MAB problems is summarized in
Table 3.
4.2. MAB-Based Channel–SF-Selection Problem
When the parameters to be selected are only SFs or only channels, i.e., when there is only one parameter to be selected, the MAB problem can be applied directly, as described in the previous subsection. However, to perform autonomous decentralized joint optimization of the channel- and SF-selection problem, we need to design the structure of the MAB-based channel–SF-selection method. In this subsection, we introduce two structures of the MAB-based channel–SF-selection problem, i.e., combinatorial and independent MAB-based channel–SF-selection problems.
4.2.1. Combinatorial MAB-Based Channel–SF-Selection Problem
We first describe the combinatorial MAB-based channel–SF-selection problem. In this problem, any combination of the SF and CH is configured as one slot machine, as shown in
Figure 3. Hence, the number of slot machines is
. The best slot machine among these combinations is selected using the MAB algorithms by maximizing the reward (i.e., the FSR). The main design idea of this structure is that it is necessary to optimize the channel and SF considering their potential relationship since packets sent using the same channel and SF simultaneously will cause collisions in the LoRa system.
The combinatorial MAB-based channel–SF-selection problem process can be summarized as follows. The channel–SF is first selected based on the strategy of the MAB algorithms implemented on each LoRa device. Then, packets are sent using the selected SF and channel. The reward of the selected SF–channel combination is evaluated depending on whether the packet was successfully sent. For the next packet transmission time, each LoRa device dynamically selects the optimal SF–channel combination based on the updated evaluation and repeats this process until the time limit
T is reached. The details of the combinatorial MAB-based channel–SF-selection problem are summarized in Algorithm 1.
Algorithm 1 Combinatorial MAB-based channel–SF selection. |
- 1:
Initialize the parameters used in each MAB algorithm - 2:
while time T do - 3:
Select channel–SF set based on the MAB algorithm. - 4:
Send a packet using the selected SF and channel. - 5:
if the packet is transmitted, and the ACK frame is received then - 6:
Transmission successful. - 7:
else - 8:
Transmission failure. - 9:
end if - 10:
Update the corresponding parameters according to each MAB algorithm. - 11:
- 12:
Sleep for transmission interval . - 13:
end while
|
4.2.2. Independent MAB-Based Channel–SF-Selection Problem
In the independent MAB-based channel–SF-selection structure, the channels and SFs are selected independently, aiming to optimize the channel and SF parameters, respectively. Two groups of machines are prepared; one group is used for SF selection, and the other group is used for channel selection. Hence, the numbers of the two types of machines are
S and
I, respectively. The number of machines for the independent MAB-based channel–SF-selection structure is
. Compared to the combinatorial MAB-based channel–SF-selection problem, the number of machines can be reduced to a great extent. By this, the memory requirements can be reduced. In addition, the efficiency of the search for the appropriate channel or SF may be increased. The schematic diagram of this structure is shown in
Figure 4.
In the independent MAB-based channel–SF-selection problem, the SF is first selected based on the MAB algorithm implemented on the LoRa device. Similarly, a channel is selected. A packet is then sent using the chosen independent SF and channel. After that, the parameters related to the MAB algorithms are updated based on whether the packet was successfully transmitted. The process is repeated until the time limit
T. The independent MAB-based channel–SF-selection problem’s details are shown in Algorithm 2. The computational complexity of the independent MAB-based channel–SF method is
, which was analyzed in our previous work [
19].
Algorithm 2 Independent MAB-based channel–SF-selection problem. |
- 1:
Initialize the parameters of the MAB algorithm used for the SF and channel selection - 2:
while time T do - 3:
Select the SF among the SF slot machines using the MAB algorithm. - 4:
Select the channel among the channel slot machines using the MAB algorithm. - 5:
Send a packet using the selected SF and channel. - 6:
if the packet is transmitted, and the ACK frame is received then - 7:
Transmission successful. - 8:
else - 9:
Transmission failure. - 10:
end if - 11:
Update the corresponding parameters for SF selection according to the policy of the MAB algorithm. - 12:
Update the corresponding parameters for channel selection according to the policy of the MAB algorithm. - 13:
- 14:
Sleep for transmission interval . - 15:
end while
|
4.3. MAB Algorithms
In this paper, we focused on three MAB algorithms for solving the channel–SF-selection problem, that is the -greedy, UCB1, and ToW dynamics algorithms. In the following subsection, we discuss these three MAB algorithms in detail.
4.3.1. -Greedy Algorithm
The
-greedy algorithm is widely used for solving MAB problems because of its simplicity. In each trial, the slot machine with the highest reward probability determined by experience is selected and played with a probability of 1
. On the other hand, the slot machines are randomly selected and played with probability
. The policy of the
-greedy algorithm is expressed below.
where
j is the indicator of the channel and SF selection and
, that is
corresponds to the joint channel and SF selection in the combinatorial MAB-based channel–SF-selection problem and
and
correspond to the channel and SF selections in the independent MAB-based channel–SF-selection problem.
is the number of arms corresponding to the structure
j.
is the number of SF and channel combinations for the combinatorial MAB-based channel–SF selection. The value of
is
. The values of
and
are equal to the number of SFs
S and channels
I, respectively, for the independent MAB-based channel–SF-selection problem.
is the set of slot machines.
{
, i.e., the set of channel and SF combinations for the combinatorial MAB-based channel–SF-selection problem.
{
and
{
, i.e., the set of channels and that of SFs for the independent MAB-based channel–SF-selection problem.
is the
i-th slot machine in
.
is the number of times the arm
is selected at iteration
t.
is the number of successful transmissions among
, i.e., the number of times the ACK information is received.
4.3.2. Upper Confidence Bound1
Upper Confidence Bound (UCB) algorithm sequences were proposed by Auer and Bianchi in [
37]. The UCB1 algorithm is the simplest one among the UCB series. The UCB1 algorithm selects the slot machine based on the average reward and the number of times each slot machine is played. This algorithm considers the upper bound of the confidence interval. The slot machine
is selected in the
t-th trial after playing each slot machine once, according to the following equation.
Auer et al. also proposed a UCB1-Tuned algorithm, which considers not only the empirical mean value of each slot machine, but also the empirical variance [
37]. This algorithm is the best-performing algorithm among the current MAB algorithms. In the UCB1-Tuned algorithm, the slot machine is selected based on the following equation in each trial.
where
is based on the estimated variance, which can be expressed as follows.
in the equation is the variance of the obtained reward.
4.3.3. Tug of War Dynamics
The ToW is a simple method with low computational complexity. It has been analytically validated that the ToW dynamics is efficient in maximizing stochastic rewards under dynamic environments where the reward probabilities of the arms change frequently [
38,
39,
40,
41]. The essential element of the ToW dynamics is a volume-conserving physical object. It assumes that each slot machine is allocated to multiple cylinders with branches filled with an incompressible fluid, as shown in
Figure 5. The volume is then updated by pushing and pulling the corresponding cylinders depending on whether the slot machine is rewarded for a trial at time
t. In addition, since the cylinders are connected, as shown in the figure, a volume increase in one part is immediately compensated by a volume decrease in another part. In the ToW dynamics, the arm
with the height cylinder interface value
is selected. The following formula expresses
.
There are various possibilities for adding oscillations
. References [
41,
42] studied the impact of oscillations on the efficiency of decision-making in detail, which is beyond the scope of this paper. In this paper, the incompressible liquids oscillate autonomously according to the following equation.
In addition,
is the estimated compensation for each arm, which is derived by the following equation:
where
(0 <
< 1) is the discount factor for estimated compensation. By introducing
, we can control the impact of the past learning experience on the present to adapt to natural communication environments where the channel conditions may change dynamically.
is given by the following formula.
In other words, if the transmission is successful and the ACK is received, the Q value of the selected arm (parameter) gains “+1” as a reward. By this, the height of the fluid interface for that arm is increased. Conversely, when the transmission fails and an ACK is not received, the corresponding arm (parameter) is updated with the punishment
. By this, the interface value of the selected arm is decreased. Correspondingly, the interface value of other arms increases. Here,
is expressed as:
where
and
are the arms with the highest and second-highest reward probabilities among all arms at time
t, respectively. The reward probability is given by Equation (
4). In the ToW dynamics,
is the number of times the arm
is selected by time
t and
is the number of successful transmissions using the arm
by time
t.
and
are given by the following equations, respectively.
where
(0 <
1) is the forgetting rate.
6. Conclusions
In this paper, we implemented and evaluated lightweight autonomous distributed reinforcement learning methods for joint channel and SF selection in a practical larger-scale LoRa system. As a result, we were able to verify the necessity of dynamically selecting both the SF and channel. Specifically, the results showed that the channel–SF selection using the MAB-based methods was effective compared to random selection, especially in situations where the LoRa devices were distributed in various locations. Specifically, when the difference between the FSR of the proposed ToW dynamics and that of random selection was highest, the achieved FSRs of the ToW dynamics and random selection were 0.86919 and 0.59761, respectively. Hence, the percentage difference of the achieved maximum FSRs for the ToW dynamics and random selection was 145%. Besides, the ToW-dynamics-based method outperformed other MAB-based methods, such as UCB1, used in the recently published results, whether with the combinational or independent structure. In addition, the structures of the MAB-based methods and the other communication parameters also greatly affected the FSR and FI. Specifically, the combinational MAB-based methods could achieve a higher FSR and FI than the independent MAB-based methods considered in our previous research. Hence, the relevance of the channel and SF is a very important factor for the communication performance of larger-scale LoRa systems. Moreover, the FSR can be improved by jointly selecting the channel and SF compared to only selecting the SF. Furthermore, by increasing the transmission interval, the FSR can be improved to a great extent. In our future work, we will consider the joint channel and SF selection in outdoor, longer-distance environments, the optimization of other communication parameters, and the energy efficiency of the MAB-based methods.