**Dynamic Reconfiguration of Cluster-Tree Wireless Sensor Networks to Handle Communication Overloads in Disaster-Related Situations**

**Miguel Lino 1,2, Erico Leão 1, André Soares 1, Carlos Montez 2, Francisco Vasques 3,\* and Ricardo Moraes <sup>4</sup>**


Received: 24 July 2020; Accepted: 18 August 2020; Published: 20 August 2020

**Abstract:** The development of flexible and efficient communication mechanisms is of paramount importance within the context of the Internet of Things (IoT) paradigm. IoT has been used for industrial, commercial, and residential applications, and the IEEE 802.15.4/ZigBee standard is one of the most suitable protocols for this purpose. This protocol is now frequently used to implement large-scale Wireless Sensor Networks (WSNs). In industrial settings, it is becoming increasingly common to deploy cluster-tree WSNs, a complex IEEE 802.15.4/ZigBee-based peer-to-peer network topology, to monitor and control critical processes such as those related to oil or gas, mining, or certain specific chemicals. The remote monitoring of critical events for hazards or disaster detection in large areas is a challenging issue, since the occurrence of events in the monitored environment may severely stress the regular operation of the network. This paper proposes the *Dynamic REconfiguration mechanism of cluster-Tree WSNs* (DyRET), which is able to dynamically reconfigure large-scale IEEE 802.15.4 cluster-tree WSNs, and to assign communication resources to the overloaded branches of the tree based on the accumulated network load generated by each of the sensor nodes. A complete simulation assessment demonstrates the proposed mechanism's efficiency, and the results show that it can guarantee the required quality of service level for the dynamic reconfiguration of cluster-tree networks.

**Keywords:** disaster; hazards; cluster-tree; remote sensing; industrial wireless sensor network

#### **1. Introduction**

Industrial plants are often constructed on large industrial sites, and involve multiple mechanical or chemical processes that are sometimes deployed in risk-prone outdoor areas. The risks posed by natural hazards can be extensive, and this implies a need for uninterrupted monitoring of environmental variables and specific dangerous events that may occur.

Real-time data collection and remote monitoring of events over large areas is a challenging issue, and this is conventionally aided by satellite imaging applications that can facilitate the development of disaster detection applications, such as landslide hazard monitoring and fire or forest post-fire detection [1]. The recent development of numerous forms of sensors and the recent advances in wireless communication and micro-nano electronic devices have leveraged the use of WSNs for

these types of monitoring applications. A WSN can offer several advantages, such as in situ sensing closer to the monitored data, online detection of events, and faster deployment of the monitoring infrastructure [2,3].

However, to ensure the success of this type of monitoring, several technical challenges need to be overcome. Large-scale monitoring applications generally require complex network topologies to achieve adequate spatial coverage at the same time as communication with low packet losses and low delays. WSN nodes impose an additional constraint in terms of an energy-saving mode of operation. Due to the large scale of the areas monitored in this way, and the possible existence of obstacles both in indoor and outdoor environments, the development of adequate communication mechanisms is a major focus of research in relation to this type of problem.

In the literature, several communication protocols and technologies have characteristics that make them candidates for large-scale monitoring applications, such as Lora [4], Sigfox [5], IEEE 802.15.4 [6], and ZigBee [7]. The first two of these are Low-Power Wide-Area Networks (LPWANs), which are suitable for long-range communication with meagre bit rates. In turn, the IEEE 802.15.4/ZigBee set of standards is a Low-Rate Wireless Personal Area Network (LR-WPAN), which has become the de facto communication method for WSNs.

As IEEE 802.15.4 radios are not intended for communication over long ranges, the use of adequate peer-to-peer communication mechanisms is required in order to allow for coverage of large areas. The IEEE 802.15.4 and ZigBee protocols support a hierarchical peer-to-peer topology called a cluster-tree [8], where each cluster consists of a group of sensor nodes coordinated by a particular node called the Cluster-Head (CH). In a conventional periodic monitoring operation, sensor nodes monitor the environment and send the acquired data to their CHs, which gather all data from within the cluster and send them towards a Base Station (BS). The main CH of the entire network assumes the role of the BS—a sink node that collects and processes packets sent by all sensor nodes. This type of communication, from all nodes to a central node, is called convergecast communication.

The adequate configuration of both beacon scheduling and other network parameters, such as the buffer sizes, Superframe Duration (SD), and Beacon Interval (BI), is a critical issue. Underprovision of network resources can cause packet losses, while overprovisioning, i.e., the presence of slack in the schedule and buffers, tends to unnecessarily increase end-to-end communication delays. Among the network parameters that need to be considered in the beacon scheduling computation are the periodicity of data acquisition at the sensor nodes and the number of levels at each branch in the cluster-tree (e.g. the number of parent and child clusters of each CH). The resources are then statically allocated to the CHs by assuming the maximum values for each packet flow in each CH. However, the network behaviour may dynamically change over time, and this introduces several challenges that are not often addressed in existing proposals.

Disaster monitoring applications are inherently event-triggered; that is, the detection of measured values above a certain threshold can lead to the modification of the operational mode of the network in some regions of the network. For example, in an Industrial WSN (IWSN) fire risk detection application, the detection of high values for temperature in conjunction with low humidity can trigger an increase of the monitoring periodicity within the nodes located in that critical region. This modification will mean that the entire tree branch will need to be reconfigured to prioritise these particular packet flows; otherwise, data conveying critical information will suffer longer delays and/or will be discarded throughout the network.

This paper aims to demonstrate that a dynamic reconfiguration of the network must be performed in such cases since a static configuration implies the reservation of network resources for all CHs based on worst-case assessments. That is, maximum periodicities are assumed for all sensor nodes and the maximum number of packets is assumed to traverse each cluster. As a consequence, beacon scheduling may become unfeasible or, at least, the network may be overused, which will have severe consequences in terms of energy consumption. The reasoning behind this work is that there is a need to dynamically reschedule the network whenever there is a need to implement a change in the mode that the network works. The proposed DyRET communication mechanism addresses this requirement, and enables dynamic mode changes in cluster-tree networks by reallocating CH communication resources according to the needs of the supported applications. The use of DyRET allows for an initial configuration of the network based on the nominal load imposed by regular monitoring activities, and the reallocation of network resources on demand whenever necessary. For example, whenever a critical event occurs in the network, such as monitored data indicating the detection of a possible disaster situation, special attention needs to be paid to this region of the network, requiring its sensor nodes to increase their duty cycles. A reconfiguration of the operating parameters is required in order to guarantee that this critical event will not congest a whole branch of the network.

#### *1.1. Objective and Contributions of This Paper*

IWSNs must be able to deal with typical impairments in communication related to signal interference and the requirements for long lifetimes and reliable network operation [9]. These types of requirements are usually important when the monitored area is large. Although a cluster-tree is generally a suitable topology for WSNs when dealing with the monitoring of large areas, several technical issues must be carefully handled, such as setting up the scheduling of active cluster periods [10], efficient allocation of resources according to performance limitations [11], prioritising different types of data traffic, and dynamic reconfiguration of the overall network. The DyRET mechanism specifically addresses this last issue. The main contributions of this paper can be summarised as follows:


#### *1.2. Outline of This Work*

The remainder of this paper is organised as follows. In Section 2 some background issues about IEEE 802.15.4/ZigBee and cluster-tree features are discussed. Related work is summarised in Section 3. Section 4 presents the problem statement of this proposal. Section 5 introduces the DyRET, a mechanism to dynamically reconfigure cluster-tree networks according to the occurrence of critical events in specific areas of the network. Section 6 presents the simulation assessment of the proposed reconfiguration mechanism and discusses the results. Finally, some conclusions and further considerations are presented in Section 7.

#### **2. Ieee 802.15.4 and Zigbee**

The industry digitalisation gave rise to the smart industry concept, also known as Industry 4.0. One of the factors that drove this digitalisation is the consolidation of technologies related to the IoT and the Industrial IoT (IIoT) paradigms [12,13], where wireless technology plays a fundamental role, providing appropriate support for the applications, offering advantages over wired technology in terms of flexibility, fast deployment, scalability, distributed processing capacity, and high mobility.

Within this context, the IEEE 802.15.4 and ZigBee set of standards is pointed out as the most widely used protocol stack for implementing WSNs. While the IEEE 802.15.4 presents the PHYsical layer (PHY) and Medium Access Control (MAC) sublayer specifications for LR-WPAN applications, ZigBee specifies the upper layers (Networks, Application and Security).

Basically, IEEE 802.15.4 standard defines two types of nodes: *Full Function Devices* (FFD) and *Reduced Function Devices* (RFD). FFDs can perform complex tasks, such as: routing, coordinating neighbour nodes, aggregation, fusion or filtering data, and physical sensing. RFDs are responsible only for sensing and transmitting physical data.

Depending on the type of application, IEEE 802.15.4/ZigBee standards support two basic types of network topologies: star and peer-to-peer. Unlike star WSNs, in which all sensor nodes are directly connected to the coordinator node (centralised communication paradigm), peer-to-peer networks can implement more complex topological formations, such as grid, mesh, and cluster-tree networks.

Cluster-tree is a special peer-to-peer network topology and is pointed out as one of the most suitable topologies to deploy large-scale WSNs [8]. In this topology, sensor nodes are grouped into neighbouring clusters, which are coordinated by CHs, as illustrated in Figure 1a. CHs are responsible for creating their own clusters and for synchronising the communication with their child nodes.

**Figure 1.** The cluster-tree WSN characteristics and the types of scheduling of its different traffic.

CH nodes are interconnected by parent-child relationships, forming a hierarchical structure that allows greater scalability than star networks. In this way, the cluster-tree routing is deterministic, following the tree levels (depths). In cluster-tree networks, the BS is often the coordinator of the Personal Area Network (PAN), i.e., the first and main CH of the network is the root node. This node is responsible for network management. Each CH synchronises its communication period with that of the PAN coordinator via beacon frame exchanges; and the PAN coordinator is responsible for organising the scale of beacon sending for the whole network.

The cluster-tree network operates in beacon-enabled mode, where beacon frames are used to synchronise the sensor nodes and they define a communication structure called superframe, illustrated in Figure 2. Superframes are delimited by beacon frames, that are periodically transmitted by all CHs (included PAN coordinator).

**Figure 2.** The superframe structure.

Basically, the superframe is defined by two parameters: *macBeaconOrder* (BO) and *macSuperframeOrder* (SO). These parameters define the Beacon Interval (BI) and the Superframe Duration (SD), respectively. BI corresponds to the interval at which a cluster-head must periodically transmit its beacon frames. In turn, SD defines the period of communication of the clusters. The BI and SD are defined as follows:

$$\begin{array}{ll} BI = aBaseSuperframeDuration \times 2^{BO}, & 0 \le BO \le 14\\ SD = aBaseSuperframeDuration \times 2^{SO}, & 0 \le SO \le BO \le 14 \end{array} \tag{1}$$

where BO = 15 indicates that the network is operating in the non-beacon enabled mode. The *aBaseSuperframeDuration* corresponds to the minimum duration of a superframe when SO = 0 (by default, this parameter is equal to 960 symbols, corresponding to a duration of 15.36 ms, considering a bit rate of 250 kbps, frequency band of 2.4 GHz, and one symbol as 4 bits).

The beacon interval has an active part and, optionally, an inactive part. Thus, when BO is larger than SO, it means that exists an inactive part and sensor nodes can enter power save mode. When SO is equal to BO, there is no inactive part, i.e., the devices do not have additional time to save energy.

In the active part, the superframe starts immediately after the beacon frame, defining the period within which the nodes, both coordinators and sensors, can exchange messages. The active part is subdivided in two periods: Contention Access Period (CAP) and Contention Free Period (CFP). During CAP, sensor nodes compete to access the wireless channel using the *Carrier Sense Multiple Access-Collision Avoidance* (CSMA-CA) algorithm, as a form of collision avoidance. CFP is optional, and if requested, allows the CH to reserve Guaranteed Time Slots (GTS) so that a specific associated node has dedicated channel access and transmit contention-free messages.

From the point of view of communication mode, after the network formation, the data packets can be traveling *upstream* and *downstream*. Upstream traffic is the typical monitoring traffic, consisting of messages generated by sensor nodes that are forwarded to ascendant CHs until the PAN Coordinator. Reversely, downstream traffic corresponds to the traffic generated by the PAN Coordinator and forwarded to the descendent nodes.

There is no need for a clock synchronisation protocol to synchronise the sending of periodic beacons by neighboring CHs since the IEEE 802.15.4 MAC sublayer is responsible for this task. However, in order to avoid intercluster interferences and collisions of beacons and data frames, the active period of clusters must be organised. This is possible by applying beacon scheduling techniques, which correspond to the ordering of the transmission time to CHs' beacon frames. Basically, there are two types of beacon scheduling [10]: bottom-up and top-down, which respectively prioritise upstream and downstream traffic.

As outlined in Figure 1b, by using bottom-up scheduling, superframes are ordered following a bottom-up direction, where deepest clusters are firstly scheduled, depth-by-depth, until reaching the PAN coordinator. On the other hand, by using a top-down scheduling approach (Figure 1c), clusters are ordered from the PAN Coordinator, depth-by-depth, until reaching the deepest clusters.

#### **3. Related Works**

This section summarises the most relevant research works, addressing different issues: cluster scheduling [14,15], configuration of communication structures [10,11,16–19], data-load-based congestion control [20–27], and environmental monitoring network solutions for event-driven applications [28–30].

Regarding the beacon scheduling approaches, Koubaa et al. [14] summarise the problem of overlapping sensor nodes and highlight the risk of improperly configuring communication structures. The authors present different approaches to address direct and indirect collision issues. Firstly, the coordinator nodes transmit the beacon frames of all CHs early and, the other approach adjusts SDs of the same duration for simultaneous transmission.

In [15], a semi-dynamic scheduling scheme that allows non-coordinating nodes to act as CHs is proposed. These nodes can send data to the PAN Coordinator without waiting for the next actuation period. This is preceded by an algorithm that statically defines the beacon time and time slots for CH nodes and dynamically defines these features for all the sensor nodes. In addition, the time slot is assigned to the sensor node based on standard traffic and the availability verified by its CH according to node ID. These techniques are statically performed.

Regarding to configuration approaches of communication structures, Severino et al. [11] propose a cluster-tree designed to dynamically reorder CHs and reallocate their bandwidth. The reordering scheme (*Dynamic Cluster scheduling Reordering*—DCR) comprises an algorithm that performs scheduling based on the priority, number of cycles, neighbour set and depth of CHs. In turn, the allocation scheme (*Dynamic Bandwidth Re-allocation*—DBR) increases the bandwidth of CHs, whereas it reduces the bandwidth of others. However, this approach does not consider the load imposed by sensor nodes.

Kim and Kim [16] propose an energy-efficient reconfiguration algorithm that periodically selects CHs according to the shorter distance routes and lower energy cost whenever a threshold is reached. In contrast, the work presented in [17] builds a non-threshold cluster-head rotation scheme considering different energy resources (aggregation, transmission, residual and regular operations energy). As with [31], it also considers the depth and the load processed by each node. The mechanism proposed by Choudhury et al. [17] was compared to methods [18] based on the LEACH protocol [19], resulting in some gain in battery consumption and number of clusters, but it does not deal with the network congestion or random network formation issues.

To improve *Time Division Cluster Scheduling* (TDCS) algorithm [21], which deals with different directions of data flows, Ahmad and Hanzálek [22] propose a new heuristic method. In [21], it is proposed a method where messages between clusters are sent every period, considering a collision domain and based on the Integer Linear Programming theory for instances of small size (less than one hundred nodes). flows, Ahmad and Hanzálek [22] also propose the TDCS-PCC (Period Crossing Constraint), which deals with multiple collision domains, allowing messages in different streams to flow through better-defined paths based on graph heuristics, tree depths, and consecutive cluster paths. Despite these techniques to contribute to network life, event-driven large-scale applications are not addressed.

In recent years, a large-scale IWSN grouped into clusters for monitoring areas of toxic gas leakage was proposed by by Mukherjee et al. [28]. The main idea is to extend the lifetime of the network by activating the smallest number of nodes. In this approach, both the initial network formation and the selection of which nodes are activated is carried out using the *Connected K-Neighbourhood* (CKN) algorithm. The event is not considered, but the status of the zone is notified. In [29], a clustering and routing method for monitoring IWSN in fire-focused environments is presented. A hybrid CH selection scheme is implemented to benefit network energy efficiency. Then, the routing phase is adaptively configured as critical events are detected in the clusters. Events are reported using flags, but the data frame format is also changed.

Following the idea of event notification using data frames, the *Priority-based Congestion Control Protocol* (PCCP), proposed by [30], aims to prioritise upstream traffic flows according to three components: (1) *Intelligent Congestion Detection* (ICD); (2) *Implicit Congestion Notification* (ICN); and (3) *Priority-based Rate Adjustment* (PRA) hop-by-hop, in order to obtain weighted transfer rates among sensor nodes. While the ICD technique infers the existence of congestion by counting the number of packets sent locally, the ICN component is an efficient way of transporting congestion information piggybacking it in the header of data packets. Besides, the PRA method intends to allocate bandwidth based on the sensor nodes' priority, despite not defining which policy is used to assign the priorities.

The *Fairness Aware Congestion Control* (FACC) [20] implements a model of fairness bandwidth allocation in WSNs, by using a mechanism that divides the network in two categories: the aggregation nodes located near the *sink* node and the local acting nodes near the sensor nodes. FACC acts locally by regulating the rate of sensor nodes close to the coordinator node (*origin*) and acts globally by triggering reconfiguration messages from nodes near the *sink* node. When a packet is lost, the nodes near the *sink* send a Warning Message (WM) to nodes near the origin. After receiving WMs, nodes close to the origin send a Control Message (CM) to the sensor node. As a disadvantage, the model is not compliant with the IEEE 802.15.4 and has a significant overhead concerning a high number of message exchanges.

A priority-based method is proposed by [27] to allocate network resources, maintaining fairness between the devices' communication. Although this proposal acts centrally and does not address traffic differentiation, the BS operates an auction-driven online selection scheme to define priority access considering characteristics such as cost, precision, location, and amount of data collected.

Leão et al. [10] propose mechanisms to proportionally configure the communication structures of cluster-tree WSNs. Among them, the *proportional Superframe Duration Allocation based on the message Load* (Load-SDA) scheme defines the superframe durations and beacon intervals for clusters based on the data load generated by child nodes. Regarding load-based congestion control, the work proposed by Lino et al. [23] combines the Load-SDA scheme with a guided network formation algorithm similar to [24], providing reduced end-to-end communication delays and homogeneous branches for convergecast traffic. Also about convergence systems, Yuan et al. [25] propose an algorithm to control the monitoring load received by the base station, which can be a mobile node, aiming to obtain Quality of Service (QoS) and to save battery energy. On the other hand, considering control messages, Jing et al. [26] propose two methods for congestion control by local actuation: the first, based on the data collection that keeps a table of coordinator nodes, and the second, a local energy-based actuation, designed to schedule sleep time for the control flow, in order to overcome the limitation of control traffic in WSNs.

Within this context, it becomes clear the lack of efficient approaches to dynamically reconfigure IEEE 802.15.4 cluster-tree networks, in the presence of critical events that change the network data load. This paper aims to propose a mechanism able to dynamically reconfigure large-scale cluster-tree WSNs, in order to ensure Quality of Service for both monitoring and control traffics.

#### **4. Problem Statement**

This work assumes that sensor nodes are randomly deployed in a large-scale two-dimensional environment. These nodes are grouped into clusters according to the IEEE 802.15.4/ZigBee cluster-tree topology, and are formed using a random cluster formation process. The network may suffer from occasional load disturbances (critical events) generated by sensor nodes during the monitoring process, which may require reconfiguration of the cluster-tree parameters.

A critical event may sporadically occur in the monitoring process, implying that the data rate of one (or several) message streams must be modified. After deploying the network, each sensor node starts to collect monitoring data and establishes its default data acquisition rate. From the moment a critical phenomenon occurs in the environment, the default acquisition rate may be changed, indicating that a critical event has occurred. Thus, since new message periodicities are being imposed on the network, network overloads may occur.

In real-time IoT applications, critical events need to be reported as soon as they are detected, in order to trigger suitable protection mechanisms. In a real-world environment, temperature, humidity, pressure, and light sensors are commonly coupled to devices for large-scale control and monitoring applications. Figure 3 illustrates an example of this scenario.

This scenario involves four different types of sensor nodes: *node 1* (humidity), *node 2* (temperature), *node 3* (light) and *node 4* (pressure). In Figure 3a, message streams are highlighted to illustrate the path traveled by the data from the generator node towards the PAN coordinator. Each sensor node can be characterised in terms of the node depth, superframe duration, beacon interval, and operational load. Figure 3b shows the changes in data acquisition rates for each sensor node. For example, *node 1* identifies a change in the behaviour of its monitored physical variable, thus requiring an increase in the amount of information to be sent to the sink node.

As there is an increase in the flow of messages generated by the set of sensor nodes located within the region of the critical event, the current configuration of the cluster-tree may not be able to handle this additional load on the path along the network branches, and this may give rise to typical problems such as node overload, congestion, higher delays and packet losses. It is important to consider that since all data messages are being sent to the PAN coordinator (sink node), the problem will be more serious for CHs closer to the PAN coordinator, as they will have to deal with data accumulated from their child CHs.

**Figure 3.** The critical events of nodes and the structure of messages stream generated.

Therefore, we identify a need to define efficient communication mechanisms for dynamically reconfiguring cluster-tree networks based on changes in the mode of the monitoring traffic, and the importance of performing this dynamic reconfiguration without affecting the current operation of the network.

#### **5. Dyret: Dynamic Reconfiguration Mechanism of Cluster-Tree Wireless Sensor Networks**

In this paper, we propose a new communication mechanism, called DyRET, to deal with the previous described problems. The steps of the DyRET mechanism are described in the following subsections: Section 5.1 defines the main assumptions made here; Section 5.2 describes the superframe allocation procedure; Section 5.3 presents the critical event (disturbance) detection process, that is used to notify the PAN coordinator of a requested mode change; and, finally, the reconfiguration and notification processes for clusters are explained in Sections 5.4 and 5.5.

#### *5.1. Assumptions*

Considering a cluster-tree WSN and their types of traffic, this work considers the following assumptions:


Please note that although this work assumes that the cluster-tree is formed randomly, any type of cluster-tree can be dynamically reconfigured by the proposed DyRET mechanism.

#### *5.2. Data Acquisition-Based Superframe Duration Allocation Process*

Figure 4 illustrates a random network formation process and the use of a proportional superframe duration allocation procedure to initially configure the cluster-tree network.

**Figure 4.** The cluster-tree network model: (**a**) an example of random cluster-tree formation process; (**b**) the allocation of superframe durations proportional to the data load imposed by sensor nodes (Load-SDA scheme).

After finishing the random network formation process (Figure 4a), DyRET considers that superframe durations of clusters are scheduled in order to avoid collisions between data and beacon frames, as described in [32]. Thus, each cluster has its own superframe duration defined by the Load-SDA scheme proposed in [10]. This approach defines proportional superframe durations considering the data load imposed on each CH by sensor nodes, as shown in Figure 4b. Then, as soon as all sensor nodes are associated with a specific CH, each one of them identifies its data acquisition rate, which is defined as its standard data rate and it is considered by Load-SDA scheme.

According to the Load-SDA scheme [10], each sensor with a message stream *Si* periodically generates a message that is sent to the sink node (PAN coordinator) through the tree routing. Each message stream is characterised by the data message size and its generation periodicity, imposing a network use factor. In this way, the size of the beacon interval must be large enough to be able to handle all superframe durations. At the same time, BI should be as short as possible in order to reduce end-to-end communication delays. Thus, we have:

$$\sum\_{j=1}^{N\_{CH}} SD\_j \le BI \le P\_{\text{min}} \tag{2}$$

where *SDj* is the superframe duration allocated to *CHj*, *BI* is the beacon interval, *NCH* is the total number of cluster-heads generated in the cluster-tree network, and *Pmin* corresponds to the shortest data rate period within the set of message streams generated by the sensor nodes.

In addition, Kohvakka et al. [33] models the required time *TTXD* to transmit a single data frame, as follows [33]:

$$T\_{TXD} = T\_{BACK} + T\_{PKT} + T\_{TX\\_RADIO} + T\_{ACK} \tag{3}$$

where *TBACK* is the total *backoff* period and *TPKT* is the packet transmission time, which denoted by *LPKT Rad* (*LPKT* corresponds to data frame size and *Rad* is the radio data rate). *TTX*\_*RADIO* corresponds to time duration the radio takes to switch between different operation modes and *TACK* corresponds to acknowledgements transmission time, denoted by *LACK Rad* (*LACK* is the acknowledgement frame size).

Considering Equation (3), Leão et al. [10] have estimated the number *X* of messages transferred over a minimum superframe duration *SDmin* as follows:

$$X = \left\lfloor \left( \frac{SD\_{\text{min}}}{T\_{TXD}} \right) \times p\_s \right\rfloor,\tag{4}$$

where *SDmin* corresponds to the duration when *SO* = 0 (value only *aBaseSuperframeDuration*) and *ps* is the probability of a successful transmission.

In this way, we can initially define the number of *SDmin* required for each cluster-head *CHj* according to the data load imposed by sensor nodes of the branch by applying Equation (5):

$$SD\_j = \left\lceil \frac{\sum\_{i \in \mathcal{G}\_{below}} \frac{1}{\left\lfloor \frac{P\_i}{RT} \right\rfloor}}{X} \right\rceil \times SD\_{min} \tag{5}$$

where ∑*i*∈S*below* 1 *Pi BI* corresponds to the maximum number of messages generated by the set of child nodes of the cluster-head *CHj* (including the accumulated message traffic of child coordinators), with data periodicity *Pi* during a beacon interval (*BI*).

#### *5.3. Implicit Notification Process of Critical Events Using the Data Frame Reserved Field*

After defining the superframe durations for cluster-heads and starting the monitoring process, sensor nodes are responsible to identify and notify the PAN coordinator about any detected critical event. Event notification is reported between sensor nodes and PAN coordinator by using reserved bits in the data frame. The approach used in this work is known as ICN [30], where notification bits are transmitted using a *piggyback* technique in the MAC frame header—MHR (Figure 5) to identify the change of data acquisition of a particular node and to alert the PAN coordinator about a critical event.


**Figure 5.** Detail of the dataframe MHR format, modified from the IEEE 802.15.4 standard [6].

In this work, three bits are used to notify a critical event through the reserved field. The most significant bit is used to identify a specific network reconfiguration round, in order to avoid more than one reconfiguration process to be triggered for the same critical event. In turn, the two least significant bits are used by sensor nodes to represent the multiplicity of their processed data acquisition rates ("00", "01", "10" or "11"). Table 1 shows the different possible behaviours for sensor nodes used in this work.

As described in Table 1, a sensor node can operate at its default data rate, setting its bits to "00", or else it can change its acquisition rate to the double of default load (less significant bits set to "01") or four times the default load (bits "10"). In turn, a sensor node can decrease its acquisition rate by setting its bits to "11", when a critical event is finished. As previously described, the most significant bit 'X' is used to identify whether a given data packet belongs to a current reconfiguration process or if it corresponds to a modification in the data rate of a sensor node. For example, when the network is fully deployed and the monitoring is started, the default load operated by each device is set to "000" (where

X = '0' identifies the current operation and the multiplicity = "00" as the default data rate). If a set of sensor nodes identifies a new critical event and they change their default acquisition rate to twice, their bits must be changed to "001". Then, the PAN coordinator will be able to identify this mode change request and trigger a reconfiguration procedure (if needed). After the network reconfiguration is complete, sensor nodes will change their bit X to 1 (able to identify a new critical event) and reset the multiplicity value to "00".

**Table 1.** Table indicating notification bits and degree of network behaviour change.


It is important to highlight that the proposed notification mechanism does not require any modification of the structure of the data frame, maintaining the compliance with IEEE 802.15.4 standard. Upon receiving data packets with a mode change request (modified multiplicity bits), the PAN coordinator will be able to start a new network reconfiguration procedure (if needed), which will be detailed in the following subsections.

#### *5.4. Reconfiguration Analysis and Calculation*

The PAN coordinator is responsible for performing the necessary reconfiguration calculations for the cluster-tree network, according to the received multiplicity bits from sensor nodes. The objective is to verify the need for recalculating the main communication structures of CHs (SDs and BIs), in order to avoid possible network overloads or network congestion issues. Figure 6 illustrates this situation.

**Figure 6.** Network behaviour considering the details of critical event detection and reconfiguration.

Figure 6a,b illustrate the scenario where several sensor nodes can detect and report a critical event in the monitored environment. Within this context, the PAN coordinator applies the Load-SDA algorithm again, in order to recalculate the BO and SO values for each of the involved CHs, but considering the new load imposed by sensor nodes affected by the critical event. In the following, PAN coordinator must analyse the impact upon the current configuration and verify whether a new set of superframe durations is required and if it is schedulable (according to Equation (2)).

On the one hand, if the new superframe reconfiguration does not impact the current configuration (the same superframe durations allocated to all CHs), PAN coordinator only send (reset) control messages to sensor nodes with changed multiplicity bits in order to inform that, from this moment, the current data rates for each one them become their default data rates (green flow shown in Figure 6c).

On the other hand, if the new superframe reconfiguration is different of the current superframe configuration for CHs, and meets Equation (2), the PAN coordinator will send control messages for CHs, containing the new reconfiguration for SO and BO values. Moreover, PAN coordinator send (reset) control messages to sensor nodes with changed multiplicity bits, becoming their new default data rates. Notice that changing the SD for a given CH can cause the subsequent CHs to shift in the scheduling structure (as shown in Figure 7).

**Figure 7.** Communication structures after the reconfiguration analysis.

Furthermore, if the new superframe reconfiguration does not meet Equation (2), the new set of generated superframes will not be schedulable (because it does not fit within the BI, or because the required BI should be longer than the minimum period). Thus, the reconfiguration scheme proposed in this work considers that the PAN coordinator can gradually decrease the data acquisition rate (data rates) of all sensor nodes in the network (not involved in the critical event). As a consequence, the total network load is reduced until it becomes schedulable (to fit inside the BI). In this case, PAN coordinator must send control messages composed of the new values of SO and BO for CHs, in addition to the value corresponding to the reduction rate for non-event sensor nodes.

Finally, considering the superframe reconfiguration described in this subsection, the PAN coordinator is responsible for notifying all the involved nodes. For this, DyRET uses an opportunity window mechanism in order to quickly broadcast control messages (downstream traffic). This mechanism is described in the following subsection.

#### *5.5. Opportunity Window and Dissemination of Reconfiguration Control Messages*

To promote a self-adaptive system and to dynamically reconfigure communication structures, DyRET considers an Opportunity Window (OW) mechanism. OW allows the implementation of a hybrid scheduling model, that temporarily changes the current bottom-up scheduling to a top-down scheduling, in order to prioritise the control traffic. Moreover, this mechanism also promotes the fast control message dissemination through an improved configuration of the CSMA-CA parameters, as described in [34]. Figure 8 illustrates the OW mechanism for a depth-4 cluster-tree network.

**Figure 8.** Rescheduling model addressed for prioritisation different traffics.

Before sending control messages with the new reconfiguration during the top-down scheduling, the PAN coordinator is responsible for creating a set of warning messages (*WARN\_msg*) and forwarding them to all descendants CHs during the bottom-up scheduling. This mechanism is intended to individually notify each CH about the correct opening time instant for the Opportunity Window, avoiding thus temporal inconsistencies.

Each *WARN\_msg* is composed of a tuple *<#, D, R>*, where *#* corresponds to the sequence number of the warning message, *D* is the maximum depth of the cluster-tree network and *R* corresponds to the redundancy value, representing the number of replicas of the warning message that the PAN coordinator will send. Upon receiving at least one of the warning messages *WARN\_msg*, each CH can define the number of remaining BIs for the opening time instant for the OW, through Equation (6):

$$N\_{BI} = \left(D - d\_i\right) + \left(R - \#\right),\tag{6}$$

where *NBI* is the number of remaining beacon intervals for creating the OW and *di* is the depth of *CHi*.

Figure 9 illustrates the timeline of creating the OW for a cluster-tree network with a maximum depth D of 4 and a redundancy value R of 3.

**Figure 9.** Timeline of the process of creating an Opportunity Window.

Importantly, warning messages are sent to CHs across the network through the indirect communication mechanism provided by the IEEE 802.15.4 standard. In indirect communication, a coordinator node indicates in the pending address field of its beacon that data is pending to be transferred. Each child node will inspect the beacon frame to verify if its address is pending. If so, this node requests the data from the coordinator during the CAP. In turn, the coordinator receives this request and subsequently sends the pending data during the CAP period, using the CSMA-CA algorithm. After receiving the data, the child node confirms its reception.

Considering the correct time instant to open the OW, each CH performs the change from bottom-up to top-down scheduling according to Equation (7):

$$TDSched\_{CH\_i} = 2 \times BI - 2 \times offset \left[ CH\_i \right] - SD \left[ CH\_i \right] \tag{7}$$

where *TDSchedCHi* is the new offset for *CHi* in the top-down scheduling; and the *o f f set*[*CHi*] and *SD*[*CHi*] are, respectively, the initial offset and the superframe duration of cluster-head *CHi*.

Therefore, after the definition of the opportunity window, the PAN coordinator will start the dissemination of reconfiguration control messages throughout the network, which are forwarded to all CHs through an indirect communication mechanism. To guarantee a higher probability of accessing the wireless channel, the sending of reconfiguration control message among the coordinator nodes is carried out by changing the default values of the *macMinBE* and *macMaxBE* variables, according to the strategy proposed in [34].

After all CHs have received the reconfiguration control messages, the bottom-up scheduling is reestablished and the monitoring traffic is prioritised again, until a new critical event is identified and the entire reconfiguration process is restarted. To establish the bottom-up scheduling, each CH calculates its new beacon sending time *Reconf Sched* based on received reconfiguration information through Equation (8):

$$\text{ReconfSchad}\_{CH\_i} = offset[\text{CH}\_i] + SD[\text{CH}\_i] + new\\_offset[\text{CH}\_i] \tag{8}$$

where *new*\_*o f f set*[*CHi*] is the new offset calculated during reconfiguration for *CHi*.

Algorithm 1 describes the proposed DyRET mechanism. Please note that the PAN coordinator is responsible for performing the main steps of DyRET. Although these operations may require higher processing power and energy consumption, the PAN coordinator is commonly a special node with more memory and computational power. Furthermore, the processing time for this type of equations is negligible.

**Algorithm 1:** DyRET Algorithm. /\* The following code is executed for all cluster-heads \*/ **repeat if** *( CurrentScheduling == BottomUp )* **then if** *( CH receives a WARN\_msg <#, D, R> )* **then** // CH calculates the number of BIs *NBI* for creating the OW *NBI* = (*D* − *di*) + (*R* − #); // CH calculates its new offset in the top-down scheduling *TDSchedCHi* = 2 × *BI* − 2 × *o f f set*[*CHi*] − *SD*[*CHi*]; **else** // CurrentScheduling is Top-Down **if** *( CH receives a reconfiguration control message )* **then** CH updates its SD and offset for the bottom-up scheduling; // CH calculates the instant to reestablish the Bottom-up scheduling *Reconf SchedCH* = *o f f set*[*CH*] + *SD*[*CH*] + *new*\_*o f f set*[*CH*]; **until** *the cluster-tree is not operational*; /\* The following code is executed by the PAN Coordinator \*/ **repeat** PAN Coordinator receives the data frames generated by sensor nodes; **if** *( CurrentScheduling == BottomUp )* **then if** *( PAN Coordinator identifies a data frame with modified multiplicity bits )* **then if** *( The need of recalculating the SDs and BI for CHs == True)* **then repeat** PAN Coordinator applies the Load-SDA based on the new load imposed by nodes; PAN Coordinator recalculate the BO and SO values for involved CHs; **if** *( Set of SDs is schedulable )* **then** PAN Coordinator sends WARN\_msg <#, D, R> for CHs; **else** PAN Coordinator gradually decrease the data acquisition rate of all sensor nodes; **until** *the set of SDs is schedulable*; **else** // CurrentScheduling is Top-Down

 PAN Coordinator sends reconfiguration control messages for all involved nodes during the OW; **repeat if** *( All CHs received their reconfiguration messages )* **then** // The Bottom-up scheduling is reestablished

$$\text{ass} \quad \bigsqcup \qquad \bigsqcup \quad \text{Recon} \\ \text{cfSched}\_{\text{CH}\_{l}} = offset [\text{CH}\_{l}] + SD[\text{CH}\_{l}] + new\\_offset[\text{CH}\_{l}];$$

**until** *all CHs have received the reconfiguration control message*;

**until** *the cluster-tree is not operational*;

#### **6. Simulation Assessment**

This section details the simulation assessment of the event-triggered dynamic reconfiguration mechanism proposed in this work. This assessment compares the behaviour of a network that uses the DyRET mechanism vs. a network that does not use dynamic network reconfiguration in the occurrence of critical events. The target of this assessment is to highlight how the DyRET communication mechanism is able to handle the efficient dissemination of both monitoring upstream messages and reconfiguration downstream control flow messages, avoiding typical cluster-tree network impairments, such as high end-to-end communication delays, network congestion, and high packet loss rates.

CT-Sim [35] has been used to assess the performance of the proposed mechanisms. CT-Sim is a set of simulation models based on *Castalia* [36], which implements the main features of cluster-tree networks.

#### *6.1. Simulation Scenarios*

For this simulation assessment, we consider three different communication scenarios (Figure 10), each one with three different number of nodes (100, 150 and 200 nodes, plus the PAN coordinator). For the sake of convenience, the following terms are used to describe the different simulation scenarios:


**Figure 10.** Different simulation approaches assessed.

The cluster-tree formation process is based on the IEEE 802.15.4 standard. The PAN coordinator is located at the corner of the environment (195 m × 195 m) and it is responsible for starting the network formation process, by associating sensor nodes, forming its own cluster and selecting a set of child nodes to be cluster-heads. Each CH (including the PAN coordinator) can associate a maximum of 6 (six) child nodes and select a maximum number of 3 (three) candidate child nodes to be CH, which can generate their own clusters.

Regarding the monitoring traffic, after the cluster-tree formation, each sensor node generates 2000 data messages at a data rate of 0.05 pkts/s (periodicity of 1 packet every 20 s), which are forwarded across the network to the PAN coordinator (sink node). Importantly, CHs do not perform any data aggregation or fusion procedure, which implies that all monitoring traffic is routed to the sink node. The superframe durations (defined by SO parameters) are proportionally allocated to each CH according to the data load of the descendant nodes (implemented by the Load-SDA algorithm [10]). In turn, BI is defined according to Equation (2). For this simulation assessment, as the shortest message periodicity *Pmin* is 1 packet every 20 s, the value of BO parameter was defined to 10 (BI of 15.72 s).

For the *Monitoring* scenario (Figure 10a), no critical events are generated, i.e., the existing traffic is just the typical monitoring traffic generated by sensor nodes. This scenario is used as the basis to assess the impact of inserting critical events upon cluster-tree networks and the benefits of using a dynamic reconfiguration mechanism.

On the other hand, in the *Event-Only* and *DyRET* scenarios, there is the occurrence of critical events (Figure 10b,c). This simulation study considers the occurrence of a single critical event, where the event area is defined as a rectangular region located at the opposite corner to the PAN coordinator, comprising sensor nodes localised in the range of 80 m × 50 m (about 10% of the sensor nodes). Furthermore, sensor nodes within the critical event area have their data acquisition rates changed with multiplicity 4 (*bits* "10"), which is equivalent to modify its periodicity from 1 packet every 20 s to 1 packet every 5 s. The critical event was scheduled to take place at 1000 s of simulation. Thus, event sensor nodes will send their data messages considering this new periodicity. As the critical event remains until to finish the simulation, each sensor node will maintain its new periodicity for sending all its data messages (defined to 2000 packets).

Table 2 summarises the main configuration parameters used in the simulations. The *macMaxFrameRetries* parameter corresponds to maximum number of packet transmission retries and its value was set to 3 (default value) [6]. In this simulation assessment, we have adopted the IEEE 802.15.4-compliant CC2420 radio and the unit disc model as the propagation model. Moreover, we used an advanced wireless channel model based on empirically measured data, available in Castalia simulator [36].


**Table 2.** Simulation parameter configuration.

#### *6.2. Results and Discussion*

Considering the proposed methodology and the described simulation scenarios, the following performance metrics will be considered:


• **Network reconfiguration time:** the number of *beacon* intervals required to send all reconfiguration control messages and thus, to reconfigure the overall network.

Firstly, the average end-to-end communication delay and average packet loss rate for all sensor nodes were assessed, considering all simulation scenarios and the aforementioned approaches. Then, we evaluate the same network metrics considering only the sensor nodes located at the region of the critical event, in order to compare the obtained results when using or not using the reconfiguration scheme. Finally, the network reconfiguration time is analysed based on the number of BIs required to send all reconfiguration control messages.

The results and discussion are presented in the following subsections.

#### 6.2.1. Discussion of Results Considering All the Sensor Nodes of the Cluster-Tree Network

To demonstrate how critical events significantly affect the behaviour of cluster-tree networks, Figure 11 illustrates the average end-to-end communication delays for all simulation scenarios, considering all the sensor nodes of the network. It can be considered that the Monitoring approach presents the base scenario, against which any comparison should be made.

**Figure 11.** Average end-to-end communication delay for all sensor nodes.

The modification of the data acquisition rate of sensor nodes located at the region of the critical event can cause serious effects to the end-to-end communication delays, if no efficient action is taken. As expected, it can be observed that end-to-end delays for the Event-only approach are remarkably higher (about 4 times higher) than for the base case of just Monitoring traffic. It can also be observed the effectiveness of the proposed DyRET communication mechanism to handle the dynamic reconfiguration of a cluster-tree network. Using the DyRET mechanism, the end-to-end delays can be significantly reduced, even in the presence of critical events, keeping these results compatible with the scenario without the occurrence of critical events (Monitoring scenario).

#### 6.2.2. Discussion of Results Considering Sensor Nodes Involved in the Critical Event

Another relevant result is to assess the network behaviour of data flows generated by sensor nodes involved in the critical event. Figure 12 illustrates the average rates of message discarded for the data flows generated by sensor nodes located at the region of the critical event, considering all simulation scenarios and all analysed approaches.

**Figure 12.** Average packet loss rate for sensor nodes involved in the critical event, considering all simulation scenarios.

As it can be observed in Figure 12, the Event-only approach presents a much greater number of discarded messages due to the critical event, when compared to the DyRET approach. As Event-only approach does not implement any mechanism to adequately reconfigure the communication network, the increase of data acquisition rate induces a quicker buffer occupation, which causes a larger number of discarded messages due to buffer overflows. On the other hand, as the DyRET approach reconfigures the communication network in the presence of critical events, data messages are quickly disseminated along the network, allowing alleviating the overload of the buffers and avoiding further message discards.

Moreover, Figure 13 illustrates the timeline of discarding messages for both the Event-only and DyRET approaches. It can be observed that, until the occurrence of the critical event (1000 s), the average packet loss rates present similar values for both approaches.

**Figure 13.** Timeline of packet losses (the range evaluated is 0 until the time the values remain constant).

After the occurrence of a critical event, the DyRET mechanism quickly recovers from a peak of packet loss during the actuation period (about 60 to 90 s). During this period, control messages are concurrently sent to the sensor nodes for the reconfiguration of the network. As long as the reconfiguration process is complete, the average packet loss rate is reduced until it remains constant, and at a similar value as for the Monitoring Scenario. This is not obviously the case of the Event-only approach, which linearly grows until it reaches its maximum peak.

In addition, Figure 14 illustrates the average end-to-end communication delays for sensor nodes located in the region of the critical event. It shows that DyRET approach presents significantly smaller end-to-end communications delays (close to Monitoring approach) for all simulation scenarios, even with the increase of the acquisition rate of sensor nodes at the critical event region. In turn, as Event-only approach does not implement any online reconfiguration mechanism, a higher message periodicity will cause a cumulative effect upon the buffers of cluster-heads belonging to the branch of the tree until the PAN coordinator, which will generate higher end-to-end communication delays and higher packet loss rates.

**Figure 14.** Average communication delay for sensor nodes involved in the critical event region.

Moreover, Figure 15 presents the timeline of the average end-to-end communication delay for both the Event-only and DyRET approaches. It is cleat that, after the occurrence of the critical event (1000 s), the average end-to-end delay highly increases for Event-only approach, while the proposed DyRET approach keeps almost constant delay rates. These results illustrate the relevance of using efficient network reconfiguration mechanisms when the behaviour of data flows is changed along the cluster-tree operation.

**Figure 15.** Timeline of delays (the range evaluated is 0 until the time the values remain constant).

Importantly, the end-to-end communication delay and packet loss rates in the 150-nodes scenario are higher than the 200-nodes scenario. As the network formation procedure is random, event nodes can be located in different branches and depths for the simulation scenarios. For the 150-nodes scenario, event-nodes were located at the deepest branches (average depth of 8), when compared to the 200-node scenario (average depth of 7).

#### 6.2.3. Discussion of Results About the Network Reconfiguration Time

Finally, we have also assessed the time spent to reconfigure the cluster-tree network, from the instant of the occurrence of a critical event until the network is completely reconfigured.

Figure 16a illustrates the ratio between the required number of beacon intervals (OW size) with the average maximum depth of a cluster-tree WSN. Considering that a beacon interval is approximately 15 s, a network with a maximum average depth of 7 requires an Opportunity Window size of 4 BIs (approximately 1 min) for the overall network reconfiguration (for the 100-nodes and 150-nodes scenarios). For 200-nodes scenario, around 5 beacon intervals (≈ 84 s) are required to send reconfiguration messages for the entire network. Such values correspond to a reconfiguration time in seconds as outlined in Figure 16b.

**Figure 16.** Reconfiguration time. (**a**) The ratio of the number of BIs in the Opportunity Window under the maximum network depth; (**b**) the respective time in seconds.

It is important to notice that despite the significant increase of the density of the communication environment, the size of the OW remains low. This fact illustrates that the configuration of CSMA-CA parameters during the Opportunity Window period combined with the hybrid scheduling actuation model is crucial for the efficient dissemination of control messages (downstream traffic).

Furthermore, the total actuation time is composed by the sum of the reconfiguration time plus the Opportunity Window configuration time. This OW configuration time comprises the period between the PAN coordinator to identify the first data packet with modified bits and the last *WARN\_msg* being received by sensor nodes. Table 3 illustrates the average total actuation time for all simulation scenarios.


**Table 3.** The average total actuation time for all scenarios.

Finally, and for the sake of clarity, Figure 17a illustrates the average occupancy rate of superframes for all simulation scenarios. The horizontal blue bars represent the sum of active periods of clusters before the occurrence of the critical event and then the red bars represent the increase in seconds caused by the critical event, representing thus the new sum of superframes of clusters after the reconfiguration process.

**Figure 17.** Ratio between load imposed and the behaviour of structures: (**a**) the load consumed and the free space within BI; (**b**) load increased percentage caused by event-nodes.

Simulation results have shown that DyRET mechanism is able to improve the transmission of data messages. It is worth mentioning that whenever a critical event is triggered, i.e., there is a disaster situation evidence, all the sensor nodes located in that region must increase their sensing data rate to send relevant information to a BS. As a consequence, the convergecast traffic is increased across all branches that form the path of this information to the BS. DyRET mechanism significantly increases the quality of service of data transmission when compared with a traditional approach, being adequate to be used in real-world disaster situations.

#### **7. Conclusions**

This paper proposes a mechanism called DyRET (*Dynamic REconfiguration of cluster-Tree wireless sensor networks*) based on the IEEE 802.15.4 standard. The communication mechanism in DyRET aims to increase the quality of service for the dynamic reconfiguration of cluster-tree networks, thus reducing end-to-end communication delays, congestion of the network and packet loss rates.

The main idea underlying DyRET is the detection of critical events that causes changes in the data acquisition rates of sensor nodes in order to allow the PAN coordinator to efficiently reconfigure the cluster-tree network without impacting the typical monitoring traffic. To achieve this, we propose a set of communication mechanisms that identify critical events and notify the PAN coordinator, reconfigure the communication structures based on critical events, and quickly disseminate the reconfiguration messages for the involved nodes, without impacting the monitoring traffic and while maintaining network synchronisation.

A simulation assessment was performed to evaluate the behaviour of the proposed DyRET mechanism in comparison to approaches that do not use a dynamic reconfiguration scheme. Through the use of implicit event notification and an opportunity window mechanism, we have shown that DyRET can reduce the packet loss rate and the end-to-end communication delays, even with increases in data rates resulting from the occurrence of critical events.

The simulation results illustrate that the DyRET scheme can reduce the end-to-end communication delays by a factor of up to 20 in environments where sensor nodes modify their data rates by an average factor of four compared to the default data load. In addition, the dissemination of control messages within the opportunity window allows all network nodes to be reconfigured within four or five beacon intervals.

A critical event occurrence may increase data rate transmission and, consequently, it triggers the construction of a new beacon scheduling by the PAN coordinator. However, there are situations where this scheduling is only feasible if some cluster-tree branches can reduce their sending rates as discussed in this paper. Then, as future work, we intend to extend DyRET by the use of mechanisms to better balance the network load, such as data fusion or aggregation, allowing parts of the network in stable situations to reduce their rates further. Moreover, we are planning to implement the DyRET mechanism in a real-world scenario testbed, for example, in a fire detection region with a high-temperature critical event. Finally, we aim to integrate the DyRET with guided cluster-tree formation procedures to obtain more-balanced cluster-tree networks.

**Author Contributions:** E.L. and F.V. proposed the DyRET mechanism; M.L. designed the simulation models; E.L., C.M., F.V. and M.L. proposed the assessment framework; M.L. and A.S. performed the simulation assessments and analysed the resulting data; F.V., R.M. and C.M. provided guidance for writing and revised the paper. All the authors contributed to the writing of this document. All authors have read and agreed to the published version of the manuscript

**Funding:** The authors would like to acknowledge the support from FAPEPI/MCT/CNPq/CT-INFRA n<sup>o</sup> 007/2018, CNPq/Brazil (Universal Project 443711/2018-6), CAPES/Brazil (PrInt CAPES-UFSC "Automação 4.0") and FCT/Portugal (Project UIDB/50022/2020) funding agencies.

**Conflicts of Interest:** The authors declare no conflict of interest.

*Sensors* **2020**, *20*, 4707

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **ISA 100.11a Networked Control System Based on Link Stability**

#### **Heitor Florencio 1,***∗***, Adrião Dória Neto <sup>2</sup> and Daniel Martins <sup>2</sup>**


Received: 30 July 2020; Accepted: 19 August 2020; Published: 21 September 2020

**Abstract:** Wireless networked control systems (WNCSs) must ensure that control systems are stable, robust and capable of minimizing the effects of disturbances. Due to the need for a stable and secure WNCS, critical wireless network variables must be taken into account in the design. As wireless networks are composed of several links, factors that indicate the performances of these links can be used to evaluate the communication system in the WNCS. This work presents a wireless network control system composed of ISA 100.11a sensors, a network manager, a controller and a wired actuator. The system controls the liquid level in the tank of the coupled tank system. In order to assess the influence of the sensor link failure on the control loop, the controller calculates the link stability and chooses an alternative link in case of instability in the current link. Preliminary tests of WNCS performance were performed to determine the minimum stability value of the link that generates an error in the control loop. Finally, the tests of the control system based on link stability obtained excellent results. Even with disturbances in the network links, the control system error remained below the threshold.

**Keywords:** industrial wireless sensor networks; ISA 100.11a; wireless networked control systems; link stability

#### **1. Introduction**

The implementation of new communication technologies in industrial automation allows for the integration of industrial processes with greater efficiency, availability and quality. The systematic association of control and monitoring systems with communication systems generates several benefits. Wireless networked control systems acquired significant technological advancements with the rise of wireless networks, advanced control, embedded computing and cloud computing.

The connection of devices spatially distributed for different purposes with wireless networks provided a great increase in applications using wireless sensor networks (WSNs). However, critical process control constraints defined some limits of WSN technology. Industrial wireless sensor networks (IWSNs) are a specific field of WSNs, which takes into account reliability constraints, timing deadlines and critical nature of industrial applications.

The industrial wireless sensor networks are being used in various branches of industry: area monitoring, structural monitoring, disaster prevention and control systems. Many applications in different monitoring and control systems are said to be safety related. In such kinds of systems, a system failure may put people in danger, lead to environmental damages or result in economic losses [1]. The guarantee on packet reception and reliability must also be provided for feedback control systems to operate properly. There is also a need for extensive measures to be implemented to counter

the uncertainties of wireless means of communication [2]. System performance evaluation parameters are required to ensure these extensive measures.

The development of wireless networked control systems (WNCSs) is fundamental in Industry 4.0. The development of both intelligent manufacturing equipment and intelligent control systems is a priority in the machine tools sector. Additionally, the priorities in the area of IT include the Internet of Things (IoT) and its applications, including industrial control [3]. Since the cyber-physical systems represent the integration of physical systems with computing and networking capabilities, WNCSs are an important class of cyber-physical systems in Industry 4.0, in which physical processes are controlled using wireless sensors, actuators and controllers [4].

A WNCS connects sensors and actuators of a plant to a controller via a wireless network, which has several critical communication channels. These links are classified as critical because they are part of several closed-loop control systems. Thus, the system design must also take into account an evaluation parameter for these links. Link stability is an appropriate factor for analyzing links that carry information from sensors to the controller.

In this paper, we present an ISA 100.11a wireless network control system. The control system receives two level measurement values, but the choice of the level value depends on the stability of these two links. Link stability assesses the performance of the link from samples of the received signal strength. A permanent monitoring of the link stability guarantees the regularity of the control system. The networked control system is implemented with ISA 100.11a wireless sensors, network manager, gateway, controller and wired actuactor (pump).

The remainder of this paper is organized as follows: Section 2 discusses some related works about WNCS and link stability on wireless networks. In Section 3, the overall the system architecture is described. Section 4 describes the software implemented in the controller of the system. All system tests and results are presented in Section 5. Finally, conclusions are stated in Section 6.

#### **2. Related Works**

#### *2.1. Wireless Networked Control Systems*

The industrial networks are part of the structure of automation systems. They allow the communication of all instruments in the system, including sensors, actuators, controllers and data acquisition stations. The wireless sensor networks have been used in many monitoring applications for various physical phenomena, such as temperature, flow, level, vibration, humidity and pump analysis [5]. With the advent of the Industrial Internet of Things (IIoT), the availability of fast, secure and reliable communication networks deployed within factories and connecting all the elements of industrial control systems became a requirement [6]. However, deployment of wireless communication in the control systems creates new obstacles to overcome, such as the need of designing the control parameters associated with the network parameters.

Wireless networked control systems allow all or some of the measurement and control signals to be transmitted over wireless channels. There are approaches in which both signals the one from sensors and the one sent to the actuators travel through wireless communication technology. However, there are other approaches that use wireless and wired signals. The data that flow between sensor nodes and controllers are not necessarily symmetric in WNCSs [7–9].

The objectives of the networked control system are to ensure that the closed-loop system has desirable dynamic and steady state response characteristics, and that it is able to efficiently attenuate disturbances and handle network delays and loss [7]. The main communication problems are the delay and the packet loss rate, which directly influence the reliability of the system. The delay problem may greatly reduce the performance of the control system, so that the stability of the system is narrow.

There are several industrial wireless communication protocols that allow different configurations of parameters and structures. Likewise, there are several control loop techniques with several variations. A proper model must consider the parameters of both control and communication. Efficient integration of communication and control has been identified as a high-impact challenge for the next generation of industrial automation systems [10]. A more integrated approach is necessary in order to design systems that systematically model parameters between the communication and control systems.

Determining the optimal parameters for minimum network cost while achieving feasibility is not trivial because of the complex interdependence of the control and communication systems. WNCSs require novel design mechanisms to address the interaction between control and wireless systems for maximum overall system performance and efficiency [7]. Several researches have being developed to model, evaluate and validate wireless network control systems [11].

Park et al. [7] presented and explain many criticial system variables. There are critical variables both in the control system and in the wireless communication system, which are closely linked. For instance, the control system defines the sampling period and the communication protocols determine the retransmission mechanism in case of failure. Therefore, the maximum retransmission period (communication system) must be determined together with sampling period (control system). The critical variables in the communication aspect are packet delay rate and packet loss rate. Additionally, in the control system aspect, the variables are sampling period, message delay and message dropout [7].

The delay time is a parameter used by several researchers to model WNCSs with different approaches [5,10,12]. Shi et al. [5] modeled the network time delay in the multi hop network caused by the S-MAC communication protocol. The model also considers the controller model in a WNCS. Araujo et al. [10] took into account the delay in the sensors' and actuators' links and modeled a solution to compensate for the delay. The paper categorizes two types of delays: delays in the access to the communication channel and delays due to the transmissions and computation at the controller. In [12], the model is based on research area of delay-constrained wireless communication. This paper analyzed a WNCS with multiple control systems sharing a commom wireless channel.

Many researchers have implemented WNCSs with simulators only, which causes the absence of experimental tests with instruments of the manufacturers in the market. By way of example, Horvath et al. [4] presented a simulation framework which includes a realistic model of the physical layer with multi-channel frequency-hopping mesh networks. The simulation framework is evaluated by implementing a WNCS based on WirelessHART.

Park et al. [7] and Araujo et al. [10] presented experimental tests with wireless instruments in the level control system of coupled tanks. Both instruments used in the tests are Telos nodes [13]. Ahlén et al. [14] presented an implementation on an industrial process at the Iggesund paper mill. All control loops were implanted using wireless sensors and actuators. Additionally, the ABB AC800M controller received and sent information to the instruments by communicating with the network gateway via Profinet. The results indicate that it is feasible to use wireless control for continuous production. Ahlén et al. [14] ensured that it is possible to reach the desired availability with wireless instrumentation compared with the wired instrumentation.

In this study, we implemented a wireless networked control system based on ISA 100.11a instruments from the manufacturer Yokogawa Electric. In addition to the delay time, the packet loss rate in communication links is also an essential parameter in the evaluation of the communication system. As such, this study evaluated the implementation of a WNCS based on the evaluation of the links that are part of the control loop. The stability of the links is the factor used in the system proposed in this work.

#### *2.2. Link Stability in IWSN*

A wireless sensor network contains several instruments located in different locations, which generate different communication links. Communication between two devices can have more than one link. However, each link has its characteristics and can be affected differently by interference. One way to evaluate these links is by the link stability factor. The link stability should not be confused with the stability of the control system, the concepts are significantly different from one another.

Some researchers claim that link stability indicates how stable the link is and how long it can support communications between two nodes [15]. For others, the link stability means the link will sustain for a long time and does not break regularly [16]. Overall the link stability indicates the level of variation of the link with respect to the level of noise and the rate of packets lost.

In paper [17], we presented a study about link stability in IWSN. Link stability is defined by the variation of received signal strength (RSS) and packet delivery rate (PDR). An unstable link presents a high variation of signal attenuation and low packet delivery rate. A stable link has no signal attenuation variation and has a high packet delivery rate [17]. Table 1 is formulated to present the selected papers in [17] and their parameters used to define stability.


**Table 1.** Papers that define link stability and its parameters.

Many papers use the distance between nodes and expiration time because they are applied to mobile networks, which have a high level of mobility. Models that use link expiration time do not consider the instability during the period that the links are active on the network. The appearance of several interferences is common, even with the device in the same position in IWSN.

In industrial wireless sensor networks, the devices are typically distributed at fixed locations without mobility, and thus the links remain active throughout the period of operation. Therefore, the model of this work does not consider the link expiration time. Additionally, the distance between nodes parameter is used indirectly in the acquisition of the received signal strength parameter.

The link stability function is based on the variation of the received signal strength. It was necessary to create a factor that indicates the degradation of the signal to calculate this variation. This factor is generated from previous samples of the received signal strength and the current value. The entire process of generating link stability is detailed in Section 4.1. A link with low stability directly implies the performance of the control system.

In this study, the link stability was be used in the controller program to choose the measurement value used in the control technique. However, the evaluation of the implementation of the network control system is the main objective of this work. The next sections detail the system implemented.

#### **3. System Architecture Design**

The system architecture consists of a controller to receive data from the sensors and send the signal to the actuator. Data are collected through the ISA 100.11a network gateway, which interfaces with all measurement elements. However, the only actuation element in the system uses a wired signal. The systems in papers [4,32] also use the wired signal from the actuator.

The system implements a tank level control loop with wireless instrumentation. The system configuration allows the controller to choose the process variable (PV) value between the LD01 sensor route and the LD02 sensor route. Thus, the stability metric of these links is the parameter that defines the choice of PV. Next the controller runs the program and sends the wired signal to the pump.

The controller provides two communication interfaces with the ISA 100.11a gateway to collect data from the links and measurement variables of the instruments. In addition, an interface with a supervisory system is provided to enable monitoring of variables. Figure 1 shows the system architecture.

**Figure 1.** ISA 100.11a network control system architecture.

It is possible to notice in Figure 1 that there are two differential pressure sensors to measure the tank level (process variable): LD01 and LD02. These sensors measure the same differential pressure value. That means the controller can receive the same process value from both routes to control the tank level.

#### *3.1. Level Control System*

The control system is composed of two tanks coupled, a water basin, two level sensors and a pump. The liquid in the lower tank flows to the water basin and a pump is responsible for pumping water from the basin to the upper tank. The liquid in the upper tank flows to the lower tank. There is a sensor in a each tank for measuring the level of tank.

The system in this project used only an upper tank. The tank shown in Figure 1 represents the upper tank of the double-tank coupled. However, the two sensors LD01 and LD02 were placed at the same measuring point. The integration of these measurements from the tank with the controller was performed through the ISA 100.11a wireless network.

#### *3.2. ISA 100.11a Network*

Nowadays, industrial wireless networks are part of the structure of automation systems. They allow the communication of all field instruments, including sensors and some actuators, with controllers and data acquisition stations. Some protocols define the rules and techniques for wireless communication at the sensors, actuadors and controller, such as IEC 62734 (ISA 100.11a) [33].

The ISA 100.11a architectures contain elements that perform information transmission, information routing and network management at different levels. Each network is formed with nodes and composed of a processing unit, a radio, memory, a data acquisition board and a battery. Currently, regardless of topology, all networks need a central element to concentrate information from all nodes: the network manager. The manager receives and sends the data packets to the nodes, and manages all links formed in the network. Within end-to-end communication, at least one of the elements is the manager, either by receiving the packet from a measuring element or by sending the packet to an actuation element.

The network manager is not responsible for the junction authorization of a new device to the network. The tasks of managing security keys, such as authenticating, generating and storing, are the responsibilities of the security manager. Physically, the two managers are integral logical parts of the gateway element, as shown in Figure 2a. In this work, the network manager, security manager and gateway were implemented in a single management station: YFGW410 (Yokogawa Electric) [34]. The first device in Figure 2b is gateway YFGW410.

**Figure 2.** Gateway, network manager, security manager and backbone router.

Backbone router (YFGW510)

The second device in Figure 2b is the backbone router YFGW510 (Yokogawa Electric) [35]. The wireless devices communicate with the gateway and therefore with the managers through backbone router, which operates as an access point of wireless network.

The field devices on the ISA 100.11a network can be routers or non-routers. The use of routers in networks can transmit their data and their neighbors' data to the manager, thereby increasing the redundancy and availability of the network due to alternative paths generated.

The ISA 100.11a network of the Figure 1 architecture includes only three instruments: two differential pressure sensors (LD01 and LD02) and a temperature transmitter, which operates in the router mode of the LD01 instrument. Unlike the LD01 sensor, the LD02 instrument has a direct link to the gateway. Figure 3 shows the number and nomenclature of the system links.

**Figure 3.** The system links.

The links created in the system configuration are:


The LD01 sensor route is composed of link0 and link1, and the LD02 sensor route is composed only of link2. The purpose of insertion of the router TT05 in this work was to be able to cause a forced attenuation in the antenna of this instrument and then analyze the influence on the level control loop. Therefore, it is mandatory that the controller be able to collect the link stability information before executing the control logic and sending the signal to the pump.

#### *3.3. Controller Board*

The controller must be able to collect the measurement values of the sensors (LD01 and LD02), calculate the link stability levels, execute the level control program, control the pump and send the monitoring data to the supervision system.

The link stability metric is generated from the RSSI data of the links. Hence, the controller will collect these data from the links through a communication interface with the gateway. The controller must send requests to the gateway following the GSAP (gateway service access point) specification. The system collects the measurement values from the sensors through a Modbus TCP communication interface with the gateway. Finally, the sending of data to the supervision system also uses a Modbus TCP interface.

Due to the high communication and processing capacity and the need for a WiFi module to implement Modbus TCP and GSAP commands, the system controller chosen was the ESP32 microcontroller from the company Espressif. Figure 4 shows the ESP32 controller.

**Figure 4.** ESP32 controller.

ESP32 is a single chip designed with ultra low power technology that incorporates microprocessing, memory, peripherals and communication modules (WiFi and Bluetooth). The main features that distinguish it from other platforms used in embedded systems are: two processing cores, a 160 MHz clock, an integrated Bluetooth module, a flash memory expandable to 32 GB, 36 GPIO pins and 18 channels of analog-to-digital converters [36].

This controller has been used in several IoT applications due to its processing power and low power consumption. It is possible to define a completely wireless solution using the ESP32 module, which integrates the IEEE 802.11 network protocol with the IoT architecture [37,38].

Two communication modules between controller and gateway were implemented: Modbus TCP communication and GSAP communication. The Modbus TCP interface is responsible for acquiring the process variable data from the LD01 and LD02 sensors. The GSAP module requests the information from the links, focusing on the RSSI values used in the link stability function. Figure 5 shows the modules implemented in the ESP32 controller.

**Figure 5.** Controller modules.

The link stability module implements the link stability analytical model and defines which PV value will be used in the control logic. Finally, the pump receives a pulse-width modulation (PWM) signal resulting from the program that controls the tank level.

#### 3.3.1. Modbus TCP Communication

The Modbus protocol is an industrial communication protocol at the application layer that follows a master–slave topology in order to perform the communication between devices. Only one device, the master, can initiate request–response messages to other devices (slaves) by sending a query to an individual slave or sending a broadcast query to all slaves. In the case of Modbus TCP/IP, the slave address is identified by an IP address [39].

It is then possible to use Modbus over serial protocols (e.g., RS-485) or TCP/IP protocols on Ethernet. In any case, the message structure is always the same. The Internet community can access Modbus at a reserved system port 502 on the TCP/IP stack. Modbus TCP/IP is mostly used in the data sharing between the field device level (e.g., PLC, CAN J1939 to the Modbus Gateway) and the SCADA system level. Modbus TCP/IP as a protocol could support communication between field devices via TCP, i.e., between sensors, actuators and PLCs [39,40].

There are two Modbus TCP communication modules in the controller: communication with the ISA 100.11a gateway and communication with the supervision system (ScadaBR).

In the Modbus TCP communication module with the gateway, the communication master is the ESP32 controller, which makes the requests, and the communication slave is the gateway. The slave's Modbus memory mapping (gateway) contains the PV values of the two sensors, as shown in Table 2.


**Table 2.** ISA 100.11a gateway Modbus memory mapping.

The controller also provides a Modbus communication with the ScadaBR supervisory application to supervise the control system [41]. Only the main variables of the control loop and the network are monitored.

Unlike the other Modbus module, the ESP controller is the communication slave in the interface with the ScadaBR application (the communication master). The mapping of the ESP32 Modbus memory, as shown in Table 3, only contains variables of the type holding registers. Some variables occupy more than one memory position (offset) due to the representation as a float data.


**Table 3.** Controller Modbus memory mapping.

ModePID, PV1, SP, MV and PV2 are variables of the control loop and the other variables represent the behavior of the network links. All of these variables that describe the network's performance were collected from the gateway through GSAP communication.

#### 3.3.2. GSAP Communication

The ISA 100.11a standard describes an access point interface to gateway services: GSAP (gateway service access point). This service is generic and should be used as a common interface above the application layer of the protocol [33].

GSAP is a specification that defines the support features that allow a communication interface between the ISA 100.11a network and an external network. The standard describes how to implement messages from the GSAP specification using the 15 objects and services described in the standard. There is no complete detail, but a reference to help understand the commands [33].

The commands are implemented using objects from the protocol application layer. Each service accesses a specific type of network manager object. However, there are several commands that can manipulate these objects. Table 4 presents some GSAP services described in the ISA 100.11a standard.

In this work, only the commands G\_Session\_request and G\_Neighbor\_Health\_Report request were implemented. The first service (Session) performs the opening of the GSAP session with the gateway. The second service (Neighbor\_Health Report) is responsible for requesting data from neighbors on a field device on the network.

Each service has specific fields in the request and confirmation messages. The programmer must understand all fields in the package to be able to communicate with a gateway. In addition, it is only possible to request a *Neighbor\_Health Report* service if the session is already open.

In order to exemplify a GSAP service request, Table 5 presents the fields of the command *Neighbor\_Health\_request* with the respective example values.


**Table 4.** GSAP services.

**Table 5.** GSAP command fields: *Neighbor\_Health request*.


The command *Neighbor\_Health\_request* returns an object of type *NeighborHealthReport*, which stores all values about the neighbors of a given instrument on the network. The object returns a vector of elements of type *NeighborHealth*: *neighborHealthList[]*. The *NeighborHealth* structure stores all the data from the link between the transmitter, identified by the Network address field in Table 5, and the neighbor (receiver), identified by the *networkAddress* of the *NeighborHealth* structure.

The GSAP driver sends the opening session command and the command *Neighbor\_Health request* for each link in the system Figure 1, thereby obtaining all data from the network links.

#### **4. Software Implementation**

The software includes all the modules shown in Figure 5. The stability function, and PID and PWMmodules are part of the main logic of the controller. The implementation of the link stability metric generation uses the values of the received signal strength indicator (RSSI). The stability values of the links are used in the selection of PV. Finally, the controller performs the PID control technique.

#### *4.1. Method of Link Stability*

The method provides an evaluation of the link stability of a IWSN. As mentioned in Section 2.2, in the context of an environment susceptible to different types of interference, the link stability is essential to evaluate network performance.

The lower the stability of the link, the higher the variation of the attenuation in the reception signal, and consequently, the greater the instability in the delivery of packets [17].

The method is based on a linear function proportional to the variation of the received signal strength (RSS) and the packet delivery rate (PDR) within a set of samples. This metric is generated from previous samples of RSS and current value of RSS and PDR. Equation (1) presents the link stability factor.

$$LinkStability = PDR\*(1 - 0.04^{MME\_{RSSRatio}}) \tag{1}$$

The variable *MMERSSRatio* is the exponential moving average of the standard deviation values of the attenuation ratio (*RSSRatio*). The purpose of using the moving average is to generate a filter to reduce the influence of outliers, and consequently, present a factor with greater smoothness.

The moving average is based on a set of samples of the variable (*σRSSRatio*); it has a size that varies when a new sample appears. The amount of samples in a data window can be defined by the user when implementing the method.

The first variable to generate the Equation (1) is the rate of attenuation of the received signal: *RSSRatio*. This variable relates the current RSS value to the previous values. The purpose is to relate the current strength value to the signal attenuation in the last samples. The Equation (2) shows the *RSSRatio*.

$$RSSRatio = \frac{RSS\_i}{RSS\_{max}}\tag{2}$$

The variable *RSSi* represents the current value of RSS and the *RSSmax* represents the maximum value of the RSS sample set. The greater the variation of *RSSRatio*, the greater the instability of the received signal strength. Thus, the value used to calculate the moving average is the standard deviation of the RSSRatio values.

As shown in Equation (1), the exponential function with fixed base and exponent MME is used to smooth the factor that multiplies the PDR. This smoothing avoids the generation of very low values of stability when the MME variable becomes very low. Figure 6 shows the flowchart for the entire method.

**Figure 6.** Overview of the method for link stability evaluation.

Part of this method was detailed and published by Florencio and Neto [17]. The link stability metric is able to detect instabilities in the links, which can cause an increase in the packet loss rate, and consequently, considering a networked control system, take the system to an unstable region [17].

#### *4.2. Control Program*

The control technique is an proportional–integral control (PI), which operates at a sampling rate of 100 ms and with gains of KP = 1.6 and KI = 0.15. As the controller modeling and tuning processes are not part of this work, standard values of other works developed with the same tank system were used.

Link0, shown in Figure 3, is the primary link to acquire the level value (PV1). However, if the stability level of link0 is equal to or below a threshold, the controller must select the PV value of the alternative route: link2 (PV2).

The control program must perform the steps listed below.

1. Read the value of PV1 (module: Modbus TCP communication with Gateway);


The pseudocode Algorithm 1 presents an overview of the lines of code implemented in the ESP32 controller for level control based on the link stability metric.

#### **Algorithm 1** Algorithm of the controller program.

1: PV1 = modbusGW\_request(1, 13, 2); 2: PV1 = (double) PV1; 3: PV2 = modbusGW\_request(1, 34, 2); 4: PV2 = (double) PV2; 5: SP = (int) modbus\_scada.Hreg(HREG\_SP); 6: gsap\_requestLinks(); 7: stability0 = func\_stability0(); 8: stability2 = func\_stability2(); 9: **if** stability0 > thresholdStab **then** 10: PV = PV1; 11: **else** 12: PV = PV2; 13: **end if** 14: MV = pidTank.Compute(SP, PV); 15: PWMvalue = (int) MV;

Each line or group of lines of code performs a step from the control program. The modbusGW\_request(x, y, z) commands in lines 1 and 3 request the PV variables for each sensor, where x is the slave ID (master–slave communication), y is the address of the variable in the slave's Modbus memory mapping and z is the size of the variable (multiples of 16 bits). Hence, as the communication slave is the gateway, line 1 requests the PV value, mapped in position 13, with a size of 32 bits.

Unlike the communication of the ESP32 controller with the gateway, the controller is the slave in the communication with the between ScadaBR and controller. The modbus\_scada.Hreg(x) command reads the value contained in the x position of ESP32 Modbus memory. This memory location is written in the supervision application (ScadaBR), or rather, by the operator in the Modbus writing commands.

Line of code 6 collects the network link data used to generate the link stability factor. This command updates the following code variables: RSSI0, RSSI1, RSSI2, DPDUTx0, DPDUTxFail0, DPDUTx1, DPDUTxFail1, DPDUTx2 and DPDUTxFail2. Some of these variables are used by the stability function in lines 7 and 8.

After generating the stability values, code lines 9 to 13 ensure that the LD01 (PV1) sensor value will only be used by the PI control if the stability is greater than the minimum stability threshold. Otherwise, the value of the LD02 sensor will be used by the PI control.

Finally, ESP32 performs PI control on code line 14 and sends the signal to the pump (code line 16). At the end, all data from the control loop and the network are updated in Modbus memory in order to transfer to the ScadaBR supervisory.

<sup>16:</sup> analogWrite(PUMP, PWMvalue);

<sup>17:</sup> update\_scadabr();

#### **5. System Performance Evaluation**

#### *5.1. Implementation*

A system has been developed to evaluate the performance of the wireless networked control system with ISA 100.11a devices. All elements of the architecture (Figure 1) are present in the system, as described below.


The tank and level sensors were placed on a test bench in the Industrial Network Laboratory, as shown in Figure 7a. The backbone router and the gateway were located at a distance of 3 m from the level sensors within the same laboratory. The TT-05 router is the only instrument that was distant, in the external area, in order to perform the signal attenuation tests.

The instruments highlighted in Figure 7b are the backbone router and the gateway.

(**a**) Tank and Level Sensors (**b**) Backbone Router and Gateway

**Figure 7.** System in the industrial network laboratory.

#### *5.2. Wireless Network Level Control System: Preliminary Tests*

Tests of the level control system without considering the link stability metric were performed in order to analyze the influence of stability on the control error with the system in steady state.

The methodology used in these first tests is described below.


The main objective of these tests is to cause a variation in the attenuation of the transmission signal of the LD01 sensor to verify its influence on the steady state of the control loop.

Step 7 of the methodology is performed only after the system reaches a steady state, considering a error of 1%. Thus, it is possible to infer that the errors that arise in the control loop are due to failures in the network link.

Seven tests were carried out with a minimum duration of 20 min, considering the time to start the controller, determine the set point value and wait for the system to reach the Steady State.

In order to present a better overview of the data in this paper, data from three tests are presented: test 01, test 02 and test 03.

#### 5.2.1. Test 01

The first graph of test 01 shows the variables of the control loop: error (difference between the process variable and the setpoint value) and MV (manipulated variable). The controller calculated the value of the output of the PI control (variable MV), in a range from 0 to 255, to send a pulse width modulation (PWM) signal to the system actuator: the pump. Figure 8 shows the data for these variables during the test.

**Figure 8.** Test 01: Variables of the control loop (error and MV).

It is possible to observe that approximately in the time period between 14 h 30 min and 14 h 33 min the absolute error in the steady state of the control system exceeded the limits of 1%, reaching a value close to 9% of error. This change in the error naturally caused a change in the pump signal (MV), shown in the second graph of Figure 8. The red line indicates the limit of 1% error.

An attenuation on the link is forced by changing position and inserting structures that degrade the signal transmitted on the link. Figure 9 shows the relation between the link stability value and the controller error. The analysis of the influence of stability on error is discussed in this section.

**Figure 9.** Test 01: Error (%) and Link0 stability.

It can be observed from Figure 9 that the variation of the link stability occurred before the variation of the control error. The vertical red dashed line indicates the moment when the forced attenuation of the link started. From these graphs, the previous influence of the link stability factor on the system error was analyzed.

#### 5.2.2. Test 02

Test 02 also shows a change in the controller after the link attenuation. It is important to remember that the control was already in the steady state. The values of the absolute error also exceeded the limit of 1% defined as a system requirement, as shown in Figure 10.

**Figure 10.** Test 02: variables of the control loop (error and MV).

The same behavior of test 01 happened in the second test. There is a variation in the link stability, shown in the first graph in Figure 10. Additionally, then, the control error increased in the period of time following reducing the link stability value. These behaviors of these variables are presented in the graphics of Figure 11. The dashed vertical lines in the graphs indicate the moment of the beginning of the forced attenuation.

An essential step in this analysis is the verification of the rate of instantly packets delivered ratio (PDRi). The relation between the PDRi of the link and the control error is shown in the graphics of Figure 12. The variation of the control error occurred later with the increase in drops packets.

The data presented in Figures 11 and 12 prove that the reduction of the link stability caused a variation in the packets delivered ratio, which, consequently, increased the absolute error of the control system.

#### 5.2.3. Test 03

The results obtained in test 03 support the same conclusion as the other tests, as shown in Figure 13. The link stability factor provides a prognosis or even a prediction of the behavior of the control system in the next few minutes.

**Figure 13.** Test 03: error (%) and Link0 stability.

#### 5.2.4. Preliminary Test Results

From an examination of the preliminary tests it becomes apparent that the link stability factor allows a prediction of the change in the control system error.

The second graphic in Figure 13 (test 03) shows that, in approximately 15 h 36 min 40 s, the link stability decreased to around 0.92 (92%) and remained decreasing until about 0.85 (85%). In approximately 15 h 41 min 40 s, 5 min after the decrease of the stability level, the error started its ascension until it got close to 3% error.

The values of reducing the stability factor and the time between the stability variation and the increase in error are essential to use the link stability as a decisive factor in the control loop. Thus, a summary of the data from the tests performed is presented in Table 6.


**Table 6.** Preliminary test results.

The third column of the Table 6 contains the values of the median of the reduction curve of the link stability factor. Thus, the median of these values was calculated. The median of the stability values during the reduction is 89%.

#### *5.3. Wireless Network Level Control System Based on Link Stability*

In the preliminary tests performed, the controller received the PV value from the LD01 sensor and executed the control logic. However, failure periods were observed after a reduction in the link stability value to an average of 89%. Thus, this value of 0.89 will be the threshold of link stability in control program: variable thresholdStab in Algorithm 1.

Unlike the previous results and following the architecture of Figure 1, in this final test, the controller selected the PV value based on the stability of the links. The ESP32 controller received the RSSI values of the links, stored the values and calculated the stability values of these links in real time to detect whether a variable change was required (PV1 or PV2).

Figure 14 shows that the error (%) of the controller decreased after a period of time until the end of the test.

**Figure 14.** Result of the control system based on link stability: error (%) and MV.

The result shows that the permanence of the error below the maximum value, with low variation, ensures that the manipulated variable of the control loop remains regulated.

In this last test, the same attenuation procedures were performed for the link of the level measurement sensor. The error remained below the maximum limit due to the implementation of control logic based on link stability.

By determining the link stability values equal to or less than 89%, the controller changes the choice of the PV value. This behavior can be observed in Figure 15.

The dashed vertical red line in Figure 15 indicates the time the controller detected the link stability value less than 89% and changed the PV value to the value of the second link (Link2). This change kept the error below the threshold.

Finally, the ISA 100.11a network control system was implemented and the link stability metric was able to identify possible instabilities and prevent the failure of the system's control loop.

There are few works that performed experiments with networked control systems using WirelessHART and ISA 100.11a protocols. In addition, there is no work that implemented a WNCS based on the link stability parameter. Thus, a comparison between researches was not possible.

**Figure 15.** Results of the control system based on link stability: error (%) and link stability.

#### **6. Conclusions**

In this paper, we present the implementation of an ISA 100.11a networked control system. ISA 100.11a networks are widely used in monitoring applications. However, control applications require careful attention when designing communication and control systems. Thus, the first contribution of this work was the evaluation of networked control systems using the ISA 100.11a protocol.

The system controller uses link stability as a decisive factor in choosing PV values. The link stability model is able to detect instabilities in the communication between the instruments, and consequently, to predict failures in the control loop. Some preliminary tests were performed to analyze the behavior of the control system from the generation of purposeful noise in the system. Purposeful noises reduced the value of the link stability and then increased the error of the control loop. Thus, the proposed WCNS is based on the link stability to avoid failure in the control system. When the controller took into account the link stability, the system tests showed satisfactory results. The controller detected a low stability of the sensor link and changed the PV value to another link. The detection of link instability kept the control loop within the desirable limits. The second contribution was the use of the link stability model in a wireless network control system.

The tests were performed with ISA 100.11a instruments from the manufacturer Yokogawa Eletric. Additionally, the monitoring of the network and control system variables was done from the interface with Modbus TCP and GSAP commands. Thus, the third main contribution of this work was to provide experimental tests with instruments from manufacturers in the market.

**Author Contributions:** Conceptualization, H.F., A.D.N. and D.M.; Data curation, H.F. and A.D.N.; Methodology, H.F., A.D.N. and D.M.; Software, H.F.; Writing—original draft, H.F.; Writing—review & editing, H.F., A.D.N. and D.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18