Next Article in Journal
User QoS-Based Optimized Handover Algorithm for Wireless Networks
Previous Article in Journal
Methods for Comprehensive Calibration of a Low-Frequency Angular Acceleration Rotary Table
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Validation of High-Availability Model for Edge Devices and IIoT

by
Peter Peniak
,
Emília Bubeníková
and
Alžbeta Kanáliková
*
Department of Control and Information Systems, Faculty of Electrical Engineering and Information Technology, University of Zilina, 010 26 Zilina, Slovakia
*
Author to whom correspondence should be addressed.
Sensors 2023, 23(10), 4871; https://doi.org/10.3390/s23104871
Submission received: 10 March 2023 / Revised: 14 May 2023 / Accepted: 15 May 2023 / Published: 18 May 2023
(This article belongs to the Section Industrial Sensors)

Abstract

:
Competitiveness in industry requires smooth, efficient, and high-quality operation. For some industrial applications or process control and monitoring applications, it is necessary to achieve high availability and reliability because, for example, the failure of availability in industrial production can have serious consequences for the operation and profitability of the company, as well as for the safety of employees and the surrounding environment. At present, many new technologies that use data obtained from various sensors for evaluation or decision-making require the minimization of data processing latency to meet the needs of real-time applications. Cloud/Fog and Edge computing technologies have been proposed to overcome latency issues and to increase computing power. However, industrial applications also require the high availability and reliability of devices and systems. The potential malfunction of Edge devices can cause a failure of applications, and the unavailability of Edge computing results can have a significant impact on manufacturing processes. Therefore, our article deals with the creation and validation of an enhanced Edge device model, which in contrast to the current solutions, is aimed not only at the integration of various sensors within manufacturing solutions, but also brings the required redundancy to enable the high availability of Edge devices. In the model, we use Edge computing, which performs the recording of sensed data from various types of sensors, synchronizes them, and makes them available for decision making by applications in the Cloud. We focus on creating a suitable Edge device model that works with the redundancy, by using either mirroring or duplexing via a secondary Edge device. This enables high Edge device availability and rapid system recovery in the event of a failure of the primary Edge device. The created model of high availability is based on the mirroring and duplexing of the Edge devices, which support two protocols: OPC UA and MQTT. The models were implemented in the Node-Red software, tested, and subsequently validated and compared to confirm the required recovery time and 100% redundancy of the Edge device. In the contrast to the currently available Edge solutions, our proposed extended model based on Edge mirroring is able to address most of the critical cases, where fast recovery is required, and no adjustments are needed for critical applications. The maturity level of Edge high availability can be further extended by applying Edge duplexing for process control.

1. Introduction

Industrial applications require the high availability and reliability of devices and systems. Any interruption in production can have serious consequences for the profitability and reputation of the company. Therefore, it is important for industrial applications to possess adequate backup systems and disaster recovery plans for operational processes. The high availability of equipment also means the minimal impact of planned maintenance work on productivity and production time. Finally, high availability also means ensuring minimal risk for employees and their safety when working with industrial applications. Deployment of Industry 4.0 [1,2], in contrast to classical ISA.95 [3,4], with the strictly hierarchical communication of devices via the control systems (PLCs) and information systems, brings completely new approaches and concepts for the overall design, implementation, and high-availability aspects. Smart devices, based on the Industrial Internet of Things (IoT/IIoT) protocols, can be integrated with applications directly and communicate in the peer-to-peer mode independently if used on-premises (local hosts) or via the Internet and Cloud (Software as a Service). However, adequate security, fail-safe features, and the overall orchestration of the communication between applications and IoT platforms should be further enhanced.
The traditional approach to integrating smart devices with applications—for example, sensors—is shown in Figure 1a. Smart devices can support different application protocols (MQTT, CoAP, DDS, etc.) in order to communicate with applications via TCP/IP connections. The applications could be run either on-premises (local servers) or via a Software as a Service deployment model (SaaS) in the Cloud. The applications perform an evaluation of the delivered data and can create a decision or process the data into information, based on the application logic. In this approach, problems may arise when making decisions in the application, which arise from the need to integrate different protocols, different types of devices, and data synchronization. Another approach can be based on the usage of Edge devices, which is shown in Figure 1b. Edge acts as a generic integration gateway (Middleware), capable of connecting various devices and protocols within the Cloud applications. Such a gateway can be installed as close as possible to the end devices, and is therefore called an Edge device (on the Edge of the Cloud). The Edge device helps to integrate various systems with different application protocols to one connection with the selected protocol and could reduce the overall latency between sensors and applications. It also extends the capability of end devices with the new features, which is known as an Industry 4.0 technique called Digital Twins.
Although the generic Edge devices can address most of the issues, there are still open tasks with respect to industrial communication, the synchronization of sensor inputs, and application requirements for high-availability features. Therefore, our focus will be paid to the creation of the extended Edge model with redundancy, as shown by Figure 1c, with two Edge devices to provide high-availability features, even in the case of Edge device failure. In addition, we will address the integration of IIoT devices with manufacturing applications that expect OPC UA integration. Having created a model, its high availability will be validated on our numerical model and compared with the generic Edge model.

2. Related Work

Based on the concept of connecting IOT and IIOT with Cloud services in the literature, we have endeavored to find a link to several reference works.

2.1. Cloud Computing

Cloud computing refers to the provision of various services over the Internet, including data storage, servers, databases, networks, and software, as well as its retrieval on demand. A Cloud computing environment provides computing power and storage to offload on-premises systems. However, there are certain disadvantages to using Cloud computing; for example, data transfer to Cloud servers may require significant network bandwidth and may also increase service latency [5]. These disadvantages can be a sensitive issue for some applications. Researchers and companies have proposed two approaches to solve these problems: Edge computing and Fog computing [6]. Cloud computing offers three types of services: infrastructure, platform, and software (IaaS, PaaS, SaaS) [5].

2.2. Fog Computing

Fog computing is a decentralized computing infrastructure that stores data, computes data, and contains applications.
It sits somewhere between the data source and the cloud. Similarly, to Edge computing, Fog computing brings the advantages and power of the Cloud closer to where the data is created and acted upon.

2.3. Edge Computing

Edge computing is a distributed architecture in which raw data is processed at the edge of the network, as close as possible to the data source, and then the selected data or statistics are transferred to a Cloud server [7,8,9]. There is a growing interest in both academia and industry for the Edge/Fog/Cloud computing of new applications and technologies, such as the IoT, artificial intelligence, machine learning, and process automation. Edge/Fog/Cloud computing represent powerful tools that enable the efficient management and processing of large amounts of data from new technologies and applications.
Edge computing works as an intermediate layer between IoT devices, or IIoT, and the Cloud. The main task of Edge is to mediate the transfer of data from the IoT to the Cloud. It can offer small real-time computing and storage capabilities [9]. The current methods of implementing the Internet of Things (IoT) solutions focus on directly connecting devices to the Cloud, where the data is processed, filtered, or aggregated to increase the business value. However, this approach is insufficient for the manufacturing industry, which requires Edge computing for on-site processing. Industrial devices typically generate significantly more data than standard IoT devices, resulting in delays and increased costs to transfer data to the Cloud. In the industrial context, minimizing response times to critical events and ensuring special security requirements are necessary. Therefore, moving computations to Edge devices in industrial plants can help mitigate these issues and improve the response time and bandwidth efficiency [10]. This means that some data processing and storage processes are moving from the Cloud to Edge. Although the concept of Edge computing is gaining popularity, there is still no consensus on a standardized definition and architecture for Edge computing, [11]. An example of a specific use of Edge in healthcare is the utilization of Edge devices to secure a system that monitors a patient’s health status in a hospital. In the work [12], the issue of data protection using homomorphic encryption is addressed, and Edge is utilized to perform part of the analytical tasks, thereby increasing the performance of encrypted analysis, and reducing the size of the data transmitted to the Cloud. One of the big challenges is deploying Edge in industry. An example of this is the concept that ensures automation in the industry using Edge devices. In [13], the implementation of the Edge Powered Industrial Control concept is realized on an industrial demonstrator using AWS Edge technology. IoT systems can benefit significantly from Edge computing technology, but there are still several challenges related to performance, efficiency, reliability, availability, scalability, security, and privacy [14]. In the following text, we present some interesting solutions that use Edge computing. It is important for Edge computing to remain reliable and fault tolerant when an IoT application is running on a set of Edge networks. It is important to design an efficient and fault-tolerant system for an Edge computing network because of the huge diversity of Edge devices, networks, and computing approaches, as discussed in [15]. In the works, the authors mainly emphasize the speed and accuracy of fault diagnosis, which ensures lower latency and higher availability. Resistance to errors and reliability in Edge computing is also addressed in work [16], where a mobile agent is incorporated, which moves the application to an alternative server in the event of a server failure. Technologies working on software-container-based virtualization have been proposed in [17] for fault tolerance. The work in [18] proposed fault tolerance and backups in Edge cluster networks with support for containers, Kubernetes, and Apache Kafka. Artificial intelligence methods were also applied in Edge implementation and within Edge devices. The mechanism is software-based, supports a software-defined architecture for the IoT, and is robust to various IoT failures and network failures [19]. Some works address the issue of high availability. Document [20] proposes a high-availability architecture in which a Cloud architecture based on templates is designed to automatically configure fault detection and fault recovery methods depending on various service characteristics. In work [21], the authors tackled the minimization of service interruptions and the assurance of the high availability of Edge services by implementing a scheme of real-time internal and external container migration to achieve cooperative processing, load balancing, data backup, and emergency service with switching using Docker technology. The paper [22] proposes a platform where devices in close proximity connect and form a network, called an Edge neighborhood. The platform allows participating devices to utilize the available resources by replicating the metadata from Edge to Edge. Several recent works have proposed the concept of interoperability between Edge/Fog/Cloud in the Internet of Things infrastructures to ensure various Quality of Services (QoS) measures, such as availability and reliability. In works [23,24,25,26,27,28,29], the authors focus on analytical modeling and the evaluation of the availability and reliability of Edge computing using tools such as Markov chains. Work [30] presents a systematic overview of the technologies and methods currently used in federated learning and Edge computing. Works [31,32,33,34] address the integration of blockchain technologies with Edge computing applications.

3. Generic Edge Device Model

Let us assume that our Edge device is mainly focused on the integration of various sensors’ data and publishing their results to the Cloud or for processing by on-premises applications, and that there are no specific features required with respect to the Edge computing, fast operation recovery, and high availability. In this case, the Edge device acts as a typical integration gateway.
In order to evaluate and compare our extended Edge device models, the generic model has to be created to provide the base for testing, comparison, and to represent generic Edge devices. To create the generic model, let us assume that there is a group of IoT devices: IIOT = {IIOT1, IIOT2, …, IIOTn}, where n is the total number of distributed IIOT devices. These devices are connected to the Cloud via Internet connection, as shown in Figure 2a. The devices must have assigned the supported application protocol P = P 1 ,   P 2 , , P n , for example, MQTT, CoAP, DDS, etc. Their exchanged data can be formalized as x = x 1 ,   x 2 , , x n with the assigned timestamp T. The timestamp is given by the target application when receiving the data T = t 1 , t 2 , , t n .
The application performs an evaluation of the delivered data and can create a decision or process the data into information based on the application logic y (1). The earliest possible time in which the application is allowed to make a decision td is determined by the maximum timestamp (2).
The generic Edge-based approach covers all of the required application protocols and consolidates all of the received values from the sensors, as shown in Figure 2b. It is performed by using the one selected application protocol for Cloud-Edge communication (Py). The Edge collects all of the sensors’ data ( P i , t i , x i ) within the local network performance conditions (~ms), which is then consolidated to the data that we labeled (xi) and publishes them according to defined time slots tp via the Internet. Under the term consolidation, we assume the preparation of all the sensors’ data for publishing via the OPC UA server. The possible publishing time (3) must be calculated as a maximum of the received data timestamps from all sensors (4), which is referred to as the cycle time t c (3) and is extended by any additional over-head time to that is needed for data processing by Edge.
y : ( x 1 , P 1 , t 1 , ( x 2 , P 2 , t 2 ) , ,   x n , P n , t n ] ,
t d > max { t 1 , t 2 , , t n }
t p = t c + t o
t c = max { t 1 , t 2 , , t n }
Therefore, Edge can provide the data from all the connected sensors within the overall publishing time t p , as illustrated in Figure 2b. Similar to the traditional approach, the publishing time must be equal or lower than the required decision time (5), (6).
t p t d ,
y : ( x 1 , t p , ( x 2 , t p ) , ( x n t p ) ,   P y ] .
To simplify our basic Edge model, which is shown in Figure 3, let us assume that we only use MQTT as the application protocol for the IIoT devices. The MQTT protocol is typical for telemetry use-cases and sensors with the associated applications. The MQTT broker is an essential part of the basic Edge device model (MQTT-BP) and will be used by sensors or IIoT devices instead of brokers on the application side.
On the contrary, manufacturing applications typically use the OPC UA protocol due to its wide vendors’ acceptance and its advanced cybersecurity features, rich communication capabilities, provided services, and dataspace modeling.
The IIoT values (x1, x2, …, xn) from the MQTT broker items are replicated to the associated variables (x1, x2, …, xn) in the OPC UA address space. The OPC UA server (OPC UA-S) then retains all of the values until the next update during the publishing time. Applications have access to the published values via the OPC UA client (“READ” method) from the variables, according to Formula (7):
x i t i = x i t p .
MQTT devices can publish and read the data via subscription, but if there is a suddenly broken connection or much higher response time, they can miss published values that are needed for the applications’ evaluation and decision. This can lead to issues with the application logic and the necessity to wait for the next publishing time, which can require additional handling and work-around coding. This can be avoided by keeping the last known value until the next publishing time can be used for the required updates, without interrupting the application logic, which is a native feature of OPC UA server variables and objects (see Figure 4).

4. Extended Edge Device Model Device with High Availability

As we presented in our introduction, Edge devices are typically implemented as dedicated devices with a focus on device integration and low latency. In this case, the high-availability features are not solved, and in cases of Edge failure, the device is simply replaced by spare-part Edge. However, manufacturing applications often demand to keep the application running, even in cases of Edge device failure or its unavailability. For this purpose, we had to extend the Edge model to cover the required high-availability features. Let us assume the following possible variants based on the generic Edge device with spare-part and extended Edge device models, which can offer high availability for applications:
  • I. Generic Edge model with spare-part device (off-line Edge2)
  • II. Extended Edge model with mirrored Edge2
  • III. Extended Edge model with duplexing by Edge2.
  • I. Generic Edge model with spare-part Edge2 device
This model is virtually identical to the Edge device deployments commonly used at present. Automatic Edge device recovery in cases of failure is not assumed. In the event of an Edge device failure, spare-part Edge will be installed and activated Applications in the Cloud layer are thus exposed to the fact that the Edge device is unavailable and there would be missing sensor data as a consequence. This Edge concept is shown in Figure 5. In Figure 5a we can see basic model and in Figure 5b is spare-part Edge in off-line mode.
The Edge recovery time for Variant I (TER-I) depends on the activation of the spare-part Edge. This procedure requires HW restart, and the activation of the IP address of the primary Edge, followed by reconnections of all IIoT devices and applications. As soon as Edge2 is activated and all devices and applications are reconnected, the application obtains all of the required inputs (sensors’ data) within the next publication time, as expressed by Formula (8). The explanation of the Edge recovery procedure is shown in Figure 6.
T E R - I = t i + t b + t a + m a x t r 1 , t r 2 , , t r n , t r a + t p
where:
  • ti—Edge failure identification,
  • tb—Edge activation time (HW restart),
  • ta—activation of IP address of primary Edge on Edge2,
  • tri—reconnection time of IIoTi device to MQTT broker,
  • tra—reconnection time of application to OPC UA server,
  • tp—next publication time of OPC UA server (all sensors’ data collected).
As explained in Figure 6, during the unavailability of Edge1, all IIoT devices and the applications would lose connection, which is a state that is not suitable (not OK). After the identification of a failure, Edge2 is restarted and the IP address of the primary Edge is activated, triggering the reconnection of all IIoT devices and the application itself. As soon as all of the IIoT data are available for publishing, the application can restart the processing of the data. The basic Edge model is shown in Figure 7.
II.
Extended Edge model with mirrored Edge2 device
This model is based on the two active Edge devices. The primary Edge1 device is used by applications and IIoT devices, while the secondary Edge2 mirrors all of the sensors’ values from the primary Edge. The secondary Edge is not visible to the IIoT devices and applications, only internally to Edge1. Automatic recovery in cases of Edge1 failure is implemented and is based on the availability checking of Edge1.
IIoT devices are connected to the primary Edge1 (see Figure 8a). The IP address or host name of the primary Edge device is known by all of the connected IIoT devices. Figure 8 illustrates the concept of Edge1 mirroring and availability checking by Edge2. As mentioned previously, Edge2 replicates all of the sensors’ data from Edge1. If Edge1 fails (see Figure 8b) and is not active on the network, Edge2 will identify this event and take over its IP address.
The Edge recovery time for Variant II (TER-II) depends on the time required for the identification of an Edge1 failure, which is followed by the activation of the IP address on the secondary Edge2, while the mirrored sensor data are already available prior to the next publishing time, according to (9):
T E R - I I = t i + t a + m a x t r 1 , t r 2 , , t r n , t r a ,
where:
  • ti—identification time of Edge1 failure,
  • ta—activation of IP address of primary Edge on Edge2,
  • tri—reconnection time of IIoTi device to MQTT broker,
  • tra—reconnection time of application to OPC UA server.
To simplify our model, let us again assume that we only use MQTT as the application protocol for IIoT devices. IIoT devices publish their sensor data (x1, x2, …, xn) to an embedded Edge broker (MQTT-Bp) via the defined items that are immediately replicated to the associated variables (x1, x2, …, xn) of the OPC UA server (OPC fFigureUA-Sp). The server then keeps all of the values in its object address space until the next update during publishing time. The application (SaaS) has access to the published values via the OPC UA client “READ” method from the variables. The secondary Edge2 has the same configuration, but its IP address/hostname is not known to the IIoT devices and applications. Edge2 replicates the IIoT values from the primary Edge by subscribing to the same MQTT-BP topics, which are again replicated to the OPC UA server (OPC UA- Ss) variables (x1, x2, …, xn). Therefore, Edge2 is practically able to obtain all the values at the same time as the primary Edge1.
x i t i = x i t p = x i t i .
In cases of primary Edge1 failure, the connections of the applications and IIoT devices with Edge1 are broken. The secondary Edge2 identifies the absence of special heart-beat signals (HB_In) from the primary Edge1 as we can see in Figure 9. Edge2 takes over the IP address/hostname of the Edge1 device. IIoT devices and applications can recover their connection to Edge and have access to MQTT-Bs and OPC UA-Ss with the replicated values. Formula (11) describes the comparison of TER-I and TER-II (recovery times) in relation to the t d   (decision time). We can see that the TER-II time is minimized using this approach. To maintain the stability of Edge2, the heart-beat signal (HB_OUT) prevents Edge1 taking the primary IP address in case of its recovery so that the stability and flapping of the connection are mitigated.
  T E R - I   T E R - I I   t d
III.
Extended Edge model with Edge duplexing
This model applies a pair of Edge devices with independent but identical functions for applications and IIoT devices. In this case, both Edge devices are used equally by all devices and applications, maintaining the overall system redundancy. It means that there are two IP addresses used and two connections established by the applications and IIoT devices.
Then, in cases of Edge1 or Edge2 failure, this solution sustains the operations without any interruption. Figure 10 explains the concept of Edge duplexing in case the right activity (Figure 10a) and in case in failure (Figure 10b) with a defined recovery time, according to (12). The extended Edge model with duplexing is shown in Figure 11.
T E R - I I I = 0 .
In addition, the received values published by the sensors can have different time stamps (ti, tj), based on the network conditions, but are consolidated by both of the Edge device OPC servers in the same publishing time (tp), as expressed by (13–15).
x i t i = x i t j
x i t i = x i t p
x i t j = x i t p

5. The Experimental Workplace for Testing and Validation of Edge Device Models

The proposed models, which are shown in Figure 7 and Figure 9, have been verified in our laboratory. The implementation of the models is based on the Node-RED platform, which is one of the most frequently used products for IoT solutions and can support the creation of various data flows with a broad portfolio of specific nodes, application protocols (TCP, HTTP, MQTT, OPC UA), and supplementary services (such as JSON/XML parsers, file systems, database connectors, etc.).
The general setup of our simulation experiment is illustrated in Figure 12. All of the sensors are simulated by a Node-RED flow called “IIoT (Sensors)”, which generates an output of sensors at various times and cycles. IIoTi nodes will send the created values to the MQTT client node. The MQTT client publishes the sensors’ values via the assigned topics (X1, X2, …, Xn) to the MQTT broker on the Edge device. Applications are represented by another flow, named “Applications (SaaS)”, which simulates the typical application processing of the sensors’ data based on the regular reading of the sensors’ values from the OPC server variables with a defined decision time (td). The primary and secondary Edge devices are implemented by dedicated Node-red flows:
  • Edge1—IP address 192.168.111.101,
  • Edge2—IP address 192.168.111.102,
  • with virtual Edge—IP address 192.168.111.100,
Let us explain the implemented workplace on an example of a test-case for the validation concept of the high-availability Edge device model (Variant 2). We will describe the sequence of steps of the model shown in Figure 12. After restart, Edge devices have a default configuration (see Figure 12a). MQTT sensors publish their values to Edge1 via the MQTT broker with the virtual Edge address (1), which is active on the primary Edge1. Moreover, Edge2 is subscribed to the primary MQTT broker and replicates all of the published items (2). Both Edge devices replicate the MQTT topics (X1, X2, …, Xn) to the associated OPC UA server variables (X′1, X′2, …, X′n) (3,4). The application reads the values of the sensors from the OPC UA server items via the virtual Edge device IP address (5). In cases of Edge1 failure (6), Edge2 takes over the virtual Edge address (7), as shown in Figure 12b. All sensors and OPC UA applications will reconnect or resume communication via the MQTT broker and the OPC UA server of Edge2 (8,9,10) based on the virtual Edge address. When Edge1 recovers and returns back to the normal mode, it checks if the virtual Edge address is active. If this is a valid case, it does not take over it until restart of the both Edge devices or until the administrator is reset to the default setup.
Our experimental workplace with defined Node-RED flows is shown in Figure 13. The number of sensors is limited for the experimental validation to only three devices. All of the sensors’ data are published to the MQTT broker via the defined topics (X1, X2, X3). Edge subscribes to these topics, consolidates them, and writes them into the OPC UA server variables (X1, X2, X3). They can be read by OPC UA clients and the associated applications. OPC UA is capable of providing a timestamp and last valid value for the application, as shown in Figure 14.
The sensors are implemented in Node-red using the standard node “Inject”, named IIoT1, IIoT2, and IIoT3. They regularly inject values, as shown in Figure 13 (debug window). Each sensor generates random values, which are multiplied for tracking, according to (16):
x i = i R N D 0 , 1 .
The generation period Ti is configurable and could be set according to the test-cases, for example, according to (17):
T i = 5 i .  
Edge flows are created with three sections:
  • Edge engine—Aedes MQTT Broker for Node-red, OPC UA Server/Client (add-ons) to keep the major services of the Edge system with the option to initialize OPC UA variables.
  • IoT MQTT Topics-OPC Variables—Subscribing to the MQTT broker of virtual Edge with writing of received values to the OPC UA server variables.
  • HighAvailability/Fail-Over-Heart-Beat—monitoring of Edge1 by Edge2 based on the heart-beat signal. Takeover of all communication and virtual Edge address in case of its failure.
To better explain the Edge engine part, the Node-red console log can be used (see Figure 15). The embedded Node-RED Contrib OPC UA server and MQTT broker from Aedes are activated within the Edge flows during the initialization of Node-RED.
The second part with IIoT topics and OPC UA variables has already been described; therefore, let us explain the fail-over part of the experimental workplace. If Variant 2 is activated, Edge2 subscribes to the HB_IN topic of its internal MQTT broker. The heart-beat signal is regularly published to this topic by Edge1 (HB_IN = 2). In cases where the heart-beat value is not published for a longer time than the configurable timeout (10 s), Edge2 evaluates this as a failure of Edge1 (Edge) and initiates the Trigger function to take over the virtual IP address of Edge1 (192.168.1.100), and the Edge status is updated by SwitchOver_On/Edge1_off, as shown in Figure 16.

6. The Validation of Edge Device Models in Laboratory

Having implemented an experimental workplace, we were able to test, verify, and validate our Edge device models, with respect to Edge high availability for IIoT devices and applications. In order to evaluate the benefits of the proposed high-availability Edge device models, a comparison with the current models would be necessary. Therefore, our experimental testing and validation will be focused on Variant 1 (Edge basic model) and Variant 2 (Edge extended model with mirroring).
Let us focus on the basic Edge model, which does not have embedded high availability. The performed test shows that immediately after Edge1 device failure, neither the application nor IIoT devices recognize the failure (MQTT connected, OPC active reading). However, the OPC UA Expert client shows that the quality of the data is bad, with a missing refresh on 23:27 (see Figure 17).
After the initial phase with the detection of a failure, the application and IIoT devices identified the connection failure with Edge1 (MQTT connecting, OPC invalid channel). While the sensors are still generating values, those values are not transferred to the application, so it cannot execute application logic with the sensors’ data after the Edge failure. This phase is shown in Figure 18. To solve this issue, there is a spare-part Edge2 kept for Variant 1. Let us assume that Edge2 is active (hot-standby mode) and there is only a need for the activation of the Edge1 IP address and initialization of the Node-RED software. As soon as Node-RED is reactivated with the IP address of Edge1 on the spare Edge2, the IIoT devices can reconnect to the MQTT broker and publish their values again, which are replicated by Edge to OPC UA variables; however, the application can still be disconnected until the next reading process is initiated (td). This restarting phase with a partially recovered system is shown in Figure 19.
Although the spare Edge2 helped to recover the incident with the failed primary Edge1, there was an overall outage for the application that lasted more than 3 min (See in Figure 20).
The incident started at 23:27 and ended at 23:30 (we can see in Figure 21) and, so the application did not process the data for more than 3 min. According to Formula (8), we can calculate the time for the Edge recovery:
TER-I ~ 180 s
where
ti = 20 s, ta = 5 s, tb + tr = 145 s, tp = 10 s.
The consequence of the Edge1 failure was that more than 18 calculations and sensor input data were missing during ~3 min of application outage. For a better evaluation, we have introduced the second critical parameter, the number of lost sensor messages xL:
xL = 18.
Apparently, this approach cannot be used for critical applications. Examples of the data generated and processed by the application during the failure and Edge recovery by Variant 1 are shown in Figure 22.
To validate the Edge enhanced model with high availability (Variant 2), we again simulated a failure of the primary Edge1. Subsequently, the HB_In signal could not be detected and Edge2 activated the virtual IP address by itself. As a result, the virtual IP address was available for the sensors again, which could reconnect, as shown by Figure 23. Finally, the system became stable and the OPC UA application could quickly reconnect.
The high-availability features of the proposed Edge enhanced model (Variant 2) were also confirmed in our experiment. Having simulated a failure of the primary Edge device, the secondary Edge2 took over the virtual IP address and the IIoT devices, with the application reconnecting within 9 s. Taking into consideration the configured heart-beat detection time (10 s), the overall time for Edge recovery was ~10× lower than with Variant 1, according to Formula (9):
TER-II = 19 s,
where
ti = 10 s, ta = 5 s, tr = 4 s.
The OPC UA server maintained the replicated published values from Edge1, so there was no loss of data for the application after the OPC UA client was reconnected and the MQTT devices restarted publishing. Therefore, the second critical parameter, which is xL—number of lost sensor messages, is:
xL = 0.
The enhanced Edge model with Edge duplexing (Variant 3) was not tested or validated. This model requires modified IIoT devices or sensors that are able to maintain parallel MQTT connections to two different MQTT brokers. In addition, the application has to be able to combine two OPC connections. This approach can be applicable for very critical and specific processes within manufacturing, especially if the sensor values can directly influence the process control systems with a high impact on the decisions. Therefore, to prepare future testing and validation models, we have prepared a Node-RED-based model of IIoT devices supporting the Variant 3 requirements, as illustrated by Figure 24.

7. Conclusions

Industrial applications require the high availability and reliability of devices and systems. For example, some manufacturing applications require reliable sensor data without any data transmission interruption and subsequent missing data values. The integration of sensors with applications can be realized directly via Internet connection to the Cloud (SaaS) or via an integration Edge device. Our article is focused on the Edge device approach because the potential malfunction of Edge devices can cause the failure of some critical applications, as well as the unavailability of Edge computing results, and can have a significant impact on manufacturing processes.
Our article deals with the design of a suitable Edge model to provide its high availability for applications in an industrial environment with Industrial Internet of Things devices. The high-availability features are not typically considered in generic Edge solutions, which act mainly as an integration gateway, providing device integration with low latency. In cases of Edge failure, there is an expectation that the device is simply replaced by spare-part hardware and activated when possible. However, manufacturing applications have much higher requirements for a fast recovery and reliable operations.
Our main goal is to extend the generic Edge device to support redundancy and fast system recovery in cases of failure. Therefore, we created three variants of Edge models:
  • Variant I: Generic Edge model without high availability.
  • Variant II: Extended Edge model with mirroring.
  • Variant III: Extended Edge model with duplexing.
While Variant I represent the generic Edge model without any focus on high-availability and is used only for comparison with the extended models, the proposed Variants II and III are based on system redundancy with an additional secondary Edge device. Variant II is based on Edge mirroring, which we designed with the aim to minimize the recovery time after Edge device failure through the mirroring of the sensor data on the secondary Edge device. In cases of primary Edge device failure, the secondary one takes over the network configuration and provides the available data (mirrored) to the reconnected applications, while the sensors are also reconnected. Variant III represents the most advanced approach, providing Edge duplexing. In this case, each sensor and application are connected to both Edge devices, providing two channels to receive and process the sensors’ data. With duplexing, any device failure has no impact on critical applications, there is no need to reconnect, and there is no potential loss of the sensors’ data.
For each proposed variant, we have assigned a numerical model with a key performance indicator (KPI) as the recovery time for Edge (TER-x) and number of lost IIoT (sensors) messages xL to evaluate the high-availability maturity level of each variant.
The implementation of the variants was based on the Node-RED platform, which is one of the most commonly used products for IoT solutions and can support the creation of various data flows with a broad portfolio of specific nodes, application protocols, and supplementary services. Our model uses the OPC UA standard, which is the most suitable integration platform for various control systems, manufacturing applications, and IIoT devices. The OPC UA protocol is combined with the more common IoT protocol—the MQTT protocol—to cover an extended product range without embedded OPC support.
Based on our validation and testing, we can state that the generic Edge model (Variant I), which represents the currently available Edge devices, cannot be used for critical manufacturing applications, where IIoT data are needed for decision making processes in semi-real time. The Edge recovery time TER-I took several minutes (180 s), with 18 missing messages (xL = 18), even in the most proactive approach using the active spare Edge (which is not a typical case). With an increased number of sensors or much faster publishing of the sensors’ data, this impact would be even higher.
On the contrary, our proposed extended model based on Edge mirroring (Variant II) was able to address most of the critical cases, where fast recovery is required, and no adjustments are needed for the applications and IIoT devices. The recovery time TER-II was roughly 10× lower (19 s) in comparison to the generic Edge model, and the reconnection time tri was fast enough not to cause any interruptions or loss of IIoT messages with the sensors’ data (xL = 0).
In conclusion, we can confirm that our Variant II with Edge mirroring can be applied for a broad range of manufacturing applications, where the reconnection time and fast system recovery is sufficient for application use-cases. There is a clear benefit in the operational aspects of Edge devices, as well in the sensor data availability, in contrast to the Edge generic model and currently available Edge solutions. However, for process control and semi-real time communication, even Variant II does not address all of the challenges. To address this most challenging use-case, there is a need for our Variant III with Edge duplexing. This model works with two Edge devices so that any failure will not be recognized by the IIoT devices or applications. To test and validate this model, we would need to adjust the IIoT devices and applications. This would require further extension and preparation work. In addition, we have limited our validation to only the MQTT protocol. It would be necessary to extend the model for different protocols, such as DDS and CoAP, to obtain validation of the model for all cases. We will need to extend our model to address those requirements in our follow-up articles.

Author Contributions

The work presented in this paper was carried out in collaboration with all authors. P.P. defined the research topic, guided the research goals, and performed the experiments. E.B. and A.K. created the Figures and conducted the formal analysis. E.B. and A.K. wrote the theoretical part and edited and reviewed the paper. All authors have read and agreed to the published version of the manuscript.

Funding

The publication was funded by the Cultural and Educational Grant Agency MŠVVaŠ SR, grant number 008ŽU-4/2021: Integrated Teaching for Artificial Intelligence Methods at the University of Žilina.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Standard Industry 4.0. Available online: https://www.standardsi40.sg (accessed on 2 March 2023).
  2. Peniak, P.; Holečko, P.; Bubeníková, E. Aplikácie informačných systémov v Priemysle 4.0 (Applications of Information Systems in Industry 4.0); EDIS UNIZA, University of Žilina: Žilina, Slovakia, 2023. [Google Scholar]
  3. The International Society of Automation. ISA95, Enterprise-Control System Integration. 2019. Available online: https://www.isa.org/isa95 (accessed on 2 March 2023).
  4. ANSI/ISA-95. Available online: https://searcherp.techtarget.com/definition/ANSI-ISA-95 (accessed on 8 March 2023).
  5. Pereira, P.; Araujo, J.; Maciel, P. A hybrid mechanism of horizontal auto-scaling based on thresholds and time series. In Proceedings of the IEEE International Conference on Systems. Man, and Cybernetics (SMC), Bari, Italy, 6–9 October 2019; pp. 2065–2070. [Google Scholar]
  6. Sunyaev, A. Fog and Edge Computing. In Internet Computing; Springer: Cham, Switzerland, 2019. [Google Scholar]
  7. Naha, R.K.; Garg, S.; Georgakopoulos, D.; Jayaraman, P.P.; Gao, L.; Xiang, Y.; Ranjan, R. Fog computing: Survey of trends, architectures, requirements, and research directions. IEEE Access 2018, 6, 47980–48009. [Google Scholar] [CrossRef]
  8. Jayashree, L.; Selvakumar, G. Edge computing in IoT. In Getting Started with Enterprise Internet of Things: Design Approaches and Software Architecture Models; Springer: Cham, Switzerland, 2020; pp. 49–69. Available online: https://link.springer.com/chapter/10.1007/978-3-030-30945-9_3 (accessed on 2 March 2023).
  9. Shi, W.; Cao, J.; Zhang, Q.; Li, Y.; Xu, L. Edge computing: Vision and challenges. IEEE Internet Things J. 2016, 3, 637–646. [Google Scholar] [CrossRef]
  10. Mahmud, R.; Kotagiri, R.; Buyya, R. Fog Computing: A Taxonomy, Survey and Future Directions. In Internet of Everything. Internet of Things; Springer: Singapore, 2018. [Google Scholar] [CrossRef]
  11. Mach, P.; Becvar, Z. Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun. Surv. Tutor. 2017, 19, 1628–1656. [Google Scholar] [CrossRef]
  12. Alabdulatif, A.; Khalil, I.; Yi, X.; Guizani, M. Secure Edge of Things for Smart Healthcare Surveillance Framework. IEEE Access 2019, 7, 31010–31021. [Google Scholar] [CrossRef]
  13. Pallasch, C.; Wein, S.; Hoffmann, N.; Obdenbusch, M.; Buchner, T.; Waltl, J.; Brecher, C. Edge Powered Industrial Control: Concept for Combining Cloud and Automation Technologies. In Proceedings of the IEEE International Conference on Edge Computing (EDGE), San Francisco, CA, USA, 2–7 July 2018; pp. 130–134. [Google Scholar]
  14. Harjula, E.; Artemenko, A.; Forsström, S. Edge Computing for Industrial IoT: Challenges and Solutions. In Wireless Networks and Industrial IoT; Springer: Cham, Switzerland, 2020; pp. 225–240. [Google Scholar]
  15. Djenouri, Y.; Belhadi, A.; Srivastava, G.; Ghosh, U.; Chatterjee, P.; Lin, J.C.W. Fast and Accurate Deep Learning Framework for Secure Fault Diagnosis in the Industrial Internet of Things. IEEE Internet Things J. 2023, 10, 2802–2810. [Google Scholar] [CrossRef]
  16. Grover, J.; Garimella, R.M. Reliable and Fault-Tolerant IoT-Edge Architecture. In Proceedings of the IEEE SENSORS, New Delhi, India, 28–31 October 2018; pp. 1–4. [Google Scholar] [CrossRef]
  17. Morabito, R.; Kjallman, J.; Komu, M. Hypervisors vs. Lightweight Virtualization: A Performance Comparison. In Proceedings of the IEEE International Conference on Cloud Engineering, Tempe, AZ, USA, 9–13 March 2015; pp. 386–393. [Google Scholar]
  18. Javed, A.; Heljanko, K.; Buda, A.; Främling, K. CEFIoT: A Fault-Tolerant IoT Architecture for Edge and Cloud. In Proceedings of the 2018 IEEE 4th World Forum on Internet of Things (WF-IoT), Singapore, 5–8 February 2018; pp. 813–818. [Google Scholar]
  19. Kumar, S.; Ranjan, P.; Singh, P.; Tripathy, M.R. Design and Implementation of Fault Tolerance Technique for Internet of Things (IoT). In Proceedings of the 12th International Conference on Computational Intelligence and Communication Networks (CICN), Bhimtal, India, 25–26 September 2020. [Google Scholar] [CrossRef]
  20. Yang, H.; Kim, Y. Design and Implementation of High-Availability Architecture for IoT-Cloud Services. Sensors 2019, 19, 3276. [Google Scholar] [CrossRef] [PubMed]
  21. Sangolli, D.R.; Ravindrarao, N.M.; Patil, P.C.; Palissery, T.; Liu, K. Enabling High Availability Edge Computing Platform. In Proceedings of the 7th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud), Newark, CA, USA, 4–9 April 2019; pp. 85–92. [Google Scholar] [CrossRef]
  22. Murturi, I.; Avasalcai, C.; Tsigkanos, C.; Dustdar, S. Edge-to-Edge Resource Discovery using Metadata Replication. In Proceedings of the IEEE 3rd International Conference on Fog and Edge Computing (ICFEC), Larnaca, Cyprus, 14–17 May 2019. [Google Scholar] [CrossRef]
  23. Facchinetti, D.; Psaila, G.; Scandurra, P. Mobile cloud computing for indoor emergency response: The IPSOS assistant case study. J. Reliab. Intell. Environ. 2019, 5, 173–191. [Google Scholar] [CrossRef]
  24. Jia, C.; Lin, K.; Deng, J. A Multi-property Method to Evaluate Trust of Edge Computing Based on Data Driven Capsule Network. In Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 616–621. [Google Scholar] [CrossRef]
  25. Gamatié, A.; Devic, G.; Sassatelli, G.; Bernabovi, S.; Naudin, P.; Chapman, M. Towards energy-efficient heterogeneous multicore architectures for edge computing. IEEE Access 2019, 7, 49474–49491. [Google Scholar] [CrossRef]
  26. Maciel, P.; Dantas, J.; Melo, C.; Pereira, P.; Oliveira, F.; Araujo, J.; Matos, R. A survey on reliability and availability modeling of edge, fog, and cloud computing. J Reliab. Intell Env. 2022, 8, 227–245. [Google Scholar] [CrossRef]
  27. Pereira, P.; Araujo, J.; Melo, C.; Santos, V.; Maciel, P. Analytical models for availability evaluation of edge and fog computing nodes. J. Supercomput. 2021, 77, 9905–9933. [Google Scholar] [CrossRef]
  28. Battula, S.K.; O’Reilly, M.M.; Garg, S.; Montgomery, J. A generic stochastic model for resource availability in fog computing environments. IEEE Trans Parallel Distrib. Syst. 2020, 32, 960–974. [Google Scholar] [CrossRef]
  29. Kabashkin, I. Dependability of v2i services in the communication network of the intelligent transport systems. In Proceedings of the 6th International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS), Cracow, Poland, 5–7 June 2019; pp. 1–6. [Google Scholar]
  30. Brecko, A.; Kajati, E.; Koziorek, J.; Zolotova, I. Federated Learning for Edge Computing: A Survey. Appl. Sci. 2022, 12, 9124. [Google Scholar] [CrossRef]
  31. Javed, A.R.; Hassan, M.A.; Shahzad, F.; Ahmed, W.; Singh, S.; Baker, T.; Gadekallu, T.R. Integration of Blockchain Technology and Federated Learning in Vehicular (IoT) Networks: A Comprehensive Survey. Sensors 2022, 22, 4394. [Google Scholar] [CrossRef] [PubMed]
  32. Zhu, C.; Zhu, X.; Ren, J.; Qin, T. Blockchain-Enabled Federated Learning for UAV Edge Computing Network: Issues and Solutions. IEEE Access 2022, 10, 56591–56610. [Google Scholar] [CrossRef]
  33. Xue, H.; Chen, D.; Zhang, N.; Dai, H.N.; Yu, K. Integration of Blockchain and Edge Computing in Internet of Things: A Survey. Future Gener. Comput. Syst. 2023, 144, 307–326. [Google Scholar] [CrossRef]
  34. Lang, P.; Tian, D.; Duan, X.; Zhou, J.; Sheng, Z.; Leung, V.C. Cooperative Computation Offloading in Blockchain-Based Vehicular Edge Computing Networks. IEEE Trans. Intell. Veh. 2022, 7, 783–798. [Google Scholar] [CrossRef]
Figure 1. The traditional approach (a), Edge-based (b), and Edge with High-Availability approach (c).
Figure 1. The traditional approach (a), Edge-based (b), and Edge with High-Availability approach (c).
Sensors 23 04871 g001
Figure 2. The concept of traditional IIoT integration (a) and generic Edge based approach (b).
Figure 2. The concept of traditional IIoT integration (a) and generic Edge based approach (b).
Sensors 23 04871 g002
Figure 3. The generic Edge model.
Figure 3. The generic Edge model.
Sensors 23 04871 g003
Figure 4. The availability of sensors values x’i within publishing time tp by Edge basic model.
Figure 4. The availability of sensors values x’i within publishing time tp by Edge basic model.
Sensors 23 04871 g004
Figure 5. The concept of basic Edge model (a) with off-line Edge2 (b).
Figure 5. The concept of basic Edge model (a) with off-line Edge2 (b).
Sensors 23 04871 g005
Figure 6. The Edge recovery model.
Figure 6. The Edge recovery model.
Sensors 23 04871 g006
Figure 7. The basic Edge model with off-line Edge2.
Figure 7. The basic Edge model with off-line Edge2.
Sensors 23 04871 g007
Figure 8. The concept of extended Edge model with mirroring by Edge2 in case the right activity (a), in case of Edge failure (b).
Figure 8. The concept of extended Edge model with mirroring by Edge2 in case the right activity (a), in case of Edge failure (b).
Sensors 23 04871 g008
Figure 9. The extended Edge device da model with mirroring.
Figure 9. The extended Edge device da model with mirroring.
Sensors 23 04871 g009
Figure 10. The concept of extended Edge duplexing in case the right activity (a), in case of Edge failure (b).
Figure 10. The concept of extended Edge duplexing in case the right activity (a), in case of Edge failure (b).
Sensors 23 04871 g010
Figure 11. The extended Edge device model with duplexing.
Figure 11. The extended Edge device model with duplexing.
Sensors 23 04871 g011
Figure 12. Experimental workplace for validation of high-availability Edge model with default configuration (a) and in case of Edge failure (b).
Figure 12. Experimental workplace for validation of high-availability Edge model with default configuration (a) and in case of Edge failure (b).
Sensors 23 04871 g012
Figure 13. Node-RED flows for experimental validation of high-availability Edge models.
Figure 13. Node-RED flows for experimental validation of high-availability Edge models.
Sensors 23 04871 g013
Figure 14. OPC Ua Expert connected to Edge1 OPC UA server with sensor values from Figure 13.
Figure 14. OPC Ua Expert connected to Edge1 OPC UA server with sensor values from Figure 13.
Sensors 23 04871 g014
Figure 15. Node-RED console during initialization of Edge flows.
Figure 15. Node-RED console during initialization of Edge flows.
Sensors 23 04871 g015
Figure 16. Edge Variant 2 -High-availability/Fail-over with HeartBeat flow.
Figure 16. Edge Variant 2 -High-availability/Fail-over with HeartBeat flow.
Sensors 23 04871 g016
Figure 17. Edge basic model and simulation of Edge1 failure—phase 1: Detection of failure.
Figure 17. Edge basic model and simulation of Edge1 failure—phase 1: Detection of failure.
Sensors 23 04871 g017
Figure 18. Edge basic model and simulation of Edge1 failure—phase 2: Application malfunction.
Figure 18. Edge basic model and simulation of Edge1 failure—phase 2: Application malfunction.
Sensors 23 04871 g018
Figure 19. Edge basic model and simulation of Edge1 failure—phase 3: Edge restart.
Figure 19. Edge basic model and simulation of Edge1 failure—phase 3: Edge restart.
Sensors 23 04871 g019
Figure 20. Edge basic model and simulation of Edge1 failure—phase 4: Edge recovery.
Figure 20. Edge basic model and simulation of Edge1 failure—phase 4: Edge recovery.
Sensors 23 04871 g020
Figure 21. Generated and processed data by application during Edge failure and recovery.
Figure 21. Generated and processed data by application during Edge failure and recovery.
Sensors 23 04871 g021
Figure 22. Edge enhanced model—Variant 2: Mirroring—Failure detection.
Figure 22. Edge enhanced model—Variant 2: Mirroring—Failure detection.
Sensors 23 04871 g022
Figure 23. Fail-over process initiated by Edge2—Variant 2.
Figure 23. Fail-over process initiated by Edge2—Variant 2.
Sensors 23 04871 g023
Figure 24. IIoT sensor model for Node-RED—Variant 3 and Edge duplexing.
Figure 24. IIoT sensor model for Node-RED—Variant 3 and Edge duplexing.
Sensors 23 04871 g024
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Peniak, P.; Bubeníková, E.; Kanáliková, A. Validation of High-Availability Model for Edge Devices and IIoT. Sensors 2023, 23, 4871. https://doi.org/10.3390/s23104871

AMA Style

Peniak P, Bubeníková E, Kanáliková A. Validation of High-Availability Model for Edge Devices and IIoT. Sensors. 2023; 23(10):4871. https://doi.org/10.3390/s23104871

Chicago/Turabian Style

Peniak, Peter, Emília Bubeníková, and Alžbeta Kanáliková. 2023. "Validation of High-Availability Model for Edge Devices and IIoT" Sensors 23, no. 10: 4871. https://doi.org/10.3390/s23104871

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop