3.1. The CPS Orchestration Model
In
Figure 2 an inner feedback loop reveals the interactions between the cyber and physical spaces and how the CPS is monitored through an inbound surveillance and an outbound actuation process. The figure also reveals: (1) A physical-to-cyber flow of information originating in the physical space and translated into services at higher levels in the cyber space and (2) a cyber-to-physical flow of information, initiated by the processing of information in the cyber space to achieve actuation, both within the sensor network and on physical objects in the physical space. In essence, information flowing from the physical to the cyber space includes: “Sensor data” obtained from the sensing of physical objects and “sensor network monitoring data”, which are extracted from sensor network surveillance. Information flowing from the cyber to the physical space on the other hand contain “control signals” for actuating physical devices within the environment and “control messages” required for re-optimisation and reconfiguration of the sensor network. In the proposed CPS management model, the key processes are sequentially iterated in a loop as follows:
Sensing: This is initiated in the physical space to sense the environment for further processing in the cyber space. It generates “sensing data”, which are routed over paths created by CTPs. These sensing data are used to analyse the diffusion of information in the CPS-IoT subsystem, for the purpose of monitoring the performance of the subsystem and and anomaly detection.
Communication: The communication process is initiated at the physical space where IoT protocols such as LoRa, ZigBee, and BLE are used to move data from sensors to sink nodes and finally to gateways at the edge layer. Communication processes can also extend to the cyber space where other protocols are used to move data from the gateways to the fog layer for further processing.
Inbound surveillance: This is the first process initiated in the cyber space and is used to assess the integrity and performance of the CPS’s IoT subsystem as well as detect anomalies before and during data processing.
Data + Control: Filters data and control information and stores the information into databases for batch or stream processing. In batch processing, data are stored in databases for a longer period of time while streamed information are processed in real time to produce data analytics for decision making.
Data processing: Here various analytics and simulations are applied to the data (both batch and stream) to support decision systems.
Observation center: The observation center is an entity of the CPS management infrastructure where the results are observed by the operation specialists for decision-making purposes (storage or actuation).
Outbound actuation: The outbound actuation is the last process implemented at the cyber space. It involves adjusting the operating parameters of the IoT subsystem based on the inbound surveillance to restore the CPS’s optimal settings whenever necessary.
Actuation: Initiated in the physical space to either restore the network to its optimal operational level or perform actuation on physical objects in the environment as a result of the data analytics or decision(s) made at the observation center.
As described above, the CPS orchestration model includes an adaptive control loop similar to the MAPE-K structure [
33,
34,
35,
36,
37] where (i) “Monitoring” is achieved through the sensing process, (ii) the “Analysis” is done by the inbound surveillance, (iii) “Planing” is performed through the Data processing process, (iv) the “Execute” is achieved by the outbound actuation process, and (v) the “Knowledge” acquired in the observation center is used to support decision making.
3.2. Epidemic Modelling of the CPS-IoT
As illustrated in
Figure 3a, the states of the nodes of the CPS-IoT can be mapped into epidemic states when considering: (i) The risks and levels of congestion associated with the network collection tree topology and connectivity and (ii) the energy that can be drawn from nodes as a result of such topology and connectivity. As IoT networking uses CTPs, the number of “children” carried by a node (node interference), can be translated into an epidemic state expressing the level of contamination of the node. In this paper, two interference thresholds,
and
, have been used to map epidemic states into IoT networking states. Building around the SIR epidemic model, this paper considers the SAR model that uses three epidemic states referred to as Susceptible (safe), Attacked, and Removed statuses. These states are respectively the loose equivalent of the Susceptible, Infected, and Removed states of the SIR model. We define the considered states as follows:
Susceptible nodes: Are the least or non-interfering nodes in a network. Their total number is denoted by S. Each susceptible node n is assumed to have weight (level of interference) less than the threshold .
Attacked nodes: Are highly interfering but still operational nodes. The total number of infected nodes in a network is denoted by A. An infected node is assumed to have weight less than the threshold but at least equal to the threshold .
Removed nodes: Are nodes that are no longer operational as a result of high levels of interference between themselves. These nodes are also referred to as depleted and their total number is denoted by R. A node is considered to be removed if its interference is at least equal to the threshold .
Note that the transitions from Susceptible to Attacked (S-to-A) and from Attacked to Removed (A-to-R), may determine the IoT’s safety level. From a CPS-IoT safety perspective, high availability translates to a “healthy” or “uninfected” state, while reliability translates to the ability of a node to accurately and safely carry out its function(s). An analysis of dependability based on the epidemic model proposed in this paper may thus lead to four potential cases depicted in
Table 1. Higher migration rates from a state to another are represented by binary values “1” while lower migration rates are represented by binary numbers “0”.
Higher transition rates from a susceptible to an attacked state is an indication of an IoT network with low availability. This is because nodes will move faster from the susceptible (safe) state to the attacked state;
Lower transition rates from the susceptible to attacked state are an indication of an IoT network with higher availability; that is nodes stay longer in the susceptible state;
Higher transition rates from attacked to depleted state are an indication of a more reliable IoT network. This is because the infected (attacked) nodes are quickly removed to minimise potential risks to the IoT network;
Lower transition rates from an attached to removed (depleted) state are an indication of a less reliable IoT network. This is because nodes stay longer in an attacked state, hence increasing the potential risks to the CPS and its IoT network sub-system.
Note that as reported in
Table 1, the best performance in terms of dependability (safety) is achieved in Case 4. This applies to sensitive applications with hard real-time constraints, which should be deployed with a low migration rate from susceptible to attacked status
and a high migration rate from attacked to removed status
. Such applications will require higher surveillance in terms of anomaly detection and quick response/actuation. For instance, this can be implemented using interference-aware data collection algorithms (such as Least Interference Beaconing Protocol (LIBP) [
38]) that can prevent IoT nodes from moving into the attacked state. Similarly,
Table 1 reveals that the average performance in terms of dependability is achieved in Case 2, when the IoT nodes stay longer in the susceptible and attacked states before being removed. Case 1 and Case 3 depict low dependability situations with the worst scenario being Case 3, where the nodes spend less time in both the susceptible and attacked states. Case 2 applies to non-sensitive applications with soft real-time constraints that should be deployed with low migration rates from susceptible to attacked status
and from attacked to removed state
. Such applications will require less surveillance in terms of anomaly detection and slower response/actuation by implementing cost aware data collection algorithms such as Routing Protocol for Low-Power and Lossy Networks (RPL) [
6]. The analysis of the dependability/safety cases when triggered by transition rates adjustments is presented in
Section 5. The comparison between the CTPs in terms of epidemic status levels and their relevance in CPS dependability will also be evaluated in
Section 5.
In modelling these transitions epidemically,
Figure 3b presents a finite state machine of the epidemic model upon which the dependability of the CPS relies upon. It depicts a network that has been partitioned into a number of diffusion sets, with nodes in the same diffusion set assumed to be infectiously similar to each others. The figure reveals the states of the diffusion sets and for each state its associated transitions as well as the actions that trigger such state transitions. Susceptible nodes in the diffusion set
may be attacked at the rate
, while the attacked nodes from
are removed at rate
. Susceptible nodes in the diffusion set
may experience high interference, enough to move them directly to the removed status without transiting through the attacked status. On the other hand, removed nodes my cause some of the susceptible or attacked nodes to leave the network because of the destruction of connection links. We consider
to be the rate at which susceptible nodes in
are removed. Susceptible nodes from diffusion set
migrate to diffusion set
with rate
, and attacked (or infected) nodes in
migrate to
at a rate of
. In contrast to the classic SIR epidemic model, our model allows susceptible nodes to move directly into the removed state. This reflects the case of nodes that are removed from the system for reasons other than the infection/attack, such as component failure or malfunction. The difference in equations resulting from these assumptions will be presented as constraints of the CPS monitoring model proposed in
Section 4. Using the Euler method, this system of differential equations can be solved to produce three graphs representing the evolution of the populations
S(
t),
A(
t), and
R(
t) over time as presented in
Figure 3c.
With regards to CPS safety, these cases may be translated into an objective optimisation problem, consisting of maximising or minimising the surface below the curves
S(
t),
A(
t), and
R(
t). This can be expressed by the integral of these curves over time and defined as:
where
T is the study period of the epidemic model and ± express a maximisation or minimisation of the surface below the epidemic curve in relation to
Table 1.