1. Introduction
Cyber-Physical Systems (CPSs) are highly connected and massively networked systems of cyber (computation and communication) and physical (sensors and actuators) components that interact with each other in a feedback loop to achieve a common goal [
1,
2]. The System of Cyber-Physical Systems (SoCPS) is a complex, heterogeneous system comprising individual independent CPSs to achieve common goals that cannot be achieved by a single system [
3,
4]. The SoCPSs have significantly promoted the popularity of many emerging intelligent systems in our daily lives, for instance, smart agriculture, smart grids, robotic systems, intelligent transportation, and avionic collision avoidance systems. These SoCPS are often part of wider collaborative networks consisting of many other CPSs and form networks capable of providing functionalities that individual systems cannot provide. A platooning system, for instance, allows vehicles to reduce intervehicle distance and hence save fuel as a common speed of all vehicles in the platoon is negotiated by all adaptive cruise control units involved. These types of system networks are highly dynamic as the individual systems can join and leave such networks at runtime.
SoCPS offer unprecedented opportunities to monitor and control the physical world through computation and control functionalities. However, these complex systems pose numerous safety-related challenges because any failure modes within a network of SoCPS may have profound effects on the whole system. Therefore, SoCPS need an extensive rule of design and adherence to the safety properties for all possible interactions so that the opportunities offered by these systems are fully welcomed while ensuring safety. During the operation, if one participant system fails, it impacts the final goal of the SoCPS. Therefore, composite hazard analysis is required where the network of a SoCPS will be analyzed to know the system’s potential failures and their impact on other collaborative systems during design time. As mentioned, the SoCPS are networks of CPSs; therefore, hazard analysis for a single system cannot guarantee SoCPS’s safety due to collaborative behavior. Another major challenge for a network of collaborative CPSs is the validation of behavior emerging due to the collaboration of networked CPSs. This potential emerging behavior can either be a desired property of a system that can enable some kind of functionality in a SoCPS or an undesired event that can potentially lead to a dangerous state. Daun et al. [
5] contributed to the review of the emergent behavior in the networks of CPSs by proposing the automated generation of instance models that allow the assessment of different network configurations of SoCPS. The proposed approach depends on the automated generation of diagrams to validate different configurations of a SoCPS considering multiple instances of participating CPSs.
However, automated safety analysis techniques can aid in the validation of expected behaviors of a system [
6,
7]. To ensure safety for a network of a SoCPS, the individual participant system’s behavior for each function should be analyzed together with the collective behavior of other participant systems. Analyzing safety for each participant system in SoCPS cannot guarantee the safety of the whole SoCPS. It must be ensured that the network CPSs behave safely, which means that it is necessary to identify the safety faults that arise from the interplay between different CPSs in SoCPS. Hence, it is insufficient to ensure the correct behavior of each system, as the behavior stems from the interaction of various systems that cannot be attributed to an individual system and, therefore, cannot be specified for an individual participant system. SoCPS is an integrated set of systems that uses each system in a collaborative fashion to achieve a common mission that the individual systems in the network of SoCPS cannot achieve. Moreover, the SoCPS employ interdependencies that further complicate the system’s operations.
Traditional safety analysis techniques cannot cope with the complexity of SoCPS because each system in SoCPS is considered to be an independent system, and its safety analysis is also conducted independently. Therefore, new safety analysis techniques are required to analyze hazards for collaborative SoCPS. Furthermore, these new safety analysis techniques must handle a large network of SoCPS and produce meaningful results while remaining economically practical. Therefore, an automated composite hazard analysis would support the safety of the dynamically forming network of CPS. Our proposed automated composite safety analysis technique relies on individual CPSs’ available documentation (hazard analysis artifacts) and predefined constraints for collaboration among CPSs. To this end, we developed a tool called SoCPSTracer that takes hazard analysis artifacts for participant collaborative CPS as input and generates a Fault Propagation Graph (FPG). It is a directed graph that enables safety engineers to determine the flow of faults in the network of SoCPS. The FPG also gives information about inter-system and intra-system fault propagation and their impact on other systems in the network of SoCPS. The SoCPSTracer tool mainly contributes to the traceability strategy that defines a traceability information model, processes, and tooling in traceability fundamentals [
8,
9]. In summary, we make the following contributions to this article:
First, we define three relationships (i.e., influence, countermeasure, and overlap relationship) among the hazard analysis artifacts of participating systems in SoCPS. The relationship among the faults, their causal factors, outcomes, and countermeasures provide information about the propagation of a fault within a single system and/or in the network of SoCPS. We assume that all participating CPSs in SoCPS are analyzed using FMEA, ETA, and FTA. We also assume that hazard artifacts for all participating systems in a network of SoCPS are available. However, we can also produce hazard artifacts for participating systems using our tool and make them available.
Second, to support safety engineers in analyzing and investigating the potential faults, we propose a diagrammatic representation of identified faults and their manifestations in the SoCPS network. The automatically generated diagrammatic representation shows fault propagation in the network of SoCPS, which we call FPG. Thus, the FPG gives information about the propagation of a fault in the network of SoCPS. Using this FPG, we can trace faults back to their source and apply preventive mechanisms to mitigate the potential faults.
Third, a tool called SoCPSTracer is developed to support our SafeSoCPS approach. This proposed tool enables safety engineers to analyze safety for a network of CPSs and produce a FPG that determines the propagation of faults among/between systems and their impact on other components/systems.
Thus, the remainder of this paper is organized as follows.
Section 2 discusses the background and related work. In addition,
Section 3 illustrates the proposed approach.
Section 4 presents a tool to support our proposed approach, and
Section 5 illustrates a case study to validate our proposed approach. Finally, the limitations of this study of are presented in
Section 6, followed by a summarizing conclusion in
Section 7.
3. SafeSoCPS Approach
Generally, safety analysis is often considered a combination of manual and automated techniques [
22,
23]. This section explains our approach called SafeSoCPS where we follow a general four-step approach for safety analysis in CPSs. First, the safety requirements for the system under development are defined. Second, the potential safety hazards for participant systems and for SoCPS are identified using a composite hazard analysis. Though various safety analysis techniques exist, we assume that safety engineers conduct Failure Mode and Effect Analysis (FMEA), Event Tree Analysis (ETA), and Fault Tree Analysis (FTA) to identify faults, consequences, and sources of identified faults. Third, based on the safety analysis results obtained in the second step, the requirements are modified or added new requirements (first step) and repeated in the second step. Fourth, the newly obtained knowledge about the occurrence and the manifestation of potential faults is considered to discover new faults, and the system is lastly certified for safety. The SafeSoCPS approach contributes mainly to the second step of the general safety process that we follow.
In the following subsections, we explain composite safety analysis for SoCPS (SafeSoCPS).
3.1. Content Relationship among Hazard Analysis Artifacts
Content relationships among hazard analysis artifacts for multiple CPS are defined to support the safety analysis for the network of a SoCPS. Initially, we introduced content relationship between hazard analysis artifacts for FTA, ETA and FMEA in [
24]. However, in this article, we present an improved version of those defined relationships. The description of these relationships is as follows:
Influence Relationship: A relationship in which a fault of one participating CPS affects another participating system (s) in the SoCPS network.
Overlap Relationship: If a failure mode or a causal factor in FMEA and an initiating event in ETA lead to the same consequences. Then, the consequences (system effect in FMEA and outcomes in ETA and vice versa) will have an overlapping relationship. Simply, if two faults in the network of SoCPS result in the same consequences, then their consequences will have an overlapping relationship.
Countermeasure Relationship: A countermeasure relationship exists when the safety guard for a particular fault in one participating CPS is used to counter a fault (s) in another participating CPS in the network of SoCPS.
Figure 1 shows an example relationship among hazard analysis artifacts. For instance, a failure mode in FMEA, i.e., “Detection Failure” has an “Influence Relationship” with “Robot Collision”, an outcome in ETA, and “Robot Collision” event in FTA. Similarly, the “Robot Collision”, a system effect in FMEA has an “Overlap Relationship” with “Robot Collision” which is an outcome of fault “Obstacle Detection Failure”. Meaning that both faults “Detection Failure” in FMEA of searching robot (a participating CPS in HRRS) and “Obstacle Detection Failure” in ETA of obstacle robot led to the same consequence “Robot Collision”. We see that the safety guard for the “Obstacle Detection Failure” fault in ETA is not known. However, we observe that the “Detection Failure” fault which overlaps “Obstacle Detection Failure” has a safety guard, i.e., “Increase Sensor Capability”, to avoid robot collision. Therefore, an “Increase Sensor Capability” safety guard can be used as a countermeasure to mitigate the “Obstacle Detection Failure” fault in ETA.
3.2. Composite Safety Analysis Model
In order to realize the above defined relationships, we formalized the hazard analysis techniques and proposed a Composite Safety Analysis Model (CSAM). The definition of CSAM is as follows:
Definition 1 (Safety Analysis for SoCPS): Safety analysis for SoCPS is defined as a tuple SA = <ID, HAT, S, L>. Where ID is a unique identification for safety analysis, HAT is a hazard analysis technique applied to analyze the system S SoCPS such that and L is the relationship among the components of hazard analysis artifacts.
Definition 2 (FMEA): FMEA model is defined as a tuple of FMEA = <ID, I, FM, SE, CF, RA, L>. Where ID is a unique identification of FMEA set, I is a set of item/function lists, {i1, i2, …}, FM is a set of failure modes, {fm1, fm2, …}, SE is the set of effects/hazards, {se1, se2, …sen} in FMEA, CF is the set of causal factors, {cf1, cf2, …, cfn}, RA is the set of recommended actions/safety guards, {g1, g2, …, gn} provided for a particular fault in FMEA, and L is the established relationship link of a component C = (I, FM, SE, CF, RA) with other components in FTA and ETA such that C FMEA → C FTA V C ETA. Meaning that the components which belong FMEA may have a relationship with the components of FTA or ETA.
Definition 3 (FTA): FTA model is defined as a tuple of FTA = <ID, G, E, TE, L>, where ID is the unique identification of the FTA set, G is the set of gates in a fault tree, E is the set of event modes E {e1, e2, … en} in the fault tree, TE is the set of top event TE {te1, te2, …ten} in a fault tree, and L is the established relationship link of a component C = (E, TE) with other components in ETA and FMEA such that C FTA → C ETA V FMEA.
Definition 4 (ETA): ETA model is defined as a tuple of ETA = <ID, IE, PE, O, L>, where ID is a unique identification of ETA set, IE is set the initial events IE{ ie1, ie2, …, ien}, PE is the set of pivotal events PE {pe1, pe2, …, pen} in ETA, O is the set of outcomes {o1, o2, …, on} of an initiating event IE of an ETA, and the L is the established relationship link of a component C = (IE, PE, O) with other components in FTA, and FMEA such that C ETA → C FTA V FMEA.
3.3. Diagrammatric Representation
The diagrammatic representation in SoCPS is critical because the information about fault propagation in the network of CPS can help safety engineers to mitigate a particular fault. This information may also help safety engineers to identify potential faults present in the participating CPSs and their propagation route. After identifying faults and their propagation routes, safety engineers can generate behavioral models for different CPS network configurations and provide them as input to the system verification techniques. The behavioral models of participating CPSs in SoCPS network configurations can be checked for the unwanted behaviors identified during a composite hazard analysis. Identifying the unwanted behavior during a specific configuration may help to correct the cause of the undesirable behavior in the SoCPS. However, in SoCPS, the unwanted behavior may occur due to the interplay of the systems, which cannot be corrected easily because the interaction among systems is dynamic in nature. The behavioral models for an overall network of SoCPS should be investigated to fix unwanted behaviors. Therefore, to support safety engineers, there is a need to trace unwanted behavior within a participating system and in the network of a SoCPS.
Figure 2 shows the traceability of a fault within a participating system to other participating CPSs in the network of SoCPS.
Highlighting the propagation of a fault in the original specification (
Figure 2) can help to investigate the manifestation of faults in the collaboration of different systems in the network of the SoCPS. However, since the SoCPS is a complex network of individual systems; manual analysis of such diagrams is time-consuming and economically not feasible. Therefore, we developed a tool called SoCPSTracer which produces the fault propagation graph to know the faults propagation and their impacts in the network of SoCPS.
The proposed FPG is a directed graph G = (N, E), where N is a node in FPG such that N
and E represents edges where each e of E is specified by n ordered pair of nodes n
1, n
2 ∊ N. The edge between e = (n
1, n
2) shows the edge (relationship) e between node n
1, and n
2 which can also be written as n
1→n
2. Further explanation of FPG is mentioned in
Section 4.
4. SoCPSTracer
The SoCPSTracer is an exclusively developed tool to support the SafeSoCPS approach. The SoCPSTracer comprises three major components, i.e., safety analysis manager, traceability analyzer, and traceability presenter, as shown in
Figure 3. SoCPSTracer is implemented in Java and JavaFx was used to develop user interface. The system was equipped with core i7 processors and 32 GB RAM for the experimental setup. Additionally, NVIDIA GeForce RTX2060 GPU was added to the system to faster the computational process, and better visualization of data on FPG. Data visualization on FPG is supported by an opensource java library called smartgraph (
https://github.com/brunomnsilva/JavaFXSmartGraph, accessed on 20 April 2022) that supports directed and undirected graph generation. We customized the smartgraph library to visualize the data on FPG according to our requirements.
The major components of SoCPSTracer are described as follows:
Safety Analysis Manager: This component comprises four sub-components, i.e., FTA editor, ETA editor, FMEA editor, and importer, as shown in
Figure 4 (left, 1). The participant systems in the network of SoCPS can be analyzed using the respective hazard analysis techniques and produce hazard analysis artifacts for participating systems. The produced artifacts are then saved into the hazard analysis artifacts repository. The importer can be used to import already existing hazard analysis artifacts for participating systems in the network of SoCPS.
Traceability Analyzer: This component comprises a relation detector and traceability repository, as shown in
Figure 4 (top, 2). A relation detector is a component that identifies and connects the trace links (relationship) among hazard artifacts. Algorithm 1 is used to detect relationships among the hazard analysis artifacts. The traceability repository is used to store trace models.
Traceability Presenter: This component of SoCPSTracer is consists of a traceability viewer sub-component and a impact analysis sub-component. The traceability viewer displays visual trace information among hazard analysis artifacts which we call FPG in the SoCPSTracer, as shown in
Figure 4 (top, 3). FPG is the manifestation of the relationship among hazard analysis artifacts. The impact analysis in FPG helps to determine the impact of a fault on other participating systems.
The FPG is a digraph of vertices and edges. The edges connect the vertices or nodes in FPG. The nodes represent faults or safety guards, whereas edges represent the relationship between two or mode nodes. An edge is placed between a pair of nodes if they are related to each other in a certain way.
Formally, the FPG is expressed as, Let G = (, , E) where N is a finite set of FPG nodes, i.e.,. Each node n in FPG carry some information to describe the node, i.e., n = (), where is the description of a fault or a safety guard, is the name of the system from where a fault or safety guard belongs to, and is the name of the hazard analysis technique that is used to analyze that particular node. The E in G represents edges where each e ∊ E is specified by n ordered pair of nodes n1, n2 ∊ N. The edge between e = (n1, n2) shows the edge e between vertex n1 and n2, which can also be written as n1→n2. In order to check the fault propagation of a specific fault in FPG, we use a subgraph that considers only influence relationships in FPG and generates the propagation graph for a specific fault. In order to know the fault propagation for a particular node in FPG, it is necessary to find the in-degree and out-degree of that specific node.
Let be a graph for the propagation of a specific node n N. The neighbors of n are the set of nodes adjacent to n through an influence relationship. Therefore, the out-degree of a fault in FPG is used to draw .
(n) = {n N: ∃e ( E (e = {n1, n2} or n1 = n2 and e = {n2})} where (n) is the set of nodes n ∊ N such that there exists an edge e so that the influence relationship always holds in E so that e = {n1, n2} holds.
Similarly, the in-degree of a specific fault in FPG is used to recover the traceability of a specific fault which is called . is for a particular fault, and n tells what kind of other faults may lead to that particular fault n. The set of incoming edges of a vertex, e.g., n1 ∊ N are all those edges whose arrows point into n2 ∊ N.
(n) = {n N: ∃e ( E (e = (n1, n2) | n2, n1))}.
Each n has a unique ID given by . The relation e gives the communication topology between nodes n1 and n2. An edge e = () indicates a directed relationship between and .
In FPG, every node n ∊ N contains information that helps safety engineers understand the relationship of a specific fault that belongs to a particular system and is analyzed by a particular hazard analysis technique. Therefore, node N = (system, elementOf, description) has three kinds of information: the system which tells from which system the fault or safety guard belongs to, elementOf uniquely determines the faults or safety guards belonging to which hazard analysis technique and description is the definition of faults or safety guards. This information helps to trace a fault among a network of SoCPS.
Let X (faults from FTA, FMEA, and ETA) are the sets of faults we are interested in discovering their relationships, i.e., countermeasure relationship (R1), influence relationship (R2), and overlap relationship (R3). Let Z is the disjoint union set of X. Therefore, Z {x: x X}. Let A {a1, a2, ..., am} is the set of hazard analysis artifacts, i.e., failure modes, causal factors, system effects and recommended actions obtained from FMEA, top events, intermediate events, and basic events obtained from FTA, and pivotal events, initiating events and outcomes obtained from ETA. Let A is the disjoint union set of A. Therefore, A{a: a A } and A ⊂ Z.
The influence relationship (R2) can be established in three perspectives. Therefore, in Algorithm 1, each influence relationship is reflected separately. In the first case, n (failure mode) and m (causal factor) may have an influence relationship with a hazard artifact
am such that n and m lead to x (system effect), which belongs to FMEA if and only if the
x is determined to be similar with a
m (event) in FTA or a
m (outcome) in ETA. The similarity between two contents of the hazard analysis techniques is calculated using Jaccard similarity index which is a commonly used algorithm to compare two strings [
25]. Equation (1) shows the Jaccard similarity index where J(A,B) is the Jaccard similarity between string A and string B.
In the second case, the y (initiating event) may have an influence relationship with a hazard artifact am such that y leads to x, which belongs to ETA if and only if x is determined to be similar with a am (event) in FTA or am (system effect) in FMEA. In the third case, the w (child events) may influence am such that w led to x, which belongs to FTA if and only if x is determined to be similar to am (system effect) in FMEA or am (outcomes) in ETA.
Two kinds of possibilities may exist for the countermeasure relationship (R1), as mentioned in Algorithm 1. In the first case, the c (recommended action in FMEA) counters e (event in FTA) such that c is defined as recommended action for x (failure mode) in FMEA if and only if x is determined to be similar to e in FTA. In the second case, the c counter h (initiating event) such that c is defined as recommended action x (failure mode) in FMEA if and only if x is determined to be similar to h (initiating event in ETA).
Algorithm 1 also shows that there may exist an overlapping relationship (R3) between x (system effect) in FMEA such that u (failure mode) lead to x in FMEA, if and only if x is determined to be similar to a
m (outcome) in ETA and vice versa.
Algorithm 1: FPG generation for fault traceability |
| Input: R1, R, R3 |
| Output: FPG |
1 | |
2 | |
3 | |
4 | |
5 | |
6 | | |
7 | | | |
8 | | | |
9 | | | |
10 | | | |
11 | | | |
12 | | | |
13 | | end |
14 | end |
15 | FPG ← R1(c,e) V (c,h) + R2 ((n Λ m), ) V (y, ) V (w, )) + R3 () |