System-of-Systems Resilience Analysis and Design Using Bayesian and Dynamic Bayesian Networks

Jiao, Tianci; Yuan, Hao; Wang, Jing; Ma, Jun; Li, Xiaoling; Luo, Aimin

doi:10.3390/math12162510

Open AccessArticle

System-of-Systems Resilience Analysis and Design Using Bayesian and Dynamic Bayesian Networks

by

Tianci Jiao

¹

,

Hao Yuan

²,

Jing Wang

¹,

Jun Ma

¹,

Xiaoling Li

^1,* and

Aimin Luo

^2,*

¹

College of Computer, National University of Defense Technology, Changsha 410075, China

²

College of System Engineering, National University of Defense Technology, Changsha 410075, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2510; https://doi.org/10.3390/math12162510

Submission received: 3 June 2024 / Revised: 25 July 2024 / Accepted: 3 August 2024 / Published: 14 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

A System-of-Systems (SoS) is characterized both by independence and by inter-dependency. This inter-dependency, while allowing an SoS to achieve its objectives, also means that failures can cascade throughout the SoS. An SoS needs to be resilient to deal with the impact of complex internal and external environments. In this paper, we propose a resilience analysis method of an SoS based on a hierarchy structure. Firstly, we establish a hierarchy structure, which is ranked from high to low as capability level, activity level and system level. Then, Bayesian Networks (BNs) and Dynamic Bayesian Networks (DBNs) are used to analyze the resilience of the SoS. A resilience-based system importance metric is introduced, which is used in the budget allocation optimization problem during the development domain of an SoS. This paper proposes a mathematical programming model aimed at optimizing SoS resilience by optimally using budget to the subsystem. The application of the proposed approach is demonstrated using a case study: a Next Generation Air Transportation setting. The study results provide evidence that the proposed inter-dependency analysis based on Bayesian theory and the SoS resilience design approach can assist SoS system engineers in increasing expected SoS resilience during the development domain.

Keywords:

resilience analysis; resilience design; Bayesian Networks; Dynamic Bayesian Networks; budget allocation

MSC:

90-10

1. Introduction

With the rapid development of network and information technology, the interconnection and interoperability between systems are becoming closer and closer. Heterogeneous component systems, which are independent of operation and management, are related, interact and coordinate with each other, and can feature capabilities that single system does not have [1]. This kind of integrated large-scale system is the current research hotspot in the field of systems engineering, which is called System-of-Systems (SoS). The SoS can be found in various domains, such as communication networks, the national transportation system, the ballistic missile defense system, and the national airspace system [2,3,4]. Different fields have different definitions of the SoS. For example, He et al. [5] defined the combat SoS as a higher level, larger scale, and more closely coordinated combat system composed of different subsystems that can independently perform a certain combat mission. Hynes et al. [6] defined the economic SoS as an enormous system composed of various links, levels and fields of economic activities, which affect, depend on each other, and develop together. Shen et al. [7] believe that the comprehensive transportation SoS refers to a transportation network formed during the process of socialized transportation, which is characterized by division of labor, organic integration, connectivity, and reasonable layout. Shpak et al. [8] defined the enterprise management SoS as a set of systems that coordinate, control, and optimize various internal works of the enterprise in order to achieve the established goals. As a kind of complex large-scale engineering system, the operation cycle of the SoS is long, and the external environment is complex and changeable. In addition, the system inter-dependency in the SoS is intricate. All of the above factors can lead to development failures of the SoS. Therefore, it is necessary to analyze and evaluate the SoS’s resilience.

The word resilience originated from the Latin word “resiliere”, which means to “bounce back” [9]. Several definitions of resilience have been offered. Many are similar, though many overlap with a number of already existing concepts such as robustness and restorability, among others. But resilience is different from those concepts. Robustness refers to the ability of a system to survive in abnormal and dangerous situations, while recoverability focuses more on the ability of the system to quickly recover to a normal state after being damaged [10,11]. Resilience can be considered as the combination of the above two abilities, referring to the ability of a system to reduce losses while quickly restoring to a normal state after being attacked [12]. As a new subject, SoS has a wide range of application prospects. The increase in the scale of the SoS leads to the gradual increase in the difficulty of its stable operation, while the resilience of the SoS can ensure the degraded operation in the case of capabilities loss and quickly recovery from impaired state.

Some scholars offer a general definition of resilience which spans multiple domains. For example, Pregenzer [13] defined resilience as “the ability of system to maintain its vital functions while absorbing persistent and unpredictable changes”. Allenby [14] defined resilience as “the ability of system to maintain its function and structure in the face of internal and external changes”. Haimes [15] defined resilience as “the ability of system to withstand significant damage within an acceptable range and recover within an acceptable range of time and cost”. For different subjects and application fields, the concept of resilience also has certain differences. Scholars have given a variety of definitions of resilience for different fields. Sheffi [16] defined the resilience of an enterprise as the ability to maintain or restore a stable state, which enables the enterprise to continue to function normally after being damaged. From the perspective of ecosystem resilience, Adger [17] defined social resilience as “the ability of a group to cope with external pressures and disturbances caused by social, political, and environmental changes”. Rose et al. [18] analyzed the major impacts of natural disasters and man-made operations on the economy, and they described economic resilience as “the ability of enterprises and regions to avoid possible losses”. Wears et al. [19] gave a definition of resilience in engineering: the ability of a system to adjust its functional state in the face of interference or unpredictable changes. The military field also attaches great importance to the study of resilience. Lan et al. [20] believed that the resilience command information system referred to the system that can still adjust its own state to complete the specified mission in the face of external environment changes.

Resilience analysis is the basis of resilience design and optimization. Analysis methods of resilience are mainly divided into three categories: qualitative, semi-quantitative, and quantitative methods. The qualitative analysis method is also called non-data analysis, and the qualitative analysis method of resilience is mainly used to build the conceptual framework. Speranza et al. [21] developed a conceptual framework to analyze the resilience of human livelihoods. The framework proposed three dimensions of resilience: buffer capacity (the change system can withstand), self-organization capacity (new structures emerge from existing social structures), and learning (adaptability). Semi-quantitative resilience analysis methods usually consist of a series of questions, which aim to analyze the characteristics of different resilient systems, such as redundancy and resource satisfaction. For example, Cutter et al. [22] identified 36 resilience variables for communities to cope with natural disasters, including redundancy, resource abundance and robustness, etc. These 36 variables were further subdivided into five categories: infrastructure, institutions, economy, society, and social capital. The quantitative analysis method is mainly used to put forward the specific index and method of calculating the resilience of the system (or SoS). Existing research methods for the quantitative analysis of system resilience mainly fall into two categories. One is by measuring the ability of the system (or capabilities of SoS). The other is to study how the structure of a system (or SoS) affects its resilience. We broadly characterize these general measures as deterministic and stochastic, each of which has been used to describe static and dynamic system behavior. The difference between the deterministic and stochastic methods is whether uncertainty is incorporated into the measure. Bruneau et al. [23] proposed a typical deterministic method to analyze resilience. They defined four dimensions in the resilience triangle model: robustness, rapidity, resource abundance, and redundancy. Meanwhile, they proposed a static index for analyzing social resilience. BN also has a wide range of applications in resilience analysis. Yodo [24] developed a resilient supplier selection method based on the BN. Niamat et al. [25] used the BN to study and analyze the resilience of Washington’s power infrastructure.

Due to the inter-dependency between the constituent systems of an SoS, it is difficult for SoS system engineers to manage SoS capabilities in operational and development domains. These complex inter-dependencies among the constituent systems can be attributed to four attributes: (1) the heterogeneity of constituent systems, (2) the distributed nature of these systems, (3) uncertainty about the future state of the SoS, and (4) SoS hierarchy [26]. In order to analyze the complex inter-dependency of the SoS, it is necessary to understand how to model them. A hierarchy structure is a typical characteristic of an SoS which is designed to meet complex and advanced functional requirements [27,28]. The hierarchy structure of the SoS can help SoS system engineers understand the relation between SoS capabilities and its constituent systems. Therefore, three different levels are used to represent the structure of the SoS: the higher-level nodes are used to describe the capabilities of the SoS, those in the middle level denote the activities needed to realize the capabilities, and the ones in the lower level designate the constituent systems needed to complete those activities. We use the BN to analyze the relationship between levels and establish a static analysis model for SoS capabilities. Furthermore, the DBN is used to obtain the process of SoS capabilities changing over time, and a resilience analysis index based on the evolution process of SoS capabilities is proposed. In addition, we also analyze the impact of different systems on the resilience of the SoS, which can be used as a basis for the allocation of budget in development domains. A mathematical programming model aimed at optimizing SoS resilience by optimally using the budget of the subsystem is proposed. At last, the Next Generation Air Transportation setting is used to demonstrate the resilience analysis and design approach proposed in this paper. Continuously rising demand in the National Airspace System (NAS) is one of the major contributing factors that aggravate system-wide congestion and delays. Therefore, mitigating congestion at the NAS has become one of several priorities for the Federal Aviation Administration (FAA). To solve this, the FAA proposed new operating concepts and technologies to achieve the Next Generation Air Transportation System (NextGen) [29]. The analysis and design of NextGen’s resilience are of great significance for improving the whole navigation capacity.

The rest of the paper is organized as follows: Section 2 introduces the hierarchy structure of the SoS. Section 3 is dedicated to reviewing the basics of the BN and DBN. In Section 4, we propose a resilience metric and a resilience-based system importance metric. Section 5 presents a mathematical model aimed at optimizing SoS resilience. A Next Generation Air Transportation setting is exhibited in Section 6 to illustrate the resilience analysis and design approach. Finally, discussion and conclusions are given in Section 7 together with the proposed future work.

2. Hierarchy Structure of an SoS

2.1. Hierarchical Representation of an SoS

As introduced in the previous section, managing the development and evolution of SoS capabilities remains a challenge due to the complex inter-dependency between constituent systems. Uncontrolled propagating effects through the complex inter-dependency may lead to the severe failure of dependent systems within the SoS. For example, in early 2001, the disruption of electric power in California affected oil and gas productions, refinery operations, and the pipeline transport of gasoline [30,31].

The success of SoS development can be determined by the degree to which it meets the desired objectives. Therefore, the first phase of any SoS development activity is to define an appropriate objective. Once the SoS system engineers have defined the objectives of the SoS, the next step is to identify the activities required to meet the objectives, and the completion of these activities depends on the collaboration of a single or multiple constituent systems within the SoS. During this process, SoS system engineers can put forward a lot of alternatives to develop the SoS. The main task for SoS system engineers is to choose the best one which can effectively achieve SoS capabilities and resilience under multi-party constraints [32,33,34,35].

One of the common features of different SoSs is their hierarchy [36,37,38]. Therefore, we represent the SoS hierarchically with three distinct levels: entities in the highest level represent the capabilities of the SoS, those in the middle level denote the activities to achieve those capabilities, and the ones in the lowest level designate the constituent systems within the SoS needed to complete those activities. The hierarchical representation also helps us to understand how the changes of physical systems influence the capabilities of the SoS. Figure 1 shows a hierarchical abstraction of the desired SoS capability, the activities that must be met to achieve this capability, and the physical systems needed to complete those activities. Color represents different levels: blue (SoS capability level), green (activity level) and orange (system level). The arrows represent the inter-dependency between nodes within the SoS. The ellipses around the collection of nodes and the lines from the ellipses to the node at the next higher level mean that the nodes in the ellipses are required to achieve the node at the next higher level.

The hierarchical representation of an SoS, as shown in Figure 1, relates the overall objectives of the SoS at the higher levels to the material features at the lower levels by “Why, What, How” relationships indicating how higher levels are achieved, what higher levels consist of, and why lower levels exist [39]. “Why” answers the question “Why should we do this?”. The highest level serves as the “Why” for the lower level. The activity level answers “How” the capability level is accomplished by the activities, which can then be broken down into specific physical systems. The system level describes “what” we need to complete the activities.

The littoral combat ship (LCS) is used as an example to demonstrate the hierarchical representation process of an SoS [40]. The LCS is the first vessel in the new surface ship family of the US Navy. The LCS system is designed to counter growing potential threats in the littoral zone such as coastal mines, quiet diesel submarines, and terrorists on small, fast, and armed boats [41,42]. The LCS is an SoS composed of independent, heterogeneous and evolving subsystems, nested in a hierarchy structure, so decomposing the LCS into hierarchy helps to better understand it. Figure 2 shows the decomposition of the LCS. The first step is to determine the objective of the LCS, which is defined by the combat mission it is currently performing. We assume that the objective of an LCS is to find and destroy enemy targets. Then, by asking “How”, the capabilities of the LCS can be decomposed into three specific activities (mine warfare, anti-submarine warfare and anti-surface warfare). It includes six different systems: Armed Helicopter for Surveillance and Attack Missions (MH-60R), Helicopter for Airborne Mine Counter-Measure Missions (MH-60S), Unmanned Air Vehicle (UAV), Unmanned Surface Vehicle (USV), Remote Mine Hunting System (RMS) and the LCS. A complete hierarchy of LCS allows SoS system engineers to find the relationship between capabilities, activities and constituent systems effectively.

2.2. Challenges of Inter-Dependency Analysis of the SoS under Uncertainty

One of the difficulty of inter-dependency analysis of the SoS is estimating the properties of the constituent systems (e.g., failure probabilities). Due to the lack of information of nascent systems, it is a challenge to analyze the structure and capabilities of an SoS using probabilistic methods. In this case, expert opinions can be used to estimate the properties of constituent systems. The probability related to the measure of belief in a proposition, or more generally, to a lack of complete knowledge, is called subjective probability [43]. On the other hand, the probability estimated by using historical data is called frequentist probability [44]. SoS system engineers may need to integrate two probabilities to estimate properties more accurately. Therefore, we need to find a method that supports the use of subjective and frequentist probabilities.

In the past, significant progress has been made on the inter-dependency analysis of the SoS. A Markov chain where the transition probabilities are defined as the dependency strengths between systems was used to design an SoS with significant inter-dependency between constituent systems [45]. Functional Dependency Network Analysis was developed to evaluate the effect of topology and possible degraded functioning of one or more systems on the operability of each system in the network [46]. However, none of the above methods can effectively handle the challenges of applying expert opinions and dynamically updating the system over time. Therefore, a Bayesian Network method is proposed to provide a formal framework for tackling the challenges discussed above. There are four reasons to choose the BN for inter-dependency analysis:

It allows an intuitive and visual representation of inter-dependency in the SoS and provides a way for fusing field data, statistical analysis results, simulation outputs, analytical equations, and expert opinions in one summary model;
It takes into account the uncertainty state of the constituent system, which also influences the completion of activities and achievement of SoS capabilities;
It can fuse new observations with prior information to update the state of the system in real time;
Uncertainty of the SoS can be explicitly calculated through the system state values and by propagating the uncertainty through the network.

3. Bayesian and Dynamic Bayesian Networks

3.1. Bayesian Network

Bayesian Networks (BNs), also known as directed acyclic graph models or Bayesian belief networks, are developed on the basis of Bayesian theory and are probability graph models [47]. The Bayesian theory can be simply expressed by the following formula:

P (A | B) = \frac{P (B | A) P (A)}{P (B)},

(1)

where

P (A | B)

is the belief for hypothesis A upon observing evidence B.

P (B | A)

is the probability that B is observed if A is true, and

P (B)

is the probability that the evidence occurs.

P (A | B)

is known as posterior probability and

P (A)

is called prior probability [48].

BN can investigate the properties of a conditional probability distribution of a set of random variables according to the topology of the probability graph. One of the advantages of the BN is the ability to represent joint probabilities

P (Y_{1}, Y_{2}, \dots, Y_{n})

using the conditional probabilities of all variables

(Y_{1}, Y_{2}, \dots, Y_{n})

:

P (Y_{1}, Y_{2}, \dots, Y_{n}) = P (Y_{1} | Y_{2}, \dots, Y_{n}) P (Y_{2} | Y_{3}, \dots, Y_{n}) \dots P (Y_{n - 1} | Y_{n}) P (Y_{n}) .

(2)

In practical applications, nodes in the BN may not only be influenced by their parent nodes but also by certain potential factors. The NoisyOR function relaxes the assumption that a variable will be recorded as a True state only if a parent is in the true state [49]. The NoisyOR function introduces a new parent node named the “leak” node, which means that there are some hidden nodes can make child node appear as true. For example, there are n factors

(A_{1}, A_{2}, \dots, A_{n})

that affect the appearance as true of node Z, and the probability of Z being true is

w_{i}

if and only if

A_{i}

is true, i.e.,

w_{i} = P (Z = T r u e | A_{i} = T r u e, A_{j} = F a l s e, for each j \neq i)

. If when all parent nodes of Z are false, the probability of node Z being true is l, and the NoisyOR function of Z can be written as

N o i s y O R (A_{1}, w_{1}, A_{2}, w_{2}, \dots, A_{n}, w_{n}, l) .

(3)

The conditional probability distribution of Z can be calculated using the NoisyOR:

P (Z = f a l s e | A_{1}, A_{2}, \dots, A_{n}) = (1 - l) \prod_{i = 1}^{n} [1 - w_{i}],

(4)

P (Z = t r u e | A_{1}, A_{2}, \dots, A_{n}) = 1 - P (Z = f a l s e | A_{1}, A_{2}, \dots, A_{n}) .

(5)

3.2. Dynamic Bayesian Network

BN can be used to analyze static systems or SoSs, which is not the case in a dynamic and continuously changing world, but it does not consider the effects of time on the system state and SoS structure. In general, the resilience of an SoS tends to be a process rather than a state which raises a requirement for a tool that can account for the changes of SoS states over time, such as Dynamic Bayesian Networks (DBNs). DBNs are BNs that connect different variables and adjacent time slices, and they can be seen as a model that combines the BN with a dynamic state and structure [50]. A DBN considers the interrelationship between the external influencing factors and the internal factors of an SoS, it can reflect the probabilistic dependency between nodes, and it describes the change in the nodes’ states over time. It is an extension of the BN by adding the time dimension. Therefore, DBNs have a wider range of application than general BNs. A DBN is an acyclic graphical model for statistical processes. The joint probability of the variables under the continuous time steps can be calculated using the following formula:

P (Z_{t} | Z_{t - 1}, Z_{t - 2}, Z_{t - 3}, \dots, Z_{1}) = \prod_{i = 1}^{N} P (Z_{t}^{i} | P a (Z_{t}^{i})),

(6)

where

Z_{t}^{i}

is the

i^{t h}

node in the

t^{t h}

time step.

P a (Z_{t}^{i})

are the parent nodes of Z, which can be in the same or the previous time steps.

We assume that the structure of DBN will not change over time, so the variables in the DBN remain unchanged. Therefore, the probability distribution of variables over all time steps can be obtained by expanding the DBN as follows:

P (Z_{1 : T}) = \prod_{t = 1}^{T} \prod_{i = 1}^{N} P (Z_{t}^{i} | P a (Z_{t}^{i})) .

(7)

Hulst et al. [51] put forward an extension of the original DBN, which introduced nodes connected only to the first and last time step of the DBN. These nodes are called anchor nodes and terminal nodes.

Anchor node (A): a node outside the temporal plate of a DBN that only affects the nodes in the first time step;
Terminal nodes (T): a node outside the temporal plate of a DBN that only affects the nodes in the last time step.

Figure 3a depicts an unexpanded DBN with anchor and terminal nodes, where the nodes in the temporal plate (dashed box) represent variables whose states are constantly changing over time, and nodes

A, C, T

represent static variables. A solid line between nodes represents the causal relationship of two variables. Moreover, a dashed line shows the casual relationship of two nodes between different time steps, where the number “1” indicates that this node only affects its child node in the next time step. Figure 3b,c show a second-order DBN and an expanded DBN where the variables that are inside the temporal plate are connected with one another through temporal lines and appear in each time step. In the case, the joint probability distribution of the DBN across time steps can be expressed as follows:

P (A, U_{1 : T}, C, T) = P (A) P (C) \prod_{i = 1}^{N} {P (U_{1}^{i} | P a (U_{1}^{i}), A, C)} \prod_{t = 2}^{T} \prod_{i = 1}^{N} {P (U_{t}^{i} | P a (U_{t}^{i}), C)} P (T | P a (T)),

(8)

where

U^{i}

are the variables in the temporal plate,

P a (U_{t}^{i})

are the parent nodes of

U_{t}^{i}

, N is the number of variables in

P a (U_{t}^{i})

, and T represents the number of time steps.

As mentioned above, one main characteristic of a DBN is that variables are connected through different time steps, which provides a basis for us to analyze the change process of SoS capabilities. Figure 4 shows how to obtain the change process of capability using DBNs as well as the stages of absorption, adaptation, and restorability for an SoS to take effect in the process. In Figure 4, when t = 0, it represents the initial state of an SoS. The first step (t = 1) is dedicated to observing the impact caused by the attack on the SoS capability. In the second time step, the absorption and adaptation of the SoS play a role in reducing the loss of the SoS capability. At time n, the SoS capability drops to a minimum level, and immediately, the restorability comes into play; the SoS capability begins to recover gradually and return to the initial state at the

T^{t h}

time step. At this point, we obtain the whole change process of the SoS capability, which helps us analyze the resilience of an SoS in the next step.

4. Resilience Metric and Resilience-Based System Importance

4.1. Resilience Analysis Metric

When designing the resilience of an SoS, it is necessary for SoS system engineers to quantitatively analyze the resilience state of different design schemes in order to make the best design decision. Therefore, the analysis of SoS resilience is the basic part of SoS resilience design.

According to the description of Figure 4, the change process of SoS capability can be mainly divided into four stages: namely, the initial stability stage, the decline stage, the recovery stage and a new stability stage. The role of resilience is mainly reflected in the decline and recovery stage. As a consequence, the existing metrics of resilience usually focus on the above two stages. Bruneau et al. [23] defined resilience as the ability of an SoS to absorb the impact of an attack and recover quickly after a reduction in capability. As shown in Figure 5, Bruneau defined the value of resilience as the area of the shadow region between the actual SoS capability and the expected SoS capability:

R L_{s} = \int_{t_{1}}^{t_{3}} [C (t_{1}) - C (t)] d t,

(9)

where

t_{1}

refers to the time when the attack occurs,

t_{3}

is the time when SoS capability is fulled restored,

C (t)

is the capability of SoS at time t, and

C (t_{1})

indicates the expected SoS capability. Bruneau assumed that the capability of the new stability stage was the same as in the initial stability stage, i.e.,

C (t_{1}) = C (t_{3})

.

Based on the method of Bruneau, Cimellaro [52] proposed a measure of resilience using the integral of SoS capability across the change process:

R_{s} = \int_{t_{1}}^{t_{3}} C (t) d t .

(10)

However, the above methods do not consider the impact of the time span during which the SoS capability is not in the expected state on the resilience analysis. For example, for two different SoS design schemes (A and B), we assume that the capability reduction and recovery are both linear processes, and schemes A and B have the same minimum capability after being attacked, but scheme A requires a longer recovery time. According to the definition of resilience, it is clear that the resilience level of scheme B is better than that of scheme A, but it is calculated that scheme A is greater than scheme B using the method of Cimellaro. We introduce the recovery time into the method of Cimellaro, and the SoS resilience based on the change process of capability can be calculated as follows:

R_{S o S} = \frac{\int_{t_{1}}^{t_{3}} C (t) d t}{t_{3} - t_{1}} .

(11)

4.2. Resilience-Based System Importance

The reliability system importance metric measures the amount by which the failure of a system can affect the reliability of an SoS [53]. Based on the reliability context, the resilient-based system importance (RSI) is defined as the amount by which the resilience of an SoS is reduced by a system’s failure [54]. A prerequisite for calculating the RSI is a resilience metric which has been proposed in Section 4.1.

Let

R^{+ i}

be the resilience of the SoS when the

i^{t h}

system is available and

R^{- i}

when it is not. Then, the RSI of the

i^{t h}

system is the difference between

R^{+ i}

and

R^{- i}

, and it is a positive value. The following Algorithm 1 explains the steps to calculate the value of RSI.

Algorithm 1 Resilience-based System Importance Algorithm.

Input: the

i^{t h}

system of SoS
Output:

R S I_{i}

1: Calculate the resilience of the

i^{t h}

system (

R^{+ i}

)
2: Calculate the resilience of the

i^{t h}

system (

R^{- i}

)
3: Calculate the resilience-based system importance

R S I_{i}

(

R S I_{i}

=

R^{+ i}

−

R^{- i}

)
4: Return

R S I_{i}

SoS system engineers want to maximize the resilience of a new SoS, but it conflicts with the realities of constrained environment such as tight budgets. Before designing the proposed SoS resilience, SoS system engineers must understand the relationships between resilience optimization and resource allocation by conducting a trade-off analysis between these two. With continuing budget cuts, it is hard for SoS system engineers to make such a choice, but RSI can help solve this problem. If the system has a higher RSI, then the more budget it should receive for designing a resilient SoS. Therefore, we design a budget allocation model based on RSI:

B_{i} = \frac{R S I_{i}}{\sum_{j = 1}^{N} R S I_{j}} \cdot B, B_{i} \leq B_{m a x}

(12)

where B is the total budget limit for designing a resilient SoS,

B_{i}

is the budget allocated to the

i^{t h}

system,

B_{m a x}

is the maximum budget needed to optimize the system, and N denotes the number of systems needed to be optimized in the SoS.

5. Design Resilience under Budgetary Constraint

In a complex SoS, each system has its own absorption and recovery functionalities in the face of a development failure. Therefore, we analyze the functionality curve of a single system, and on this basis, establish the system functionality optimization model.

5.1. Single System Functionality Curve

Systems are the constituent units of an SoS, and they are also the lowest level representation of the SoS capabilities in a hierarchy structure. Therefore, it is of great significant to analyze the functionality curve of each system to understand the overall SoS capabilities. The functionality of a system is the level at which the system performs its assignment. For example, the functionality of a water transmission pipeline is the amount of flow that it carries. Like the capabilities of SoS, two characters that influence system functionality after an attack are absorption and rapid recovery (Figure 6a), where absorption can reduce the negative effects of the attack and rapid recovery can reduce the time required for system recovery. Keeping all other factors fixed, a system with less effects and recovery time can provide more resilience. In this section, we build a system optimization model under budgetary constraints, in which we study different outcomes of optimization schemes on a single system. We will use the following notation:

$L_{i}$ : The maximum of loss in the functionality of the $i^{t h}$ system;
$T_{i}$ : Recovery time of the $i^{t h}$ system;
$a_{i}$ : Improvement in the absorption of the $i^{t h}$ system (in percent);
$r_{i}$ : Improvement in the recovery time of the $i^{t h}$ system (in percent);
$B_{i}$ : The total budget limit for the $i^{t h}$ system.

The functionality of each system is normalized by dividing the real functionality

f_{i}

by the expected functionality

E (f_{i})

, yielding a value between 0% and 100%. When the value of functionality is 100%, it indicates that the system operates normally or provides extremely high resilience. Different optimization schemes on a single system can result in different outcomes for improving absorption and rapid recovery. The smallest

a_{i}

and

r_{i}

are both 0, meaning that

L_{i}

and

T_{i}

remain unchanged.

For the convenience of calculation, we assume that the system functionality can be estimated by two line segments,

l_{1}

and

l_{2}

(Figure 6b). Several main elements are required to determine the calculation formula of the system functionality, such as

t_{1}

,

t_{2}

,

t_{3}

and

L_{i}

. Then, at any time t, the functionality of the

i^{t h}

system can be calculated as

f_{i, t} = \{\begin{matrix} 1 - \frac{L_{i}}{t_{i, 2} - t_{i, 1}} t + \frac{L_{i} t_{i, 1}}{t_{i, 2} - t_{i, 1}}, t_{i, 1} \leq t \leq t_{i, 2} \\ \frac{L_{i}}{t_{i, 3} - t_{i, 2}} t + 1 - \frac{L_{i} t_{i, 3}}{t_{i}, 3 - t_{i, 2}}, t_{i, 2} \leq t \leq t_{i, 3} \\ 1, 0 \leq t \leq t_{i, 1} o r t \geq t_{i, 3} \end{matrix}

(13)

5.2. System Functionality Optimization under Budgetary Constraint

According to Equation (12), the budget amount for the

i^{t h}

system can be determined by its RSI. For a certain linear functionality function and budget limit

B_{i}

of the

i^{t h}

system, the new maximum of loss in the functionality

L_{i, n e w}

and recovery time

T_{i, n e w}

are as follows:

\begin{matrix} L_{i, n e w} = L_{i} (1 - a_{i}), and T_{i, n e w} = T_{i} (1 - r_{i}) . \end{matrix}

(14)

Inspired by the resilience index, we are of the opinion that the optimization goal is to allocate the budget to the systems and to determine the absorption and recovery enhancements in such a way that a minimum loss of functionality overall is achieved in the process. In this study, we consider that reducing the loss of functionality is as important as the recovery time. On this basis, we establish a general nonlinear optimization model of the system functionality under the following budgetary constraint:

\begin{matrix} m a x F_{i} = \int_{t_{1}}^{t_{3}} f_{i} (t) d t, \\ β_{i, L} a_{i} + β_{i, T} r_{i} \leq B_{i}, \\ L_{i} (1 - a_{i}) \leq L_{A c c e p t}, \\ T_{i} (1 - r_{i}) \leq T_{A c c e p t}, \\ i \in [1, N]; T_{A c c e p t}, L_{A c c e p t} \geq 0; 0 \leq a_{i}, r_{i} \leq 1, \end{matrix}

(15)

where N is the number of systems, while

β_{i, L}

and

β_{i, T}

are the budgets required to optimize the functionality loss and recovery time of the

i^{t h}

system, respectively.

L_{A c c e p t}

and

T_{A c c e p t}

are the minimum acceptable level of the functionality loss and recovery time.

In order to solve the model more conveniently and expand its application, the optimization scheme and the amount of budget consumed are represented discretely. In the first step, for a system that needs to be optimized, there are multiple schemes

P_{i} = (s_{i, 1}, s_{i, 2}, s_{i, 3}, \dots, s_{i, n}, T_{A c c e p t}, L_{A c c e p t})

, where each scheme is composed of three parameters:

a_{i, j}

,

r_{i, j}

and

c_{i, j}

, i.e.,

s_{i, j} = (a_{i, j}, r_{i, j}, c_{i, j})

. For all the systems, the optimization scheme

s_{i, j} = (0, 0, 0)

is included, indicating that the system has not been optimized. The goal of the optimization problem is to select the optimal scheme in

P_{i}

. When the scheme and budget consumption are represented discretely, the system functionality optimization model under budgetary constraints is modeled as follows:

\begin{matrix} m a x F_{i, j} = \int_{t_{1}}^{t_{3}} f_{i, j} (t) d t, \\ \sum_{j} (β_{i, L} a_{i, j} + β_{i, T} r_{i, j}) x_{i, j} \leq B_{i}, \sum_{j} x_{i, j} = 1, \\ L_{i} (1 - a_{i, j}) x_{i, j} \leq L_{A c c e p t}, \\ T_{i} (1 - r_{i, j}) x_{i, j} \leq T_{A c c e p t}, \\ i \in [1, N], j \in [1, n]; T_{A c c e p t}, L_{A c c e p t} \geq 0; 0 \leq a_{i, j}, r_{i, j} \leq 1, \end{matrix}

(16)

where j represents the

j^{t h}

scheme in

P_{i}

, while n is the number of available schemes for the

i^{t h}

system.

x_{i, j}

is a binary variable; it is 1 if the

j^{t h}

scheme is selected. Next, we will introduce the whole process of SoS resilience analysis and design through a Next Generation Air Transportation System (NextGen) case.

6. Numerical Example and Results

The United States Federal Aviation Administration (FAA) controls the world’s largest air traffic control system. The system controls nearly 50,000 flights a day, covering an area of nearly

7.61 \times 10^{7} {km}^{2}

, which is equivalent to 15% of the earth’s surface area. Continuously rising demand in the National Airspace System (NAS) is one of the major contributing factors that aggravate system-wide congestion and delays. To solve this, the FAA proposed new operating concepts and technologies to achieve the NextGen.

6.1. Hierarchical Representation of the NextGen

In order to solve the problem of traffic congestion and delay, FAA proposed the development of the NextGen project in 2004. The vision of the NextGen is to build a near and midterm SoS developed by the FAA to increase the traffic capacity of the airspace system, improve flight efficiency, enhance system predictability, enhance navigation capacity, improve resilience and improve the safety of air transportation.

However, complex interdependencies between the new technologies and its constituent systems create challenges when assessing the impact of the technologies and systems on the NextGen. We use a hierarchical representation to understand how the activities and new technologies affect the capability of the NextGen. The hierarchy of the NextGen is decomposed into three levels: capability, activity and technology (composed by systems). In this paper, we focus on the reduction in flight delay. Thus, we define the NextGen capability as “provide service with lowest delay” and can measure the capability of the NextGen by estimating the percent of ideal delay reduction obtained. The capability of the NextGen can be achieved by implementing three activities: departure delay reduction, airborne delay reduction and taxi-in delay reduction. Each activity is satisfied by a set of NextGen technologies (Figure 7). Each new NextGen technology can be achieved by completing a set of acquisition programs or optimizing existing systems (Table 1).

6.2. Using BN to Evaluate the Capability of NextGen

Assessing the impact of new technologies and systems on the capability of the NextGen requires a modeling and simulation tool. In this study, GeNIe3.0 software is used to model the BN of the NextGen. The advantage of GeNIe3.0 is that it can intuitively display the established BN, and moreover, support to model the DBN. Two of the busier airports are selected for this simulation: Atlanta International Airport (AIA) and Chicago O’Hare International Airport (CIA). AIA and CIA are used as the departure and arrival airports, respectively. The proposed BN focuses on quantifying the impact of various factors on the arrival delay. The conditional probability in the BN can be obtained through the parameter learning function of GeNIe3.0, and the relationship between the new technologies and the constituent systems is represented by the NoisyOR function, as shown in Equation (17) [29,49].

\begin{matrix} P (T_{1} = U n a v a i l a b l e) = N o i s y O R (P_{2}, 0.14, P_{11}, 0.18, P_{12}, 0.15, P_{15}, 0.21, P_{16}, 0.1, 0.1), \\ P (T_{2} = U n a v a i l a b l e) = N o i s y O R (P_{5}, 0.05, P_{6}, 0.1, P_{7}, 0.07, P_{8}, 0.16, P_{10}, 0.12, P_{12}, 0.19, P_{16}, 0.11, P_{17}, 0.1, 0.07), \\ P (T_{3} = U n a v a i l a b l e) = N o i s y O R (P_{3}, 0.17, P_{4}, 0.1, P_{9}, 0.13, P_{11}, 0.1, P_{12}, 0.15, P_{13}, 0.08, P_{14}, 0.13, P_{16}, 0.2, P_{18}, 0.1, 0.03), \\ P (T_{4} = U n a v a i l a b l e) = N o i s y O R (P_{1}, 0.1, P_{3}, 0.15, P_{4}, 0.1, P_{11}, 0.08, P_{12}, 0.13, P_{13}, 0.11, P_{14}, 0.2, P_{16}, 0.1, P_{18}, 0.12, 0.05), \\ P (T_{5} = U n a v a i l a b l e) = N o i s y O R (P_{4}, 0.15, P_{5}, 0.10, P_{8}, 0.20, P_{9}, 0.08, P_{10}, 0.07, P_{11}, 0.12, P_{12}, 0.13, P_{16}, 0.08, 0.06), \\ P (T_{6} = U n a v a i l a b l e) = N o i s y O R (P_{3}, 0.08, P_{4}, 0.15, P_{9}, 0.17, P_{11}, 0.20, P_{12}, 0.05, P_{16}, 0.10, 0.10), \\ P (T_{7} = U n a v a i l a b l e) = N o i s y O R (P_{2}, 0.12, P_{4}, 0.13, P_{6}, 0.05, P_{11}, 0.08, P_{12}, 0.19, P_{13}, 0.15, P_{16}, 0.13, P_{17}, 0.14, 0.08) . \end{matrix}

(17)

We set two states for the NextGen capability and three activities: on time or delay, and the punctuality rate of flights is used to represent the capability of the NextGen. Figure 8 shows the analysis result of the NextGen capability. As can be seen, when all systems are available, the punctuality probabilities of the three activities are 90%, 92%, and 91%, respectively, and the probability that a flight arrives on time is 84%.

In addition, we can use the detected evidence to update the probability distribution in the BN. For example, if it is known that technologies

T_{3}

and

T_{4}

are unavailable, the probability distribution of the NextGen capability is shown in Figure 9. Due to the unavailability of

T_{3}

and

T_{4}

, the delay probability of the airborne process decreases from 92% to 63%, and the NextGen capability decreases from 84% to 76%.

We also analyze the impact of different new technologies on the percentage of arrival delay reduction. As shown in the Figure 10, the larger the value of the Y-axis, the better the new technology is on reducing the arrival delay.

T_{2}

and

T_{7}

have the greatest impact, so they should be prioritized for construction. On the contrary,

T_{3}

and

T_{4}

have less impact and should be considered to be constructed last when budget and time are insufficient.

Using the method proposed in this study, we can analyze how the sequence of technologies failures affects the NextGen capability, as shown in Figure 11. We first order the failures of technologies from the least pivotal to the most pivotal to determine the best-case capability decline pattern. Doing the converse yields the worst-case pattern. In the worst case, the NextGen capability decline is fast first and then slow, while the best case is the opposite. The general case is between the worst and best cases.

6.3. Using DBN to Analyze the Resilience of NextGen

In this section, a resilience analysis model of the NextGen based on DBN is established. In order to analyze the resilience of the NextGen more specifically, we make the following assumptions:

(1): The development failures of system $P_{11}$ , $P_{12}$ , $P_{16}$ lead to the reduction in the functionalities at the same time step T in the temporal plate;
(2): The functionalities of the system $P_{11}$ , $P_{12}$ , $P_{16}$ decline to the minimum and then recover slowly; the functions of their functionality have been known.

Based on the above assumptions, Figure 12 presents the network structure using the GeNIe3.0. A color code is used to distinguish the variables in the DBN. Nodes that are outside the box are static systems, which means that their functionality remains unchanged. Nodes in the temporal plate are dynamic, and their functionality will change over time steps. The failure variable interferes at the first time step. Using the DBN shown in Figure 12, we can obtain the dynamic change process of the NextGen capability (Figure 13). Based on the resilience analysis metric proposed in Section 4, the resilience result of the NextGen is 0.696.

In addition, we also analyze the impact of different degrees of development failures on NextGen resilience; the results are shown in Figure 14 and Table 2. It can be seen that the higher the degree of system development failure, the worst the resilience of the NextGen. Therefore, reducing the capability loss and recovery time under various failure degrees can effectively improve the SoS resilience.

6.4. Resilience Design under Budget Constraint

We perform our resilience design method on the above NextGen case. By following the steps in Section 4.2, we calculate the

R^{+ i}

and

R^{- i}

of each system. Algorithm 1 outputs the RSI as in Figure 15. An absence of system

P 12

has the highest impact on the resilience of the NextGen, and the absence of system

P_{1}

has the lowest effect.

SoS system engineers can determine the budget allocation through the

R S I_{i}

of the system

P_{i}

. And then, the allocated budget can be used to optimize the system functionality, so as to achieve the purpose of designing the SoS resilience. For each system

P_{i}

, its

f_{i, j} (t)

,

β_{i, L}

,

β_{i, T}

,

L_{A c c e p t}

and

T_{A c c e p t}

are all known, as shown in Table 3. In order to discretize the optimization schemes and for the convenience of calculation, we assume that the values of

a_{i}

and

r_{i}

can only be taken from

(0, 10 %, 20 %, \dots, 90 %, 100 %)

.

Now, the parameters for the optimization problem under budgetary constraints are ready. Firstly, we apply these parameters and optimize the budget allocation for a budget limit of $800,000. We take system

P_{6}

as an example to demonstrate the budget allocation process. The

R S I

of

P_{6}

is 0.103 and the sum of the

R S I_{i}

of all systems is 1.457; then, the total budget that

P_{6}

receives can be calculated as

B_{6} = $ 800,000 \times 0.103 \div 1.457 = $ 56,555

. The optimization model of

P_{6}

under budgetary constraints is established as follows:

\begin{matrix} m a x F_{6, j} = \int_{t_{1}}^{t_{3}} f_{i, j} (t) d t, \\ \sum_{j} (β_{6, L} a_{6, j} + β_{6, T} r_{6, j}) x_{6, j} \leq B_{6}, \sum_{j} x_{6, j} = 1, \\ 0.6 \times (1 - a_{6, j}) x_{6, j} \leq 40 %, \\ 10 \times (1 - r_{6, j}) x_{6, j} \leq 7, \\ j \in [1, n]; 0 \leq a_{i, j}, r_{i, j} \leq 1, \end{matrix}

(18)

Other systems can establish their own optimization models according to this. Solving the optimization models yields the solution in Table 4. We calculate the initial resilience metric of the NextGen in the case of Table 3, and the result is 0.51. The optimal investment in Table 4 with a total cost of $743,372 improves the resilience level to 0.72.

To find out the effect of different budget limits on the resilience, further experiments are made on different budget scenarios. We choose the values of $400,000, $600,000, $800,000, $1,000,000, $1,200,000 and $1,400,000 for the budget limit. For all these budgets, we find the corresponding improvement in the NextGen resilience (Figure 16).

As can be seen from Figure 16, before the budget constraint reaches a certain threshold, the resilience is proportional to the amount of budget, and the greater the budget constraint, the better the resilience of the NextGen. For the budgets of $1,200,000 and $1,400,000, the optimal scheme yields the same amount of investment ($1,101,480), and the optimal solution remains the same. Hence, the highest budget needed to enhance the resilience of the NextGen is $1,101,480.

7. Discussion and Conclusions

The interaction among the constituent systems within an SoS induces interdependencies that are critical to achieving the SoS capability. This system interdependency can also lead to cascading failures in the SoS. In order to better analyze the complex interdependency within an SoS, we formulate a general hierarchy structure to fuse information from the system level to SoS capability level via the activity level. Then, the SoS capability at a certain moment is analyzed through the BN with certain conditions on constituent systems. A DBN is a BN extended with additional mechanisms that are capable of modeling influences over time. It extends the classical BN by adding the time dimension. Based on its features, we used a DBN to obtain the dynamic change process of the SoS capability. The resilience metric suggested by Cimellaro was adopted and optimized, and a metric was introduced to measure the resilience-based system importance, which was used in the budget allocation optimization problem. The mathematical programming formulation was introduced to allocate a budget to systems while maximizing the resilience of the SoS within the budget. We adopted a novel method to reduce computational complexity by discretizing the optimization schemes. The optimal scheme determined the optimal level of enhancement in the system’s absorption and recovery time.

We applied our resilience analysis and design method on a Next Generation Air Transportation (NextGen) case and discussed the effects of different budget limits on the resilience of the NextGen. In the NextGen SoS, SoS system engineers can use the proposed method to evaluate the impact of combined NextGen technologies on the performance of the air transportation system. This method also enables the identification of the criticality of a NextGen technology in terms of performance improvement and the determination of the development order of the NextGen technologies, which reduces the risk of mission failure. Cost–benefit analysis provides a feasible design space given the cost and resilience improvement constraints, which allows SoS system engineers to select the best SoS design scheme for the given conditions.

The purpose of the case study is a demonstration of the proposed approach in a useful context. We are not making any validated, novel conclusions about the real NextGen decision-making problem due to the challenges regarding the lack of required information and very large research scope. But it does not mean that the case study of the NextGen is useless. On the contrary, the case study shows the benefits of using the proposed approach: to quantify propagating effects within an SoS, to analyze the capability and resilience level of the SoS, to identify the criticality of constituent systems, to analyze all possible resilience design schemes and select the optimal one, and to calculate the highest budget needed to enhance the resilience of an SoS. To analyze the case study, we use notional values for the input information and make many assumptions. Once the actual NextGen decisionmakers know the required inputs and their assumptions, they can use the approach proposed in this paper on their problems. So, this research provides a good guidance and reference for SoS system engineers to design a resilience SoS.

There are a number of limitations in the proposed resilience analysis and design approach. Firstly, the development and analysis process of BNs and DBNs involves a lot of subjectivity, which is unavoidable because one of the main characteristics of BNs is to replace missing data with expert knowledge. Another limitation is that as the size of the analyzed system or SoS and the number of components increase, the computational complexity of the BN and DBN increases exponentially while also involving more subjectivity. In addition, a BN is a directed acyclic graph, which limits the possibility of modeling the interdependence between two variables in real life. Finally, when modeling the DBN, determining which variables will affect other variables at another time step may be challenging.

This research is data-driven, and the computational complexity increases exponentially with the increase in the scale of the SoS. Therefore, more research is needed to develop faster and more efficient algorithms and to reduce the computational complexity. Design space exploration may be one of the research directions to address this issue. Additionally, we defined resilience metrics based on the robustness and recoverability viewpoints. SoS resilience can have other definitions according to different viewpoints: flexibility, redundancy and resourcefulness. They can also be core attributes of the SoS resilience. Hence, more work is needed to quantify other SoS attributes and incorporate them into the resilience analysis framework based on which more comprehensive resilience analysis metrics will be proposed. Finally, external environmental factors may lead to a sustained evolution of SoS capabilities/requirements. Thus, more research is needed to to develop a means that can consider SoS evolution issues.

Author Contributions

Conceptualization, T.J.; methodology, T.J.; software, H.Y.; formal analysis, J.W. and J.M.; investigation, X.L. and A.L.; resources, J.W.; data curation, J.M.; writing—original draft, T.J. and H.Y.; writing—review and editing, J.W. and J.M.; supervision, T.J. The first two authors contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is supported by the National Natural Science Foundation of China under fund numbers NSFC No. 62202474.

Data Availability Statement

All the relevant data are already included in the main manuscript.

Acknowledgments

The authors are grateful to the anonymous referees for their constructive comments on the earlier versions of this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SoS	System-of-System
BN	Bayesian Network
DBN	Dynamic Bayesian Network
LCS	littoral combat ship
RSI	resilience system importance
NextGen	Next Generation Air Transportation

References

Zhou, H.; Mao, Y.; Guo, X. An Improved Multi-Objective Particle Swarm Optimization-Based Hybrid Intelligent Algorithm for Index Screening of Underwater Manned/Unmanned Cooperative System of Systems Architecture Evaluation. Mathematics 2023, 11, 4389. [Google Scholar] [CrossRef]
Konur, D.; Farhangi, H.; Dagli, C.H. On the flexibility of systems in system of systems architecting. Procedia Comput. Sci. 2014, 36, 65–71. [Google Scholar] [CrossRef]
Ed-daoui, I.; Itmi, M.; Hami, A.E.; Hmina, N. Systems-of-Systems and Regional Resilience Assessment. Complex Syst. Smart Territ. Mobil. 2021, 12, 127–144. [Google Scholar]
Koo, J.I.; Jeong, S.J. Improved Technology Readiness Assessment Framework for System-of-Systems from a System Integration Perspective. IEEE Access 2024, 12, 23827–23853. [Google Scholar] [CrossRef]
He, H.; Wang, W.; Zhu, Y.; Li, X.; Wang, T. Function Chain-Based Mission Planning Method for Hybrid Combat SoS. IEEE Access 2019, 7, 100453–100466. [Google Scholar] [CrossRef]
Hynes, W.; Trump, B.D.; Kirman, A.; Haldane, A.; Linkov, I. Systemic resilience in economics. Nat. Phys. 2022, 18, 381–384. [Google Scholar] [CrossRef]
Shen, Z.; Ji, C.; Lu, S. Transportation network resilience response to the spatial feature of hazards. Transp. Res. Part D Transp. Environ. 2024, 128, 104121. [Google Scholar] [CrossRef]
Shpak, N.; Dvulit, Z.; Maznyk, L.; Mykytiuk, O.; Sroka, W. Validation of ecologists in enterprise management system: A case study analysis. Pol. J. Manag. Stud. 2019, 19, 376–390. [Google Scholar] [CrossRef]
Jackson, S. Overview of resilience and theme issue on the resilience of systems. Insight 2015, 18, 7–9. [Google Scholar] [CrossRef]
Farhangi, H.; Konur, D.; Dagli, C.H. Combining Max-min and Max-max Approaches for Robust SoS Architecting. Procedia Comput. Sci. 2016, 95, 103–110. [Google Scholar] [CrossRef]
Cuong, T.N.; Ngoc, L.L.; Kim, H.S.; You, S.S. Data analytics and throughput forecasting in port management systems against disruptions: A case study of Busan Port. Marit. Econ. Logist. 2023, 25, 61. [Google Scholar] [CrossRef]
Hosseini, S.; Barker, K.; Ramirez-Marquez, J.E. A review of definitions and measures of system resilience. Reliab. Eng. Syst. Saf. 2016, 145, 47–61. [Google Scholar] [CrossRef]
Pregenzer, A. Joseph A. Burton Forum Award Lecture: Managing Nuclear and Biological Risks: Building Resilience through International Cooperation. In Proceedings of the APS April Meeting Abstracts, Atlanta, GA, USA, 31 March–3 April 2012. [Google Scholar]
Allenby, B.; Fink, J. Toward inherently secure and resilient societies. Science 2005, 309, 1034–1036. [Google Scholar] [CrossRef]
Haimes, Y.Y. On the definition of resilience in systems. Risk Anal. Int. J. 2009, 29, 498–501. [Google Scholar] [CrossRef] [PubMed]
Sheffi, Y. The Power of Resilience: How the Best Companies Manage the Unexpected; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
Adger, W.N. Social and ecological resilience: Are they related? Prog. Hum. Geogr. 2000, 24, 347–364. [Google Scholar] [CrossRef]
Rose, A. Economic resilience to natural and man-made disasters: Multidisciplinary origins and contextual dimensions. Environ. Hazards 2007, 7, 383–398. [Google Scholar] [CrossRef]
Wears, R. Resilience Engineering: Concepts and Precepts; Ashgate Publishing, Ltd.: Aldershot, UK, 2006. [Google Scholar]
Yushi, L. Construction Mechanism and Implementation of Resilient Command Information Systems. J. Command Control 2015, 1, 284–291. [Google Scholar]
Speranza, C.I.; Wiesmann, U.; Rist, S. An indicator framework for assessing livelihood resilience in the context of social–ecological dynamics. Glob. Environ. Chang. 2014, 28, 109–119. [Google Scholar] [CrossRef]
Cutter, S.L.; Barnes, L.; Berry, M.; Burton, C.; Evans, E.; Tate, E.; Webb, J. A place-based model for understanding community resilience to natural disasters. Glob. Environ. Chang. 2008, 18, 598–606. [Google Scholar] [CrossRef]
Bruneau, M.; Chang, S.E.; Eguchi, R.T.; Lee, G.C.; O’Rourke, T.D.; Reinhorn, A.M.; Shinozuka, M.; Tierney, K.; Wallace, W.A.; Von Winterfeldt, D. A framework to quantitatively assess and enhance the seismic resilience of communities. Earthq. Spectra 2003, 19, 733–752. [Google Scholar] [CrossRef]
Yodo, N.; Wang, P. Resilience modeling and quantification for engineered systems using Bayesian networks. J. Mech. Des. 2016, 138, 031404. [Google Scholar] [CrossRef]
Hossain, N.U.I.; Jaradat, R.; Hosseini, S.; Marufuzzaman, M.; Buchanan, R.K. A framework for modeling and assessing system resilience using a Bayesian network: A case study of an interdependent electrical infrastructure system. Int. J. Crit. Infrastruct. Prot. 2019, 25, 62–83. [Google Scholar] [CrossRef]
DeLaurentis, D.; Callaway, R.K. A system-of-systems perspective for public policy decisions. Rev. Policy Res. 2004, 21, 829–837. [Google Scholar] [CrossRef]
Groote, J.F.; Mousavi, M.R.; Reniers, M.A. A hierarchy of SOS rule formats. Electron. Notes Theor. Comput. Sci. 2006, 156, 3–25. [Google Scholar] [CrossRef]
DeLaurentis, D.A.; Crossley, W.A.; Mane, M. Taxonomy to guide systems-of-systems decision-making in air transportation problems. J. Aircr. 2011, 48, 760–770. [Google Scholar] [CrossRef]
Han, S.Y. System-of-Systems Architecture Analysis and Design Using Bayesian Networks. Ph.D. Thesis, Purdue University, West Lafayette, IN, USA, 2014. [Google Scholar]
Fletcher, S. Electric power interruptions curtail California oil and gas production. Oil Gas J. 2001, 99, 10–12. [Google Scholar]
de Rouffignac, A. Refineries could be subject to rolling blackouts. Oil Gas J. 2001. [Google Scholar]
Elhabbash, A.; Elkhatib, Y.; Nundloll, V.; Marco, V.S.; Blair, G.S. Principled and automated system of systems composition using an ontological architecture. Future Gener. Comput. Syst. 2024, 157, 499–515. [Google Scholar] [CrossRef]
Derhamy, H.; Eliasson, J.; Delsing, J. System of system composition based on decentralized service-oriented architecture. IEEE Syst. J. 2019, 13, 3675–3686. [Google Scholar] [CrossRef]
Sharkov, G.; Todorova, C.; Koykov, G.; Zahariev, G. A System-of-Systems Approach for the Creation of a Composite Cyber Range for Cyber/Hybrid Exercising. Inf. Secur. Int. J. 2021, 50, 129–148. [Google Scholar] [CrossRef]
Mathlouthi, W.; Saoud, N.B.B. Flexible composition of system of systems on cloud federation. In Proceedings of the 2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud), Prague, Czech Republic, 21–23 August 2017; pp. 358–365. [Google Scholar]
Phillis, Y.A.; Kouikoglou, V.S. System-of-Systems hierarchy of biodiversity conservation problems. Ecol. Model. 2012, 235, 36–48. [Google Scholar] [CrossRef]
Tamazyan, H.A.; Chubaryan, A.A. A hierarchy of determinative sequent systems with different substitution rules. Pattern Recognit. Image Anal. 2024, 34, 20–30. [Google Scholar] [CrossRef]
Matusik, J.G.; Mitchell, R.L.; Hays, N.A.; Fath, S.; Hollenbeck, J.R. The highs and lows of hierarchy in multiteam systems. Acad. Manag. J. 2022, 65, 1571–1592. [Google Scholar] [CrossRef]
Manual, J. Manual for the Operation of the Joint Capabilities Integration and Development System; US Department of Defense: Washington, DC, USA, 2012. [Google Scholar]
O’Rourke, R. Navy Littoral Combat Ship (LCS) Program: Background and Issues for Congress; Congressional Research Service: Washington, DC, USA, 2014. [Google Scholar]
Work, R.O. The Littoral Combat Ship: How We Got Here, and Why; Undersecretary of the Navy: Arlington, VA, USA, 2014. [Google Scholar]
Russell, J.C. Littoral Combat Ship: Is the US Navy Assuming Too Much Risk? Ph.D. Thesis, US Army Command and General Staff College, Fort Leavenworth, KS, USA, 2006. [Google Scholar]
Oakes, M.W. Statistical Inference: A Commentary for the Social and Behavioural Sciences; Wiley: Hoboken, NJ, USA, 1986. [Google Scholar]
Shafer, G. A Mathematical Theory of Evidence; Princeton University Press: Princeton, NJ, USA, 1976; Volume 42. [Google Scholar]
Sun, T.; Liu, D.; Liu, D.; Zhang, L.; Li, M.; Khan, M.I.; Li, T.; Cui, S. A new method for flood disaster resilience evaluation: A hidden markov model based on Bayesian belief network optimization. J. Clean. Prod. 2023, 412, 137372. [Google Scholar] [CrossRef]
Xu, P.C.; Lu, Q.C.; Xie, C.; Cheong, T. Modeling the resilience of interdependent networks: The role of function dependency in metro and bus systems. Transp. Res. Part A Policy Pract. 2024, 179, 103907. [Google Scholar] [CrossRef]
Chen, Y.; Li, X.; Wang, J.; Liu, M.; Cai, C.; Shi, Y. Research on the Application of Fuzzy Bayesian Network in Risk Assessment of Catenary Construction. Mathematics 2023, 11, 1719. [Google Scholar] [CrossRef]
Duttweiler, L.; Thurston, S.W.; Almudevar, A. Spectral Bayesian network theory. Linear Algebra Its Appl. 2023, 674, 282–303. [Google Scholar] [CrossRef] [PubMed]
Hossain, N.U.I.; El Amrani, S.; Jaradat, R.; Marufuzzaman, M.; Buchanan, R.; Rinaudo, C.; Hamilton, M. Modeling and assessing interdependencies between critical infrastructures using Bayesian network: A case study of inland waterway port and surrounding supply chain network. Reliab. Eng. Syst. Saf. 2020, 198, 106898. [Google Scholar] [CrossRef]
Shunqi, Y.; Ying, Z.; Xiang, L.; Yanfeng, L.; Hongzhong, H. Reliability analysis for wireless communication networks via dynamic Bayesian network. J. Syst. Eng. Electron. 2023, 34, 1368–1374. [Google Scholar] [CrossRef]
Hulst, J. Modeling Physiological Processes with Dynamic Bayesian Networks. Master’s Thesis, Faculty of Electrical Engineering, Mathematics, and Computer Science, University of Pittsburgh, Pittsburgh, PA, USA, 2006. [Google Scholar]
Cimellaro, G.P.; Reinhorn, A.M.; Bruneau, M. Seismic resilience of a hospital system. Struct. Infrastruct. Eng. 2010, 6, 127–144. [Google Scholar] [CrossRef]
Tabandeh, A.; Jia, G.; Gardoni, P. A review and assessment of importance sampling methods for reliability analysis. Struct. Saf. 2022, 97, 102216. [Google Scholar] [CrossRef]
Uday, P.; Marais, K.B. Resilience-based system importance measures for system-of-systems. Procedia Comput. Sci. 2014, 28, 257–264. [Google Scholar] [CrossRef]

Figure 1. A hierarchical representation of SoS.

Figure 2. Notional example of hierarchical representation of LCS.

Figure 3. The extension process of a DBN.

Figure 4. DBN is used to analyze the change process of SoS capability.

Figure 5. Change process of the SoS capability.

Figure 6. Single system functionality curve.

Figure 7. Hierarchical representation of the NextGen.

Figure 8. Capability analysis result of the NextGen.

Figure 9. Updated BN with known evidence.

Figure 10. Increase in NextGen capability with one new technology addition.

Figure 11. NextGen capability decline patterns with different sequences of technologies failures.

Figure 12. Analysis Results of the NextGen Capability Change Process.

Figure 13. Change process of the NextGen capability.

Figure 14. Change process of the NextGen capability with different failures.

Figure 15.

R S I

of systems in the NextGen.

Figure 15.

R S I

of systems in the NextGen.

Figure 16. Resilience improvement for different budget limits.

Table 1. Required systems to achieve the NextGen technologies.

Symbols of Systems	System Name	Supported Technologies
$P_{1}$	Free flight traffic management	$T_{4}$
$P_{2}$	Airport surface detection equipment	$T_{1}$ , $T_{7}$
$P_{3}$	En route automation modernization	$T_{3}$ , $T_{4}$ , $T_{6}$
$P_{4}$	Next Generation air-to-ground communication	$T_{3}$ – $T_{7}$
$P_{5}$	Standard terminal automation replacement	$T_{2}$ , $T_{5}$
$P_{6}$	Airport surveillance radar	$T_{2}$ , $T_{7}$
$P_{7}$	Aviation surface weather observation network	$T_{2}$
$P_{8}$	Integrated terminal weather system	$T_{2}$ , $T_{5}$
$P_{9}$	Instrument flight procedures automation	$T_{3}$ , $T_{5}$ , $T_{6}$
$P_{10}$	Terminal automation modernization replacement	$T_{2}$ , $T_{5}$
$P_{11}$	Automatic dependent surveillance broadcast	$T_{1}$ , $T_{3}$ – $T_{7}$
$P_{12}$	Traffic flow management infrastructure	$T_{1}$ – $T_{7}$
$P_{13}$	System-wide information management	$T_{3}$ , $T_{4}$ , $T_{7}$
$P_{14}$	En route control center system modernization	$T_{3}$ , $T_{4}$
$P_{15}$	Airport movement area safety system	$T_{1}$
$P_{16}$	Traffic management advisor	$T_{1}$ – $T_{7}$
$P_{17}$	Precision runway monitor	$T_{2}$ , $T_{7}$
$P_{18}$	En route communication gateway	$T_{3}$ , $T_{4}$

Table 2. The impact of different degrees of development failure on the NextGen resilience.

Failure Degree	Capability Loss ( $\int_{t_{1}}^{t_{3}} C (t) d t$ )	Recovery Time ( $t_{3} - t_{1}$ )	Resilience
Nonserious failure	8.75	12	0.729
Moderate failure	9.74	14	0.696
Severe failure	10.16	16	0.635

Table 3. Known parameters of system

P_{i}

in the NextGen.

Table 3. Known parameters of system

P_{i}

in the NextGen.

System	$L_{i}$	$T_{i}$	$β_{i, L} ($ 1000)$	$β_{i, T} ($ 1000)$	$L_{Accept}$	$T_{Accept}$
$P_{1}$	40%	10	4.23	3.52	40%	7
$P_{2}$	60%	10	7.71	6.53	40%	7
$P_{3}$	50%	10	6.25	7.45	40%	7
$P_{4}$	70%	10	6.23	6.54	40%	7
$P_{5}$	60%	10	5.22	5.58	40%	7
$P_{6}$	60%	10	7.23	9.37	40%	7
$P_{7}$	50%	10	6.65	7.16	40%	7
$P_{8}$	50%	10	6.17	5.11	40%	7
$P_{9}$	50%	10	5.64	6.22	40%	7
$P_{10}$	50%	10	6.20	5.43	40%	7
$P_{11}$	70%	10	7.25	9.41	40%	7
$P_{12}$	70%	10	9.71	7.28	40%	7
$P_{13}$	60%	10	8.28	7.50	40%	7
$P_{14}$	40%	10	5.79	6.37	40%	7
$P_{15}$	40%	10	4.24	4.44	40%	7
$P_{16}$	70%	10	3.23	5.62	40%	7
$P_{17}$	60%	10	4.82	6.97	40%	7
$P_{18}$	40%	10	6.29	5.25	40%	7

Table 4. Optimal scheme for the budget limits of $800,000.

	$P_{1}$	$P_{2}$	$P_{3}$	$P_{4}$	$P_{5}$	$P_{6}$	$P_{7}$	$P_{8}$	$P_{9}$	$P_{10}$	$P_{11}$	$P_{12}$	$P_{13}$	$P_{14}$	$P_{15}$	$P_{16}$	$P_{17}$	$P_{18}$
${RSI}_{i}$	0.020	0.087	0.067	0.110	0.083	0.103	0.053	0.083	0.080	0.083	0.127	0.140	0.090	0.040	0.027	0.130	0.103	0.030
$B_{i} *$	10.98	47.60	36.61	60.41	45.77	56.56	29.30	45.77	43.94	45.77	69.57	76.89	49.43	21.97	14.65	71.40	56.75	16.48
$a_{ij}$	0	20%	20%	60%	50%	30%	10%	10%	40%	10%	50%	30%	20%	0	0	60%	50%	0
$r_{ij}$	30%	40%	30%	30%	30%	30%	30%	70%	30%	70%	30%	60%	40%	30%	30%	30%	30%	30%
$C_{ij} *$	10.56	41.54	34.85	57.00	42.84	49.80	28.13	41.94	41.22	44.21	64.48	72.81	46.56	19.11	13.32	66.24	53.01	15.75

* The units of

B_{i}

and

C_{i, j}

are both $1000.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jiao, T.; Yuan, H.; Wang, J.; Ma, J.; Li, X.; Luo, A. System-of-Systems Resilience Analysis and Design Using Bayesian and Dynamic Bayesian Networks. Mathematics 2024, 12, 2510. https://doi.org/10.3390/math12162510

AMA Style

Jiao T, Yuan H, Wang J, Ma J, Li X, Luo A. System-of-Systems Resilience Analysis and Design Using Bayesian and Dynamic Bayesian Networks. Mathematics. 2024; 12(16):2510. https://doi.org/10.3390/math12162510

Chicago/Turabian Style

Jiao, Tianci, Hao Yuan, Jing Wang, Jun Ma, Xiaoling Li, and Aimin Luo. 2024. "System-of-Systems Resilience Analysis and Design Using Bayesian and Dynamic Bayesian Networks" Mathematics 12, no. 16: 2510. https://doi.org/10.3390/math12162510

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

System-of-Systems Resilience Analysis and Design Using Bayesian and Dynamic Bayesian Networks

Abstract

1. Introduction

2. Hierarchy Structure of an SoS

2.1. Hierarchical Representation of an SoS

2.2. Challenges of Inter-Dependency Analysis of the SoS under Uncertainty

3. Bayesian and Dynamic Bayesian Networks

3.1. Bayesian Network

3.2. Dynamic Bayesian Network

4. Resilience Metric and Resilience-Based System Importance

4.1. Resilience Analysis Metric

4.2. Resilience-Based System Importance

5. Design Resilience under Budgetary Constraint

5.1. Single System Functionality Curve

5.2. System Functionality Optimization under Budgetary Constraint

6. Numerical Example and Results

6.1. Hierarchical Representation of the NextGen

6.2. Using BN to Evaluate the Capability of NextGen

6.3. Using DBN to Analyze the Resilience of NextGen

6.4. Resilience Design under Budget Constraint

7. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI