1. Introduction
Shock models have been widely studied in engineering fields to describe the system failure process in a random environment. Shocks can rise from the impacts of sudden changes in the system operating environment, such as overload, vibration, friction and so on, which may cause the system failure or reduce its residual lifetime. In the past few decades, shock models have received extensive attention in reliability engineering [
1,
2,
3,
4]. Various shock models have been defined and studied, which can be divided into five categories [
5]: cumulative shock model [
6], extreme shock model [
7,
8], run shock model [
9],
δ-shock model [
10] and mixed shock model [
11,
12]. For cumulative shock models, a system breaks down when the cumulative effect of shocks exceeds a certain threshold. With regard to extreme shock models, a system fails due to the magnitude of an individual shock exceeding a critical level, while for run shock models, the number of consecutive shocks reaching a certain threshold leads to the system malfunction. In
-shock models, system failure occurs when the time interval between two consecutive shocks is smaller than a critical value
. A mixed shock model combines two different types of shock models. Under the mixed shock model, a system may fail when the cumulative magnitude of shocks exceeds a critical level or
consecutive shocks arrive, whichever comes first. In recent years, extensive mixed shock models and related studies have also been conducted [
13,
14,
15].
The current studies on shock models are mostly focused on binary state systems with perfect functioning and complete failure. However, many practical systems may work in multiple states due to complicated system structure and complex working environment. Such systems have more than two states, ranging from perfect working to total malfunction [
16,
17,
18,
19,
20,
21,
22]. Multi-state systems have been widely applied in engineering practice, and extensive research on multi-state system reliability modeling has been conducted in recent years [
23,
24,
25,
26,
27]. Eryilmaz [
28] first established the extreme shock model for multi-state systems, where the state transitions among states were triggered by external shocks with different magnitudes. The model was extended to the case with mutative failure patterns by combining extreme and cumulative shock models, and failure patterns were used according to different states of the system [
29]. The work in [
30] considered a general multi-state balanced system where the components and the whole system both have multiple states. According to the component operating states, the system states are determined by system balance degrees, formulated by a balance function. Levitin and Finkelstein [
31,
32,
33] proposed mission abort policies for multi-state systems to improve system reliability.
Recently, reliability modeling of self-exciting phenomena in various engineering fields has been drawing increasing attention. Self-exciting processes were first proposed by Hawkes [
34,
35] in 1971 to capture the earthquake mechanism of clustering and self-exciting. In [
34], the self-exciting process was used to characterize the phenomenon that the occurrence of certain events increases the probability of future events and interaction between different series of events. Since then, numerous models have been established based on the concept of “self-exciting”. In [
36], a traffic accident model based on self-exciting processes was developed, which was used to analyze the reliability and safety of the traffic network in a certain traffic situation. A mean-field model for the monetary reserves was considered where the reserves were subjected to a self-exciting and cross-exciting shock [
37]. In addition, self-exciting models are widely applied in the field of software reliability. Wang [
38] developed a software reliability model based on both properties of mixture and self-exciting. Chen and Singpurwalla [
39] pointed out that existing software reliability models were special cases of self-exciting point processes, and such processes unified the very diverse reliability modeling approaches.
In this paper, we incorporate the characteristic of self-exciting mixed shock models and investigate the reliability of the system. For practical engineering systems, although the appearance of a single invalid shock cannot directly change the state of the system, when the number of invalid shocks reaches a certain threshold, it will also have a bad influence on the system and accelerate the system’s degeneration. Such a phenomenon is defined as a self-exciting mechanism in our model. Based on the considered self-exciting mechanism, a multi-state mixed shock model is proposed in this paper. The system is subject to a sequence of random shocks, which are divided into valid shocks and invalid shocks by shock magnitude, where valid shocks can transfer the system to an adjacent worse state directly and invalid shocks have no effect on the system. Moreover, a self-exciting behavior might be triggered competitively by invalid shocks. In particular, when the number of cumulative invalid shocks or consecutive invalid shocks exceeds a certain value, the self-exciting mechanism is triggered, which leads the system to a worse state.
Failures of safety-critical systems may lead to severe economic losses and significant safety hazards [
40,
41,
42,
43,
44,
45,
46]. To reduce the risk of catastrophic consequences caused by system failures, some safety critical systems operating in harsh environments are often equipped with a protective device to alleviate damage from external shocks, which can significantly improve system reliability. Taking the hydraulic control system as an example, it is equipped with an accumulator buffer device to reduce the damage caused by random shocks [
47]. Additionally, the damage degree and recovery state of the protective device have a significant effect on the damage process and failure behavior of the system. Shafieian and Khiadani [
48] proposed a multipurpose cooling and air-conditioning system to protect the engine by recovering waste heat from both the exhaust fumes and the cooling water of submarine engines. Another practical example of a cooling system was given in [
49]. When the engine temperature increased to a certain level, they used a mixture of water, ethylene and glycol as a coolant to control the liners temperature. The coolant can be regarded as a protective device to reduce the risk of system failure.
In spite of the extensive research on shock models established for multi-state systems, few studies focus on modeling the self-exciting mechanism of invalid shocks and the triggering mechanism of protective devices. To further advance the state of the art of reliability modeling for multi-state shock models, we investigate a multi-state system equipped with a protective device considering the self-exciting mechanism of invalid shocks. The protective device is activated when the system state exceeds a critical value, and the protective effect can reduce the probability of valid shocks. System reliability is evaluated by using the finite Markov chain imbedding approach (FMCIA), which has been proven to be an efficient method to overcome the intractability in analyzing system reliability [
50,
51,
52,
53,
54,
55]. The optimal triggering threshold of the protective device is optimized to minimize the operating cost of the protective device and system failure cost.
In this paper, we choose the hydraulic valve system as a potential application field for the proposed multi-state shock models. The hydraulic valve is a control component in the hydraulic system, which is used to control the pressure level, the flow rate and the flow direction of liquid in the hydraulic system to meet the requirements of system operation, and the operating state of the hydraulic valve is critical to the entire hydraulic system. Excessive vibration, oil leakage and serious fluid fluctuations that occur during the operating process are regarded as external random shocks to the hydraulic valve system. Depending on the magnitude of the external influences, the shocks are divided into valid shocks and invalid shocks. Although an invalid shock cannot change the operating state of the hydraulic valve system, it will have a cumulative impact defined as a self-exciting mechanism. A protective device is commonly equipped to protect the system from external shocks and thus reduce the risk of system malfunction.
To summarize, the main contributions of this paper to the current literature are presented as follows:
Multi-state systems with a protective device operating in a shock environment are investigated;
The self-exciting mechanism in shock models with multiple triggering conditions is considered;
The reliability of the system and protective device is evaluated analytically;
An optimal state-based triggering policy of the protective device is designed and optimized.
The remainder of this paper is organized as follows. In
Section 2, model assumptions and descriptions are given. Considering the self-exciting mechanism, the reliability analysis of the multi-state system with a protective device by FMCIA is presented in
Section 3. In
Section 4, an optimal triggering policy of the protective device is proposed, and a corresponding simulation method is constructed to determine the optimal value.
Section 5 gives numerical examples of hydraulic valve systems and discusses the sensitivity analysis of pivotal parameters. Finally, conclusions and future research directions are given in
Section 6.
2. Model Description
Consider multi-state systems performing missions in a random environment characterized by a sequence of external shocks. Let random variable denote the time interval between the th and the ith shock following a continuous phase-type distribution. External shocks can be divided into valid shocks and invalid shocks by shock magnitude, and the probability that a shock is valid (or invalid) is (or ). System failure occurs upon the cumulative number of valid shocks reaching a threshold S. Accordingly, the state space of the system is denoted by structured in order of increasing deterioration levels, where 1 denotes the perfect functioning state and denotes the failure state.
Self-exciting behavior of invalid shocks. A valid shock can transfer the system to an adjacent worse state, whereas an invalid shock has no effect on the system. A self-exciting behavior might be triggered by invalid shocks. When the system suffers cumulative invalid shocks or consecutive invalid shocks, the self-exciting mechanism is triggered, which can also lead the system to an adjacent worse state. It should be noted that when an invalid shock occurs and the self-exciting mechanism is triggered, then all invalid shocks in the current runs will no longer be counted in the next shock runs. This means that the previous invalid shock runs will not influence the self-exciting behavior of the current shock runs. It can be assumed that after triggering the protective device, the and also change. For simplicity, we consider fixed and . System failure occurs when the system degrades to state .
Modeling triggering and protective mechanisms. To reduce the risk of system failure, a protective device is used to reduce the damage of external shocks. The protective device is triggered upon the system state exceeding a critical value , and implies that the protective device is triggered once the system starts working. Once the protective device is triggered, it immediately exerts protective effects and reduces the probability of valid shocks to . At the same time, the probability of invalid shocks becomes ().
Figure 1 presents two possible realizations of the system state evolution processes. The system fails in state 5 and the protective device is triggered when the system transfers to state 3. Assume that
and
, implying that in the current run of invalid shocks, if the number of cumulative invalid shocks reaches 3 or the number of consecutive invalid shocks reaches 2, the self-exciting mechanism will be triggered, and the system will transfer to an adjacent worse state. In
Figure 1a, the self-exciting mechanism is triggered after suffering two consecutive invalid shocks, and the system progresses to state 2. Due to the arrival of a valid shock, the system is moved to state 3, and the protective device is triggered immediately. The system fails when it transfers to state 5, and the mission is not completed at this time. In
Figure 1b, two valid shocks lead the system transit to state 3, and the protective device is triggered. In the current run of invalid shocks, the number of cumulative invalid shocks reaches 3, and the system transits into state 4 due to self-exciting behavior. In such case, the mission is completed before system failure.
3. Reliability Analysis
3.1. Construction of a Markov Chain
This section uses the finite Markov chain imbedding approach to analyze and derive some reliability indicators of the system with a protective device. FMCIA has been widely applied in the domain of reliability. We construct the Markov chain with corresponding state space and then calculate the one-step transition probability matrix.
We first define three random variables in a sequence of random shocks. denotes the current state of the system. and represent the number of cumulative invalid shocks and consecutive invalid shocks in the current shocks run, respectively. When an invalid shock occurs and the self-exciting mechanism is triggered, all invalid shocks in the current invalid shocks run will no longer be counted in the trailing run.
Next, a Markov chain
with
,
and
is defined as
The collections , and are mutually exclusive divisions of the state space . Collection contains the transient states in which the protective device is in the waiting phase. Collection consists of the transient states in which the protective device is in the operating phase. is the absorbing state in the state space which represents the system fails.
Let and denote the probability of a valid shock and an invalid shock, respectively. Once the protective device is triggered, the probability of a valid shock and an invalid shock become and ().
The transition probability matrix
of the proposed Markov chain
can be obtained via the following transition rules, as presented in
Table 1.
According to the transition rules presented in
Table 1, the one-step transition probability matrix
can be obtained easily.
is a square matrix with order
, where
and
is the cardinality of the state space
. Hence,
is partitioned into four elements using the Markov chain theory as follows.
where
denotes the transition probability matrix among all transient states.
is the transition probability matrix from transient states to absorbing states.
denotes a zero matrix representing the transition probability matrix from absorbing states to transient states, and
is the transition probability matrix among the absorbing states.
The system operating process can be divided into two stages by the state of the protective device. In the first stage, the protective device is in the waiting phase. The latter stage contains the operating process from the protective device being triggered to system failure.
To obtain the distribution of the waiting time of the protective device, all states after the protective device is activated are defined as an absorbing state, and the new one-step transition probability matrix
can be expressed as
where
is the order of the matrix
.
is the transition probability matrix among all transient states.
stands for the transition probability matrix from transient states to absorbing states.
is the transition probability matrix from absorbing states to transient states, and
denotes the one-step transition probability matrix among the absorbing states.
Example. Consider a system with a protective device has four states (
), where state 1 denotes the perfect functioning state and state 4 represents the failure state. The protective device can be triggered when the system state reaches 3 (
). In the current invalid shocks run, if the number of cumulative invalid shocks reaches 3 or the number of consecutive invalid shocks reaches 2 (
), the self-exciting mechanism will be triggered, and the system will transfer to an adjacent worse state. Therefore, the state space of the constructed Markov chain is
Then, the one-step transition probability matrix
can be obtained as
The corresponding matrixes
and
are established as
Define all the states after the protective device is triggered as an absorbing state, and the matrix
can be written as
3.2. Expected Waiting Time of Protective Device
Define random variable as the total number of shocks that the system has suffered when the protective device is triggered, and follows a discrete PH distribution denoted as where . Based on the above analysis, some probabilistic indices related to the shock length can be derived by the following equations.
The distribution function of the shock length
is
The probability mass function of the shock length
is
The expected shock length
is
where
,
and
.
We assume that the inter-arrival time of two consecutive external shocks
follows a continuous phase-type distribution with a representation
. The random variable
denotes the time that the protective device is triggered, which can be derived as
In line with the closure properties of phase-type distributions, we use a matrix-based method to obtain the distributions of
, which is shown as follows
where
,
is an identity matrix and
is the Kronecker product.
The cumulative distribution function of
can be derived as
The reliability function of
can be expressed as
The expected value of
is given by
3.3. Expected System Lifetime
To obtain the distribution of system lifetime, we define random variable as the total number of shocks until the system fails. Similarly, follows a discrete PH distribution denoted as where . Some probabilistic indices related to can be obtained as follows.
The distribution function of the shock length
is
The probability mass function of the shock length
is
The expected shock length
is
where
and
.
Similarly, we define the random variable
as the system lifetime, and it can be derived as
Due to
, we have
where
,
is an identity matrix and
is the Kronecker product.
The cumulative distribution function of
can be derived as
The reliability function of
can be expressed as
The expected value of
is given by
4. Optimal Triggering Policy of the Protective Device
In this section, we consider the optimal policy of the trigger threshold of the protective device. Assume that the multi-state system executes a mission that requires continuous operation for a duration of . If the system completes the mission before failure, the system can survive successfully. According to the operating process of the system, it can be divided into the following three situations.
Situation 1: The system completes the mission at
. At this time, the system has not degraded to state
, and the protective device has not been triggered. The probability of this situation is
Situation 2: The system completes the mission at
and survives successfully. Additionally, the protective device is triggered before mission completion. The probability of this situation is
Situation 3: The system fails before mission completion. According to model assumptions, the protective device is triggered during the system operating process. The probability of this situation is
On the one hand, in the actual system operating process, system failure often causes serious economic losses. To reduce the probability of system failure before mission completion as much as possible, the protective device should be activated as soon as possible to resist external shocks and reduce the risk of system failure. On the other hand, the protective device has an operating cost when it resists external shocks, and its operating cost is proportional to the operating time. Triggering the protective device too early will cause an unnecessary waste of operating costs. Therefore, we need to optimize the triggering threshold of the protective device to minimize the total cost of system operation.
In our model, two possible costs are considered.
represents the operating cost per unit time of the protective device, and
denotes the cost of system failure. Thus, the total cost is given as
The expected total cost is calculated by
The first part of Equation (22) indicates that the protective device has not been activated when the mission is complete, so no cost must be considered. The second part indicates that the system completes the mission successfully after the protective device is triggered. The total cost is the operating cost of the protective device with a duration of . The third part of this formula means that the system reaches the failure state before the mission is completed. In this situation, the total cost contains the operating cost of the protective device from being triggered to system failure time and the system failure cost.
It is intractable to derive the analytical form of
due to the dependence between random variables
and
. Thus, a simulation method is employed to evaluate the optimal triggering threshold for the protective device. The flowchart of the proposed simulation procedure is shown in
Figure 2. Here,
is the number of simulation runs, and
denotes the maximal number of shocks that may arrive during a simulation. The main simulation procedure is given below.
Step 1: Initialize the model parameters.
Step 2: Generate two random variables to simulate the external shock process.
Step 3: Judge whether the system reaches the triggering threshold of the protective device and whether the mission is completed.
Step 4: Simulate the system operating process after the protective device is triggered.
Step 5: Judge whether the mission is completed when the system fails and calculate the corresponding cost.
Step 6: Obtain the result of a round of simulation.
Step 7: Obtain the average operating cost by the simulation results.
5. Illustrative Example
For an illustration of the proposed model and optimization policy in the previous sections, numerical examples are given considering hydraulic valve systems in this section. We consider that the hydraulic valve system executes a mission with a duration of . Excessive vibration, serious fluid fluctuations, and oil leakage that occur during the operating process are regarded as random shocks to the hydraulic valve system. According to the magnitude of external influences, shocks are divided into valid shocks and invalid shocks. Valid shocks can transfer the system to an adjacent worse state directly. The invalid shock has a self-exciting mechanism and will accelerate the degradation process of the system when its quantity reaches a certain level.
To mitigate the risk of hydraulic valve failure, a protective device is designed to protect the system from external shocks by reducing the probability of valid shocks. When the state of the hydraulic valve system degenerates to a preset level due to external shocks, the protective device is triggered and reduces the probability of valid shocks to .
Assume that the time interval between two successive shocks follows the exponential distribution with parameter (i.e., ). The probability that a shock is valid (or invalid) is (or ). We assume and , which means that when the system suffers 4 cumulative invalid shocks or 3 consecutive invalid shocks, the self-exciting mechanism is triggered, and the system transfers to an adjacent worse state. The system has 10 states, and state 10 denotes system failure. The protective device is activated in state 5 and reduces the probability of valid shocks to . The mission time is . The operating cost per unit time of the protective device is , and denotes the cost of system malfunction.
Figure 3 shows the sensitivity analysis of system reliability over the mission under different values of
. When the value of
increases, the system reliability decreases continuously. Intuitively, the trend shown in
Figure 3 is correct. An increase in
implies that the arrival of shocks is relatively dense, and more external shocks are suffered by the system at the same time. Therefore, the system degrades faster, leading to lower reliability.
Changes in system reliability under different probabilities of valid shocks when the triggering threshold
increases are presented in
Figure 4. In general, the trends of the three curves are roughly the same. As the triggering threshold increases, the system reliability decreases continuously. The increase in
means that the protective device is triggered when the system is in a worse state, so the protective device has less time to resist external shocks. Naturally, the reliability of the system shows a downward trend. By comparing three curves with different
, it is found that when the probability of valid shock is high, the system is easier to move to a worse state, and the reliability is relatively low at any trigger threshold.
As mentioned above, the increase in
means that the system reliability decreases. When
is small, the system has a high probability of completing the mission before the protective device is triggered. In this situation, the total operating cost is very low. With the increase in
, the operating cost of the system shows an obvious upward trend due to a higher risk of system failure. The above conclusion is shown intuitively in
Figure 5.
For the optimization model of the triggering threshold
, an enumeration method is adopted to find optimal
and related
. The curve in
Figure 6 is convex, and there exists an optimal
to minimize the total cost. When the triggering threshold increases, the minimal expected cost first decreases because the protective device reduces the risk of system failure significantly by resisting shocks and then increases due to high operating cost of the protective device. It can be observed that triggering the protective device in state 5 can obtain the minimum total operating cost
where
equals 2 and
takes value of 0.4.
To further demonstrate the applicability of the optimization policy, we conduct a sensitivity analysis of the minimum expected cost with different values of
and
. It can be observed from
Table 2 that when the value of
increases, the triggering threshold decreases, and the cost increases, respectively. The optimal triggering threshold is state 5, and the corresponding expected total cost is 120.11 with
. Under the condition of
, we can obtain the minimal expected cost
in state 6. For the system, an increase in
means a worse operating environment. Therefore, we tend to trigger the protective device earlier to resist external shocks to reduce the expected total cost. When the probability of valid shock is small, the system suffers less shocks and has a longer operating time, which means that the failure process of the system is deferred. Therefore, the protective device tends to be activated later to save operating costs and obtain the minimum total cost.
From
Table 3, it is observed that when
remains unchanged, as
increases, the total operating cost shows an increasing trend, and the optimal triggering threshold shows a decreasing trend, respectively. The increase in
means that the cost of system failure is higher, so it is necessary to trigger the protective device earlier to reduce the risk of system failure. Similarly, comparing the optimal triggering thresholds in the cases of
and
given that
, it can be observed that when the operating cost of protective device increases, the protective device tends to be triggered at a later stage to reduce the operating time, thereby reducing expected total cost. In addition, when
and
increase, the minimal expected total cost of the system also increases.
6. Conclusions
In this paper, multi-state systems are studied, considering the self-exciting mechanism of invalid shocks and the protective effect. In an invalid shock run, when the number of cumulative or consecutive invalid shocks reaches the corresponding threshold, the self-exciting mechanism will be triggered to accelerate the state transition of the system. When the system degrades to a certain level, the protective device is triggered to delay the process of system failure by resisting external shocks. The finite Markov chain imbedding approach is used to derive the expected waiting time of the protective device, expected system lifetime and system reliability. In addition, a simulation method is proposed to find the optimal triggering threshold of the protective device minimizing the expected total system operating cost. Finally, illustrative examples of hydraulic valve systems are given to verify the effectiveness of the model and optimization policies.
The current study can be generalized in the following directions. Firstly, the considered system is only subject to damages from external shocks in the current study. In future research, the internal degradation could be incorporated into the reliability model. Additionally, the current study considers a single-component system, and further studies can consider multi-component systems with multiple protective devices, and the corresponding optimal triggering policy of protective devices is also worth investigating. Last but not least, the triggering policy is state-based in this study, and we can further consider the multi-criteria triggering policy with state and time thresholds.