Next Article in Journal
Comparison of Postural Stability and Regulation among Male Athletes from Different Sports
Next Article in Special Issue
A Control Method for the Differential Steering of Tracked Vehicles Driven Independently by a Dual Hydraulic Motor
Previous Article in Journal
KFSENet: A Key Frame-Based Skeleton Feature Estimation and Action Recognition Network for Improved Robot Vision with Face and Emotion Recognition
Previous Article in Special Issue
Influencing Factors of the Length of Lane-Changing Buffer Zone for Autonomous Driving Dedicated Lanes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Safety-Oriented System Hardware Architecture Exploration in Compliance with ISO 26262

1
College of Electrical Engineering and Computer Science, National Taipei University, New Taipei City 237303, Taiwan
2
Department of Electrical Engineering, National Taipei University, New Taipei City 237303, Taiwan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(11), 5456; https://doi.org/10.3390/app12115456
Submission received: 10 May 2022 / Revised: 23 May 2022 / Accepted: 25 May 2022 / Published: 27 May 2022
(This article belongs to the Special Issue Novel Methods and Technologies for Intelligent Vehicles)

Abstract

:

Featured Application

The proposed safety-oriented system hardware architecture exploration can be applied to achieve an ISO-26262-compliant hardware architecture for the safety-critical automotive system.

Abstract

Safety-critical intelligent automotive systems require stringent dependability while the systems are in operation. Therefore, safety and reliability issues must be addressed in the development of such safety-critical systems. Nevertheless, the incorporation of safety/reliability requirements into the system will raise the design complexity considerably. Furthermore, the international safety standards only provide guidelines and lack concrete design methodology and flow. Therefore, developing an effective safety process to assist system engineers in tackling the complexity of system design and verification, while also satisfying the requirements of international safety standards, has become an important and valuable research topic. In this study, we propose a safety-oriented system hardware architecture exploration framework, which incorporates fault tree-based vulnerability analysis with safety-oriented system hardware architecture exploration to rapidly discover an efficient solution that complies with the ISO-26262 safety requirements and hardware overhead constraint. A failure mode, effect, and diagnostic analysis (FMEDA) report is generated after performing the exploration framework. The proposed framework can facilitate the system engineers in designing, assessing, and enhancing the safety/robustness of a system in a cost-effective manner.

1. Introduction

Safety-critical intelligent automotive systems such as autonomous driving systems, advanced driver assistant systems, and driver-by-wire systems require stringent dependability while the systems are in operation. Therefore, safety and reliability issues must be addressed in the development of safety-critical systems. When the vehicle control functions are implemented by electronic control systems, functional safety issues become so critical that such systems should be developed with strict safety requirements. Furthermore, safety issues should be considered with the highest priority during the whole lifecycle of a safety-critical system. To carry out such safety-oriented system development, the functional safety standard, ISO-26262, was established [1].
ISO-26262 was first published in 2011, specific to the application sector of electrical and/or electronic (E/E) systems within road vehicles. The primary purpose of this standard is to conduct a safety life cycle for electronic systems. ISO-26262 has been accepted worldwide as the technical “state-of-the-art” for safety-critical automotive systems [2]. In ISO-26262, a safety life cycle includes the concept phase, product development phase, and the production and operation planning phase. During the safety life cycle, considered issues cover initialization of the product concept, specification establishment, product design, and pre-production tests. All these issues put emphasis on functional safety consideration. At the product development phase, the V-model [3] is adopted as the primary design, verification, and validation flow. This phase is further divided into three different levels: System level, hardware level, and software level. For these levels, functional safety requirements are verified and validated through failure mode and effect analysis (FMEA), fault tree analysis (FTA), and safety-related metrics.
The ISO-26262 standard adopts the ASILs (Automotive Safety Integrity Levels) to measure whether the developed systems have achieved the demanded safety level or not. There are four ASILs defined in the standard—from A to D, where ASIL A defines the lowest and ASIL D defines the highest safety level. More strict requirements need to be fulfilled if a higher ASIL is specified.
In this study, based on the ISO-26262 functional safety standard, we propose a safety-oriented hardware architecture exploration framework for safety-critical automotive systems. The proposed framework integrates the hardware architecture exploration algorithms with the FTA-based weak-point analysis to quickly find an efficient system solution that complies with the ISO-26262 ASIL requirement as well as the hardware overhead constraint. We employ the autonomous emergency braking (AEB) system to demonstrate the effectiveness of the proposed design framework. The AEB system integrates the brake-by-wire system proposed in [4] with automotive radar sensors and a speed sensor to implement a front-vehicle collision avoidance system.
The paper is organized as follows. Section 2 discusses the related works and Section 3 introduces the ISO-26262 hardware architecture metrics. The safety-oriented system hardware exploration framework a including fault tree analysis with vulnerability identification, safety mechanism deployment, and hardware overhead constraint conformation is proposed in Section 4. A case study demonstration is illustrated in Section 5. Conclusions and future works appear in Section 6.

2. Related Works

To implement the safety-oriented system architecture exploration framework, there are two infrastructural technical bases; one is the safety analysis and the other is the safety improvement. In this section, related works for these two technical bases are discussed and compared to our proposed framework.
In [5], an accurate and well-explained practical guide to the specific techniques appropriate for PMHF (probabilistic metric for hardware failure) calculation using FTA was proposed. This paper presented a structured and systematic quantitative FTA while showing various schemes for calculating the PMHF considering both single-point faults and dual-point latent faults. In [6], the author performed quantitative assessments of ISO-26262 hardware architecture metrics by means of a fault tree analysis. A custom coverage gate was firstly proposed to represent the diagnostic coverage related to a safety mechanism. The advantage of introducing the coverage gate is to reduce the complexity of fault tree construction. The author of [7] presented generalized formulas for the calculation of PMHF in non-redundant and redundant subsystems using observable parameters, such as the failure rate of a mission function and a safety mechanism, the diagnostic coverages of the primary and secondary safety mechanisms, and the diagnostic period of the secondary safety mechanisms to expand the scope of the application according to ISO 26262. A mixed model based on FTA and the Markov chain was proposed in [8] to evaluate random hardware failures of the whole-redundancy system in ISO 26262. The mixed model presented in [8] tried to solve the problem of calculating the PMHF in the whole-redundancy system, whose fault tree only contains several dynamic logic gates.
Another study [9] illustrated the application of hardware reliability calculation procedures according to the ISO 26262 standard. This paper described computational procedures with the derivation and explanation of mathematical formulas for various hardware architectures of electronic systems. The described formulas consider the impact of multiple failures and the impact of self-tests, but the formulas are relatively simple. The research in [10] aimed to provide a framework for quantitative FTAs, while considering periodic inspections and repairs, which are the key assumption of the standard. The framework is based on models of the Markov stochastic process and the PMHF equations derived from those models. Further research [11] considered the design-phase safety analysis of vehicle guidance systems. The proposed approach constructed dynamic fault trees (DFTs) to model a variety of safety concepts and E/E architectures for drive automation. Our previous work [12] proposed an FTA-based weak-point analysis methodology for the safety-critical automotive systems, but only the safety metric, PMHF, was considered.
On the other hand, previous literature [13,14,15,16,17,18,19,20,21,22,23] has demonstrated how to accomplish safety improvement through architecture exploration, which take both the overall system safety and hardware overhead as the design metrics. Reference [13] was a survey paper that collected and compared the published literatures addressing the safety-oriented system architecture exploration in recent years. According to [13], combined with our survey results, we find that recent studies [14,15,16,17,18,19,20,21,22,23] have proposed the feasible solutions for safety-aware hardware cost-optimization techniques. The main idea of [14,15,16,17] is to identify the removable hardware elements in an existing system hardware architecture so the system safety and operation performance requirements can still be met. For example, a processor core is removable if the tasks allocated to this processor can be ported to other processors without violating the safety and timing requirements. Therefore, the hardware overhead can be reduced after removing these identified hardware elements. However, the proposed techniques [14,15,16,17] can only meet the requirement of one single safety-related design metric. Thus, these techniques are not feasible for systems with more than one safety-related design metric to be fulfilled.
The core concept of references [18,19,20,21,22,23] is very similar. The proposed hardware architecture exploration frameworks in these studies tried to satisfy the system safety requirements in compliance with the target ASIL first and then reduce the hardware overhead through the ASIL decomposition, which is a technique adopted in ISO-26262 for cost-effective considerations. After applying ASIL decomposition, hardware elements or subsystems with a higher ASIL can be decomposed into hardware elements with lower design complexity, and the required ASIL can be lowered accordingly. In such a way, the overall hardware overhead can be effectively reduced. Furthermore, verification and testing efforts can also be reduced due to the lowered ASIL. However, these papers only considered the ideal case, i.e., the ASIL decomposition is feasible. For real automotive systems, ASIL decomposition may not allow the hardware elements to be in hardcore form when they are provided by the suppliers. Thus, the proposed methodology in [18,19,20,21,22,23] would be limited to only the systems that the ASIL decomposition allows. To avoid such a limitation, the proposed hardware architecture exploration framework in this study does not adopt ASIL decomposition as the main scheme for hardware overhead reduction.
Compared to the previous works, the advantages of the proposed architecture exploration framework in this study primarily concern the following two aspects:
(1)
Complying with more safety-related design metrics.
The safety-aware architecture exploration methodologies proposed in the literature tend to only comply with an overall system reliability or PMHF safety metric requirement. However, PMHF is only one of the three safety metrics required in ISO-26262. In fact, two other specific safety-related design metrics, the single-point fault metric (SPFM) and latent-fault metric (LFM), also play important roles in safety-critical design, especially when ISO-26262 is the primary safety norm to be complied with. Thus, we should consider SPFM, LFM, and PMHF metrics holistically. Otherwise, the developed safety-critical system could still contain unknown safety vulnerability, which could lead to reliability and safety problems and violate the requirements of ISO-26262. For example, if multiple-point faults in a system are not considered, then latent faults could exist in such a system. Therefore, specific safety-related design metrics such as SPFM and LFM representing the fault-tolerant capabilities of single-point faults and multiple-point faults are required to be defined and fulfilled. In this study, the proposed safety-oriented system architecture exploration framework is highly customized for ISO-26262. We specify the three safety-related design metrics, SPFM, LFM, and PMHF, requested by ISO-26262 as the primary targets to be achieved for safety-critical automotive systems. Thus, the proposed framework can assure that the obtained system architecture can fulfill the safety requirements for single-point and multiple-point fault tolerance as well as overall system reliability and safety.
(2)
Satisfying safety-related design metrics and hardware overhead constraints simultaneously in a very limited number of design iterations.
In addition to the safety-related design metrics required by ISO-26262, the proposed system architecture exploration framework also takes hardware overhead into account. Thus, there are four design metrics to be satisfied so that the high-design-complexity problem, as encountered in previous works [13,14,15,16,17,18,19,20,21,22,23], arises to be resolved. In this work, we show that the proposed hardware architecture exploration framework only requires a very limited number of design iterations to achieve a system hardware architecture that simultaneously satisfies the three safety-related design metrics and hardware overhead constraints through a real safety-critical automotive system demonstration.

3. ISO-26262 Hardware Architecture Metrics

Failure mode, effect, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem/product level failure rates, failure modes, and diagnostic capability. The main purpose of FMEDA in ISO-26262 is to evaluate hardware architecture metrics and safety goal violations due to random hardware failures and provide sufficient information to improve safety gaps if the required hardware safety level is not fulfilled. The hardware architecture metrics include the single-point fault metric (SPFM), latent-fault metric (LFM), and probabilistic metric for hardware failure (PMHF). SPFM reflects the robustness of the item to single-point and residual faults by either coverage from safety mechanisms or design (primarily safe faults). A high SPFM implies that the proportion of single-point faults and residual faults in the hardware of the item is low. LFM reflects the robustness of the item to latent faults by either coverage of faults in safety mechanisms or by the driver recognizing that the fault exists before the violation of the safety goal, or by design (primarily safe faults). A high latent-fault metric implies that the proportion of latent faults in the hardware is low. Finally, PMHF is a probabilistic metric for evaluating the violation of the considered safety goal due to random hardware failure. Table 1 lists the target values for SPFM, LFM, and PMHF under different ASILs. It is worth noting that there is no target value for ASIL A.
To acquire PMHF, SPFM, and LFM, λS, λSPF, λRF, and λDPF,L, which represent the failure rates associated with a safe fault, single-point fault (SPF), residual fault (RF), and latent dual-point fault (DPF), have to be derived in advance. Figure 1 shows the failure rate calculation process according to ISO-26262, where λC(i) is the failure rate of the ith safety-related component C(i) and assumes the system has n number of safety-related elements.
For a safety-related hardware element, C(i), its faults consist of safe and non-safe faults. The safe faults will not cause a safety goal violation. Therefore, only the non-safe faults could cause a safety goal violation and contribute to the λSPF if there is no safety mechanism to protect the faults. If there is a safety mechanism to prevent the faults of C(i) from causing a safety goal violation, then the faults not covered by the safety mechanism are identified as residual faults of C(i), and the corresponding failure rate, λ R F , can be derived from the following expression (1)
λ R F = λ C i × p e r c e n t a g e   o f   n o n s a f e   f a u l t s × 1 D C R F C i
where λ C i and D C R F C i are the failure rate and the diagnostic coverage with respect to the residual faults of C(i).
In Figure 1, the faults of C(i) that have no potential to directly cause the system to violate the safety goals are identified as multiple-point faults. There are two sources of the multiple-point faults; the first one is attributed to the single-point faults covered by the safety mechanism, and the second one is from faults that can cause the system to violate the safety goals only when the fault in C(i) combines with one or more other independent faults. Because the probability of violation of the safety goal contributed to by three or more faults is low enough, we treat them as safe faults in this study. Thus, only the latent dual-point faults (DPF) are considered, which can be calculated by the following expression (2)
λ D P F , L = λ C i × p e r c e n t a g e   o f   n o n s a f e   f a u l t s × D C R F C i × 1 D C D P F , L C i
where D C D P F , L C i is the diagnostic coverage of C(i) with respect to latent dual-point faults.

4. Safety-Oriented System Hardware Architecture Exploration Framework

In this study, we propose a safety-oriented system hardware architecture exploration framework whose goal is to achieve a system hardware architecture that complies with the ISO-26262 safety metrics and the hardware overhead constraint simultaneously. Figure 2 exhibits the overall flow of the proposed framework.
First, the ISO-26262 safety metrics are calculated for the initial system hardware architecture provided by the system engineers and compared to the target values to determine whether the target ASIL is achieved or not. If the target ASIL cannot be achieved, the safety-oriented system architecture exploration is performed to effectively apply appropriate safety mechanisms to reduce the whole system’s failure rates. Such system architecture exploration is repeated until all the safety metrics can be satisfied. After that, the system hardware architecture that meets the ASIL goal but fails to satisfy the hardware overhead constraint is used to further explore the final solution that satisfies the safety metrics and hardware overhead constraint simultaneously. The reason we consider the safety metrics and hardware overhead constraint sequentially is due to the complexity problem. The idea of our architecture exploration methodology is first to discover a feasible solution that meets the safety metrics, and then use that solution to adjust its hardware architecture to satisfy the safety metrics and hardware overhead constraint simultaneously. To address the issues of safety metrics and the hardware overhead constraint separately, our architecture exploration methodology can tackle the complexity problem well.
The safety-oriented system architecture exploration framework comprises the following three phases:
i.
FTA-based weak-point analysis.
In this phase, we apply the well-known and widely adopted safety analysis methodology, fault tree analysis, to identify all the hardware elements that cause the safety metrics to be unachievable. Furthermore, we can utilize the quantitative FTA to evaluate the failure probabilities for the MCS (Minimal Cut Set) and determine the MCSs, which are the weak points of the system, by comparing their failure probabilities to the target failure probability for the required ASIL. The details for this analysis are illustrated in Section 4.2.
ii.
ASIL-oriented hardware architecture exploration algorithm.
In last phase, the hardware elements identified as weak points have been listed. Thus, the safety mechanisms need to be deployed to those hardware elements to reduce their failure rates. The issue for this phase is proposing an effective measure to evaluate the extent of failure rates reductions that are required to achieve the target ASIL goal and then recognize which safety mechanisms are sufficient for the identified hardware elements. The proposed ASIL-oriented hardware architecture exploration, which is introduced in Section 4.3.2, can aptly address the safety issue to be solved.
iii.
HO-oriented hardware architecture exploration algorithm.
In phase ii, the safety mechanisms are used to reach the target ASIL goal without considering the cost of hardware overhead induced from the deployed safety mechanisms. Thus, the additional hardware overhead could cause the overall system hardware cost to exceed the acceptable upper bound. To address this issue, we propose a hardware overhead (HO)-oriented hardware architecture exploration methodology. Through this methodology, the system hardware architecture with safety mechanism deployment derived from the previous phase is analyzed to recognize the bottleneck when the additional hardware overhead exceeds the constraint specified by the system engineers. Accordingly, the system hardware is adjusted to meet the safety metrics and hardware overhead constraint simultaneously. More details can be found in Section 4.3.3.
After the safety-oriented system architecture exploration is accomplished and both the safety metrics and hardware overhead constraint are conformed, the final system hardware architecture can be obtained with additional hardware costs invested. Besides, the corresponding FMEDA report is also generated to provide more detailed information for the designers. The effectiveness of the proposed framework is demonstrated with an autonomous emergency braking system design as described in Section 5.
In the following subsections, we will introduce the proposed safety-oriented system hardware architecture exploration framework and illustrate the details of the exploration process through a simple example.

4.1. Problem Formulation

Before the problem can be formalized, the following notations are defined first:
  • n: The number of safety-related hardware elements in the system;
  • C(i): The ith safety-related component, where 1 ≤ in;
  • λC(i): Failure rate of the ith safety-related component;
  • λS, λSPF, λRF, and λDPF,L: The failure rates associated with a safe fault, single-point fault (SPF), residual fault (RF), and latent dual-point fault (DPF), respectively;
  • SPFMtar, LFMtar, and PMHFtar: The target values for SPFM, LFM, and PMHF in accordance with the target ASIL;
  • P F S P F M t a r t , P F L F M t a r t ,   and   P F P M H F t a r t : The target failure probability for achieving SPFMtar, LFMtar, and PMHFtar safety metrics at mission time t;
  • SPFMite_d, LFMite_d, and PMHFite_d: The values of SPFM, LFM, and PMHF in the dth design iteration, where d ≥ 0.
With the notations defined above, the main goal of the safety-oriented system hardware architecture exploration framework can be formalized as:
  • Explore the system hardware design space to determine a hardware architecture such that the following requirements can be fulfilled:
    PMHFite_d < PMHFtar.
    SPFMite_dSPFMtar.
    LFMite_dLFMtar.
Based on Table 1, the target values of PMHFtar, SPFMtar and LFMtar can be determined according to the target ASIL. For example, if the target ASIL B is selected, then th PMHFtar, SPFMtar, and LFMtar are specified as 10−7 h−1, 90%, and 60%, respectively. On the other hand, PMHFite_d, SPFMite_d, and LFMite_d can be derived from the following formulae provided by ISO-26262 for the hardware architecture metrics calculation.
PMHFite_d can be calculated by the following expression (3)
P M H F i t e _ d = Safety - Related   hardware ( HW )   elements λ S P F + λ R F + λ D P F , L
Next, SPFMite_d can be computed by the following expression (4)
S P F M i t e _ d = 1 Safety - Related   HW   elements λ S P F + λ R F Safety - Related   HW   elements λ
where Safety - Related   HW   elements   λ represents the total failure rates of all safety-related hardware elements and can be calculated by Safety - Related   HW   elements λ = Safety - Related   HW   elements λ S P F + λ R F + λ D P F + λ S (assuming all failures are independent and follow the exponential distribution).
Moreover, the following expression can be derived from the safety requirement SPFMite_d ≥ SPFMtar.
Safety - Related   HW   elements λ S P F + λ R F 1 S P F M t a r × Safety - Related   HW   elements λ
Lastly, the calculation of LFMite_d is based on the following expression (6)
L F M i t e _ d = 1 Safety - Related   HW   elements λ D P F , L Safety - Related   HW   elements λ λ S P F λ R F
Similarly, the following expression can be derived from the safety requirement LFMite_dLFMtar.
Safety - Related   HW   elements λ D P F , L 1 L F M t a r × Safety - Related   HW   elements λ λ S P F λ R F
The failure rates λS, λSPF, λRF, and λDPF,L in expressions (3)–(7) can be derived from the process as exhibited in Figure 1 and expressions (1) and (2) as shown in Section 3.

4.2. FTA-Based Weak-Point Analysis

After PMHFite_d, SPFMite_d, and LFMite_d, are acquired, we can compare them with the target values to check whether the functional safety requirements can be fulfilled. When any requirement is violated, the current system hardware architecture should be analyzed to identify the main contributors to the safety goal violation. We term such an analysis as the weak-point analysis. The weak-point analysis should provide the precise basis to guide the deployment of a feasible safety mechanism for the vulnerable components so that all target values of the hardware architecture metrics can be achieved in an efficient and cost-effective manner.
In this study, we propose an effective weak-point analysis methodology based on fault tree analysis (FTA). FTA has been widely adopted as the primary system-level reliability modeling for decades. Besides, in ISO-26262, FTA is also recommended as the primary system-level safety analysis methodology. However, the concrete measures are not disclosed. To address this hiatus, we intend to illustrate how to locate the safety vulnerability in the system through the FTA approach. Before explaining the proposed FTA measures, the following notations need to be defined in advance:
  • FPMCS(t): The failure probability for an MCS at mission time t;
  • F P R F C i t : The failure probability associated with the residual faults for a hardware element C(i) at mission time t;
  • F P D P F , L C i t : The failure probability associated with the latent dual-point faults for a hardware element C(i) at mission time t;
  • λ S P F M t a r : The target failure rate to be achieved for satisfying the SPFM requirement with respect to the target ASIL;
  • λ L F M t a r : The target failure rate to be achieved for satisfying the LFM requirement with respect to the target ASIL;
  • P F P M H F t a r t : The target failure probability at mission time t to be achieved for satisfying the PMHF requirement with respect to the target ASIL;
  • P F S P F M t a r t : The target failure probability at mission time t to be achieved for satisfying the SPFM requirement with respect to the target ASIL;
  • P F L F M t a r t : The target failure probability at mission time t to be achieved for satisfying the LFM requirement with respect to the target ASIL;
  • GapPMHF(MCSk): The quantified gap between the failure probability for the kth MCS in the system hardware and the P F P M H F t a r t at mission time t;
  • GapSPFM(MCSk): The quantified gap between the failure probability for the kth MCS in the system hardware and the P F S P F M t a r t at mission time t;
  • GapLFM(MCSk): The quantified gap between the failure probability for the kth MCS in the system hardware and the P F L F M t a r t at mission time t.
First, we take the system to be analyzed as the input and construct the corresponding fault tree according to the system hardware architecture. The process of constructing a fault tree is out of this paper’s scope but has been comprehensively illustrated in previous literature [5,6,7,8,9,10,11,12,24,25,26,27,28] either from the system preliminary hardware architecture or from the system-level simulation models. Thus, the details of fault tree construction are omitted in this study. After the fault tree is constructed, the FTA can be performed to list all the MCSs for the fault tree. Next, we classify all the listed MCSs into the following two types:
  • Single-point failure (SPF): MCS contains a single safety-related hardware element represented as {C(i)}.
    The failure probability of the MCS belonging to SPF is calculated by the following expression
    F P M C S t = F P C i t = 1 e λ C i × t
    where F P C i t   is the failure probability of the safety-related hardware element C(i) at mission time t.
  • Dual-point failure (DPF): MCS contains two hardware elements and id further classified into two kinds of constitution.
    MCS consists of the safety-related hardware element and the safety mechanism to protect this safety-related hardware element, represented as {C(i), SMC(i)}.
    The failure probability of such an MCS is computed by the following expression
    F P M C S t = F P R F C i t + F P D P F , L C i t
    where
    F P R F C i t = 1 e λ C i × 1 D C R F C i × t
    and
    F P D P F , L C i t = 1 e λ C i × D C R F C i × 1 D C D P F , L C i × t
    where D C R F C i and D C D P F , L C i represent the diagnostic coverage (DC) of safety mechanisms with regard to the residual faults and latent dual-point faults.
    Any two safety-related hardware elements could lead to the safety goal violation only when these two hardware elements fail at the same time, represented as {C(i), C(j)}.
The failure probability of such an MCS is calculated by the following expression
F P M C S t = F P C i t × F P C j t
With the above expressions (8)–(12), we can calculate the failure probability for each MCS. An MCS is marked as the safety vulnerability if this MCS’s failure probability is greater than or equal to P F P M H F t a r t , P F S P F M t a r t , or P F L F M t a r t . For such an MCS, the safety mechanism must be deployed or upgraded (if existed) to reduce the failure rate of the hardware element(s) in this MCS. Otherwise, the requirements for achieving the target ASIL goal could never be fulfilled. We call such an MCS the MBP (Must-Be-Protected) weak points. There are MBPPMHF, MBPSPFM, and MBPLFM associated with the safety metrics of PMHF, SPFM, and LFM, respectively. On the contrary, an MCS is termed POD (Protected-On-Demand) if the failure probability is lower than P F P M H F t a r t , P F S P F M t a r t , and P F L F M t a r t . Similarly, an MCS could belong to the PODPMHF, PODSPFM, or PODLFM. It is worth noting that the requirements for reaching the target ASIL may still not be achieved even though all the listed MCS belong to the POD. Under these circumstances, we need to determine the most critical weak point and then deploy or upgrade the safety mechanism to the addressed hardware element so that the most effective failure rate reduction for the whole system can be assured.
For each MCS, the quantified gaps between F P M C S t and P F P M H F t a r t , and P F S P F M t a r t and P F L F M t a r t are calculated through the following steps:
a.
Calculate the λ S P F M t a r and λ L F M t a r :
According to expression (5), we know that SPFMtar can be achieved only when the total failure rates associated with the single-point faults and residual faults are less than 1 S P F M t a r × Safety - Related   HW   elements λ . Thus, the λ S P F M t a r can be specified by
λ S P F M t a r = 1 S P F M t a r × Safety - Related   HW   elements λ
Similarly, according to expression (7), the λ L F M t a r can also be specified by
λ L F M t a r = 1 L F M t a r × Safety - Related   HW   elements λ λ S P F λ R F
b.
Calculate P F P M H F t a r t , P F S P F M t a r t , and P F L F M t a r t with the following expressions
P F P M H F t a r t = 1 e P M H F t a r × t
P F S P F M t a r t = 1 e λ S P F M t a r × t
P F L F M t a r t = 1 e λ L F M t a r × t
c.
Calculate GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk) for the kth MCS with the following expressions
Gap PMHF ( MCS k ) = F P M C S k t / P F P M H F t a r t
Gap SPFM ( MCS k ) = F P M C S k t / P F S P F M t a r t
Gap LFM ( MCS k ) = F P M C S k t / P F L F M t a r t
Subsequently, the MCSk can be marked as the MBP or POD by the following criteria:
  • When GapPMHF(MCSk), GapSPFM(MCSk), or GapLFM(MCSk)
    ≥1 → for Gapx(MCSk) ≥ 1, MCSk is identified as an MBPx weak point for the corresponding safety metric x, where x can be PMHF, SPFM, or LFM. For example, if GapPMHF(MCSk) ≥ 1 and GapSPFM(MCSk) ≥ 1, then MCSk belongs to MBPPMHF and MBPSPFM.
    <1 → for Gapx(MCSk) < 1, MCSk is identified as a PODx weak point for the corresponding safety metric x, where x can be PMHF, SPFM, or LFM.
The process regarding the FTA-based weak-point analysis is exhibited in Figure 3. All the hardware elements in the MBP weak points are required to lower their failure rates by safety mechanism deployment until there are no hardware elements marked as MBP weak points. Therefore, we specify a set named PT_MBP, which is the union of MBPPMHF, MBPSPFM, and MBPLFM to contain all the MBP weak points. If PT_MBP is not an empty set, then all the hardware elements in PT_MBP are required to employ the appropriate safety mechanisms to diminish the failure rates. On the other hand, if the PT_MBP becomes an empty set, then all the MCSs belong to the POD. If the target ASIL is still not achieved, the MCS with the highest Gapx(MCSk), where x can be PMHF, SPFM, or LFM, is selected, and the hardware elements contained in this MCS are assigned to set PT for POD. The elements in set PT for POD are the targets to conduct the safety mechanism deployment or enhancement. For safety mechanism deployment or improvement, we develop an ASIL-oriented system hardware architecture exploration algorithm to effectively apply safety mechanisms to achieve the safety requirements for the target ASIL goal. This algorithm will be introduced in the next subsection.
Before we depict the FTA-based weak-point analysis methodology, a simple example is used to explain the idea of the methodology. In this example, we assume that there are five hardware elements in the system. Furthermore, all five hardware elements are not protected by any safety mechanism initially. Table 2 shows the original failure rates for these five elements and Figure 4 shows the constructed fault tree for this simple system.
We point out that the hardware element C(5) does not appear in the constructed fault tree in Figure 4 because all of C(5)’s faults have no chance of causing a safety goal violation as shown in Table 2. In this example, we assume that the target ASIL is B. Therefore, the SPFMtar = 90%, LFMtar = 60%, and PMHFtar = 10−7 h−1 according to Table 1.
First, the failure rates as seen below are computed following the process of Figure 1.
S a f e t y - R e l a t e d   H W   e l e m e n t s λ s = 1.2 × 10 8
S a f e t y - R e l a t e d   H W   e l e m e n t s λ S P F = λ C 1 + λ C 2 + λ C 3 + λ C 4 = 1.43 × 10 6
S a f e t y - R e l a t e d   H W   e l e m e n t s λ R F = S a f e t y   R e l a t e d   H W   e l e m e n t s λ D P F , L = 0 (no hardware elements are applied to the safety mechanism).
Then the target failure probabilities at a mission time o five thousand hours can be calculated according to expressions (13)–(17) as shown below.
P F P M H F t a r t = 1 e P M H F t a r × t = 4.99875 × 10 4
λ S P F M t a r = 1 S P F M t a r × Safety - Related   HW   elements λ = 1 90 % × 1.44 × 10 6 = 1.44 × 10 7
P F S P F M t a r t = 1 e λ S P F M t a r × t = 7.14744 × 10 4
λ L F M t a r = 1 L F M t a r × Safety - Related   HW   elements λ λ S P F λ R F 0
P F L F M t a r t = 1 e λ L F M t a r × t = 0
All the MCSs can be identified according to the fault tree in Figure 4, and the failure probabilities for all MCSs can also be computed as shown in Table 3.
Consequently, we can mark each MCS as MBP or POD based on the calculated GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk) according to expressions (18)–(20), and the results are shown in Table 4.
The results in Table 4 show that MBP weak points exist in the system. Therefore, the protection target set PT is specified by the union of MBPPMHF, MBPSPFM, and MBPLFM, i.e., PT_MBP = {C(3), C(4)}. In the next subsection, we will illustrate how to apply appropriate safety mechanisms to the hardware elements in PT after we depict the proposed ASIL-oriented system hardware architecture exploration algorithm.

4.3. System Hardware Architecture Exploration with Safety and Hardware Overhead Consideration

The proposed system hardware architecture exploration is performed for two aspects: Safety and hardware overhead. For the former, the system hardware architecture needs to contain sufficient safety mechanism protection so that the safety metrics for the target ASIL can be achieved; for the latter, the additionally increased hardware overhead attributed to safety mechanism deployment needs to comply with the constraint specified by the system engineers. To fulfill the safety and hardware overhead requirements, we firstly propose an effective ASIL-oriented system hardware architecture exploration algorithm to determine a solution that meets the safety requirements, and then use a Hardware Overhead (HO)-oriented hardware architecture exploration algorithm to adjust the hardware architecture solution derived from the ASIL-oriented system hardware architecture exploration algorithm to satisfy the safety metrics and hardware overhead constraint simultaneously. These two algorithms will be introduced in Section 4.3.2 and Section 4.3.3, respectively. Furthermore, we will demonstrate how to perform these two algorithms for a system with the simple example already shown in Section 4.2.
In this work, the deployed safety mechanisms are categorized into three different levels, which are “High”, “Medium”, and “Low” and represent the diagnostic coverage (DC) estimated to be at least 99%, 90%, and 60%, respectively. Such categorized levels are adopted in ISO-26262. System engineers can also specify their required DC percentages for these three levels. If a hardware element belongs to the MCSk marked as an MBP weak point and the GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk) cannot be reduced to be lower than 1, even when the safety mechanism (SM) with “High” DC is used to protect the element, then the target ASIL goal will never be fulfilled. Thus, system engineers should replace the hardware element with a superior one with a lower inherent failure rate or adjust the target ASIL.

4.3.1. Safety Mechanism Library

There are different types of hardware elements in a system, such as microcontroller units, storages, sensors, and actuators. The specific safety mechanisms for each type of hardware element should be deployed to assure the effectiveness in terms of the diagnostic coverage and corresponding hardware overhead. If there exists a database that collects all the feasible safety mechanisms with the information of DC and hardware overhead for each type of hardware element, then we can rapidly discover the most appropriate safety mechanism to be employed. In this study, we call such a database the safety mechanism library. For safety-related hardware elements, a specific safety mechanism library can be established. Moreover, the safety mechanism library is formalized and then can be used in the succeeding ASIL-oriented and HO-oriented system hardware architecture exploration algorithms.
It is worth noting that there are two feasible fault-tolerant design concepts to prevent the dual-point faults from becoming latent. The first one is to deploy a safety mechanism with a self-checking capability and the other is to adopt two-layered safety mechanisms, which means that there is a first-layer safety mechanism for the hardware element protection and a second-layer safety mechanism to detect the first-layer safety mechanism’s faults. In this study, we assume that all the deployed safety mechanisms are developed with self-checking features to monitor the safety mechanism itself, and therefore, no second-layer safety mechanism is required.
The notations for the formalized safety mechanism library are defined below.
  • SMxyz(PT(e)): The deployed safety mechanism for the protection target PT(e), the eth element in the set PT, where x,y,z ∈ {L, M, H};
    x, y represents the level of the D C R F P T e and D C D P F , L P T e of the deployed SM for PT(e), respectively, where
    L means diagnostic coverage = 60%//Low diagnostic coverage.
    M means diagnostic coverage = 90%//Medium diagnostic coverage.
    H means diagnostic coverage = 99%//High diagnostic coverage.
    z is for the hardware overhead of PT(e) contributed from the deployed SM for PT(e).
    The percentages of hardware overhead for L(Low), M(Medium), and H(High) are assumed to be known and specified by the system engineers.
Then, the formalized safety mechanism library can be defined in the following:
  • SM_Lib(PT(e)): The safety mechanism library for the PT(e); the safety mechanisms of an element can be represented by a set that contains all feasible safety mechanisms such as {SMLLL, SMMMM, SMHHH} or {SMLML, SMMHH, SMHHM} or {SMMHM, SMHHH} depending on the hardware element type. SM_Lib(PT(e)) collects the sets of safety mechanism for all hardware elements in the PT(e).

4.3.2. ASIL-Oriented Hardware Architecture Exploration Algorithm

In Section 4.2, we have pointed out that GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk) must all be lower than 1 to eliminate the gaps between current and target failure probabilities so that the three hardware architecture metrics could be achieved. As seen from Figure 3, Max_Gap(k) represents the maximum gap among GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk) for the MCSk. In the following, we assume MCSk contains the hardware element C(i). Therefore, the reduced failure probabilities obtained from the safety mechanism deployment shall comply with the following condition:
F P M C S k t F P R F M C S k t > Max _ Gap ( k )
where F P M C S k t is the original failure probability of MCSk attributed to the hardware element C(i)’s failures without any safety mechanism protection, and F P R F M C S k t is the failure probability due to residual faults attributed to the hardware element C(i) under the safety mechanism protection. Then, the D C R F M C S k can be obtained according to the linearity between the F P M C S k t ,   F P R F M C S k t and Max_Gap(k) as described below.
If M a x _ G a p k = 2.5, the expression (21) can be rewritten as F P R F M C S k t < 0.4 × F P M C S k t . Next, we let F P M C S k t = F P C i t and F P R F C i t = 1 e λ C i × 1 D C R F C i × t 1 D C R F C i × F P C i t . Thus, we can acquire the expression 1 D C R F C i × F P C i t < 0.4 × F P C i t , which means that the D C R F C i must be greater than 60%. Therefore, we can conclude that the “Low” safety mechanism is sufficient to eliminate the gaps if Max_Gap(k) < 2.5.
Similarly, if 2.5 ≤ Max_Gap(k) < 10, then we can induce the D C R F C i = 90%, i.e., “Medium” safety mechanism to be sufficient and D C R F C i = 99%, i.e., “High” safety mechanism deployment for the case Max_Gap(k)  10 .
Once the D C R F C i is decided, the F P D P F , L C i t becomes nonzero. Therefore, the current F P M C S k t can be expressed as F P R F C i t + F P D P F , L C i and all the GapPMHF(MCSk), GapSPFM(MCSk) and GapLFM(MCSk) need to be updated to reflect such changes. The updated GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk) here are used to determine the D C D P F , L C i . Accordingly, F P R F C i t should be excluded when updating the gaps so that the D C D P F , L C i assignment can assure that the derived F P M C S k t can be lower than P F P M H F t a r t , P F S P F M t a r t , and P F L F M t a r t . The updated gaps are computed with the following expressions (22)–(24):
Gap PMHF ( MCS k ) = F P D P F , L C i t / P F P M H F t a r t F P R F C i t
Gap SPFM ( MCS k ) = F P D P F , L C i t / ( P F S P F M t a r t F P R F C i t )
Gap LFM ( MCS k ) = F P D P F , L C i t / ( P F L F M t a r t F P R F C i t )
where F P D P F , L C i t = 1 e λ C i × D C R F C i × 1 D C D P F , L C i × t with D C D P F , L C i = 0 .
Next, we reassign Max_Gap(k) with the largest one among the updated GapPMHF(MCSk), GapSPFM(MCSk), and GapLFM(MCSk). Consequently, D C D P F , L C i can be assigned according to the same derivation method for deciding D C R F C i as shown above.
The proposed ASIL-oriented system hardware architecture exploration algorithm is written by pseudo-codes as shown below.
In Algorithm 1, if the element has been assigned the safety mechanisms at previous iterations and selected again as the target to improve its diagnostic coverage, then the following criteria are adopted to guide the upgrade of the DC of safety mechanism for the target element to be protected. The criteria are based on the aspects of the effectiveness of failure rate reduction and the increased hardware overhead. We note that the protection of the hardware element for single-point and dual-point faults is based on the concept of a safety mechanism with self-checking. In general, the self-checking scheme employed to cope with the dual-point faults will be considered first, because we do not need to change the safety mechanism for single-point faults and enjoy the lower hardware overhead increase as well as the higher failure rate reduction. As we know, the failure rate of residual faults decreases while the DC of safety mechanism to tackle the single-point faults increases, but meanwhile, the failure rate of latent dual-point faults could increase due to the fact that more single-point faults covered by the safety mechanism could possibly become the latent dual-point faults. According to the reasons stated above, the self-checking scheme used to protect the safety mechanism is first considered to be enhanced if the hardware element has deployed the safety mechanisms and selected the protection target again.
Algorithm 1: ASIL-oriented system hardware architecture exploration
1: Function SM_Deploy(PT)
2:   for e = 1 to n do //n is the number of the elements in the set (PT)//
3:  {  flag ← 0; SM-status ← ‘false’;
4:     if ( D C R F P T e = 0   a n d   D C D P F , L P T e = 0 ) then //No protection for single-point faults and dual-point faults
5:     { GapSM_D ← Max(GapPMHF(PT(e)), GapSPFM(PT(e)), GapLFM(PT(e))); //apply the Max_Gap for PT(e)//
6:      SM_Sel(PT(e), GapSM_D, “RF”);
7:      Update GapPMHF(PT(e)), GapSPFM(PT(e)) and GapLFM(PT(e)) with expressions (22)–(24);
8:      GapSM_D ← Max(GapPMHF(PT(e)), GapSPFM(PT(e)), GapLFM(PT(e)));
9:      SM_Sel(PT(e), GapSM_D, “DPF,L”); flag ← 1; }
10:     if ( D C R F P T e 0  and D C D P F , L P T e = 0 ) then //there is a protection of single-
    point faults but no protection for latent dual-point faults//
11:     { Update GapPMHF(PT(e)), GapSPFM(PT(e)) and GapLFM(PT(e)) with expressions (22)–(24);
12:      GapSM_D ← Max(GapPMHF(PT(e)), GapSPFM(PT(e)), GapLFM(PT(e)));
13:      SM_Sel(PT(e), GapSM_D, “DPF,L”);}
14:     if ( f l a g = 0   and   D C R F P T e 0  and  D C D P F , L P T e 0 ) then //the considered
    element has been assigned the safety mechanisms at previous iterations and
     selected again as target to improve its diagnostic coverage//
15:     { if ( D C D P F , L P T e 99 % and  D C D P F , L P T e D C R F P T e ) then upgrade
      D C D P F , L P T e to the next higher level of DC; SM-status ← ‘true’;
     //as mentioned before, the protection of hardware element for single-
     point and dual-point faults is based on the concept of safety mechanism
     with self-checking. Therefore, considering the effectiveness of failure rate
     reduction and the increased hardware overhead, when DC of latent dual-
     point faults is not at the highest level and less than or equal to the DC of
      single-point faults, the D C D P F , L P T e is upgraded to the next higher level
     of DC//
16:    if (SM-status = ‘falseand  D C R F P T e 99 % ) then upgrade D C R F P T e to the next higher level of DC;
17:     if ( D C R F P T e = 99 %  and  D C D P F , L P T e = 99 % ) then
18:     { if PT(e)    PT_MBP then return(failed); //MBP cannot be eliminated using
    available SM_Lib, and therefore, fail to find a solution//
19:      else  P T MCS with the second most critical Max_Gap among all POD; call SM_Deploy(PT); }}
20: }
21:  if (PMHFite_dPMHFtar or SPFMite_d < SPFMtar or LFMite_d < LFMtar) then
22:   if (all safety-related hardware elements have been applied with SMHHz) then
    return(failed); //the target ASIL not achieved, but the highest DC of safety
    mechanism has been employed for all safety-related hardware elements, then
    the algorithm fails to disciver a feasible solution to satisfy the ASIL goal//
23: End SM_Deploy;
24: Function SM_Sel(PT(e), GapSM_D, type)
25:  switch GapSM_D do
26:   case ≥ 10 do//DC: High
27:    if (type = RF) then D C R F P T e 99 % ;  else  D C D P F , L P T e 99 % ; //SM_Lib
      always provides a high DC of safety mechanism for each hardware
      element in the set PT//
28:   case < 10 && ≥ 2.5 do //DC: Medium
29:    if (type = “RF”) then
30:     if (there exists SMMyz inSM_Lib(PT(e))) then  D C R F P T e 90 % ;  else
       D C R F P T e 99 % ; //check whether medium DC of safety mechanism
      for the target element to be protected is available or not in the
     SM_Lib. If not, the high DC of safety mechanism is used instead.//
31:    if (type = “DPF,L”) then
32:     if (there exists SMxMz inSM_Lib(PT(e))) then  D C D P F , L P T e 90 % ;
33:     else  D C D P F , L P T e 99 % ;
34:   case < 2.5 do //DC: Low
35:    if (type = “RF”) then
36:    { if (there exists SMLyz inSM_Lib(PT(e))) then  D C R F P T e 60 % ;
37:     else if (there exists SMMyz inSM_Lib(PT(e))) then  D C R F P T e 90 % ;
38:     else  D C R F P T e 99 % ;  }
39:    if (type = “DPF,L”) then
40:    { if (there exists SMxLz inSM_Lib(PT(e))) then  D C D P F , L P T e 60 % ;
41:     else if (there exists SMxMz inSM_Lib(PT(e))) then  D C D P F , L P T e 90 % ;
42:     else  D C D P F , L P T e 99 % ;  }
43: End SM_Sel;
In this work, we assume that SM_Lib will provide the ‘High’ level of DC of the safety mechanism for each hardware element in the set PT. For the cases to select the safety mechanism of ‘Medium’ and ‘Low’ DC, we need to determine whether the demanded safety mechanisms exist in the SMxyz(PT(e)). The safety mechanism with a higher level of DC will be deployed if the demanded safety mechanism cannot be found in the SM_Lib(PT(e)).
Next, we will demonstrate how to perform Algorithm 1 with the simple example presented in Section 4.2 where the set of protection targets is PT = {C(3), C(4)}. For the sake of simplicity, we use the same safety mechanism set {SMLLL, SMLML, SMMMM, SMMHM, SMHHH} for all hardware elements in the SM_Lib.
For C(3) and C(4) hardware elements, there is no safety mechanism deployed, so D C R F P T e = 0. Besides, from Table 4, the Max_Gap for C(3) and C(4) can be acquired. Thus, the deployed safety mechanisms can be determined as shown in Table 5 where the deployed safety mechanism for hardware element C(3) is represented by SMMyM, which means that the D C R F P T e = 90% with “Medium” hardware overhead and D C D P F , L P T e is still unspecified. The meaning of SMHyH can be explained in a similar way. After safety mechanism deployment to avoid single-point faults, the GapSM_D for C(3) and C(4) need to be updated following the aforementioned process with expressions (22)–(24). Then the D C D P F , L P T e for C(3) and C(4) can be decided according to the updated GapSM_D as illustrated in Table 6.
Then, the hardware architecture metrics can be updated in accordance with the deployed safety mechanisms as stated in the following:
S a f e t y - R e l a t e d   H W   e l e m e n t s λ s = 1.2 × 10 8
S a f e t y - R e l a t e d   H W   e l e m e n t s λ S P F = λ C 1 + λ C 2 = 1.2 × 10 7
S a f e t y - R e l a t e d   H W   e l e m e n t s λ R F = i = 3 4 λ C i × 1 D C R F C i = 3.65 × 10 8
S a f e t y - R e l a t e d   H W   e l e m e n t s λ D P F , L = i = 3 4 λ C i × D C R F C i × 1 D C D P F , L C i = 3.38 × 10 8  
S P F M i t e _ d = 1 Safety - Related   HW   elements λ S P F + λ R F Safety - Related   HW   elements λ = 1 1.57 × 10 7 1.43 × 10 6 = 89.06 %
L F M i t e _ d = 1 Safety - Related   HW   elements λ D P F ,   L Safety - Related   HW   elements λ λ S P F λ R F = 1 3.38 × 10 8 1.274 × 10 6 = 99.0 %
P M H F i t e _ d = λ S P F + λ R F + λ D P F , L = 1.9 × 10 7
Compared to the PMHFtar, SPFMtar, and LFMtar for target ASIL B, we can conclude that only LFMtar has been achieved and PMHFite_d and SPFMite_d still violate the target values. Thus, another design iteration is required to perform the weak-point analysis and exploration algorithm again to deploy and/or upgrade the safety mechanisms. The results of the weak-point analysis for the updated hardware architecture are shown in Table 7.
From Table 7, we can observe that all the MBPs have been resolved. The results show that the proposed exploration algorithm improves the system hardware architecture’s failure rates in an efficient fashion. However, the target ASIL is still not achieved even though no MBP exists. Thus, the most critical weak point among all MCSs marked as POD should be identified further. According to Table 7, {C(1)} is the most critical POD weak point. Then, Algorithm 1 is performed to assign an appropriate safety mechanism to C(1) according to its GapSM_D. As a result, the D C R F P T e and D C D P F , L P T e for C(1) are both decided to be 60%. Furthermore, the updated hardware architecture metrics are SPFMite_d = 92.24%, LFMite_d = 96.06%, and PMHFite_d = 1.63 × 10 7 , respectively. The results demonstrate that the PMHFite_d is still greater than the target value and, hence, another design iteration is required. The process for the next design iteration is similar to iteration 1 and 2, and the changes of D C R F P T e and D C D P F , L P T e for each hardware element in the following design iterations are summarized in Table 8.
After performing the six design iterations, the updated hardware architecture metrics are SPFMite_d = 95.69%, LFMite_d = 97.8%, and PMHFite_d = 9.18 × 10 8 . As a result, all the target values for achieving ASIL B have been satisfied. The deployed safety mechanisms for the hardware element C(1)C(4) are SMMMM, SMLLL, SMMHM, and SMHHH, respectively.
  • Because the increased hardware overhead for the deployed safety mechanisms is not examined in Algorithm 1, the overall hardware overhead for the whole system could violate the specified constraint. If the constraint is violated, the proposed HO-oriented hardware architecture exploration process should be activated. The corresponding details are depicted in the next subsection.

4.3.3. HO-Oriented Hardware Architecture Exploration Algorithm

As aforementioned, the HO-oriented hardware architecture exploration algorithm will be performed if the hardware overhead constraint is not met. The main purpose of this algorithm is to explore whether there are other safety mechanism deployment solutions with lower hardware overhead than the one assigned by the ASIL-oriented hardware architecture exploration algorithm. During such design space exploration, there will be four possible outcomes, which are:
a. 
Both the safety metrics and hardware overhead constraint are met.
b. 
The safety metrics are satisfied but the hardware overhead constraint is not.
c. 
The hardware overhead constraint is met but the safety metrics are not.
d. 
Neither the hardware overhead constraint nor the safety metrics are satisfied.
Outcome a indicates that a feasible system hardware architecture has been found and the FMEDA report will also be generated. For outcome b, a new round of system hardware architecture exploration is required to search for another possible solution with lower overall hardware overhead. To assure that the overall hardware overhead can be effectively reduced, our strategy is to replace the safety mechanism, which contributes the highest hardware overhead among all deployed safety mechanisms, with a safety mechanism with lower DC and lower hardware overhead. However, such a replacement is not allowed if the replacement will cause the element to become MBP again. In such a case, the element with the next highest hardware overhead in the overhead ranking will be selected as the target to be adjusted. Moreover, if outcome c occurs, the element deploying the safety mechanism with the lowest hardware overhead in the overhead ranking will be chosen to be replaced by the safety mechanism with a higher level of DC. Undoubtedly, the safety mechanism with better DC will lead to higher hardware overhead and has the potential to lead the outcome to turn into c or d. Therefore, the outcomes could alternatively repeat between b, c, and d until all the possible safety mechanism replacements have been examined. If so, it means that no feasible system hardware architecture can meet the hardware overhead constraint and the safety requirements simultaneously. Thus, the system engineers should review whether the specified constraint is reasonable for the target ASIL. For outcome d, we tend to meet the hardware overhead constraint first and then the safety metrics because satisfying the hardware overhead constraint is the primary goal in the current design phase.
We organize the concepts described above into an algorithm, which is shown in Algorithm 2. The following notations and expressions are defined next:
  • PT_d: The set containing the hardware elements with safety mechanism deployment in the system hardware architecture derived from the Algorithm 1.
  • no_d: The number of elements in the set PT_d.
  • ite_d: The number of search iterations performed.
  • HO_Maxsys: The maximal allowable system hardware overhead in percentage.
  • HO_Totalite_d: Total system hardware overhead due to safety mechanism deployment.
HO _ Total i t e _ d ( % ) = e = 1 n o _ d H C S M x y z P T _ d e i = 1 n H C i
where
  • H C i : The hardware size in unit for the ith hardware element and n is the total number of hardware elements in the evaluated system.
  • H C S M x y z P T _ d e : The hardware overhead in unit for the safety mechanism of the eth element in set PT_d.
    H C S M x y z P T _ d e = H C P T _ d e × H O S M x y z P T _ d e
    where HO(SMxyz(PT_d(e))), the required hardware overhead in percentage for protecting the hardware element PT_d(e), is specified by the system engineers.
Algorithm 2: HO-oriented system hardware architecture exploration
1: PT_d ← set of all hardware elements with safety mechanism deployment in the
   system hardware architecture derived from the Algorithm 1; ite_d ← 0;
2: Rank all the elements in PT_d by the hardware overhead in unit from high to low; Calculate HO_Totalite_d for PT_d;
4: pdown ← 1; pupno_d; SM_HO(HO_Totalite_d, down)//Select the first hardware element in PT_d as the target to adjust
5: Function SM_HO(HO_Totalite_d, strategy)
6:   while (puppdown)  //check if pup < pdown then stop the search; it means that the design
       space has been comprehensively explored and no
      solution can be found when pup < pdown occurs.//
7:   { if (strategy = down) and (downgrade the S M P T _ d p d o w n is allowable) then
8:   downgrade S M P T _ d p d o w n to reduce H C S M x y z P T _ d p d o w n ;
9:   else if (strategy = up) and (upgrade the S M P T _ d p u p is feasible) then
10:    upgrade S M P T _ d p u p to improve D C R F P T _ d p u p or D C D P F , L P T _ d p u p or both;
11:   else//both SM downgrade and upgrade are not allowable. In this case, the other candidate will be selected.
12:   {  if (strategy = down) then pdownpdown + 1; SM_HO(HO_Totalite_d, down); //Try next element
13:    else puppup − 1; SM_HO(HO_Totalite_d, up) //Try previous one element.}
14:   ite_dite_d + 1; update HO_Totalite_d and ASIL metrics;
15:   if (HO_Totalite_dHO_Maxsys) and (Target ASIL has been achieved) then
16:   { return (a feasible solution has ben discovered; all the adjusted elements with
    revised SM and overall system hardware overhead); terminate the
    Algorithm 2; //a cost-effective solution to meet the hardware overhead
   constraint and ASIL safety goal is obtained//}
17:   if (HO_Totalite_d > HO_Maxsys) and (Target ASIL has been achieved) then
18:   { if (downgrade the S M P T _ d p d o w n is feasible) then SM_HO(HO_Totalite_d, down);//Try same element again.
19:    else pdown = pdown + 1; SM_HO(HO_Totalite_d, down); //Try next element.}
20:   if (HO_Totalite_dHO_Maxsys) and (Target ASIL is not achieved) then
21:   {   if (upgrade the S M P T _ d p u p ) is feasible) then SM_HO(HO_Totalite_d, up)//Try same element again.
22:    else pup = pup − 1; SM_HO(HO_Totalite_d, up) //try previous one element.}
23:   if (HO_Totalite_d > HO_Maxsys) and (Target ASIL is not achieved) then //both hardware
  overhead and ASIL metrics are violated. In this case, try to meet hardware
   overhead constraint first//
24:   { if (downgrade the S M P T _ d p d o w n is feasible) then SM_HO(HO_Totalite_d, down); //Try same element again.
25:    else pdown = pdown + 1; SM_HO(HO_Totalite_d, down); //Try next element.}
26:  }
27:  return (failed); //pup < pdown occurs.
28:  End function;
In the following, we will illustrate how to perform Algorithm 2 with the example presented earlier. Before performing Algorithm 2, the hardware overhead of existing safety mechanisms for each hardware element are provided by the system engineers as shown in the Table 9. Here, the hardware overhead constraint HO_Maxsys is set up for 15%.
Next, H C S M x y z P T _ d e for PT_d = {{C(1), SMC(1)}, {C(2), SMC(2)}, {C(3), SMC(3)}, {C(4), SMC(4)}} derived from Algorithm 1 can be computed, and the results are summarized in Table 10.
According to Table 9 and Table 10, the overall hardware overhead can be computed as follows:
HO _ Total i t e _ d ( % ) = e = 1 4 H C S M x y z P T _ d e i = 1 5 H C i = 1.2 + 0.32 + 1.44 + 2 6 + 4 + 8 + 10 + 3 = 4.96 31 = 16.0 %
As a result, the current HO_Totalite_d is greater than HO_Maxsys, and therefore, Algorithm 2 is activated to adjust the system hardware architecture acquired from Algorithm 1. First, the set PT_d is specified to contain all the safety-related hardware elements that are protected by safety mechanisms, and then all the elements in PT_d are sorted according to the increased hardware overhead due to the deployed safety mechanisms. It is evident that PT_d = {C(4), C(3), C(1), C(2)}. Then we declare two pointers, pdown and pup where pdown points to the hardware element with the highest H C S M x y z P T _ d e , i.e., C(4), and pup points to the hardware element with the lowest H C S M x y z P T _ d e , i.e., C(2).
Subsequently, the safety mechanism of C(4), SMHHH(C(4)), is selected as the candidate to be downgraded for the hardware overhead reduction. However, the downgraded safety mechanism could allow C(4) to become MBP again. Therefore, the downgrade of the safety mechanism for C(4) is not allowable and hence we need to let pdown point to the hardware element with the next highest H C S M x y z P T _ d e , i.e., C(3). Unfortunately, the downgrade of C(3)’s safety mechanism is also not allowable so the next candidate C(1) is selected. At this time, the downgrade of SMMMM(C(1)) is allowable and feasible because C(1) is kept as POD with the downgraded safety mechanism. Consequently, SMMMM(C(1)) is replaced by SMLML(C(1)). In accordance with this replacement, all the considered design metrics HO_Totalite_d, SPFMite_d, LFMite_dˆ, and PMHFite_d need to be updated.
HO _ Total i t e _ d ( % ) = e = 1 4 H C S M x y z P T _ d e i = 1 5 H C i = 0.48 + 0.32 + 1.44 + 2 6 + 4 + 8 + 10 + 3 = 4.36 31 = 13.68 %
S P F M i t e _ d = 1 Safety - Related   HW   elements λ S P F + λ R F Safety - Related   HW   elements λ = 1 8.45 × 10 8 1.43 × 10 6 = 94.09 %
L F M i t e _ d = 1 Safety - Related   HW   elements λ M P F , L Safety - Related   HW   elements λ λ S P F λ R F = 1 2.79 × 10 8 1.35 × 10 6 = 97.93 %
P M H F i t e _ d = λ S P F + λ R F + λ M P F , L = 1.12 × 10 7
Clearly, the P M H F i t e _ d exceeds P M H F t a r although HO_Totalite_d has met the hardware overhead constraint. Therefore, another design iteration is activated to explore the potential solution.
In the next design iteration, the hardware element C(2) pointed by pup is selected as the upgraded candidate for its deployed safety mechanism. Apparently, the upgrade of SMLLL(C(2)) is feasible because there exists a safety mechanism with a higher level of DC. Thus, SMLLL(C(2)) is replaced by SMMMM(C(2)). It is worth noting that the replacement of SMLLL(C(2)) by SMLML(C(2)) cannot resolve the P M H F t a r violation situation. For this reason, SMLML(C(2)) is not applied. Again, we need to update the corresponding HO_Totalite_d, SPFMite_d, LFMite_d, and PMHFite_d.
HO _ Total i t e _ d ( % ) = e = 1 4 H C S M x y z P T _ d e i = 1 5 H C i = 0.48 + 0.64 + 1.44 + 2 6 + 4 + 8 + 10 + 3 = 4.56 31 = 14.71 %
S P F M i t e _ d = 1 Safety - Related   HW   elements λ S P F + λ R F Safety - Related   HW   elements λ = 1 7.13 × 10 8 1.43 × 10 6 = 95.01 %
L F M i t e _ d = 1 Safety - Related   HW   elements λ M P F , L Safety - Related   HW   elements λ λ S P F λ R F = 1 2.13 × 10 8 1.36 × 10 6 = 98.44 %
P M H F i t e _ d = λ S P F + λ R F + λ M P F , L = 9.26 × 10 8
The results show that all the hardware architecture metrics and the hardware overhead constraint have been satisfied. Thus, the fault tree can be updated with the results of the safety mechanism deployment as exhibited in Figure 5.
With this simple example, we have demonstrated that the proposed safety-oriented system hardware architecture exploration framework can simultaneously deal with four design metrics (three safety metrics and one hardware overhead constraint) with two exploration algorithms. The framework can deliver a system hardware architecture that conforms to the safety and hardware overhead requirements in a very limited number of design iterations. In the following section, to concretely demonstrate the effectiveness of the proposed framework, we employ a safety-critical AEB (Autonomous Emergency Braking) system adopted in the real automotive industry to exemplify how to apply the proposed framework to such a safety-related system design.

5. Case Study—An Autonomous Emergency Braking System

Figure 6 shows the functional block diagram of the AEB system [4]. The primary function of the AEB system is to warn drivers about emergent situations and autonomously brake vehicles to avoid a serious collision if drivers do not react to the warning signal.
For implementing the warning and autonomous braking function, radar will continuously monitor the distance between the subject vehicle and the front vehicle and provide the distance information to the central AEB control node. Once the AEB control node is aware that the current distance falls into the dangerous range and a collision could happen under the relative vehicle speed, the AEB control node will first send a warning message (a sound or flashlight) to warn the driver. If the driver does not react to the warning message and the situation becomes more severe, the AEB control node will inform the electric braking units to immediately perform the braking action to avoid the serious collision.
Figure 7 shows the hardware architecture of the AEB system. The CAN bus is adopted as an in-vehicle communication backbone. According to the designer’s requirements, other advanced in-vehicle communication protocols such as FlexRay or automotive Ethernet can also be employed.
The braking function of the AEB system is implemented with fail-operational consideration. Once any one among four EBD nodes (whether the Brake ECU or the EBM, Electric Brake Module) is diagnosed as having failed, the AEB control node will stop sending the braking force to the failed EBD node. Under the circumstances, braking forces are redistributed to the remaining three working EBD nodes. Therefore, the AEB system can tolerate one failed EBD node with degraded braking performance.
Figure 8 illustrates the fault tree constructed from the hardware architecture as shown in Figure 6. The K-out-of-N (or K/N) gate reflects the fail-operational design concept 3/4 (3-out-of-4) gate for four EBD nodes. It means that the failure of one EBD node will not lead to the AEB system failure.
One thing should be pointed out, and that is the concept of safety mechanism library presented in Section 4.3.1 is developed only for demonstrating the idea of the proposed safety framework. However, the variety of safety mechanisms or fault-tolerant techniques in the real world is more diverse than the safety mechanisms defined in Section 4.3.1. Therefore, in addition to the safety mechanisms described in Section 4.3.1, we also employ other types of safety mechanisms in the case study to concretely demonstrate our safety framework with more diverse safety mechanisms in the design of safety-critical automotive systems.
There are two kinds of safety mechanisms used in the case study, which do not belong to the safety mechanism library depicted in Section 4.3.1. The first one is the aforementioned k-out-of-n design applied at the system level instead of the element level for the EBD nodes. The advantage of implementing the element protection at the system level is that the additional safety mechanism for each individual hardware element is not required but needs to develop the error detection scheme to monitor the healthy status of EBD nodes. Here, we assume that an error detection has been embedded in each of the EBD nodes. Moreover, the corresponding design and verification complexity are raised as well. Besides, an issue arises regarding such a design, and that is how one can evaluate the failure rates for the hardware elements, i.e., Brake ECUs and EBM in this case study, under the protection of the K/N fault-tolerant design. In fact, the failure rates of each individual hardware element cannot be evaluated through the proposed expressions (1) and (2) because the effectiveness of the K/N fault-tolerant design cannot be directly represented by the diagnostic coverage D C R F P T e and D C D P F , L P T e . Instead, only an overall failure rate for the whole K/N-formed subsystem constituted by the four EBD nodes can be evaluated. The detailed steps for the evaluation are illustrated as follows.
(1)
Let λB_ECU, λEBM, and λEBD be the failure rates of the Brake ECU, EBM, and EBD nodes (which represents any one of EBD nodes 1-4) and then λEBD = λB_ECU + λEBM because either the failed Brake ECU or failed EBM would lead to the failure of the EBD node.
(2)
Let RK/N(t) be the reliability of the K/N-formed subsystem estimated at mission time t and then the RK/N(t) can be computed through expression (27) as shown below [29,30].
R K / N t = K N N ! K ! N K ! e λ E B D × t K 1 e λ E B D × t N K
(3)
Let λK/N_Sub be the failure rate of the K/N-formed subsystem and then the λK/N_Sub can be acquired by the following expression (28).
λ K / N _ S u b = l n R K / N t t
The second type of safety mechanism is the hardware duplication. To protect the selected hardware element with duplication, a duplication of the original hardware element is required. Next, the original and the duplicated hardware elements are formed as a pair, and each output of the paired hardware elements is compared to check the consistency through a comparator. Any inconsistency means that there must be at least one faulty element in the hardware pair. It is worth noting, in this work, we assume that the duplicated hardware element is implemented with the diversity technique so that the probability of common-cause failure occurring is reduced to be low enough and can be ignored. Thus, the formed hardware element pair combined with the comparator will cause the safety goal violation only when all of them fail concurrently. Such a fault scenario conforms to the three-point faults, which have been classified into a safe fault as mentioned in Figure 1. Thus, the failure rate of the hardware element deployed with hardware duplication will not be counted when computing the three hardware architecture metrics. However, the hardware overhead required to implement the hardware duplication technique will be 100%.
In this demonstration, we specify ASIL D as the target to be achieved. Thus, PMHFtar, SPFMtar, and LFMtar are required to be 10−8 h−1, 99% and 90%, respectively. The hardware element’s failure rates used for the purpose of demonstration are listed in Table 11. System engineers may specify more realistic component failure rates by applying reliability data books such as SN-29500, IEC-62380/61709, and HDBK-217F, which are widely adopted in the related industry. The percentage of non-safe faults and the applied safety mechanism library for each hardware element can also be found in Table 11. Besides, the mission time t is set to be five thousand hours, and the hardware overhead constraint HO_Maxsys is set to be 40%.
Now we can compute λ K / N _ S u b by applying the specified failure rates and mission time to expressions (27) and (28), respectively. The acquired λ K / N _ S u b is 4.18 × 10−11. λ K / N _ S u b is approximately −3~−4 order of magnitude compared to the failure rates of other hardware elements, so its effect on the overall system’s failure rate can be ignored. Thus, the λ K / N _ S u b will not be counted in the following hardware architecture metric calculation. In addition, the K/N-formed subsystem, i.e., the four EBD nodes, is also excluded in the FTA-based weak-point analysis. In addition to the K/N-formed subsystem, the identified MCS failure probability, acquired by quantified gaps through expressions (18)–(20) and the corresponding Max_Gap are summarized in Table 12.
According to Table 12, the PT is assigned as PT = PT_MBP = {CAN bus, AEB microcontroller, Brake pedal sensor, Brake pedal ECU, Radar, Radar ECU, Speed sensor, Speed Meter ECU}. Then the ASIL-oriented hardware architecture exploration algorithm is performed. For the sake of saving space, the AEB microcontroller, Brake pedal sensor, Brake pedal ECU, Radar ECU, Speed sensor, and Speed Meter ECU are abbreviated as AEB_MCU, B_SEN, B_PECU, R_ECU, S_SEN, and S_ECU, respectively, in the following demonstration.
Among all MBP weak points, it is evident that the Max_Gap of the CAN bus and AEB_MCU are much higher than others. Thus, to reduce the failure rates of these two hardware elements more efficiently, we apply the hardware duplication with diversity design to the CAN bus and AEB_MCU to let their faults become safe faults. Table 13 exhibits the results of the safety mechanism deployment in this design iteration.
After safety mechanism deployment, the hardware architecture metrics are calculated as shown below:
S P F M i t e _ d = 1 Safety - Related   HW   elements λ S P F + λ R F Safety - Related   HW   elements λ = 1 2.51 × 10 8 3.76 × 10 6 = 99.33 %
L F M i t e _ d = 1 Safety - Related   HW   elements λ D P F , L Safety - Related   HW   elements λ λ S P F λ R F = 1 7.05 × 10 9 3.73 × 10 6 = 99.81 %
P M H F i t e _ d = λ S P F + λ R F + λ D P F , L = 3.215 × 10 8
Consequently, only the P M H F i t e _ d still exceeds P M H F t a r and therefore a subsequent design iteration is required. The process of the FTA-based weak-point analysis and ASIL-oriented hardware architecture exploration at subsequent design iterations is very similar to the previous iteration and omitted here. The derived outcomes of safety mechanism deployment are SMCAN bus = SMHHH, SMAEB_MCU = SMHHH, SMB_SEN = SMHHH, SMB_PECU = SMMMM, SMRadar = SMHHH, SMR_ECU = SMMMM, SMS_SEN = SMHHH, SMS_ECU = SMMMM, SMPS = SMLML (PS stands for the power supply) and the corresponding hardware architecture metrics all comply with the target ASIL D requirements, which are SPFM = 99.87%, LFM = 99.89%, and PMHF = 8.995 × 10 9 . To reach these results, there are ten design iterations executed. Next, the total hardware overhead needs to be calculated according to the hardware overhead data summarized in Table 14.
HO _ Total ite_d ( % ) = e = 1 9 H C S M x y z P T e i = 1 17 H C i = 82.84 204 = 40.61 %   >   HO Max sys
Because the current system hardware overhead does not meet the constraint, the HO-oriented hardware architecture exploration algorithm is activated to solve the hardware overhead problem.
The process of performing the HO-oriented hardware architecture exploration is similar to the previous example presented in Section 4.3.3. Therefore, we do not repeat the process and directly provide the modified parts, which are summarized as follows:
SMS_SEN = SMHHHSMMHM, SMS_ECU = SMMMMSMMHM and SMPS = SMLMLSMHHH.
HO_Totalite_d(%) = e = 1 9 H C S M x y z P T e i = 1 17 H C i = 81 204 = 39.71 % < HO_Maxsys.
Hardware architecture metrics, which all comply with the target ASIL D requirement are SPFM = 99.83%, LFM = 99.52%, and PMHF = 9.58 × 10 9 .
Thus, after a total of thirteen design iterations, including eleven iterations for the ASIL-oriented algorithm and two iterations for the HO-oriented algorithm, the resulting fault tree and the FMEDA report are obtained, as shown in Figure 9 and Table 15, respectively.
Through the AEB case study, we have illustrated the proposed safety-oriented system hardware architecture exploration framework for the safety-critical automotive system design. Furthermore, the remarkable performance of the proposed framework has also been demonstrated so only a limited number of design iterations is required to achieve a cost-effective and reliable hardware architecture that complies with the ASIL safety goal and hardware overhead constraint simultaneously.

6. Conclusions and Future Works

In this study, we focus on the design of safety-critical automotive systems, and especially consider the metrics of ASIL safety goal and hardware overhead constraint together in the development process. A safety-oriented system hardware architecture exploration framework is proposed to tackle the complexity of the safety-critical automotive system design. The core of the framework consists of three phases, namely FTA-based weak-point analysis, an ASIL-oriented hardware architecture exploration algorithm, and an HO-oriented hardware architecture exploration algorithm. The main contributions of this work are the development of the ASIL-oriented and HO-oriented hardware architecture exploration algorithms to rapidly discover a cost-effective and robust hardware architecture, which satisfies the target ASIL represented by three hardware architecture metrics, SPFM, LFM, and PMHF, defined in ISO-26262, and the system hardware overhead constraint at the same time. The second is to illustrate how to accomplish hardware architecture exploration for the AEB system to comply with the requirements of the ISO-26262 functional safety standard and hardware overhead constraint. Through the proposed FTA-based weak-point analysis and safety-oriented fault-tolerant design methodologies, we have shown how to effectively apply safety mechanisms to the system design so that the required ASIL can be achieved with the required hardware overhead constraint.
We successfully overcome the high design complexity challenge of fault-tolerant hardware design and kept the number of design iterations low enough to make the approach feasible and effective in real cases. Besides, the proposed methodologies are very suitable to be implemented in an EDA (Electronic Design Automation) tool. Integrating our design framework into an automotive design tool chain will be our next work to be accomplished.

Author Contributions

Conceptualization, K.-L.L.; methodology, K.-L.L. and Y.-Y.C.; validation, K.-L.L. and Y.-Y.C.; formal analysis, K.-L.L. and Y.-Y.C.; investigation, K.-L.L. and Y.-Y.C.; writing—original draft preparation, K.-L.L.; writing—review and editing, Y.-Y.C.; supervision, Y.-Y.C.; project administration, Y.-Y.C.; funding acquisition, Y.-Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the MOST academic research project under Contract Number 108-2221-E-305-004-MY2.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors acknowledge the support of MOST academic research project under Contract Number 108-2221-E-305-004-MY2.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. ISO/DIS 26262; ISO-26262 Road Vehicles—Functional Safety. International Organization for Standardization: Geneva, Switzerland, 2018.
  2. Marchio, F.; Vittorelli, B.; Colombo, R. Automotive electronics: Application & technology megatrends. In Proceedings of the 40th ESSDERC, European Solid-State Circuits Conference, Venice Lido, Italy, 22–26 September 2014; pp. 23–29. [Google Scholar]
  3. Clark, J.O. System of Systems Engineering and Family of Systems Engineering From a Standards, V-Model, and Dual-V Model Perspective. In Proceedings of the 3rd IEEE Systems Conference, Vancouver, BC, Canada, 23–26 March 2009; pp. 381–387. [Google Scholar]
  4. Cheon, J.S.; Kim, J.; Jeon, J.; Lee, S.M. Brake by Wire Functional Safety Concept Design for ISO/DIS 26262; SAE Technical Paper; SAE International: Warrendale, PA, USA, 2011. [Google Scholar]
  5. Das, N.; Taylor, W. Quantified fault tree techniques for calculating hardware fault metrics according to ISO 26262. In Proceedings of the 2016 IEEE Symposium on Product Compliance Engineering (ISPCE), Anaheim, CA, USA, 16–18 May 2016; pp. 1–8. [Google Scholar]
  6. Cherfi, A. Toward an Efficient Generation of ISO 26262 Automotive Safety Analyses. Ph.D. Thesis, Polytechnic School, Pasadena, CA, USA, July 2015. [Google Scholar]
  7. Sakurai, A. Generalized formula for the calculation of a probabilistic metric for random hardware failures in redundant subsystems. In Proceedings of the 2017 IEEE Symposium on Product Compliance Engineering (ISPCE), San Jose, CA, USA, 8–10 May 2017; pp. 1–5. [Google Scholar]
  8. Wang, T.; Chen, X.; Cai, Z.; Mi, J.; Lian, X. A mixed model to evaluate random hardware failures of whole-redundancy system in ISO 26262 based on fault tree analysis and Markov chain. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2018, 233, 890–904. [Google Scholar] [CrossRef]
  9. Famfulik, J.; Richtar, M.; Rehak, R.; Smiraus, J.; Dresler, P.; Fusek, M.; Mikova, J. Application of hardware reliability calculation procedures according to ISO 26262 standard. Qual. Reliab. Eng. Int. 2020, 36, 1822–1836. [Google Scholar] [CrossRef]
  10. Atsushi, S. A Framework for Performing Quantitative Fault Tree Analyses for Subsystems with Periodic Repairs. In Proceedings of the 2021 Annual Reliability and Maintainability Symposium (RAMS), Orlando, FL, USA, 24–27 May 2021; pp. 1–6. [Google Scholar]
  11. Ghadhab, M.; Junges, S.; Katoen, J.-P.; Kuntz, M.; Volk, M. Safety analysis for vehicle guidance systems with dynamic fault trees. Reliab. Eng. Syst. Saf. 2019, 186, 37–50. [Google Scholar] [CrossRef] [Green Version]
  12. Lu, K.; Chen, Y. ISO 26262 ASIL-Oriented Hardware Design Framework for Safety-Critical Automotive Systems. In Proceedings of the 2019 IEEE International Conference on Connected Vehicles and Expo (ICCVE), Graz, Austria, 4–8 November 2019; pp. 1–6. [Google Scholar]
  13. Xie, G.; Li, Y.; Han, Y.; Xie, Y.; Zeng, G.; Li, R. Recent Advances and Future Trends for Automotive Functional Safety Design Methodologies. IEEE Trans. Ind. Inform. 2020, 16, 5629–5642. [Google Scholar] [CrossRef]
  14. Xie, G.; Ma, W.; Peng, H.; Li, R.; Li, K. Price Performance-Driven Hardware Cost Optimization Under Functional Safety Requirement in Large-Scale Heterogeneous Distributed Embedded Systems. IEEE Trans. Ind. Electron. 2019, 68, 4485–4497. [Google Scholar] [CrossRef]
  15. Xie, G.; Chen, Y.; Li, R.; Li, K. Hardware Cost Design Optimization for Functional Safety-Critical Parallel Applications on Heterogeneous Distributed Embedded Systems. IEEE Trans. Ind. Inform. 2017, 14, 2418–2431. [Google Scholar] [CrossRef]
  16. Xie, G.; Chen, Y.; Liu, Y.; Li, R.; Li, K. Minimizing Development Cost with Reliability Goal for Automotive Functional Safety During Design Phase. IEEE Trans. Reliab. 2017, 67, 196–211. [Google Scholar] [CrossRef]
  17. Xie, G.; Zeng, G.; Li, R. Safety Enhancement for Real-Time Parallel Applications in Distributed Automotive Embedded Systems: A Stable Stopping Approach. IEEE Trans. Parallel Distrib. Syst. 2020, 31, 2067–2080. [Google Scholar] [CrossRef]
  18. Bandur, V.; Pantelic, V.; Tomashevskiy, T.; Lawford, M. A Safety Architecture for Centralized E/E Architectures. In Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Taipei, Taiwan, 21–24 June 2021; pp. 67–70. [Google Scholar]
  19. Gheraibia, Y.; Kabir, S.; Djafri, K.; Krimou, H. An overview of the approaches for automotive safety integrity levels allocation. J. Fail. Anal. Prev. 2018, 18, 707–720. [Google Scholar] [CrossRef] [Green Version]
  20. Ebner, C.; Gorelik, K.; Zimmermann, A. Model-Based Design Space Exploration for Fail-Operational Mechatronic Systems. In Proceedings of the IEEE International Symposium on Systems Engineering (ISSE), Vienna, Austria, 13 September–13 October 2021; pp. 1–8. [Google Scholar]
  21. Vermeulen, F.B.; Goossens, K.G.W. Automotive Architecture Topologies: Analysis for Safety-Critical Autonomous Vehicle Applications. IEEE Access 2021, 9, 62837–62846. [Google Scholar]
  22. Vermeulen, F.B.; Goossens, K. Component-Level ASIL Decomposition for Automotive Architectures. In Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W), Portland, OR, USA, 24–27 June 2019; pp. 62–69. [Google Scholar]
  23. Hu, B.; Xu, S.; Cao, Z.; Zhou, M. Safety-Guaranteed and Development Cost-Minimized Scheduling of DAG Functionality in an Automotive System. IEEE Trans. Intell. Transp. Syst. 2020, 23, 3074–3086. [Google Scholar] [CrossRef]
  24. Byun, S.; Yang, I.; Song, M.G.; Lee, D. Reliability Evaluation of Steering System Using Dynamic Fault Tree. In Proceedings of the 2013 IEEE Intelligent Vehicles Symposium (IV), Gold Coast, Australia, 23–26 June 2013; pp. 1416–1420. [Google Scholar]
  25. Papadopoulos, Y.; Maruhn, M. Model-based synthesis of fault trees from MATLAB-Simulink models. In Proceedings of the 2001 International Conference on Dependable Systems and Networks, Gothenburg, Sweden, 1–4 July 2001; pp. 77–82. [Google Scholar]
  26. Sharvia, S.; Papadopoulos, Y. Integrating model checking with HiP-HOPS in model-based safety analysis. Reliab. Eng. Syst. Saf. 2015, 135, 64–80. [Google Scholar] [CrossRef]
  27. Huang, C.; Li, L. Architectural design and analysis of a steer-by-wire system in view of functional safety concept. Reliab. Eng. Syst. Saf. 2020, 198, 106822. [Google Scholar] [CrossRef]
  28. Lu, K.-L.; Chen, Y.-Y. Model-based design, analysis and assessment framework for safety-critical systems. In Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks—Supplemental Volume (DSN-S), Taipei, Taiwan, 21–24 June 2021; pp. 25–26. [Google Scholar]
  29. k-out-of-n Systems—ReliaWiki. Available online: https://www.reliawiki.com/index.php?oldid=60226 (accessed on 8 May 2022).
  30. Romeu, J.L. Understanding series and parallel systems reliability. In Selected Topics in Assurance Related Technologies (START); Department of Defense Reliability Analysis Center (DoD RAC): Rome, Italy; New York, NY, USA, 2004; Volume 11, pp. 1–8. [Google Scholar]
Figure 1. ISO 26262 fault classification and failure rate calculation process.
Figure 1. ISO 26262 fault classification and failure rate calculation process.
Applsci 12 05456 g001
Figure 2. Flowchart of the proposed safety-oriented system HW architecture exploration framework.
Figure 2. Flowchart of the proposed safety-oriented system HW architecture exploration framework.
Applsci 12 05456 g002
Figure 3. Execution flow of the proposed FTA-based weak-point analysis.
Figure 3. Execution flow of the proposed FTA-based weak-point analysis.
Applsci 12 05456 g003
Figure 4. Constructed fault tree for the simple system.
Figure 4. Constructed fault tree for the simple system.
Applsci 12 05456 g004
Figure 5. Updated fault tree for the simple system with final safety mechanism deployment.
Figure 5. Updated fault tree for the simple system with final safety mechanism deployment.
Applsci 12 05456 g005
Figure 6. Functional block diagram of AEB system.
Figure 6. Functional block diagram of AEB system.
Applsci 12 05456 g006
Figure 7. Hardware architecture of the AEB system.
Figure 7. Hardware architecture of the AEB system.
Applsci 12 05456 g007
Figure 8. Constructed fault tree of the illustrated AEB system.
Figure 8. Constructed fault tree of the illustrated AEB system.
Applsci 12 05456 g008
Figure 9. Resulting fault tree for the AEB system hardware architecture in compliance with ASIL D and the hardware overhead constraint.
Figure 9. Resulting fault tree for the AEB system hardware architecture in compliance with ASIL D and the hardware overhead constraint.
Applsci 12 05456 g009
Table 1. Target values of SPFM, LFM, and PMHF for different ASILs.
Table 1. Target values of SPFM, LFM, and PMHF for different ASILs.
ASIL BASIL CASIL D
SPFM≥90%≥97%≥99%
LFM≥60%≥80%≥90%
PMHF<10−7 h−1<10−7 h−1<10−8 h−1
Table 2. The hardware elements and their failure rates.
Table 2. The hardware elements and their failure rates.
HW ElementHW Unit *% Non-Safe FaultFailure Rate λC(i)(/h)
C(1)6100%7.6 × 10−8
C(2)4100%4.4 × 10−8
C(3)8100%2.6 × 10−7
C(4)10100%1.05 × 10−6
C(5)30%1.2 × 10−8
* 1 HW unit = 10 K gate counts.
Table 3. List of MCSs and their failure probabilities.
Table 3. List of MCSs and their failure probabilities.
MCSSPF/DPF F P M C S k t = 5000   h
{C(1)}SPF3.79928 × 10−4
{C(2)}SPF2.19976 × 10−4
{C(3)}SPF1.29916 × 10−3
{C(4)}SPF5.23624 × 10−3
Table 4. List of MCSs and their quantified gaps to the target values and MBP/POD identifications.
Table 4. List of MCSs and their quantified gaps to the target values and MBP/POD identifications.
MCSGapPMHF (MCSk)GapSPFM (MCSk)GapLFM (MCSk)MBPPMHF/PODPMHFMBPSPFM/PODSPFMMBPLFM/PODLFM
{C(1)}7.6005 × 10−15.3156 × 10−1N/APODPMHFPODSPFMN/A
{C(2)}4.4006 × 10−13.0777 × 10−1N/APODPMHFPODSPFMN/A
{C(3)}2.5990 × 1001.8177 × 100N/AMBPPMHFMBPSPFMN/A
{C(4)}1.0475 × 1017.3260 × 100N/AMBPPMHFMBPSPFMN/A
Table 5. Safety mechanism deployment for the PT according to D C R F P T e .
Table 5. Safety mechanism deployment for the PT according to D C R F P T e .
MCSGapSM_DDeployed SM D C R F P T e
{C(3)}2.5990 × 100SMMyM90%
{C(4)}1.0475 × 101SMHyH99%
Table 6. Safety mechanism deployment for the PT according to D C D P F , L P T e .
Table 6. Safety mechanism deployment for the PT according to D C D P F , L P T e .
MCSGapSM_DDeployed SM D C D P F , L P T ( e )
{C(3),SMC(3)}3.1613 × 100SMMMM90%
{C(4),SMC(4)}1.1588 × 101SMHHH99%
Table 7. List of MCSs and their failure probabilities with updated system hardware architecture.
Table 7. List of MCSs and their failure probabilities with updated system hardware architecture.
MCSGapPMHF (MCSk)GapSPFM (MCSk)GapLFM (MCSk)MBPPMHF/PODPMHFMBPSPFM/PODSPFMMBPLFM/PODLFM
{C(1)}7.6005 × 10−15.3156 × 10−11.49357 × 10−1PODPMHFPODSPFMPODLFM
{C(2)}4.4006 × 10−13.0777 × 10−18.64767 × 10−2PODPMHFPODSPFMPODLFM
{C(3), SMC(3)}4.9409 × 10−13.4556 × 10−19.70944 × 10−2PODPMHFPODSPFMPODLFM
{C(4), SMC(4)}2.0900 × 10−11.4617 × 10−14.10700 × 10−2PODPMHFPODSPFMPODLFM
Table 8. The assigned D C R F P T e and D C D P F , L P T e in the design iteration 4–6. (The DC value adjusted in each design iteration is marked as bold text.
Table 8. The assigned D C R F P T e and D C D P F , L P T e in the design iteration 4–6. (The DC value adjusted in each design iteration is marked as bold text.
Hardware ElementIteration 3Iteration 4Iteration 5Iteration 6
D C R F P T e   D C D P F , L P T e D C R F P T e D C D P F , L P T e D C R F P T e D C D P F , L P T e D C R F P T e D C D P F , L P T e
C(1)60%60%60%90%60%90%90%90%
C(2)--------60%60%60%60%
C(3)90%99%90%99%90%99%90%99%
C(4)99%99%99%99%99%99%99%99%
Table 9. The hardware overhead of safety mechanisms for the elements in the system.
Table 9. The hardware overhead of safety mechanisms for the elements in the system.
Hardware ElementHardware Unit H O S M x y z P T e %
SMLLLSMLMLSMMMMSMMHMSMHHH
C(1)668202432
C(2)4810162040
C(3)8810161824
C(4)1068121620
C(5)31012202430
Table 10. List of PT_d(e) and their H C S M x y z P T _ d e .
Table 10. List of PT_d(e) and their H C S M x y z P T _ d e .
PT_dHC(i)HO(SMxyz(PT_d(e))) (%)
H C S M x y z P T _ d e
{C(1),SMC(1)}6HO(SMMMM(C(1))) = 20%1.2
{C(2),SMC(2)}4HO(SMLLL(C(2))) = 8%0.32
{C(3),SMC(3)}8HO(SMMHM(C(3))) = 18%1.44
{C(4),SMC(4)}10HO(SMHHH(C(4))) = 20%2.0
Table 11. Hardware element’s failure rates of the AEB system.
Table 11. Hardware element’s failure rates of the AEB system.
Hardware ElementsFailure Rate λ (/h)% Non-Safe FaultSM_Lib(C(i))
Brake ECU3.3 × 10−7100%Not Necessary
EBM4.2 × 10−7100%
CAN bus2.4 × 10−7100%{SMLML, SMLHL, SMHHH}
AEB microcontroller3.8 × 10−7100%{SMLML, SMLHL, SMMMM, SMMHM, SMHHH}
Brake pedal sensor2.6 × 10−8100%
Brake pedal ECU1.05 × 10−8100%
Radar1.3 × 10−7100%
Radar ECU1.05 × 10−8100%
Speed sensor2.6 × 10−8100%
Speed Meter ECU1.05 × 10−8100%
Power supply2 × 10−8100%{SMLHL, SMMHM, SMHHH }
Table 12. List of MCSs and their failure probabilities for the AEB system.
Table 12. List of MCSs and their failure probabilities for the AEB system.
MCS F P M C S k t GapPMHF (MCSk)GapSPFM (MCSk)GapLFM (MCSk)Max_Gap
{CAN bus}1.19928 × 10−32.3986 × 1016.3856 × 100N/A2.3986 × 101
{AEB microcontroller}1.89820 × 10−33.7965 × 1011.0107 × 101N/A3.7965 × 101
{Brake pedal sensor}5.24986 × 10−52.5999 × 1006.9178 × 10−1N/A2.5999 × 100
{Brake pedal ECU}1.29992 × 10−41.0500 × 1002.7953 × 10−1N/A1.0500 × 100
{Radar}2.64965 × 10−45.2994 × 1001.4108 × 100N/A5.2994 × 100
{Radar ECU}5.24986 × 10−51.0500 × 1002.7953 × 10−1N/A1.0500 × 100
{Speed sensor}1.29992 × 10−42.5999 × 1006.9178 × 10−1N/A2.5999 × 100
{Speed Meter ECU}5.24986 × 10−51.0500 × 1002.7953 × 10−1N/A1.0500 × 100
{Power supply}9.99995 × 10−62.0000 × 10−15.3245 × 10−2N/A2.0000 × 10−1
Table 13. Safety mechanism deployments and their DCRF and DCDPF,L for PT.
Table 13. Safety mechanism deployments and their DCRF and DCDPF,L for PT.
PT(e)Deployed SM D C R F P T e D C D P F , L P T e
{CAN bus,SMCAN bus}Duplication-- --
{AEB_MCU,SMAEB_MCU}Duplication-- --
{B_SEN,SMB_SEN}SMMMM90%90%
{B_PECU,SMB_PECU}SMLML60%90%
{Radar,SMRadar}SMMHM90%99%
{R_ECU,SMR_ECU}SMLML60%90%
{S_SEN,SMS_SEN}SMMMM90%90%
{S_ECU,SMS_ECU}SMLML60%90%
Table 14. Hardware overhead of safety mechanism for each AEB hardware element.
Table 14. Hardware overhead of safety mechanism for each AEB hardware element.
Hardware ElementHardware Unit H O S M x y z P T e %
SMLLLSMLMLSMMMMSMMHMSMHHH
Brake ECU 1-48----------
EBM 1-44----------
CAN bus16810----100
AEB MCU2020255055100
Brake pedal sensor82428455075
Brake pedal ECU142530606480
Radar122830485475
Radar ECU162832525675
Speed sensor102024454875
Speed Meter ECU142025485275
Power supply61215----20
Table 15. FMEDA report for the AEB system with ASIL D safety goal.
Table 15. FMEDA report for the AEB system with ASIL D safety goal.
Hardware ElementFailure RateSRFailure Mode(FM)FDVSMFMCRF/SPFVILFMCLLMPF
CAN Bus2.4 × 10−7xModule failure100% Duplication100%0xDuplication100%0
Brake ECU 1-43.3 × 10−7xModule failure100% K/N 100%0xK/N100%0
EBM 1-44.2 × 10−7xModule failure100% K/N 100%0xK/N100%0
AEB MCU3.8 × 10−7xModule failure100% Duplication100%0xDuplication100%0
Brake pedal ECU1.05 × 10−8xModule failure100%xSMB_PECU99%2.6 × 10−10xSMB_PECU90%2.57 × 10−10
Brake pedal sensor2.6 × 10−8xModule failure100%xSMB_SEN90%1.05 × 10−9xSMB_SEN99%9.45 × 10−10
Radar5.3 × 10−8xModule failure100%xSMRadar99%5.3 × 10−10xSMRadar99%5.25 × 10−10
Radar ECU1.05 × 10−8xModule failure100%xSMR_ECU90%1.05 × 10−9xSMR_ECU99%9.45 × 10−10
Speed sensor2.5 × 10−8xModule failure100%xSMS_SEN90%2.6 × 10−9xSMS_SEN99%2.34 × 10−10
Speed Meter ECU1.05 × 10−8xModule failure100%xSMS_ECU90%1.05 × 10−9xSMS_ECU99%9.45 × 10−11
Power supply2.0 × 10−8xModule failure100%xSMPS99%2.00 × 10−11xSMPS99%1.98 × 10−11
Total3.76 × 10−6 6.56 × 10−9 3.02 × 10−9
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lu, K.-L.; Chen, Y.-Y. Safety-Oriented System Hardware Architecture Exploration in Compliance with ISO 26262. Appl. Sci. 2022, 12, 5456. https://doi.org/10.3390/app12115456

AMA Style

Lu K-L, Chen Y-Y. Safety-Oriented System Hardware Architecture Exploration in Compliance with ISO 26262. Applied Sciences. 2022; 12(11):5456. https://doi.org/10.3390/app12115456

Chicago/Turabian Style

Lu, Kuen-Long, and Yung-Yuan Chen. 2022. "Safety-Oriented System Hardware Architecture Exploration in Compliance with ISO 26262" Applied Sciences 12, no. 11: 5456. https://doi.org/10.3390/app12115456

APA Style

Lu, K. -L., & Chen, Y. -Y. (2022). Safety-Oriented System Hardware Architecture Exploration in Compliance with ISO 26262. Applied Sciences, 12(11), 5456. https://doi.org/10.3390/app12115456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop