1. Introduction
According to the British standard BS EN 13306:2017, “Maintenance is a combination of all technical, administrative, and managerial actions during the life cycle of an item intended to retain it in, or restore it to, a state in which it can perform the required function [
1].” Numerous maintenance types and techniques, such as condition-based maintenance (CBM) [
2,
3], predictive maintenance [
4], prescriptive maintenance [
5], remote maintenance [
6,
7], preventive maintenance [
8,
9], and e-maintenance [
10,
11] have been developed in recent decades. Owners of various pieces of equipment face three key maintenance challenges: lowering maintenance costs, increasing availability or operational reliability, and selecting the most effective techniques to improve operational characteristics. The estimated maintenance costs range from 15% to 70% of the cost of items sold [
12]. A significant part of the losses is due to unplanned downtime. According to [
13], industrial manufacturers incur an estimated USD 50 billion in annual losses owing to unplanned downtime and equipment failure accounts for 42% of this unplanned downtime. Therefore, it is crucial to assess the effectiveness of maintenance policies in addition to developing them.
In the following, we discuss the concept of a maintenance and repair system (MRS). We define MRS as a set of tools, maintenance, and repair documentation, and performers necessary to maintain and restore the object of maintenance (OM). The difference between maintenance and repair is that maintenance is conducted to prevent unexpected equipment downtime, whereas repair is conducted following downtime to restore the OM operable state and reduce losses. Under MRS effectiveness, we refer to a set of properties that characterize the ratio between the costs of resources (material, time, or labor) to maintain and restore the health of the OM and the effect achieved. The effectiveness of MRS depends on the reliability, maintainability, and durability of the OM as well as the trustworthiness of inspections or condition monitoring. The quantitative characteristics of various MRS properties are indicators of the reliability, maintainability, durability, and trustworthiness of the inspections. For example, indicators of inspection trustworthiness quantitatively characterize the properties of the inspection tool, which is a subsystem in the MRS, to objectively display the actual condition of the OM based on the inspection results.
Currently, technical and technical-economic indicators are used to quantify maintenance effectiveness. Technical indicators include the achieved availability, inherent availability, mission availability, and operational reliability. Technical-economic indicators include the long-run average profit per unit time and long-run average cost per unit time.
Choosing maintenance effectiveness indicators of a system requires identifying the key features that distinguish it from others. Therefore, for the proper selection of MRS effectiveness indicators, the development of a technical system classification is required according to signs that consider the unique features of their design, intended use, and operating conditions.
After selecting the MRS effectiveness indicators, a maintenance model should be developed or chosen for the calculation. This model should reflect the features of the structural construction of the system in terms of reliability, the presence of external manifestations of failure, and the main signs of maintenance operations. Since any MRS necessarily includes an inspection subsystem and a recovery subsystem, the maintenance model should consider the characteristics of operations, including inspection and repair. The efficiency of using a system for its intended purpose depends on the type of inspection. Therefore, the maintenance model should consider the type of inspection and the degree of system repair.
Based on the literature review in
Section 2, the following findings can be drawn:
- (1)
Performance indicators are considered at three levels of maintenance control: strategic, tactical, and operational.
- (2)
An analysis of the published studies revealed that diverse types of maintenance performance indicators exist in the literature for each level of maintenance control.
- (3)
At the operational stage, the most frequently used maintenance indicators are instantaneous availability, steady-state availability, average availability, inherent availability, mission availability, operational reliability, long-run average profit per unit time, long-run average cost per unit time, and average lifetime maintenance costs.
- (4)
To date, several classifications of maintenance performance indicators have been developed, including, for example, such categories of indicators as equipment-related, maintenance task-related, cost-related, and so on. However, the signs by which indicators should be selected in each group for systems of various purposes have not been indicated. Since a formalized approach to selecting the indicators has not been developed, users are forced to subjectively choose suitable indicators for their circumstances from a set of known indicators.
- (5)
The same effectiveness indicators were used for diverse types of maintenance, including preventive, corrective, condition-based, predictive, and prescriptive maintenance. Simultaneously, there is no formal classification of maintenance models that allows an appropriate model to be objectively chosen according to certain features.
- (6)
There is no formalized approach to the classification of systems for selecting maintenance effectiveness indicators and for the classification of the maintenance models necessary to calculate the selected indicators.
- (7)
In the existing preventive, corrective, condition-based, and predictive maintenance models, it is assumed that failures are detected using periodic (or sequential) inspections and/or continuous condition monitoring.
- (8)
Prescriptive maintenance uses condition monitoring and artificial intelligence to track a larger range of data and predict when maintenance is necessary in real-time.
In this study, we propose a classification of technical systems that allows us to select maintenance effectiveness indicators depending on signs such as system maintainability, failure consequences, an economic assessment of the failure consequences, and a temporary mode of system use. This classification makes it possible to identify 13 subgroups of indicators, including most of the known indices and some new ones.
The study also proposes a classification of maintenance models to calculate each maintenance effectiveness indicator. The developed classification allows the selection of a maintenance model depending on features such as the system structure in terms of reliability, type of inspection, degree of the system’s restoration, and external manifestations of failure. This classification makes it possible to identify one hundred and sixty-eight subgroups of the maintenance models.
We illustrate how to determine the maintenance effectiveness indicators for a subgroup of maintenance models, which assume that the system has a single-component structure, that the type of inspection is similar to CBM, that the perfect repair is used, and that only hidden failures occur in the system. We developed mathematical maintenance models for the case of an arbitrary distribution of time-to-failure and multiple imperfect inspections, where the probabilities of correct and incorrect decisions depend on the time of inspection and failure occurrence. A numerical example illustrates the determination of the optimal number of inspections for a particular stochastic degradation process in condition-based and corrective maintenance.
The remainder of this article is organized as follows:
Section 2 provides a review of the classification of maintenance effectiveness indicators and maintenance models. The classification of technical systems for selecting maintenance effectiveness indicators is discussed in
Section 3.
Section 4 presents the classification of the maintenance models.
Section 5 develops the condition-based maintenance model and provides formulas for calculating maintenance effectiveness indicators. In
Section 6, an example of deterioration process modeling is considered. The results are outlined in
Section 7, followed by a discussion in
Section 8.
Section 9 presents the conclusions and potential future work. Abbreviations and references are provided at the end of this article.
2. Literature Review
Most studies on the evaluation of maintenance indicators deal with performance metrics that fall into two groups: those that show how maintenance affects a system’s overall performance, and those that address a system’s operational reliability and availability. The long-run average profit per unit time of using the system is representative of the first group of indicators. Well-known representatives of the second group are achieved and inherent availability.
The literature on the classification of maintenance effectiveness or performance indicators is abundant. Reviews of the classification of maintenance performance indicators can be found in [
14,
15,
16]. In summary, the study [
14] concludes that successful maintenance performance indicators should concentrate on assessing total maintenance effectiveness [
17]. Seven categories of maintenance performance indicators that measure the overall maintenance effectiveness were included in this study [
18]. Cost-, equipment-, and maintenance-task-related indicators are the most crucial. The study [
15] examined the development of maintenance performance measures concerning the crucial maintenance organizational function and its resources, activities, and practices. According to a study [
16], lagging measures, such as maintenance cost and safety performance, dominate the measurement of maintenance performance. Leading indicators such as maintenance work processes are used less frequently. The findings revealed no connection between the indicators employed and the maintenance goals pursued. The study in [
19] used the original ELECTRE I, a multi-criteria decision-making method, to present a novel method for selecting maintenance key performance indicators. Following evaluations based on the key criteria, the suggested methodology generates a ranking of potential options. The study in [
20] considered three levels of maintenance control: strategic, tactical, and operational. A set of maintenance performance indicators can be considered at each maintenance level. This study concentrates on maintenance performance indicators that support decisions at the operational level. The balanced scorecard is a different and comprehensive approach to measuring maintenance performance, as suggested in this study [
21]. It is based on the idea that no single measure can accurately reflect the overall maintenance performance and that multiple measures must be used in conjunction.
Let us consider the published indicators of maintenance effectiveness used in the operational stage. The study in [
22] used instantaneous and steady-state availability as maintenance effectiveness indicators. The best inspection and imperfect maintenance policy that reduces the average long-term cost rate is then obtained using availability models. The study in [
23] considered an analytical model for the steady state and instantaneous availability of the system. The authors determined the best method to change the inspection interval to maximize the steady-state availability of the system. A study [
24] considered steady-state availability as a critical metric for telecommunication services in which network functions are handled using software. To determine the lower confidence limit of the availability of complex control systems, the study [
25] presented a new availability assessment approach based on the goal-oriented method. The study [
26] looked at a continuously tested digital electronic system that could have one of three failures: revealed, unrevealed, or intermittent. A new maintenance model for determining inherent availability was proposed. A mathematical model to describe mission availability for a system with bounded cumulative downtime was proposed in [
27]. The suggested approach simultaneously considers cumulative uptime and downtime as restrictions. The study [
28] considered a mathematical model to compute the operational reliability of an avionic line-replaceable unit (LRU), the probability of LRU recovery, and the maintenance cost. In [
29], the authors considered corrective maintenance policies for scheduled imperfect inspections. The maintenance effectiveness indicators are the average availability and long-run average cost per unit time. A study in [
30] suggested a new CBM model that uses an energy efficiency indicator. The suggested approach encourages CBM optimization to consider both useful output performance and maintenance costs. The study in [
31] focused on the analytical modeling of a condition-based inspection and replacement policy for a stochastically and continuously degrading single-unit system. The developed mathematical model allows for the evaluation of the effectiveness of the maintenance policy and minimizes the expected long-term maintenance cost per unit time. A CBM policy was developed in the study [
32] for a two-component system with stochastic and economic dependencies. The long-term expected maintenance cost per unit time is an indicator of maintenance effectiveness. In [
33], a mathematical model of preventive maintenance with imperfect continuous condition monitoring of wind turbine components was presented. The derived mathematical equations allow the calculation of both the average lifetime maintenance cost and expected maintenance cost per unit time. A study [
34] proposed a comprehensive approach for the reliability modeling and maintenance planning of parallel repairable systems that suffer from hidden failures. This study aimed to reduce the expected maintenance cost per unit time by jointly determining the optimal inspection interval and maintenance thresholds. A study [
35] established maintenance policies for a system under periodic inspections. Maintenance actions were performed for any failures that were found. Repair is imperfect. The expected variable cost rate is an indicator of maintenance effectiveness. A study [
36] examined the problem of determining the optimal aircraft equipment maintenance intervals while minimizing maintenance costs and assuming an Erlang distribution of time between failures. The study in [
37] focused on a technique for calculating the optimal threshold during the implementation of the condition-based maintenance of radio equipment. The minimum average operational costs served as the optimization criterion. The study in [
38] examined the cost-effectiveness of applying CBM in manufacturing companies. The findings indicate that two potentially important benefits of CBM include lowering the probability of maximum damage to manufacturing machinery and of production losses, particularly in high production volumes. The study in [
39] considered a sensory-updated predictive maintenance policy that forecasts and updates the residual life distribution of a simple manufacturing cell using degradation models and sensor information. The overall maintenance costs are then computed. By considering imperfect remaining useful life (RUL) prognostics based on condition monitoring data, [
40] proposed a dynamic, predictive maintenance scheduling framework for a fleet of aircraft. This framework reduced the maintenance costs associated with engine failures to only 7.4% of the overall maintenance costs. A study [
41] examined a production system that was regularly inspected. Manufacturing, inventory, lost sales, repair, inspection, and maintenance costs are all included in the expected cost function. The study [
42] considered a strategic queuing model to examine how a maintenance service provider should allocate capacity and set prices in the face of imperfect IoT-based diagnostics, which continuously monitor various pieces of equipment using sensors. The study in [
43] investigated two maintenance strategies for wind-turbine gearboxes using continuous temperature monitoring. The maintenance effectiveness indicator is the total expected cost per unit time. According to a numerical analysis of the presented models, the optimal imperfect preventive maintenance strategy was 46% more effective than the optimal renewal strategy. A study [
44] proposed a technique for managing the maintenance of wind turbines using artificial intelligence approaches to reduce the overall cost. According to the study, the electrical system, gearbox, generator, and blades account for more than 80% of the risk factor and related downtime; as a result, they should be monitored and inspected more frequently than the others. The study in [
45] created a big data analytics platform that lowers maintenance costs by optimizing the maintenance schedule through CBM optimization and increasing the accuracy of quantifying the RUL prediction uncertainty. The study [
46] proposed a wind farm predictive maintenance approach that considered the economic dependency among subassemblies and component-level major and minor repairs. A simulation method was developed to assess maintenance costs. The study in [
47] proposed a reinforcement learning method to explore optimal predictive maintenance policies that optimize production and maintenance costs. The study in [
48] determined an optimal preventive maintenance schedule that was predicted based on the industry’s maximum availability of critical part manufacturing systems. A study [
49] developed a highly accurate RUL prediction for machinery using sensor-monitoring data. The proposed approach provides a more accurate RUL prediction than existing data-driven prediction methods. The study in [
50] built a distributed system with artificial intelligence assistance for applications in manufacturing plant-wide predictive maintenance based on sensors. The study in [
51] considered a sensor deployment problem to minimize maintenance costs. The authors developed a maintenance cost model for IoT networks that considers thermal degradation and battery depletion. A study [
52] proposed a maintenance model for protection systems with imperfect inspections. The conditional probabilities of correct and incorrect inspections are assumed to be constant. The maintenance effectiveness indicator was the expected cost per regeneration cycle. Studies [
53,
54] have shown that the probabilities of imperfect inspections at condition-based and predictive maintenance are functions of the degradation model parameters and strongly depend on time. In the study [
55], the case of imperfect testing of the crack depth in a fighter wing was considered. The study showed that flight hours have a significant impact on the probabilities of a false positive and a false negative. The study in [
56] analyzed six main components of intelligent predictive maintenance: (1) sensor and data acquisition, (2) signal pre-processing and feature extraction, (3) maintenance decision-making, (4) key performance indicators, (5) maintenance scheduling optimization, and (6) feedback control and compensation. The study [
57] presented a new data-driven predictive maintenance approach that covers the full process, from using RUL prediction to making maintenance decisions. The maintenance cost per unit time was calculated. The study in [
58] proposed probabilistic models for assessing mission success probability, system survival probability, expected number of inspections during the mission, and total estimated losses for a system subject to imperfect inspections. In the developed models, the probabilities of false positives and false negatives were time-independent.
The studies [
59,
60,
61,
62,
63,
64] considered the application of preventive, condition-based, and predictive maintenance techniques in the context of Industry 4.0. Among the novel maintenance approaches and techniques, special attention is given to the data-driven approach, conventional machine learning approaches, and machine prognostics for RUL estimation. Studies [
65,
66,
67,
68,
69,
70] introduced prescriptive maintenance techniques that may not only predict the condition of a machine in the future but also suggest proactively timed decisions for certain maintenance activities such as inspection, repair, and replacement. Studies [
71,
72,
73,
74,
75,
76] considered the use of knowledge-based approaches in predictive, condition-based, prescriptive, and preventive maintenance.
3. Classification of Systems
As noted in
Section 1, the design of the system, its purpose, and its operating conditions affect the choice of MRS effectiveness indicators. To classify the systems, we characterized each of the listed generalized properties by using a minimum set of the most significant signs.
From a maintenance perspective, the design of a system is primarily characterized by its maintainability. According to [
1], “maintainability is the ability of an item under given conditions of use, to be retained in, or restored to, a state in which it can perform a required function.” Items that can be repaired are compared with those that cannot be repaired because of planned obsolescence. The division of systems into repairable and non-repairable systems is associated with the possibility of restoring an operable state through repair, which is ensured during the development and manufacture of the system.
We should also note that sometimes the term “repairability” is used to characterize the ability of a system to be repaired. Maintainability and repairability are similar. The only distinction is that while repairability is limited to active repair time, maintainability is based on total downtime (which includes administrative time, active repair time, and logistic time). In addition, the term “maintainability” is standard [
1], while the term “repairability” is not.
The concept of “purpose of the system” primarily relates to the functions it performs. However, this concept includes not only the list of functions performed but also the relative importance of these functions. These two circumstances can be characterized by qualitative and quantitative assessments of the events that occur owing to system failure, that is, the consequences of failure and the possibility of their quantitative assessment.
The evaluation of maintenance effectiveness is carried out using indicators that depend on the operating time of the system in a given mode of temporary use, which may be continuous or intermittent. That is why, as a sign of the classification that characterizes the mode of operation, we chose the temporary mode of using the system for its intended purpose.
Therefore, we classified systems according to the following criteria: maintainability, consequences of a failure, possibility of an economic assessment of the consequences of a failure, and temporary mode of system use.
To divide a set of system types into subsets according to the listed signs, it is necessary to select a classification system. Currently, the most common classification systems for information objects are hierarchical and faceted [
77].
A hierarchical system is used when a subordination relationship is established between the classification signs used at levels i and (i + 1).
With a faceted classification system, we subdivide the initial set of objects into subsets by combining the values of the independent signs (facets).
As shown below, for the four signs under consideration, a subordination relation exists between all features. Therefore, it is necessary to apply a hierarchical classification system with ordinal registration of individual sign values.
We determined the values of the classification signs, established a connection between them, and divided the initial set of systems into disjointed subsets as follows.
First classification sign. A sign of system maintainability i (i = 1: the system is repairable; i = 2: the system is non-repairable).
When using “repairable”, we refer to the systems for which the repair is provided in the technical documentation.
Non-repairable systems will include systems whose performance cannot be restored owing to design features or whose restoration is not economically feasible. However, for non-repairable systems, maintenance may be provided, including condition monitoring and removal if necessary.
Therefore, in the first stage of the classification, the initial set of systems
I0 is divided according to sign
i into the class of repairable systems
I1 and the class of non-repairable systems
I2, that is,
moreover
, where
,
, and
are symbols denoting the union and intersection of subsets, and an empty subset, respectively.
Figure 1a illustrates the classification of systems by sign
i.
Second classification sign. A sign of failure’s consequences j (j = 1: the fact that the system does not perform a safety-critical function assigned to it in a given volume, j = 2: the fact that the system does not perform a safety-critical function assigned to it in a given volume at an arbitrary moment of beginning the “operation mode,” j = 3: downtime of the system in a state of unrevealed failure when used as intended and downtime associated with scheduled maintenance and unscheduled repair; j = 4: downtime of the system in a state of unrevealed failure when used as intended, regardless of the downtime associated with scheduled maintenance; j = 5: downtime of the system in a state of unrevealed failure when used as intended and downtime associated with unscheduled system replacement).
When evaluating the consequences of a failure, the sign value of
j = 1 corresponds to the case in which the consequences affect safety. According to the European Standard ECSS-S-ST-00-01C––Glossary of Terms [
78], “a safety-critical function is the function that, if lost or degraded, or as a result of incorrect or inadvertent operation, can result in catastrophic or critical consequences.” When
j = 2, the system can be either in standby mode or operation mode, and when it is in operation mode, it does not perform a safety-critical function. In the standby mode, the system is accessible for inspection and repair. The system must operate without failure for a predetermined time in the operating mode, where failure affects safety. Maintenance and inspection were not performed in the operating mode.
The sign value j = 3 corresponds to the case when material damage due to a failure is caused by both the system’s downtime in a state of unrevealed failure during operation and by the downtime due to the restoration.
The sign value j = 4 corresponds to the case in which the material damage owing to a failure is much greater than the losses caused by the system downtime. In this case, as a rule, the system downtime owing to restoration work does not lead to substantial material loss. If it is impossible to estimate material losses due to system failure, the value j = 4 corresponds to the case in which either scheduled maintenance or restoration of the system, or both, can be carried out during time intervals when the system is not used.
The sign value j = 5 corresponds to the operation of a non-repairable system, the consequences of which do not affect safety.
The sign values j = 2, 3, and 4 correspond to the cases of operation of repairable systems, and when j = 1, the system can be both repairable and non-repairable.
Therefore, in the second stage of the classification, we divided class
I1 into four subclasses:
I11,
I12,
I13, and
I14, and class
I2 into two subclasses:
I21 and
I25, that is,
Figure 1b,c illustrates the classification of systems by sign
j.
Third classification sign. A sign of the economic evaluation of failure consequences k (k = 1: economic evaluation is impossible; k = 2: economic evaluation is possible).
When the consequences of a failure can be evaluated economically, it is necessary to use technical-economic indicators of the MRS in the form of the average operating profit per unit of time (AOPUT) or average maintenance cost per unit of time (AMCUT). The possibility of an economic evaluation of the consequences of failure exists only for systems that do not affect safety, that is, for subclasses
I13,
I14, and
I25. If, for some reason, material damage from a system failure that belongs to one of these subclasses cannot be quantified, then complex reliability indicators, namely inherent availability and achieved availability, should be used as MRS effectiveness indicators. If a system failure leads to the non-fulfillment of the assigned safety-critical functions, then the consequences of failure cannot be evaluated. Indeed, the failure of safety-critical systems in the nuclear, chemical, aviation, military, etc. industries may result in many casualties. Moreover, it is not known how many victims there will be. For example, two accidents at the nuclear power plants of Chornobyl and Fukushima killed tens of thousands of people [
79,
80]. The cost of the loss of human lives cannot be estimated in such cases. Therefore, when planning and optimizing the maintenance of safety-critical systems, criteria are used that do not include the cost of the failure consequences. In such cases, the dominant factor in evaluating the consequences of a failure is the inability to perform a task. As higher demands are placed on the reliability of safety-critical systems during their intended use, it is logical that meeting these demands comes at a price. Therefore, when evaluating the MRS effectiveness for subclasses
I11 and
I12, it is recommended to utilize the “reliability-costs” criterion, which demands the presence of two effectiveness indicators. The first identifies the operational reliability of the system, whereas the second characterizes maintenance costs.
As indicators of reliability, we used the a posteriori probability of failure-free operation (APFFO) for non-repairable systems and the operational probability of failure-free operation (OPFFO) for repairable systems.
The APFFO and OPFFO can be determined both for periodic and sequential inspections. The APFFO is the conditional probability of failure-free operation of the system on the interval (tk, t), provided that the system can be used for its intended purpose based on the inspection results at moments .
From the definition of APFFO, it follows that for non-repairable systems, the moments of inspections are assigned until the system is rejected. Once the system is replaced, the inspection schedule is restarted, beginning with the first inspection moment.
Under the OPFFO, we refer to the probability of failure-free operation of the system over the operating time interval (tk, t), considering that at moments , maintenance was carried out, including inspection and restoration of systems judged as inoperable.
From the definition of OPFFO, it follows that for repairable systems, inspection times are assigned considering the repair of rejected systems.
The definitions of the APFFO and OPFFO for systems with continuous condition monitoring are as follows:
The APFFO is the conditional probability of the system’s failure-free operation during the interval (0, τp), assuming that the system has not been rejected by the condition-monitoring results, where τp is the periodicity of maintenance.
The OPFFO is the probability of failure-free operation of the system in the interval (0, τp), considering that, at any moment, the system can be rejected as inoperable and then restored.
Maintenance costs are characterized by unit costs (UC), that is, the average costs per unit time of system use. Moreover, for non-repairable systems, UC includes inspection and replacement costs, whereas, for repairable systems, UC includes inspection and repair costs.
For subclass I12, that is, for systems operating in two modes, the dominant factor in evaluating the consequences of a failure is the fact that a safety-critical function is not performed at an arbitrary start of the “operation mode.” As a rule, for this system subclass, an economic assessment of the consequences of the failure is not possible. Therefore, mission availability should be used as an indicator of maintenance effectiveness. We define mission availability as the probability that the system will be in an operable state at any time (except for scheduled maintenance intervals, during which the system is not used for its intended purpose), and from that moment, will operate failure-free within a specified interval of time.
Therefore, in the third stage of the classification, depending on the value of sign
k, the subclasses of the systems are divided into the following nine groups:
At this stage of classification, maintenance effectiveness indicators can be uniquely defined for the six groups of systems. For group I111, the effectiveness indicators were OPFFO and UC; for group I121, mission availability; for groups I132, I142, and I252, AOPUT or AMCUT; and for group I211, APFFO and UC. The performance indicators of groups I131, I141, and I251 can either be inherent or achieved availability. The final choice of indicators is made after these groups are divided into subgroups depending on the temporary mode of the system’s use.
Figure 1b,c illustrates the classification of systems by sign
k.
Fourth classification sign. A sign of the temporary mode of system use l (l = 1: continuous mode; l = 2: intermittent mode).
In the continuous system use mode, equipment downtime is caused by failures and the need for scheduled maintenance. When a system is used intermittently, planned maintenance can be performed if it is not in use.
Each group of systems
I111,
I211,
I251, and
I252 on the fourth level of classification was divided into two subgroups:
because the values of
l do not contradict the previous signs in these groups.
For group I121, sign l = 2 because the system operates in two modes and switches to operating mode only after an alarm signal.
For group I131, sign l = 1 because the value of sign l = 2 contradicts the value of sign j = 3. Indeed, with the intermittent use of the system, scheduled maintenance can be carried out at intervals in which the use of the system is not planned. Therefore, there will be no downtime owing to the scheduled maintenance. For the same reason, for group I132, the sign l = 1.
For groups I141 and I142, the sign l = 2 because l = 1 is incompatible with j = 4. It takes time to repair a system when a failure is detected in a continuous mode of use.
Therefore, for groups
I131,
I132,
I141, and
I142 at the fourth stage of classification, we obtain
Figure 1b,c illustrates the classification of systems by sign
l.
During continuous use, it is necessary to consider the downtime of the system owing to scheduled maintenance. Therefore, the achieved availability is an effective indicator for subgroups I1311 and I2511. As mentioned previously, scheduled maintenance does not result in downtime in intermittent usage. Therefore, the effectiveness indicator for subgroups I1412 and I2512 is inherent availability.
Definitions of the achieved and inherent availability for repairable systems are available in many references, such as examples [
81,
82]. Since subgroups
I2511 and
I2512 include non-repairable systems, we define the achieved and inherent availability for them.
The achieved availability of a non-repairable system is the ratio of the mathematical expectation of the time the system is in an operable state for a certain period of operation to the sum of the mathematical expectations of the time the system is in an operable state and downtime owing to unrevealed failures, scheduled and unscheduled maintenance (checks and removals).
The inherent availability of a non-repairable system is formulated in the same manner as that of a repairable system. However, when calculating the inherent availability, because there is no repair, only the characteristics of the operable and unrevealed failure states and unscheduled removals are considered.
Figure 1 shows the hierarchical classification of the systems, which includes two classes, six subclasses, nine groups, and thirteen subgroups of systems.
The corresponding maintenance effectiveness indicators are listed in
Table 1.
Table 1 uses the following notations:
is the OPFFO,
is the UC for a repairable system,
is the mission availability,
is the achieved availability for a repairable system,
and
are the AOPUT and AMCUT for a repairable system, respectively,
is the inherent availability for a repairable system,
is the APFFO and
is the UC for a non-repairable system,
is the achieved availability for a non-repairable system,
is the inherent availability for a non-repairable system, and
and
are the AOPUT and AMCUT for a non-repairable system, respectively.
The index “1” in the designation of indicators , , , , , and indicates that the mode of the system use is continuous, and therefore, when calculating indicators, it is necessary to consider losses due to downtime during scheduled maintenance and unscheduled repairs (for repairable systems). Index “2” in the designation of these indicators specifies intermittent use of the system. In this case, losses owing to downtime during scheduled maintenance were not considered.
The maintenance effectiveness indicators listed in
Table 1 characterize the most significant signs of the design, purpose, and operating conditions of the system. Therefore, they should be considered as key indicators. When solving maintenance optimization problems, additional indicators that specify the specifics of the situation under consideration can be included in the number of indicators characterizing maintenance effectiveness.
Example 1. We determined the maintenance effectiveness indicators for an aircraft instrument landing system (ILS), which is the most common radio navigation instrument approach system in aviation.
Redundant electronic systems make up modern digital avionics [
83]. The onboard ILS is an avionics system. Any avionics system usually comprises two or three identical line-replaceable units (LRUs) [
84]. Avionics LRUs have a reputation for their excellent testability and maintainability standards. Shop-replaceable units (SRUs) are LRU interchangeable parts. Typically, an SRU is a printed circuit board assembly that may be changed or repaired in a workshop. ILS is therefore a repairable system (
i = 1). System failures during the descent and landing phases can lead to dangerous situations. Therefore, the ILS system was not included in the master minimum equipment list (MMEL) [
85], which listed onboard systems in an aircraft that can be classified as having little to no effect on the safety of the operation. Hence, the failure consequence is the nonfulfillment of the critical functions (
j = 1). Economic assessment of the consequences of failure is impossible (
k = 1). The ILS was operated in intermittent mode (
l = 2) because avionics operation is associated with the alternation of flights and stops of aircraft.
Thus, the ILS belongs to the subgroup
I1112. As shown in
Table 1, the maintenance effectiveness indicators were
and
.
Example 2. We determined the maintenance effectiveness indicators for an aircraft’s satellite communication (SATCOM) system.
The SATCOM system was repairable (
i = 1). The SATCOM system was included in the MMEL [
85], which means that the system has little to no effect on flight safety. Hence, the failure consequence is the downtime of the system in a state of unrevealed failure when used as intended, regardless of the downtime associated with scheduled maintenance (
j = 4), because scheduled maintenance can be carried out during time intervals when the system is not in flight. An economic assessment of the consequences of failure is impossible (
k = 1). The SATCOM was operated in intermittent mode (
l = 2).
Thus, the SATCOM system belongs to the subgroup
I1412. As shown in
Table 1, the maintenance effectiveness indicator was
.
4. Classification of Maintenance Models
This section uses hierarchical classification with the ordinal registration of individual sign values. The classification signs include system structure in terms of reliability, type of inspection during maintenance, degree of system restoration for the system rejected at inspection, and external manifestations of system failure. We chose a hierarchical classification because the first three signs have a subordination relationship.
We determined the values of classification signs and established a relationship between them, as shown below.
First classification sign. A sign of the system structure in terms of reliability (a = 1: single-component structure; a = 2: series structure; a = 3: parallel structure; a = 4: structure “h-out-of-q”; a = 5: series-parallel structure; a = 6: parallel-series structure; a = 7: complex structure).
As is well known, the maintenance models depend on the system structure concerning reliability [
86,
87,
88,
89,
90].
When a = 1, the system is considered a single unit; that is, in the case of the rejection of the system based on the inspection results, the entire system is subject to restoration.
When , the system consists of q > 1 units, and only the units rejected during the inspection are subject to restoration. We refer to systems with complex reliability structures as systems whose structural functions extend past the first six values of feature a. For instance, systems with a complex reliability structure may change the reliability structure in different modes of operation.
Therefore, in the first stage of the classification, we divided the set of maintenance models for inspected systems (
M0) based on sign
a into seven classes:
,
where
if
.
Figure 2a illustrates the classification of maintenance models by sign
a.
Second classification sign. A sign of the type of inspection (b = 1: periodic or sequential inspection at corrective maintenance (CM); b = 2: periodic or sequential inspection at CBM; b = 3: periodic or sequential inspection at predictive or prescriptive maintenance; b = 4: continuous condition monitoring at corrective, preventive, condition-based, predictive, or prescriptive maintenance).
A corrective maintenance inspection distinguishes only between the operable and inoperable states of a system. This type of inspection is the simplest, but it has a serious disadvantage, the essence of which lies in the fact that CM inspection allows for detecting defects that have accumulated in the system by the time it is carried out and does not provide any confidence in the system’s operability in the future. From the perspective of the system operator, the content of CM inspections falls short of fully achieving the goal of maintenance, which is to prevent failures from occurring while the system is being used for its intended purpose rather than simply identifying failures and fixing their effects.
Inspection at CBM is more effective because it employs both the functional failure threshold and the replacement threshold, making it possible to reject systems that have already failed, as well as those that may fail in the next operation interval.
An inspection at predictive maintenance includes a prognostication of the system’s future state by comparing the predicted parameter values with the functional failure thresholds. The use of prognosis-based inspection allows the rejection of systems that will fail in the upcoming operation interval. This type of inspection is in line with the potential goal of maintenance to allow only those systems that will not fail until the next maintenance time point to operate.
Prescriptive maintenance goes beyond predictive maintenance because it generates proactive decisions for equipment restoration based on predictive analytics. Prescriptive maintenance can show us how particular actions on an asset or system restoration affect the output, rather than only predicting when the equipment is likely to fail. Industrial businesses utilize prescriptive maintenance and analytics solutions to reduce unplanned downtimes, increase equipment reliability, and maximize profits.
According to ISO 13372 [
91], “condition monitoring is detection and collection of information and data that indicate the state of a machine.” Both intermittent and continuous condition monitoring is possible.
Periodic monitoring in production systems is performed with the use of portable indicators such as hand-held measurement equipment, acoustic emission units, and vibration pens at specific intervals [
38].
In real-time monitoring, a machine is continuously monitored, and at any time an error is found, a warning alarm is set off. Continuous condition monitoring is used to increase system availability or reduce maintenance costs for deteriorating systems. Sensors for condition monitoring may use vibration, ultrasonics, thermography, fiber optics, and other technologies. Maintenance based on continuous condition monitoring is a potential strategy to increase operational reliability and lower the operating costs of various deteriorating systems [
33,
39,
42,
43,
50].
Therefore, depending on the type of inspection, the class of models
M1 is divided into four subclasses
M11,
M12,
M13, and
M14 that is,
One of the four types of inspection can be utilized for every system unit when performing maintenance on multi-unit systems. To simplify the classification, we selected from the classes of models only those subclasses, Ma1, Ma2, Ma3, and Ma4 (), in which the same type of inspection was used for all units of the system.
Figure 2b illustrates the classification of maintenance models by sign
b.
Third classification sign. A sign of the degree of the system’s unit restoration (c = 1: perfect repair: after repair, the unit becomes as-good-as-new; c = 2: minimal repair: after repair, the unit is in the pre-failure condition; c = 3: imperfect repair: after repair, the unit is in the condition between “as-good-as-new” and its pre-failure condition).
Those units rejected due to inspection are shipped for restoration. These units can be restored with varying degrees of correspondence to the initial state. The degree of recovery is characterized by the reliability properties that the unit acquires after restoration.
The degree of recovery can be full, minimal, or partial. The first degree of recovery corresponds to perfect repair. In this case, the repaired units acquire the same reliability properties as the new units with a zero-operating time. Such restoration is a model of a practical situation in which the reliability of a unit is primarily determined by a non-repairable element. In the second case, the repaired unit had the same reliability characteristics as the unit that had been in use for the same amount of time without experiencing failure. This recovery model applies to real-world scenarios, in which a unit includes many components with similar reliability indicator values. When one failed element was replaced, the reliability indicator readings of such a unit were comparable to those of a block with no failures. In the third case, we describe imperfect repair as one that leaves the system in a condition halfway between its pre-failure “as-bad-as-old” state and an “as- good-as-new” state.
Let us assume that we use the failure rate
λ(
t) as a reliability indicator for unit elements. Then, the perfect repair model can be used to describe the quantitative characteristics of the reliability of units arriving for restoration if the unit contains an element with the number
j, for which
where
is the failure rate of the
i element and
g is the number of unit elements.
The minimal repair model can be used with the same failure rate of the elements in a multi-element unit.
When using the imperfect repair model, the failure rate of the unit after repair (
) is greater than that with perfect repair (
) but less than that with minimal repair (
), that is,
An imperfect repair model may be employed for units with several nonrepairable components; when one of these components is replaced by a new one, the unit’s failure rate drops but does not match that of a new one.
Therefore, in the third stage of classification, depending on the recovery model used, the subclasses of models
Ma1,
Ma2,
Ma3, and
Ma4 (
) were divided into the following groups:
Figure 2c illustrates the classification of maintenance models by sign
c.
Fourth classification sign. A sign of external manifestations of failure (d = 1: only hidden failures occur in the system; d = 2: both hidden and evident failures occur in the system).
The external manifestations of system failure directly affect the feasibility and efficiency of inspection. If there are only hidden failures, the inspection efficiency will be the largest because only by its results will it be possible to determine the technical condition of the system and thereby decide whether the system should be allowed to be used in the upcoming operating time interval. If only evident failures occur in the system, it is not advisable to inspect it, because it does not provide any additional information about the system’s condition. In this case, failure was evident to the technical staff. Therefore, maintenance includes detecting the failure location and repairing failed parts of the system. We did not consider this case separately.
When both hidden and evident failures occur in the system, the inspection efficiency depends on the ratio of the reliability characteristics of the system to these failures.
Sign
d does not depend on signs
a,
b, and
c. Therefore, each of the groups obtained at the third stage of the classification was subdivided at the fourth stage into two subgroups as follows:
Figure 2d illustrates the classification of maintenance models by sign
d.
Figure 2a–d shows the hierarchical classification of the maintenance models for the inspected systems, including seven classes, twenty-eight subclasses, eighty-four groups, and one hundred and sixty-eight subgroups.
The operation and maintenance processes of any system can be represented as a sequence of changes in its various states. Therefore, we define system behavior using a stochastic process
L(
t),
where t > 0, with a finite space of states:
where
m is the number of the system states.
Process L(t) changes only in jumps, where each jump is caused by a system transition to one of the possible states. The operable state is denoted by S1. State S2 corresponds to an inoperable state, that is, the presence of a failure in the system. States are related to different types of inspections and repairs (or removal).
For the convenience of the subsequent presentation, from one hundred and sixty-eight subgroups of maintenance models, we formed two classes of models that differed in the value of sign c.
At
c = 1, the unit rejected during the inspection at time
tk (
k = 1, 2, …) is replaced by a new operable unit with zero operating time. Since the unit is renewed with such a replacement, process
L(
t),
t > 0, is regenerative:
where
is the
ν random cycle of system regeneration.
Since the initial process
L1(
t) and the
i process
are stochastically equivalent, the regenerative process
L(
t) is synchronous [
92]. Therefore, the mathematical expectation of the regeneration cycle
is the same for each
ν ; we designate it as
E(
C0).
For the regenerative process
L(
t) of changing the system states, the fraction of time the system is in
Si is equal to the ratio of the mathematical expectation of the time the system is in state
Si during a regeneration cycle to the mathematical expectation of the regeneration cycle [
93]. We used this regenerative process property when developing models of maintenance effectiveness indicators at
c = 1.
When
c = 2 and 3, the system operation and maintenance are considered during the operating time
Tul > 0, where
Tul is the useful life of the system or the assigned useful life. The technical condition of the unit was not renewed when it was replaced at time
tk (
k = 1, 2…,
tk <
Tul). It is assumed, however, that once the operating time exceeds
Tul, the unit is replaced with a new unit with zero operating time. Therefore, it is also possible to construct a synchronous regenerative process
L(
t),
t > 0, with a deterministic regeneration cycle
Tul by setting the following:
Since at c = 2 and c = 3, the process L(t) is also synchronous, we can consider the models of maintenance effectiveness indicators during the regeneration cycle (0, Tul).
5. Example of Developing Maintenance Model
We determined the maintenance effectiveness indicators in
Table 1 for the subgroup of maintenance models
M1211, assuming that the system has a single-component structure (
a = 1), the type of inspection is similar to CBM (
b = 2), the perfect repair is used (
c = 1), and only hidden failures occur in the system (
d = 1).
We began by building a maintenance model by defining the stochastic process
L(
t), as follows [
94]:
We denote through
the amount of time the system spent in state
Si. Time
is a random variable with the expected mean time
. Since process
L(
t) is regenerative, the average duration of the regeneration cycle is
5.1. Correct and Incorrect Decisions at CBM Inspections
In this subsection, we describe events such as true positive, false positive, true negative, and false negative that may arise from imperfect inspection during CBM.
To determine the expected mean times
, we need to know the conditional probabilities of the correct and incorrect decisions made when conducting a CBM inspection. We suppose that a gradual failure of the system occurs at time
η, where
.
Figure 3 shows the location of inspection times
and the gradual failure at time
η in a finite interval (0,
T).
To determine the probabilities of the correct and incorrect decisions, we assume that the system state parameter
X(
t), which is a nonstationary stochastic process with continuous time, completely identifies the system condition. If the value of the system state parameter exceeds the functional failure threshold
FT, then the system enters a failed state. If there is a measurement error (or noise)
Y(
tn),
, then the measurement result
Z(
tn) is related to the true value
X(
tn) as follows:
where Φ(·) is a function of random variables
X and
Y.
The following decision rule is introduced when inspecting the system at time tn. If z(tn) < RTn, the system is judged to be operable over the time interval (tn, tn+1), and if z(tn) ≥ RTn, the system is judged to be inoperable over the time interval (tn, tn+1) and is excluded from the operation, where RTn (RTn < FT) is the replacement threshold at time tn. Since RTn < FT, this decision rule aims to reject systems that can fail over the time interval between the inspections.
Using the introduced decision rule, two repair or replacement strategies are possible. If RTn ≤ Z(tn) < FT, then the preventive repair or replacement of the system is performed at time tn. If Z(tn) ≥ FT, the corrective repair or replacement of the system is performed at time tn.
The decision rule above compares the parameter values that determine the system state with those of replacement thresholds. This rule does not allow one to associate the time to failure with the probabilities of the correct and incorrect decisions based on the inspection results because the measurement result is located on the space (vertical) axis but not on the time (horizontal) axis. Therefore, with this decision rule, it is impossible to properly introduce the conditional probabilities of correct and incorrect decisions into the mathematical model of maintenance.
We introduced three random variables associated with the functional failure threshold
FT and replacement threshold
RTn to consider the decision rule at CBM inspection on the time axis. We denote the random time to system failure by H with the probability density function (PDF) ω(
η). Let
be a random time of the system operation until it exceeds the replacement threshold
RTn by the parameter
X(
t), and let
denote a random assessment of
based on the results of the inspection at time
tn. The lowest roots of the following stochastic equations are used to define the random variables H,
, and
:
The definition of the random variable
involves the following:
Based on (22), the decision rule presented previously can now be expressed as follows: at time point tn, the system is judged to be operable over the time interval (tn, tn+1) if ; alternatively (if ), the system is judged to be inoperable over the time interval (tn, tn+1), where is the realization of the random variable for the system.
According to (18) and (21),
is implied to be a function of the random variables
X(
tn),
Y(
tn), and the replacement threshold
RTn. When
Y(
tn) is present in (21), the measurement error of the time to failure at inspection time
tn appears to be random in nature, and is defined as follows:
The random variables H (0 < H < ∞) and Δ
n (−∞ < Δ
n < ∞) exhibit an additive relationship. Consequently, the random variable
is specified to have a continuous range of values between −∞ and +∞. When inspecting the system at time
tn, a mismatch between the solutions of (19) and (21) causes one of the following mutually exclusive events to occur:
where
is the joint occurrence of the following events: the system is operable over the time interval
and is judged as operable at inspection times
;
is the joint occurrence of the following events: the system is operable over the time interval
judged as operable at inspection times
, and is judged as inoperable over the time interval
at inspection time
;
is the joint occurrence of the following events: the system is operable at inspection time
, fails within interval
, and is judged as operable at inspection times
;
is the joint occurrence of the following events: the system is operable at inspection time
, fails during interval
, judged as operable at inspection times
, and is judged as inoperable at inspection time
;
is the joint occurrence of the following events: the system has failed until inspection time
and has been judged as operable at inspection times
; and
is the joint occurrence of the following events: the system has failed until inspection time
, judged as operable at inspection times
and inoperable at time
.
As shown in (24)–(29), the system can a priori be in one of three states when inspecting the system operability over the time interval at time tn: operable with probability P(tn+1), operable at time tn but inoperable over the time interval with probability P(tn) − P(tn+1), and inoperable with probability 1 − P(tn), where P(t) is the system reliability function.
It is evident that the time axis is used to formulate events (24)–(29). We will utilize (24)–(29) in the future to assess operational reliability and maintenance effectiveness indicators because reliability indicators are usually developed in relation to events occurring on the time axis.
Equations (26) and (27) show that with regard to the system operability over the time interval , the event corresponds to an incorrect decision, whereas the event corresponds to the correct decision. When the event occurs, an inoperable system is erroneously allowed to be used over time interval . Note that in the case of CM inspection at time , events and match correct and incorrect decisions, respectively. This is the fundamental difference between the CBM and CM inspections. Therefore, CM inspection does not allow the rejection of potentially unreliable systems.
Furthermore, event
is called a “false negative” (false alarm) at time
tn, and events
and
are called “false positive 1” and “false positive 2”, respectively, at time
tn. The events
and
represent the correct decisions made during a CBM inspection at time
tn; we refer to these events as “true positive”, “true negative 1”, and “true negative 2,”, respectively.
Table 2 lists the actual system conditions and the decisions made when conducting CBM inspection at time
tn.
5.2. The Probabilities of Correct and Incorrect Decisions at CBM Inspections
In this subsection, we develop a general mathematical model to calculate the probabilities of correct and incorrect decisions at multiple CBM inspections and an arbitrary time to system failure. We also derive, on the time axis, general equations for calculating the conditional probabilities of true positive, false positive, true negative 1, false negative 1, true negative 2, and false negative 2 at multiple CBM inspections, which will be further incorporated into CBM mathematical maintenance models.
Calculating the probabilities of events (24)–(29) comes down to determining the probability that the random point will fall inside the (n + 1)-dimensional region produced by the limits of variation of each random variable and be equal to the (n + 1)-fold integral over this region.
We denote the joint PDF of the random variables as . Event corresponds to the (n + 1)-dimensional region with the following limits: and .
By integrating PDF
within the specified region, we determine the probability of event
.
Event
corresponds to the
-dimensional region with the limits:
,
, and
. Integrating PDF
within the limits, we obtain the probability of event
.
Event
corresponds to the
-dimensional region with the following limits:
and
. By integrating PDF
within the indicated limits, we obtain the probability of event
.
Event
corresponds to the
-dimensional region with the limits:
,
, and
. Integrating PDF
within the limits, we obtain the probability of event
.
Event
corresponds to the
-dimensional region with the following limits:
and
By integrating PDF
within the specified region, we determine the probability of event
.
Event
corresponds to the
-dimensional region with the limits:
,
, and
. Integrating PDF
within the limits, we obtain the probability of event
.
As we can observe from (30)–(35), the joint PDF
of random variables
must be known in order to determine the probabilities of correct and incorrect decisions made when conducting CBM inspections. We denote the conditional PDF of random variables
as
provided that
Following the multiplication theorem of PDFs, we represent PDF
as follows [
94]:
where
is the conditional PDF of random variables
, provided that
.
In the case of , the random variables are defined as
The following equality is true because of the additive relationship between random variables
and
:
We obtain the following expression for the multidimensional PDF by substituting (37) into (36).
It is feasible to simplify (30)–(35) using (38). Inputting (38) into (30) gives the following.
Considering that
in (39), we arrive at:
By changing the variables in (31) to (35), we obtain:
As can be seen from (40) to (45), it is necessary to know the PDF ω(η) and to calculate the probabilities of correct and incorrect decisions made when conducting CBM inspections. Another thing to keep in mind is that formulas (40)–(45) are generalized, meaning that they may be applied to any stochastic deterioration process X(t).
For brevity, we introduce the following notation for the probabilities
:
where TP, FN, FP, and TN represent the true positive, false negative, false positive, and true negative, respectively.
We use the time axis depicted in
Figure 3 to derive the conditional probabilities of correct and incorrect decisions.
When performing a CBM inspection at time
), the conditional probability of a “true positive” is defined as follows.
When running a CBM inspection at time
, the conditional probability of a “false negative” is defined as follows.
When doing a CBM inspection at time
, the conditional probability of a “false positive 1” event is defined as follows.
When conducting a CBM inspection at time
, the conditional probability of a “true negative 1” occurrence is described as follows.
When performing a CBM inspection at time
, the conditional probability of a “false positive 2” event can be defined as follows.
When running a CBM inspection at time
, the conditional probability of a “true negative 2” event can be defined as follows.
The calculation of each of the conditional probabilities (47)–(52) is equivalent to taking the n-fold integral over this region from the PDF which is equivalent to calculating the probability of hitting the random point in the n-dimensional region formed by the limits of variation of each random variable.
The conditional probability of the “true positive” at time
(
is given by
The conditional probability of the “false negative” at time
(
is determined as:
The conditional probability of the “false positive 1” at time
(
is set as:
The conditional probability of the “true negative 1” at time
(
is given by:
The conditional probability of the “false positive 2” at time
is determined as:
The conditional probability of the “true negative 2” at time
is given by:
5.3. Mean Times for the System to Stay in Different States
In this subsection, we derive equations for the expected mean times of the system staying in the states . Firstly, we determine the conditional mathematical expectations of the amount of time the system spends in each state. Then, by using the formula for the total mathematical expectation of a continuous random variable, we will find the mathematical expectation of how long the system will remain in state .
Given that H =
η, we can use
Figure 3 to calculate the conditional mathematical expectation of the amount of time the system spends in the state
S1.
According to the time-location of CBM inspections in
Figure 3, the conditional mathematical expectation of the amount of time the system will spend in the state
S2 provided that H =
η is as follows:
The conditional mathematical expectation of the time spent by the system in the state
S3 under the condition that H =
η is equal to:
where
tins is the average duration of a CBM inspection.
Based on
Figure 3, we estimate the conditional mathematical expectation of the amount of time the system will spend in state
S4 provided that H =
η.
where
tPR is the average length of time of a preventive repair.
According to the study of the time axis in
Figure 3, the conditional mathematical expectation of the length of time the system will spend in state
S5 provided that H =
η is as follows:
where
tCR is the average length of time of a corrective repair.
Using a modified version of the formula for the total mathematical expectation of a continuous random variable, we can calculate the mathematical expectation of how long the system will remain in state
.
When we apply (64) to (59)–(63), we obtain the following:
The mathematical expectation of time spent by the system in state
S1.
The mathematical expectation of time spent by the system in state
S2.
The mathematical expectation of time spent by the system in state
S3.
The mathematical expectation of time spent by the system in state
S4.
The mathematical expectation of time spent by the system in state
S5.
5.4. Maintenance Effectiveness Indicators
Let us determine the maintenance effectiveness indicators presented in
Table 1. When deriving formulas, we employ the property of the regenerative stochastic process
L(
t) of changing the system states according to which fraction of time the system is in state
Si is equal to the ratio of the time the system spends in state
Si during a regeneration cycle to the mathematical expectation of the regeneration cycle.
As shown in
Table 1, for the subgroup
I1111, the maintenance effectiveness indicators are
P0 and
. For the finite operating time interval (0,
T), we determined the OPFFO as follows:
where
is the probability of system repair at time
tj and
and
are the probabilities of preventive and corrective repair at time
tj, respectively.
Since a CBM inspection typically takes far less time than the interval between inspections, we neglect the CBM inspection duration in (70).
We begin with the proof of (71). The following events are introduced:
and
are the preventive and corrective system repair events, respectively, and
is the system repair event at time
tj after the
j inspection. If one of the events
or
occurs, system repair will occur at time
tj. Consequently,
Events and are mutually exclusive because they are based on incompatible events (25), (27), and (29). Therefore, by applying the addition theorem of probability to (74), we obtain (71).
We write the following probabilistic definitions of indicators
,
, and
to prove (70), (72), and (73), respectively:
Given that inspections are scheduled across a finite time horizon (0,
T), and considering that the system’s most recent repair occurs at time
tj, the random variable H exists in the range (0,
T −
tj) with the conditional PDF:
Let us prove (70). Assume that the most recent restoration of the system occurs at
tj and system failure occurs in the time interval from
η to
η +
dη. Consequently, the conditional probability of such an event occurring, provided that during previous inspections the system was correctly judged to be operable, is equal to:
The formulated event’s unconditional probability is as follows:
We calculate the probability of the event:
by integrating (80) over the range of the random variable H.
Equation (82) takes the following form when considering (78).
We determine the joint probability of system recovery at time
tj and event (81) using the probability multiplication theorem.
Since the system can be repaired at any of the moments , the system is as-good-as-new after repair, the events are independent, then the sum of the probabilities (84) with the change in j from 0 to k yields (70), where .
The proof of formulas (72) and (73) are similar.
The UC for subgroup
I1111 is determined as follows:
where
Cins is the average cost of CBM inspection per unit time,
Cdt is the average cost of downtime per unit time,
CPR is the average cost of preventive repair per unit time, and
CCR is the average cost of corrective repair per unit time.
As shown in
Table 1,
P0 and
were maintenance effectiveness indicators for the subgroup
I1112. OPFFO was determined in the same manner as that for subgroup
I1111. As a result of excluding
from the regeneration cycle and
Cdt = 0 for the cost component associated with scheduled CBM inspections, owing to the intermittent nature of the system’s functioning, (85) reduces to:
Since we use the probabilities of
and
when determining the OPFFO, indicators (85) and (86) can also be calculated using the following formulas:
where
,
,
, and
are the average cost of CBM inspection, the average cost of downtime, the average cost of preventive repair, and the average cost of corrective repair, respectively.
The mission availability Ama is the maintenance effectiveness indicator for subgroup I1212.
Let represent the instantaneous mission availability, that is, the probability that the system will be operable at time kτ + and operate without failure for a predetermined amount of time ρ beginning at moment kτ + , where and ρ represent the duration of the mission.
We assume that
is a random variable with a uniform distribution in the interval from
kτ to (
k + 1)
τ –
ρ with the PDF.
Concerning instantaneous mission availability, the following formula holds:
The instantaneous mission availability can be defined as the probability that the interval of a failure-free system operation
falls entirely within one of the intervals between CBM inspections
, considering that at any moment
the system can be recovered with probability
PR(
jτ), where
PR(0) = 1. Therefore, the system will operate failure-free in the interval
if the last system recovery occurs at the moment
, by the results of CBM inspections at instants
the system is judged as operable, and
, i.e.,
We determine the probability of the event (91) by applying the probability addition theorem and considering (89):
where
Substituting (93) into (92) gives (90).
We determined the steady-state mission availability in the case of an infinite maintenance planning time.
For the mission availability the following formula holds:
To prove (94), we express the probability
using the renewal density function and then proceed to the limit.
Since system recovery is only possible at discrete moments of time
kτ (
k = 0, 1, 2, …), we express the renewal density function through the
δ-function:
Using (96), we present (92) in the integral form.
Furthermore, because the function
is not negative, it has limited variation on the semi-axis (0, ∞) and satisfies the following inequality:
Subsequently, according to Smith’s theorem in the case of a lattice random variable [
95], we have:
where
F(
t) is the unreliability function and
α is the average time between system recoveries.
From (93), it follows that:
Since the system is used in intermittent mode (
l = 2):
Substituting (100) and (101) into (99), we obtain (94).
For subgroup
I1311, the achieved availability
is the maintenance effectiveness indicator. We determined the achieved availability using the well-known properties of regenerative processes:
For subgroup
I1321, the maintenance effectiveness indicator can be either AOPUT or AMCUT, which we determined as follows:
where
Cprof is the average profit from using the system per unit time and
Cuf is the average loss due to the system being in a hidden failure state per unit time.
For subgroup
I1412, we determined the inherent availability as follows:
For subgroup
I1422, the maintenance effectiveness indicators were AOPUT and AMCUT; however, because the mode of operation was intermittent,
Cdt = 0 for the cost component associated with scheduled CBM inspections.
For subgroup
I2111, the maintenance effectiveness indicators were APFFO (
PA) and UC for a non-repairable system
. As in
Section 3, under the APFFO, we understand the conditional probability of failure-free operation of the system on the interval (
tk,
t), provided that, according to the results of the CBM inspections at moments
, the system was judged as operable.
According to the given definition, we write the mathematical expression of the APFFO as follows:
The following equation holds for the APFFO:
Let us prove (109). Denote the following events:
Then, we can present the APFFO by the Bayes formula:
By integrating PDF (38) within appropriate limits, we determine the following probabilities:
Substituting (112) and (113) into (111) yields (109).
In the interval (tk, t), APFFO changes from the maximum value PA(tk, tk) at t = tk to the minimum value PA(tk, tk+1) at t = tk+1.
The UC for a non-repairable system is determined as follows:
where
Crep is the cost per unit time to replace the system judged as inoperable by CBM inspection.
It should be noted that when using formula (114), we should set tPR = tCR = trep in formulas (68) and (69), where trep is the replacement time of the system judged as inoperable at a CBM inspection.
For subgroup I2112, the maintenance effectiveness indicators were APFFO (PA) and UC for a non-repairable system operating in intermittent mode .
The APFFO is determined by (109), provided that the CBM inspection time is much less than the interval between inspections.
We determine the UC as follows.
Comparing (114) and (115), we observe that, in (115), the downtime cost Cdt = 0 for the cost component that is associated with scheduled CBM inspections owing to the intermittent mode of the system operation.
For subgroup I2511, the maintenance effectiveness indicator was the achieved availability of a non-repairable system . It is determined by (102); however, in (68) and (69), tPR = tCR = trep.
For subgroup I2512, the maintenance effectiveness indicator was the inherent availability of a non-repairable system . We determined using formula (105) if, in formulas (68) and (69), tPR = tCR = trep.
For subgroup
I2521, the maintenance effectiveness indicator can be either AOPUT or AMCUT for a non-repairable system if, in formulas (68) and (69),
tPR =
tCR =
trep.
For subgroup
I2522, the maintenance effectiveness indicator is also AOPUT or AMCUT; however, owing to the intermittent mode of operation,
Cdt = 0, for the cost component that is associated with scheduled CBM inspections, and the regeneration cycle does not include the time of CBM inspections; that is:
5.5. Optimal Inspection Schedule and Replacement Thresholds
Various criteria can be used to frame the challenge of determining the optimal inspection schedule and replacement thresholds throughout a finite period of the system operation. Some examples of optimization criteria are maximum availability, minimum possible average system operation costs over the interval (0, T), provision of the necessary level of OPFFO with the lowest possible average maintenance costs over the interval (0, T), and the maximum level of OPFFO while limiting the average maintenance costs over the interval (0, T).
The criterion of the maximum achieved availability is formulated as follows:
where
are the optimal inspection instants over the interval (0,
T) and
are the optimal replacement thresholds at instants
We present the criterion for the minimum AMCUT over the interval (0,
T) as follows:
where
l = 1 for the continuous mode and
l = 2 for the intermittent mode of operation.
Since the two indicators evaluate the effectiveness of safety-critical systems, we can formulate two optimization criteria for the subgroups of systems
I1111,
I1112,
I2111, and
I2112. For example, the average maintenance cost of
can be reduced by specifying the lowest permissible OPFFO
for subgroups
I1111 and
I1112. In this case, the optimization criterion has the following form:
If the maximum allowed average maintenance cost
is specified, the optimization criterion is as follows:
We can simplify criteria (120)–(123) for the case of a periodic inspection schedule and determine an optimal threshold
RTopt for all checking moments instead of the optimal replacement threshold for each inspection instant. For example, the optimization criterion (120) in this case can be expressed as:
where
τ is the periodicity of the CBM inspections and
τopt is the optimal periodicity of inspections.
7. Results
Radar systems emit electromagnetic waves, also known as radio waves. The waves are reflected by most objects, enabling the radar system to detect them. To achieve the best possible outcome, it is imperative to ensure that the radar transmitter generates a signal with precisely the right amount of power and specific characteristics. The radar transmitter power supply plays a critical role in its performance. To ensure that the radar system meets the necessary performance standards, it is vital to maintain a stable power supply.
Let us define a subgroup of systems to which the power supply of the radar transmitter belongs. The power supply unit can be repaired, indicating a system maintainability sign of
i = 1. The power supply is in continuous operation, displaying a temporary system use mode with the sign
l = 1. Failures in the power supply, scheduled maintenance, and unscheduled repairs can all result in downtime. Therefore, the sign of failure’s consequences is
j = 3. And, finally, it is not possible to make an economic evaluation of the failure’s consequences (
k = 1). Thus, we can associate the radar power supply with subgroup
I1311, as presented in
Table 1. According to this table, the achieved availability is the maintenance effectiveness metric.
In the next step, we need to identify a specific subgroup of maintenance models, which is essential for determining the achieved availability. If a power supply failure is detected, it is turned off and shipped for repair. Instead, a spare power supply is connected resulting in a value of the classification sign a = 1. According to regulations, CBM with regular inspections must be conducted, resulting in a value of the classification sign b = 2. During the repair process, the power supply unit will be fully restored in a repair shop, resulting in a value of the classification sign c = 1. Determining power supply failure requires an instrumental inspection, which leads to the value of the sign being d = 1. Thus, the maintenance model falls under subgroup M1211.
The output voltage of the power supply for the radar transmitter represents the system state parameter [
97]. If the output voltage is less than 25 kV, the power supply is operable, that is
FT = 25 kV. The parameters of the random process (125) are as follows:
a0 = 19.92 kV,
mA1 = 0.01 kV/h,
σA1 = 0.0043 kV/h, and
μ = 0.9. Assume that the random measurement error of the system state parameter has a normal distribution with a standard deviation of
σy = 0.5 kV and a zero mathematical expectation.
The following data were used to calculate the achieved availability: T =1000 h, tins = 3 h, tPR = 5 h, and tCR = 10 h.
Consider the case of a periodic inspection schedule with an optimal threshold for all inspection instants. In this case, the optimization criterion corresponds to (124).
When calculating mean times (65)–(69), we set tn = nτ, tk = kτ, tj = jτ, and tN = Nτ.
Figure 4 shows the dependence of the achieved availability on the number of inspections in the interval (0,
T). By solving the problem (124), we obtain the following solution:
, and
.
For CM inspections, that is, at RT = FT = 25 kV, the following solution is optimal: , , and . Thus, the use of the optimal replacement threshold substantially increases the achieved availability and reduces the number of inspections by more than half. It should be noted that the unavailability when using CBM inspections is 2.5-fold less than when using CM inspections.
A further increase in the achieved availability is possible using criterion (120), which assumes a sequential schedule of inspections with an optimal replacement threshold value for each inspection moment.