1. Introduction
Reliability analysis of complex, multicomponent systems is crucial in engineering, manufacturing, and operations research. It involves understanding and quantifying the capacity of a system to perform its intended function over a specified period under given conditions. Addressing the reliability of such systems requires a multifaceted approach that integrates structural analysis, probabilistic modeling, and practical maintenance strategies. These methodologies help engineers design more reliable systems, predict potential failures, and develop effective maintenance plans to mitigate risks.
However, as the number of components increases, the lack of knowledge about the structure of the system leads to an estimation problem with an overwhelming number of features [
1]. This results in exponential growth in dimensionality and sparse available data. As a result, high-dimensional dependability analysis continues to be extremely difficult because the majority of current techniques are plagued by the dimensionality curse [
2].
The curse of dimensionality refers to the phenomenon where, in higher dimensions, a local neighborhood loses its traditional “local” properties. A neighborhood containing a fixed percentage of data points becomes disproportionately large, diverging from the intuitive concept of locality [
3]. If a local neighborhood contains 10 data points along each axis, then there are
data points in the corresponding
d-dimensional neighborhood. As a consequence, much larger datasets are needed even when
d is moderate, and such large datasets are often not available in practical situations. This makes smoothing techniques less effective in high-dimensional settings. In nonparametric regression, the curse of dimensionality significantly impacts the convergence rate of prediction error, which is proportional to
. This illustrates that the required sample size
n grows exponentially with
d to maintain a low prediction error [
4].
To address this challenge, various strategies have been proposed, all of which incorporate a dimensionality reduction step [
3]. In this context, we propose an initial step for nonparametric fitting that reduces the dimensionality of the problem using a factor analysis model. This approach aims to mitigate the effects of high dimensionality and improve the practical utility of smoothing techniques.
Machine learning (ML), a form of applied statistics, focuses on using computational power to estimate complex functions, unlike traditional statistics, which prioritize confidence intervals and rigorous proofs of uncertainty [
5]. ML employs algorithms to detect patterns in data, predicting and assessing performance and failure risks in systems. By doing so, it enhances design, maintenance, and safety. ML is defined as a collection of techniques used to identify patterns in data to predict future outcomes or support decision-making under uncertainty [
6]. Over recent decades, ML has revolutionized fields such as control systems, autonomous systems, and computer vision. Similarly, reliability engineering and safety analysis are expected to follow this trend [
7].
ML offers several advantages over traditional methods, including advanced predictive capabilities, the ability to handle large, high-dimensional datasets, real-time monitoring, and automation, all of which contribute to improved system reliability. Common applications of ML in reliability engineering include tasks such as estimating remaining useful life, detecting anomalies and faults, monitoring system health, planning maintenance, and assessing degradation. Studies like [
7] highlight the potential of ML to transform reliability and safety analysis. However, challenges such as data quality and model interpretability must be addressed for ML to fully realize its potential (see [
1] and references therein). Interdisciplinary collaboration and hybrid approaches also show promise [
1,
8,
9]. For instance, ref. [
8] proposes a state monitoring algorithm combining a convolutional neural network (CNN) with a random forest (RF) for data missing scenarios. The CNN algorithm is designed to extract the distributed fault information from the available signals and acquire the state features of the system, then the random forest algorithm processes the state features and judges the system state, while ref. [
9] combines reliability analysis tools with ML to identify critical maintenance components and failure causes. Similarly, ref. [
1] integrates supervised and unsupervised learning techniques to analyze system status.
When applying ML, the question of which model performs best often arises. In 1997, David Wolpert and William MacReady addressed this issue with the “No Free Lunch” (NFL) theorem [
10], which demonstrated that without assumptions about the data, no model is universally superior. This implies that no model can consistently outperform others in all scenarios. It underscores the importance of inductive biases, suggesting that specialized algorithms are required for different types of data, as real-world problems often exhibit low complexity [
11]. Recent explorations extend the NFL theorem to quantum learning protocols, revealing that quantum algorithms can achieve better sample complexity under certain conditions, thus showcasing the theorem’s relevance in advanced learning contexts [
12]. Conversely, while the NFL theorem suggests limitations, it also opens avenues for developing metalearning strategies that leverage correlations among algorithms, potentially leading to improved performance in specific contexts [
13].
The use of ML for reliability analysis in large-scale, complex systems is a key application within Industry 4.0. In this context, the interconnection of sensors and smart devices generates large volumes of data, which can be analyzed to enhance system efficiency and safety. In systems with thousands of sensors, traditional reliability models based on statistical distributions become inefficient. Machine learning techniques, such as deep learning, are better suited for handling high-dimensional data. Algorithms like Random Forest and Bayesian classification models can extract valuable information from the vast datasets produced by sensors [
14]. Industry 4.0 now includes diverse data sources, such as temperature, vibration, and pressure sensors, and the application of ML algorithms enables the integration and simultaneous analysis of these variables, detecting correlations that might not be apparent with traditional techniques [
15].
Reliability analysis is fundamental in the design and maintenance of complex systems. One key question is that, traditionally, systems have been analyzed under the assumption that their components are independent, but this assumption becomes less valid as systems grow in complexity. In reality, system components are often interdependent; they are designed to work together to ensure the proper operation of the system. Addressing these interdependencies is crucial for improving the accuracy of reliability predictions. A complex system is characterized by the nonlinear interaction of its components, where the dynamics of a system cannot be understood simply as the sum of its parts. These systems exhibit emergent properties and adaptability and may have hierarchical structures. To model such complex interactions, advanced methods like ML can be employed, which allow for the consideration of nonlinear relationships between components. The analysis of complex systems in the context of reliability needs to address these interdependencies to improve the accuracy of reliability predictions. Traditional methods that assume independent components are insufficient, and this is where machine learning (ML) methods, such as neural networks, become valuable. These methods can model the complex interdependencies between the components of the system, providing more accurate predictions. This capability makes them ideal for reliability prediction, especially in systems with numerous components that interact in ways that traditional methods may not easily capture. Until 2006, we lacked the knowledge to train neural networks to outperform traditional methods, aside from a few specific cases [
16]. In a neural network, we do not instruct the computer on how to solve our problem; rather, it learns from observational data, discovering its own solutions. The breakthrough in 2006 came with the advent of learning techniques for deep neural networks [
17].
Nevertheless, ML methods have to be adapted according to the area of application, and it is necessary to incorporate the knowledge of experts to improve the accuracy and reliability of the predictions [
18]. Below are some of the ML methods that can be applied to predict the reliability of a complex system with multiple components and possible correlations between them.
This paper conducts a comprehensive comparative analysis between classical statistical methods and widely used ML techniques to predict system reliability in complex systems characterized by high dimensionality. In this context, high dimensionality refers to systems with a large number of components, often grouped into independent blocks of correlated units due to physical, functional, or operational dependencies, a common scenario in engineering systems. These systems pose significant challenges to traditional reliability analysis methods. The contributions of this paper are summarized as follows:
Efficient method for reliability analysis: we present an easy-to-run and computationally efficient method specifically designed for reliability analysis in high-dimensional complex systems.
Classical statistical methods for effective reliability: the proposed method, based on classical statistical approaches, ensures interpretability, scalability, and robustness, even in the presence of complex correlation structures between system components.
Performance comparison with machine learning: we compare the performance of the FA-LR-IS algorithm, designed by the authors, with commonly used ML methods, including artificial neural networks (ANNs), k-nearest neighbors (KNN), and random forests.
Practical relevance via extensive evaluation: The evaluation includes an extensive simulation study and two real-world datasets. This ensures the practical relevance and provides insights into the strengths and limitations of each methodology.
The rest of this paper is organized as follows:
Section 2 first defines the problem statement. Then it introduces the FA-LR-IS algorithm for estimating reliability in complex, high-dimensional systems and discusses machine learning solutions to this problem.
Section 3 presents the traditional division of the data into the training and test sets, together with the metrics for evaluating the models. In addition, the particular evaluation procedure on the test sample for the FA-LR-IS algorithm is included.
Section 4 presents numerical results from a simulation study and two real applications for the study of reliability in systems composed of sensors.
Section 5 provides a brief discussion on the topic, and
Section 6 ends with the conclusions.
2. Approaches to Reliability Analysis
2.1. Problem Statement
Let us consider a system that can be found in one of two states: operative (1) and failure (0). We denote Y the random variable representing the system state. The system is composed of p units, which interact and work together to assure the system’s functioning. For , unit j can be found in a state that is represented by a random variable taking values in . The states of the p units are collected in a random vector that we call the state vector. Depending on the underlying logic of the system, some configurations of the state vector lead to the good performance of the system, while for others the system is out of work. Then, our objective is to determine a function that links the states of the units to the state of the system.
The reliability of a system is the probability that the system will be operational for a particular configuration of the system. We consider the reliability of the system as a function of the state vector:
That is, we assume that given
,
Y follows a binomial distribution with event probability
, with
R an unknown function for which we do not assume any particular functional form. The only restriction is that
R meets the coherence conditions of the system, i.e.,
,
, and
R is monotone in each argument.
Our main objective is to predict whether the system is operational given the state of the components. This problem will be solved using different algorithms in the following subsections. Some of the algorithms will only provide a classification of the system in one of two categories, functioning or failure, but in other cases an estimation of the probability that the system is working in terms of the components states is also provided. To learn our algorithms, we need to observe a sample of systems. Let
be the observed data, where
is a matrix of dimension
. Each row of the matrix is a configuration of the component states of system
i, with
.
is the
n-dimensional vector of all systems of the sample. The objective is to predict whether the system is operational given the state of the components. In the following, we describe in detail the FA-LR-IS algorithm and the basic elements of the ML methods that we will apply in our numerical analysis in
Section 4.
2.2. The FA-LR-IS Algorithm
As we have already mentioned, one of the primary goals in reliability analysis is to mathematically represent the logic underlying a system. Assessing system performance, even for simple structures, can be challenging, making it essential to develop effective methods for modeling the relationship between the state of the system and its components. Unlike some of the ML algorithms, the FA-LR-IS algorithm [
1] not only classifies the system as operative or failed; it provides other useful information. Specifically, the main objective of the algorithm is to estimate the reliability function, that is, the probability that the system is functioning in terms of its unit states as in (
1). Finally, a ranking of components can be established according to the importance of every component in the system state. Understanding the importance of a unit as the impact that a one-unit change in the component state has on the system behavior. In [
19], we emphasize the importance of using nonparametric regression, which produces estimators with good asymptotic properties, although these estimators are not necessarily monotonic. In high-dimensional scenarios, a methodology is proposed that combines factor analysis, logistic regression, and isotonization. The combination of factor analysis and logistic regression was chosen for the following reasons: First, we consider factor analysis for dimensionality reduction, because in complex systems with a large number of components, the total number of variables in the regression problem we aim to solve is very high, many of them highly correlated [
20,
21,
22]. In this context, factor analysis reduces dimensionality by extracting latent factors that summarize the original data. This prevents issues like multicollinearity and overfitting while maintaining the key relationships in the data. Second, in our practical application we are interested in predicting the probability that the system is either operative or in failure given a particular configuration of component states. As known, logistic regression is ideal for predicting binary outcomes. It is interpretable, efficient, and well suited for modeling the nonlinear relationships that are underlying in our problem. Finally, to ensure practical relevance and interpretability of results, the model is expressed in terms of the original variable by applying an inverse transformation, which is possible given the mathematical formulation of the combination of these two models.
Here, we outline the algorithm described in [
1] for constructing a statistical model that predicts system reliability in complex systems with many interdependent components.
Let represent the observed data, where is an matrix. Each row of , for , corresponds to a configuration of component states in a system with p dimensions. The state of each component is modeled by a random variable that takes values within the interval . The components are not assumed to be independent. The vector has dimension n, where for each , indicates that the system is operational, and otherwise.
2.2.1. Data
Preprocessing
Center the matrix: : Define for each . Let be a diagonal matrix, where the jth element is . Define , where a unit column vector of size p. We denote , resulting in a matrix with dimensions .
2.2.2. FA-RL-IS Algorithm
Apply factorial analysis (FA): we apply a FA algorithm to the scaled dataset and transform the data into a reduced set denoted as with dimensions , where . Here, , and is the FA coefficient-matrix of dimension .
Fit a local-logistic model: a local-logistic model is fitted using the reduced dataset with the leave-one-out-cross-validation (LOOCV) criterion for bandwidth selection.
- (a)
Let h be the bandwidth, varying over the grid .
- (b)
Construct the local-logistic model based on and estimate for all .
- (c)
Consider the leave-one-out () dataset for each , which exclude the ith observation. Define as the fitted model using the dataset and estimate for all .
- (d)
The cross-validation score is defined as
- (e)
Define .
Let
represent a specific configuration of component states. We compute
and construct a local-logistic model. For any
such that
, the fitted model is given by
with
vector of coefficients obtained using
.
Let
be defined such that
. Under this condition, we estimate the local-logistic model within the state space of components using
where the vector of estimated coefficients
is given by
where
denotes the
jth row of matrix
, and
.
Define
, for
. Apply the isotonization algorithm outlined in [
19] to adjust these estimated values, thereby obtaining the estimated reliability for the
ith configuration of states of components, denoted as
,
.
Let us denote
the vector with components
,
. The problem is to find
minimizing
subject to
, for matrix
of dimension
and with elements
, and
for
. The set
is a polyhedral convex cone in
. To solve this problem, we will use the
hinge algorithm presented in [
23].
Analyze the labeled data using the ROC curve to calculate the (
) as described below in
Section 3.2 and evaluate the model.
Broadly speaking, the algorithm goes in this sense: The goal of the FA-LR-IS algorithm is to estimate the probability of the system functioning based on its components performance levels. However, with a large number of inputs, overfitting and higher prediction errors may arise. To address this, once the data have been normalized, the algorithm first reduces dimensionality using factor analysis (FA), which groups correlated variables into factors that share common variance.
Next, a local-logistic model is built in the latent space rather than using the original component states as inputs. The local regression model is constructed on the scores matrix generated by the FA algorithm. Nonparametric regression ensures estimators with desirable properties like consistency and normality, but these estimators are not inherently monotonic. In a coherent system, however, the response variable of the regression model (the system state) must be monotonic relative to the original variables (component states). Since the features used in the local-logistic model lack clear physical interpretation, we cannot assume monotonicity. Therefore, we propose an isotonization step after back-transformation to ensure the model is expressed in terms of the original variables.
Finally, the estimated probabilities generated by the logistic regression model are translated into classes or categories using the classification obtained from the ROC curve.
2.3. Artificial Neural Networks
Artificial neural networks (ANNs) are ML algorithms that simulate the learning mechanisms of biological organisms [
17], having demonstrated success in a variety of areas, such as natural language processing, speech recognition, and image recognition. The application of neural networks in reliability estimation is based on their ability to model complex relationships between input data (e.g., historical failure data or operating conditions) and output results (such as failure probability, remaining useful life, etc.). Neural networks significantly improve the ability to anticipate failures, optimize maintenance, and ensure reliability in complex systems, especially in industries where the cost of a failure can be high [
24]. An example of this can be seen in [
25]; the authors propose the use of neural networks to predict the useful life of machines and components, highlighting the predictive capacity of these models and illustrating the methodology with two studies: repair of damaged units subjected to fatigue and a pump system in an industrial plant.
ANNs are made up of units called neurons, which are interconnected in a structure consisting of at least two layers: an input layer and an output layer. In the case of having a hidden layer, it must have at least one hidden layer. The input layer receives the data (relevant features such as operating conditions, runtime, sensor variables, etc.), while the hidden layers process the information in an intermediate manner. Each neuron in these layers performs a mathematical transformation based on the learned weights and biases. Finally, the output layer produces the reliability estimate, such as the probability of failure or the remaining lifetime. Each neuron takes a linear combination of the inputs and then applies a nonlinear activation function. This process can be described as follows:
where the following is true:
is the value of neuron j in layer l.
is the weight connecting neuron i in the previous layer to neuron j in the current layer l.
is the output of neuron i in the previous layer.
is the bias of neuron j in layer l.
n is the number of neurons in the previous layer.
Then, an activation function
g is applied to introduce nonlinearity,
; for more details, see [
17,
26]. Depending on the type of study, selecting the appropriate activation function allows the network to learn more complex patterns and perform more sophisticated tasks. For reliability classification problems, activation functions such as ReLU (Rectified Linear Unit) or the sigmoid are commonly used:
The sigmoid function is especially popular in the output layer for binary classification problems, as it generates an output in the form of a probability [
27]. This process is repeated in all the hidden layers. In the last layer (the output layer), the output value
Y, which represents the probability of failure at a given time, is generated using an activation function such as the sigmoid in the case of classification problems.
The ANN requires a learning process to adjust the weights and biases of the connections, which incurs a computational cost. Algorithms such as gradient descent (GD), stochastic gradient descent (SGD), adaptive gradient descent (AdaGrad), root mean square propagation (RMSprop), and ADAM are commonly used to minimize the loss function [
17,
28]. The ADAM algorithm updates the parameters of the neural network according to the following expressions. For each iteration
where
L is a loss function,
is the learning rate, and
and
are correction biases for the exponential moving average of the gradient and the exponential moving average of the squared gradient, respectively. The parameter
is a smoothing term that avoids division by zero.
Loss functions are critical in the training and validation stages, as they minimize the difference between the system state predicted by the model and the actual state, allowing for the correct fit of the model parameters, i.e., weights and biases. For regression problems, the mean square error is typically used, while binary cross-entropy is an appropriate cost function for binary classification problems. The binary cross-entropy loss function is obtained by applying:
For our problem we consider a feedforward neural network with three layers that attempts classification. The input layer consists of
p input neurons (
) with sample information,
H neurons in the hidden layers, and one neuron in the output layer,
Y. The model that generates the network is expressed as follows:
where
are the connection weights of the input
i with neuron
h of the hidden layer,
are the connection weights of the hidden layer with the output layer,
and
are the bias terms, and
y
are the activation functions in the hidden and output layers, respectively.
2.4. K-Nearest Neighbors
The K-nearest neighbors (KNN) algorithm is a supervised learning technique commonly used in classification and regression problems. In the context of reliability, KNN can be applied to make predictions related to the probability of failure or lifetime of a system based on historical data of similar failures. An example would be the prediction of the remaining lifetime of electronic devices under accelerated stress conditions, where machine learning models such as KNN are used to estimate the reliability of electronic components with high levels of accuracy [
29]. In [
30], a methodology is presented that combines active learning and the KNN algorithm to evaluate the reliability of engineering structures based on the assessment of the fracture probability of cracked structures.
Using KNN, a system is classified based on data from other similar systems (neighbors). If a majority of the nearby neighbors have failed under similar conditions, the algorithm predicts that the system is also at risk of failure. The aim is to assign an unclassified point , i.e., a given configuration of component states, to the class state of the system represented by a majority of its K-nearest neighbors. To do this, the distance between the point of interest and each point in the dataset, is calculated. The K nearest neighbors are selected, that is, those points that minimize the distance . Finally, the majority class of the state of the system is determined among the nearest neighbors; the system state, , for point will be the one that appears most frequently among the K neighbors.
The choice of the parameter
K has a significant influence on the performance of the model [
31,
32,
33]. To choose the parameter
K correctly, there are several alternatives, such as opting for an odd value of
K, which would avoid possible ties in the proportions of membership in each class. The leave-one-out or
K-fold cross-validation technique consists of dividing the dataset into
K subsets, using
of these to train the model and the remaining subset to evaluate performance. This process is repeated
K times, and the model performance is averaged for different values of
K. Another option is to try different values of
K, apply the method to sample points whose classification is known, and select the value of
K that minimizes the classification error. Empirically, one can highlight the square root rule of the dataset size, which suggests that
K should be approximately equal to the square root of the total size of the dataset.
The importance of normalizing the data is also highlighted since the KNN algorithm is sensitive to the scale of the variables. This algorithm is simple to implement and easy to understand [
34], but it can have a high computational cost if working with large datasets due to the calculation of distances. In KNN algorithms, the choice of the distance metric directly affects how the proximity between points in the feature space is calculated, which in turn influences classification decisions [
35]. The Euclidean distance is the most common metric; however, the Manhattan distance can also be used. Finally, we can mention the Minkowski distance, which generalizes the two previous ones and whose expression is given by
When , it is equivalent to the Euclidean distance, and when , it is equivalent to the Manhattan distance.
For the problem we are considering, the method would be formulated according to the following steps. The objective of the KNN algorithm is to classify a new observation into one of the classes using the information from the K nearest neighbors in the feature space:
Let a dataset with n observations, where each observation i has p input variables (features). This type of problem could involve features related to different system reliability metrics, such as time to failure, failure rate, system age, repair time, etc. We want to predict a class for each observation.
Compute the distance between and each observation in the set using the chosen distance metric.
Choose the K neighbors with the smallest distances to .
The predicted class is the one with the majority of votes among the K nearest neighbors.
2.5. Random Forest
Random Forest (RF) is a powerful ML technique that has been applied in the field of reliability engineering to improve failure prediction, risk assessment, and maintenance decision-making. Its ability to handle complex, high-dimensional data, as well as providing insight into the most relevant factors affecting system reliability, makes it a particularly valuable tool in real-world applications.
Several scientific studies have explored the use of RF in reliability engineering; for example, to predict failures in semiconductor manufacturing equipment, demonstrating the high accuracy and robustness of the algorithm in handling complex operational data [
36] to estimate product failures and warranty costs, highlighting the effectiveness of the model in analyzing warranty claims data and in projecting future costs based on historical failure data [
37], and Ref. [
38] proposes a model that integrates several machine learning techniques, including radio frequency, to predict failures in secondary power distribution networks. In particular, the model leverages the ability of radio frequency to identify relevant patterns in meteorological data and historical fault records, critical factors for predicting failures in power distribution systems.
In the field of reliability, RF operates by creating multiple independent and uncorrelated decision trees [
39]. Each tree is trained on a randomly selected portion of the data, and at each split within the tree, a random subset of component states is used. This process, known as bagging or bootstrap aggregating, helps reduce model variance and prevent overfitting. Once all decision trees are trained, the RF algorithm makes predictions by combining the individual predictions from each tree. In classification problems, a majority voting strategy is used: the system state that receives the most votes among all trees is selected as the final prediction. The RF predictor is defined as the average of the predictions from
B independent decision trees, each trained on a randomly selected dataset:
where
B is the total number of trees in the forest, and
is the
b-th decision tree. That is, if regression analysis is used, the probability that the system works will be the percentage of trees that have classified the system as working. If classification analysis is used, the estimate will be determined by majority vote. In this way, a final prediction is made that depends on the predictions resulting from the set of trees produced. This approach makes RF particularly effective at handling large datasets with numerous features, providing efficient computational performance tailored to solving complex problems in ML [
40].
For our problem, we consider the implementation of the following:
Bootstrap sampling: Training samples are created by randomly selecting, with replacement, a subset of component states for each tree. This procedure generates diverse datasets for building varied trees, introducing diversity into the model.
Node splitting: Within each tree, nodes are split using the best possible partition based on a random subset of features (or component states in this case). The goal is to minimize the impurity of child nodes after each split. The most common measure for classification is the Gini impurity or information gain (based on entropy).
where
is the proportion of examples at the node that belongs to class
k, and
K is the total number of classes. The splitting of each node is performed in such a way as to minimize impurity.
Tree growth: trees are expanded to the maximum allowed size, without pruning except for explicit constraints such as maximum depth or minimum number of samples per leaf.
Aggregation of results: Once all trees are built, their predictions are combined to produce the final estimate. For classification, each tree makes a prediction for a given example. The class with the highest number of votes (majority voting) is selected.
The component state importance is based on the sum of the Gini impurity reduction, weighted by the number of samples arriving at each node, and averaged across all trees.
4. Numerical Results
This section presents a simulation study of multiple systems, followed by an application to two real-world datasets. These analyses demonstrate the strong performance of the FA-LR-IS algorithm compared with the supervised learning methods mentioned above.
4.1. Simulations
To assess our method, we carried out a simulation study using systems with varying configurations. Many scientific fields involve static or dynamic systems composed of multiple components, which can be grouped into distinct, interacting blocks. We assume that the structural logic of the system can be represented using a block diagram.
Figure 4 presents a graphical representation of the four cases analyzed in this section.
The data for each case were generated as follows: We simulate
M = 500 samples with size
. Let
p represent the number of components in the system, with
p taking values of 9, 10, 15, and 25 in our examples. The resulting data are organized into a matrix with
columns. The first
p columns correspond to the states of the components, denoted as
, where each
,
. The
th column represents the system state,
Y, which is 1 if the system is operational and 0 otherwise. Then the components are combined in blocks, denoted as
, for
. for example, in System 1. We assume that
if components
k and
are disposed of in the same block
F and
otherwise. This relationship is illustrated in
Figure 4, where the blocks containing dependent components are marked with dashed lines.
The state of the system is simulated by incorporating a latent variable that is not directly observable, represented as , with denoting a specific configuration of the state vector. We set , and the information regarding the state of system Y is simulated from a binomial distribution, where the event probability is defined by , with .
The structure function of each model is given in the following:
System 1. We examine a series-parallel system consisting of
components, as depicted in
Figure 4 (top plot). The system is organized into three blocks connected in series. The first two blocks are arranged in parallel, containing three and four components, respectively. The third block is made up of two components connected in series. The structure function of this system is expressed as follows:
where
denotes the state of the
jth component,
.
System 2. We examine a system consisting of a series–parallel combination with
components, as shown in
Figure 4 (second plot). The system is arranged with four parallel blocks connected in series. The first two blocks each contain two components, while the last two blocks consist of three components each. The structure function for this system is represented by the following expression:
where
denotes the state of the
jth component,
.
System 3. We analyze a bridge system consisting of
components, as shown in the bottom plot of
Figure 4 (third plot). The basic bridge structure has been modified to incorporate redundancy, where each component is replaced by a block of three parallel-connected units. The structure function of this system is expressed as follows:
where
, for
, and
denoting the state of the
jth component,
.
System 4. We consider a bridge structure with
components as displayed in the bottom plot of
Figure 4 (bottom plot). Again, a simple bridge structure has been modified, introducing redundancy, but in this case, each node has been replaced by a block consisting of a bridge structure with five components. The structure function for this system is represented by the following expression:
where
for
, and
denoting the state of the
kth component,
.
Each sample
is split into a training set and a test set, as illustrated in
Figure 1. We then have
, with the corresponding set of indices
similarly divided into
, where
represents the indices of the training set and
represents the indices of the test set. Accordingly,
The results presented below were obtained using the estimated reliability for each system (S = 1, 2, 3, 4) with a sample size of , ensuring that the sample split remains consistent to facilitate reproducibility across methods; this guarantees that all algorithms are applied to the same datasets.
Below, we present a series of tables and graphs with different measures to compare the FA-LR-IS algorithm with other ML methods.
First,
was used to assess the goodness of fit; a higher value of this measure means that the model achieves better overall performance.
Table 1 shows the mean and standard deviation (SD) of the
calculated over all repetitions in the experiment. All values are displayed in
Figure 5; each system is color-coded for clarity. The FA-LR-IS algorithm, with the highest values in the table, consistently outperforms the other methods across all system configurations, demonstrating superior discriminatory power as a classification method.
Next, we consider the Mean Squared Error (
) to evaluate the accuracy of a model. Then, for each system, given a particular sample of a test dataset
, with
, the
was calculated using the formula:
where
represents the true reliability function for the structure
S, and
denotes the reliability estimated using the corresponding method.
Table 2 presents the average values and standard deviations (SD) of
across
M repetitions; a lower MSE indicates that the model predictions are very close to the actual values.
Table 2 shows that FA-LR-IS outperforms all other methods for Systems 3 and 4, achieving the lowest error, while ANN performs comparably for Systems 1 and 2; although it can be seen in
Figure 6 that FA-LR-IS and ANN perform similarly concerning MSE for the four systems considered, the values are very similar.
Table 3 provides the mean and SD for the accuracy of the estimator measured in terms of predictive capacity as defined in
Section 3.2 across repetitions. The FA-LR-IS algorithm outperforms the other methods for Systems 3 and 4, while ANN performs best for System 1 and KNN for System 2. That is, our algorithm has been shown to achieve a greater predictive capacity in the case of a more complex system than if it is simpler. All values obtained are displayed in
Figure 7.
In
Figure 7, the accuracy of the estimator, measured in terms of predictive capacity as defined in
Section 3.2, is shown.
Finally, given that FA-LR-IS and ANN were the most competitive methods, we conducted direct bootstrap comparisons between them. For each measure (, , or accuracy), the following procedure was implemented:
Steps
For the 500 sample realizations of the measure obtained for each method (FA-LR-IS and ANN), calculate the difference between the means.
Combine all results from both methods and draw two samples with replacements from this combined data, each of size 500. Calculate the means of these samples and then compute the difference between them.
Repeat Steps 1 and 2 a total of 10,000 times to obtain an asymptotic bootstrap distribution for the difference in means.
Calculate the p-value as the proportion of bootstrap differences that are less than or equal to the absolute value of the observed difference in means from the initial 500 observations.
Table 4 shows the
p-values for hypothesis tests assessing the equality of the measurements obtained with each method. For
and
, FA-LR-IS significantly outperforms ANN (
p-value < 0.0001). Although ANN achieved better results in accuracy for Systems 1 and 2, the differences were not statistically significant. In contrast, FA-LR-IS significantly outperformed ANN in Systems 3 and 4. Notably, when FA-LR-IS was superior, the difference was statistically significant, whereas the advantage of ANN was not significant in the cases where it performed better.
4.2. Scalability Analysis
We have calculated the execution time based on sample sizes of 50, 100, 250, 500, and 1000, simulated from System 4 shown in
Figure 4. On the one hand, the FA-LR-IS algorithm has been implemented using a 3.60-GHz Intel Core i5-8600K processor. On the other hand, the simulations for the ML methods have been performed using a 2.20-GHz Intel Xeon processor. The simulation results are shown in
Table 5. We observe that the execution time for all the methods grows with the sample size, with the AF-RL-IS algorithm standing out. The cross-validation method used for bandwidth selection makes the time to increase exponentially. However, this algorithm yields the best goodness of fit, as can be seen from the
,
, and accuracy results for System 4 shown in
Table 1,
Table 2 and
Table 3. As for the ML techniques, we see that the ANN algorithm provides longer execution time than RF and KNN, the latter being the one that yields the smallest execution time and worse results for the
,
, and accuracy measures. The KNN method is sensitive to data scaling; therefore, it exhibits low execution times due to the variables being normalized beforehand. In this context, we can assert that, for System 4, the models that take the longest to train are those that yield the best goodness-of-fit results. However, despite the comparison of the different methods studied, it should be noted that their recorded execution times are not directly comparable, as they were not performed under the same conditions. The ML techniques have been implemented using Python 3.10.12, with optimized functions that have been tested by a large team of professionals. In contrast, the FA-LR-IS method has been implemented in R-4.4.2, utilizing functions developed by the authors that are not fully optimized for computational efficiency. Additionally, this software has more limited resources for performing complex calculations with large datasets. In this regard, we propose the creation of an R package in the short term, where the functions will be optimized. The package will be made available on the CRAN repository and on our personal website,
www.reliastat.com.
4.3. First Real Case Study: A Water Pump Sensor Monitoring System
We analyze a dataset concerning an industrial structure; in particular, we have performance measurements of a water pump installed in a small area. The dataset was sourced from the data platform
www.kaggle.com (accessed on 1 February 2024), and a statistical analysis of this data is presented in [
41]. The sample information was collected by a set of sensors monitoring various components of the water pump over time. Specifically, 50 sensors measured parameters such as temperature, pressure, vibration, load capacity, volume, and flow density, among others, every minute from April 1st, 2018, to August 31st, 2018. In total, there are 220.320 observations. Limited information is available regarding the behavior of the sensors, but according to the study of one of the experts who have analyzed the data (available on the data website), the intermediate group of sensors (sensor16–sensor36) corresponds to the performance of two impellers, while the first 14 sensors monitor aspects related to the engine.
The data consist of a longitudinal follow-up of a single system, with observations taken at one-minute intervals between consecutive data points. To ensure that the systems in the sample are independent, we did not use all the available records. Instead, we increase the time gap between the data points we analyze. In [
1] we considered a small sample with all records taken in the first minute of every day and focused the study on building a ranking of components in terms of the effect that changes in each component had on the reliability of the system. In this case, we consider sampled data at one-hour intervals, resulting in a sample size of
, and we applied the same algorithms used in previous simulations. Specifically, we implemented the FA-RL-IS algorithm following this procedure:
We split the sample into training (80%) and test (20%) sets (Step 1).
To train the model, we proceeded as follows:
- (a)
Data normalization: since the scales of the sensor measurements vary, it was necessary to normalize the data to avoid the influence of variables with larger scales and ensure comparability.
- (b)
Factor analysis: this step involved:
- i.
Examine possible correlations. The correlation matrix with all variables is shown in
Figure 8. Some sensor groups, such as the intermediate group, show high positive correlations within the group.
- ii.
Determine the appropriate number of factors. Using the R package
psych [
42], we determined the number of factors to extract, based on a
scree plot (
Figure 9).
- iii.
Conduct the factor analysis. We used the fa function from the psych package to perform an exploratory factor analysis of latent variables using maximum likelihood. The correlation matrix was decomposed into eigenvalues and eigenvectors, estimating the commonalities for each variable across the first five factors. Factor loadings and interfactor correlations were also obtained.
- (c)
Local-logistic estimation: We then fitted a local-logistic model in the space of the first five factors, , and back-transformed the results to the original feature space, . The bandwidth parameter was estimated through cross-validation. We now had a model that predicts the probability that the machine is functioning (reliability function) based on the sensor values.
- (d)
Classification: After estimating the probabilities with the logistic regression model, we translated these probabilities into classes or categories. For this, we used the classification obtained via the
, see (
4).
Test the model:
- (a)
Normalize the data.
- (b)
Transform the data to a reduced set.
- (c)
Local-logistic estimation. We applied the same model used to predict the reliability function to the test dataset.
- (d)
Classification. We classified the observations in the test set using the same method as with the training set.
Metrics: All previous steps generated results to compute error metrics on the training dataset. We calculated these error metrics using the test dataset, fitting the local-logistic model, and applying the bandwidth obtained in Step 2(c) on the training data. The test set was classified using the model obtained with the training set.
In addition to the proposed algorithm, we also executed the ANN, KNN, and RF algorithms using the same sample split and parameters as in the simulations.
Table 6 shows the error metrics calculated on the test dataset. We observe that the KNN algorithm yields the highest specificity, accuracy, TPV, and F1-Score, while ANN shows the highest sensitivity and the same accuracy and F1-Score as KNN. The proposed algorithm shows high and competitive values comparable to the other ML algorithms, except for specificity, where it underperforms relative to the other methods.
4.4. A Second Real Case Study: Condition Monitoring of Hydraulic Systems
We analyze a dataset containing information collected from a hydraulic system. The dataset is available on the website
archive.ics.uci.edu (accessed on 20 September 2024). The data contain useful information for developing predictive models to detect faults early, contributing to the safety and operational efficiency of these systems. These data have been analyzed in several works, exploring the use of advanced data analysis techniques, such as multivariate statistics, sensor fault compensation, and automatic feature extraction, to improve condition monitoring in complex hydraulic systems [
43,
44,
45].
The data were collected using a hydraulic test rig. This rig features two circuits: a primary working circuit and a secondary cooling-filtration circuit, both linked through an oil tank. The system performs cyclic load tests of 60 s, during which it records process parameters such as pressures, flow rates, and temperatures. Simultaneously, the conditions of four key hydraulic components (cooler, valve, pump, and accumulator) are varied in a controlled manner. A total of 2205 observations are recorded, involving 17 sensors. The target condition value recoded was the “stable flag”: 1 if conditions were stable and 0 if static conditions might not have been reached yet.
We have also applied to this dataset the ML algorithms considered previously, i.e., ANN, KNN, and RF, as well as our FA-LR-IS algorithm. The main results are summarized in the following:
We split the sample into training (80%) and test (20%) sets (Step 1).
To train the model, we proceeded as follows:
- (a)
Data normalization.
- (b)
Factor analysis: This step involved the following:
- i.
Examine possible correlations. The correlation that exists between the variables is remarkable, as can be seen in
Figure 10.
- ii.
Determine the appropriate number of factors to extract, based on (
Figure 11).
- iii.
Conduct the factor analysis.
- (c)
Local-logistic estimation. We then fitted a local-logistic model in the space of the first three factors, , and back-transformed the results to the original feature space, . The bandwidth parameter was estimated through cross-validation.
- (d)
Classification. After estimating the probabilities with the logistic regression model, we translated these probabilities into classes or categories via the
; see (
4).
Test the model: do steps 3(a–d) as in the previous example.
Metrics: We calculated the error metrics using the test dataset, fitting the local-logistic model and applying the bandwidth obtained in Step 2(c) on the training data. The test set was classified using the model obtained with the training set.
Finally, we executed the ANN, KNN, and RF algorithms using the same sample split.
Table 7 shows the error metrics calculated on the test dataset. We observe that, in this case, the RF algorithm yields the highest values in almost all metrics, followed by the KNN algorithm. The FA-LR-IS algorithm shows high values comparable to the other ML algorithms, proving to be a strong competitor.
5. Discussion
One significant limitation of our study is the inability to train a deep learning model for comparison with our custom statistical model. While deep learning models typically outperform ANNs, they require substantial datasets to learn effectively [
46]. Unfortunately, our datasets are insufficient in size to support the training of such a model. Additionally, the computational resources necessary for training deep learning models are beyond our current capabilities. The need for multiple data sources, such as labeled and unlabeled data in semi-supervised learning, adds another layer of complexity to determining the right dataset size. This requires careful consideration of how to balance different types of data [
47]. Designers must also consider performance targets, collection costs, and penalties for failing to meet these targets, which complicates the decision-making process regarding dataset size. Training large models with very large datasets can take months of computational power [
48], although several strategies have been proposed to curtail the trial and error time [
49].
A key advantage of classical statistical models over ML models is interpretability. While ML models often operate as black boxes, classical statistical models are explicitly defined through mathematical equations, which are more objectively quantifiable.
Although some ML methods attempt to overcome this limitation using procedures like SHAP for all ML models or MDI for random forests, these methods require fine-tuning numerous hyperparameters to achieve optimal performance. In contrast, in our probabilistic approach, implemented through the FA-LR-IS algorithm, the only tuning parameter is the bandwidth or smoothing parameter, which is determined via cross-validation, making it entirely data-driven and not sensitive to user-selected options. Our statistical methodology is fully non-parametric, allowing it to be flexible enough to capture complex relationships among variables, such as non-linearity and interactions.That being said, we acknowledge the high predictive power of ML methods, which operate with lower computational complexity even with large amounts of data compared with our statistical model, as demonstrated in
Table 5, included in the new version of the paper. For this reason, one of our future research directions is to propose a hybrid method that combines the computational power of ML methods with the objectivity and clear formulation of a statistical model.
Table 8 summarizes the strengths and weaknesses of each method.
An interesting avenue for future research could involve measuring the performance of FA-LR-IS against a deep learning model over time. Because of the simulations, we hypothesize that our statistical model will initially outperform deep learning models in scenarios where the system is new and data availability is limited, due to its lower dependency on large volumes of data. However, as the system matures and accumulates more data, the deep learning model may exhibit superior performance.
We aim to explore several questions in future work: At what point does this shift occur? How does it differ from system to system? Is the improvement worth the cost of deploying a deep learning model?
There are also models of ANNs that allow for feedback loops, known as recurrent neural networks (RNNs). While RNNs have been less influential than feedforward networks, partly because their learning algorithms are less powerful to date, they are still extremely interesting. RNNs more closely resemble the way our brains operate, and they may be capable of solving important problems that feedforward networks can only tackle with great difficulty. RNNs are more effective for large and complex datasets, while classical methods are better suited for smaller, univariate datasets. RNNs consistently outperform ARIMA and exponential smoothing in terms of accuracy, especially for seasonal time series, as demonstrated in various studies. RNNs exhibit lower Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE) compared with classical methods, indicating superior forecasting capabilities. RNNs are particularly advantageous for long-term forecasts, whereas ARIMA may perform better in short-term predictions [
50,
51].
6. Conclusions
In this paper, we refine a previously introduced probabilistic algorithm to align with the standard ML framework of training and testing. This refinement allows us to systematically evaluate its performance, showing it outperforms leading ML methods such as RF and ANN in the following sense. Unlike black-box ML models, our probabilistic approach is fully interpretable, providing meaningful insights into the system state and quantifying individual component importance. By combining the procedural strengths of ML with the transparency of probabilistic modeling, we present a robust, hybrid algorithm that is both powerful and practical for engineering applications. To apply ML techniques, it is essential to carry out a thorough analysis of the data beforehand and reflect on which approach would be most appropriate, whether using ANN, KNN, or another algorithm. Additionally, we explore the use of ML methods like RF for reliability estimation, an area traditionally dominated by probabilistic parametric approaches, and conduct an extensive comparison of classical statistical methods versus ML techniques in the reliability context.
To illustrate the benefits of our method, we carried out a simulation study and applied it to two real datasets. In all instances, the FA-LR-IS algorithm yielded favorable results regarding model accuracy, making it a strong competitor for any ML method. The algorithm even is able to identify internal dependency structures in the system. In summary, the FA-LR-IS algorithm can be considered a versatile and effective option when there is uncertainty about which method to use, acting as a general solution in situations of uncertainty.
In the simulations, KNN and RF have been shown to offer less accurate estimations of the reliability of the system, providing considerably poorer results than the FA-LR-IS algorithm and the ANN. This is because these estimates are only averages and are not formulated with a mathematical expression that depends on the state of the system.
This study suggests that the FA-LR-IS model is not intended to be a definitive solution but rather an intermediate step while better resources are being developed. This idea is reinforced by the conclusion that there is no single method to solve all reliability estimation problems. The FA-LR-IS is presented as a versatile and effective solution in scenarios with uncertainty, aligning with the need to use diverse tools depending on the context.
Future work will focus on investigating the flexibility of the FA-LR-IS algorithm in quantifying the localized effects of specific units on system performance, comparing these findings with the machine learning methodologies discussed in this paper, as well as other methods. It is also proposed to extend machine learning techniques, such as random forests (RFs), to address the problem of reliability estimation while ensuring that system consistency conditions are met. In this context, the approach will focus on analyzing methods to optimize the selection of input samples for tree growth. Additionally, the study will explore more accurate methods for determining the optimal criteria for splitting the nodes of the regression trees used in building the random forest model. A measure will also be developed to quantify the impact of each individual component on the overall system performance, allowing for a ranking of components within the system structure. Managing the weakest areas would allow preventing failures that result in significant economic losses. This approach would facilitate the design of preventive maintenance policies aimed at mitigating the probability of failures originating in the most vulnerable components.