This study aimed to establish a predictive maintenance policy by analyzing operational data in a real-world scenario involving data from power transformers supplied by a Chilean energy distribution company. Therefore, data were collected from four databases, including transformer details, external tests, operational data, and intervention data, which were processed and merged into a single dataset for this case study.
As context, the company relied on its manual solution, requiring external input at every step. Their method initially generated a list of potential solutions with their respective Weibull parameters, covariate weights, and flags to indicate the feasibility of the solution. A user then selected the most appropriate solution from this list. The process was repeated if no feasible options were available.
The proposed framework aimed to streamline that process by integrating the covariate selection and weight and band estimation processes to obtain a feasible solution without a user having to initially decide. Two analyses were conducted to test this approach’s effectiveness. The first study evaluated the decision policy quality derived from a preliminary solution and the optimal log-likelihood solution obtained in previous studies. Additionally, a sensitivity analysis was conducted to check the impacts of the resultant maintenance policy. The second study focused on methods for selecting the optimal covariates, examining how they affected the decision policy, and comparing them with the previous results.
4.2. Analysis 1: Covariate Weight Sensitivity Analysis
The first analysis aimed to assess the performance of the covariate weight and band estimation process relative to the preliminary solution shown in
Table 3, exploring the impact of these covariate weights on the decision policy. Additionally, it investigated whether the optimization parameters identified as optimal in Ref. [
3] could enhance the accuracy of the decision rule. Furthermore, a sensitivity analysis was conducted on the latter solution to examine whether minor modifications to the optimization parameters could lead to variations in overall reliability metrics.
A preliminary sensitivity analysis was conducted to determine the optimal GA hyper-parameters. By changing the values of the gene population (from 50 to 500 in steps of 50), crossover probability (from 0 to 1 in steps of 0.1), and mutation probability (from 0 to 1 in steps of 0.1), approximately 4000 cases were created. An extract of these cases is shown in
Table 4 with the top best and worst cases. The optimal results indicated the selection of a population of 500 genes, the crossover probability was set to one, the gene distribution of each offspring was set to 0.1, the mutation probability was equal to 0.7, and each value in the offspring underwent mutation by applying a Gaussian additive mutation with a mean of zero, a standard deviation of one, and an independent probability of 0.3 for mutation occurrence. Finally, 1000 generations were taken into account.
A total of eight cases were defined for the analysis. Case 1 corresponded to the original solution provided by the Chilean company, as detailed in
Table 3. Case 2 represented the optimal solution identified based on the partial log-likelihood score from [
3], utilizing the optimization parameters IPOPT, NS transformation, and IPCRidge. The remaining cases involved sensitivity analyses applied to the optimization parameters of case 2.
Table 5 outlines the optimization parameters, log-likelihood scores, Weibull parameters, log-likelihood scores, and resulting covariate weights used for each solution.
In
Table 5, the
column indicates the optimization methodology:
fmincon for the original solver, IPOPT for the traditional solver, and GA for genetic algorithms. The
column indicates the type of scaler: NS using the custom solution transformation and MinMax employing the scaling with values between zero and one. The
column specifies the technique for establishing covariate weight constraints: Fixed for bounds set in custom software based on expert criteria and IPCR using the proposed method for bound determination.
represents the partial log-likelihood score from Equation (
3), where a lower value technically indicates a better solution. Lastly, the Weibull parameters and covariate weights are displayed in the remaining columns.
Here, case 1 stood out as having the worst LL score, while case 2 achieved the best score, although it remained to be seen whether this also translated into a good maintenance policy. Furthermore, the cases that utilized the IPOPT solver outperformed those that used the GA solver. This suggested that the solver method significantly impacted the quality of the solution, as evidenced by similar LL scores and and values.
One noteworthy finding was that varying the bound determination method produced nearly identical results in terms of covariate weights and LL score. This was evident in cases 3, 5, and 8, as well as cases 4 and 6. Consequently, the total number of cases was streamlined to five: cases 1, 2, and 7 were designated as M1, M2, and M3, respectively. Cases 3, 5, and 8 were consolidated as M4, while cases 4 and 6 were represented by M5.
Table 6 shows these modifications.
Before proceeding with the analysis, it is essential to highlight a specific modification applied exclusively to cases using the MinMax scaler (M2 and M3). When calculating the
values for these cases, they had a magnitude of about
. This resulted in centroids with matching magnitudes. Consequently, when assessing the risk using Equation (
1), the term
became closer to one, which implied that the conditional reliability was not penalized by the covariates, resulting in curves closely aligned with the reliability computed solely from the Weibull parameters.
An example illustrating this can be seen in
Figure 2. Here, the blue curve represents the estimated reliability using only the Weibull parameters (without any covariates). The conditional reliability curves are overlaid on this plot, indicating no penalty when considering the covariates. To address this issue, the data points were not scaled after obtaining the covariate weights.
Table 7 specifies where this criterion was applied.
4.2.1. Reliability Metrics
The results of the following figures are delineated following this format:
Case MX: Solver Approach–Scaler Method–Bounds Determination. Figure 3 and
Table 8 display the cluster boundaries and centroids produced for the five cases. It is evident that cases M5 resembled the reference case, case M1. Conversely, case M2 displayed a noticeably different data segmentation due to its optimization parameters. The remaining cases exhibited comparable results to case M1.
Figure 4 shows considerable dispersion among the conditional reliability states in case M2 and case M3, with state 3 in case M2 incurring a more significant penalty. Two factors could contribute to this outcome. Firstly, it was noted that a lower weight was assigned to
in both cases, as seen in
Table 5, while the other cases shared a similar magnitude. Secondly, the exclusive application of the MinMax scaler to these cases and the following non-scaling of the data points before entering them into the PHM may also have affected these results.
Comparing the results from the GA and IPOPT solvers revealed significant differences in conditional reliability results. The GA tended to produce more optimistic reliability curves, leading to less similarity with the reference case. This discrepancy suggested a potential non-optimal fit in this analysis. Further investigation underscored the crucial role of the scaler in determining the state dispersion and curve shapes, as evidenced in cases M1, M4, and M5.
Overall, the dispersion of the reliability curves appeared to be influenced by the choice of the scaler, while the length of the conditional reliability seemed to depend on the optimization method used. Similar trends were observed for the RUL in
Figure 5.
Table 9 presents the optimal time values derived from the cost function, confirming these trends. The case most closely aligned with the reference case was M2, where states 2 and 3 displayed values that were very similar. This suggested that combining IPOPT with the NS transformation could result in policies similar to those of case M1. Additionally, all the generated cases tended to show times greater than those of case M1.
Finally,
Figure 6 shows the decision policies for all cases. In the reference case M1, power transformer 13 required immediate intervention at the 20,000 h mark, as it surpassed the worst conditional warning-limit graph. Similarly, transformers 14 and 15 were in the caution zone approximately at the 18,000 h and 20,000 h marks, indicating that preparations for their replacement should be made soon. The remaining transformers did not require intervention, as they were operating correctly. These insights allowed for predictive planning of interventions.
In case M2, despite being the best solution in terms of log-likelihood, did not generate a reasonable policy compared to case M1. It suggested that all transformers should be replaced at around the 5000 h mark, resulting in a very aggressive policy. Similar outcomes were observed in case M5, where the policy suggested replacing all equipment after just 2000 h of use. Case M3 performed the worst, recommending immediate replacement as soon as the equipment was put into operation.
While most configurations failed to generate satisfactory policies, case M4 produced a feasible policy, proposing equipment intervention at the 22,000 h mark, which suggested that transformer 13 was nearing the point of needing intervention. Although case M4 did not perfectly match the reference case, it demonstrated that the IPOPT, NS, and IPCRidge configuration yielded the closest results. This showed the framework’s capability to produce feasible policies when the correct optimization parameters were applied.
4.2.2. Results Discussion
Considering all the results, it is evident that the anticipated similarities between cases M2 and M1 did not materialize. A key contributing factor was the lower weight assigned to covariate
, as shown in
Table 5. This lower weight resulted in a significant divergence in the data composition, as observed in
Figure 3, which was also reflected in case M3. These findings highlighted the significant influence of the covariate weights on the resulting policies.
Conversely, preliminary findings indicated that imposing constraints on covariate domains, whether fixed, boundless, or with IPCRidge boundaries, prior to optimizing the weights, had minimal impact on the results. This was evident in the original cases 3, 5, and 8, where identical Weibull parameters and covariate weights were obtained.
Furthermore, a sensitivity analysis conducted on case M2 provided crucial insights into the impact of scaling methods on reliability metrics. Specifically, cases using the NS transformation exhibited dispersion in state curves similar to case M1, whereas cases utilizing the MinMax scaler showed greater dispersion, leading to increased penalization for each state.
The results also indicated that the use of GA did not yield satisfactory outcomes across all reliability metrics. This was surprising given that the Weibull parameters in
Table 5 and the cluster results in
Figure 3 closely resembled the reference case. However, the weights of the covariate
were not close to those obtained in case M1, reinforcing the high significance of this covariate to the PHM.
Additionally, policies employing Ipopt as the solver method and the NS transformation for data scaling most closely resembled the reference case, as demonstrated in case M4.
Overall, the sensitivity analysis was necessary to reveal that achieving a superior log-likelihood score does not guarantee a feasible maintenance strategy.
Finally, this analysis emphasized the importance of covariate weights in shaping the final decision policy and highlighted the need to accurately test and estimate covariate weights and state bands. The key question now is whether the selected covariates in this analysis were optimal or if superior alternatives existed. This is explored in subsequent analyses.
4.3. Analysis 2: Covariate Selection Sensitivity
In the second analysis, the objective was to evaluate the framework’s covariate selection process and assess how these selections impacted the final decision policy. Consequently, all the vital signals listed in
Table 1 were considered. Additionally, based on the results from the preceding analysis, the optimization engine IPOPT, the NS scaler, and the IPCRIDGE bound technique were employed in this case study.
To initiate the analysis, a model including all the covariates was constructed to determine their respective weights. However, to gain a deeper understanding of how the weights varied with different covariate combinations, their weights were calculated for every possible combination, and their median values were computed (this approach helped to mitigate biases from extreme weight values compared to using the average). The results are presented in
Table 10, offering a comprehensive analysis across various covariate combinations. For example, out of the total 15 covariates, when considering only 2 covariates, a total of
combinations were calculated. Then, the median values for each covariate were obtained from these cases.
As evident from
Table 10, the covariates Tint, Dift, C
2H
4, TGC, R.Die, and % Hum consistently demonstrated substantial weight values across all scenarios. This suggested that a more parsimonious model could potentially include only these covariates without compromising the log-likelihood score. In contrast, covariates CH
4, GTF, and TGC-CO initially exhibited high weight values. However, their weights decreased significantly when more than nine covariates were considered.
To evaluate the consistency of the covariates, a cross-validation analysis using K folds was applied. The underlying principle was to vary the dataset utilized for estimating the covariate weights in each fold. Thus, if a covariate demonstrated a consistent and relatively low coefficient of variation compared to others, it enhanced its suitability for inclusion in the final model.
In this analysis, 5 and 10 folds were utilized, and the results, including the average, standard deviation, and coefficient of variation, are presented in
Table 11 and
Table 12. To enhance readability, the tables exclude the covariates CH
4, GTF, and TGC-CO, which exhibited extremely high CV values.
The covariates C
2H
4, R.Die, and % Hum demonstrated relatively lower CV compared to the other covariates. This suggested that the weights assigned to C
2H
4 and R.Die were more consistent across all models. While % Hum had a higher CV than the aforementioned covariates, its CV remained significantly lower when compared to the rest. Consequently, based on this analysis, C
2H
4, R.Die, and % Hum should be prioritized when selecting the final model from the framework. Moreover, these covariates also received substantial weights in
Table 10 when considering a set of 15 covariates. This may indicate that using the model with all covariates could allow us to assess the importance of each covariate, aiding decision-making in covariate selection.
To extend the analysis, a focused examination was conducted on the impact of each covariate on the log-likelihood score, as presented in
Table 13. Interestingly, C
2H
4 significantly influenced the log-likelihood score. Results for TGC and % Hum are not included as there is no solution when considering each separately.
To check if these results were significant, three tests were taken into account to assess the significance of the covariates on the reduction in the log-likelihood score: the chi-squared test for the difference in the LL scores, the LRT, and Akaike values, as shown in
Table 14.
The principal findings of this test indicated that adding any covariates to the model led to a significant improvement in the reduction in the LL score. Furthermore, the chi-squared test revealed that the covariate producing significant changes in the model was C2H4 (p ≤ 0.05), a finding supported by the lowest Akaike value when this covariate was introduced in the model. Additionally, the LRT test indicated that % Hum could also introduce significant changes (p ≤ 0.05). Consequently, models considering only C2H4 and both C2H4 and % Hum were used for further testing.
Continuing with the covariate selection process, a correlation analysis was applied to identify and exclude correlated features that contributed minimally to the model.
Figure 7 presents the
p-values from the Pearson correlation test, where a significance level below 5% suggests correlations between features (
p-value ≤ 0.05). The results revealed that multiple covariates exhibited a high degree of correlation, which could be addressed by removing certain features. The key question was which features should be removed. This decision was crucial since various combinations of feature removal were possible. For instance, consider the temperature covariates: Tint and DifT were correlated because DifT represented the difference between Tint and Text. One possible solution could be to remove these two covariates and use only Text. However, based on the previous analysis presented in
Table 10, both Tint and DifT had higher covariate weights than Text, making this selection less advisable. Alternatively, should Tint or DifT be removed from the model? To address this and similar cases objectively, features with high CV were removed until no correlation was detected. In other words, covariates with a smaller CV were prioritized to remain in the model. Using this criteria, DifT, C
2H
4, R.Die, and % Hum were the combination of features that did not have any correlation, as shown in
Figure 8, and were considered for the final model.
Considering all the analyses presented thus far,
Table 15 displays all the models for comparison, sorted by the LL score.
Case 0 represents the model with no covariates. Case A includes only C
2H
4 as a covariate due to its significance in reducing the LL score, as indicated by the chi-squared test and Akaike Value. Case B is based on covariate selection using expert criteria. Case C comprises covariates showing high significance according to the log-likelihood ratio test. Case D includes covariates with substantial weights from
Table 10. Case E represents covariates identified through the correlation and CV analysis. Lastly, case F is the model including all covariates (Dda, Text, IFS, CH
4, GTF, CO
2, Co, and TGC-CO are considered but not displayed for table readability).
The table shows that all the models we compared (A–F) had much lower LL scores than case 0. This means they fit the data better and could potentially make more accurate predictions. However, there was a trade-off. While adding extra features to the models improved the LL score, it also tended to slightly increase the values and decrease the values.
Models A, C, and E were particularly interesting. They had the lowest AIC and BIC scores. These scores could be explained by AIC and BIC penalizing models for being too complex. It is important to note that even though model A had the absolute best AIC and BIC scores, its LL score was slightly higher than models C and E. This highlights the challenge of finding an accurate model that is not overly complex.
4.3.1. Decision Rule Performance Comparison
The maintenance policies derived from the models outlined in
Table 15 are illustrated in
Figure 9. Model A, employing a single covariate, exhibited a notably aggressive maintenance policy, suggesting immediate intervention for all equipment surpassing 4500 h. Similarly, model C, incorporating two covariates, suggested intervention for all transformers. However, a notable enhancement was observed with the addition of more covariates, extending the intervention threshold to around 7000 h.
Models D, E, and F presented nearly identical policies, marked by significant improvements over previous models, suggesting intervention after 15,000 h of operation. The primary difference among these models lay in the number of selected covariates, indicating initially that as the model incorporated more covariates, it became less sensitive to variations induced by these factors. Furthermore, these models proposed a more conservative approach compared to model B, the top performer in the previous analysis.
Upon closer examination, the policies derived from models D, E, and F bore resemblance to the reference case depicted in
Figure 6a, with the primary difference being the dispersion of the obtained curves and a more cautious intervention policy. This shows the framework’s capability to generate effective policies even without expert criteria, proving its utility when dealing with data lacking such recommendations
4.3.2. Results Discussion
Based on the analysis of both the covariate selection process and the resulting policies, several insightful findings emerge.
In examining the policies, the importance of covariate selection in shaping the decision rule for maintenance policy becomes evident. The number and choice of covariates significantly influence the approach to equipment intervention. For instance, in models A and C, a limited selection of covariates led to an overly aggressive policy, prompting unnecessary interventions. In contrast, selecting less appropriate covariates, as in model B, resulted in a more lenient policy, thereby increasing the risk of premature equipment failure.
In relation to the covariate selection procedure proposed by the framework, models D, E, and F emerged as interesting cases. Despite differences in their selection methods, these models yielded practically identical results. This could suggest that while the strategy for optimizing covariate weights in model E generated good results, it might not be necessary, as models with more covariates produced identical outcomes. However, a more detailed analysis revealed that models D and F included the covariates selected by model E. Therefore, it was inferred that these were the minimum covariates that must be considered to ensure adequate and feasible maintenance policies. Nevertheless, models D and F illustrated the substantial value provided by the covariate weight estimation process, as it appropriately assigned importance to the covariates to generate pertinent policies.
As observed in this analysis, a systematic approach to covariate selection can lead to building a more parsimonious model. In this instance, this was achieved through the combined use of cross-validation and the coefficient of variation for feature selection in the correlation analysis, yielding results similar to the reference case.
Furthermore, the results suggest that a valid strategy within the proposed framework is to select all covariates and let the optimization engine determine the relative importance of each. Notably, this approach demonstrated identical performance to model E and case 0.