1. Introduction
The role of wind turbine systems (WTSs) in decarbonizing the electrical grid has steadily increased in recent years [
1]. In 2020, the estimated global cumulative capacity of both onshore and offshore installations was over 743GW. These numbers are expected to grow further in the quest to supply renewable and sustainable energy [
2]. A key factor to reduce the levelized cost of energy (LCOE) of wind power is to increase the performance and reliability of these systems [
3]. In this context, the implementation of preventive maintenance techniques for WTSs aims to reduce operating expenditures (OPEX) related to unexpected maintenance events, expected to be of critical importance in the case of offshore installations [
4]. These variable OPEX can account for 11–30% of the LCOE of onshore installations, up to 30% in offshore installations, and 20–25% of the total LCOE wind power systems [
5].
The operation of WTSs depends on a multitude of elements, including external factors such as wind availability and grid stability. Hence, addressing health diagnostics and prognostics requirements in WTSs is a complex task that depends on system behavior, component degradation, and varying environmental conditions [
6]. Several issues may lead to system downtime, including mechanical, electrical, and connectivity failures. A breakdown of common WTS faults per component is presented by Liton Hossain et al. [
7]. In particular, pitch system failures may account for up to 20% of total turbine downtime [
8]. Determining the cause and identifying early signs of system degradation have proven to be key when developing adequate planning and scheduling of actions to maintain grid stability. Here, the development of prognostics and health management (PHM) frameworks designed for WTS operations can play a major role in deriving comprehensive maintenance policies and increasing system reliability [
9]. Complementing traditional reliability methods based on statistical failure event analysis, PHM leverages the collection and analysis of sensor monitoring data to provide diagnostics and prognostics models, aimed at detecting, localizing, and/or predicting future failures and faulty states.
Most WTSs use the supervisory control and data (SCADA) system for their monitoring data collection [
10,
11,
12]. SCADA is a computer-based system that gathers and processes monitoring data from multiple sensors. Even though the standard sampling rate of SCADA data is 1 s, the reported data correspond to average values over 10 min time intervals, essentially converting the data acquisition into low-frequency measurements [
13]. SCADA also records anomalous behavior or system failures detected by a built-in rule-based alarm system. However, due to the high number of false positive alerts, plus the noisy and intractable nature of the generated alarm logs, most researchers have focused on using either the low-frequency SCADA sensor measurements or additional component-specific local high-frequency measurements to develop data-driven diagnostics and prognostics models [
14,
15]. Indeed, few published works explicitly perform a joint analysis of sensor measurements, alarm logs, and maintenance records [
8,
14,
16]. This is exacerbated by the lack of standardized maintenance reporting procedures in the industry [
17,
18].
The number of collected sensor signals varies from system to system depending on the WTS, manufacturer, and operators [
12,
18]. The available sensors and their data quality will determine the possible anomaly detection and diagnostics models that can be trained and their performance. As such, although most of these diagnostics models are based on data collected through the SCADA system, the data preprocessing used in different architectures will likely converge to system-specific solutions. Hence, there is a need for a systematic preprocessing methodology that can be implemented to different systems requiring no or minor adjustments. In this regard, previous studies have addressed some of the global challenges presented in data collected from the SCADA system. For instance, [
15] analyzed highly imbalanced SCADA data with the purpose of designing a more accurate alarm system. Here, principal component analysis (PCA) is used to preprocess SCADA sensor data, oversampling techniques are implemented to address class imbalance, and time splits were employed to avoid class contamination.
In WTSs, fault detection and diagnostics tasks have mainly been addressed through three different approaches: model-based, signal-based, and knowledge-based methods [
14,
19]. In this case, model-based approaches refer to statistical or data-driven models (DDMs). Among these, machine learning (ML) and deep learning (DL) have become powerful alternatives to train diagnostic and prognostic models in the context of PHM frameworks. Popular ML models include support vector machines (SVM), k-nearest neighbors (k-NN), and random forest (RF) algorithms, and they are frequently used for diagnostics tasks in a variety of settings [
20,
21,
22,
23]. For instance, in [
15], a comparative analysis of fault diagnostics with k-NNs and SVM is presented, achieving F1 scores over 0.95 for both models after balancing the datasets for healthy and degraded classes. Furthermore, Stetco et al. [
23] presented a comprehensive review of ML models applied for both diagnostics and prognostics in WTSs. It was identified that, for diagnostics tasks, class imbalance and noisy features can hinder model performance, and significant attention must be given to the feature selection and reduction process. Deep learning models have also been implemented to take advantage of their hierarchical structure to extract abstract features from the available data. Chen et al. [
24] implemented an unsupervised anomaly detection model for WTSs based on long short-term memory (LSTM) autoencoders (AEs) using data from the SCADA system. The model is trained based on data considered to be operating at normal states to then be evaluated for new unseen data. Depending on the network’s reconstruction error, an adaptive threshold is defined to determine whether the system is operating under normal or anomalous conditions. Encalada-Davila et al. [
25] presented another anomaly detection method based on the prediction of one of the sensor variables (i.e., the quantity of interest) based on other selected variables, where a residual is defined as the difference between the sensor reading and the model’s predicted value. The model is trained on healthy data, defined as long operational periods where no failures were observed. A faulty state is then defined based on the model’s prediction error, where it is expected that faulty states will produce a higher prediction error than a healthy state.
Two important challenges are thus identified in the literature for WTS diagnostics models based on data from the SCADA system. On the one hand, most algorithms are focused on unsupervised or semi-supervised anomaly and fault detection models. This is due to the difficulties of acquiring robust and reliable labels from the system. It is observed that the number of failures is negligible with respect to the available nominal or healthy data (i.e., normal operation) [
8]. Therefore, new methodologies to acquire health state labels are required. On the other hand, the selection and implementation of ML and DL approaches have been shown to be difficult. SVM can be unstable when analyzing large multidimensional datasets, while RF tends to overfit over the training sets. DL models are highly complex and require large amounts of data to be trained. Determining the structure of a DL model is challenging given the high number of hyperparameters, and models tend to overfit. However, although ML models also tend to overfit, these may outperform more complex DL architectures under limited data regimes [
26]. In this regard, feature extraction and reduction techniques have proven key to analyze smaller datasets.
In this context, quantum computing has been presented as a new computational paradigm, in which computations are performed based on two-state quantum systems, denoted as qubits, instead of traditional bits. Qubits allow for quantum mechanics properties such as interference, entanglement, and superposition to be used in computation routines, in some cases obtaining exponential gains in terms of algorithmic complexity (number of iterations required to perform a certain task). While quantum hardware is still in development, early quantum computers are becoming available for the general public through cloud computing services such as IBM’s Quantum Experience. Additionally, specialized software providing high-level APIs to develop quantum algorithms using traditional languages (e.g., Python) have also been released, such as Pennylane [
27], Qiskit [
28], or Cirq [
29]. With these two key developments, researchers and practitioners have been able to test algorithms designed when quantum computing was a theoretical field. A good example of this is Shor’s algorithm, proposed originally by Peter Shor in 1994 [
30], for efficiently computing the prime factors of integers, which has been recently further explored in the area of cryptography given its relevance as a way to break modern encryption techniques [
31].
Recently, the focus of quantum computing research has been shifted into three main areas. The first is the usage of quantum computing to improve existing established algorithms, such as query search or decryption algorithms. Examples of this can be found in [
32,
33]. The second is the use of quantum properties to accelerate general optimization problems such as the quantum approximate optimization algorithm (QAOA), originally proposed in 2014 by Farhi et al. [
34] to solve combinatorial optimization problems. The third area is the use of quantum computing to either improve or accelerate ML models, where research has focused on two different topics: designing quantum circuits that can be identified as neural networks [
35] and developing quantum circuits that can be used as kernels for traditional algorithms such as SVMs [
36].
Given this context, two research gaps are identified. Firstly, how to objectively select feature reduction techniques when training ML diagnostic and prognostic models based on SCADA data. Secondly, given the recent advances in quantum machine learning (QML) algorithms, how can quantum kernels be used for PHM purposes and how these compare to traditional ML techniques. This paper discusses the potential of quantum SVMs (Q-SVMs) for system prognostics through a WTS case study. Details on the preprocessing methodology of SCADA sensors data and alarm logs are presented to train a quantum-enabled prognostics model, which is compared with traditional ML algorithms. Here, special attention is given to the feature reduction process through PCA and deep AE. Challenges, advantages, and prospects are discussed when using QML models in PHM.
The main contributions of this paper are the following:
Development and implementation of a quantum kernel-based fault prognostics model in WTSs;
Provide a comparative analysis of PCA and AE as feature reduction tools;
Methodology to obtain health state labels based on SCADA alarm logs;
Comparison with traditional ML models used for classification tasks.
The remainder of the paper is structured as follows.
Section 2 presents a review of current research and challenges in the application of PHM to WTSs.
Section 3 presents a detailed introduction to QML and quantum kernels.
Section 4 describes in greater detail the WTS case study.
Section 5 presents the development of the proposed prognostics models employed in this study and the obtained results, comparing the performance between classical and quantum approaches. Finally,
Section 6 presents the main conclusions of this work.
2. Prognostics and Health Management in Wind Turbine Systems
Maintenance activities can be addressed through three different approaches: corrective, preventive, and condition based. Corrective maintenance corresponds to a reactive approach, where failed components are repaired or replaced after they have failed. Preventive maintenance is a more conservative strategy, where maintenance is performed before the component’s estimated failure time, frequently based on fixed schedules. Here, maintenance scheduling is based on the component’s statistical study, where it is ensured that a certain percentage of the failures are prevented. This approach significantly reduces the number of failures when compared to corrective maintenance; however, it is costly since it frequently results in unnecessary stoppages to perform maintenance in equipment that does not need it. In WTS farms, having these unprofitable stoppages is undesired. In this regard, condition-based maintenance (CBM) uses information from monitoring data collected from sensor networks to infer the health state of the system. This is a dynamic and proactive approach that allows integrating the health state of the system into the optimization of maintenance policies. Integrating the CBM health assessment to the decision-making process is known as PHM.
Prognostics and health management is an approach derived from CBM developed to aid the optimization of maintenance policies. PHM seeks to implement end-to-end frameworks that integrate sensor monitoring data into the decision-making processes, including everything from data acquisition and preprocessing to the training of diagnostics and prognostics models. As it is shown in
Figure 1, most PHM frameworks are broadly divided into four different stages: data acquisition, data preprocessing, diagnostics and prognostics, and decision making [
37].
In the last decade, research works have focused on obtaining diagnostics and prognostics models to assess the system’s state of health. These models are traditionally physics-based models (PBMs), DDMs, or hybrid (i.e., a combination of PBM and DDM). On the one hand, mathematical models are highly accurate and provide interpretability. However, PBMs are rarely available to describe the degradation processes in complex systems. On the other hand, DDMs such as ML and DL techniques, have gained interest since they do not require prior knowledge on the data or system under study and present great generalization capabilities. This comes at the cost of low interpretability due to their black-box behavior and lower precision in their prediction when compared to PBMs. ML applications can be adapted to study degradation processes at a local scale in components for which mathematical models for the physics of degradation are not available. These require highly precise and localized sensors. These applications are common in additive manufacturing [
38]. Another approach considers the discovery of general degradation behavior from operational sensor measurements. This is more suitable for CESs, since knowledge on the degradation behavior is scarce and sensor networks are designed to monitor the operation of the asset rather than for diagnostics or prognostic purposes. In this case, extracting degradation data is a challenging task, since the degradation can occur at any location in the system and not necessarily where the sensors are placed. Sensor networks are also designed to simultaneously monitor several components; thus, the resulting diagnostic models usually focus on system-level degradation rather than local phenomena. Furthermore, hardware development in the last decade has allowed the training of powerful models with millions of data points using graphical power units (GPUs). As such, implementing ML and DL algorithms to obtain diagnostics and prognostics models has become the center of research in PHM. Examples of this are variational autoencoders (VAE) for fault detection [
39], deep convolutional neural networks (CNNs) for damage detection and quantification [
40,
41], deep LSTM and recurrent neural networks (RNN) for quantity of interest prediction and anomaly detection [
42,
43], and physics-informed neural networks (PINNs) for remaining useful life estimation (RUL) [
44].
Multiple works have been published exploring different data-driven techniques employed for both diagnostics and prognostics in WTSs. Due to the lack of reliable labels and the difficulties presented when training prognostics algorithms, most of these DDM-PHM architectures in WTSs are used for anomaly detection and fault diagnostic tasks. These are mostly trained on data collected through the SCADA systems [
12]. Among these models, SVMs, RF, and neural networks (NNs) are the most popular [
45,
46,
47]. More complex DL architectures have also been implemented for anomaly detection [
48]. For instance, Wu et al. [
49] proposed a methodology to diagnose gearbox bearings and generator faults using SCADA sensor measurements based on a hybrid statistical–ML approach, combining LSTM and Kullback–Leibler divergence. LSTM models have also been used with AEs to develop an adaptive anomaly detection method, which then was employed with support vector regression as an adaptive threshold of performance index [
24]. A hybrid approach was proposed in [
50], where NNs were combined with statistical proportional hazard models for real-time performance and stress condition assessment.
Given the model-agnostic nature of ML and DL algorithms, their performance heavily relies on data availability and quality. Data preprocessing has been identified as a fundamental stage in PHM frameworks [
37]. The importance of feature selection and outlier detection for diagnostic and prognostic models in WTSs have been studied thoroughly by Marti-Puig et al. [
12,
51]. The outlier detection process provides features with more representative domains, which in turn yields models with better generalization capabilities. ML techniques such as SVMs tend to perform better for small input dimensions; thus, feature extraction and selection play an important role in generating smaller and representative datasets. Hence, for these models, a smaller dataset results in shorter model training and evaluation times and models with high performance. This is key to enable the online deployment of these models for WTSs. In this regard, PCA and AE have been implemented to train diagnostic and prognostic DDMs, as well as for feature reduction techniques [
37,
52]. Regarding WTS analysis, PCA has been implemented for different applications, including a data visualization tool [
53], feature selection and reduction [
54,
55], and fault detection methods [
56,
57]. In this regard, utilizing and comparing AE and PCA as effective feature reduction tools have not been as widely studied in WTS settings.
5. Quantum-Based Wind Turbines’ Pitch Fault Prognostics
This section describes the computational experimental setup and results for the quantum-enabled and classical diagnostic approaches. The data are split into healthy and degraded state classes. Based on the previously discussed PCA and AE feature reduction process, the ML model training and testing process are described. The results of the classification models and the effect of the feature reduction techniques employed are also discussed.
5.1. Computational Experimental Setup
In this work, the performance of all the ML models is compared based on datasets of reduced dimensionalities. This dimensionality reduction is performed with both the PCA and AE techniques, as described in
Section 4.2. A sensibility analysis to assess the impact of the feature reduction on the models’ performance is presented. The tested dimensionalities are 4, 8, 16, 19, and 32 features. These dimensions are chosen based on two criteria. First, the dimensions 4, 8, 16, and 32 are chosen based on the encoding techniques available for quantum algorithms. Secondly, experiments with 19 features are included to represent the threshold of the PCA’s 90% explained variance and where the AE’s reconstruction MSE starts to converge. The resulting dataset sizes are reported in
Table 5.
With respect to the quantum classification approach, both angular and amplitude encoding were utilized as the feature map function for the quantum kernel for the datasets including four and eight principal components. For the cases where 16 and 32 principal components were utilized, only amplitude encoding was tested due to limitations in quantum circuit simulation. Given the exponential increase of possible states that a quantum model can represent as the number of qubits increases, simulating a system with over 12 qubits is generally not possible on modern classical computers running quantum simulators. For the special dataset including 19 principal components, zero-padding was used to augment the dimension of each datapoint to 32 features, which is the closest number to a power of two (i.e., 25). As this operation is performed on each datapoint after the initial preprocessing and division intro training and testing sets of the data, it does not affect the balance or fairness of the experiments nor leak training data into the testing set.
The reduced datasets are separated into balanced training and test sets [
15,
70]. That is, the training datasets present the same number of entries labeled as “healthy” and “faulty” states, with the purpose of reducing the model’s bias toward the most observed state (i.e., the healthy state). As shown in
Table 6, out of the original 251,164 temporal entries, only 779 of these correspond to faulty states. Hence, to create the balanced training and test sets, 779 entries labeled as healthy are randomly selected. The resulting 1558 entries are then divided into training and test sets, considering a 20% split to test the models’ performance after these have been trained.
The quantum kernels are compared to traditional ML techniques, namely: SVM with both linear and RBF kernels, RF, and k-NN. The models’ performances are compared based on the same dataset. A stratified k-fold (10) is used for the hyperparameter selection. The ML models are trained on Python 3.8 and the Pycaret library. Ten different models are independently trained for each classical algorithm, whereas five different models are independently trained for each quantum-based algorithm. Reported classification metrics include the average and standard deviation of accuracy, precision, recall, and F1 score. The utilized hardware consists of an NVIDIA RTX 3060 GPU, an 8-core AMD Ryzen 7 5800X CPU, and 32 GB of RAM memory.
5.2. Results and Discussion
This section is structured as follows. First, results regarding the classification task using a feature reduction strategy based on PCA and AE are presented in
Table 7. This table reports the models’ average accuracy and F1 score achieved for the training and test sets of five different reduced dimensionalities. Additional metrics, including averages of precision and recall, as well as the standard deviation for all metrics, are presented in
Appendix A (
Table A1,
Table A2,
Table A3,
Table A4 and
Table A5). Then, the performance of the PCA and AE as feature extractors is discussed, followed by a comparison of the classical ML models. The performance of the Q-SVM model is discussed for the two types of encoding presented. Finally, the performance between classical ML and Q-SVM is discussed.
In
Section 4, a sensibility analysis regarding the optimal reduced feature dimensionality of the data was presented based on the CEV and MSE metrics. However, when implementing data-driven prognostics models, it is also desired to obtain the lowest possible dimensionality that does not hinder the model’s performance to minimize the computational burden. In practice, this allows for simpler model selection and maintenance for effective online deployment given their on-site hardware requirements. As such, a sensibility analysis on the model’s performance is required to assess the optimal feature space dimensionality, which might not necessarily coincide with the number of features indicated by the CEV and MSE thresholds. In general,
Table 7 suggests that the prognostics models perform better when analyzing the data preprocessed with AE than with PCA. Additionally, model accuracies tend to increase with a higher dimensionality space, converging to values above 0.90. F1 scores follow a similar trend as accuracy, indicating an adequate balance between false negatives and false positives. It should be noted that most ML models achieve a peak test accuracy over 0.90 for a reduced dimensionality of 19 features. This is the case for most models trained on the AE’s latent space, although this behavior is only exhibited by the RF and k-NN models when trained on the PCA data. While it may be intuitive that more training features should increase the performance of the tested models, from an information point of view, this is not always the case. In this study, 19 features represent a threshold for which most of the information from the original dataset is encoded into the PCA or AE features, as shown in
Figure 7 and
Figure 8. Adding extra information in the form of additional features may become detrimental to the learning process, as they are likely to add more noise than useful information; nonetheless, the algorithms are forced to interpret them. Moreover, it is known that ML models, such as SVM, are often affected by what is known as the “the curse of dimensionality”, where datasets containing a large number of features are not suitable for efficient training and thus require further feature extraction procedures to increase their prediction performance. In general, the performance of all trained models suffers the most when a lower number of features are used as input data, as expected by the loss of information for both the PCA and AE preprocessing approaches at lower dimensionalities. This behavior can be observed in
Figure 9, where the test accuracies for the models are compared based on the corresponding number of features used for the data reduction process. This is consistent with the dimensionality reduction analysis performed for both the PCA and AE (see
Figure 7 and
Figure 8). However, it should be noted that unless the AE’s performance is compared to that of an interpretable metric, such as the CEV obtained through PCA, there is no guarantee that a representative dataset is efficiently obtained. Results indicate that the latent space representation obtained from the PCA and AE are not directly comparable, which confirms the value of simultaneously analyzing these two feature reduction techniques.
The presented results indicate that the overall highest test accuracy is achieved by the SVM-RBF model at 0.945 and 0.921 through AE and PCA reduction, respectively. It should be noted that, although both RF and k-NN present a comparable performance, these show noticeable differences between the accuracy reported for the training and test sets, which is an indication of overfitting. Still, k-NN generally performs better than the RF using the same number of input features. It can also be observed that, for low dimensionality, the RF and k-NN models with AE feature reduction outperform the rest, while for higher dimensionalities the models trained on PCA features show better results. This is an important result to choose which dimensionality reduction technique should be used. Regardless of the input dimensionality, the standard deviation for the accuracy does not surpass 5% for all models, except for the SVM with linear kernel (see
Table A1,
Table A2,
Table A3,
Table A4 and
Table A5). These low standard deviations values are expected, given that the tested ML algorithms present a stable behavior during the training process. A small standard deviation also shows a consistent performance of the models, which can be related to the preprocessing methodology employed. Indeed, the average accuracy standard deviation obtained for data preprocessed with both PCA and AE does not surpass 1.87% and 2.39%, respectively. This behavior can also be observed for the Q-SVM model, for which the average accuracy standard deviation obtained is 2.26% and 2.56% for datasets preprocessed with PCA and AE, respectively. Unfortunately, none of these algorithms allow one to quantify the prediction uncertainty, which is an ongoing challenge in the PHM community.
One of the main limitations of this work lies in the simplified approach taken towards label generation. The detection time window of one hour prior to a pitch fault alarm is a naïve approach based on the distribution of alarm duration shown in
Figure 5. The main drawback of using a fixed time window is that it does not consider that overlapping alarms may be recorded. Hence, further work is required to produce more robust datasets from alarm logs, considering various alarms of interest. Other label generation methodologies defined from maintenance and operational logs for prognostics purposes based on time windows have been proposed by Cofre-Martel et al. [
37,
70]. The approaches outlined in these articles would enable the use of longer time windows, expanding the prediction horizons to more than 1 h in the future. Additionally, a sensibility analysis is further required to obtain the optimal prediction horizon based on expert knowledge and model performance.
It can be seen from
Table 7 that angle encoding results in a better model performance than angular encoding for both PCA and AE (refer to Q-SVM results for four and eight features). While both encoding techniques are lossless operations to translate information from a classical to a quantum setting, using more qubits results in more expressive kernels, as it is the case for angle encoding. This expressiveness allows the algorithm to perform better in downstream tasks, such as classification. For the rest of the cases, quantum-based fault prognostics results are comparable between preprocessing approaches (PCA and AE), and no significant differences are observed between them in terms of final accuracy, indicating the stability of the quantum kernel-based approach against widely different feature extraction techniques. Note that the performance of Q-SVM models is more sensitive to the number of features for the AE than the PCA preprocessing.
Figure 9 shows that, for most cases, more features in the original dataset effectively result in better fault prognostics performance, as it is expected from an information point of view. Nevertheless, for the PCA feature reduction approach, a decay in performance is observed when the algorithm is trained using 32 principal components, indicating that the extra components contain a low amount of explained variance and therefore are detrimental to the classification task. On the other hand, for the AE feature reduction approach, no loss in performance is observed when using 19 or 32 features. This may be explained by the fundamental differences between the PCA and the AE feature reduction process, the latter being a nonlinear function optimized based on a reconstruction metric that allows the AE to encode useful information even past the threshold of 19 features. Note that no significant decay in performance is observed when the zero-padding technique is used to allow the application of amplitude encoding to the case of 19 features. This is important since it motivates further exploration in quantum kernel-based fault detection models without the limitation of having to synchronize the number of features to the closest power of two. With respect to the implementation itself, while software libraries allow for a relatively straightforward interface to program simulations of quantum circuits, the execution time vastly surpasses the time necessary to train and evaluate the classical approaches tested in this paper. For example, the Q-SVM training time is in the order of four hours, while the training of traditional SVM approaches takes seconds in modern hardware. The increased amount of training time for Q-SVM models is likely to be due to the simulation process being performed in a classical computer, which is not specialized for quantum operations. The real-time requirements for quantum algorithms will need to be further assessed by the research community once quantum hardware becomes readily accessible. In this regard, this situation is comparable to the origins of other data-driven techniques such as DL before the general availability of GPUs and custom-made software to accelerate the execution of such models (e.g., CUDA).
Comparing the performance of Q-SVM with traditional approaches, it is evident that while the results are within close range with the current available quantum processors and quantum simulators, slightly better performance is obtained using some of the classical techniques for most cases, notably RF models. An outlier in this trend is the case in which the dataset with 32 features generated with an AE was used, where the Q-SVM performance surpasses the RF classifier. Nevertheless, the Q-SVM technique can achieve satisfactory pitch fault imbalance prognostics results. This indicates that the quantum kernel is effectively transforming the original data into a higher dimensional space that is rich enough to allow for the identification of pitch imbalance faults. In addition, the lower performance of the Q-SVM technique could be at least partially attributed to the current state of the art of quantum hardware and quantum simulators, which does not allow for the generation of large encoding circuits capable of leveraging on the whole range of available feature information. Given the state of current quantum technology and algorithms, the fact that the Q-SVM presents a comparable performance with classical counterparts is an important indication of the potential benefits achievable in the near future. This is a key motivation to explore these algorithms as quantum hardware and software are further developed.
To formally assess the statistical difference between the models’ performance and compare the ML models with the quantum kernels, a difference of means hypothesis test is performed for the best-performing data reduction configuration. For each model using 19 features processed with AEs, 10 different instances are trained and then tested to obtain a mean and a standard deviation of the models’ accuracy. The hypothesis is that the test accuracy distributions are not statistically significant. Thus, the null hypothesis is that the mean test accuracy is the same for each pair of models. The null hypothesis
and the alternative hypothesis
are presented in Equations (18) and (19):
where
and
are the sample mean for the first and second population samples, respectively. Equation (20) shows the standard error (SE):
where
,
are the sample standard deviations corresponding to
and
, respectively. The degrees of freedom (DOF) are then computed as shown in Equation (21):
Then, the test statistic is given by Equation (22):
The null hypothesis is rejected if , where is the significance level, which is normally set to a value between 0.05 and 0.10, and is the corresponding p-value.
Table 8 shows the mean and standard deviation values for each model.
Table 9 shows the obtained
p-value for each pair of models.
Considering a significance level of
, the obtained p-values indicate that the Q-SVM outperforms both the k-NN and RF models with a statistically significant advantage. Although further testing is required to confirm the robustness of the Q-SVM approach, this is an interesting result considering the widespread use of k-NN and RF for data-driven fault diagnostics and prognostics tasks. Furthermore, the null hypothesis cannot be rejected when comparing the Q-SVM and SVM-L models. Thus, the performance difference between these two models is inconclusive, and they can be considered as comparable. The SVM-RBF is the only approach that presents a statistically significant higher performance than the Q-SVM. This result, in conjunction with the fact that quantum kernel approaches have just begun to be expanded and tailored by the machine learning research community, encourages further exploration of the technique itself and ways to improve it. Indeed, the presented Q-SVM approach uses a reduced number of qubits, limited by the current maturity of the quantum hardware and simulators available. Yet, Q-SVM obtained comparable and competitive results in terms of performance, even outperforming popular algorithms such as RF and k-NN, particularly when tested with a reduced space of 19 features (See
Figure 9 and
Table 7). However, the computational complexity of the calculations performed to encode and process the data in Q-SVM results in prohibitive model training and evaluation times when compared to traditional ML models. Nevertheless, the results obtained for Q-SVMs are encouraging, considering QML implementations are expected to improve and become more advantageous from a practical point of view as the quantum hardware evolves, enabling the use of more qubits and therefore enabling the exploration and construction of more complex and representative quantum states.
6. Conclusions
This paper presented a methodology to include the SCADA alarm information into data-driven diagnostic tasks in WTSs focused on detecting pitch fault failures. The number of features in the SCADA sensor data was reduced through two methods: PCA and AE. Following this, several data-driven diagnostics approaches were explored: traditional ML algorithms and quantum kernel ML algorithms. A sensibility analysis on the models’ performance was presented regarding the reduced dimensionality of the dataset and feature reduction method.
Overall, the highest performance was achieved with the SVM-RBF model (mean test accuracy of 0.945), while most models present over 0.9 accuracy when 19 features are used. It was also observed that when more than 19 features are used, the overall accuracy of the classification does not improve further, and in some cases it decreases. This is consistent with the CEV and MSE analysis presented for the PCA and AE methods, respectively, where it was shown that, around 19 features, almost all the statistically significant information was extracted from the original dataset. Hence, these results suggest that the optimal feature dimensionality obtained from the feature reduction analysis coincides with the optimal performance of the prognostic models. In this regard, while the fault prognostics models tend to exhibit a slightly higher performance when using AE-based data reduction methods, PCA provides an explainable metric (CEV); therefore, it is an interesting point of comparison. These results highlight the importance of considering alternative metrics other than model performance when selecting the appropriate feature reduction procedures. Ultimately, the chosen method would depend on the application at hand and the user’s interpretability requirements.
In general, the results obtained for the quantum kernel show comparable performance levels when compared with classical approaches. Indeed, when comparing the classical and quantum-based diagnostic methods using 19 features preprocessed with the AE, the Q-SVM presents a statistically significant advantage over the k-NN and RF models (). Further, while the performance of the SVM-RBF models surpasses the Q-SVM, the results of the latter were comparable with those obtained with the SVM-L model. Regarding the practical implications of these results, comparable results were achieved between the proposed Q-SVM method and established approaches. As quantum hardware evolves and becomes readily available, it is expected for QML algorithms to increase in complexity and representational capacity, possibly surpassing traditional ML models. QML has just recently begun to be explored outside the quantum computing research community, so its early testing in practical applications, such as the case study presented in this work, allows the PHM community to assess its potential and identify future research paths within the field. The authors believe that, based on the results obtained for this case study, quantum kernel-based fault prognostics algorithms merit further research in the advent of the further development and general availability of quantum computers and quantum simulators, which is expected to occur during this decade.