1. Introduction
Partial discharge (PD) detection in transformer dielectric oils is justified by the need to reduce costly transformer breakdowns, extend their lifespan, optimize preventive maintenance, and eliminate network failures, all of which can have significant economic impacts. A steady and sustained rise in the number of publications on PD source classification using machine learning algorithms can be seen in the period from 2010 to 2023 [
1].
PD occurs when high voltage is applied to materials in any state, whether solid, liquid, or gaseous. It is a complex physical process that exhibits randomly distributed properties and produces phenomena such as light, sound, and high-frequency electromagnetic waves, releasing electrical charges [
2].
In situ experimental images of transformer oil spaces are extremely complex. In this work, the tests have been performed in the laboratory with oil samples extracted from the transformer. Consider that the main objective of this work is to explore the feasibility and potential of quantum machine learning models for the classification of electrical discharges in dielectric oils, using a controlled laboratory environment.
Although our current work is not focused on in situ studies, it could lay the groundwork for future research in that direction. This laboratory setup offers the advantage of immunity to electromagnetic interference.
The most widely used AI-based machine learning algorithms currently used to identify PDs in electrical transformers are derived using support vector machines (SVMs) [
3], followed by artificial neural networks (ANNs) [
4] and convolutional neural networks (CNNs) [
5]. All these methods use classical computing. In ref. [
6], it is indicated that before machine learning algorithms are tested on specific problems, there are no inherent or predefined differences that allow us to affirm that one machine learning algorithm is better than another. Following the current trend in the use of SVM techniques for the analysis and search for patterns in difficult-to-classify environments, the so-called kernel trick is used; through it, an attempt is made to find a series of hyperplanes where it is easier to find certain values. Once these SVM techniques are known, the aim is to transfer this knowledge to quantum computing.
Below, we review the current state of quantum kernel models (QKMs), quantum variational models (QVMs), the use of currently employed fault-tolerant quantum computers, and their potential theoretical and experimental advantages as well as their limitations.
The QKM is an area of AI in which the advantage of quantum computing has been explored. According to ref. [
7], quantum kernels can be used for supervised learning, showing that a quantum computer can classify data in a high-dimensional feature space more efficiently than classical methods.
In ref. [
8], it is explained how QKMs can capture complex relationships and patterns in data that classical kernels might not be able to identify. In this way, an SVM using a quantum kernel can better classify new data and make more accurate predictions. This enables hybrid computing, where a quantum computer implements a quantum kernel that is then run on a classical computer.
In ref. [
9], it is noted that as the problem size increases, the differences between kernel values become smaller and smaller, and more measurements are required to distinguish between the elements of the kernel matrix.
In ref. [
10], the number of evaluations when solving the dual problem is quantified in a number of quantum circuit evaluations with an order of magnitude given by Equation (1), where M represents the size of the data set and ϵ is the accuracy of the solution compared to the ideal result, which can only be obtained theoretically with exact values. That is, the time required to solve the dual problem using quantum circuits increases polynomially with the size of the data set
M and is inversely proportional to the square of the accuracy ϵ.
The dependence on
M poses a major challenge for problems with large data sets. In ref. [
10], an improvement with the primal problem with the kernel is shown using a generalization of a classical algorithm known as Pegasos, resulting in a smaller number of evaluations which, using Landau notation, is shown in Equations (1) and (2).
In ref. [
11], it is explained that the QKM approach is more natural and suitable for quantum theory compared to the attempt to adapt quantum theory to fit the structure of classical neural networks, which is a more popular but less natural approach.
Thus, instead of optimizing classical parameters with QVM, which presents certain complex problems such as the choice of ansatz and the appearance and treatment of sterile plateaus, the QKM approach avoids these problems, although it requires calculating pairwise distances between data points, which implies a high computational cost.
As shown by the results reported in ref. [
12], quantum algorithms can outperform classical algorithms in optimization problems, which are central to supervised learning. A review of several quantum optimization algorithms is also conducted in ref. [
12], and their potential to outperform classical methods in machine learning tasks is pointed out, highlighting certain practical applications and preliminary experiments.
In ref. [
13], it is shown how a quantum perceptron can be simulated on a quantum computer, suggesting that QVMs could be trained and run more efficiently than their classical counterparts in certain cases. In ref. [
14], it is suggested that a QVM with the ability to process classical and quantum data, trainable through supervised learning, could be run on an intermediate-scale quantum computer.
It is interesting to note that in ref. [
15], it is demonstrated that quantum computers can handle and process structured data more efficiently in some specific cases. The quantum algorithm can, in theory, outperform classical methods in performing principal component analysis (PCA) on large data sets.
Quantum computing is a technique based on random phenomena that occur at the atomic scale. This computing uses the properties of quantum mechanics such as quantum superposition and entanglement. Its basic unit of information is the qubit, similar to the bit in classical computing.
It is important to highlight the greater computational power of quantum computing because qubits can exist simultaneously in multiple states. In quantum computing, the computational power increases exponentially as the number of qubits increases, which can be compared to classical computing where this increase is linear as the number of bits increases.
The execution speed in quantum computing is greater than in classical computing due to the principles on which it is based, such as quantum superposition and entanglement, which give rise to parallel computing.
Therefore, the execution of these algorithms can be repeatedly invoked many times, obtaining an acceptable probabilistic response in certain practical problems.
Among these algorithms, the best known is Shor’s algorithm [
16], which is a reference used in the factorization of prime numbers with a polynomial complexity
compared to the generalized prime number algorithm with a complexity
[
17].
Another algorithm of great interest in quantum computing is Grover’s algorithm [
18], which has been shown to be fundamentally useful in searching for a given element in an unstructured database with a theoretical complexity of
, compared to
. The Long–Grover algorithm [
19] is a variant of Grover’s quantum search algorithm that is able to handle situations where the exact number of solutions in the unstructured database is not known. This new algorithm maintains the same theoretical efficiency
for database dimension
N as Grover’s algorithm, but with a better ability to tolerate uncertainty in the proportion of solutions, making it more robust and practical for certain types of search problems.
In the analysis undertaken in ref. [
20], the question is raised as to whether and how quantum computing can actually boost machine learning using real-world classical data sets. The main technical limitations and challenges associated with noisy intermediate-scale quantum (NISQ) computers are then addressed.
An analysis of the quantum computing landscape is discussed in ref. [
21], where it is stated that current quantum computers are not perfect due to decoherence in qubits caused by environmental noise, but they can perform certain calculations or solve problems that are beyond the reach of the best classical computers available today. It is argued that the main limitation will be the ability to maintain precision in quantum operations as circuits become larger and more complex. In ref. [
22], it is explained how to experimentally extend the coherence time of logical qubits by almost an order of magnitude.
Another challenge that exists today is to maintain a large number of entangled qubits in a stable manner. Quantum computers that address this challenge include the IBM Kyoto [
23], IBM Brisbane [
24,
25], and Google’s Sycamore quantum computer [
26], among others.
To our knowledge, the use of QKM and QVM techniques has not been applied in the analysis of real cases in PD image detection in transformer oils using optical sensors. In this paper, a comprehensive study of PDs originating from bubbles present in dielectric mineral oil is carried out. These discharges are precursors of the arc breakdown and therefore represent a method for diagnosing the state of mineral oil before such a breakdown occurs.
In this paper, images captured in a high-voltage laboratory are processed by selecting a number of significant features. For this purpose, the Scikit-image environment [
27] is used. Two quantum classifier models, QKM and QVM, are developed. These models are implemented in the Qiskit development environment [
28].
In this article, we focus on the use of quantum computing to address the problem of image classification in transformer dielectric oils. The images used in this article are classified into four categories: images with partial discharges, images without partial discharges, images with electric arc breaking, and images with gas bubbles after arc breaking.
The effectiveness of the trained QKM and QVM classifiers is evaluated with images not used during the training process.
These models were run on three fault-tolerant physical quantum computers, each with 127-qubit superconducting processors: IBM Osaka, IBM Brisbane, and IBM Kyoto. The measurements obtained using quantum computers were then compared with the results of simulations obtained using classical computing.
The main contribution of this paper is that for the first time, two quantum machine learning models, QVM and QKM, are applied and compared for the classification of electrical discharge images in dielectric oils, using real data obtained with a high-resolution optical sensor.
The novelty of this work can be summarized in the following points: The work is applied to a real problem, unlike other previous studies that focus on simulated or theoretical data sets. In addition, a study is carried out on the impact of the number of qubits in QKM, and it is shown that increasing the number of qubits in this model significantly improves the accuracy in the classification of the four binary combinations of the classes. On the other hand, real quantum computers are used, and the models are implemented and executed on three fault-tolerant IBM quantum computers, demonstrating their operation on real quantum hardware and providing results comparable to classical simulations. This work also provides transparency and reproducibility by creating a repository on Zenodo [
29], with a detailed README, where the images of the electrical discharges used, the Jupyter Notebooks 7.0.8 for the extraction of the features, and the Jupiter Notebooks with the Python 3.12.4 programming of QVM and QKM with the respective figures have been published, so that the scientific community can access and use them.
This article is divided into the following sections:
Section 2, Image Processing and Feature Extraction Method;
Section 3, Quantum Machine Learning with Variational Circuits (Quantum Variational Model, QVM);
Section 4, Support Vector Machine, SVM;
Section 5, Quantum Kernel Model, QKM;
Section 6, Overall Flowchart. Finally, in
Section 7, Conclusions, the main conclusions are presented.
2. Image Processing and Feature Extraction Method
In this work, an analysis of the PDs originated from bubbles present in dielectric mineral oils is performed. For this purpose, a high-resolution image sensor is used. The PDs detected with this sensor were validated using a standard electrical detection system using a discharge capacitor, according to the IEC60270 standard [
30]. All images used in this paper were previously obtained by the authors [
31]. These images were used to characterize and train the quantum circuits in
Section 3 and
Section 5.
From the extraction of features in machine learning, relevant values are obtained from the obtained experimental images, speeding up the computing process without losing information. This reduces the required memory and computing time and improves the accuracy of the model.
In ref. [
1], the main techniques used to date are summarized. Those based on statistical characteristics are highlighted, as well as the technique based on principal component analysis (PCA) due to its capacity to reduce dimensionality and identify key variables, among others. This is crucial when working with a limited number of qubits that correspond to the current limitations of quantum technology. The method used in this article is explained below.
Figure 1a presents four images, each corresponding to one of the four classes used: class 0 for partial discharge (PD), class 1 for no discharge (NOPD), class 2 for electric arc breaking (ARC), and class 3 for gas bubbles after arc breaking (BREAK).
Figure 1b shows the experimental image collection device. From these images, features are extracted that reduce the number of qubits needed to perform quantum analyses.
Features are basic properties that characterize and simplify experimental images of electrical discharges. The goal is to work with as few qubits as possible. For this reason, the extraction of features from images has been reduced to a maximum of thirteen.
To process the images captured in the high-voltage laboratory and select their features, the Scikit-image environment is used. An explanation of the entire Scikit-learn environment, which includes working with images in multiple formats and provides tools for transforming, analyzing, and improving images, is provided in ref. [
27]. This includes filtering functions, geometric transformations, edge detection, segmentation, and color manipulation, among others.
To analyze the characteristic features of electrical discharges in images captured with a high-quality camera, a region of interest (ROI) selection and analysis process was first performed. Thus, the ROI was defined as a square centered on the image with a side of 100 pixels. To do this, the coordinates of the center of the image were calculated, and the vertices of the square were located (
Figure 2a). The ROI within this square was converted to greyscale to facilitate the analysis. The ROI is shown in a red frame.
Then, the mean and standard deviation of the pixel intensities within the ROI were calculated. Using these values, a threshold was set as the mean plus two times the standard deviation. Pixels whose intensity exceeded this threshold were identified and their coordinates determined. The mean of these coordinates was then calculated to obtain the centroid of the high-intensity region.
To highlight pixels exceeding the threshold, the original image was modified by highlighting these pixels in red. In addition, the area of the highlighted region was calculated in terms of the number of pixels, and the centroid of this region was determined. Finally, the coordinates of the centroid were adjusted with respect to the originally selected ROI (see
Figure 2a).
Key features obtained in this image analysis include the area in pixels of the highlighted region, the centroid coordinates in both the ROI and the original image, and the intensity statistics of the ROI. All these features are normalized to the interval [0, 2π].
The thirteen features of the images and the class to which they belong are as follows: the area in pixels, the centroid coordinates (centroid_x, centroid_y), the adjusted centroid coordinates in the ROI (centroid_x_roi, centroid_y_roi), the means of the coordinates (mean_coords_x, mean_coords_y), the size of the side of the square (side_px), the dimensions of the image (image_width, image_height), the mean intensity (mean_intensity), the standard deviation of the intensity (std_intensity), the threshold (threshold), and finally the class to which it belongs.
Figure 2b and
Figure 3b show the preprocessing results with the acquisition of the features for the PD and BREAK classes, respectively.
From the images of each binary combination between classes, a graphic study of the relationship between the possible pairs of features of the images, which turn out to be 13 × 13 graphs, was performed. This allows a preliminary study of the relationship between the different features of each class. This visualization allows for a first analysis, and to identify patterns and differences between the binary classes of images corresponding to each combination of classes.
Figure 4 presents a pairwise plot illustrating the relationships between five of the thirteen normalized image features, highlighting the two classes using colors for the PD_BREAK combination shown in
Figure 1a.
Likewise, three other binary combinations were analyzed, PD_NOPD, PD_ARC, and BREAK_NOPD. Of all the possible binary combinations, these three are analyzed in this article because they are the most significant in the study of partial discharges.
Two machine learning methods were used, QVM and QKM. The first is considered in
Section 3 and uses a trained variational quantum circuit to distinguish each class for each of these binary combinations. The second is considered in
Section 5 and uses SVM by estimating the quantum kernel corresponding to each of these binary combinations.
5. Quantum Kernel Estimate (Quantum Kernel Model, QKM)
In this section, a quantum kernel estimate is made for two, three, and eight features using a quantum computer with two, three, and eight qubits, respectively. Once the kernel has been estimated, Equations (20)–(25) are used on a classical computer to make the corresponding membership estimate of a new element x.
The quantum kernel estimation algorithm consists of mapping classical data vectors into quantum states [
11]. This is achieved by a mapping that transforms a classical feature vector x into a quantum state |
ϕ(
x)⟩. This process is performed by parameterizing a quantum circuit with the feature x, transforming a unitary matrix over
n qubits into the ground state |0
n⟩, i.e.,
U(
x)|0
n⟩.
A quantum kernel is based on using quantum states to represent data and calculate the similarity between them in a quantum feature space. This is done using quantum circuits that encode the data into quantum states. The similarity between two feature vectors
x and
y is calculated by the Hilbert–Schmidt inner product between density matrices [
8]. The similarity between two feature vectors
x and
y is calculated as the value of the quantum kernel
k(
x,
y) given in Equation (26):
The way to evaluate each point in the matrix
k(
x,
y) is to run a quantum circuit
U†(
x)
U(
y) on the input
and find the probability of obtaining the state
.
Figure 11 shows the generic structure of the quantum circuit used to estimate the particularized kernel for three features. For this purpose, the ZZFeatureMap function obtained from the Qiskit library [
28] was used. This is a parameterized quantum circuit used to map classical data to a quantum feature space. This mapping is performed by applying quantum rotation gates on the qubits, followed by CX-type interactions between pairs of qubits, which allows the capture of nonlinear relationships in the data.
Next, in
Section 5.1, the results obtained with the QKM method for the binary combinations PD_NODP, PD_BREAK, PD_ARC, and BREAK_NOPD, with two features, are explained in order to make a comparison with the results obtained with the QVM method. Then, in
Section 5.2, the results with the QKM method for three and eight features are analyzed, and the improvement in accuracy is analyzed.
5.1. Two-Feature Kernel Estimation
All simulations performed on classical computers and experiments performed on quantum computers were run following the basic ZZFeatureMap circuit structure, shown in
Figure 11 and defined in [
35]. In this section, the kernel is estimated for two features, [‘area-pixels’, ‘mean-coords-x’], described in
Section 2 and for the four class combinations PD_NODP, PD_BREAK, PD_ARC, and BREAK_NOPD.
The exact kernel results for the PD_BREAK class combination are depicted in
Figure 12a. Only the upper diagonal needs to be computed, since the matrix is symmetric. The element kernel matrix obtained is 580 × 580, as can be seen in the Data Index (X1) and (X2). The number of computations, which corresponds to the quantum circuit in
Figure 11, is determined by Equation (26). This circuit was used to estimate the kernel on a quantum computer, performing 168,200 evaluations. For each evaluation, the kernel matrix value is normalized between 0 and 1.
Figure 12b shows a comparative study between the exact kernel value and the value obtained with the IBM Kyoto computer for row 40 and columns 0 to 24 (Pub 0 to Pub 24) of the matrix shown in
Figure 12a. The matrix was generated using the library Qiskit [
28], with the Jupyter Notebook QKM_verification_two_qubits.ipynb allocated in the repository [
29].
Due to time constraints on available quantum computers, 140 values were randomly selected from the top of the matrix to estimate the kernel and compared with those obtained in simulations on a classical computer. These values are shown in
Figure 13a, with the execution time on the quantum computer being 2 m 34 s. The results of the simulation and the execution on the IBM Kyoto computer are presented in
Figure 13b, with a mean absolute percentage error (MAPE) of 7.6% according to Equation (27).
Figure 12b details the comparison between the exact value and the results of 25 consecutive values from row 40 of the kernel matrix, obtained on the same quantum computer.
where
k = 140 is the total number of elements,
Ai is the actual observed value, and
Fi is the value predicted by the simulation.
Figure 14a shows another 140 random evaluations, different from the previous ones, see, performed on another quantum computer, IBM Osaka, for PD_BREAK class combination, in order to verify the results in another physical environment.
Figure 14b shows the comparison between the results obtained on the real IBM Osaka quantum computer and the simulation, with a runtime of 2 m 34 s on the IBM computer.
The results obtained in the three quantum computers were verified in the other three class combinations PD_NOPD, BREAK_NOPD, and PD_ARC, and are represented in
Figure 15,
Figure 16, and
Figure 17, respectively, along with the simulations of the kernel obtained with a classical computer.
The exact kernel results for the PD_NOPD, BREAK_NOPD, and PD_ARC class combinations are depicted in
Figure 15a,
Figure 16a, and
Figure 17a, respectively. Only the upper diagonal needs to be computed, since the matrix is symmetric. The element kernel matrix obtained is 580 × 580, as can be seen in the Data Index (X1) and (X2). The number of computations, which corresponds to the quantum circuit in
Figure 11, is determined by Equation (26). This circuit was used to estimate the kernel on a quantum computer, performing 168,200 evaluations. For each evaluation, the kernel matrix value is normalized between 0 and 1.
Figure 15b,
Figure 16b, and
Figure 17b show a comparative study between the exact kernel value and the value obtained with IBM Osaka, Brisbane, and IBM Kyoto computers, respectively, for row 40 and columns 0 to 24 (Pub 0 to Pub 24) of the matrix shown in
Figure 15a,
Figure 16a, and
Figure 17a, respectively. These matrices were generated using the library Qiskit [
28], with the Jupyter Notebook QKM_verification_two_qubits.ipynb allocated in the repository [
29].
As a final summary of the results obtained for two features,
Table 4 presents the accuracy of all the binary combinations using quantum kernel estimation on the different two-qubit features x = [‘area-pixels’, ‘mean-coords-x’], using a test image ratio of 20%. The accuracy values are 83% and 97% for the PD_ARC and PD_NOPD combinations, respectively. The execution times for the kernel calculation on the test items vary between 2587 s and 2639 s for these models.
Results for the QVM are presented in
Table 3 (see
Section 3). In the first QVM, an accuracy of 92% was observed on the test set for 136 random images under the first PD_NOPD binary combination, obtained consistently on the three types of quantum computers used, with a 1% error margin compared to the simulation. For the other three binary combinations, the test set accuracy was around 88%.
However, for QKM, using SVM, the test set accuracy for the PD_NOPD combination reached 97%, while for the other combinations, the average accuracy was 89% (
Table 4).
5.2. Kernel Estimation with Three and Eight Features
In this section, the improvement in accuracy when increasing the number of features is analyzed.
The features used for three qubits are represented in Equation (28). The test or validation set represents 80% of the data.
Table 5 shows the results of four binary combinations, using an SVM with a kernel to train and evaluate models with these three features, parameter
C1 = 1 in Equation (20). The results indicate that the accuracies on the test set for the binary combinations PD_NODP, PD_BREAK, PD_ARC, and BREAK_NOPD are 99%, 85%, 84%, and 92%, respectively. The kernel matrix computation times for the training data vary between 157 s and 307 s, corresponding to a range of 129 to 173 evaluations. For the validation set, the times range between 1290 and 3509 s, due to a higher number of evaluations.
The features used for eight qubits are expressed in Equation (29). The test or validation set represents 80% of the data.
Table 6 shows the results of four binary combinations, using an SVM with a kernel to train and evaluate models with these eight features. The results indicate that the accuracy on the test set for all the binary classes PD_NODP, PD_BREAK, PD_ARC, and BREAK_NOPD is 99%, 94%, 99%, and 98%, respectively. The kernel matrix computation times for the training data vary between 294 and 531 s, corresponding to a range of 129 to 173 evaluations. For the validation set, the times range between 2352 s and 4229 s, due to a higher number of evaluations.
A comparison of
Table 5 and
Table 6 shows how increasing the number of features affects both the accuracy and the execution time of the SVM models with the kernel. When using more features (eight instead of three), an improvement in accuracy is observed, especially on the test set, suggesting that the model generalizes better with new data. However, this improvement leads to an increase in execution times as more features are added.
6. Overall Flowchart
In this section, an overall flowchart, see
Figure 18, that explains the main steps followed in the article to obtain the different results shown in
Section 2,
Section 3,
Section 4 and
Section 5 has been added. We believe this will make the procedure easier to understand.
A detailed description of each step of the process is included in the README of the Zenodo repository [
29], along with links to the corresponding Jupyter Notebooks. To make it easier to understand the overall workflow, we have included a summary description of the flowchart in four steps. This flowchart is described below.
Flowchart Description:
The flowchart is generated running the Binary_features_generation.ipynb file and can be viewed directly in the repository. It graphically represents the four main steps of this experimental process.
Step A: Feature Generation
The name of the Jupyter Notebook is Binary_features_generation.ipynb.
It presents the following subsections: image visualization, feature extraction, and binary concatenation of files in csv format. Folders /IMAGES/ and /FEATURE_RESULTS/ are referenced, and the feature_*.csv files are introduced.
Step B: Optimization of QVM Parameters
The name of the Jupyter Notebook is FIT_DP_NODP_CIRCUIT.ipynb.
It presents the following subsections: library import, function definition, data loading, normalization, quantum circuit definition, cost function, COBYLA optimization, and model evaluation. Reference is made to
Table 1 and to the optimal parameters stored in the variable named opt_var.
Step C: Verification on Quantum Hardware/Simulation
The name of the Jupyter Notebook is QVM_verification_two_qubits.ipynb.
It presents the following subsections: environment setup, backend selection, data and parameter loading, circuit definition, transpilation, circuit execution, and results analysis. Reference is made to
Table 3 and the results comparison graphs.
Step D: Execution of QKM Model
The name of the Jupyter Notebook is QKM_verification_two_qubits.ipynb.
It presents the following subsections: execution of the QKM model on real quantum computers, real quantum computer execution, quantum kernel estimation algorithm, and SVM with QKM for two, three, and eight qubits.