1. Introduction
The rapid advancement of quantum computing has spurred significant interest in the field of quantum machine learning (QML) as a promising approach to leverage quantum algorithms in machine learning tasks [
1]. Quantum computing models such as quantum support vector machines [
2] and quantum kernel methods [
3] have gained traction due to their solid theoretical foundations and potential computational advantages. While these models are often favored for their interpretability, their precision can be limited, particularly in the noisy intermediate-scale quantum era. In contrast, quantum neural networks (QNNs) have emerged as powerful tools capable of handling large-scale datasets and offering superior performance [
4,
5,
6]. However, the increased complexity of QNNs presents a unique challenge in balancing interpretability with model performance.
QNN development has paralleled advancements in classical AI [
4,
7,
8,
9,
10,
11]. Despite the potential benefits QNNs offer, their internal processes remain obscure, raising questions about their learning mechanisms and ability to assimilate knowledge for prediction generation. Unlike classical models, where interpretability can often be traced to factors like linear weights or decision rules [
12], quantum models are inherently more opaque. Quantum phenomena such as superposition and entanglement complicate interpretability, presenting barriers to their application in critical domains where transparency, reliability, and trust are essential.
The opacity of QNNs parallels concerns in classical machine learning, where deep neural networks (DNNs) have been criticized for their lack of transparency. DNNs, often labeled as “black boxes”, have raised concerns in sensitive fields such as healthcare, finance, and autonomous systems, where understanding model decisions is crucial [
13,
14,
15,
16]. Consequently, researchers have focused on developing explainable artificial intelligence (XAI) techniques to provide insights into the inner workings of DNNs, enhancing their interpretability, trustworthiness, and compliance with regulatory standards [
17,
18,
19,
20,
21].
In remote sensing, the volume and complexity of data generated by modern sensors pose significant challenges for data analysis and interpretation. QNNs have the potential to process large-scale remote sensing data more efficiently than classical models, but their opacity hinders their adoption in critical applications where interpretability is essential [
22,
23,
24,
25]. For instance, in disaster response scenarios, understanding why a model predicts certain areas as high risk is crucial for decision-makers. Similarly, in environmental monitoring, transparent models are needed to justify actions based on detected changes in land use or vegetation. Beyond remote sensing, other domains also benefit from explainable quantum models. In healthcare, QNNs could potentially analyze complex genomic data to identify disease markers, but clinicians require interpretable models to trust and act upon such predictions. In finance, quantum models might detect subtle patterns in market data for investment strategies, yet explainability is necessary to comply with regulatory standards and to gain user trust.
This paper advocates for the development of explainable quantum artificial intelligence (XQAI) to bridge the gap between quantum model performance and interpretability. XQAI aims to provide clear and interpretable insights into the predictions of QNNs, similar to the XAI techniques used for classical models. By enhancing the understanding of QNNs, XQAI can guide the future of quantum technologies and their application across a range of fields, from finance to healthcare.
QNNs, however, present distinct challenges. For example, they are prone to barren plateau problems [
26], which make optimization difficult by flattening the loss landscape. Yet, a careful selection of QNN architecture and cost functions [
27,
28,
29] can mitigate such issues. Moreover, efforts to make QNNs more interpretable must account for the specific quantum characteristics of the model and the dataset being used.
This paper contributes to advancing XQAI by exploring a comprehensive set of explainability methods applied to QNNs, integrating both example-based and feature-based approaches. These methods provide interpretable insights into QNN predictions, offering new tools to enhance the trustworthiness of quantum models.
Figure 1 illustrates the fundamental concepts and differences between feature-based and instance-based interpretability methods. The trained quantum neural network is capable of processing various types of data inputs, including quantum data and classical data. The figure also summarizes the related methods and evaluation metrics of the explainable quantum artificial intelligence approaches proposed in this work. Among these, the methods in green font (occlusion analysis, deletion diagnostics) represent approaches that require modifying the model input or retraining the model. In contrast, the methods in blue font (gradient-based interpretability methods, influence functions) represent interpretability techniques that do not require changing the model input and rely solely on the model’s internal gradient information.
The primary contributions of this paper are as follows:
A comprehensive analysis of a diverse set of explainability methods is conducted, selectively applying appropriate techniques to elucidate QNN models. By integrating both example-based and feature-based approaches, this study broadens the range of explanation techniques available in quantum machine learning.
A quantitative evaluation of the proposed feature-based explanation techniques through the probability drop curve is provided. The findings reveal that QNNs are generally more challenging to explain than classical neural networks (NNs), highlighting the unique complexities posed by quantum models in explainable AI.
The study also demonstrates that QNNs exhibit higher sensitivity to data compared to classical models. The interdependence of features in QNNs, due to quantum phenomena such as entanglement, makes them more reliant on complete data, thus emphasizing the need for robust and interpretable explanation techniques to better understand how quantum models process information.
The remainder of this paper is structured as follows. In
Section 2, we introduce key concepts related to QNNs, explanation methods, and deep neural networks.
Appendix A reviews the relevant literature. In
Section 3, we discuss suitable explanation methods for quantum machine learning models and detail the metrics used to assess them. In
Section 4, we present experimental results, followed by concluding remarks and suggestions for future work in
Section 5.
2. Preliminary
2.1. Quantum Neural Networks
Quantum neural networks are a class of quantum algorithms that leverage quantum computation to process information and perform machine learning tasks. QNNs are implemented as hybrid quantum–classical algorithms, utilizing both quantum and classical resources. These algorithms consist of a quantum subroutine that evaluates an objective function and a classical subroutine that optimizes the parameters based on the quantum output. Hybrid quantum–classical algorithms are more resilient to noise and limitations in current quantum devices, as they require fewer quantum resources and employ shorter-depth circuits compared to fully quantum algorithms.
Qubits can be physically realized using various systems, such as superconducting circuits [
30] and trapped ions [
31]. Superconducting qubits, like transmons, are implemented using Josephson junctions, where quantum states are manipulated using microwave pulses. Trapped ion qubits use atomic ions confined by electromagnetic fields, with quantum operations performed using laser pulses. These physical systems support the quantum gates and operations used in QNNs, ensuring coherent quantum state evolution.
For a QNN with n qubits, the input is a quantum state typically prepared by encoding classical data into a quantum state through an encoding method . The parameterized quantum circuit (PQC) with L layers is a sequence of quantum gates, typically consisting of single-qubit rotations and two-qubit entangling gates. In the hardware-efficient ansatz, is constructed as a product of parameterized single-qubit gates and entangling operations as follows: where and represent parameterized single-qubit rotations around the z-axis and y-axis, respectively, for the k-th qubit in the i-th layer. The entangling operations are represented by W, typically consisting of controlled-NOT (CNOT) gates applied between neighboring qubits or according to some predefined topology. This ansatz allows the QNN to learn complex quantum correlations with relatively shallow circuits, making it suitable for near-term quantum devices.
The output of the PQC is the transformed quantum state . After processing the input state, a quantum measurement is performed to extract classical information from the output state. The measurement yields the expectation value , with being the observable, which varies depending on the label under consideration. This result is used to estimate the label y associated with the input data.
Entanglement plays a fundamental role in quantum algorithms and has been crucial in achieving quantum speedups. For instance, the Deutsch–Josza algorithm [
32], which provides exponential speedup over classical algorithms, relies on the quantum superposition and entanglement to evaluate Boolean functions efficiently. Shor’s algorithm for factoring large integers [
33] and Grover’s search algorithm [
34] also demonstrate the power of quantum entanglement and non-local operations, where multiple quantum states can be processed simultaneously, leveraging quantum parallelism. Similarly, the Harrow–Hassidim–Lloyd (HHL) algorithm [
35] uses quantum entanglement to solve linear systems exponentially faster, leading to advances in quantum machine learning applications such as quantum support vector machines (QSVMs) [
2] and quantum principal component analysis (QPCA) [
36].
In the context of QNNs, quantum entanglement and non-local operations also play a key role in the network’s ability to capture and process complex data patterns. These quantum phenomena allow QNNs to create correlations between qubits that cannot be replicated by classical models, which is particularly beneficial for high-dimensional and quantum-native datasets. Although QNNs may not yet have the same rigorous theoretical guarantees of quantum advantage as algorithms like Shor’s, quantum entanglement and non-local operations remain essential for increasing the expressiveness and efficiency of QNNs, enabling them to leverage quantum superposition and non-classical correlations to solve certain tasks more effectively than their classical counterparts [
3,
37,
38].
2.2. Data Encoding and Two-Qubit Operations
To process classical data using a QNN, it must first be encoded into a quantum state. This can be accomplished using various encoding methods, each offering distinct advantages and limitations. We briefly describe three commonly used encoding methods in the experiments.
Amplitude encoding encodes classical data into the amplitudes of a quantum state, requiring a number of qubits logarithmic with respect to the input data size. This makes it highly efficient for high-dimensional data. However, it is challenging to implement in practice due to the computational cost of state preparation. Furthermore, calculating gradients with respect to the input data, as required by gradient-based methods, is non-trivial.
In gate encoding, classical data are encoded into the parameters of Pauli rotation gates applied to a fixed initial quantum state, such as the computational basis state. This method offers flexibility and expressiveness but may require a large number of gates to represent complex data structures.
Figure 2 illustrates data encoding on a three-qubit system using the
rotation gate. For a vector of length
,
N layers of interleaved rotation gates and two-qubit entanglement gates are required. The Pauli rotation gates
have a period of
, and their cyclic behavior (
for
) makes the expected value of an observable periodic with
. Hence, input features should be normalized to the range
using Min-Max scaling.
Data re-uploading is a technique that encodes classical data by applying a series of gate encodings multiple times with intermediate parameter updates [
39]. As shown in
Figure 2, data are encoded using the same mechanics as gate encoding, and the circuit is applied
R times with trainable parameters. Re-uploading the data multiple times increases the expressiveness of the quantum circuit, potentially improving classification performance. However, this approach introduces additional computational overhead due to the repeated encoding. A summary of these encoding methods, highlighting their advantages, disadvantages, and use cases, is provided in
Table 1.
This study compares three layouts of two-qubit operations: nearest-neighbor (NN), circuit-block (CB), and all-to-all (AA) layouts, as illustrated in
Figure 3, following a similar approach to that proposed by [
38].
2.3. Training and Optimization of QNNs
The objective of training a QNN is to optimize the parameters in the PQC to minimize a loss function . The parameters represent the trainable weights that control the quantum gate operations within the PQC. In hardware-efficient ansätze, typically refers to the angles of rotation gates like , , and , and these parameters are usually restricted to the range to reflect the periodicity of the quantum operations.
The loss function
measures the difference between the predicted outcome
and the true outcome y. For classification tasks, cross-entropy loss is commonly used. It is defined as
where
is the predicted probability of class
i, and
is the true label. For regression tasks, the mean squared error loss is typically used, while for quantum-specific tasks, such as unitary learning, fidelity distance is often employed to measure the similarity between predicted and actual quantum states.
Training a QNN requires optimizing the parameters
to minimize the selected loss function. This is achieved through optimization algorithms like gradient-based or zero-order methods. For a dataset
, the objective is to minimize the empirical risk:
In gradient-based optimization, the gradient of the loss function with respect to the parameters is calculated using techniques such as the parameter-shift rule [
41,
42], which allows for estimating gradients of quantum expectation values using only a fixed number of quantum evaluations. The parameters are updated iteratively using optimizers like gradient descent or its variants (e.g., Adam [
43]):
where
is the learning rate and
t denotes the iteration step.
Decoherence is a significant challenge in QNNs, as the loss of quantum coherence can reduce the fidelity of the quantum states, leading to suboptimal or incorrect outputs. Non-ideal unitary gate operations, which occur due to hardware noise, further exacerbate this issue by introducing errors during the execution of quantum circuits. These factors complicate the training of QNNs, requiring error mitigation techniques and the design of noise-resilient quantum algorithms to maintain model accuracy in noisy environments [
44].
While we have briefly touched on the impact of quantum noise on interpretability in this work, we recognize that a more detailed exploration is necessary. Future research will focus on the interplay between quantum noise and explainability, along with the development of noise-mitigation strategies to ensure the robustness and transparency of QNN models in practical applications.
2.4. Deep Neural Networks
Deep neural networks are a class of artificial neural networks composed of multiple layers of interconnected neurons. A DNN can be mathematically represented as a composition of several non-linear functions, with each function corresponding to a layer in the network.
Consider a DNN with
L hidden layers. Each layer can be represented by a function
, where
. The output of each layer is computed as the weighted sum of inputs from the previous layer, followed by the application of a non-linear activation function. Let
x be the input vector,
the weight matrix, and
the bias vector for layer
l. The output
of layer
l can be expressed as
where
is the input to the first layer, and
represents the activation function for layer
l. Common activation functions include sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU).
The output of the final layer,
L, is passed through an output activation function or decision function to generate the final prediction,
y. In classification tasks, a softmax function is often used to produce class probabilities:
where
represents the output function, and softmax normalizes the output into a probability distribution over the classes.
Training a DNN involves minimizing a loss function, , which quantifies the discrepancy between the predicted output y and the true target . The goal is to adjust the network’s weights and biases using an optimization algorithm, such as gradient descent. During training, the gradients of the loss function with respect to the parameters ( and ) are computed using backpropagation, which applies the chain rule to efficiently calculate the necessary gradients.
In summary, a DNN is a composition of multiple non-linear functions, where each function corresponds to a layer in the network. It is trained using a loss function and optimization algorithm to minimize the error between predictions and target outputs.
3. Methods
Explanation methods vary in their applicability to QNNs. Some methods, like deletion diagnostics and influence functions, are effective with both classical and quantum datasets, making them suitable for example-based approaches. However, for feature-based methods, the situation is more complex. Techniques tailored for specific architectures, such as class activation mapping (CAM) for convolutional neural networks (CNNs), are not applicable to QNNs. Additionally, methods requiring information about intermediate quantum states, such as layer-wise relevance propagation, are impractical for QNNs due to the difficulty of calculating quantum state gradients.
Table 2 provides a summary of the explanation methods we will discuss in the following sections, categorized into feature-based and example-based approaches. We propose occlusion analysis and gradient-based explanation methods for QNNs, with the choice of data type and encoding method being crucial factors. For classical datasets, a QNN with gate encoding can be effectively analyzed using both occlusion and gradient-based methods. In contrast, amplitude encoding presents challenges for gradient-based methods due to the difficulty in computing gradients. For occlusion analysis, the lack of physical meaning when perturbing individual computational basis amplitudes further complicates its implementation. Similar challenges arise when deleting or perturbing the amplitudes of quantum data.
3.1. Feature-Based Explanation Methods
This section introduces two key feature-based methods: occlusion analysis and gradient-based approaches. Both provide valuable insights into the inner workings of deep learning models, each with distinct advantages and computational requirements.
3.1.1. Occlusion Analysis
Occlusion analysis involves systematically masking parts of the input and observing the resulting changes in the model’s output. By analyzing how the occlusion affects the model’s predictions, we can identify which regions of the input are most important for the model’s decision-making process.
For example, consider a two-dimensional input with dimensions . We use an occlusion window (mask) M with dimensions , applied to the input with a stride s. This generates a set of modified inputs, , where index the position of the occlusion window. For each occluded input , we calculate the QNN model’s output, typically as the probability over the target class , where is the observable for the target class, and represents the model parameters.
The occlusion importance score
quantifies the impact of occlusion at position
and is computed as the difference between the original output and the occluded output:
Larger positive values of
indicate that the occluded region is more important for the model’s decision-making process. Visualizing the occlusion importance scores as a heatmap provides a clearer understanding of which areas in the input contribute most to the model’s prediction, enhancing interpretability.
The number of occluded inputs to be processed is determined by the size of the occlusion window and the stride. For an input of dimensions
and an occlusion window of size
, the total number of occluded inputs is given by
Here,
s is the stride with which the occlusion window is shifted across the input. For example, if the stride is 1, the window is moved one pixel at a time, resulting in a higher number of occluded inputs. Increasing the stride reduces the number of occluded inputs but may lead to a less granular analysis of feature importance.
For each occluded input, the model’s output must be computed. The complexity of each forward pass of the QNN is , where C represents the number of measurements required to achieve a certain precision. Thus, the total complexity of occlusion analysis is . This complexity depends on the input dimensions, the size of the occlusion window, the stride, and the number of measurements needed by the QNN. Consequently, this approach can be computationally expensive, especially for large inputs. Parallelizing the process can help mitigate the computational cost. On the other hand, gradient-based methods typically require one or a few backward passes and generally offer better computational efficiency.
3.1.2. Gradient-Based Explanation Methods
Gradient-based explanation methods leverage the gradients of the model’s output with respect to the input features to provide explanations. The
saliency map method is introduced here, with other gradient-based methods (e.g., gradient input multiplication, integrated gradients, and SmoothGrad) discussed in
Appendix C.
Given a QNN model
mapping input
to a scalar output, the goal is to explain the model’s prediction for a specific input
x. The saliency map [
45] is obtained by computing the gradients of the output with respect to the input features:
The resulting saliency map is a matrix with the same dimensions as the input, representing the importance of each feature in the model’s prediction.
By visualizing the saliency map as a heatmap superimposed on the input, one can intuitively understand the contribution of each feature to the model’s decision. Regions with higher intensity in the saliency map correspond to features with a greater impact on the model’s output, while lower-intensity areas indicate less important features.
3.2. Example-Based Explanation Methods
3.2.1. Deletion Diagnostics
To define the influence of a specific training data point for a QNN, we consider the loss function
, which quantifies the discrepancy between the true labels
and the QNN’s predictions. A standard training scheme assigns equal weight to each training example, with the objective function given by
To study the influence of each training data point on test samples, we modify the objective by re-weighting it with a vector
:
where
is the weight assigned to each training data point, with
representing the standard training scheme. Denote
as the model parameter retrained from scratch by minimizing
.
Using deletion diagnostics, the influence of the training data point
on a test data point
is defined as
where
assigns zero weight to the
i-th data point. If
has a positive influence on
, the loss under
is generally larger than that under
.
The process of retraining the model for each data point can be computationally prohibitive for large datasets, as it requires training multiple models from scratch.
3.2.2. Influence Function
The influence function, introduced by [
46], provides an infinitesimal approximation to bypass the computational difficulties of deletion diagnostics. Assume that
is differentiable with respect to
at
, and that
is twice differentiable in
. The influence of
on the test data
is defined as
where
, and
, with
being the
ith standard basis vector in the space of
. Here,
is a small perturbation, analyzed at zero for local behavior.
Applying the implicit function theorem yields the following:
where
is the Hessian, or its quantum equivalent,
, and
. Estimating
in the quantum context can be challenging, but this approach offers computational benefits over retraining. Thus, the influence function is simplified to
A limitation of influence functions is that they often identify outliers or mislabeled data as highly influential, making them suboptimal for explanations. To address this, the relative influence function distinguishes between global and local influence by assessing the local impact of an example on a prediction relative to its overall effect on the model [
47]. The relative influence function is defined as
Liu et al. [
48] discusses the relationship between the influence function and the quantum Fisher information matrix. For example, using the fidelity distance as the cost function, the quantum Hessian is equivalent to the quantum Fisher information matrix. Fidelity distance is defined as
, where
represents the fidelity between two pure states. While approximating the quantum Hessian matrix
is often difficult, techniques like those in [
49] can be useful.
The method of encoding classical data into quantum states can also influence the impact of individual data points. Furthermore, the noisy nature of current quantum devices introduces potential errors in the computation of the influence function. Error mitigation techniques may be necessary to ensure accurate results. Despite these challenges, influence functions in QNNs can provide valuable insights into model behavior, helping to identify influential data points, detect anomalies, and improve the interpretability of quantum machine learning models.
While the influence function provides significant insights into the impact of individual data points, its computational complexity presents challenges, particularly as the number of qubits increases in larger QNN models. The core computational bottleneck lies in calculating the inverse of the Hessian matrix , which grows quadratically with the number of parameters in the quantum circuit. For QNNs with many parameters, especially those with deep quantum circuits or multiple layers, computing becomes prohibitively expensive in terms of both time and memory.
Moreover, the noisy nature of current quantum devices exacerbates the difficulty of accurately estimating gradients and Hessians. Noise introduces additional variance into the computation, further increasing the complexity of reliably calculating influence functions in practical quantum settings. To mitigate this, noise-resilient methods and error-mitigation strategies must be integrated into the calculation process. These may include techniques like Richardson extrapolation and error correction codes, which can reduce the impact of noise on the gradients and Hessians.
While influence functions offer a valuable method for explainability in QNNs, their applicability to larger models is limited by computational complexity and noise. Future research should focus on developing scalable and noise-tolerant techniques for calculating influence functions in large-scale quantum systems.
3.3. Metrics: Faithfulness
Achieving an unbiased assessment of an explanation’s quality is critical in practice. However, evaluating explanations is often challenging due to the lack of universally accepted “ground truth” explanations. The concept of human explainability remains ambiguous and difficult to define [
21]. Users’ ability to comprehend explanations and discern underlying features may vary significantly depending on their expertise. For instance, a non-expert may prefer a simple visual representation, while an expert might seek a more detailed explanation with precise scientific terminology.
A key criterion for evaluating an explanation is how accurately and comprehensively it represents the local decision structure of the quantum neural network (QNN) model being analyzed. One practical approach to assess this is by examining how the removal of features identified by the explanation leads to a significant decrease in the model’s predictive capabilities [
50]. This method involves iteratively eliminating input features, starting with the most relevant, and monitoring changes in the model’s output. The changes in prediction scores can be visualized as a score drop curve. This curve can be calculated for a single instance or averaged across an entire dataset to estimate the faithfulness of the explanation algorithm on a global scale.
In the case of QNNs, where the output is the probability of certain computational bases, the probability drop curve is used as a metric. Two key indicators on the curve warrant attention: (a) the steepness of the initial descent and (b) the minimum value reached by the curve. A steeper initial drop and a smaller minimum value suggest that the explanation method more faithfully reflects the network’s decision-making process.
An alternative evaluation metric involves taking the result of one explanation method as the ground truth and calculating the faithfulness of other methods relative to it. Since the result of occlusion analysis tends to be stable and can reflect true feature importance, we use it as the ground truth. We then calculate the rank correlation of feature order results from gradient-based methods. By comparing these rank correlations, we can evaluate the relative performance of various explanation methods and assess their ability to provide meaningful insights into the QNN’s decision-making process. This method helps researchers and practitioners select the most suitable explanation techniques for enhancing the interpretability and trustworthiness of quantum machine learning models.
3.4. Interplay Between XQAI and Quantum Adversarial Machine Learning
XQAI methods and quantum adversarial examples [
51] are closely linked in their shared goal of developing robust, transparent, and trustworthy quantum machine learning models. The connection between XQAI and adversarial examples lies in their respective roles in improving the understanding and resilience of quantum models.
XQAI methods seek to offer valuable insights into the complex mechanisms of quantum machine learning models. In contrast, adversarial examples are intentionally crafted inputs designed to mislead a machine learning model into making incorrect predictions or classifications. These examples exploit weaknesses in the model’s decision boundaries, exposing vulnerabilities and demonstrating a lack of robustness. Adversarial examples have been shown to be effective against both quantum classifiers and DNNs [
52,
53,
54,
55,
56].
3.4.1. Feature-Based Explanation Methods
As depicted in
Figure 4, feature-based explanation methods can be used to generate adversarial examples. Experiments in
Section 4 demonstrate that masking approximately 10% of pixels, selected based on feature-based explanation methods, results in a prediction probability smaller than a random guess. This suggests that adversarial examples can be created by targeting key features, as identified by explanation methods.
3.4.2. Example-Based Explanation Methods
While feature-based explanation methods can generate adversarial test examples that deceive models, adversarial training examples can also be crafted to manipulate predictions while remaining visually indistinguishable [
57]. Influence functions can determine how to subtly perturb training data to maximize loss on targeted test examples, leading to flipped predictions.
For a target test data point
, an adversarial version of a training data point
is initialized as
. Small perturbations are added iteratively, informed by the influence function
, to maximize the loss on the test data while preserving its visual appearance:
where
is a small step size and
with
. The model is retrained after each iteration. Though perturbations are small in pixel space, they have a significant impact in feature space.
This method is mathematically equivalent to previous gradient-based attacks on training sets [
53,
54] but requires fewer visually noticeable changes. It demonstrates how minimally perturbed adversarial training examples can be generated to manipulate model predictions on targeted test data. Models that rely heavily on a small number of highly influential data points are most vulnerable to such attacks. Evaluating the extent to which subtly perturbing training examples affect loss on targeted test examples helps gauge model robustness against data poisoning attacks.
The relationship between XQAI and adversarial examples highlights several critical aspects of quantum machine learning models. XQAI can expose weaknesses in a model’s decision-making process, revealing potential vulnerabilities that adversarial examples might exploit. This understanding of model weaknesses allows researchers to develop more robust models capable of resisting such attacks. Additionally, XQAI methods can be employed to detect and assess the impact of adversarial examples on model predictions by analyzing discrepancies in explanations generated for original versus adversarial inputs. These insights can guide the creation of effective defenses against adversarial attacks. Ultimately, ensuring the model’s resilience to adversarial examples while maintaining transparency in its decision-making process is essential for fostering trust in AI systems. XQAI contributes to this trust by offering interpretable insights into the internal workings of models, thereby supporting both robustness and transparency.
4. Experiment
In this section, we report an empirical investigation conducted utilizing XQAI to compare the performance and interpretability of QNNs and DNNs. Specifically, this study integrated explainability into QNNs for multi-class classification tasks using the widely recognized Iris and Digit datasets. The outcomes of the QNNs were juxtaposed with those derived from classical NNs. Models demonstrating exceptional performance were selected, and the results of applying both example-based and feature-based explanation methodologies are presented. To facilitate a quantitative comparison of feature-based explanation techniques, we propose using the probability drop curve and rank correlations as metrics to evaluate the faithfulness of the respective methods.
The QNN architecture used in this experiment comprises multiple layers of single-qubit rotation gates, parameterized by trainable weights, and entangling operators. Entanglement is achieved through two-qubit operations, specifically using CNOT gates. This study compares three layouts of two-qubit operations: nearest-neighbor (NN), circuit-block (CB), and all-to-all (AA) layouts, as illustrated in
Figure 3. With CNOT gates fixed as the entangling operator, we investigated the performance of three Pauli rotation gates (
,
, and
). The results indicated that models employing
gates outperformed those using
gates, while models with
gates failed to learn due to the commutation of
with the CNOT gate. Consequently,
was chosen as the default single-qubit rotation gate. The QNN’s output is a probability distribution over the classes. The first few qubits are measured, with each category corresponding to a distinct computational basis.
For comparison, we also implemented multilayer perceptron (MLP) and convolutional neural network (CNN) models. To ensure a fair comparison, classical feature selection techniques were not applied to reduce the number of features input into the QNN. Both QNNs and classical NNs were trained using the Adam optimizer, with a learning rate of 0.001 and a batch size of 4, for 100 epochs. Alternative numerical methods, such as genetic algorithms [
58], can also be used for optimizing quantum systems.
4.1. Evaluating Explanation Methods on Classical Benchmarks
For the classical datasets, we used the following benchmark datasets:
Iris Dataset: This dataset contains 150 instances, each with four features representing the length and width of the sepal and petal of Iris flowers. There are three classes, each corresponding to a different species of Iris: setosa, versicolor, and virginica.
Digit Dataset: This dataset consists of 1797 instances, where each instance is an 8 × 8 grayscale image of a handwritten digit (0–9). For this experiment, we selected images of {0, 1, 2, 3} to perform a four-class classification task. The images are represented as 64-dimensional feature vectors.
The Iris and Digit datasets are publicly available and can be accessed through the UCI Machine Learning Repository and the Scikit-Learn library, respectively. Both datasets were divided into training (60%) and testing (40%) sets.
Table 3 and
Table 4 provide comparative evaluations of model performance on the Iris and Digit datasets, ranked by test accuracy. The models evaluated include the Hardware Efficient Ansatz (HE), Re-uploading Hardware Efficient Ansatz (Re), MLP, and CNN. Data for QNNs are gate-encoded using
rotation gates. The Hardware Efficient Ansatz is applied to 6-qubit quantum circuits with
. The Re-uploading Hardware Efficient Ansatz is applied to 6-qubit quantum circuits with
and
. “Layout” refers to the layout of two-qubit operations, while “Cost” indicates the cost function employed: cross-entropy (CE) and fidelity distance (FD). “Train Acc” and “Test Acc” represent training and testing accuracies, respectively. Accuracy is calculated as the ratio of correct predictions to the total number of predictions, evaluated separately for the training and test datasets. This metric provides a straightforward measure of model performance across different architectures and datasets. For the QNN and classical NN models with the highest test accuracy on each dataset, we evaluate the quality of the explanations generated by our XQAI framework.
4.1.1. Example-Based Explanation Methods
In the analysis of the Digit dataset, as illustrated in
Figure 5, the relative influence function appears to provide more consistent and meaningful explanations compared to the standard influence function. This is particularly evident in the Re-uploading Hardware Efficient Ansatz, which outperforms the MLP in providing more reliable and interpretable top-5 examples. The superior performance of the relative influence function can be attributed to its ability to capture local influences in the context of the model’s decision-making process, distinguishing between the examples that truly impact the model’s predictions and those that might be misleading.
This result highlights the importance of selecting appropriate example-based methods, especially in quantum models like QNNs, where the decision boundaries may differ significantly from classical models. The ability of the Re-uploading Ansatz to better align its predictions with the test data through example-based methods suggests that quantum models might be more sensitive to specific influential data points, offering opportunities for more robust interpretations of quantum-based decisions.
4.1.2. Feature-Based Explanation Methods
The comparison of the five feature-based explanation methods, as shown in
Figure 6, reveals key differences in their ability to generate interpretable heatmaps. Occlusion analysis, gradient input multiplication, and integrated gradients produce clearer and more detailed heatmaps, successfully identifying the most important regions of the input that influence the model’s predictions. This indicates that these methods are better suited for capturing feature importance in both QNN and classical NN models.
In contrast, the saliency map and SmoothGrad methods generate less interpretable and more diffuse heatmaps, particularly on the Re-uploading Hardware Efficient Ansatz. This observation suggests that gradient-based methods like the saliency map, while computationally efficient, may struggle with noisy or less smooth decision boundaries in QNN models. The use of SmoothGrad, which averages gradients over multiple noisy versions of the input, does not seem to sufficiently address this issue, as the resulting heatmaps remain somewhat unclear.
4.1.3. Probability Drop Curves
The average probability drop curves, as presented in
Figure 7, provide a quantitative assessment of the five feature-based explanation methods. The sharp decline in probability when masking the most important features suggests that all five methods effectively identify influential features. Masking approximately 10% of the features leads to a probability drop that approaches random guessing, underscoring the high sensitivity of both QNNs and classical NNs to a small subset of critical features.
The minimum values of the curves, which are lower than the probability when all features are removed, offer further insights. This phenomenon can be attributed to the fact that the features are ranked by importance, with the most significant ones masked first. Masking these important features leads to a rapid decrease in probability. Conversely, removing the least significant features might sometimes slightly increase the probability, likely due to their negative contribution or noise-like influence. This finding highlights the importance of feature ranking in interpretability and suggests that not all features contribute positively to the model’s decision.
Furthermore, the results demonstrate that QNNs rely more heavily on complete data than MLPs, as indicated by the steeper probability drop when features are randomly removed. This steeper decline suggests that QNNs encode information more holistically, meaning the removal of even a few seemingly less significant features can have a more pronounced impact on the model’s output. The heightened sensitivity in QNNs can be explained by their more refined feature interactions. In QNNs, the relationships between features are often encoded in a complex, interdependent manner due to quantum phenomena like entanglement, which allows QNNs to capture more intricate patterns. While this refined feature interaction gives QNNs an advantage in leveraging all input features for prediction, it also makes them more vulnerable to the removal of any feature, as the intricate connections among features can be disrupted. This behavior suggests both a strength—through more sophisticated data encoding—and a potential limitation, as QNNs may be more susceptible to noise or missing data.
This analysis reinforces the need for robust interpretability methods, particularly for QNNs, where understanding the role of each feature is crucial to the model’s overall performance. Recognizing how QNNs utilize input features differently from classical models like MLPs can guide strategies to enhance their resilience and applicability in real-world scenarios.
5. Conclusions
In this paper, we present an XQAI framework that integrates explainability into QNNs for multi-class classification tasks. We evaluate the performance of QNNs and classical NNs using the Iris and Digit datasets. Our results demonstrate that certain QNN configurations achieve performance that is comparable to or even exceeds that of classical NNs. Through the application of example-based and feature-based explanation methods, we demonstrate the effectiveness of our XQAI framework in providing insights into both QNN and classical NN models. The probability drop curve, proposed as a metric for evaluating the faithfulness of feature-based explanation methods, offers a meaningful way to assess these techniques.
While this work demonstrates that adapting classical XAI techniques for QNNs provides useful insights, there are inherent limitations in applying these methods directly to quantum systems. Quantum-specific phenomena, such as entanglement and superposition, introduce complexities not encountered in classical models. This complicates the interpretation of certain features and requires further adaptation of classical techniques to account for the unique properties of quantum models. Future work should focus on developing XAI methods specifically designed for quantum neural networks, addressing the challenges posed by quantum phenomena.
Future research should explore additional datasets, model architectures, and explanation methods to further validate the efficacy of our XQAI framework. Moreover, the development of new metrics to evaluate the interpretability and faithfulness of explanation methods would advance the field of XQAI, both in quantum and classical machine learning.