1. Introduction
Quantum-inspired machine learning is a new branch of machine learning based on the application of the mathematical formalism of quantum mechanics to devise novel algorithms. It has revealed how such algorithms have the potential to provide benefits in spite of lacking the computational power of quantum computers with several qubits. Some of these binary classifiers have been analyzed from a geometric perspective [
1]. In this work, we implement some algorithms, based on quantum state discrimination, within a local approach in the feature space by taking into account elements close to the element to be classified. In particular, we perform multi-class classification directly (without using binary classifiers) based on Helstrom discrimination following an approach suggested by Blanzieri and Melgani [
2], where an unlabeled data instance is classified by finding its
k nearest training elements before running a support vector machine (SVM) over the
k training elements. This local approach improves the accuracy in classification and motivates the integration with the quantum-inspired Helstrom classifier since the latter can be interpreted as a SVM with linear kernel [
3]. It has the potential to offer comparable performance using less complexity because it uses few training points per test point.
The quantum-inspired classifiers require the encoding of the feature vectors into density operators and methods for estimating the distinguishability of quantum states like the Helstrom state discrimination and the pretty-good measurement (PGM). Quantum-inspired machine learning has revealed how relevant benefits for machine learning problems can be obtained using the quantum information theory even without employing quantum computers [
4]. Moreover, as we will show below, our PGM within our algorithms is more efficient than the one proposed by these authors in the case of multiple preparations in the same state because it removes duplicates and null values in encoding. Quantum-inspired methods are used in applications that solve industry-relevant problems related to finance, optimization and chemistry [
5,
6,
7,
8,
9].
In the experimental part, we present a comparison of the performances of the local quantum-inspired classifiers against well-known classical algorithms in order to show that the local approach can be a valuable tool for increasing the performances of this kind of classifier.
In
Section 2, we review the notion of quantum encoding of data vectors into density operators and quantum-inspired classification based on quantum state discrimination [
10,
11,
12,
13]. In
Section 3, we use the
k-nearest neighbors algorithm (kNN) as a procedure to restrict the training set to the nearest elements around the test elements enabling the local execution of the quantum-inspired classifiers. In
Section 4, we present and discuss some empirical results for evaluating the impact of locality in quantum-inspired classification comparing the performances of the proposed algorithms to classical methods over benchmark datasets. Furthermore, we compare quantum-inspired classifiers with SVMs within the local approach. In
Section 5, there are the concluding remarks about the efficiency of local quantum-inspired classifiers.
2. Quantum-Inspired Classification
The first step of quantum-inspired classification is the
quantum encoding that is any procedure to encode classical information into quantum states. In particular, we consider encoding of data vectors into density matrices on a Hilbert space
whose dimension depends on the dimension of the input space.
Density matrices are positive semidefinite operators
such that
and are the mathematical objects used to describe the physical states of quantum systems.
Pure states are all the density matrices of the form
, with
, which are the rank-1 projectors that can be directly identified with unit vectors up to a phase factor. Let
be a density operator on a
d-dimensional Hilbert space
; it can be written in the following form:
where
are the standard generators of the special unitary group
, also called
generalized Pauli matrices, and
is the
identity matrix. The vector
, with
, is the
Bloch vector associated with
which lies within the hypersphere of radius 1 in
. For
, the qubit case, the density matrices are in bijective correspondence to the points of the Bloch sphere in
, where the pure states are in one-to-one correspondence with the points of the spherical surface. For
, the points contained in the unit hypersphere of
are not in bijective correspondence with density matrices on
, so the Bloch vectors do not form a ball but a complicated convex body. However, any vector within the sphere of radius
gives rise to a density operator [
14].
Complex vectors of dimension
n can be encoded into density matrices of an
-dimensional Hilbert space
in the following way:
where
is the computational basis of
, identified as the standard basis of
. The map defined in (
2), called
amplitude encoding, encodes
into the pure state
where the additional component of
stores the norm of
. Nevertheless the quantum encoding
can be realized in terms of the Bloch vectors
saving space resources. The improvement of memory occupation within the Bloch representation is evident when we take multiple tensor products
of a density matrix
constructing a feature map to enlarge the dimension of the representation space [
1].
Quantum-inspired classifiers are based on quantum encoding of data vectors into density matrices, calculations of centroids and various criteria of quantum state distinguishability such as: the Helstrom state discrimination, the pretty-good measurement [
4,
11] and the geometric construction of a minimum-error measurement [
12]. Let us briefly recall the notion of quantum state discrimination. Given a set of arbitrary quantum states with respective a priori probabilities
, in general there is no measurement process that discriminates the states without errors, i.e., a collection
of positive semidefinite operators such that
, satisfying the following property:
when
for all
. The probability of a successful state discrimination of the states in
R performing the measurement
E is:
A complete characterization of the optimal measurement
that maximizes the probability (
3) for
is due to Helstrom [
10]. Let
be the
Helstrom observable whose positive and negative eigenvalues are, respectively, collected in the sets
and
. Consider the two orthogonal projectors:
where
projects onto the eigenspace of
. The measurement
maximizes the probability (
3) that attains the
Helstrom bound:
Helstrom quantum state discrimination can be used to implement a quantum-inspired binary classifier with promising performances. Let
be a training set with
,
. Assume that, to encode the data points into quantum states by means of
, one can construct the quantum centroids
and
of the two classes
:
Let
be the Helstrom measurement defined by the set
, where the probabilities attached to the centroids are
. The
Helstrom classifier applies the optimal measurement for the discrimination of the two quantum centroids to assign the label
y to a new data instance
, encoded into the state
, as follows:
A strategy to increase the accuracy in classification is given by the construction of the tensor product of
q copies of the quantum centroids
enlarging the Hilbert space where data are encoded. The corresponding Helstrom measurement is
and the Helstrom bound satisfies:
Increasing the dimension of the Hilbert space of the quantum encoding, one increases the Helstrom bound obtaining a more accurate classifier. The corresponding computational cost is evident; however, in the case of real input vectors, the space can be enlarged saving time and space by means of encoding into Bloch vectors.
Clearly, defining a quantum encoding is equivalent to selecting a feature map to represent feature vectors into a space of higher dimension. In the case of the considered quantum amplitude encoding
, the nonlinear explicit injective function
to encode data into Bloch vectors can be defined as follows:
The mapped feature vectors are points on the surface of a hyper-hemisphere, with centroids of the classes, calculated as the means of these feature vectors, inside the hypersphere and can be rescaled to a Bloch vector as shown below.
In order to make the classification more accurate, one can increase the dimension of the representation space providing k copies of the quantum states, in terms of a tensor product, encoding data instances and centroids into density matrices . Bloch encoding allows an efficient implementation of feature maps; by removing null and repeated entries from the Bloch vector we obtain the following injective function for data encoding. Therefore, the Bloch representation allows an efficient storing of redundant elements of density matrices .
Let us consider a training set divided into the classes
; assume we have any training point
encoded into the Bloch vector
of a pure state on
. The calculation of the centroid of the class
, within this quantum encoding, must take into account that the mean of the Bloch vectors
does not represent a density operator in general. In fact, for
the points contained in the unit hypersphere of
are not in bijective correspondence with density matrices on
. However, since any vector within the closed ball of radius
gives rise to a density operator, a centroid can be defined in terms of a meaningful Bloch vector by a rescaling:
A method of quantum state discrimination for distinguishing more than two states
is the square-root measurement, also known as the
pretty-good measurement, defined by:
where
; PGM is the optimal minimum error when states satisfy certain symmetry properties [
11]. Clearly, to distinguish between
n centroids we need a measurement with at most
n outcomes. It is sometimes optimal to avoid measurement and simply guess that the state is the a priori most likely state.
The optimal POVM
for minimum-error state discrimination over
satisfies the following necessary and sufficient Helstrom conditions [
12]:
where the Hermitian operator, also known as the
Lagrange operator, is defined by
. It is also useful to consider the following properties which can be obtained from the above conditions:
For each
i the operator
can have two, one or no zero eigenvalues, corresponding to the zero operator, a rank-one operator, and a positive-definite operator, respectively. In the first case, we use the measurement
for some
i where
, i.e., the state belongs to the a priori most likely class. In the second case, if
, it is a weighted projector onto the corresponding eigenstate. In the latter case, it follows that
for every optimal measurement.
Given the following Bloch representations:
in order to determine the Lagrange operator in
we need
independent linear constraints:
A measurement with more than
outcomes can always be decomposed as a probabilistic mixture of measurements with at most
outcomes. Therefore, if the number of classes is greater than or equal to
and we get
linearly independent equations, we construct the Lagrange operator and derive the optimal measurements. From the geometric point of view, we obtain the unit vectors corresponding to the rank-1 projectors
where
giving the POVM of the measurement. It is also possible to further partition the classes in order to increase the number of centroids and of the corresponding equations. The classification is carried out in this way: an unlabeled point
is associated with the first label
y such that
, where
.
3. Local Quantum-Inspired Classifiers
In the implementation, we consider the execution of the classifiers described above after a selection of the k training elements that are closest to a considered unclassified instance.
The k-nearest neighbors algorithm (kNN) is a simple classification algorithm which consists of the following steps:
The computation of the chosen distance metric between the test element and the training elements;
The extraction of the k elements closest to the test instance;
The assignment of the class label through a majority voting based on the labels of the k nearest neighbors.
In the following, we apply the kNN for the extraction of the closest elements to the test element then the classification is performed by a quantum-inspired algorithm instead of majority voting. On the one hand, given a test element, the kNN can be executed over the data vectors in the input space, e.g., considering the Euclidean distance, then the
k neighbors can be encoded into density matrices and used for a quantum-inspired classification. On the other hand, the entire dataset can be encoded into density matrices and the kNN selects the
k neighbors evaluating an operator distance among quantum states. In the latter case, we consider the
Bures distance that is a quantum generalization of the Fisher information and a distance derived by the super-fidelity. The Bures distance is defined by:
where the fidelity between density operators is given by
. Let us note that the fidelity reduces to
when
. Therefore the Bures distance between the pure state
and the arbitrary state
can be expressed in term of the Bloch representation as follows:
where
and
are the Bloch vectors of
and
, respectively, and
d is the dimension of the Hilbert space of the quantum encoding. The special form (
17) of the Bures distance, expressed in terms of Bloch vectors, is relevant for our purpose because data vectors can be encoded into pure states and in general quantum centroids are mixed states.
An alternative distance can be defined via super-fidelity [
15]
where the super-fidelity between density operators is given by
Notice that the super-fidelity reduces to
when
. This distance can be expressed in term of the Bloch representation as follows:
where
and
are the Bloch vectors of
and
, respectively, and
d is the dimension of the Hilbert space of the quantum encoding. The inner distance between the corresponding Bloch vectors represents the angle
between the unit vectors
and
, which is normalized to be 1:
For pure states the inner distance corresponds to the Fubini-Study distance.
In Algorithm 1, the locality is imposed by running the kNN on the input space finding the training vectors that are closest to the test element; then there is the quantum encoding into pure states and a quantum-inspired classifier (Helstrom, PGM or geometric Helstrom) is locally executed over the restricted training set. In Algorithm 2, the test element and all the training elements are encoded into Bloch vectors of pure states then a kNN is run w.r.t. the Bures distance to find the nearest neighbors in the space of the quantum representation; then a quantum-inspired classifier is executed with the training instances corresponding to the closest quantum states.
Algorithm 1 Local quantum-inspired classification based on kNN in the input space before the quantum encoding. The distance can be: Euclidean, Manhattan, Chessboard, Canberra or Bray–Curtis. |
Require: Dataset X of labeled instances, unlabeled point Ensure: Label of
find the k nearest neighbors to in X w.r.t. the Euclidean distance
encode into a pure state
for
do
encode into a pure state
end for
run the quantum-inspired classifier with training points encoded into .
|
Algorithm 2 Local quantum-inspired classification based on kNN in the Bloch representation after quantum encoding. The distance can be: Bures, Super-Fidelity or Inner. |
Require: Dataset X of labeled instances, unlabeled point Ensure: Label of
encode into a Bloch vector of a pure state
for
do
encode into a Bloch vector of a pure state
end for
find the k nearest neighbors to in w.r.t. the distance
run the quantum-inspired classifier over the k nearest neighbors.
|
A local quantum-inspired classifier can be defined without quantum state discrimination but considering a
nearest mean classification such as the following: after the quantum encoding we perform a kNN selection and calculate the centroid of each class considering only the nearest neighbors to the test element, finally we assign the label according to the nearest centroid as schematized in Algorithm 3.
Algorithm 3 Local quantum-inspired nearest mean classifier. |
Require: Training set X divided into n classes , unlabeled point Ensure: Label of
encode into a Bloch vector of a pure state
for
do
encode into a Bloch vector of a pure state
end for
find the neighborhood of w.r.t. the distance
for
do
construct the centroid where
end for
find the closest centroid to w.r.t. the distance
return label of the class |
4. Results and Discussion
In this section, we present some numerical results obtained by the implementation of the local quantum-inspired classifiers with several distances compared to well-known classical algorithms. In particular, we consider the SVM with different kernels: linear, radial basis function and sigmoid. Then, we run a random forest, a naive Bayes classifier and the logistic regression. In order to compare the results with previous papers, we take into account the following benchmark datasets from the PMLB public repository [
16]: analcatdata_aids, analcatdata_asbestos, analcatdata_bankruptcy, analcatdata_boxing1, analcatdata_cyyoung9302, analcatdata_dmft, analcatdata_happiness, analcatdata_japansolvent, analcatdata_lawsuit, appendicitis, biomed, breast_cancer, iris, labor, new_thyroid, phoneme, prnn_fglass, prnn_synth, tae and wine_recognition. For each dataset we randomly select
of the data to create a training set and use the residual
for the evaluation. We repeated the same procedure 10 times and calculated the average accuracy in
Table 1. Certainly, it is possible to compare the performances based on different statistic indices including Matthews correlation coefficient, F-measure and Cohen’s parameter.
We observe that the performances of the local quantum-inspired classifiers turn out to be definitely more accurate, where the hyperparameter
k is set equal to the number of classes in the dataset. This value is reasonable to construct the centroids of the classes. In particular, Algorithm 1 with the Euclidean distance is the most accurate classifier for the datasets analcatdata_boxing1, analcatdata_happiness, biomed, prnn_fglass and wine_recognition, while the Manhattan distance is best for analcatdata_aids, analcatdata_japansolvent, breast_cancer, iris and tae, the Chessboard distance is best for
analcatdata_cyyoung9302 and analcatdata_lawsuit, and the Bray–Curtis distance is best for analcatdata_bankruptcy and appendicitis. Algorithm 2 with the Bures distance outperforms Algorithm 1 and 3 for analcatdata_dmft and produces the same accuracy for labor. Algorithm 3 with the Bures distance is the most accurate classifier for analcatdata_asbestos, new_thyroid, phoneme and prnn_synth. Algorithm 1 uses a k-d tree in the training set, while the other two use a k-d tree in the corresponding Bloch vector space. The time complexity to construct the k-d tree is usually
, where
n is the cardinality of the training set and
d the length of each vector, while the space complexity is
. The query to find the
k nearest neighbors takes
. The time complexity of PGM is
and is
for the classification of the
m elements of the test set in
c classes. Our algorithm is more efficient than the one presented in [
4] in the presence of multiple copies because it remove nulls and duplicates. In particular, we consider only 20 values instead of 81 matrix elements of
, 51 values instead of 729 for
and so on. In a future paper, we will analyze in detail the complexity of such algorithms in the average case and in the worst case. For instance, one can construct the ball tree for clustered data instead of the k-d tree and consider different search techniques.
In
Table 2, we show the methods that provided the best accuracy, with the respective execution times, compared with the classical method. These experimental results are promising and show that the methods are efficient when run on classical computers. Algorithm 3 with the Bures distance is not efficient for phoneme, but Algorithm 1 with the Euclidean distance is:
s with an average accuracy of
. We will study in a future work how to also apply the local methods in implementations on quantum computers.
Let us focus on multi-class datasets for the comparison with the kNNSVM method proposed by Blanzieri and Melgani [
2]. This method requires the choice of the hyperparameter
k, and as is well known from the standard kNN algorithm, there is no general strategy to choose
k a priori. In
Table 3, the results obtained for some
k values of the kNNSVM are shown. For
analcatdata_dmft, kNNSVM presents an average accuracy that is only 2% lower than Algorithm 2 but requires 17 elements per test element instead of 6. For
analcatdata_happiness, kNNSVM yields an average accuracy that is 10% lower than Algorithm 1 and requires 14 elements per test element instead of 3. However, kNNSVM outperforms local quantum-inspired classifiers for
iris and
tae, but only for the latter requires fewer elements, while for
wine_recognition they are comparable. For
new_thyroid and prnn_fglass, the best results are obtained with the nearest neighbor method, but with lower accuracy than Algorithms 1 and 3, respectively.