Next Article in Journal
Security Evaluation of Companion Android Applications in IoT: The Case of Smart Security Devices
Previous Article in Journal
A Survey on Multi-Sensor Fusion Perimeter Intrusion Detection in High-Speed Railways
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Method Based on Locally Optimal Step Length in Accelerated Gradient Descent for Quantum State Tomography

Department of Management & Innovation Systems, University of Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, SA, Italy
*
Author to whom correspondence should be addressed.
Sensors 2024, 24(17), 5464; https://doi.org/10.3390/s24175464
Submission received: 15 July 2024 / Revised: 19 August 2024 / Accepted: 21 August 2024 / Published: 23 August 2024

Abstract

:
Quantum state tomography (QST) is one of the key steps in determining the state of the quantum system, which is essential for understanding and controlling it. With statistical data from measurements and Positive Operator-Valued Measures (POVMs), the goal of QST is to find a density operator that best fits the measurement data. Several optimization-based methods have been proposed for QST, and one of the most successful approaches is based on Accelerated Gradient Descent (AGD) with fixed step length. While AGD with fixed step size is easy to implement, it is computationally inefficient when the computational time required to calculate the gradient is high. In this paper, we propose a new optimal method for step-length adaptation, which results in a much faster version of AGD for QST. Numerical results confirm that the proposed method is much more time-efficient than other similar methods due to the optimized step size.

1. Introduction

Quantum physics emerged from Albert Einstein’s efforts to explain the “Photoelectric effect”, which suggested that light can behave like a particle. Other scientists explored the alternative idea that particles such as electrons can behave like waves [1], and this wave-like behavior of particles was later mathematically formulated by Erwin Schrödinger. Schrödinger’s equation provides a theoretical foundation for quantum mechanics, but when it comes to making measurements and interpreting experimental data, statistical tools are essential. One of these tools is quantum state tomography (QST), which is briefly explained in the next section.
In quantum computing and quantum information theory, QST is one crucial step in determining the state of a quantum system, which is essential for understanding and controlling a given quantum system. It uses many identical particles, each measured in a slightly different way. By piecing together these measurements, scientists can build a picture of the original quantum state. In QST, projections are the results of measurements on the quantum system, expressed as probabilities and expectation values.
It is worth noting that QST is a general concept that can be applied to any quantum system, including digital quantum computers [2] and analog quantum simulators (computers) [3]. In [3], a tomography approach (described in terms of Positive Operator-Valued Measures (POVMs) formalism) that is implementable on analog quantum simulators, including ultra-cold atoms, is proposed. Additionally, since quantum sensing and imaging technologies have a lot of exciting applications in optical measurements, using entanglement with applications in quantum lithography [4], one of the applications of QST is in quantum sensing [5].
The QST problem can be formulated as a smooth optimization problem, and as a result, gradient-based methods [6] can be used to solve the problem. If we denote the value of the objective function of the optimization (minimization) problem at point x by f ( x ) , the gradient descent uses x k + 1 = x k ε f ( x k ) , where f ( x k ) is the gradient of f at x k , which is a vector consisting of the first-order partial derivatives of f with respect to the decision variables, and ε is a small number and represents the step size. However, while gradient-based methods are easy to implement and highly scalable, they are very slow to converge.
Another approach is to generalize the iteration procedure by using x k + 1 = x k ε D f ( x k ) , where D is a square matrix. Note that if we set D to be equal to the identity matrix then we will arrive at x k + 1 = x k ε f ( x ) , as in gradient descent.
If we set D = ( 2 f x ) 1 , then we will have the Newton method [6], which is very fast to converge in terms of the number of iterations, but ( 2 f x ) 1 is not only challenging and time-consuming to compute, but also requires a lot of memory (RAM). Now, one solution is to use less idealistic but more practical choices of D . For instance, L-BFGS [7] tries to directly approximate the vector ( 2 f x ) 1 f ( x ) using a limited number of previous gradients stored in RAM. However, another class of methods that are less demanding in terms of memory (RAM) usage are Accelerated Gradient Descent (AGD) methods [8,9,10], which are based on running two iterative procedures simultaneously. The advantages of this method are its ease of implementation, relatively fast convergence, and being less memory-intensive compared to L-BFGS.
To solve the QST problem, a solver based on Accelerated Gradient Descent (AGD) followed by the singular value decomposition (SVD) projection step is proposed in [2]. A MATLAB implementation is available on GitHub [11]. In [12], compressed sensing (CS) is proposed for QST. In [13], AGD is applied again; SVD projection is bypassed by introducing a non-convex programming formulation of the QST problem, and the Python code is made available [14]. A neural network-based method is presented in [15], and the code is available on GitHub [16]. In [17], a combination of CS and projected least squares (PLS) is proposed, and the code is available in [18]. In [19], attention mechanisms are used in neural networks on informationally complete POVMs. For digital quantum computers, the complexity of tomography scales exponentially with the number of qubits. To address the challenge of exponential scaling, in [20], POVMs are approximated with low-rank approximation (rank-1 projectors of the K = 6 eigenstates of the Pauli matrices), which is similar to the approach that we use for generating simulated data to validate the proposed method. The authors in [5] performed QST on pure tomography sensing using singular value decomposition (SVD) techniques for a 2-qubit system to come up with a robust method.
In this paper, we propose a new method based on the modification of AGD for a non-convex programming formulation. In our approach, one of two step sizes is adaptively chosen which increases the speed of convergence, as demonstrated in the numerical evaluations carried out. A comparison of similarities and differences between the proposed method and the top two most similar methods is presented in Table 1. The proposed method combines Momentum-inspired Factorized Gradient Descent (MiFGD) with an adaptive step length. The novel contribution of the proposed method is the introduction of a closed-form solution for locally optimal step length in AGD, significantly enhancing the convergence speed of the original MiFGD.

2. Introduction to Quantum Tomography

Quantum physics began with Albert Einstein’s attempts to explain the “Photoelectric effect”. Einstein’s theory suggested that the energy carried by each single energy packet (quantum) of light can be computed as follows:
E = h f
where f is the frequency of the light and h is Planck’s constant. In 1924, Louis de Broglie suggested that particles such as electrons can behave like waves [1], and the equation for such wavy behavior was then proposed by Erwin Schrödinger:
i h Ψ ( x , t ) t = h 2 2 m 2 x 2 + V ( x , t ) Ψ ( x , t )
for matter particles with mass m , where V ( x , t ) is the potential. Indeed, physical quantities that physicists are interested in measuring, such as position, energy, momentum, and spin, are represented by Hermitian operators (acting on a Hilbert space), which are called observables. Since these operators are Hermitian, their eigenvalues are real, corresponding to the possible measurement outcomes of their corresponding observable. For example, the eigenvalues of the Hamiltonian operator H ^ = h 2 2 m 2 x 2 + V ( x , t ) are real numbers that correspond to the energy of the particle.
Indeed, it can be shown that if E n is an eigenvalue with the corresponding eigenfunction ψ E n x , which is to say that H ^ ψ E n x = E n ψ E n x , then Ψ n x , t = e i E n t / h ψ E n x is a solution for (2), which is called a wave function.
It can be shown that with some assumptions in Ket–Bra notation, we have the following:
Ψ n x , t | Ψ m x , t = 0 ,   m n
Ψ m x , t | Ψ m x , t = 1
Also, the general solution of (2) is the following:
Ψ x , t = n = 0 C n Ψ n ( x , t )
where C 1 , C 2 , C 3 , . are complex numbers.
Now, it is easy to see that
Ψ x , t | Ψ x , t = C 1 2 + C 2 2 + C 3 2 +
It is assumed that C 1 2 + C 2 2 + C 3 2 + = 1 .
Max Born (1882–1970) suggested that the set of possible outcomes is exclusively restricted to Ψ 1 x , t , Ψ 2 x , t , Ψ 3 x , t , but with different probabilities. These probabilities are proportional to C 1 2 , C 2 2 , C 3 2 , ….
Now, if we define P m = Ψ m × Ψ m , then it is easy to see that
P m | Ψ   = C m | Ψ m
Equation (7) means that P m projects | Ψ onto | Ψ m , and for this reason P m is called a projector.
Another property which immediately follows from (5) and (7) is the following:
n = 0 P n | Ψ = n = 0 P n | Ψ = n = 0 C n | Ψ n = | Ψ
which says that n = 0 P n = I , where I is the identity operator.
Also, note that P n is a positive semi-definite (PSD) matrix.
Now, if we define ϱ = Ψ × Ψ , then it is obvious that ϱ is PSD and
ϱ = Ψ × Ψ = n = 0 C n | Ψ n   m = 0 C m * Ψ m |
As a result, ϱ i j = C i C j , and therefore t r a c e ( ϱ ) = C 1 2 + C 2 2 + C 3 2 + … = 1.
Now, if y m denotes the probability of outcome m , then
y m = C m C m * = t r a c e ϱ P m
For this reason, ϱ is called a density operator. The projection operators P 1 , P 2 , P 3 , can be generalized to PSD matrices M 1 , M 2 , M 3 , , such that n = 0 M n = I . In this case, they are referred to as POVMs.
Now, given the statistical data y 1 , , y m and the POVMs M 1 , M 2 , M 3 , , the goal is to find a PSD density operator ϱ that best fits the measurement data y 1 , , y m .
In other words, we want to solve the following optimization problem:
min i y i t r a c e ϱ M i 2     s . t   t r a c e ϱ = 1   &   ϱ   i s   P S D
Optimization (11) is called quantum state tomography (QST) [2]. In quantum computing and quantum information theory, quantum state tomography (QST) is one of the key steps in determining the state of the quantum system, which is crucial as this is key to understanding and controlling the quantum system. The term “tomography” comes from the ancient Greek words “tomos” (“slice” or “section”) and “grapho” (“to draw” or “to write”) [21]. Regular X-ray tomography (such as CT scans in medicine) performs a 3D reconstruction of an object (which cannot be directly seen) by piecing together 2D X-ray images (measurements) taken from different angles or sections to build up a 3D picture of the original unknown object in the body. QST, to a large extent, is similar, but instead of 2D X-ray images (2D X-ray projections) taken from different angles or perspectives, QST uses multiple measurements on many identical copies of the same quantum system, each measured in a slightly different way (by projecting the state onto various bases). The outcomes of these measurements are gathered as statistical data. Again, like X-ray tomography, in which complex algorithms are used to process 2D X-ray data to reconstruct a 3D image in QST, mathematical optimization based on linear algebra, like (11), is used to reconstruct the density matrix ϱ (which fully describes the quantum state). Having gathered the statistical data y 1 , , y m , what these optimization techniques (for instance, least squares fitting) do is to fit a model (the density matrix ϱ ) that best fits the measurement data y 1 , , y m . Here, we must deal with statistical noise from quantum measurements and reconstruct the most probable quantum state, despite measurement imperfections. The main difference here is that while 3D reconstruction in X-ray tomography can be undertaken via a series of measurements on the same (classical) object, in the case of a single quantum particle, measurement perturbs the state of the quantum particle, often making its further investigation uninformative [21]. We use a source that creates many identical particles in the same unknown quantum state. So, tomography uses many identical particles, each measured in a slightly different way. By piecing together these measurements, scientists can build a picture of the original quantum state.

3. Proposed Adaptive Method for QST

As problem (11) does not have a closed-form solution, we need to use an iterative approach to solve it.
In [13], problem (11) is formulated as follows:
m i n i y i t r a c e U U M i 2     s . t   U 2 = 1 ,     U C d × r
in which the matrix U is assumed to be low rank. Also, U is the Frobenius norm of U , which means that
U 2 = j i U i j 2 = j i U i j U i j *        
Then, an Accelerated Gradient Descent (AGD) method called MiFGD is proposed [8], as follows:
U k + 1 = Z k η i t r a c e Z k Z k M i y i M i Z k
Z k + 1 = U k + 1 + μ ( U k + 1 U k )
in which η and μ are two fixed step sizes.
In this paper, we present a method for finding a closed-form solution for the greedy optimal choice of η , which means choosing η in a way that results in the biggest decrease in the objective function in that iteration. A description of the proposed method is given below and in Figure 1.
Here, it is worth mentioning that we keep the other step size μ constant, which is set to μ = 0.95   i n o u r e x p e r i m e n t s .
For this reason, in two consecutive steps, there is a possibility that while η is chosen to result in a lower error in the next iteration, the step size μ will remain out of our control and may push the iterative approach away from what we hope to see for the next run. However, the method still performs much better than MiFGD overall. This is something that we will verify in our experiments (see Section 4).
Let us start with rewriting (15) as follows:
Z k + 1 = U k + 1 + μ U k + 1 U k = 1 + μ U k + 1 μ U k
Now, if we plug in U k + 1 from (14), we arrive at the following:
Z k + 1 = T k + η S k
in which
S k = 1 + μ i t r a c e Z k Z k M i y i M i Z k
and
T k = 1 + μ Z k μ U k
As we want Z k + 1 to be normalized, we need to minimize
i y i t r a c e Z k + 1 Z k + 1 Z k + 1 Z k + 1 M i 2
We consider that
a r g m i n i y i t r a c e Z k + 1 Z k + 1 Z k + 1 Z k + 1 M i 2 = a r g m i n i y i Z k + 1 2 t r a c e Z k + 1 Z k + 1 M i 2    
It can be obtained that
Z k + 1 2 = T k 2 + η 2 S k 2 + 2 η j i   T i j , k   S i j , k *
Also, note that T k 2 , S k 2 and j i T i j , k S i j , k are real numbers independent of η ; therefore, if we set a k = T k 2 , b k = S k 2 , and c k = j i T i j , k S i j , k , then
Z k + 1 2 = a k + b k η 2 + 2 c k η
Similarly, we have
t r a c e Z k + 1 Z k + 1 M i = t r a c e T k T k M i + η 2 t r a c e S k S k M i + 2 η t r a c e T k S k M i
Now, if we define vectors A , B , C and Y to be vectors whose i ’th entries are t r a c e T k T k M i , t r a c e S k S k M i , t r a c e T k S k M i and y i , respectively, then from (20) and (23), we have
a r g m i n i y i t r a c e Z k + 1 Z k + 1 Z k + 1 z k + 1 M i 2 = a r g m i n Y Z k + 1 2 A η 2 B 2 η C 2 = a r g m i n Y Z k + 1 2 A η 2 B 2 η C T Y Z k + 1 2 A η 2 B 2 η C = Y T Y Z k + 1 4 2 Y T A Z k + 1 2 2 η 2 Y T B Z k + 1 2 4 η Y T C Z k + 1 2 + A T A + 2 η 2 A T B + 4 η A T C + η 4 B T B + 4 η 3 B T C + 4 η 2 C T C
It is worth noting that while A , B , C and Y are vectors, Y T Y , Y T A , Y T B , Y T C , A T A   , A T B , A T C , B T B , B T C and C T C are all real numbers independent of η .
Now, from (22) and (24), we can conclude that we have to minimize the following 4th-order polynomial with respect to η :
Y T Y a 2 2 Y T A a + A T A + η 4 Y T Y a c 4 Y T C a 4 Y T A c + 4 A T C + η 2 4 C T C + 2 A T B 2 Y T B a 2 Y T A b 8 Y T C c + 4 Y T Y c 2 + 2 Y T Y a b + η 3 4 Y T Y b c 4 Y T C b 4 Y T B c + 4 B T C + η 4 Y T Y b 2 2 Y T B b + B T B
This attains a minimum where the derivative with respect to η is zero:
4 Y T Y a c 4 Y T C a 4 Y T A c + 4 A T C + η 8 C T C + 4 A T B 4 Y T B a 4 Y T A b 16 Y T C c + 8 Y T Y c 2 + 4 Y T Y a b + η 2 12 Y T Y b c 12 Y T C b 12 Y T B c + 12 B T C + η 3 4 Y T Y b 2 8 Y T B b + 4 B T B = 0
Now, by solving (26), which is a humble cubic equation with respect to η , the optimal step size for the k th step is obtained.
Note that computing A ,   B , and C at each step is the most time-consuming operation, which makes the proposed method almost three times slower at each iteration compared to MiFGD, but as we choose a much more optimized step size, our approach becomes faster overall. The numerical results are presented in Section 5.

4. Numerical Results

While what we have explained in the previous section applies to general quantum systems, in this section, we apply it to multi-qubit spin-half systems.
Suppose that we have a particle that only has two states, and the state of the particle has two entries:
ψ x 1 ψ x 2 =   α β
where α and β are two complex numbers and α 2 and β 2 are the probabilities of being at each of the two states. The discretization of the Hamiltonian is a 2 by 2 matrix. On the other hand, since the Hamiltonian as an operator is expected to be Hermitian, which guarantees that the eigenvalues are real (and as a result, they correspond to the possible measurement outcomes of their corresponding observable), physicists investigate the space of all possible 2 by 2 Hamiltonians that satisfy the following equations:
H 11 * = H 11
H 12 * = H 21
H 22 * = H 22
As a result, both H 11 and H 22 should be real numbers, and H 21 should be the conjugate of H 12 . After straightforward calculation, physicists arrive at the following as the general formula for the Hamiltonian in the aforementioned binary state.
H = 1 2 I + r x σ 1 + r y σ 2 + r z σ 3  
in which
σ 1 = 0 1 1 0 , σ 2 = 0 i i 0 , σ 3 = 1 0 0 1
which are called Pauli matrices.
Additionally, the following matrices can be defined:
S x = h 2 σ 1 , S y = h 2 σ 2 , S z = h 2 σ 3
These satisfy the following properties:
S x S y S y S x = i h S z
S y S z S z S y = i h S x
S z S x S x S z = i h S y
The above three equalities remind physicists of angular momentum. For this reason, they relate them to the spin system. They call the system “spin-1/2” because of the coefficient h 2 (in Equation (33)), which is the component of angular momentum. These spin-1/2 systems are interesting since they have two states and are the building blocks for qubits (in digital quantum computers).
Now, if we assume that, for example, the density operator in the case of a single qubit is
ϱ = 1 2 I + r x σ 1 + r y σ 2 + r z σ 3
then
t r a c e ϱ σ 1 = 1 2 t r a c e σ 1 + r x σ 1 σ 1 + r y σ 2 σ 1 + r z σ 3 σ 1 = 1 2 t r a c e r x I = r x
and from (10) we can calculate r x from our measurement data. Similarly, we can calculate from our measurement data the other coefficients, r y and r z , which allows us to identify the density matrix ϱ . This is the simplest form of QST. But for more complicated cases (multi-qubit systems), more has to be undertaken.
It is easy to see that
σ 3 1 0 = h 2 1 0
σ 3 0 1 = h 2 0 1
Here, 1 0 is called spin-up and is denoted by | . Also, 0 1 is called spin down and is denoted by | .
Therefore, Equations (39) and (40) are sometimes expressed as follows:
σ 3 | = h 2 |
σ 3 | = h 2 |
which indicates that the eigenstates of σ 3 are | and | , with the corresponding eigenvalues h 2 and h 2 , respectively. Also, it can be seen that the following two vectors are the eigenvectors for σ 1 :
| x , = 1 2 | + |
| x , = 1 2 | |
and the following two are the eigenvectors for σ 2 :
| y , = 1 2 | + i |
| y , = 1 2 | i |
Furthermore, the real three-dimensional space that we live in relates to the two-dimensional complex vector space within which a qubit sits by rewriting Equation (27) as follows. Suppose that
ψ = α | + β   |
where α 2 + β 2 = 1 . Using polar coordinates, we have
ψ = r 1 e i ϕ 1 |   + r 2 e i ϕ 2   |   = e i ϕ 1 r 1 |   + r 2 e i ( ϕ 2 ϕ 1 )   |   = e i ϕ 1 r 1 |   + r 2 e i ϕ   |
But since r 1 2 + r 2 2 = 1 , we can write r 1 = cos θ 2 , r 2 = sin θ 2 , and therefore
ψ = e i ϕ 1 cos θ 2 |   + sin θ 2 e i ϕ   |
And if we ignore e i ϕ 1 , we end up with the following:
ψ = cos θ 2 |   + sin θ 2 e i ϕ |
which is called the Bloch sphere representation of the quantum state.
The Bloch sphere representation (50) is important as it helps us to figure out the state of a qubit on the surface of a unit sphere in real 3D space rather than complex 2D space. Now, in this coordinate system state, “z” (when θ = 0 ) is equivalent to | and “−z” (when θ = π , ϕ = 0) is equivalent to | , which means that while in complex vector space | and | are orthogonal states, once we represent them on the Block sphere they are antipodal states that point in opposite directions. Indeed, angle θ is the angle between the unit vector representing the state of a qubit in 3D space and the axis z, and angle ϕ is the angle between the projection of the unit vector on the x-y plane and the x axis. Now, θ = π 2 and ϕ = 0 are equivalent to | x , = 1 2 | + | , and if θ = π 2 and ϕ = π , we arrive at | x , = 1 2 | | , and the last two vectors lie on the x and –x axes, respectively. Also, ( θ , ϕ ) = ( π 2 ,   π 2 ) corresponds to | y , = 1 2 | + i | , which is in the direction of the y axis and, ( θ , ϕ ) = ( π 2 , π 2 ) corresponds to | y , = 1 2 | i | , which is in the direction of the –y axis. Moreover, as we have
| | + | | = 1 0 0 1 = I
| x , x , | + | x , x , | = 1 0 0 1 = I
| y , y , | + | y , y , | = 1 0 0 1 = I
in [13], the following three bases are used for QST:
B 1 = | x , , | x ,
B 2 = | y , , y , |
B 3 = | , |
However, in [11], more complicated bases are introduced, as follows:
B 1 = W 1 T , W 2 T
B 2 = W 3 T , W 4 T
B 3 = W 5 T , W 6 T
where W 1 , , W 6 are the rows of the following matrix:
W = cos θ / 2 isin θ / 2 isin θ / 2 cos θ / 2 cos θ / 2 sin θ / 2 sin θ / 2 cos θ / 2 cos θ / 2 isin θ / 2 0 0 cos θ / 2 + isin θ / 2
Now, again, similar to (51)–(53), we have
W 1 T W 1 + W 2 T W 2 = I
W 3 T W 3 + W 4 T W 4 = I  
W 5 T W 5 + W 6 T W 6 = I  
To obtain the above formulas, remember that, for example, what we mean by W 1 T is cos θ / 2 isin θ / 2 , not cos θ / 2 isin θ / 2 .
We create our data according to the procedure explained in [11]. As mentioned in [2], the reason for choosing this kind of procedure is that it allows for applications in which the measurement matrices are ill conditioned [22,23,24].
To be more specific, the proposed procedure selects (with repetitions) n rows of matrix W in 6 n different ways, and for each of these choices we compute the tensor products of the selected rows. Then, we come up with s = 6 n different row vectors, A 1 , , A s , each of which with d = 2 n entries. Now, for an n -qubit system, the POVMs are M 1 = A 1 T A 1 , M 2 = A 2 T A 2 , , M s = A s T A s . Also, a random PSD matrix ϱ is created and the simulated generated data are constructed by the relation y i = t r a c e ϱ M i + ε i , where ε i is a Gaussian noise term. Now, given the POVMs M 1 = A 1 T A 1 , M 2 = A 2 T A 2 , , M s = A s T A s and the simulated measurement data y 1 , , y s , the QST algorithm should be able to find a PSD matrix ϱ such that y i t r a c e ϱ M i , for which we solve (12) using the proposed method. As our simulated data are constructed by y i = t r a c e ϱ M i + ε i , we expect that the optimal value of the problem (12) will be i ε i 2 , which is around 0.003 in all our experiments. So, we can approximately measure the optimality gap.
As an illustrative example of how POVMs are created, let us take n = 3. Also, suppose that θ = π 2 . Therefore, the matrix W is as follows:
W = 1 2 1 i i 1 1 1 1 1 1 i 0 0 1 + i
It is easy to check that (61)–(63) are verified.
W 1 T W 1 + W 2 T W 2 = 1 2 1 i × 1 2 1 i + 1 2 i 1 × 1 2 i 1 = 1 0 0 1 = I
W 3 T W 3 + W 4 T W 4 = 1 2 1 1 × 1 2 1 1 + 1 2 1 1 × 1 2 1 1 = 1 0 0 1 = I
W 5 T W 5 + W 6 T W 6 = 1 2 1 + i 0 × 1 2 1 i 0 + 1 2 0 1 i × 1 2 0 1 + i = 1 0 0 1 = I
Now, we have s = 6 3 different ways to select (with repetitions) 3 rows of matrix W , which we denote by the following 216 rows:
1 1 1 1 1 6 1 2 1 1 2 6 1 6 1 1 6 6 6 6 1 6 6 2 6 6 6
Then, for example, for the 9th row, which is [1 2 3], we have to pick the corresponding rows in W, which are
W 1 = 1 2 1 i
W 2 = 1 2 i 1
W 3 = 1 2 1 1
Now, as the tensor product of two row vectors is defined as
a b = a 1 a n b 1 b n = [ a 1 b 1 a 1 b n a n b 1 a n b n ]
we have
W 1 W 2 = 1 2 1 i i 1 = 1 2 [ i   1 1 i ]
and therefore
A 10 = W 1 W 2 W 3 = 1 2 2 i   1 1 i 1 1 = 1 2 2 [ i   i   1 1 1   1 i   i ]
In a similar way, A 1 , , A 216 are constructed. Now, for a 3 -qubit system, the POVMs are M 1 = A 1 T A 1 , M 2 = A 2 T A 2 , , M 216 = A 216 T A 216 , consisting of 216 matrices of dimensions of 8 by 8.
We apply the proposed method on a n-qubit system, where in our case n = 6, 7, 8 each with θ = π 3   o r   θ = π 2 and rank( U ) = 10 or rank( U ) = d.
As a result, we will report the comparison of our method with MiFGD for 12 different configurations.
For n = 6, we compare the performance of the two algorithms for four different configurations, ( θ , r a n k ) = ( π 2 , d ) , ( θ , r a n k ) = ( π 2 , 10 ) , ( θ , r a n k ) = ( π 3 , d ) , and ( θ , r a n k ) = ( π 3 , 10 ) , as shown in Figure 2, Figure 3, Figure 4 and Figure 5. As it can be seen, our algorithm reaches the same accuracy of MiFGD approximately 25 times faster.
To be more specific, for instance, in the first configuration (as seen in Figure 2), after 3480 iterations and spending 318 s, MiFGD reaches a normalized error of 0.0082, while our method after just 61 iterations in 15 s reaches a normalized error of 0.0080.
As can be seen, the proposed method takes 0.24 s per iteration, while MiFGD takes 0.09 s, which means that the proposed method is 2.7 times slower per iteration, but overall much faster thanks to a very well-optimized step length.
A summary of the results is presented in Table 2. As it can be seen, although the elapsed time per iteration when using the proposed method is higher due to the fact that the number of expensive operations (computing t r a c e T k T k M i , t r a c e S k S k M i and t r a c e T k S k M i ) is tripled compared to MiFGD (in which the only expensive operation is the calculation of t r a c e Z k Z k M i ), both the number of iterations and the elapsed time have decreased in the proposed method.
In the case of n = 7, for the first configuration, MiFGD performed 3229 iterations in 3488 s, which is approximately 1 s per iteration, while the proposed method completed 74 iterations in 225 s, which is 3 s per iteration. However, overall, it performed much better, as can be seen from Figure 6, Figure 7, Figure 8 and Figure 9. Indeed, in the proposed method, in each step, we have to perform three expensive operations to calculate t r a c e T k T k M i , t r a c e S k S k M i and t r a c e T k S k M i (as opposed to MiFGD, in which the only expensive operation is the calculation of t r a c e Z k Z k M i ), yet the elapsed time decreased. This is because once t r a c e T k T k M i ,   t r a c e S k S k M i and t r a c e T k S k M i are computed, due to the optimal choice of η , we have a better value for t r a c e Z k + 1 Z k + 1 M i , which can be calculated from the following equation (Equation (23)):
t r a c e Z k + 1 Z k + 1 M i = t r a c e T k T k M i + η 2 t r a c e S k S k M i + 2 η t r a c e T k S k M i
which approximates the experimental measurement data y i (see Equation (12)).
A summary of the results is presented in Table 3. As can be seen from both the table and the figures, when we restrict the rank of the matrix U to 10, MiFGD struggles more noticeably to converge.
For n = 8, in the first configuration, MiFGD took 617.5 s for each iteration while the proposed method took 1596 s per iteration. This means the proposed algorithm is approximately 2.6 times slower per iteration, as it performs more calculations to avoid an uneducated step, but overall it is much faster to converge, as can be seen from both Table 4 and Figure 10, Figure 11, Figure 12 and Figure 13.
Again, as can be seen from both the table and the figures, when we restrict the rank of the matrix U to 10, MiFGD struggles more noticeably to converge compared to the case in which rank( U ) = d.
Another interesting thing is that while the proposed method is supposed to choose the optimal step size, we can see some oscillation noticeably at the beginning. The reason for the oscillation is that, in equations U k + 1 = Z k η i t r a c e Z k Z k M i y i M i Z k and Z k + 1 = U k + 1 + μ ( U k + 1 U k ) (Equations (14) and (15)), two different step lengths, η and μ , are involved. What we have undertaken so far in the proposed method is finding the optimal η , while the other one is fixed ( μ = 0.95 ) . Indeed, the computation of Z k + 1 will be influenced not only by U k + 1 , but also by U k from the previous steps. Intuitively, μ determines to what extent the method “remembers” the past iterations (for instance, μ = 0 means no “memories” from the past iterations). As a result, while the random starting point might be very far from optimal, it gets reflected in the “memory” of the method and it takes a while before it fades away and gets overshadowed by the accumulation of new memories. As a result, we cannot expect a strictly decreasing error. This issue is something that we will address shortly.
In all the previous experiments, we considered μ = 0.95 . The main reason was that this keeps the number of expensive operations small, but we could have merged (17) and (19) as follows: Z k + 1 = 1 + μ Z k μ U k + η S k . This way, by having computed t r a c e Z k Z k M i ,   t r a c e U k U k M i , t r a c e S k S k M i ,   t r a c e Z k U k M i ,   t r a c e Z k S k M i and t r a c e U k S k M i , and by having a fixed μ , we can quickly (without much calculation) compute T k = 1 + μ Z k μ U k and then compute t r a c e T k T k M i ,   t r a c e S k S k M i and t r a c e T k S k M i , and then we can calculate the linear combinations of t r a c e Z k Z k M i ,   t r a c e U k U k M i ,   t r a c e S k S k M i ,   t r a c e Z k U k M i ,   t r a c e Z k S k M i and t r a c e U k S k M i and plug them into (23) to find the optimal step length η , as before. This means that by increasing the number of expensive operations per iteration from 3 to 6, we will be able to use grid search (for instance, varying μ from 0.01 to 0.99 by increments of 0.01) to adaptively obtain a good choice of μ for every iteration. Below, we compare this new version of adaptively selecting μ versus fixed μ ( μ = 0.95 ) for the case of n = 8.
As can be seen from the Figure 14, Figure 15, Figure 16 and Figure 17, with this slight modification, we no longer have the oscillatory behavior, and the convergence is slightly faster. Also, note that the computational cost per iteration for the case of adaptive μ is two times the cost for the case of fixed μ , resulting in faster convergence. However, still, it seems that η has the main crucial role compared to μ .

5. Conclusions

In this paper, we proposed a novel method for achieving a closed-form solution for optimal step length in an AGD approach to QST, leading to significantly improved performance compared to existing methods. Although the proposed method is three times slower per iteration, the numerical results demonstrate that it is more than ten times faster overall due to the educated step size that is chosen in each iteration. The key innovation of our method lies in the adaptive selection of the step length η, which ensures that each iteration leads to the maximum possible reduction in the objective function. This approach leverages the current state information to make more informed adjustments, thereby enhancing the overall efficiency of the algorithm. Looking ahead, this adaptive step-size strategy opens up several promising avenues for future research. One potential direction is to integrate similar adaptive step-size methods into more sophisticated gradient-based algorithms where the curvature information is better encoded in the iterative approach.

Author Contributions

Conceptualization, M.D., P.S. and V.L.; methodology, M.D.; software, M.D.; validation, M.D., P.S. and V.L.; formal analysis, M.D.; investigation, M.D. and P.S.; resources, M.D.; data curation, M.D.; writing—original draft preparation, M.D., P.S. and V.L.; writing—review and editing, M.D., P.S. and V.L.; visualization, M.D., P.S. and V.L.; supervision, P.S. and V.L.; project administration, M.D., P.S. and V.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Whittaker, E.D. A History of the Theories of Aether and Electricity: The Modern Theories; Nelson, T., Ed.; Courier Dover Publications: Mineola, NY, USA, 1951; Volume 2, pp. 1900–1926. [Google Scholar]
  2. Bolduc, E.; Knee, E.; Gauger, E.; Leah, J. Projected gradient descent algorithms for quantum state tomography. Npj Quantum Inf. 2017, 3, 44. [Google Scholar] [CrossRef]
  3. McGinley, M.; Fava, M. Shadow tomography from emergent state design in analog quantum simulators. Phys. Rev. Lett. 2023, 131, 160601. [Google Scholar] [CrossRef]
  4. Boto, A.N.; Kok, P.; Abrams, D.S.; Braunstein, S.L.; Williams, C.P.; Dowling, J.P. Quantum interferometric optical lithography: Exploiting entanglement to beat the diffraction limit. Phys. Rev. Lett. 2000, 13, 2733. [Google Scholar] [CrossRef]
  5. Farooq, A.; Khalid, U.; Ur Rehman, J.; Shin, H. Robust quantum state tomography method for quantum sensing. Sensors 2022, 22, 2669. [Google Scholar] [CrossRef]
  6. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: London, UK, 2004. [Google Scholar]
  7. Byrd, R.H.; Nocedal, J.; Schnabel, R.B. Representations of quasi-Newton matrices and their use in limited memory methods. Math. Program. 1994, 63, 129–156. [Google Scholar] [CrossRef]
  8. Li, B.; Shi, B.; Yuan, Y.X. Linear Convergence of Forward-Backward Accelerated Algorithms without Knowledge of the Modulus of Strong Convexity. SIAM J. Optim. 2024, 34, 2150–2168. [Google Scholar] [CrossRef]
  9. Bai, J.; Hager, W.W.; Zhang, H. An inexact accelerated stochastic ADMM for separable convex optimization. Comput. Optim. Appl. 2022, 81, 479–518. [Google Scholar] [CrossRef]
  10. Li, B.; Shi, B.; Yuan, Y.X. Proximal Subgradient Norm Minimization of ISTA and FISTA. arXiv 2022, arXiv:2211.01610. [Google Scholar] [CrossRef]
  11. Available online: https://github.com/eliotbo/PGDfullPackage (accessed on 15 June 2024).
  12. Kalev, A.; Kosut, R.; Deutsch, I. Quantum tomography protocols with positivity are compressed sensing protocols. NPJ Quantum Inf. 2015, 1, 15018. [Google Scholar] [CrossRef]
  13. Kim, J.L.; Kollias, G.; Kalev, A.; Wei, K.X.; Kyrillidis, A. Fast quantum state reconstruction via accelerated non-convex programming. Photonics 2023, 10, 116. [Google Scholar] [CrossRef]
  14. Available online: https://github.com/gidiko/MiFGD (accessed on 15 June 2024).
  15. Beach, M.J.; De Vlugt, I.; Golubeva, A.; Huembeli, P.; Kulchytskyy, B.; Luo, X.; Melko, R.; Merali, E.; Torlai, G. QuCumber: Wavefunction reconstruction with neural networks. SciPost Phys. 2019, 7, 9. [Google Scholar] [CrossRef]
  16. Available online: https://github.com/PIQuIL/QuCumber (accessed on 15 June 2024).
  17. Ahmed, S.; Quijandría, F.; Kockum, A.F. Gradient-descent quantum process tomography by learning Kraus operators. Phys. Rev. Lett. 2023, 130, 150402. [Google Scholar] [CrossRef] [PubMed]
  18. Available online: https://github.com/quantshah/gd-qpt (accessed on 15 June 2024).
  19. Cha, P.; Ginsparg, P.; Wu, F.; Carrasquilla, J.; McMahon, P.L.; Kim, E.A. Attention-based quantum tomography. Mach. Learn. Sci. Technol. 2021, 3, 01LT01. [Google Scholar] [CrossRef]
  20. Torlai, G.; Wood, C.J.; Acharya, A.; Carleo, G.; Carrasquilla, J.; Aolita, L. Quantum process tomography with unsupervised learning and tensor networks. Nat. Commun. 2023, 14, 2858. [Google Scholar] [CrossRef]
  21. Altepeter, J.B.; James, D.F.V.; Kwiat, P.G. 4 Qubit Quantum State Tomography. In Quantum State Estimation; Springer Science & Business Media: Berlin, Germany, 2004; pp. 113–145. [Google Scholar]
  22. Miranowicz, A.; Bartkiewicz, K.; Peřina, J., Jr.; Koashi, M.; Imoto, N.; Nori, F. Optimal two-qubit tomography based on local and global measurements: Maximal robustness against errors as described by condition numbers. Phys. Rev. A 2014, 90, 062123. [Google Scholar] [CrossRef]
  23. Feito, A.; Lundeen, J.S.; Coldenstrodt-Ronge, H.; Eisert, J.; Plenio, M.B.; Walmsley, I.A. Measuring measurement: Theory and practice. New J. Phys. 2009, 11, 093038. [Google Scholar] [CrossRef]
  24. Bianchetti, R.; Filipp, S.; Baur, M.; Fink, J.M.; Lang, C.; Steffen, L.; Boissonneault, M.; Blais, A.; Wallraff, A. Control and tomography of a three level superconducting artificial atom. Phys. Rev. Lett. 2010, 105, 223601. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The proposed method: AGD with optimal step length η .
Figure 1. The proposed method: AGD with optimal step length η .
Sensors 24 05464 g001
Figure 2. Numerical result for n = 6, θ = π 2 , and rank = d.
Figure 2. Numerical result for n = 6, θ = π 2 , and rank = d.
Sensors 24 05464 g002
Figure 3. Numerical result for n = 6, θ = π 2 , and rank = 10.
Figure 3. Numerical result for n = 6, θ = π 2 , and rank = 10.
Sensors 24 05464 g003
Figure 4. Numerical result for n = 6, θ = π 3 , and rank = d.
Figure 4. Numerical result for n = 6, θ = π 3 , and rank = d.
Sensors 24 05464 g004
Figure 5. Numerical result for n = 6, θ = π 3 , and rank = 10.
Figure 5. Numerical result for n = 6, θ = π 3 , and rank = 10.
Sensors 24 05464 g005
Figure 6. Numerical result for n = 7, θ = π 2 , and rank = d.
Figure 6. Numerical result for n = 7, θ = π 2 , and rank = d.
Sensors 24 05464 g006
Figure 7. Numerical result for n = 7, θ = π 2 , and rank = 10.
Figure 7. Numerical result for n = 7, θ = π 2 , and rank = 10.
Sensors 24 05464 g007
Figure 8. Numerical result for n = 7, θ = π 3 , and rank = d.
Figure 8. Numerical result for n = 7, θ = π 3 , and rank = d.
Sensors 24 05464 g008
Figure 9. Numerical result for n = 7, θ = π 3 , and rank = 10.
Figure 9. Numerical result for n = 7, θ = π 3 , and rank = 10.
Sensors 24 05464 g009
Figure 10. Numerical result for n = 8, θ = π 2 , and rank = d. Despite initial oscillation, our method outperforms the basic MiFGD.
Figure 10. Numerical result for n = 8, θ = π 2 , and rank = d. Despite initial oscillation, our method outperforms the basic MiFGD.
Sensors 24 05464 g010
Figure 11. Numerical result for n = 8, θ = π 2 , and rank = 10. In the first iterations, the proposed method does not seem to be progressing, but as time passes it starts to outperform the basic MiFGD.
Figure 11. Numerical result for n = 8, θ = π 2 , and rank = 10. In the first iterations, the proposed method does not seem to be progressing, but as time passes it starts to outperform the basic MiFGD.
Sensors 24 05464 g011
Figure 12. Numerical result for n = 8, θ = π 3 , and rank = d. Despite initial volatility, the proposed method outperforms the basic MiFGD.
Figure 12. Numerical result for n = 8, θ = π 3 , and rank = d. Despite initial volatility, the proposed method outperforms the basic MiFGD.
Sensors 24 05464 g012
Figure 13. Numerical result for n = 8, θ = π 3 , and rank = 10. Despite instability at the beginning, the method does a better job compared to the basic MiFGD.
Figure 13. Numerical result for n = 8, θ = π 3 , and rank = 10. Despite instability at the beginning, the method does a better job compared to the basic MiFGD.
Sensors 24 05464 g013
Figure 14. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 3 , and rank = d. As can be seen, the new version does not suffer from instability at the beginning.
Figure 14. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 3 , and rank = d. As can be seen, the new version does not suffer from instability at the beginning.
Sensors 24 05464 g014
Figure 15. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 3 , and rank = 10. As can be seen, the new version does not oscillate, in contrast to the previous version.
Figure 15. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 3 , and rank = 10. As can be seen, the new version does not oscillate, in contrast to the previous version.
Sensors 24 05464 g015
Figure 16. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 2 , and rank = d. As can be seen, the adaptivity of μ helps the new method not to oscillate.
Figure 16. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 2 , and rank = d. As can be seen, the adaptivity of μ helps the new method not to oscillate.
Sensors 24 05464 g016
Figure 17. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 2 , and rank = 10. As can be seen, the adaptivity of μ results in a better performance.
Figure 17. Numerical comparison of the case where μ is fixed versus. the case of adaptively choosing μ for n = 8, θ = π 2 , and rank = 10. As can be seen, the adaptivity of μ results in a better performance.
Sensors 24 05464 g017
Table 1. Comparison of AGD-based methods for QST.
Table 1. Comparison of AGD-based methods for QST.
MethodPGDM [2]MiFGD [13]Proposed Method
Using SVDYesNoNo
AGDYesYesYes
Step sizeFixedFixedVariable
Unit traceYesNoNo
Low rankNoYesYes
# fcn eval113
Table 2. Comparison of MiFGD and the proposed method for n = 6.
Table 2. Comparison of MiFGD and the proposed method for n = 6.
( θ , r a n k ) MiFGDProposed Method
Time [s], Number of Iterations, Normalized Error
( π 2 , d ) (318, 3480, 0.0082)(15, 61, 0.0080)
( π 2 , 10 ) (364, 3960, 0.0081)(15, 59, 0.0079)
( π 3 , d ) (497, 5196, 0.0080)(21, 81, 0.0078)
( π 3 , 10 ) (578, 6074, 0.0080)(20, 78, 0.0077)
Table 3. Comparison of MiFGD and the proposed method for n = 7.
Table 3. Comparison of MiFGD and the proposed method for n = 7.
( θ , r a n k ) MiFGDProposed Method
Time [s], Normalized Error
( π 2 , d ) (3488, 0.0081)(225, 0.0076)
( π 2 , 10 ) (3973, 0.0082)(224, 0.0076)
( π 3 , d ) (4640, 0.0082)(244, 0.0076)
( π 3 , 10 ) (5523, 0.0082)(241, 0.0082)
Table 4. Comparison of MiFGD and the proposed method for n = 8.
Table 4. Comparison of MiFGD and the proposed method for n = 8.
( θ , r a n k ) MiFGDProposed Method
Time [s], Normalized Error
( π 2 , d ) (15,439, 0.5384)(14,550, 0.1032)
( π 2 , 10 ) (14,396, 0.5239)(13,915, 0.0890)
( π 3 , d ) (15,782, 0.4764)(14,992, 0.0630)
( π 3 , 10 ) (15,236, 0.4073)(15,213, 0.0685)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dolatabadi, M.; Loia, V.; Siano, P. A New Method Based on Locally Optimal Step Length in Accelerated Gradient Descent for Quantum State Tomography. Sensors 2024, 24, 5464. https://doi.org/10.3390/s24175464

AMA Style

Dolatabadi M, Loia V, Siano P. A New Method Based on Locally Optimal Step Length in Accelerated Gradient Descent for Quantum State Tomography. Sensors. 2024; 24(17):5464. https://doi.org/10.3390/s24175464

Chicago/Turabian Style

Dolatabadi, Mohammad, Vincenzo Loia, and Pierluigi Siano. 2024. "A New Method Based on Locally Optimal Step Length in Accelerated Gradient Descent for Quantum State Tomography" Sensors 24, no. 17: 5464. https://doi.org/10.3390/s24175464

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop