1. Introduction, Motivation and Main Results
Quantum hardware and software are still in their early development days, so that the design of quantum algorithms typically focuses on low-level operations. Although one should always keep in mind the hardware limitations, especially when describing possible near-term implementations of quantum algorithms, it is convenient to pursue higher levels of abstraction. Apart from its long-term and algorithmic interest, a more abstract and standardized approach serves practical purposes too, for example that of making the benchmarking of quantum computer performances a more solid and transparent process. In turn, this helps pushing forward the research and the development in quantum computation at all levels.
In the present paper, we describe a novel framework for the design of quantum algorithms on a more abstract plane. Although the isolated components of the framework that we present here might not be new, it is the combination of them that is novel. This paper is especially suited for researchers coming from other fields who see quantum computing as a potentially useful tool within their own subject. To this aim, our first proposal consists of the definition of a
quantum matrix, namely a quantum state organized in two registers:
where
indicates a register composed of
qubits corresponding to
states, while
is a register composed of
qubits corresponding to
states. The overall state
, as defined in (
1), is manifestly presented with the structure of a matrix; specifically, we interpret
i as the index running over the rows and
j as the index running over the columns. The rightmost qubit within a register is associated with the least significative digit of the associated index, in binary notation (thus, we are adopting a little-endian convention). This way of storing the information has a common ground with that of Flexible Representation of Quantum Images (FRQI) and the Novel Enhanced Quantum Representation (NEQR) [
1,
2]. The main difference with FRQI and NEQR is that we encode the information of the
entry of the matrix in the quantum amplitude
. Intuitively, the matrix (
1) is a bi-dimensional memory array where
encodes the information stored in the
memory location (see
Figure 1).
Here we must address a key technical feature about how we encode the information into the quantum amplitudes, the so-called
direct embedding [
3]. Namely, the information to be stored into the quantum matrix is directly encoded into the complex amplitudes
. This contrasts with the customary choice in many fields of applications (see [
4] for example) of loading the information into the probabilities
. The reason behind this latter encoding is that most of the read-out algorithms are only capable of extracting the module of the complex amplitude. We make further comments on this point in
Section 4. Furthermore, the direct embedding that we adopt has relevant implications in later stages of the quantum algorithms. Most importantly, the information stored into the quantum state is handled and combined more easily, because algebraic operations are not hampered by the presence of square roots. This allows us to define an “arithmetic library” composed of many fundamental arithmetic operations to handle arrays stored into the quantum matrix. Such a “general purpose” library provides a versatile framework for the implementation of wide classes of algorithms. In this work, we provide some simple example algorithms without aiming to be exhaustive. The possibility of implementing arithmetic operations within a quantum framework has been considered in the literature since the early days of quantum computation. Apart from the quantum implementation of logical circuits corresponding to basic operations, like the quantum adder [
5,
6], the manipulation of “continuous” numbers has also been studied. Let us mention some works which, at least in spirit, are closer to ours [
7,
8,
9,
10]. Unlike these previous approaches, we use a new embedding and organize the information into a matrix (
1); these two aspects combined allow us to work in a transparent and simple manner.
The second and final proposal in this paper is to give a full overview of the complete pipeline, or overall structure, of the generic algorithm admitting implementation within this framework. The first step of every algorithm corresponds to loading some input data. In the quantum case, it is often convenient to split this step into two sub-steps:
loading a probability distribution
[
3,
11,
12]
loading a bi-dimensional function
(possibly by means of methods that load information a line at a time). In reference to the notation introduced for the quantum matrix (
1), we have
, where the indexes are not summed over.
It is not strictly necessary to split the loading into two steps. Yet, we consider such splitting because –typically– we adopt different loading techniques for them: the probability distribution is loaded with a state preparation algorithm (e.g., a multiplexor binary tree), while the function is loaded by means of an auxiliary qubit meant to tell “good” and “bad” states apart. We describe the first step of the pipeline in
Section 2.
In
Section 3 we describe the second step of the pipeline corresponding to the implementation of various arithmetic operations, typically at the level of entire arrays or sub-arrays, and we refer to it as
quantum arithmetic. In
Section 4 we describe the last step of the pipeline, which corresponds to extracting the information that we have stored in the quantum state, namely the read-out of the state that encodes the result of the algorithm.
One of the advantages of organizing the pipeline as in
Figure 2 is that it enjoys a modular structure. Therefore, we can develop and analyze each of the steps independently, thus achieving a better understanding of the problems in each domain. The color coding corresponds to the efficiency of the single modules. It should be stressed that the efficiency of the single modules depicted in the diagram of
Figure 2 refers to the current state of the art which can change in the future, that is, it does not necessarily represent a structural limitation. In particular, an efficient algorithm would correspond to an end-to-end green path from left to right across the diagram. In searching for possible implementations for a desired algorithm, the challenge is to improve the necessary blocks so to follow a completely green path. This structure can be easily adapted to quantum neural networks, where the quantum arithmetic layer is interpreted as a layer (or a collection of layers) within the quantum network.
Remark 1. Throughout the text we use the terms complexity and efficiency:
The complexity can be measured in different manners, specifically we measure it as the number of gates in an algorithm. This is motivated by the fact that CNOTs are sensibly more error-prone and require a longer execution time than single-qubit gates, as commented—for instance—in [13]. In particular, we call an algorithm “complex” if its complexity scales as an exponential function of n, n being the number of qubits in the circuit. The efficiency is much more difficult to define. In the first two modules, data loading and quantum arithmetic, with efficiency we refer to the complexity of the quantum algorithm plus the associated cost of all the classical auxiliary steps (if present). Since there is no standard way of adding the quantum and classical computational costs, we prefer not to quantitatively define efficiency. In the third module, information extraction, we instead use efficiency to refer to the number of calls to the oracle. Note that this is not related to the previous definition of complexity. This said, an overall efficient algorithm would correspond to one that is efficient in the three modules at the same time.
2. Data Loading
Data loading is a generic step which is required essentially in any quantum algorithm. The actual data to be loaded can vary in nature and serve different purposes. The data can correspond –for example– to the discretization of a normalized general real function f defined on a two-dimensional domain. Another example could be the pixels of an image in grayscale. Yet, we can as well think of the loading of more general data corresponding to a complex matrix, as long as the normalization of the quantum state is respected.
The recipe described in the appendix works in a pointwise fashion, exploiting an auxiliary register to store the desired value into the quantum amplitude at each “memory address”, namely, to store it into the associated entry of the quantum matrix. It is important to underline from the outset that this pointwise approach is generically not efficient. In order to attain efficiency at the level of the full algorithm, we need to assume that the loading procedure can be implemented in an alternative and efficient way; in other words, we need to assume the existence of a suitable efficient oracle. Nonetheless, as we will show and stress later, a set of efficient manipulations for generic arrays is possible even when their loading is not efficient. This observation stems directly from the modular structure described in the pipeline of
Figure 2.
As already stated in Remark 1 there are two different aspects related to the efficiency of the state preparation, the quantum circuit complexity, on the one side, and the complexity of the pre-processing algorithm (where needed), on the other. An example of pre-processing algorithm would be the computation of the values of the angles in a tree-like loading (see for instance [
11,
14]). In such an example, although the quantum circuit has a low complexity, the cost of the classical pre-processing makes it inefficient. Here, we are going to discuss only the former, namely, the quantum circuit complexity.
Loading a generic real array is not a trivial problem. In
Appendix A.1 we refer to a pointwise loading, without paying attention to its optimization (Despite it not being optimal, we have adopted the pointwise method for its simplicity. We postpone the study of more optimized approaches to the future). To this regard, the state of the art is currently set by two alternative approaches [
15]: one based on multiplexors [
13,
16] and the other one based on Schmidt’s decomposition [
17,
18]. Both approaches give essentially the same leading CNOT complexity, namely, a number of CNOTs which scales as
for the preparation of the generic
n-qubit state. Although this scaling might seem excessively large, we are loading
points and the complexity of the algorithm is just of order
.
Let us stress that, in the very specific case where we need to load a constant array, the procedure of
Appendix A.2 requires (in the worst case scenario)
X-gates, two
Y-rotations and one multi-controlled NOT gate. Such numbers must be compared with their classical counterpart, where the loading of a constant array on a line of the
matrix requires
J operations, considering that the process of copying a single number from memory is an operation. Therefore, the loading of a constant function is more interesting from the quantum speed-up perspective than the pointwise loading of a generic function. Indeed, in principle, we need exponentially fewer operations on a quantum computer to load a constant array. Interestingly, the number of operations needed does not depend on the length of the constant array that we want to load, but it does depend on the number of rows of the matrix that we have to control. Here we can directly see the nature of quantum systems in practice, there is an “extra” cost associated to acting on a single element of the system without impacting the others. That makes operations on single elements inefficient and operations on the whole structure very efficient. A detailed analysis of the loading procedure will be explored in future works.
3. Quantum Arithmetic
In the present section we provide a collection of tools for the arithmetic handling of arrays encoded into a quantum matrix through direct embedding. These tools have been implemented and tested using QLM (by Atos). In this section we do not aim to address all the possible arithmetic operations implementable within this framework. Nor is in the spirit of this work to make an exhaustive analysis of the performance of the different operations here described. We simply give some particular examples in order to better discuss the working and possibilities of the framework.
Other operations potentially implementable within this framework are—for instance—those described in [
19], a Walsh-Hadamard transform [
20] or the replacement of the arithmetic unit by neural net processing unit.
3.1. Ordering
The first operations that we introduce are those which allow us to move elements within the quantum matrix. Manipulating single elements in the matrix has a much higher cost than performing operations on the whole structure. For this reason, we first introduce a global reversing operation and then we introduce generic permutations.
3.1.1. Reversing
By reversing, we mean the operation
where, for concreteness, we have addressed the reversing operation on the
i-th row of the quantum matrix. Similarly, reversing the entire matrix corresponds to reading its entries from the lower-right corner i.e., in the opposite direction to how matrix entries are usually read. Note that it is straightforward to perform the reversing operation on a column.
We divide the process in three steps:
Mask the row. In this case we only need to mask the register corresponding to the row (i.e., the
register) and leave the column register untouched. For more information on this operation see
Appendix A.1.
Apply controlled X-gates. The controlled qubits are those of the row register. The target qubits are those of the column register.
Undo step one, by applying again the same masking operation as before.
Following the steps above, we can perform a reversing operation on any row of the quantum matrix.
If we wanted instead to reverse the whole matrix, the operation would be more efficient than just reversing a row or a column. In that case, there is no need to control on any qubit, we just need to apply an X-gate to each register of the quantum matrix.
As an explicit example, let us think of an
quantum matrix and, for simplicity, let us consider
. Hence, we need
and
. Suppose that we have loaded the following quantum matrix:
In order to reverse the first row, we start by applying an
X-gate to the row register obtaining the state:
Now, the row on which we are focusing, namely the one corresponding to
for
, has all the qubits in the row register set to one (in this case the row register is just the qubit
). So, by means of controlled operations on
, we act only on the row
. Specifically, we apply an
X-gate controlled on the row register, which acts on all the qubits of the column register. This yields
Finally, we apply again an
X-gate to the row register obtaining:
The last step consists in undoing the mask.
3.1.2. Permutations
Permutations of two elements of an array, i.e., swaps of two entries, are demanding operations as we have to manipulate individual elements, instead of whole blocks in the quantum matrix. For simplicity, in what follows we discuss the algorithm referring to a quantum matrix given by a single row. Generalizing to larger matrices is straightforward. It is relevant to point out that also the extension to arrays with higher dimensions, i.e., from bi-dimensional matrices to
d-dimensional tensors, is doable, although it requires additional controlled operations. Note that the additional controlled operations may result in additional complexity (see the definition of complexity in Remark 1). Specifically, consider the state:
where
Moreover, let us write
in (
7) in the notation
which is more convenient to understand how the different gates act on the order of the components. The strategy presented here to perform a permutation of two arbitrary elements in (
8) consists in using a pivot element. That is, we choose a fixed position
k (pivot) and implement the permutations of the component placed at position
k and any other component in the array. Once this is done, the generic swap of two elements can be obtained by means of three operations, at most. For example, if we aim to make a permutation of elements in positions
i and
j in
we would need to perform the following three steps: First, we permute the positions
, obtaining
Then, we consider the permutation of positions
corresponding to the permutation of elements
:
Finally, we perform again step one, obtaining the desired permuted state, namely
Now, the key in the algorithm is to understand how to actually perform in practice the permutations with the pivot. They can be implemented through
X-gates and controlled
X-gates. Moreover –without losing generality–, we choose the last element of the register as the pivot. If we have
qubits, the single
X-gates acting on state (
8) have the effects described in
Table 1. The symbol
represents the identity matrix of order 2.
From
Table 1 we can see that the single
X-gates are performing swaps of blocks of
contiguous memory positions. When we act on more significant qubits we are affecting bigger blocks and each gate is affecting the whole state. In this algorithm we are only interested on the effect that the gate has on certain blocks of the array (the ones highlighted). Using multi-controlled
X-gates where the controls are applied to all qubits (except the one where we apply the
X gate) and acting on state (
8), we get the results reported in
Table 2.
In this case it is clear that the effect of the controlled operations is to permute the last elements of each highlighted block.
We need to combine both operations, X-gates and multi-controlled X-gates, to perform the permutation of any element with the pivot (the last element, according to our choice). The strategy can be implemented recursively in the following way:
Move the last element of the array to the block where the element we wish to permute is located. This is done through a suitable multi-controlled X-gate.
If at this point the two elements that we wanted to interchange have been actually swapped, then undo all previous operations (both X-gates and multi-controlled X-gates) except for the last one and finish. These operations are needed to bring back to their original position all the other elements except the pair that has been swapped. Otherwise continue.
Swap the blocks on which we have acted at step 1. This is done through a single X-gate and serves the purpose of moving to the right the block on which we need to focus.
Go back to step 1.
For the sake of clarity, let us give a simple explicit example. Consider the state
and suppose we want to permute the first element
with the pivot element
. We can proceed as follows:
Now that we have effectively swapped the element 0 and the element 7 we just have to relocate the rest of the elements (Step 3).
In general, the number of multi-controlled X gates that we need to apply for the permutation of the last element of the array with the first one, (the worst case scenario) is of order , the length of the array being .
3.1.3. Cyclic Permutations
Cyclic permutations correspond to the two transformations:
where we follow the same notation adopted in
Section 3.1.2. These operators have been discussed in depth in [
21] and their implementation can be immediately extended to our framework, upon adding suitable controls.
3.2. Addition
In this subsection we discuss both the sum of whole arrays and the sum of their components (reduction).
3.2.1. Sum
Consider the state
given in (
A4) (In what follows, we adopt the notation ⊃ to consider just some specific terms that are relevant for our purposes within a bigger quantum state):
where we have omitted the auxiliary register
for convenience. Applying a Hadamard gate on the first qubit of the row register, namely
we get the sum and difference of the the rows grouped in pairs, explicitly
In the first row of (
18), we get the sum of the first and second row of (
16). In the second row of (
18), instead, we get the difference between the first and the second row of (
16). In the third row of (
18), we get the sum of the third and fourth row of (
16), and the same structure continues on.
An analogous sum/difference operation can be performed in columns. Note that, in order to consider the correct number of
factors, we need to count the Hadamard gates that we apply. Eventually, to sum two rows that are not in the same pair, we can take advantage of the permutations described in
Section 3.1.2.
3.2.2. Reductions
By
reduction we mean the summation of all the elements of an array
where the result of the reduction is stored into the first entry of the array. Consider again the state defined in (
16), that is,
In order to perform a reduction by rows (i.e., summing the elements of each row and storing the result on the first column), we just need to apply a Hadamard gate to every qubit of the column register
which gives:
The parenthesis implies that we have the reduction of each row in the first column (which corresponds to
). In the rest of the columns, we get other reductions with different combinations of signs, as implemented by the Walsh-Hadamard operator in (
21).
If we were to do a reduction by columns, instead of by rows, we will need to apply the Walsh-Hadamard gate to the row register, instead of the column register. Correspondingly, we will get the reduction of the columns on the first row.
3.3. Products
In this subsection we consider the product of a whole array by a constant and the component-wise product of two arrays. Eventually, the scalar product of two arrays can be obtained composing the product of two arrays and a reduction. Similarly, the squaring of an array can be obtained as the product of the array by itself.
3.3.1. Multiplication by a Constant
In order to multiply a row or a column by a constant, we need an extra qubit which we denote
. Consider the state defined in (
A4), but this time supplemented with the extra qubit:
The multiplication operation consists merely in a controlled rotation. The rotation needs to be applied onto the auxiliary register , it introduces a factor , so we are initially restricted to multiplication by numbers between 0 and 1. This limitation can be circumvented by means of suitable manipulations of the normalization constants.
Depending on the controls that we apply, we can multiply a row, a column or a specific individual entry by
. For example, assume that we want to multiply the first row by
. In order to act solely on the first row, we first have to mask it by applying
explicitly, we obtain:
The next step is to perform the controlled
Y-rotation
where we have indicated the controls of the controlled rotations with the symbol
C. Thus, (
26) is to be interpreted as a controlled
Y-rotation acting on
and controlled by the row register. Explicitly, (
26) gives
Eventually, we have to unmask the state
which yields
The relevant information is marked by .
3.3.2. Array Multiplication
In the present section, we describe the theoretical proposal for a more advanced operation: the multiplication of arrays. Its (overall) efficiency is related to that of the loading process. Let us assume to dispose of the following oracles:
which load the arrays
f and
g respectively. Moreover, consider the swap operator:
In order to build the multiplication operator we start from the state:
where we have two auxiliary qubits:
to load
f and
to load
g. First we load
f:
In the second step we swap qubits
and
:
The third and last step consists in applying the oracle
:
The multiplication of arrays
f and
g is encoded in the registers marked by
(In order to return back to the original ordering of the auxiliary qubits,
, one can consider an extra swap
S):
This procedure can be extended to the multiplication of more than two arrays. It is worth noticing that this method depends on the loading complexity, that is, the efficiency of the employed oracles.
As a final comment, when in
Section 1 we split the loading of the integrand
in two parts, one associated to the distribution and one associated to the function, we were in fact performing the multiplication of the two arrays
and
.
3.3.3. Squaring and Scalar Product
The square of an array and the scalar product of two arrays can be obtained from operations that we have already defined above. The former is trivially just the multiplication of an array by itself. On the other hand, if we perform the reduction of the product of two arrays, we get their scalar product. As their construction depends on the steps commented in
Section 3.3.2, the efficiency of the square of an array and the scalar product of two arrays is strongly dependent on the loading strategy for the arrays.
4. Information Extraction
With information extraction we loosely refer to any technique which allows us to read some information encoded in the quantum matrix. Within this broad category, we can identify two main groups of algorithms.
The first group includes those algorithms focusing on the estimation of the phase of a quantum state. The most well known example in the literature is the
Quantum Phase Estimation (QPE) algorithm [
22,
23]. However, the overall structure of the QPE algorithm does not fit directly within the structure that we are proposing here. The reason for this is that, in order to implement QPE, we would need additional qubit registers.
The second group includes algorithms designed to estimate probability amplitudes. The most well known example in the literature is the
Quantum Amplitude Estimation (QAE) algorithm [
24]. More recent approaches try to demand less computational resources by removing the implicit QPE present in the QAE algorithm. Examples of works in this direction include the
Quantum Amplitude Estimation Simplified (QAES) algorithm, the
Iterative Quantum Amplitude Estimation (IQAE) algorithm and the
Maximum Likelihood Amplitude Estimation (MLAE) algorithm [
25,
26,
27]. QAE algorithm does not fit in our framework either, in fact it depends on QPE. Nevertheless, QAES, IQAE and MLAE do not use QPE and they can be naturally implemented in our framework. In these approaches, the required number
N of oracle calls to obtain a precision
in the estimated probability amplitude is of order
while, with a naive (i.e., unamplified) sampling, the number of oracle calls would grow as
. We recall that, from Remark 1, the number of calls to an oracle (i.e., the efficiency of the information extraction module) is not related with the number of CNOT gates used in the first two modules (i.e., the complexity). To compute the total number of CNOT gates in the circuit, which represents the overall complexity of the quantum algorithm, we need to consider the number of CNOT gates used in the first two modules repeated as many times as the information extraction algorithm requires. This is the reason why we say that the three modules are
quasi-independent.
The information extraction strategies deserve an in-depth investigation on their own. Within the here proposed framework, it would be of particular interest an algorithm able to read both the probability amplitude and the phase at the same time. We will explore this possibility in future works [
28].
6. Discussion and Conclusions
The main goal of this work is to propose and describe a generic framework for the design of quantum algorithms based on direct embedding. Its modular structure, as depicted in
Figure 2, is appealing and handy in a number of ways. For example, under this framework the main components of a quantum algorithm, namely: data loading, arithmetic manipulations and read-out, can be studied and discussed separately. This holds true also for considerations related to efficiency, the current status of which is reflected by the color coding of
Figure 2; specifically, an end-to-end efficient pipeline would be represented by a left-to-right path within the diagram that encounters only green boxes. Thus, the modular structure of the pipeline for the generic quantum algorithm helps to organize the research effort, compare and interpret different algorithms, and identify possible bottle-necks. Furthermore, it is possible to combine this framework with other existing routines. For instance, it is possible to adopt one’s favorite amplitude amplification and estimation technique for the information-extraction part.
On a more technical level, the direct embedding of information into the quantum amplitudes avoids having to deal with square roots and thereby it opens the way to easier arithmetic manipulations of the data stored in the quantum state. In particular, we defined the quantum matrix, a two-dimensional array which can be thought of in analogy to a memory register: the basis states correspond to the row and column addresses of the memory locations, while the entries of the matrix are the quantum amplitudes representing the loaded information. As it has been previously illustrated, this construction allows for the neat and flexible manipulation of arrays. We have also covered some basic arithmetic manipulations, for which we provided descriptions and implementation details. All in all, we set up a theoretical proposal for a package of arithmetic operations in a quantum framework. Its full potential and development requires further investigation and work in the three modules.
Quantum matrices can be naturally generalized to multi-dimensional arrays. All the proposed arithmetic manipulations, as well as the loading and read-out techniques, can be extended in a straightforward way to the higher-dimensional and more general tensor setting. However, this comes at the cost of the potential necessity of additional controlled operations needed for “masking” the array and act only on a desired subset of entries. In other words, the cost of an operation is related to the co-dimension of the subset of entries to which it applies.
Finally, we also provided two specific examples of applications that are interesting on their own, beyond the discussions of the present work. Namely, the shift of a generic oracle by a constant and the shift by a step-wise approximate linear function. We note that their efficient implementation depends on the efficiency of the oracle to which the shift is applied. A constant shift for an oracle implements a vertical offset and it is useful –for example– in iterative algorithms where at each iteration an output oracle needs to be centered vertically, i.e., along the y axis.