**1. Introduction**

The field of quantum algorithms has dramatically transformed in the last few years due to the advent of a quantum to classical feedback loop: a fixed depth quantum circuit is adjusted to minimise a cost function. This approach partially circumvents certain limitations such as variability in pulse timing and requires shorter depth circuits at the cost of outer-loop training [1–6]. The most studied algorithm in this setting is the quantum approximate optimisation algorithm (QAOA) [7] which was developed to approximate solutions to combinatorial optimisation problem instances [8] i.e., MAX-k-SAT [9,10], MAX-Cut [7,11–16], and MAX-k-Colorable-Subgraph [17] instances. The algorithm has certain real-world applications, including finances [18] and might prove useful for general constraint optimisation [19].

The setting of QAOA is that of *n* qubits: states are represented as vectors in *Vn* = [C<sup>2</sup>]<sup>⊗</sup>*n*. We are given a non-negative Hamiltonian *P* ∈ hermC(*Vn*) and we seek the normalised ground vector |*t* ∈ arg min *φ*∈{0,1}*nφ*|*P*|*φ* .

QAOA might be viewed as a (time-variable fixed-depth) quantum split operator method. We let V(*γ*) be the propagator of *P* applied for time *γ*. We consider a second propagator U(*β*) generated by applying a yet-to-be-defined Hamiltonian *Hx* for time *β*. We start off in the equal superposition state |+ ⊗*n* = <sup>2</sup>−*<sup>n</sup>*/2(|<sup>0</sup> + |1 )<sup>⊗</sup>*n* and form a *p*-depth U, V sequence:

$$\left|\mathcal{g}\_{\mathcal{P}}(\gamma,\mathcal{G})\right|^{2} = \left| \langle t|\Pi\_{k=1}^{p}[\mathcal{U}(\beta\_{k})\mathcal{V}(\gamma\_{k})]|+\rangle^{\otimes n}|^{2}.\tag{1}$$

**Citation:** Rabinovich, D.; Sengupta, R.; Campos, E.; Akshay, V.; Biamonte, J. Progress towards Analytically Optimal Angles in Quantum Approximate Optimisation. *Mathematics* **2022**, *10*, 2601. https:// doi.org/10.3390/math10152601

Academic Editors: Fernando L.Pelayo and Mauro Mezzini

Received: 22 June 2022 Accepted: 23 July 2022 Published: 26 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The time of application of each propagator is varied to maximise preparation of the state |*t* . Finding *γ*, *β* to maximise |*gp*(*<sup>γ</sup>*, *β*)| has shown to be cumbersome. Even lacking such solutions, much progress has been made.

Recent milestones include experimental demonstration of *p* = 3 depth QAOA (corresponding to six tunable parameters) using a twenty three qubits [1] superconducting processor, universality results [20,21], as well as several results that aid and improve on the original implementation of the algorithm [11,12,17]. Towards practical realisation of the QAOA, trapped ion-based quantum computers have recently shown promising results, including demonstrations on up to forty qubits [2] and the potential to realise arbitrary combinatorial optimisation problems with all to all connectivity based on hardwareinspired modifications [22]. Although QAOA exhibits provable advantages such as recovering a near-optimal query complexity in Grover's search [23] and offers a pathway towards quantum advantage [13], several limitations have been discovered for low depth QAOA [9,24,25].

In the setting of maximum-constraint satisfiability (e.g., minimizing a Hamiltonian representing a function of type *f* : {0, 1}*n* → R+), it has been shown that underparameterisation of QAOA sequences can be induced by increasing a problem instances constraint to variable ratio [9]. This effect persists in graph minimisation problems [26]. While this effect is perhaps an expected limitation of the quantum algorithm, parameter concentrations and noise-assisted training add a degree of optimism. QAOA exhibits parameter concentrations, in which training for some fraction of *ω* < *n* qubits provides a training sequence for *n* qubits [27]. Moreover, whereas layerwise training saturates for QAOA in which the algorithm plateaus and fails to reach the target, local coherent noise recovers layerwise training robustness [28]. Both concentrations and noise-assisted training imply a reduction in computational resources required in outer-loop optimisation.

Exact solutions to find the optimal parameters for QAOA have only been possible in special cases including, e.g., fully connected graphs [14–16] and projectors [27]. A general analytical approach which would allow for (i) calculation of optimal parameters, (ii) estimation of the critical circuit depth and (iii) performance guarantees for fixed depth remains open.

Here we prove that optimal QAOA parameters for *p* = 1 are related as *γ*1 = *π* − 2*β*1 and in the thermodynamic limit, we recover optimality as *β*1*<sup>n</sup>* → *π* and *γ*1 → *π*. We moreover demonstrate that conditions for vanishing gradients of the overlap function share a similar form which leads to a linear relation between circuit parameters, independent of the number of qubits. We hence devise an additional means to recover parameter concentrations [27] analytically. Finally, we present a list of numerical effects, observed for particular system size and circuit depth, which are ye<sup>t</sup> to be explained analytically.

#### **2. State Preparation with QAOA**

We consider an *n*-qubit complex vector space *Vn* = [C<sup>2</sup>]<sup>⊗</sup>*n* ∼= C<sup>2</sup>*<sup>n</sup>* with fixed standard computational basis *Bn* = {|0 , |1 }⊗*<sup>n</sup>*. For an arbitrary target state |*t* ∈ *Bn* (equivalently |*t* , *t* ∈ {0, 1}×*n*) we define propagators

$$\mathcal{U}(\beta) \equiv e^{-i\beta H\_x}, \; \mathcal{V}(\gamma) \equiv e^{-i\gamma P}, \tag{2}$$

where *P* = |*t t*| and *Hx* = ∑*nj*=<sup>1</sup> *Xj* is the one-body mixer Hamiltonian with *Xj* the Pauli matrix acting non-trivially on the *j*-th qubit. Here we focus on the state preparation, thus choosing the problem Hamiltonian to be a projector (*P*<sup>2</sup> = *P*) on an arbitrary bit string |*t* . We note that while the projector has only two energy levels, the effective Hamiltonian of the whole QAOA sequence has up to *n* + 1 distinct energy levels. In such settings, the propagator V(*γ*) acting on a superposition adds a phase −*γ* to the component |*t* , while the propagator U(*β*) mixes the components' amplitudes.

A *p*-depth (*p* layer) QAOA circuit prepares a quantum state |*ψ* as:

$$\left|\psi\_{\mathcal{P}}(\gamma,\mathcal{J})\right\rangle = \prod\_{k=1}^{p} \left[\mathcal{U}(\beta\_k)\mathcal{V}(\gamma\_k)\right] |+\rangle^{\otimes n},\tag{3}$$

where *γk* ∈ [0, <sup>2</sup>*π*), *βk* ∈ [0, *<sup>π</sup>*). The optimisation task is to determine QAOA optimal parameters for which the state prepared in (3) achieves maximum absolute value of the overlap *gp*(*<sup>γ</sup>*, *β*) = 6*t*33*ψp*(*<sup>γ</sup>*, *β*)5 with the target |*t* . In other words, we search for

$$\left| (\gamma\_{opt,\prime} \mathfrak{B}\_{opt}) \right| \in \arg\max\_{\gamma,\mathfrak{B}} \left| \mathfrak{g}\_{\mathcal{P}}(\gamma,\mathfrak{F}) \right|. \tag{4}$$

Note that the problem is equivalent to the minimisation of the ground state energy of Hamiltonian *P*<sup>⊥</sup> = - − |*t <sup>t</sup>*|,

$$\min\_{\gamma,\mathfrak{g}} \langle \psi\_{\mathcal{P}}(\gamma,\mathfrak{g}) \left| P^{\perp} \right| \psi\_{\mathcal{P}}(\gamma,\mathfrak{g}) \rangle = 1 - \max\_{\gamma,\mathfrak{g}} \left| \varrho\_{\mathcal{P}}(\gamma,\mathfrak{g}) \right|^{2}. \tag{5}$$

**Remark 1** (Inversion symmetry)**.** *Under the affine transformation*

$$(\gamma, \beta) \to (2\pi - \gamma, \pi - \beta) \tag{6}$$

*the absolute value of the overlap remains invariant as gp* → (−<sup>1</sup>)*ng*<sup>∗</sup>*p. Therefore, this narrows the search space to γk* ∈ [0, *<sup>π</sup>*)*, βk* ∈ [0, *<sup>π</sup>*)*, whereas maximums inside the restricted region determine maximums in the composite space using Equation* (6)*.*

**Proposition 1** (Overlap invariance)**.** *The overlap function gp*(*<sup>γ</sup>*, *β*) *is invariant with respect to* |*t* ∈ *Bn.*

**Proof.** Each |*t* = |*<sup>t</sup>*1*t*2 ... *tn* ∈ *Bn* determines a unitary operator *U* = *U*† = *nj*=<sup>1</sup> *Xtj j* . Hence, we have

$$\begin{split} g\_{p}(\boldsymbol{\gamma},\boldsymbol{\mathfrak{f}}) &= \langle \mathbf{0}|\boldsymbol{\iota}\boldsymbol{\varPi}^{\dagger}\prod\_{k=1}^{p} e^{-i\beta\_{k}H\_{x}}e^{-i\gamma\_{k}\boldsymbol{l}\boldsymbol{l}(|\boldsymbol{\mathfrak{0}}\rangle\langle\boldsymbol{\mathfrak{0}}|)\boldsymbol{l}\boldsymbol{l}^{\dagger}}|+\rangle \\ &= \langle \mathbf{0}|\boldsymbol{\iota}\boldsymbol{I}^{\dagger}\prod\_{k=1}^{p} e^{-i\beta\_{k}H\_{x}}\langle\boldsymbol{\iota}\boldsymbol{I}e^{-i\gamma\_{k}\langle\langle\boldsymbol{\mathfrak{0}}|\boldsymbol{\mathfrak{0}}|\boldsymbol{\upleft}|\boldsymbol{\mathfrak{0}}\rangle\langle\boldsymbol{\mathfrak{0}}|}\boldsymbol{\iota}\boldsymbol{I}^{\dagger}|+\rangle \end{split} \tag{7}$$
 
$$\begin{split} &= \langle \mathbf{0}|\prod\_{k=1}^{p} e^{-i\beta\_{k}H\_{x}}e^{-i\gamma\_{k}\langle\boldsymbol{\mathfrak{0}}|\boldsymbol{\mathfrak{0}}\rangle\langle\boldsymbol{\mathfrak{0}}|}|+\rangle \end{split} \tag{7}$$

The first equality follows from *U*|**0** = |*t* where |**0** = |0 ⊗*n*. The second equality follows from the definition of the matrix exponential. The third equality follows as *U* commutes with *Hx* as does any analytic function of *Hx*, and *<sup>U</sup>*|+ ⊗*n* = |+ ⊗*n*. Thus, the overlap is seen to be independent of the target bit string |*t* .

**Remark 2.** *Overlap invariance introduced in Proposition 1 shows that optimisation problems in Equations* (4) *and* (5) *do not depend on the target. Therefore, optimal parameters are the same for any target state. Thus, with no loss of generality we limit our consideration to the target* |*t* = |**0** *.*

Preparation of state (3) requires a strategy to assign 2*p* variational parameters by outer-loop optimisation.

**Remark 3** (Global optimisation)**.** *A strategy when all* 2*p parameters are optimised simultaneously which might provide the best approximation to prepare* |*t .*

**Remark 4** (Layerwise training)**.** *Optimisation of parameters layer by layer. At each step of the algorithm, only one layer is optimised. After a layer is trained, a new layer is added and its parameters are optimised while keeping the parameters of the previous layers fixed.*

Global optimisation is evidently challenging for high-depth circuits. The optimisation can, in principle, be simplified by exploiting problem symmetries [29] and leveraging parameter concentrations [27,30]. Layerwise training might avoid barren plateaus [31] ye<sup>t</sup> is known [28] to stagnate at some critical depth, past which additional layers (trained one at a time) do not improve overlap. Local coherent noise was found to re-establish the robustness of layerwise training [28].

#### **3.** *p* **= 1 QAOA**

For a single layer, the global and layerwise strategies are equivalent. Such a circuit was considered to establish parameter concentrations [27] analytically. The overlap was shown to be:

$$\left|g\_1(\gamma,\beta)\right|^2 = \frac{1}{2^n} \left[1 + 2\cos^n\beta(\cos\left(\gamma - n\beta\right) - \cos n\beta) + 2\cos^{2n}\beta(1-\cos\gamma)\right].\tag{8}$$

To find extreme points of (8) the authors in [27] set the derivatives with respect to *γ* and *β* to zero. This approach leads to solutions which contain maxima but also the minimum of the overlap (8). These must be carefully separated. Moreover, this approach ignores the operator structure of the overlap as presented here. For aesthetics, subscript *opt* in *γop<sup>t</sup>* and *βop<sup>t</sup>* is further omitted.

**Theorem 1.** *Optimal p* = 1 *QAOA parameters relate as γ* = *π* − 2*β.*

**Proof.** To maximise the absolute value of the overlap

$$\mathbf{g} \equiv \mathbf{g}\_1(\gamma, \boldsymbol{\beta}) = \langle \mathbf{0} | e^{-i\boldsymbol{\beta}H\_x} e^{-i\gamma P} | + \rangle^{\otimes n},\tag{9}$$

with *P* = |**0 0**| we use the standard conditions *∂*(*gg*<sup>∗</sup>) *∂γ* = *∂*(*gg*<sup>∗</sup>) *∂β* = 0. Setting the first derivative to zero we arrive at

$$\langle \mathbf{0} | e^{-i\beta H\_x} e^{-i\gamma P} P | + \rangle^{\odot \text{tr}} \mathbf{g}^\* = \langle + | {}^{\odot \text{tr}} P e^{i\gamma P} e^{i\beta H\_x} | \mathbf{0} \rangle \mathbf{g}. \tag{10}$$

Using the explicit form of the projector and the fact that **<sup>0</sup>**|*e*<sup>−</sup>*iβHx* |**0** = cos*<sup>n</sup> β*, equation (10) simplifies into

$$\mathfrak{g} = \mathfrak{g}^\* \mathfrak{e}^{-2i\gamma} \Leftrightarrow \mathfrak{g} \mathfrak{e}^{i\gamma} = \mathfrak{g}^\* \mathfrak{e}^{-i\gamma},\tag{11}$$

which is equivalent to

$$\arg \mathfrak{g} = -\gamma. \tag{12}$$

Then the derivative of expression (9) with respect to *β* is set to zero and we arrive at

$$\langle \mathbf{0} | e^{-i\oint\_{\mathbf{x}} H\_{\mathbf{x}}} H\_{\mathbf{x}} e^{-i\gamma P} | + \rangle^{\odot \text{tr}} \mathbf{g}^{\*} = \langle + | {}^{\odot \text{tr}} e^{i\gamma P} H\_{\mathbf{x}} e^{i\oint H\_{\mathbf{x}}} | \mathbf{0} \rangle \mathbf{g}. \tag{13}$$

Moving *Hx* next to its eigenstate |+ ⊗*n* is compensated as follows:

$$
\begin{split} \langle \mathbf{0} | e^{-i\boldsymbol{\beta} \mathbf{H}\_{\mathbf{x}}} \left\{ e^{-i\boldsymbol{\gamma} \mathbf{P}} H\_{\mathbf{x}} + (e^{-i\boldsymbol{\gamma} \mathbf{\cdot}} - \mathbf{1}) [H\_{\mathbf{x}\prime} P] \right\} | + \rangle^{\otimes \mathbf{n}} \mathbf{g}^{\*} \\ = \langle + | ^{\otimes \mathbf{n}} \{ H\_{\mathbf{x}} e^{i\boldsymbol{\gamma} \mathbf{P}} + [P\_{\mathbf{}} H\_{\mathbf{x}}] (e^{i\boldsymbol{\gamma} \mathbf{\cdot}} - \mathbf{1}) \} e^{i\boldsymbol{\beta} \mathbf{H}\_{\mathbf{x}}} | \mathbf{0} \rangle \mathbf{g}. \end{split} \tag{14}
$$

After simplification (see Remark 5) we arrive at

$$-\mathfrak{g}A = \mathfrak{g}^\* A^\* \mathfrak{e}^{-i\gamma},\tag{15}$$

where *A* = <sup>+</sup>|<sup>⊗</sup>*<sup>n</sup>*[*<sup>P</sup>*, *Hx*]*eiβHx* |**0** . Now *g*∗ is substituted from Equation (11) to establish

$$-\varepsilon^{-i\gamma}A = A^\*.\tag{16}$$

Thus, similar to Equation (12) we arrive at

$$\arg A = \frac{\gamma + \pi}{2}.\tag{17}$$

*A* is calculated as

$$A\sqrt{2^{n}} = \langle \mathbf{0} | (H\_{\mathbf{x}} - n)e^{i\beta H\_{\mathbf{x}}} | \mathbf{0} \rangle = -n\cos^{n-1}\beta e^{-i\beta},\tag{18}$$

which shows that arg *A* = *π* − *β*. Thus, from Equation (17) we arrive at

$$
\pi - \beta = \frac{\gamma + \pi}{2},
\tag{19}
$$

which finally establishes *γ* = *π* − 2*β*.

**Remark 5** (Trivial solutions)**.** *Equation* (14) *has three pathological solutions which must be ruled out: (i)* sin *γ*2 = 0 *(which sets ei<sup>γ</sup>* − 1 = 0*), (ii)* cos *β* = 0 *(which sets A* = 0*), (iii) g*(*<sup>γ</sup>*, *β*) = 0*. All three cases imply* |*g*(*<sup>γ</sup>*, *β*)| ≤ *g*(0, <sup>0</sup>)*.*

**Remark 6.** *The zero derivative conditions result in* (11) *and* (15) *which have a similar form, viz. x* = *<sup>x</sup>*<sup>∗</sup>*eiϕ. The first condition* (11) *can be obtained without differentiation [28] using the explicit form of the overlap Equation* (9)

$$\lg \sqrt{2^n} = \varepsilon^{-i\gamma} \cos^n \beta + (\varepsilon^{-i\beta n} - \cos^n \beta),\tag{20}$$

*and the fact that* max*γ* 33*Ae*−*i<sup>γ</sup>* + *B*33 = |*A*| + |*B*| *for any A*, *B* ∈ C*. Although the derivative with respect to β leads to the condition* (15)*, we find no way to recover this using elementary alignment arguments.*

**Remark 7.** *While optimal angle relation γ* = *π* − 2*β has also been established in [27], here we demonstrate that it can result from certain ansatz symmetry, manifested in similar form of zero derivatives conditions* (11) *and* (15)*. This can provide useful insights to understand similar optimal angle dependency for deeper circuits (Section 4.2).*

To find optimal parameters one needs to solve the zero derivative conditions and then take solutions that deliver a global maximum to the overlap. For convenience, we substitute *γ* = *π* − 2*β* to the overlap function (20), square it and after simplification arrive at

$$\left|g\right|^2 2^n = 1 + 4\cos^{n+1}\beta (\cos^{n+1}\beta - \cos(n+1)\beta),\tag{21}$$

which is used to prove the next theorem.

**Theorem 2.** *The optimal p* = 1 *QAOA parameters converge as βn* → *π and γ* → *π when n* → ∞*.*

**Proof.** Using the explicit form of the overlap (20), from Equation (11) one can establish

$$\operatorname{Im}\left[e^{i\gamma}(e^{-i\beta n} - \cos^n\beta)\right] = 0.\tag{22}$$

Substituting *γ* = *π* − 2*β* one arrives at

> Im[*e*<sup>−</sup>*<sup>i</sup>*(*n*+<sup>2</sup>)*<sup>β</sup>* − *e*<sup>−</sup>2*iβ* cos*<sup>n</sup> β*)] = 0, (23)

which is equivalent to

$$
\sin(n+2)\beta = \sin 2\beta \cos^n \beta. \tag{24}
$$

We solve this equation in the limit *n* → ∞. In this limit sin 2*β* cos*<sup>n</sup> β* → 0 independent of the value of *β*. Thus, the left-hand side of Equation (24) tends to zero. This implies that the leading order solution scales as

$$\beta = \frac{k\pi}{n+2} + o(n^{-1})\tag{25}$$

where *k* < *n* is a positive integer (in principle, *n*-dependent). To recover the optimal constant *k* we substitute Equation (25) to Equation (21) to obtain

$$\left| \left| \mathbf{g} \right|^2 \mathbf{2}^n = 1 + 4 \cos^{n+2} \frac{k \pi}{n+2} \left( \cos^n \frac{k \pi}{n+2} - (-1)^k \right) \tag{26}$$

up to *o*(1) terms. Finally, as cosine is monotonously decreasing in the interval [0, *π*) it is evident that the overlap maximises for the smallest odd constant *k* = 1. Therefore, the optimal parameter *β* is given by

$$\beta = \frac{\pi}{n+2} + o(n^{-1}) = \frac{\pi}{n} + o(n^{-1}),\tag{27}$$

which implies *nβ* → *π* and thus *γ* = *π* − 2*β* → *π* when *n* → ∞.

**Remark 8.** *In Theorem 2 the leading order solutions were found for optimal parameters. Higher order corrections in n are found from Equation* (24)*. For example, it is straightforward to show that*

$$
\beta = \frac{\pi}{n} - \frac{4\pi}{n^2} + O(n^{-3}),
\tag{28}
$$

$$
\gamma = \pi - \frac{2\pi}{n} + \frac{8\pi}{n^2} + O(n^{-3}).\tag{29}
$$

**Remark 9.** *Expressions* (28) *and* (29) *are used to demonstrate parameter concentrations [27], i.e., the effect when optimal parameters for n and n* + 1 *qubits are polynomially close.*

Theorems 1 and 2 provide state of the art analytical results for state preparation with *p* = 1 depth QAOA circuit. For deeper circuits and more general settings, analysis becomes complicated and known results are mostly numerical. Therefore, below we provide a list of numerical effects for deeper circuits which lack analytical explanations.

#### **4. Empirical Findings Missing Analytical Theory**

*4.1. Parameter Concentration in p* ≥ 2 *QAOA*

> From expression (3), overlaps for circuits of different depths are related recursively as

$$g\_{p+1}(\gamma, \mathfrak{G}, \gamma\_{p+1}, \mathfrak{G}\_{p+1}) = g\_p(\gamma, \bar{\mathfrak{G}}) + g\_p(\gamma, \mathfrak{G}) \cos^n \mathfrak{G}\_{p+1} (e^{-\imath \gamma\_{p+1}} - 1),\tag{30}$$

where *β* ˜ = (*β*1 + *βp*+1, ... , *βp* + *βp*+<sup>1</sup>). This recursion was used in [27] for *p* = 2 where it was shown that in the thermodynamic limit *n* → ∞ the zero derivative conditions let one obtain solutions for which *nβ* → *π* and *γ* → *π*. This establishes parameter concentrations. The effect was further confirmed numerically on up to *n* = 17 qubits and *p* = 5 layers. For arbitrary depth, parameter concentrations are conjectured, ye<sup>t</sup> analytical confirmation remains open. The recursion (30) can be used in the suggested operator formalism to

˜

derive a system of equations to calculate optimal parameters for circuits of arbitrary depth. In the suggested formalism the zero derivative conditions will contain expectations of propagators used in the circuit, and the system can be solved in the thermodynamic limit, albeit with a growing number of equations to satisfy.

#### *4.2. Last Layer Behaviour*

Theorem 1 establishes the linear relation between optimal parameters independent of the number of qubits *n*. Using a global training strategy for the same problem with *p* ≥ 2 depth circuits, it was numerically observed [27] that optimal parameters depend on the depth, ye<sup>t</sup> usually can be approximately described by some linear relation. In the present work, we have observed that the last layer is distinctively characterised by the very same linear relation *γp* + <sup>2</sup>*βp* = *π* stated in Theorem 1. We numerically confirmed this up to *p* = 5 layers and *n* = 17 qubits, as shown in Figure 1. The effect remains unexplained analytically and could be the manifestation of some hidden ansatz symmetry.

**Figure 1.** Optimal angles of *p* = 5 depth circuit for *n* ∈ [6; <sup>17</sup>]. While the first layers can be approximately described by a linear relation, the last layer fits *γp* + <sup>2</sup>*βp* = *π*. Moreover, the values of the last layer's parameters are evidently distinct from the previous layers.

#### *4.3. Saturation in Layerwise Training at p* = *n*

It was demonstrated [28] that layerwise training *saturates*, meaning that past a critical depth *p*<sup>∗</sup>, overlap cannot be improved with further layer additions. Due to this effect, naive layerwise training performance falls below global training. Training saturation in layerwise optimisation was reported in [28] and confirmed up to *n* = 10 qubits. Most surprisingly, the saturation depth *p*∗ was observed to be equal to the number of qubits *n*. Two effects remain unexplained analytically. Firstly does *p*∗ = *n*. Secondly, could one go beyond the necessary conditions in [28] to explain saturations?

#### *4.4. Removing Saturation in Layerwise Training*

Any modification in the layerwise training process that violates the necessary saturation conditions can remove the system from its original saturation points. This idea was exploited in [28], where two types of variations were introduced for system sizes up to *n* = 7: (i) undertraining the QAOA circuit at each iteration and (ii) training in the presence of random coherent phase noise. Whereas both modifications (i) and (ii) removed saturations at *p* = *n* ye<sup>t</sup> the reason remains unexplained.

## **5. Conclusions**

We have proven a relationship between optimal QAOA parameters for *p* = 1, and we recover optimal angles in the thermodynamic limit. We demonstrated the effect of parameter concentrations for *p* = 1 QAOA circuits using an operator formalism. Compared to the explicit calculation where objective function gradients are set to zero, the operator approach exploits the ansatz symmetry in finding optimal parameters. The suggested approach can directly be adopted to find optimal parameters for the *p* ≥ 2 QAOA circuit, with increasing complexity due to the larger number of parameters. Finally, we present a list of numerical effects, observed for particular system size and circuit depth, which are ye<sup>t</sup> to be explained analytically. These unexplained effects include both limitations and advantages to QAOA. While difficult, adding missing theory to these subtle effects would improve our understanding of variational algorithms.

**Author Contributions:** Methodology, D.R., R.S., E.C., V.A. and J.B.; Writing—original draft, D.R., R.S., E.C., V.A. and J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors acknowledge support from the research project Leading Research Center on Quantum Computing (agreement No. 014/20).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.
