*Article* **Variational Autoencoder Reconstruction of Complex Many-Body Physics**

#### **Ilia A. Luchnikov 1,2, Alexander Ryzhov 1, Pieter-Jan Stas 3, Sergey N. Filippov 2,4,5 and Henni Ouerdane 1,\***


Received: 9 October 2019; Accepted: 6 November 2019; Published: 7 November 2019

**Abstract:** Thermodynamics is a theory of principles that permits a basic description of the macroscopic properties of a rich variety of complex systems from traditional ones, such as crystalline solids, gases, liquids, and thermal machines, to more intricate systems such as living organisms and black holes to name a few. Physical quantities of interest, or equilibrium state variables, are linked together in equations of state to give information on the studied system, including phase transitions, as energy in the forms of work and heat, and/or matter are exchanged with its environment, thus generating entropy. A more accurate description requires different frameworks, namely, statistical mechanics and quantum physics to explore in depth the microscopic properties of physical systems and relate them to their macroscopic properties. These frameworks also allow to go beyond equilibrium situations. Given the notably increasing complexity of mathematical models to study realistic systems, and their coupling to their environment that constrains their dynamics, both analytical approaches and numerical methods that build on these models show limitations in scope or applicability. On the other hand, machine learning, i.e., data-driven, methods prove to be increasingly efficient for the study of complex quantum systems. Deep neural networks, in particular, have been successfully applied to many-body quantum dynamics simulations and to quantum matter phase characterization. In the present work, we show how to use a variational autoencoder (VAE)—a state-of-the-art tool in the field of deep learning for the simulation of probability distributions of complex systems. More precisely, we transform a quantum mechanical problem of many-body state reconstruction into a statistical problem, suitable for VAE, by using informationally complete positive operator-valued measure. We show, with the paradigmatic quantum Ising model in a transverse magnetic field, that the ground-state physics, such as, e.g., magnetization and other mean values of observables, of a whole class of quantum many-body systems can be reconstructed by using VAE learning of tomographic data for different parameters of the Hamiltonian, and even if the system undergoes a quantum phase transition. We also discuss challenges related to our approach as entropy calculations pose particular difficulties.

**Keywords:** complex systems thermodynamics; machine learning; quantum phase transition; Ising model; variational autoencoder

## **1. Introduction**

The development of the dynamical theory of heat or classical equilibrium thermodynamics as we know it was possible only with empirical data collection, processing, and analysis, which led, through a phenomenological approach, to the definition of two fundamental physical concepts, the actual pillars of the theory: energy and entropy [1]. It is with these two concepts that the laws (or principles) of thermodynamics could be stated and the absolute temperature be given a first proper definition. Though energy remains as fully enigmatic as entropy from the ontological viewpoint, the latter concept is not completely understood from the physical viewpoint. This of course did not preclude the success of equilibrium thermodynamics as evidenced not only by the development of thermal sciences and engineering, but also because of its cognate fields that owe it, at least partly or as an indirect consequence, their birth, from quantum physics to information theory.

Early attempts to refine and give thermodynamics solid grounds started with the development of the kinetic theory of gases and of statistical physics, which in turn permitted studies of irreversible processes with the development of nonequilibrium thermodynamics [2–6] and later on finite-time thermodynamics [7–9], thus establishing closer ties between the concrete notion of irreversibility and the more abstract entropy, notably with Boltzmann's statistical definition [10] and Gibbs' ensemble theory [11]. Notwithstanding conceptual difficulties inherent to the foundations of statistical physics, such as, e.g., irreversibility and the ergodic hypothesis [12,13], entropy acquired a meaningful statistical character and the scope of its definitions could be extended beyond thermodynamics, thus paving the way to information theory, as information content became a physical quantity per se, i.e., something that can be measured [14]. Additionally, although quantum physics developed independently from thermodynamics, it extended the scope of statistical physics with the introduction of quantum statistics, led to the definition of the von Neumann entropy [15], and also introduced new problems related to small, i.e., mesoscopic and nanoscopic systems [16,17], down to nuclear matter [18], where the concepts of thermodynamic limit and ensuing standard definitions of thermodynamic quantities may be put at odds.

Quantum physics problems that overlap with thermodynamics are typically classified into different categories: ground state characterization [19], thermal state characterization at finite temperature [20], the so-called eigenstate thermalization hypothesis [21–25], calculation of the dynamics of either closed or open systems [26,27], state reconstruction from tomographic data [28], and quantum system control, which, given the complexity for its implementation, requires the development of new methods [29]. Among the rich variety of methods applicable to such problems, including, e.g., mean-field approach [30], slave particle approach [31], dynamical mean-field theory [32], nonperturbative methods based on functional integrals [33], we believe two large families of techniques are of particular interest for numerical studies of many-body systems when strong correlations must be accounted for: One is based on the quantum Monte Carlo (QMC) framework [34], which is powerful to overcome the curse of dimensionality by using the stochastic estimation of high-dimensional integrals; the other family encompasses methods that search solutions in the parametric set of functions, also called ansatz. The most used ansatzes are based on different tensor network architectures [35,36] as tensor network-based methods show state-of-the-art performance for the characterization of one-dimensional strongly correlated quantum systems. One can solve either the ground-state problem by using the variational matrix product state (MPS) ground state search [37] or a dynamical problem using a time-evolving block decimation (TEBD) algorithm [38]. Quantum criticality of one-dimensional systems also can be studied by using a more advanced architecture called multiscale entanglement renormalization ansatz (MERA) [39]. The application of tensor networks is not restricted to one-dimensional systems, and one can describe an open quantum dynamics [40], characterize the numerical complexity of an open quantum dynamics [41,42], perform tomography of non-Markovian quantum processes by using tensor networks [43,44], analyze properties of two dimensional quantum lattices by using projected entangled pair states (PEPS) [45], or solve classical statistical physics problems [46,47].

The cross-fertilization of quantum physics and thermodynamics has benefited much from the powerful quantum formalism and computational techniques; however, as thermodynamic concepts evolved from intuitive/phenomenological definitions to classical-mechanics constructs, extended with quantum physics and formalism when needed, thermodynamics, in spite of its undeniable theoretical and practical successes, never managed to fully mature into a genuine fundamental theory that firmly rests on strong basic postulates. On one hand, this led a growing number of physicists to consider thermodynamics as incomplete, and on the other, to think quantum theory as the underlying framework from which equilibrium and nonequilibrium thermodynamics emerge. Quantum thermodynamics [48,49] is a fairly recent field of play, where new ideas are tested while revisiting old problems related to cycles, engines, refrigerators, and entropy production, to name a few [50,51]. Further, quantum technology is a burgeoning field at the interface of physics and engineering, which seeks to develop devices able to harness quantum effects for computing and secure communication purposes [52,53]. The wide scale development of such a kind of systems, which irreversibly interact with an infinite environment, rests on the ability to properly simulate the open quantum dynamics of their many-body properties and analyze coherence and dissipation at the quantum level.

How fast quantum thermodynamics will progress is difficult to anticipate as there exist numerous unsolved problems, especially those related to the proper characterization of the physical processes, e.g., what qualifies as heat or work on ultrashort time and length scales, where averages become irrelevant is unclear, and how the laws of thermodynamics may be systematically adapted still may be debated. To mitigate risks of slow progress, one may resort to approaches that do not rely on models of systems, but rather on data, the idea being to gain actual knowledge and understanding from data irrespective of how complex the studied system is. Machine learning (ML) provides perfectly suited tools for that purpose [54]. ML has a rather long history that can be dated back with the works of Bayes (1763) on prior knowledge that can be used to calculate the probability of an event as formulated by Laplace (1812). Much later (1913), Markov chains were proposed as a tool to describe sequences of events, each being characterized by a probability of occurrence that depends on the actuality of the previous event only. The main milestone is in 1950, with Turing's machine that can learn [55], shortly followed in 1951 by the first neural network machine [56]. Thanks to the huge increase in computational power over the last two decades, ML is now used for a wide variety of problems [54], and quantum machine learning now shows extraordinary potential for faster and more efficient than ever treatment of complex quantum systems problems [57], one major challenge still residing in the development of the hardware capable to harness and transform this potentiality into actual tool.

With the recent success in the field of deep learning, tools other than those based on tensor networks work as well as an ansatz. Restricted Boltzmann machine has been successfully applied as an ansatz to a ground state search, dynamics calculation, and quantum tomography [58–60], as well as convolution neural network to the two-dimensional frustrated *J*1 − *J*2 model [61]. The deep autoregressive model was applied very efficiently and elegantly to a ground state search of many-body quantum system and to classical statistical physics as well [62,63]. It was also recently shown how ML can establish and classify with high accuracy the chaotic or regular behavior of quantum billiards models and XXZ spin chains [64]. Thus, it can be useful to transfer deep architectures from the field of deep learning to the area of many-body quantum systems. A variational autoencoder (VAE) was used for sampling from probability distributions of quantum states in [65]; in the present work, we show that state-of-the-art generative architecture called conditional VAE can be applied to describe the whole family of the ground states of a quantum many-body system. For that purpose, using quantum tomography (albeit in an approximate fashion as discussed below) and reconstruction tools developed in [66], we consider the paradigmatic Ising model in a transverse-field as an illustration of the usefulness and efficiency of our approach. The use of VAE in such a problem is justified by the simplicity of VAE training, as well as its expressibility [67].

The article is organized as follows. In Section 2, we give a brief recap of the physics of the Ising model in a transverse field. In Section 3, we develop our generative model in the framework of the tensor network. Section 4 is devoted to the variational autoencoder architecture. The results are shown and discussed in Section 5. The article ends with concluding remarks, followed a by a short series of appendices.

#### **2. Transverse-Field Ising Model**

Among the rich variety of condensed matter systems, magnetic materials are a source of many fruitful problems, whose studies and solutions inspired discussions and new models beyond their immediate scope. The Kondo effect (existence of a minimum of electrical resistivity at low temperature in metals due to the presence of magnetic impurities) is one such problem [68,69], as it provides an excellent basis for studies of quantum criticality and absolute zero-temperature phase transitions [70,71] and, also, on a more fundamental level, a concrete example of asymptotic freedom [69]. Assuming infinite on-site repulsion, the single-impurity Anderson model [68,72] was used to establish a correspondence between Hamiltonian language and path integral for the development of nonperturbative methods in quantum field theory [73,74]. One other important model is that of the Heisenberg Hamiltonian, defined for the study of ferromagnetic materials, and which, assuming a crystal subjected to an external magnetic field *B*, reads [75] as

$$H = -\sum\_{\langle i,j \rangle} I\_{\hat{i}\hat{j}} \hat{S}^i \hat{S}^j - \hbar \cdot \sum\_{\hat{j}} \hat{S}^j \tag{1}$$

where, for ease of notations, we introduced *h* = *gμBB*, with *g* being the Landé factor and *μB* = *eh*¯ /2*<sup>m</sup>*e being the Bohr magneton (*e*: elementary electric charge, and *m*e: electron mass); *Jij* is a parameter that characterizes the nearest-neighbors exchange interaction between electron spins on the crystal sites *i* and *j* (the quantum spins *S* ˆ*i* and *S*ˆ*j* are vector operators whose components are proportional to the Pauli matrices). For simplicity, one may consider *Jij* ≡ *J* constant. If *J* > 0, then the system is ferromagnetic and if *J* < 0 the system is antiferromagnetic. Hereafter, we fix the electron's magnetic moment *gμ*B = 1.

Although Equation (1) has a fairly simple form, the exact calculation of the partition function is

$$Z = \operatorname{Tr} \mathfrak{e}^{-\beta H} \tag{2}$$

where *β* = 1/*k*B*T* is the inverse thermal energy, which is possible on the analytical level with the mean-field approximation that simplifies the Hamiltonian (1), and also for one-dimensional systems, one difficulty of the Heisenberg Hamiltonian being that the three components of a spin vector operator do not commute. That said, Heisenberg's Hamiltonian is very useful to, e.g., study spin frustration [76], entanglement entropy [77], and also serve as a test case for density-matrix renormalization group algorithms [78]. Under zero field, Heisenberg's Hamiltonian is also a simplified form of the Hubbard model at half-filling, thus including ferromagnetism in the scope of strongly correlated systems studies.

A particular, but very important, approximation of Heisenberg's Hamiltonian, whose significance lies in physics, especially for the study of critical phenomena, cannot be underestimated: the so-called Ising model. In its initial formulation [79], Ising spins are *N* classical variables, which may take ±1 as values and form a one-dimensional (1D) system characterized by free or periodic boundary conditions. The classical partition function *Z* may be calculated analytically for the 1D Ising model, and quantities such as the average total magnetization are obtained directly [80]:

$$M = \frac{1}{\beta} \frac{\partial \ln Z}{\partial h} \tag{3}$$

In the present work, we consider a 1D quantum spin chain whose Hilbert space is given by H = '*Ni* C2. The system is described by the transverse-field Ising (TFI) Hamiltonian [81]:

$$H = -J\sum\_{\langle i,j\rangle} \sigma\_z^i \sigma\_z^{i+1} + h\_\mathbf{x} \sum\_{i=1}^N \sigma\_x^i. \tag{4}$$

where *σiα* (*α* ≡ *x*, *z*) is the Pauli matrix for the *α*-component of the *i*-th spin in the chain, and *hx* is the magnetic field applied in the transverse direction *x*. In this case, the spins are no longer the classical Ising ones and the two terms that compose the Hamiltonian *H* do not commute, therefore requiring a full quantum approach. An example of a real-world system that may be studied as a quantum Ising chain is cobalt niobate (CoNb2O6); in this case, the spins that undergo the phase transition as the transverse field varies are those of the Co2<sup>+</sup> ions [82]. The spin states are denoted |+*i* and |−*i* at ion site *i*. There are two possible ground states: when all *N* spins are in the state |+ or in the state |−, i.e., when they are all aligned, which defines the ferromagnetic phase.

The phase transition from the ferromagnetic phase to the paramagnetic phase that we speak of now is of a quantum nature, and not of a thermal nature, as here it is driven only by the external magnetic field. More precisely, when the transverse field *hx* is applied with sufficient strength, the spins align along the *x* direction, and the spin state at site *i* is given as the superposition (|+*i* + |−*i*) /√2, which is nothing else but the eigenstate of the *x*-component of the spin. Therefore, in this particular case, there is no need to raise the temperature of the system initially in the ferromagnetic phase beyond the Curie temperature to make it a paramagnet: the many-body system remains in its ground state, but its properties have changed. Further, note that unlike for the ferromagnetic phase, the quantum paramagnetic phase has spin-inversion symmetry. An insightful discussion on quantum criticality can be found in Reference [83].

Now, we briefly comment on the quantity *β* = 1/*k*B*T* in the context of quantum phase transitions, which, strictly speaking, can only occur at temperature *T* = 0 K. In fact, close to the absolute zero, where *β* → <sup>∞</sup>, their signatures can be observed as quantum fluctuations dominate thermal fluctuations in the criticality region, where the quantum critical point lies. The imaginary time formalism [84], where exp(−*βH*) is interpreted as an evolution operator, and the partition function *Z* as a path integral, provides a way to map a quantum problem onto a classical one with the introduction of the imaginary time *β* resulting from a Wick rotation in the complex plane, thus yielding one extra dimension to the model. In classical thermodynamics, to observe a phase transition in a system requires that its size (i.e., the number of constituents *N*) tends to infinity so that the order parameter is non-analytic at the transition point; so, for the quantum transition, the thermodynamic limit entails the limit *β* → ∞ also: the 1D TFI model is mapped onto an equivalent 2D classical Ising model [85]. The imaginary time formalism permits implementation of classical Monte Carlo simulations to study quantum systems. Further discussion, including the sign problem for the quantum spin-1/2 system, is available in Reference [4].

We have chosen the transverse-field Ising model as an illustrative case for our study for several reasons. First, as this system is 1-dimensional, we can apply an MPS variational ground state solver [37], and therefore obtain the ground state solution in MPS representation. We can then perform fast and exact sampling for generation of large data sets for the training of the VAE. Next, this model can be solved analytically, which allows us to adequately benchmark our results. Finally, this model shows a nontrivial behavior around the quantum phase transition point at *hx* = 1, and thus constitutes an interesting example to apply a VAE.

#### **3. Generative Model as a Quantum State**

Many-body quantum physics is rich in high-dimensional problems. Often, however, with increasing dimensionality, these become extremely difficult or impossible to solve. One solving method is through the reformulation of the quantum mechanical problem as a statistical problem, when possible. This way, machine learning can be used to effectively solve such a problem, as machine learning is a tool for the solving of high-dimensional statistical problems [86]. Probabilistic interpretation allows for using powerful sampling-based methods that work efficiently with high dimensional data.

An example of the reformulation of a quantum problem as a statistical problem is with informationally complete (IC) positive-operator valued measures (POVMs) [87]. POVMs describe the most general measurements of a quantum system. Each particular POVM is defined by a set of positive semidefinite operators *M<sup>α</sup>*, with the normalization condition ∑*α Mα* = 1, where 1 is the identity operator. The fact that the POVM is informationally complete means that using measurement outcomes one can reconstruct the state of a system with arbitrary accuracy.

The probability of measurement outcome for a quantum system with the density operator *ρ* is governed by Born's rule: *<sup>P</sup>*[*α*] = Tr(*M<sup>α</sup>*), where {*Mα*} is a particular POVM and *α* is an outcome result. In other words, any density matrix can be mapped on a mass function, although not all mass functions can be mapped on a density matrix [88,89]. Some mass functions lead to non-positive semidefinite "density matrices", which is not physically allowed. As such, quantum theory is a constrained version of probability theory. For a many-body system, these constraints can be very complicated, and direct consideration of quantum theory as a constrained probability theory is not fruitful. However, if one can access the samples of the IC POVM induced mass function, which is by definition physically allowed, this mass function can be reconstructed using generative modeling [66,67]. Samples can be obtained either by performing generalized measurements over the quantum system or by in silico simulation.

In the present work, we simulate measurements of the ground state of a spin chain with the TFI Hamiltonian, Equation (4). As a local (one spin) IC POVM, we use the so-called symmetric IC POVM for qubits (tetrahedral) POVM [90]:

$$\begin{aligned} M\_{\text{tera}}^{\mathfrak{a}} &= \frac{1}{4} \left( \mathbb{1} + \mathbf{s}^{\mathfrak{a}} \sigma \right), \; a \in (0, 1, 2, 3), \; \sigma = \left( \sigma\_{x}, \sigma\_{y}, \sigma\_{z} \right), \\\ s^{0} &= (0, 0, 1), \; s^{1} = \left( \frac{2\sqrt{2}}{3}, 0, -\frac{1}{3} \right), \; s^{2} = \left( -\frac{\sqrt{2}}{3}, \sqrt{\frac{2}{3}}, -\frac{1}{3} \right), \; s^{3} = \left( -\frac{\sqrt{2}}{3}, -\sqrt{\frac{2}{3}}, -\frac{1}{3} \right). \end{aligned} \tag{5}$$

Note that the many-spin generalization of local IC POVM can easily be obtained by considering the tensor product of local ones:

$$M\_{\text{tetra}}^{a\_1,\dots,a\_N} = M\_{\text{tetra}}^{a\_1} \odot M\_{\text{tetra}}^{a\_2} \odot \dots \odot M\_{\text{tetra}}^{a\_N} \,. \tag{6}$$

To simulate measurements outcome under the IC POVM described above, we implement the following numerical scheme: First, we run a variational MPS ground state solver to obtain the ground state of the TFI model in the MPS form:

$$
\Omega\_{i\_1, i\_2, \dots, i\_N} = \sum\_{\beta\_1, \beta\_2, \dots, \beta\_{N-1}} A\_{i\_1 \beta\_1}^1 A\_{\beta\_1 i\_2 \beta\_2}^2 \dots A\_{\beta\_{N-1} i\_N}^N \tag{7}
$$

where we use the tensor notation instead of the bra-ket notation for further simplicity, and we obtain the MPS representation of IC POVM induced mass function:

$$P[\mathfrak{a}\_1, \mathfrak{a}\_2, \dots, \mathfrak{a}\_N] = \sum\_{\delta\_1, \delta\_2, \dots, \delta\_{N-1}} \pi\_{\mathfrak{a}\_1 \delta\_1} \pi\_{\delta\_1 \mathfrak{a}\_2 \delta\_2} \dots \pi\_{\delta\_{N-1} \mathfrak{a}\_N}$$

$$\pi\_{\delta\_{n-1} \mathfrak{a}\_n \delta\_n} = \pi \underbrace{\int\_{\mathfrak{a}\_{n-1} \mathfrak{a}\_n \delta\_{n-1}}\_{\text{multi-index } \delta\_{n-1}} a\_n \underbrace{\int\_{\mathfrak{a}} \mathfrak{a}\_n \delta\_n'}\_{\text{multi-index } \delta\_n} = [M\_{\text{tetra}}]\_{ij}^{\mathfrak{a}\_n} A\_{\beta\_{n-1} j \beta\_n}^{\mathfrak{a}} [A^n]\_{\beta\_{n-1}' i \beta\_n'}^\* \tag{8}$$

whose diagrammatic representation [35] is shown in Figure 1. Next, we produce a set of samples of size *M*: {*αi*1, *αi*2, ... , *<sup>α</sup>iN*}*Mi*=<sup>1</sup>from the given probability. The sampling can be efficiently implemented

as shown in Appendix B. We call this set of samples (outcome measurements) a data set, which may then be used to train a generative model *p*[*<sup>α</sup>*1, *α*2, ... , *<sup>α</sup>N*|*θ*] to emulate the true mass function *<sup>P</sup>*[*<sup>α</sup>*1, *α*2, ... , *<sup>α</sup>N*]. Here, *θ* is the set of parameters of the generative model, which is trained by maximizing the logarithmic likelihood L(*θ*) = ∑ *M i*=1 log *p*[*α<sup>i</sup>* 1, *αi* 2, ... , *αi N*|*θ*] with respect to the parameters *θ* [91]. The trained generative model fully characterizes a quantum state. The density matrix is obtained by applying an inverse transformation to the mass function [92]:

$$\begin{aligned} \boldsymbol{\varrho} &= \sum\_{\boldsymbol{a}\_{1}, \boldsymbol{a}\_{2}, \dots, \boldsymbol{a}\_{N}} p[\boldsymbol{a}\_{1}, \boldsymbol{a}\_{2}, \dots, \boldsymbol{a}\_{N} | \boldsymbol{\theta}] [\boldsymbol{M}^{\boldsymbol{a}\_{1}}\_{\text{terra}}]^{-1} \otimes [\boldsymbol{M}^{\boldsymbol{a}\_{2}}\_{\text{terra}}]^{-1} \otimes \dots \otimes [\boldsymbol{M}^{\boldsymbol{a}\_{N}}\_{\text{terra}}]^{-1}, \\ \boldsymbol{\left[M^{\boldsymbol{a}}\_{\text{terra}}\right]} &= \sum\_{\boldsymbol{a}'} T^{-1}\_{\text{act}'} M^{\boldsymbol{a}'}\_{\text{terra}'} \\ \boldsymbol{T}\_{\text{act}'} &= \text{Tr}\left(M^{\boldsymbol{a}}\_{\text{terra}} M^{\boldsymbol{a}'}\_{\text{terra}}\right), \end{aligned} \tag{9}$$

the diagrammatic representation of which is given in Figure 2. Note that the summation included in the density matrix representation is numerically intractable, but we can estimate it using samplings from the generative model.

**Figure 1.** Tensor diagrams for (**a**) building blocks, (**b**) matrix product state (MPS) representation of measurement outcome probability, and (**c**) its subtensor.

**Figure 2.** Tensor diagrams for (**a**) building blocks and (**b**) inverse transformation from a mass function to a density matrix.

Our goal is to use a generative model as an effective representation of quantum states to calculate the mean values of observables such as, e.g., two-point and higher-order correlation functions. An explicit expression of the two-point correlation function obtained by sampling from the trained generative model is shown in Figure 3. To obtain the ground state of the TFI model, we use a

variational MPS ground state search, and we pick the bond dimension of MPS equal to 25 and perform 5 DMRG sweeps to ge<sup>t</sup> an approximate ground state in the MPS form. We use the variational MPS solver provided by the mpnum toolbox [93].

**Figure 3.** Tensor diagrams representing calculation of two-point correlation function.

#### **4. Variational Autoencoder Architecture**

In our work, we use a conditional VAE [94] to represent quantum states. A conditional VAE is a generative model expressed by the following probability distribution,

$$p[\mathbf{x}|\theta, h] = \int p[\mathbf{x}|z, \theta, h] p[z] dz,\tag{10}$$

where *x* is the data we want to simulate; *θ* represents the VAE parameters, which can be tuned to ge<sup>t</sup> the desired probability distribution over *x*; *h* is the condition; and *z* is a vector of latent variables. In our case, *x* is the quantum measurement outcome in one-hot notation. A collection of measurement outcomes is a matrix of size *N* × 4, where *N* is the number of particles in the chain and 4 is the number of possible outcomes of the tetrahedral IC POVM, which is either [1000], [0100], [0010], or [0001]. *h* is the external magnetic field. The probability distribution *p*[*x*|*<sup>z</sup>*, *θ*, *h*] can thus be written as

$$p[\mathbf{x}|z,\theta,h] = \prod\_{i=1}^{N} \prod\_{j=1}^{4} \pi\_{ij}(z\_{\prime}h\_{\prime}\theta)^{x\_{\overline{\imath}\overline{\jmath}}},\tag{11}$$

where *<sup>π</sup>ij*(*<sup>z</sup>*, *h*, *θ*) is the neural network in our architecture, and, more precisely, *<sup>π</sup>ij* is the probability of the *jth* outcome of the POVM for the *ith* spin with <sup>∑</sup>*Nj*=<sup>1</sup> *<sup>π</sup>ij* = 1 and *<sup>π</sup>ij* ≥ 0. The quantity *p*[*z*] is the prior distribution over latent variables, which is simply given by N (0, *I*) = 1 √<sup>2</sup>*π<sup>N</sup>* exp %−12 *<sup>z</sup>*T*z*&, with *I* being the identical covariance matrix. We take the number of latent variables equal to the number of spins, *N*. Essentially, we want to optimize our VAE so that its probability matches the probability of the quantum measurement outcomes as closely as possible. This can be done using the well-known maximum likelihood estimation:

$$\theta\_{MLE} = \underset{\theta}{\text{argmax}} \sum\_{i=1}^{M} \log(p[\mathbf{x}\_i | \theta, h]),\tag{12}$$

where {*xi*}*Mi*=<sup>1</sup> is the data set of outcome measurements. We cannot simply maximize this function using, for example, a gradient descent method, due to the presence of hidden variables in the structure of this function. However, we can overcome this problem by using the Evidence Lower Bound (ELBO) [95] and the reparametrization trick shown in [96]. The detailed description of the procedure is given in the Appendix A.

Once trained, the VAE is a simple and efficient way to produce new samples from its probability distribution. It can be done in three steps. First, we produce a sample from the prior distribution *p*[*z*] = N (0, *I*). Next, we feed this sample and the external magnetic field value into the neural network decoder *<sup>π</sup>ij*(*<sup>z</sup>*, *θ*, *h*), which returns the matrix of probabilities. Finally, we sample from the matrix of probability *<sup>π</sup>ij*(*<sup>z</sup>*, *θ*, *h*) to generate "fake" outcome measurements. A visual representation of the sampling method is shown in Figure 4.

**Figure 4.** Sampling scheme with the trained variational autoencoder (VAE).

In many problems, gradients of observables with respect to different model parameters yield quantities of interest. For example, one may consider the magnetic differential susceptibility tensor *χij* = *∂μi*/*<sup>∂</sup>hj*. It can be done efficiently by using backpropagation through the VAE architecture but, as samples from the VAE are discrete, a straightforward backpropagation is impossible. In recent papers [97–99], a method called the Gumbel-softmax was introduced to overcome this difficulty through continuous relaxation. The spirit, and therefore the physical meaning of the method, may be understood with a short discussion of the so-called simulated annealing technique, which is often used to solve discrete optimization problems. Broadly speaking, the simulated annealing rests on the introduction of a parameter that acts as an artificial "temperature", which varies continuously to modify the state of the system in search of a global optimum. Starting from a given state, for some values of the temperature, if the system mostly explores the neighboring states, moving among them and possibly in the vicinity of the "better" ones, i.e., with lower energy, it may ge<sup>t</sup> and remain close to a local optimum, or local energy minimum in the thermodynamic language; however, to avoid remaining in a locally optimal region, "bad" moves leading to worse (i.e., higher energy) states are useful to explore the temperature space more completely improving the chance to find a global optimum or at least to be near it. To each move an energy variation, Δ*E*, is associated; it is the continuous character of the fictitious temperature that makes the discrete problem continuous as the probability exp(−Δ*E*)/*k*B*<sup>T</sup>* of acceptance of a state is continuous. Although this approach has been known for a long time [100], it remains topical and under active development [101,102]. The method of continuous relaxation we use also exploits such an artificial temperature to make discrete samples continuous.

The Gumbel-softmax trick, consists of three steps:

1. We calculate the matrix of log probabilities, taking element-wise logarithm of decoder network 

$$\text{output: } \log \Pi = \begin{bmatrix} \log \pi\_{11} & \log \pi\_{12} \dots \log \pi\_{1N} \\ \log \pi\_{21} & \log \pi\_{22} \dots \log \pi\_{2N} \\ \log \pi\_{31} & \log \pi\_{32} \dots \log \pi\_{3N} \\ \log \pi\_{41} & \log \pi\_{42} \dots \log \pi\_{4N} \end{bmatrix}^{\prime}$$


The quantity *x*fake soft (*T*) has a number of remarkable properties: first, it becomes an exact one-hot sample when *T* → 0; second, we can backpropagate through soft samples for any T > 0. The method is validated in the next section.

Before we proceed to the presentation and discussion of our results, and to better see the added value of the VAE, it is instructive to compare MPS and VAE (NN) in terms of expressibility, i.e., "estimation of MPS states via incomplete local measurements" vs "VAE reconstruction". As the state of the system is assumed to be unknown, and some measurement outcomes are only known for different magnetic fields, these outcomes are too few for exact tomography. Further, it is known that for a given bond dimension *d*, the entangled entropy cannot be larger than log(*d*); in other words, the bond dimension of MPS places an upper bound on the entangled entropy. Thus, the MPS representation describes well only quantum states with low entangled entropy, i.e., quantum states which satisfy the area law [103,104]. The situation with neural network quantum states (NQS) is different: there is no such a restriction for NQS. Moreover, the existence of NQS with volume-law entanglement [105] shows a promising development of new, and possibly powerful, NN-based approaches to representing many-body quantum systems.

## **5. Results**

Here, we show that the VAE trained on a set of preliminary measurements is capable to describe the physics of the whole family of TFI models. We validate our results by comparing VAE-based calculations with numerically exact calculations performed by variational MPS algorithm [35]. Additionally, to assess the capabilities of the VAE, we consider a spin chain with 32 spins. We calculate the MPS representation of the ground state and extract information from it by performing measurements over the state. The external field in the *x*-direction is varied from 0 to 2 with a step of 0.1. The VAE is trained on a data set (TFI measurement outcomes) consisting of 10.5 million samples in total: 21 external fields *hx* with 500,000 samples per field.

To evaluate the VAE performance, we simply compare directly the numerically exact correlation functions with those reconstructed with our VAE. Those of *n* = 1, ... , 32, *σ*<sup>1</sup> *z σ<sup>n</sup> z* , and *σ*<sup>1</sup> *xσ<sup>n</sup> x* are shown in Figures 5 and 6, respectively, and we compare the numerically exact and the VAE-based average magnetizations along *x*, given by *σ<sup>n</sup> x* for each position of the spin along the chain, in Figure 7. We see that the VAE captures well the physics of the one- and two-point correlation functions. Figure 8 shows the total magnetizations, *μx* and *μ<sup>z</sup>*, in the *x* and *z* directions, respectively, with *μi* = 1 *N* ∑ *N <sup>j</sup>*=<sup>1</sup>*σ<sup>j</sup> i*, and we see that the VAE is a tool well-suited for the description of the quantum phase transition and also finite-size effects: whereas for the infinite TFI chain, i.e., in the thermodynamic limit, the phase transition is observed at *hx* = 1, and the finite size of the system yields a shift of the critical point at *hx* ≈ 0.9. Also note that in the *T* → 0 limit, the magnetization *M* defined in Equation (3) coincides exactly with the magnetization *μ* defined above.

A backpropagation algorithm combined with the Gumbel-softmax trick may be used to evaluate the derivative of an output over an input. We use this approach to calculate some elements of a magnetic differential susceptibility tensor *χij* = *∂μi*/*<sup>∂</sup>hj*, in particular, *χxx* and *χzx* shown in Figure 9. The backpropagation-based magnetic differential susceptibility agrees well with the numerically calculated one (central differences). The main advantage of the backpropagation-based calculation is its numerical efficiency. The VAE may thus be trained with an arbitrary set of external parameters, i.e., not only *hx*, but also *hy* and *hz*, and yield the full differential susceptibility tensor.

**Figure 5.** Two-point correlation function *σz*1*<sup>σ</sup>zn* for different values of external magnetic field *hx*.

**Figure 6.** Two-point correlation function *σx*1*σxn* for different values of external magnetic field *hx*.

**Figure 7.** Average magnetization per site along *x* for different values of external magnetic field *hx*.

**Figure 8.** Total magnetization along *x* and *z* axes for different values of external magnetic field *hx*. The location of the critical region is slightly shifted towards smaller values of *hx* due to the finite size of the chain.

**Figure 9.** Backpropagation-based and numerical-based (central differences) values of *χxx* and *χzx* for different values of external magnetic field *hx*. Both derivatives slightly fluctuate due to VAE error.

At this stage, we could conclude that the VAE is capable to describe the physics of one- and two-point correlation functions, and therefore the TFI physics. However, notwithstanding the ability of the VAE to yield correlation functions that fit well numerically-exact correlation functions, this is not ye<sup>t</sup> a full proof that it represents quantum states well. To address this point, we consider a small spin chain (five spins with TFI Hamiltonian and an external magnetic field *hx* = 0.9) for which we calculate both the exact mass function and that estimated from VAE samples. Figure 10 shows that the VAE result again fits the numerically exact mass function with high accuracy. Further, we calculate the Bhattacharyya coefficient [106]: BC(*p*vae, *p*exact) = ∑*α <sup>p</sup>*exact[*α*] *p*vae[*α*] *p*exact[*α*] as a function of the external magnetic field *hx*. Results reported in Figure 11 show that BC(*p*vae, *p*exact) > 0.99 over the whole *hx* range, which thus proves that the VAE represents a quantum state well, at least for small spin chains.

**Figure 10.** Comparison of two positive-operator valued measure (POVM)-induced mass functions (*P*[*α*] = Tr(*ρM<sup>α</sup>*)) for a chain of size 5: numerically exact mass function and reconstructed from VAE samples mass function. A sequence of indices *α* has been transformed into a single multi-index. Indices have been ordered to put numerically exact probability in descending order. A good agreemen<sup>t</sup> between the mass functions is observed.

**Figure 11.** Dependence of the classical fidelity on the external magnetic field. A high predictive accuracy is demonstrated for the whole set of fields.

The structure of the entanglement is an another interesting subject that we would like to validate. The essence of entanglement between two parts of the chain, which is split into *n* left spins and *N* − *n* right spins, can be described by the Réniy entropy of the left part of this chain: *Sα* = 1 1−*<sup>α</sup>* log Tr*ρ<sup>α</sup>*n, where *ρ*n is the density matrix of the first *n* spins in the chain. We estimate the Rényi entropy of order 2: *S*2 = − log(Tr*ρ*<sup>2</sup>), as it can be efficiently calculated from the matrix product representation of the density matrix and from the VAE samples. However, as sample-based estimation of the entangled entropy has a variance that grows exponentially with the number of spins, we consider a small spin chain of size 10. A direct comparison between the numerically exact and the VAE-based entangled entropies is shown for different values of *n* in Figure 12. For this particular case, the VAE clearly overestimates the entangled entropy. This undesirable effect is indeed observed for all sizes of spin chains, and even for the spin chain of size 5, for which we have an excellent agreemen<sup>t</sup> between the

numerically exact mass function and the VAE-based result. The entropy *S*2 is sensitive to small errors in the mass function, but it also appears that the primary method of state reconstruction used in the present work has the following shortcomings.

**Figure 12.** Comparison of the numerically exact Rényi entropy and that reconstructed from the VAE samples for different values of *n*.


**Figure 13.** Comparison of numerically exact spectra of density matrices and VAE-estimated spectra. The ground state spectra of the spin chain of size 5 with an external magnetic field *h* = 0.9 is shown on the right panel, and the spectra of the reduced density matrix (last 3 spins) are shown on the left panel.

These drawbacks hinder a robust description of the entanglement structure. In addition to the mismatch between the Rényi entropies (*S*2), the entropy of a reduced density matrix can be larger than the entropy of the whole density matrix, which is erroneous. This particular issue, now identified, may be resolved by introduction of a particular regularization term into the VAE loss. This is the object of future work.

Finally, it is also instructive to comment on the memory costs of the use of either MPS or VAE, which is somehow a tricky question, as it is unclear for any NN-based architecture what numbers of layers and neurons per layer are needed because there is no criterion for NN, whereas for the MPS and tensor networks, there is one. Thus, a direct comparison of NN architectures and tensor networks (MPS, etc.) is certainly a difficult task, and in our opinion, likely an impossible one. At this stage, we may say the following. For a given spin chain of size *N* and maximal entangled entropy between subchains *S* = −Tr*ρ* log *ρ*, the MPS requires to store approximately 2*N* exp (2*S*) complex numbers; this follows from the fact that one then considers *N* subtensors of size exp (*S*) × 2 × exp (*S*), where exp (*S*) is the typical (approximate) size of bond dimension. For a VAE, although it seems that there are no entropic restrictions, the proper quantitative characterization of the "neural network" complexity of a quantum state still is an open question (for tensor networks, it is the entangled entropy). A VAE contains two neural networks: encoder and decoder. To store a feed-forward neural network, one has to store ∑*i li*−<sup>1</sup> × *li* + *li* real numbers, with *li* being the number of neurons in the layer number *i*. In general, one may conclude that the MPS is preferable for low entangled states, and the VAE is preferable for highly entangled states.
