Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems

Chen, Yuan; Khaliq, Abdul

doi:10.3390/a17040163

Open AccessArticle

Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems

by

Yuan Chen

^1,† and

Abdul Khaliq

^2,*,†

¹

Computational and Data Science Program, Middle Tennessee State University, Murfreesboro, TN 37132, USA

²

Department of Mathematical Sciences, Middle Tennessee State University, Murfreesboro, TN 37132, USA

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2024, 17(4), 163; https://doi.org/10.3390/a17040163

Submission received: 24 March 2024 / Revised: 13 April 2024 / Accepted: 17 April 2024 / Published: 19 April 2024

(This article belongs to the Special Issue Quantum and Classical Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

In this study, we investigate Quantum Long Short-Term Memory and Quantum Gated Recurrent Unit integrated with Variational Quantum Circuits in modeling complex dynamical systems, including the Van der Pol oscillator, coupled oscillators, and the Lorenz system. We implement these advanced quantum machine learning techniques and compare their performance with traditional Long Short-Term Memory and Gated Recurrent Unit models. The results of our study reveal that the quantum-based models deliver superior precision and more stable loss metrics throughout 100 epochs for both the Van der Pol oscillator and coupled harmonic oscillators, and 20 epochs for the Lorenz system. The Quantum Gated Recurrent Unit outperforms competing models, showcasing notable performance metrics. For the Van der Pol oscillator, it reports MAE 0.0902 and RMSE 0.1031 for variable x and MAE 0.1500 and RMSE 0.1943 for y; for coupled oscillators, Oscillator 1 shows MAE 0.2411 and RMSE 0.2701 and Oscillator 2 MAE is 0.0482 and RMSE 0.0602; and for the Lorenz system, the results are MAE 0.4864 and RMSE 0.4971 for x, MAE 0.4723 and RMSE 0.4846 for y, and MAE 0.4555 and RMSE 0.4745 for z. These outcomes mark a significant advancement in the field of quantum machine learning.

Keywords:

Quantum Long Short-Term Memory (QLSTM); Quantum Gated Recurrent Unit (QGRU); Van der Pol oscillator; harmonic oscillators; Lorenz system

1. Introduction

Ordinary differential equations (ODEs) are foundational mathematical models that play a critical role across diverse domains, such as physics [1], biology [2,3], and economics [4]. The numerical resolution of ODE systems is pivotal for understanding the dynamics of these varied processes. To this end, numerous methodologies have been devised, including linear multistep methods [5,6] and Runge–Kutta methods [7,8]. Nevertheless, the practical application of these methods frequently encounters the complexity of systems characterized by nonlinear and time-varying elements, alongside a pronounced sensitivity to initial conditions. Such complexities [9] significantly challenge the efficacy of traditional analytical and numerical strategies, especially in terms of achieving desired levels of precision, stability, and computational efficiency. Given these considerations, there is a pressing need for more sophisticated techniques capable of navigating the intricate landscape of ODE systems.

In recent years, recurrent neural networks (RNNs) have achieved significant breakthroughs in processing sequential data. RNNs have been used extensively in various domains, including but not limited to, time series data prediction [10,11,12], machine translation [13], and speech recognition [14]. Notably, RNNs have also played a critical role in scientific research [15,16], an area of particular interest to us. Within this framework, RNNs and their specific variants, such as Long Short-Term Memory (LSTM) [17] and Gated Recurrent Units (GRUs) [18], have proven to be efficacious in learning or forecasting complex, nonlinear challenges. The intrinsic nature of RNNs, which are capable of capturing temporal dependencies and learning from sequential data, makes them particularly suited for modeling dynamical systems. Unlike conventional approaches that require explicit numerical schemes for integration over time, RNNs can implicitly learn the underlying dynamics from data, offering a way to bypass the complexities of directly solving ordinary differential equations (ODEs). This capability allows for the modeling of complex systems with high degrees of freedom and nonlinearity, providing a flexible and potentially more accurate alternative to traditional methods. Consequently, the use of RNNs in solving ODEs, especially systems of ODEs, represents a significant shift towards data-driven approaches in the prediction of dynamical [19,20,21,22] and time series chaotic systems [23,24,25,26].

Concurrently, quantum computing has been advancing rapidly, spearheaded by technology giants such as IBM [27], Google [28], PennyLane [29], and D-Wave Systems [30]. These quantum systems offer theoretical advantages for specific computational tasks and simulations of intricate quantum phenomena. However, the practical utility of current Noisy Intermediate-Scale Quantum (NISQ) devices is constrained by limitations in quantum error correction [31,32,33]. This limitation poses a significant challenge, as it restricts the reliability and scalability of quantum computations, impeding their broader application in complex problems.

The advent of variational quantum algorithms and circuits, pioneered by Mitarai et al. [34], has been a landmark development in the integration of quantum computing with machine learning. These algorithms leverage quantum entanglement to address fundamental machine learning challenges, such as function approximation and classification [34,35]. More studies about variational quantum algorithms and circuits can be found here [36]. This innovation has facilitated the development of hybrid quantum–classical algorithms, adaptable for use on existing Noisy Intermediate-Scale Quantum (NISQ) devices. Such hybrid approaches have shown promise across various domains, including classification [37,38], generative adversarial learning [39], and deep reinforcement learning [40]. Furthermore, the exploration of quantum machine learning [41,42,43,44,45,46] has opened a new path for processing and analyzing data, harnessing the unique computational capabilities of quantum systems.

Among the studies, the study by Chen et al. [47,48] innovated in the quantum machine learning domain by introducing Quantum LSTM (QLSTM) and Quantum GRU (QGRU), integrating variational quantum circuits (VQCs) with recurrent neural network architectures. Designed for Noisy Intermediate-Scale Quantum (NISQ) devices, these quantum models utilize quantum entanglement to enhance learning efficiency and stability. Their research showcases that QLSTM and QGRU outperform traditional LSTM models in terms of learning efficiency and convergence stability, marking a significant step in sequential data learning within the quantum computing sphere. However, the scenarios examined in Chen’s studies [47,48] only involved singular ODEs, which prompted us to explore the application of QRNN to systems of ODEs.

In this research, we explore the application of Quantum Long Short-Term Memory (QLSTM) and Quantum Gated Recurrent Unit (QGRU) models, based on the Variational Quantum Circuits (VQCs) developed by Chen et al. [47,48], to complex dynamical systems described by ODEs. Our approach involves a nuanced modification to the VQC, thereby refining its compatibility with QLSTM and QGRU models for their application in this context. Our focus is on the Van der Pol oscillator, two coupled damped harmonic oscillators, and the Lorenz equations. We conduct a comparative analysis of these quantum models against their classical counterparts, LSTM and GRU, focusing on error metrics and loss evolution across epochs. This comparison aims to evaluate their effectiveness in modeling and predicting the dynamics of nonlinear and chaotic systems.

The structure of this paper is organized as follows: Section 2 presents an overview of fundamental concepts in quantum computing, including qubits, superposition, quantum gates, and entanglement. In Section 3, we introduce the computational models under investigation, namely variational quantum circuits, classical Long Short-Term Memory (LSTM), classical Gated Recurrent Unit (GRU), Quantum Long Short-Term Memory (QLSTM), and Quantum Gated Recurrent Unit (QGRU). Section 4 introduces three numerical experiments conducted to evaluate these models, including the Van der Pol oscillator, two coupled damped harmonic oscillators, and the Lorenz equations system. Section 5 is the discussion and Section 6 is the conclusion.

2. Fundamental Quantum Computing Concepts

2.1. Qubits

The core distinction between classical and quantum computing lies in their fundamental units of information: the classical bit and the quantum bit (qubit), respectively. In classical computing, a bit is a binary unit that can exist in one of two states, either 0 or 1. This binary nature underpins all classical computing processes, with data storage and computations executed through combinations of these binary states.

In contrast, the qubit, the fundamental unit of quantum computing, transcends this binary limitation. A qubit differs from a classical bit in its ability to exist in a superposition of states. Rather than being limited to a strict 0 or 1, a qubit can represent both 0 and 1 simultaneously, a phenomenon that is central to quantum computing’s potential for processing complexity.

This unique property of qubits arises from the principles of quantum mechanics. Unlike classical bits, whose state is definite and observable, a qubit’s state is described probabilistically. The actual state of a qubit is not determined until it is measured. Before measurement, a qubit exists in a superposition of the states

| 0 〉

and

| 1 〉

, where the probabilities of these states are determined by quantum amplitudes. These amplitudes, which are complex numbers, dictate the likelihood of the qubit collapsing into either the

| 0 〉

or

| 1 〉

state upon measurement.

The concept of superposition enables qubits to perform computations on a scale and at a speed unattainable by classical bits. This capability forms the basis for quantum computing’s potential to solve problems that are currently intractable for classical computers, which will be introduced in the following section.

2.2. Superpositions

Superposition in a quantum system allows a qubit to exist in multiple states simultaneously until it is measured. This characteristic differentiates a quantum bit from a classical bit, which can only be in one state at any given time. In a single qubit system, the state of a qubit can be described as a linear combination of the basis states

| 0 〉

and

| 1 〉

. Mathematically, this is represented as:

\begin{matrix} | ψ 〉 = α | 0 〉 + β | 1 〉, \end{matrix}

(1)

where

α

and

β

are complex numbers representing the probability amplitudes for the qubit being in state

| 0 〉

and

| 1 〉

, respectively. According to the Born’s rule, the probabilities of finding the qubit in either state upon measurement are

{| α |}^{2}

and

{| β |}^{2}

, with the condition that

{| α |}^{2} + {| β |}^{2} = 1

.

For example, if

α = \sqrt{0.3}

and

β = \sqrt{0.7}

, the qubit has a 30% probability of being measured in state

| 0 〉

and a 70% probability of being in state

| 1 〉

.

Extending the concept of superposition to a two-qubit system, we find a more complex scenario. In a two-qubit system, the combined state can be represented as a superposition of all four possible states of the two qubits. The state vector for such a system is given by:

\begin{matrix} | Ψ 〉 = α | 00 〉 + β | 01 〉 + γ | 10 〉 + δ | 11 〉, \end{matrix}

(2)

where

α, β, γ,

and

δ

are complex probability amplitudes corresponding to each of the four states. As per the normalization condition, the sum of the squares of the absolute values of these amplitudes must equal one:

\begin{matrix} {| α |}^{2} + {| β |}^{2} + {| γ |}^{2} + {| δ |}^{2} = 1 . \end{matrix}

(3)

For instance, if the amplitudes are

α = \sqrt{0.1}, β = \sqrt{0.2}, γ = \sqrt{0.3}, and δ = \sqrt{0.4}

, the probabilities of measuring the system in the states

| 00 〉, | 01 〉, | 10 〉,

and

| 11 〉

are 10%, 20%, 30%, and 40%, respectively.

In both single and two-qubit systems, superposition allows quantum computers to process and encode information in ways that are fundamentally different from classical computers, enabling them to perform complex calculations more efficiently.

In traditional computing, gates are fundamental building blocks that process binary information, operating on bits that exist in one of two states: 0 or 1. Common logic gates include AND, OR, NOT, and XOR, each performing a specific logical operation for computational tasks.

Similarly, quantum computing has quantum gates, which manipulate qubits, such as the Hadamard, Pauli-X, Y, Z, and Controlled-NOT (CNOT) gates, operating on these qubits, enabling complex operations that can entangle qubits, and creating correlations between them that are essential for quantum computation’s power. These gates are unitary, meaning they are reversible, a property that contrasts with some irreversible classical gates.

Here, we introduce one of the fundamental gates in quantum computing, the Hadamard gate, often used to create a superposition of states. The Hadamard gate acts on a single qubit and transforms it into a superposition of its basis states.

The Hadamard gate is represented by the following matrix:

\begin{matrix} H = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}) . \end{matrix}

(4)

This matrix operates on a qubit state vector, transforming it from a definite state (

| 0 〉

or

| 1 〉

) into a superposition. When the Hadamard gate acts on the state

| 0 〉

(represented by the vector

(\begin{matrix} 1 \\ 0 \end{matrix})

), the resulting state is:

\begin{matrix} H | 0 〉 = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}) (\begin{matrix} 1 \\ 0 \end{matrix}) = \frac{1}{\sqrt{2}} (\begin{matrix} 1 \\ 1 \end{matrix}) = \frac{| 0 〉 + | 1 〉}{\sqrt{2}} . \end{matrix}

(5)

Similarly, when it acts on the state

| 1 〉

(represented by the vector

(\begin{matrix} 0 \\ 1 \end{matrix})

), the output is:

\begin{matrix} H | 1 〉 = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}) (\begin{matrix} 0 \\ 1 \end{matrix}) = \frac{1}{\sqrt{2}} (\begin{matrix} 1 \\ - 1 \end{matrix}) = \frac{| 0 〉 - | 1 〉}{\sqrt{2}} . \end{matrix}

(6)

The Hadamard gate, therefore, creates an equal superposition of the

| 0 〉

and

| 1 〉

states.

2.3. Quantum Entanglement

Quantum entanglement is a phenomenon in which multiple qubits become interconnected in such a way that the state of one qubit cannot be described independently of the state of the other qubits, even when the qubits are separated by large distances.

To explain the significance of entanglement in this study, consider an analogy involving the conventional method of information exchange between two individuals. The most traditional method of information exchange involves one individual physically moving within audible distance to relay news to another person. This method, while effective, is indeed time-consuming and limits the speed of information transfer. However, by incorporating quantum entanglement into our framework to enable instantaneous information exchange between two entangled parties we significantly enhance the efficiency and speed of learning. This quantum advantage mirrors the leap from having to physically move towards someone to share information, to an immediate and direct exchange of knowledge.

Entanglement can be achieved using certain quantum gates. For instance, the combination of a Hadamard gate followed by a Controlled-NOT (CNOT) gate is commonly used to entangle two qubits.

First, the Hadamard gate is applied to one of the qubits to create a superposition, as previously discussed. Then, the CNOT gate, which flips the state of the second qubit if the first qubit is in the

| 1 〉

state, is applied. The operation of the CNOT gate can be described by the following matrix:

\begin{matrix} CNOT = (\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{matrix}) . \end{matrix}

(7)

This operation creates entanglement between the two qubits.

Consider two qubits, initially in the state

| 00 〉

. Applying a Hadamard gate to the first qubit creates the superposition

\frac{| 0 〉 + | 1 〉}{\sqrt{2}}

for the first qubit. Now, the combined state of the two qubits is

\frac{| 00 〉 + | 10 〉}{\sqrt{2}}

. Applying a CNOT gate afterward results in the entangled state

\frac{| 00 〉 + | 11 〉}{\sqrt{2}}

, where the state of each qubit cannot be described independently of the other.

3. Methodology

3.1. Variational Quantum Circuits (VQCs)

Variational Quantum Circuits (VQCs) are the unique part of the architecture of quantum recurrent neural networks. The specific VQC components used in this context are structured in three main parts: an encoding layer, a variational layer, and a quantum measurement layer. Figure 1 shows an example of a VQC.

The left-hand part of the figure is the encoding layer, which contains a Hadamard gate, two angle embedding layers working as

R_{y}

and

R_{z}

gates in the quantum concept. The middle part is the variational layer (entanglement layer). It is worth mentioning that the number of qubits (in this example is 4), and the number of variational layers (in this example is 2), can be modified to increase the capability to learn the data.

3.1.1. Encoding Layer

The encoding layer’s primary function is to map classical data values into quantum amplitudes. It starts with initializing the circuit in the ground state and applying Hadamard gates to create an unbiased initial state. The state of an N-qubit quantum system can be represented as:

\begin{matrix} | ψ 〉 = \sum_{(q_{1}, q_{2}, \dots, q_{N}) \in {0, 1}^{N}} c_{q_{1}, q_{2}, \dots, q_{N}} | q_{1} 〉 \otimes | q_{2} 〉 \otimes \dots \otimes | q_{N} 〉, \end{matrix}

(8)

where

c_{q_{1}, \dots, q_{N}} \in C

is the complex amplitude for each basis state and ⊗ stands for tensor product. Again, by the Born’s rule, we know that:

\begin{matrix} \sum_{(q_{1}, q_{2}, \dots, q_{N}) \in {0, 1}^{N}} {∥ c_{q_{1}, \dots, q_{N}} ∥}^{2} = 1 . \end{matrix}

(9)

The encoding layer first transforms the input data into rotation angles to the rotation of each single qubit. We apply the Hadamard gate here and transform the initial state into a superposition, also called the unbiased state. After applying the Hadamard gate H N times (once to each qubit), we will obtain:

\begin{matrix} {(H | 0 〉)}^{\otimes N} = {(\frac{1}{\sqrt{2}} (| 0 〉 + | 1 〉))}^{\otimes N} . \end{matrix}

(10)

The resulting state after applying the Hadamard gate N times is a uniform superposition of all possible states of N qubits, which can be expressed as:

\begin{matrix} \frac{1}{\sqrt{2^{N}}} ({| 0 〉}^{\otimes N} + \dots + {| 1 〉}^{\otimes N}) . \end{matrix}

(11)

Hence,

\begin{matrix} \frac{1}{\sqrt{2^{N}}} \sum_{i = 0}^{2^{N} - 1} | i 〉 . \end{matrix}

(12)

Here, i is used as an index to sum over all possible states, where i represents the decimal equivalent of the binary numbers formed by the qubits. For example, for

N = 2

, the sum would run over

| 00 〉

,

| 01 〉

,

| 10 〉

, and

| 11 〉

, which correspond to decimal 0, 1, 2, and 3, respectively.

The encoding process involves using two-angle encoding, where each data value is encoded to a qubit with a series of two gates,

R_{y}

and

R_{z}

. In this study, a template named

q m l . t e m p l a t e s . A n g l e E m b e d d i n g

is used to play the role of

R_{y}

and

R_{z}

.

The

q m l . t e m p l a t e s . A n g l e E m b e d d i n g

template effectively maps classical information onto the quantum states by applying specific rotation gates to each qubit in the system. This template can be customized for various aspects of the embedding, including selecting the axis of rotation

R_{x}

,

R_{y}

, or

R_{z}

. This adaptability makes the template well-suited for a broad spectrum of applications within the domain of quantum neural networks and other machine learning models.

After transforming the initial states into unbiased states, we will use

2 N

rotational angles from an N-dimensional input vector,

\vec{v} = (x_{1}, x_{2}, \dots, x_{N})

. For each component

x_{i}

of

\vec{v}

, in Chen’s original paper, the two angles are calculated:

θ_{i, 1} = arctan (x_{i})

for y-axis rotation and

θ_{i, 2} = arctan (x_{i}^{2})

for z-axis rotation, where the rotations are effected through

R_{y} (θ_{i, 1})

and

R_{z} (θ_{i, 2})

gates, respectively.

Despite normalizing the input data before the encoding layer, this work introduces a distinct methodology. Rather than employing arctan functions, we select the sin and cos functions for the

R_{y} (θ_{i, 1})

and

R_{z} (θ_{i, 2})

gates, respectively. This entails setting

θ_{i, 1} = sin (x_{i})

for the

R_{y}

gate and

θ_{i, 2} = cos (x_{i}^{2})

for the

R_{z}

gate, diverging from conventional arctan applications. This method represents a preliminary attempt in practice and it can be improved in future work, which is not covered within this paper. More studies related to this aspect can be found in Mitarai’s paper [34].

3.1.2. Variational Circuit

The variational circuit is the trainable part of the VQC, consisting of parameterized unitary transformations. This section of the circuit includes multiple CNOT gates for qubit entanglement and unitary rotation gates controlled by learnable parameters

α, β,

and

γ

. It is worth mentioning that the variational circuit can be repeated more than one time in practice to increase the number of parameters and the model’s expressive capacity.

3.1.3. Quantum Measurement

Quantum measurement is used for extracting classical information from the quantum circuit. It involves measuring the qubits, which, due to the probabilistic nature of quantum systems, yield varying bit strings upon each measurement. The expectation value of an operator

\hat{O}

for a state

| ψ 〉

is given by:

\begin{matrix} E [\hat{O}] = 〈 ψ | \hat{O} | ψ 〉 . \end{matrix}

(13)

The expectation values can be either calculated analytically in a quantum simulation or obtained through multiple samplings in practical quantum devices with specific noise models.

The VQC architecture described here is pivotal in the QLSTM and QGRU, as a quantum-enhanced approach to processing and learning from data.

3.2. Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network (RNN) particularly adept at learning from sequences of data. LSTMs are designed to overcome the limitations of traditional RNNs, especially issues related to long-term dependencies in data sequences. Figure 2 shows the structure of a single LSTM unit.

An LSTM unit is composed of a cell, an input gate, an output gate, and a forget gate. These components work together to regulate the flow of information into and out of the cell, and to decide which information to store and which to discard.

The key equations governing the operations within an LSTM unit are as follows:

\begin{matrix} f_{t} & = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}), \end{matrix}

(14)

\begin{matrix} i_{t} & = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}), \end{matrix}

(15)

\begin{matrix} {\tilde{C}}_{t} & = tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}), \end{matrix}

(16)

\begin{matrix} C_{t} & = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}, \end{matrix}

(17)

\begin{matrix} o_{t} & = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}), \end{matrix}

(18)

\begin{matrix} h_{t} & = o_{t} * tanh (C_{t}) . \end{matrix}

(19)

Here,

f_{t}

,

i_{t}

, and

o_{t}

represent the activations of the forget, input, and output gates, respectively, at time t.

σ

denotes the sigmoid function, and ∗ represents element-wise multiplication.

C_{t}

is the cell state at time t, and

h_{t}

is the hidden state. W and b are the weights and biases associated with the respective gates and cell state updates.

3.3. Gated Recurrent Units (GRU)

Gated Recurrent Unit (GRU) is a variation of recurrent neural networks that aims to solve the vanishing gradient problem, similar to LSTM. GRUs simplify the architecture seen in LSTM by combining certain gates and states, which often results in more efficient training for certain types of problems. The structure of a single GRU is shown in Figure 3.

The GRU architecture is built around two gates: the update gate and the reset gate. These gates determine how much of the past information needs to be passed along to the future. The key equations defining a GRU are:

\begin{matrix} z_{t} & = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z}), \end{matrix}

(20)

\begin{matrix} r_{t} & = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r}), \end{matrix}

(21)

\begin{matrix} {\tilde{h}}_{t} & = tanh (W \cdot [r_{t} * h_{t - 1}, x_{t}] + b), \end{matrix}

(22)

\begin{matrix} h_{t} & = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t} . \end{matrix}

(23)

In Equations (20)–(23),

z_{t}

and

r_{t}

represent the activations of the update and reset gates at time t, respectively.

h_{t}

is the hidden state at time t, and

{\tilde{h}}_{t}

is the candidate hidden state. W and b are the weights and biases associated with the respective gates and hidden state updates. The symbol ∗ denotes element-wise multiplication, and

σ

represents the sigmoid activation function.

GRUs provide an efficient alternative to LSTM and are particularly useful in modeling sequences where LSTM’s complex structure may not be necessary.

We need to consider the hardware realization of two activation functions here, which are the

t a n h

and

s i g m o i d

functions. On classical hardware,

t a n h

and

s i g m o i d

functions are computed directly, introducing nonlinearity in neural networks crucial for learning complex patterns:

\begin{matrix} tanh (x) & = \frac{e^{2 x} - 1}{e^{2 x} + 1}, \end{matrix}

(24)

\begin{matrix} σ (x) & = \frac{1}{1 + e^{- x}} . \end{matrix}

(25)

Classical computers execute these operations efficiently using their processing capabilities. However, quantum hardware employs a different approach due to its linear operation nature, making direct computation of

t a n h

and

s i g m o i d

functions non-trivial. The encoding and variational layers in the VQCs allow the quantum-based models to catch the nonlinear trends across data inputs. Additionally, the measurement layer outputs values within the range of

[- 1, 1]

, which can then be processed classically to implement these nonlinear functions.

3.4. Quantum Long Short-Term Memory (QLSTM)

Quantum Long Short-Term Memory (QLSTM) is a quantum-enhanced version of the traditional LSTM networks. QLSTM integrates VQCs into the LSTM architecture, aiming to leverage the computational advantages of quantum mechanics. The structure of a single QLSTM unit is shown in Figure 4.

In a QLSTM, two key memory components are present: the hidden state

h_{t}

and the cell or internal state

c_{t}

. The functioning of a QLSTM cell can be mathematically described by the following equations:

\begin{matrix} f_{t} & = σ (VQC 1 (v_{t})), \end{matrix}

(26)

\begin{matrix} i_{t} & = σ (VQC 2 (v_{t})), \end{matrix}

(27)

\begin{matrix} {\tilde{C}}_{t} & = tanh (VQC 3 (v_{t})), \end{matrix}

(28)

\begin{matrix} c_{t} & = f_{t} * c_{t - 1} + i_{t} * {\tilde{C}}_{t}, \end{matrix}

(29)

\begin{matrix} o_{t} & = σ (VQC 4 (v_{t})), \end{matrix}

(30)

\begin{matrix} h_{t} & = VQC 5 (o_{t} * tanh (c_{t})), \end{matrix}

(31)

\begin{matrix} {\tilde{y}}_{t} & = VQC 6 (o_{t} * tanh (c_{t})), \end{matrix}

(32)

\begin{matrix} y_{t} & = NN ({\tilde{y}}_{t}) . \end{matrix}

(33)

In Equations (26)–(33),

σ

represents the sigmoid function, and ∗ denotes element-wise multiplication. The input to the QLSTM cell at each time step is the concatenation

v_{t}

of the previous hidden state

h_{t - 1}

and the current input vector

x_{t}

. The VQCs mentioned in the equations refer to Variational Quantum Circuits.

3.5. Quantum Gated Recurrent Unit (QGRU)

Quantum Gated Recurrent Unit (QGRU) represents an evolution of traditional GRU networks, integrating with VQCs. The structure of a single QGRU unit is shown in Figure 5.

A QGRU cell operates based on the following equations:

\begin{matrix} r_{t} & = σ (VQC 1 (v_{t})), \end{matrix}

(34)

\begin{matrix} z_{t} & = σ (VQC 2 (v_{t})), \end{matrix}

(35)

\begin{matrix} o_{t} & = cat (x_{t}, r_{t} * H_{t - 1}), \end{matrix}

(36)

\begin{matrix} {\tilde{H}}_{t} & = tanh (VQC 3 (o_{t})), \end{matrix}

(37)

\begin{matrix} H_{t} & = z_{t} * H_{t - 1} + (1 - z_{t}) * {\tilde{H}}_{t}, \end{matrix}

(38)

\begin{matrix} y_{t} & = NN (H_{t}) . \end{matrix}

(39)

In Equations (34)–(39),

r_{t}

and

z_{t}

represent the reset and update gates of the QGRU at time t, respectively.

H_{t}

denotes the hidden state, and

{\tilde{H}}_{t}

is the candidate hidden state. The input to the QGRU cell,

v_{t}

, is the concatenation of the previous hidden state

H_{t - 1}

and the current input vector

x_{t}

. The VQCs are used to process the quantum aspects of the data.

4. Numerical Experiments

4.1. Hyperparameter Configuration

For the numerical experiments conducted in this study, specific hyperparameters were chosen to optimize the performance of the quantum models. Table 1 outlines the hyperparameter configuration used in the Van der Pol oscillator simulation and the simulation of two coupled damped harmonic oscillators:

More specifically, we are using a single-layer model with hidden size 4, and the models are evaluating over 100 epochs. We are using the backend

d e f a u l t . q u b i t

by PennyLane [16].

4.2. Van der Pol Oscillator Simulation

In the first experiment, we consider the Van der Pol oscillator, a classical example of a non-conservative oscillator with nonlinear damping. The oscillator is modeled by the following second-order differential equation:

\frac{d^{2} x}{d t^{2}} - μ (1 - x^{2}) \frac{d x}{d t} + x = 0 .

(40)

The parameter

μ

, representing the nonlinearity and strength of the damping, is set to

1.0

in our simulations. This choice of

μ

provides a balance between linear and nonlinear dynamical behaviors, also making the system ideal for predicting a wide range of oscillatory patterns.

The initial conditions for the oscillator are chosen as

x_{0} = 2.0

and

y_{0} = 0.0

, where

x_{0}

and

y_{0}

represent the initial position and velocity, respectively.

To numerically solve the Van der Pol differential equation, we convert it into a system of first-order equations:

\begin{matrix} \frac{d x}{d t} & = y, \\ \frac{d y}{d t} & = μ (1 - x^{2}) y - x . \end{matrix}

(41)

The numerical solution is obtained over a time span of 0 to 50 s, discretized into 250 time steps.

Following the numerical solution of the differential equations, the resultant time series data of x and y are normalized and prepared for analysis. The data are reshaped to fit the input requirements of RNN models, enabling us to predict the Van der Pol oscillator’s dynamics.

Simulation Results for the Van der Pol Oscillator

The behavior of the Van der Pol oscillator was examined through the LSTM, QLSTM, GRU, and QGRU models by evaluating their performance on the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) metrics.

Mean Absolute Error (MAE) is a measure of errors between paired observations. It is calculated as the average of the absolute differences between the predicted values and the actual values, disregarding the direction of the error. The MAE is given by the formula:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}| .

(42)

Root Mean Square Error (RMSE) is a quadratic scoring rule that also measures the average magnitude of the error. It is the square root of the average of squared differences between prediction and actual observation. The RMSE is given by the formula:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}},

(43)

where:

n is the number of observations;
$y_{i}$ is the actual value of the ith observation;
${\hat{y}}_{i}$ is the predicted value of the ith observation.

Data from the numerical solution of the Van der Pol differential equations formed the basis for training and testing these models. A comparative analysis of the models’ performance is summarized in Table 2 and Table 3.

Table 2 shows that quantum-based models exhibit superior predictive performance on x value over classical models. Among the models compared, the GRU models also outshine the LSTM models in accuracy and QGRU model gives us the most accurate predictions.

Table 3 compares model performances on the y value. It is very interesting to see that the models produced more error on the prediction of the y values compared to the x values. Moreover, GRU models surpass LSTM models in terms of results, with the QGRU model achieving the highest accuracy among the evaluated models.

The example predictive results over epochs by the different models are shown in Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10.

It is observable that the Van der Pol oscillator exhibits periodic behavior. It can be seen that all the models caught the patterns well. It can be observed that the QRNN models, especially, learn significantly faster than the classical RNN models from the comparison through epoch 5. Again, the QGRU model shows great capability in learning the dynamics.

By comparing the training and test loss over the epochs, the QRNN models show more stable decrease than the classical RNN models and the QGRU converges faster than QLSTM.

4.3. Two Coupled Damped Harmonic Oscillators Simulation

In this experiment, we study the capability the QRNN in learning the patterns in the system of two coupled harmonic oscillators. It is worth pointing out that this experiment actually is an extension of the prediction on a single damped harmonic oscillator by Chen [34,35]. This experiment extends our understanding not only from a single oscillator, but physical systems by introducing interactions between oscillatory motions. Such systems are paramount in numerous fields, including physics and engineering. The governing differential equations for two coupled damped harmonic oscillators, representing an extended model of the single oscillator, are as follows:

\begin{matrix} \frac{d^{2} θ_{1}}{d t^{2}} & = - \frac{b_{1}}{m_{1}} \frac{d θ_{1}}{d t} - \frac{g}{l_{1}} sin (θ_{1}) + \frac{k_{c}}{m_{1}} (θ_{2} - θ_{1}), \\ \frac{d^{2} θ_{2}}{d t^{2}} & = - \frac{b_{2}}{m_{2}} \frac{d θ_{2}}{d t} - \frac{g}{l_{2}} sin (θ_{2}) + \frac{k_{c}}{m_{2}} (θ_{1} - θ_{2}) . \end{matrix}

(44)

Here,

θ_{1}

and

θ_{2}

represent the angular displacements of the first and second pendulums, respectively. The constants

g = 9.81

m/s² (gravitational acceleration),

b_{1} = b_{2} = 0.15

(damping factors),

l_{1} = l_{2} = 1.0

m (lengths of the pendulums),

m_{1} = m_{2} = 1.0

kg (masses of the pendulums), and

k_{c} = 0.05

(coupling constant) define the system’s characteristics. The initial conditions are set with angular displacements

θ_{1} = θ_{2} = 0

and angular velocities

\dot{θ_{1}} = 3.0

,

\dot{θ_{2}} = 0.0

rad/s.

Simulation Results for the Two Coupled Damped Harmonic Oscillators

Just like the Van der Pol example, we present a comparative analysis of the models’ performance in predicting the dynamic behavior of both oscillators (again, with MAE and RMSE), which are summarized in Table 4 and Table 5.

Like the results from the first experiment, in Table 4 and Table 5 all the models successfully learn the dynamics, but the QGRU shows excellent predictive results, especially on the prediction of Oscillator 2. We noticed, first, that the predictions on Oscillator 2 are more accurate than Oscillator 1. Second, we noticed that the GRU models outshine LSTM models in general.

The example predictive results over the 100 epochs by the different models are shown in Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15.

Similar to the Van der Pol case, the GRU models converge faster than the LSTM models and the quantum-based models learn the pattern faster than the classical models (which can be seen by the comparison on the results on epoch 5). The quantum-based RNNs also show stabler decrease in loss. Here, we observe several points of interest: Firstly, there is a minor increase in the test loss in LSTM models from epoch 80 to 100, potentially indicative of mild overfitting. Although adjusting the learning rate could mitigate this, we maintain a constant learning rate to achieve a straightforward comparison between the models. Secondly, spikes are observed in the LSTM model’s test loss. Furthermore, at epoch 5, the LSTM model exhibits an undershot in performance, which is not seen in the results of the other models.

From the previous two numerical experiments, the QRNN models demonstrate exceptional capability in learning patterns by system dynamics. In the last experiment, we apply the models to chaotic systems using a different approach.

4.4. System of Lorenz Equations

The Lorenz equations, fundamental in chaos theory, model the dynamics of atmospheric convection and are characterized by their chaotic nature for certain parameter values. The system is described by the following set of differential equations:

\begin{matrix} \frac{d x}{d t} & = σ (y - x), \\ \frac{d y}{d t} & = x (ρ - z) - y, \\ \frac{d z}{d t} & = x y - β z . \end{matrix}

(45)

In Equation (45),

x, y,

and z represent the system states, and the parameters

σ, ρ,

and

β

are crucial for the system’s behavior. The chosen values

σ = 10.0

,

β = \frac{8}{3}

, and

ρ = 28.0

are known to induce chaotic dynamics in the Lorenz system. These parameters represent the Prandtl number, normalized Rayleigh number, and certain physical dimensions of the convective cells, respectively. Their specific values make the system a classic example of chaos, making it a compelling subject for studying the predictive capabilities of quantum-enhanced and classical neural networks. Particularly, the research on the utilization of classical neural networks is discussed in [49].

Data Preparation and Hyperparameter Configuration

Unlike the previous two experiments, we generated the data differently in the application of RNN models on Lorenz equations. We first randomly generated 10 datasets; we used 67% of them for training and the remaining 33% for testing. The shapes of the training and test datasets are as follows:

Training dataset shapes:
–
Features: torch.Size([3120, 10, 3]);
–
Labels: torch.Size([3120, 3]).
Testing dataset shapes:
–
Features: torch.Size([780, 10, 3]);
–
Labels: torch.Size([780, 3]).

Regarding the hyperparameter aspect, the previous two experiments demonstrate that QRNN models rapidly catch the trend of the dataset. Consequently, we aim to explore more about these models’ proficiency in learning features with shorter sequence lengths, reduced data points, and fewer epochs.

We pick sequence length 10 for all the models and all the models will be running over 20 epochs instead of the 100 epochs in the previous two experiments.

Hyperparameter configurations for LSTM, QLSTM, GRU, and QGRU models are presented in Table 6:

4.5. Results Analysis

The performances of each model on the Lorenz system data are summarized in Table 7 and Table 8. These tables detail the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for training and testing datasets across three dimensions.

According to the error metrics, quantum-based RNNs exhibit a marked increase in accuracy, particularly the QGRU model in comparison to the classical GRU. Moreover, the LSTM model demonstrates notably poorer performance compared to the other models, potentially caused by its slower rate of convergence across epochs (again, only 20 epochs were used in this case).

To further present the models’ predictions, we visualize their predictive outcomes across three dimensions, the trajectories and the losses over the epochs, in Figure 16, Figure 17, Figure 18, Figure 19, Figure 20, Figure 21, Figure 22, Figure 23 and Figure 24.

Firstly, regarding to the three-dimensional predictions, it is evident that all models correctly capture the dataset’s trend. Nonetheless, a closer look at the figure scales reveals difficulties in predicting the spike at the figure’s center for all the models (for LSTM in y and z dimensions, QLSTM in z, GRU in y and z, and QGRU in z). We believe this challenge could be solved by increasing the number of epochs, though the performance is deemed satisfactory for the current study.

Furthermore, the QGRU outperforms other models in trajectory plotting, showing a consistent decline in both training and test losses across epochs. In contrast, the classical RNNs display slight fluctuations in their loss plots. Although a spike is observed in the QLSTM’s loss plot, quantum-based RNNs, in general, tend to show a smoother and more rapid reduction in losses than the classical models.

5. Discussion

This study explores the efficacy of quantum-enhanced models, specifically the quantum Long Short-Term Memory (QLSTM) and the Quantum Gated Recurrent Unit (QGRU), in predicting the dynamics of systems governed by ordinary differential equations. These systems include Van der Pol oscillators, two damped coupled oscillators, and the Lorenz system. We benchmarked the performance of these quantum models against traditional Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) models by evaluating their accuracy through MAE and RMSE, alongside observing their training and testing loss trajectories over epochs.

The results from the three numerical experiments showcase that quantum-based models, particularly the QGRU, significantly outperform their classical counterparts in terms of predictive accuracy. This discovery highlights the capability of quantum computing techniques to improve the predictive performance of recurrent neural networks, particularly in the complex dynamics of systems represented by ordinary differential equations. However, it is important to recognize that these quantum-based models require greater computational resources, primarily because the operations of the Variational Quantum Circuit (VQC) are simulated on classical computing hardware. The same challenge is observed across numerous studies in the QRNN field, as highlighted in [50,51]. To address these computational challenges and enhance efficiency, we implemented batch sizes of 64 in this study, which significantly accelerated the processing time. Additionally, a critical aim of our ongoing research is to test the performance of these models on actual quantum hardware. This will not only confirm their theoretical benefits but also evaluate their practical utility and efficiency within a real quantum computing context.

The example code for

c l a s s Q L S T M

and

c l a s s Q G R U

can be found in the following Github link: https://github.com/YuanChenmt/QRNN-Examples (accessed on 16 April 2024).

6. Conclusions

We have validated the superior performance of quantum-based recurrent neural networks (QRNNs), particularly the Quantum Gated Recurrent Unit (QGRU), over traditional RNN models in predicting the outcomes of systems described by ordinary differential equations in terms of model accuracy, efficiency, and stability. This study aims to contribute to the understanding of QRNNs’ predictive capabilities, suggesting potential pathways for their application in complex, real-world scenarios. It modestly seeks to add to the ongoing discourse in computational science, hoping to inspire further research and exploration in the field.

Author Contributions

Conceptualization, Y.C.; methodology, Y.C. and A.K.; software, Y.C.; validation, A.K.; formal analysis, Y.C. and A.K.; investigation, A.K.; resources, A.K.; data curation, Y.C.; writing—original draft preparation, Y.C.; writing—review and editing, A.K.; visualization, Y.C.; supervision, A.K.; project administration, A.K.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This paper used manual data, no new data were created.

Acknowledgments

We would like to express our sincere gratitude to the reviewers for their insightful comments and constructive feedback, which significantly improved the quality and clarity of this manuscript. Their expertise and valuable insights have been invaluable in shaping the direction of our research and enhancing the overall impact of our work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zakwan, M.; Di Natale, L.; Svetozarevic, B.; Heer, P.; Jones, C.N.; Ferrari Trecate, G. Physically Consistent Neural ODEs for Learning Multi-Physics Systems. arXiv 2022, arXiv:2211.06130. [Google Scholar] [CrossRef]
Städter, P.; Schälte, Y.; Schmiester, L.; Hasenauer, J.; Stapor, P.L. Benchmarking of numerical integration methods for ODE models of biological systems. Sci. Rep. 2021, 11, 2696. [Google Scholar] [CrossRef] [PubMed]
Yazdani, A.; Lu, L.; Raissi, M.; Karniadakis, G.E. Systems biology informed deep learning for inferring parameters and hidden dynamics. PLoS Comput. Biol. 2020, 16, e1007575. [Google Scholar] [CrossRef] [PubMed]
Jorge, M. An Application of Ordinary Differential Equations in Economics: Modeling Consumer’s Preferences Using Marginal Rates of Substitution. Math. Methods Sci. Mech. 2014, 33. [Google Scholar] [CrossRef]
Bashforth, F.; Adams, J.C. An Attempt to Test the Theories of Capillary Action by Comparing the Theoretical and Measured Forms of Drops of Fluid; University Press: Cambridge, MA, USA, 17 October 2007; (1883) Paperback. [Google Scholar]
Dahlquist, G. A special stability problem for linear multistep methods. BIT 1963, 3, 27–43. [Google Scholar] [CrossRef]
Runge, C. Über die numerische Auflösung von Differentialgleichungen. Math. Ann. 1895, 46, 167–178. [Google Scholar] [CrossRef]
Wilhelm, K. Beitrag zur näherungsweisen Integration totaler Differentialgleichungen. Zeitschrift für Mathematik und Physik 1901, 46, 435–453. [Google Scholar]
Sottas, G. Rational Runge-Kutta Methods Are Not Suitable for Stiff Systems of ODE’s; Report SFB 123, Number 215; University of Heidelberg: Heidelberg, Germany, 1983. [Google Scholar]
Ahmed, D.M.; Hassan, M.M.; Mstafa, R.J. A Review on Deep Sequential Models for Forecasting Time Series Data. Appl. Comput. Intell. Soft Comput. 2022, 2022, 6596397. [Google Scholar] [CrossRef]
Lindemann, B.; Müller, T.; Vietz, H.; Jazdi, N.; Weyrich, M. A survey on long short-term memory networks for time series prediction. Procedia CIRP 2021, 99. [Google Scholar] [CrossRef]
Weerakody, P.B.; Wong, K.W.; Wang, G.; Ela, W. A review of irregular time series data handling with gated recurrent neural networks. Neurocomputing 2021, 441, 161–178. [Google Scholar] [CrossRef]
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112. [Google Scholar]
Graves, A.; Jaitly, N.; Mohamed, A.-R. Hybrid speech recognition with deep bidirectional LSTM. In Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 8–12 December 2013; pp. 273–278. [Google Scholar]
Flurin, E.; Martin, L.S.; Hacohen-Gourgy, S.; Siddiqi, I. Using a recurrent neural network to reconstruct quantum dynamics of a superconducting qubit from physical observations. Phys. Rev. X 2020, 10, 011006. [Google Scholar] [CrossRef]
August, M.; Ni, X. Using recurrent neural networks to optimize dynamical decoupling for quantum memory. Phys. Rev. A 2017, 95, 012335. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Gajamannage, K.; Jayathilake, D.I.; Park, Y.; Bollt, E.M. Recurrent neural networks for dynamical systems: Applications to ordinary differential equations, collective motion, and hydrological modeling. Chaos 2023, 33, 013109. [Google Scholar] [CrossRef] [PubMed]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Fu, Y.; Saab, S., Jr.; Ray, A.; Hauser, M. A Dynamically Controlled Recurrent Neural Network for Modeling Dynamical Systems. arXiv 2019, arXiv:1911.00089. [Google Scholar]
Niu, M.Y.; Horesh, L.; Chuang, I. Recurrent Neural Networks in the Eye of Differential Equations. arXiv 2019, arXiv:1904.12933. [Google Scholar]
de la Fraga, L.G.; Ovilla-Martínez, B.; Tlelo-Cuautle, E. Echo state network implementation for chaotic time series prediction. Microprocess. Microsyst. 2023, 103, 104950. [Google Scholar] [CrossRef]
Prokhorov, D. Echo state networks: Appeal and challenges. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 3, pp. 1463–1466. [Google Scholar] [CrossRef]
Vlachas, P.R.; Byeon, W.; Wan, Z.Y.; Sapsis, T.P.; Koumoutsakos, P. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks. Proc. R. Soc. A 2018, 474, 20170844. [Google Scholar] [CrossRef] [PubMed]
Meng, X.; Yang, T. Entanglement-Structured LSTM Boosts Chaotic Time Series Forecasting. Entropy 2021, 23, 1491. [Google Scholar] [CrossRef] [PubMed]
Cross, A. The IBM Q experience and Qiskit open-source quantum computing software. APS Meet. Abstr. 2018, 2018, L58. [Google Scholar]
Arute, F.; Arya, K.; Babbush, R.; Bacon, D.; Bardin, J.C.; Barends, R.; Biswas, R.; Boixo, S.; Brandao, F.G.S.L.; Buell, D.A.; et al. Quantum supremacy using a programmable superconducting processor. Nature 2019, 574, 505–510. [Google Scholar] [CrossRef] [PubMed]
Bergholm, V.; Izaac, J.; Schuld, M.; Gogolin, C.; Ahmed, S.; Ajith, V.; Alam, M.S.; Alonso-Linaje, G.; Narayanan, B.A.; Asadi, A.; et al. PennyLane: Automatic differentiation of hybrid quantum-classical computations. arXiv 2022, arXiv:1811.04968. [Google Scholar]
Lanting, T.; Przybysz, A.J.; Smirnov, A.Y.; Spedalieri, F.M.; Amin, M.H.; Berkley, A.J. Entanglement in a quantum annealing processor. Phys. Rev. X 2014, 4, 021041. [Google Scholar] [CrossRef]
Preskill, J. Quantum computing in the NISQ era and beyond. Quantum 2018, 2, 79. [Google Scholar] [CrossRef]
Gottesman, D. Stabilizer codes and quantum error correction. arXiv 1997, arXiv:quant-ph/9705052. [Google Scholar]
Gottesman, D. Theory of fault-tolerant quantum computation. Phys. Rev. A 1998, 57, 127. [Google Scholar] [CrossRef]
Mitarai, K.; Negoro, M.; Kitagawa, M.; Fujii, K. Quantum circuit learning. Phys. Rev. A 2018, 98, 032309. [Google Scholar] [CrossRef]
Du, Y.; Hsieh, M.-H.; Liu, T.; Tao, D. The expressive power of parameterized quantum circuits. arXiv 2018, arXiv:1810.11922. [Google Scholar]
Qi, J.; Yang, C.-H.; Chen, P.-Y. QTN-VQC: An end-to-end learning framework for quantum neural networks. Phys. Scr. 2024, 99, 015111. [Google Scholar] [CrossRef]
Schuld, M.; Bocharov, A.; Svore, K.; Wiebe, N. Circuit-centric quantum classifiers. arXiv 2018, arXiv:1804.00633. [Google Scholar] [CrossRef]
Havlíček, V.; Córcoles, A.D.; Temme, K.; Harrow, A.W.; Kandala, A.; Chow, J.M.; Gambetta, J.M. Supervised learning with quantum-enhanced feature spaces. Nature 2019, 567, 209–212. [Google Scholar] [CrossRef] [PubMed]
Dallaire-Demers, P.-L.; Killoran, N. Quantum generative adversarial networks. Phys. Rev. A 2018, 98, 012324. [Google Scholar] [CrossRef]
Chen, S.Y.-C.; Yang, C.-H.H.; Qi, J.; Chen, P.-Y.; Ma, X.; Goan, H.-S. Variational quantum circuits for deep reinforcement learning. IEEE Access 2020, 8, 141007–141024. [Google Scholar] [CrossRef]
Schuld, M.; Petruccione, F. Supervised Learning with Quantum Computers; Springer: Berlin/Heidelberg, Germany, 2018; Volume 17. [Google Scholar]
Chen, S.Y.-C.; Yoo, S. Federated Quantum Machine Learning. Entropy 2021, 23, 460. [Google Scholar] [CrossRef] [PubMed]
Wu, S.L.; Chan, J.; Guan, W.; Sun, S.; Wang, A.; Zhou, C.; Livny, M.; Carminati, F.; Di Meglio, A.; Li, A.C.Y. Application of quantum machine learning using the quantum variational classifier method to high energy physics analysis at the LHC on IBM quantum computer simulator and hardware with 10 qubits. J. Phys. Nucl. Part. Phys. 2021, 48, 12. [Google Scholar] [CrossRef]
Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum Machine Learning. Nature 2017, 549, 195–202. [Google Scholar] [CrossRef] [PubMed]
Dunjko, V.; Briegel, H.J. Machine Learning & Artificial Intelligence in the Quantum Domain: A Review of Recent Progress. Rep. Prog. Phys. 2018, 81, 074001. [Google Scholar] [PubMed]
Zaman, K.; Marchisio, A.; Hanif, M.A.; Shafique, M. A Survey on Quantum Machine Learning: Current Trends, Challenges, Opportunities, and the Road Ahead. arXiv 2023, arXiv:2310.10315. [Google Scholar]
Chen, S.Y.-C.; Yoo, S.; Fang, Y.-L.L. Quantum Long Short-Term Memory; Computational Science Initiative, Brookhaven National Laboratory: Upton, NY, USA, 2020.
Chen, S.Y.-C.; Fry, D.; Deshmukh, A.; Rastunkov, V.; Stefanski, C. Reservoir Computing via Quantum Recurrent Neural Networks. arXiv 2020, arXiv:2211.02612v1. [Google Scholar]
Malvern, M.; Gibbons, T.E. Learning and Modeling Chaos Using LSTM Recurrent Neural Networks. 2018. Available online: https://api.semanticscholar.org/CorpusID:212631658 (accessed on 3 March 2024).
Khan, S.Z.; Muzammil, N.; Zaidi, S.M.H.; Aljohani, A.J.; Khan, H.; Ghafoor, S. Quantum Long Short-Term Memory (QLSTM) vs Classical LSTM in Time Series Forecasting: A Comparative Study in Solar Power Forecasting. arXiv 2023, arXiv:2310.17032v2. [Google Scholar]
Siemaszko, M.; Buraczewski, A.; Le Saux, B. Rapid training of quantum recurrent neural networks. Quantum Mach. Intell. 2023, 5, 31. [Google Scholar] [CrossRef]

Figure 1. An example of VQC architecture with two layers of entanglements for QLSTM and QGRU.

Figure 2. Structure of a single unit of classical LSTM.

Figure 3. Structure of a single unit of classical GRU.

Figure 4. Structure of a single unit of QLSTM.

Figure 5. Structure of a single unit of QGRU.

Figure 6. LSTM predictions over 100 epochs for the Van der Pol oscillator. Up: x; down: y; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 7. QLSTM predictions over 100 Epochs for the Van der Pol oscillator. Up: x; down: y; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 8. GRU predictions over 100 Epochs for the Van der Pol oscillator. Up: x; down: y; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 9. QGRU predictions over 100 Epochs for the Van der Pol oscillator. Up: x; down: y; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 10. Experiment 1: Train and test losses for the models over 100 epochs. From left to right: LSTM, QLSTM, GRU, QGRU.

Figure 11. LSTM predictions over 10 epochs. Up: Oscillator 1; down: Oscillator 2; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 12. QLSTM Predictions over 10 epochs. Up: Oscillator 1; down: Oscillator 2; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 13. GRU Predictions over 10 epochs. Up: Oscillator 1; down: Oscillator 2; epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 14. QGRU Predictions over 10 epochs. Up: Oscillator 1, down: Oscillator 2 epoch: 5, 50, 70, 100; green dashed line: actual values; blue solid line: training predictions; red solid line: testing predictions.

Figure 15. Experiment 2: train and test losses for the models over 100 epochs. From left to right: LSTM, QLSTM, GRU, QGRU.

Figure 16. LSTM predictions for dimensions X (up), Y (middle), and Z (bottom).

Figure 17. QLSTM predictions for dimensions X (up), Y (middle), and Z (bottom).

Figure 18. GRU predictions for dimensions X (up), Y (middle), and Z (bottom).

Figure 19. QGRU predictions for dimensions X (up), Y (middle), and Z (bottom).

Figure 20. Predicted Lorenz system trajectory by LSTM model.

Figure 21. Predicted Lorenz system trajectory by QLSTM model.

Figure 22. Predicted Lorenz system trajectory by GRU model.

Figure 23. Predicted Lorenz system trajectory by QGRU model.

Figure 24. Experiment 3: Train and test losses for the models over 20 epochs. From left to right: LSTM, QLSTM, GRU, QGRU.

Table 1. Hyperparameter configuration.

Hyperparameter	QRNNs Value	RNNs Value
Optimizer	RMSprop	Adam
Loss Function	MSE	MSE
Backend	default.qubit	-
Number of Qubits	4	-
Layer of Entanglements	4	-
Number of Data Points	250	250
Percentage of Train Set	67%	67%
Percentage of Test Set	33%	33%
Learning Rate	0.01	0.01

Table 2. Comparison of train and test MAE and RMSE for the state of x.

Model	Train MAE	Test MAE	Train RMSE	Test RMSE
LSTM	0.2601	0.2585	0.3002	0.2986
QLSTM	0.1411	0.1397	0.1648	0.1641
GRU	0.1597	0.1591	0.1845	0.1850
QGRU	0.0868	0.0902	0.1013	0.1031

Table 3. Comparison of train and test MAE and RMSE for the state of y.

Model	Train MAE	Test MAE	Train RMSE	Test RMSE
LSTM	0.3224	0.3294	0.3737	0.3828
QLSTM	0.1959	0.1851	0.2498	0.2384
GRU	0.3336	0.3375	0.4319	0.4384
QGRU	0.1473	0.1500	0.1931	0.1943

Table 4. Comparison of train and test MAE and RMSE for Oscillator 1.

Model	Train MAE	Test MAE	Train RMSE	Test RMSE
LSTM	0.5467	0.4371	0.7761	0.4922
QLSTM	0.5284	0.4763	0.6987	0.5374
GRU	0.3648	0.4396	0.4499	0.4941
QGRU	0.2585	0.2411	0.3587	0.2701

Table 5. Comparison of train and test MAE and RMSE for Oscillator 2.

Model	Train MAE	Test MAE	Train RMSE	Test RMSE
LSTM	0.2190	0.3597	0.2814	0.4248
QLSTM	0.1244	0.2591	0.1607	0.2885
GRU	0.1160	0.0858	0.1483	0.1006
QGRU	0.0794	0.0482	0.0949	0.0602

Table 6. Hyperparameter configuration for classic models and quantum models.

Hyperparameter	Quantum	Classic
Optimizer	RMSprop	Adam
Loss Function	MSE	MSE
Backend	default.qubit	-
Number of Qubits	4	-
VQC Layer	4	-
Percentage of Train Set	67%	67%
Percentage of Test Set	33%	33%
Epochs	20	20
Learning Rate	0.01	0.01

Table 7. Mean Absolute Error (MAE) for each model on test set.

Model	X	Y	Z
LSTM	2.0361	2.1232	2.2119
QLSTM	0.8820	0.9473	1.0206
GRU	1.1684	1.1719	1.1710
QGRU	0.4864	0.4723	0.4555

Table 8. Root Mean Square Error (RMSE) for each model on test set.

Model	X	Y	Z
LSTM	2.1736	2.2718	2.3714
QLSTM	1.2234	1.2723	1.3348
GRU	1.1783	1.1748	1.1740
QGRU	0.4971	0.4846	0.4745

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Khaliq, A. Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems. Algorithms 2024, 17, 163. https://doi.org/10.3390/a17040163

AMA Style

Chen Y, Khaliq A. Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems. Algorithms. 2024; 17(4):163. https://doi.org/10.3390/a17040163

Chicago/Turabian Style

Chen, Yuan, and Abdul Khaliq. 2024. "Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems" Algorithms 17, no. 4: 163. https://doi.org/10.3390/a17040163

APA Style

Chen, Y., & Khaliq, A. (2024). Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems. Algorithms, 17(4), 163. https://doi.org/10.3390/a17040163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantum Recurrent Neural Networks: Predicting the Dynamics of Oscillatory and Chaotic Systems

Abstract

1. Introduction

2. Fundamental Quantum Computing Concepts

2.1. Qubits

2.2. Superpositions

2.3. Quantum Entanglement

3. Methodology

3.1. Variational Quantum Circuits (VQCs)

3.1.1. Encoding Layer

3.1.2. Variational Circuit

3.1.3. Quantum Measurement

3.2. Long Short-Term Memory (LSTM)

3.3. Gated Recurrent Units (GRU)

3.4. Quantum Long Short-Term Memory (QLSTM)

3.5. Quantum Gated Recurrent Unit (QGRU)

4. Numerical Experiments

4.1. Hyperparameter Configuration

4.2. Van der Pol Oscillator Simulation

Simulation Results for the Van der Pol Oscillator

4.3. Two Coupled Damped Harmonic Oscillators Simulation

Simulation Results for the Two Coupled Damped Harmonic Oscillators

4.4. System of Lorenz Equations

Data Preparation and Hyperparameter Configuration

4.5. Results Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI