**1. Introduction**

Matrix inversion is extensively used in linear algebra (e.g., for solving linear equations). Although matrix inversion is already referred to in very ancient books, tremendous attention has been devoted to it (by scientists) mainly since the 17th century. The interest devoted to matrix inversion has led to the development of various methods, concepts, and algorithms for solving linear equations [1]. Solving matrix inversion is very useful in engineering, physics, and other natural sciences [2]. Solving a real-time/online matrix inversion is part of mathematics and control theory. It finds important applications in various areas such as tra ffic simulation and/or online control in the frame of intelligent transportation systems, robotics (e.g., for kinematics and inverse kinematics), communications [3], machine learning [4], smart/complex antennas (MIMO) [5–7], Field Programmable Gate Array (FPGA) [8,9], signal processing [10], image processing [11,12], and robotics [13–15], etc.

Also, the matrix inversion is very useful in several decision-making algorithms. There are plenty of di fferent heuristics and goal-programming algorithms which are using linear relationships to solve problems and they often try to solve those problems via matrix inversion; this is the case, for example, in social media networks where it can help to speed up the ranking amongs<sup>t</sup> di fferent categories [16,17].

Matrix inversion is further widely used and is of critical importance in various processing contexts in smart sensors. In some cases, the accuracy of certain measurements can be significantly improved by involving matrix inversion. For example, a real-time matrix inversion is used to ensure high-precision in multi parameter sensing while using the so-called fiber Bragg grating (FBG)-based sensors [18]. Thereby, in that related approach, the sensor functionality is approximated with a linear system and the problem is thus solved through linear system of equations. Also, this last-mentioned approximation has other advantages like the capability of re-constructing lost information, a capability used, for example, in the context of so-called compressed sensing [19,20].

In general, the measurements of di fferent physical quantities related to a dynamical system can be explained as linear measurement equations or multi-variate sensing processes [21]. The measurement of those/such mentioned quantities often requires a real-time computing of matrix inversions. Further, the relationship between di fferent measurements can also be used for calibrating purposes for the related sensors [22].

Finally, matrix inversion used in sensors related processing processes does also provide the capability to significantly reduce noise in measurements. For example, just for illustration, matrix inversion is helping to ignore noisy measurement in spatially distributed sensors [23].

To fulfil the expected role in various applications, di fferent methods have been developed to achieve both fast convergence and higher accuracy of the matrix inversion related calculations. Some of the most famous methods are the following: elimination of variables, Gaussian elimination (also known as row reduction), lower-upper (LU) decomposition, Newton's method, eigenvalue decomposition, Cholesky decomposition, and many other methods.

Generally, one can categorize matrix inversion methods into two di fferent groups: (a) the recursive (or iterative) methods; and (b) the direct methods [7,24–26].

The first group encompasses methods like Gauss–Seidel or gradient descent. The initial condition (or starting point) is provided and each step uses the last value to calculate the new value. In each iteration, the solution approximation becomes better until the desired accuracy is reached [27,28].

On the other hand, direct methods like Cholesky or Gaussian elimination typically compute the solution in finite number of iterations. They can find the exact solution if there exists no rounding error [29]. Those analytical methods normally have a minimum arithmetic complexity of *<sup>O</sup>*(*n*<sup>3</sup>) (provided those algorithms are implemented on a single CPU). For parallel systems, this arithmetic complexity is di fferent, as the algorithm should be changed in a way to fit for e fficient processing on a parallel system architecture. Hereby, depending on the parallelizability potential of the algorithm, the present number of parallel cores will a ffect the final e ffective complexity. However, in real benchmarking implementations, most parallel implementations of algorithms do require the transfer of large amounts of data amongs<sup>t</sup> processors and thereby this communication need does lead to a loss of e fficiency of the algorithm with respect to speeding-up capability in presence of multiple cores. Therefore, implementing the same algorithm on a parallel system is very ine fficient (i.e., speed-up). On the other hand, in practice, the noise problem is a very serious problem and the related denoising process is an expensive one, which can provoke a violation of the systems' real-time property. The need for developing a new concept to solve this concern is therefore su fficiently justified.

For answering the above described parallelizability problem, several parallel concepts have been introduced, one good example being the so-called cellular neural network (CNN). It has been introduced by Leon Chua in 1988 in the seminal paper entitled: "Cellular Neural Networks: Theory" [30]. In 1993, a further article entitled: "The CNN Universal Machine: An Analogic Array Computer", was published

by Tamas Roska and Leon Chua, where the first analog CNN processor was presented [31]. Since then many di fferent articles have been published to show the applicability of CNN processors for various usages.

In this article, we use a more generic concept which is called recurrent neural network (RNN), which is basically a more general neural network family to which CNN belongs. Essentially, they (i.e., RNN) are building elements of the so-called dynamic neural network (DNN) [32]. The parallel computing nature of RNN and the inherent fast processing speed while solving problems make it a very good basic brick of a novel concept for e fficiently solving di fferential problems on multiple cores [33].

An RNN processor, similarly to its ancestors (artificial neural networks), is a set of cells, which do have mathematical relations (i.e., coupling) with their respective neighbors. The dynamic property of each cell is expressed by one di fferential equation. Furthermore, the dynamic property of the network can be customized for solving a given ODE (ordinary di fferential equation) by appropriate values of ODE's parameter settings (or coe fficients) expressed in the form of matrices called "templates".

Playing with templates does provide the possibility to solve di fferent kind of problems without changing the physical property and/or the architecture of the RNN machine or processor.

The inherent flexibility and the fast processing speed while solving problems with RNN does provide two advantages to solve problems when compared to traditional or competing computational methods or concepts [34–36]. First, we can solve di fferent kind of problems by changing templates. Second, the problem solving can be significantly accelerated (see speed-up) without losing accuracy.

The above-mentioned good features of RNN (e.g., flexibility, speed-up (in presence of multiple cores), etc.) motivate the use of this promising paradigm for an accelerated solving of linear algebraic equations on either one-core or multi-core platforms. This is not a simple conversion as we do face completely di fferent hardware computing frameworks with di fferent parameters to be taken care of. All RNN parameters should be adjusted such that the resulting new dynamical system does converge to the target solution of the problem.

In this paper, we do need to formulate the central problem (i.e., realizing a "matrix inversion") in a manner such that the final state of the RNN system is the solution of the target linear algebraic system of equations; see Equation (1).

$$M\begin{pmatrix} t \end{pmatrix} X = I \tag{1}$$

In more detail, we define the linear algebraic equations system as Equation (1). Hereby, *M*(*t*) is a *n* × *n* matrix of smooth time varying real numbers and *X* is a *n* × *n* matrix, *I* is identity matrix of size *n* × *n*. We would like to find *X* such that Equation (1) is satisfied. Our target is to find corresponding RNN templates, which are such that for any initial value problem (IVP), after a finite number of iterations, our dynamic system (see Equation (2)) converges to a solution *X*<sup>∗</sup>, which approximates *X*.

.

.

$$\begin{aligned} X &= F(X) \\ \forall X \in \mathbb{R}\_{n \times n\prime} \lim\_{t \to \infty} MX - I = 0 \end{aligned} \tag{2}$$

In Equation (2), *X* = *F*(*X*) is showing an ODE in a general form. The second part of Equation (2) is showing our problem that we wish to convert into the form of an ODE solving.

There exist several published works, in which one has tried to solve similar problems with dynamical systems like recurrent neural networks (RNN) [13,37–43] or artificial neural networks (ANN) [44–47], etc. Although most of those works are also related to RNN, our work and the novel concepts presented in this paper have more potential to reliably provide faster convergence towards the solution of Equation (1).

One of the main di fferences between our method and other related works, especially RNN concepts, is that the proposed model does need training like this required for most of RNN networks required. Our model is converging to the problem's solution if and only if a correct selection/setting of internal dynamic parameters is done. Therefore, this functionality makes our model completely unique amongs<sup>t</sup> other types of RNN models, which are usually used in machine learning [48].

For training, most of the published related works use common traditional methods like gradient descent [39] to create dynamical systems and do then customize them for an implementation on a target platform.

In this paper we do compare the performance of the own novel method developed with those already implemented traditional methods from the relevant literature. This does provide the basis for a fair judgement of both advantages and disadvantages of each method, old ones versus our new one.

The overall structure of this paper is as follows. Section 2 contains both background and a comprehensive overview of previous dynamic system models used for matrix inversion. Besides, related requirements with respect to the target platform for implementing the dynamic system (RNN, ANN ... ) are discussed. In Section 3, we do formulate our problem (i.e., inverting a time-varying matrix) with required restrictions related to RNN. The e ffectiveness of the developed model for solving the matrix inversion problem will be demonstrated in Section 4. The main proof and interesting result are the one showing the global (fast) convergence of the RNN-based dynamic system to the final solution of Equation (1).

In Section 5, the new method will be used to create a new dynamical system, i.e., an RNN model. The performance of this novel RNN will be compared to that of previous/original concepts.

In the last section (i.e., Section 7), some concluding remarks summarize the quintessence of the key achievements in this work.

#### **2. Related Works of Dynamical Neural Networks**

Solving Equation (1) using any dynamic method like the so-called dynamic neural network (DNN) requires creating a dynamic system with an attractor in that system (if and only if Equation (1) has a real solution), which is a fix point equal to the solution of Equation (1). Such systems can be implemented by di fferent ways, but the most famous way of creating such dynamic systems is to use either the gradient descent method, the Zhang dynamics, or the Chen dynamics. All three named methods do essentially use the same concept, but both the second and the third named methods do use smaller step sizes and therefore more memory is required. This can improve/speed-up the convergence of the algorithms. We will be providing (in the next sections) a full explanation of the advantages of each method; afterwards a generic new method for solving Equation (1) will be provided.

#### *2.1. The Gradient Method*

This method involves a first-order optimization technique for finding the minimum point of a function. This technique takes steps in the direction of the negative gradient. It is based on the observation of where the multivariate function *F*(*X*) decreases the fastest if we choose the negative gradient of *F*(*X*) at point a: −∇*F*(*a*).

$$b = a - \gamma \nabla F(a), \; \forall \gamma \in \mathbb{R} \; and \; \gamma > 0 \tag{3}$$

γ is the step size for updating decision neurons. For small γ, *<sup>F</sup>*(*a*) ≥ *<sup>F</sup>*(*b*); with this remark, we can extend the observations by Equation (4).

$$X\_{n+1} = X\_n - \gamma \nabla F(X\_n) \tag{4}$$

That means, by iterating Equation (4), *F*(*X*) will be converging to the minimum point of the function *F*.

The gradient descent (see Equation (5)) can also be described as the "Euler method" for solving ordinary di fferential equations.

$$
\dot{X}(t) = -\gamma \nabla F(X(t))\tag{5}
$$

Gradient descent can be used to solve linear equations. In that case, the problem is reformulated as the quadratic minimization of a function which is then solving it with gradient descent method.

$$F(X) = \frac{1}{2} \|MX - I\|^2 \tag{6}$$

Then, based on Equation (6), we have:

$$
\nabla F(X) = M^I \left( MX - I \right) \tag{7}
$$

By substituting Equation (7) into Equation (5) the following dynamic model is obtained:

$$\dot{X}(t) = -\gamma \mathcal{M}^T \mathcal{M} X + \gamma \mathcal{M}^T I \tag{8}$$

The exact analytical solution of Equation (8) is expressed as the following:

$$X(t) = \left(X(0) - M^{-1}\right)e^{-\gamma M^T M t} + M^{-1} \tag{9}$$

Whereby the first derivative of *X*(*t*) denoted by . *X*(*t*) will be:

$$\dot{X}(t) = -\gamma \left( X(0) - M^{-1} \right) \left( M^T M + \dot{M}^T M t \right) e^{-\gamma M^T M t} \tag{10}$$

Equations (8) and (1) converge to the same solution described by the point *M*−1.

$$\forall \gamma, a\_{i,j} \in \mathbb{R} \text{ and } \gamma > 0, \text{ then } \gamma MM^T > 0$$

This method is also called "gradient-based" dynamics. It can be designed by norm-based energy functions [45,49]. The advantage of this model is its easiness of implementation, but due to the e ffective factor of convergence which can be seen in Equation (9) it takes (a long) time to converge to the solution of the problem, and this convergence rate has an e ffect on noise sensitivity of the model as it makes the model more sensitive to the noises. Also, this model is not appropriate for real-time matrixes.

## *2.2. Zhang Dynamics*

There exists another method to create [13,50] a dynamic system converging to the required solution. In this method, the error function is defined in the following way:

$$E(t) = M(t)X(t) - I\tag{11}$$

This means the error will be increasing proportionally to the amount of divergence of *X* from its true solution. Larger values of the di fference will create larger values of the error.

Equation (11) is changing over time in a way such that the result will be the same as Equation (2). This requires that one moves against errors to come closer to the solution. Therefore, one can define the derivation of this function as following:

.

$$E(t) = -\gamma E(t) \tag{12}$$

The solution of Equation (12) can be expressed as Equation (13). This equation is showing how the error function is decreasing exponentially in a reverse direction to the errors. This will force the function to converge to the solution. Using this dynamic system is helping to create a new dynamic system for our problem.

$$E(t) = \mathbb{C} \text{ e}^{-\gamma t} \tag{13}$$

By substitution of *E*(*t*) and *E* (*t*) into Equation (12) we will obtain a new dynamic system [11].

$$
\dot{MX} + \dot{MX} = -\gamma (MX - I) \tag{14}
$$

Whereby the solution of Equation (14) expressed as follows:

$$X(t) = \mathbb{C}e^{-\gamma t} + M^{-1} \tag{15}$$

When we make a first derivation of Equation (15) we will have:

$$
\dot{X}(t) = -\mathbf{C}\gamma e^{-\gamma t} \tag{16}
$$

By combining Equations (15) and (16) we derive Equation (14). It is observed again that by increasing t to infinite our dynamic system will be converging to the solution of Equation (1). This model has a very good convergence rate and this will solve the problem of noise sensitivity. On the other hand, it is a model which is much more di fficult to implement.
