**Identification, Knowledge Engineering and Digital Modeling for Adaptive and Intelligent Control**

Editors

**Natalia Bakhtadze Igor Yadykin Andrei Torgashov Nikolay Korgin**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Natalia Bakhtadze Russian Academy of Sciences Russia

Igor Yadykin V.A. Trapeznikov Institute of Control Sciences Russia

Andrei Torgashov Institute of Automation and Control Process FEB RAS Russia

Nikolay Korgin V.A. Trapeznikov Institute of Control Sciences Russia

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Mathematics* (ISSN 2227-7390) (available at: https://www.mdpi.com/journal/mathematics/special issues/Identification Knowledge Engineering Digital Modeling Adaptive Intelligent Control).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-8060-9 (Hbk) ISBN 978-3-0365-8061-6 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**



## **About the Editors**

#### **Natalia Bakhtadze**

Professor, Dr. Natalia Bakhtadze, Head of the Identification Laboratory, Institute of Control Sciences of Russian Academy of Sciences, Moscow, Russia. Author of over 200 scientific publications. Areas of Interest: Identification of Control Systems; Estimation Theory; Adaptive Control; Model Predictive Control; Data Mining; Wavelet Analysis; Control of Technological Processes in Industry and Energy; Multi-Agent Systems. Member of the editorial board of several peer-reviewed journals ("Automation and Remote Control", "Advances in Systems Science and Applications" (Associate Editor-in-Chief), "Information Technology and Computing Systems" ( Executive Editor), etc.). Vice-chair of IFAC TC 5.2. - Management and Control in Manufacturing and Logistics.

#### **Igor Yadykin**

Professor, Dr. Igor Yadykin, Institute of Control Sciences of Russian Academy of Sciences, Moscow, Russia. Author of over 250 scientific publications. Areas of Interest: power systems analysis; power systems simulation; adaptive and optimal control; mechanical engineering. Vice-chair of IFAC TC 6.3. Power and Energy Systems.

#### **Andrei Torgashov**

Dr. Sci. Andrei Torgashov, Principal Researcher, Institute of Automation and Control Processes FEB RAS, Vladivostok, Russia. Author of over 200 scientific publications. Areas of Interest: Process Control, System Identification, Process Modeling, Process Optimization, Model Predictive Control, Applied Statistics, PID Control, Stability Analysis, Control Systems Engineering, Statistical Data Analysis, Advanced Control Theory, Optimal Control, Modeling and Simulation, System Modeling, Advanced Control Systems APC.

#### **Nikolay Korgin**

Professor, Dr. Nikolay Korgin, Institute of Control Sciences of Russian Academy of Sciences, Moscow, Russia. Author of over 200 scientific publications. Areas of Interest: Mechanism Design; Game theory; Power Systems Analysis; Mechanical Engineering; Identification problems; Organizational Behavior; Mechanism Design; Organizational Theory; Organizational Management; Control Systems Engineering; Microeconomics; Strategic Management; Optimal Control; Strategic Planning; Leadership; Robustness; Human Resource Management; Organizational Development; Organizational Culture.

## **Preface to the Special Issue on "Identification, Knowledge Engineering and Digital Modeling for Adaptive and Intelligent Control"—Special Issue Book**

**Natalia Bakhtadze**

Institute for Control Sciences, Russian Academy of Sciences, 117806 Moscow, Russia; sung7@yandex.ru

Starting our work on this Special Issue, we assumed that the research results presented here would reflect the solutions to various problems related to production management; however, the set of identified problems showed that their solutions could be useful for a wider range of applications. Therefore, we have presented 14 articles covering various aspects of the new trends in adaptive and intelligent control and identification.

The results of research on the theories and methodologies of identification are presented. New methods for solving the problems of parametric and non-parametric identification are proposed, and the possibilities of using data mining and knowledge engineering methods for identifying control systems and building digital models of dynamic processes in real time are studied. Various aspects of constructing intelligent control systems with an identifier and reinforcement learning are discussed and the possibilities of intelligent model predictive control and its application to control objects of various natures, as well as stability problems, are investigated. Approaches to building models of strategic decision making under informational control are also proposed.

A general complex model is presented in [1] for collective dynamical strategic decision making with explicitly interconnected factors reflecting both the psychic (internal state) and behavioral (external action, result of activity) components of agents' activity under specified environmental and control factors. This model unifies and generalizes the approaches of game theory, social psychology, and the theory of multi-agent systems and control in organizational systems through a simultaneous consideration of both the internal and external parameters of the agents. Article [2] carries out a comparative analysis of the known methods for the synthesis of various control laws ensuring the invariance of the output (controlled) variable with respect to external disturbances, under various assumptions about their type and channels of acting on the control plant. Synthesis methods are presented by the example of a third-order nonlinear system with a single input and single output (SISO-system). For the systems where the matching conditions are not satisfied, the paper draws a conclusion on the expediency of introducing smooth and bounded nonlinear local feedbacks. In Ref. [3], the stability of bilinear systems is investigated using spectral techniques such as selective modal analysis. Predictive models of bilinear systems based on inductive knowledge extracted by big data mining techniques are applied with associative search of statistical patterns. In Ref. [4], the intelligent computational algorithms of evolutionary computing paradigms (ECPs) are presented, which effectively solve complex nonlinear optimization problems. The maximum-likelihood-based adaptive differential evolution algorithm (ADEA) is investigated for the identification of nonlinear Hammerstein output error (HOE) systems that are widely used for modeling various nonlinear processes in engineering and applied sciences. In Ref. [5], the stability of a bilinear system is investigated by the Gramian method. The paper shows that the state of a bilinear control system can be split uniquely into generalized modes corresponding to the eigenvalues of the dynamics matrix. The Gramians of the controllability and observability of a bilinear system can be divided into parts (sub-Gramians) that characterize the measure of these

**Citation:** Bakhtadze, N. Preface to the Special Issue on "Identification, Knowledge Engineering and Digital Modeling for Adaptive and Intelligent Control"—Special Issue Book. *Mathematics* **2023**, *11*, 1906. https://doi.org/10.3390/ math11081906

Received: 3 April 2023 Accepted: 14 April 2023 Published: 18 April 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

generalized modes and their interactions. In Ref. [6], the system identification properties of Dynamic Mode Decomposition (DMD) are studied. DMD is a popular data-driven framework for extracting linear dynamics from complex high-dimensional systems. In Ref. [7], a direct method for the synthesis of robust systems operating under parametric uncertainty in a control plant model is proposed. The developed robust control procedures are based on the assumption that the structural properties of the nominal system survive over the entire range of parameter changes. The authors in [8] show that for simulators providing vestibular stimulus, the automatic vestibular–ocular reflex (VOR) bodily function can objectively measure the accuracy of motion simulation. This requires a model of ocular response to enforced accelerations, which is offered in the paper. The model corresponds to a single-layer spiking differential neural network; its activation functions are based on the dynamic Izhikevich model of neuron dynamics.

The authors in [9] discuss the analysis and optimization of stochastic systems based on canonical wavelet expansions. A wavelet model for the calibration of essentially nonstationary stochastic processes and parameters is developed. In Ref. [10], a new algorithm is proposed for constructing an integral model of an input–output-type nonlinear dynamic system in the form of a quadratic segment of the Volterra integro-power series (polynomial). It examines the nonparametric identification of models using physically realizable piecewise linear test signals in the time domain.

In Ref. [11], a multi-output soft sensor for the industrial reactive distillation process of methyl tert-butyl ether (MTBE) is developed. Unlike the existing approaches, the paper offers soft sensors with filters to predict model errors, which are further considered as corrections in the final output forecasts. The authors in [12] consider the mathematical aspects of the problem of the optimal interception of a mobile search vehicle moving along random tacks on a given route and searching for a target, which travels parallel to this route. The interception problem was formulated as an optimal stochastic control problem, which was transformed to a deterministic optimization one.

The article [13] is aimed at numerical studies of inverse problems of experiment processing (identification of unknown parameters of mathematical models from experimental data) based on balanced identification technology. This technology uses the cross-validation root-mean-square error to select the values of the regularization parameters. The authors in [14] discuss the identification of plasma equilibrium reconstruction in D-shaped tokamaks on the basis of external magnetic plasma measurements. Such identification methods are aimed at increasing the speed of response when plasma discharges are relatively short, such as in the spherical Globus-M2 tokamak.

As Guest Editor of this Special Issue, I am grateful to the authors of these articles for their quality contributions, to the reviewers for their valuable comments, and to the administrative staff of MDPI for the support to complete this Special Issue. Special thanks to the Section Managing Editor Ms. Krystal Wang for her excellent collaboration and valuable assistance.

**Funding:** This research was funded by the Russian Science Foundation, grant number [19-19-00673-P].

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Models of Strategic Decision-Making under Informational Control**

**Dmitry Novikov**

V.A. Trapeznikov Institute of Control Sciences, 117997 Moscow, Russia; novikov@ipu.ru; Tel.: +7-4953347569

**Abstract:** A general complex model is considered for collective dynamical strategic decision-making with explicitly interconnected factors reflecting both psychic (internal state) and behavioral (externalaction, result of activity) components of agents' activity under the given environmental and control factors. This model unifies and generalizes approaches of game theory, social psychology, theories of multi-agent systems, and control in organizational systems by simultaneous consideration of both internal and external parameters of the agents. Two special models (of informational control and informational confrontation) contain formal results on controllability and properties of equilibriums. Interpretations of a general model are conformity (threshold behavior), consensus, cognitive dissonance, and other effects with applications to production systems, multi-agent systems, crowd behavior, online social networks, and voting in small and large groups.

**Keywords:** decision-making; psychic and behavioral components of activity; action; result of activity; equilibrium stability; consensus; threshold behavior; cognitive dissonance; conformity; informational control; informational confrontation

#### **1. Introduction**

What factors influence the decisions one makes? Each scientific domain gives its own answer, which is correct in the paradigm of its particular domain. For example, the *theory of individual decision-making* says that the main factor is the *utility* of the decision-maker. *Game theory* answers that it's a set of decisions made by others. *Psychology* says that it's a person's internal state (including their beliefs, attitudes, etc.). Table 1 contains factors of decisionmaking (columns), scientific domains (rows), and the author's subjective expert judgment on the degree (conventionally reflected by the number of plus signs in the corresponding cell) of taking into account the factors by the domains. Since all these domains are immense (but none of them explores a combination of more than two factors), references are given on several main books or representative survey papers.

In this paper, a model of strategic collective decision-making, which equally considers all of the factors listed in the columns of Table 1, is considered. The model includes explicit interconnected parameters, reflecting both psychic (state) and behavioral (action and activity result, see [1]) components of an *agent*'s activity. Following the methodology proposed in [2], we study the mutually influencing processes of the dynamics of the agent's internal states, actions, and activity results and the properties of the corresponding equilibria.

In decision-making, organizational systems control, and collective behavior, the traditional models of dynamics cover either *the behavioral components of activity* [1] (externally manifested, observable), the *actions* and (or) *activity results* of different agents [3], or *the psychic components of activity*, their "*internal states*" (opinions, beliefs, attitudes, etc.; see surveys in [4,5]), which are "internal" variables and are not always completely observable.

**Citation:** Novikov, D. Models of Strategic Decision-Making under Informational Control. *Mathematics* **2021**, *9*, 1889. https://doi.org/ 10.3390/math9161889

Academic Editor: Vassilis C. Gerogiannis

Received: 3 July 2021 Accepted: 7 August 2021 Published: 9 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).


**Table 1.** Decision-making factors and related scientific domains.

In the general case, the strategic (goal-oriented) decisions of an agent can be affected by:


The first three groups of sources of informational influence are "passive." The fourth source of influence—*control*—is active, and there may exist several agents affecting a given agent; see the model of informational confrontation in Section 6 below.

In the following paper, we introduce a general complex model of collective decisionmaking and control with explicit interconnected factors, reflecting both the psychic and behavioral components of activity. Some practical interpretations are conformity effects [10,11] as well as applications to production systems [25,27], multi-agent systems [23], crowd behavior [28], online social networks [29], and voting in small and large groups [9].

The main results are:


**Figure 1.** Structure of decision-making process [2].

This paper is organized as follows: in Section 2, the general structure of the decisionmaking process is considered. In Section 3, the well-known particular models of informational control, conformity behavior, etc., are discussed. In Section 4, the simple majority voting model is used as an example to present the original results on the mutually influencing processes of the dynamics of the agent's states and actions (the psychic and behavioral components of activity) and the properties of the corresponding equilibria. Section 5 is devoted to the model of informational confrontation between two agents, trying to control—influence on the third one—simultaneously in their own interests.

#### **2. Decision-Making Model**

Consider a set *N* = {1, 2, ... , *n*} of interacting *agents*. Each agent is assigned a number (subscript). Discrete time instants (periods) are indicated by superscripts. Assume that there is a single control authority (*principal*) purposefully affecting the activity of different agents by *control* {*ui* ∈ *Ui*}.

We introduce a parameter *ri* ∈ *Ri* (internal "*state*") of agent *i*, which reflects all his characteristics of interest, including his *personality structure* [1]. In applications, the agent's state can be interpreted as his *opinion*, *belief*, or *attitude* (e.g., his *assessment* of some object or agent), the effectiveness of his activity, the rate of his learning, the desired result of his activity, etc.

Let agent *i* choose *actions* from a set of admissible ones; *Ai*. His action is denoted by *yi* (*yi* ∈ *Ai*). The agent chooses their actions, and the *results* of their activity are realized accordingly, which is denoted by *zi* ∈ *Azi*, where *Azi* is a set of admissible activity results of agent *i*. The agent's action and the result of his activity may mismatch due to *uncertainty factors*, including an *environment* with a state *ω* ∈ Ω or the actions of other agents; see Figure 1.

The connection between the agent's action and the result of his activity may have a complex nature described by probability distributions, fuzzy functions, etc. [26]. For the sake of simplicity, assume that the activity result *zi* of agent *i* is a given real-valued deterministic function *Ri*(*yi*, *y*-*i*, *ω*) that depends on his action, the vector *y*−*<sup>i</sup>* = (*y*1, ... , *yi*−1, *yi*+1, ... , *yn*) of actions of all other agents (the so-called *opponent's action profile* for agent *i*), and the environment's state ω. The function *Ri*(·) is called the *technological function* [27,30].

Suppose that each agent always knows his state, and his action is completely observable for him and all other agents.

Let agent *i* have *preferences* on a set *Azi* of activity results. In other words, agent *i* has the ability to compare different results of his activity. The agent's preferences are described by his *utility function* (goal function, or payoff function) <sup>Φ</sup>*i*: *Azi* <sup>×</sup> *Ri* <sup>→</sup> ◦1: under a fixed state, of the two activity results, the agent prefers the one with the utility function of greater value. The agent's behavior is rational in the sense of maximizing his utility.

When choosing an action, the agent is guided by his preferences and how the chosen action affects the result of his activity. Given his state, the environment's state, and the actions of other agents, agent *i* chooses an action *y*∗ *<sup>i</sup>* maximizing his utility:

$$y\_i^\*(y\_{-i\prime}^\* r\_{i\prime} \,\omega) \, = \, \arg\max\_{y\_i \in A\_i} \Phi\_i(\mathbb{R}\_i(y\_{i\prime} y\_{-i\prime}^\* \,\omega) \, \_i r\_i), \, i \in N. \tag{1}$$

The expression (1) defines a *Nash equilibrium* of the agents' normal form game [8], in which they choose their actions once, simultaneously, and independently under *common knowledge* about the technological functions, utility functions, the states of different agents, and the environment's state [26].

The structure in Figure 1 is very general and covers, as particular cases, the following processes and phenomena:


(Whenever several factors appear simultaneously in a process or phenomenon, the corresponding arrows in a sequence are conventionally separated by commas.)

Let us specify the decision-making model.

#### **3. General Model**

We introduce a series of assumptions. (Their practical interpretations are discussed below).

**Assumption 1.** *Ai* = *Azi* = *Ri* = *Ui* = [0, 1], *i* ∈ *N*.

**Assumption 2.** *Ri*(*yi*, *y*−*i*, *θ*) = *R*(*yi*, *y*−*i*), *i* ∈ *N*.

**Assumption 3.** *Under a fixed state ri of agent i, his utility function <sup>Φ</sup>i*: [0, 1]2 → *is singlepeaked* with the *peak point ri*, *i* ∈ *N [26].*

**Assumption 4.** *The function R(*·*) is continuous, strictly monotonically increasing in all variables, and satisfies the unanimity condition:* ∀*a* ∈ [0, 1] R*(*a*,* ... *,* a*) =* a*.*

Assumption 1 is purely "technical": as seen in the subsequent presentation, many results remain valid for a more general case of convex and compact admissible sets.

Assumption 2 is more significant, as it declares the following. First, the activity result (*collective decision*) *z* = *R*(*yi*, *y*−*i*) is the same for all agents. Second, there is no uncertainty about the environment's state. The agent's state determines his *preferences*—-attitude towards the results of collective activity. The vector of individual results of the agents' activity depending, among other factors, on the actions of other agents can be considered by analogy. This line seems promising for future research. By Assumption 2, there is no uncertainty. Therefore, the dependence of the activity result (and the equilibrium actions of different agents) on the parameter ω is omitted.

According to Assumption 3, the agent's utility function, defined on the set of activity results, has a unique maximum achieved when the result coincides with the agent's state. In other words, the agent's state parameterizes his utility function, reflecting the goal of his activity. (Recall that a *goal* is a desired activity result [3].) Also, the agent's state can be interpreted as his *assessment*, *opinion*, or attitude [1] towards certain activity results; see the terminology of personality psychology in [1].

Assumption 4 is meaningfully transparent: if the goals of all agents coincide, then the corresponding result of their joint activity is achievable.

The expression (1) describes an agent's single decision (single choice of his action). To consider repetitive decision-making, we need to introduce additional assumptions. The decision-making dynamics studied below satisfy the following assumption.

**Assumption 5.** *The agent's action dynamics are described by the indicator behavior procedure [26]:*

$$y\_i^t = \left(1 - \gamma\_i^t\right) y\_i^{t-1} + \gamma\_i^t y\_i^\* \left(y\_{-i}^{t-1}, r\_i^t\right), \ t = 1, 2, \dots, \tag{2}$$

*with given initial values* - *y*0 *<sup>i</sup>* , *<sup>r</sup>*<sup>0</sup> *i* , *<sup>i</sup>* <sup>∈</sup> *N, where <sup>γ</sup><sup>t</sup> <sup>i</sup>* ∈ (0, 1] *are known constants. The action y*∗ *i yt*−<sup>1</sup> <sup>−</sup>*<sup>i</sup>* , *<sup>r</sup><sup>t</sup> i is called the local (current) position for the goal of agent i. In each period, the agent makes a "step" (proportional to γ<sup>t</sup> i ) from his current state to his best response (1) to the action profile in the previous period.*

**Assumption 6.** *The agent's state dynamics are described by the procedure:*

$$r\_i^t = \left[1 - b\_i B\_i\left(r\_i^{t-1}, u\_i^t\right) - c\_i \mathbb{C}\_i\left(r\_i^{t-1}, y\_i^{t-1}\right) - d\_i D\_i\left(r\_i^{t-1}, z^{t-1}\right) - e\_i\right] r\_i^{t-1} + \\\ b\_i B\_i\left(r\_i^{t-1}, u\_i^t\right) u\_i^t + c\_i \mathbb{C}\_i\left(r\_i^{t-1}, y\_i^{t-1}\right) y\_i^{t-1} + d\_i D\_i\left(r\_i^{t-1}, z^{t-1}\right) z^{t-1} + e\_i \mathbb{E}\_i\left(r\_i^{t-1}, y\_{-i}^{t-1}\right) \\\ t = 1, 2, \dots, i \in N.$$

**Assumption 7.** *The nonnegative constant degrees of trust* (*bi*, *ci*, *di*,*ei*) *satisfy the constraints:*

$$b\_i + c\_i + d\_i + e\_i \le 1, \ i \in \mathcal{N}. \tag{4}$$

**Assumption 8.** *The trust functions Bi*(·)*, Ci*(·)*, Di*(·)*, and Ei*(·)*, i* ∈ *N, have the domains* [0, 1]*; in addition,* ∀*a* ∈ [0, 1] *Ei*(*a*,..., *a*) = *a, i* ∈ *N*.

**Assumption 9.** *The nonnegative constant degrees of trust* (*bi*, *ci*, *di*,*ei*) *and the trust functions Bi(*·*), Ci(*·*), and Di(*·*), i* ∈ *N , satisfy the condition:*

$$\forall \, \mathbf{x}\_1, \mathbf{x}\_2, \mathbf{x}\_3, \mathbf{x}\_4 \in [0, 1] \; b\_i B\_i(\mathbf{x}\_1, \mathbf{x}\_2) + c\_i \mathbf{C}\_i(\mathbf{x}\_1, \mathbf{x}\_3) + d\_i D\_i(\mathbf{x}\_1, \mathbf{x}\_4) + e\_i, \; i \in \mathcal{N}. \tag{5}$$

Assumptions 7–9 guarantee that the state of the dynamic system (2) and (3) stay within the admissible set.

The constant weights (*bi*, *ci*, *di*,*ei*) possibly reflect the attitude (*trust*) of agent *i* to the corresponding *information source*, whereas the functions *Bi*(·), *Ci*(·), *Di*(·), and *Ei*(·) reflect his trust in the *information source*. The factor 1 − *biBi rt*−<sup>1</sup> *<sup>i</sup>* , *<sup>u</sup><sup>t</sup> i* − *ciCi rt*−<sup>1</sup> *<sup>i</sup>* , *<sup>y</sup>t*−<sup>1</sup> *i* − *diDi rt*−<sup>1</sup> *<sup>i</sup>* , *<sup>z</sup>t*−<sup>1</sup> − *ei* (see the first term on the right-hand side of the procedure (3)) conditionally reflects *the power of the agent's beliefs*.

Note that, for unitary values of the trust functions, the expression (3) also has a conditional probabilistic interpretation: with some probability, the agent does not change his state (opinion); with the probability *bi*, the state becomes equal to the control and with the probability *ci*, to his action, etc.

Let us present and discuss practical interpretations of the five terms on the right-hand side of the expression (3). According to (3), the state *r<sup>t</sup> <sup>i</sup>* of agent *i* in period *t* is a linear combination of the following parameters:


V. the external impact (*control*) *u<sup>t</sup> <sup>i</sup>* applied to him in period *t* (arrow no. 1 in Figure 1).

Thus, the model (2)–(3) embraces both *external* (explicit) and *internal* (implicit) informational control of decision-making.

An example is the interaction of group members in an online social network. Based on their beliefs (states), they publicly express their opinions (assessments or actions) regarding some issue (phenomenon or process). In this case, the collective decision (opinion or assessment) may be, e.g., the average value of the expressed assessments (opinions). Some agents can apply informational control (without changing their states and actions); some honestly reveal their beliefs in assessments; some try to bring the collective assessment closer to their beliefs. The beliefs of some agents may "drift," depending on the current actions (both their own and other agents), control, and (or) collective assessment.

An equilibrium *y*∗ *<sup>i</sup>* (*a*, ..., *a*) = *r*<sup>∗</sup> *<sup>i</sup>* = *a* ∈ [0,1], *i* ∈ *N*, is called *unified*: the final decision and all states and actions of all agents are the same.

Under Assumptions 1–9, we have the following result:

**Proposition 1** ([2])**.** *Let Assumptions 1–9 hold, and let all constant degrees of trust and trust functions be strictly positive. Without any control (bi* = 0*, i* ∈ *N), a fixed point of the dynamic system (2) and (3) is the unified equilibrium.*

Really, substituting the unified equilibrium into the expressions (2) and (3), we obtain identities: the unified equilibrium satisfies (1) due to the properties of the utility function (see Assumption 3).

The unified equilibrium of the dynamic system (2) and (3) always exists, but its domain of attraction does not necessarily include all admissible initial states and actions. Moreover, it may be nonunique. Therefore, the properties of equilibria of the dynamic system (2) and (3) should be studied in detail, focusing on practically important particular cases.

#### **4. Particular Cases**

Several well-studied models represent particular cases of the dynamic model (2) and (3). Let us consider some of them; also, see the survey in [2].

#### *4.1. Models of Informational Control*

Models of informational control [29], in which the agent's opinions evolve under purposeful messages, e.g., from the *mass media*. In these models *ci* = *di* = *ei* = 0, *i* ∈ *N*:

$$r\_i^t = \left(1 - b\_i B\_i\left(r\_i^{t-1}, u\_i^t\right)\right) r\_i^{t-1} + b\_i \, B\_i\left(r\_i^{t-1}, u\_i^t\right) u\_i^t, \; t = 1, 2, \dots, \; i \in N.$$

The agent's state dynamics model (6) was adopted in the book [29] to pose and solve informational control problems.

The dynamics of opinions, beliefs, and attitudes of a personality can be described by analogy; see a survey of the corresponding models of personality psychology in [1,21].

#### *4.2. Models of Consensus*

Models of *consesus* (see [29] and surveys in [23,31]). In this class of models *bi* = *ci* = *di* = 0, and each agent averages their state with the states or actions of other agents:

$$E\_i\left(r\_i^{t-1}, y\_{-i}^{t-1}\right) = e\_i \sum\_{j \in N\backslash\{i\}} \mathcal{e}\_{ij} \,\triangle\_i \left(r\_i^{t-1}, y\_j^{t-1}\right) \, y\_j^{t-1} \,.$$

In other words, the expression (3) takes the form:

$$r\_i^t = (1 - \varepsilon\_i)r\_i^{t-1} + \varepsilon\_i \sum\_{j \in N\backslash\{i\}} \varepsilon\_{ij} \triangle\_i \left(r\_i^{t-1}, y\_j^{t-1}\right) y\_j^{t-1}, \ t = 1, 2, \dots, i \in N\_{\prime}$$

where the elements of the matrix *eij* (the links between different agents) satisfy the condition ∑ *j*∈*N*\{*i*} *eij* = 1, *i* ∈ *N*.

The existence conditions of equilibria can be found in [23,29].

#### *4.3. Models of Conformity Behavior*

Models of conformity behavior (see [9,11] and a survey in [28]). In this class of models, *bi* = *ci* = *di* = 0, *ei* = 1 and each agent makes a binary choice between being active or passive (*Ai* = {0; 1}). Moreover, his action coincides with his state evolving as follows:

$$r\_i^t = \begin{cases} 1, \sum\_{j \in N} e\_{ij} y\_j^{t-1} \ge \xi\_{i\prime} \\ 0, \sum\_{j \in N} e\_{ij} y\_j^{t-1} < \xi\_{i\prime} \end{cases}, t = 1, 2, \dots, i \in N,\tag{6}$$

where *ς<sup>i</sup>* ∈ [0,1] is the agent's *threshold*. The agent demonstrates *conformity behavior* [9,11]: he begins to act when the weighted share of active agents exceeds his threshold (the weights are the strengths of links between different agents). Otherwise, the agent remains passive. The dynamics of conformity behavior (6) were studied in the book [28].

In the models of informational control, consensus, and conformity behavior, the main emphasis is on the agent's states: his actions are not considered, or the action is assumed to coincide with the state.

#### *4.4. Models of Social Influence*

Models of social influence (see a meaningful description of social influence effects and numerous examples in [13,16]). On the one hand, the models of informational control, consensus, and conformity behavior can undoubtedly be attributed to the models of *social influence*. On the other hand, the general model (3) reflects other social influence effects known in *social psychology*, including the dependence of beliefs, relationships, and attitudes on the previous experience of the agent's activity [20–22].

Similar effects occur under *cognitive dissonance*: an agent changes his opinions or beliefs in dissonance with the performed behavior, e.g., with the action he chooses (see arrow no. 6 in Figure 1). In this case, an adequate model has the form:

$$r\_i^t = \left(1 - c\_i \mathbb{C}\_i(r\_i^{t-1}, y\_i^{t-1})\right) r\_i^{t-1} + c\_i \mathbb{C}\_i(r\_i^{t-1}, y\_i^{t-1}) \ y\_i^{t-1}, \ t = 1, 2, \dots, \ i \in N\_\prime$$

(*bi* = *di* = 0, *eij* = 0). Within this model, the agent changes his state depending on the actions chosen.

Another example is *the hindsight effect* (explaining events by the retrospective view, "It figures"). This effect is the agent's inclination to perceive events that have already occurred or facts that have already been established, as obvious and predictable, despite insufficient initial information to predict them. In this case, an adequate model has the form:

$$r\_i^t = \left(1 - d\_i \, D\_i(r\_i^{t-1}, z^{t-1})\right) r\_i^{t-1} + d\_i \, D\_i(r\_i^{t-1}, z^{t-1}) \, z^{t-1}, \, t = 1, \, 2, \, \dots, \, i \in N\_{\tau}$$

(*bi* = *ci* = 0, *eij* = 0). Within this model, the agent changes his state depending on the activity result (see arrow no. 7 in Figure 1).

The two models mentioned were considered in detail in [2].

#### **5. Model of Voting**

Consider a decision-making procedure by simple majority voting. Assume that the agents report their true opinions (actions) *y<sup>t</sup> <sup>i</sup>* <sup>∈</sup> {0; 1}: they either support a *decision* (*y<sup>t</sup> <sup>i</sup>* = 1) or not (*y<sup>t</sup> <sup>i</sup>* = 0). (Truth-telling means no strategic behavior.) The decision (the result of collective activity) is accepted (*z<sup>t</sup>* = 1) if at least half of the agents voted for it; otherwise, the decision is rejected (*z<sup>t</sup>* = 0): *z<sup>t</sup>* = *I* ∑ *j*∈*N yt <sup>j</sup>* <sup>≥</sup> *<sup>n</sup>* 2 , where *I*(·) denotes the indicator

function. Examples are: election of some candidate or authority, support of resources or costs allocation variant, etc.

Agent *i* has a type (opinion or belief) *r<sup>t</sup> <sup>i</sup>* ∈ [0,1] reflecting his inclination to support the decision. Assume that the agent chooses his action depending on his type: *yt <sup>i</sup>* = *I rt*−<sup>1</sup> *<sup>i</sup>* <sup>≥</sup> <sup>1</sup> 2 , *i* ∈ *N*.

Let the dynamics of the agent's type be described by the procedure:

$$r\_i^t = \begin{bmatrix} 1 \ -b\_i - c\_i - d\_i \end{bmatrix} r\_i^{t-1} + b\_i \ u\_i^t + c\_i \ y\_i^{t-1} + d\_i \ z^{t-1}, \ t = 1, \ 2, \ \dots, \ i \in N,\tag{7}$$

where *u<sup>t</sup> <sup>i</sup>* ∈ [0, 1] is the *control* (i.e., informational influence via mass media, social media, or personal communication), and the nonnegative *constant degrees of trust* (*bi*, *ci*, *di*) satisfy the constraints:

$$b\_i + c\_i + \ d\_i \le 1, \ i \in \mathcal{N}. \tag{8}$$

(Also, see the expression (3)).

Due to relations (8), the state of the dynamic system (7) stays within the admissible set [0,1]*n*.

According to the expression (7), the type *r<sup>t</sup> <sup>i</sup>* of agent *i* in period *t* is a linear combination of the following parameters:


Within this model, an active system is controllable if the action of any agent can be changed to the opposite in finite time using admissible controls according to (7).

Let {*r*<sup>0</sup> *<sup>i</sup>* ∈ [0, 1]} be given initial types of all agents. Consider different modifications of the model (7), as described in Table 2.


**Table 2.** Modifications of model (7).

Modification 1 corresponds to no influence on the types of any agents. In these conditions, the types are static: *r<sup>t</sup> <sup>i</sup>* = *<sup>r</sup>*<sup>0</sup> *<sup>i</sup>* , *t* = 1, 2, . . . , *i* ∈ *N*.

Modification 2. Here the expression (7) takes the form *r<sup>t</sup> <sup>i</sup>* <sup>=</sup> [<sup>1</sup> <sup>−</sup> *bi*] *<sup>r</sup>t*−<sup>1</sup> *<sup>i</sup>* + *bi <sup>u</sup><sup>t</sup> i* , *t* = 1, 2, . . . , *i* ∈ *N*.

**Proposition 2.** *In modification 2 with bi > 0, <sup>i</sup>* <sup>∈</sup> *N, the system (7) is controllable. For <sup>u</sup><sup>t</sup> <sup>i</sup>* ∈ {0; 1} *and bi* <sup>&</sup>gt; *max* 1/2−*r*<sup>0</sup> *i* <sup>1</sup>−*r*<sup>0</sup> *i* ; 1 <sup>−</sup> <sup>1</sup> 2*r*<sup>0</sup> *i , i* ∈ *N, the action of any agent can be changed to the opposite in one period.*

Lower bounds for constants {*bi*} in propositions 2, 4, 5, and 6 characterize minimal "strength" of informational control or minimal trust in the source of the control information to provide the system's controllability.

Modification 3. Here the expression (7) takes the form:

$$r\_i^t = \left[1 - c\_i\right] r\_i^{t-1} + c\_i \ y\_i^{t-1}, \ t = 1, 2, \dots, i \in N.$$

In this modification, the types of agents vary, but their actions and activity result are *stationary*: *y<sup>t</sup> <sup>i</sup>* = *<sup>y</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>z</sup><sup>t</sup>* <sup>=</sup> *<sup>z</sup>*0, *<sup>t</sup>* = 1, 2, ... , *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*. The agents become increasingly convinced of the correctness of their beliefs and initial action.

Modification 4. Here the expression (7) takes the form:

$$
\tau\_i^t = \begin{bmatrix} 1 - d\_i \end{bmatrix} \tau\_i^{t-1} + d\_i \ z^{t-1}, \ t = 1, \ 2, \ \dots, \ i \in \mathbb{N}. \tag{9}
$$

In this modification, the types and actions of agents vary, but the activity result is *stationary*: *<sup>z</sup><sup>t</sup>* <sup>=</sup> *<sup>z</sup>*0, *<sup>t</sup>* = 1, 2, ... , *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*. The prior majority of agents do not change their actions and, affecting those who prefer another alternative, gradually draw the latter to their side.

**Proposition 3.** *In modification 4 with di > 0, <sup>i</sup>* <sup>∈</sup> *N, for any initial conditions {r*<sup>0</sup> *<sup>i</sup>* ∈ [0, 1]*} the system (9) has the unique equilibrium z*0*.*

Modification 5. Here the expression (7) takes the form:

$$r\_i^t = \begin{bmatrix} 1 - b\_i - c\_i \end{bmatrix} r\_i^{t-1} + \begin{array}{c} b\_i \ u\_i^t + c\_i \ y\_i^{t-1}, \ t = 1, \ 2, \dots, \ i \in N. \tag{10}$$

Writing the monotonicity condition for the agent's type depending on the control goal, we easily establish the following result.

**Proposition 4.** *In modification 5 with bi > ci, i* ∈ *N the system (10) is controllable.*

Modification 6. Here the expression (7) takes the form:

$$r\_i^t = \begin{bmatrix} 1 \ -b\_i - d\_i \end{bmatrix} r\_i^{t-1} + b\_i \ u\_i^t + d\_i \ z^{t-1}, \ t = 1, \ 2, \ \dots, \ i \in \mathbb{N}. \tag{11}$$

Writing the monotonicity condition for the agent's type depending on the control goal, we easily establish the following result:

**Proposition 5.** *In modification 6 with bi > di, i* ∈ *N, the system (11) is controllable.*

Modification 7. Here there is no control, and the expression (7) takes the form:

$$r\_i^t = \left[1 - c\_i - d\_i\right] r\_i^{t-1} + c\_i \, y\_i^{t-1} + d\_i \, z^{t-1}, \; t = 1, \; 2, \; \dots, \; i \in N.$$

In this modification, the types of agents and, generally speaking, their actions vary, but the activity result is *stationary*: *<sup>z</sup><sup>t</sup>* <sup>=</sup> *<sup>z</sup>*0, *<sup>t</sup>* = 1, 2, ... , *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*. The prior majority of agents do not change their actions and, affecting those who prefer another alternative, possibly gradually draw the latter to their side (depending on the relation between the parameters *ci* and *di*).

Modification 8. Here the type dynamics are described by the general expression (7). Writing the monotonicity condition for the agent's type depending on the control goal, we easily establish the following result:

**Proposition 6.** *In modification 8 with bi > 3 (ci + di), i* ∈ *N, the system (7) is controllable.*

Concluding this subsection, we also mention an interesting modification of the procedure (7): no control and *anti-conformists* (the agents choosing actions to obtain a result different from the majority's one):

$$r\_i^t = \left[1 - c\_i - d\_i\right] r\_i^{t-1} + c\_i \, y\_i^{t-1} + d\_i \left(1 - z^{t-1}\right), \; t = 1, \; 2, \; \dots, \; i \in N.$$

Example. Consider an illustrative example of three agents with the initial types *r*<sup>0</sup> <sup>1</sup> = 0.3, *r*0 <sup>2</sup> = 0.6, and *<sup>r</sup>*<sup>0</sup> <sup>3</sup> = 0.4 Assume that the cognitive dissonance effect is absent (*ci* = 0, *i* = 1, 3). The first agent does not change his type: *d*<sup>1</sup> = 0. The second and third agents are anticonformists: *d*<sup>2</sup> = 0.1 and *d*<sup>3</sup> = 0.1. The dynamics of the agents' types (second and third agents) and activity result (unstable!) are shown in Figure 2.

**Figure 2.** Dynamics of agents' types and activity result in the example.

#### **6. Model of Informational Confrontation**

Consider three agents: the first and second agents perform informational control (choose controls as their actions), affecting (due to the *informational influence*) the type (internal state—opinion or belief) of the third agent. The common activity result for all agents is the state of the third agent by a terminal period *T*.

Let the opinion *rt* of the third agent in period *t* be a linear combination of his opinion and the opinions of the first and second agents in the previous period: *<sup>r</sup><sup>t</sup>* <sup>=</sup> [<sup>1</sup> <sup>−</sup> *<sup>b</sup>*<sup>1</sup> <sup>−</sup> *<sup>b</sup>*2] *<sup>r</sup>t*−<sup>1</sup> <sup>+</sup> *<sup>b</sup>*1*rt*−<sup>1</sup> <sup>1</sup> <sup>+</sup> *<sup>b</sup>*2*rt*−<sup>1</sup> <sup>2</sup> . (All opinions have the range [0, 1).)

Assume that the goals of the first and second agents are opposite (the first one is interested in turning *rt* to state "0", while the second one—to state "1") and their states are invariable: *r<sup>t</sup>* <sup>1</sup> <sup>≡</sup> 0, *<sup>r</sup><sup>t</sup>* <sup>2</sup> ≡ 1. Interpretations of agents states are the same as in Section 4 above.

If, in each period, the agents exchanged their opinions (true states), the opinion dynamics would be *<sup>r</sup><sup>t</sup>* <sup>=</sup> [<sup>1</sup> <sup>−</sup> *<sup>b</sup>*<sup>1</sup> <sup>−</sup> *<sup>b</sup>*2] *<sup>r</sup>t*−<sup>1</sup> <sup>+</sup> *<sup>b</sup>*2.

The controls of the first and second agents are to inform the third agent about their opinions in some periods. Therefore, we have:

$$r^t = \left[1 - b\_1 I\left(y\_1^t = 1\right) - b\_2 I\left(y\_2^t = 1\right)\right] r^{t-1} + b\_1 I\left(y\_1^t = 1\right) r\_1^{t-1} + b\_2 I\left(y\_2^t = 1\right) r\_2^{t-1}.$$

The sets of admissible actions have the form *y<sup>t</sup> <sup>i</sup>* ∈ {0; 1}, *i* = 1, 2, (such controls are called *binary*). Then *y<sup>t</sup> <sup>i</sup>* = *I* - *yt <sup>i</sup>* = <sup>1</sup>), *<sup>i</sup>* = 1, 2. Substituting *<sup>r</sup><sup>t</sup>* <sup>1</sup> <sup>≡</sup> 0, *<sup>r</sup><sup>t</sup>* <sup>2</sup> ≡ 1, we arrive at the following state dynamics of the third agent:

$$r^t = \begin{bmatrix} 1 - b\_1 y\_1^t - b\_2 y\_2^t \end{bmatrix} r^{t-1} + b\_2 y\_2^t, \ t = 1, \ 2, \ \dots \tag{12}$$

where *<sup>b</sup>*<sup>1</sup> <sup>+</sup> *<sup>b</sup>*<sup>2</sup> <sup>≤</sup> 1 and *<sup>r</sup>*<sup>0</sup> is a given initial state. (Also, see the expressions (3) and (7) above.) Let the first agent be interested in minimizing the terminal state *rT*, whereas the second in maximizing it. Note that the consumption of resources and other costs are not included in the goal functions.

In a practical interpretation, the state of the third agent (his opinion, belief, or attitude towards some issue or phenomenon) is reduced by the first agent and increased by the second. There is an informational confrontation between the first and second agents, described by game theory. In the dynamic case considered below, we have a differential game; static models of informational confrontation and models of repeated games can be found in [28,29].

According to (12), the combinations, presented in Table 3, are possible in each period.

**Table 3.** The combinations of each period.


In the latter case, the state of the third agent has a nonnegative increment if *<sup>b</sup>*<sup>2</sup> <sup>≥</sup> *<sup>b</sup>*<sup>1</sup> *<sup>r</sup>t*−<sup>1</sup> <sup>1</sup>−*rt*−<sup>1</sup> . A differential counterpart of the difference Equation (12) has the form:

$$
\dot{r}(t) = -\left[b\_1 y\_1(t) + b\_2 y\_2(t)\right] r(t) + b\_2 y\_2(t). \tag{13}
$$

Assume that the actions of the first and second agents are subjected to the integral resource constraints (i.e., resources for customized publications in mass media or posts in social media, advertising costs, etc.)

$$\int\_{0}^{T} y\_i(t) \, dt \le \mathbb{C}\_{i\prime} \text{ i } = \overline{1,2}. \tag{14}$$

First, let us study several special cases.

Case 1 (control applied by the first agent only). Substituting *y<sup>t</sup>* <sup>2</sup> ≡ 0 or (and) *b*<sup>2</sup> ≡ 0 into (13), we obtain the differential equation . *r*(*t*) = −*b*<sup>1</sup> *y*1(*t*) *r*(*t*). Due to the constraint (14), the solution *<sup>r</sup>*(*t*) = *<sup>r</sup>*<sup>0</sup> exp {−*b*<sup>1</sup> *t* 0 *<sup>y</sup>*1(*τ*) *<sup>d</sup>τ*} yields the estimate *<sup>r</sup>*(*T*) <sup>=</sup> *<sup>r</sup>*<sup>0</sup> exp {−*b*1*C*1}

of the terminal state, which is independent of the trajectory *y*1(*t*).

Case 2 (control applied by the second agent only). Substituting *y<sup>t</sup>* <sup>1</sup> ≡ 0 or (and) *<sup>b</sup>*<sup>1</sup> <sup>≡</sup> 0 into (13), we obtain the differential equation . *r*(*t*) = *b*<sup>2</sup> *y*2(*t*) (1 − *r*(*t*)). Due to the constraint (14), the solution *<sup>r</sup>*(*t*) = <sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>r</sup>*0) exp {−*b*<sup>2</sup> *t* 0 *y*2(*τ*) *dτ*} yields the estimate *<sup>r</sup>*(*T*) <sup>=</sup> <sup>1</sup> <sup>−</sup> - <sup>1</sup> <sup>−</sup> *<sup>r</sup>*<sup>0</sup> exp {−*b*2*C*2} of the terminal state, which is independent of the trajectory *y*2(*t*).

Case 3 (unlimited resources, both agents choose the actions *y<sup>t</sup>* <sup>1</sup> <sup>≡</sup> 1,*y<sup>t</sup>* <sup>2</sup> ≡ 1 in all periods). In this case, Equation (13) takes the form:

$$
\dot{r}(t) = -\left[b\_1 + b\_2\right]r(t) + b\_2. \tag{15}
$$

The solution is given by:

$$r(t) = \frac{b\_2}{b\_1 + b\_2} - \left(\frac{b\_2}{b\_1 + b\_2} - r^0\right) e^{-(b\_1 + b\_2)t}.\tag{16}$$

*The characteristic time* is *<sup>τ</sup>*<sup>0</sup> <sup>∼</sup> <sup>3</sup> *<sup>b</sup>*1+*b*<sup>2</sup> , and *the asymptotic value* is *r*<sup>∞</sup> = *<sup>b</sup>*<sup>2</sup> *b*1+*b*<sup>2</sup> .

Now, we return to the general case (13). Let *ci*(*t*) = *<sup>t</sup>* 0 *yi*(*τ*)*dτ* ∈ [0; *t*], *ci*(*T*) ≤ *Ci*, *i* = 1, 2, denote the resource consumption of agent *i* by a period *t*, representing a nondecreasing function of time. The choice of these functions by the first and second agents can be treated as their strategies.

The solution of Equation (13) is given by:

$$r(c\_1(\cdot), c\_2(\cdot), t) = \frac{r^0 + b\_2 \int\_0^t y\_2(\tau) \exp\{b\_1 c\_1(\tau) + b\_2 c\_2(\tau)\} d\tau}{\exp\{b\_1 c\_1(t) + b\_2 c\_2(t)\}}. \tag{17}$$

Consider the differential zero-sum two-person (antagonistic) game in normal form [32,33] of the first two agents. At the initial time instant of this game, the first and second agents choose their open-loop strategies *y*1(*t*)| *T <sup>t</sup>*=<sup>0</sup> and *y*2(*t*)| *T <sup>t</sup>*=0, respectively, once, simultaneously, and independently of one another.

Further analysis will be restricted to the class of strategies with a single switch. In this class, at the initial time instant, the first and second agents simultaneously and independently choose some instants *t*<sup>1</sup> and *t*2, respectively, when they start consuming their resource (apply controls) until complete exhaustion. Therefore, the open-loop strategies have the form:

$$y\_i(t\_i, \mathbb{C}\_i, t) = \begin{cases} \begin{array}{c} 0, \ t < t\_i; \\ 1, \ t \in [t\_i, t\_i + \mathbb{C}\_i]; \\ 0, \ t > t\_i + \mathbb{C}\_i. \end{array} \end{cases} \tag{18}$$

The functional (17) monotonically decreases in *c*1(·) and increases in *c*2(·). Hence, the first and second agents benefit from consuming the entire resource, and consequently, *t*<sup>1</sup> ≤ *T* − *C*<sup>1</sup> and *t*<sup>2</sup> ≤ *T* − *C*2.

There are four possible relations among the parameters *C*1, *C*2, and *T*.

The first relation: *T* ≤ min{*C*1; *C*2} (both agents have enough resources).

Here the Nash equilibrium strategies are: <sup>∀</sup>*<sup>t</sup>* <sup>∈</sup> [0, *<sup>T</sup>*] *<sup>y</sup><sup>t</sup> <sup>i</sup>* ≡ 1, *i* = 1, 2, due to the monotonicity mentioned above.

The second and third relations: for some *<sup>i</sup>* = 1, 2, *Ci* ≥ *Ti* and *<sup>C</sup>*3−*<sup>i</sup>* < *Ti*.

Here, for agent *<sup>i</sup>*, the optimal strategy is: <sup>∀</sup>*<sup>t</sup>* <sup>∈</sup> [0, *<sup>T</sup>*] *<sup>y</sup><sup>t</sup> <sup>i</sup>* ≡ 1. For agent (3 − *i*), the optimal switching instant *t*3−*<sup>i</sup>* is the solution of a scalar optimization problem. The case *<sup>t</sup>*3−*<sup>i</sup>* = *<sup>T</sup>* − *<sup>C</sup>*3−*<sup>i</sup>* is of practical interest. Note that the binary control is optimal under the constraints *y<sup>t</sup> <sup>i</sup>* ∈ [0, 1], *i* = 1, 2, due to the linearity of (13) in the controls.

The fourth relation: *T* > max{*C*1; *C*2} (both agents lack resources).

Here the agents play a complete game. If *τ*<sup>0</sup>  min{*C*1; *C*2}, then the equilibrium of this game is *t* ∗ <sup>1</sup> = *T* − *C*1, *t* ∗ <sup>2</sup> = *T* − *C*2. Therefore, both agents start spending resources as late as possible, and the terminal value is *<sup>r</sup>*(*T*) <sup>≈</sup> *<sup>r</sup>*∞. The same pair of strategies will be an equilibrium for *T C*<sup>1</sup> + *C*<sup>2</sup> (when the quantities of resources are such that the controls are short-term on the scale of the period *T*). Practical interpretation is "save all reserves until the last decisive moment".

Hence, the results of this section give optimal strategies of the first two agents and characterize the equilibrium of their informational confrontation.

#### **7. Conclusions**

The main result is a general model (1)–(3) of joint dynamics of agents' actions and internal states, depending as on previous actions and states, as on the environment and the results of activity (see Figure 1). It allows combining methods and approaches of various decision-making paradigms, game theory, and social psychology to external and internal aspects of collective strategic decision-making.

Many known models and results of the above-mentioned scientific domains—reflecting the effects of consensus, threshold behavior, cognitive dissonance, informational influence, control, and confrontation—turn out to be the particular cases of the general model.

Three main directions seem prospective for future researches. First, the analysis of the general models in order to explore maximally general but analytical conditions for equilibrium existence, uniqueness, and its comparative statics. Second, generating new particular/applied models of collective activity and organizational behavior and management, taking into account not only "economical" rationality but psychological aspects as well. The third direction is the field of model identification and verification to put them closer to reality and practical applications.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **Multi-Output Soft Sensor with a Multivariate Filter That Predicts Errors Applied to an Industrial Reactive Distillation Process**

**Vladimir Klimchenko 1, Andrei Torgashov 1,\*, Yuri A. W. Shardt <sup>2</sup> and Fan Yang <sup>3</sup>**


**Abstract:** The paper deals with the problem of developing a multi-output soft sensor for the industrial reactive distillation process of methyl tert-butyl ether production. Unlike the existing soft sensor approaches, this paper proposes using a soft sensor with filters to predict model errors, which are then taken into account as corrections in the final predictions of outputs. The decomposition of the problem of optimal estimation of time delays is proposed for each input of the soft sensor. Using the proposed approach to predict the concentrations of methyl sec-butyl ether, methanol, and the sum of dimers and trimers of isobutylene in the output product in a reactive distillation column was shown to improve the results by 32%, 67%, and 9.5%, respectively.

**Keywords:** soft sensing; multivariate filter; reactive distillation

#### **1. Introduction**

As the size and complexity of industrial systems increases, there is a need to accurately measure most process variables. Unfortunately, not all variables can be accurately measured using online hard sensors. For certain variables, such as concentration or density, the only accurate measurements can be obtained by manually taking samples and analyzing them in a laboratory. One solution to this problem is the development of soft sensors, which take the easy-to-measure variables and create models to predict the hard-to-measure variables [1].

All soft sensor systems consist of a process model that takes the easy-to-measure variables and provides an estimate of the hard-to-measure variables. These models can be constructed using methods ranging from linear regression to principal component analysis and support vector machines. Although the main focus has been on the development of the soft sensor models [2–5], advanced soft sensor systems have also a bias update term that can take any slowly sampled information to update the soft sensor prediction [1]. This bias update term is normally designed as some function of the difference between the predicted and measured values [6]. Of note, it should be mentioned that the measured values are often sampled very slowly and with considerable time delay. This means that during the points at which there are no updates, the previously available bias value is used. When such a system is properly designed, it can provide good tracking of the process, i.e., the predicted and measured values are close to each other.

Recently, it has been suggested that instead of only using the available slowly sampled data for updating the bias term, it should be possible to also model the historical errors and use them to predict the future errors [7]. It has been shown that such an approach can improve the overall performance of the soft sensor system. However, there still remain issues with how best to model and implement this predictive bias update term.

**Citation:** Klimchenko, V.; Torgashov, A.; Shardt, Y.A.W.; Yang, F. Multi-Output Soft Sensor with a Multivariate Filter That Predicts Errors Applied to an Industrial Reactive Distillation Process. *Mathematics* **2021**, *9*, 1947. https:// doi.org/10.3390/math9161947

Academic Editor: Michal Feˇckan

Received: 20 June 2021 Accepted: 13 August 2021 Published: 15 August 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Furthermore, there are issues with incorporating time delays into this approach since they will greatly increase the size of the required search space.

Therefore, this paper will examine the development of a predictive bias update term for a nonlinear system using dimension reduction. The proposed approach will be tested using data from an industrial reactive distillation column that produces methyl tert-butyl ether (MTBE).

#### **2. Background**

Consider the soft sensor system shown in Figure 1, where *ut* is the input, *yt* is the measured (true) output, *y*ˆ*m*,*<sup>t</sup>* the predicted soft sensor value, *y*ˆ*α*,*<sup>t</sup>* and *y*ˆ*β*,*<sup>t</sup>* are intermediate soft sensor values, *Gp* is the true process, *G*ˆ *<sup>p</sup>* is the soft sensor process model, and *GB* is the bias update term. It can be noted that purpose of the bias update term is to take the information from the measured values and correct the output of the soft sensor system. This comes primarily from the unknown disturbances and the inherent plant-model mismatch.

**Figure 1.** Soft sensor system of interest [1].

Another approach to this problem is to re-arrange the bias update term so that it contains a predictive model that can predict the errors between the measured and predicted values. This re-arrangement is shown in Figure 2, where the predicted value from the soft sensor is corrected based on the modeled errors of the system. The question becomes how to design this model so that the best predictions can be obtained.

For prediction of time series, the Box-Jenkins methodology is traditionally used, according to which the time series model is found in the class of autoregressive-moving average (ARMA) models, i.e., is considered a rational algebraic function of the backward shift operator. The flexibility of the ARMA class makes it possible to find parsimonious models, i.e., the adequacy of the evaluated model is achieved with a small number of estimated parameters. Since this property is especially important for empirical models, the Box-Jenkins methodology is widely used to solve various practical problems. This approach is adopted in this paper.

In industrial processes, where it is desired to implement the model on programmable logic control (PLC) units, the complexity of the model *G*ˆ *<sup>p</sup>* can be an issue. Therefore, this paper will consider a simple model for *G*ˆ *<sup>p</sup>* of the form

$$y\_t = b\_0 + b\mathbf{x}\_t + \mathbf{e}\_t \tag{1}$$

where *b* are the parameters to be estimated and *xt* is the input(s). Model (1) can be improved by taking into account possible delays of the output variables relative to inputs. Consider the following model for a multi-output soft sensor

$$\mathbf{y}\_{t,\ m} = \mathbf{b}\_{\text{ill}} \; \mathbf{u}\_{\text{ill}} \; \begin{pmatrix} t, \ \mathbf{r}\_{\text{ill}} \end{pmatrix} + \mathbf{e}\_{t,\ m} \tag{2}$$

where *t* = 1, 2, ... , *n*; *m* = 1, 2, 3 (the number of outputs *m* is given by the industrial production team and reflecting the key quality indices of MTBE product). Vector *bm* = (*bm,* 1, *bm,* 2, ... , *bm,* 10) is a row vector of unknown coefficients; *τ<sup>m</sup>* = (*τm,* 1, *τm,* 2, ... , *τm,* 10) is a row vector of unknown time delays; *um* (*t*, *τm*)=(*ut, m,* 1, *ut, m,* 2, ... , *ut, m,* 10) T; *ut, m, k* is the measurement of the *xk* value at time *t* − *τm, k* with *k* = 1, 2, ... , 10. Please note that it has been assumed here that the maximal time delay is 10 samples and justified from the industrial process dynamics point of view. However, it can easily be extended to arbitrary values.

Solving model (2) by minimizing the mean squared error (MSE) gives an estimate for the unknown parameters ˆ *bm* and *τ*ˆ*m*. The MSE depends not only on the coefficients *bm,* but also on the delays *τm*, i.e.,

$$D\_{cm}(b\_m \, \tau\_m) = \frac{1}{n} \sum\_{t=1}^{n} \left\{ y\_{tm} - b\_m \mu\_m(t, \, \tau\_m) \right\}^2,\\ m = 1, 2, 3 \tag{3}$$

Thus,

$$\left(\hat{b}\_{m\text{\textquotedblleft}\tau\_m}, \mathfrak{r}\_m\right) = \arg\min\_{b\_{m\text{\textquotedblleft}\tau\_m}} D\_{\text{cm}}(b\_{m\text{\textquotedblleft}\tau\_m}).\tag{4}$$

Please note that if *Dem*(*b*<sup>∗</sup> *<sup>m</sup>*, *τ*<sup>∗</sup> *<sup>m</sup>*) = min *bm*, *τm Dem*(*bm*, *τm*), than *Dem*(*b*<sup>∗</sup> *<sup>m</sup>*, *τ*<sup>∗</sup> *<sup>m</sup>*) =

min *bm Dem*(*bm*, *τ*<sup>∗</sup> *<sup>m</sup>*).

Consequently,

$$\min\_{b\_{\mathrm{lm}}, \tau\_{\mathrm{m}}} D\_{\mathrm{c}\mathrm{m}}(b\_{\mathrm{m}}, \tau\_{\mathrm{m}}) = \min\_{\tau\_{\mathrm{m}}} \left\{ \min\_{b\_{\mathrm{m}}} D\_{\mathrm{c}\mathrm{m}}(b\_{\mathrm{m}}, \tau\_{\mathrm{m}}) \right\} = \min\_{\tau\_{\mathrm{m}}} D\_{\mathrm{c}\mathrm{m}}(\hat{b}\_{\mathrm{m}}, \tau\_{\mathrm{m}}) \tag{5}$$

Furthermore, the estimates ˆ *bm* are found using standard regression analysis which gives

$$\hat{\boldsymbol{\theta}}\_{m} = \left\{ \left( \mathbf{U}\_{m}^{\mathrm{T}} \mathbf{U}\_{m} \right)^{-1} \mathbf{U}\_{m}^{\mathrm{T}} \mathbf{Y}\_{m} \right\}^{\mathrm{T}}, \; m = 1, \; 2, \; 3 \tag{6}$$

where **Y***<sup>m</sup>* is the *m*-th column of the matrix **Y**; **U***<sup>m</sup>* is a matrix with dimension *n* × 10, whose *t*-th row is the row *um* (*t*, τ*m*) T.

Since all variables are measured at discrete moments in time, the gradient descent methods cannot be directly applied to minimize the objective function *Dem*(ˆ *bm*, *τm*) for the argument *τm*. However, this difficulty can be avoided by calculating *Dem* for any values of the elements of the vector *τ<sup>m</sup>* by interpolating between the nearby nodes of the discrete grid. Interpolation with a large search space dimension is a difficult problem. Among the various characteristics of the algorithms used, such properties as visibility and relative simplicity come to the fore. Therefore, in this situation, the most preferable is the polynomial interpolation.

#### *2.1. Error Modeling*

If the *et, m* error were known at time *t* − 1, then using Equation (2), it would be possible to predict the *yt, m* variable with absolute accuracy. Unfortunately, the *et, m* error is not known in advance, but it can be predicted using any statistical patterns found in the sequence *e*1, *<sup>m</sup>*, *e*2, *<sup>m</sup>*, ... . This error prediction can be used as a correction to model (2) as shown in Figure 2, therefore improving the prediction accuracy of the *yt,m* output variable. To evaluate a predictive model for the sequence *e*1, *<sup>m</sup>*, *e*2, *<sup>m</sup>* ... , let us consider the class of ARMA models. Let us introduce the predicted process as the output of an invertible linear filter, called a shaping filter, driven by white noise, i.e., a process with a constant spectral density. In this case, the transfer function of the shaping filter is considered a rational algebraic function of the backward shift operator, i.e.,

$$x\_t = \frac{\prod\_{l=1}^{N\_n} (1 - H\_l q^{-1})}{\prod\_{k=1}^{N\_d} (1 - G\_k q^{-1})} \varepsilon\_t \tag{7}$$

where *ε<sup>t</sup>* and *et* are values of the input and output processes of the shaping filter at time *t*; *Nn* is the order of the moving average; *Nd* is the order of the autoregressive component; *Hl*, *Gk* are constants (generally speaking, complex-valued); and *q*−<sup>1</sup> is the backshift operator. The stationarity and invertibility conditions, which are necessary to predict the *et* process, are [8]

$$|G\_k| < 1, \ k = 1, \dots, N\_d; \ |H\_l| < 1, \ l = 1, \dots, N\_n \tag{8}$$

The flexibility of the ARMA class provides the possibility of finding parsimonious models, i.e., the adequacy of the constructed model is achieved with a relatively small number of estimated parameters. Since this property is especially important for empirical models, the models with the structure given in Equation (7) and their variants are widely used for solving practical problems.

The filter for predicting the *et* process can be found using the prediction error method (PEM) [9]. Expanding the brackets in Equation (7) gives

$$\varepsilon\_{t} = \frac{(1 - \theta\_{1}q^{-1} - \dots - \theta\_{N\_{\rm n}}q^{-N\_{\rm n}})}{(1 - \eta\_{1}q^{-1} - \dots - \eta\_{N\_{\rm d}}q^{-N\_{\rm d}})} \varepsilon\_{t} \tag{9}$$

where *θ<sup>l</sup>* and *η<sup>k</sup>* are the model parameters. It is assumed that the polynomials in the numerator and denominator have no common roots, since otherwise it would be possible to reduce the common multipliers in the numerator and denominator of Equation (7).

The PEM function finds the parameter values that minimize the predictive MSE of the *et* process for given polynomial orders (*Nn*, *Nd*) and the initial estimates of the parameters *θ<sup>l</sup>* and *ηk*. It is possible to choose suitable orders of the polynomials based on sample estimations of the spectral density of the considered process. Recall that the frequency response of the shaping filter is the value of Equation (7) on a circle of unit radius centered on the origin and the spectral density *S*(*ω*) of the output process *et* is equal to the product of the variance of the input process and the square of the frequency response modulus, i.e., [10]

$$S(\omega) = \sigma\_\varepsilon^2 \frac{\prod\_{l=1}^{N\_\pi} (1 - H\_l e^{-j\omega})}{\prod\_{\substack{N\_d\\k=1}}^{N\_d} (1 - G\_k e^{-j\omega})} \frac{\prod\_{l=1}^{N\_\pi} (1 - \overline{H}\_l e^{j\omega})}{\prod\_{k=1}^{N\_d} (1 - \overline{G}\_k e^{j\omega})},\tag{10}$$

where *σ*<sup>2</sup> *<sup>ε</sup>* is the variance of random process *ε<sup>t</sup>* and *Hl* and *Gk* are the complex conjugates of the constants *Hl* and *Gk*. Furthermore, since we desire that our filter be invertible, it follows that for the model

$$\varepsilon\_{t} = \frac{\prod\_{k=1}^{N\_d} (1 - G\_k q^{-1})}{\prod\_{l=1}^{N\_n} (1 - H\_l q^{-1})} \text{ } \varepsilon\_t \tag{11}$$

the *et* process is invertible if the absolute values of all the *Hl* constants are less than one. Similarly, if the absolute values of all the *Gk* constants is less than one, then the *et* process is stationary [8]. Thus, although multiple processes can have the same spectral density, there is only one that is both stationary and invertible.

Once the general model has been obtained, we can rewrite it as an infinite impulse response model, i.e.,

$$\mathfrak{e}\_t = \mathfrak{e}\_t + \sum\_{k=1}^{\infty} \psi\_{k\cdot\mathfrak{e}\_{t-k}} \tag{12}$$

where *ψ* is an impulse response coefficient. Since we know that the general model converges [8], it follows that we only need a finite number of terms in Equation (12). Furthermore, we note that

$$
\varepsilon\_{t-i} = \varepsilon\_{t-i} + \sum\_{k=1}^{\infty} \psi\_k \varepsilon\_{t-i-k} \tag{13}
$$

which implies that for any positive *i* the random variables *ε<sup>t</sup>* and *et*−*<sup>i</sup>* are uncorrelated (since the process *ε<sup>t</sup>* is white noise). Therefore, successively multiplying both sides of Equation (12) by the values of the corresponding process at delays *i* and taking expectations, we obtain equations for finding the initial estimates of the parameters that involve the covariances of the errors for different lags [10]. Obviously, since the true covariances are not known, they will need to be replaced by the sample estimates. This method of estimating the coefficients does not lead to too large error as long as the absolute values of the parameters of model (7) are not too close to the boundary of unit circle centered on the origin. Thus, it is possible to design the required filter.

#### *2.2. Filter Design*

Let *et* = (*et*, 1, *et*, 2, ... , *et*, *<sup>N</sup>*) <sup>T</sup> be an *N*-dimensional stationary process of the soft sensor's errors whose shaping filter transfer matrix is *F*0(*q*−1), i.e*.,*

$$
\varepsilon\_t = F\_0(q^{-1})\varepsilon\_t \tag{14}
$$

where *q*−<sup>1</sup> is the backshift operator; *ε<sup>t</sup>* = (*εt*, 1, *εt*, 2, ... , *εt*, *<sup>N</sup>*) <sup>T</sup> is an *N*-dimensional vector of white noise; and *<sup>F</sup>*0(*q*−1)=[*fkm*(*q*−1)] is an *<sup>N</sup>* <sup>×</sup> *<sup>N</sup>* matrix function, whose entries denoted as *fkm*(*q*<sup>−</sup>1) are the rational transfer function from *εt,m* to *et, k*. Thus, it is desired to construct the filter that will predict *et*+1 given the past values.

Let *P*(*q*−1) be the desired one-step ahead predictor transfer matrix, *e*ˆ*t*+<sup>1</sup> = *P*(*q*−1)*et* the prediction of the vector *et*+1 at time *<sup>t</sup>*, and *εt*+<sup>1</sup> <sup>=</sup> *et*+1 <sup>−</sup> *<sup>e</sup>*ˆ*t*+<sup>1</sup> the error of the prediction obtained with the aid of the filter *P*(*q*−1). Then

$$\overline{\varepsilon}\_{t} = \varepsilon\_{t} - \mathfrak{e}\_{t} = \varepsilon\_{t} - q^{-1}\mathfrak{e}\_{t+1} = \mathfrak{e}\_{t} - q^{-1}\mathbf{P}\left(q^{-1}\right)\mathfrak{e}\_{t} = \left[I\_{N} - q^{-1}\mathbf{P}\left(q^{-1}\right)\right]\mathfrak{e}\_{t} \tag{15}$$

where *IN* is identity matrix of order *N*. Consequently, the filter in the square brackets transforms the initial series into the prediction error series. If the random vector *ε<sup>t</sup>* includes components correlated with those of the vector *εt*−*<sup>j</sup>* at some *j >* 0, we can predict the errors *ε<sup>t</sup>* using the known previous errors. Using those predictions as corrections to the *et* that were obtained, we could improve the accuracy of the predictions. Hence, in order to maximize the predictor accuracy, we must find a *<sup>P</sup>*(*q*−1) such that the errors *ε<sup>t</sup>* are uncorrelated with the errors *εt*−*<sup>j</sup>* at any *<sup>j</sup>* > 0 with some nonzero correlation between the components of *ε<sup>t</sup>* (i.e., at *<sup>j</sup>* = 0) being admissible. In other words, the time series *ε<sup>t</sup>* must be *<sup>N</sup>*-dimensional white noise. Consequently, *IN* <sup>−</sup> *<sup>q</sup>*−1*P*(*q*−1) = *<sup>F</sup>*<sup>0</sup> <sup>−</sup>1(*q*−1), from which it follows that *<sup>P</sup>*(*q*−1) = *<sup>q</sup>*[*IN* <sup>−</sup> *<sup>F</sup>*<sup>0</sup> <sup>−</sup>1(*q*−1)].

Thus, the predictor transfer matrix *P*(*q*<sup>−</sup>1) can be expressed through the transfer matrix of the shaping filter *F*0(*q*−1). The matrix *F*0(*q*−1) can be found from

$$\mathbf{G}(q^{-1}) = F\_0(q^{-1}) F\_0 \mathbf{^T(q)},\tag{16}$$

where *G*(*q*−1)=[*gkm*(*q*−1)], *gkm*(*q*−1) is the *q*-transform of the statistical estimate of the cross-covariance function of the time series *et, k* and *et, m* (in particular, when *m* = *k*, *gmm* is a *q*-transform of the sample covariance function, i.e., the autocovariance generating function (AGF) of the time series *etm*).

The algorithm for finding *F*0(*q*<sup>−</sup>1) is simplified by decomposing it into *N* stages. At the *k*th stage, a shaping filter *Fk*(*q*<sup>−</sup>1) of the *k*-dimensional process (*et,* 1, *et,* 2, ... , *et, k*) <sup>T</sup> is found. At this stage, the filter *Fk*−*1*(*q*−1), found at the (*k*−1)th stage, is used in order to transform the matrix *Gk*(*q*−1) = *Fk*(*q*−1)*Fk* T(*q*) so that its transform contains nonzero elements in only one line, one column, and on the main diagonal. This technique substantially simplifies the procedure of spectral factorization (finding the matrix function *Fk*(*q*−1)) [11].

The proposed approach allows us to identify the vector time series transfer matrix without resorting to a complicated phase state representation. This advantage is used to obtain an adequate model with relatively few estimated parameters for the initial time series shaping filter *F*0(*q*<sup>−</sup>1). Simultaneously, the model for the transfer matrix of the inverse filter *F*<sup>0</sup> <sup>−</sup>1(*q*−1), which transforms the initial time series into the white noise, is also found.

The algorithm for constructing both the shaping filter *F*0(*q*<sup>−</sup>1) and its inverse *F*<sup>0</sup> <sup>−</sup>1(*q*<sup>−</sup>1) is described in [11]. Based on this algorithm, the sequence of prediction errors *ε<sup>t</sup>* should be *N*-dimensional white noise. However, since in practice, the true characteristics of the original process are not known, but only their estimates, containing inevitable statistical errors, in reality, the properties of the sequence *ε<sup>t</sup>* can be significantly different from the properties of white noise. Thus, to verify the optimality of the resulting model *P*(*q*−1) of the predictive filter, a criterion is needed to test the hypothesis that the process *ε<sup>t</sup>* is *<sup>N</sup>*-dimensional white noise. To construct such a criterion, we can transform the process *ε<sup>t</sup>* in such a way that its spectral density matrix is diagonal. Such a transformation is achieved by means of a rotation of axes in the *<sup>N</sup>*-dimensional variable space *ε*1, *ε*2, ··· ,*ε<sup>N</sup>* [12]. Since the variances of these variables can be made equal to each other by normalization, without loss of generality, we suppose that spectral density matrix of the noise *ε<sup>t</sup>* is an *<sup>N</sup>* <sup>×</sup> *N* identity matrix *IN*.

Consider a univariate sequence *<sup>ξ</sup><sup>k</sup>* <sup>=</sup> *εt*−*j*,*m*, where *<sup>k</sup>* <sup>=</sup> *jN* <sup>+</sup> *<sup>m</sup>*. Please note that each pair couple (*j*, *m*) determines one *k* and each *k* determines one pair couple (*j*, *m*). Consequently, *ε<sup>t</sup>* is multivariate white noise if and only if *<sup>ξ</sup><sup>k</sup>* is univariate white noise. It is known that the spectral density of univariate white noise is constant [8,13]. Thus, testing the hypothesis that *ε<sup>t</sup>* is multivariate white noise is reduced to testing the hypothesis on the constancy of the spectral density of a univariate sequence. This hypothesis can be tested using Kolmogorov's criterion [14].

Please note that only a time series containing prediction errors is used as the initial information for constructing a predictor with the proposed approach. Information about the model with which the predictions were obtained is not used. Therefore, this approach is applicable to any predictive model that involves errors, regardless of the specific properties of the model used.

#### *2.3. Summary of the Proposed Approach*

Thus, the proposed procedure for developing the model can be summarized as follows:

Step 1: Create an initial sample *ut*, *yt*, *t* = 1, 2, ... , *K*. If the plant is already functioning then the initial sample consists of the historical values of *ut*, *yt*. Otherwise, the initial sample is forming during the trial period of the plant. The initial sample is divided into training and testing datasets.

Step 2: Based on the data included in the training sample, the coefficients and delays of the model given by Equation (2) are estimated via solving optimization problem (4).

Step 3: Based on the data included in the training sample, the errors for the model and the corresponding sample spectrum of errors are calculated.

Step 4: Based on the sample spectrum, the order of the ARMA model is selected in order to predict the unknown future error given the known current and past errors.

Step 5: The least squares method is used to find the values of the ARMA model parameters.

Step 6: The ARMA model obtained is used as the predictive filter *F*(*q*<sup>−</sup>1) in the feedback loop of the compensator (bias update term) as shown in Figure 2.

Step 7: If the resulting soft sensor improves the accuracy of the prediction for the test sample then it can be recommended for practical use.

Please note that the obtained predictive filter model can be recommended for further use for the same plant on the data of which it was built. As for the approach, it will certainly be successful if the sequence of errors of the plant is a stationary (or close to it) process. In addition, the class of successful applicability of this approach can be extended to those plants, for whose errors it is possible to find an invertible transformation that brings the sequence of errors to a stationary process. The quality of the developed model should be checked on a test sample that was not used at the stage of the model training.

#### **3. Industrial Application of the Proposed Method**

Industrial methyl tert-butyl ether (MTBE) production occurs in a reactive distillation unit, as shown in Figure 3. The feed containing isobutylene and methanol (MeOH) enters the column. The distillate (D) is a lean butane-butylene fraction with a certain amount of MeOH. The raffinate is the heavy product MTBE that is withdrawn from the bottom part of the column. Table 1 shows the main process variables for the industrial unit. The goal is to develop a soft sensor for the prediction of the concentrations of methyl sec-butyl ether (MSBE), MeOH, and the sum of dimers and trimers of isobutylene (DIME) in the bottom product MTBE.

The measured values of output *ym* and input *xk* variables at the time moment *t* are denoted as *ytm*, *xtk*; *m* = 1, 2, 3; *k* = 1, 2, ... , 10; and *t* = 1, 2, ... , *n*. The existing measurements may be used for development of a predictive model of the form

$$y\_t = b\_0 + b\mathbf{x}\_t + e\_t, \quad t = 1, 2, \dots, n \tag{17}$$

where *yt* = (*yt,* 1, *yt,* 2, *yt,* 3) T; *xt* = (*xt,* 1, *xt,* 2, ... , *xt,* 10) T; *b* is a matrix of the model parameters [*bmk*] of dimension 3 × 10; *b*<sup>0</sup> = (*b*1, *b*2, *b*3) <sup>T</sup> is a vector of the constant biases; *et* = (*et,* 1, *et,* 2, *et,* 3) <sup>T</sup> is a vector of the residuals, and the superscript *T* denotes the transpose. Since Equation (17) can be rewritten as

$$(y\_t - \overline{y}) = b(\mathbf{x}\_t - \overline{\mathbf{x}}) + \mathbf{c}\_t \tag{18}$$

where *y* = <sup>1</sup> *n n* ∑ *t*=1 *yt*, *x* = <sup>1</sup> *n n* ∑ *t*=1 *xt*, then expectations of all the elements of vectors *yt*, *xt*, and *et*, as well as biases vector *b*0, may be considered to be equal to zero without loss of generality.

Although the elements of matrix *b* are unknown, they are easily estimated using the ordinary least squares (OLS) method, which gives [10]

$$\hat{\boldsymbol{\theta}} = \left\{ \left( \mathbf{x}^{\mathrm{T}} \mathbf{x} \right)^{-1} \mathbf{x}^{\mathrm{T}} \mathbf{y} \right\}^{\mathrm{T}} \tag{19}$$

where **X** = [*xtk*]; **Y** = [*ytm*]; *m* = 1, 2, 3; *k* = 1, 2, . . . , 10; and *t* = 1, 2, . . . , *n*.

**Figure 3.** Reactive distillation unit of MTBE production.



For the training sample containing *n* = 400 measurements, the following estimates were obtained:

$$\overline{\mathbf{x}} = \begin{pmatrix} 51.8154 \ 1.8747 \ 52.1154 \ 3.0859 \ 51.9866 \ 0.7580 \ 60.7100 \\ 66.4516 \ 136.3077 \ 64.5725 \end{pmatrix}^{\mathrm{T}}$$

$$\overline{y} = \begin{pmatrix} 0.5440 \, 0.1461 \, 0.0595 \, \text{\AA} \end{pmatrix}^{\text{T}}$$

$$\hat{\boldsymbol{\theta}} = \begin{pmatrix} -0.0151 & 0.2383 & -0.0342 & 0.1401 & 0.0476 \dots \\ -0.0173 & 0.0794 & 0.0281 & 0.1191 & -0.0171 \dots \\ -0.0080 & 0.1118 & -0.0061 & 0.0537 & 0.0134 \dots \\ -1.7361 & 0.0430 & -0.0012 & -0.1019 & 0.0388 \end{pmatrix}$$

$$\begin{pmatrix} 2.9800 & -0.0093 & -0.0072 & -0.1333 & 0.0353 \\ 0.3490 & 0.0215 & -0.0011 & -0.0467 & 0.0098 \end{pmatrix}.$$

The estimated MSE vector for the model (17) is (0.0094 0.0095 0.0021)T, while the vector of sample estimates of variances of the output variables is (0.0321 0.0184 0.0047)T.

Let *R*<sup>2</sup> *<sup>m</sup>* be a sample estimate of the coefficient of determination, i.e., the estimate of a fraction of variance of the dependent variable *ym* explained by model (18), i.e.,

$$R\_m^2 = 1 - \frac{D\_{\varepsilon, m}}{D\_m} \tag{20}$$

where *Dm* is a sample estimate of the variance of the output variable *ym*, *De, m* is the mean squared value of the *et, m* errors, and *m* = 1, 2, 3. This gives *R*<sup>2</sup> <sup>1</sup> = 0.7061, *<sup>R</sup>*<sup>2</sup> <sup>2</sup> = 0.4822, and *R*2 <sup>3</sup> = 0.5467.

Assuming a sampling time of one hour, the estimates of the delay vector *τ*ˆ1 for predicting the output variable *y*<sup>1</sup> is

$$\mathbf{r}\_1 = (4.83 \; 0 \; 2.00 \; 5.00 \; 1.83 \; 0 \; 2.00 \; 0.83 \; 1.00 \; 2.00)$$

and the estimate of the coefficient vector is equal to

$$\hat{b}\_1 = (0.0002\,0.1341 - 0.0360\,0.0064\,0.0451 - 2.3289\,0.0519 - 0.0029 - 0.0819\,0.0442)$$

with *De,* 1(ˆ *b*1, *τ*ˆ1) = 0.0091.

Similarly, for variables *y*<sup>2</sup> and *y*3, we obtain

*τ*ˆ2 = (0.33 0.33 1.67 4.50 0.50 0.67 0.33 0.50 0.50 1.67)

ˆ *b*2= (−0.0263 0.1481 0.0315 0.1947 − 0.0168 3.4223 − 0.0064 − 0.0092 − 0.1513 0.0385); *D*<sup>2</sup> ˆ *b*<sup>2</sup> , *τ*ˆ2) = 0.0088

$$\mathbf{r}\_3 = (4.17\,0\,0.83\,4.33\,0.83\,0.50\,2.00\,0.67\,0.83\,1.00)$$

ˆ *b*3= (−0.0021 0.0811 − 0.0070 − 0.0016 0.0130 0.3795 0.0259 − 0.0015 − 0.0455 0.0098); *D*<sup>3</sup> ˆ *b*<sup>3</sup> , *τ*ˆ3) = 0.0020

> The sample estimate of the coefficient of determination to predict the output variable *ym* denoted by *R*<sup>2</sup> *Lm* is *<sup>R</sup>*<sup>2</sup> *<sup>L</sup>*<sup>1</sup> = 0.7160; *<sup>R</sup>*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> = 0.5200; *<sup>R</sup>*<sup>2</sup> *<sup>L</sup>*<sup>3</sup> = 0.5726.

> The effect of delay accounting was evaluated on a test sample containing 167 measurements. As a result, the MSE of the predictions of output variables *y*1, *y*<sup>2</sup> and *y*<sup>3</sup> decreased by 23%, 10%, and 3%, respectively.

> Now, let us consider modeling the error term. From the spectral density of the errors for *et*, 1 and *et*, 3 shown in Figures 4 and 5, it can be seen that the maximum within the interval [0, 0.5] Hz indicates the presence in the denominator of the spectral density function *<sup>S</sup>*(*ω*) a factor (1 <sup>−</sup> *Ge*−*jω*) with a complex-valued constant *<sup>G</sup>*. Since the sampling time is equal to 12 h, the frequency unit 1/(12 h) is used instead of Hz. However, for the practical application of the filter given by Equation (9), it is necessary that all the coefficients be real [8]. Therefore, the denominator of density *S*(*ω*) must contain a factor

(1 <sup>−</sup> *Ge*−*jω*) along with a factor (1 <sup>−</sup> *Ge*−*jω*). If the frequency response models for *et,* <sup>1</sup> and *et,* <sup>3</sup> processes are limited to these two factors (assuming the numerator is equal to one), then the corresponding spectral density of the second-order autoregressive process approximates well the sample estimates of the spectrum of *et,* <sup>1</sup> and *et,* <sup>3</sup> processes at different values of *G*. However, the insufficiently rapid decrease of the spectral density in the high-frequency region justifies the inclusion in the denominator of the model another multiplier with a real value of the constant *G*.

**Figure 4.** Sample spectrum of the process *et,* 1.

**Figure 5.** Sample spectrum of the process *et,* 3.

In Figure 6, which shows the spectral density for the *et,* <sup>2</sup> errors, the sample spectrum of this time series resembles the spectrum of a first-order autoregressive process [15–17]. However, we note that the stochastic process is not uniquely determined by its spectral density [8]. Therefore, as previously mentioned, we need to include two additional constraints that the resulting model be invertible and realizable. This will ensure that we have a unique model.

**Figure 6.** Sample spectrum of the process *et,* 2.

Based on the theoretical properties of the process, the error models are

$$\begin{aligned} \varepsilon\_{t,\ 1} - \eta\_{11}\varepsilon\_{\{t-1\},\ 1} - \eta\_{21}\varepsilon\_{\{t-2\},\ 1} - \eta\_{31}\varepsilon\_{\{t-3\},\ 1} &= \varepsilon\_{t,\ 1} \\ \varepsilon\_{t,\ 2} - \eta\_{12}\varepsilon\_{\{t-1\},\ 2} &= \varepsilon\_{t,\ 2} \end{aligned} \tag{21}$$

$$\mathfrak{e}\_{\mathfrak{t},\mathfrak{z}} - \eta\_{13}\mathfrak{e}\_{\mathfrak{t}-1),\mathfrak{z}} - \eta\_{23}\mathfrak{e}\_{\mathfrak{t}-2),\mathfrak{z}} - \eta\_{33}\mathfrak{e}\_{\mathfrak{t}-3),\mathfrak{z}} = \mathfrak{e}\_{\mathfrak{t},\mathfrak{z}}$$

where *η* are the parameters to be determined. These parameters can be found using the approach presented in Section 2.2 by multiplying the finite impulse response model by the delayed errors and taking the expectations. For example, for *e*1, this gives

$$
\gamma\_{\dot{i}} = \eta\_{11}\gamma\_{\dot{i}-1} + \eta\_{21}\gamma\_{\dot{i}-2} + \eta\_{31}\gamma\_{\dot{i}-3\prime} \text{ i} = 1,2,3 \tag{22}
$$

where *γ<sup>i</sup>* = cov(*et*1, *e*(*t*−*<sup>i</sup>*)1) = *γ*−*i*.

For the process *et,* 1, the estimates of the coefficients *η*11, *η*<sup>21</sup> and *η*<sup>31</sup> are, respectively, equal to 0.4131, −0.0093, and −0.0528. These values were used as the initial guesses passed to the PEM function. As a result of calculations, the model parameters were found to be: *η*<sup>11</sup> = 0.4175, *η*<sup>21</sup> = 0.03234, *η*<sup>31</sup> = −0.07026. The initial value of coefficient *η*<sup>12</sup> is 0.3748 and its final value is *η*<sup>12</sup> = 0.3758.

Similarly, using Equation (22), the initial guesses were *η*<sup>13</sup> = 0.5142, *η*<sup>23</sup> = −0.0507, and *η*<sup>33</sup> = −0.0207 to give final values of *η*<sup>13</sup> = 0.5151, *η*<sup>23</sup> = −0.02676, and *η*<sup>33</sup> = −0.03246.

The performance of predictive filter models obtained from the analysis of the training dataset is validated using the testing sample. Figures 7–9 compare the predictions against the true values, where the solid line shows the true *et, m* errors and the dashed line their predicted values for *m* = 1, 2, and 3. At the time point *t* on the *x*-axis, the corresponding error *et, m* and the predicted error *e*ˆ*t*,*<sup>m</sup>* computed at *t* − 1.

**Figure 7.** Prediction of the process *et*, 1.

**Figure 8.** Prediction of the process *et*, 2.

**Figure 9.** Prediction of the process *et,* 3.

Figures 10–12 compare the performance of the soft sensors with the proposed filter for error prediction and a traditional method, in which adaptive bias term is calculated based on the moving window (MW) approach [18]. It can be seen that the filter provides better tracking of the process values, therefore improving the accuracy of the overall soft sensor system reducing the MSE of the output variables *y*1, *y*2, and *y*<sup>3</sup> by 32%, 67%, and 9.5%, respectively.

**Figure 10.** Estimation of *ym*1.

**Figure 11.** Estimation of *ym*2.

**Figure 12.** Estimation of *ym*3.

#### **4. Conclusions**

This paper proposed a new approach to handling the bias update term in a soft sensor system. Rather than purely using available samples, the new bias update term seeks to predict what the errors will be in the future. Tests of this approach on a reactive distillation column show that the approach can handle the errors well. However, the predictive filters used only work for areas without serious disturbances or outliers.

Therefore, it makes sense to consider more complex models for the predictive filters including models with an additional component in the form of some flow, for example, Poissonian flow, of events (outliers). If the flow of outliers is added to the process model then the intensity of this flow needs to be estimated. In this case, the number of outliers in the training dataset should be sufficient to estimate the intensity of the flow of outliers with acceptable accuracy.

**Author Contributions:** Conceptualization, all; methodology, Y.A.W.S., A.T., V.K.; software, A.T.; validation, A.T., Y.A.W.S.; formal analysis, all; resources, F.Y., V.K.; writing—original draft preparation, Y.A.W.S., A.T.; writing—review and editing, all; funding acquisition, F.Y., V.K., Y.A.W.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by RFBR and NSFC (grant numbers 21-57-53005 and 62111530057) and National Science and Technology Innovation 2030 Major Project (grant No.2018AAA0101604) of the Ministry of Science and Technology of China.

**Data Availability Statement:** Data can be obtained by contacting the authors.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Andrey A. Galyaev \*, Pavel V. Lysenko and Evgeny Y. Rubinovich**

Institute of Control Sciences of RAS, 117997 Moscow, Russia; pashlys@yandex.ru (P.V.L.); rubinvch@ipu.ru (E.Y.R.)

**\*** Correspondence: galaev@ipu.ru

**Abstract:** This article considers the mathematical aspects of the problem of the optimal interception of a mobile search vehicle moving along random tacks on a given route and searching for a target, which travels parallel to this route. Interception begins when the probability of the target being detected by the search vehicle exceeds a certain threshold value. Interception was carried out by a controlled vehicle (defender) protecting the target. An analytical estimation of this detection probability is proposed. The interception problem was formulated as an optimal stochastic control problem, which was transformed to a deterministic optimization problem. As a result, the optimal control law of the defender was found, and the optimal interception time was estimated. The deterministic problem is a simplified version of the problem whose optimal solution provides a suboptimal solution to the stochastic problem. The obtained control law was compared with classic guidance methods. All the results were obtained analytically and validated with a computer simulation.

**Keywords:** optimal stochastic control; path planning; 2D random search; interception

#### P.V.; Rubinovich, E.Y. Optimal Stochastic Control in the Interception Problem of a Randomly Tacking Vehicle. *Mathematics* **2021**, *9*, 2386. https://doi.org/10.3390/math9192386

**Citation:** Galyaev, A.A.; Lysenko,

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 30 August 2021 Accepted: 22 September 2021 Published: 25 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Search problems have become increasingly popular recently and have attracted a significant number of researchers [1–5]. The search process is considered to be that of exploring a certain area of a physical space in order to detect a searched object (SO) in this area with the search vehicle (SV) using various types of physical sensors. The basis for solving these problems is a symbiosis of models and methods from multiple branches of science, which allows establishing causal relationships among the search conditions, the physical characteristics of the SOs, and the search results.

Mathematical formulations of search problems can include various criteria [6,7] with the goal of the minimization or maximization of these criteria. All search problems can be divided into two groups according to the SO's type: it can be stationary or mobile. The problems of the first type (Chapter 2 of [1]) are easier to solve than the problems of mobile SOs (Chapter 3 of [1,5]), since the parameters of their movement may be unknown to the SV. The problems of the second type have become popular in recent years due to the development of unmanned vehicles such as unmanned aerial vehicles (UAVs) or unmanned underwater vehicles (UUVs), operating in a largely unpredictable and uncertain marine environment [1,8].

The practical applications of such autonomous vehicles and search problems can vary from environmental monitoring and geological exploration to combat and reconnaissance tasks. Therefore, the parameters of the mathematical models can vary greatly depending on the different characteristics of real-world objects and their operating conditions. The problem considered in this article can be applied to objects in the marine environment such as UUVs or autonomous surface vehicles (ASVs), which can serve as both the SO and SV in the model under discussion.

The search can be performed by one [3,5] or several SVs [9,10]. If the SV and SO are on conflicting sides and the search itself is undesirable for the SO [11,12], then we can talk about the so-called threat environment [13,14]. Several SVs can be connected in a network structure and form a dynamically changing threat map [10,15]. The task of the SO (UUV or UAV) in this case is to avoid these threats while moving. The trajectory planning problem can be formulated for the SO when the threat mapping is known. If the dynamics of the SO is also known, then these problems are classical problems of deterministic optimal control.

If the SV presents a danger to the SO, the problem of interception can be considered. There is a vast class of such problems with various formulations and models of the moving vehicles. These models may include restrictions on the maneuverability of the vehicles [16–18]. Moreover, the problem can be considered optimal if any criterion, as for example, the intercept time, must be minimized [19–21]. In most problems studied in the literature, the intercepted vehicle moves along a given programmed trajectory [22]. Meanwhile, real vehicles as a rule move in a stochastic way, and this case is considered in the presented article.

The article relates to various branches of mathematics, such as stochastic control, guidance, information processing and search, and optimization, and is devoted to the problem of the optimal interception of an SV that moves randomly on tacks along a given course and searches for a target SO. The interception is carried out by a controlled mobile vehicle protecting the target SO. The presence of an arbitrarily maneuvering search vehicle requires an adequate mathematical formalization in the form of a stochastic control problem. The maneuvering process can be conveniently formalized using a jump-like Markov process with a given state vector and a given matrix of the transition intensities between these states. Such a model allows us to describe the trajectory of the SV in the form of a linear stochastic differential equation, which makes it possible to obtain the equations of the evolution of the mathematical expectation and variance. These equations allow us to formulate the problem of SV interception by the controlled vehicle with the criterion of a predicted miss or with a given mathematical expectation of a miss at the final position of the SV [16–21]. The purpose of the article is to find an interception trajectory of the controlled defender vehicle as a result of solving the optimal stochastic control problem and comparing this trajectory with classical guidance algorithms such as the pursuit guidance method and the method of proportional navigation guidance [23–25].

The considered problem belongs to the "attacker–target–defender" type [26–28], the essence of which is a counteraction to the SV (attacker) from the SO (target), which can be a certain strategically important mobile vehicle, by using an autonomous attacking robotic complex (defender), for example an UAV or UUV.

In this article, by SV, we mean a vehicle moving programmatically or randomly on a plane equipped with a circular detection zone of a fixed radius. The goal of the SV is to detect the SO, i.e., to cover the point of the plane depicting the SO with its detection zone and maximize some functional that characterizes the reliability of detecting the SO in this zone. The reliability of the detection (probability of correct classification) of the SV may depend on various physical factors, in particular on the time spent by the SO in the detection zone, its current distance from the SV, the direction of the velocity vector of the SO, etc. [29].

We considered the SO to be able to observe the real trajectory of the SV and evaluate the characteristics of its movement, i.e., current coordinates and components of the velocity vector. At some point in time, the SO releases a mobile defender, which moves autonomously and stealthily and does not have a communication channel with the SO. It was also assumed that the defender can evaluate the current motion characteristics of the SV using its passive onboard sensors. The stealthiness of the defender is provided, in particular, with its low velocity.

The proposed work has the following structure. In Section 2, the model of the SV with a given detection zone is considered. Section 3 contains a statistic description of the detection probability of the SO moving along a straight-line trajectory. In Section 4, the interception problem is formulated as an optimal stochastic control problem. This problem is analytically solved in Section 5, and the obtained results are discussed and illustrated with simulation examples in Section 6. Section 7 concludes the article and suggests the direction for future work.

#### **2. Model of the SV's Movement on Tacks**

The search system consists of one SV, which has a circle detection zone of radius *R*. The SV moves piecewise-rectilinearly on a plane, tacking randomly around the line of the general course. The origin *O* of the stationary Cartesian coordinate system *XOY* is situated in the initial position of the SV, as shown in Figure 1. This coordinate system is oriented in such a way that its *OX* axis coincides with the line of the general course of the SV.

**Figure 1.** The SV's trajectory.

The SV moves on tacks in accordance with the following law:

$$\begin{cases} \begin{aligned} \dot{x}\_{SV} &= v\_x = v \cos a\_\prime \\ \dot{y}\_{SV} &= v\_y = \theta\_t v \sin a\_\prime \end{aligned} \tag{1} \end{aligned} \tag{1}$$

where *α* is the specified tacking angle, *v* is the SV's search speed, and *θ<sup>t</sup>* is a random jump-like Markov process. The component of the SV's velocity vector *v* along the line of the general course is constant:

*vx* = const.

Figure 2 shows a velocity diagram of the SV. As follows from (1), tacking was performed by periodically changing the velocity component *vy* according to a random Markov process *θ<sup>t</sup>* with a finite vector of states *J* = (*j*1, *j*2, ... , *jn*) and a given matrix of the transition intensities between these states Λ. This article discusses the case of processes with three states *J* = (−1, 0, 1). This means that the SV's velocity vector can coincide with the general course line (*θ<sup>t</sup>* = 0) or deviate from it by a constant angle equal to ±*α* (when *θ<sup>t</sup>* = ±1), as shown in Figure 2.

**Figure 2.** Velocity diagram of the SV.

We considered transitions between process states equally possible with transition intensity matrix:

$$
\Lambda = \lambda \begin{pmatrix} -2 & 1 & 1 \\ 1 & -2 & 1 \\ & 1 & 1 & -2 \end{pmatrix} \tag{2}
$$

corresponding to the state vector *J*. The variable *λ* here is *λ* = 1/*τ*0, where *τ*<sup>0</sup> is the average time of the SV being on one tack. This model generates random trajectories that have the approximate shape shown in Figure 1.

For the mathematical formulation of the stochastic optimization problem, it is convenient to study the Gaussian Markov analog instead of the jump-like process *θt*. This diffusion process Θ*<sup>t</sup>* has the same mathematical expectation and correlation function as the process *θt*. It follows from the theory of jump-like Markov processes that Θ*<sup>t</sup>* allows the stochastic Ito differential [30]:

$$d\Theta\_t = -D\Theta\_t dt + \sigma dw\_{t\prime} \tag{3}$$

where *wt* is a standard Wiener process and *D*, *σ* are constants related to the original Markov process *θt*: *D* - 3*λ* and *σ* - 2 tan *α* <sup>√</sup>*λ*.

#### **3. Detection Probability of the SO Moving at a Constant Velocity**

Firstly, let us consider the task of detecting a target SO (target) with the SV, whose dynamics is described in Section 2. The following model was investigated. The target moves at a constant speed parallel to the general course line of the SV at a distance *l* from it.

The initial distance between the vehicles along the general course is *L*, so the initial Cartesian distance is *L*<sup>2</sup> + *l*2. The SV is moving according to (1), where *θ<sup>t</sup>* is a random Markov process with the state vector *J* and the transition matrix Λ from (2). The target moves according to the law:

$$\begin{cases} \quad \dot{\mathfrak{x}} = -u, \\ \dot{\mathfrak{y}} = 0, \end{cases} \tag{4}$$

where *u* is its constant velocity.

The target will be detected if the distance between it and the SV becomes less than *R*. To simplify the model, let us assume that the detection is successful when the target's and SV's *x*-coordinates become equal at some point in time: *xSV*(*ϑ*) = *x*(*ϑ*), and the inequality |*ySV*(*ϑ*) − *y*(*ϑ*)| ≤ *R* is satisfied for the *y* coordinates.

The rendezvous instant *ϑ* is defined as:

$$\vartheta = \frac{L}{v \cos \alpha + u}.\tag{5}$$

The probability of detection will be determined by including the *ySV* coordinate in the interval [*l* − *R*, *l* + *R*], namely:

$$\mathbf{P}\_{\text{det}} = \mathbf{P}\{l - R \le y\_{SV}(\theta) \le l + R\} = \mathbf{P}\left\{\frac{l - R}{v \sin a} \le \int\_0^{\theta} \theta\_s \, ds \le \frac{l + R}{v \sin a}\right\}.\tag{6}$$

As mentioned in (3), the random jump-like Markov process *θ<sup>t</sup>* can be replaced with its Gaussian Markov analog Θ*t*, which has the same mathematical expectation and correlation function as the process *θt*.

Further, instead of calculating the random integral (6), we estimated the target detection probability by the SV through the analytical approximation of probability histograms, obtained in the numerical simulation. We assumed that at the instant *t*<sup>0</sup> = 0, the target is situated in the position *<sup>E</sup>*<sup>0</sup> = (*L*, *<sup>l</sup>*) and *<sup>L</sup>* 1 (as shown in Figure 3) and the velocity of the target *u* < 1.

**Figure 3.** Relative positions of the SV and SO.

Due to the latter assumption, the SV's detection zone can be considered as a flat-line segment with the length of the diameter instead of the circle. Thus, the detection probability can be estimated as the probability of meeting the target with this segment.

The histograms of the distribution density of the *ySV* coordinate obtained in the interval [*l* − Δ*l*, *l* + Δ*l*] for some small Δ*l* are well approximated by the symmetric density of the Gaussian distribution. Figure 4 depicts the histogram of the probability of meeting between the target and SV and the corresponding density of the Gaussian distributions: <sup>N</sup> (0, *<sup>σ</sup>*<sup>2</sup> <sup>1</sup> ) for *<sup>σ</sup>*<sup>1</sup> <sup>=</sup> 0.705 for the case *<sup>L</sup>* <sup>=</sup> *<sup>L</sup>*<sup>1</sup> <sup>=</sup> 5 (Figure 4a) and <sup>N</sup> (0, *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> ) for *σ*<sup>2</sup> = 0.993 for the case *L* = *L*<sup>2</sup> = 10 (Figure 4b). The histograms were constructed as a result of computer simulation of the movement of the target and SV for 10,000 implementations of the SV trajectory corresponding to *λ* = 5/3.

**Figure 4.** Histograms of the probability detection distribution density of the target moving at a constant velocity.

These graphs allowed us to estimate the SV's detection probability **P**det at its various initial positions. Now, Equation (6) may be approximated as:

$$\mathbf{P}\_{\text{det}} = \mathbf{P}\{l - R \le y\_{SV}(\theta) \le l + R\} = \frac{1}{\sqrt{2\pi\sigma\_i}} \int\_{l-R}^{l+R} \exp\left(-y^2/(2\sigma\_i^2)\right) dy,\tag{7}$$

where *σ<sup>i</sup>* corresponds to various parameters (*Li*, *li*, *ui*). In particular, when *l* = *l*<sup>1</sup> = 1.5 and *l* = *l*<sup>2</sup> = 2.5 for *L*<sup>1</sup> and *L*2, respectively, these probabilities are presented in Table 1. In all cases, the velocity of the target is *u* = 0.3. All values are given in a normalized scale.

**Table 1.** The detection probability of the target **P**det at its various initial positions *E*<sup>0</sup> = (*L*, *l*).


Next, we introduced a certain threshold value (security threshold) *h* < 1 of the permissible detection probability of **P**det, for example *h* = 0.07. The situation with **P**det ≤ *h* is considered *safe*. In this case, the target continues to move in a straight line without changing its course and speed. If **P**det > *h*, then the situation is considered *dangerous*. It was assumed that in the case of a dangerous situation, the target (to prevent the negative consequences of possible detection) uses the mobile defender mentioned in the Introduction, whose task is to intercept the SV with a minimum standard error at a given point in the plane relative to the SV.

The minimization of this miss is associated with the solution of the following optimal stochastic control problem.

#### **4. Optimal Stochastic Control Problem**

The problem was considered in a moving Cartesian coordinate system *XtOtYt*, where the origin *Ot* is associated with the current position *P<sup>t</sup>* of the SV and the axis *OtXt* is directed parallel to the SV's general course. The current position of the defender *E<sup>t</sup>* <sup>2</sup> is given by a two-dimensional vector *Z<sup>t</sup>* <sup>2</sup> directed from *Ot* - *P<sup>t</sup>* to *E<sup>t</sup>* 2.

Terminal position *E<sup>ϑ</sup>* <sup>2</sup> of the defender is defined by a given two-dimensional vector *d*, as shown in Figure 5. An auxiliary vector *η<sup>t</sup>* - *Z<sup>t</sup>* <sup>2</sup> − *d* was introduced for a more convenient formulation of the defender's optimal control problem.

**Figure 5.** Geometry of the problem.

In the selected coordinate system, the equations of the relative motion of the defender– SV system have the form:

$$Z\_2^t = u\_t - \left(\begin{array}{c} 1 \\ \ominus\_t \end{array}\right)\_{\prime} \quad u\_t = \left(\begin{array}{c} u\_x^t \\ u\_y^t \end{array}\right)\_{\prime} \tag{8}$$

where Θ*<sup>t</sup>* is from (3) and the initial position of *Z*<sup>0</sup> <sup>2</sup> were set. The two-dimensional velocity vector *ut* of the defender plays the role of the control and is subject to the restrictions:

$$|u\_t| \le \beta < 1\tag{9}$$

with the specified constant *β*.

In terms of the auxiliary vector *η<sup>t</sup>* introduced above, the equations of motion (8) take the compact form:

$$
\eta\_t = u\_t + A + B\Theta\_{t\prime} \quad \eta\_0 \triangleq Z^0\_{2\prime} \tag{10}
$$

where:

$$A = \begin{pmatrix} -1 \\ 0 \end{pmatrix}, \quad B = \begin{pmatrix} 0 \\ -1 \end{pmatrix}. \tag{11}$$

At the terminal moment *ϑ*, the following condition must be met:

$$\mathbb{E}\eta\_{\vartheta} = 0,\tag{12}$$

where **E** is the sign of the mathematical expectation. As a criterion, we took the terminal functional:

$$\operatorname{EG}(\eta\_{\theta'} \Theta\_{\theta}) \to \min\_{u\_t} \tag{13}$$

where:

$$G(\eta\_{\theta}, \Theta\_{\theta}) = \eta\_{\theta}^{2} + \gamma \Theta\_{\theta}. \tag{14}$$

In (13) and (14), the summand *η*<sup>2</sup> *<sup>ϑ</sup>* characterizes the standard deviation of the defender from the end of the vector *d* at the terminal moment *ϑ*. The term *γ***E** Θ*ϑ*, where *γ* is a given constant, plays the role of an additional terminal penalty for the "convenient" or "inconvenient" tack of the SV at the time of *ϑ*. Here, the words "convenient" or "inconvenient" are used in the following sense. The tack of the SV at the time of *ϑ* is considered "convenient" if Θ*<sup>ϑ</sup>* < 0, i.e., the component of the velocity of the SV along the *OY* axis is negative (the SV is moving away from the line of the movement of the target *E*1). Otherwise, we considered the tack of the SV "inconvenient".

#### **5. Optimal Stochastic Control**

*5.1. Reduction of the Optimal Stochastic Control Problem to the Deterministic One*

It is known that solving stochastic optimization problems in real time is associated with certain difficulties [30]. For this reason, instead of the original stochastic problem (3), (9)–(14), we solved its deterministic analog. To construct this analog, we need the following auxiliary results.

The solution of Equation (3) has the form:

$$
\Theta\_t = e^{-Dt} \Theta\_0 + \sigma \int\_0^t e^{-D(t-s)} dw\_s. \tag{15}
$$

Integration (15) leads to the equation:

$$\int\_{0}^{t} \Theta\_{5} ds = \frac{\Theta\_{0}}{D} \left( 1 - e^{-Dt} \right) + \frac{\sigma}{D} \int\_{0}^{t} \left( 1 - e^{-D(t-s)} \right) dw\_{5} \,. \tag{16}$$

Now, let us calculate the value of the criterion (13) with an arbitrary permissible program control *ut* and the parameter *ϑ* fixed at the moment *t*<sup>0</sup> = 0. To this end, we integrated the equations of motion (10) taking into account (16). We have:

$$\eta\_{\theta} = \eta\_0 + A\theta + B\frac{\theta\_0}{D}(1 - e^{-D\theta}) + B\frac{\sigma}{D}\int\_0^{\theta} (1 - e^{-D(\theta - s)}) dw\_s + \int\_0^{\theta} u\_s ds. \tag{17}$$

From (12) and (17) follows:

$$\mathbb{E}\eta\_{\theta} = \eta\_0 + A\theta + B\frac{\Theta\_0}{D}\left(1 - e^{-D\theta}\right) + \int\_0^{\theta} u\_s ds = 0. \tag{18}$$

Finally, from (17) and (18), we obtain:

$$\mathbb{E}\,\eta\_{\theta}^{2} = \frac{\sigma^{2}}{D^{2}} \Big[\theta - \frac{2}{D} \left(1 - e^{-D\theta}\right) + \frac{1}{2D} \left(1 - e^{-2D\theta}\right)\Big].\tag{19}$$

Thus, the (13) criterion takes the form:

$$\mathbb{E}G = \frac{\sigma^2}{D^2} \left[ \theta - \frac{2}{D} (1 - e^{-D\theta}) + \frac{1}{2D} (1 - e^{-2D\theta}) \right] + \gamma e^{-D\theta} \Theta\_0 \to \min\_{u\_t}.\tag{20}$$

Now, we transformed (18) by introducing a two-dimensional vector *ξ<sup>t</sup>* subordinate to the equation:

$$
\dot{\xi}\_t = A + B\Theta\_0 \mathbf{e}^{-Dt} + \mathfrak{u}\_t \tag{21}
$$

with boundary conditions:

$$
\mathfrak{F}\_0 = \eta\_{0\prime} \qquad \mathfrak{F}\_\theta = 0. \tag{22}
$$

In terms of the vector *ξt*, the desired deterministic analog is the following auxiliary problem of optimal (deterministic) control, which includes the equations of motion (21), boundary conditions (22), control constraints (9), and terminal criterion *F*(*ϑ*) → min *ut* , where *<sup>F</sup>*(*ϑ*) denotes the right-hand side of (20) with the excluded additive constants <sup>−</sup>2*σ*2/*D*<sup>3</sup> and *σ*2/(2*D*3):

$$F(\theta) \triangleq \frac{\sigma^2}{D^2} \left[ \theta + \frac{2}{D} e^{-D\theta} - \frac{1}{2D} e^{-2D\theta} \right] + \gamma e^{-D\theta} \Theta\_0 \to \min\_{u\_t}.\tag{23}$$

#### *5.2. Pontryagin's Maximum Principle in the Auxiliary Optimal Problem (23)*

To solve the auxiliary problem, we used Pontryagin maximum principle (PMP) [31]. According to the procedure of PMP, firstly, we constructed the Hamiltonian:

$$H = \lambda\_{\mathbb{S}} \cdot \left( A + B \theta\_0 e^{-Dt} \right) + \lambda\_{\mathbb{S}} \cdot \mathfrak{u}\_t \to \max\_{u\_t}.\tag{24}$$

Here, the dot between the two-dimensional vectors means a scalar product, and *λξ* = *λξ* (*t*) is a conjugate variable corresponding to the phase variable *ξt*. From (24), we found the explicit form of the optimal control (here and further, the \* symbol indicates the optimal controls):

$$
\mu\_t^\* = \beta \frac{\lambda\_\xi(t)}{|\lambda\_\xi(t)|}. \tag{25}
$$

The conjugate variable satisfies [31]:

$$\dot{\lambda}\_{\xi}(t) = -\frac{\partial H}{\partial \xi}(t) = 0;\tag{26}$$

hence *λξ* (*t*) = *λξ* = const, which leads to *u*<sup>∗</sup> *<sup>t</sup>* = *u*<sup>∗</sup> = const with |*u*∗| = *β*. In other words, the program motion of the controlled object is implemented in a straight line with the maximum possible speed. The transversality conditions at instant *ϑ* are given by:

$$
\delta F(\theta) + \lambda\_{\tilde{\xi}} \cdot \delta \tilde{\xi} - H \delta \theta = 0,\tag{27}
$$

where according to (23):

$$
\delta F(\theta) = \frac{\partial F(\theta)}{\partial \theta} \delta \theta = \frac{\sigma^2}{D^2} \left[ 1 - 2e^{-D\theta} + e^{-2D\theta} \right] \delta \theta - \gamma D \varepsilon^{-D\theta} \theta\_0 \,\delta \theta. \tag{28}
$$

Following (27), (28):

$$H(\theta) = \frac{\sigma^2}{D^2} \left[ 1 - 2e^{-D\theta} + e^{-2D\theta} \right] - \gamma D e^{-D\theta} \theta\_0. \tag{29}$$

Integrating (21), taking into account (22), gives:

$$
\theta \eta\_0 + A\theta + B\frac{\theta\_0}{D}(1 - e^{-D\theta}) + \mu^\* \theta = 0\tag{30}
$$

that naturally coincides with (18) under *ut* = *u*∗.

Next, we put

$$\begin{cases} \; \mu^\* \stackrel{\triangle}{=} \beta(\cos \varphi, \sin \varphi), & \text{with} \quad \varphi = \text{const}, \\\\ \; \eta\_0 \stackrel{\triangle}{=} (\mathfrak{x}\_0, \mathfrak{y}\_0). \end{cases} \tag{31}$$

Then, from (30) and (31), we have in a componentwise form of the system of two equations with respect to *ϕ* and *ϑ*:

$$\begin{cases} \begin{aligned} \chi\_0 - \theta + \beta \theta \cos \varphi &= 0, \\\\ y\_0 + \beta \theta \sin \varphi - \frac{\theta\_0}{D} \left( 1 - e^{-D\theta} \right) &= 0. \end{aligned} \tag{32}$$

From (32) follows:

$$\begin{cases} \cos \varphi = (\theta - \mathbf{x}\_0)(\beta \theta)^{-1} , \\\\ \sin \varphi = \left[ \frac{\Theta\_0}{D} \left( 1 - e^{-D\theta} \right) - y\_0 \right] (\beta \theta)^{-1} , \end{cases} \tag{33}$$

where *ϑ* can be found as the least-positive root of the equation, following from the identical equality cos2 *ϕ* + sin2 *ϕ* = 1 with respect to the right parts of (33), namely:

$$(\theta - x\_0)^2 + \left[\frac{\Theta\_0}{D} \left(1 - e^{-D\theta}\right) - y\_0\right]^2 = \beta^2 \theta^2. \tag{34}$$

Formulas (33) and (34) allow us to find the velocity components of the controlled object and the time interval [0, *ϑ*] of its motion from the initial position to the end of the vector *d*.

If *Dϑ* in (34) is sufficiently large, then the term *e*−*D<sup>ϑ</sup>* is close to zero and can be omitted. In this case, (34) takes the form:

$$(\theta - x\_0)^2 + \left(\frac{\Theta\_0}{D} - y\_0\right)^2 = \beta^2 \theta^2. \tag{35}$$

Then, the instant *ϑ* can be found as the least root of the square Equation (35):

$$\theta = \frac{\mathbf{x}\_0 - \sqrt{\mathbf{x}\_0^2 - (1 - \beta^2) \left(\mathbf{x}\_0^2 + \left(\frac{\Theta\_0}{D} - y\_0\right)^2\right)}}{1 - \beta^2}. \tag{36}$$

To construct a positional optimal control (feedback control) of the defender, the current moment *t* was taken as the initial *t*0, the current position (*xt*, *yt*) was taken as the initial (*x*0, *y*0), and the current value of Θ*t*—for the initial Θ0; after that, the instantaneous direction of the vector *u*∗ *<sup>t</sup>* of the defender's velocity was calculated using the formulas (31) taking into account (33) and (36). Next, *u*∗ *<sup>t</sup>* was recalculated at the rate of updating the current information. Note that at a high rate of updating this information, it may be quite justified to use the piecewise program control of the defender, in which its control is recalculated only at certain moments called correction moments with intervals between them Δ*tu*. During these intervals, the defender moves programmatically according to control *u*∗ *<sup>t</sup>* , calculated in the previous step.

#### **6. Examples**

To demonstrate the effectiveness of the obtained optimal control, a numerical simulation was performed for two approaches for studying the interaction between the defender and SV. These approaches differ in the mathematical description of the evolution of the *y*component of the SV's velocity. In the first (discrete) approach, this component is piecewise constant and its evolution is described as a jump-like Markov process *θ<sup>t</sup>* with three states (1, 0, −1) and the transition intensity matrix Λ from (2). The description of this process is given in the beginning of Section 2. In the second (continuous) approach, an evolution of the *y*-component of the SV's velocity vector is set by Gaussian process Θ*t*, i.e., continuous diffusion process (3).

In both approaches, the control of the defender was obtained through Equations (31), (33), and (36). In other words, the control of the defender is always calculated according to the continuous diffusive model (3) of the evolution of the *y*-component of the SV's velocity vector. Strictly speaking, as this control law is the result of the solution of the continuous problem, it should not always successfully solve the discrete problem, simulated in the first approach. The idea of these experiments is to apply the solution of the continuous problem, which can be solved analytically, to the similar discrete practical model, which cannot be studied in the same convenient way. In all experiments, vector *d* was considered to be null, i.e., the defender has to intercept the SV.

Both approaches to the simulation are shown in further examples, which were devoted to two different applications of the studied interception problem.

The realization of diffusive process Θ*<sup>t</sup>* was acquired in Maple with the package for stochastic equations. An approximate formula for *ϑ* (36) was used for the stochastic differential Equation (15). Thus, Maple allows integrating this equation numerically and obtaining the optimal trajectory of the defender, as well as the random trajectory of the SV corresponding to the process with the appropriate mathematical expectation and dispersion.

A more practical discrete jump-like process *θ<sup>t</sup>* was simulated in Python script. The movement of the SV and defender was computed with a very small discretization step Δ*t*, which is the quality of the simulation. At each step, the SV, according to the model from Section 3, can change the direction of its *vy* velocity component with probability 2*λ*Δ*t* or not change it with probability (1 − 2*λ*Δ*t*). However, in practice, this model is not very useful. This process is identical to a Gaussian process: the time of another SV tack is sampled exponentially with mathematical expectation 1/*λ*, and the direction of the vertical velocity for this tack is chosen from two directions, different from the current one with probability 1/2. The defender, on the other hand, has its own parameter Δ*tu* and corrects its control law according to (36) every interval Δ*tu*, considering the current positions to be initial.

#### *6.1. Intrusion in the Detection Zone*

The first application is the intrusion of the SV's detection zone by the defender to distract the SV from the target. In normalized scale, these parameters are:

$$R = 1, \quad v\_{\pi} = 1, \quad \pi\_{\theta} = 0.6.$$

Let tan *α* = 0.5. Then, the parameters for Gauss process Θ*<sup>t</sup>* are:

$$
\lambda = \frac{1}{\pi\_0} \approx 1.67, \quad D = 3 \\
\lambda \approx 5, \quad \sigma = 2 \tan \alpha \\
\sqrt{\lambda} = 1.29.
$$

In the coordinate system associated with the initial position of the SV, the initial coordinates of the defender's position are (10, 1) in the normalized scale. The velocity of the defender was chosen as *β* = 0.5. The probability of the detection of the target following a parallel course from this coordinates equals **P**det = 0.5, which is higher than the accepted security threshold *h* = 0.07. Thus, according to the above-described security concept, the target must use a mobile defender.

The results of this experiment are shown in Figure 6. The red line depicts the trajectory of the defender, whereas the blue one, that of the SV. Figure 6a shows the evolution of the *y*-component of the SV's velocity according to Markov jump-like process *θt*. Figure 6b shows the trajectories of the vehicles for the diffusion approximation Θ*<sup>t</sup>* of the process *θt*. In Figure 6a, the black ellipse depicts the circular detection zone of radius *R*, which looks ellipsoidal due to the different scale of the *OX* and *OY* axes. In the case of the discrete model, the parameter Δ*tu* is equal to *τ*0. In the case of the continuous model, the calculation of the defender's optimal control is performed in time with the SV's information updating, i.e., almost continuously (Δ*tu* equals the simulation discretization step).

**Figure 6.** Intrusion of the SV's detection zone. (**a**) SV and defender trajectories corresponding to the path of *θt*; (**b**) SV and defender trajectories corresponding to the path of Θ*t*.

For the estimation of time *ϑ*, Equation (36) was used. According to (36), interception time *ϑ* = 7, which means *e* <sup>−</sup>*D<sup>ϑ</sup>* <sup>≈</sup> 0, i.e., 1 <sup>−</sup> *<sup>e</sup>* <sup>−</sup>*D<sup>ϑ</sup>* <sup>≈</sup> 1, so *ut* can be found from Equations (31), (33), and (36). One can see in Figure 6 that the trajectories of the defender for the discrete and continuous models of the SV's movement were quite close. The difference of the trajectories in the final sections was due to the significant duration of the interval Δ*tu* between the updates of the information about the SV and, thereby, the corrections of the defender's program control in the discrete approach.

As one can see, the problem of interception was solved successfully, as the defender moving from the initial position with the found *u* control finally occurred in the close vicinity of the SV.

#### *6.2. Destruction of the SV*

The second application is the task of the destruction of the SV using the defender. To complete this task, the defender must come close enough to the SV. In the normalized scale:

$$
\mathbb{R} = 1, \quad v\_{\mathbf{x}} = 1, \quad \tau\_{\mathbf{o}} \approx 60.1
$$

Let tan *α* = 0.5. Therefore:

$$
\lambda = 0.017, \quad D = 0.05, \quad \sigma = 0.13.
$$

In the coordinate system associated with the initial position of the SV, the initial coordinates of the defender are (300, 20) in the normalized scale. The velocity of the defender was chosen as *β* = 0.5. As the target moves parallel to the general course of the SV, then the detection probability **P**det equals **P**det = 0.37 > *h* = 0.07; thus, using the defender is justified.

The results of the modeling are presented in Figure 7. As in the first example, Figure 7a corresponds to the discrete approach to the simulation and the process *θt*, and Figure 7b relates to the continuous approach and the process Θ*t*.

**Figure 7.** Destruction of the SV. (**a**) SV and defender trajectories corresponding to the path of *θt*; (**b**) SV and defender trajectories corresponding to the path of Θ*t*.

The accuracy of the interception of the SV by the defender or the so-called terminal miss obviously depends on the parameter Δ*tu*—the time interval between corrections of the defender's control. Figure 8 presents the results of different simulations of the interception of the SV by the defender for the discrete approach. Figure 8a corresponds to the case of Δ*tu* = *τ*0. A sufficient miss of the defender can be explained by the relatively significant duration Δ*tu* of its movement without control correction and the "inconvenient" realization of the tack, which combined with the velocity advantage (*β* < 1) allowed the SV to avoid interception by the defender. However, decreasing the parameter Δ*tu* helped achieve more satisfactory results, as shown in Figure 8b. For two similar realizations of process *θ<sup>t</sup>* (blue lines), the trajectories of the controlled defender (red lines) were clearly very different with dependence on the parameter Δ*tu* (*τ*<sup>0</sup> and *τ*0/10, respectively).

**Figure 8.** Interception trajectories with different values of Δ*tu*.

#### *6.3. Comparison with Classic Guidance Methods*

The optimal control law of the defender obtained here was compared with classic guidance methods, mentioned in the Introduction, such as the pursuit guidance method and parallel guidance, which is a specific case of the proportional navigation guidance method. On average, our method gave better results than the others. In Figure 9, a typical realization of different simulated guidance methods is presented. The orange line designates the trajectory of the defender, acting according to the pursuit guidance method; the red line denotes the trajectory generated by the parallel guidance algorithm; the blue graph shows the SV's movement. The defender, controlled according to Equations (31), (33) and (36), has a green trajectory. Dashed lines illustrate the distances on the *Y* axis between the SV and defender at instant *ϑ* when their *X*-coordinates coincide.

**Figure 9.** Comparison of different guidance methods.

As one can see, the green defender was closer to the SV than the others. Classic guidance methods are effective when the pursuer velocity is higher than the one of the evader. That is not the case in the current study, because the defender's velocity *β* was less than the velocity of the SV. Moreover, the classic guidance methods are not intended to be use for intercepting stochastic targets, unlike the control law obtained in this article as a solution of the stochastic optimal control problem.

#### **7. Conclusions**

The article considered one "attacker–target–defender"-type problem of the interaction on a plane between the search system, consisting of one search vehicle with the circle detection zone, and the mobile searched object. The search vehicle tacked randomly along a given general course towards the searched object, and its movement was described using a Markov jump-like process. The searched object had a mobile defender onboard, which can be used for the distraction and destruction of the search vehicle, if it presents a danger to the searched object in the sense of its detection. The feature of this problem is that the defender has lower dynamic capabilities in comparison to the searching vehicle being intercepted.

It was shown that, being stochastic in nature, the optimal control problem of the interception of a search vehicle can be transformed into the classic deterministic problem of optimal control in the class of piecewise-programmatic controls. The optimal time of interception was estimated, and an optimal control law was found. The examples of the numerical simulations for both the discrete and continuous (stochastic and deterministic) problems were presented to reveal the efficiency of the designed results. Furthermore, a comparison with the interception solutions, based on classic guidance laws, was presented.

In the future, it is planned to consider a similar problem statement with a group of search vehicles instead of one.

**Author Contributions:** Conceptualization, A.A.G. and E.Y.R.; methodology, A.A.G. and E.Y.R.; software, P.V.L.; validation, A.A.G., P.V.L. and E.Y.R.; formal analysis, A.A.G., P.V.L. and E.Y.R.; investigation, A.A.G. and E.Y.R.; writing—original draft preparation, A.A.G., P.V.L. and E.Y.R.; writing—review and editing, A.A.G., P.V.L. and E.Y.R.; visualization, P.V.L.; supervision, A.A.G. and E.Y.R.; project administration, A.A.G.; funding acquisition, A.A.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This paper was partially supported by the Program of Basic Research of RAS.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


## *Article* **Methods of Ensuring Invariance with Respect to External Disturbances: Overview and New Advances**

**Aleksey Antipov 1,\*, Svetlana Krasnova <sup>1</sup> and Victor Utkin <sup>2</sup>**


**Abstract:** In this paper, we carry out a demonstration and comparative analysis of known methods of the synthesis of various control laws ensuring the invariance of the output (controlled) variable with respect to external disturbances under various assumptions about their type and channels of acting on the control plant. Methods of the synthesis are presented on the example of a third-order nonlinear system with single input and single output (SISO-systems), dynamic feedback synthesis is presented at a descriptive level and the focus is on procedures of static feedback synthesis. For the systems in which the matching conditions are not satisfied, it is concluded that it is expedient to introduce smooth and bounded nonlinear local feedbacks. Within the framework of the block control principle, we developed an iterative procedure of synthesis of S-shaped sigmoid feedbacks for such systems. Nonlinear local feedbacks ensure stabilization of the output variable with the given accuracy and settling time as in a system with traditionally used linear local feedbacks with high gains. However, in contrast to it, sigmoid functions do not lead to a large overshoot of state variables and control actions.

**Keywords:** external disturbances; invariance; block control principle; decomposition; high-gain factors; sliding mode control; sigmoid function

#### **1. Introduction**

The basic issue of automatic control theory is the tracking problem, which consists in a convergence of the output variables to the reference admissible signals with the given performance of the transient and steady-state processes. The main efforts of researchers are aimed at solving this problem for the control plants, operating under the action of external uncontrolled disturbances. The methods of the synthesis of invariant systems used at the present stage are quite diverse. However, their effectiveness and applicability depend on many factors. Firstly, our goal is to systematize the existing methods of disturbance suppressing and compensating, to formalize the requirements on the degree of certainty of the control plant, at which it is advisable to use one or another approach. We also present the methods of synthesizing the corresponding control laws. The results of the survey are presented in Sections 2 and 3. Secondly, in Section 4 we propose a new, more universal approach to the synthesis of invariant systems with nonlinear feedback, in which the advantages of classical methods are concentrated. Moreover, this approach gives an effective result in cases where classical methods are not applicable.

To strengthen the methodological component of the presented material, we will consider all the stated approaches specifically on the example of a single-channel nonlinear minimum-phase system of the third order operating under the action of external uncontrolled disturbances. The given synthesis procedures for a third-order system fully describe all the features of the presented methods. Therefore, the algorithms can be easily extended to similar systems of a higher order. In this sense, without loss of generality, the considered

**Citation:** Antipov, A.; Krasnova, S.; Utkin, V. Methods of Ensuring Invariance with Respect to External Disturbances: Overview and New Advances. *Mathematics* **2021**, *9*, 3140. https://doi.org/10.3390/ math9233140

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 10 November 2021 Accepted: 3 December 2021 Published: 6 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

control plant model can be interpreted as one of the subsystems of external dynamics equations of a multichannel system [1]

For the sake of presentation simplicity, let us suppose that the mathematical model of control plant has a relative degree of three and is representable in the following canonical input-output form: .

$$\begin{aligned} \dot{x}\_1 &= x\_2 + \eta\_1(t), \\ \dot{x}\_2 &= x\_3 + \eta\_2(t), \\ \dot{x}\_3 &= f(x) + b(x)u + \eta\_3(t), \end{aligned} \tag{1}$$

where *x* = (*x*1, *x*2, *x*3) <sup>T</sup> <sup>∈</sup> *<sup>X</sup>* <sup>⊂</sup> *<sup>R</sup>*<sup>3</sup> is measured state vector, *<sup>X</sup>* is open bounded region; *x*<sup>1</sup> ∈ *R* is controlled variable (output), *u* ∈ *R* is control action (input); *b*(*x*) = 0, *x* ∈ *X* is the structural requirement, needed for system controllability. In the system (1) *ηi*(*t*) are unknown functions of time that depend on external deterministic disturbances and other uncertainties in the description of the control plant model, which are bounded in modulus by known constants:

$$|\eta\_i(t)| \le H\_i = \text{const} > 0, \ t \ge 0, \ i = \overline{1,3}.\tag{2}$$

The assumptions about the smoothness/non-smoothness of these functions, as well as the requirements of definiteness of *f*(*x*), *b*(*x*) will be refined further.

Note that the output variable *x*1(*t*) can represent a tracking error which is the residual between the controlled variable and the given signal. We can assume that the analytical form of the given signal is not known, there are only its measured current values. Then its derivative is assumed to be an unknown bounded function that is additively included in *η*1(*t*).

It should be understood that in the presence of persistent disturbances *η*1,2(*t*), stabilization of all state variables of the system (1) is not possible for any control law. In a closed system, the variables *x*2(*t*) and *x*3(*t*) will have to describe the external actions *η*1(*t*) and *η*2(*t*) correspondingly. Therefore, for the system (1), the problem of feedback synthesis, ensured stabilization of only the output variable *x*1(*t*), is posed, which in the general case can be achieved with some accuracy,

$$|\varkappa\_1(t)| \le \Delta\_1, \ t \ge t\_1. \tag{3}$$

Further, it is assumed that the value Δ<sup>1</sup> > 0 is given. The settling time *t*<sup>1</sup> > 0 depends on the initial conditions. In addition, the requirement on the given settling time often leads to cumbersome constructions and conservative estimates on the regulator parameters selection. Therefore, in the review section, we consider sufficient conditions for solving the posed problem (3) without the given settling time. A complete solution of the problem (3) will be given in the presentation of the author's method.

Then, for the system (1) the known and new methods of solving the posed problem are considered under various assumptions. The article is structured as follows. In Section 2, we consider a particular case of system (1), when an external disturbance acts on the same channel as the control (matched disturbance). The methods of synthesis and the results of a comparative analysis of the following approaches of solving the problem (3) are presented:


In Section 3, we deal with the system (1) with unmatched disturbances. The main attention is paid to the case when external disturbances are not smooth. For the solution of the posed problem (3), a standard procedure of block synthesis of linear local feedbacks with high-gain factors is presented. The advantages and disadvantages of this method are described, and a conclusion about the advisability of introducing smooth and bounded nonlinear feedbacks in practical applications is made.

In Section 4 a new approach developed by the authors and implemented in practical applications is presented. Sufficient conditions of the posed problem (3) solution for the given settling time are formalized, and a constructive procedure of block synthesis of nonlinear sigmoid local feedbacks is developed. In the conclusion, the prospect for the further development of the results presented in Section 4 is indicated.

#### **2. Feedback Synthesis Methods in a System with a Matched Disturbance**

The most developed case in the automatic control theory means that functions with parametric uncertainties and external disturbances are affine and act in the control space. In this case, the disturbances are said to be matched with the control and the matching conditions are satisfied. For example, for a linear system

$$
\dot{\mathbf{x}} = A\mathbf{x} + B\boldsymbol{\mu} + Q\boldsymbol{\eta}(t)
$$

matching conditions have a form [1,2]

$$
\mathsf{Im}Q \subset \mathsf{Im}B \Leftrightarrow \mathsf{rank}B = \mathsf{rank}(B \ Q).
$$

This means that the columns of the matrix *Q* are a linear combination of the columns of the matrix *B*, therefore, the original system can be represented as

$$\dot{\mathbf{x}} = A\mathbf{x} + B(\mu + \Lambda \eta), \mathbf{Q} = B\Lambda.$$

For the system (1), the matching conditions take a form

$$\eta\_i(t) \equiv 0, \ t \ge 0, \ i = 1, 2 \tag{4}$$

Thus, in this section, we consider a special case of the system (1) and (4)

$$\begin{array}{l} \dot{\mathfrak{x}}\_{1} = \mathfrak{x}\_{2}, \\ \dot{\mathfrak{x}}\_{2} = \mathfrak{x}\_{3}, \\ \dot{\mathfrak{x}}\_{3} = f(\mathfrak{x}) + b(\mathfrak{x})\mathfrak{u} + \eta\_{3}(t), \end{array}$$

where the requirements on the smoothness of the functions *f*(*x*), *η*3(*t*) are generally not imposed.

Note that if *x*1(*t*) is a tracking error, then *x*2(*t*) and *x*3(*t*) are the first and second derivatives of the tracking error, which depend on the first and second derivatives of a given signal and are supposed to be known functions of time. Uncertainty is allowed only for the third derivative of the given signal, which is bounded and additively included in *η*3(*t*).

In contrast to the general case, in Systems (1) and (4) with matched disturbance it is possible to ensure the stabilization of all state variables using:


According to the first approach, firstly, the complete definiteness of the factor *b*(*x*) before control is required. Secondly, we need to estimate the unknown disturbance *η*3(*t*) using any method to ensure asymptotically decreasing of the estimation error Δ*η*(*t*) = *η*3(*t*) − *η*ˆ3(*t*) or its convergence to some small vicinity of zero rather quickly,

$$\lim\_{t \to +\infty} \Delta \eta(t) = 0 \text{ or } |\Delta \eta(t)| \le \delta\_\prime \ t \ge t\_{0\prime} \ 0 < t\_0 < t\_1.$$

The obtained estimate *η*ˆ3(*t*) is used for the synthesis of combined control

$$u = -\left(\phi(\mathfrak{x}) + \mathfrak{h}\_3(t)\right) / b(\mathfrak{x})\_{\prime\prime}$$

where *φ*(*x*) is a stabilizing component. If the function *f*(*x*) is complete defined, then we can linearize the closed system by feedback

$$u = -\frac{1}{b(\boldsymbol{x})} \left( f(\boldsymbol{x}) + \eta\_3(\boldsymbol{t}) + \sum\_{i=1}^3 c\_i \boldsymbol{x}\_i \right),\tag{5}$$

where *ci* > 0 are the coefficients of stable polynomial *λ*<sup>3</sup> + *c*3*λ*<sup>2</sup> + *c*2*λ* + *c*1. The closed system (1), (4) and (5) has a form

$$\begin{array}{l} \dot{\mathbf{x}}\_{1} = \mathbf{x}\_{2}, \\ \dot{\mathbf{x}}\_{2} = \mathbf{x}\_{3}, \\ \dot{\mathbf{x}}\_{3} = -c\_{1}\mathbf{x}\_{1} - c\_{2}\mathbf{x}\_{2} - c\_{3}\mathbf{x}\_{3} + \Delta \eta(t), \end{array}$$

where, in the general case, the given control accuracy (3) is ensured. Particularly, when estimate error decreases, an asymptotic stabilization of all state vector and, hence, the output variable

$$\lim\_{t \to +\infty} x\_1(t) = 0,\tag{6}$$

is occurred. In both cases by the selection of *ci* > 0 we can ensure the required characteristics of the transient process of the output variable.

The standard approach of obtaining an estimate of an external disturbance *η*ˆ3(*t*) is to expand the state space using a dynamic model, simulating the action of external disturbance, and construction of extended observer [3–5]. In the case of parametric uncertainty of the control plant model, the identification and adaptation algorithms are additionally used to estimate the unknown parameters [6–8].

However, the implementation of these approaches will lead to large estimation errors if the parameters and disturbances vary significantly during the operation of the control plant, and the used model does not describe these changes adequately. On the other side, taking into account all possible variations of external disturbances will lead to an unacceptable expansion of the dynamic model, a significant complication of the controller, and an increase in computing time of the control signal. An alternative to introducing a model of external influences is the construction of an observer based on the model of the control plant, which allows to obtain the estimates of unknown inputs without their dynamical model under certain conditions [9–12].

The second approach of invariance ensuring does not require the external disturbance estimation and consists in it suppressing by discontinuous controls with the organization of sliding modes or continuous feedbacks with high-gain factors. As a rule, these are linear controls.

To organize the sliding mode in the system (1) and (4), it is necessary to specify the switching surface (plane)

$$s = c\_1 \mathfrak{x}\_1 + c\_2 \mathfrak{x}\_2 + \mathfrak{x}\_{3\prime}$$

where *c*1,2 > 0 are the coefficients of the stable polynomial *λ*<sup>2</sup> + *c*2*λ* + *c*1, and introduce the discontinuous control law

$$u = -M\text{sign}(b(x))\text{sign}(s), \text{sign}(b(x)) = \text{const}\_{\prime}$$

where *M* = const > 0 is the amplitude of discontinuous control, sign(*s*) is the sign function

$$\text{sign}(s) = \begin{bmatrix} -1, \text{ s} < 0, \\ +1, \text{ s} > 0, \end{bmatrix}$$

which value is undefined when *s* = 0, but it bounded on interval [−1; 1].

Within the framework of this method, complete certainty of *f*(*x*), *b*(*x*) is not required, but the boundaries of their varying are assumed to be known

$$\begin{array}{l} |f(x(t))| \le F, \\ 0 < b\_{\min} \le |b(x)| \le b\_{\max}, \ x \in X, \ t \ge 0 \end{array} \tag{7}$$

A sufficient condition of sliding mode occurrence on the plane *s* = 0 has the form of inequality *s* . *s* < 0 [12–14], where

$$\begin{cases} \dot{s} = c\_2 s - c\_2 c\_1 x\_1 + (c\_1 - c\_2^2) x\_2 + f(x) - M|b(x)|\text{sign}(s) + \eta\_3(t), \\\left| c\_2 s(t) - c\_2 c\_1 x\_1(t) + (c\_1 - c\_2^2) x\_2(t) \right| \le C, \ t \ge 0. \end{cases}$$

They are satisfied when we select amplitude from the inequality

$$M \succ (C + F + H\_3) / b\_{\text{min}}.\tag{8}$$

When determining the upper estimate *C* of the admissible region of initial conditions |*x*(0)| ≤ *X*0, it is necessary to estimate the region of variation of closed system variables

$$\begin{aligned} \dot{\mathfrak{x}}\_1 &= \mathfrak{x}\_{2\prime} \\ \dot{\mathfrak{x}}\_2 &= -c\_1 \mathfrak{x}\_1 - c\_2 \mathfrak{x}\_2 + \mathfrak{s} \end{aligned}$$

with respect |*s*(*t*)| ≤ |*s*(0)|, *t* ≥ 0.

When (8) is valid, the requirement *s* . *s* ≤ |*s*|(*C* + *F* + *H*<sup>3</sup> − *Mb*min) < 0 is satisfied, and the sliding mode arises on the plane *s* = 0 in a finite time *t* > *t*0, 0 < *t*<sup>0</sup> < *t*1. In the sliding mode, the dynamic order of the closed system decreases

$$\begin{aligned} \dot{x}\_1 &= x\_{2'}\\ \dot{x}\_2 &= -c\_1 x\_1 - c\_2 x\_{2'} s(t) = 0, \; t \ge t \nu\_{\prime} \end{aligned}$$

and the stability of the accepted polynomial implies asymptotic stabilization of the output variable (6).

Thus, according to this method, the synthesis problem is divided into two successively solved subproblems of lower dimension:


$$\dot{s} = c\_2 s - c\_2 c\_1 x\_1 + (c\_1 - c\_2^2) x\_2 + f(x) - M|b(x)|\text{sign}(s) + \eta\_3(t)$$

Note that the use of discontinuous controls is natural in the presence of electrical inertia-less actuators that function exclusively in the key mode. In this case, the implementation of constant amplitude is a standard technical solution. Now let us consider systems in which there are no electrical actuators and only continuous control is permissible. Another method, based on disturbance suppression, is to use linear controls with high-gain factors [14–16]. For system (1) and (4) we introduce linear feedback instead of discontinuous control

$$u = -k \text{sign}(b(\mathfrak{x})) \\ \text{s. } \text{sign}(b(\mathfrak{x})) = \text{const}\_{\mathsf{x}}$$

where *k* = onst > 0 is a high-gain factor inversely proportional to the desired accuracy of suppression of matched disturbances and uncertainties: |*s*(*t*)| ≤ Δ, *t* ≥ *t*0, 0 < *t*<sup>0</sup> < *t*1. With respect,

$$\begin{array}{l} \dot{s} = c\_2 s - c\_2 c\_1 x\_1 + (c\_1 - c\_2^2) x\_2 + f(\mathbf{x}) - k|b(\mathbf{x})|s + \eta s(t), \\\ \left| (c\_1 - c\_2^2) x\_2(t) - c\_2 c\_1 x\_1(t) \right| \le C\_1, \ t \ge 0 \end{array}$$

the selection of the high-gain factor from inequality

$$k > \frac{C\_1 + F + H\_3}{\Delta b\_{\text{min}}} + \frac{c\_2}{b\_{\text{min}}} \tag{9}$$

will ensure that sufficient condition *s* . *s* ≤ |*s*|(*C*<sup>1</sup> + *F* + *H*<sup>3</sup> − (*kb*min − *c*2)|*s*|) < 0 is satisfied outer the region |*s*(*t*)| ≤ Δ, in which the variable *s*(*t*) converges in a finite time. When *t* ≥ *t*0, the closed system can be represented in the form

$$\begin{array}{l}\dot{\mathbf{x}}\_{1} = \mathbf{x}\_{2},\\\dot{\mathbf{x}}\_{2} = -c\_{1}\mathbf{x}\_{1} - c\_{2}\mathbf{x}\_{2} + \mathbf{s}\_{\prime}\ |\mathbf{s}(t)| \le \Delta\_{\prime} \end{array} \tag{10}$$

which ensures the control goal (3), where Δ<sup>1</sup> depend on Δ and accepted *c*1,2.

Note, if we exactly know *f*(*x*), *b*(*x*), the combined control laws can be formed, which resources will be used only on the suppression of external disturbances. Selected based on the virtual system . *s* = *c*1*x*<sup>2</sup> + *c*2*x*<sup>3</sup> + *f*(*x*) + *b*(*x*)*u* + *η*3(*t*) combine control law

$$u = -\left(c\_1\mathfrak{x}\_2 + c\_2\mathfrak{x}\_3 + f(\mathfrak{x}) + M\text{sign}(\mathfrak{s})\left[\text{or }ks\right]\right)/b(\mathfrak{x})^\natural$$

leads to the closed system

$$\begin{array}{l} \dot{\mathbf{x}}\_{1} = \mathbf{x}\_{2}, \\ \dot{\mathbf{x}}\_{2} = -\boldsymbol{\varepsilon}\_{1}\mathbf{x}\_{1} - \boldsymbol{\varepsilon}\_{2}\mathbf{x}\_{2} + \mathbf{s}\_{\prime} \\ \dot{\mathbf{s}} = -M\mathbf{s}\mathbf{g}\mathbf{n}(\mathbf{s}) \text{ [or } -k\mathbf{s} ] + \eta\_{3}(t) \end{array}$$

and when *M* > *H*<sup>3</sup> [or *k* > *H*3/Δ] ensure the fulfillment (6) [or (3)].

The main restriction of the synthesis method of systems with high-gain factors is that it is unrealizable in practical applications. To satisfy the constraints on control actions, the continuous piecewise linear controls in the form of saturation functions are used [17,18], which are the hybrid of linear and discontinuous controls. These functions are bounded, and they tend to a sign function with the increasing of high-gain factors. Consequently, in the closed system saturation functions ensure similar properties as in the systems, operating in sliding mode, and with some accuracy.

For system (1) and (4) let us consider feedback in the form of saturation function

$$u = -M\text{sign}(b(x))\text{sat}(ks),\text{sign}(b(x)) = \text{const}\_{\text{var}}$$

where *M* = const > 0 is the amplitude, *k* = const > 0 is the high-gain factor

$$M\text{sat}(\overline{\text{ks}}) = \begin{cases} M\text{sign}(\overline{\text{ks}}), & |s| > 1/\overline{\text{k}}, \\ M\overline{\text{ks}}, & |s| \le 1/\overline{\text{k}}. \end{cases}$$

Amplitude is selected so as in a system with discontinuous control (8), that ensures |*s*(*t*)| ≤ 1/*k* ≤ Δ, *t* > *t*0, when |*s*(0)| > 1/*k*. Selection of *k* ≥ 1/Δ ensures the desired stabilization accuracy, and as a result, the fulfillment of (10) and (3).

Significantly, that in contrast to a discontinuous control law with constant amplitude, which value does not vary in modulus during all control process, the values of saturation control automatically decrease in modulus in the steady-state mode (this fact is also valid

for linear continues control). It occurs due to the stabilization of state variables, and when *t* > *t*1, the control signal describes only external disturbance *η*3(*t*) with small notdecreasing components.

Thus, the combined control makes it possible to compensate for external matched disturbances, but for this, it is necessary to obtain their estimates and identify the unknown parameters of the system. In the case when the combined control cannot be realized, it remains to use the control aimed at suppressing external disturbances and model uncertainties. The selection of the type of control depends on the properties of the system and the existing design requirements on the smoothness and boundness.

#### **3. Block Synthesis of Linear Local Feedbacks in System with Unmatched Disturbances**

The most difficult are the control plants with unmatched disturbances (when conditions (4) are not satisfied), which cannot be compensated or suppressed by true control. In the tracking problem, these disturbances also include the derivatives of the reference signals. In addition, the problem of ensuring invariance with respect to disturbances is posed only for controlled outputs (tracking errors), since the remaining variables have to describe the corresponding external influences. According to the classical approach of the synthesis of a tracking system under the assumption of the smoothness of external influences, the state space is expanded due to the generators of reference and external influences, as well as the corresponding dynamic observers and identifiers of parameters [1]. In this case, the dynamic order of the closed system can increase by a factor of five or more times (in comparison with the dimension of the control plant model) if the external disturbances (and the corresponding autonomous models) vary significantly during the control process. If it is possible to formulate a model that accurately describes the dynamics of external disturbances, then asymptotic stabilization of tracking errors is theoretically achieved by expanding the state space.

Another approach is to represent the model of the control plant in the canonical or block input-output form, with the differentiation of external signals. In the process of obtaining this form, mixed variables are generated, which are the functions of state variables with additive external influences and their derivatives [19,20]. For system (1) under the assumption of differentiability of external disturbances *η*1,2(*t*), the canonical system in mixed variables has the form

$$\begin{aligned} \dot{\overline{\mathfrak{x}}}\_1 &= \dot{\overline{\mathfrak{x}}}\_2, \\ \dot{\overline{\mathfrak{x}}}\_2 &= \overline{\mathfrak{x}}\_3, \\ \dot{\overline{\mathfrak{x}}}\_3 &= f(\mathfrak{x}) + b(\mathfrak{x})\mathfrak{u} + \overline{\mathfrak{y}}\_3(t), \end{aligned} \tag{11}$$

where

$$\overline{\mathbf{x}}\_1 = \mathbf{x}\_1,\\ \overline{\mathbf{x}}\_2 = \mathbf{x}\_2 + \eta\_1(t),\\ \overline{\mathbf{x}}\_3 = \mathbf{x}\_3 + \eta\_2(t) + \dot{\eta}\_1(t),\\ \overline{\eta}\_3(t) = \eta\_3(t) + \dot{\eta}\_2(t) + \ddot{\eta}\_1(t).$$

In the last equation of the system (11), the initial variables *x* are left in the arguments of the functions *f*(*x*), *b*(*x*) for the convenience of synthesis. Structurally, system (11) repeats system (1), (4) with matched disturbances, since all uncertainties are concentrated in the control space and are subject to compensation or suppression using the control laws presented in Section 2.

The feature of this approach of ensuring invariance is that the problem of evaluating external influences separately is not considered, the autonomous models that generate them are not introduced into the constructions. Assuming that the output variable *x*<sup>1</sup> = *x*1, is measured, an observer is constructed based on the transformed system (11) with an indefinite input. Due to the suppression function of corrective action of the observer, it gives an estimate of mixed variables and uncertainties to form feedback and leads to an increase in the dynamic order of closed system by no more than twice. As a rule, in this case, *ε*-invariance of the output variable with respect to external unmatched disturbances is achieved.

However, the mentioned approaches are not applicable in the case when external unmatched disturbances and other model uncertainties are not smooth enough and cannot be differentiated. An example is shock loads and dry friction forces when controlling mechanical objects, taking into account the dynamics of actuators [21–25]. In the particular case, when a non-smooth disturbance is separated from the true control by one integrator, it can be suppressed using "vortex" control with continuous and discontinuous components. The result is achieved due to the organization of an oscillatory transient process in the system, in which part of the state variables automatically compensates for the influence of unknown terms [26].

In the general case, when external unmatched disturbances act on the control plant, the estimation, and compensation or suppression of which are not possible by true control, it remains to use the possibilities of disturbance suppression using local feedbacks. The methodological basis for the implementation of this approach is the decomposition methods and the block control principle [16,27]. According to this approach, using a non-degenerate change of variables, the equations of external dynamics are reduced to a block input-output form with an affine occurrence of fictitious and true controls. It consists of elementary blocks, in each of which the dimension of the controlled variables is equal to the rank of the matrix before the fictitious controls, which are the variables of the next block. For the general case of a controllable minimum-phase nonlinear system of the *n*-th order with affine external influences *η*, the block form is the following [20]:

$$\begin{cases}
\dot{\mathbf{x}}\_{1} = f\_{1}(\mathbf{x}\_{1}) + B\_{1}(\mathbf{x}\_{1})\mathbf{x}\_{2} + Q\_{1}(\mathbf{x}\_{1})\eta; \\
\dot{\mathbf{x}}\_{i} = f\_{i}(\mathbf{x}\_{1}, \dots, \mathbf{x}\_{i}) + B\_{i}(\mathbf{x}\_{1}, \dots, \mathbf{x}\_{i})\mathbf{x}\_{i+1} + Q\_{i}(\mathbf{x}\_{1}, \dots, \mathbf{x}\_{i})\eta\_{i}; \ i = \overline{2, r-1}; \\
\dot{\mathbf{x}}\_{r} = f\_{r}(\mathbf{x}\_{1}, \dots, \mathbf{x}\_{r}) + B\_{r}(\mathbf{x}\_{1}, \dots, \mathbf{x}\_{r})\mathbf{u} + Q\_{r}(\mathbf{x}\_{1}, \dots, \mathbf{x}\_{r})\eta\_{r}
\end{cases}$$

where *Bi* <sup>∈</sup> *<sup>R</sup>pi*×*pi*+<sup>1</sup> , *<sup>i</sup>* <sup>=</sup> 1, *<sup>r</sup>* <sup>−</sup> 1, *Br* <sup>∈</sup> *<sup>R</sup>pr*×*pr*, *Qi* <sup>∈</sup> *<sup>R</sup>pi*×1, dim*xi* <sup>=</sup> rank*Bi* <sup>=</sup> *pi*, *<sup>i</sup>* <sup>=</sup> 1, *<sup>r</sup>*, *p*<sup>1</sup> + *p*<sup>2</sup> + ... + *pr* = *n*.

Sequentially (from top to bottom) formed stabilizing local feedbacks in each block are provided by the selection of true control. When a block form is obtained, external influences are not differentiated and do not participate in transformations, but with a block organization, they become matched with fictitious controls. Then, with an appropriate selection of fictitious controls, it is possible to stabilize the output variables with some accuracy.

Let us explain the essence of the block control principle using the example of system (1), which, as we see, is a special case of the block form and consists of three elementary blocks of the first order. In the first and second equations, the variables *x*<sup>2</sup> and *x*3, respectively, are treated as fictitious controls, with which the bounded disturbances *η*<sup>1</sup> and *η*<sup>2</sup> are matched, respectively. The smoothness requirement is not imposed on external disturbances. The question arises about the selection of the form of stabilizing functions in fictitious and true controls, that would ensure the invariance of the output variable with respect to external disturbances by suppressing them.

As shown above, the classical methods of suppressing external and parametric bounded disturbances acting in the control space are: (1) continuous linear feedbacks with high-gain factors; (2) discontinuous controls bounded in modulus with the organizations of sliding modes. In addition, only controls of the first type (due to their smoothness) can be used to form local feedbacks. We emphasize once again that with the help of linear local feedbacks in a system with unmatched disturbances, it is possible to ensure stabilization of the controlled variable only with certain accuracy (3).

For system (1), let us consider the standard step-by-step procedure of block synthesis of linear local feedbacks with high-gain factors under the action of unmatched bounded disturbances [16]. It consists of the following stages: (1) introduction of local feedbacks (stabilizing fictitious controls) by non-degenerate change of variables of the original system (1) to residuals between real and adopted fictitious controls; (2) the selection of the control law; (3) setting the parameters of the feedback that meets the control goal. We represent the first stage in the form of the following procedure, which for system (1) consists of three steps and is similarly extended to systems of any order presented in the block form of controllability.

**Procedure 1** : Non-degenerate transformation with the introduction of linear local feedbacks.

Step 1. In the first equation of system (1), we introduce linear local feedback *x*∗ <sup>2</sup> = −*k*1*x*1, *k*<sup>1</sup> = const > 0 and the residual between the actual and the selected fictitious control

$$
\varkappa\_2 = \varkappa\_2 - \varkappa\_2^\* = \varkappa\_2 + k\_1 \varkappa\_1. \tag{12}
$$

Taking into account the notation *e*<sup>1</sup> = *x*<sup>1</sup> and (12), the first equation of System (1) takes the form .

$$
\dot{e}\_1 = -k\_1 e\_1 + e\_2 + \eta\_1. \tag{13}
$$

Step 2. Let us write the differential equation for the residual (12) by (1) and (13)

$$
\dot{e}\_2 = x\_3 + \eta\_2 + k\_1 \\
\dot{e}\_1 = -k\_1^2 e\_1 + k\_1 e\_2 + x\_3 + \eta\_2 + k\_1 \eta\_{14},
$$

where we form a combined fictitious control with a linear stabilizing component *x*∗ <sup>3</sup> = *<sup>k</sup>*<sup>2</sup> <sup>1</sup>*e*<sup>1</sup> − *k*2*e*2, *k*<sup>2</sup> = const, *k*<sup>2</sup> > *k*<sup>1</sup> and a residual between the actual and the selected fictitious control

$$
\mathbf{e}\_3 = \mathbf{x}\_3 - \mathbf{x}\_3^\* = \mathbf{x}\_3 - k\_1^2 \mathbf{e}\_1 + k\_2 \mathbf{e}\_2. \tag{14}
$$

With respect (14), the second equation of the system (1) takes the form

$$
\dot{e}\_2 = -(k\_2 - k\_1)e\_2 + e\_3 + \eta\_2 + k\_1 \eta\_1. \tag{15}
$$

Step 3. Let us write the differential equation for the residual (14) by (1) and (15)

$$\begin{cases}
\dot{e}\_3 = f(\mathbf{x}) + b(\mathbf{x})\boldsymbol{\mu} + \eta\_3(t) - k\_1^2 \dot{e}\_1 + k\_2 \dot{e}\_2 = k\_1^3 e\_1 - (k\_1^2 + k\_2^2 - k\_2 k\_1) e\_2 + k\_2 e\_3 + \\
+ f(\mathbf{x}) + b(\mathbf{x})\boldsymbol{\mu} + \eta\_3 + k\_2 \eta\_2 + (k\_2 k\_1 - k\_1^2) \eta\_1.
\end{cases} \tag{16}$$

The procedure is over.

Thus, we have obtained system (13), (15) and (16) using a nondegenerate linear transformation of the system (1). The final transformation matrix is obtained as a result of the product of the transformation matrices performed at the first (12) and second (14) steps of the procedure (in the indicated order)

$$P = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ -k\_1^2 & k\_2 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ k\_1 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ k\_1 & 1 & 0 \\ k\_1k\_2 - k\_1^2 & k\_2 & 1 \end{pmatrix}, \text{ det} \\ P \neq 0$$

For simplicity of presentation, we will consider the case of complete definiteness of functions *f*(*x*), *b*(*x*), which allows us to accept a combined true control in the form

$$u = -\left(k\_1^3 e\_1 - \left(k\_1^2 + k\_2^2 - k\_2 k\_1\right) e\_2 + k\_2 e\_3 + f\left(\mathbf{x}\right) + \varphi(e\_3)\right) / b\left(\mathbf{x}\right). \tag{17}$$

The closed system (13) and (15)–(17) takes the form

$$\begin{aligned} \dot{e}\_1 &= -k\_1 e\_1 + e\_2 + \eta\_{1\prime} \\ \dot{e}\_2 &= -(k\_2 - k\_1) e\_2 + e\_3 + \eta\_2 + k\_1 \eta\_{1\prime} \\ \dot{e}\_3 &= -\phi(e\_3) + \eta\_3 + k\_2 \eta\_2 + (k\_2 k\_1 - k\_1^2) \eta\_1. \end{aligned} \tag{18}$$

The stabilizing component *φ*(*e*3) of the control law (17) must ensure the suppression of the linear combination of disturbances *<sup>η</sup>*<sup>3</sup> <sup>+</sup> *<sup>k</sup>*2*η*<sup>2</sup> + (*k*2*k*<sup>1</sup> <sup>−</sup> *<sup>k</sup>*<sup>2</sup> <sup>1</sup>)*η*<sup>1</sup> and the stabilization of the variable *e*3. For this, either discontinuous control or linear with high-gain factors or their piecewise-linear continuous hybrid in the form of a saturation function is applied, namely:

$$\begin{array}{l}(1)\ \phi(\iota\_{3}) = M \text{sign}(\iota\_{3});\\(2)\ \phi(\iota\_{3}) = k\_{3}\iota\_{3};\\(3)\ \phi(\iota\_{3}) = M \text{sat}(\overline{k}\_{3}\iota\_{3}),\end{array} \tag{19}$$

where *M*, *k*3, *k*<sup>3</sup> = const > 0. Note that the control laws (19) are formed by the variable *e*<sup>3</sup> (14), which is a linear combination of the measured state variables of the original system (1).

As shown above, in the first case (19) when the amplitude is selected based on the inequality

$$M > H\_3 + k\_2 H\_2 + (k\_2 k\_1 - k\_1^2) H\_1 \tag{20}$$

the sufficient condition *e*<sup>3</sup> . *e*<sup>3</sup> < 0 is satisfied. The sliding mode arises on the plane *e*<sup>3</sup> = 0 in a finite time *t* > *t*<sup>0</sup> > 0, and the dynamical order of the system is reduced. In the second case (19), a high-gain factor is selected taking into account the specified stabilization accuracy similarly to (9), namely:

$$k\_3 > \frac{H\_3 + k\_2 H\_2 + (k\_2 k\_1 - k\_1^2) H\_1}{\Delta\_3}.\tag{21}$$

In the third case (19), the lower bounds of the parameter's selection of the piecewise linear control have the form (20) and *k*<sup>3</sup> > 1/Δ3, Δ<sup>3</sup> > 0. In both the second and third cases, the convergence of the variable to some neighborhood of zero is ensured

$$|e\_3(t)| \le \Delta\_{3\prime} \ t > t\_0. \tag{22}$$

Using (22), let us consider the procedure of selection the high-gain factors *k*<sup>1</sup> > 0, *k*<sup>2</sup> > 0 based on the second Lyapunov method. We introduce a candidate on the Lyapunov function as the sum of two terms *V* = *V*<sup>1</sup> + *V*2, *Vi* = <sup>1</sup> 2 *e*2 *<sup>i</sup>* , *i* = 1, 2, and estimate their derivatives by (2), (13) and (15):

$$\begin{array}{lcl}\varrho\_1\dot{\varepsilon}\_1 \le |\varepsilon\_1|(|\dot{\varepsilon}\_2| + H\_1 - k\_1|\varepsilon\_1|),\\\varrho\_2\dot{\varepsilon}\_2 \le |\varepsilon\_2|(|\dot{\varepsilon}\_3| + H\_2 + k\_1H\_1 - (k\_2 - k\_1)|\varepsilon\_2|). \end{array} \tag{23}$$

It follows from inequalities (23) that sufficient stability conditions . *V* < 0 are met if the high-gain factors satisfy the inequalities

 $k\_1 > \frac{H\_1 + \Lambda\_2}{\Delta\_1}$ ,  $|e\_3| \le \Delta\_3$ ,  $|e\_2| \le \Delta\_2$ ,  $|e\_1| > \Delta\_1$ ,  $k\_2 > \frac{H\_2 + k\_1 H\_1 + \Lambda\_3}{\Delta\_2} + k\_1$ ,  $|e\_3| \le \Delta\_3$ ,  $|e\_2| > \Delta\_2$ .

Thus, first, we set the desired accuracy of the stabilization Δ*i*, *i* = 1, 3 of the virtual variables *e* = (*e*1, *e*2, *e*3) T. Then, with a sequential (from top to bottom) selection of high-gain factors based on inequalities (24) and (21), the variables of the closed system (18) and (19) sequentially (from bottom to top) converge into the given neighborhoods of zero

$$|e\_3(t)| \le \Delta \mathfrak{s} \implies |e\_2(t)| \le \Delta \mathfrak{s} \implies |e\_1(t)| \le \Delta\_1,\tag{25}$$

and the control goal (3) is achieved. When selecting the high-gain factors, one should take into account that as the *k*<sup>1</sup> increases, the accuracy improves (3) (in the limited case Δ<sup>1</sup> → 0 when *k*<sup>1</sup> → +∞) and the settling time decreases. However, due to the unboundedness of linear controls, this leads to the well-known problem of large overshoot [28]. On the other hand, in practical applications control resources are always bounded, so there is an upper bound of the selection of *k*<sup>1</sup> ≤ *k*1max and the corresponding minimum achievable tracking error Δ1,min ≤ Δ1.

The bounded control in the form of a saturation function is not smooth. On the one hand, it is not an obstacle when these functions are used in corrective actions of observers of the state of systems with disturbances [20,29]. However, on the other hand, it narrows the possibilities of its application as fictitious controls in practical problems.

Summing up, we can conclude that for the universal formation of invariant local feedbacks and the practical realizability of control Algorithms, it is advisable to use smooth analogs of the saturation function. These include transcendental S-shaped functions: arctangent, hyperbolic tangent, logistic function, etc. The odd hyperbolic tangent th(*x*) = 1 − 2/(exp(2*x*) + 1) appears to be a constructive tool for the analysis and synthesis of nonlinear control. This bounded function depends on the exponent, its derivatives are also bounded everywhere and are recursively expressed through the antiderivative.

In this paper, a modification of the hyperbolic tangent, which is more convenient for constructions, is used in the form of a sigmoid function *σ*(*x*) = −th(−*x*/2). Its properties and the corresponding synthesis procedure developed by the authors are presented in the next section and constitute the main result of this work.

#### **4. Block Synthesis of Nonlinear Local Feedbacks in Systems with Unmatched Disturbances**

Let us consider a smooth and bounded sigmoid function

$$\sigma(k\mathbf{x}) = \frac{2}{1 + \exp(-k\mathbf{x})} - 1, \; k = \text{const} > 0,$$

which is defined on the whole number axis and has the following properties: *σ*(−*kx*) = −*σ*(*kx*), *σ*(*kx*) ∼ *<sup>x</sup>*→<sup>0</sup> *kx*/2, *<sup>σ</sup>*(*kx*) <sup>∼</sup>*k*→+<sup>∞</sup> sign(*x*). In its argument, a factor *<sup>k</sup>* is specially introduced, which plays a role of a high-gain factor in a small neighborhood of zero in further constructions. The derivative of the sigmoid function has a recursive form:

$$
\sigma'(k\mathbf{x}) = k(1 - \sigma^2(k\mathbf{x}))/2 > 0, \; \mathbf{x} \in \mathbb{R}, \\
\sigma'(-k\mathbf{x}) = \sigma'(k\mathbf{x}).
$$

To simplify the analysis of a nonlinear sigmoid function, let us establish its analogy with a piecewise linear saturation function. Consider some neighborhood of zero with radius Δ > 0. The following estimates

$$\begin{array}{ll} \sigma(k\Delta) < |\sigma(k\mathbf{x})| < 1, \ 0 < \sigma'(k\mathbf{x}) < \sigma'(k\Delta), \ |\mathbf{x}| > \Delta; \\ \frac{\sigma(k\Delta)|\mathbf{x}|}{\Delta} \le |\sigma(k\mathbf{x})| \le \sigma(k\Delta), \ \sigma'(k\Delta) \le \sigma'(k\mathbf{x}) \le \sigma'(0) = \frac{\mathbf{k}}{2}, \ |\mathbf{x}| \le \Delta \end{array} \tag{26}$$

are valid for the sigmoid function and its derivative in the indicated intervals. Inequalities (26) demonstrate that when |*x*| > Δ the sigmoid function is close to a constant, and when |*x*| ≤ Δ it is close to a linear function. To formalize the abscissa of the specified division, we introduce the parameter *c* = const > 0: |*x*| = Δ = *c*/*k*, which is advisable to select from the interval

$$k\Delta = c \in [1.3; 3]. \tag{27}$$

where ±1.3 are the abscissas of the inflection points of the first derivative *σ*(±1.3) = 0, and *σ*(±1.3) ≈ ±0.57, *σ* (±1.3) ≈ 0.34*k*; ±3 are the abscissas of the vertices of the sigmoid function, in which its curvature reaches its maximum, while *σ*(±3) ≈ ±0.9, *σ* (±0.9) ≈ 0.095*k* [19].

For the convenience of calculations, we take

$$
\omega = 2.2; \sigma(c) \approx 0.8; \ \frac{1}{\sigma(c)} \approx 1.25; \ \sigma'(c) \approx 0.18k. \tag{28}
$$

Using (28), estimates (26) take the following form:

 $0.8 < |\sigma(kx)| < 1, \ 0 < \sigma'(kx) < 0.18k$ ,  $|x| > \varepsilon/k$ ,  $\varepsilon = 2.2; \quad 0.8k|x| \le |\sigma(kx)| \le 0.8, \ 0.18k \le \sigma'(kx) \le \sigma'(0) = \frac{k}{2}, |x| \le \varepsilon/k.$ 

Let us explain the idea of using sigmoid feedback and the selection of its parameters in the problem of ensuring invariance using the example of an elementary system with external disturbance .

$$
\dot{\mathbf{x}} = \eta(t) + \boldsymbol{\mu}, \tag{30}
$$

where *x* ∈ *R* is the state variable, *η*(*t*) is the external disturbance, which is described by a deterministic, unknown, but bounded function of a time. The requirement of smoothness is not imposed on it, it is sufficient that it be piecewise continuous. The problem of stabilizing system (30) with a given accuracy using the sigmoid control

$$
\mu = -m\sigma(k\mathbf{x})\tag{31}
$$

with a constant amplitude *m* = onst > 0 and high-gain factor *k* = const > 0 is posed.

**Lemma 1.** *If in system (30), (31) the external disturbance is bounded by a known constant* |*η*(*t*)| ≤ *H* = const > 0, *t* ≥ 0, *then for any arbitrary small* Δ > 0, *T* > 0 *and any initial values x*(0) *from some bounded domain X*<sup>0</sup> ≥ |*x*(0)| *there are positive real numbers k u m*, *such that for any k* ≥ *k*, *m* ≥ *m*, *the following inequality is valid*

$$|x(t)| \le \Delta, \ t \ge T. \tag{32}$$

**Proof.** Let us introduce the parametric dependence (27), then, with respect to (28) and (29), the following lower estimates are valid for control (31) on the indicated intervals. To analyze the stability of closed system (30) and (31), we use the second Lyapunov method. Let us introduce a candidate on the Lyapunov function *V* = *x*2/2 and estimate its derivative on the indicated intervals taking into account (33).

$$|u(\mathbf{x})| = |m\sigma(k\mathbf{x})| \ge \begin{cases} 0.8m, \, |\mathbf{x}| > \Delta, \\ 0.8mk|\mathbf{x}|/2.2, \, |\mathbf{x}| \le \Delta \end{cases} \tag{33}$$

$$\dot{V} = \mathbf{x}(\eta(t) - m\sigma(kx)) \le \begin{cases} |\mathbf{x}|(H - 0.8m), |\mathbf{x}| > \Delta, \\ |\mathbf{x}|(H - 0.8mk|\mathbf{x}|/2\Delta), |\mathbf{x}| \le \Delta. \end{cases} \tag{34}$$

It follows from (34) that the derivative of the Lyapunov function is negative if the feedback parameters satisfy the following conditions:

$$\begin{array}{l} 0.8m > H \Leftrightarrow m > 1.25H, \\ k > \frac{H}{0.8m} \cdot \frac{2.2}{\Delta}. \end{array} \tag{35}$$

The fulfillment of the first inequality (35) means that the state variable will converge into the region |*x*| ≤ Δ or will not leave it if it was there initially. In addition, the fulfillment of the second inequality guarantees stabilization with a given accuracy (32), namely:

$$|x| \le \frac{H}{0.8m} \Delta < \Delta.$$

Using 0 < *H*/(0.8*m*) < 1, it is possible to simplify the lower bound for selection a high-gain factor in comparison with the second inequality (35) and take

$$k \ge k = 2.2/\Lambda.\tag{36}$$

In the general case |*x*(0)| > Δ to guarantee the achievement of the state variable of a given region in a given time *T* > 0, let us increase the lower bound of selection of amplitude. With respect to the estimate of the solution of system (30) and (31) on the interval *t* ∈ [0; *T*]

$$|x(t)| \le |x(0)| + (H - 0.8m)T \le \Delta. \tag{37}$$

we obtain

$$m \ge \overline{m} = 1.25 \left( \frac{X\_0 - \Delta}{T} + H \right), \ X\_0 > \Delta. \tag{38}$$

Thus, we defined such *k* (36) and *m* (38) that for any *k* ≥ *k*, *m* ≥ *m*, the stabilization of the state variable with the given accuracy and for the given time (32) is ensured in the closed system (30) and (31). Lemma 1 is proved.

As you can see, the sigmoid control, as well as the piecewise-linear saturation function, is bounded everywhere and contains two adjustable parameters. The selection of the amplitude provides the given time of convergence of the controlled variable to a certain neighborhood of zero, and the selection of a high-gain factor provides the radius of this area, i.e., the given stabilization accuracy. In a first-order system, the transient process is monotonic.

We use the results obtained in Lemma 1 to stabilize the output variable of system (1) taking into account (2) and (7) under the following assumptions: the requirements of smoothness of external disturbances are not imposed, the functions *f*(*x*), *b*(*x*) are not required to be completely defined, the sign of *b*(*x*) is constant and known. In further constructions, we will take into account the given settling time (3), which is guaranteed for all initial values of the variables from the bounded admissible region

$$|\mathbf{x}\_1(0)| \le X\_1, \ |\mathbf{x}\_2(0)| \le X\_2, \ |\mathbf{x}\_3(0)| \le X\_3. \tag{39}$$

As a methodological basis of the synthesis procedure, we use the block control principle, demonstrated in Section 3 for the synthesis of linear feedbacks. Let us emphasize that the idea of the approach proposed below is similar to the backstepping [30]. The main differences of our approach are that it does not require smoothness of functions *f*(*x*), *b*(*x*), and *ηi*(*t*), *i* = 1, 3; we use static feedback, do not expand the state space, and do not aim to obtain estimates of the existing uncertainties. To avoid large overshoot, which is typical for linear feedbacks with high-gain factors, we will select stabilizing fictitious controls in the form of smooth and bounded sigmoid functions

$$m\_i^\* = -m\_{i-1} \sigma(k\_{i-1} e\_{i-1}), \; k\_{i-1} = \text{const} > 0, \; m\_{i-1} = \text{const} > 0, i = 2, 3, \; i$$

where *e*<sup>2</sup> and *e*<sup>3</sup> are the residuals between the variables *x*<sup>2</sup> and *x*3, respectively, and the selected fictitious controls

$$
\mathbf{x}\_{i} = \mathbf{x}\_{i} - \mathbf{x}\_{i}^{\*} = \mathbf{x}\_{i} + m\_{i-1}\sigma(k\_{i-1}\mathbf{e}\_{i-1}), \; i = 2,3 \\
\mathbf{x}\_{1} = \mathbf{x}\_{1}. \tag{40}
$$

For uniformity, true control is also accepted as a sigmoid function

$$
\mu = -\text{sign}(b)m\_3\sigma(k\_3e\_3), \, k\_3 = \text{const}>0, \, m\_3 = \text{const}>0 \tag{41}
$$

Note that to simplify the computational implementation, instead of (41), one can also use a continuous, bounded, but non-smooth saturation function or discontinuous control in systems with electric actuators as a true control.

Also note that, unlike Procedure 1 with linear transformations in changes of variables (40) and control law (41), we did not compensate the nonlinear components that do not depend on external disturbances in order not to complicate the control function.

Let us rewrite closed system (1), (41) with respect to residuals (40)

$$\begin{aligned} \dot{e}\_1 &= -m\_1 \sigma(k\_1 e\_1) + e\_2 + \eta\_1, \\ \dot{e}\_2 &= -m\_2 \sigma(k\_2 e\_2) + e\_3 + \eta\_2 + \Lambda\_1, \\ \dot{e}\_3 &= -|b(\mathbf{x})| m\_3 \sigma(k\_3 e\_3) + f(\mathbf{x}) + \eta\_3 + \Lambda\_2. \end{aligned} \tag{42}$$

where terms

$$\Lambda\_i = m\_i \frac{k\_i(1 - \sigma^2(k\_i e\_i))}{2} \dot{e}\_{i\prime} \text{ i } = 1 \text{ 2,} \tag{43}$$

are the derivatives of the corresponding fictitious controls, which arise in the transition to the new coordinate basis (40).

There is no need to change the arguments of functions *b*(*x*) and *f*(*x*) in the last equation of transformed system (42) since constraints (7) are specified in terms of the variables of the original system (1), and the specific of these functions do not matter for the formation of control law (41).

We will perform feedback synthesis according to the block approach in terms of virtual system (42). The idea is that sigmoid fictitious and true controls introduced into each subsystem using non-degenerate change of variables (40) and feedback (41) serve to suppress external uncontrolled disturbances. This will ensure the stabilization of the residuals *ei*, *i* = 1, 3 with any given accuracy. By virtue of the inverse change of variables (40), namely, this means that in the closed system (1) with nonlinear control (41), which is realized in the form

$$u = -\text{sign}(b)m\_3\sigma(k\_3(\mathbf{x}\_3 + m\_2\sigma(k\_2(\mathbf{x}\_2 + m\_1\sigma(k\_1\mathbf{x}\_1))))),\tag{44}$$

in the steady-state, the variables *x*2(*t*) and *x*3(*t*) describe external disturbances *η*1(*t*) and *η*2(*t*), accordingly. In addition, the stabilization accuracy of the output variables of both systems will be the same. Thus, the fulfillment of the objective condition in closed system (42) and (41)

$$|e\_1(t)| \le \Delta\_{1\prime} \ t \ge t\_{1\prime} \tag{45}$$

is equivalent to solving the problem (3).

As shown in Section 3, the block approach in multidimensional systems consists in sequentially solving elementary synthesis problems in subsystems (blocks) similar to (30). However, only the last subsystem is directly regulated by the true control, and in the rest, the variables of the next block act as fictitious controls. As a consequence, in the general case of nonzero initial conditions only in the last block, a monotonic transient process is guaranteed.

Sufficient conditions of the existence of feedback parameters *mi*, *ki*, *i* = 1, 2, 3 that ensure the fulfillment of objective condition (45) in the system (42) are formulated in Lemma 2. In the process of constructive proof, a step-by-step procedure of adjusting the amplitudes of sigmoid controls was formalized, in which the decomposition principle is implemented [31,32].

**Lemma 2.** *Let us consider closed system (1), (44), presented in the form (42) using non-degenerate changes of variables. If conditions (2) and (7) are satisfied for this system, then for any initial values of variables from the bounded domain (39) and for any, arbitrarily small* Δ<sup>1</sup> > 0, *t*<sup>1</sup> > 0, *there are real numbers ki* > 0, *i* = 1, 3, 0 < *mi* < *mi*, *i* = 1, 2, *m*<sup>3</sup> > 0, *such that for any ki* ≥ *ki*, *mi* : *mi* < *mi* ≤ *mi*, *inequality (45) is satisfied*.

**Constructive Proof.** According to the ideology of the block approach, in the closed system (42) it is necessary to provide the following sequence of convergence of residuals:

$$|\mathfrak{e}\_3| \le \Delta\_3(t \ge t\_3 > 0) \Rightarrow |\mathfrak{e}\_2| \le \Delta\_2(t \ge t\_2 > t\_3) \Rightarrow |\mathfrak{e}\_1| \le \Delta\_1(t \ge t\_1 > t\_2),\tag{46}$$

where Δ<sup>1</sup> > 0, *t*<sup>1</sup> > 0 are the given (45), Δ2,3 > 0 are assigned arbitrarily. The dependences *t*2,3 on the initial conditions and accepted Δ2,3 > 0 are established in the course of the proof.

Lemma 1 demonstrates the existence of *ki* > 0*, i* = 1, 3 such as for any *ki* ≥ *ki, i* = 1, 3 the desired radii Δ1,2,3 > 0 (46) of neighborhoods of zero are guaranteed, at which the residuals converge in the indicated times (46). With respect (28) and similarly to (36), we fix the values of high-gain factors based on the inequalities

$$k\_i^\* \ge \overline{k}\_i = 2.2 / \Delta\_i \text{ } i = \overline{1,3}. \tag{47}$$

In (47) and below, using the symbol ∗ in the superscript, we will denote specific accepted numerical values of the parameters.

Increasing the accepted values *k*∗ *<sup>i</sup>* leads to a decrease in the stabilization errors of residuals. The convergence of the residuals into the established areas in the specified time (46) is ensured by selection *mi*, *i* = 1, 3.

The stabilization of system (42) is carried out "from the bottom up" (46). Sufficient conditions of the selection of amplitudes, similar to the first inequality in (35), are valid when the indicated conditions are met:

$$\begin{array}{l} 0.8m\_1 > H\_1 + \Delta\_{2\prime} \ |e\_2| \le \Lambda\_2\\ 0.8m\_2 > H\_2 + |\Lambda\_1| + \Delta\_{3\prime} \ |e\_3| \le \Lambda\_{3\prime}\\ 0.8b\_{\text{min}}m\_3 > F + H\_3 + |\Lambda\_2|. \end{array} \tag{48}$$

The fulfillment of (47) and (48) ensures the sequential stabilization of the residuals with a given accuracy without taking into account the convergence time, which depends on the initial conditions. In a particular case, the fulfillment of (47) and (48) will ensure |*ei*(*t*)| ≤ Δ*i*, *i* = 1, 3 at *t* ≥ 0, i.e., the goal of control (45) is achieved.

In the tracking system, the following variants of the initial conditions are also subject of interest: if |*e*3(0)| ≤ Δ3,|*e*2(0)| > Δ2, then the transient process of *e*2(*t*) will be monotonical; if |*ei*(0)| ≤ Δ*i*, *i* = 2, 3, |*e*1(0)| > Δ1, then the transient process of *e*1(*t*) will be without overshoot.

In the rest of the particulars, as well as in the general case |*ei*(*t*)| > Δ*i*, *i* = 1, 3, within the framework of these constructions, a monotonic transient process is guaranteed only for the variable *e*3(*t*).

Until the variables of the lower blocks of system (42) reach the specified neighborhoods (46), the variables of the upper blocks grow in absolute value and reach their maximum value no later than at the following times:

$$|\varepsilon\_3(t)| \le |\varepsilon\_3(0)| = \varepsilon\_{3,\text{max}} |\varepsilon\_2(t)| \le |\varepsilon\_2(t\_3)| = \varepsilon\_{2,\text{max}} |\varepsilon\_1(t)| \le |\varepsilon\_1(t\_2)| = \varepsilon\_{1,\text{max}} \ t \ge 0 \tag{49}$$

By (39) and (40), we estimate the ranges of initial values of the variables of system (42)

$$|e\_1(0)| \le X\_1, |e\_i(0)| \le X\_i + m\_{i-1}, i = 2, 3\tag{50}$$

Using (42), (48) and (50) and taking into account that the proper motions in closed system (42) are stable, we estimate the maximum values (49)

$$\begin{array}{l} \text{e}\_{1,\text{max}} \le X\_1 + (\text{e}\_{2,\text{max}} - \Delta\_2)t\_{2,} \\ \text{e}\_{2,\text{max}} \le X\_2 + m\_1 + (\text{e}\_{3,\text{max}} - \Delta\_3)t\_{3,} \\ \text{e}\_{3,\text{max}} \le X\_3 + m\_2. \end{array} \tag{51}$$

To ensure the given convergence time, it is necessary to increase the lower bounds of the selection of amplitudes (48). First, we give estimates of the derivatives of fictitious controls (43). They differ at different intervals and depend on the corresponding estimates of the derivatives of the sigmoid functions and the derivatives of the corresponding residuals (42). Using (48), for the derivatives of the residuals, the following estimates are valid:

$$\begin{split} t \in \left[0; t\_{2}\right) &: \left|\dot{e}\_{1}(t)\right| \leq \underbrace{H\_{1} + \Delta\_{2}}\_{< 0.8m\_{1}} + \epsilon\_{2, \max} - \Delta\_{2} + m\_{1} < 2m\_{1} + \epsilon\_{2, \max} - \Delta\_{2} \\ t \geq t\_{2} &: \left|\dot{e}\_{1}(t)\right| \leq \underbrace{H\_{1} + \Delta\_{2} + m\_{1} < 2m\_{1}}\_{< \left[0; t\_{3}\right) \left|\dot{e}\_{2}(t)\right| \leq \underbrace{H\_{2} + \left|\Lambda\_{1}\right| + \Delta\_{3}}\_{< 0.8m\_{1}} + \epsilon\_{3, \max} - \Delta\_{3} + m\_{2} < 2m\_{2} + \epsilon\_{3, \max} - \Delta\_{3} \\ t \geq t\_{3} &: \left|\dot{e}\_{2}(t)\right| = H\_{2} + \left|\Lambda\_{1}\right| + \Delta\_{3} + m\_{2} < 2m\_{2} \end{split} \tag{52}$$

For the derivative of the sigmoid function, by (29) on the indicated intervals, we have

$$\begin{array}{l} |\varepsilon\_{i}(t)| > \varepsilon/k\_{i}, \ t \in [0; t\_{i}):\ 0 < 0.5k\_{i}(1 - \sigma^{2}(k\_{i}e\_{i}))<0.18k\_{i},\\ |\varepsilon\_{i}(t)| \le \varepsilon/k\_{i}, \ t \ge \ t\_{i}\ 0.18k\_{i} \le 0.5k\_{i}(1 - \sigma^{2}(k\_{i}e\_{i})) \le 0.5k\_{i}, i = 1, 2 \end{array} \tag{53}$$

Combining (52) and (53), we obtain estimates of the derivatives of fictitious controls (43) on the indicated intervals

$$|\Lambda\_i| = m\_i \frac{k\_i(1 - \sigma^2(k\_i e\_i))}{2} |\dot{e}\_i| \le \begin{cases} 0.36 k\_i m\_i^2 + 0.18 k\_i m\_i (e\_{i+1, \text{max}} - \Delta\_{i+1}), t \in [0; t\_{i+1}); \\ 0.36 k\_i m\_{i'}^2, t \in [t\_{i+1}; t\_i); \\ k\_i m\_{i'}^2, t \ge t\_i; \ i = 1, 2 \end{cases}$$

To uniformly accept as an estimate

$$|\Lambda\_i| \le k\_i m\_i^2, \ t \ge 0; \ i = 1, 2,\tag{54}$$

we need to provide 0.18*kimi*(*ei*+1, max <sup>−</sup> <sup>Δ</sup>*i*+1) <sup>≤</sup> 0.64*kim*<sup>2</sup> *<sup>i</sup>* ⇒ *ei*+1, max − Δ*i*+<sup>1</sup> ≤ 3.5*mi* , *i* = 1, 2. For this we introduce constraints on the peak values of the residuals, slightly lowering the limiting estimates for the convenience of calculations:

$$
\varepsilon\_{i,\max} \le \Im m\_{i-1} + \Delta\_i, \ i = 1, 2 \tag{55}
$$

For consistency, limitation on the overshoot of the output variable also can be introduced:

$$|\mathcal{e}\_1(0)| \le X\_1 < \mathcal{e}\_{1,\text{max}} \le E\_1. \tag{56}$$

In a particular case |*e*1(0)| < Δ1, the implementation of *e*1,max ≤ *E*1= Δ<sup>1</sup> provides |*e*1(*t*)| ≤ Δ1, *t* ≥ 0.

With respect (55) and (56), inequalities (51) take the form

$$\begin{array}{c} e\_{1,\text{max}} \le X\_1 + 3m\_1 t\_2 \le E\_1, \\ e\_{2,\text{max}} \le X\_2 + m\_1 + 3m\_2 t\_3 \le 3m\_1 + \Delta\_2, \\ e\_{3,\text{max}} \le X\_3 + m\_2 \le 3m\_2 + \Delta\_3. \end{array} \tag{57}$$

whence additional conditions follow, which must be taken into account when selecting *t*2,3 (0 < *t*<sup>3</sup> < *t*<sup>2</sup> < *t*1) and amplitudes of fictitious controls:

$$0 < m\_1 \le \frac{E\_1 - X\_1}{3t\_2}, \; 0 < m\_2 \le \frac{2m\_1 + \Delta\_2 - X\_2}{3t\_3};\tag{58}$$

$$m\_1 > \frac{X\_2 - \Lambda\_2}{2}, \ m\_2 > \frac{X\_3 - \Lambda\_3}{2}. \tag{59}$$

Note that, according to constructions (48) *mi*−<sup>1</sup> > <sup>Δ</sup>*i*, *<sup>i</sup>* = 2, 3, while <sup>Δ</sup>*<sup>i</sup>* > 0, *<sup>i</sup>* = 2, 3 can be accepted less or more than values *Xi*. The requirement of smallness is not imposed on them. To simplify the calculations, one can initially fix Δ*<sup>i</sup>* = *Xi*, *i* = 2, 3, which removes the need to check the fulfillment of conditions (59).

In the general case Δ*<sup>i</sup>* < *Xi*, *i* = 2, 3, the inequalities of the lower bound of the selection of amplitudes *mi* will contain two basic components. Due to the first component *mi*1, as well as *m*3, similarly to (38), the convergence of residuals *e*1(*t*), *e*2(*t*), *e*3(*t*) on intervals [*t*2; *t*1], [*t*3; *t*2], [0; *t*3], respectively, from the peak values (51), (57) into the given areas in a given time (46) is ensured. The second component *mi*<sup>2</sup> provides the implementation of constraints (59). In addition, in contrast to the amplitude of the true control *m*3, which is selected only based on the lower estimate, there are upper constraints on the selection of the amplitudes of the fictitious controls (58).

Let us formalize a step-by-step procedure of sequential, "top-down" selection of the amplitudes of sigmoid controls and admissible times *t*2,3 for the given Δ1, *t*1, assigned *E*<sup>1</sup> (56), Δ2,3 > 0, and adopted on their basis *k*<sup>∗</sup> *<sup>i</sup>* , *i* = 1, 3 (47). During the procedure, variation of free parameters is allowed.

**Procedure 2**. Selection of sigmoid feedback amplitudes

Step 1. Using (57), the first inequality (48) takes the form

$$0.8m\_1 \ge \frac{X\_1 + 3m\_1t\_2 - \Delta\_1}{t\_1 - t\_2} + H\_1 + \Delta\_2\\ \Rightarrow m\_{11} \ge \frac{X\_1 - \Delta\_1 + (H\_1 + \Delta\_2)(t\_1 - t\_2)}{0.8t\_1 - 3.8t\_2}.$$

whence the constraint on the selection 0 < *t*<sup>2</sup> < *t*<sup>1</sup> follows:

$$0.8t\_1 - 3.8t\_2 > 0 \Rightarrow t\_2 < 0.2t\_1. \tag{60}$$

Based on (60), we select *t* ∗ <sup>2</sup> > 0 and substitute it into the double inequality

$$\max\{m\_{11}; m\_{12}\} < \overline{m}\_1 < \overline{\overline{m}}\_1,\tag{61}$$

$$m\_{11} = \frac{X\_1 - \Delta\_1 + (H\_1 + \Delta\_2)(t\_1 - t\_2^\*)}{0.8t\_1 - 3.8t\_2^\*}, \\ m\_{12} = \frac{X\_2 - \Delta\_2}{2}, \\ \overline{\overline{m}}\_1 = \frac{E\_1 - X\_1}{3t\_2^\*}. \tag{62}$$

If inequality (61) is satisfied, then we fix *t* ∗ <sup>2</sup>, *m*<sup>∗</sup> <sup>1</sup> ∈(*m*1; *m*1] and go to the second step. If (61) is not satisfied, arbitrary parameters should be varied. This can be performed in two ways.

First way. If it is required to ensure accepted *E*<sup>1</sup> (56), then we vary Δ<sup>2</sup> and/or *t*2. If with the initially accepted 0 < *t* ∗ <sup>2</sup> < 0.2*t*<sup>1</sup> inequality *m*<sup>12</sup> > *m*<sup>11</sup> (62) is valid, then by increasing Δ<sup>2</sup> (up to Δ<sup>2</sup> = *X*2) it is necessary to ensure *m*<sup>11</sup> > *m*12. If with the new Δ<sup>∗</sup> <sup>2</sup> the inequality (61) is not valid and initially *m*<sup>11</sup> > *m*12, then we decrease *t* ∗ <sup>2</sup>. The critical value *t*<sup>2</sup> > 0 :*m*11(*t*2) = *m*1(*t*2) exists and equals

$$\mathbb{Z}\_2 = \frac{\sqrt{p\_2^2 - 4p\_1p\_3} - p\_2}{2p\_1} \text{--}$$

$$\begin{array}{l}p\_1 = -3(H\_1 + \Delta\_2)\_{\prime} \\ p\_2 = 0.8(E\_1 - X\_1) + 3(E\_1 - \Delta\_1 + (H\_1 + \Delta\_2)t\_1)\_{\prime} \\ p\_3 = -0.8(E\_1 - X\_1)t\_1. \end{array}$$

From the limit relation

$$\lim\_{t\_2 \to \to +0} m\_{11}(t\_2) = \frac{X\_1 - \Delta\_1 + (H\_1 + \Delta\_2)t\_1}{0.8t\_1} = \text{const} < \lim\_{t\_2 \to +0} \frac{E\_1 - X\_1}{3t\_2} = +\infty,\tag{63}$$

it follows that *m*<sup>1</sup> can be made arbitrarily large and for any *t* ∗ <sup>2</sup> > 0 :0 < *t* ∗ <sup>2</sup> < *t*<sup>2</sup> inequality (61) will be satisfied.

Thus, by reducing *t*2, it is possible to provide any sufficiently small overshoot in the output variable (56). However, this can lead to a significant increase in the lower bounds of the selection of amplitudes in the following blocks.

Second way. If we abandon the accepted *E*<sup>1</sup> (56) and increase its value

$$E\_1 > E = X\_1 + \mathfrak{Im}\_1^\* t\_{2'}^\* \tag{64}$$

where *E* is the minimum possible overshoot of the output variable with the initial accepted value 0 < *t* ∗ <sup>2</sup> < 0.2*t*1, then one can arbitrarily increase the upper bound *m*<sup>1</sup> of the selection of the amplitude (61).

Step 2. The second inequality (46) is ensured by selection *m*2. With respect (54), (57), the second inequality (48) takes the form

$$\begin{cases} 0.8m\_2 \ge \frac{X\_2 + m\_1^\* + 3m\_2 t\_3 - \Delta\_2^\*}{t\_2^\* - t\_3} + H\_2 + k\_1^\*(m\_1^\*)^2 + \Delta\_3 \Rightarrow\\ m\_{21} \ge \frac{X\_2 + m\_1^\* - \Delta\_2^\* + (H\_2 + k\_1^\*(m\_1^\*)^2 + \Delta\_3)(t\_2^\* - t\_3)}{0.8t\_2^\* - 5.8t\_3}, \end{cases} \tag{65}$$

whence follows a constraint on the selection 0 < *t*<sup>3</sup> < *t* ∗ <sup>2</sup>, similar to (60)

$$0.8t\_2^\*-3.8t\_3 > 0 \Rightarrow t\_3 < 0.2t\_2^\*.\tag{66}$$

Based on (66), we select *t* ∗ <sup>3</sup> > 0 and substitute it into the double inequality

$$<\max\{m\_{21}; m\_{22}\} < \overline{m}\_2 < \overline{\overline{m}}\_{2\prime} \tag{67}$$

where *m*21(*t* ∗ <sup>3</sup> ) (65),

$$m\_{22} = \frac{X\_3 - \Delta\_3}{2}, \overline{\overline{m}}\_2 = \frac{2m\_1^\* + \Delta\_2^\* - X\_2}{3t\_3^\*}. \tag{68}$$

If (67) is satisfied, then we fix *t* ∗ <sup>3</sup>, *m*<sup>∗</sup> <sup>2</sup> ∈(*m*2; *m*2] and go to the third step. If (67) is not fulfilled, arbitrary parameters Δ<sup>3</sup> and/or *t*<sup>3</sup> should be varied. If initially *m*<sup>22</sup> > *m*21, then by increasing Δ<sup>3</sup> (up to Δ<sup>3</sup> = *X*3) we need to ensure *m*<sup>21</sup> > *m*22. If with new Δ<sup>∗</sup> <sup>3</sup> the inequality (67) is not satisfied or initially *m*<sup>21</sup> > *m*22, then we decrease *t* ∗ <sup>3</sup>. The critical value *t*<sup>3</sup> > 0 :*m*21(*t*3) = *m*2(*t*3) exists and equals

$$\begin{aligned} \tilde{t}\_3 &= \frac{\sqrt{q\_2^2 - 4q\_1 q\_3} - q\_2}{2q\_1}, \\ q\_1 &= -3(H\_2 + k\_1^\*(m\_1^\*)^2 + \Delta\_3), \\ q\_2 &= 3(3m\_1^\* + (H\_2 + k\_1^\*(m\_1^\*)^2 + \Delta\_3)t\_2^\*) + 0.8(2m\_1^\* + \Delta\_2 - X\_2), \\ q\_3 &= -0.8(2m\_1^\* + \Delta\_2 - X\_2)t\_2^\*. \end{aligned} \tag{69}$$

From a limit relation similar to (63), namely

$$\lim\_{t\_3 \to +0} m\_{21}(t\_3) = \frac{X\_2 + m\_1^\* - \Delta\_2^\* + (H\_2 + k\_1^\*(m\_1^\*)^2 + \Delta\_3)t\_2^\*}{0.8t\_2^\*} = \text{const} < \lim\_{t\_3 \to +0} \frac{2m\_1^\* + \Delta\_2^\* - X\_2}{3t\_3} = +\infty$$

it follows that for any *t* ∗ <sup>3</sup> > 0:0 < *t* ∗ <sup>3</sup> < *t*<sup>3</sup> inequality (48) is valid.

Note that at the second step (as opposed to the first), the fulfillment of (67) can be ensured only in the indicated way. Increasing the upper limit *m*<sup>2</sup> by increasing *m*∗ <sup>1</sup> will also lead to an increase in the lower limit *m*2(*m*21), and at a faster rate.

Allowable values *t* ∗ <sup>3</sup>, *m*<sup>∗</sup> <sup>2</sup>, Δ<sup>∗</sup> <sup>3</sup> and *k*<sup>∗</sup> 3(Δ<sup>∗</sup> <sup>3</sup> ) are fixed, and then we go to the third step. Step 3. Using (54), (57), the third inequality (48) takes a form similar to (38)

$$m\_3 \ge \overline{m}\_3 = \frac{1.25}{b\_{\min}} \left( \frac{X\_3 + m\_2^\* - \Delta\_3^\*}{t\_3^\*} + F + H\_3 + k\_2^\* \left( m\_2^\* \right)^2 \right). \tag{70}$$

Based on (70), let us fix *m*∗ <sup>3</sup>. The amplitude adjustment procedure is complete.

Thus, there are exist such *ki* > 0, *i* = 1, 3(47), 0 < *mi* < *mi*, *i* = 1, 2(61), (62) and (67) and *m*<sup>3</sup> > 0 (70), that for all *ki* ≥ *ki*, *mi* : *mi* < *mi* ≤ *mi*, ∀*m*<sup>3</sup> ≥ *m*<sup>3</sup> the variables in closed system (42) sequentially converge into the indicated regions within the specified time (46), which ensures the fulfillment of the target condition. Lemma 2 is proved.

The theoretical significance of the obtained results is as follows. It is shown that it is fundamentally possible to ensure any arbitrary small stabilization error of the output variable with any sufficiently small overshoot (56) in any arbitrary small time for any admissible initial conditions (39). However, it must be understood that a decrease in target characteristics (45) will lead to an increase in the parameters of the controller and the values of fictitious and true controls in the transient process, which is undesirable in real automatic control systems.

We can easily extend the procedure presented in the proof of Lemma 2 to *n*-dimensional canonical systems with one input. Accordingly, without restrictions, this approach is applicable to MIMO systems with *m* outputs, in which: (i) the number of inputs is not less than outputs; (ii) the system is representable in the form of *m* input-output subsystems with one input, in which the matrix before the controls has full rank; (iii) there is no internal

dynamics subsystem or its solutions are bounded (i.e., the system is a minimum phase). The more general case of MIMO systems requires additional research.

#### **5. Discussion**

The main result of this work is the use of S-shaped smooth sigmoid functions in the feedback loop as fictitious and true controls when unmatched non-smooth disturbances act on the system. The parameters of the nonlinear stabilizing controller are iteratively selected at the stage of synthesis based on inequalities obtained from the worst possible values of the parameters of the control plant and the boundaries of changes in external influences. This approach does not require reconfiguring the controller when internal and external factors change within acceptable limits. Thus, it simplifies the structure of the controller and decreases the formation time of the control signal, since additional identification of unknown parameters, a compilation of models, and the use of an external disturbance observer are not required. In the process of regulation, the sigmoid fictitious and true controls converge to the unknown bounded external signals matched with them in a finite time and repeat their shape with a predetermined accuracy. Thus, a mechanism of suppressing disturbances, including those that are not into the space of true control, is automatically implemented, which ensures the invariance of the output (controlled) variable.

The boundness of sigmoid feedbacks is their undoubted advantage over the traditionally used linear feedbacks with high-gain factors, leading to a large overshoot. In the paper [33], the results of comparative analysis and modeling of systems with linear and nonlinear local feedbacks operating under uncertainty conditions are shown. In [34,35], the results of modeling closed systems with sigmoid local feedbacks as applied to various electromechanical control plants are presented. The disadvantages of the method include a more complex computational implementation compared to a linear control. However, given the constantly increasing power of modern control microprocessors, this is not a serious obstacle to the use of nonlinear functions in automatic control systems of modern and promising technical objects.

Due to the organization of local feedbacks, the state variables of the closed initial system will "track" bounded sigmoid signals, while the maximum deviations of fictitious controls from "reference influences" are bounded (51). This fact is a prerequisite for the creation of analytical methods of the synthesis of invariant systems, taking into account design constraints on the state and control variables. The solution of this problem is the subject of future research by the authors.

**Author Contributions:** Conceptualization, methodology, S.K. and V.U.; validation, investigation, formal analysis, A.A. and S.K.; writing—original draft preparation, S.K.; writing—review and editing, A.A. and V.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Analysis and Prediction of Electric Power System's Stability Based on Virtual State Estimators**

**Natalia Bakhtadze \* and Igor Yadikin**

V.A. Trapeznikov Institute of Control Sciences, 65 Profsoyuznaya, 117997 Moscow, Russia; jadikin1@mail.ru **\*** Correspondence: sung7@yandex.ru; Tel.: +7-916-544-2259

**Abstract:** The stability of bilinear systems is investigated using spectral techniques such as selective modal analysis. Predictive models of bilinear systems based on inductive knowledge extracted by big data mining techniques are applied with associative search of statistical patterns. A method and an algorithm for the elementwise solution of the generalized matrix Lyapunov equation are developed for discrete bilinear systems. The method is based on calculating the sequence of values of a fixed element of the solution matrix, which depends on the product of the eigenvalues of the dynamics matrix of the linear part and the elements of the nonlinearity matrixes. A sufficient condition for the convergence of all sequences is obtained, which is also a BIBO (bounded input bounded output) systems stability condition for the bilinear system.

**Keywords:** Gramian method; bilinear system process identification; generalized Lyapunov equation; knowledgebase; associative search models; wavelet analysis

#### **1. Introduction**

Stability estimators (soft sensors) are increasingly used in mode planning and supervisory control in present day electric power systems (EPS) [1–19]. Due to the growing popularity of distributed generation and renewable energy sources (RES) in EPS, its operation modes may approach the stability limits. It is, therefore, necessary to improve both predictive modeling techniques and the methods of analysis and preventive control not only for small deviation modes, but also for essentially nonlinear ones [14,20–23]. Bilinear systems are the closest ones to the essentially nonlinear class among all nonlinear systems. Therefore, research methods for bilinear systems have been actively developed over recent decades [2–5,20,21,24–28]. Spectral methods of stability analysis, in particular, selective modal analysis, are widely used in the design and operation of EPS [16,18,22,23].

This article presents the results of the development of these methods and their extension over the class of bilinear EPS models. This will expand their application area [7,8,19,21,28]. To create bilinear models of discrete stationary dynamical systems, digital identification methods and associative search algorithms are used, based on the intelligent analysis of big data obtained from system operation monitoring.

The work [13] develops the Poincaré normal form method for analyzing the stability of energy systems based on continuous dynamical systems with smooth nonlinearities. This approach can be considered to be an alternative to stability estimator development. The article [21] uses a virtual model of energy system's inertia to control the frequency in a system with a high level of microgrid penetration that shows the possibility of using stability estimators not only for stability monitoring tasks, but also for controlling the frequency of low-frequency oscillations.

In [26], Volterra equations are proposed for analyzing the stability of power systems with renewable energy sources. As against [26], Volterra equations are used by the authors for developing digital twins of bilinear models for EPS. The work [27] shows an effective method of Lyapunov stability indices for studying small-signal stability of EPS, which can be used to solve problems discussed in our article.

**Citation:** Bakhtadze, N.; Yadikin, I. Analysis and Prediction of Electric Power System's Stability Based on Virtual State Estimators. *Mathematics* **2021**, *9*, 3194. https://doi.org/ 10.3390/math9243194

Academic Editor: Ioannis Dassios

Received: 5 October 2021 Accepted: 7 December 2021 Published: 10 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Separable spectral expansions of discrete Lyapunov equations are obtained for MISO LTI (multiple input single output linear time invariant) discrete dynamical systems governed by state equations in controllability and observability forms. A method and an algorithm for the element-wise solution of the generalized matrix Lyapunov equation are developed for discrete bilinear systems. The method is based on calculating the sequence of values of a fixed element of the solution matrix, which depends on the product of the eigenvalues of the dynamics matrix of the linear part and the elements of the nonlinearity matrixes. The new method is a spectral version of the iterative method used for solving this equation.

The new method changes the very computing paradigm: sequences of elements are now calculated instead of decision matrixes. In addition to this paradigm change, the method changes the approach to studying the stability of the initial nonlinear model of the power system. The convergence of all sequences of elements means BIBO stability of the original system in a wider area than the area of "small-signal stability of power systems", as well as identify new indicators of stability in this area.

For developing bilinear models of discrete stationary dynamical systems of this class, the authors propose digital identification methods and associative search algorithms, based on the intelligent analysis of big data collected during system operation.

Computing speed is the key advantage of such identification techniques against the known methods of bilinear model development. The off-line training of the identification system is carried out in advance; further, in the course of real-time operation, the current values of model parameters are obtained using the knowledge accumulated at the training phase [29]. This is done by choosing analogs from the appropriate cluster [18]. It should be noted that the associative search method creates a new linear model at each time step. Section 2 shows how such a model can be further used for digital bilinear model development. In Section 6, an example of obtaining the values of the parameters of linear associative models is cited. In future studies, it is planned to develop a version of this method for the non-stationary case.

#### **2. Knowledge-Based Bilinear Models of Discrete Stationary Dynamic Systems**

The essence of the machine learning procedure is as follows [9]. For the current time instant *<sup>k</sup>*, a set of impacts *uk* (*uk* <sup>∈</sup> <sup>R</sup>*m*) on the stationary system during the time interval *T* = {*k* − *T*, *k* − *T* + 1, . . . , *k*}, is divided into clusters (together with the corresponding values of the outputs *yk*−*i*, *<sup>i</sup>* = 0, ... , *<sup>T</sup>*). The clustering procedure is carried out with reference to the distance between the vectors. For the current vector *uk*, a set of vectors *uk*−*<sup>i</sup>* and the corresponding outputs *yk*−*<sup>i</sup>* are collected within the corresponding cluster. Next, a system of linear equations is formed for the unknown coefficients and the output *yk*. Unlike traditional regression models, this model does not contain all the prehistory, but rather especially selected vectors (the closest to the current input vector subject to a certain criterion) named "associations".

The least squares method provides a solution to this system of equations, which is optimal if the conditions of the Gauss-Markov theorem are met [30]. Statistical independence of the model variables is a condition of this theorem, which is not met for closed-loop systems. The transition to a system of simultaneous linear equations can be done in particular per the Moore-Penrose procedure [31,32]. As a result, a pseudo-solution of the original system of equations can be obtained such that the resulting linear model will have accuracy admissible for a wide range of applications.

It should be noted that the described identification algorithm generates point models, the best ones for the nonlinear system under investigation at a time instant. Therefore, unlike traditional identification algorithms, we do not improve a single model ad infinitum, rather we deal with a sequence of digital ad hoc models; each one is the best fit at the specific time instant subject to the chosen criterion.

Another feature of the models obtained by machine learning is the fact that if the corresponding model accuracy requirements are met, then the solution does not need to be found every time, it can be rather "found" in the cluster, which contains the current vector *uk*. This can be the "nearest neighbor", or a vector selected in some other way, in particular, the cluster's centroid.

If it is nevertheless necessary to solve a system of linear equations, it will be possible in the near future using quantum algorithms.

Machine learning procedures are carried out off-line, at the training stage. Therefore, this identification algorithm demonstrates high speed.

There is a class of essentially nonlinear systems, for which the accuracy of point linear models may be insufficient. This class can include nonlinear systems described by the following equations:

$$\begin{array}{l} \dot{\mathbf{x}}(t) = f(\mathbf{x}, u, t) = f(\mathbf{x}, t) + b(\mathbf{x})u(t), \; \mathbf{x}(0) = 0\\ \mathbf{y}(k) = \mathbf{C}\mathbf{x}(k), \end{array} \tag{1}$$

where *<sup>x</sup>*(*t*) is the state vector of the system, *<sup>x</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*n*, *<sup>f</sup>* and *<sup>b</sup>* are nonlinear functions. Models of such systems in the form of:

$$
\dot{x}(t) = ax(t) + Nx(t)u(t) + bu(t) \tag{2}
$$

where *ax*(*t*) + *bu*(*t*) is the system's linear part, are called *bilinear models*.

If the matrix *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*n*×*m*, the system can be represented as

$$\begin{array}{c} \dot{\mathbf{x}}(t) = A\mathbf{x} + \sum\_{\gamma=1}^{m} N\_{\gamma} \mathbf{x} \mu\_{\gamma} + Bu\_{\prime} \\ y = \mathbf{C} \mathbf{x}, \end{array} \tag{3}$$

or, in the discrete case:

$$\begin{aligned} \mathbf{x}(k+1) &= A\mathbf{x}(k) + \sum\_{\gamma=1}^{m} N\_{\gamma} \mathbf{x}(k) u\_{\gamma}(k) + Bu(k), \; \mathbf{x}(0) = 0, \\ \mathbf{y}(k) &= \mathbf{C} \mathbf{x}(k), \end{aligned} \tag{4}$$

where *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>1, *<sup>u</sup>* <sup>∈</sup> <sup>R</sup>*m*, *<sup>A</sup>*, *<sup>B</sup>*, *<sup>C</sup>*, *<sup>N</sup><sup>γ</sup>* are matrices of appropriate dimensions. Equation (4) can be rewritten as:

$$\mathbf{x}(k+1) = [A\dot{\mathbf{i}}\ \dot{\mathbf{N}}\ \dot{\mathbf{i}}\ \cdots \ \dot{\mathbf{i}}\ \dot{\mathbf{N}}\ \mathbf{n}\ ] \tilde{\mathbf{x}}(k) + Bu(k),\tag{5}$$

where

$$
\tilde{\mathbf{x}}(k) = \begin{bmatrix}
\mathbf{x}(k) \\
\mathbf{x}(k) \cdot \mathbf{u}\_1(k) \\
\vdots \\
\mathbf{x}(k) \cdot \mathbf{u}\_m(k)
\end{bmatrix} \in \mathbb{R}^{1 \times (n \times (m+1))}.\tag{6}
$$

Thus, we get the representation:

$$\begin{aligned} \mathfrak{x}(k+1) &= A\widetilde{\mathfrak{x}}(k) + Bu(k), \\ \widetilde{A} &= [A \dot{\mathfrak{x}} N\_1 \dot{\mathfrak{x}} \cdots \dot{\mathfrak{x}} N\_m], \end{aligned} \tag{7}$$

where:

$$D^n = \left[ \begin{array}{c} D^{0n} \\ \dots \\ D^{nn} \end{array} \right], \\ D^{0n} = \left[ \begin{array}{ccc} 1 & \cdots & 0 \\ \vdots & 1 & \vdots \\ 0 & \cdots & 1 \end{array} \right], \\ D^{in} = \left[ \begin{array}{ccc} u\_i(k) & \cdots & 0 \\ \vdots & u\_i(k) & \vdots \\ 0 & \cdots & u\_i(k) \end{array} \right], \\ i = 1, \dots, n.$$

For simplicity, we assume that *m* = *n*. Otherwise, the matrices *Ni* are padded with zero rows so that the number of rows is equal to *n*.

Then Equation (4) can be represented as

$$\mathbf{x}(k+1) = \tilde{A}D^\eta \mathbf{x}(k) + Bu(k). \tag{8}$$

Furthermore, to identify the parameters of the system, we will carry out the transition from a state-space model to a linear input-output model. This transition is a standard procedure described, e.g., in [33]. The identification is carried out using the associative search algorithm together with the Moore-Penrose procedure [31,32], which delivers a solution to a system of linear equations with the statistical dependence of the components of the vector *<sup>x</sup>*.

Returning to the canonical form of the model results in an estimate of all parameters, i.e., the updated bilinear model. The ability to determine the system's state and output for various control impacts enables the usage of identification models for predicting the approach to stability boundaries in advance.

#### **3. Controllability and Observability Gramians of Discrete Stationary Bilinear Systems**

Let the model (4) of a discrete stationary dynamical system be obtained as a result of identification using the algorithms described in Section 2. We will assume that it belongs to the class of MISO LTI systems.

Consider a MISO LTI discrete stationary dynamical system in the form:

$$\begin{array}{c} \text{x}(k+1) = A\text{x}(k) + Bu(k), \text{ x}(0) = 0, \\ y(k) = \text{Cx}(k), \end{array} \tag{9}$$

where *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>1, *<sup>u</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>*

We will consider real matrices of the corresponding sizes *A*, *B*, *C*. Let us assume that the system (9) is stable, fully controllable and fully observable, all eigenvalues of matrix *A* are different. Consider discrete algebraic Lyapunov equations associated with Equation (9) in the form:

$$\begin{aligned} A P^c A^\* + B B^\* &= P^c \\ A^\* P^o A + C^\* C &= P^o \end{aligned}$$

Consider a bilinear discrete stationary dynamical system in the form:

$$\begin{aligned} \mathbf{x}(k+1) &= A\mathbf{x}(k) + \sum\_{\substack{\gamma=1\\\gamma(k)=1}}^{m} N\_{\uparrow}\mathbf{x}(k)\boldsymbol{\mu}\_{\uparrow}(k) + Bu(k), \; \mathbf{x}(0) = \mathbf{0},\\ \mathbf{y}(k) &= \mathbf{C}\mathbf{x}(k), \end{aligned} \tag{10}$$

where *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*n*, *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>1, *<sup>u</sup>* <sup>∈</sup> <sup>R</sup>*m*, *<sup>A</sup>*, *<sup>B</sup>*, *<sup>C</sup>*, *<sup>N</sup><sup>γ</sup>* are matrices of appropriate dimensions. One of the most important properties of control systems is the controllability. In [4,5], controllability and observability Gramians of discrete bilinear dynamical systems were introduced and iterative algorithms for their computation were proposed. Let us denote:

$$P\_{\bar{1}}(k\_1) = A^{k\_1} B\_{\prime}$$

$$P\_{\bar{1}}(k\_1, \dots, k\_{\bar{i}}) = A^{k\_{\bar{i}}} [\mathcal{N}\_1 P\_{\bar{i}-1} \mathcal{N}\_2 P\_{\bar{i}-1} \dots \mathcal{N}\_m P\_{\bar{i}-1}], \ i \ge 2, \ i$$

The controllability Gramian of a bilinear system is defined as follows:

$$P = \sum\_{i=1}^{\infty} \sum\_{k\_i=0}^{\infty} \dots \dots \sum\_{k\_i=0}^{\infty} P\_i P\_i^T. \tag{11}$$

It was shown in [5] that if the spectrum of *A* belongs to the interior of the unit circle, then, under certain additional conditions, the solutions to the following two Lyapunov generalized matrix equations:

$$PAP^\* + BB^\* + \sum\_{j=1}^{m} N\_j PN\_j^\* = P\_\prime \tag{12}$$

$$A^\*QA + C^\*C + \sum\_{j=1}^m N\_j Q N\_j^\* = Q\_{\prime} \tag{13}$$

are the Gramians of controllability and observability. The Gramian of controllability of a bilinear system is the limiting solution

$$P = \lim\_{i \to \infty} P\_i \tag{14}$$

obtained as a result of the implementation of the following iterative procedure

$$AP\_1A^\* - P\_1 = -BB^\*,$$

$$AP\_iA^\* - P\_i + \sum\_{j=1}^m N\_jP\_{i-1}N\_j^\* = 0, \; i = 2, \dots, \infty. \tag{15}$$

Similarly, the observability Gramian of a bilinear system is the limiting solution obtained by implementing a similar iterative procedure. The disadvantage of such procedures is that the resulting limiting solution is not always the corresponding Gramian of the bilinear system.

Our goal is to create improved iterative algorithms for calculating the Gramians of bilinear systems and to develop a method and algorithms for calculating the stability indices of bilinear systems based on them. To achieve this goal, it is proposed to change the computation paradigm by transferring computations from the matrixes to their elements in the course of iterations.

The very idea of the element-wise computation of Gramians is not new: for example, the method for vectorizing the solution of generalized matrix Lyapunov equations is based on it [3,5]. However, the calculation of sequences of numeric elements of Gramian matrixes will reveal new patterns of sequences behavior, for example, the formation of geometric progressions of elements. This will allow investigating the behavior of sequences for small, medium and large matrixes and develop new approaches to approximate calculations. Another argument in favor of the element-wise approach is that this approach to calculating the spectral decompositions of Gramians of linear continuous systems was previously proposed in [7] and has shown its effectiveness.

#### **4. Iterative Methods for Calculating the Solutions of the Generalized Lyapunov Equations for Canonical State form Equations**

Consider further the spectral methods of Gramians calculating for discrete linear systems. These methods were studied in early works [1–3,7,11] Consider a MIMO LTI discrete system reduced using a non-degenerate coordinate transformation to the diagonal form of the dynamics matrix

$$\begin{array}{l} \mathbf{x} = T\mathbf{x}\_d \; \mathbf{x}\_d(k+1) = \Lambda \mathbf{x}\_d(k) + B\_d \boldsymbol{u}(k), \; y\_d(k) = \mathbf{C}\_d \mathbf{x}\_d(k),\\ \boldsymbol{\Lambda} = T^{-1} \mathbf{A} \mathbf{T}, \; B\_d = T^{-1} \mathbf{B}, \; \mathbf{C}\_d = \mathbf{C} \mathbf{T}, \end{array} \tag{16}$$

or

$$A = \begin{bmatrix} \ u\_1 & u\_2 \cdot \cdot & \cdot & u\_n \end{bmatrix} \begin{bmatrix} z\_1 & 0 & 0 & 0 \\ 0 & z\_2 & & 0 \\ 0 & & \cdot & \\ 0 & 0 & & z\_n \end{bmatrix} \begin{bmatrix} v\_1^\* \\ v\_2^\* \\ \vdots \\ \vdots \\ v\_n^\* \end{bmatrix} = T\Lambda T^{-1}, \; TV = VT = \mathbf{I}, \tag{17}$$

where the matrix *T* consists of the right eigenvectors *ui*, and matrix *T*−<sup>1</sup> consists of the left eigenvectors *v*∗ *<sup>i</sup>* corresponding to the eigenvalues *zi*. The last equality is a condition for eigenvectors normalization.

In particular, in [8], the spectral decompositions of Gramian controllability and observability matrixes for LTI MIMO discrete stable systems with a simple spectrum are as follows:

$$P^c = \sum\_{k=1}^n \sum\_{\rho=1}^n P^c\_{k\rho\prime} \cdot P^c\_{k\rho} = \sum\_{\eta=0}^{n-1} \sum\_{j=0}^{n-1} \frac{z\_k^j z\_\rho^\eta}{\dot{N}(z\_k)\dot{N}(z\_\rho)} \cdot \frac{1}{1 - z\_\theta z\_k} A\_j B B^T A\_{\eta\prime}^T \tag{18}$$

$$P^o = \sum\_{k=1}^n \sum\_{\rho=1}^n P^o\_{k,\rho'} \ P^o\_{k,\rho} = \sum\_{\eta=0}^{n-1} \sum\_{j=0}^{n-1} \frac{z\_k^j z\_\rho^\eta}{\dot{N}(z\_k)\dot{N}(z\_\rho)} \cdot \frac{1}{1 - z\_\theta z\_k} A^T\_\eta \mathbb{C}^T \mathbb{C} A\_{j\prime} \tag{19}$$

where *zk*, *z<sup>ρ</sup>* are the roots of the characteristic equation, *Aj*, *A<sup>T</sup> <sup>η</sup>* are the Faddeev matrices. For the Lyapunov equations of the same diagonalized systems of the form:

$$\begin{array}{c} \Lambda P\_d^c \Lambda^\* + B\_d B\_d^\* = P\_{d'}^c V\_d = B\_d B\_{d'}^\* \\ \Lambda^\* P\_d^o \Lambda + C\_d^\* C\_d = P\_{d'}^o W\_d = C\_d^\* C\_{d'} \end{array}$$

we have the following formulas for spectral decomposition:

$$P\_{d, \rho k}^{c} = \frac{1}{1 - z\_{\rho} z\_{k}} R\_{k} B\_{d} B\_{d}^{\*} R\_{\rho \prime}^{\*} \,\,\forall z : |z| < 1,\tag{20}$$

$$P\_{d, \rho k}^{\rho} = \frac{1}{1 - z\_{\rho} z\_k} R\_k^\* \mathbb{C}\_d^\* \mathbb{C}\_d R\_{\rho \prime} \,\,\forall z : |z| < 1,\tag{21}$$

where *Rk* are the residues of the resolvent of the matrix Λ in the eigenvalues of the matrix *zk*. The elements "*ρk*" of the sub-Gramian matrixes (20)–(21) satisfy the formulas:

$$p\_{d\rho k}^c = \frac{1}{1 - z\_{\rho} z\_k} \upsilon\_{d\rho k,} \ p\_{d\rho k}^o = \frac{1}{1 - z\_{\rho} z\_k} \upsilon\_{d\rho k.} \tag{22}$$

When transforming Equation (4) by decomposing the matrix *A* in its eigenvalues (16)–(17), we obtain the equations

$$\begin{aligned} \mathbf{x}\_d(k+1) &= \Lambda \mathbf{x}\_d(k) + \sum\_{\gamma=1}^m N\_{d\gamma} \mathbf{x}\_d(k) \boldsymbol{\mu}\_{\gamma}(k) + B\_d \boldsymbol{\mu}(k), \; \mathbf{x}\_d(0) = \mathbf{0}, \\ \boldsymbol{y}(k) &= \mathbf{C}\_d \boldsymbol{x}(k), \boldsymbol{k} = \mathbf{0}, 1, 2 \dots \\ & \quad \boldsymbol{N}\_{d\gamma} = T \boldsymbol{N}\_{\gamma}. \end{aligned} \tag{23}$$

**Definition 1.** *Consider the following matrix and vector identities:*

$$A \equiv \sum\_{i,j} a\_{ij} \mathbf{1}\_{ij\prime} \cdot \{a\} \equiv \sum\_{i} a\_{i} \mathbf{1}\_{i\prime} \cdot \mathbf{1}\_{ij} = \mathfrak{e}\_{i} \mathfrak{e}\_{j}^{T} \prime \ \mathbf{1}\_{i} = \mathfrak{e}\_{i\prime}$$

*where the unit vector is as follows:*

*ei* <sup>=</sup> 0... 0 1 0 ...0 *<sup>T</sup>* .

We call the above decompositions of matrixes and vectors *separable decompositions*. The separability property means the change in the very paradigm of solution computing: the transition from the matrix-vector consideration to the element-wise one.

Let us derive a formula for spectral decomposition of Gramian for the dynamics matrix of the system, transformed into the canonical controllability form using a linear nondegenerate transformation of coordinates with the following matrix:

$$\mathcal{R}\_{c}^{F}{}\_{\prime} \propto = \mathcal{R}\_{c}^{F} \mathbf{x}\_{c}.$$

We assume full-controllability and full-observability conditions are fulfilled. Furthermore, we consider the channel "γ" MISO LTI of the linear system in the canonical controllability form:

$$\begin{array}{ll} \mathfrak{x}\_{\mathfrak{c}}(k+1) = A\_{\mathfrak{c}}^{F}\mathfrak{x}\_{\mathfrak{c}}(k) + b\_{\gamma}^{F}\mathfrak{u}(k), \; \mathfrak{x}\_{\mathfrak{c}}(0) = 0, \\\ y(k) = c\_{\gamma}^{F}\mathfrak{x}(k), k = 0, 1, 2 \dots \end{array} \tag{24}$$

$$A\_{\mathfrak{c}}^{F} = \begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 \\ -a\_{0} & -a\_{1} & -a\_{2} & \dots & -a\_{n-1} \\ & & & & c\_{\gamma}^{F} = \begin{bmatrix} \mathfrak{f}\_{0} & 0 & \dots & 0 & 1 \end{bmatrix}^{T}, \mathfrak{N}\_{\mathcal{C}\gamma}^{F} = \begin{pmatrix} \mathfrak{R}\_{\mathfrak{c}}^{F} \end{pmatrix}^{-1} \mathfrak{N}\_{\mathcal{T}}.$$

The following relations are valid:

$$R\_{\mathfrak{c}}^{F} = \begin{bmatrix} \ \ B & \ AB & \dots & \dots & A^{n-1}B \end{bmatrix} \begin{bmatrix} a\_1 & a\_2 & a\_{n-1} & 1\\ a\_2 & a\_3 & a\_{n-1} & 1\\ & a\_{n-1} & & & 0\\ a\_{n-1} & 1 & & 0 & 0\\ 1 & 0 & 0 & 0 & 0 \end{bmatrix},$$

$$\left(R\_{\mathfrak{c}}^{F}\right)^{-1} A R\_{\mathfrak{c}}^{F} = A^{F}, \left(R\_{\mathfrak{c}}^{F}\right)^{-1} B = B^{F}, \mathcal{C} R\_{\mathfrak{c}}^{F} = \mathcal{C}^{F}.$$

**Lemma 1.** *Consider a linear discrete MISO system in the form (2) represented by equations in the canonical form of controllability of the form (24). Let us further consider the decomposition of the dynamics matrix A<sup>F</sup> resolvent into a segment of the Faddeev series of the form*

$$\left(Iz - A^F\right)^{-1} = \sum\_{j=0}^{n-1} \frac{A\_j^F z^j}{N(z)}.$$

*where: N*(*z*) *is a characteristic polynomial, A<sup>F</sup> <sup>j</sup> is Faddeev matrix, j* = 1, 2, . . . *n*.

The elements of the last column of the matrix *A<sup>F</sup> <sup>j</sup>* satisfy the statements:

$$\left\{a\_{n-k,n}^F\right\}^T = e\_{n-k'}^T \; k = 0, 1, 2\dots, n-1. \tag{25}$$

**Comment.** *Note, first that the decomposition of the resolvent in the Faddeev series form does not require calculating the eigenvalues of the dynamics matrix AF. Second, the transfer function of the* "*γ*" *channel of the linear part is determined by the formula:*

*VFln <sup>γ</sup>* <sup>=</sup> *<sup>ξ</sup>*<sup>0</sup> *<sup>ξ</sup>*<sup>1</sup> ... *<sup>ξ</sup>n*−<sup>2</sup> *<sup>ξ</sup>n*−<sup>1</sup> *Iz* <sup>−</sup> *<sup>A</sup><sup>F</sup>* −<sup>1</sup> *bF <sup>γ</sup>*, *b<sup>F</sup> <sup>γ</sup>* <sup>=</sup> <sup>0</sup> ... 0 <sup>1</sup> *<sup>T</sup>* ,

*hence it follows that it is determined only by the elements of the last column of the matrix AF*.

**Proof.** Consider the expansion of the resolvent of the matrix *A<sup>F</sup>* in the form of a segment of the Faddeev series

$$\left(Iz - A^F\right)^{-1} = \frac{\sum\_{j=0}^{n-1} A^F\_j z^j}{N(z)}.$$

We will accept

$$N(z) = z^n + a\_{n-1}z^{n-1} + \dots + a\_1z + a\_0, \ R\_j = A\_{j-1'}^F \ j = 1, 2, \dots \\ n.$$

We apply the method of mathematical induction. An iterative algorithm for calculating the Faddeev matrixes and the coefficients of the characteristic equation has the form at the first step [17]:

$$a\_{\mathfrak{n}} = 1, \ R\_{\mathfrak{n}} = A\_{cn-1}^F = I\_{\mathfrak{n}}$$

at the step "*k*":

$$a\_{n-k} = -\frac{1}{k}tr\left(A^F R\_{n-k+1}\right), \ R\_{n-k} = a\_{n-k}I + A^F R\_{n-k+1}, \ k = 1, 2, \dots \\ n.$$

Consider the formation of the last column of matrixes *A<sup>F</sup> <sup>n</sup>*−*k*. The first step:

$$A\_{cn-1}^F = I. \left\{ a\_{n-1,n}^F \right\} = \begin{bmatrix} 0 & 0 & \dots & 0 & 1 & \dots \end{bmatrix}^T$$

.

The second step:

$$A\_{cn-2}^F = \begin{bmatrix} 0 & 1 & 0 & & 0 \\ 0 & 0 & 1 & & 0 \\ 0 & 0 & & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ -a\_1 & -a\_2 & \dots & -a\_{n-2} & 0 \end{bmatrix}, \begin{Bmatrix} a\_{cn-2,n}^F \end{Bmatrix} = \begin{bmatrix} 0 & \dots & 0 & 1 & 0 \end{bmatrix}^T.$$

Suppose that for step "*<sup>k</sup>* <sup>−</sup> 1", the last column of the matrix *<sup>A</sup><sup>F</sup> cn*−(*k*−1) has the form:

$$\left\{ a\_{cn-(k-1),n}^F \right\} = \left[ \underbrace{0}\_{1} \dots \underbrace{0}\_{n-(k-2)} \dots \underbrace{1}\_{n-(k-1)} \dots \underbrace{0}\_{n} \dots \right]^T \dots$$

We introduce the notation:

$$\mathcal{A}\_c^F \mathcal{A}\_{c n - (k - 1)}^F = \mathcal{S}\_\prime \mathcal{S} = \begin{bmatrix} \ \{s\_1\} & \{s\_2\} & \dots & \{s\_n\} & \dots \end{bmatrix}.$$

The last column of the matrix is:

$$\{s\_n\} = \left[ \underbrace{0}\_1 \quad \dots \quad \underbrace{1}\_{n-k} \quad \underbrace{0}\_{n-(k-1)} \quad \underbrace{-a\_{cn-(k-1),n}^F}\_{n} \quad \right]^T.$$

In accordance with the Faddeev—Le Verrier algorithm, we have:

$$\left\{ a\_{cn-k,n}^F \right\} = \left[ \underbrace{0}\_1 \quad \dots \quad \underbrace{1}\_{n-k} \quad \underbrace{0}\_{n-(k-1)} \quad \underbrace{0}\_{n} \quad \right]^T.$$

**Corollary 1.** *Without loss of generality, we assume m = 1. The general formulas (9) of spectral decompositions for the controllability Gramians of the linear system transformed in the canonical controllability form, taking into account the Lemma 1, acquire a simpler form:*

$$P^{\varepsilon F} = \sum\_{k=1}^{n} \sum\_{\rho=1}^{n} P\_{k,\rho\prime}^{\varepsilon F} \cdot P\_{k,\rho}^{\varepsilon F} = \sum\_{\eta=0}^{n-1} \sum\_{j=0}^{n-1} \frac{z\_k^j z\_{\rho}^{\eta}}{\dot{N}(z\_k)\dot{N}(z\_{\rho})} \frac{1}{1 - z\_{\rho} z\_k} \varepsilon\_{j+1} e\_{\eta+1}^T. \tag{26}$$

A similar approach can be used to derive a formula for spectral decompositions for observability Gramians of the MISO system transformed in the canonical form of observability. In this case, the following formulas are valid [18]:

$$\boldsymbol{\Lambda}\_{0}^{F} = \left\{ \left[ \begin{array}{ccc} \boldsymbol{a}\_{1} & \boldsymbol{a}\_{2} & \boldsymbol{a}\_{n-1} & \boldsymbol{1} \\ \boldsymbol{a}\_{2} & \boldsymbol{a}\_{3} & \boldsymbol{a}\_{n-1} & \boldsymbol{1} \\ & \boldsymbol{a}\_{n-1} & & \boldsymbol{0} \\ \boldsymbol{a}\_{n-1} & \boldsymbol{1} & & \boldsymbol{0} \\ \boldsymbol{1} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} \end{array} \; \middle| \; \begin{array}{c} \boldsymbol{c} \\ \boldsymbol{c} \boldsymbol{A} \\ \vdots \\ \boldsymbol{c} \boldsymbol{A}^{n-1} \end{array} \right\} \right\}^{-1}$$

Let us use (16) and consider the formation of the expression *AFT oj <sup>c</sup>*<sup>F</sup>*Tc*<sup>F</sup>*A<sup>F</sup> oj*. In accordance with duality principle, due to the complete controllability and observability, the following formula is valid:

$$\boldsymbol{c}^{F}\boldsymbol{A}\_{\boldsymbol{o}\boldsymbol{j}}^{F} = \left(\boldsymbol{A}\_{\boldsymbol{c}\boldsymbol{j}}^{F}\right)^{T}\boldsymbol{b}^{F}.\tag{27}$$

.

Substituting (27) into (19), we obtain the expression:

$$P^{oF} = \sum\_{k=1}^{n} \sum\_{\rho=1}^{n} P^{oF}\_{k,\rho} = \sum\_{\eta=0}^{n-1} \sum\_{j=0}^{n-1} \frac{z\_k^j z\_\rho^\eta}{\dot{N}(z\_k)\dot{N}(z\_\rho)} \frac{1}{1 - z\_\rho z\_k} \varepsilon\_{j+1} e\_{\eta+1}^T, \ P^{oF} = P^{cF}.\tag{28}$$

The Gramians of the original system are related to the Gramians of the systems transformed into canonical forms as follows

$$R\_c^F P^{cF} R\_c^{FT} = P^c,\ \left(R\_o^{FT}\right)^{-1} P^{oF} \left(R\_o^F\right)^{-1} = P^\mathcal{o}.$$

Please note that in this case the expressions of the Gramians and sub-Gramians of controllability and observability depend only on the eigenvalues of the dynamics matrix. In addition, the proposed approach using canonical forms made it possible to simplify the general formulas significantly.

#### **5. Separable Spectral Method and Algorithm for Solving the Generalized Lyapunov Equation**

**Theorem 1.** *Consider a MISO (multiple input single output) discrete bilinear stationary system in the form (4) [3–5,24].*


$$\left\|\mathfrak{u}(k)\right\| = \sqrt[2]{\sum\_{i=1}^{m} \left|u\_i(k)\right|^2} < M.$$

(3) There exist real numbers *α*, *ρ* such that the following inequalities hold:

$$\begin{array}{c} \|A\| \le \varkappa \rho^i, \ i = 0, 1, 2, \dots, \varkappa \\ \rho < i, \varkappa > 0, \\ M < \sqrt{1 - \rho^2 \kappa^{-1}}. \end{array}$$

(4) Suppose, in addition, the following conditions are satisfied:

$$\left| \frac{p\_{ij}^{\mathbb{C}\ \operatorname{lln}(k+1)}}{p\_{ij}^{\mathbb{C}\ \operatorname{lln}(k)}} \right| \le \left| \nabla L < 1 \right.\\ \left. \forall k, \nu, \mu, j, \eta, \gamma, \end{aligned} \right. \tag{29}$$

where

$$\overline{N} = \text{supp}\, n^2 \underline{|n\_{dvi}^\gamma||n\_{dvj}^\gamma|} \; \; L = \left| \left( 1 - z\_{\nu \text{max}} z\_{\nu \text{max}} \stackrel{\*}{}{\right)}^{-1} \right|. \tag{30}$$

In (30), *zνmax* denotes the maximum eigenvalue of the dynamics matrix of the linear part of the system.

Then, there also exists a uniquely following separable iterative spectral solution to the generalized Lyapunov equation (12) for the diagonalized system (16):

$$p\_d^{(k)ij} = \left(\sum\_{\nu,\mu} p\_{d\nu\mu}^{(k-1)ij}\right) \left[\left(1 - z\_{\nu}z\_{\mu}\right)^{-1} n\_{d\nu i}^{\gamma} n\_{d\mu j}^{\gamma}\right], k = 2, 3, \dots \infty. \tag{31}$$

$$\forall \nu, \mu, i, j = 1, 2, \dots \ n; \gamma = 1, 2, \dots \, m.$$

Sequences of partial sums (31) converge uniformly and absolutely to the corresponding elements of the solution matrix of the generalized Lyapunov equation (12) if the conditions of the theorem are satisfied. The controllability Gramian of the original bilinear system *Pcbln* is related to the controllability Gramian of the diagonalized bilinear system *Pcbln <sup>d</sup>* as follows

$$T P\_d^{cbln} T^T = P^{cbln}.\tag{32}$$

**Proof.** Consider an iterative process, which develops the solution of (31).

Step 1. Let us consider the forming of the right-hand side of the generalized Lyapunov equation for the case *m* = 1. We do not need the matrix of the Lyapunov equation solution of the linear part; rather we need a separable spectral decomposition of this solution in the pair spectrum of the matrix [18]:

$$P\_d^{\hbar ln(1)} = \sum\_{\gamma=1}^m \sum\_{\nu=1}^n \sum\_{\mu=1}^n \frac{\upsilon\_{d,\gamma\nu\mu}}{1 - z\_\nu z\_\mu} \mathbf{1}\_{\nu\mu}.\tag{33}$$

Step 2. Consider the formation of the right-hand side of the generalized Lyapunov equation with the example of the matrix *N<sup>γ</sup> <sup>d</sup>* <sup>1</sup>*ij*- *Nγ d T* :

$$\left(N\_d^\gamma \mathbf{1}\_{ij} \left(N\_d^\gamma \right)^T\right)^T = \sum\_{\nu=1}^n \sum\_{\mu=1}^n n\_{d,\nu i}^\gamma n\_{\mu j}^\gamma \mathbf{1}\_{\nu \mu \nu} \tag{34}$$

The solution of the Lyapunov equation takes at Step 2 the form:

$$P\_d^{\hbar ln(2)} = \sum\_{\gamma=1}^m \sum\_{\nu=1}^n \sum\_{\mu=1}^n \frac{1}{1 - z\_{\nu} z\_{\mu}} \left( n\_{d,\nu i}^{\gamma} n\_{d\mu j}^{\gamma} \right) p\_{\nu\mu}^{\hbar ln(1)i\gamma} \mathbf{1}\_{\nu\mu}.$$

Proceeding in a similar way and taking into account the summation of sub-Gramians over the index "*γ*", we obtain a formula for calculating the matrix of the Gramian kernel of the order "*k*" at step "*k*".

$$P\_d^{bln(k)i\dot{\gamma}\gamma} = \sum\_{\nu\_\gamma\mu} r^{(k)i\dot{\gamma}\gamma} p\_{\nu\mu}^{(k-1)i\dot{\gamma}\gamma} \mathbf{1}\_{\mathbf{v}\mu\boldsymbol{\cdot}} \ r^{(k)i\dot{\gamma}\gamma} = \left[ \left( 1 - z\_\nu z\_\mu \right)^{-1} n\_{dvi}^{\gamma} n\_{d\mu j}^{\gamma} \right] \tag{35}$$

$$p\_{d\nu\mu}^{(k)ij\gamma} = \left(\sum\_{\nu,\mu} p\_{d\nu\mu}^{(k-1)ij\gamma}\right) \left[\left(1 - z\_{\nu}z\_{\mu}\right)^{-1} n\_{d\nu i}^{\gamma} n\_{d\mu j}^{\gamma}\right], k = 2, 3, \dots \infty,$$

$$\forall \nu, \mu, i, j = 1, 2, \dots \ n; \gamma = 1, 2, \dots m. \tag{36}$$

This proves the theorem's statement about the iterative spectral decomposition of the solution in the case of solution convergence. Let us show that under the conditions of Theorem 2, the convergence of the sequences is absolute and uniform. To this end, we construct a majorizing sequence for the elements of the sub-Gramian matrixes. Suppose that conditions (29)–(30) are satisfied. For all converging sequences' elements "*ij*", the following conditions must be satisfied:

$$\left| p\_d^{(k)ij\gamma} \right| \le \left| p\_d^{(k-1)ij\gamma} \right| \cdot \dots \le \left| p\_d^{\text{lln}(1)ij\gamma} \right| \le \underbrace{\max\_{ij}}\_{ij} \left| p\_d^{\text{lln}(1)ij\gamma} \right| = M\_{\text{max}}^{ij\gamma}$$

Let us introduce the notation *Mmax* = *max* "#\$% *ijγ Mij<sup>γ</sup> max*. For the matrix *<sup>N</sup><sup>γ</sup> <sup>d</sup>* , the exact upper

bound of the products exists:

$$m^2 \left| n\_{d\nu j}^\gamma \right| \left| n\_{d\mu\eta}^\gamma \right| \le \overline{N}, \ \overline{N} > 0, \ \forall \gamma, \nu, i, \mu, j: \ \gamma, \nu, i, \mu, j = 1, 2, \dots \dots n.$$

In addition, in addition, due to the stability of the linear part, the exact upper bound exists for the functions:

$$L = \underbrace{\max}\_{\nu\mu} \Big| \left(1 - z\_{\nu\max} z\_{\nu\max} \ast \right)^{-1} \Big|\_{\prime} \ L > 0, \ \forall \nu, \mu, \ \colon \ \nu, \mu = 1, 2, \dots \, n\_{\prime}$$

where *zνmax* is the maximum eigenvalue of the dynamics matrix of the system's linear part. Therefore, the following inequality holds:

$$\left| \frac{p\_{dij}^{chln(k+1)}}{p\_{dij}^{chln(k)}} \right| \le \overline{\mathcal{N}} L \,\forall \gamma, \nu, i, \mu, j \;:\ \gamma, \nu, i, \mu, j = 1, 2, \dots \; n.$$

We choose a single majorant for all numerical sequences in the form:

$$S\_{k} = M\_{\max}^{\dot{\eta}\dot{\gamma}} n^{2} \left[ \underbrace{\max}\_{\nu\mu} \left| \left( 1 - z\_{\nu\max} z\_{\nu\max} \right)^{-1} \right| \underbrace{\max}\_{\nu\mu\dot{\eta}\dot{\gamma}\gamma} \left| n\_{d\dot{\nu}i}^{\gamma} n\_{d\mu j}^{\gamma} \right| \right]^{k-1}.$$
 
$$k = 2, 3, \dots \infty.$$

Obviously, with such a choice, according to (36), the following inequality holds:

$$\left|p\_{d\upsilon\mu}^{(k)ij\gamma}\right| < S\_{k\ \prime} \; \forall k \; \imath\_{\ \prime} \; j\_{\ \prime} \; \nu\_{\ \prime} \mu\_{\ \prime} \gamma. \tag{37}$$

It follows thereof that under conditions (29)–(30), the inequality

 

$$\left| \frac{p\_{dij}^{hln(k+1)}}{p\_{dij}^{hln(k)}} \right| \le \frac{S\_{k+1}}{S\_k} < 1,\tag{38}$$

is valid.

The majorizing sequence for all sub-Gramians of the bilinear system forms a geometric progression with positive terms. In accordance with the convergence criterion for geometric progressions, it converges if the following condition is satisfied:

$$\overline{\mathbf{M}} < 1.$$

In accordance with the Weerstrass test (37)–(38), the sequences of partial sums *p cbln*(*k*) *dij* converge uniformly and absolutely. The uniqueness of the iterative solution under conditions (1)–(3) was proved in [5].

The Gramians method can be used simultaneously for state monitoring and control of large-scale power systems, in particular, for static stability analysis, for developing stability estimators, detecting dangerous free and forced oscillations, and assessing the resonant interaction of dangerous oscillations [1,7–10].

Algorithm of the spectral iterative solution of the generalized Lyapunov equation of the form (12) is as follows:

Step 1. Calculate the spectrum of the dynamics matrix of the linear part, check the stability of the linear part, the absence of multiple roots of the characteristic equation. Find a nondegenerate coordinate transformation that transforms the dynamics matrix of the linear part into a diagonal matrix. Let us transform the equations of the bilinear system (9) to the diagonal form.

Step 2. Check the fulfillment of conditions (1)–(4) of Theorem 1.

Step 3. By analyzing conditions (4), we identify the numerical sequences of elements of the matrixes of the kernels of the spectral expansion of the matrixes of the solution of the generalized Lyapunov equation, which are critical from the point of view of convergence. Step 4. Using algorithm (34), we compute the sequences of elements "ij" of the matrixes of the kernels of the Gramian expansion of the bilinear system at each step. We aggregate the elements of the sequences into the matrixes of the kernels of the decomposition of the bilinear system Gramian. We estimate the accuracy of the solution.

Step 5. Using Formula (32), we calculate the Gramian matrix of the original bilinear system.

Comment. In [3,5,20,24], various versions of the generalized Lyapunov equation solutions are proposed using conditions (1)–(3) given in Theorem 1., but the similarity transformation of the dynamics matrix of the linear part to the diagonal form is not used, and the separable spectral decomposition is not used solutions of the Lyapunov equation of the linear part and the generalized Lyapunov equation for a bilinear system. Such a technique allows one to switch from calculating decision matrixes at separate iterations to calculating sequences of their elements.

As is known [2], the necessary and sufficient condition for energy stability of the system in terms of the square of the *H*<sup>2</sup> norm of the linear system transfer function G(z) has the form:

$$\|\mathbf{G}(\mathbf{z})\|\_2^2 = \text{tr}\mathbf{C}P^\varepsilon\mathbf{C}^\mathrm{T} = \text{tr}B^\mathrm{T}P^\sigma B < +\infty.$$

Therefore, we define the stability loss risk functional of a bilinear system as:

$$\mathbf{J}(z\_1, z\_2, \dots, z\_n) = \text{tr} \mathbf{C} P^{\text{ch} \ln} \mathbf{C}^\text{T} = \text{tr} B^\text{T} P^{\text{oh} \ln} \mathbf{B}.\tag{39}$$

As the system approaches the stability threshold caused by the approaching of the characteristic equation roots to the imaginary axis, the risk functional approaches the infinity. Let us define the acceptable risk of stability loss of the bilinear system as:

$$J^{(\gamma)}(z\_1, z\_2, \dots, z\_n, \gamma) = M\_{\gamma prm\prime\prime}, \gamma = 1, 2, \dots, m.$$

We will consider any system as conditionally *unstable* if all its roots lie in a unit circle, but the functional of the stability loss risk (39) exceeds the established acceptable risk value. Accordingly, we will consider the system conditionally stable if:

$$J^{(\gamma)}(z\_1, z\_2, \dots, z\_n, \gamma) \prec M\_{\gamma \text{perm.}}\,\gamma = 1, 2, \dots, m. \tag{40}$$

The inequalities (40) define a set of energy functionals, the boundedness whereof guarantees the BIBO stability of the bilinear system. Conditions (1)–(4) of the theorem are sufficient conditions for the BIBO stability of the bilinear system and, at the same time, sufficient conditions for the boundedness of the energy functionals *J*(*γ*).

It is easy to see that inequalities (40) determine the stability conditions for a bilinear energy system in a wider range as against the traditional selective modal analysis. The analysis of expressions (37), (38) shows that the elements of the numerical sequences for the Gramian of the bilinear system converge at different rates, the guaranteed estimate whereof is specified by expressions (29)–(30).

This estimate depends on the choice of the "*γ*" channel, on the values of the elements of the nonlinearity matrixes for a specific channel, and on the proximity of the product of two eigenvalues of the linear part to unity. Sufficient conditions for BIBO stability of a bilinear system were obtained earlier in [3,5,24]. Theorem 1 establishes additional sufficient conditions (29)–(30) that guarantee the existence of not only a matrix for the solution of the generalized Lyapunov equation, but also of complete controllability and observability properties for the bilinear system.

#### **6. Case Studies**

The increased requirements for speed, accuracy and control capabilities under conditions of uncertainty in the presence of various kinds of disturbances in the control systems of production processes in industry and the electric power industry have demonstrated the inadequacy of the capabilities of traditional approaches to the synthesis of automatic control systems. Methods of identification synthesis, in which models are developed on the basis of data mining and machine learning, are gaining more and more popularity [18].

The authors have developed an intelligent system designed to dynamically assess the state of facilities in the power system [34]. The system is underpinned by intelligent algorithms of grid dynamics identification with automatic on-line self-tuning based on the data from monitoring systems.

State estimation models for power facilities with on-line model tuning are based on data monitoring and application of a predictive method for state estimation—the associative search method.

The acquisition, storage, processing, displaying, analysis and documenting of the information are executed in real time based on the data from automated power generation, distribution and consumption systems and supervisory control, monitoring and accounting. Figure 1 demonstrates power dynamic estimation for a certain facility in the power system.

**Figure 1.** Power dynamic estimation.

In Figure 1, we have:


Figure 1 shows how a more accurate estimate of a real process dynamics can be obtained using the associative search model, compared with the classical linear models.

#### **7. Conclusions**

Predictive bilinear models of discrete dynamical systems are obtained using the associative search algorithm. The method is based on the use of machine learning procedures and inductive knowledge (associative patterns) extraction from historical data. The method features high algorithmic speed, since the main computational load falls on the training stage.

According to the proposed scheme, we, at first, obtain a bilinear model of a nonlinear dynamic object, and then analyze the stability. The advantages of the scheme are the accuracy and speed of the identification algorithm. Section 6 demonstrates the operation of the associative search algorithm. It shows that for nonlinear systems the models obtained through this algorithm are more accurate as against the ones obtained using traditional linear techniques.

Furthermore, according to our scheme, separable spectral expansions of discrete Lyapunov equations are obtained for MISO LTI discrete dynamical systems governed by state equations in controllability and observability forms. A method and an algorithm for the element-wise solution of the generalized matrix Lyapunov equation are developed for discrete bilinear systems. The new method is a spectral version of the well-known iterative method used for solving this equation.

A sequence of values is calculated for a fixed element of the solution matrix. The element depends on the eigenvalues product of the dynamics matrix of the linear part and the elements of the nonlinearity matrixes. A sufficient condition for the convergence of all sequences is obtained, which is also a BIBO stability condition for a bilinear system.

The article discusses MIMO, MISO, and SISO classes of bilinear systems of the form (10) but does not consider bilinear systems with distributed parameters. In the future, the authors intend to extend the new method over this class of systems. Time-variant systems will be also investigated.

**Author Contributions:** Conceptualization, N.B. and I.Y.; methodology, N.B. and I.Y.; formal analysis, N.B. and I.Y.; investigation, N.B. and I.Y.; writing—original draft preparation, N.B. and I.Y.; writing—review and editing, N.B. and I.Y.; visualization N.B. and I.Y.; supervision, N.B. and I.Y.; project administration, N.B. and I.Y.; funding acquisition, N.B. and I.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** The APC was funded by the Russian Science Foundation. This work was supported by the Russian Science Foundation project no. 19-19-00673 and by the Russian Foundation for Basic Re-search (RFBR), project number 21-57-53005.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Maximum-Likelihood-Based Adaptive and Intelligent Computing for Nonlinear System Identification**

**Hasnat Bin Tariq 1, Naveed Ishtiaq Chaudhary 1, Zeshan Aslam Khan 1, Muhammad Asif Zahoor Raja 2,\*, Khalid Mehmood Cheema <sup>3</sup> and Ahmad H. Milyani <sup>4</sup>**


**Abstract:** Most real-time systems are nonlinear in nature, and their optimization is very difficult due to inherit stiffness and complex system representation. The computational intelligent algorithms of evolutionary computing paradigm (ECP) effectively solve various complex, nonlinear optimization problems. The differential evolution algorithm (DEA) is one of the most important approaches in ECP, which outperforms other standard approaches in terms of accuracy and convergence performance. In this study, a novel application of a recently proposed variant of DEA, the so-called, maximum-likelihood-based, adaptive, differential evolution algorithm (ADEA), is investigated for the identification of nonlinear Hammerstein output error (HOE) systems that are widely used to model different nonlinear processes of engineering and applied sciences. The performance of the ADEA is evaluated by taking polynomial- and sigmoidal-type nonlinearities in two case studies of HOE systems. Moreover, the robustness of the proposed scheme is examined for different noise levels. Reliability and consistent accuracy are assessed through multiple independent trials of the scheme. The convergence, accuracy, robustness and reliability of the ADEA are carefully examined for HOE identification in comparison with the standard counterpart of the DEA. The ADEA achieves the fitness values of 1.43 <sup>×</sup> <sup>10</sup>−<sup>8</sup> and 3.46 <sup>×</sup> <sup>10</sup>−<sup>9</sup> for a population size of 80 and 100, respectively, in the HOE system identification problem of case study 1 for a 0.01 nose level, while the respective fitness values in the case of DEA are 1.43 <sup>×</sup> <sup>10</sup>−<sup>6</sup> and 3.46 <sup>×</sup> <sup>10</sup>−7. The ADEA is more statistically consistent but less complex when compared to the DEA due to the extra operations involved in introducing the adaptiveness during the mutation and crossover. The current study may consider the approach of effective nonlinear system identification as a step further in developing ECP-based computational intelligence.

**Keywords:** adaptive differential evolution; evolutionary computing; Hammerstein; nonlinear system identification

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

#### *1.1. Background and Motivation*

System identification or parameter estimation involves the approximation of unknown variables of the system, and this concept provides the foundation for solving different engineering, science and technology problems [1]. Most real-time systems are nonlinear and complex in nature. There are many applications for nonlinear systems in science and engineering, such as the inverted pendulum system [2], motion control of a motor driven robot [3], average dwell-time switching [4], tail-control missile system [5], and weather station systems [6].

**Citation:** Tariq, H.B.; Chaudhary, N.I.; Khan, Z.A.; Raja, M.A.Z.; Cheema, K.M.; Milyani, A.H. Maximum-Likelihood-Based Adaptive and Intelligent Computing for Nonlinear System Identification. *Mathematics* **2021**, *9*, 3199. https:// doi.org/10.3390/math9243199

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 11 November 2021 Accepted: 9 December 2021 Published: 11 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Nonlinear systems can be described through different nonlinear models, including the Volterra series [7], Wiener series [8], NARMAX model [9], Wiener model [10–12] and Hammerstein model [13], etc. Researchers revealed the strong relations between different nonlinear models and the Volterra series [14]. Sidorov et al. contributed significantly in the theory and applications of Volterra equations by proposing different methods [15–17] and exploring the applications in power system operations and energy storage systems [18]. The Volterra series can also represent the Hammerstein model, and Kibangou et al. [19] described Hammerstein model through Volterra series representation and identified the coefficients of the Hammerstein model using the Volterra series. The Hammerstein model has a simpler structure with easier identification than the Volterra series [20]. Therefore, the Hammerstein model is often used to represent a wide class of nonlinear systems [21–24].

The Hammerstein structure, presented in Figure 1, belongs to a class of input nonlinear systems (INL) where a nonlinear block is cascaded with a linear block. Different local and global search algorithms were proposed for the identification of INL models. Local search algorithms are easy to implement but prone to become stuck in local minima. Local search algorithms include the key term separation technique for the parameter estimation of the Hammerstein-controlled autoregressive system [25]; impulse response, constrained, leastsquare support vector machine modeling for multiple-input and multiple-output Hammerstein system identification [26]; fractional calculus-based adaptive techniques [27,28]; and the parameter estimation problems of input-nonlinear-output error autoregressive systems, based on the key variable separation technique and the auxiliary model-based identification [29], whereas global search techniques effectively handle the local minima issues. The global search methods based on evolutionary and swarm optimization heuristics are effectively applied for the parameter estimation of different-input nonlinear systems, such as genetic algorithms, which are used for the parameter estimation of nonlinear, Hammersteincontrolled autoregressive systems [30]. Meta-heuristic computing techniques are used for the parameter estimation of Hammerstein-controlled auto-regressive, moving-average systems using differential evolution, genetic algorithms, pattern searches and simulated annealing algorithms [31]. Evolutionary computational heuristics are presented for the parameter estimation problem of nonlinear Hammerstein-controlled, auto-regressive systems through a global search competency of the backtracking search algorithm, differential evolution, and genetic algorithms [32]. The neural networks and fuzzy-logic-based, computational, intelligent approaches are also used to solve complex system identification problems [33–37].

**Figure 1.** Block Diagram of INL systems.

The DEA was also effectively applied to INL systems, and it showed better results than its standard counterparts [38]. Recently, a new variant of the DEA called the maximumlikelihood-based adaptive DEA (ADEA) was proposed [39] for linear systems. The ADEA showed an improved performance compared to the standard DAE in terms of convergence speed and accuracy. The increasing complexity of nonlinear systems requires a continuous search for more accurate and reliable computing algorithms. Thus, the enhanced performance of the DEA and ADEA inspired authors to investigate the behavior of these algorithms for effective INL system identification.

#### *1.2. Objectives and Contribution*

In this study, the performance of the DEA and ADEA in terms of correctness, robustness, and convergence, is examined for different nonlinearities, as well as noise levels, in INL systems. The most important contributions of this study are as follows:


#### *1.3. Paper Outline*

The rest of the paper is presented as follows: the INL-based system model of the Hammerstein output error (HOE) structure is given in Section 2. The differential evolution based proposed schemes are presented in Section 3. The simulation results for two case studies of HOE systems are provided in Section 4. The main conclusions and some future research directions are listed in Section 5.

#### **2. Mathematical Model of HOE Systems**

Figure 2 shows the block diagram of the HOE model [40].

**Figure 2.** Mathematical structure of HOE system.

The input–output relation of HOE system in Figure 2 is represented as:

$$y(t) = u(t) + v(t) \tag{1}$$

where *y*(*t*) represents the systems' output, *v*(*t*) denotes the additive noise, and *u*(*t*) denotes noise-free output, defined as:

$$
\mu(t) = \frac{\mathcal{C}(z)}{D(z)} \,\,\overline{s}(t) \tag{2}
$$

*s*(*t*) shows the nonlinear block's output and is defined as a nonlinear function of the system input *s*(*t*) with a known basis: *γ*1, *γ*2,..., *γm*,

$$\overline{s}(t) = f(s(t)) = e\_1\gamma\_1(s(t)) + e\_2\gamma\_2(s(t)) + \dots + e\_m\gamma\_m(s(t))\tag{3}$$

or:

$$\overline{s}(t) = \sum\_{j=1}^{m} e\_j \gamma\_j(s(t)) \tag{4}$$

Substituting (3) in (2) yields:

$$u(t) = \frac{\mathbb{C}(z)}{D(z)} e\_1 \gamma\_1(s(t)) + e\_2 \gamma\_2(s(t)) + \dots + e\_m \gamma\_m(s(t))\tag{5}$$

where *D*(*z*) and *C*(*z*) represents the polynomials with shifting operator as: *z*−1[*z*−<sup>1</sup> *y*(*t*) = *y*(*t* − 1)]

$$\begin{array}{c} D(z) = 1 + d\_1 z^{-1} + d\_2 z^{-2} + \dots + d\_n z^{-n} \\ C(z) = c\_1 z^{-1} + c\_2 z^{-2} + \dots + c\_n z^{-n} \end{array} \tag{6}$$

The output of the HOE can be expressed in terms of information and parameter vectors, where the information vector containing the input and output delay terms is denoted by *w*(*t*) and the corresponding parameter vector of the HOE is defined as [40]:

$$\boldsymbol{\Theta} = [d, c, e]^{\mathsf{T}} \boldsymbol{\epsilon} \mathcal{R}^{n}$$

where *n*<sup>0</sup> = 2*n* + *m* and the variables in the parameter vector are:

$$d = \begin{bmatrix} d\_{1\prime} \ d\_{2\prime} \dots \dots \dots \dots \dots \dots d\_{n} \end{bmatrix}^{\mathrm{T}} \in \mathcal{R}^{n}$$

$$\mathbf{c} = \begin{bmatrix} \mathbf{c}\_{1\prime} \ \mathbf{c}\_{2\prime} \dots \mathbf{c}\_{n} \dots \dots \dots \dots \mathbf{c}\_{n} \end{bmatrix}^{\mathrm{T}} \in \mathcal{R}^{n}$$

$$\mathbf{c} = \begin{bmatrix} \mathbf{c}\_{1\prime} \ \mathbf{c}\_{2\prime} \dots \mathbf{c}\_{n} \dots \dots \dots \dots \mathbf{c}\_{m} \end{bmatrix}^{\mathrm{T}} \in \mathcal{R}^{m}$$

The block diagram of the identification of the nonlinear system modelled through the block-oriented HOE structure shown in Figure 2, by using the proposed evolutionary algorithms, is shown in Figure 3. The objective is to minimize the error *z*(*t*) between the desired response and the estimated response by exploiting the proposed evolutionary computing approach, such that *y*(*t*) approaches *y*ˆ(*t*).

**Figure 3.** Identification model of nonlinear systems.

#### **3. Proposed Methodology**

In this section, the proposed methodology based on the DEA and ADEA with maximum-likelihood criteria are presented for the identification of the HOE system given in Section 2.

#### *3.1. Differential Evolution Algorithm (DEA)*

The DEA is one of the most broadly exploited algorithms in ECP, developed by Rainer Stron and Kenneth Price in 1995 [41]. This is a population-based algorithm which has the ability to solve global optimum problems. Due to its usefulness and efficiency, this algorithm is applied to various problems, such as the parameter estimation of Hammerstein control autoregressive systems [38], deep belief network [42], effective long short-term memory for electricity price prediction [43], parameter estimation of solar cells [44], effective electricity energy consumption forecasting using an echo state network [45]. In this study, a recently introduced maximum-likelihood-based adaptive DEA is exploited for HOE identification and the maximum-likelihood-based DEA is used for the purpose of comparison [39]. The flowchart describing the main steps of the DEA is presented in Figure 4.

**Figure 4.** Flowchart of the DEA.

#### *3.2. Adaptive Differential Evolution Algorithm (ADEA)*

Different variants of the DEA are proposed through introducing the adaptivity in the process of mutation and crossover. These adaptive DEA variants are effectively exploited to solve many nonlinear problems, such as those involving photovoltaic models and other optimization problems [46–54]. The main steps of the adaptive DEA are similar to the simple DEA that starts from population initialization, mutation, crossover, selection and then termination. The only aspect which differs is the adaptiveness factor of both the mutation and crossover processes. Recently, a maximum-likelihood-criterion-based adaptive DEA, i.e., ADEA, was proposed, where the fitness value is calculated by the maximumlikelihood-criterion function [39]. In ADEA, the values of mutation and crossover process change automatically according to the generation (T). The pseudocode of the ADEA is presented in Algorithm 1, whereas the stepwise mechanism involved in the learning of the ADEA is as follows:

Step 1. Initialization:

Set the generation t = 1 and set the initial population *Pi*,0.

Given the population size Np, the mutation Factor F and the maximum generation T. Step 2. Data Collection:

Collect the calculated data {*wi* (1), *wi* (2), . . . , *wi* (N)} and {*yi* (1), *yi* (2), . . . , *yi* (N)}.

Step 3. Adaptive Mutation Operation:

Calculate the mutation vector *Vi*.*a*,*<sup>t</sup>* using L = exp(1 <sup>−</sup> *<sup>T</sup> <sup>T</sup>*+1−*<sup>t</sup>* ); AMF = F · <sup>2</sup>*L*;

*Vi*.*a*,*<sup>t</sup>* = *pi*.*X*1,*t*−<sup>1</sup> + AMF · (*pi*.*X*2,*t*−<sup>1</sup> − *pi*.*X*3,*t*−1);

Step 4. Adaptive Crossover Operation:

Read *Vi*.*a*,*j*,*<sup>t</sup>* from mutation vector *Vi*.*a*,*<sup>t</sup>* = [*Vi*.*a*,1,*t*, *Vi*.*a*,2,*t*,..., *Vi*.*a*,*D*,*t*] *<sup>T</sup>*; and read *pi*.*a*,*j*,*t*−<sup>1</sup> from target vector *<sup>p</sup>i*.*a*,*t*−<sup>1</sup> <sup>=</sup> [*pi*.*a*,1,*t*−1, *pi*.*a*,2,*t*−1,..., *pi*.*a*,*D*,*t*−<sup>1</sup> ] *<sup>T</sup>*; to create the crossover vector *Ui*.*a*,*t*.

For t = 1, the adaptive crossover probability Pc will be Pc = <sup>1</sup>+cos(*t*) <sup>2</sup> ; and for t = 2l the adaptive crossover probability Pc will be Pc = <sup>1</sup>+cos(*t*−1) <sup>2</sup> ;

Step 5. Selection Procedure:

Compute the maximum-likelihood criterion function of *<sup>U</sup>i*.*a*,*<sup>t</sup>* and *<sup>p</sup>i*.*a*,*t*−<sup>1</sup> using the equations:

 $\mathbf{J}(\mathbf{U}\_{i.a,t}) = \frac{1}{N} \sum\_{t=1}^{N} \left[ y\_i(t) - \mathbf{w}\_i^T(t) \mathbf{U}\_{i.a,t} \right]^2;$  $\mathbf{J}(\mathbf{p}\_{i.a,t-1}) = \frac{1}{N} \sum\_{t=1}^{N} \left[ y\_i(t) - \mathbf{w}\_i^T(t) \mathbf{p}\_{i.a,t-1} \right]^2;$ 


#### **Algorithm 1 Pseudo-code of the ADEA**

**Input:** Collect data {*wi* (1), *wi* (2), . . . , *wi* (N)} and {*yi* (1), *yi* (2), . . . , *yi* (N)}. Given the population size Np, the mutation factor F and maximum generation T. Let the generation t = 1. **Output:** *pi*,*best*,*<sup>t</sup>*

```
(1) for a = 1 : Np do
(2) for j = 1: D do
(3) pi.a,j,0 = rand(0, 1)
(4) end
(5) pi.a,0 = 
                pi.a,1,0, pi.a,2,0,..., pi.a,D,0 T
(6) end
(7) Pi,0 =

         pi.1,0, pi.2,0,..., pi.N p,0 T
(8) for t=1:T do
(9) for i = 1 : Np do
(10) X1 = randperm(Np, 1)
(11) while X1 = p do
(12) X1 = randperm(Np, 1)
(13) end
(14) X2 = randperm(Np, 1)
(15) while X2 = p or X2 = X1 do
(16) X2 = randperm(Np, 1)
(17) end
(18) X3 = randperm(Np, 1); while X3 = p or X3 = X1 or X3 = X2 do
(19) X3 = randperm(Np, 1)
(20) end
(21) L = exp (1 − T T+1−t ); AMF = F · 2L; Vi.a,t = pi.X1,t−1 + AMF · (pi.X2,t−1 − pi.X3,t−1)
(22) if t = 1 or t = 2l then
(23) Pc = 1+cos(t)
                 2
(24) else
(25) Pc = 1+cos(t−1)
                  2
(26) end
(27) Vi.a,t = 
          Vi.a,1,t, Vi.a,2,t,..., Vi.a,D,t
                                T; pi.a,t−1 = 
                                             pi.a,1,t−1, pi.a,2,t−1,..., pi.a,D,t−1
                                                                         T
(28) for j=1:D do
(29) if rand(0, 1)  Pc or j = randperm(D, 1) then
(30) Ui.a,j,t = Vi.a,j,t
(31) else
(32) Ui.a,j,t = pi.a,j,t−1
(33) end
(34) end
```
#### **Algorithm 1 Pseudo-code of the ADEA**

```
(35) Ui.a,t = 
               Ui.a,1,t, Ui.a,2,t,..., Ui.a,D,t
                                             T
(36) J(Ui.a,t) = 1
                N ∑N
                     t=1 [yi(t) − wT
                                     i (t)Ui.a,t]
                                                2
(37) J(pi.a,t−1) = 1
                  N ∑N
                       t=1 [yi
                              (t) − wT
                                       i (t)pi.a,t−1]
                                                     2
(38) if J(Ui.a,t) > J(pi.a,t−1) then
(39) pi.a,t = Ui.a,t
(40) else
(41) pi.a,t = pi.a,t−1
(42) end
(43) J(pi.a,t) = 1
                N ∑N
                    t=1 [yi(t) − wT
                                    i (t)pi.a,t]
                                               2
(44) pi.best,t = arg minpi.a,t J(pi.a,t)
(45) End
```
#### **4. Simulation and Performance Analyses**

This section includes the simulation results of two case studies for HOE system identification using DEA and ADEA. The simulations for both algorithms are performed in MATLAB. The identification of HOE systems is performed by considering different noise levels, as well as various sets of generation size and diverse population size, while the results are presented in a variety of convergence graphs and multiple statistical analyses. The input *s*(*t*) for this system is taken as a zero mean and unit variance, while noise *v*(t) is an additive noise. The performance of algorithms in terms of convergence speed, accuracy, robustness and reliability is evaluated through fitness function formulation. The equation for the fitness function is given below:

$$\text{Fitness} = \text{mean}(\mathbf{y} - \mathbf{\hat{y}})^2$$

where **y** represents the desired response and **y**ˆ is the estimated response through proposed evolutionary algorithms. The optimal parameter settings for DAE and ADAE technique are presented in Table 1.


**Table 1.** Parameter settings of DEA and ADEA.

#### *4.1. Case Study 1*

The desired response for case study 1 of the HOE system is obtained through a set of parameters taken in [40]. The performance of the ADEA is assessed by considering two type of nonlinearities and different noise levels in the HOE system. The performances of DEA and ADEA in terms of fitness are initially investigated for the variable size of generations (400, 600, 800) and populations (50, 100, 150). The detailed results with polynomial-type nonlinearity are presented in Table 2. It is observed from the fitness values in Table 2 that, for the given generations size (400, 600, 800), the fitness of both algorithms decreases with the increase in population size. Furthermore, both methods achieved minimum fitness values for largest generation sizes. It is observed that the ADEA showed an improved performance compared to the DEA for almost all generations and population sizes. The best fitness achieved by ADEA for 800 generations with 150 population size is 7.08 <sup>×</sup> <sup>10</sup><sup>−</sup>15.


**Table 2.** Comparison of DEA and ADEA with respect to generations and population size for polynomial-type nonlinearity in case study 1.

The fitness-based learning curves for different generations and population sizes with polynomial-type nonlinearity are shown in Figure 5. Figure 5a–c represent the learning curves for DEA with different generations and population size, whereas Figure 5d–f denote learning curves for ADEA with variations in generations and population size. Figure 5a–c show that the DEA aachieved a fast and accurate convergence for a large number of generations and population sizes, but a slight difference in convergence is observed for DEA until 100 generations with different populations are reached. Likewise, Figure 5d–f show that the ADEA also accomplished minimum fitness values for more generations and populations.

**Figure 5.** *Cont*.

**Figure 5.** Fitness plots of DEA vs. ADEA, with respect to generations and population size for polynomial-type nonlinearities in case study 1.

The performance of DAE and ADAE is further examined for three noise variances (0.09, 0.05, 0.01) with fixed population sizes, (50, 80, 100) and a changing size of generations. To analyze the methods in terms of optimal fitness, the results with polynomial-type nonlinearity are provided in Table 3 for three noise levels, as well as three different populations. It is witnessed from the fitness values in Table 3 that both the DEA and ADEA accomplished a significant performance in terms of fitness for small values of noise variances along with a different number of populations. However, both methods did not perform significantly for higher noise variances with different population sizes. The optimal fitness achieved by both DEA and ADEA with a noise variance of 0.01 and population size of 100 is 2.12 <sup>×</sup> <sup>10</sup>−<sup>6</sup> and 8.69 <sup>×</sup> <sup>10</sup>−10, respectively.


**Table 3.** Comparison of DEA and ADEA with respect to noise variance and fixed population size for polynomial-type nonlinearity in case study 1.

The learning curves for the fitness achieved with polynomial-type nonlinearity, three noise variances and three population variations are presented in Figure 6. Learning curves for DEA and ADEA are shown in Figures 6a–c and 6d–f, respectively. Figure 6a–c show that the convergence and steady-state performance of the DEA increases with the reducing population size, noise variance and generations and vice versa. Similar behavior was shown by the ADEA for lower noise levels and smaller population sizes. Moreover, ADEA accomplishes optimal fitness for generations, twice that of the DEA.

**Figure 6.** Fitness plots of DEA vs. ADEA with respect to noise variance and population size for polynomial-type nonlinearities in case study 1.

The fitness with regard to MSE for both DEA and ADEA is also evaluated by introducing sigmoidal-type nonlinearity to the HOE system. The investigations are made for different populations (50, 100, 150) and generations (400, 600, 800). The comparison of fitness results between DEA and ADEA for the HOE system under consideration with sigmoidal-type nonlinearity are shown in Table 4. It is seen from the MSE results shown in Table 4 that the performances of DEA and ADEA increase with the increase in population size for various generations (400, 600, 800). The optimal fitness of both algorithms is accomplished for the largest values of generations and populations. The relative performance, in

terms of minimum value of fitness achieved by both methods with respect to particular generations and populations, is not consistent. The minimum fitness attained by DEA and ADEA with a maximum number of generations and populations is 6.26 <sup>×</sup> <sup>10</sup>−<sup>11</sup> and 3.24 <sup>×</sup> <sup>10</sup>−9, respectively.

**Generations (T) Population Size (Np) DEA Fitness ADEA Fitness** 400 <sup>50</sup> 7.14 <sup>×</sup> <sup>10</sup>−<sup>5</sup> 1.77 <sup>×</sup> <sup>10</sup>−<sup>4</sup> <sup>100</sup> 9.44 <sup>×</sup> <sup>10</sup>−<sup>6</sup> 1.37 <sup>×</sup> <sup>10</sup>−<sup>5</sup> <sup>150</sup> 8.89 <sup>×</sup> <sup>10</sup>−<sup>7</sup> 1.41 <sup>×</sup> <sup>10</sup>−<sup>4</sup> 600 <sup>50</sup> 2.08 <sup>×</sup> <sup>10</sup>−<sup>5</sup> 1.18 <sup>×</sup> <sup>10</sup>−<sup>4</sup> <sup>100</sup> 3.82 <sup>×</sup> <sup>10</sup>−<sup>7</sup> 3.40 <sup>×</sup> <sup>10</sup>−<sup>8</sup> <sup>150</sup> 8.54 <sup>×</sup> <sup>10</sup>−<sup>8</sup> 8.18 <sup>×</sup> <sup>10</sup>−<sup>9</sup> 800 <sup>50</sup> 1.10 <sup>×</sup> <sup>10</sup>−<sup>5</sup> 1.21 <sup>×</sup> <sup>10</sup>−<sup>7</sup> <sup>100</sup> 5.17 <sup>×</sup> <sup>10</sup>−<sup>7</sup> 4.63 <sup>×</sup> <sup>10</sup>−<sup>8</sup> <sup>150</sup> 6.26 <sup>×</sup> <sup>10</sup>−<sup>11</sup> 3.24 <sup>×</sup> <sup>10</sup>−<sup>9</sup>

**Table 4.** Performance comparison of DEA and ADEA with regard to generations and population size for sigmoidal-type nonlinearity in case study 1.

The learning plots representing fitness for sigmoidal-type nonlinearity with variations of generation and population are shown in Figure 7. The fitness curves for DEA are shown in Figure 7a–c and the learning curves for ADEA are given in Figure 7d–f. A similar trend in performance of DEA and ADEA is noticed from Figure 7a–f regarding convergence speed and final estimated accuracy. Both methods exhibit fast convergence for smaller population and generation size. However, they have achieved optimal fitness for bigger values of population and generation.

**Figure 7.** *Cont*.

**Figure 7.** Fitness curves of DEA vs. ADEA with regard to generations and population size for sigmoidal-type nonlinearities in case study 1.

The behavior of DEA and ADEA methods in terms of minimal fitness achieved with sigmoidal-type nonlinearity for the HOE system is also assessed by fixing the three population sizes, i.e., [50, 80, 100] against three noise variances [0.09, 0.05, 0.01] and different generations set. The optimal fitness attained for different noise levels and population sizes are presented in Table 5. It is observed that both DEA and ADEA performed well for smallest value of noise by obtaining optimal fitness of 1.67 <sup>×</sup> <sup>10</sup>−<sup>7</sup> and 3.46 <sup>×</sup> <sup>10</sup>−9, respectively.


**Table 5.** Performance comparison of DEA and ADEA with regard to noise variance and population size for sigmoidal-type nonlinearity in case study 1.

Figure 8 shows the fitness-based learning curves with sigmoidal-type nonlinearity for various noise levels, population sizes, and different generations. The learning curves for DEA, shown in Figure 8a–c, demonstrate that DEA performs effectively in terms of convergence rate for low noise variances with the maximum number of populations. DEA achieves a fast convergence by increasing the generation size up to 200, whereas the graphs in Figure 8d–f show a fast convergence rate for ADEA up to 600 generations with low noise levels, e.g., 0.01, and a small population size, e.g., 50.

#### *4.2. Case Study 2*

The desired response for case study 2 of the HOE system is obtained through a set of parameters taken in [55]. The performance of the ADEA is assessed by considering two type of nonlinearities and different noise levels in the HOE system.

**Figure 8.** Fitness curves of DEA vs. ADEA with regard to noise variance and population size for sigmoidal-type nonlinearities in case study 1.

In case study 2, the methods DEA and ADEA are assessed for various populations, (50, 100, 150) and generations (400, 600, 800) using two types of nonlinearities: polynomial and sigmoidal. The performance outcomes of DEA and ADEA for polynomial- and sigmoidaltype nonlinearities are shown in Tables 6 and 7 respectively. Tables 6 and 7 show that the performance of both DEA and ADEA for different numbers of generations increases for both types of nonlinearities with an increase in the population size. Moreover, the best performance of both methods is achieved for a larger generation size. For polynomial-type nonlinearity, the minimum fitness values achieved by DEA and ADEA are 6.32 <sup>×</sup> <sup>10</sup>−<sup>19</sup>

and 6.84 <sup>×</sup> <sup>10</sup>−12, respectively, whereas the minimum fitness values accomplished by DEA and ADEA for sigmoidal-type nonlinearity are 7.68 <sup>×</sup> <sup>10</sup>−<sup>12</sup> and 3.78 <sup>×</sup> <sup>10</sup>−12, respectively.


**Table 6.** Comparison of DEA and ADEA with respect to generations and population size for polynomial-type nonlinearity.

**Table 7.** Comparison of DEA and ADEA with respect to generations and population size for sigmoidal-type nonlinearity in case study 2.


Figures 9 and 10 show fitness-based learning curves with different populations and generations for polynomial-type and sigmoidal-type nonlinearities, respectively. Figure 9 shows that both DEA and ADEA show fast convergence for a smaller population and generation size, but both methods obtained better steady-state performance for larger population and generation sizes. A similar performance trend was shown by DEA and ADEA in Figure 10 for sigmoidal-type nonlinearity.

To prove the robustness of DEA and ADEA, the performance of both techniques was evaluated for different noise variances (0.09, 0.05, 0.01), variable generation sizes and three population sizes (50, 80, 100). The optimal results achieved by DEA and ADEA with polynomial- and sigmoidal-type nonlinearities for three noise variances and populations are presented in Tables 8 and 9, respectively. It is seen from the fitness values shown in Tables 8 and 9 that both DEA and ADEA obtained optimal fitness values for the smallest value of noise level. Furthermore, the performance of both methods in terms of fitness is increased by increasing the population size for different noise levels. The optimum fitness values achieved by DEA and ADEA with polynomial-type nonlinearity and smallest value of noise (0.01) are 1.03 <sup>×</sup> <sup>10</sup>−<sup>6</sup> and 5.02 <sup>×</sup> <sup>10</sup>−10, respectively. However, the minimum fitness values accomplished by DEA and ADEA with a sigmoidal-type nonlinearity are 1.09 <sup>×</sup> <sup>10</sup>−<sup>9</sup> and 1.39 <sup>×</sup> <sup>10</sup>−7, respectively.

**Figure 9.** Fitness plots of DEA vs. ADEA with respect to generations and population size for polynomial-type nonlinearities in case study 2.

The fitness-based learning curves for polynomial- and sigmoidal-type nonlinearities with three noise variances, three populations and varying generation sizes are shown in Figures 10 and 11, respectively. Figures 10a–c and 11a–c represent the performance-based learning curves of DEA; Figures 10d–f and 11d–f denote the plots for ADEA with different settings. Figure 9 shows that the convergence rate of both DEA and ADEA with polynomialtype nonlinearity increases by increasing the population size and decreasing the noise level, as well as generation size, while both methods accomplished an optimum steady-state performance for the smallest value of noise, a larger population, and larger generation size. A similar performance was demonstrated by both methods for sigmoidal-type nonlinearity, as shown in Figure 12.

**Figure 10.** Fitness plots of DEA vs. ADEA with respect to generations and population size for sigmoidal-type nonlinearity in case study 2.


**Table 8.** Comparison of DEA and ADEA with respect to noise variance and fixed population size for polynomial-type nonlinearity.

**Table 9.** Comparison of DEA and ADEA with respect to noise variance and fixed population size for sigmoidal-type nonlinearities in case study 2.


**Figure 11.** *Cont*.

**Figure 11.** Fitness plots of DEA vs. ADEA with respect to noise variance and population size for polynomial-type nonlinearities in case study 2.

**Figure 12.** *Cont*.

**Figure 12.** Fitness plots of DEA vs. ADEA with respect to noise variance and population size for sigmoidal-type nonlinearities in case study 2.

#### *4.3. Statistical Study of DEA and ADEA*

The statistical investigations of DEA and ADEA for various numbers of runs, with different noise variances, fixed population sizes, and constant generation sizes are shown in Figure 13. It is witnessed from Figure 13a–c that, for all values of noise variances, ADEA is more convergent than DEA, and the optimal fitness achieved by ADEA is much better than that of DEA for all noise variances. It is also noticed that the performance of both DEA and ADEA only slightly degrades by increasing the noise level.

**Figure 13.** *Cont*.

**Figure 13.** Statistical analyses plots of DEA and ADEA for Np = 100, T = 500 and multiple noise variances.

It is observed from the detail results presented for the two case studies that the proposed evolutionary algorithms can be effectively utilized for nonlinear systems identification with polynomial- and sigmoidal-type nonlinearities. The proposed evolutionary algorithms identify the unknown HOE system through optimizing the fitness function that makes the difference between the desired and the estimated response approach to zero. However, the optimal fitness value is not required to correspond to the same set of parameters taken to generate the desired response since, in practical applications, only the desired response is available, rather than the set of parameters.

#### **5. Conclusions**

The following are the conclusions drawn from the extensive simulation results presented in the last section:

The evolutionary, computing, paradigm-based DEA and ADEA are effectively used for the nonlinear system identification of Hammerstein output error structures. The DEA and ADEA are accurate and convergent for different nonlinearities, based on polynomialand sigmoidal-type nonlinearities. The robustness of the DEA and ADEA is established for different levels of external disturbances. However, the accuracy of both algorithms decreases by increasing the noise level. The performance of both DEA and ADEA improves by increasing the population size and generation count, but at the cost of a higher computational budget. The reliable inferences regarding the performance of the DEA and ADEA are drawn through statistical analyses based on 20 independent executions of the algorithms. The convergence speed of the ADEA is slightly slower than the DEA due to the crossover and mutation adaptiveness factor. In comparison, the ADEA is more accurate and statistically consistent compared to the DEA, but at the cost of a little more complexity due to the extra operations involved in introducing the adaptiveness during the mutation and crossover steps. The presented study is a step further in the domain of nonlinear system identification through the use of intelligent computing based on evolutionary algorithms.

In future, the application of the proposed methodology can be investigated for solving nonlinear supply energy systems [56], industrial reactive distillation processes [57], power supply systems [58] and delivery systems [59]. Moreover, the other recently introduced evolutionary algorithms [60] and fuzzy predictive control [61–64] can be used for efficient nonlinear system identification.

**Author Contributions:** Conceptualization, N.I.C. and Z.A.K.; methodology, H.B.T., N.I.C. and M.A.Z.R.; software, H.B.T.; validation, M.A.Z.R. and Z.A.K.; resources, H.B.T., K.M.C. and A.H.M.; writing—original draft preparation, H.B.T.; writing—review and editing, N.I.C., Z.A.K. and M.A.Z.R.; supervision, N.I.C. and Z.A.K.; project administration, K.M.C. and A.H.M.; funding acquisition, K.M.C. and A.H.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **On Spectral Decomposition of States and Gramians of Bilinear Dynamical Systems**

**Alexey Iskakov \* and Igor Yadykin**

V.A. Trapeznikov Institute of Control Sciences of RAS, 117997 Moscow, Russia; jad@ipu.ru **\*** Correspondence: isk\_alex@mail.ru or iskalexey@gmail.com

**Abstract:** The article proves that the state of a bilinear control system can be split uniquely into generalized modes corresponding to the eigenvalues of the dynamics matrix. It is also shown that the Gramians of controllability and observability of a bilinear system can be divided into parts (sub-Gramians) that characterize the measure of these generalized modes and their interactions. Furthermore, the properties of sub-Gramians were investigated in relation to modal controllability and observability. We also propose an algorithm for computing the Gramians and sub-Gramians based on the element-wise computation of the solution matrix. Based on the proposed algorithm, a novel criterion for the existence of solutions to the generalized Lyapunov equation is proposed, which allows, in some cases, to expand the domain of guaranteed existence of a solution of bilinear equations. Examples are provided that illustrate the application and practical use of the considered spectral decompositions.

**Keywords:** bilinear systems; eigenmode decomposition; spectral expansions; generalized Lyapunov equation; Gramians; observability; controllability; small-signal analysis; numerical algorithm

#### **Citation:** Iskakov, A.; Yadykin, I. On Spectral Decomposition of States and Gramians of Bilinear Dynamical Systems. *Mathematics* **2021**, *9*, 3288. https://doi.org/10.3390/math9243288

Academic Editor: Jaume Giné

Received: 4 November 2021 Accepted: 10 December 2021 Published: 17 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Monitoring the state of various technical, social, and biological systems using nonlinear mathematical models and modern information technology is a widely relied upon trend in the development of modern civilization. An example is the state estimation and control in modern electric power systems. Renewable energy sources and distributed generation, electric vehicles and charging networks, and the increased use of power electronics pose new challenges for the monitoring and controling of complex oscillations in energy systems [1]. New problems require the development of new methods for the analysis of non-linear dynamic systems, including computational methods for their solutions.

Bilinear control systems represent an important class of non-linear systems, which are linear in inputs and states, but they are not linear in both. Research in the field of non-linear and "weakly non-linear" control systems described by the Volterra series dates back more than half a century. In [2], a theory of realization was developed, and structural decompositions of the Gramians of bilinear systems were investigated; furthermore, explicit representations of the Gramian of a bilinear system were obtained in the form of a Volterra series, and the conditions for its convergence were investigated. In [3,4], the multivariate Laplace transform was used to construct a solution for systems with smooth non-linearities. In [5], an iterative solution of the generalized Lyapunov equation was obtained, which was first used to analyze the state of an electric power system. It was shown that a solution to this equation exists if the linear part of the bilinear system is stable, and the input signal and non-linearity matrices are bounded in the norm. In [6], these results were generalized for multiple-input and multiple-output (MIMO) dynamical systems.

Research in the field of bilinear control systems is closely related to the problem of model order reduction (MOR) by constructing an approximating model of a lower dimension. Among the methods for solving this problem, we note balanced truncation, singular decomposition, the Krylov subspace method, optimal methods for the *H*2-norm of

Gramians, and hybrid methods. For most of the methods, iterative algorithms for their implementation have been developed, and conditions for the existence and uniqueness of the solution of the corresponding generalized Lyapunov equations have been established [6–9]. In these studies, the squared *H*2-norm of Gramians of the bilinear system was used, and its spectral expansions using singular values were obtained. To estimate the error between the full and reduced models, energy functionals were introduced, and the corresponding *H*2-norm optimal algorithms for the interpolation of bilinear systems were proposed.

Modal analysis and selective modal analysis are among the main methods for analyzing the stability of electric power systems with small deviations from the steady state. These methods involve identifying dominant weakly stable modes of the power system and are widely used in combination with other linear and non-linear analysis methods [1,10]. To assess bilinear effects in power systems analysis, the technique of normal forms [11], modal series methods [12], and bilinear approximation [13] are used. These methods consider the higher-order terms of the Taylor expansion in the system approximation and use normal Poincaré forms. In [14], a method was proposed for the fast computation of normal forms, considering the interaction of dominant modes. Ref. [15] proposes a hybrid method combining selective modal analysis and Koopman mode decomposition.

In contrast to these methods, in this study, we consider the spectral decomposition, not for the instantaneous dynamics of state variables, but for the Lyapunov functions, which characterize the *L*2-norms of variables or signals in the time domain. This approach allows us to consider the non-linear effects associated with the accumulation of influence over time. For linear dynamic systems, Lyapunov functions are usually associated with the controllability and observability Gramians, which characterize the integrated energy of the input and output signals. The concept of Gramians was further generalized and interpreted for deterministic bilinear systems using energy functionals [16]. For linear systems, ref. [7] obtained singular expansions for infinite Gramians of controllability and observability based on the diagonalization of the dynamics matrix. A more general form of the spectral decomposition of Gramians into components (sub-Gramians) corresponding to the individual eigenvalues of the system or their pairwise combinations was proposed in [17,18]. In [19], the spectral expansions for the Gramians of controllability and observability were generalized to the case of bilinear continuous systems.

The purpose of this study is to develop and provide a rationale for the application of the spectral expansions of the Gramians proposed in [19] for the analysis and monitoring of bilinear systems. As the state of a bilinear system is not the sum of eigenmodes as in the linear case, a number of important theoretical questions arise. How should eigenmodes be viewed and interpreted in a bilinear system? What interpretation can be given to the spectral expansions of the Gramians in [19]? What is their connection with the expansion of the Gramians in linear systems?

#### *Main Contribution*

As spectral expansions of states of bilinear systems are closely related to the corresponding expansions of states of linear systems, in Section 2, we first consider the concepts of modal controllability and observability for a linear dynamical system. The following new results were obtained: Criteria for modal controllability and observability are proposed (Propositions 3 and 5), and a relation is established between the eigenmodes of the linear system and sub-Gramians of controllability and observability (Propositions 7 and 9).

The main theoretical results are presented in Section 3. We show that the solution of a bilinear system under any control can be split uniquely into generalized modes corresponding to the eigenvalues of the dynamics matrix (Proposition 11). The definitions of sub-Gramians are proposed in a new form, and their relationship with the definitions in [19] are clarified (Property 4). The conditions for the existence of sub-Gramians (Property 1) and their consistency with the concept of sub-Gramians in linear theory (Property 3) are established.

In [19], expressions for sub-Gramians were proposed in the form of solutions to the modal Lyapunov equations. In this study, the same quantities are derived as the sums of squared convolution kernels arising in the Volterra series expansion of the state of the bilinear system. Moreover, it is proved (in Property 4) that if these quantities exist, then for a stable matrix of dynamics, they coincide with the definition in [19]. Although the new definition of sub-Gramians essentially coincides with the definition in [19], it allows us to establish a relation between sub-Gramians and the corresponding generalized modes of a bilinear system, namely, to prove that sub-Gramians characterize some measure of the corresponding generalized eigenmodes and their pairwise scalar products (Proposition 5) under the condition that controls are small enough. From a theoretical point of view, this result provides a conceptual justification for the concept of sub-Gramians for bilinear systems. From the point of view of applications, it allows one to make energy-based estimates of individual generalized modes and their pairwise interactions in the system. Such estimates, in turn, can become the basis for stability analysis and optimal control in bilinear dynamical systems.

Section 4 proposes an iterative algorithm for computing the Gramians and sub-Gramians based on the element-wise computation of the solution matrix on an eigenvector basis. This algorithm is similar to the algorithms in [20]. However, based on the proposed algorithm, a novel criterion for the existence of solutions to the generalized Lyapunov equation is formulated (Theorem 4), which, in some cases, allows the expansion of the domain of guaranteed existence of a solution of bilinear equations. At the end of Section 4, some examples that illustrate the application and practical use of the considered spectral decompositions are presented.

#### **2. Spectral Expansions of Gramians of Linear Systems**

#### *2.1. Eigenmode Decompositions of the Dynamics of a Linear System*

In this section, we consider the eigen-decomposition of the dynamics of a linear stationary system, which will be required for further presentation. Consider a linear dynamical system of the form

$$\begin{cases} \circ = A \circ + B \circ\\ y = C \circ \end{cases} \tag{1}$$

where *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is the state vector, and *<sup>y</sup>* <sup>∈</sup> <sup>R</sup>*<sup>l</sup>* , *<sup>u</sup>* <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* are the output signal and control, respectively. *A*, *B*, *C* are real matrices. Suppose that the dynamics matrix *A* has a simple spectrum *σ*(*A*) = {*λ*1, *λ*2,..., *λn*}.

**Proposition 1.** *A matrix A with a simple spectrum can be represented as*

$$A = \lambda\_1 R\_1 + \lambda\_2 R\_2 + \dots + \lambda\_n R\_n \tag{2}$$

*where Ri are the matrices of residues in the decomposition of the resolvent of matrix A:*

$$(Is - A)^{-1} = \frac{R\_1}{s - \lambda\_1} + \frac{R\_2}{s - \lambda\_2} + \dots + \frac{R\_n}{s - \lambda\_n} \,. \tag{3}$$

**Proof.** When all eigenvalues are distinct, the residue matrices of the resolvent of matrix *A* can be calculated using the normalized right and left eigenvectors as *Ri* = *uiv<sup>T</sup> <sup>i</sup>* (see [21]). Then, representation (2) directly follows from the eigen decomposition of matrix *A*.

From the representation of the residue matrices through the eigenvectors and the orthogonality of the eigenvectors, it follows that the residue matrices *Ri* satisfy the following the orthogonality property:

$$R\_i \, R\_j = R\_i \, \delta\_{ij} \, \, \, \, \tag{4}$$

where *δij* is the Kronecker delta. Thus, representation (2) of matrix *A* is *separable* in the sense that all terms in it are orthogonal to each other in accordance with (4). If the matrices *Ri* of residues are known, then using (2)–(4), one can easily find all the powers of the matrix *A*

$$A^k = \sum\_{i} \lambda\_i^k R\_i \quad \text{ } k = 0 \text{ } \pm 1 \text{ } \pm 2 \text{ } \dots \text{ } \text{ } \tag{5}$$

and the summation index here and in the following are assumed to be from one to *n*. Substituting (5) into the Taylor expansion of the matrix exponent of *A*, we obtain

$$e^{At} = \sum\_{i} \mathcal{R}\_i \, e^{\lambda\_i t} \,. \tag{6}$$

**Proposition 2** (Eigenmode decomposition)**.** *Solution, control, and output signal of linear system (1) are separable with respect to the eigenmodes, i.e., there is a representation*

$$\mathbf{x}(t) = \sum\_{i} \mathbf{x}\_{i}(t) \; , \quad \mathbf{u}(t) = \sum\_{i} \boldsymbol{u}\_{i}(t) \; , \quad \mathbf{y}(t) = \sum\_{i} \mathbf{y}\_{i}(t) \; , \quad \text{where}$$

$$\mathbf{x}\_{i}(t) = R\_{i}\mathbf{x}(t) = R\_{i}e^{\lambda\_{i}t}\mathbf{x}\_{0} + Be^{\lambda\_{i}t} \int\_{t\_{0}}^{t} e^{-\lambda\_{i}\tau} \boldsymbol{u}\_{i}(\tau)d\tau \; , \tag{7}$$

$$\boldsymbol{u}\_{i}(t) = B^{\mathsf{H}}\boldsymbol{R}\_{i}\boldsymbol{B}\boldsymbol{u}(t) \; , \quad \boldsymbol{y}\_{i}(t) = \mathbf{C}\boldsymbol{x}\_{i}(t) = \mathbf{C}\boldsymbol{R}\_{i}\mathbf{x}(t) \; ,$$

*x*<sup>0</sup> = *x*(*t*0) *is the initial position of the system, and B*# *denotes the Moore–Penrose inverse. The system (1) splits into separate subsystems*

$$\begin{cases} \dot{x}\_i(t) = \lambda\_i \ x\_i(t) + B \, u\_i(t) \\ y\_i(t) = \mathbb{C} \, x\_i(t) \end{cases}, \quad i = 1, \dots, n. \tag{8}$$

Recall that the Moore–Penrose inverse matrix *B*# exists and is unique for any complex or real matrix *B* and it is defined by four conditions: (i) *BB*#*B* = *B*, (ii) *B*#*BB*# = *B*#, (iii) *BB*# is Hermitian, and (iv) *B*#*B* is Hermitian.

**Proof.** The expression (7) for *xi*(*t*) = *Rix*(*t*) is obtained by multiplying the solution to (1):

$$\mathbf{x}(t) = e^{At}\mathbf{x}\_0 + e^{At} \int\_{t\_0}^t e^{-A\tau} B\,\boldsymbol{\mu}(\tau) \,d\tau$$

on the left by *Ri*, taking into account property (4) and also that *RieAt* = *Rie<sup>λ</sup>it* , *e*−*A<sup>τ</sup>* = ∑*<sup>j</sup> Rje* <sup>−</sup>*λjτ*, *Bui*(*τ*) = *RiBu*(*τ*). If we differentiate (7), we obtain (8).

The expression (7) for *xi*(*t*) = *Rix*(*t*) determines the dynamics of the *eigenmode* corresponding to the eigenvalue *λ<sup>i</sup>* in system (1). The corresponding mode in the output signal is determined by the expression *yi*(*t*) = *Cxi*(*t*).

#### *2.2. Modal Observability and Controllability of a Linear System*

In this section, by analogy with the classical definitions of an observable and controllable linear system, we introduce the corresponding concepts for individual eigenmodes. We also establish simple criteria for modal controllability and observability for a linear stationary system (1).

**Definition 1.** *The mode corresponding to the eigenvalue λ<sup>i</sup> is observable in the linear system (1) at the moment t*0*, when yi*(*t*, *t*0, *x*0, *u* = 0) ≡ 0 *at t* ≥ *t*<sup>0</sup> *if, and only if, xi*(*t*0) = 0*.*

According to (7), the observability of a mode in a stationary system (1) is entirely determined by the matrices *Ri* and *C*. Therefore, we can also discuss the *modal observability of a pair* {*C*, *Ri*}. For stationary systems, modal observability can be verified using the following simple criterion.

**Proposition 3.** *The mode corresponding to λ<sup>i</sup> in the linear system (1) is observable. if, and only if, CRi* = 0*.*

**Proof.** If the stationary pair {*C*, *Ri*} is modally observable, then *CRix*<sup>0</sup> = 0 holds for any *Rix*<sup>0</sup> = 0, that is, for some *x*<sup>0</sup> = 0, *CRix*<sup>0</sup> = 0 is fulfilled, and therefore *CRi* = 0. If *CRi* = 0, then there is some *x*<sup>0</sup> = 0 such that *CRix*<sup>0</sup> = 0. Let us now choose an arbitrary *Rix*˜0 = 0. It is easy to show that the vectors *Rix*<sup>0</sup> and *Rix*˜0 are both eigenvectors of matrix *A* corresponding to the eigenvalue *λi*. Because, by assumption, the spectrum of *<sup>σ</sup>*(*A*) is simple, these vectors are proportional, that is, *Rix*˜0 <sup>=</sup> *<sup>α</sup>Rix*0, *<sup>α</sup>* <sup>∈</sup> <sup>C</sup>. Therefore, *CRix*˜0 = *αCRix*<sup>0</sup> = 0, that is, the pair {*C*, *Ri*} is modally observable.

One can check the observability of the system by checking the observability of its individual modes.

**Proposition 4.** *The stationary system (1) is observable (identifiable) if, and only if, each mode is observable (identifiable).*

**Proof.** It follows from the definitions and equivalence of the following statements

∀*i* : *yi*(*t*, *t*0, *x*0, *u* = 0) ≡ 0 at *t* ≥ *t*<sup>0</sup> <=> *y*(*t*, *t*0, *x*0, *u* = 0) ≡ 0 at *t* ≥ *t*<sup>0</sup> ; ∀*i* : *xi*(*t*0) = 0 <=> *x*(*t*0) = 0.

However, individual modes can be observable when the dynamical system (1) as a whole is unobservable.

Similarly, one can consider the concept of modal controllability and obtain a criterion for modal controllability.

**Definition 2.** *The mode corresponding to the eigenvalue λ<sup>i</sup> in the linear system (1) is controllable, if for each event* (*t*0, *x*<sup>0</sup> = *xi*(*t*0))*, there is a control u*(*t*)*, which brings the system to the zero state in a finite time.*

For stationary systems, modal controllability can be verified using the following simple criterion.

**Proposition 5.** *The mode corresponding to λ<sup>i</sup> in the linear system (1) is controllable if, and only if, RiB* = 0*.*

**Proof.** If *RiB* = 0, then it follows from (7) that mode *xi*(*t*) is not controllable. If *RiB* = 0, then *u*(*t*) can always be chosen, such that

$$\int\_{t\_0}^{t\_0+T} e^{-\lambda\_j \tau} u(\tau) d\tau = \begin{cases} u\_{i'}^0, & j=i\\ 0, & j \neq i \end{cases}, \quad R\_i \ge\_0 = -R\_i B u\_i^0, \quad j=1,\dots,n$$

Then, in a finite time *T*, the control *u*(*t*) brings the system from state *xi*(*t*0) = *Rix*<sup>0</sup> to the zero state, i.e., the eigen-mode corresponding to *λ<sup>i</sup>* is controllable.

According to Proposition 5, the controllability of a mode in a stationary system is entirely determined by the matrices *Ri* and *B*. Thus, we can discuss *the modal controllability of the stationary pair* {*Ri*, *B*}. The controllability of the system can be verified by checking the controllability of its individual modes.

**Proposition 6.** *A stationary linear system (1) is controllable if, and only if, each mode is controllable.*

**Proof.** If the system (1) is controllable, then each of its modes, by definition, is also controllable. Consider a system in which each mode can be controlled. Let at the moment *t*0, it is in the state *x*<sup>0</sup> = 0. Let us choose modal control in the form

$$u(t) = \sum\_{i} u\_i(t) \; , \; u\_i(t) = \begin{cases} u\_i^0 f\_i(t) \; , \; t \in [t\_0, t\_0 + T] \\ 0 \; , \; t \notin [t\_0, t\_0 + T] \end{cases} \tag{9}$$

where the set of scalar functions *f*1, *f*2, ··· , *fn* satisfies the condition

$$\forall i,k=1,\cdots,n:\ \int\_{t\_0}^{t\_0+T} e^{-\lambda\_k t} f\_i(t)dt = \delta\_{ik} = \begin{cases} 0,\ i \neq k\\ 1,\ i=k \end{cases}.\tag{10}$$

As functions *fi*, for example, one can always choose piecewise constant functions on *n* sections of the interval *t* ∈ [*t*0, *t*<sup>0</sup> + *T*]. Substituting the control *u*(*t*) from (9) and (10) into the solution to (1),

$$\varkappa(t) = e^{At}\varkappa\_0 + e^{At} \int\_{t\_0}^t e^{-A\tau} Bu(\tau)d\tau \text{ .} $$

we obtain

$$\mathbf{x}(t) = \sum\_{i} \mathbf{x}\_{i}(t) = \sum\_{i} (R\_{i}\mathbf{x}\_{0} + R\_{i}Bu\_{i}^{0})e^{-\lambda\_{i}t}, \ t \ge t\_{0} + T. \tag{11}$$

Because all eigenvalues *λ<sup>i</sup>* are simple, the vectors *Rix*<sup>0</sup> and *RiBu*<sup>0</sup> *<sup>i</sup>* coincide up to a scalar factor with the corresponding right eigenvector of the system. In addition, according to Proposition 5, *RiB* <sup>=</sup> 0 for all *<sup>i</sup>*. Therefore, it is always possible to choose vectors *<sup>u</sup>*<sup>0</sup> *<sup>i</sup>* , such that *x*(*t*) ≡ 0, *t* ≥ *t*<sup>0</sup> + *T* in (11). Thus, system (1) is controllable.

The choice of the control *u*(*t*) in the form (9–10) also proves the following property:

**Corollary 1.** *If an individual mode of system (1) is controllable, then there is a control ui*(*t*) *that allows one to change this eigenmode arbitrarily on any finite interval without changing other eigenmodes of the solution.*

Note that individual modes can be controllable even when the dynamical system as a whole is uncontrollable.

#### *2.3. Spectral Decompositions of Gramians of a Linear System*

In this section, we recall the basic facts about the observability and controllability Gramians of the linear system (1) and their spectral expansions, and also offer a meaningful interpretation of the corresponding spectral components in these expansions.

*The Gramians of controllability and observability* of a stable linear system (1) are, respectively, the quantities

$$P\_{\mathbb{C}} = \int\_{0}^{\infty} e^{At} B B^{T} e^{A^{T}t} dt, \quad P\_{\mathbb{O}} = \int\_{0}^{\infty} e^{A^{T}t} \mathbb{C}^{T} \mathbb{C} e^{At} dt \tag{12}$$

which are also solutions of the corresponding Lyapunov equations

$$AP\_{\mathbb{C}} + P\_{\mathbb{C}}A^T = -BB^T, \quad A^T P\_{\mathbb{O}} + P\_{\mathbb{O}}A = -\mathbb{C}^T \mathbb{C}. \tag{13}$$

If *x*<sup>0</sup> = *x*(0) is the initial state of system (1), then the integral energy of the output signal at zero control is determined by the observability Gramian

$$\int\_{0}^{\infty} y^{T}(t) \, y(t) dt = \mathfrak{x}\_{0}^{T} P\_{\mathbb{O}} \, \mathfrak{x}\_{0} \,. \tag{14}$$

If the state *x*<sup>0</sup> is reachable, then the minimum energy for bringing the system from the zero state to *x*<sup>0</sup> and the corresponding optimal control *u*ˆ(*t*) are determined by the inverse matrix of the controllability Gramian

$$\inf\_{\mathbf{x}(-\infty)=0} \int\_{-\infty}^{0} \boldsymbol{\mathfrak{a}}^{T}(t)\boldsymbol{\mathfrak{a}}(t)dt = \mathbf{x}\_{0}^{T} \boldsymbol{P}\_{\mathbb{C}}^{\#} \mathbf{x}\_{0}, \quad \boldsymbol{\mathfrak{a}}(t) = \mathbf{B}^{T} \boldsymbol{e}^{-\mathbf{A}^{T}t} \boldsymbol{P}\_{\mathbb{C}}^{\#} \mathbf{x}\_{0}, \quad -\infty < t < 0,\tag{15}$$

where *P*# *<sup>C</sup>* is the Moore–Penrose inverse.

In [17], the spectral decompositions of Gramians (12) were proposed. In [18], they were generalized to a more general class of solutions of the matrix Krein equations. The eigenterms of the expansions are represented using the residues of the resolvent of the matrix *A*. Let us formulate this result for Equation (13) in the following form:

**Theorem 1** ([18])**.** *If λ*∗ *<sup>i</sup>* + *λ<sup>j</sup>* = 0 *for all λi*, *λ<sup>j</sup>* ∈ *σ*(*A*)*, Then, for any matrices B and C, there is a unique solution of the Lyapunov Equation (13), and it is presented in the form*

$$P = \sum\_{i=1}^{n} \bar{P}\_i = \sum\_{i,j=1}^{n} P\_{ij\prime} \quad \bar{P}\_i = \sum\_{j=1}^{n} P\_{ij\prime} \tag{16}$$

*where the spectral components for the controllability and observability Gramians, respectively, are given by*

$$\bar{P}\_{i}^{\mathbb{C}} = -\left\{ R\_{i} B B^{T} (\lambda\_{i} I + A^{\*})^{-1} \right\}\_{Hrm}, \quad P\_{i\bar{j}}^{\mathbb{C}} = \left\{ \frac{-1}{\lambda\_{\bar{i}} + \lambda\_{\bar{j}}^{\*}} R\_{i} B B^{T} R\_{\bar{j}}^{\*} \right\}\_{Hrm},\tag{17}$$

$$\tilde{P}\_i^O = -\left\{ R\_i^\* \mathbb{C}^T \mathcal{C} (\lambda\_i^\* I + A)^{-1} \right\}\_{Hrm}, \quad P\_{ij}^O = \left\{ \frac{-1}{\lambda\_i^\* + \lambda\_j} R\_i^\* \mathbb{C}^T \mathcal{C} R\_j \right\}\_{Hrm},\tag{18}$$

*where* {·}*Herm denotes the Hermitian part of the matrix, and Ri and Rj are the matrix residues (3) that correspond to the eigenvalues λ<sup>i</sup> and λj.*

The eigenterms *P*˜ *<sup>i</sup>* and *Pij* in expressions (16) are called in [17] *the sub-Gramians and pairwise sub-Gramians*, respectively. They characterize the contribution of the corresponding eigenmodes or their pairs to the energy variation of the system, determined by the corresponding Gramian over an infinite time interval. The following statement holds:

**Proposition 7** (Interpretation of observability sub-Gramians)**.** *For system (1) with zero control, the value x<sup>T</sup>* <sup>0</sup> *<sup>P</sup>*˜*<sup>O</sup> <sup>i</sup> x*<sup>0</sup> *is the cross-correlation between the output signal y*(*t*) *and its i-th modal component at a lag of zero. The value x<sup>T</sup>* <sup>0</sup> *<sup>P</sup><sup>O</sup> ij x*<sup>0</sup> *is the cross-correlation between the i-th and j-th modal components of the output signal at a lag of zero.*

**Proof.** Considering that *y*(*t*) = *CeAtx*<sup>0</sup> and *yi*(*t*) = *CRie<sup>λ</sup>it x*0, we obtain

$$\frac{1}{2} \int\_0^\infty (y\_i^\* y + y^\* y\_i) dt = \frac{1}{2} \mathbf{x}\_0^T \int\_0^\infty (e^{\lambda\_i^\* t} \mathbf{R}\_i^\* \mathbf{C}^T \mathbf{C} e^{At} + e^{A^T t} \mathbf{C}^T \mathbf{C} \mathbf{R}\_i e^{\lambda\_i t}) dt\\\mathbf{x}\_0 = \mathbf{x}\_0^T \tilde{P}\_i^O \mathbf{x}\_0$$

Similarly, we directly verify that <sup>1</sup> 2 <sup>∞</sup> <sup>0</sup> (*y*<sup>∗</sup> *<sup>i</sup> yj* + *y*<sup>∗</sup> *<sup>j</sup> yi*)*dt* = *<sup>x</sup><sup>T</sup>* <sup>0</sup> *<sup>P</sup><sup>O</sup> ij x*0.

Similar to the Lyapunov Equation (13) hold for Gramians, the corresponding *modal Lyapunov equations* hold for sub-Gramians.

**Proposition 8.** *Under the conditions of Theorem 1, the observability sub-Gramians P*˜*<sup>O</sup> <sup>i</sup> and <sup>P</sup><sup>O</sup> ij in expansions (16) and (18) satisfy the following modal Lyapunov equations:*

$$A^T \bar{P}\_i^O + \bar{P}\_i^O A = -\frac{1}{2} \left( R\_i^\* \mathcal{C}^T \mathcal{C} + \mathcal{C}^T \mathcal{C} R\_i \right),\tag{19}$$

$$A^T P^O\_{\rm ij} + P^O\_{\rm ij} A = -\frac{1}{2} \left( R^\*\_i \mathcal{C}^T \mathcal{C} R\_j + R^\*\_j \mathcal{C}^T \mathcal{C} R\_i \right). \tag{20}$$

**Proof.** This is verified by the direct substitution of (18) into (19) and (20).

Similar statements are proved for controllability sub-Gramians.

**Proposition 9** (Interpretation of controllability sub-Gramians)**.** *For system (1) and reachable state x*0*, consider problem (15) of finding the required control u*ˆ(*t*) *with the minimum energy. Then, the value x<sup>T</sup>* <sup>0</sup> (*P*# *C*)*TP*˜*<sup>C</sup> <sup>i</sup> <sup>P</sup>*# *<sup>C</sup>x*<sup>0</sup> *is the cross-correlation between the optimal control u*ˆ(*t*) *and its i-th modal component at a lag of zero. The value x<sup>T</sup>* <sup>0</sup> (*P*# *C*)*TPC ij <sup>P</sup>*# *<sup>C</sup>x*<sup>0</sup> *is the cross-correlation between the the i-th and j-th modal components of the optimal control at a lag of zero.*

**Proposition 10.** *Under the conditions of Theorem 1, the controllability sub-Gramians P*˜*<sup>C</sup> <sup>i</sup> and <sup>P</sup><sup>C</sup> ij in (16) and (17) satisfy the following modal Lyapunov equations:*

$$\begin{aligned} A\bar{P}\_i^{\mathbb{C}} + \bar{P}\_i^{\mathbb{C}} A^T &= -\frac{1}{2} \left( R\_i B B^T + B B^T R\_i^\* \right), \\ A P\_{i\bar{j}}^{\mathbb{C}} + P\_{i\bar{j}}^{\mathbb{C}} A^T &= -\frac{1}{2} \left( R\_i B B^T R\_{\bar{j}}^\* + R\_{\bar{j}} B B^T R\_{\bar{i}}^\* \right). \end{aligned}$$

#### **3. Spectral Decompositions of Gramians of a Bilinear Control System**

In this section, we extend the results obtained for linear systems to the case of bilinear control systems. In particular, we introduce the concept of *a generalized eigenmode* and prove that the state of the bilinear system can be uniquely split into generalized modes corresponding to the eigenvalues of the dynamics matrix. Further, we recall some known facts about the controllability and observability Gramians of bilinear systems and propose their *spectral decomposition* into parts (sub-Gramians) corresponding to the spectrum of the dynamics matrix. We prove that individual sub-Gramians characterize some measure of the corresponding generalized eigenmodes or their pairwise scalar products.

#### *3.1. Partitioning the Solution into Generalized Modes of the Matrix A*

Consider a bilinear control system of the form [5,6]

$$\dot{\mathbf{x}}(t) = Ax(t) + \sum\_{j=1}^{m} N\_j \mathbf{x}(t) u\_j(t) + Bu(t), \quad \mathbf{y}(t) = \mathbf{C} \mathbf{x}(t) \,. \tag{21}$$

where *<sup>x</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*n*, *<sup>u</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*m*, *<sup>y</sup>*(*t*) <sup>∈</sup> <sup>R</sup>*<sup>l</sup>* are the state, input, and output vectors, respectively, and *A*, *N*1, ··· , *Nm*, *B*, and *C* are the real matrices. Assume that the initial state is *x*(0) = 0, and the system input satisfies *u*(*t*) = 0, *t* < 0. Then, the solution of (21) can be considered as a solution to the following recursive system of linear equations:

$$\dot{\mathbf{x}}^{(1)}(t) = A\mathbf{x}^{(1)}(t) + B\,\boldsymbol{u}(t)\,,$$

$$\dot{\mathbf{x}}^{(k)}(t) = A\mathbf{x}^{(k)}(t) + \sum\_{j=1}^{m} N\_j \mathbf{x}^{(k-1)}(t)\boldsymbol{u}\_j(t) + B\,\boldsymbol{u}(t)\,\,,\,\,k = 2, 3, \cdots \tag{22}$$

Solving the systems (22) sequentially, we obtain

$$\mathbf{x}^{(1)}(t) = \int\_{0}^{\infty} e^{A\tau\_{1}} B \, u(t - \tau\_{1}) d\tau\_{1}$$

$$\mathbf{x}^{(2)}(t) = \mathbf{x}^{(1)}(t) + \sum\_{j=-1}^{m} \int\_{0}^{\infty} \int\_{0}^{\infty} e^{A\tau\_{2}} N\_{j\_{2}} e^{A\tau\_{1}} B \, u(t - \tau\_{2} - \tau\_{1}) u\_{j\_{2}}(t - \tau\_{2}) d\tau\_{1} d\tau\_{2} \, \cdots \, . \tag{23}$$

$$\mathbf{x}^{(k)}(t) = \mathbf{x}^{(k-1)}(t) + \sum\_{j\_{2}, \dots, j\_{k}=1}^{m} \int\_{0}^{\infty} \cdots \int\_{0}^{\infty} e^{A\tau\_{k}} N\_{j\_{k}} \cdots e^{A\tau\_{2}} N\_{j\_{2}} e^{A\tau\_{1}} B \, u(t - \tau\_{1} - \dots - \tau\_{k})$$

$$u\_{j\_{2}}(t - \tau\_{2} - \dots - \tau\_{k}) \cdots u\_{j\_{k}}(t - \tau\_{k}) \, d\tau\_{1} d\tau\_{2} \cdots d\tau\_{k} \, \, \, k = 3, \dots \tag{23}$$

It was proved in [22] that if the sequence *x*(*k*)(*t*) in (23) converges (that is, the corresponding *Volterra series* of corrections converges), then it converges to the solution of (21), that is,

$$\mathfrak{x}(t) = \lim\_{k \to \infty} \mathfrak{x}^{(k)}(t) \,. \tag{24}$$

It was proved in [23] that this sequence always converges if matrix *A* is stable, input control is bounded, and all the matrices *Nj* are sufficiently bounded in norm. In what follows, we assume that the corresponding Volterra series converges, and the limit (24) exists.

From (23), it follows that the solution to the bilinear system (21) is constructed as the sum of (i) the solution of its linear part *<sup>x</sup>*(1)(*t*), (ii) the bilinear correction *<sup>x</sup>*(2)(*t*) <sup>−</sup> *<sup>x</sup>*(1)(*t*) generated by the linear part, (iii) the bilinear correction *<sup>x</sup>*(3)(*t*) <sup>−</sup> *<sup>x</sup>*(2)(*t*) generated by the first correction, etc. Moreover, all non-linear corrections of the form *<sup>x</sup>*(*k*)(*t*) <sup>−</sup> *<sup>x</sup>*(*k*−<sup>1</sup>)(*t*), *<sup>k</sup>* <sup>=</sup> 2, 3, ··· are integral transformations of the linear part *<sup>x</sup>*(1)(*t*) of order *<sup>k</sup>* with respect to control, that is,

$$\mathbf{x}(t) = \mathbf{x}^{(1)}(t) + \sum\_{k=1}^{\infty} F^k \mathbf{x}^{(1)}(t) \, \prime \quad \text{where}$$

$$F \mathbf{x}(t) = \sum\_{j=1}^{m} \int\_{0}^{\infty} e^{A\tau} \mathbf{N}\_j \, \mathbf{x}(t-\tau) \, u\_j(t-\tau) \, d\tau \, . \tag{25}$$

Moreover, according to our assumption, the integral operator *F* in (25) is a contraction.

The solution *x*(1)(*t*) of the linear part of the system (21) can be divided into eigenmodes of the matrix *A*, in accordance with the definitions (7) in Section 2.1.

$$\mathbf{x}^{(1)}(t) = \sum\_{i} \mathbf{x}\_{i}^{(1)}(t) \quad \text{where}$$

$$\mathbf{x}\_{i}^{(1)}(t) = \mathbf{R}\_{i} \mathbf{x}^{(1)}(t) = \int\_{0}^{\infty} e^{\lambda\_{i}\tau} \mathbf{R}\_{i} \, \mathbf{B} \, \boldsymbol{u}(t-\tau) \, d\tau \, , \tag{26}$$

where *Ri* is the residue matrix in (3) corresponding to *λi*.

**Definition 3.** *The generalized mode of the bilinear system (21) corresponding to the eigenvalue λ<sup>i</sup> of the matrix A is the sum of the mode x* (1) *<sup>i</sup>* (*t*) *of the linear part of the system and non-linear corrections generated by this mode, obtained in the course of solving the recursive system (22), i.e.,*

$$\mathbf{x}\_i(t) = \mathbf{x}\_i^{(1)}(t) + \sum\_{k=1}^{\infty} F^k \mathbf{x}\_i^{(1)}(t) \, , \tag{27}$$

*where the integral operator F is defined in (25) and is assumed to be a contraction, and x* (1) *<sup>i</sup> is defined in (26).*

The significance of Definition 3 is justified by the following statement.

**Proposition 11.** *Let the initial state of the bilinear system (21) x*(0) = 0*, which satisfies u*(*t*) = 0*, t* < 0*, and the Volterra series in (23) converges. Then, the solution of (21) is uniquely split into generalized modes (27), corresponding to the eigenvalues of matrix A.*

$$\mathfrak{x}(t) = \sum\_{i} \mathfrak{x}\_{i}(t) \;. \tag{28}$$

**Proof.** By constructing the sequence in (23),

$$\mathfrak{x}^{(k)}(t) = \mathfrak{x}^{(1)}(t) + \sum\_{j=1}^{k-1} F^j \mathfrak{x}^{(1)}(t)$$

According to Proposition 2, the solution of the linear part *x*(1)(*t*) is uniquely decomposed into eigenmodes

$$\mathbf{x}^{(1)}(t) = \sum\_{i=1}^{n} \mathbf{x}\_i^{(1)}(t).$$

Since the integral operator *F* is linear, we obtain

$$\mathbf{x}^{(k)}(t) = \sum\_{i=1}^{n} \mathbf{x}\_i^{(1)}(t) + \sum\_{j=1}^{k-1} F^j \left( \sum\_{i=1}^{n} \mathbf{x}\_i^{(1)}(t) \right) = \sum\_{i=1}^{n} \left( \mathbf{x}\_i^{(1)}(t) + \sum\_{j=1}^{k-1} F^j \mathbf{x}\_i^{(1)}(t) \right) = \sum\_{i=1}^{n} \mathbf{x}\_i^{(k)}(t)$$

If Volterra series <sup>∑</sup>*k*(*x*(*k*)(*t*) <sup>−</sup> *<sup>x</sup>*(*k*−<sup>1</sup>)(*t*)) in (23) converges, then according to [22], the sequence {*x*(*k*)(*t*)} converges to the solution of (21). Due to the convergence of the sequence {*x*(*k*)(*t*)}, the sequences {*<sup>x</sup>* (*k*) *<sup>i</sup>* (*t*)} for each *i* also converge to *xi*(*t*) in (27), since they are obtained by multiplying {*x*(*k*)(*t*)} by constant matrices *Ri*. Therefore, taking the limit *k* → ∞ in the previous equation, we obtain the assertion of the proposition.

#### *3.2. Spectral Decompositions of Gramians*

The concept of controllability and observability Gramians for a bilinear system was studied in [2]. The controllability Gramian of system (21) is defined as

$$P\_{\mathbb{C}} = \sum\_{k=1}^{\infty} P^{(k)} = \sum\_{k=1}^{\infty} \int\_{0}^{\infty} \cdots \int\_{0}^{\infty} G\_k B B^T G\_k^T d\tau\_1 \cdots d\tau\_k, \quad \text{where}$$

$$G\_1 = e^{A\tau\_1}, G\_k(\tau\_1 \cdots \tau\_k) = e^{A\tau\_k} [N\_1 G\_{k-1} \cdots \cdot N\_m G\_{k-1}], \ k = 2, 3, \cdots \tag{29}$$

It characterizes the *input-to-state energy* of the system [16]. Additionally, the following statements hold:

**Theorem 2** ([6])**.** *The controllability (observability) Gramian exists if (i) A is stable, such that* ||*eAt*|| ≤ *<sup>β</sup>e*−*α<sup>t</sup>* , *<sup>t</sup>* <sup>≥</sup> 0, *<sup>α</sup>*, *<sup>β</sup>* <sup>&</sup>gt; <sup>0</sup>*. (ii)* || <sup>∑</sup>*<sup>m</sup> <sup>γ</sup>*=<sup>1</sup> *NγN<sup>T</sup> <sup>γ</sup>* || <sup>&</sup>lt; <sup>2</sup>*α*/*β*2*.*

**Theorem 3.** *If matrix A is stable and the controllability Gramian exists, then (i) system (1) is controllable if, and only if, PC* > 0 *[2], and (ii) the Gramian PC satisfies the generalized Lyapunov equation [5]*

$$AP\_{\mathbb{C}} + P\_{\mathbb{C}}A^T + \sum\_{\gamma=1}^{m} N\_{\gamma} P\_{\mathbb{C}} N\_{\gamma}^T = -BB^T. \tag{30}$$

A study in [6] (in Proposition 1) also showed that if the matrix *A* is stable, then the terms of the series *P*(*k*) in (29) can be found as successive solutions of the following recursive system of linear Lyapunov equations:

$$AP^{(1)} + P^{(1)}A^T + BB^T = 0$$

$$AP^{(k)} + P^{(k)}A^T + \sum\_{\gamma=1}^{m} N\_{\gamma}P^{(k)}N\_{\gamma}^T = 0, \; k = 2, 3, \cdots \tag{31}$$

The following useful addition can be made to this statement.

**Proposition 12.** *The controllability Gramian (29) of the bilinear system (21) is the sum of the controllability Gramian P*(1) *of the linear part and the integrals of the Gram matrices formed by convolution kernels that arise when calculating the non-linear corrections <sup>x</sup>*(*k*)(*t*) <sup>−</sup> *<sup>x</sup>*(*k*−<sup>1</sup>)(*t*)*, k* = 2, 3, ··· *in the recursive solution to system (22).*

**Proof.** According to (31), *P*(1) in (29) is the controllability Gramian of the linear part of the system (21), and the other terms *P*(*k*) are calculated in (29) as integrals of the Gram matrices:

$$P^{(k)} = \int\_0^\infty \cdots \int\_0^\infty G\_k B B^T G\_k^T dt\_1 \cdots dt\_{k'}$$

and it can be verified that these Gram matrices

$$G\_k B B^T G\_k^T = \sum\_{j\_k=1}^m e^{A \tau\_k} N\_{j\_k} G\_{k-1} B B^T G\_{k-1}^T N\_{j\_k}^T e^{A^T \tau\_k} = \dots =$$

$$\sum\_{j\_2, \dots, j\_k=1}^m e^{A \tau\_k} N\_{j\_k} \dots \cdot e^{A \tau\_2} N\_{j\_2} e^{A \tau\_1} B B^T e^{A^T \tau\_1} N\_{j\_2}^T e^{A^T \tau\_2} \dots \cdot N\_{j\_k}^T e^{A^T \tau\_k}$$

are formed by convolution kernels, arising when calculating the corrections *<sup>x</sup>*(*k*)(*t*) <sup>−</sup> *<sup>x</sup>*(*k*−<sup>1</sup>)(*t*), *<sup>k</sup>* <sup>=</sup> 2, 3, ··· in (23).

**Definition 4.** *Controllability sub-Gramians and pairwise sub-Gramians of the bilinear system (21) are, respectively, the matrices*

$$\bar{P}\_i^{\mathbb{C}} = \sum\_{k=1}^{\infty} \bar{P}\_i^{(k)} = \frac{1}{2} \sum\_{k=1}^{\infty} \int\_0^{\infty} \cdots \int\_0^{\infty} \mathcal{G}\_k \left( R\_i B B^T + B B^T R\_i^\* \right) \mathcal{G}\_k^T d\tau\_1 \cdots d\tau\_k \,\,\,\tag{32}$$

$$P\_{\vec{ij}}^{\mathbb{C}} = \sum\_{k=1}^{\infty} P\_{\vec{ij}}^{(k)} = \frac{1}{2} \sum\_{k=1}^{\infty} \int\_{0}^{\infty} \cdots \int\_{0}^{\infty} G\_k(R\_i B B^T R\_j^\* + R\_j B B^T R\_i^\*) G\_k^T d\tau\_1 \cdots d\tau\_k \,\,\,\,\tag{33}$$

*where Ri and Rj are the residue matrices in (3) corresponding to the eigenvalues λ<sup>i</sup> and λ<sup>j</sup> of matrix A, and the matrices Gk are defined in (29).*

We now establish some basic properties of sub-Gramians (32) and (33) in Definition 4.

**Property 1.** *Under the conditions of Theorem 2, sub-Gramians (32) and (33) exist.*

**Proof.** Under the conditions of Theorem 2, the series in (32) and (33) are formed using the same contracting operator *F* as a series (29) in the definition of Gramian. Therefore, sub-Gramians exist.

Suppose, further, that the matrix *A* is stable and controllability sub-Gramians (32) and (33) exist. Then the following properties are satisfied.

**Property 2.** *The sum over all sub-Gramians is Gramian (29)*

$$P\_{\mathbb{C}} = \sum\_{i=1}^{n} \tilde{P}\_i^{\mathbb{C}} = \sum\_{i,j=1}^{n} P\_{ij}^{\mathbb{C}} \, \prime \, \, \tilde{P}\_i^{\mathbb{C}} = \sum\_{j=1}^{n} P\_{ij}^{\mathbb{C}} \, \, . \tag{34}$$

**Proof.** This is verified by the direct summation of expressions (32) and (33) considering the uniform convergence of the series and integrals and the property of residue matrices ∑*<sup>i</sup> Ri* = *I*.

**Property 3** (Consistency with linear theory)**.** *The sub-Gramians <sup>P</sup>*˜(1) *<sup>i</sup> and <sup>P</sup>*(1) *ij in (32) and (33) are the controllability sub-Gramians of the linear part of system (21) in accordance with the definitions (17) of Section 2.3.*

**Property 4.** *Controllability sub-Gramians of the bilinear system in (32) and (33) satisfy the corresponding generalized modal Lyapunov equations*

$$A\bar{P}\_i^C + \bar{P}\_i^C A^T + \sum\_{\gamma=1}^m N\_\gamma \bar{P}\_i^C N\_\gamma^T = -\frac{1}{2} \left( R\_i B B^T + B B^T R\_i^\* \right),\tag{35}$$

$$AP\_{ij}^C + P\_{ij}^C A^T + \sum\_{\gamma=1}^m N\_\gamma P\_{ij}^C N\_\gamma^T = -\frac{1}{2} \left( R\_i B B^T R\_j^\* + R\_j B B^T R\_i^\* \right). \tag{36}$$

**Proof.** We can directly verify that when *<sup>A</sup>* is stable, the terms *<sup>P</sup>*˜(*k*) *<sup>i</sup>* in (32) can be obtained from the following Lyapunov equations:

$$\begin{aligned} A\bar{P}\_i^{(1)} + \bar{P}\_i^{(1)}A^T + \frac{1}{2} \left( R\_i B B^T + B B^T R\_i^\* \right) &= 0 \\ A\bar{P}\_i^{(k)} + \bar{P}\_i^{(k)}A^T + \sum\_{\gamma=1}^m N\_\gamma \bar{P}\_i^{(k-1)} N\_\gamma^T &= 0 \ , \ k = 2, 3, \cdots \end{aligned}$$

We sum the first *K* equations. Because we assumed that sub-Gramians exist, that is, the series in (32) and (33) converge, then, the series ∑*<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *<sup>P</sup>*˜(*k*) *<sup>i</sup>* converges uniformly as *K* → ∞. Taking the limit *K* → ∞, we obtain (35). Similarly, we obtain (36).

**Corollary 2.** *If Equation (30) has a unique solution and the sub-Gramians P*˜*<sup>C</sup> <sup>i</sup> and <sup>P</sup><sup>C</sup> ij exist, then they are defined as unique solutions to (35) and (36).*

**Proof.** According to Property 4, sub-Gramians must satisfy (35) and (36). If (30) has a unique solution, then the operator on the left-hand side of (30) is non-singular. Therefore the sub-Gramians *P*˜*<sup>C</sup> <sup>i</sup>* and *<sup>P</sup><sup>C</sup> ij* are defined uniquely by (35) and (36) for any matrix on the right-hand side.

Choose the input control satisfying the conditions *u*(*t*) = 0, *t* < 0 and <sup>∞</sup> <sup>0</sup> |*u*(*t*)| <sup>2</sup>*dt* = *M*<sup>2</sup> < 1. Consider the set of vector functions

$$\Omega\_{\mathbb{H}} = \{ f(t) : f(t) = \sum\_{k=0}^{\infty} F^k f^{(1)}(t), \ f^{(1)}(t) = \int\_0^{\infty} G\_1(\tau) B\_f \, u(t - \tau) d\tau \}, \ t > 0$$

where operator *F* is defined in (25), *G*1(*τ*) = *eA<sup>τ</sup>* as in (29), and *Bf* is a matrix of appropriate dimensions. Then for any *x*, *y* ∈ Ω*<sup>u</sup>* we define the scalar product as

$$\Gamma(x,y)\_{\Omega} = \sum\_{k=1}^{\infty} M^{2k} \cdot \text{Trace}\left(\int\_0^{\infty} \cdots \cdot \int\_0^{\infty} G\_k B\_x B\_y^\* G\_k^T \, d\tau\_1 \cdots \, d\tau\_k\right),\tag{37}$$

where *Gk* are defined as in (29). This definition satisfies the axioms of linearity, commutativity and positive definiteness. Then, the following analog of Proposition 7 holds for the sub-Gramians of the bilinear system.

**Property 5.** *Suppose that in the bilinear system (21), the initial state is x*(0) = 0*, and the control satisfies the condition u*(*t*) = 0, *t* < <sup>0</sup>*. Then, for a sufficiently small control* <sup>∞</sup> <sup>0</sup> |*u*(*t*)| <sup>2</sup>*dt* = *M*<sup>2</sup> < 1*, the trace of controllability sub-Gramian P*˜*<sup>C</sup> <sup>i</sup> estimates from above the value of the dot product (37) of a solution vector x*(*t*) *with generalized mode xi*(*t*) *in (28), and the trace of pairwise sub-Gramian PC ij estimates from above the value of the dot product of a generalized mode xi*(*t*) *with generalized mode xj*(*t*)

<sup>|</sup>(*x*, *xj*)Ω|≤| Trace *<sup>P</sup>*˜*<sup>C</sup> <sup>i</sup>* | , <sup>|</sup>(*xi*, *xj*)Ω|≤| Trace *<sup>P</sup><sup>C</sup> ij* | . (38)

The *observability Gramian* and *observability sub-Gramian* of system (21) are defined in a similar manner. Properties similar to Properties 1–5 are satisfied for them. *Gramian of observability* is defined as

$$P\_O = \sum\_{k=1}^{\infty} P^{(k)} = \sum\_{k=1}^{\infty} \int\_0^{\infty} \cdots \int\_0^{\infty} Q\_k^T \mathbf{C}^T \mathbf{C} Q\_k d\mathbf{r}\_1 \cdots d\mathbf{r}\_k,\quad\text{where}$$

$$Q\_1 = e^{A\mathbf{r}\_1},\ Q\_k(\mathbf{r}\_1 \cdots \mathbf{r}\_k) = \left[N\_1^T Q\_{k-1}^T \cdots \ , N\_m^T Q\_{k-1}^T\right]^T e^{A\mathbf{r}\_k},\ k = 2, 3, \cdots \tag{39}$$

**Definition 5.** *Observability sub-Gramians and pairwise sub-Gramians of the bilinear system (21) are, respectively, the matrices*

$$\mathcal{P}\_i^{\mathcal{O}} = \sum\_{k=1}^{\infty} \mathcal{P}\_i^{(k)} = \frac{1}{2} \sum\_{k=1}^{\infty} \int\_0^{\infty} \cdots \int\_0^{\infty} \mathcal{Q}\_k^T (\mathcal{R}\_i^\* \mathcal{C}^T \mathcal{C} + \mathcal{C}^T \mathcal{C} \mathcal{R}\_i) \mathcal{Q}\_k \, d\tau\_1 \cdots \, d\tau\_k \,\tag{40}$$

$$P\_{ij}^{O} = \sum\_{k=1}^{\infty} P\_{ij}^{(k)} = \frac{1}{2} \sum\_{k=1}^{\infty} \int\_{0}^{\infty} \cdots \int\_{0}^{\infty} Q\_k^T (R\_i^\* \mathbb{C}^T \mathbb{C} R\_j + R\_j^\* \mathbb{C}^T \mathbb{C} R\_i) Q\_k d\tau\_1 \cdots d\tau\_k. \tag{41}$$

The observability sub-Gramians satisfy the following *modal Lyapunov equations*:

$$\begin{aligned} A^T P\_i^O + P\_i^O A + \sum\_{\gamma=1}^m N\_\gamma^T P\_i^O N\_\gamma &= -\frac{1}{2} \left( R\_i^\* \mathbb{C}^T \mathbb{C} + \mathbb{C}^T \mathbb{C} R\_i \right), \\\ A^T P\_{i\bar{j}}^O + P\_{i\bar{j}}^O A + \sum\_{\gamma=1}^m N\_\gamma^T P\_{i\bar{j}}^O N\_\gamma &= -\frac{1}{2} \left( R\_i^\* \mathbb{C}^T \mathbb{C} R\_{\bar{j}} + R\_{\bar{j}}^\* \mathbb{C}^T \mathbb{C} R\_i \right). \end{aligned}$$

#### **4. Iterative Algorithms for Computing Gramians and Sub-Gramians**

In this section, we propose iterative algorithms for computing the Gramians and sub-Gramians for bilinear control systems based on the element-wise computation of the solution matrix on an eigenvector basis. Similar formulas for linear systems were proposed in [24]. Based on the proposed iterative procedure, we introduce a new criterion for the existence of solutions to generalized Lyapunov equations, which in some cases allows us to expand the region of guaranteed existence of solutions in comparison with the estimate of Theorem 2. The proposed criterion, however, uses more detailed information on the coefficients of matrices *Nγ* and eigenvalues of matrix *A*.

#### *4.1. Algorithm for the Element-Wise Computation of Gramian in the Eigenvector Basis*

Assume that the matrix *A* in (21) has a simple spectrum *σ*(*A*) = {*λ*1, *λ*2, ··· , *λn*} and the following eigenvalue decomposition

$$A = \mathcal{U}\Lambda V, \quad \mathcal{U}V = V\mathcal{U} = I,\tag{42}$$

where Λ = *diag*{*λ*1, *λ*2, ··· , *λn*}. The columns of matrix *U* are composed of the normalized right eigenvectors of matrix *A*, and the rows of matrix *V* are the normalized left eigenvectors. Then, the Lyapunov Equation (30) in the eigenbasis takes the form

$$
\Delta \mathcal{P}\_{\mathbb{C}} + \mathcal{P}\_{\mathbb{C}} \Lambda^\* + \sum\_{\gamma=1}^{m} \mathcal{N}\_{\gamma} \mathcal{P}\_{\mathbb{C}} \mathcal{N}\_{\gamma}^T = -\bar{\mathcal{Q}}\_{\prime} \tag{43}
$$

where *P*˜ *<sup>C</sup>* <sup>=</sup> *VPCV*∗, *<sup>Q</sup>*˜ <sup>=</sup> *VBBTV*∗, *<sup>N</sup>*˜ *<sup>γ</sup>* <sup>=</sup> *VNγU*, and (·)<sup>∗</sup> denotes the Hermitian conjugation. The iterative procedure (31) for solving Equation (43) in the eigenbasis of matrix *A* takes the form

$$
\Lambda \ddot{P}^{(1)} + \ddot{P}^{(1)} \Lambda^\* = -\ddot{\mathcal{Q}},
$$

$$
\Lambda \mathcal{P}^{(k)} + \mathcal{P}^{(k)} \Lambda^\* = -\sum\_{\gamma=1}^{\text{m}} \ddot{N}\_{\gamma} \mathcal{P}^{(k)} \ddot{N}\_{\gamma'}^\*, \ k = 2, 3, \cdots,\tag{44}
$$

$$
\bar{P}\_{\mathbb{C}} = \sum\_{k=1}^{\infty} \bar{P}^{(k)}, \ \bar{P}\_{\mathbb{C}} = \mathcal{U} \bar{P}\_{\mathbb{C}} \mathcal{U}^\*,
$$

where *P*˜(*k*) = *VP*(*k*)*V*∗. Let (*ν<sup>γ</sup> <sup>i</sup>* )*<sup>T</sup>* = *<sup>e</sup><sup>T</sup> <sup>i</sup> <sup>N</sup>*˜ *<sup>γ</sup>* be the *<sup>i</sup>*-th raw matrix *<sup>N</sup>*˜ *<sup>γ</sup>*, where *ei* is the *<sup>i</sup>*-th column of the unit matrix. Then, (44) can be written in terms of the matrix components as

$$\left(\vec{P}^{(1)}\right)\_{ij} = \frac{-1}{\lambda\_i + \lambda\_j^\*} \left(\vec{\mathcal{Q}}\right)\_{ij'},$$

$$\forall k > 1: \quad \left(\vec{P}^{(k)}\right)\_{ij} = -\sum\_{\gamma=1}^m \frac{-\nu\_i^{\gamma} \mathcal{P}^{(k-1)} \left(\nu\_j^{\gamma}\right)^T}{\lambda\_i + \lambda\_j^\*}, \ P\_\mathbb{C} = \mathcal{U} \left(\sum\_{k=1}^\infty \vec{P}^{(k)}\right) \mathcal{U}^\*,\tag{45}$$

#### *4.2. Novel Criterion for the Existence of Gramians*

The iterative procedure (45) assumes an appropriate *criterion for the existence of Gramian PC*, which is based on the convergence of its elements in an iterative process.

**Theorem 4.** *The controllability Gramian PC in (29) exists if (i) the matrix A is stable, and (ii) the inequality holds*

$$\sqrt{\sum\_{i,j} q\_{ij}^2} < 1, \quad q\_{ij} = \sum\_{\gamma=1}^m \frac{|\nu\_i^{\gamma}| \cdot |\nu\_j^{\gamma}|}{|\lambda\_i + \lambda\_j^\*|}, \quad i, j = 1, \dots, n,\tag{46}$$

*where the vectors ν<sup>γ</sup> <sup>i</sup>* = *<sup>U</sup>*∗*N<sup>T</sup> <sup>γ</sup> V*∗*ei, the matrices V*, *U are defined in (42), and λi*, *λ<sup>j</sup> are the eigenvalues of the matrix A. Under the above conditions, the Gramian PC can be obtained using an iterative algorithm (45).*

**Proof.** For the proof, we use the Frobenius norm || · ||*F*. From expressions (45), it follows that

$$\left|\nu\_i^{\gamma}\bar{P}^{(k-1)}\left(\nu\_j^{\gamma}\right)^T\right| \le |\nu\_i^{\gamma}| \cdot |\nu\_j^{\gamma}| \cdot ||\bar{P}^{(k-1)}||\_F$$

$$\left|\left(\mathcal{P}^{(k)}\right)\_{ij}\right| \le q\_{ij} ||\mathcal{P}^{(k-1)}||\_{F'} \cdot ||\mathcal{P}^{(k)}||\_F \le \sqrt{\sum\_{i,j} q\_{ij}^2} \cdot ||\mathcal{P}^{(k-1)}||\_F$$

Thus, under (46), the series ∑<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *<sup>P</sup>*˜(*k*) in (44) is bounded from above by a converging geometric progression, and therefore converges. Adding the *K* equations in (44) and taking the limit *K* → ∞, we obtain a solution to the generalized Lyapunov equation in the eigenvector basis (43). If the series ∑<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *<sup>P</sup>*˜(*k*) converges in the iterative procedure (44) on an eigenbasis, then the corresponding series ∑<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *<sup>P</sup>*(*k*) converges in procedure (31). According to [6] (Proposition 1), if the matrix *A* is stable, then the terms of the series defining the Gramian *PC* in (29) are calculated using terms *P*(*k*) obtained in the iterative procedure (31), that is, *PC* = ∑<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *<sup>P</sup>*(*k*). Hence, the Gramian *PC* exists.

The conditions for the existence of a solution in the Lyapunov Equation (30), established in Theorem 2 [6], are based on the characteristics of the matrices as a whole, whereas Theorem 4 uses the convergence criterion, which is based on more detailed information about the coefficients of the matrices *N<sup>γ</sup>* and the eigenvalues of the matrix *A*. Therefore, we can expect that the criterion of Theorem 4 will allow, in general, to expand the domain of guaranteed existence of a solution in comparison with the criterion of Theorem 2. Let us compare them using an illustrative example.

**Example 1.** *Consider the following generalized Lyapunov equation with parameter :*

$$
\begin{pmatrix} -1 & 0 \\ 0 & -2 \end{pmatrix} \cdot P + P \cdot \begin{pmatrix} -1 & 0 \\ 0 & -2 \end{pmatrix} + \epsilon^2 \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \cdot P \cdot \begin{pmatrix} 1 & 0 \\ 1 & 1 \end{pmatrix} = \begin{pmatrix} -3 & -3 \\ -3 & -3 \end{pmatrix} \tag{47}
$$

*In the notation of Theorem 2, we have*

$$\alpha = \beta = 1,\ \ N N^T = \epsilon^2 \begin{pmatrix} 2 & 1 \\ 1 & 1 \end{pmatrix},\ \ (N N^T)^2 = \epsilon^4 \begin{pmatrix} 5 & 3 \\ 3 & 2 \end{pmatrix},$$

$$||N N^T||\_F = \sqrt{trace((N N^T)^2)} = \epsilon^2 \sqrt{7}.$$

*The condition for the existence of a solution to Equation (47) established by Theorem 2 takes the form*

$$||NN^T|| \le 2\alpha/\beta^2, \quad \epsilon^2 < 2/\sqrt{7} \approx 0.756\dots$$

*In the notation of Theorem 4, we have*

$$\lambda\_1 = -1,\ \lambda\_2 = -2,\ |\nu\_1| = \sqrt{2},\ |\nu\_2| = 1,\ (q)\_{i\bar{j}} = \epsilon^2 \begin{pmatrix} 1 & \sqrt{2}/3\\ \sqrt{2}/3 & 1/4 \end{pmatrix}.$$

*The condition for the existence of a solution to Equation (47) established by Theorem 4 takes the form*

$$
\sqrt{\sum\_{i,j} q\_{ij}^2} = \epsilon^2 \sqrt{217/144} < 1, \quad \epsilon^2 < 12/\sqrt{217} \approx 0.815.
$$

*In this example, the criterion of Theorem 4 allows us to expand the domain of guaranteed existence of solutions (47) in comparison with the criterion of Theorem 2. However, the application of this criterion requires more detailed information about the system.*

We calculate the solution to Equation (47) using the iterative algorithm (45) for = 0.5. In this case, we obtain

$$\mathcal{P}^{(1)} = \begin{pmatrix} 1.5 & 1\\ 1 & 0.75 \end{pmatrix}, \quad \mathcal{P}^{(2)} = \begin{pmatrix} 0.5312 & 0.1458\\ 0.1458 & 0.04687 \end{pmatrix},$$

$$\mathcal{P}^{(3)} = \begin{pmatrix} 0.1089 & 0.01614\\ 0.01614 & 0.002930 \end{pmatrix}, \quad \mathcal{P}^{(4)} = \begin{pmatrix} 0.01599 & 0.001589\\ 0.001589 & 0.0001831 \end{pmatrix},$$

$$\mathcal{P} \approx \sum\_{k=1}^{4} \mathcal{P}^{(k)} = \begin{pmatrix} 2.15609 & 1.1635\\ 1.1635 & 0.79998 \end{pmatrix}. \tag{48}$$

The criterion of Theorem 2 guarantees convergence with the common ratio of geometric progression *q* = <sup>2</sup> <sup>√</sup>7/2 <sup>≈</sup> 0.3307, and the relative accuracy of the solution (48) after four iterations is not worse than *<sup>q</sup>*4/(<sup>1</sup> <sup>−</sup> *<sup>q</sup>*) <sup>≈</sup> 0.0179, that is, 1.79%.

The criterion of Theorem 4 guarantees convergence with the common ratio of geometric progression *q* = <sup>2</sup> <sup>√</sup>217/144 <sup>≈</sup> 0.3069 and the relative accuracy of solution (48) after four iterations is not worse than *<sup>q</sup>*4/(<sup>1</sup> <sup>−</sup> *<sup>q</sup>*) <sup>≈</sup> 0.0128, that is, 1.28%.

In this case, the exact solution to (47) and the actual error after four iterations are as follows:

$$P = \begin{pmatrix} 832/385 & 64/55 \\ 64/55 & 4/5 \end{pmatrix}, \\ P - P = \begin{pmatrix} -0.0049 & -0.00014 \\ -0.00014 & 0.00002 \end{pmatrix}, \\ ||P - P||\_F = 0.0049$$

that is, for ||*P*||*<sup>F</sup>* = 2.83, the relative accuracy of the solution (48) is 0.17%.

#### *4.3. Iterative Algorithm for Computing Sub-Gramians*

Modal Lyapunov Equations (35) and (36) for the controllability sub-Gramians differ from Equation (30) for the Gramian *PC* only on the right-hand side. Therefore, to apply the iterative procedure (45) to compute the sub-Gramians *P*˜*<sup>C</sup> <sup>i</sup>* and *<sup>P</sup><sup>C</sup> ij* , the matrix *<sup>Q</sup>*˜ <sup>=</sup> *VBBTV*<sup>∗</sup> in the first Equation (45) must be replaced with matrices

$$\tilde{Q}\_{i} = \frac{1}{2}V(R\_{i}BB^{T} + BB^{T}R\_{i}^{\*})V^{\*} \quad \text{and} \quad Q\_{ij} = \frac{1}{2}V(R\_{i}BB^{T}R\_{j}^{\*} + R\_{j}BB^{T}R\_{i}^{\*})V^{\*},$$

respectively. The elements of these matrices in the eigenvector basis are calculated as

$$(\bar{Q}\_i)\_{\mathcal{V}} = \frac{1}{2}(\delta\_{ip} + \delta\_{ir})(\bar{Q})\_{\mathcal{V}} \quad \text{and}, \quad (\bar{Q}\_{ij})\_{\mathcal{V}} = \frac{1}{2}(\delta\_{ip}\delta\_{jr} + \delta\_{jp}\delta\_{ir})(\bar{Q})\_{\mathcal{V}'}.$$

where *δls* is the Kronecker delta. Substituting these expressions into the iterative procedure (45) instead of (*Q*˜)*pr*, we obtain the following iterative procedure for *computation of sub-Gramians P*˜*<sup>C</sup> <sup>i</sup>* in (35)

$$\left(\bar{P}\_i^{(1)}\right)\_{pr} = -\frac{1}{2} \cdot \frac{1}{\lambda\_p + \lambda\_r^\*} (\delta\_{ip} + \delta\_{ir}) \left(\bar{Q}\right)\_{pr},$$

$$\forall k > 1: \quad \left(\bar{P}\_i^{(k)}\right)\_{pr} = \sum\_{\gamma=1}^m \frac{-\nu\_p^{\gamma} P\_i^{(k-1)} \left(\nu\_r^{\gamma}\right)^T}{\lambda\_p + \lambda\_r^\*}, \tag{49}$$

$$P\_i^C = \mathcal{U} \left(\sum\_{k=1}^\infty P\_i^{(k)}\right) \mathcal{U}^\*,$$

and an iterative procedure for *computation of pairwise sub-Gramians P<sup>C</sup> ij* in (36)

$$\left(\bar{P}\_{ij}^{(1)}\right)\_{pr} = -\frac{1}{2} \cdot \frac{1}{\lambda\_p + \lambda\_r^\*} (\delta\_{ip}\delta\_{jr} + \delta\_{jp}\delta\_{ir}) \left(\bar{\mathcal{Q}}\right)\_{pr},$$

$$\forall k > 1: \quad \left(\bar{P}\_{ij}^{(k)}\right)\_{pr} = \sum\_{\gamma=1}^m \frac{-\nu\_p^\gamma P\_{ij}^{(k-1)} \left(\nu\_r^\gamma\right)^T}{\lambda\_P + \lambda\_r^\*}. \tag{50}$$

$$P\_{ij}^{\mathbb{C}} = \mathcal{U} \left(\sum\_{k=1}^\infty \bar{P}\_i^{(k)}\right) \mathcal{U}^\*.$$

Sufficient conditions for the applicability of iterative procedures (49) and (50) are the same as those for the iterative procedure (44) established in Theorem 2 or in Theorem 4.

**Example 2.** *To illustrate the definition of sub-Gramians and algorithms for their computation, we calculate the controllability sub-Gramians for Equation (47) with* = 1/2*. As was established in Example 1, the Gramian P exists, and according to Property 1, all sub-Gramians also exist. According to Property 2, the Gramian is split into sub-Gramians in the form*

$$P = P\_1 + P\_2 = \begin{pmatrix} 144/77 & 6/11 \\ 6/11 & 0 \end{pmatrix} + \begin{pmatrix} 112/385 & 34/55 \\ 34/55 & 4/5 \end{pmatrix} = \begin{pmatrix} 832/385 & 64/55 \\ 64/55 & 4/5 \end{pmatrix}.$$

$$P = P\_{11} + P\_{12} + P\_{21} + P\_{22} = \begin{pmatrix} 12/7 & 0 \\ 0 & 0 \end{pmatrix} +$$

$$\begin{pmatrix} 12/77 & 6/11 \\ 6/11 & 0 \end{pmatrix} + \begin{pmatrix} 12/77 & 6/11 \\ 6/11 & 0 \end{pmatrix} + \begin{pmatrix} 52/385 & 4/55 \\ 4/55 & 4/5 \end{pmatrix}.$$

*Moreover, the sub-Gramians themselves, according to Property 4, can be calculated from the corresponding modal Lyapunov Equations (35) and (36), respectively.*

**Example 3.** *For completeness, we present an example of using sub-Gramians to analyze a bilinear model of an electric power system from [20]. As a test bilinear model, the 17th-order model from [5] was used for two interconnected power systems, each area having one steam and one hydro unit. In a test experiment, the contribution of generalized eigenmodes (28) and their pair interactions to the small-signal perturbation energy of the system was estimated based on the coefficient α, which characterized the magnitude of all bilinear terms. To illustrate the process of selecting eigenmodes that are sensitive to bilinear effects, as well as the selection of areas of linear and bilinear behavior of*

*the system, consider Figure 1. One can see the Frobenius norm of sub-Gramians P*˜ *<sup>i</sup> for generalized eigenmodes as a function of the weighting coefficient α. The behavior of the spectral components indicates the range of applicability of the linear model in general and reveals particular eigenmodes that are sensitive to bilinear effects. The arrowhead in Figure 1 indicates the threshold between the linear and bilinear behavior of the system at α* ≈ 4.17*. This threshold can be defined from the condition that the difference between the norms of "linear" and full sub-Gramians corresponding to some eigenmode reaches a certain percentage. In this case, we can see in Figure 1 that the most sensitive to bilinear effects are the S15 and S14 modes. At α* ≈ 4.17 *the norm of their sub-Gramians has increased by 17% and 15%, respectively. The modes S1 and S4/S5 are also sensitive to bilinear effects. The norms of their sub-Gramians have increased by 6.6% and 4.6%, respectively. Other modes are less sensitive, and can be considered in the linear approximation, as long as the norms of their sub-Gramians remain less than the chosen threshold value. The threshold, after which the non-linear behavior of the eigenmode must be considered, can be determined individually for each mode. This information can be used for small-signal or transient stability analyses. A detailed description of the model, test experiment, and its results can be found in [5,20].*

**Figure 1.** The Frobenius norm of sub-Gramians *P*˜ *<sup>i</sup>* for generalized eigenmodes as a function of the weighting coefficient *α* in the test experiment in [20].

#### **5. Discussion**

In this study, we show that (i) the solution of a bilinear system can be split uniquely into generalized modes corresponding to the eigenvalues of the dynamics matrix, and (ii) the controllability and observability Gramians can be split into "sub-Gramians" that characterize the magnitude of these generalized modes and their pairwise interactions. This characterization, however, was proven only for small enough input control. A similar condition arises when establishing the relationship between the Gramians and the energy of states in the system in [16] and, apparently, it is typical for bilinear systems.

In contrast to the spectral expansions of the instantaneous dynamics of a bilinear system in [11–13], the spectral expansions of the *L*2-norms of states and signals considered in this paper can be useful for analyzing the non-linear effects associated with the accumulation of the influence of disturbances over time. Therefore, the practical significance of the obtained results is that they allow the characterization of the contribution of generalized modes or their pairwise combinations to the asymptotic dynamics of the integrated perturbation energy in bilinear systems. In particular, the norm of the obtained sub-Gramians increases when the frequencies of the corresponding oscillating modes approximate each other. Thus, the proposed decompositions may provide a new fundamental approach for quantifying resonant modal interactions in bilinear systems.

When the bilinear effects decrease, the proposed expansions allow a smooth transition to the linear case (see Property 3). This property can be useful in determining the range of applicability of a linear model and identifying generalized eigenmodes that are sensitive to bilinear effects and require "non-linear refinement" of their dynamics. It can be expected that in some large systems, there will be only a few such modes. Therefore, a non-linear examination of their dynamics will not take much time when real-time state estimation is required. The first test experiment with a bilinear model of an electric power system in [20] showed that the proposed spectral decompositions allow one to determine the range of applicability of linear model in general and to reveal particular generalized eigenmodes that sensitive to bilinear effects.

Although this study focuses on continuous bilinear systems, the results obtained can be extended to different classes of systems. First, they can be extended to discrete dynamical systems. In the linear case, this was partially performed in [18]. Meanwhile, the generalized Lyapunov equations that we consider for deterministic bilinear systems can be naturally associated with stochastic linear control systems (see [8]). Therefore, the results of spectral decomposition of Gramians can immediately be carried over to this class of systems. In this case, the results must be interpreted in terms of probabilities. Finally, the equations considered in this study can describe a special class of linear parameter-varying systems that can be reformulated as bilinear dynamical systems [9]. In this case, the interpretation of the spectral decompositions must include the effect of parameter variation.

It should be noted that the main object of research in this study is matrix Lyapunov equations, that is, matrix equations. An alternative approach is to apply the apparatus of linear matrix inequalities and semi-definite programming [25]. Therefore, another possible area of research is the combination of these approaches. In terms of applications, the authors plan to apply the developed methods to study the stability of electric power systems using linear and non-linear graph models. Another emerging area is the analysis of the stability of neural networks, including the use of Lyapunov functions [26,27]. The dissipativity principle in the synchronization of neural networks is very similar to the synchronization of generators in power systems. Therefore, the application of the developed methods to the problem of synchronization of neural networks is another possible direction for future research.

**Author Contributions:** A.I. and I.Y. contributed equally on the development of the theory and their respective analysis. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Science Foundation, grant number 19-19-00673.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **New Identification Approach and Methods for Plasma Equilibrium Reconstruction in D-Shaped Tokamaks**

**Yuri V. Mitrishkin 1,2,***∗***,†, Pavel S. Korenev 2,†, Artem E. Konkov 2,†, Valerii I. Kruzhkov 1,2,† and Nicolai E. Ovsiannikov 1,2,†**


**Abstract:** The paper deals with the identification of plasma equilibrium reconstruction in D-shaped tokamaks on the base of plasma external magnetic measurements. The methods of such identification are directed to increase their speed of response when plasma discharges are relatively short, like in the spherical Globus-M2 tokamak (Ioffe Inst., St. Petersburg, Russia). The new approach is first to apply to the plasma discharges data the off-line equilibrium reconstruction algorithm based on the Picard iterations, and obtain the gaps between the plasma boundary and the first wall, and the second is to apply new identification methods to the gap values, producing plasma shape models operating in real time. The inputs for on-line robust identification algorithms are the measurements of magnetic fluxes on magnetic loops, plasma current, and currents in the poloidal field coils measured by the Rogowski loops. The novel on-line high-performance identification algorithms are designed on the base of (i) full-order observer synthesized by linear matrix inequality (LMI) methodology, (ii) static matrix obtained by the least square technique, and (iii) deep neural network. The robust observer is constructed on the base of the LPV plant models which have the novelty that the state vector contains the gaps which are estimated by the observer, using input and output signals. The results of the simulation of the identification systems on the base of experimental data of the Globus-M2 tokamak are presented.

**Keywords:** tokamak; plasma equilibrium reconstruction; linear plasma models; identification; state observer; LMI; least square technique; deep neural network

#### **1. Introduction**

Tokamaks [1], toroidal vessels with magnetic coils (Figure 1), originated at the I.V. Kurchatov Institute of Atomic Energy in the USSR and spread around the world to solve the problem of controlled thermonuclear fusion: obtaining energy from the fusion of the light elements nuclei. The most promising devices for solving this problem are vertically elongated tokamaks with increased gas-kinetic pressure (D-shaped tokamaks) (Figure 1). Plasma (the fourth state of matter) vertically elongated by an external magnetic field is unstable in the vertical direction, and it is necessary to use automatic feedback control systems to keep it near the first tokamak wall.

In our studies, we developed, modeled, and applied control systems of plasma position, current and shape for various tokamaks: ITER (International Thermonuclear Experimental Reactor, Cadarache, France) [2,3], T-15MD (tokamak created at NRC "Kurchatov Institute", Moscow, Russia, planned to be launched in the near future) [4–6], Tuman-3 (toroidal installation with adiabatic compression) [3,4], Globus-M2 (spherical tokamak) [4,7,8] (operating at Ioffe Physics and Technology Institute of RAS, St. Petersburg, Russia), T-11M (operating circular tokamak) [9], and IGNITOR (JSC "SSC RF TRINITI", Troitsk, Russia) [10].

**Citation:** Mitrishkin, Y.V.; Korenev, P.S.; Konkov, A.E.; Kruzhkov, V.I.; Ovsiannikov, N.E. New Identification Approach and Methods for Plasma Equilibrium Reconstruction in D-Shaped Tokamaks. *Mathematics* **2022**, *10*, 40. https://doi.org/ 10.3390/math10010040

Academic Editors: Igor Yadykin, Andrei Torgashov, Nikolay Korgin and Natalia Bakhtadze

Received: 21 November 2021 Accepted: 19 December 2021 Published: 23 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** Vertically elongated tokamak without iron core: 1 is the VV; 2 is the toroidal field coil; 3 are the poloidal field inner and outer coils; 4 are plasma and helical magnetic lines (©ITER Project Center (Russia): https://www.iterrf.ru/index.php/istoriya-sozdaniya-proekta, accessed on 21 December 2021).

Since optical reconstruction codes such as OFIT [11] are not available in many tokamaks, the plasma boundary in D-shaped tokamaks usually is not measured directly but rather reconstructed from the external measurements. There are a number of codes which are able to solve that problem, off-line and on-line [12]. The most popular of them are EFIT (equilibrium fitting) [13], which uses the Picard iterations [14] and which is applied on-line on a set of tokamaks, such as DIII-D, NSTX (U.S.), EAST (China), KSTAR (S. Korea) and RTLIUQE [15] used on TCV (Switzerland). These codes were adopted for ITER.

In this work, the new plasma equilibrium reconstruction algorithms are to be inserted into the plasma position, current, and shape feedback control system of the Globus-M2 tokamak. In Figure 2, one can see the digital model of that system [16] where the plasma equilibrium reconstructed algorithm is to be identified by the new methods proposed in the paper.

**Figure 2.** Structure scheme of the plasma position, current and shape control system of Globus-M2 tokamak.

#### **2. Reconstructing Plasma Equilibria from External Magnetic Measurements**

A tokamak is an axially symmetrical device, so the tokamak plasma equilibrium is described in the poloidal plane (*r*, *z*), typically in terms of the poloidal magnetic flux distribution *Ψ*(*r*, *z*), which is defined as the flux of the magnetic field vector - *B* through a surface *S* bounded by the line (*r* = *const*, *z* = *const*):

$$\Psi(r,z) = \frac{1}{2\pi} \iint\limits\_{\mathcal{S}} \vec{B}d\vec{\mathcal{S}}.\tag{1}$$

The magnetic field lines, along which plasma particles move, lie on the flux surfaces *Ψ*(*r*, *z*) = *const*; therefore, the boundary of the magnetically confined plasma can be found as the largest closed flux surface.

The toroidal current density *J<sup>ϕ</sup>* in the tokamak is connected with the poloidal flux through the linear second-order partial differential equation [1]:

$$-\mu\_0^{-1} \left( \frac{\partial}{\partial r} \frac{1}{r} \frac{\partial}{\partial r} + \frac{1}{r} \frac{\partial^2}{\partial z^2} \right) \Psi = J\_\Psi. \tag{2}$$

The boundary conditions for the equation are obtained from the definition and the physical meaning of the poloidal flux:

$$\left.\Psi\right|\_{r=0} = 0, \left.\Psi\right|\_{r=\infty} = 0.\tag{3}$$

When the right-hand side of the Equation (2) is known, it can be solved with the standard numerical methods, for example, using the corresponding Green's function *G* [17]:

$$\begin{aligned} \Psi(r,z) &= \iint J\_{\emptyset}(r',z')G(r,z,r',z')dr'dz',\\ G(r,z,r',z') &= \frac{\mu\_0}{\pi} \sqrt{\frac{rr'}{k^2}} \left( \left(1 - \frac{k^2}{2}\right)K(k^2) - E(k^2) \right), \end{aligned}$$

where *K* and *E* are the elliptic integrals of the first and the second kind respectively, and

$$k^2 = \frac{4rr'}{(r+r')^2 + (z-z')^2}.$$

In practice, the plasma current distribution and the induced currents in the conductive Vacuum Vessel (VV) of the tokamak are often not available for real-time reconstruction and must be identified together with the poloidal flux distribution from the external magnetic measurements, which include coil currents *I*1, ... , *INc* , total plasma current *Ip* measured by Rogowski coils and poloidal flux values *Ψ* at finite number of points 8 (*r*1, *z*1),...,(*rNl* , *zNl* ) 9 by magnetic loops outside the plasma.

Hence, the plasma equilibrium reconstruction problem is to find plasma area *Sp*, plasma current distribution *Jp* and induced current density *Jv* such that:

$$\chi^2 = \left(I\_{\mathcal{V}} - \int I\_{\mathcal{V}} dS\right)^2 / \sigma\_p^2 + \sum\_{j=1}^{N\_l} \left(\Psi\_j - \Psi(r\_j, z\_j)\right)^2 / \sigma\_j^2 \xrightarrow[I\_p, J\_v]{} \min,\tag{4}$$

where *σ<sup>p</sup>* and *σ<sup>j</sup>* are uncertainties of the plasma current and poloidal flux at *j* th magnetic loop, *Ψ*(*r*, *z*) is the solution of the Equation (2) with boundary conditions (3) and the right-hand side:

$$J\_{\varphi}(r,z) = \begin{cases} I\_{p\prime} & (r,z) \in S\_{p\prime} \\ I\_k N\_k / S\_{k\prime} & (r,z) \in S\_{k\prime} \\ J\_{v\prime} & (r,z) \in S\_{v\prime} \end{cases} \quad k=1,\ldots,N\_{c\prime}$$

*Sk* and *Sv* are the area occupied by the *k*th coil and the VV, respectively, *Nk* is the number of turns of the *k*th coil.

Optionally, coil current measurements may also be considered uncertain and accounted in the functional (4) by terms *Ik* <sup>−</sup> *<sup>I</sup>measured k* 2 /*σ*<sup>2</sup> *Ik* , *k* = 1, ... , *Nc*. The functional may also include other measurements that can be expressed in terms of the currents and the magnetic flux. Finally, as the plasma equilibrium reconstruction problem is ill-posed in the sense of Hadamard, the functional may include a regularization term.

To find the plasma shape in the Globus-M2 tokamak (Figure 3), the flux-current distribution identification (FCDI) code was used [14]. The FCDI code applies the following expression for the plasma toroidal current density, obtained from the plasma force balance equations [1,14]:

$$J\_p = rp'(\mathcal{Y}) + \frac{1}{\mu\_0 r} F(\mathcal{Y}) F'(\mathcal{Y}).$$

where *p* is plasma pressure and *F* is poloidal current defined analogous to poloidal flux (1):

$$F = \frac{\mu\_0}{2\pi} \iint\limits\_S \vec{J} d\vec{S} ...$$

The Picard iteration method is used to find the poloidal flux distribution. Since *F* and *p* depend only on poloidal flux, on each iteration, the FCDI code approximates the plasma current density by polynomials of the poloidal flux from the previous iteration:

$$p'(\Psi) = \sum\_{k=0}^{N\_F} c\_k^{(p)} \Psi^k,$$

$$F(\Psi)F'(\Psi) = \sum\_{k=0}^{N\_F} c\_k^{(F)} \Psi^k.$$

.

Similarly, the VV currents are approximated as a linear combination of some basis functions, for example, orthogonal VV current modes [18]:

$$J\_{\upsilon} = \sum\_{k=0}^{N\_{\upsilon}} c\_k^{(\upsilon)} J\_{k^{\ast}}$$

The coefficients of the *Jp* polynomials and the *Jv* basis function regression are found then by minimizing the error functional (4) which can be written in the matrix form:

$$\chi^2 = \|Ac - b\|^2.$$

Here, *<sup>c</sup>* is the *<sup>N</sup>* <sup>×</sup> 1 column-vector of the coefficients *<sup>c</sup>*(*p*), *<sup>c</sup>*(*F*), *<sup>c</sup>*(*v*), *<sup>N</sup>* <sup>=</sup> *Np* <sup>+</sup> *NF* <sup>+</sup> *Nv*, *A* is the *M* × *N* matrix, where *M* is the number of magnetic measurements used, and *b* is the *N* × 1 column-vector. To regularize the problem, the SVD truncation method is used to minimize the quadratic functional [19]. After the coefficients are determined, the corresponding poloidal flux distribution is calculated, which is used for the polynomials construction on the next iteration. The iterations are continued until the error *χ*<sup>2</sup> is sufficiently small or the maximal number of iterations is reached.

**Figure 3.** Globus-M2 tokamak (©Ioffe Physics and Technology Institute of RAS, St. Petersburg, Russia).

#### **3. Experimental Data**

The FCDI code was applied to 50 discharges of the Globus-M2 tokamak. For each discharge, there are magnetic measurements available *y*(1), *y*(2), ... , *y*(50), which include currents in the 8 control coils (Horizontal Field Coil, Vertical Field Coil, Central Solenoid, Poloidal Field Coil 1, upper and lower sections of the Poloidal Field Coil 2, Poloidal Field Coil 3 and Correcting Coil) (Figure 4), poloidal magnetic flux from 21 loops, vertical dipole magnetic flux (difference between magnetic flux above and below plasma), horizontal dipole magnetic flux (difference between magnetic flux on the left and on the right of the plasma), and quadrupole magnetic flux (expressed as *ψ*(*L*1) − *ψ*(*L*2) + *ψ*(*L*3) − *ψ*(*L*4), with location of loops *<sup>L</sup>*1–*L*<sup>4</sup> shown in Figure 4) so that *<sup>y</sup>*(*i*) <sup>∈</sup> <sup>R</sup>33×*si* , *<sup>i</sup>* <sup>∈</sup> [1; 50], where each *si* = *Ti*/*τ*, *Ti* is the duration of the discharge, *τ* is the discretization step. Here, the discretization step is the time step between the reconstructed off-line equilibria. It is constrained only by the discretization time of the experimental measurements.

**Figure 4.** Poloidal system of the Globus-M2 tokamak and plasma boundary with strike points *g*1, *g*2 and gaps *g*3–*g*6.

From these data, the FCDI code obtains plasma current distribution and plasma boundary coordinates for the divertor phases of the discharges. The calculated plasma shapes are represented by the positions of 2 strike points (*g*1, *g*2) on the VV and values of 4 gaps (*g*3–*g*6) between plasma and VV (Figure 4) *<sup>g</sup>*(1), *<sup>g</sup>*(2), ... , *<sup>g</sup>*(50); *<sup>g</sup>*(*i*) <sup>∈</sup> <sup>R</sup>6×*si* . The strike points are points of intersection of the poloidal flux isoline, which bounds the plasma and the VV. Their coordinates *g*<sup>1</sup> and *g*<sup>2</sup> are calculated as the distance from point *P*<sup>6</sup> in Figure 4 along the VV. The gap *g*<sup>3</sup> is calculated as the distance between point *P*<sup>3</sup> on the VV outer wall and the plasma boundary on the horizontal line, *g*<sup>4</sup> is the distance between *P*<sup>4</sup> and the plasma boundary on the 45◦ line, *g*<sup>5</sup> is the distance between *P*<sup>5</sup> and the plasma boundary along the vertical line, and *g*<sup>6</sup> is the distance between point *P*<sup>6</sup> on the VV inner wall and the plasma along the horizontal line. The *g*1–*g*<sup>6</sup> values describe plasma shape in the LSND (lower single null divertor) configuration, typical for the Globus-M2 tokamak. Other configurations may require different sets of descriptors, but the identification methods described below are applicable all the same.

#### **4. Plasma Model**

The plasma dynamics is described by Faraday's law equations:

$$\frac{d}{dt}\Phi(J\_{\mathbb{P}'}I) + RI = \mathbb{U},\tag{5}$$

and force balance equation

$$
\vec{F}(f\_{p\prime}, I) = 0.\tag{6}
$$

The measured fluxes and the plasma shape are determined by currents in the tokamak:

$$\begin{aligned} \Psi &= \Psi(f\_{\mathbb{P}'}I), \\ \mathbf{g} &= \mathbf{g}(f\_{\mathbb{P}'}I). \end{aligned} \tag{7}$$

Here, *I* = [*I*<sup>T</sup> *<sup>c</sup>* , *I*<sup>T</sup> *<sup>v</sup>* , *Ip*] T, Φ, *R*, and *U* are respectively the column-vector of currents, column-vector of magnetic flux, diagonal matrix of electrical resistance and column-vector of the voltage applied to the control coils, VV, and plasma - *F* is the force acting on the plasma, *g* is the column vector of strike points positions on the VV and the gaps between the plasma and VV, *Ψ* is the column vector of the fluxes measured by the tokamak diagnostics. The plasma mass is neglected.

The magnetic flux vector can be expressed as Φ(*Jp*, *I*) = *M*(*Jp*)*I*, where *M* is the inductance matrix. The dependence of the inductance matrix *M*, force - *F* and plasma shape *g* on plasma current distribution *Jp* is nonlinear but for the small deviations from the reconstructed equilibrium, the linearized model is sufficient. Assuming that plasma can rigidly move in vertical and radial directions, the linearized Equations (5)–(7) take form:

$$M\frac{dI}{dt} + \frac{\partial \Phi}{\partial \vec{r}\_p} \frac{d\vec{r}\_p}{dt} + R\delta I = \delta II,$$

$$\frac{\partial \vec{F}}{\partial I} \delta I + \frac{\partial \vec{F}}{\partial \vec{r}\_p} \delta \vec{r}\_p = 0,$$

$$\delta \Psi = \frac{\partial \Psi}{\partial I} \delta I + \frac{\partial \Psi}{\partial \vec{r}\_p} \delta \vec{r}\_{p'}$$

$$\delta g = \frac{\partial g}{\partial I} \delta I + \frac{\partial g}{\partial \vec{r}\_p} \delta \vec{r}\_{p'}$$

where*rp* is the radius-vector*rp* = [*rp*, *zp*] <sup>T</sup> of plasma center of mass, *δ* denotes deviation from the scenario value.

Introducing state vector *x* = *δI* = [*δI*<sup>T</sup> *<sup>c</sup>* , *δI*<sup>T</sup> *<sup>v</sup>* , *δIp*] T, input vector *u* = *δU* and output vector of plasma and coil currents, gaps, and fluxes deviations *y* = [*δIp*, *δI*<sup>T</sup> *<sup>c</sup>* , *δΨ*T, *δg*T] T, the LPV (linear parameter varying) model takes the standard state-space form:

$$\begin{cases}
\dot{\boldsymbol{x}}(t) = A\_m(J\_{p\prime}\boldsymbol{t})\boldsymbol{x}(t) + B\_m(J\_{p\prime}\boldsymbol{t})\boldsymbol{u}(t), \\
\boldsymbol{y}(t) = \mathbb{C}\_m(J\_{p\prime}\boldsymbol{t})\boldsymbol{x}(t).
\end{cases} \tag{8}$$

The reconstructed plasma current distributions *Jp* are used to calculate series of linear models {*A*, *B*, *C*}*nm* describing plasma dynamics in each considered discharge. Here, index *n* denotes time moment *tn* for which the model is obtained

$$A\_{nm} = A\_m(f\_{p\_l}, t\_n), \\ B\_{nm} = B\_m(f\_{p\_l}, t\_n), \\ \mathbb{C}\_{nm} = \mathbb{C}\_m(f\_{p\_l}, t\_n), \\ n = 1, \ldots, N\_{nm}$$

where *t*1, ... , *tNm* correspond to the time points of the divertor phase of the *m*th tokamak discharge with the time step of 1 ms and index *m* denotes the serial number of discharge. This represents the LPV model (8) as an array of LTI (linear time invariant) models. During modeling each discharge, a linear interpolation is performed between time points from *t*<sup>1</sup> to *tNm* .

The models have 24 states, 8 inputs, and 39 outputs. Each obtained model has a single real positive pole.

Although the models include expressions for the gaps as the outputs, the gaps are not directly measured on the tokamak, so it may be convenient to apply state-space coordinate transformation, replacing any 6 currents with gaps in the state vector and removing gaps from the outputs. Furthermore, use the ZOH (zero-order hold) for discretization with sample time *Ts* = 0.1 ms such that

$$t(T\_{\sf s}k) \le t \le t(T\_{\sf s}k + T\_{\sf s}), k \in \mathbb{Z},$$

$$A\_{nm}^d = \exp(A\_{nm}T\_{\sf s}), B\_{nm}^d = A\_{nm}^{-1}(A\_{nm}^d - I)B\_{nm}, \mathbf{C}\_{nm}^d = \mathbf{C}\_{nm}.$$

The final array of discrete-time models in the state-space form is obtained

$$\begin{cases} \boldsymbol{\mathfrak{x}}(T\_{\sf s}\boldsymbol{k} + T\_{\sf s}) = \boldsymbol{A}\_{nm}^{d}\boldsymbol{\mathfrak{x}}(T\_{\sf s}\boldsymbol{k}) + \boldsymbol{B}\_{nm}^{d}\boldsymbol{\mathfrak{u}}(T\_{\sf s}\boldsymbol{k}),\\ \boldsymbol{y}(T\_{\sf s}\boldsymbol{k}) = \boldsymbol{C}\_{nm}^{d}\boldsymbol{\mathfrak{x}}(T\_{\sf s}\boldsymbol{k}). \end{cases} \tag{9}$$

The models have 8 inputs *u* = *δU*, 24 states *x* = [*δg*T, *δ* ˆ*I*T] <sup>T</sup> consisting of 6 gaps and truncated to 18 elements current vector ˆ*I*, 33 outputs *y* = [*δIp*, *δI*<sup>T</sup> *<sup>c</sup>* , *δΨ*T] <sup>T</sup> directly corresponding to the values measured by the diagnostics at Globus-M2 tokamak: plasma current, 8 currents in control coils, poloidal magnetic flux from 21 loops, quadrupole magnetic flux, and vertical and horizontal dipole magnetic flux. The inclusion of the gaps in the state vector is convenient for some applications, one of which is described in the next section.

#### **5. Plasma Shape Identification by Robust Observer Synthesized by LMI**

The idea of gap estimation with a robust discrete state observer is as follows. Using the FCDI code, a series of LPV models for a series of plasma discharges is computed. The gaps are included in the state vector of all linear models, and the output vector includes the signals measured by the magnetic diagnostics system of the tokamak. Then, using the LMI method, a unified state observer is synthesized, which provides minimal error between states and state estimates for a series of LPV models.

The synthesized observer can be used in a real experiment, with experimental signals connected to its input as shown in Figure 5.

The unified observer for an array of linear models of the plant ensures the minimum error between the state vectors and state estimation and consequently between the gaps values and gaps estimation over the entire discharge duration. This further guarantees the robust behavior of the synthesized plasma shape control system.

**Figure 5.** Robust observer synthesized via LMIs for use in a real experiment. The red vector signal includes experimental signals obtained by the magnetic diagnostic system. The blue vector signal includes voltages on the poloidal coils. The yellow signal contains states estimation, which includes gaps estimation.

The state equation of the full-order discrete-time observer [20] is given as follows

$$
\tilde{\mathfrak{x}}(T\_5k + T\_5) = A^d \tilde{\mathfrak{x}}(T\_5k) + B^d \mathfrak{u}(T\_5k) + L \left( \mathfrak{y}(T\_5k) - \mathbb{C}^d \tilde{\mathfrak{x}}(T\_5k) \right),
$$

where *<sup>x</sup>*˜ is the state estimation vector of discrete-time state-space plant model {*Ad*, *<sup>B</sup>d*, *<sup>C</sup>d*}, *Ts* is the sample time and *L* is the matrix of the observer.

Then it is necessary to perform the transition to the error equation of the observer

$$e(T\_s k + T\_s) = \left(A^d - LC^d\right)e(T\_s k)\_f$$

where *e* = *x* − *x*˜ is the error between the states and state estimations.

The matrix inequalities systems for the observer synthesis are obtained using the generalized Lyapunov theorem [21]

$$\begin{cases} \mathcal{X} \succ 0, \\ R(\mathcal{X}, V) = L\_{\mathbb{D}} \otimes \mathcal{X} + M\_{\mathbb{D}} \otimes \left( \mathcal{X}(A^d - L\mathcal{C}^d) \right) + M\_{\mathbb{D}}^{\mathrm{T}} \otimes \left( \mathcal{X}(A^d - L\mathcal{C}^d) \right)^{\mathrm{T}} \prec 0, \end{cases}$$

where the symbol "⊗" denotes the Kronecker product.

The poles of the observer are placed in the D-region formed by the disk with the characteristic function

$$F\_{\mathbb{D}}(s) = L\_{\mathbb{D}} + sM\_{\mathbb{D}} + \mathbb{S}M\_{\mathbb{D}}^{\mathbb{T}} < 0,$$

where

$$L\_{\mathbb{D}} = \begin{bmatrix} -0.5 & 0\\ 0 & -0.5 \end{bmatrix}, \quad M\_{\mathbb{D}} = \begin{bmatrix} 0 & 1\\ 0 & 0 \end{bmatrix}. \tag{10}$$

The choice of this D-region is due to the need, on the one hand, to provide shorter transition times in the observer compared to the plant model, and on the other hand, the D-region should not be too small; otherwise, it would be impossible to find a solution of the LMI system for the array of plant models.

The synthesizable observer should qualitatively estimate the states for each LTI model from (9), which is obtained from the LPV model (8) for the *m*th plasma discharge. In addition, the same observer should qualitatively estimate the states for several LPV models corresponding to several discharges. In this approach, the robust performance of the synthesized observer is achieved.

The LMI system for obtaining the observer matrix for the array of models in state-space (9) with the replacement of *V* = *XL* is as follows

$$\begin{cases} X \succ 0, \\ R\_1(X, V) = L\_\mathbb{D} \otimes X + M\_\mathbb{D} \otimes \left( X A\_{11} \right) + M\_\mathbb{D}^\mathsf{T} \otimes \left( X A\_{11} \right)^\mathsf{T} \\ \qquad \quad - M\_\mathbb{D} \otimes \left( V \mathbb{C}\_{11} \right) - M\_\mathbb{D}^\mathsf{T} \otimes \left( V \mathbb{C}\_{11} \right)^\mathsf{T} \prec 0, \\ \qquad \quad \vdots \\ R\_r(X, V) = L\_\mathbb{D} \otimes X + M\_\mathbb{D} \otimes \left( X A\_{\text{nm}} \right) + M\_\mathbb{D}^\mathsf{T} \otimes \left( X A\_{\text{nm}} \right)^\mathsf{T} \\ \qquad \quad - M\_\mathbb{D} \otimes \left( V \mathbb{C}\_{\text{nm}} \right) - M\_\mathbb{D}^\mathsf{T} \otimes \left( V \mathbb{C}\_{\text{nm}} \right)^\mathsf{T} \prec 0, \end{cases} \tag{11}$$

where *n* = 1, . . . , *Nm*, *m* = 1, . . . , *M* and *r* = 1, . . . , *NmM*.

The LMI system (11) includes *NmM* + 1 LMIs, and it must be solved with respect to the two unknown matrices, *X* and *V*. The matrix of the observer is defined as

$$L = X^{-1}V.$$

Finally, the gap estimation vector *δg*˜ is obtained as follows

$$
\delta \mathfrak{F} = \mathfrak{S}\_{\mathfrak{F}} \mathfrak{F},
$$

where *Sg* (Figure 5) is the gaps estimation selection matrix from the state vector estimation

$$\mathcal{S}\_{\mathcal{R}} = \begin{bmatrix} I\_6 & 0\_{6,18} \end{bmatrix} \prime$$

where *I*<sup>6</sup> is the identity matrix and 06,18 is the zeros matrix of the appropriate size.

The comparison of the gaps variations *δg* derived by LPV model obtained from the FCDI code and the gaps variations estimation *δg*˜ obtained by the robust observer synthesized via LMIs for plasma discharge #37263 is shown in Figure 6.

**Figure 6.** Comparison of gaps variations *δg* derived by LPV model obtained from the FCDI code (blue line) and estimation of gap variations *δg*˜ obtained from robust observer synthesized by LMIs (red line). Globus-M2 discharge #37263.

#### **6. Static Matrix Plasma Shape Identification**

The idea of the static matrix plasma shape estimation is that at any time moment *j* ∈ [1;*si*] of any discharge *i* ∈ [1; 50], the plasma shape estimation *g*ˆ (*i*) *<sup>j</sup>* <sup>∈</sup> <sup>R</sup><sup>6</sup> may be obtained by multiplication of the measurable signals at this time moment *y* (*i*) *<sup>j</sup>* <sup>∈</sup> <sup>R</sup><sup>33</sup> and matrix *<sup>K</sup>* <sup>∈</sup> <sup>R</sup>6×<sup>33</sup> summing the base gap values *<sup>g</sup>*˜ <sup>∈</sup> <sup>R</sup>6. As this takes place, the matrix *<sup>K</sup>* and vector *g*˜ are constant for all plasma discharges:

$$\mathfrak{g}\_{\mathfrak{j}}^{(i)} = K \mathfrak{y}\_{\mathfrak{j}}^{(i)} + \mathfrak{g}\_{\prime} \; K = \text{const}, \; \mathfrak{g} = \text{const}, \; \mathfrak{g}\_{\mathfrak{j}}^{(i)} \in \mathbb{R}^6,\tag{12}$$

The matrix *K* and vector *g*˜ are obtained by the minimization of the summed squared differences between the estimated and the reconstructed values of all 6 gaps in all 50 discharges at all time moments:

$$E(K, \mathfrak{g}) = \sum\_{i=1}^{50} \sum\_{k=1}^{6} \sum\_{j=1}^{s\_i} (\mathfrak{g}\_{jk}^{(i)} - \mathfrak{g}\_{jk}^{(i)})^2 = \sum\_{i=1}^{50} \sum\_{k=1}^{6} \sum\_{j=1}^{s\_i} (K \mathfrak{g}\_j^{(i)} + \mathfrak{g} - \mathfrak{g}\_{jk}^{(i)})^2 \xrightarrow[K, \mathfrak{k}]{} \min,\tag{13}$$

where *g*ˆ(*i*) = [*g*ˆ (*i*) <sup>1</sup> , *g*ˆ (*i*) <sup>2</sup> , ... , *g*ˆ (*i*) *si* ] <sup>∈</sup> <sup>R</sup>6×*si* is the matrix estimation of *<sup>i</sup>* th discharge gaps, *g*(*i*) = [*g* (*i*) <sup>1</sup> , *g* (*i*) <sup>2</sup> , ... , *g* (*i*) *si* ] <sup>∈</sup> <sup>R</sup>6×*si* is the matrix of the reconstructed values of *<sup>i</sup>* th discharge gaps. If *y*(*i*) = [*y* (*i*) <sup>1</sup> , *y* (*i*) <sup>2</sup> , ... , *y* (*i*) *si* ] <sup>∈</sup> <sup>R</sup>6×*si* is the matrix of measurable signals of the *<sup>i</sup>* th discharge so *<sup>r</sup>*(*i*)(*g*˜)=[*g*˜, *<sup>g</sup>*˜, ... , *<sup>g</sup>*˜] <sup>∈</sup> <sup>R</sup>6×*si* is the matrix with the same columns *<sup>g</sup>*˜. Equation (12) can be rewritten in the matrix form,

$$\mathfrak{g}^{(i)} = K\mathfrak{g}^{(i)} + r^{(i)}(\mathfrak{g}), \; \mathfrak{g}^{(i)} \in \mathbb{R}^{6 \times s\_i}.\tag{14}$$

Equation (13) can be rewritten in matrix form as follows:

$$E(K, \tilde{\mathcal{g}}) = \sum\_{i=1}^{50} \|\hat{\mathcal{g}}^{(i)} - \mathcal{g}\_k^{(i)}\|^2 \xrightarrow[K, \tilde{\mathcal{G}}]{} \text{min} \,. \tag{15}$$

Let *<sup>Y</sup>* = [*y*(1), *<sup>y</sup>*(2),..., *<sup>y</sup>*(50)]; *<sup>Y</sup>* <sup>∈</sup> <sup>R</sup>33×*S*, where *<sup>S</sup>* <sup>=</sup> <sup>∑</sup><sup>50</sup> *<sup>i</sup>*=<sup>1</sup> *si*, *<sup>G</sup>* = [*g*(1), *<sup>g</sup>*(2),..., *<sup>g</sup>*(50)]; *<sup>G</sup>* <sup>∈</sup> <sup>R</sup>6×*<sup>S</sup>* and *<sup>G</sup>*<sup>ˆ</sup> = [*g*ˆ(1), *<sup>g</sup>*ˆ(2), ... , *<sup>g</sup>*ˆ(50)]; *<sup>G</sup>*<sup>ˆ</sup> <sup>∈</sup> <sup>R</sup>6×*S*. Matrix *<sup>Y</sup>* contains all measurements, matrix *G* is all reconstructed gaps. Since *G*ˆ = [*g*ˆ(1), *g*ˆ(2), ... , *g*ˆ(50)]=[*Ky*(1) + *r*, *Ky*(2) + *r*,..., *Ky*(50) + *r*], problems (14) and (15) are equivalent to

$$\begin{aligned} \text{G}(K,\underline{\text{g}}) &= KY + R(\underline{\text{g}}), \text{ } \text{G} \in \mathbb{R}^{6 \times S} \\ E(K,\underline{\text{g}}) &= \| \text{G}\_{\underline{k}}(\underline{\text{g}}) - \text{G}\_{\underline{k}} \|^{2} \xrightarrow[K\_{\underline{g}}{\underline{k}}] \text{min} \end{aligned} \implies \begin{aligned} K(\underline{\text{g}}) &= (\text{G} - R(\underline{\text{g}}))Y^{+} \\ Y^{+} &= (Y^{\text{T}}Y)^{-1}Y^{\text{T}} \end{aligned} \tag{16}$$

where *<sup>R</sup>*(*g*˜)=[*g*˜, *<sup>g</sup>*˜, ... , *<sup>g</sup>*˜] <sup>∈</sup> <sup>R</sup>6×*<sup>S</sup>* is the matrix with the corresponding columns *<sup>g</sup>*˜. If *<sup>g</sup>*˜ is known, then the problem is the overdetermined system of linear equations and can be solved by the generalized inverse matrix: *<sup>K</sup>*(*g*˜)=(*<sup>G</sup>* <sup>−</sup> *<sup>R</sup>*(*g*˜))*Y*+. Then *<sup>G</sup>*ˆ(*g*˜) = *<sup>K</sup>*(*g*˜)*<sup>Y</sup>* <sup>+</sup> *R*(*g*˜). Problem (16) is equivalent to:

$$E(\mathfrak{g}) = \left\|{\hat{G}\_k(\mathfrak{g}) - G\_k}\right\|^2 \xrightarrow[\mathfrak{F}]{} \min. \tag{17}$$

This problem can be solved by the iterative gradient method:

$$
\mathfrak{g}' = \mathfrak{g} - \gamma \nabla E(\mathfrak{g}).\tag{18}
$$

This is done to obtain matrix *K* and the base gap values using the data of 50 discharges of the Globus-M2 tokamak. The calculated base gap values are

$$\begin{aligned} \text{g}\_1 &= 0.5235 \text{ m}, \text{ g}\_2 = 0.6264 \text{ m}, \text{ g}\_3 = 0.0217 \text{ m}, \\ \text{g}\_4 &= 0.1058 \text{ m}, \text{ g}\_5 = 0.1590 \text{ m}, \text{ g}\_6 = 0.0278 \text{ m}. \end{aligned} \tag{19}$$

The obtained matrix *K* and base gap values are tested on the discharge #37712 that was not used for identification (Figure 7). The mean squad error (MSE) of all gaps estimation at all moments of time is 1.5 <sup>×</sup> <sup>10</sup>−<sup>5</sup> <sup>m</sup>2.

**Figure 7.** Estimation of the gap values in the discharge #37712 of the Globus-M2 spherical tokamak.

#### **7. Neural Network for Plasma Shape Identification**

In this section, the identification system based on an artificial neural network is proposed. It is assumed that the input and the output data can be linked using some unknown function *f* . Neural networks are well known for their ability to approximate unknown functions [22]. Attempts to apply them to plasma research in tokamaks began as early as the 1990s. Several major results have been achieved, including the tasks of plasma equilibrium reconstruction [23–26]. However, the vast majority of studies use feed-forward neural networks with multiple hidden layers to approximate the unknown mapping function, which have not shown good results in this problem in the area of generalization to various unknown discharges. To improve this ability, this paper proposes an approach using an encoder–decoder network structure [27].

Neural networks are based on the concept of artificial neurons. The first concept was proposed by Rosenblat [28], called perceptron. It receives inputs (*X*1, *X*2, .., *Xn*) and sums it with weights (*W*1, *W*2, .., *Wn*). Then the special function, named the transfer function, is applied to this sum product. The result of the transfer function is the output of the neuron. The most simple neural network, called multilayer perceptron, consists of three layers of neurons: the first one gets the input data, the second one is hidden and processes this data, and the third one is an output layer (Figure 8).

To approximate an unknown function *f* , a neural network needs to be trained on some given data. The better approximation is achieved by adjusting weights of network's neurons to minimize the value of the loss function, which is computed between the network output and groundtruth values.

In this work, the input and output data are represented as time sequences. Each point in time corresponds to a vector of features, so the data can be described by the matrix with dimensions of time intervals and parameters. The dimensionality in time is equal to 4110, i.e., there are 4110 data vectors at each time point of discharge. The time step is 6.38 <sup>×</sup> <sup>10</sup>−<sup>5</sup> s, so the entire signal is 0.262154 s. The variation in the absolute values of the various parameters, particularly the coil currents, is quite large, so the normalized values are used. They are obtained by subtracting the average value over the entire time sequence from each value of a particular parameter, and then dividing the result by the standard deviation.

However, the time dimension length is not fixed, as the plasma shape parameters are only determined during the divertor phase, when the strike points and corresponding values *g*<sup>1</sup> and *g*<sup>2</sup> exist. The start time and duration of this phase are not known from the available for the real-time reconstruction diagnostic (currents and magnetic fluxes) and require further determination.

**Figure 8.** Multilayer perceptron.

In general, the plasma shape identification problem is dynamic, i.e., the gap values at some point in time during the divertor phase depend not only on the values of magnetic fluxes and coil currents at the same point in time, but also on the values at previous time steps. Therefore, to determine the required parameters, it is advisable to use recurrentbased neural networks. However, it is not practical to use the entire input signal in such a network for several reasons. First, the longer the sequence of the data fed to the recurrent network input, the longer the training and prediction processes, which are important factors in real-time identification. Second, the coordinate values are only significantly affected by data over a relatively small time range. Based on this, the task can be divided into two subtasks. The first one is to determine whether a given moment in time is a divertor phase. The second one is calculation of the required parameters during the already known divertor phase. The first subtask can be solved using a simple feed forward network without the use of recurrence blocks because it is a classification problem, not a regression problem, unlike the second one. In addition, the first subtask is only necessary to limit the length of the input signal to the recurrent network and achieve a simultaneous increase in system speed and improved positioning accuracy.

The values of magnetic fluxes through the loops and coil currents are fed into the network separately. Each input is processed by a densely connected layer, whose outputs are then concatenated. The merged result is fed into two densely connected layers with a dropout between them. This solution is designed to combat overtraining, which has a significant impact in this task because the signals provided for training have a similar structure. The output of the network is the probability that the current time moment belongs to the divertor phase (Figure 9).

**Figure 9.** FNN model.

After this, the binary crossentropy is used as the loss function to measure the difference between the network output and training data

$$\text{BCS}(\Theta, y) = -y \log(x) + (1 - y) \log(1 - x) \,\mathrm{s}$$

where Θ is the neural network parameters, *x* is the network's output value, and *y* is a label. The sigmoid function is taken as a transfer function of the output neuron and RelU for the neural network hidden layer ones

$$\text{sign}(S) = \frac{1}{1 + e^{-S}}.$$

The Adam [29] optimization algorithm with learning rate *α* = 0.0001 is used to minimize the loss function. Learning takes place on 50 discharges and the remaining one is left for tests. To measure how often output values match with groundtruth values, the binary accuracy function is utilized. The obtained accuracy of the identification of the divertor phase for all time points equals 0.986 (Figure 10).

**Figure 10.** Divertor phase of the test discharge #37270.

The second subtask is to determine the required gap values during the divertor phase. As mentioned above, the recurrent neural network based on an encoder–decoder architecture [27] can combat this problem. This type of networks allows to capture temporal dependencies both in input and output data and build mapping between them. The first major block-encoder-created state describing input signal and the second block decoder is responsible for mapping the data into an output sequence. Both the encoder and decoder consist of GRU cells [30].

Figure 11 shows the encoder–decoder network schematic diagram. The network input is divided into two parts: encoder input and decoder input. An input signal is a sequence of vectors with the values of magnetic fluxes and currents, and it is applied to the encoder input. The decoder input can vary. Therefore, it is best to set the decoder input to 0, which will make it work with the dependencies passed to it by the encoder.

**Figure 11.** Encoder–decoder model.

The MSE as loss function and linear function as transfer function for the output neuron have the best performance for this regression problem.

$$MSE(\mathfrak{g}, \mathfrak{g}) = \frac{1}{N} \sum\_{i=1}^{N} (\mathfrak{g}\_i - \mathfrak{g}\_i)^2 \tag{20}$$

where *g*˜*<sup>i</sup>* are the network's estimation of gaps and coordinates, *gi* are the groundtruth values, and *N* is the number of values.

This network is also trained on 50 signals and tested on the remaining one. Figure 12 shows the results for the required plasma parameters during the divertor phase of the discharge #37270.

The deviation is calculated for all values of each gap using the MSE. The results obtained have the order of 10−5.

**Figure 12.** Neural network estimation of the gap values. Discharge #37270 of the Globus-M2 spherical tokamak.

#### **8. Real Time Simulation of Identification Systems**

To develop and realize plasma control systems for tokamaks, it is effective to apply so-called digital twins with real and digital control systems (Figure 13). This idea is used intensively in the industry because it gives a lot of advantages for the design, modeling, and application of control systems in real time. The digital twin is the interface between the digital and real world because it can have the ability to link physical and virtual worlds in real time, which provides more a realistic and holistic measurement of unforeseeable and unpredictable scenarios [31].

All signals from the magnetic diagnostics system of the tokamak are analog, which are then digitized by passing through an analog–digital converter (ADC). We can simulate the signal digitization process on our real-time test bed (Figures 14 and 15).

**Figure 13.** Digital twin containing a real dynamical plant with real feedback controller and virtual dynamical plant with a real feedback controller closed by the feedback of information flows: real-time data and algorithms, commands, adaptations, and recommendations.

**Figure 14.** Real-time test bed for plasma control in tokamaks. The test bed consists of two Speedgoat performance real-time target machines that are connected in feedback: one computer plays the role of the controlled plant model and the other one is the MIMO controller (https://www.ipu.ru/presscenter/62866, accessed on 21 December 2021).

**Figure 15.** Scheme of digital twin of Globus-M2. The block with plasma equilibrium reconstruction algorithm is marked by red color.

In Figure 13, one can see the digital twin containing the real dynamical plant with the real feedback controller in the real space and the virtual dynamical plant with the real feedback controller in the virtual space. Between these spaces, there is a feedback of information flows that offers the opportunity to use the results obtained on the digital feedback system for the real control system, and vice versa. The data flows between an existing physical object and a digital object are fully integrated in both directions, which one might refer to as a digital twin [32]. The digital twin in the paper consists of the spherical Globus-M2 tokamak with a plasma feedback control system and a test bed with a digital controlled plant model and a feedback controller. The test bed was created by Lomonosov Moscow State University, Trapeznikov Institute of Control Sciences (Moscow), and Ioffe Institute (St. Petersburg). A photo of the test bed for Globus-M2 that is operating in real time is given in Figure 14. In Figure 15, one can see the test bed scheme in detail consisting of the digital plant model and the digital controller with an internal plant model for controller tuning. The digital plant model contains the plasma model in the tokamak and a set of feedback loops for plasma horizontal and vertical position control, for currents control in the poloidal field coils. The digital controller contains plasma equilibrium reconstruction algorithm as well as plasma current and shape controller.

This connection of the two real-time target machines is real and reliable. The realtime test bed is away from sources of powerful electromagnetic radiation, and all of its components have high-quality protection by means of shielding and grounding.

The two identification algorithms described in this paper are applied on the real-time testbed. Figure 16 shows the real-time running of a robust observer synthesized via LMIs. Figure 17 shows the real-time running of the identification algorithm with the static matrix. Real-time simulations are performed with a sample time of 0.1 ms.

These signals demonstrate the workability of two new approaches to reconstruct plasma equilibrium in real time on the test bed. That means important value of these signals in Figures 16 and 17.

**Figure 16.** Real-time simulation of gaps variations *δg* derived by LPV model obtained from the FCDI code (blue line) and estimation of gap variations *δg*˜ obtained from robust observer synthesized by LMIs (red line). Discharge #37263.

**Figure 17.** Real-time simulation of gaps *g* derived by LTI model obtained from the FCDI code (blue line) and estimation of gaps *g*˜ obtained from static matrix (red line). Discharge #37263.

#### **9. Comparison of Identification Algorithms**

Table 1 shows the comparison results of the different gap identification algorithms. For each gap *g* and an estimate of this gap *g*˜, the value of the MSE (20) is calculated. For each algorithm, the value of the TET (task execution time) in the real-time simulation is given.


**Table 1.** Comparison of identification results.

The FCDI code has an execution time of approximately 25 ms, which is too slow for real-time applications at Globus-M2 tokamak. To apply a plasma shape identification algorithm in real time, the algorithm must have an execution time of less than 1 ms, preferably less than 0.1 ms. All algorithms in Table 1 satisfy that criteria.

The MSE of the robust observer is 100 times smaller than other algorithms. This advantage is due to the fact that the observer is the dynamic model in the state-space form with 24 states. It contains 24 integrators with the help of which the error between the states of the plant model and the estimates of the states at the observer's output is fast minimized. The time it takes to minimize the error is determined by the location of the observer's poles. The observer's poles are defined by the D-region (10).

The disadvantages of using the observer include the fact that it requires the use of scenario values of currents and fluxes, i.e., values relative to which the deviations from gaps are calculated. Other algorithms use the full values of experimental signals as inputs. The synthesis of the observer is possible only in the presence of linear models of the plasma in the tokamak as (9). Linear plant models can be derived only for deviations of currents and fluxes from the scenario values.

The fastest of these estimation algorithms of the gaps between the plasma boundary and the first wall is the static matrix algorithm with a TET of 6.3 μs because it is the simplest and requires only matrix-vector multiplication. The neural network algorithm is attractive because it can be adapted to a large number of discharges during experiments.

#### **10. Discussion**

In this work, the authors developed the original direction of plasma equilibrium reconstruction in D-shaped tokamaks using the magnetic measurements outside the hot plasma [33]. The basic criteria of this development are speed of response and accuracy. In practice, there is a set of such approaches, mentioned in the Introduction, which are used on working D-shaped tokamaks all over the world. Some of them use Picard iterations or current filaments methods. However, they rely only on the measurements outside plasma and most do not use the information from the database of the previous plasma discharges. If one uses this information, it may be possible to increase the speed of plasma equilibrium reconstruction. Moreover, when the history information of the plasma discharges is used, one can apply various reconstruction approaches from very simple ones, such as approximation with static matrices, to complex ones, such as artificial neural networks, which can be adjusted by and learn from dynamic processes. It gives a chance to improve not only the process of plasma identification on-line, but to understand the patterns of plasma processes from the experiment. These patterns cannot be deduced from the theory of high-temperature plasma physics because the plasma in a magnetic field is an extremely complicated object. These patterns represent the relationships between the gaps, which are the outputs of the plasma equilibrium reconstruction algorithm applied off-line to a set of plasma discharges and the inputs of this algorithm. The input signals are the measured fluxes on the magnetic loops, the currents in the CS/PF coils, and the plasma

current. Then, one can use these patterns to apply any plasma reconstruction algorithms with the highest speed of response, e.g., state observers, static matrices, neural networks, and others. This activity is similar to machine learning techniques, where the search process is automated on big data [34]. In future, we can use these patterns for effective plasma control systems design with the fast plasma equilibrium reconstruction algorithms in the feedback with the usage of our new testbed for the installation of plasma control systems in real time on operating tokamaks [16].

#### **11. Conclusions**

The development of the fusion problem is moving forward but not very quickly because the plasma in tokamaks is an extremely complicated plant. In spite of that, the technologies in this field have had great progress and new technologies are appearing. One of these directions is plasma diagnostics to which our research belongs. The algorithms of plasma equilibrium reconstruction, such as ones using static matrix, state observer, and artificial neural network, can be included into the feedback of plasma shape control. The first real-time test of these algorithms is done on the digital model of the plasma shape control system (Figures 2 and 15). After that, the control system can be used in a real experiment on the Globus-M2 tokamak by means of a controller based on the third machine of the digital complex shown in Figures 14 and 15. The third machine will be connected to the tokamak as the control unit of the real control system, like in Figure 13. The real control system will interact with the virtual control system shown in Figures 14 and 15, realizing the concept of the digital twin shown in Figure 13. This approach is in line with the digital twins which are applied in Industry 4.0 [35].

In any case, there is a critical point in this new identification approach. The point is that this approach greatly increases the response rate of plasma equilibrium reconstruction, but the estimation accuracy may not be as high as, for example, in the filaments (current rings) approach. So, the designer of the magnetic plasma control system should choose what is more adequate for the specific control problem since the plasma equilibrium reconstruction algorithm is included in the feedback (Figure 15).

#### **12. Patents**

The authors received the patent of the RF on the approach of modeling plasma magnetic control systems with the plasma equilibrium reconstruction algorithm in feedback #2702137 with the priority from 28 April 2017 [36]. The next application for the RF patent was submitted for the structure and approach of the digital testbed under the number 2021128495.

**Author Contributions:** Conceptualization, Y.V.M.; methodology, Y.V.M. and P.S.K.; software, P.S.K., A.E.K., V.I.K. and N.E.O.; validation, P.S.K., A.E.K., V.I.K. and N.E.O.; formal analysis, P.S.K., A.E.K., V.I.K. and N.E.O.; investigation, P.S.K., A.E.K., V.I.K. and N.E.O.; resources, P.S.K., A.E.K., V.I.K. and N.E.O.; data curation, P.S.K., A.E.K., V.I.K. and N.E.O.; writing—original draft preparation, Y.V.M., P.S.K., A.E.K., V.I.K. and N.E.O.; writing—review and editing, Y.V.M. and P.S.K.; visualization, Y.V.M., P.S.K., A.E.K., V.I.K. and N.E.O.; supervision, Y.V.M.; project administration, Y.V.M.; funding acquisition, Y.V.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Russian Science Foundation (RSF) under grant number 21-79-20180.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Restrictions apply to the availability of these data. Data was obtained from Ioffe Physics and Technology Institute of RAS (St. Petersburg, Russia) and are available from: https://globus.rinno.ru (accessed on 21 December 2021) with the permission of Ioffe Physics and Technology Institute of RAS.

**Acknowledgments:** The authors are grateful to their colleagues from Ioffe Institute in St. Petersburg (Russia) for their help in supporting us with the experimental data of plasma behavior in the tokamak Globus-M/M2 in line with the grant of Russian Science Foundation #21-79-20180.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


### *Article* **Robust Stabilization via Super-Stable Systems Techniques**

**Svetlana A. Krasnova, Yulia G. Kokunko, Victor A. Utkin and Anton V. Utkin \***

V.A. Trapeznikov Institute of Control Sciences of RAS, 117997 Moscow, Russia; skrasnova@list.ru (S.A.K.); juliakokunko@gmail.com (Y.G.K.); viktorutkin013@gmail.com (V.A.U.)

**\*** Correspondence: utkin-av@rambler.ru; Tel.: +7-(495)-198-17-20 (ext. 1577)

**Abstract:** In this paper, we propose a direct method for the synthesis of robust systems operating under parametric uncertainty of the control plant model. The developed robust control procedures are based on the assumption that the structural properties of the nominal system are conservated over the entire range of parameter changes. The invariant-to-parametric-uncertainties transformation of the initial model to a regular form makes it possible to use the concept of super-stable systems for the synthesis of a stabilizing feedback. It is essential that the synthesis of super-stable systems is carried out not on the basis of assigning eigenvalues to the matrix of the close-loop system, but in terms of its elements. The proposed approach is applicable to a wide class of linear systems with parametric uncertainties and provides a given degree of stability.

**Keywords:** parametric uncertainty; robust control; super-stability; regular form; decomposition

#### **1. Introduction**

The problem of stabilizing the state variables of dynamic automatic control plants is a fundamental problem, the formulation and solution of which served as the basis for the formation and development of control theory. Classical methods of control theory, in particular modal control, are based on the assumption of an accurate description of the mathematical model of the control process and the environment of its operation. In reality, there is often parametric uncertainty in the mathematical model of control plants, in particular due to the discarding of residual terms of higher order in the linearized models. This leads to the need to consider a parametrically indeterminate model when synthesizing feedback and to set the robust control problem. Many researchers are currently paying increased attention to control problems in conditions of parametric uncertainty. The direct way to solve the stabilization problem is to obtain estimates of unknown parameters of the control plant model, either directly using the parametric identification theory [1,2], or indirectly, based on the adaptation theory [3,4]. After obtaining estimates of unknown parameters, it becomes possible to use well-developed modal control methods. Another trend in solving the problem of stabilization of parametrically uncertain systems refers to the currently actively developing theory of robust control, in which we can roughly define two main fields: problems of analysis and problems of synthesis. Classical methods for analyzing open-loop systems include results on interval stability of polynomials [5,6], robust frequency methods [7], the D-partition technique [8], *H*∞ optimization methods [9], and others. Direct and very effective methods of robust control include the use of sliding-mode technique [10] and deep feedback [11]. Note that both methods provide the independence of motions in the sliding mode (slow motions) only from the matching uncertainties. It should be noted that usually on the problem statement step of these approaches, no assumptions are made about the structural properties of the controllability of the system. These methods of robust theory allow us to establish only the fact of system stability and do not give a direct answer to the question of the nature of convergence, which reduces their practical value.

**Citation:** Krasnova, S.A.; Kokunko, Y.G.; Utkin, V.A.; Utkin, A.V. Robust Stabilization via Super-Stable Systems Techniques. *Mathematics* **2022**, *10*, 98. https://doi.org/ 10.3390/math10010098

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 18 November 2021 Accepted: 22 December 2021 Published: 28 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

This paper considers a different approach to robust stabilization, where a guaranteed stability margin for linear stationary systems with interval parameter uncertainty is achieved using linear state feedback. The methodological basis of the developed approach is the synthesis of super-stable closed systems [12], and decomposition is based on the transformation of the control plant model to a regular form [13]. It is essential that in these approaches, the results are expressed in terms of matrix elements rather than their eigenvalues. Possibilities for extending this approach are available by using the block approach [14–16].

The paper has the following structure. Section 2 considers parametrically certain linear stationary systems. As a methodological basis for further discussion, the procedure of modal synthesis based on transformation to a regular form is presented. For the particular case of a regular form, which consists of two elementary subsystems, we formalize a procedure for the synthesis of a stabilizing feedback that ensures super-stability of the closed-loop system in the new coordinate basis and a guaranteed stability margin in the initial system. Section 3 considers a significant practical class of linear stationary systems, in which, for all values of uncertain parameters from intervals with known bounds, the structural controllability properties defined by the nominal system are conserved. For a class of systems with a controllability indicator equal to two, we formalize rank requirements for the structure of indeterminate matrices, in case of which the indeterminate system is reduced to a regular form regardless of the unknown parameters. Sufficient conditions for the feasibility of robust control are formalized. The procedure for synthesizing a stabilizing feedback is also formalized. In this case, the super-stability of the system is ensured in the coordinate basis of a regular form, and for the original system, a given stability margin is provided in all intervals of uncertain parameters. Section 4 contains numerical examples to illustrate the developed theoretical results.

#### **2. Parametrically Certain Systems**

*2.1. The Elementary Control Problem*

A mathematical model of a linear stationary control plant is considered

$$
\dot{\mathbf{x}} = A\mathbf{x} + Bu\_{\prime} \tag{1}
$$

where *<sup>x</sup>* <sup>∈</sup> *<sup>R</sup><sup>n</sup>* is measurable state vector, *<sup>u</sup>* <sup>=</sup> col(*u*1, ... , *um*) <sup>∈</sup> *<sup>R</sup><sup>m</sup>* is control vector; *<sup>A</sup>* <sup>∈</sup> *<sup>R</sup>n*×*n*, *<sup>B</sup>* <sup>∈</sup> *<sup>R</sup>n*×*<sup>m</sup>* are constant known matrices, and pair (*A*, *<sup>B</sup>*) is controllable.

For system (1), there is a problem of stabilization by means of a linear static feedback

$$
\mu = F \mathbf{x}\_{\prime} \tag{2}
$$

resulting in a closed-loop system

$$
\dot{\mathbf{x}} = (A + BF)\mathbf{x} = A\_0 \mathbf{x}.\tag{3}
$$

Typical for a linear system is the modal control problem, in which the choice of the feedback matrix *<sup>F</sup>* <sup>∈</sup> *<sup>R</sup>m*×*<sup>n</sup>* must assign a given spectrum *<sup>σ</sup>*<sup>d</sup> to the closed-loop matrix

$$\sigma\_{\mathbf{d}} = \sigma(A\_0) = \left\{ \lambda\_i \in \mathbb{C} : \det\left(\lambda\_i I\_n - A\_0\right) = 0, \, i = \overline{1, n} \right\}, \\ \text{Re}\lambda\_i(A\_0) < 0, \, i = \overline{1, n}, \quad \text{(4)}$$

which ensures asymptotic convergence of the state vector to the zero equilibrium position

$$\lim\_{t \to +\infty} x(t) = \vec{0}.$$

In Formula (4) and below, *I* is unit matrix of a given dimension.

In general, the following problems arise when solving the modal control problem:

(1) by assigning only eigenvalues in a closed system (3), it is not always possible to achieve the desired transients of the state variables;


The first two problems can be solved in some special cases of system (1). These include elementary systems with full-rank control.

**Definition 1.** *System* (1) *is called elementary if the number of controls in it is not less than the dimensionality of the state vector and the control matrix has a full rank:*

$$
\dim \mu = m \ge n = \dim \mathfrak{x}, \\
\text{rank} B\_{n \times m} = n.
$$

The synthesis problem in the elementary system is also called elementary, because the feedback matrix is directly found from the matrix equation *A* + *BF* = *A*0, such as

$$m > n : F = B^{+}(A\_0 - A) ; \; m = n : F = B^{-1}(A\_0 - A) , \tag{5}$$

where in the first expression *B*+—pseudo-inverse matrix *B*, *BB*+*B* = *B*. In the elementary system, the matrix *<sup>B</sup>* rows are linearly independent, hence *Bn*×*mB*<sup>+</sup> *<sup>m</sup>*×*<sup>n</sup>* <sup>=</sup> *In* and *<sup>B</sup>*<sup>+</sup> <sup>=</sup> *BT*(*BBT*) <sup>−</sup><sup>1</sup> [17].

Thus, in the elementary system, at first, the synthesis problem (5) is solved in terms of matrix elements rather than their eigenvalues. Second, one can easily provide the desired transients in all state variables by choosing a reference matrix of simple structure, in a Jordanian form or diagonal form. In the latter case, the transient process of each state variable will be monotonous with a given rate of convergence to zero, which is determined by the values of the diagonal elements of the reference matrix.

The advantages of systems with full control are obvious, but in practice, usually the control problem is not elementary. In the next subsection, the procedure of nonsingular linear transformations is given, which allows extracting an elementary subsystem with full control from the initial system of general form.

#### *2.2. Synthesis of Modal Control Based on a Regular Form*

We will consider the general case of system (1), where the number of controls is less than the dimension of the state vector and 0 < rank*Bn*×*<sup>m</sup>* = *m*<sup>0</sup> ≤ *m* < *n*, i.e., out of *n* matrix rows *B* only *m*<sup>0</sup> are basic. For such a system, there is an equivalent representation in a new coordinate basis, which is called a regular form (RF) with respect to the control [13,18]. In this form, the elementary subsystem with full control is singled out. The point of the corresponding linear nonsingular transformation is grouping of basis rows and zeroing linearly dependent rows of the matrix *B*.

**Definition 2.** *A regular form with respect to the control vector is an equivalent representation of system* (1)*,* rank*Bn*×*<sup>m</sup>* = *m*<sup>0</sup> ≤ *m* < *n in the form of two subsystems*

$$\begin{array}{l}\dot{\mathbf{x}}\_{1} = A\_{11}\mathbf{x}\_{1} + A\_{12}\mathbf{x}\_{2\prime} \\ \dot{\mathbf{x}}\_{2} = A\_{21}\mathbf{x}\_{1} + A\_{22}\mathbf{x}\_{2} + B\_{2}\mathbf{u}\_{\prime} \end{array} \tag{6}$$

*which are obtained as a result of nonsingular variable change*

$$T\mathfrak{x} = \overline{\mathfrak{x}} = \left( \begin{array}{c} \mathfrak{x}\_1 \\ \mathfrak{x}\_2 \end{array} \right), \det T\_{(n \times n)} \neq 0, \mathfrak{x}\_1 \in R^{n-m\_0}, \text{rank} B = \text{rank} B\_2 = \text{dim} \mathfrak{x}\_2 = m\_0$$

*and similarity transformation*

$$TAT^{-1} = \overline{A} = \begin{pmatrix} A\_{11(n-m\_0)\times(n-m\_0)} & A\_{12(n-m\_0)\times m\_0} \\ A\_{21(m\_0\times(n-m\_0))} & A\_{22(m\_0\times m\_0)} \end{pmatrix}, \\ TB = \overline{B} = \begin{pmatrix} O\_{(n-m\_0)\times m} \\ B\_{2(m\_0\times m)} \end{pmatrix}.$$

*Here and further in the text, O is the zero matrix of the corresponding dimension.*

The second subsystem of system (6) contains full-rank control, which is a condition for the solution of the elementary control problem in this subsystem; similar to (5), pair (*A*22, *B*2) is obviously controllable. In the first subsystem of system (6), which in the general case is not elementary, the vector *x*<sup>2</sup> is considered virtual control action. If system (1) pair (*A*, *B*) is controllable, then due to invariance of the controllability property to nonsingular linear transformations, this means that in the first subsystem of (6) the pair (*A*11, *A*12) is also controllable.

Note that there can be several sets of basis rows in matrix *B*, so in general there are several equivalent realizations of the regular form (6) for a particular system. They differ by the values of the matrix elements *A* and *B*2, but all have the same structure, in that the first subsystem has no control, and the second has a dynamic order *m*<sup>0</sup> and is elementary.

Based on the regular form, the problem of synthesis of modal control is decomposed into two successively solvable subproblems of lesser dimensions than the original system. In the first subsystem *n* − *m*<sup>0</sup> with virtual control *x*2, the problem of assigning a part of a given spectrum (4) is solved. The derived linear local feedback is introduced by a nonsingular linear transformation, and the assignment of the second part of the spectrum is provided by a real linear control *u*, meaning the elementary synthesis problem of dimension *m*<sup>0</sup> is solved. As a result, a linear control law for the variables of the transformed system will be obtained. Using the resulting transformation matrix, it should be presented with respect to the state variables of the initial system in the form (2). According to the property of invariance of the roots of the characteristic equation to nondegenerate linear transformations, the characteristic polynomials (and hence the spectrum) of matrices of closed-loop initial and transformed systems will be equal to each other. Let us present these transformations in the form of a step-by-step description.

**Procedure 1.** Synthesis of modal control based on transition to a regular form.

1. Nonsingular transformation of system (1) to the regular form (6).

1.a. Grouping basis rows of the matrix *B* and forming matrix *B*2(*m*0×*m*).

If necessary, rearrange the matrix *B* rows in a way that *m*<sup>0</sup> of its last rows are linearly independent, and perform an appropriate variable change, in which the transformation matrix is a permutation matrix *Tp*(*n*×*n*), det*Tp* = 0:

$$T\_p B = \tilde{B} = \begin{pmatrix} \tilde{B}\_1 \\ B\_2 \end{pmatrix}, \\ T\_p \mathbf{x} = \tilde{\mathbf{x}} = \begin{pmatrix} \tilde{\mathbf{x}}\_1 \\ \mathbf{x}\_2 \end{pmatrix} \\ \tilde{\mathbf{x}}\_1 \in \mathbb{R}^{n \times m\_0}, \\ T\_p A T\_p^{-1} = \tilde{A} = \begin{pmatrix} \tilde{A}\_{11} & \tilde{A}\_{12} \\ \tilde{A}\_{21} & \tilde{A}\_{22} \end{pmatrix}, \\ \text{(7)}$$
 
$$\text{rank} B = \text{rank} B\_2 = \text{dim} \mathbf{x}\_2 = m\_0.$$

System (1) will be represented in the following equivalent form:

$$
\dot{\tilde{x}}\_1 = \tilde{A}\_{11}\tilde{x}\_1 + \tilde{A}\_{12}\mathbf{x}\_2 + \tilde{B}\_1\boldsymbol{\mu}, \ \dot{\mathbf{x}}\_2 = \tilde{A}\_{21}\tilde{\mathbf{x}}\_1 + \tilde{A}\_{22}\mathbf{x}\_2 + B\_2\boldsymbol{\mu}.\tag{8}
$$

If no permutations are required, then *Tp* = *I*, and to obtain the system (8), the appropriate notation is introduced.

1.b. Zeroing out the linearly dependent rows of a matrix *B*.

If in system (8) *<sup>B</sup>*1(*n*−*m*0)×*<sup>m</sup>* = *<sup>O</sup>*, then the matrix *<sup>B</sup>*1, which consists of linearly dependent rows of a matrix *B*2, needs to be reset to zero. It is required that as a result of partial change of the variables,

$$\mathfrak{x}\_1 = \widetilde{\mathfrak{x}}\_1 - B\_2^\* \mathfrak{x}\_{2\prime} \mathfrak{x}\_1 \in \mathbb{R}^{n-m\_0}.\tag{9}$$

In the new subsystem relative to *x*<sup>1</sup> control was absent, as follows

$$\dot{\mathbf{x}}\_1 = \dot{\overline{x}}\_1 - B\_2^\* \dot{\underline{x}}\_2 = (\tilde{A}\_{11} - B\_2^\* \tilde{A}\_{21}) \tilde{\mathbf{x}}\_1 + (\tilde{A}\_{12} - B\_2^\* \tilde{A}\_{22}) \mathbf{x}\_2 + (\tilde{B}\_1 - B\_2^\* B\_2) \boldsymbol{\mu} \Rightarrow \tilde{B}\_1 - B\_2^\* B\_2 = O.$$

From the resulting matrix equation, we have

$$m\_0 < m : B\_2^\* = \tilde{B}\_1 B\_{2(m \times m\_0)^t}^+ \; B\_2^+ = B\_2^T (B\_2 B\_2^T)^{-1} ; \; m\_0 = m : B\_2^\* = \tilde{B}\_1 B\_{2(m \times m)}^{-1} . \tag{10}$$

The corresponding transformation of partial variable change (9) has the form

$$\begin{aligned} \,^t T\_4 \widetilde{\mathcal{B}} &= T\_4 \left( \begin{array}{c} \widetilde{\mathcal{B}}\_1 \\ \mathcal{B}\_2 \end{array} \right) = \overline{\mathcal{B}} = \begin{pmatrix} O \\ & B\_2 \end{pmatrix}, \det T\_{4(n \times n)} \neq 0, \,^t T\_4 \widetilde{\mathcal{X}} = T\_4 \left( \begin{array}{cc} \widetilde{\mathcal{X}}\_1 \\ \mathcal{X}\_2 \end{array} \right) = \overline{\pi} = \begin{pmatrix} \varkappa\_1 \\ & \varkappa\_0 \end{pmatrix}, \,^t T\_4 \widetilde{\mathcal{A}} T\_4^{-1} = \overline{\mathcal{A}} \\\ \,^t D\_{m\_0 \times (n - m\_0)} & \,^t D\_{m\_0} \end{pmatrix} \end{aligned} \tag{11}$$
 
$$\begin{aligned} \,^t T\_4 = \begin{pmatrix} I\_{n - m\_0} & -B\_{2(n - m\_0) \times m\_0}^\* \\ \,^t D\_{m\_0 \times (n - m\_0)} & I\_{m\_0} \end{pmatrix}, \,^t T\_4^{-1} = \begin{pmatrix} I\_{n - m\_0} & B\_{2(n - m\_0) \times m\_0}^\* \\ \,^t D\_{m\_0 \times (n - m\_0)} & I\_{m\_0} \end{pmatrix} \end{aligned} \tag{11}$$

and leads system (8) to the regular form (6). If in system (8) *<sup>B</sup>*1(*n*−*m*0)×*<sup>m</sup>* = *<sup>O</sup>*, it corresponds exactly to the regular form (6) and *Ta* = *I*.

The sequence of the above transformations of system (1) to the regular form (6) is

$$T\mathbf{x} = T\_a(T\_p\mathbf{x}) = \overline{\mathbf{x}}\text{, }T = T\_aT\_{p\text{.}}\tag{12}$$

where some cases may be *Tp* = *I* and/or *Ta* = *I*. Clearly, the equality *Tp* = *Ta* = *I* occurs in mathematical models that are initially of the regular form (6), and this situation is typical of many practical applications.

**Procedure 2.** Decomposition synthesis of modal control based on RF.

2.a. Synthesis of fictitious control in the first RF subsystem.

We have to choose *n* − *m*<sup>0</sup> values from a given spectrum *σ<sup>d</sup>* (4) so as not to disconnect complex-conjugate pairs, if any. If an odd *n* − *m*<sup>0</sup> and/or *m*<sup>0</sup> is required to break the complex-conjugate pair, then the decomposition will have to be dropped, and a different synthesis method should be used. Otherwise, this method will produce a complex feedback matrix (2), which is not acceptable in practical applications.

If the above choice is possible, in the first subsystem of system (6) we form a linear virtual control *x*<sup>2</sup> = *F*1*x*<sup>1</sup> and obtain the local feedback matrix

$$F\_{1(m\_0 \times (n-m\_0))}: A\_1 = A\_{11} + A\_{12} \\ F\_1 \sigma(A\_1) \subset \sigma\_\mathbf{d}.\tag{13}$$

Due to the controllability of the pair (*A*11, *A*12), this problem has a solution. In the particular case rank*A*<sup>12</sup> = dim*x*<sup>1</sup> = *n* − *m*<sup>0</sup> ≤ *m*0, when also the first subsystem is elementary, then similarly to (5) we can assign in it both a given spectrum and a given matrix of own movements. In the general case, problem (13) is not elementary, but the dimensions of the desired matrix are smaller than when solving problem (3) in the original system (1), (2), where dim*F* = *m* × *n*.

**Remark 1.** *In many applications, the transition to the RF simplifies the synthesis procedure sufficiently, and it is possible to simply represent the initial system in the form of two subsystems. In general case for large-dimensional systems, one can continue the mentioned transformations and in the first subsystem of* (6) *allocate in a similar way an elementary subsystem with respect to virtual control x*2*, etc. As a result, the first subsystem of system* (6) *will be represented as associated elementary subsystems (blocks) with full-rank virtual controls, which are the variables of the following block. The form in this case is called the block form of controllability, on the basis of which the synthesis problem is divided into consecutive elementary control problems* [14].

In order to implement the local relation of variables that has been formed, we need to introduce a mismatch between the real control and the selected virtual control by means of partial variable change

$$\mathbf{x}\_2 = F\_1 \mathbf{x}\_1, \ e\_1 := \mathbf{x}\_1, e\_2 = \mathbf{x}\_2 - F\_1 \mathbf{x}\_1, e\_2 \in R^{m\_0} \tag{14}$$

and the corresponding linear transformation

$$\begin{split} T\_{\varepsilon}\overline{\mathbf{z}} = \begin{pmatrix} x\_{1} \\ x\_{2} \end{pmatrix} = \varepsilon = \begin{pmatrix} \varepsilon\_{1} \\ \varepsilon\_{2} \end{pmatrix}, T\_{\varepsilon} = \begin{pmatrix} I\_{(n-m)} & O(n-m\_{0}) \times m\_{0} \\ -F\_{1(my \times (n-m\_{0}))} & I\_{m\_{0}} \end{pmatrix}, T\_{\varepsilon}^{-1} = \begin{pmatrix} I\_{(n-m\_{0})} & O(n-m\_{0}) \times m\_{0} \\ F\_{1(my \times (n-m\_{0}))} & I\_{m\_{0}} \end{pmatrix}, \\ \det T\_{\varepsilon(n \times n)} \neq 0, T\_{\varepsilon}\overline{A}T\_{\varepsilon}^{-1} = A\_{\varepsilon} = \begin{pmatrix} A\_{1} & A\_{12} \\ \mathcal{C}\_{21}\mathcal{C}\_{22} \end{pmatrix}, T\_{\varepsilon}\overline{B} = T\_{\varepsilon} \begin{pmatrix} O & \\ & B\_{2} \end{pmatrix} = \begin{pmatrix} O & \\ & B\_{2} \end{pmatrix}. \end{split} \tag{15}$$

As a result, RF with local relation closed-loop will be obtained:

$$\begin{aligned} \dot{e}\_1 &= A\_1 e\_1 + A\_{12} e\_2, \\ \dot{e}\_2 &= C\_{21} e\_1 + C\_{22} e\_2 + B\_2 u. \end{aligned} \tag{16}$$

2.b. Synthesis of real control by variables of transformed systems.

Next, the local feedback generated in the first subsystem of (16) must be provided by the real control. For the second elementary subsystem of (16) we have to compose a reference matrix *A*2(*m*0×*m*0) with *m*<sup>0</sup> with eigenvalues from the rest of the given spectrum *σ*(*A*1) ∪ *σ*(*A*2) = *σ*d, and form a feedback on the variables of the transformed system:

$$\begin{array}{lcl}m\_0 < m : & \iota = B\_2^+ \left(-\mathbb{C}\_{21}\mathbf{e}\_1 - \mathbb{C}\_{22}\mathbf{e}\_2 + A\_2\mathbf{e}\_2\right) = K\mathbf{e};\\m\_0 = m : & \iota = B\_2^{-1} \left(-\mathbb{C}\_{21}\mathbf{e}\_1 - \mathbb{C}\_{22}\mathbf{e}\_2 + A\_2\mathbf{e}\_2\right) = K\mathbf{e}.\end{array} \tag{17}$$

System (16), with closed-loop by control (17), will take the form

$$
\dot{e}\_1 = A\_1 e\_1 + A\_{12} e\_2, \ \dot{e}\_2 = A\_2 e\_2. \tag{18}
$$

Its matrix has an upper triangular block structure

$$
\begin{pmatrix} A\_1 & A\_{12} \\ O & A\_2 \end{pmatrix}
$$

and is stable according to (4), and its eigenvalues meet the characteristic equation det(*λI* − *A*1)det(*λI* − *A*2) = 0.

2.c. A modal control law based on the state of the initial system.

Finally, based on (17), it is necessary to form a feedback on the variables of the original systems (1) and (2), since it is these variables that are measured. By substitutions of variables (7), (11), and (15), the resulting transformation matrix and the resulting modal control law (2) are as follows:

$$T\_{\varepsilon}T\_{a}T\_{p}\ge = e,\ \mathfrak{u} = \mathfrak{K}e = F\mathfrak{x}\_{\prime}F\_{\mathfrak{m}\times\mathfrak{n}} = \mathfrak{K}T\_{\varepsilon}T\_{a}T\_{p},\tag{19}$$

which provides (3), (4), and a solution to the stabilization problem.

Modal control synthesis is complete.

As stated in subsection 2.a, full parametric certainty of the matrices *A* and *B* is required to implement modal control, which limits its applicability in practical applications, as models of real-world control plants often depend on unknown parameters.

In such cases, the requirements of the closed-loop system are relaxed, and the stability margin, which is one of the key quality indicators of the transition process, is considered as the target condition. The problem is to synthesize a linear feedback (2), which provides in the closed-loop system (3) a stability margin not less than a given *η*<sup>d</sup> > 0:

$$\min \{-\text{Re}\lambda\_i(A+BF)\}\_{i=\overline{1,n}} = \eta \ge \eta\_{\text{d}}.\tag{20}$$

As a methodological basis for problem (20), we will use the concept of super-stability of the system, which is defined in terms of matrix elements using inequalities rather than characteristic Equation (4), which is a precondition for using this concept in solving robust control problems in systems with uncertain parameters.

**Definition 3** ([12])**.** *Matrix <sup>A</sup>* = (*aij*) <sup>∈</sup> *<sup>R</sup>n*×*<sup>n</sup> and, consequently, the system* . *x* = *Ax are called super-stable if A is a negative-diagonal-dominated matrix, i.e., all the elements of its main diagonal are negative numbers aii* < 0, *i* = 1, *n, which are greater in absolute value than the sum of the modules of the non-diagonal elements in the row:*

$$\min \left\{ -a\_{ii} - \sum\_{j=1, j \neq i}^{n} |a\_{ij}| \right\}\_{i=\overline{1,n}} = \nu > 0,\tag{21}$$

*where ν has the meaning of a margin of super-stability.*

The statements in Lemma 1 below are rather obvious. However, we will present a rigorous proof of them, because they are important for further discussion.

**Lemma 1.** *Any super-stable matrix <sup>A</sup>* = (*aij*) <sup>∈</sup> *<sup>R</sup>n*×*n, (21) is Hurwicz, and its stability margin* min{−Re*λi*(*A*)}*i*=1,*<sup>n</sup>* = *η* > 0 *is as much as the margin of her super-stability (21), i.e.,*

$$
\eta \ge \nu. \tag{22}
$$

**Proof 1.** According to Gershgorin's theorem [17], each of the eigenvalues *λ* of matrix *A* is always located in one of the circles of the complex plane <sup>|</sup>*aii* <sup>−</sup> *<sup>λ</sup>*<sup>|</sup> <sup>≤</sup> *<sup>n</sup>* ∑ *j*=1,*j*=*i aij* , *i* = 1, *n*

centered at *aii* and with a radius of *<sup>n</sup>* ∑ *j*=1,*j*=*i aij* . Each eigenvalue *λ* of matrix *A* corresponds to the eigenvector *<sup>h</sup>*: *<sup>n</sup>*

∑ *j*=1 *aijh* <sup>=</sup> *<sup>λ</sup>h*, *<sup>i</sup>* <sup>=</sup> 1, *<sup>n</sup>*. Let <sup>|</sup>*hk*<sup>|</sup> <sup>=</sup> max*<sup>i</sup>* <sup>|</sup>*hi*<sup>|</sup> <sup>&</sup>gt; 0; then,

$$\left| |a\_{kk} - \lambda| |h\_k| \right| = \left| \sum\_{i,j \neq k}^n a\_{kj} h\_i \right| \le |h\_k| \sum\_{j \neq k}^n \left| a\_{kj} \right| \text{ and } |a\_{kk} - \lambda| \le \sum\_{j \neq k}^n \left| a\_{kj} \right|.$$

It follows that if the matrix *<sup>A</sup>* is super-stable and <sup>−</sup>*akk* <sup>&</sup>gt; *<sup>n</sup>* ∑ *j*=*k akj* , then each of its eigenvalues lies in the left half-plane of the complex plane, i.e., matrix *A* is Hurwitz and its stability margin is defined as *η* = −Re*λ*<sup>0</sup> = min{−Re*λi*(*A*)} > 0.

Let *λ*<sup>0</sup> be a real simple eigenvalue of the matrix *A*, to which corresponds the eigenvector *h*<sup>0</sup> = (*h*1,..., *hn*) *<sup>T</sup>*, *λ*0*h*<sup>0</sup> = *Ah*0, and for the *k*-th (*k* = 1, 2, ... , *n*) element we have: *<sup>λ</sup>*0*hk* <sup>=</sup> *<sup>n</sup>* ∑ *j*=1 *akjhj*. Let *hk* be an element with a maximum module *h*0: |*hk*| = max{|*hi*|}*i*=1,*n*. Then, a fair estimate is <sup>|</sup>*λ*0||*hk*<sup>|</sup> <sup>≥</sup> <sup>|</sup>*akk*||*hk*<sup>|</sup> <sup>−</sup> *<sup>n</sup>* ∑ *akj* |*hj*| ≥ <sup>|</sup>*hk*|(|*akk*<sup>|</sup> <sup>−</sup> *<sup>n</sup>* ∑ *akj* ) =

*j*=1,*j*=*k j*=1,*j*=*k* |*hk*|*ν*, whence it follows *η* = |*λ*0| ≥ *ν*, inequality (20) is satisfied. The case of Re*λ*<sup>0</sup> corresponds to a pair of complex-conjugate eigenvalues, and the estimate becomes |Re*λ*0||*hk*| ≥ <sup>|</sup>*akk*||*hk*<sup>|</sup> <sup>−</sup> *<sup>n</sup>* ∑ *j*=1,*j*=*k akj hj* ≥ |*hk*|*ν* inequality (26) is satisfied.

In the case of an multiple-eigenvalue *λ*0, similar estimates hold for all linearly independent eigenvectors corresponding to a given eigenvalue. Lemma 1 is proved.

In a controllable linear system with certain parameters, it is always possible to achieve stability with state feedback, but super-stability is rarely achieved due to a lack of control actions. In this sense, the only obvious exceptions are elementary systems.

As it is shown in subsection 2.a, it is possible to provide any reference matrix *A*0, including a super-stable one, in a closed-loop system using feedback (2) and (5), if the parameters of the elementary system are known. Let us note that a diagonal matrix with negative elements *A*<sup>0</sup> = diag{*ai*}, *ai* < 0, *i* = 1, *n* is a special case of a super-stable matrix, where min{|*ai*|} = min{−*λi*(*A*0)} = *η* = *ν*.

Let us distinguish a class of nonelementary linear systems, for the stabilization of which with a given stability margin (20) we can interconnectively apply the concept of super-stability and decomposition synthesis based on the transition to the RF. This class includes a particular case of controllable systems (1), in which RF (6) will consist of two elementary subsystems. The possibility of such a representation is contained in the rank structure of the controllability matrix.

If system (1), where 0 < rank*Bn*×*<sup>m</sup>* = *m*<sup>0</sup> ≤ *m* < *n*, is controllable, its controllability matrix is of full rank:

$$\text{rank}(B \cdot AB \cdot A^2 B \cdot \dots \cdot A^{n-m\_0}B)\_{n \times m(n-m\_0+1)} = n. \tag{23}$$

The rank structure of the controllability matrix (23) is characterized by a controllability index and a controllability indicator [19]. If the rank of the controllability matrix (23) is increased according to the following scheme:

> rank*B* = *m*<sup>0</sup> = 0, rank(*B AB*) = *m*<sup>0</sup> + *m*1, *m*<sup>0</sup> ≥ *m*<sup>1</sup> = 0, rank(*B AB A*2*B*) = *<sup>m</sup>*<sup>0</sup> <sup>+</sup> *<sup>m</sup>*<sup>1</sup> <sup>+</sup> *<sup>m</sup>*2, *<sup>m</sup>*<sup>1</sup> <sup>≥</sup> *<sup>m</sup>*<sup>2</sup> <sup>=</sup> 0, . . . , rank(*B AB* ... *<sup>A</sup>rB*) = *<sup>m</sup>*<sup>0</sup> <sup>+</sup> *<sup>m</sup>*<sup>1</sup> <sup>+</sup> ... <sup>+</sup> *mr*, *mr*−<sup>1</sup> <sup>≥</sup> *mr* <sup>=</sup> 0, rank(*B AB* ... *<sup>A</sup>rB Ar*<sup>+</sup>1*B*) = *<sup>m</sup>*<sup>0</sup> <sup>+</sup> *<sup>m</sup>*<sup>1</sup> <sup>+</sup> ... <sup>+</sup> *mr* <sup>+</sup> <sup>0</sup> <sup>⇒</sup> <sup>⇒</sup> rank(*B AB* ... *<sup>A</sup>r*+1*B Ar*+2*B*) = *<sup>m</sup>*<sup>0</sup> <sup>+</sup> *<sup>m</sup>*<sup>1</sup> <sup>+</sup> ... <sup>+</sup> *mr* <sup>+</sup> <sup>0</sup> <sup>+</sup> 0, (24)

then pair (*A*, *B*) corresponds to a specific set of natural numbers *m*0,..., *mr*:

$$\text{rank}(B \mid AB \mid \dots \mid A^r B) = m\_0 + m\_1 + \dots + m\_{\ell} = n,\\ m\_0 \ge m\_1 \ge \dots \ge m\_{\ell}, \ r \le n - m\_0,\tag{25}$$

which are called the indexes of controllability of the pair (*A*, *B*). *mi* ∈ *N*, *i* = 0,*r* is the number of linearly independent matrix columns *A<sup>i</sup> B*, which form the basis of the controllability matrix, compiled according to the specified scheme; *r* + 1 is controllability indicator of pair (*A*, *B*), the number of its controllability indices.

**Lemma 2.** *If the controllability matrix of a linear controlled system* (1) *has a controllability indicator equal to two,*

$$\text{rank}B\_{(n\times m)} = m\_0 \neq 0,\text{rank}(B\_{\cdot}AB)\_{n\times 2m} = m\_0 + m\_1 = n,\ m\_0 \ge (n - m\_0),\tag{26}$$

*then, using the nondegenerate replacement of variables* (12)*, system* (1) *will be represented in RF* (6)*, in which not only the second, but also the first subsystem will be elementary with respect to the virtual control,*

$$\text{rank}A\_{12(n-m\_0)\times m\_0} = n - m\_0.\tag{27}$$

**Proof 2.** Let us rearrange the blocks of the controllability matrix (26) without performing a rearrangement inside the blocks *<sup>W</sup>*(*n*×2*m*) = (*AB B*). For convenience, we denote *AB* = *<sup>P</sup>*. Let us multiply this matrix from the left by the transition matrix to RF (12). According to (7) and (12), the matrix obtained as a result of multiplication can be represented in the following form:

$$TW = T\_d T\_p (P \ B) = T\_d \left( \begin{array}{cc} \tilde{P}\_1 & \tilde{B}\_1 \\ \tilde{P}\_2 & B\_2 \end{array} \right) = \left( \begin{array}{cc} P\_{1(n-m\_0)\times m} & O \\ P\_2 & B\_{2(m\_0\times m)} \end{array} \right) = \overline{W}\_{\prime \prime}$$

where *<sup>P</sup>*<sup>1</sup> <sup>=</sup> *<sup>P</sup>*<sup>1</sup> <sup>−</sup> *<sup>B</sup>*1*B*<sup>+</sup> <sup>2</sup> *<sup>P</sup>*2. By design, rank*W*(*n*×2*m*) = *<sup>n</sup>*, and rank*<sup>B</sup>* = rank*B*2(*m*0×*m*) = *<sup>m</sup>*0. When multiplied by the nonsingular matrix det*T*(*n*×*n*) = 0, the rank does not change and rank*W*(*n*×2*m*) = *<sup>n</sup>*, which is why matrix *<sup>P</sup>*<sup>1</sup> is of full rank:

$$\text{rank}P\_{1(n-m\_0)\times m} = n - m\_0.\tag{28}$$

Considering that the matrices *Wn*×2*<sup>m</sup>* and *Wn*×2*<sup>m</sup>* = *TW* are of full rank and consist of linearly independent rows, there are pseudo-inverse matrices for them, and

$$\boldsymbol{W}\_{2m \times n}^{+} \colon \boldsymbol{W} \boldsymbol{W}^{+} = \boldsymbol{I}\_{\mathbb{H}\_{\prime}} \, \overline{\boldsymbol{W}}^{+} = \left(\boldsymbol{T} \boldsymbol{W}\right)^{+} = \begin{pmatrix} \boldsymbol{P}\_{1(m \times (n - m\_{0}))}^{+} & \boldsymbol{O} \\ \times & \boldsymbol{B}\_{2(m \times m\_{0})}^{+} \end{pmatrix}. \tag{29}$$

In Formula (29) and further in the text, the symbol × denotes matrices, the type of which does not affect the structural properties.

Taking (27) into account, the similarity transformation of the matrix *A* to RF *TAT*−<sup>1</sup> = *A* can be represented as

$$TAT^{-1} = TAWW^{+} \ T^{-1} = TAW(TW)^{+} = (TAW)\overline{W}^{+},$$

where *TAW* = *TA*(*AB B*) = *T*(*A*2*B AB*), *AB* = *P*. Then,

$$TAT^{-1} = (T(A^2B \cdot AB))\overline{W}^+ = \begin{pmatrix} \times & P\_1 \\ \times & \times \end{pmatrix} \begin{pmatrix} P\_1^+ & O \\ \times & B\_2^+ \end{pmatrix} = \begin{pmatrix} \times & P\_1B\_2^+ \\ \times & \times \end{pmatrix} = \begin{pmatrix} A\_{11} & A\_{12(n-m\_0)\times m\_0} \\ A\_{21} & A\_{22} \end{pmatrix},$$

where *<sup>A</sup>*12(*n*−*m*0)×*m*<sup>0</sup> = (*P*1*B*<sup>+</sup> <sup>2</sup> )(*<sup>n</sup>*−*m*0)×*m*<sup>0</sup> due to (28), (29), and rank(*P*1*B*<sup>+</sup> <sup>2</sup> )(*<sup>n</sup>*−*m*0)×*m*<sup>0</sup> <sup>=</sup> *n* − *m*0, so equality (27) is satisfied. Lemma 2 is proved.

Let us extend (without proof) the results of Lemma 2 to controlled systems of general form (25).

A consequence of Lemma 2 is as follows. If condition (25) is satisfied in system (1), then it can be represented in the block form of controllability, which consists of *r* + 1 elementary blocks of dimension *m*0, *m*1, ... *mr* by a linear nonsingular transformation *Tx* = (*x*1, ... , *xr*+1), det*<sup>T</sup>* <sup>=</sup> 0, *<sup>x</sup>*<sup>1</sup> <sup>∈</sup> *<sup>R</sup>mr*, *<sup>x</sup>*<sup>2</sup> <sup>∈</sup> *<sup>R</sup>mr*−<sup>1</sup> , ... , *xr*+<sup>1</sup> <sup>∈</sup> *<sup>R</sup>m*<sup>0</sup> . Matrix *<sup>T</sup>* can be found by transforming the matrix *W* to the bottom-triangular block form with matrices of full rank on the main diagonal:

$$T\overline{W} = T(A^\prime B \; \dots \; AB \; B) = \overline{\overline{W}} = \begin{pmatrix} P\_r & \dots & O & O & O \\ \dots & \dots & \dots & \dots & \dots \\ & \dots & P\_2 & O & O \\ & \dots & & P\_1 & O \\ & \dots & & & P\_0 \end{pmatrix},\tag{30}$$

where rank*P*0(*m*0×*m*) <sup>=</sup> *<sup>m</sup>*0, rank*Pi*(*mi*×*mi*−1) <sup>=</sup> *mi*, *<sup>i</sup>* <sup>=</sup> 1, *<sup>r</sup>*.

Just as in the procedure of converting to the RF (6), the essence of the transformations is that successively in each block *B*, *AB*, ... , *Ar*−1*B*, one needs to group *mi* basis rows of matrix *Pi* by transpositions (similar to (7)) and zero out the top linearly dependent lines (similar to (11)). In this case, the leftmost block *ArB* can be discarded, since its elements do not participate in the formation of the matrix *T*.

For the selected class of systems (1), (26), it is possible to provide a guaranteed stability margin (24) by providing super-stability of the closed system (18) in a new coordinate basis, where the reference matrices *A*1, *A*<sup>2</sup> can be assigned arbitrarily. Selecting these matrices diagonally

$$A\_1 = \text{diag}\left\{a\_i^1\right\}\_{i=\overline{1,n-m\_0}'} A\_2 = \text{diag}\left\{a\_i^2\right\}\_{i=\overline{1,m\_0}'} \tag{31}$$

on the one hand, excludes the presence of complex-conjugate eigenvalues in the matrix of the closed system, but, on the other hand, simplifies the computational aspect of the synthesis. Then, for any parameters satisfying the non-strict inequalities

$$\nu \ge \eta\_{\rm d}, \ a\_i^1 \le -(\nu + \sum\_{j=1}^{m\_0} \left| a\_{ij}^{12} \right|), \ A\_{12} = (a\_{ij}^{12}), \ i = \overline{1, n-m\_0}; \ a\_i^2 \le -\nu > 0, i = \overline{1, m\_0} \tag{32}$$

the closed-loop system (18) will be super-stable with a margin of super-stability *ν* ≥ *η*d.

As it was noted, the property of super-stability is formulated in terms of matrix elements (21) rather than their eigenvalues, so it is not invariant to linear transformations, and the initial closed system (1), (19), (31), (32) in general case will not be super-stable. However, because of (22), it guarantees stabilization with a stability margin at least equal to the one given in (20).

In the next section, we consider the possibility of synthesis of robust control of parametrically uncertain systems in the context of the proposed approach.

#### **3. Parametrically Uncertain Systems**

#### *3.1. Elementary Control Problem*

This section considers the problem of stabilization of linear stationary systems operating under interval parameter uncertainty

$$\dot{\mathbf{x}} = (A + A)\mathbf{x} + (B + B)u, \mathbf{x} \in \mathbb{R}^n, \ u \in \mathbb{R}^m, \ m < n,\tag{33}$$

where matrices elements *A* = (*aij*), *i*, *j* = 1, *n*, *B* = (*bij*), *i* = 1, *n*, *j* = 1, *m*, which define the nominal system (1), are known, and pair (*A*, *B*) is controllable. Elements of matrices *A*ˆ = (*a*ˆ*ij*) and *B*ˆ = (ˆ *bij*) are constant but unknown; their values belong to closed intervals with known boundaries:

$$a\_{ij\text{min}} \le b\_{ij} \le a\_{ij\text{max}}, \ i\_\prime j = \overline{1, n}; b\_{ij\text{min}} \le \hat{b}\_{ij} \le b\_{ij\text{max}}, \ i = \overline{1, n}, \ j = \overline{1, m}.$$

In the following, to simplify the explanation, we will assume that the values of the uncertain elements are in intervals symmetric with respect to zero

$$a\_{ij} - \widehat{a}\_{ij} \le a\_{ij} + \mathfrak{a}\_{ij} \le a\_{ij} + \widehat{a}\_{ij}, \; i, j = \overline{1, n};\\b\_{ij} - \widehat{b}\_{ij} \le b\_{ij} + \widehat{b}\_{ij} \le b\_{ij} + \widehat{b}\_{ij}, \; i = \overline{1, n}, \; j = \overline{1, m}. \tag{34}$$

Then, the values of the matrix elements of the system (33) will be in closed intervals with known bounds, symmetrical for the corresponding nominal values

$$a\_{ij} - \stackrel{\frown}{a}\_{ij} \le a\_{ij} + \mathfrak{d}\_{ij} \le a\_{ij} + \stackrel{\frown}{a}\_{ij\prime},\\i, j = \overline{1, n}; b\_{ij} - \stackrel{\frown}{b}\_{ij} \le b\_{ij} + \mathfrak{d}\_{ij} \le b\_{ij} + \stackrel{\frown}{b}\_{ij\prime},\\i = \overline{1, n}, j = \overline{1, m}.$$

It is supposed that pair ((*A* + *A*ˆ), (*B* + *B*ˆ)) is controllable in all acceptable intervals of parameter uncertainty, and moreover, the rank structures of the controllability matrices of the nominal system (1) and the parametrically perturbed system (33) are the same. This requirement is due to practical considerations. The uncertain system model (33) describes the functioning of a real control plant, and, for example, the failure to meet the condition rank *B* = rank (*B* + *B*ˆ) indicates a "faulty" actuator or damaged communication with the control plant.

In a general case, the solution of the modal control problem with the assignment of a given spectrum (4) in the system (33) is not possible. We set the problem of synthesis of linear feedback (2), providing stabilization of the system (33) at all acceptable values of uncertain parameters (34) with stability margin not less than the given one *η*<sup>d</sup> > 0, i.e., providing in a closed-loop system

$$\min \left\{ -\text{Re}\lambda\_i[(A+\hat{A}) + (B+\hat{B})F] \right\}\_{i=\overline{1,\text{u}}} = \eta \ge \eta\_{\text{d}}.\tag{35}$$

We first investigate the possibility of solving the problem (35) for parametrically uncertain elementary systems of two types. The first type of the considered elementary systems are the systems with known control matrix

$$
\dot{\mathbf{x}} = (A + \hat{A})\mathbf{x} + Bu,\\
\dim \boldsymbol{\mu} = m \ge n = \dim \mathbf{x} = \text{rank}B,\tag{36}
$$

which are obviously controllable. No additional requirements are imposed on them. Variable states with uncertain coefficients cannot be compensated for by feedback, so the control law can be formed in two ways:

$$\mu = F\mathbf{x} = B^+(K - A)\mathbf{x} \text{ or } \mu = F\mathbf{x} = B^+K\mathbf{x},\text{ } K = \text{diag}(k\_i)\_{i=\overline{1,\pi}}.\tag{37}$$

In (37) and below we consider the general case of a rectangular matrix *B*. In a special case *m* = *n*, instead of *B*+, matrix *B*−<sup>1</sup> should be used. The corresponding closed-loop systems have the following form:

$$
\dot{\mathbf{x}} = (\mathbf{K} + \dot{\mathbf{A}})\mathbf{x} \text{ or } \dot{\mathbf{x}} = (\mathbf{K} + \mathbf{A} + \dot{\mathbf{A}})\mathbf{x}.\tag{38}
$$

Obviously, the choice of matrix elements *K* can provide super-stability of systems (38) with any stability margin *ν* > 0. To achieve the control goal (35), let us assume *ν* ≥ *η*d. Then, for any *ki*, satisfying the inequalities

$$k\_i \le - (\eta + \sum\_{j=1}^n \widehat{a}\_{ij}) \text{ or } k\_i \le - (\eta + a\_{ii} + \widehat{a}\_{ij} + \sum\_{j=1, j \ne i}^n (|a\_{ij}| + |\widehat{a}\_{ij}|)), i = \overline{1, n}, \tag{39}$$

matrices of systems (38) will be super-stable (21), which due to (22) and *ν* ≥ *η*<sup>d</sup> solves the problem (35).

In practical applications, in order to save control resources, the first method of feedback generation is recommended (37), and the calculated values of the super-stability margin and *ki* take on the basis of equalities *ν* = *η*<sup>d</sup> and (39).

Consider the general case of parametrically uncertain elementary systems

$$
\dot{\mathbf{x}} = (A + \hat{A})\mathbf{x} + (B + \hat{B})u,\\
\dim u = m \ge n = \dim \mathbf{x} = \text{rank}B,\tag{40}
$$

where the elements of the undefined matrices satisfy (34), but additional constraints must be imposed on the matrix *B*ˆ so that the system remains controllable.

**Remark 2.** *In a first-order system* . *x* = (*a* + *a*ˆ)*x* + (*b* + ˆ *b*)*u, the condition b* = 0 *is added to a basic requirement b* + ˆ *b* = 0*. From a theoretical point of view, the situation is acceptable when* sign(*b*) <sup>=</sup> sign(*<sup>b</sup>* <sup>+</sup> <sup>ˆ</sup> *b*)*, and the problem* (35) *has a solution. However, in models of real control plants the parameters have a certain physical meaning, so the following conditions are proposed:*

$$a \neq 0, b + \hat{b} \neq 0 \text{ and } \text{sign}(b) = \text{sign}(b + \hat{b}) \Rightarrow |b| > \stackrel{\frown}{b} \Leftrightarrow 1 > \stackrel{\frown}{b} / |b|. \tag{41}$$

*The conditions* (41) *are characteristic of adequate models of parametrically uncertain control plants, in which the uncertainty intervals have "reasonable" bounds with respect to the nominal system parameters.*

Then, the control law

$$
\mu = k \text{sign}(b+\hat{b}) \text{x} \tag{42}
$$

will result in a closed-loop system . *x* = (*a* + *a*ˆ + *k b* + ˆ *b* )*x*, and the choice of gain based on inequality *<sup>k</sup>* ≤ −(*η*<sup>d</sup> <sup>+</sup> *<sup>a</sup>* <sup>+</sup> *<sup>a</sup>* )/(|*b*<sup>|</sup> <sup>−</sup> *b* ) provides a given margin of safety.

The condition under which the multidimensional system (40) is not only controllable, but also preserves the structural property of the nominal system, namely, it remains elementary, appears as

$$\text{rank}B = \text{rank}(B + \hat{B}) = n.\tag{43}$$

When making any of the requirements for uncertain matrices in (43) and below, it is assumed by default that these requirements are met for all values of uncertain elements from the allowable ranges (34).

However, as will be shown below, in the used approach the fulfillment of (43) is necessary but not sufficient to solve the problem (35).

Due to the parametric uncertainty of the control matrix in system (40), even state variables with certain coefficients cannot be compensated for by feedback, so we form a one-parameter control law in the form

$$\mu = F\mathbf{x} = kB^{+}\mathbf{S}\mathbf{x},\ \mathbf{k} = \text{const},\ \mathbf{S} = \text{diag}\{\text{sign}(1+l\_{i\bar{i}})\}\_{\mathbf{i}=\overline{1,n^{\*}}}\ L\_{(\mathbf{n}\times\mathbf{n})} = \nexists B^{+} = (l\_{i\bar{i}})\ .\tag{44}$$

From the form of the matrix of the closed-loop system (40), (44),

$$
\dot{\mathbf{x}} = \left[ A + \hat{A} + k(I\_\mathbf{n} + L)S \right] \mathbf{x} \; , \tag{45}
$$

the choice of matrix *S* is clear. It contains the signs of the diagonal elements of the matrix *I* + *L* . The conditions for these signs to be constant for all acceptable values of the matrix *B*ˆ uncertain parameters are formulated in Lemma 3.

**Lemma 3.** *If in system* (45), *the matrix In* + *L has a predominant diagonal*

$$\min \{ |1 + l\_{ii}| - \sum\_{j=1, j \neq i}^{n} |l\_{ij}| \} \Big|\_{i = \overline{1, n}} = \mu > 0,\tag{46}$$

*there is a real number k such that for any real values of elements of matrix A, A*ˆ (34) *and ν* > 0*, the system* (45) *will be super-stable with an super-stability margin ν* > 0.

**Proof 3.** In each *i*-th matrix row of system (45), we substitute *k* = *ki*, *i* = 1, *n*. The resulting matrix will be super-stable with a margin of super-stability *ν* > 0, if

$$|a\_{ii} + \stackrel{\frown}{a}\_{ii} + k\_i|1 + l\_{ii}| \leq - \left(\nu + \sum\_{\substack{j=1, j \neq i}}^n (|a\_{ij}| + \stackrel{\frown}{a}\_{ij}) + k\_i \text{sign}(k\_i) \sum\_{\substack{j=1, j \neq i}}^n |l\_{ij}|\right) , i = \overline{1, n}. |$$

Due to (46), <sup>|</sup><sup>1</sup> <sup>+</sup> *lii*<sup>|</sup> <sup>+</sup> sign(*ki*) *<sup>n</sup>* ∑ *j*=1,*j*=*i lij* > 0 at any sign of *ki*. Taking the "worst" case into account, we obtain autonomous upper estimates for the selection of *ki*:

$$k\_i \le \overline{k}\_i = -\frac{\nu + a\_{ii} + \stackrel{\frown}{a}\_{ii} + \sum\_{j=1, j \ne i}^{n} (|a\_{ij}| + \stackrel{\frown}{a}\_{ij})}{|1 + I\_{ii}| - \sum\_{j=1, j \ne i}^{n} |I\_{ij}|}, i = \overline{1, n}. \tag{47}$$

Obviously, the number we are looking for is *<sup>k</sup>* <sup>≤</sup> min*ki* . *i*=1,*n* , at which all inequalities (47) are fulfilled simultaneously, which ensures that the system (45) is super-stable with any super-stability margin *ν* > 0. Lemma 3 is proved.

Using (46), we simplify the final inequality, obtaining a slightly higher estimate module for the choice of the parameter *k*:

$$k \le \min\_{i=1,n} \{-\left(\nu + a\_{ii} + \widehat{a}\_{ii} + \sum\_{j=1,j\neq i}^{n} \left(|a\_{ij}| + \widehat{a}\_{ij}\right)\}\}\}/\mu.\tag{48}$$

From the set of elementary parametrically indeterminate systems (40), (34), a class of systems with additional requirements (43), (46), for which there is a robust control law (44), (48), provides a solution to the problem (35) if *ν* = *η*d. Notice that condition (41) is a special case of (46).

In the next subsection, a class of systems is extracted from the set of parametrically uncertain non-elementary systems whose nominal model satisfies conditions (28), for which a guaranteed stability margin can be provided by the proposed feedback approach.

#### *3.2. Formalisation of a Class of Acceptable Non-Elementary Systems*

Let us consider the question of possibility in the combination of concepts of superstability and RF in robust synthesis of parametrically uncertain non-elementary system (33), (34), under the assumption that in its nominal model (1), a pair (*A*, *B*) is controllable and has a controllability indicator equal to two (26). As is proved in Lemma 2, in this case, the RF of the nominal system consists of two elementary subsystems, which allows one to synthesize a super-stable closed-loop system in terms of discrepancies and, as a consequence, to provide a guaranteed stability margin in the original closed-loop system.

In order to obtain the RF structure for system (33), it is necessary to impose additional constraints on the undefined matrices *A*ˆ and *B*ˆ. When fulfilled, the system (33) will not only be controllable, but it will also retain the structural properties of the nominal system (26); more specifically, it will have the same dislocation of the basis columns of the controllability matrix and, hence, the structural zeros in the RF. Thus, it is necessary to formalize the conditions under which, as a result of a non-singular linear transformation *Tx* = *x* (12), which is determined by the matrices of the nominal system (1), the indeterminate system (33) will be represented in a form similar to RF (6), (26), that is:

$$\begin{array}{l}\dot{\mathbf{x}}\_{1} = (A\_{11} + \hat{A}\_{11})\mathbf{x}\_{1} + (A\_{12} + \hat{A}\_{12})\mathbf{x}\_{2},\\\dot{\mathbf{x}}\_{2} = (A\_{21} + \hat{A}\_{21})\mathbf{x}\_{1} + (A\_{22} + \hat{A}\_{22})\mathbf{x}\_{2} + (B\_{2} + \mathcal{B}\_{2})\mathbf{u}\_{1}\end{array} \tag{49}$$

where

$$\begin{aligned} \text{rank}(B\_2 + \hat{B}\_2) &= \text{rank}B\_2 = \text{rank}B = \text{dim}x\_2 = m\_0 \text{y} \\ \text{rank}(A\_{12} + \hat{A}\_{12}) &= \text{rank}A\_{12} = \text{dim}x\_1 = n - m\_0 \end{aligned} \tag{50}$$

matrices *Aij*, *B*<sup>2</sup> are known and match the corresponding RF matrices (6) of the nominal system (1), (26), the elements of the matrices *A*ˆ*ij*, *B*ˆ2 are constant and unknown, and the limits of the intervals to which their values belong are recalculated with regard to (34) by the formulas

$$T(A+\hat{A})T^{-1} = \overleftarrow{A},\ T(B+\hat{B}) = \overleftarrow{B}.\tag{51}$$

**Lemma 4.** *Let the pair*(*A*, *B*) *in the nominal system* (1) *be controllable and characterized by the controllability indices* (26)*. If, in system* (33)*, all uncertainty intervals* (34) *for pair* ((*A* + *A*ˆ), (*B* + *B*ˆ)) *rank conditions are met, including*

$$\begin{array}{l} \text{rank}B = \text{rank}(B + \mathcal{B}) = \text{rank}(B \ (B + \mathcal{B})) = m\_0, \\ \text{rank}(B \ AB) = \text{rank}((B + \mathcal{B}) \ (A + \hat{A})(B + \mathcal{B})) \\ = \text{rank}(B \ AB \ (B + \mathcal{B}) \ (A + \hat{A})(B + \mathcal{B})) = m\_0 + m\_1 = n) \\ \Leftrightarrow (\text{rank}(B \ AB) = \text{rank}(B \ AB \ (A + \hat{A})(B + \hat{B})) = m\_0 + m\_1 = n), \end{array} \tag{52}$$

*then by means of a non-singular change of the variables Tx* = *x* (12) *and transformations* (51)*, where T depends only on the matrices of the nominal system A*, *B, system* (33) *will be represented in the form of RF* (49)*, where conditions* (50) *are met.*

**Proof 4.** First condition (52) rank*B* = rank(*B* - *B* + *B*ˆ)) means that the columns of the matrix *B* + *B*ˆ are linear combinations of the columns of the matrix *B*; hence, the indeterminate matrix can be represented as

$$B\_{\rm II \times \rm II} + \hat{B}\_{\rm II \times \rm II} = B \Lambda\_{\rm O(m \times m)},\tag{53}$$

where Λ<sup>0</sup> is indeterminate matrix, *m*<sup>0</sup> ≤ rankΛ<sup>0</sup> ≤ *m*. The second condition (50), rewritten with (53) as

$$\text{rank}(B \mid AB) = \text{rank}(B\Lambda\_0 \; (A+\vec{A})B\Lambda\_0) = \text{rank}(B \; AB \; (A+\vec{A})B\Lambda\_0) = m\_0 + m\_{1,1}$$

means that the columns of the matrix (*A* + *A*ˆ)*B*Λ<sup>0</sup> are linear combinations of the columns of the matrix (*B AB*) and are represented in the form of

$$(A + \hat{A})B\Lambda\_0 = (B \cdot AB)\Lambda\_1 = B\Lambda\_{10} + AB\Lambda\_{11}, \\ \Lambda\_1 = \begin{pmatrix} \Lambda\_{10(m \times m)} \\ \Lambda\_{11(m \times m)} \end{pmatrix}, \text{rank}\Lambda\_{11} \ge n - m\_0.$$

The columns of the matrices *B*Λ<sup>10</sup> are linear combinations of the columns of the matrix *<sup>B</sup>*Λ<sup>0</sup> <sup>=</sup> *Bn*×*<sup>m</sup>* <sup>+</sup> *<sup>B</sup>*ˆ*n*×*<sup>m</sup>* and can be represented as *<sup>B</sup>*Λ<sup>10</sup> <sup>=</sup> *<sup>B</sup>*Λ0Λ00. Consequently, (*B*Λ<sup>0</sup> *AB*Λ11) ∼ (*B*Λ<sup>0</sup> *AB*Λ<sup>11</sup> *B*Λ0Λ00), and then rank(*B*Λ<sup>0</sup> *AB*Λ11) = *m*<sup>0</sup> + *m*<sup>1</sup> = *n*.

Thus, the controllability matrix of the pair ((*A* + *A*ˆ), (*B* + *B*ˆ)) when conditions (50) are fulfilled has a full rank and can be represented in the form

$$((\boldsymbol{B} + \boldsymbol{B})\cdot(\boldsymbol{A} + \boldsymbol{A})(\boldsymbol{B} + \boldsymbol{B})) = (\boldsymbol{B}\Lambda\_0\,\boldsymbol{B}\Lambda\_0\Lambda\_{00} + \boldsymbol{A}\boldsymbol{B}\Lambda\_{11})\,,$$

where the matrix elements Λ0, Λ00, Λ<sup>11</sup> are unknown. Let us denote *AB* = *P*, swap the control matrix blocks *W*ˆ = (*B*Λ0Λ<sup>00</sup> + *P*Λ<sup>11</sup> *B*Λ0), and multiply this matrix on the left by the transition to RF matrix (12), which depends only on the matrix elements of the nominal system:

$$\begin{aligned} T\dot{W} &= T\_d T\_p (B\Lambda\_0 \Lambda\_{00} + P\Lambda\_{11} \quad B\Lambda\_0) = T\_d \left( \begin{pmatrix} \tilde{B}\_1 \\ B\_2 \end{pmatrix} \Lambda\_0 \Lambda\_{00} + \begin{pmatrix} \tilde{P}\_1 \\ \tilde{P}\_2 \end{pmatrix} \Lambda\_{11} \begin{pmatrix} \tilde{B}\_1 \\ B\_2 \end{pmatrix} \Lambda\_0 \right), \\ &= \left( \begin{pmatrix} O \\ B\_2 \end{pmatrix} \Lambda\_0 \Lambda\_{00} + \begin{pmatrix} P\_1 \\ P\_2 \end{pmatrix} \Lambda\_{11} \begin{pmatrix} O \\ B\_2 \end{pmatrix} \Lambda\_0 \right) = \hat{\tilde{W}}. \end{aligned}$$

In the obtained matrix, the right-hand block corresponds to the transformation of the matrix (*B* + *B*ˆ). With (53), it follows that

$$T(B+\hat{B}) = \begin{pmatrix} O \\ B\_2 \end{pmatrix} \Lambda\_0 = \begin{pmatrix} O \\ B\_2 \Lambda\_0 \end{pmatrix} = \begin{pmatrix} O \\ B\_2 + \hat{B}\_2 \end{pmatrix}, \text{rank}(B\_2 + \hat{B}\_2) = \text{rank}B\_2 = m\_{0\prime}$$

i.e., the first condition (52) is satisfied.

According to the scheme given in Lemma 2, let us perform a similarity transformation

$$T(A+\hat{A})T^{-1} = (T((A+\hat{A})^2(B+\mathcal{B})\ \ (A+\hat{A})(B+\mathcal{B})) \ \mathfrak{W}^+ = \begin{pmatrix} A\_{11} + \hat{A}\_{11} & A\_{12} + \hat{A}\_{12} \\ A\_{21} + \hat{A}\_{21} & A\_{22} + \hat{A}\_{22} \end{pmatrix} \ \mathfrak{R}$$

where *A*<sup>12</sup> + *A*ˆ <sup>12</sup> = *P*1Λ11(*B*2Λ0) <sup>+</sup> <sup>⇒</sup>rank*A*<sup>12</sup> <sup>=</sup> rank(*A*<sup>12</sup> <sup>+</sup> *<sup>A</sup>*<sup>ˆ</sup> <sup>12</sup>) = *<sup>n</sup>* <sup>−</sup> *<sup>m</sup>*0, i.e., and the second condition (50) is satisfied. Hence, system (33) is representable in the form (49), whose structure corresponds to the structure of the RF of the nominal system (1), (26). Lemma 4 is proved.

Thus, a class of systems (35), (50) is defined, which can be reduced to RF (49) consisting of two elementary blocks (50) in an invariant way to the unknown parameters (34). Let us adopt without proof the inverse statement for Lemma 4, defining a constructive way to check the rank conditions (50). In system (1), the pair (*A*, *B*) is characterized by controllability indices (26). If the change of variables (12) leads system (33) to RF (49), (50), then in all uncertainty intervals (34) the rank conditions (50) for the pair ((*A* + *A*ˆ), (*B* + *B*ˆ)) is fulfilled.

However, as is shown in the previous subsection, satisfaction of conditions (50) and RF (49), (50) are necessary but, in general, not sufficient for solving the problem (35) in the framework of the technique we used.

Let us first distinguish particular cases that do not require any additional constraints from the theoretical point of view.

If in (26) *n* = 2, *m*<sup>0</sup> = 1, then the RF will consist of two first-order subsystems, where

$$\begin{array}{l} \left(\mathsf{rank}\left(B\_{2} + \hat{B}\_{2}\right) = \mathsf{rank}B\_{2} = 1\right) \Leftrightarrow \left(b\_{2} \neq 0 \text{ and } b\_{2} + \hat{b}\_{2} \neq 0\right);\\ \left(\mathsf{rank}\left(A\_{12} + \hat{A}\_{12}\right) = \mathsf{rank}A\_{12} = 1\right) \Leftrightarrow \left(a\_{12} \neq 0 \text{ and } a\_{12} + \hat{a}\_{12} \neq 0\right). \end{array}$$

Then, similarly to (42), using virtual control and subsequent variable change,

$$\mathbf{x}\_{2} = k\_{1}\text{sign}(a\_{12} + \mathbb{A}\_{12})\mathbf{x}\_{1}, \ e\_{2} = \mathbf{x}\_{2} - k\_{1}\text{sign}(a\_{12} + \mathbb{A}\_{12})\mathbf{x}\_{1} \tag{54}$$

The first subsystem of the RF is stabilized, and the second subsystem is stabilized with real control *u* = *k*2sign(*b*<sup>2</sup> + ˆ *b*2)*e*2. In another particular case, for arbitrary *m*<sup>0</sup> ≥ *n* − *m*<sup>0</sup> > 1 in system (49), *<sup>A</sup>*<sup>ˆ</sup> <sup>12</sup> <sup>≡</sup> *<sup>O</sup>* and *<sup>B</sup>*ˆ2 <sup>≡</sup> *<sup>O</sup>*. Then, the virtual and real control is chosen in a form similar to (37). Furthermore, the last particular case is a combination of the first two, where *<sup>m</sup>*<sup>0</sup> <sup>=</sup> *<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>&</sup>gt; 1 and *<sup>B</sup>*ˆ2 <sup>≡</sup> *<sup>O</sup>*.

Only for the systems with the mentioned properties is the fulfillment of conditions (52) necessary and sufficient to ensure the super-stability of the closed-loop system, with the help of the linear static feedback in terms of the discrepancies.

For the general case of systems from the considered class, sufficient conditions similar to (46) are formulated in terms of elements of the RF matrices (49). Let us first form in the system (49) the virtual and real control analogous to (44):

$$\begin{array}{l} \mathsf{x}\_{2} = F\_{1}\mathbf{x}\_{1} = k\_{1}A\_{12}^{+}S\_{1}\mathbf{x}\_{1}, \ e\_{1} := \mathbf{x}\_{1}, e\_{2} = \mathbf{x}\_{2} - F\_{1}\mathbf{x}\_{1}, \\\ S\_{1} = \text{diag}\{\mathsf{sign}(1 + l\_{ii}^{1})\}, L\_{1(n - m\_{0}) \times (n - m\_{0})} = \hat{A}\_{12}A\_{12}^{+} = (l\_{ij}^{1}) \ ; \\\ \mu = K\_{2}e\_{2} = k\_{2}B\_{2}^{+}S\_{2}e\_{2}, \ \mu = \text{Ke}, \ K = (O \quad K\_{2}), k\_{1,2} = \text{const}, \\\ S\_{2} = \text{diag}\{\mathsf{sign}(1 + l\_{ii}^{2})\}, L\_{2(m\_{0} \times m\_{0})} = \hat{B}\_{2}B\_{2}^{+} = (l\_{ij}^{2}) \ ; \end{array} \tag{55}$$

and let us change the variables (14), (15) and make a closed-loop RF of the uncertain system (49), (55) in discrepancies

$$\begin{array}{l}\dot{e}\_{1} = (A\_{11} + \hat{A}\_{11} + k\_{1}(I + L\_{1})\mathbb{S}\_{1})e\_{1} + (A\_{12} + \hat{A}\_{12})e\_{2},\\\dot{e}\_{2} = (\mathbb{C}\_{21} + \mathbb{C}\_{21})e\_{1} + (\mathbb{C}\_{22} + \mathbb{C}\_{22} + k\_{2}(I + L\_{2})\mathbb{S}\_{2})e\_{2},\end{array} \tag{56}$$

where the ranges of elements of the unknown matrices are assumed to be symmetric and are calculated from (34), considering the performed transformations (12), (15), which depend only on the matrices of the nominal system (1) and the selected *k*1.

From Lemma 3, it follows that by successively selecting at first the parameter *k*<sup>1</sup> = const, and then *k*<sup>2</sup> = const, the system (56) can be made super-stable with a given margin of super-stability *ν* ≥ *η*d, if matrices *In*−*m*<sup>0</sup> + *L*1, *Im*<sup>0</sup> + *L*<sup>2</sup> (55) have dominant diagonals

$$\min \{ |1 + l\_{il}^1| - \sum\_{j=1, j \neq i}^{n-m\_0} \left| l\_{ij}^1 \right| \} \Big|\_{i = \overline{1, n-m\_0}} = \mu\_1 > 0,\\ \min \{ |1 + l\_{il}^2| - \sum\_{j=1, j \neq i}^{m\_0} \left| l\_{ij}^2 \right| \} \Big|\_{i = \overline{1, m\_0}} = \mu\_2 > 0. \tag{57}$$

Then, similarly to (47), a joint system of inequalities can be obtained, based on which the feedback parameters are successively specified in the form of (48). Taking into account the notations

$$\begin{split} \mathcal{A}\_{11(n-m\_0)\times(n-m\_0)} &= (\mathcal{a}\_{ij}^{11})\_{\prime} \mathcal{A}\_{12(n-m\_0)\times m\_0} = (\mathcal{a}\_{ij}^{12})\_{\prime} \mathcal{C}\_{21(m\_0\times(n-m\_0))} = (\mathcal{c}\_{ij}^{21})\_{\prime} \mathcal{C}\_{22(m\_0\times m\_0)} = (\mathcal{c}\_{ij}^{22})\_{\prime}, \\ \mathcal{A}\_{11} &= (\mathcal{a}\_{ij}^{11})\_{\prime} \mathcal{A}\_{12} = (\mathcal{d}\_{ij}^{12})\_{\prime} \mathcal{C}\_{21} = (\mathcal{c}\_{ij}^{21})\_{\prime} \mathcal{C}\_{22} = (\mathcal{c}\_{ij}^{22})\_{\prime}, \\ \left| \mathcal{a}\_{ij}^{11} \right| &\leq \mathcal{a}\_{ij}^{11} \left| \left| \mathcal{a}\_{ij}^{12} \right| \leq \mathcal{a}\_{ij}^{12} \left| \left| \mathcal{c}\_{ij}^{21} \right| \right| \leq \mathcal{c}\_{ij}^{21} \left| \left| \mathcal{c}\_{ij}^{22} \right| \right| \leq \widehat{\mathcal{c}}\_{ij}^{22} \end{split}$$

we have

$$\begin{split} k\_{1} &\leq \min\_{i=1,n-m\_{0}} \{-\left(\nu + a\_{ii}^{11} + \widehat{a}\_{ii}^{\cdot 11} + \sum\_{j=1,j\neq i}^{n-m\_{0}} \left( \left| a\_{ij}^{11} \right| + \widehat{a}\_{ij}^{\cdot 11} \right) + \sum\_{j=1}^{m\_{0}} \left( \left| a\_{ij}^{12} \right| + \widehat{a}\_{ij}^{\cdot 12} \right) \} \} / \mu\_{1}, \\ k\_{2} &\leq \min\_{i=1,m\_{0}} \{-\left(\nu + c\_{ii}^{22} + \widehat{c}\_{ii}^{\cdot 22} + \sum\_{j=1,j\neq i}^{m\_{0}} \left( \left| c\_{ij}^{22} \right| + \widehat{c}\_{ij}^{\cdot 22} \right) + \sum\_{j=1}^{n-m\_{0}} \left( \left| c\_{ij}^{21} \right| + \widehat{c}\_{ij}^{\cdot 21} \right) \} \} / \mu\_{2}. \end{split} \tag{58}$$

The control law based on (55) on the variables of the initial system (19) depends only on the matrices of the nominal system and selected parameters (58) and ensures stabilization of the initial parametrically uncertain system (33) with a guaranteed stability margin (35).

The theoretical statements presented in this subsection and the decomposition synthesis procedure for systems with a controllability indicator equal to two (26) can similarly be extended to non-elementary controllable systems of the general form (24).

#### **4. Simulations**

We consider a mathematical model of the control plant of the form

$$\dot{\mathbf{x}} = A\mathbf{x} + Bu,\\ A = \begin{pmatrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{pmatrix},\\ B = \begin{pmatrix} 1 & 0 \\ 2 & 1 \\ 0 & 1 \end{pmatrix},\\ \dim \mathbf{x} = n = 3,\\ \dim \boldsymbol{\mu} = m = 2. \tag{59}$$

Let us investigate the rank structure of the controllability matrix of the system (59) according to scheme (24):

$$\text{rank}B = \text{rank}\begin{pmatrix} 1 & 0 \\ 2 & 1 \\ 0 & 1 \end{pmatrix} = 2 \neq 0,\\ \text{rank}(B \mid AB) = \text{rank}\begin{pmatrix} 1 & 0 & 3 & 1 \\ 2 & 1 & 2 & 1 \\ 0 & 1 & 1 & 1 \end{pmatrix} = 2 + 1 = 3.$$

Pair (*A*, *B*) is controllable and has a controllability indicator equal to 2. The system (59) belongs to the valid class (26), and its RF will consist of two elementary subsystems of the first and second orders. On the example of system (59), let us demonstrate the decomposition procedures developed in Sections 2 and 3 for the synthesis of modal and robust control based on the transition to the RF.

**Example 1.** *For the system* (59)*, the goal is to synthesize a linear feedback that provides a given spectrum in a closed-loop system σ*<sup>d</sup> = {−1; −1 ± 3*j*}. *To solve the problem, we use the synthesis of modal control based on transition to RF (Procedure 1)*.

1.a. In the matrix *B*, the bottom two rows are linearly independent and form a basis. It is not necessary to rearrange the rows. We assume

$$B\_2 = \left(\begin{array}{cc} 2 & 1 \\ 0 & 1 \end{array}\right) , \ T\_p = I\_\prime\\T = T\_{a\prime} \mathfrak{x} = \vec{\mathfrak{x}} .$$

1.b. Using the second Formula (10), we find the cancellation matrix

$$B\_2^\* = \tilde{B}\_1 B\_2^{-1} = \frac{1}{2} \begin{pmatrix} 1 & 0 \ \end{pmatrix} \begin{pmatrix} 1 & -1 \\ 0 & 2 \end{pmatrix} = \begin{pmatrix} \ 0.5 & -0.5 \ \end{pmatrix}$$

and after performing the transformation (11) to the matrix

$$T = \begin{pmatrix} 1 & -0.5 & 0.5 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}, T^{-1} = \begin{pmatrix} 1 & 0.5 & -0.5 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} \tag{60}$$

we obtain an equivalent representation of system (59) in RF (6), which has the form

$$\begin{aligned} \dot{\mathbf{x}}\_1 &= 1.5\mathbf{x}\_1 + (1.25 \ -0.25)\mathbf{x}\_2, \\ \dot{\mathbf{x}}\_2 &= \begin{pmatrix} 0 \\ 1 \end{pmatrix} \mathbf{x}\_1 + \begin{pmatrix} 1 & 0 \\ 0.5 & 0.5 \end{pmatrix} \mathbf{x}\_2 + \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} \mathbf{u}. \end{aligned} \tag{61}$$

2.a. (Procedure 2) In the first subsystem, we take a valid eigenvalue from the given spectrum (61) as the reference matrix: *A*<sup>1</sup> = −1. The local feedback matrix *x*<sup>2</sup> = *<sup>F</sup>*1(2×1)*x*1, providing (13), has infinitely many realizations. The solution obtained is similar to the first equality (5):

$$A\_{11} + A\_{10}F\_1 = A\_1 \Rightarrow F\_1 = A\_{10}^+(A\_1 - A\_{11}) = \begin{pmatrix} f\_1 \\ f\_2 \end{pmatrix} = \begin{pmatrix} -25/13 \\ 5/13 \end{pmatrix}, \begin{aligned} A\_{10}^+ &= \begin{pmatrix} 10/13 \\ -2/13 \end{pmatrix}, \end{aligned}$$

which is inconvenient for calculations. To determine *F*1, we use a direct method:

$$A\_{11} + A\_{10}F\_1 = A\_1 \Rightarrow 1.5 + \begin{pmatrix} 1.25 & -0.25 \end{pmatrix} \begin{pmatrix} f\_1 \\ f\_2 \end{pmatrix} = -1 \Leftrightarrow f\_2 = 10 + 5f\_1.$$

Let us assume, for example *F*<sup>1</sup> = (−1 5) *<sup>T</sup>*. After performing the transformation (15), we obtain the RF closed by the local relation (16), in the form

$$\dot{e}\_1 = -e\_1 + (1.25 \ -0.25)e\_2.$$

$$\dot{e}\_2 = \left(\begin{array}{cc} -2 \\ 8 \end{array}\right) e\_1 + \left(\begin{array}{cc} 2.25 & -0.25 \\ -5.75 & 1.75 \end{array}\right) e\_2 + \left(\begin{array}{cc} 2 & 1 \\ 0 & 1 \end{array}\right) \mu.$$

2.b. For the remaining complex-conjugate pair from a given spectrum *λ* = −1 ± 3*j*, we make a reference matrix, e.g., in the form of a Jordanian cell *A*<sup>2</sup> = <sup>−</sup>1 3 −3 −1 , and generate the feedback from the second Formula (17) in the form of

$$\begin{aligned} u &= \begin{pmatrix} 0.5 & -0.5 \\ 0 & 1 \end{pmatrix} \left( \begin{pmatrix} 2 \\ -8 \end{pmatrix} e\_1 + \begin{pmatrix} -2.25 & 0.25 \\ -5.75 & -1.75 \end{pmatrix} e\_2 + \begin{pmatrix} -1 & 3 \\ -3 & -1 \end{pmatrix} e\_2 \right), \\ u &= \mathbf{K} \mathbf{c} = \begin{pmatrix} 5 & -3 & 3 \\ -8 & 2.75 & -2.75 \end{pmatrix} \mathbf{e}\_\prime \end{aligned}$$

which leads to a closed system of discrepancies (18), that is,

$$
\dot{e}\_1 = -e\_1 + (1.25 \ -0.25)e\_2,\\
\dot{e}\_2 = \begin{pmatrix} -1 & 3 \\ -3 & -1 \end{pmatrix} e\_2.
$$

2.c. Considering the transformations performed, let us find the feedback matrix and form a modal state control law for the initial system in form (19)

$$F\_{2 \times 3} = KT\_t T = \left( \begin{array}{cccc} 5 & -3 & 3 \\ -8 & 2.75 & -2.75 \end{array} \right) \left( \begin{array}{cccc} 1 & 0 & 0 \\ 1 & 1 & 0 \\ -5 & 0 & 1 \end{array} \right) \left( \begin{array}{cccc} 1 & -0.5 & 0.5 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right), \tag{62}$$

$$u = Fx = \left( \begin{array}{cccc} -13 & 3.5 & -3.5 \\ 8.5 & -1.5 & 1.5 \end{array} \right) \chi\_{\prime}$$

which provides a solution to the problem: *σ*(*A* + *BF*) = *σ<sup>d</sup>* = {−1; −1 ± 3*j*}.

Figure <sup>1</sup> shows the behavior of the variables *<sup>x</sup>*(*t*)=(*x*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*))<sup>T</sup> and controls *<sup>u</sup>*(*t*)=(*u*1(*t*), *<sup>u</sup>*2(*t*))<sup>T</sup> in closed-loop system (59), (62) with *<sup>x</sup>*(0)=(0.5, 0.5, 0.5 ) T.

**Figure 1.** (**a**) Plots of *x*1(*t*), *x*2(*t*), *x*3(*t*); (**b**) Plots of *u*1(*t*), *u*2(*t*) in the closed-loop system (59), (62) with *x*(0)=(0.5, 0.5, 0.5 ) T.

**Example 2.** *For system* (59)*, the problem is to synthesize a linear feedback that provides a given margin of stability in the closed-loop system η* ≥ *η*<sup>d</sup> = 1. *To solve this problem, we use the procedure for synthesis of a super-stable closed-loop system* (18) *based on the transition to the* *RF* (61)*. We assign the numerical values of the super-stability margin and the elements of the reference matrices of the closed-loop system* (18) *on the basis of equalities* (32) *ν* = *η*<sup>d</sup> = 1, *A*<sup>1</sup> = <sup>−</sup>(*<sup>ν</sup>* <sup>+</sup> *a*<sup>12</sup> 11 <sup>+</sup> *a*<sup>12</sup> 12 ) = <sup>−</sup>(<sup>1</sup> <sup>+</sup> 1.25 <sup>+</sup> 0.25) = <sup>−</sup>2.5; *<sup>a</sup>*<sup>2</sup> <sup>1</sup> = *<sup>a</sup>*<sup>2</sup> <sup>2</sup> = −*ν* = −1*, which will ensure that the closed-loop system is super-stable.*

$$
\dot{e}\_1 = -2.5e\_1 + (1.25 \ -0.25)e\_2,\\
\dot{e}\_2 = \begin{pmatrix} -1 & 0\\ 0 & -1 \end{pmatrix} e\_2.
$$

The matrix of this system has a spectrum of *σ* = {−1; −1; −2, 5}. This spectrum, and hence a given stability margin, will be provided in the original closed-loop system (59) by the control (19). To determine the local feedback matrix *F*1, we also use the direct method:

$$A\_{11} + A\_{10}F\_1 = A\_1 \Rightarrow 1.5 + 1.25f\_1 - 0.25f\_2 = -2.5 \Leftrightarrow f\_2 = 16 + 5f\_1.$$

Let us take, for example, *F*<sup>1</sup> = (−2 6) *<sup>T</sup>*. After performing the transformation (15), we obtain the RF of the closed-loop system (16) in the form

$$\begin{aligned} \dot{e}\_1 &= -2.5e\_1 + (1.25 \ -0.25)e\_2, \\ \dot{e}\_2 &= \left( \begin{array}{cc} -7 \\ 18 \end{array} \right) e\_1 + \left( \begin{array}{cc} 3.5 & -0.5 \\ -7 & 2 \end{array} \right) e\_2 + \left( \begin{array}{cc} 2 & 1 \\ 0 & 1 \end{array} \right) \mu. \end{aligned}$$

The second subsystem of this system gives the control laws for the transformed (17) and initial variables (19) as

$$u = \begin{pmatrix} 0.5 & -0.5 \\ 0 & 1 \end{pmatrix} \left( \begin{pmatrix} 7 \\ -18 \end{pmatrix} e\_1 + \begin{pmatrix} -3.5 & 0.5 \\ 7 & -2 \end{pmatrix} e\_2 + \begin{pmatrix} -1 & 0 \\ 0 & -1 \end{pmatrix} e\_2 \right),$$

$$u = Ke = \begin{pmatrix} 12.5 & -5.75 & 1.75 \\ -18 & 7 & -3 \end{pmatrix} e\_\prime$$

$$F = KT\_\varepsilon T = \begin{pmatrix} 12.5 & -5.75 & 1.75 \\ -18 & 7 & -3 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ -6 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & -0.5 & 0.5 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix},\tag{63}$$

$$u = Fx = \begin{pmatrix} -9.5 & -1 & -3 \\ 14 & 0 & 4 \end{pmatrix} x.$$

The matrix of a closed-loop system (59), (63), expressed as

$$A + BF = \begin{pmatrix} -8.5 & 0 & -3 \\ -5 & -1 & -2 \\ 15 & 0 & 5 \end{pmatrix}$$

is not super-stable, but the system has a given margin of stability:

$$\sigma(A+BF) = \{-1; -1; -2, 5\}, \\ \min\{-\mathrm{Re}\lambda\_i(A+BF)\} = 1 = \eta\_{\mathrm{d}}.$$

Figure <sup>2</sup> shows the behavior of the variables *<sup>x</sup>*(*t*)=(*x*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*))<sup>T</sup> and controls *<sup>u</sup>*(*t*)=(*u*1(*t*), *<sup>u</sup>*2(*t*))<sup>T</sup> in a closed-loop system (59), (63) with *<sup>x</sup>*(0)=(0.5, 0.5, 0.5 ) T.

**Figure 2.** (**a**) Plots of *x*1(*t*), *x*2(*t*), *x*3(*t*); (**b**) Plots of *u*1(*t*), *u*2(*t*) in the closed-loop system (59), (63) with *x*(0)=(0.5, 0.5, 0.5 ) T.

Compared to the system (59), (62) (see Figure 1), the transients of the closed-loop system (59), (63) with real spectrum are not oscillatory but aperiodic, but the range of variation in all the variables has increased by about 2.5 times, and the time of regulation has not significantly changed.

**Example 3.** *With the nominal system* (59) *we will consider a parametrically indeterminate system* (33)*, where A*ˆ = *αA, B*ˆ = *βB. Parameters α*, *β are constant and unknown, their values belong to closed symmetric intervals with known boundaries*:

$$|a| \le \widehat{\mathfrak{a}} = 0.1, \ |\mathfrak{f}| \le \widehat{\mathfrak{f}} = 0.1. \tag{64}$$

The problem is to synthesize a linear feedback that provides a guaranteed margin of stability in a closed-loop system *η* ≥ *η*<sup>d</sup> = 1 in all uncertainty intervals. In this system,

$$A + \tilde{A} = A(1+a), \; B + \mathcal{B} = B(1+\beta), \; a \neq -1, \; \beta \neq -1,\tag{65}$$

conditions (53), (52) of Lemma 4 are met. The uncertain system is controllable in all uncertainty intervals and keeps the structural controllability properties of the nominal system (59). Hence, the uncertain system is representable in the form of RF (49)–(50) by transformation (12), (51) with matrix (60), where due to (65), *A*ˆ*ij* = *αAij*, *i*, *j* = 1, 2, *B*ˆ2 = *βB*2.

Let us check that the condition (57) is met:

$$\begin{aligned} L\_1 &= \hat{A}\_{12} A\_{12}^+ = a(5/4 \ -1/4) \begin{pmatrix} 10/13 \\ -2/13 \end{pmatrix} = a = l\_{11}^1; \\ L\_2 &= \mathcal{B}\_2 B\_2^{-1} = \mathcal{B} \begin{pmatrix} 2 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 0.5 & -0.5 \\ 0 & 1 \end{pmatrix} = \mathcal{beta} l\_{21} l\_{11}^2 = l\_{22}^2 = \mathcal{beta}. \end{aligned} \tag{66}$$

Due to (64) 1 + *α* > 0, 1 + *β* > 0, the sufficient condition (57) is fulfilled, and because *μ*<sup>1</sup> = *μ*<sup>2</sup> = 0.9, in RF of an uncertain system, it is possible to provide super-stability by means of feedback (55), where

$$S\_1 = \text{sign}(1+\mathfrak{a}) = 1,\ S\_2 = \text{diag}\{\text{sign}(1+\beta)\} = I.$$

In the first subsystem of the uncertain RF . *x*<sup>1</sup> = (6/4)(1+ *α*)*x*<sup>1</sup> + (5/4 −1/4)(1+ *α*)*x*<sup>2</sup> let us form the virtual control in the form of (55),

$$\mathbf{x}\_{2} = F\_{1}\mathbf{x}\_{1} = k\_{1}A\_{12}^{+}\mathbf{S}\_{1}\mathbf{x}\_{1} = k\_{1}\begin{pmatrix} 10/13\\ -2/13 \end{pmatrix}\mathbf{x}\_{1}.\tag{67}$$

With variable changes *e*<sup>1</sup> := *x*1, *e*<sup>2</sup> = *x*<sup>2</sup> − *F*1*x*<sup>1</sup> we obtain

$$
\dot{e}\_1 = (1+\kappa)((A\_{11}+k\_1)e\_1 + A\_{12}e\_2) \\
= (1+\kappa)((1.5+k\_1)e\_1 + (1.25 \quad -0.25)e\_2).
$$

As can be seen, in this subsystem the choice of gain *k*<sup>1</sup> does not depend on undefined parameters. Let us assume *ν* = *η*<sup>d</sup> = 1; then, similarly to (21), we have

$$-(1.5 + k\_1) - (1.25 + 0.25) \ge 1 \Rightarrow k\_1 \le -4.1$$

For the convenience of the calculation (67), let us assume *k*<sup>1</sup> = −13/2 = −6.5, then *F*<sup>1</sup> = (−5 1) *<sup>T</sup>*. Let us perform transformations (15) taking into account (61), (65)–(67), forming the control law in the form (55), that is,

$$
\mu = k\_2 \epsilon\_2 = k\_2 B\_2^{-1} S\_2 \epsilon\_2 = k\_2 \begin{pmatrix} 0.5 & -0, 5 \\ 0 & 1 \end{pmatrix} \epsilon\_{2\prime} \tag{68}
$$

and we obtain a closed RF of the uncertain system in terms of discrepancies in the form (56), namely,

$$\begin{aligned} \dot{e}\_1 &= (1+a)(-5e\_1 + (1.25 \quad -0.25)e\_2) \\ \dot{e}\_2 &= (1+a)\begin{pmatrix} -30 \\ 3 \end{pmatrix} e\_1 + \left( (1+a) \begin{pmatrix} 7.25 & -1.25 \\ -0.75 & 0.75 \end{pmatrix} + k\_2 (1+\beta) \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \right) e\_2. \end{aligned}$$

From the second inequality (58), we find the second gain

$$\begin{aligned} \overline{k}\_{21} &\le -(1 + (1 + \stackrel{\frown}{a})(30 + 7.25 + 1.25)) / (1 - \stackrel{\frown}{\beta}) \approx -48.2, \\ \overline{k}\_{22} &\le -(1 + (1 + \stackrel{\frown}{a})(3 + 0.75 + 0.75)) / (1 - \stackrel{\frown}{\beta}) \approx -6.62, \\ \overline{k}\_{21} &\le -(1 + (1 + \stackrel{\frown}{a})(3 + 0.75 + 0.75)) / (1 - \stackrel{\frown}{\beta}) \approx -6.62, \\ \overline{k}\_{22} &\le \min\{-48.2; -6.62\}. \end{aligned}$$

Let us take *k*<sup>2</sup> = −50. Then due to (68), (19), we get

$$u = Ke, \ K = \begin{pmatrix} O & K\_2 \\ \end{pmatrix} = \begin{pmatrix} 0 & -25 & 25 \\ 0 & 0 & -50 \\ \end{pmatrix},$$

$$F = KT\_\varepsilon T = \begin{pmatrix} 0 & -25 & 25 \\ 0 & 0 & -50 \\ \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 \\ 5 & 1 & 0 \\ -1 & 0 & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & -0.5 & 0.5 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{pmatrix}.$$

Control law

$$\boldsymbol{\mu} = F\mathbf{x} = \begin{pmatrix} -150 & 50 & -50 \\ 50 & -25 & -25 \end{pmatrix} \mathbf{x} \tag{69}$$

provides in the initial uncertain system a guaranteed margin of stability *η* ≥ *ν* = *η*<sup>d</sup> = 1 in all uncertainty intervals, and this solves the problem. For example, in the nominal system (59) and in the uncertain system with different boundary values of parameters *α* = ±0.1, *β* = ±0, 1 we obtain

$$\begin{aligned} \sigma(A+BF) &= \{-6.0531; -41.5559; -49.3910\}, \eta = 6.0531; \\ \sigma(1.1A + 0.9BF) &= \{-7.0628; -35.3071; -44.33\}, \eta = 7.0628; \\ \sigma(1.1A + 1.1BF) &= \{-6.6585; -45.7114; -54.3301\}, \eta = 6.6585; \\ \sigma(0.9A + 1.1BF) &= \{-5.2231; -47.625; -54.4519\}, \eta = 5.2231; \\ \sigma(0.9A + 0.9BF) &= \{-5.4478; -37.4003; -44.4519\}, \eta = 5.4478. \end{aligned}$$

Figure 3 shows the behavior of the state variables *<sup>x</sup>*(*t*)=(*x*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*))<sup>T</sup> and controls *<sup>u</sup>*(*t*)=(*u*1(*t*), *<sup>u</sup>*2(*t*))<sup>T</sup> in the closed-loop system . *x* = (1.1*A* + 0.9*BF*)*x*, (59), (69) with *x*(0)=(0.5, 0.5, 0.5 ) T.

**Figure 3.** (**a**) Plots of *<sup>x</sup>*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*); (**b**) Plots of *<sup>u</sup>*1(*t*), *<sup>u</sup>*2(*t*) in the closed-loop system . *x* = (1.1*A* + 0.9*BF*)*x*, (59), (69) with *x*(0)=(0.5, 0.5, 0.5 ) T.

In comparison with system (59), (63) (see Figure 2), the solution norm of the closedloop system is practically the same, but the regulation time has been reduced by about 6 times. In addition, the value of *u*(0) has increased by about 10 times.

It should be noted that the control spikes at the beginning of the transient can be limited by piecewise linear control with saturation

$$
\overline{u} = \begin{pmatrix} 10 \text{sat}(\mu\_1) \text{ 10sat}(\mu\_2) \end{pmatrix}^T. \tag{70}
$$

Corresponding graphs for the closed-loop system . *x* = (1.1*A* + 0.9*BF*)*x*, (59), (69), (70) are shown in Figure 4. As can be seen from Figures 3a and 4a, the control constraint (70) had no effect on the state variable transients.

**Figure 4.** (**a**) Plots of *<sup>x</sup>*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*); (**b**) Plots of *<sup>u</sup>*1(*t*), *<sup>u</sup>*2(*t*) in the closed-loop system . *x* = (1.1*A* + 0.9*BF*)*x*, (59), (69), (70).

Figure <sup>5</sup> shows the behavior of the variables *<sup>x</sup>*(*t*)=(*x*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*))<sup>T</sup> and control vector *<sup>u</sup>*(*t*)=(*u*1(*t*), *<sup>u</sup>*2(*t*))<sup>T</sup> in a closed-loop system . *x* = ((1 + *α*)*A* + (1 + *β*)*BF*)*x*, (59), (69), (70), *x*(0)=(0.5, 0.5, 0.5 ) *<sup>T</sup>*, where the unknown parameters smoothly vary within the specified ranges (64): *α* = 0.1 sin 4*t*, *β* = 0.1 sin 2*t*. As we can see, at variable parameters the nature of the transients is practically unchanged, a fact that opens perspectives for using the developed approach in relation to parametrically uncertain non-stationary control systems.

**Figure 5.** (**a**) Plots of *<sup>x</sup>*1(*t*), *<sup>x</sup>*2(*t*), *<sup>x</sup>*3(*t*); (**b**) Plots of *<sup>u</sup>*1(*t*), *<sup>u</sup>*2(*t*) in the closed-loop system . *x* = ((1 + *α*)*A* + (1 + *β*)*BF*)*x*, (59), (69), (70), *x*(0)=(0.5, 0.5, 0.5 ) *T*.

#### **5. Discussion**

In this paper, we propose a new approach to the synthesis of robust control for a practically significant class of linear stationary parametrically uncertain systems, in which the structural controllability properties of the nominal system do not change with parameter variation within acceptable limits. For the special case of systems with a controllability indicator equal to two, the procedures for the synthesis of a stabilizing feedback are formalized in detail, using the concepts of regular form and super-stability. The possibility of extending this approach to a general form of controllable systems is shown theoretically.

It should be noted that the tuning of the feedback coefficients, which guarantee a given margin of stability in the closed-loop system in all uncertainty intervals, is done on the basis of inequalities in terms of matrix elements rather than their eigenvalues. On the one hand, this is what allows synthesizing of a robust system. However, on the other hand, these conditions are only sufficient, and the resulting estimates are conservative. As a result, there may be spikes in the start of transients of state variables and controls that are not acceptable in practical applications.

Numerical examples show the fundamental possibility of limiting the control actions, as well as the performance of the proposed method for non-stationary systems. However, further research is needed to formalize these problems rigorously.

**Author Contributions:** Conceptualization, methodology, S.A.K. and V.A.U.; validation, investigation, formal analysis, Y.G.K., A.V.U., and S.A.K.; writing—original draft preparation, S.A.K. and V.A.U.; writing—review and editing, Y.G.K. and A.V.U. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Identification of Linear Time-Invariant Systems with Dynamic Mode Decomposition**

**Jan Heiland 1,†,‡ and Benjamin Unger 2,\*,‡**


**Abstract:** Dynamic mode decomposition (DMD) is a popular data-driven framework to extract linear dynamics from complex high-dimensional systems. In this work, we study the system identification properties of DMD. We first show that DMD is invariant under linear transformations in the image of the data matrix. If, in addition, the data are constructed from a linear time-invariant system, then we prove that DMD can recover the original dynamics under mild conditions. If the linear dynamics are discretized with the Runge–Kutta method, then we further classify the error of the DMD approximation and detail that for one-stage Runge–Kutta methods; even the continuous dynamics can be recovered with DMD. A numerical example illustrates the theoretical findings.

**Keywords:** dynamic mode decomposition; system identification; Runge–Kutta method

#### **1. Introduction**

Dynamical systems play a fundamental role in many modern modeling approaches of physical and chemical phenomena. The need for high fidelity models often results in large-scale dynamical systems, which are computationally demanding to solve, analyze, and optimize. Thus the last three decades have seen significant efforts to replace the so-called full-order model, which is considered the *truth model*, with a computationally cheaper surrogate model. In the context of model order reduction, we refer the interested reader to the monographs [1–5]. Often, the surrogate model is constructed by projecting the dynamical system onto a low-dimensional manifold, thus requiring a state-space description of the differential equation.

If a mathematical model is not available or not suited for modification, data-driven methods, such as the *Loewner framework* [6,7], *vector fitting* [8–10], *operator inference* [11], or *dynamic mode decomposition* (DMD) [12] may be used to create a low-dimensional realization directly from the measurement or simulation data of the system. Suppose the dynamical system that creates the data is linear. In that case, the Loewner framework and vector fitting are—under some technical assumptions—able to recover the original dynamical system and hence serve as system identification tools. Despite the popularity of DMD, a similar analysis seems to be missing, and this paper aims to close this gap.

Since DMD creates a discrete, linear time-invariant dynamical system from data, we are interested in answering the following questions:


**Citation:** Heiland, J.; Unger, B. Identification of Linear Time-Invariant Systems with Dynamic Mode Decomposition. *Mathematics* **2022**, *10*, 418. https:// doi.org/10.3390/math10030418

Academic Editor: Ioannis G. Stratis

Received: 14 September 2021 Accepted: 24 December 2021 Published: 28 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

It is essential to know how the data for the construction of the DMD model are generated to answer these questions. Assuming exact measurements of the solution may be valid from a theoretical perspective only. Instead, we take the view of a numerical analyst and assume that the data are obtained via time integration of the dynamics with a general *Runge–Kutta method* (RKM) with known order of convergence. We emphasize that for linear time-invariant systems, a RKM may not be the method of choice; see, for instance, [13]. Nevertheless, RKMs are a common numerical technique to solve general differential equations, which is our main reason to consider RKMs in the following.

We can summarize the questions graphically as in Figure 1. Thus, the dashed lines represent the questions that we aim to answer in this paper.

**Figure 1.** Problem setup.

Our main results are the following:


To render the manuscript self-contained, we recall important definitions and results for RKM and DMD in the upcoming Sections 2.1 and 2.2, respectively, before we present our analysis in Section 3. We conclude with a numerical example to confirm the theoretical findings.

#### *Notation*

As is standard, N, R, and R[*t*] denote the positive integers, the real numbers, and the polynomials with real coefficients, respectively. For any *<sup>n</sup>*, *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, we denote with <sup>R</sup>*n*×*<sup>m</sup>* the set of *<sup>n</sup>* <sup>×</sup> *<sup>m</sup>* matrices with real entries. The set of nonsingular matrices of size *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* is denoted with GL*n*(R). Let *<sup>A</sup>* = [*aij*] <sup>∈</sup> <sup>R</sup>*n*×*m*, *<sup>B</sup>* <sup>∈</sup> <sup>R</sup>*p*×*q*, and *xi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* (*<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>k</sup>*). The transpose and the Moore–Penrose pseudoinverse of *A* are denoted with *A<sup>T</sup>* and *A*†, respectively. The Kronecker product ⊗ is defined as

$$A \otimes B := \begin{bmatrix} a\_{11}B & \cdots & a\_{1m}B \\ \vdots & & \vdots \\ a\_{m1}B & \cdots & a\_{mn}B \end{bmatrix} \in \mathbb{IR}^{np \times mnq} \mathbb{J}$$

We use span{*x*1, ... , *xk*} to denote the linear span of the vectors *x*1, ... , *xk* and also casually write span{*X*} = span{*x*1, ... , *xk*} for the column space of the matrix *X* with {*x*1, ... , *xk*} as its columns. For *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* and a vector *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*n*, we denote the reachable space as <sup>C</sup>(*x*0, *<sup>A</sup>*) = span{*x*0, *Ax*0, ... , *<sup>A</sup>n*−1*x*0}. The Stiefel manifold of *<sup>n</sup>* <sup>×</sup> *<sup>r</sup>* dimensional matrices with real entries is denoted by

$$\text{St}(n, r) := \left\{ \mathcal{U} \in \mathbb{R}^{n \times r} \mid \mathcal{U}^T \mathcal{U} = I\_r \right\}, \tag{1}$$

where *Ir* denotes the *r* × *r* identity matrix. For a continuously differentiable function *<sup>x</sup>* : <sup>I</sup> <sup>→</sup> <sup>R</sup>*<sup>n</sup>* from the interval <sup>I</sup> <sup>⊆</sup> <sup>R</sup> to the vector space <sup>R</sup>*n*, we use the notation *<sup>x</sup>*˙ :<sup>=</sup> <sup>d</sup> <sup>d</sup>*<sup>t</sup> x* to denote the derivative with respect to the independent variable *t*, which we refer to as the time.

#### **2. Preliminaries**

As outlined in the introduction, DMD creates a finite-dimensional linear model to approximate the original dynamics. Thus, in view of possibly exact system identification, we need to assume that the data that are fed to the DMD algorithm are obtained from a linear ODE, which in the sequel is denoted by

$$
\dot{\mathfrak{x}}(t) = F\mathfrak{x}(t) \tag{2a}
$$

with given matrix *<sup>F</sup>* <sup>∈</sup> <sup>R</sup>*n*×*n*. To fix a solution of (2a), we prescribe the initial condition

$$\mathfrak{x}(\mathbf{0}) = \mathfrak{x}\_0 \in \mathbb{R}^n,\tag{2b}$$

and denote the solution of the *initial value problem* (IVP) as *x*(*t*; *x*0) := exp(*Ft*)*x*0. For the analysis of DMD, we assume that the matrix *F* is not available. Instead, the question is to what extent DMD is able to recover the matrix *F* solely from measurements of the state variable *x*.

**Remark 1.** *While a* DMD *approximation, despite its linearity, may well reproduce trajectories of nonlinear systems (see, for example, [14]), the question of* DMD *being able to recover the full dynamics has to focus on linear systems. Here, the key observation is that a* DMD *approximation is a finite-dimensional linear map. In contrast, the encoding of nonlinear systems via a linear operator necessarily needs an infinite-dimensional mapping.*

#### *2.1. Runge–Kutta Methods*

To solve the IVP (2) numerically, we employ a RKM, which is a common one-step method to approximate ordinary and differential-algebraic equations [15,16]. More precisely, given a step size *h* > 0, the solution of the IVP (2) is approximated via the sequence *xi* ≈ *x*(*t*<sup>0</sup> + *ih*) given by

$$\mathbf{x}\_{i+1} = \mathbf{x}\_i + h \sum\_{j=1}^{s} \beta\_j k\_{j\prime} \tag{3a}$$

with the so-called *internal stages kj* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* (implicitly) defined via

$$k\_j = F\mathbf{x}\_i + h \sum\_{\ell=1}^s \alpha\_{j,\ell} Fk\_\ell \qquad \text{for } j = 1, \dots, s,\tag{3b}$$

where *<sup>s</sup>* <sup>∈</sup> <sup>N</sup> denotes the number of stages in the RKM. Using the matrix notation <sup>A</sup> = [*αj*,] <sup>∈</sup> <sup>R</sup>*s*×*<sup>s</sup>* and *<sup>β</sup>* = [*βj*] <sup>∈</sup> <sup>R</sup>*<sup>s</sup>* , the *s*-stage RKM defined via (3) is conveniently summarized with the pair (A, *β*). Note that we restrict our presentation to linear timeinvariant dynamics, and hence, do not require the full Butcher tableau.

Since the ODE (2a) is linear, we can rewrite the internal stages as

$$
\begin{bmatrix}
I\_{1} - h\alpha\_{1,1}F & -h\alpha\_{1,2}F & \dots & -h\alpha\_{1,s}F \\
\vdots & \ddots & \ddots & \vdots \\
\end{bmatrix}
\begin{bmatrix}k\_{1} \\ k\_{2} \\ \vdots \\ k\_{s}\end{bmatrix} = \begin{bmatrix}F\mathbf{x}\_{i} \\ F\mathbf{x}\_{i} \\ \vdots \\ F\mathbf{x}\_{i}\end{bmatrix} \tag{4}
$$

Setting *k* := *kT* <sup>1</sup> ... *<sup>k</sup><sup>T</sup> s <sup>T</sup>* <sup>∈</sup> <sup>R</sup>*sn* and *<sup>e</sup>* :<sup>=</sup> 1 ...1*<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*<sup>s</sup>* , the linear system in (4) can be written as

$$(I\_{\mathfrak{s}} \otimes I\_{\mathfrak{n}} - h\mathcal{A} \otimes F)k = (\mathfrak{e} \otimes F)\mathfrak{x}\_{\mathfrak{i}\mathfrak{s}} \tag{5}$$

where ⊗ denotes the Kronecker product. If *h* is small enough, the matrix (*Is* ⊗ *In* − *h*A ⊗ *F*) is invertible, and thus, we obtain the discrete linear system

$$\begin{aligned} \mathbf{x}\_{i+1} &= \mathbf{x}\_i + h \sum\_{j=1}^s \beta\_j \mathbf{k}\_j = \mathbf{x}\_i + h(\boldsymbol{\beta}^T \otimes I\_n)\boldsymbol{k} \\ &= \mathbf{x}\_i + h(\boldsymbol{\beta}^T \otimes I\_n)(I\_s \otimes I\_n - h\boldsymbol{\mathcal{A}} \otimes \boldsymbol{F})^{-1}(\boldsymbol{\epsilon} \otimes \boldsymbol{F})\boldsymbol{x}\_i = A\_{\mathrm{li}} \boldsymbol{x}\_{i\boldsymbol{\epsilon}} \end{aligned}$$

with (using the identity *Is* ⊗ *In* = *Isn*)

$$A\_h := I\_n + h(\mathcal{J}^T \otimes I\_n) \left( I\_{sn} - h \mathcal{A} \otimes F \right)^{-1} (e \otimes F). \tag{6}$$

**Example 1.** *The explicit (or forward) Euler method is given as* (A, *β*)=(0, 1) *and according to* (6) *we obtain the well-known formula Ah* = *In* + *hF. For the implicit (or backward) Euler method* (A, *β*)=(1, 1) *the discrete system matrix is given by*

$$A\_h = I\_n + h(I\_n - hF)^{-1}F = (I\_n - hF)^{-1}(I\_n - hF + hF) = (I\_n - hF)^{-1}.$$

To guarantee that the representation (6) is valid, we make the following assumption throughout the manuscript.

**Assumption 1.** *For any s-stage* RKM (A, *<sup>β</sup>*) *and any dynamical system matrix <sup>F</sup>* <sup>∈</sup> <sup>R</sup>*n*×*n, we assume that the step size h is chosen such that the matrix Isn* − *h*A ⊗ *F is nonsingular.*

**Remark 2.** *Using Assumption 1, the matrix Isn* − *h*A ⊗ *F is nonsingular, and thus, there exists a polynomial p* = ∑*sn*−<sup>1</sup> *<sup>k</sup>*=<sup>0</sup> *pkt <sup>k</sup>* <sup>∈</sup> <sup>R</sup>[*t*] *of degree at most sn* <sup>−</sup> <sup>1</sup> *depending on the step size <sup>h</sup> such that*

$$\begin{aligned} \left(I\_{sn} - h\mathcal{A} \otimes F\right)^{-1} &= p(I\_{sn} - h\mathcal{A} \otimes F) = \sum\_{k=0}^{sn-1} p\_k (I\_{sn} - h\mathcal{A} \otimes F)^k \\ &= \sum\_{k=0}^{sn-1} p\_k \sum\_{\rho=0}^k \binom{k}{\rho} (-1)^{\rho} h^{\rho} (\mathcal{A}^{\rho} \otimes F^{\rho})\_{\rho} \end{aligned}$$

*where the last equality follows from the binomial theorem. Consequently, we have*

$$A\_{\rm li} = I\_{\rm li} + \sum\_{k=0}^{sn-1} p\_k \sum\_{\rho=0}^k \binom{k}{\rho} (-1)^{\rho} h^{\rho+1} \left( \boldsymbol{\beta}^T \boldsymbol{\mathcal{A}}^{\rho} \boldsymbol{e} \right) F^{\rho+1}. \tag{7}$$

*Rearranging the terms together with the Cayley–Hamilton theorem implies the existence of a polynomial <sup>p</sup>*˜ <sup>∈</sup> <sup>R</sup>[*t*] *of degree at most <sup>n</sup> such that Ah* <sup>=</sup> *<sup>p</sup>*˜(*F*)*. As a direct consequence, we see that any eigenvector of F is an eigenvector of Ah and thus, Ah is diagonalizable if F is diagonalizable.*

Having computed the matrix *Ah*, the question that remains to be answered is the quality of the approximation *x*(*ih*; *x*0) − *xi*, which yields the following well-known definition (cf. [15]).

**Definition 1.** *A* RKM (A, *β*) *has* order *p if there exists a constant C* ≥ 0 *(independent of h) such that*

$$\|\|\mathbf{x}(h;\mathbf{x}\_0) - \mathbf{x}\_1\|\| \le Ch^{p+1} \tag{8}$$

*holds, where x*<sup>1</sup> = *Ahx*<sup>0</sup> *with Ah defined as in* (6)*.*

For one-step methods, it is well known that the local errors—as estimated in (8) for the initial time step—basically sum in the global error such that the following estimate holds:

$$\|\|\mathbf{x}(\mathrm{Nh};\mathbf{x}\_0) - \mathbf{x}\_N\|\| \le Ch^p;$$

see, e.g., ([15], Thm. II.3.6).

#### *2.2. Dynamic Mode Decomposition*

For *<sup>i</sup>* <sup>=</sup> 0, ... , *<sup>m</sup>*, assume data points *xi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* are available. If not explicitly stated, we do not make any assumption on *m*. The idea of DMD is to determine a linear timeinvariant relation between the data, i.e., finding a matrix *<sup>A</sup>*DMD <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* such that the data approximately satisfy

$$
\boldsymbol{\pi}\_{i+1} \approx A\_{\text{DMD}} \boldsymbol{\pi}\_i \qquad \text{for } i = 0, 1, \ldots, m - 1.
$$

Following [17], we introduce

$$X := \begin{bmatrix} \mathbf{x}\_0 & \dots & \mathbf{x}\_{m-1} \end{bmatrix} \in \mathbb{R}^{n \times m} \qquad \text{and} \qquad Z := \begin{bmatrix} \mathbf{x}\_1 & \dots & \mathbf{x}\_m \end{bmatrix} \in \mathbb{R}^{n \times m}.\tag{9}$$

Then, the DMD approximation matrix is defined as the minimum-norm solution of

$$\min\_{M \in \mathbb{R}^{n \times n}} ||Z - MX||\_{F\_{\prime}} \tag{10}$$

where ·*<sup>F</sup>* denotes the Frobenius norm. It is easy to show that the minimum-norm solution is given by *A*DMD = *ZX*† [12], where *X*† denotes the Moore–Penrose pseudoinverse of *X*. This motivates the following definition.

**Definition 2.** *Consider the data xi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> for <sup>i</sup>* <sup>=</sup> 0, 1, ... , *<sup>m</sup> and associated data matrices <sup>X</sup> and Z defined in* (9)*. Then the matrix A*DMD := *ZX*† *is called the* DMD matrix *for* (*xi*)*<sup>m</sup> <sup>i</sup>*=0*. If the eigendecomposition of A*DMD *exists, then the eigenvalues and eigenvectors of A*DMD *are called* DMD eigenvalues *and* DMD modes *of* (*xi*)*<sup>m</sup> <sup>i</sup>*=0*, respectively.*

The Moore–Penrose pseudoinverse and, thus, also the DMD matrix can be computed via the *singular value decomposition* (SVD); see, for example, ([18], Ch. 5.5.4). Let

$$
\begin{bmatrix} \boldsymbol{U} & \boldsymbol{U} \end{bmatrix} \begin{bmatrix} \boldsymbol{\Sigma} & \boldsymbol{0} \\ \boldsymbol{0} & \boldsymbol{0} \end{bmatrix} \begin{bmatrix} \boldsymbol{V}^{\top} \\ \boldsymbol{\mathcal{V}}^{\top} \end{bmatrix} = \boldsymbol{X}
$$

denote the SVD of *<sup>X</sup>*, with *<sup>r</sup>* :<sup>=</sup> rank(*X*), *<sup>U</sup>* <sup>∈</sup> St(*n*,*r*), <sup>Σ</sup> <sup>∈</sup> <sup>R</sup>*r*×*<sup>r</sup>* and rank(Σ) = *<sup>r</sup>*, and *V* ∈ St(*m*,*r*), where we use the Stiefel manifold as defined in (1). Then

$$X^\dagger = \begin{bmatrix} V & \mathcal{V} \end{bmatrix} \begin{bmatrix} \Sigma^{-1} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \boldsymbol{U}^\top \\ \boldsymbol{\bar{U}}^\top \end{bmatrix} = V\Sigma^{-1}\boldsymbol{U}^\top \tag{11}$$

and, thus,

$$A\_{\rm DMD} = ZV\Sigma^{-1}U^{T}.\tag{12}$$

For later reference, we call *U*Σ*V* = *X* the *trimmed* SVD of *X*.

#### **3. System Identification and Error Analysis**

In this section, we present our main results. Before discussing system identification for discrete-time (cf. Section 3.2) and continuous-time (cf. Section 3.3) dynamical systems via DMD, we study the impact of transformations of the data on DMD in Section 3.1.

#### *3.1. Data Scaling and Invariance of the DMD Approximation*

Scaling and more general transformations of data are often used to improve the performance of the methods that work on the data. Since DMD is inherently related to the Moore– Penrose inverse, we first study the impact of a nonsingular matrix *<sup>T</sup>* <sup>∈</sup> GL*n*(R) on the generalized inverse. To this purpose, consider a matrix *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>m</sup>* with *<sup>r</sup>* :<sup>=</sup> rank(*X*). Let *<sup>X</sup>* <sup>=</sup> *<sup>U</sup>*Σ*<sup>V</sup>* denote the trimmed SVD of *<sup>X</sup>* with *<sup>U</sup>* <sup>∈</sup> St(*n*,*r*), <sup>Σ</sup> <sup>∈</sup> GL*r*(R) and *<sup>V</sup>* <sup>∈</sup> St(*m*,*r*). Let *TU* <sup>=</sup> *QR* denote the QR-decomposition of *TU* with *<sup>Q</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* and *<sup>R</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>r</sup>* . We immediately obtain rank(*RS*) = *r*. Let *R*Σ = *U*<Σ<*V*< denote the trimmed SVD of *<sup>R</sup>*<sup>Σ</sup> with *<sup>U</sup>*<sup>&</sup>lt; <sup>∈</sup> St(*n*,*r*), <sup>Σ</sup><sup>&</sup>lt; <sup>∈</sup> GL*r*(R), and *<sup>V</sup>*<sup>&</sup>lt; <sup>∈</sup> St(*r*,*r*). We immediately infer

$$
\dot{V}\ddot{V}^{\parallel} = I\_{\text{r.}}\tag{13}
$$

It is easy to see that the matrices *UT* :<sup>=</sup> *QU*<sup>&</sup>lt; <sup>∈</sup> <sup>R</sup>*n*×*<sup>r</sup>* , and *VT* :<sup>=</sup> *VV*<sup>&</sup>lt; <sup>∈</sup> <sup>R</sup>*m*×*<sup>r</sup>* satisfy *U <sup>T</sup> UT* = *Ir* = *V <sup>T</sup> VT*, i.e., *UT* ∈ St(*n*,*r*) and *VT* ∈ St(*m*,*r*). The trimmed SVD of *TX* is thus given by

$$TX = T\mathcal{U}\Sigma V^\top = \mathcal{Q}\mathcal{R}\Sigma V^\top = \mathcal{Q}\widehat{\mathcal{U}}\widehat{\Sigma}\widehat{V}^\top V^\top = \mathcal{U}\_\mathcal{T}\widehat{\Sigma}V\_\mathcal{T}^\top.$$

We conclude

$$(TX)^\dagger TX = V\_T V\_T^\top = V \hat{\mathcal{V}} \hat{\mathcal{V}}^\top V^\top = V V^\top = X^\dagger X\_\prime$$

where we used the identity (13). We have thus shown the following result.

**Proposition 1.** *Let X* <sup>∈</sup> <sup>R</sup>*n*×*<sup>m</sup> and T* <sup>∈</sup> GL*n*(R)*. Then* (*TX*)†(*TX*) = *<sup>X</sup>*†*X.*

With these preparations, we can now show that the DMD approximation is partially invariant to general regular transformations applied to the training data. More precisely, a data transformation only affects the part of the DMD approximation that is not in the image of the data.

**Theorem 1.** *For given data* (*xi*)*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *consider the matrices X and Z as defined in* (9) *and the corresponding* DMD *matrix A*DMD <sup>∈</sup> <sup>R</sup>*n*×*n. Consider T* <sup>∈</sup> GL*n*(R) *and let*

$$\mathcal{X} := TX \quad \text{and} \quad \mathcal{Z} := T\mathcal{Z}$$

*be the matrices of the transformed data. Let A*˜ DMD := *Z*˜ *X*˜ † *denote the* DMD *matrix for the transformed data. Then the* DMD *matrix is invariant under the transformation in the image of X, i.e.,*

$$A\_{\rm DMD}X = T^{-1}\tilde{A}\_{\rm DMD}TX = T^{-1}\tilde{A}\_{\rm DMD}\tilde{X}.$$

*Moreover, if T is unitary or* rank(*X*) = *n, then*

$$A\_{\rm DMD} = T^{-1} A\_{\rm DMD} T.\tag{14}$$

**Proof.** Using Proposition 1, we obtain

$$T^{-1} \bar{A}\_{\text{DMD}} TX = T^{-1} T Z (TX)^{\dagger} TX = ZX^{\dagger} X = A\_{\text{DMD}} X.$$

If *T* is unitary or rank(*X*) = *n*, then we immediately obtain (*TX*)† = *X*†*T*−1, and thus

$$T^{-1} \vec{A}\_{\text{DMD}} T = T^{-1} T Z (T X)^{\dagger} T = Z X^{\dagger} T^{-1} T = A\_{\text{DMD}}.$$

which concludes the proof.

While Theorem 1 states that DMD is invariant under transformations in the image of the data matrix, the invariance in the orthogonal complement of the image of the data matrix, i.e., equality (14), is, in general, not satisfied. We illustrate this observation in the numerical simulations in Section 4 and in the following analytical example.

**Example 2.** *Consider the data vectors xi* := [*i* + 1, 0] *for i* = 0, 1, 2 *and T* := 1 0 1 1 *. Then,*

$$X = \begin{bmatrix} 1 & 2 \\ 0 & 0 \end{bmatrix}, \quad Z = \begin{bmatrix} 2 & 3 \\ 0 & 0 \end{bmatrix}, \quad X^\dagger = \frac{1}{5} \begin{bmatrix} 1 & 0 \\ 2 & 0 \end{bmatrix}, \quad TX = \begin{bmatrix} 1 & 2 \\ 1 & 2 \end{bmatrix}, \quad (TX)^\dagger = \frac{1}{10} \begin{bmatrix} 1 & 1 \\ 2 & 2 \end{bmatrix}$$

*We thus obtain*

$$A\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 0 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 4 & 4 \\ 4 & 4 \end{bmatrix}, \qquad and \qquad T^{-1} \bar{A}\_{\rm DMD} T = \frac{1}{5} \begin{bmatrix} 8 & 4 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 4 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 0 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 0 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 0 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 0 \\ 0 & 0 \end{bmatrix}, \qquad \bar{A}\_{\rm DMD} = \frac{1}{5} \begin{bmatrix} 8 & 0 \\ 0 & 0 \end{bmatrix}$$

*confirming that* DMD *is invariant under transformations in the image of the data, but not in the orthogonal complement.*

**Remark 3.** *One can show that in the setting of Theorem 1, the matrix <sup>M</sup>*<sup>&</sup>lt; := *TA*DMD*T*−<sup>1</sup> *is a minimizer (not necessarily the minimum-norm solution) of*

$$\min\_{M \in \mathbb{R}^{n \times n}} \left\| \hat{Z} - M \hat{X} \right\|\_{\mathbb{F}}.$$

#### *3.2. Discrete-Time Dynamics*

In this subsection, we focus on the identification of discrete-time dynamics, which are exemplified by the discrete-time system

$$\mathbf{x}\_{i+1} = A\mathbf{x}\_i \tag{15}$$

.

with initial value *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* and system matrix *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*n*. The question that we want to answer is to what extent DMD is able to recover the matrix *A* solely from data.

**Proposition 2.** *Consider data* (*xi*)*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> *generated by* (15)*, associated data matrices X*, *Z as defined in* (9)*, and the corresponding* DMD *matrix A*DMD*. Moreover let U*Σ*V* = *X with U* ∈ St(*n*,*r*)*,* <sup>Σ</sup> <sup>∈</sup> GL*r*(R)*, V* <sup>∈</sup> St(*m*,*r*)*, and r* :<sup>=</sup> rank(*X*) *denote the trimmed* SVD *of X. Then*

$$A\_{\rm DMD} = A \amalg U^{\parallel}.\tag{16}$$

**Proof.** By assumption, we have *X* = *<sup>x</sup>*<sup>0</sup> *Ax*<sup>0</sup> ··· *<sup>A</sup>m*−1*x*<sup>0</sup> and *Z* = *AX* = *AU*Σ*V*. We conclude

$$A\_{\rm DMD} = ZX^{\dagger} = A \wr \Sigma V^{\top} V \Sigma^{-1} U^{\top} = A \sqcup U^{\top}. \quad \Box$$

**Remark 4.** *We immediately conclude that* DMD *recovers the true dynamics, i.e., A*DMD = *A, whenever* rank(*X*) = *n. This is the case if and only if* (*A*, *x*0) *is controllable, i.e.,* C(*A*, *x*0) *has dimension n, and the data set is sufficiently rich, i.e., m* ≥ *n.*

Our next theorem identifies the part of the dynamics that is exactly recovered in the case that rank(*X*) < *n* that occurs for (*A*, *x*0) is not controllable or *m* < *n*.

**Theorem 2.** *Consider the setting of Proposition 2. If* span{*U*} *is A*DMD *invariant, then the DMD approximation is exact in the image of U, i.e.,*

$$(A^i - A^i\_{\text{DMD}})\mathbf{x}\_0 = 0 \qquad \text{for all } i \ge 0 \text{ and } \mathbf{x}\_0 \in \text{span}\{\mathcal{U}\}. \tag{17}$$

*If, in addition,* ker(*A*) ∩ span{*U*}<sup>⊥</sup> = {0}*, then also the converse direction holds.*

**Proof.** Let *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> span{*U*}. Since span{*U*} is *<sup>A</sup>*DMD invariant, we conclude *<sup>A</sup><sup>i</sup>* DMD*x*<sup>0</sup> ∈ span{*U*} for *<sup>i</sup>* <sup>≥</sup> 0, i.e., there exists *yi* <sup>∈</sup> <sup>R</sup>*<sup>r</sup>* such that *<sup>A</sup><sup>i</sup>* DMD*x* = *Uyi*. Using Proposition 2 we conclude

$$A\_{\rm DMD}^{i+1} \mathbf{x}\_0 = A\_{\rm DMD} \\ A\_{\rm DMD}^i \mathbf{x}\_0 = A\_{\rm DMD} \\ Ily\_i = A \mathbf{x}\_i = A^{i+1} \mathbf{x}\_0.$$

The proof of (17) follows via induction over *i*. For the converse direction, let *x* = *xU* + *x*<sup>⊥</sup> *<sup>U</sup>* with *xU* ∈ span{*U*} and *x*<sup>⊥</sup> *<sup>U</sup>* ∈ span{*U*}⊥. Proposition 2 and (17) imply

$$(A - A\_{\rm DMD})\mathfrak{x} = A\mathfrak{x}\_{\rm Ul}^{\perp} \neq 0,$$

which completes the proof.

**Remark 5.** *The proof of Theorem 2 details that* span{*U*} *is A*DMD*-invariant if and only if* span{*U*} *is A invariant. Moreover,* span{*U*} = span{*X*} *implies that this condition can be checked easily during the data-generation process. If we further assume that the data are generated via* (15)*, then this is the case whenever*

$$\text{rank}(\begin{bmatrix} \boldsymbol{x}\_0 & \cdots & \boldsymbol{x}\_i \end{bmatrix}) = \text{rank}(\begin{bmatrix} \boldsymbol{x}\_0 & \cdots & \boldsymbol{x}\_{i+1} \end{bmatrix}),$$

*for some i* ≥ 0*.*

*3.3. Continuous-Time Dynamics and RK Approximation*

Suppose now that the data (*xi*)*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> are generated by a continuous process, i.e., via the dynamical system (2). In this case, we are interested in recovering the continuous dynamics from the DMD approximation. As a consequence of Theorem 2, we immediately obtain the following results for exact sampling.

**Corollary 1.** *Let <sup>A</sup>*DMD *be the* DMD *matrix for the sequence xi* <sup>=</sup> exp(*iFh*)*x*<sup>0</sup> <sup>∈</sup> <sup>R</sup>*<sup>n</sup> for i* = 1, . . . , *m with m* ≥ *n. Then*

$$\left(\mathfrak{x}(ih; \mathfrak{x}\_0)\right) = A^i\_{\text{DMD}} \mathfrak{x}\_0.$$

*if and only if x*˜0 ∈ span{*x*0, ... , *xm*}*, where x*(*t*; *x*˜0) *denotes the solution of the* IVP (2) *with initial value x*˜0*.*

**Proof.** The assertion follows immediately from Proposition 2 with the observation that exp(*iFh*) is nonsingular.

We conclude that we can recover the continuous dynamics with the matrix logarithm (see [19] for further details), whenever rank(*X*) = *n*. In practical applications, an exact evaluation of the flow map is typically not possible. Instead, a numerical time-integration method is used to approximate the continuous dynamics.

Suppose we use a RKM with constant step size *h* > 0 to obtain a numerical approximation (*xi*)*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> <sup>⊆</sup> <sup>R</sup>*<sup>n</sup>* of the IVP (2) and use these data to construct the DMD matrix *<sup>A</sup>*DMD <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* as in Definition 2. If we now want to use the DMD matrix to obtain an approximation for a different initial condition, say *x*(0) = *x*˜0, we are interested in quantifying the error

$$\|\|\mathbf{x}(ih; \mathfrak{x}\_0) - A^{\bar{i}}\_{\mathrm{DMD}} \mathfrak{x}\_0\|\|.$$

**Theorem 3.** *Suppose that the sequence* (*xi*)*<sup>m</sup> <sup>i</sup>*=0*, with xi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> for <sup>i</sup>* <sup>=</sup> 0, ... , *m, is generated from the linear* IVP (2) *via a* RKM *of order p and step size h* > 0 *and satisfies*

$$\text{span}\{\mathfrak{x}\_{0\prime}, \dots, \mathfrak{x}\_{m-1}\} = \text{span}\{\mathfrak{x}\_{0\prime}, \dots, \mathfrak{x}\_{m}\}.$$

*Let <sup>A</sup>*DMD <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup> denote the associated* DMD *matrix. Then there exists a constant <sup>C</sup>* <sup>≥</sup> <sup>0</sup> *such that*

$$\left\|\left\|\mathbf{x}(ih;\mathfrak{x}\_{0})-A\_{\mathrm{DMD}}^{i}\mathfrak{x}\_{0}\right\|\right\|\leq Ch^{p}\tag{18}$$

*holds for any x*˜0 ∈ span({*x*0,..., *xm*−1})*.*

**Proof.** Since the data (*xi*)*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> are generated from a RKM, there exists a matrix *Ah* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* such that *xi*+<sup>1</sup> = *Ahxi* for *i* = 0, ... , *m* − 1. Let *x*˜0 ∈ span({*x*0, ... , *xm*−1}). Then, Theorem <sup>2</sup> implies *A<sup>i</sup> <sup>h</sup>x*˜0 = *<sup>A</sup><sup>i</sup>* DMD*x*˜0 for any *i* ≥ 0. Thus, the result follows from the classical error estimates for RKM (see, for example, [15], Thm. II.3.6) and from the equality

$$\|\|\mathbf{x}(ih; \mathfrak{x}\_0) - A^i\_{\text{DMD}}\mathfrak{x}\_0\|\| = \|\|\mathbf{x}(ih; \mathfrak{x}\_0) - A^i\_h\mathfrak{x}\_0\|\| \le Ch^p$$

for some *C* ≥ 0 since the RKM is of order *p*.

The proof details that due to Proposition 2, we are essentially able to recover the discrete dynamics *Ah* obtained from the RKM via DMD, provided that rank(*X*) = *n*. As laid out in Remark 4, this condition is equivalent to (*Ah*, *x*0) being controllable for which the controllability of (*F*, *x*0) is a necessary condition.

The question that remains to be answered is whether it is possible to recover the continuous dynamic matrix *F* from the discrete dynamics *A*DMD (respectively *Ah*) provided that the Runge–Kutta scheme used to discretize the continuous dynamics is known. For any 1-stage Runge–Kutta method (*α*, *β*), i.e., *s* = 1 in (3), this is indeed the case since then (6) simplifies to

$$A\_{\hbar} = I\_{\hbar} + h\beta \left(I\_{\hbar} - h\alpha F\right)^{-1} F\_{\prime\prime}$$

which yields

$$F = -\frac{1}{h}(I\_n - A\_h)\left(\alpha A\_h + (\beta - \alpha)I\_n\right)^{-1}.$$

Combining (19) with Proposition 2 yields the following result.

**Lemma 1.** *Suppose that the sequence* (*xi*)*<sup>m</sup> <sup>i</sup>*=<sup>0</sup> <sup>⊆</sup> <sup>R</sup>*<sup>n</sup> is generated from the linear* IVP (2) *via the* <sup>1</sup>*-stage Runge-Kutta method* (*α*, *<sup>β</sup>*) *with step size <sup>h</sup>* <sup>&</sup>gt; <sup>0</sup>*. Let <sup>A</sup>*DMD <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup> denote the associated* DMD *matrix. If* rank({*x*0,..., *xm*−1}) = *n, then*

$$F = -\frac{1}{\hbar}(I\_{\text{\textquotedblleft}} - A\_{\text{DMD}})\left(\alpha A\_{\text{DMD}} + (\beta - \alpha)I\_{\text{\textquotedblleft}}\right)^{-1},\tag{19}$$

*provided that the inverse exists.*

If the assumption of Lemma 1 holds, then we can recover the continuous dynamic matrix from the DMD approximation. The corresponding formula for popular 1-stage methods is presented in Table 1.

**Table 1.** Identification of continuous-time systems via DMD with 1-stage Runge–Kutta methods.


In this scenario, let us emphasize that we can compute the discrete dynamics with the DMD approximation for any time step.

The situation is different for *s* ≥ 2, as we illustrate with the following example.

**Example 3.** *For given <sup>h</sup>* <sup>&</sup>gt; <sup>0</sup>*, consider <sup>F</sup>*<sup>1</sup> :<sup>=</sup> <sup>0</sup> *and <sup>F</sup>*<sup>2</sup> :<sup>=</sup> <sup>−</sup><sup>2</sup> *<sup>h</sup> . Then, for Heun's method, i.e.,* <sup>A</sup> <sup>=</sup> 0 0 1 0 *and β* = <sup>1</sup> 2 1 2 *, we obtain Ah* <sup>=</sup> *<sup>p</sup>*(*F*) *with <sup>p</sup>*(*x*) = <sup>1</sup> <sup>+</sup> *hx* <sup>+</sup> *<sup>h</sup>*<sup>2</sup> <sup>2</sup> *<sup>x</sup>*2*, and thus p*(*F*1) = *p*(*F*2)*. In particular, we cannot distinguish the continuous-time dynamics in this specific scenario.*

#### **4. Numerical Examples**

To illustrate our analytical findings, we constructed a dynamical system that exhibits some fast dynamics that is stable but not exponentially stable and has a nontrivial but exactly computable flow map. In this way, we can check the approximation both qualitatively and quantitatively. In addition, the system can be scaled to arbitrary state-space dimensions. Most importantly, for our purposes, the system is designed such that for any initial value, the space not reached by the system is at least as large as the reachable space. The complete code of our numerical examples can be found in the supplementary material.

With *<sup>N</sup>* <sup>∈</sup> <sup>N</sup>, <sup>Δ</sup> :<sup>=</sup> diag(0, 1, ... , *<sup>N</sup>* <sup>−</sup> <sup>1</sup>) we consider the continuous-time dynamics (2) with

.

$$F := \begin{bmatrix} 0 & 2\Lambda \\ 0 & -\frac{1}{2}\Lambda \end{bmatrix} \qquad \text{and} \qquad \exp(tF) = \begin{bmatrix} I & 4(I - \exp(-\frac{t}{2}\Lambda)) \\ 0 & \exp(-\frac{t}{2}\Lambda) \end{bmatrix}$$

Starting with an initial value *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> <sup>R</sup>2*<sup>N</sup>* we can thus generate exact snapshots of the solution via *x*(*t*) = exp(*tF*)*x*0, as well as the controllability space

$$\mathcal{L}(F, \mathbf{x}\_0) = \text{span}\left\{ \mathbf{x}\_{0\prime} \begin{bmatrix} 0 & 2\Delta \\ 0 & -\frac{1}{2}\Delta \end{bmatrix} \mathbf{x}\_{0\prime} \begin{bmatrix} 0 & 2\Delta \\ 0 & -\frac{1}{2}\Delta \end{bmatrix}^2 \mathbf{x}\_{0\prime}, \dots, \begin{bmatrix} 0 & 2\Delta \\ 0 & -\frac{1}{2}\Delta \end{bmatrix}^{2N-1} \mathbf{x}\_0 \right\}.$$

One can confirm that dim(C(*F*, *x*0)) ≤ *N* with equality if, for example, the initial state

$$\mathbf{x}\_0 = \begin{bmatrix} \mathbf{x}\_{0,1} \\ \mathbf{x}\_{0,2} \end{bmatrix}$$

has no zero entries in its lower part *<sup>x</sup>*0,2 <sup>∈</sup> <sup>R</sup>*N*. Due to (7), we immediately infer

$$\dim(\mathcal{C}(A\_{h\prime}x\_0)) \le N$$

for any *Ah* obtained by a Runge–Kutta method. We conclude that DMD is at most capable of reproducing solutions that evolve in C(*F*, *x*0). Indeed, as outlined in Proposition 2, all components of any other initial value *x*˜0 that are in the orthogonal complement of C(*F*, *x*0) are set to zero in the first DMD iteration.

For our numerical experiments, we set *N* := 5, *x*<sup>0</sup> := [1, 2, ... , 10] , and consider the time-grid *ti* := *ih* for *i* = 0, 1, ... , 100 with uniform step size *h* = 0.1. A SVD of exactly sampled data

$$
\begin{bmatrix} \mathcal{U}\_1 & \mathcal{U}\_2 \end{bmatrix} \begin{bmatrix} \Sigma\_1 & 0 \\ 0 & 0 \end{bmatrix} V^T = \begin{bmatrix} \mathbf{x}\_0 & \mathbf{x}(h; \mathbf{x}\_0) & \mathbf{x}(2h; \mathbf{x}\_0) & \cdots & \mathbf{x}(10; \mathbf{x}\_0) \end{bmatrix} \tag{20}
$$

of the matrix of snapshots of the solution *x*(*t*; *x*0) reveals that the solution space is indeed of dimension *N* = 5 and defines the bases *U*1, *U*<sup>2</sup> ∈ St(10, 5) of C(*F*, *x*0) and its orthogonal complement, respectively.

For our numerical experiment, whose results are depicted in Figure 2, we choose the initial values

$$\mathfrak{X}\_0 := \mathcal{U}\_1 e \in \text{span}(\mathcal{U}\_1) \qquad \text{and} \qquad \hat{\mathfrak{X}}\_0 := \mathcal{U}\_2 e \in \text{span}(\mathcal{U}\_2) = \text{span}(\mathcal{U}\_1)^\perp,$$

with *e* = [1, 1, 1, 1, 1] . The exact solution for both initial values is presented in Figure 2a,b, respectively. Our simulations confirm the following:


$$T = \begin{bmatrix} 1 & 1 \\ & \ddots & \ddots \\ & & \ddots & 1 \\ & & & \ddots & 1 \\ & & & & 1 \end{bmatrix} \in \operatorname{GL}\_{2N}(\mathbb{R})\_{\star}$$

then compute the DMD approximation, and then transform the results back, the DMD approximation for *x*˜0 remains unchanged (see Figure 2e), confirming (14) from Theorem 1. In contrast, the prediction of the dynamics for *<sup>x</sup>*<<sup>0</sup> changes (see Figure 2f), highlighting that DMD is not invariant under state-space transformations in the orthogonal complement of the data.

The presented numerical example is chosen to illustrate the importance of the reachable space. Computing a subspace numerically is a delicate task in particular if, as in our example, the ratio of the largest and the smallest entry in the controllability matrix is of size (1/2)2*N*−3(*N*−1)2*<sup>N</sup>* (1/2)2*N*−<sup>1</sup> <sup>=</sup> <sup>4</sup>(*<sup>N</sup>* <sup>−</sup> <sup>1</sup>)2*N*, which leads to huge rounding errors already for moderate *N*. This mainly concerns the separation of the reachable and the unreachable subspace, which, however, can be monitored in a general implementation for a general setup. Since in standard SVD implementations, the dominant directions (and, thus, the Moore–Penrose inverse) are computed with high accuracy, for quantitative approximations using DMD, these numerical issues are less severe.

**Figure 2.** *Cont*.

**Figure 2.** Comparison of the exact solution, DMD approximation, and DMD approximation based on transformed data for initial values inside the reachable subspace, i.e., *x*˜0 ∈ C(*F*, *x*0) and outside the reachable subspace, i.e., *<sup>x</sup>*<<sup>0</sup> ∈ C(*F*, *<sup>x</sup>*0)⊥. (**a**) Exact solution with initial value *<sup>x</sup>*˜0. (**b**) Exact solution with initial value *<sup>x</sup>*<0. (**c**) DMD approximation with initial value *<sup>x</sup>*˜0. (**d**) DMD approximation with initial value *<sup>x</sup>*<0. (**e**) DMD with transformed data with initial value *<sup>x</sup>*˜0. (**f**) DMD with transformed data with initial value *<sup>x</sup>*<0.

#### **5. Conclusions**

This work highlighted fundamental properties of the DMD approach if applied to linear problems both in continuous and discrete times. Depending on how the initial data relate to the reachable space, the DMD can recover the exact discrete-time dynamics. If, in addition, the discrete-time data are generated from a continuous-time system via time discretization with a Runge–Kutta scheme, then the error of the DMD approximation is in the same order as the time-integration method. As a by-product of our analysis, we made a relation of the Moore–Penrose inverse and regular transformations explicit, which has not been stated so far. Although the findings mainly confirm what should be expected, the basic principles, such as controllability, will well generalize to nonlinear problems.

**Supplementary Materials:** The following are available at https://www.mdpi.com/article/10.3390/ math10030418/s1. Python script to reproduce the numerical results.

**Author Contributions:** All authors have contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

**Funding:** B. Unger acknowledges funding from the DFG under Germany's Excellence Strategy–EXC 2075–390740016 and is thankful for support by the Stuttgart Center for Simulation Science (SimTech).

**Data Availability Statement:** The code to produce the numerical example is attached to this manuscript as Supplementary Material.

**Acknowledgments:** We thank Robert Altmann for inviting us to the Sion workshop, where we started this work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


## *Article* **Differential Neural Network-Based Nonparametric Identification of Eye Response to Enforced Head Motion**

**Isaac Chairez 1,2, Arthur Mukhamedov 3, Vladislav Prud 3, Olga Andrianova 4,\* and Viktor Chertopolokhov <sup>5</sup>**


**Abstract:** Dynamic motion simulators cannot provide the same stimulation of sensory systems as real motion. Hence, only a subset of human senses should be targeted. For simulators providing vestibular stimulus, an automatic bodily function of vestibular–ocular reflex (VOR) can objectively measure how accurate motion simulation is. This requires a model of ocular response to enforced accelerations, an attempt to create which is shown in this paper. The proposed model corresponds to a single-layer spiking differential neural network with its activation functions are based on the dynamic Izhikevich model of neuron dynamics. An experiment is proposed to collect training data corresponding to controlled accelerated motions that produce VOR, measured using an eye-tracking system. The effectiveness of the proposed identification is demonstrated by comparing its performance with a traditional sigmoidal identifier. The proposed model based on dynamic representations of activation functions produces a more accurate approximation of foveal motion as the estimation of mean square error confirms.

**Keywords:** nonparametric model; artificial neural network; Izhikevich artificial neuron; vestibular– ocular reflex; control Lyapunov function

**MSC:** 93B30; 93-10; 93D30; 93C10; 94C30

#### **1. Introduction**

Currently, a significant multidiscipline effort deals with developing technologies that can be applied for training in simulated environments. Such training can be used in different scenarios, from studying drivers' behavior to improving road safety and pilot training, the latter of which has been one of the leading forces for the development of these systems since the early years. These technologies require understanding human sensory systems and their influence to be studied effectively with the proposer instrumentation and modeling tools.

During simulator training, body movements cannot precisely match what is being shown on screen, causing a mismatch in sensory information and leading to simulator sickness as described in [1,2]. This discrepancy is caused by several factors like delays due to tracking and rendering of the output image and physical limitations of the movement range of training systems. Consequently, attempts to overcome this problem covers several different research directions, including but not limited to dynamic motion systems, forecasting movement, and galvanic vestibular stimulation [3]. However, the problem can also be reversed, so that body reaction is used to estimate the accuracy of simulated motion.

**Citation:** Chairez, I.; Mukhamedov, A.; Prud, V.; Andrianova, O.; Chertopolokhov, V. Differential Neural Network-Based Nonparametric Identification of Eye Response to Enforced Head Motion. *Mathematics* **2022**, *10*, 855. https:// doi.org/10.3390/math10060855

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 10 February 2022 Accepted: 2 March 2022 Published: 8 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

One of such indicators is an ocular response to enforced accelerations by an external system or device, just like a flight simulator.

Due to the size and position of the fovea, which is the part of a human eye retina with a high density of light-sensitive photoreceptors, clear vision is achieved when the object of interest is moving slower than 4◦/s. A unique mechanism exists so that the region of interest on the acquired image stays on the retina as the body moves. It is called the vestibular–ocular reflex (VOR), and it is one of the interaction processes between a human body and the surrounding environment. It operates via a neural path between the vestibular and oculomotor systems: eyes compensate head rotations by rotating in the opposite direction [4].

Incorrect functioning of VOR leads to disruptions of clear vision such as the inability to compensate micromovements of the head. However, as an existing connection between external accelerations and angular velocities with the vestibular response is not entirely understood, VOR cannot be estimated directly. A natural way to study VOR is to observe it using immersive technologies (such as virtual or mixed reality) and produce reliable and accurate mathematical models of VOR with human motion as input and electrophysiological response as output. This response could be electroencephalographic signals, oculographic information, or eye motion data, among others. Despite the importance of such mathematical model design, the number and complexity of physiological aspects increase the difficulty of generating specific models for given motion cues that use a reasonably small number of parameters [5].

An alternative way to represent VOR dynamics is to use nonparametric models to reproduce the aforementioned input–output relationship while maintaining a tractable numerical complexity. Several methodologies propose nonparametric models, including adaptive autoregressive systems, polynomial approximations, swarm optimization techniques, and artificial neural networks. Nevertheless, the dynamic nature of VOR limits the applicability of the models under a wide variety of working scenarios. Dynamic approximate models can also be considered as modeling options for systems describing VOR dynamics. In particular, differential neural networks (DNNs) have been used for a long time as efficient modeling strategies of dynamic systems with uncertain mathematical models that are affected by perturbations and modeling inaccuracies. Notice that DNN based models could be well fitted to represent the VOR dynamics [6,7]. Still, the selection of activation functions could be a matter of discussion, considering that sigmoidal or other monotonical functions may not capture the complex electrophysiological VOR response.

Izhikevich model of neuron activity [8] is a bioinspired characterization of electrophysiology-based approximate mathematical models. Izhikevich artificial mathematical models have been proven to be an efficient model of diverse neuron responses [9]. Therefore, an aggregation of several Izhikevich artificial neurons is named electrophysiology-inspired approximated DNN or spiking DNNs [10,11].

Because of the modeling abilities of DNN using Izhikevich neuron dynamics, this paper proposes a method to approximate oculomotor response using the described spiking DNN model. The main contributions of this study can be summarized as follows:


This manuscript is organized as follows. In Section 2, we provide a general description of the vestibular–ocular response. In Section 3, we introduce the uncertain model of ocular response, which is then formulated as a spiking-differential-neural-network-based nonparametric identifier in Section 4. In Section 5, we describe general modeling strategy as the process of collecting experimental data. In Section 6, we cover processing of the obtained data and assessing performance of the proposed model. Conclusions and final remarks of Section 7 close the study.

#### **2. Description of Vestibular–Ocular Connection**

As jet aviation and then crewed spaceflight progressed, they brought attention to several physiological phenomena: a vestibular–ocular reflex. Its disruption was stated to lead to deterioration of a human being in the pioneering work by A.L. Yarbus [12]. Possible causes of disorder include biological prerequisites like vestibular neuronitis [13] or congenital predisposition [14] as well as environmental change. Crewed spaceflight provided an essential context for studying the activity of the vestibular system and its connection to the rest of the body. The papers by I. Kozlovskaya and L. Kornilova (Institute of Biomedical Problems, Moscow, Russia) [15,16] examine vestibular–sensory disorders in a weightless environment and methodology for diagnosing the VOR functioning.

A general approach for detecting dysfunctions is to compare actual data with the reference. For vestibular–sensory disorders, the latter takes the form of a VOR model. The most common method of creating such models is to describe the system as a dynamic one formed by differential and difference equations. One such example is [17] that uses a bilateral model of an eye. It describes ocular dynamics based on the activity of extraocular muscles connected to the right and left sides of an eye. These muscles are more sensitive to positive difference, so they are more active when the difference is negative [18]. The downside of this model is that muscle behavior is described using a large number of parameters that require the application of genetic algorithms to improve the model accuracy [19].

An alternative method was proposed in [20]. It uses statistical methods to approximate the actual dynamics of optokinetic–vestibule–cervical and vestibular nystagmus. Typical dynamics of nystagmus' slow phase drive the values of the five parameters of the model. With known dynamics of head rotations and depending on supporting visual information, this model generates both phases of nystagmus. However, such modeling approaches do not provide enough flexibility and require vast processing power to solve the underlying optimization problem.

#### **3. Modeling Ocular Response to Enforced Acceleration**

This study is focused on developing a nonparametric model based on a single-layer DNN able to characterize ocular response. The network uses artificial neurons implemented as Izhikevich models, so it operates as a Spiking DNN or SDNN for short. The proposed model produces a vector of two angular coordinates of ocular rotation based on linear acceleration and angular velocities from a vestibular system which serves as an input. Training input data come from a tracking system and ground truth output from a bidimensional eye tracker. The two signals were resampled to have equally acquired information.

Let *ζ* = [*xeye*, *yeye*] be the coordinates vector of the eye movement. Its evolution over time is forced by information from the vestibular system—linear acceleration *a* = [*ax*; *ay*; *az*] and angular velocity *ω* = [*ωx*; *ωy*; *ωz*] . These values are obtained with respect the body motion.

The electrophysiological system relating inertial information with ocular movement operates using the physiological process of VOR. The continuous dynamics of *ζ* as the system state vector, coupled with input vector *u* = [*a*; *ω*] justifies that a model of this relation has uncertain dynamics defined by the following differential equation:

$$\frac{d}{dt}\zeta(t) = f(\zeta(t), u(t)) + \eta(t). \tag{1}$$

Here *<sup>ζ</sup>* <sup>=</sup> *<sup>ζ</sup>*(*t*) is the state vector, *<sup>u</sup>* <sup>∈</sup> <sup>R</sup><sup>6</sup> is the input vector that drives uncertain dynamics described by the proposed vector function *<sup>f</sup>* : <sup>R</sup><sup>2</sup> <sup>×</sup> <sup>R</sup><sup>6</sup> <sup>→</sup> <sup>R</sup>2. *<sup>f</sup>* is Lipschitz with respect to its first argument with a positive constant *Lf* <sup>&</sup>gt; 0. *<sup>η</sup>* <sup>∈</sup> <sup>R</sup><sup>2</sup> is the vector of external perturbations to the system not involved in the modeling process. These perturbations belong to a subset of Σ = 8 *<sup>η</sup>* | *η*<sup>2</sup> <sup>≤</sup> *<sup>η</sup>*0, *<sup>η</sup>*<sup>0</sup> <sup>&</sup>gt; <sup>0</sup> 9 . Such class is admissible considering the nature of inputs and signals that affect the VOR dynamics.

#### **4. Formulation of Spiking-Differential-Neural-Network-Based Model**

For the vestibular–ocular system with an uncertain mathematical model (1), the SDNN formulation assumes the following form:

$$\begin{split} \frac{d}{dt}\mathbb{\tilde{J}}(t) &= A\mathbb{\tilde{J}}(t) + \mathcal{W}\_1^o \Phi\_1(\mathbb{\tilde{J}}(t)) + \mathcal{W}\_2^o \Phi\_2(\mathbb{\tilde{J}}(t)) u(t) + \boldsymbol{\tilde{f}}\_c(\mathbb{\tilde{J}}(t), t) + \boldsymbol{\eta}(t), \\ &\qquad \boldsymbol{\zeta}(0) = \mathbb{\tilde{J}}o \in \mathbb{R}^2. \end{split} \tag{2}$$

The vector *<sup>ζ</sup>* <sup>∈</sup> <sup>R</sup><sup>2</sup> defines the SDNN state. The matrix *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>2×<sup>2</sup> describes the linear component of the network dynamics. This matrix is selected as a Hurwitz one to provide boundedness for the state *ζ*. The two following components form approximation of an uncertain system with traditional SDNN. *W<sup>o</sup>* <sup>1</sup> <sup>∈</sup> <sup>R</sup>2×*p*<sup>1</sup> and *<sup>W</sup><sup>o</sup>* <sup>2</sup> <sup>∈</sup> <sup>R</sup>2×*p*<sup>2</sup> are the weights matrices and *<sup>φ</sup>*<sup>1</sup> : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup>*p*<sup>1</sup> and *<sup>φ</sup>*<sup>2</sup> : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup>*p*2×<sup>6</sup> are the vector and matrix of activation functions respectively. Choice of the exact values of *p*<sup>1</sup> and *p*<sup>2</sup> is left to the SDNN designer, depending on the value of expected approximation error and methodologies of selecting the size of each layer of general artificial neural networks.

Dynamic nature of the real biological neural networks bioinspired the proposal in this study to use activation functions based on neuron evolution. Thus, each component of *φ*<sup>1</sup> and *φ*<sup>2</sup> is described as the output of the Izhikevich model of neuron [8]:

$$\frac{d}{dt}\varrho\_i(t) = f\_0(\varrho\_i(t), \zeta(t)),$$

$$f\_0(\varrho\_i, \zeta) = \begin{bmatrix} 0.04v\_i^2 + 5v\_i - u\_i + 140 + Z\_i^T \zeta \\ a\_i(b\_i v\_i - u\_i) \end{bmatrix}, \varrho\_i = \begin{bmatrix} v\_i \\ u\_i \end{bmatrix},$$

$$\text{if } v\_i \ge 30 \text{ mV, then } \begin{cases} v\_i := c\_i \\ u\_i := u\_i + d\_i. \end{cases} \tag{4}$$

Here *ai*, *bi* and *ci* are the scalar parameters of the Izhikevich model. *φji* = [1, 0]*<sup>i</sup>* characterizes the artificial neuron response and is used as the model output in (2). *Zi* <sup>∈</sup> <sup>R</sup><sup>2</sup> is a vector of input weights.

Function ˜ *fe*(*ζ*(*t*)) : <sup>R</sup><sup>2</sup> <sup>×</sup> <sup>R</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> in (2) represents approximation error due to selection of a finite number of Izhikevich neurons in the proposed SDNN design. Based on SDNN modeling characteristics this error belongs to the following set: Ω = 8 ˜ *fe* | ˜ *fe*<sup>2</sup> <sup>≤</sup> ˜ *f*0, ˜ *f*<sup>0</sup> > 0 9 . This result is a consequence of the dynamics of the Izhikevich artificial neuron.

The term *<sup>η</sup>* <sup>∈</sup> <sup>R</sup><sup>2</sup> in (2) characterises external perturbations, or elements affecting VOR system dynamics while being independent of the states values. This term can be said to belong to the set Σ = 8 *<sup>η</sup>* | *η*<sup>2</sup> <sup>≤</sup> *<sup>η</sup>*<sup>0</sup> 9 with *η*<sup>0</sup> being a positive scalar. Together, the two terms ˜ *fe* and *η* represent the degree of vagueness of the underlying electrophysiological system when describing dynamic activation functions of the SDNN representation.

Based on the described approximate dynamical model, this study considers a model for uncertain dynamics of the VOR based on the design of an adaptive SDNN. The proposed approximate adaptive model can be described as follows:

$$\frac{d}{dt}\mathcal{\zeta}(t) = A\mathcal{\zeta}(t) + \mathcal{W}\_1(t)\phi\_1(\mathcal{\zeta}(t)) + \mathcal{W}\_2(t)\phi\_2(\mathcal{\zeta}(t))u(t), \quad \mathcal{\zeta}(0) = \mathcal{\zeta}\_0 \in \mathbb{R}^2. \tag{5}$$

Vector ˆ *ζ* defines the approximated dynamics of the 2 eye coordinates. The right-hand side of the VOR dynamics consists of spiking neurons and satisfies the model structure described in (2). The parameters *W*<sup>1</sup> and *W*<sup>2</sup> in (5) must be adjusted by a set of learning laws. It is necessary to have the learning laws derived in such a way so that the proposed identifier operating under these learning laws and identical input can reproduce state trajectories of (1). The aforementioned allows issuing the following problem formulation corresponding to the modeling process based on the application of Izhikevich artificial neurons.

Problem statement for the nonparametric modeling with SDNN.

The problem considered in this study is designing the nonlinear algorithm Σ(*x*, *x*ˆ, *x*, *u*) adjusting the weights *<sup>W</sup>* <sup>=</sup> [*W*<sup>1</sup> *<sup>W</sup>*2] in a way that ensures the identification error <sup>Δ</sup> <sup>=</sup> *<sup>ζ</sup>* <sup>−</sup> <sup>ˆ</sup> *ζ* has a stable equilibrium point at the origin:

$$\limsup\_{T \to \infty} \left\{ \sup\_{\eta \in \Sigma, \ \int\_{\varepsilon} \in \Omega} ||\Delta(T)||\_P^2 \right\} \le \gamma \tag{6}$$

where *<sup>γ</sup>* <sup>&</sup>gt; 0 defines the quality of approximation of the proposed SDNN. *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>2×<sup>2</sup> is a positive definite matrix that adjusts influence of different components of the modeling error vector to the overall approximation quality.

This problem can be solved using Lyapunov stability theory by deriving dynamics of *W*<sup>1</sup> and *W*<sup>2</sup> from identification error Δ. To develop the stability study, the dynamics of Δ admits the following ordinary differential equation:

$$\begin{cases} \frac{d}{dt}\Delta(t) = \mathcal{A}\Delta(t) + \mathcal{W}\_1^\*\tilde{\Phi}\_1\left(\hat{\xi}(t)\right) + \mathcal{W}\_2^\*\tilde{\Phi}\_2\left(\hat{\xi}(t)\right)u(t) + \\ \mathcal{W}\_1(t)\phi\_1\left(\hat{\xi}(t)\right) + \bar{\mathcal{W}}\_2(t)\phi\_2\left(\hat{\xi}(t)\right)u(t) + \bar{f}\_c(t) + \eta(t). \end{cases} \tag{7}$$

The process of applying Lyapunov-based stability confirms that identification error has an upper ultimate bound [21,22]. The suggested Lyapunov function has a quadratic form that depends on identification error and SDNN weights. Dynamics of these weights must be selected in such a way to ensure identification error may have an ultimate bound. The following theorem demonstrates that such a bound exists.

**Theorem 1.** *If there exist positive definite matrices* Λ<sup>1</sup> > 0 *and* Λ<sup>2</sup> > 0 *and positive and bounded scalar α* > 0 *such that for the matrix inequality Ric*(*P*, *α*) < 0

$$\begin{aligned} Ric(P,\alpha) &:= P\left(A + \frac{\alpha}{2}I\_{2\times 2}\right) + \left(A + \frac{\alpha}{2}I\_{2\times 2}\right)^\top P + PRP + Q\_{\prime} \\ R &:= \sum\_{j=1}^{2} W\_j^+ \left(\Lambda\_j^{-1}\right) I\_{2\times 2\times} \ Q := 2I\_{2\times 2} + \sum\_{j=1}^{2} L\_j \Lambda\_{j\times} \end{aligned} \tag{8}$$

*there exists at least one positive definite solution <sup>P</sup>* <sup>∈</sup> <sup>R</sup>2×2*, <sup>P</sup>* <sup>=</sup> *<sup>P</sup>* <sup>&</sup>gt; <sup>0</sup> *then the learning laws described by*

$$\begin{cases} \frac{d}{dt}\mathcal{W}\_{\dot{\jmath}}(t) = -k\_{\dot{\jmath}}^{-1}\Omega\_{\dot{\jmath}}(t) + a\bar{\mathcal{W}}\_{\dot{\jmath}}(t),\\ \Omega\_{\dot{\jmath}}(t) = P\Delta(t)\boldsymbol{\phi}\_{\dot{\jmath}}^{\top}(\mathcal{\zeta}(t)), \end{cases} \tag{9}$$

$$\mathcal{W}\_1(0) = \mathcal{W}\_{1,0\prime} \quad \mathcal{W}\_2(0) = \mathcal{W}\_{2,0\prime}j = \{1,2\}\_{\prime\prime}$$

*with scalars k*1, *k*<sup>2</sup> > 0*, W*˜ *<sup>j</sup>* = *Wtr <sup>j</sup>* <sup>−</sup> *Wj, with <sup>W</sup>tr <sup>j</sup> any matrix satisfying Wtr <sup>j</sup>* <sup>−</sup> *<sup>W</sup>*<sup>0</sup> *j j <sup>F</sup>* <sup>≤</sup> *<sup>W</sup>*<sup>+</sup> *j justify the identification error* Δ *converging to a ball with its center at the origin and an ultimate bound given by*

$$
\gamma \le \frac{\eta\_0 + \tilde{f}\_0}{\alpha}.\tag{10}
$$

**Proof of Theorem 1.** Taking into consideration the dynamics of the identification error Δ presented in (7), one may propose an energetic function depending on the deviation between the state *ζ* and ˆ *ζ* as well as the deviation between the weights estimated with the identifier and the actual values of the approximation.

For the particular case of the SDNN considered in this study, the aforementioned energetic function is given by:

$$E\left(\Delta\_\prime \tilde{\mathcal{W}}\_1, \tilde{\mathcal{W}}\_2\right) = ||\Delta||\_{2,P}^2 + k\_1 ||\tilde{\mathcal{W}}\_1||\_F^2 + k\_2 ||\tilde{\mathcal{W}}\_2||\_F^2. \tag{11}$$

Here Δ is the tracking error already, for which its dynamics has been defined in (7). The symbol ·<sup>2</sup> 2,*<sup>P</sup>* represents the weighted *l*<sup>2</sup> norm of finite-dimensional vectors with the positive definite and symmetric matrix *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>2×2. Additionally, the terms *W*˜ *<sup>j</sup>*<sup>2</sup> *<sup>F</sup>*, *j* = 1, 2 are the matrix norms of the deviation weights *W*˜ *<sup>j</sup>*. For this study, the trace operator is selected as the matrix norms for the weights deviations. Hence, the energetic function is

$$E\left(\Delta\_\prime \vec{\mathcal{W}}\_1, \vec{\mathcal{W}}\_2\right) = \Delta^\top P \Delta + k\_1 tr \left\{ \vec{\mathcal{W}}\_1^\top \vec{\mathcal{W}}\_1 \right\} + k\_2 tr \left\{ \vec{\mathcal{W}}\_2^\top \vec{\mathcal{W}}\_2 \right\}. \tag{12}$$

Notice that the function *E* operates as a Lyapunov-like class with a positive definite, null value when the three arguments vanish and are radially unbounded. Now, the full-time derivative of *E* corresponds to

$$\frac{d}{dt}E(t) = 2\Delta^\top(t)P\frac{d}{dt}\Delta(t) + 2k\_1tr\left\{\tilde{W}\_1^\top\frac{d}{dt}W\_1\right\} + 2k\_2tr\left\{\tilde{W}\_2^\top\frac{d}{dt}W\_2\right\} \tag{13}$$

where *E*(*t*) := *E* - Δ(*t*), *W*˜ <sup>1</sup>(*t*), *W*˜ <sup>2</sup>(*t*) . The term 2<sup>Δ</sup>(*t*)*<sup>P</sup> <sup>d</sup> dt*Δ(*t*) admits the following upper bound

$$2\Delta^\top(t)P\frac{d}{dt}\Delta(t) \le \left\|\Delta(t)\right\|\_{2,LM(\mathcal{P})}^2 + \gamma + 2k\_1tr\left\{\tilde{W}\_1^\top\Omega\_{W,1}(t)\right\} + 2k\_2tr\left\{\tilde{W}\_2^\top\Omega\_{W,2}(t)\right\} \tag{14}$$

where *LM*(*P*) = *PA* + *AP* + *PRP* + *Q*, while the value of Ω*W*,1(*t*) and Ω*W*,2(*t*) have been presented in the learning laws for the proposed identifier.

Transition in (14) was obtained by applying the Young's inequality [21] *YZ* + *ZY* ≤ *<sup>Y</sup>*Λ*<sup>Y</sup>* <sup>+</sup> *<sup>Z</sup>*Λ−1*<sup>Z</sup>*, which is valid for any *<sup>Y</sup>* <sup>∈</sup> <sup>R</sup>*r*×*<sup>s</sup>* , *<sup>Z</sup>* <sup>∈</sup> <sup>R</sup>*r*×*<sup>s</sup>* and any positive definite and symmetric matrix <sup>Λ</sup> <sup>∈</sup> <sup>R</sup>*s*×*<sup>s</sup>* a number of times. Taking the result in (14) into the right-hand side of the time derivative of *<sup>d</sup> dt <sup>E</sup>*(*t*), leads to

$$\begin{split} \frac{d}{dt} E(t) \le & \|\Delta(t)\|\_{LM(\mathbb{P})}^2 + \gamma + 2k\_1 tr \left\{ \bar{W}\_1^\top \Omega\_{W,1}(t) \right\} + 2k\_2 tr \left\{ \bar{W}\_2^\top \Omega\_{W,2}(t) \right\} + \\ & 2k\_1 tr \left\{ \bar{W}\_1^\top \frac{d}{dt} \mathcal{W}\_1 \right\} + 2k\_2 tr \left\{ \bar{W}\_2^\top \frac{d}{dt} \mathcal{W}\_2 \right\}. \end{split} \tag{15}$$

With the addition and subtraction of the following terms *<sup>α</sup>*Δ(*t*)<sup>2</sup> *<sup>P</sup>*, *<sup>α</sup>tr*<sup>8</sup> *W*˜ <sup>1</sup> *<sup>W</sup>*˜ <sup>1</sup> 9 and *αtr*8 *W*˜ <sup>2</sup> *<sup>W</sup>*˜ <sup>2</sup> 9 , the next right hand side holds for the time derivative of *E*(*t*)

$$\begin{array}{c} \frac{d}{dt}E(t) \le \left\|\Delta(t)\right\|\_{\text{Ric}(\mathcal{P}\_{\mathcal{A}})}^{2} + \gamma - a\left\|\Delta(t)\right\|\_{\text{P}}^{2} +\\ 2k\_{1}tr\left\{\bar{W}\_{1}^{\top}\Omega\_{W,1}(t)\right\} + 2k\_{2}tr\left\{\bar{W}\_{2}^{\top}\Omega\_{W,2}(t)\right\} +\\ tr\left\{\bar{W}\_{1}^{\top}\left(2k\_{1}\frac{d}{dt}\bar{W}\_{1} + ak\_{1}\bar{W}\_{1}\right)\right\} + tr\left\{\bar{W}\_{2}^{\top}\left(2k\_{2}\frac{d}{dt}\bar{W}\_{2} + ak\_{2}\bar{W}\_{2}\right)\right\} -\\ ak\_{1}tr\left\{\bar{W}\_{1}^{\top}\bar{W}\_{1}\right\} - ak\_{2}tr\left\{\bar{W}\_{2}^{\top}\bar{W}\_{2}\right\}. \end{array} \tag{16}$$

Using the learning laws (9) and the matrix inequality (8) presented in the theorem statement, transforms the right-hand side of the derivative of *E* into

$$\frac{d}{dt}E(t) \le \gamma - a \|\Delta(t)\|\_P^2 - \operatorname{art}\left\{k\_1 \tilde{\mathcal{W}}\_1^\top \tilde{\mathcal{W}}\_1\right\} - \operatorname{art}\left\{k\_2 \tilde{\mathcal{W}}\_2^\top \tilde{\mathcal{W}}\_2\right\}.\tag{17}$$

Using the definition of the Lyapunov yields the following outcome:

$$\frac{d}{dt}E(t) \le \gamma - \alpha E(t). \tag{18}$$

The integration of these last differential inclusions and following the convergence to an invariant set scheme presented in [21], yields to prove the ultimate boundedness of the identification error as well as the weights.

The obtained values of *W*<sup>1</sup> and *W*<sup>2</sup> that minimize the expression (6) may be fixed and used further for solving the prediction problem. The scheme of the whole process (identification and prediction) is shown in Figure 1.

**Figure 1.** Identification and prediction workflow.

#### **5. Modeling Process and Experimental Validation**

The proposed approximate model was tested in an experiment that collects the data from a volunteer using an instrumented controlled acceleration motion device. The data were recorded at a predefined frequency and then injected (offline) to the proposed SDNNbased identifier. This section details all the aspects of the experiment.

A rotating dynamic platform was used to enforce controlled rotational movements on a test subject. This experiment used an XD-motion platform with 4 degrees of freedom produced by Vympel corporation. The data collecting system is based on a virtual reality headset HTC Vive Pro Eye. The headset's position and orientation quaternion in a fixed coordinate system were obtained from the SteamVR tracking system. SRanipal software gathered data provided by a built-in eye-tracking system and produced view origin and direction vectors for each eye as the output at a maximum frequency of 120 Hz. The whole experimental setup is shown in Figure 2. The resulting ocular movements and head dynamics were recorded and later processed to be modeled by the proposed SDNN.

The experimental process is as follows. First, a test subject puts on and adjusts the belts of the headset for it to stay firmly fixed on the head throughout the whole experiment. Then, the eye tracker is calibrated according to SRanipal documentation and guidelines. After finishing the calibration procedure, any adjustment of the headset by the test subject leads to resetting the experiment, according to SRanipal guidelines. The test subject is then sat on the dynamic platform straight. The platform performs rotational movements around the vertical axis, alternating clockwise and counterclockwise. Movement frequency and amplitude remain constant for 30 s, after which a 20-s break takes place, and new movement parameters are loaded. The order of these parameter sets is randomized. The test subject isn't provided any indication of these parameters. Visual and audio cues of motion are further reduced with the headset screen showing solid black and headphones playing static during the experiment.

The choice of movement pattern is based on several factors. First, horizontal semicircular channels are stimulated more than the other two for this kind of movement, so ocular response is also primarily horizontal, allowing to focus on a single axis. Second, the platform has the most reach on this rotational axis, which allows for more diverse movement patterns. Additionally, pitch and roll rotations on this platform are performed by adjusting the length of the legs. However, this adjustment happens even in an idle state when no rotation is being performed, leading to additional platform vibrations introducing parasitic ocular response.

**Figure 2.** Experimental setup for collecting the ocular response to the controlled accelerated movements.

During the processing phase, each movement pattern is handled individually. The leading and trailing 3 s of each recording are trimmed. The view direction vector is converted from a headset coordinate system into angles of eye rotation in horizontal and vertical planes. The head coordinates data were sampled at a lower frequency than eyetracking data, so the former were smoothed using a Gaussian filter. Head orientation quaternion was converted into Euler angles. After leaving only data corresponding to horizontal angles, angular velocity and linear acceleration were calculated.

#### **6. Numerical Simulation**

The collected data from the two motion patterns were used to test the proposed SDNN model. These two patterns are 18 25-degree rotation cycles per minute and 50-degree rotations at a rate of 4.8 cycles per minute. They are later referred to as high- and low-frequency movements. As described earlier, linear accelerations and angular velocities formed the system input *u* while eye rotation angles were used as a reference state *ζ*. Figures 3–6 compare dynamics of the proposed SDNN identifier with Izhikevich and sigmoidal activation functions on the obtained data. Figures 3a and 5a demonstrate recorded head rotation profile. Figures 3b and 5b show evolution of identification error (shown as mean square error) of the proposed identifier. In both cases, the origin is shown to be a practical stable equilibrium point for the analyzed modeling error. Direct comparison between recorded and modeled data is shown in Figures 3c and 5c. Finally, Figures 3d and 5d show evolution of the weights from initial conditions. The highlighted dashed line on both figures illustrates the work of VOR. The correspondence between ground truth eye-tracking data and identifier state shows the validity of the proposed identifier.

The identification performance of the proposed spiking identifier was compared against the traditional sigmoidal DNN-based identifier, shown in Figures 4 and 6. These figures are structured identically to Figures 3 and 5. Note the different *y*-axis scales between all figures on the weights dynamics plot. Parameter values for both identifiers are presented in Table 1. Numerical values are compared in Table 2 as the performance of the two approaches using mean square error (MSE), mean absolute error (MAE), and standardized mean absolute error (sMAE).

**Figure 3.** Identification with Izhikevich activation function for high-frequency rotations: (**a**)—recorded head rotation; (**b**)—identification error; (**c**)—recorded data and identification results comparison; (**d**)—evolution of weights.

**Figure 4.** Identification with sigmoidal activation function for high-frequency rotations: (**a**)—recorded head rotation; (**b**)—identification error; (**c**)—recorded data and identification results comparison; (**d**)—evolution of weights.

**Figure 5.** Identification with Izhikevich activation function for low-frequency rotations: (**a**)—recorded head rotation; (**b**)—identification error; (**c**)—recorded data and identification results comparison; (**d**)—evolution of weights.

**Figure 6.** Identification with sigmoidal activation function for low-frequency rotations: (**a**)—recorded head rotation; (**b**)—identification error; (**c**)—recorded data and identification results comparison; (**d**)—evolution of weights.

**Table 1.** Parameters of the compared identifiers.



**Table 2.** Comparison of identification performance.

Overall, correspondence between modeled behavior and ground truth data shows the applicability of the proposed system under different patterns of rotational movements. Additionally, Izhikevich activation functions for both patterns demonstrate over 50% better performance for modeling ocular response than the DNN implementing sigmoidal activation functions. This shows that SDNN can be used as a generalized approximation class for ocular response dynamics.

#### **7. Conclusions**

This study examines modeling physiological VOR systems using SDNN. The proposed nonparametric model implements an arrangement of the artificial neurons described by Izhikevich dynamics with fixed parameters to follow eye movements caused by known head accelerations. Learning laws have been derived for the proposed SDNN to ensure convergence to the origin of identification error. An experimental setup is proposed and used to obtain data and confirm the validity of the proposed SDNN-based nonparametric model. Comparison of the proposed modeling strategy and a traditional identifier with sigmoidal activation functions was performed for different experimental conditions and demonstrated the efficacy of the proposed approach. One potential use of this study is estimating the accuracy of motion cues simulation. Suppose the ground truth of the ocular motion is acquired using a model of vestibular–ocular response. In that case, it can be compared with experimental data on a dynamic platform to assess how accurate the movement was in terms of vestibule system reaction. Despite the additional computational complexity produced with the application of Izhikevich models, the identification quality improves significantly compared to the traditional sigmoidal (algebraic form) forms. This fact justifies the approximated model proposed in this study and opens novel options to create representations of complex biological systems with multirate dynamics.

#### **8. Patents**

A derivative from this work is currently undergoing software registration process.

**Author Contributions:** Conceptualization, I.C., O.A. and V.C.; methodology, I.C. and O.A.; software, V.P. and A.M.; validation, A.M. and V.P.; formal analysis, O.A. and I.C.; investigation, I.C.; resources, V.C.; data curation, A.M.; writing—original draft preparation, A.M. and V.P.; writing—review and editing, I.C., O.A. and V.C.; visualization, V.P.; supervision, I.C.; project administration, V.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Ministry of Science and Higher Education of the Russian Federation grant number 075-15-2020-923 "Supersonic".

**Institutional Review Board Statement:** Ethical review and approval were waived for this study, due to the study only considered to evaluation of motion cues with volunteers in their normal conditions.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/cut4cut/spikennet/tree/main/data accessed on 1 February 2022.

**Acknowledgments:** The authors thank Alexander Poznyak and Vladimir Alexandrov for fruitful discussions and helpful suggestions and Ernest Sleptsov for valuable advices concerning literature review. **Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


## *Article* **Bayes Synthesis of Linear Nonstationary Stochastic Systems by Wavelet Canonical Expansions**

**Igor Sinitsyn 1,2, Vladimir Sinitsyn 1,2, Eduard Korepanov <sup>1</sup> and Tatyana Konashenkova 1,\***


**\*** Correspondence: tkonashenkova64@mail.ru

**Abstract:** This article is devoted to analysis and optimization problems of stochastic systems based on wavelet canonical expansions. Basic new results: (i) for general Bayes criteria, a method of synthesized methodological support and a software tool for nonstationary normal (Gaussian) linear observable stochastic systems by Haar wavelet canonical expansions are presented; (ii) a method of synthesis of a linear optimal observable system for criterion of the maximal probability that a signal will not exceed a particular value in absolute magnitude is given. Applications: wavelet model building of essentially nonstationary stochastic processes and parameters calibration.

**Keywords:** Bayes criterion; Haar wavelets; loss function; mean risk; observable stochastic systems (OStS); stochastic process (StP); wavelet canonical expansion (WLCE)

**MSC:** 62C10; 65T60

**Citation:** Sinitsyn, I.; Sinitsyn, V.; Korepanov, E.; Konashenkova, T. Bayes Synthesis of Linear Nonstationary Stochastic Systems by Wavelet Canonical Expansions. *Mathematics* **2022**, *10*, 1517. https:// doi.org/10.3390/math10091517

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 25 March 2022 Accepted: 27 April 2022 Published: 2 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Nowadays, for stochastic systems research, e.g., functioning at essentially nonstationary disturbances of complex structures, we need analytical modeling technologies for accurate analysis and synthesis. Methods of analysis and synthesis based on canonical expansions are very suitable for quick analytical modeling realizations using the first two probabilistic moments. Wavelet canonical expansions essentially increase the flexibility and accuracy of corresponding technologies.

It is known [1–3] that canonical expansion (CE) of stochastic processes (StP) is widely used to solve problems of analysis, modeling and synthesis of linear nonstationary stochastic systems (StS). For StS with high availability, corresponding software tools based on CE were worked out in [4–8]. In [4], we gave a brief review of the known algorithmic and software tools. In [5,6], the issues of instrumental software for analytical modeling of nonstationary scalar and vector random functions by means of wavelet CE (WLCE) are considered. The parameters of WLCE are expressed in terms of the coefficients of the expansion of the covariance matrix of random function over two-dimensional Dobshy wavelets. Article [7] continues the thematic cycle dedicated to analytical modeling of linear nonstationary StS based on wavelet and wavelet canonical expansions. The article describes wavelet algorithms for analytical modeling of mathematical expectation, a covariance matrix and a matrix of covariance functions, as well as wavelet algorithms for spectral and correlation-analytical express modeling.

The article [8] continues the thematic cycle devoted to software tools for analytical modeling of linear with parametric interference (Gaussian and non-Gaussian) StS based on nonlinear correlation theory (the method of normal approximation and the method of canonical expansions). Analytical methods are based on orthogonal decomposition of covariance matrix elements using a two–dimensional Dobshy wavelet with a compact carrier and Galerkin–Petrov wavelet methods.

In [5], for an essentially nonstationary StP wavelet, CE (WLCE) was proposed. Nowadays, deterministic wavelet methods are intensively applied to the problems of numerical analysis and modeling. A broad class of numerical methods based on Haar wavelets achieved great success [9]. These methods are simple in the sense of versatility and flexibility and possess less computational cost for accuracy analysis problems. The theory and practice of wavelets has attained its modern growth due to mathematical analysis of the wavelet in [10–12]. The concept of multiresolution analysis was given in [13]. In [14,15] method to construct wavelets with compact support and scaling function was developed. Among the wavelet families, which are described by an analytical expression, the Haar wavelets deserve special attention. Haar wavelets, in combination with the Galerkin method, are very effective and popular for solving different classes of deterministic equations [16–25]. The application of a wavelet for CE of StP and stochastic differential and integrodifferential equations was given in [7,8,26].

In [27,28], design problems for linear mean square (MS) optimal filters are considered on the basis of WLCE. Explicit formulae for calculating the MS optimal estimate of the signal and the MS optimal estimate of the quality of the constructed linear MS optimal operator are derived. Articles [29,30] are devoted to the synthesis of wavelets in accordance with complex statistical criteria (CsC). The basic definitions of CsC and approaches are given. Methodological support is based on Haar wavelets. The main wavelet equations, algorithms, software tools and examples are given. Some particular aspects of the StS wavelet synthesis under nonstationary (for example, shock) perturbations are presented in [31].

The developed wavelet algorithms have a fairly high degree of versatility and can be used in various applied fields of science. Such complex StS describes organizations– technical–economical systems functioning in the presence of internal and external noises and stochastic factors. The developed wavelet algorithms are used for data analysis and information processing in high-availability stochastic systems, in complex data storage systems, model building and calibration.

Let us state the general problem of the Bayes synthesis of linear nonstationary normal observable StS (OStS) by WLCE means. Special attention will be paid to the synthesis of linear optimal system for criterion of the maximum probability that the signal will not exceed a particular value in absolute magnitude. For example, the results of computer experiments are presented and discussed.

#### **2. Bayes Criteria**

In practice [1,2], the choice of criterion for comparing alternative systems for the same purpose, like any question regarding the choice of criteria, is largely a matter of common sense, which can often be approached from consideration of operating conditions and purpose of any particular system.

The criterion of the maximum probability that the signal will not exceed a particular value in absolute magnitude can be represented as

$$E[l(\mathcal{W}, \mathcal{W}^\*)] = \min. \tag{1}$$

If we take the function *l* as the characteristic function of the corresponding set of values of the error, the following formula is valid:

$$I(\mathcal{W}, \mathcal{W}^\*) = \begin{cases} \begin{array}{c} \text{1 at } |\mathcal{W}^\* - \mathcal{W}| > \mathcal{W}, \\ \text{0 at } |\mathcal{W}^\* - \mathcal{W}| \le \mathcal{W}. \end{array} \tag{2}$$

In applications connected with damage accumulation (1) needs to be employed with function *l* in the form:

$$l(\mathcal{W}, \mathcal{W}^\*) = 1 - e^{-k^2(\mathcal{W}^\* - \mathcal{W})^2}. \tag{3}$$

Thus, we get the following general principle for estimating the quality of a system and selecting the criterion of optimality. The quality of the solution of the problem in each actual case is estimated by a function *l*(*W*, *W*∗), the value of which is determined by the actual realizations of the signal *W* and its estimator *W*∗. It is expedient to call this the loss function. The quality of the solution of the problem on average for a given realization of the signal *W* with all possible realizations of the estimator *W*∗ corresponding to particular realization of the signal *W* is estimated by the conditional mathematical expectation of the loss function for the given realization of the signal:

$$\rho(A|W) = E[I(\mathcal{W}, \mathcal{W}^\* | W)].\tag{4}$$

This quantity is called conditional risk. The conditional risk depends on the operator *A* for the estimator *W*∗ and on the realization of signal *W*. Finally, the average quality of the solution for all possible realization of *W* and its estimator *W*∗ is characterized by the mathematical expectation of the conditional risk

$$\mathcal{R}(A) = E[\rho(A|\mathcal{W})|\mathcal{W}] = E[l(\mathcal{W}, \mathcal{W}^\*)].\tag{5}$$

This quantity is called the mean risk.

All criteria of minimum risk which correspond to the possible loss functions or functionals which may contain undetermined parameters are known as Bayes' criteria.

#### **3. Basic formulae for Optimal Bayes Synthesis of Linear Systems**

Let us consider scalar linear OStS with real StP *Z*(*τ*) (*τ* ∈ [*t* − *T*, *t*]), which is the sum of the useful signal and the additive normal noise *X*(*τ*):

$$Z(\tau) = \sum\_{r=1}^{N} \mathcal{U}\_r \xi\_r(\tau) + X(\tau). \tag{6}$$

The useful signal is the linear combination of given random parameters *Ur* (*r* = 1, *N*). We need to get StP *W*(*t*) in the following form:

$$\mathcal{W}(t) = \sum\_{r=1}^{N} \mathcal{U}\_r \mathbb{J}\_r(t) + \mathcal{Y}(t). \tag{7}$$

Here, *ξ*1(*τ*), ... , *ξN*(*τ*), *ζ*1(*τ*), ... , *ζN*(*τ*) are known structural functions; *U*1, ... , *UN* are given random variables (RV) which do not depend on noises *X*(*τ*), *Y*(*τ*) (*EX*(*τ*) = 0, *EY*(*τ*) = 0).

We state to construct an optimal system with operator *A* in cases when output StP:

$$\mathcal{W}^\*(t) = A\mathcal{Z} \tag{8}$$

based on observation StP *Z*(*τ*) at time interval [*t* − *T*, *t*], reproducing given output signal *W*(*t*) for criteria (1) with maximal accuracy.

It is known [1–3] that the solution of this problem through CE is based on two-stage procedures based on Formulae (4) and (5).

Vector CE *X*(*τ*) *Y*(*τ*) *<sup>T</sup>* presents the linear combination of uncorrelated RV with deterministic coordinate functions:

$$X(\pi) = \sum\_{\nu} V\_{\nu} \mathbf{x}\_{\nu}(\pi), \ Y(\pi) = \sum\_{\nu} V\_{\nu} \mathbf{y}\_{\nu}(\pi) \tag{9}$$

According to [1,2] for *Vν* we have

$$V\_{\nu} = \int\_{t-T}^{t} a\_{\nu}(\tau)X(\tau)d\tau + \int\_{t-T}^{t} a\_{\nu}(\tau)Y(\tau)d\tau \tag{10}$$

Then, coordinate functions are calculated by the following formulae:

$$\mathbf{x}\_{\nu}(\tau) = \frac{1}{D\_{\nu}} \int\_{t-T}^{t} a\_{\nu}(\theta) K\_{X}(\tau, \theta) d\theta + \frac{1}{D\_{\nu}} \int\_{t-T}^{t} a\_{\nu}(\theta) K\_{XY}(\tau, \theta) d\theta,\tag{11}$$

$$y\_{\nu}(\mathbf{r}) = \frac{1}{D\_{\nu}} \int\_{t-T}^{t} a\_{\nu}(\theta) K\_{XY}(\theta, \mathbf{r}) d\theta + \frac{1}{D\_{\nu}} \int\_{t-T}^{s} a\_{\nu}(\theta) K\_{Y}(\mathbf{r}, \theta) d\theta. \tag{12}$$

Here, *E*[*Vν*] = 0. *D<sup>ν</sup>* = *D*[*Vν*], *KX*(*τ*, *θ*) = *E*[*X*(*τ*) · *X*(*θ*)], *KXY*(*τ*, *θ*) = *E*[*X*(*τ*) · *Y*(*θ*)], *KY*(*τ*, *θ*) = *E*[*Y*(*τ*) · *Y*(*θ*)]; *aν*(*τ*) is a given set of deterministic functions satisfying biorthogonality conditions:

$$\int\_{t-T}^{t} a\_{\nu}(\tau) \chi\_{\mu}(\tau) d\tau + \int\_{t-T}^{t} a\_{\nu}(\tau) \chi\_{\mu}(\tau) d\tau = \delta\_{\nu\mu}.\tag{13}$$

Let us consider RV

$$Z\_{\nu} = \int\_{t-T}^{t} a\_{\nu}(\tau) Z(\tau) d\tau,\tag{14}$$

and its presentation

$$Z\_V = \sum\_{r=1}^{N} \alpha\_{Vr} lI\_r + V\_{Vr} \tag{15}$$

where

$$\alpha\_{\mathcal{V}} = \int\_{t-T}^{t} a\_{\mathcal{V}}(\tau) \xi\_{\mathcal{V}}(\tau) d\tau. \tag{16}$$

The sum of RV *Zν*, multiplied by *xν*(*τ*) gives the CE of StP *Z*(*τ*) (*τ* ∈ [*t* − *T*, *t*])

$$Z(\pi) = \sum\_{\nu} Z\_{\nu} \boldsymbol{\chi}\_{\nu}(\pi). \tag{17}$$

To find the conditional mathematical expectation of the loss function for StP *Z*(*τ*) (*τ* ∈ [*t* − *T*, *t*]), it is necessary to find the conditional probability density of output StP relatively on input StP *Z*(*τ*). According to (4), StP *W*(*t*) depends upon the given random parameters *Ur* (*r* = 1, *N*) and random noise *Y*(*t*). So, we get

$$Y(t) = \sum\_{\nu} V\_{\nu} y\_{\nu}(t) = \sum\_{\nu} \left( Z\_{\nu} - \sum\_{r=1}^{N} a\_{\nu r} lL\_{r} \right) y\_{\nu}(t) = \sum\_{\nu} Z\_{\nu} y\_{\nu}(t) - \sum\_{r=1}^{N} lL\_{r} \sum\_{\nu} a\_{\nu r} y\_{\nu}(t). \tag{18}$$

Here,

$$W(t) = \sum\_{r=1}^{N} lI\_{\mathcal{V}}\mathbb{I}\_{\mathcal{V}}(t) + \sum\_{\nu} Z\_{\mathcal{V}}y\_{\mathcal{V}}(t) - \sum\_{r=1}^{N} lI\_{\mathcal{V}}\sum\_{\nu} a\_{\nu\mathcal{V}}y\_{\mathcal{V}}(t). \tag{19}$$

The last formula shows that StP *W*(*t*) depends upon random parameters *Ur* (*r* = 1, *N*) and the set of *Zν*.

Let us introduce the vector of RV *U* = *U*<sup>1</sup> *U*<sup>2</sup> ... *UN T* . Conditional distribution of *U* relative StP *Z*(*τ*) coincides with the set of RV *Z<sup>ν</sup>* . Conditional density *f*1(*u*|*z*1, *z*2,...) is defined by the known formula:

$$f\_1(u|z\_1, z\_{2'} \dots) = \frac{f(u)f\_2(z\_1, z\_{2'} \dots |u)}{\int\_{-\infty}^{\infty} f(u)f\_2(z\_{1'}, z\_{2'} \dots |u) du}.\tag{20}$$

Here, *f*(*u*)is a given apriority density of RV *U* = *U*<sup>1</sup> *U*<sup>2</sup> ... *UN T* ; *f*2(*z*1, *z*2, ... |*u*) is a density of RV *Z<sup>ν</sup>* , relatively *U* = *U*<sup>1</sup> *U*<sup>2</sup> ... *UN T* .

Taking into account that vector random noise is normal, *Vν* is the linear transform of vector *X*(*τ*) *Y*(*τ*) *<sup>T</sup>* . We conclude that RV are not only correlated, but also independent. Joint density of *Vν* with zero mathematical exactions and variances *Dν* is expressed by formula

$$f\_V(v\_1, v\_2, \dots) = \frac{1}{\sqrt{(2\pi)^L D\_1 \cdot D\_2 \cdot \dots}} \exp\left\{-\frac{1}{2} \sum\_{\nu} \frac{v\_{\nu}^2}{D\_{\nu}}\right\}.\tag{21}$$

In (7), let us replace RV *U*1, ... , *UN* with their realizations *u*1, ... , *uN*; then, *Z<sup>ν</sup>* is the linear function of RV *V<sup>ν</sup>* with known joint density. Expressing *V<sup>ν</sup>* by *Z<sup>ν</sup>* and using Formula (21), we get:

$$f\_{\mathcal{T}}(z\_1, z\_2, \dots | u) = \frac{1}{\sqrt{(2\pi)^L D\_1 \cdot D\_2 \cdot \dots}} \exp\left\{-\frac{1}{2} \sum\_{\nu} \frac{1}{D\_\nu} \left(z\_\nu - \sum\_{r=1}^N a\_{\nu r} u\_r\right)^2\right\},\tag{22}$$

where *αν*(*u*) = *<sup>N</sup>* ∑ *r*=1 *ανrur*.

After substituting Formula (22) into (20), we get the formula for a posteriori density *<sup>f</sup>*1(*u*|*z*1, *<sup>z</sup>*2,...) of *<sup>U</sup>* <sup>=</sup> *<sup>U</sup>*<sup>1</sup> *<sup>U</sup>*<sup>2</sup> ... *UN <sup>T</sup>* for input StP *<sup>Z</sup>*(*τ*) (*<sup>τ</sup>* <sup>∈</sup> [*<sup>t</sup>* <sup>−</sup> *<sup>T</sup>*, *<sup>t</sup>*]):

$$f\_1(u|z\_1, z\_2, \dots) = \chi(z) f(u) \exp\left\{ \sum\_{\nu} \frac{z\_{\nu} a\_{\nu}(u)}{D\_{\nu}} - \frac{1}{2} \sum\_{\nu} \frac{a\_{\nu}^2(u)}{D\_{\nu}} \right\},\tag{23}$$

$$\chi(z) = \left[ \int\_{-\infty}^{+\infty} f(u) \exp\left\{ \sum\_{\nu} \frac{z\_{\nu} a\_{\nu}(u)}{D\_{\nu}} - \frac{1}{2} \sum\_{\nu} \frac{a\_{\nu}^{2}(u)}{D\_{\nu}} \right\} du \right]^{-1}. \tag{24}$$

This formula may be used after observation when realization *Z*(*τ*) is available.

A posteriori mathematical expectation of loss function *l*(*W*, *W*∗) is called conditional risk, and is denoted as *ρ*(*A*|*W*):

$$\begin{split} \rho(A|\mathcal{W}) &= E[l(\mathcal{W}, \mathcal{W}^\*)|Z] = \chi(z) \int\_{-\infty}^{+\infty} l(\mathcal{W}, \mathcal{W}^\*) f(u) \\ &\times \exp\left\{ \sum\_{\nu} \frac{z\_{\nu} a\_{\nu}(u)}{D\_{\nu}} - \frac{1}{2} \sum\_{\nu} \frac{a\_{\nu}^2(u)}{D\_{\nu}} \right\} du. \end{split} \tag{25}$$

In order to solve the stated problem, it is necessary to calculate the optimal output StP *W*∗(*t*) for every *t* from condition of minimum of integral (11).

Let us consider this integral as a function of *P<sup>W</sup>* = *W*∗(*t*) at fixed values of parameters

$$\eta\_{\mathcal{V}} = \eta\_{\mathcal{V}}(z\_1, z\_2, \dots) = \sum\_{\mathcal{V}} z\_{\mathcal{V}} y\_{\mathcal{V}}(t), \ \eta\_{\mathcal{V}} = \eta\_{\mathcal{V}}(z\_1, z\_2, \dots) = \sum\_{\mathcal{V}} \frac{\mathfrak{a}\_{\mathcal{V}} z\_{\mathcal{V}}}{D\_{\mathcal{V}}} \ (r = \overline{1, N}) \tag{26}$$

and time *t*:

$$\begin{split} I(P^W, \eta\_1, \dots, \eta\_N, t) &= \bigcap\_{-\infty}^{+\infty} \dots \bigcap\_{-\infty}^{+\infty} I\left(\sum\_{r=1}^N u\_r(\mathbb{Z}\_r(t) - b\_{r0}) + \eta\_0, P^W\right) f(u\_1, \dots, u\_N) \\ &\times \exp\left\{\sum\_{r=1}^N \eta\_r u\_r - \frac{1}{2} \sum\_{p,q=1}^N b\_{pq} u\_p u\_q\right\} du\_1 \dots du\_N. \end{split} \tag{27}$$

Here,

$$b\_{p0} = \sum\_{\nu} a\_{\nu p} y\_{\nu}(t), \ b\_{p\eta} = \sum\_{\nu} \frac{1}{D\_{\nu}} a\_{\nu p} a\_{\nu \eta} \ (q, p = \overline{1, N}). \tag{28}$$

The value of parameter *P<sup>W</sup>* = *P<sup>W</sup>* <sup>0</sup> (*t*, *η*0, *η*1, ... , *ηN*) when integral (27) reaches the minimum value defines the Bayes optimal operator for criterion (1). Changing *ηr*, (*r* = 0, *N*) and *P<sup>W</sup>* <sup>0</sup> (*t*, *η*0, ... , *ηN*) variables *η*1, ... , *η<sup>N</sup>* and *z*1, *z*2, ... with the corresponding RV H0, ... , H*<sup>N</sup>* and *Z*1, *Z*2, . . ., we get the required optimal operator:

$$W^\*(t) = AZ = P\_0^w(t, \mathbb{H}\_{0\prime}, \dots, \mathbb{H}\_N),\tag{29}$$

where

$$\mathbf{H}\_0 = \sum\_{\nu} Z\_{\nu} y\_{\nu}(\mathbf{t}), \ \mathbf{H}\_{\mathbf{r}} = \mathbf{H}\_{\mathbf{r}}(Z\_1, Z\_{2\nu}, \dots) = \sum\_{\nu} \frac{a\_{\nu r} Z\_{\nu}}{D\_{\nu}} \ (r = \overline{1, N}) \tag{30}$$

The quality of the optimal operator is estimated by the mean risk [1,2]

$$\begin{array}{lcl} R(A) = E[\rho(A|\mathcal{W})|\mathcal{W}] = E[l(\mathcal{W}, \mathcal{W}^\*)] \\ \stackrel{+\infty}{=} \stackrel{+\infty}{\int} \dots \stackrel{+\infty}{l} l\left(\sum\_{r=1}^N u\_r(\zeta\_r(t) - b\_{r0}) + \eta\_0, P\_0^{\mathcal{W}}\right) f\_2(z\_1, z\_2, \dots | u) f(u) dz\_1 dz\_2 \dots du. \end{array} \tag{31}$$

So, we get the following basic Formulae (23)–(31) necessary for wavelet canonical expansion method.

#### **4. Wavelet Canonical Expansions Method**

Let us construct an operator for an optimal linear system using the Haar wavelet CE method WLCE [5,6]:

$$\left\{\varphi\_{00}(\overline{\mathbb{T}}), \ \psi\_{jk}(\overline{\mathbb{T}})\right\} \tag{32}$$

where

$$\wp\_{00}(\overline{\mathbb{T}}) = \wp(\overline{\mathbb{T}}) = \begin{cases} \ 1, \ \overline{\mathbb{T}} \in [0, 1), \\\ \ 0, \overline{\mathbb{T}} \notin [0, 1) \end{cases} \text{ is a scaling function},\tag{33}$$

$$\psi\_{00}(\overline{\tau}) = \psi(\overline{\tau}) = \begin{cases} \quad 1, \overline{\tau} \in \left[0, \frac{1}{2}\right), \\\quad -1, \overline{\tau} \in \left[\frac{1}{2}, 1\right), \quad \text{is a mother wavelet,} \\\quad 0, \ \overline{\tau} \notin [0, 1) \end{cases} \tag{34}$$

*<sup>ψ</sup>jk*(*τ*) = <sup>√</sup> 2*j ψ*(2*<sup>j</sup> <sup>τ</sup>* <sup>−</sup> *<sup>k</sup>*) are wavelets of level *<sup>j</sup>* for *<sup>j</sup>* <sup>=</sup> 1, 2, ... , *<sup>J</sup>*; *<sup>k</sup>* <sup>=</sup> 0, 1, ... , 2*<sup>j</sup>* <sup>−</sup> 1; *<sup>J</sup>* is maximal resolution level defined by required accuracy of approximation for any function *<sup>f</sup>*(*τ*) <sup>∈</sup> *<sup>L</sup>*2[0, 1] by finite linear combination of Haar wavelets, equal to 2<sup>−</sup> *<sup>J</sup>* 2 . Then, let us present a one-dimensional wavelet basis (32) as:

> *g*1(*τ*) = *ϕ*00(*τ*), *g*2(*τ*) = *ψ*00(*τ*), *gν*(*τ*) = *ψjk*(*τ*), *<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>J</sup>*; *<sup>k</sup>* <sup>=</sup> 0, 1, . . . , 2*<sup>j</sup>* <sup>−</sup> 1; *<sup>ν</sup>* <sup>=</sup> <sup>2</sup>*<sup>j</sup>* <sup>+</sup> *<sup>k</sup>* <sup>+</sup> 1; *<sup>ν</sup>* <sup>=</sup> 3, *<sup>L</sup>*. (35)

For construction of the Haar WLCE for vector *<sup>X</sup>*(*τ*) *<sup>Y</sup>*(*τ*) *<sup>T</sup>* at *<sup>τ</sup>* <sup>∈</sup> [*<sup>t</sup>* <sup>−</sup> *<sup>T</sup>*, *<sup>t</sup>*], we pass to new time variable *<sup>τ</sup>* <sup>∈</sup> [0, 1] , *<sup>τ</sup>* <sup>=</sup> *<sup>τ</sup>*−(*t*−*T*) *<sup>T</sup>* and assume

$$\begin{cases} K\_X(\tau\_1, \tau\_2) \in L^2([t - T, t] \times [t - T, t]), \; K\_{XY}(\tau\_1, \tau\_2) \in L^2([t - T, t] \times [t - T, t]),\\ K\_Y(\tau\_1, \tau\_2) \in L^2([t - T, t] \times [t - T, t]), \end{cases} \tag{36}$$

$$\begin{array}{l} \mathbb{Z}\_{X}(\mathbb{\pi}\_{1}, \mathbb{\pi}\_{2}) \in L^{2}([0,1] \times [0,1]), \ \mathbb{Z}\_{XY}(\mathbb{\pi}\_{1}, \mathbb{\pi}\_{2}) \in L^{2}([0,1] \times [0,1]),\\ \mathbb{Z}\_{Y}(\mathbb{\pi}\_{1}, \mathbb{\pi}\_{2}) \in L^{2}([0,1] \times [0,1]). \end{array} \tag{37}$$

Additionally, for presentation of given covariance functions in the form of twodimensional wavelet expansion, it is necessary to define the two-dimensional orthogonal

basis through tensor composition of one-dimensional bases (32) when scaling is performed simultaneously for two variables

$$\begin{array}{ll}\Phi^{A}(\overline{\pi}\_{1},\overline{\pi}\_{2}) = \varphi\_{00}(\overline{\pi}\_{1})\varphi\_{00}(\overline{\pi}\_{2}), \,\,\Psi^{H}(\overline{\pi}\_{1},\overline{\pi}\_{2}) = \varphi\_{00}(\overline{\pi}\_{1})\psi\_{00}(\overline{\pi}\_{2}),\\\Psi^{B}(\overline{\pi}\_{1},\overline{\pi}\_{2}) = \psi\_{00}(\overline{\pi}\_{1})\varphi\_{00}(\overline{\pi}\_{2}), \,\,\Psi^{D}\_{jkn}(\overline{\pi}\_{1},\overline{\pi}\_{2}) = \psi\_{jk}(\overline{\pi}\_{1})\psi\_{jn}(\overline{\pi}\_{2})\end{array} \tag{38}$$

where *<sup>j</sup>* <sup>=</sup> 1, 2, . . . , *<sup>J</sup>*; *<sup>k</sup>*, *<sup>n</sup>* <sup>=</sup> 0, 1, . . . , 2*<sup>j</sup>* <sup>−</sup> 1.

So, the two-dimensional wavelet expansion of given covariance functions takes the form

$$\overline{\mathcal{K}}\_{\overline{X}}(\overline{\pi}\_{1},\overline{\pi}\_{2}) = a^{x}\Phi^{A}(\overline{\pi}\_{1},\overline{\pi}\_{2}) + b^{x}\Psi^{\text{H}}(\overline{\pi}\_{1},\overline{\pi}\_{2}) + b^{x}\Psi^{B}(\overline{\pi}\_{1},\overline{\pi}\_{2}) + \sum\_{j=0}^{l} \sum\_{k=0}^{\mathcal{Q}^{1}-1} \sum\_{w=0}^{\mathcal{Q}^{1}-1} a^{x}\_{jkn} \Psi^{D}\_{jkn}(\overline{\pi}\_{1},\overline{\pi}\_{2}) \tag{39}$$

where

$$\begin{aligned} a^{\mathbf{x}} &= \int \int \overline{K}\_{\mathbf{X}}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) \Phi^{A}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) d\overline{\tau}\_{1} d\overline{\tau}\_{2}, \; h^{\mathbf{x}} = \int \overline{\int} \overline{K}\_{\mathbf{X}}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) \Psi^{\mathbf{H}}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) d\overline{\tau}\_{1} d\overline{\tau}\_{2}, \\\ b^{\mathbf{x}} &= \int \overline{\int} \overline{K}\_{\mathbf{X}}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) \Psi^{\mathbf{B}}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) d\overline{\tau}\_{1} d\overline{\tau}\_{2}, \; d^{\mathbf{x}}\_{j\mathbf{k}n} = \int \overline{\int} \overline{K}\_{\mathbf{X}}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) \Psi^{\mathbf{D}}\_{j\mathbf{k}n}(\overline{\tau}\_{1}, \overline{\tau}\_{2}) d\overline{\tau}\_{1} d\overline{\tau}\_{2}. \end{aligned} \tag{40}$$

$$\overline{K}\_{XY}(\overline{\tau}\_1, \overline{\tau}\_2) = a^{xy} \Phi^A(\overline{\tau}\_1, \overline{\tau}\_2) + h^{xy} \Psi^H(\overline{\tau}\_1, \overline{\tau}\_2) + b^{xy} \Psi^B(\overline{\tau}\_1, \overline{\tau}\_2) + \sum\_{j=0}^{l} \sum\_{k=0}^{2^{l}-1} \sum\_{n=0}^{2^{l}-1} d\_{jk\pi}^{xy} \Psi^D\_{jk\pi}(\overline{\tau}\_1, \overline{\tau}\_2) \tag{41}$$

where

$$\begin{aligned} a^{xy} &= \int \bigcup\_{0 \neq 0} \mathsf{T}\_{XY}(\mathsf{T}\_{1}, \mathsf{T}\_{2}) \Phi^{A}(\mathsf{T}\_{1}, \mathsf{T}\_{2}) d\mathsf{T}\_{1} d\mathsf{T}\_{2}, \ h^{xy} = \int \bigcup\_{0 \neq 0} \mathsf{T}\_{XY}(\mathsf{T}\_{1}, \mathsf{T}\_{2}) \Psi^{\mathsf{H}}(\mathsf{T}\_{1}, \mathsf{T}\_{2}) d\mathsf{T}\_{1} d\mathsf{T}\_{2}, \\\ b^{xy} &= \int \bigcup\_{0 \neq 0} \mathsf{T}\_{XY}(\mathsf{T}\_{1}, \mathsf{T}\_{2}) \Psi^{\mathsf{B}}(\mathsf{T}\_{1}, \mathsf{T}\_{2}) d\mathsf{T}\_{1} d\mathsf{T}\_{2}, \ d\_{j \& \ \ \ \ \ \ \int \prod\_{0 \neq 0} (\mathsf{T}\_{1}, \mathsf{T}\_{2}) \Psi^{\mathsf{D}}\_{j \& \ \ \ \mathsf{T}\_{1}} d\mathsf{T}\_{2}, \end{aligned} \tag{42}$$

$$\overline{K}\_{Y}(\overline{\tau}\_{1},\overline{\tau}\_{2}) = a^{y} \Phi^{A}(\overline{\tau}\_{1},\overline{\tau}\_{2}) + h^{y} \Psi^{\text{H}}(\overline{\tau}\_{1},\overline{\tau}\_{2}) + b^{y} \Psi^{\text{B}}(\overline{\tau}\_{1},\overline{\tau}\_{2}) + \sum\_{j=0}^{l} \sum\_{k=0}^{\mathcal{Q}-1} \sum\_{m=0}^{\mathcal{Q}-1} d\_{j\text{kn}}^{y} \Psi^{\text{D}}\_{j\text{kn}}(\overline{\tau}\_{1},\overline{\tau}\_{2}) \tag{43}$$

here

$$\begin{aligned} a^y &= \int \int \overline{\mathcal{K}}\_Y(\overline{\pi}\_1, \overline{\pi}\_2) \Phi^A(\overline{\pi}\_1, \overline{\pi}\_2) d\overline{\pi}\_1 d\overline{\pi}\_2, \; h^y = \int \overline{\int \mathcal{K}}\_Y(\overline{\pi}\_1, \overline{\pi}\_2) \Psi^\mathcal{H}(\overline{\pi}\_1, \overline{\pi}\_2) d\overline{\pi}\_1 d\overline{\pi}\_2, \\\ b^y &= \int \overline{\int \mathcal{K}}\_Y(\overline{\pi}\_1, \overline{\pi}\_2) \Psi^\mathcal{B}(\overline{\pi}\_1, \overline{\pi}\_2) d\overline{\pi}\_1 d\overline{\pi}\_2, \; d^y\_{j\text{kn}} = \int \overline{\int \mathcal{K}}\_Y(\overline{\pi}\_1, \overline{\pi}\_2) \Psi^\mathcal{D}\_{j\text{kn}}(\overline{\pi}\_1, \overline{\pi}\_2) d\overline{\pi}\_1 d\overline{\pi}\_2. \end{aligned} \tag{44}$$

After transition to time variable *<sup>τ</sup>* <sup>∈</sup> [0, 1] , *<sup>τ</sup>* <sup>=</sup> *<sup>τ</sup>*−(*t*−*T*) *<sup>T</sup>* at *τ* = *τ*(*τ*) = *Tτ* + (*t* − *T*), expression (3) takes the form

$$Z(\boldsymbol{\pi}) = Z(\boldsymbol{\pi}(\boldsymbol{\overline{\pi}})) = \mathbb{Z}(\boldsymbol{\overline{\pi}}) = \sum\_{r=1}^{N} \mathcal{U}\_r \mathbb{Z}\_r(\boldsymbol{\overline{\pi}}) + \mathbb{X}(\boldsymbol{\overline{\pi}}).\tag{45}$$

Analogously, we have

$$V\_{\nu} = T \cdot \overline{V}\_{\nu}; \ \overline{V}\_{\nu} = \int a\_{\nu}(\overline{\pi}) \overline{X}(\overline{\pi}) d\overline{\pi} + \int a\_{\nu}(\overline{\pi}) \overline{Y}(\overline{\pi}) d\overline{\pi}, \ D\_{\nu} = T^{2} \overline{D}\_{\nu}, \ \overline{D}\_{\nu} = D\left[\overline{V}\_{\nu}\right]. \tag{46}$$

According to [3,5], functions *aν*(*τ*) may be expressed by functions:

$$a\_1(\overline{\tau}) = \mathcal{g}\_1(\overline{\tau}), \; a\_\nu(\overline{\tau}) = \sum\_{\lambda=1}^{\nu-1} c\_{\nu\lambda} \mathcal{g}\_\lambda(\overline{\tau}) + \mathcal{g}\_\nu(\overline{\tau}) \; (\nu = \overline{2, L}).\tag{47}$$

Using notations:

$$\overline{\mathbf{x}}\_{\nu}(\overline{\tau}) = \frac{1}{\overline{D}\_{\nu}} \int\_{0}^{1} a\_{\nu}(\overline{\theta}) \overline{K}\_{X}(\overline{\tau}, \overline{\theta}) d\overline{\theta} + \frac{1}{\overline{D}\_{\nu}} \int\_{0}^{1} a\_{\nu}(\overline{\theta}) \overline{K}\_{XY}(\overline{\tau}, \overline{\theta}) d\overline{\theta},\tag{48}$$

$$\overline{y}\_{\boldsymbol{\nu}}(\boldsymbol{\mathsf{T}}) = \frac{1}{\overline{D}\_{\boldsymbol{\nu}}} \int\_{0}^{1} a\_{\boldsymbol{\nu}}^{x}(\overline{\boldsymbol{\theta}}) \mathbb{X}\_{XY}(\overline{\boldsymbol{\theta}}, \boldsymbol{\mathsf{T}}) d\overline{\boldsymbol{\theta}} + \frac{1}{\overline{D}\_{\boldsymbol{\nu}}} \int\_{0}^{1} a\_{\boldsymbol{\nu}}^{y}(\overline{\boldsymbol{\theta}}) \mathbb{X}\_{Y}(\overline{\boldsymbol{\mathsf{T}}}, \overline{\boldsymbol{\theta}}) d\overline{\boldsymbol{\theta}} \tag{49}$$

we get the following formulae:

$$\mathbf{x}\_{\nu}(\tau) = \mathbf{x}\_{\nu}(\tau(\overline{\tau})) = \frac{1}{T}\overline{\mathbf{x}}\_{\nu}(\overline{\tau}), \ y\_{\nu}(\tau) = y\_{\nu}(\tau(\overline{\tau})) = \frac{1}{T\_{\mathcal{Y}}}\overline{y}\_{\nu}(\overline{\tau}),\tag{50}$$

$$X(\boldsymbol{\pi}(\boldsymbol{\overline{\pi}})) = \sum\_{\nu=1}^{L} V\_{\nu} \boldsymbol{\underline{x}}\_{\nu}(\boldsymbol{\pi}(\boldsymbol{\overline{\pi}})) = \sum\_{\nu=1}^{L} T \overline{\boldsymbol{\nabla}}\_{\nu} \frac{1}{T} \boldsymbol{\overline{\pi}}\_{\nu}(\boldsymbol{\overline{\pi}}) = \sum\_{\nu=1}^{L} \overline{\boldsymbol{\nabla}}\_{\nu} \boldsymbol{\overline{x}}\_{\nu}(\boldsymbol{\overline{\pi}}),\tag{51}$$

$$Y(\boldsymbol{\tau}(\boldsymbol{\overline{\tau}})) = \sum\_{\nu=1}^{L} V\_{\nu} y\_{\nu}(\boldsymbol{\tau}(\boldsymbol{\overline{\tau}})) = \sum\_{\nu=1}^{L} T \overline{\boldsymbol{\nabla}}\_{\nu} \frac{1}{T} \overline{y}\_{\nu}(\boldsymbol{\overline{\tau}}) = \sum\_{\nu=1}^{L} \overline{\boldsymbol{\nabla}}\_{\nu} \overline{y}\_{\nu}(\boldsymbol{\overline{\tau}}).\tag{52}$$

Here, RV *Vν* have zero mathematical expectations, and variances coordinate functions *xν*(*τ*) and *yν*(*τ*) are successively defined by the following formulae:

$$\overline{\mathfrak{X}}\_{1}(\overline{\tau}) = \frac{1}{\overline{\mathfrak{D}\_{1}}} h\_{1}^{\mathbf{x}}(\overline{\tau}); \ \overline{\mathfrak{X}}\_{\nu}(\overline{\tau}) = \frac{1}{\overline{\mathfrak{D}\_{\nu}}} \left( \sum\_{\lambda=1}^{\nu-1} d\_{\nu\lambda} h\_{\lambda}^{\mathbf{x}}(\overline{\tau}) + h\_{\nu}^{\mathbf{x}}(\overline{\tau}) \right); \tag{53}$$

*<sup>y</sup>*1(*τ*) = <sup>1</sup> *D*<sup>1</sup> *h y* <sup>1</sup>(*τ*); *<sup>y</sup>ν*(*τ*) = <sup>1</sup> *Dν <sup>ν</sup>*−<sup>1</sup> ∑ *λ*=1 *dνλh y <sup>λ</sup>*(*τ*) + *h y <sup>ν</sup>*(*τ*) ! ; (54)

where

$$d\_{\nu\lambda} = c\_{\nu\lambda} + \sum\_{\mu=\lambda+1}^{\nu-1} c\_{\nu\mu} d\_{\mu\lambda} \left(\lambda = \overline{1, \nu-2}\right); \ d\_{\nu\nu-1} = c\_{\nu\nu-1}; \nu = \overline{2, \mathsf{L}};\tag{55}$$

$$\begin{split} c\_{\nu1} &= -\frac{k\_{\text{1}}}{D\_{1}} \left( \nu = \overline{2, L} \right); \ c\_{\nu\mu} = -\frac{1}{D\_{\mu}} \left( k\_{\nu\mu} - \sum\_{\lambda=1}^{\mu-1} \overline{D}\_{\lambda} c\_{\mu\lambda} c\_{\nu\lambda} \right) \left( \mu = \overline{2, \nu-1}; \nu = \overline{3, L} \right); \\ \overline{D}\_{1} &= k\_{11}; \ \overline{D}\_{\nu} = k\_{\nu\nu} - \sum\_{\lambda=1}^{\nu-1} \overline{D}\_{\lambda} \left| c\_{\nu\lambda} \right|^{2} \left( \nu = \overline{2, L} \right). \end{split} \tag{56}$$

Parameters *kνμ* are expressed by coefficients of two-dimensional wavelet expressions of covariance functions *KX*(*τ*1, *τ*2), *KXY*(*τ*1, *τ*2), and *KY*(*τ*1, *τ*2)

$$\begin{array}{l}k\_{11} = a^x + 2a^{xy} + a^y, \; k\_{12} = h^x + 2h^{xy} + h^y, \; k\_{21} = b^x + 2b^{xy} + b^y, \\ k\_{22} = d\_{000}^x + 2d\_{000}^{xy} + d\_{000}^y, \; k\_{\nu \mu} = d\_{jkn}^x + 2d\_{jkn}^{xy} + d\_{jkn}^y \\ (\nu = 2^j + k + 1; \; \mu = 2^j + n + 1; \; j = \overline{1, j}; k, n = 0, 1, \ldots, 2^j - 1). \end{array} \tag{57}$$

The other *kνμ* = 0.

Auxiliary functions *h<sup>x</sup> <sup>ν</sup>*(*τ*), *h y <sup>ν</sup>*(*τ*) are expressed by basic wavelet functions (38) and coefficients of wavelet expansions of covariance functions *KX*(*τ*1, *τ*2), *KXY*(*τ*1, *τ*2), *KY*(*τ*1, *τ*2):

$$\begin{aligned} h\_1^x(\mathbb{T}) &= (a^x + a^{xy})\varphi\_{00}(\mathbb{T}) + (b^x + b^{xy})\varphi\_{00}(\mathbb{T}), \; h\_1^y(\mathbb{T}) = (a^{xy} + a^y)\varphi\_{00}(\mathbb{T}) + (b^{xy} + b^y)\varphi\_{00}(\mathbb{T}),\\ h\_1^x(\mathbb{T}) &= (h^x + h^{xy})\varphi\_{00}(\mathbb{T}) + \left(d\_{00}^x + d\_{000}^{xy}\right)\varphi\_{00}(\mathbb{T}), \; h\_1^y(\mathbb{T}) = (h^{xy} + h^y)\varphi\_{00}(\mathbb{T}) + \left(d\_{000}^{xy} + d\_{000}^y\right)\varphi\_{00}(\mathbb{T}),\\ h\_\nu^x(\mathbb{T}) &= \sum\_{k=0}^{2^j - 1} \left(d\_{jkn}^x + d\_{jkn}^{xy}\right)\varphi\_{jk}(\mathbb{T}) \ (v = \overline{3, l}; v = 2^j + n + 1; \; n = 0, 1, \ldots, 2^j - 1). \end{aligned} \tag{58}$$

Considering (45), (46), we get

$$\mathbf{Z}\_{\nu} = \mathbf{T} \mathbf{Z}\_{\nu} \quad \mathbf{Z}\_{\nu} = \sum\_{r=1}^{N} \overline{\mathbf{a}}\_{\nu r} \mathbf{U}\_{r} + \overline{\mathbf{V}}\_{\nu \prime} \tag{59}$$

$$\boldsymbol{\alpha}\_{\boldsymbol{\nu}\boldsymbol{\tau}} = T \overline{\boldsymbol{\pi}}\_{\boldsymbol{\nu}\boldsymbol{\tau}} \; \; \overline{\boldsymbol{a}}\_{\boldsymbol{\nu}\boldsymbol{\tau}} = \int\_{0}^{1} a\_{\boldsymbol{\nu}}(\overline{\boldsymbol{\pi}}) \overline{\boldsymbol{\zeta}}\_{\boldsymbol{\nu}}(\overline{\boldsymbol{\pi}}) d\overline{\boldsymbol{\pi}}.\tag{60}$$

If functions *<sup>ξ</sup>*1(*τ*), ... , *<sup>ξ</sup>N*(*τ*) <sup>∈</sup> *<sup>L</sup>*2[*<sup>t</sup>* <sup>−</sup> *<sup>T</sup>*, *<sup>t</sup>*], then *<sup>ξ</sup>*1(*τ*), ... , *<sup>ξ</sup>N*(*τ*) <sup>∈</sup> *<sup>L</sup>*2[0, 1] and have wavelet expansions

$$\overline{\xi}\_r(\overline{\tau}) = a\_r^{\overline{\xi}} \varphi\_{00}(\overline{\tau}) + \sum\_{j=0}^{J} \sum\_{k=0}^{2^j - 1} d\_{rjk}^{\overline{\xi}} \psi\_{jk}(\overline{\tau}) \ (r = 1, \dots, N), \tag{61}$$

$$a\_r^{\tilde{\mathbb{E}}} = \int\_0^1 \overline{\xi}\_r(\overline{\tau}) \, \varphi\_{00}(\overline{\tau}) \, d\overline{\tau}, \\ d\_{rjk}^{\tilde{\mathbb{E}}} = \int\_0^1 \overline{\xi}\_r(\overline{\tau}) \, \psi\_{jk}(\overline{\tau}) \, d\overline{\tau}, \tag{62}$$

Using notation (38) we get from (61), (62)

$$
\overline{\zeta}\_r(\mathsf{T}) = c\_{r1}^{\overline{\mathsf{T}}} \mathfrak{S}\_1(\mathsf{T}) + \sum\_{\substack{\nu = 2 \\ (\nu = 2^j + k + 1; j = \overline{0, \overline{f}}; k = 0, 1, \dots, 2^j - 1)}}^L c\_{r\mathcal{G}\nu}^{\overline{\mathsf{T}}}(\mathsf{T}) \ (r = 1, \dots, N), \tag{63}
$$

$$
\begin{aligned}
(\nu = 2^j + k + 1; j = \overline{0, \overline{f}}; k = 0, 1, \dots, 2^j - 1) \\
c\_{r1}^{\overline{\mathsf{T}}} = a\_{r}^{\overline{\mathsf{T}}}, \ c\_{r\mathcal{V}}^{\overline{\mathsf{T}}} = d\_{r\mathcal{V}}^{\overline{\mathsf{T}}}. \end{aligned} \tag{64}
$$

From (60), (62), (64), we have

$$
\overline{\mathfrak{a}}\_{1r} = c\_{r1}^{\overline{\mathfrak{z}}}; \overline{\mathfrak{a}}\_{\nu r} = \sum\_{\lambda=1}^{\nu-1} c\_{\nu\lambda} c\_{r\lambda}^{\overline{\mathfrak{z}}} + c\_{\nu\nu}^{\overline{\mathfrak{z}}} (\nu = \overline{\mathfrak{z}\prime L}).\tag{65}
$$

Finally, using formulae

$$\sum\_{\nu=1}^{L} Z\_{\nu} \mathbb{1}\_{\nu}(\tau) = \sum\_{\nu=1}^{L} \left( T \mathbb{Z}\_{\nu} \right) \left( \frac{1}{T} \mathbb{Z}\_{\nu}(\mathbb{T}) \right) = \sum\_{\nu=1}^{L} \mathbb{Z}\_{\nu} \mathbb{Z}\_{\nu}(\mathbb{T}) \tag{66}$$

we get the required WLCE for StP *Z*(*τ*) (*τ* ∈ [*t* − *T*, *t*]):

$$Z(\boldsymbol{\pi}) = Z(\boldsymbol{\pi}(\boldsymbol{\pi})) = \mathbb{Z}(\boldsymbol{\overline{\pi}}) = \sum\_{\nu=1}^{L} \mathbb{Z}\_{\boldsymbol{\nu}} \mathbb{z}\_{\nu}(\boldsymbol{\pi}).\tag{67}$$

In basic Formulae (23)–(31), the parameters are expressed as follows:

$$\eta\_0 = \sum\_{\nu=1}^{L} z\_{\nu} y\_{\nu}(\tau) \ = \sum\_{\nu=1}^{L} (T\overline{z}\_{\nu}) \left(\frac{1}{T}\overline{y}\_{\nu}(\overline{\tau})\right) = \sum\_{\nu=1}^{L} \overline{z}\_{\nu} \overline{y}\_{\nu}(\overline{\tau}) \,. \tag{68}$$

$$\eta\_{\boldsymbol{r}} = \sum\_{\nu=1}^{L} \frac{a\_{\nu \boldsymbol{r}} z\_{\nu}}{D\_{\boldsymbol{v}}} = \sum\_{\nu=1}^{L} \frac{(T\overline{a}\_{\nu \boldsymbol{r}})(T\overline{z}\_{\boldsymbol{v}})}{T^{2}\overline{D}\_{\boldsymbol{v}}} = \sum\_{\nu=1}^{L} \frac{\overline{a}\_{\nu \boldsymbol{r}} \overline{z}\_{\boldsymbol{v}}}{\overline{D}\_{\boldsymbol{v}}} \left(r = \overline{1, N}\right),\tag{69}$$

$$b\_{p0} = \sum\_{\nu=1}^{L} a\_{\nu p} y\_{\nu}(\pi) = \sum\_{\nu=1}^{L} \left( T \overline{a}\_{\nu p} \right) \left( \frac{1}{T} \overline{y}\_{\nu}(\overline{\pi}) \right) = \sum\_{\nu=1}^{L} \overline{a}\_{\nu p} \overline{y}\_{\nu}(\overline{\pi}),\tag{70}$$

$$b\_{p\eta} = \sum\_{\nu=1}^{L} \frac{1}{D\_{\nu}} \mathfrak{a}\_{\nu p} \mathfrak{a}\_{\nu \eta} = \sum\_{\nu=1}^{L} \frac{1}{T^2 \overline{D}\_{\nu}} \left( T \overline{\mathfrak{a}}\_{\nu p} \right) \left( T \overline{\mathfrak{a}}\_{\nu \eta} \right) = \sum\_{\nu=1}^{L} \frac{1}{\overline{D}\_{\nu}} \overline{\mathfrak{a}}\_{\nu p} \overline{\mathfrak{a}}\_{\nu \eta}.\tag{71}$$

Note that expression *P<sup>W</sup>* <sup>0</sup> (*t*, *η*0, ... , *ηN*) depends on fixed values *z*1, ... , *zL* of *Z*1, *Z*2, ... , *ZL*. So, the WLCE method is defined by Formulae (67)–(71) at conditions (61)–(65).

#### **5. Synthesis of a Linear Optimal System for Criterion of the Maximum Probability That Signal Will Not Exceed a Particular Value in Absolute Magnitude**

Conditional risk *ρ*(*A*|*W*) in case (2) is equal from interval to probability of error exit

$$\rho(A|\mathcal{W}) = E[l(\mathcal{W}, \mathcal{W}^\*)|\mathcal{W}] = P(|\mathcal{W}^\* - \mathcal{W}| \ge w(t)) = 1 - P(|\mathcal{W}^\* - \mathcal{W}| < w(t)).\tag{72}$$

A priori density *f*(*u*) = *f*(*u*1, ... , *uN*) of RV *U* = [*U*<sup>1</sup> *U*<sup>2</sup> ... *UN*] *<sup>T</sup>* is defined by formula

$$f(\boldsymbol{\mu}\_{1},\ldots,\boldsymbol{\mu}\_{N}) = \left[ (2\pi)^{N} |\boldsymbol{K}| \right]^{-\frac{1}{2}} \exp\left\{ -\frac{1}{2} \sum\_{p,q=1}^{N} c\_{pq} \boldsymbol{\mu}\_{p} \boldsymbol{\mu}\_{q} \right\} \tag{73}$$

where *K* is the covariance matrix of *U*, *cpq* (*p*, *q* = 1, *N*) is *K*−<sup>1</sup> elements. Let us find minimum of the integral

$$\begin{aligned} \mathcal{I}(P^W, \eta\_0, \dots, \eta\_N, t) &= \left[ (2\pi)^N |K| \right]^{-\frac{1}{2}} \\ &\times \inf\_{\substack{\sum\_{r=1}^N \mathbf{u}\_r(\zeta\_r(t) - b\_{r0}) + \eta\_0 - P^W | \ge \mathbf{w}(t)}} \exp\left\{ \sum\_{r=1}^N \eta\_r \boldsymbol{u}\_r - \frac{1}{2} \sum\_{p,q=1}^N \left( \mathbf{c}\_{pq} + b\_{pq} \right) \boldsymbol{u}\_p \boldsymbol{u}\_q \right\} du\_1 \dots du\_N. \end{aligned} \tag{74}$$

Integral (74) is propositional to the probability of the normal point (*U*1, *U*2,..., *UN*), and does not get into the subspace defined by inequality | *N* ∑ *r*=1 *ur*(*ζr*(*t*) <sup>−</sup> *br*0) + *<sup>η</sup>*<sup>0</sup> <sup>−</sup> <sup>P</sup>*W*<sup>|</sup> <sup>&</sup>lt; *<sup>w</sup>*(*t*). This probability has a minimum, if its mathematical expectation lies on line *N* ∑ *r*=1 *ur*(*ζr*(*t*) <sup>−</sup> *br*0) <sup>+</sup> *<sup>η</sup>*<sup>0</sup> <sup>−</sup> <sup>P</sup>*<sup>W</sup>* <sup>=</sup> 0. Normal density has maximum mathematical expectation. So, for definition of mathematical expectation, it is enough to equate partial derivatives in (74) to zero for *u*1, *u*2,..., *uN*. The (74) minimization value *P*0(*t*, *η*0,..., *ηN*) is equal to:

$$P\_0^N = \sum\_{r=1}^N \lambda\_r(t)(\mathbb{Q}\_r(t) - b\_{r0}) + \eta\_0. \tag{75}$$

For solution of functions *λ*1(*t*), *λ*2(*t*), ... , *λN*(*t*) it is necessary to solve the system of linear algebraic equations:

$$\sum\_{p=1}^{N} \lambda\_p(t) \left(c\_{pq} + b\_{pq}\right) = \eta\_q(t) \left(q = \overline{1, N}\right). \tag{76}$$

In matrix form, Equation (76) is as follows:

$$\mathbf{C}\_{1} \cdot \boldsymbol{\Lambda} = \boldsymbol{A}\_{1}^{T} \cdot \boldsymbol{Z}\_{1} \tag{77}$$

where

$$\mathbb{C}\_{1} = \left(c\_{\overline{i}\overline{j}} + b\_{\overline{i}\overline{j}}\right)\_{i,\overline{j}=1'}^{N}, \; A\_{1} = \left(\frac{\overline{a}\_{\overline{i}\overline{j}}}{\overline{D}\_{\overline{i}}}\right)\_{i,\overline{j}=1}^{L,N}, \; Z\_{1} = \left[\overline{z}\_{1}, \overline{z}\_{2}, \dots, \overline{z}\_{L}\right]^{T}, \; \Lambda = \left[\lambda\_{1}(t), \dots, \lambda\_{N}(t)\right]^{T}. \tag{78}$$

Hence,

$$
\Lambda = \mathbb{C}\_1^{-1} \cdot A\_1^T \cdot Z\_1. \tag{79}
$$

Using notations

$$B\_1 = \begin{pmatrix} \mathbb{Z}\_1(t) - b\_{10} \\ \dots \\ \mathbb{Z}\_N(t) - b\_{N0} \end{pmatrix}, \begin{pmatrix} Y\_1(t) \\ \dots \\ \overline{y}\_N(t) \end{pmatrix} \tag{80}$$

we get the Bayes optimal operator in matrix form:

$$A = B\_1^T \cdot \mathbb{C}\_1^{-1} \cdot A\_1^T + Y\_1^T. \tag{81}$$

The Bayes optimal estimate of output StP is defined by

$$W^\*(t) = A \cdot Z\_1. \tag{82}$$

The mean risk is at

*<sup>R</sup>*(*A*) = (2*π*) *<sup>N</sup>*+*<sup>L</sup>* · *<sup>D</sup>*<sup>1</sup> · ... · *DL* · |*K*<sup>|</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> - | *N* ∑ *r*=1 *ur*(*ζr*(*t*)−*br*0)+*η*0−P*<sup>W</sup>* <sup>0</sup> |≥*w*(*t*) exp{−<sup>1</sup> 2 *L* ∑ *ν*=1 *z*2 *ν D<sup>ν</sup>* − − *L* ∑ *ν*=1 *N* ∑ *r*=1 *ανr <sup>D</sup><sup>ν</sup> <sup>z</sup>νur* <sup>−</sup> <sup>1</sup> 2 *N* ∑ *p*,*q*=1 - *cpq* + *bpq upuq* 9 *du*<sup>1</sup> ... *duNdz*<sup>1</sup> ... *dzL* = = 1 − (2*π*) *<sup>N</sup>*+*<sup>L</sup>* · *<sup>D</sup>*<sup>1</sup> · ... · *DL* · |*K*<sup>|</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> - | *N* ∑ *r*=1 *ur*(*ζr*(*t*)−*br*0)+*η*0−*P<sup>W</sup>* <sup>0</sup> |<*w*(*t*) exp{−<sup>1</sup> 2 *L* ∑ *ν*=1 *z*2 *ν D<sup>ν</sup>* − − *L* ∑ *ν*=1 *N* ∑ *r*=1 *ανr <sup>D</sup><sup>ν</sup> <sup>z</sup>νur* <sup>−</sup> <sup>1</sup> 2 *N* ∑ *p*,*q*=1 - *cpq* + *bpq upuq* 9 *du*<sup>1</sup> ... *duNdz*<sup>1</sup> ... *dzL*. (83)

Equations (75)–(83) define the method of synthesis of a linear system for criterion of maximum probability that the signal will not exceed a particular value in absolute magnitude. New results generalize the following particular results [27–31] for different Bayes criteria in OStS:


#### **6. Example**

The designed software tools based on results of Section 5 provide the possibility to compare mathematical models of different classes of linear OStS, its optimal instrumental potential accuracy in case of stochastic factors and noises.

Let us consider the extrapolator for a radar-location device described by the following equations:

$$Z(\tau) = lI\_1 + ll2\tau + X(\tau),\ \mathcal{W}(t) = lI\_1 + ll2(t + \triangle),\ \tau \in [t - T, t] \tag{84}$$

Here, *U*<sup>1</sup> and *U*<sup>2</sup> are random calibration parameters for the calibration device, and *X* is the colored noise. For the criterion of the maximum probability that the signal will not exceed a particular value *a* in absolute magnitude, we use algorithm (82).

Suppose that:


$$f(u\_1, u\_2) = \frac{\sqrt{c\_{11}c\_{22} - c\_{12}^2}}{2\pi} \exp\left\{-\frac{1}{2} \sum\_{p,q=1}^2 c\_{pq}u\_p u\_q\right\} \tag{85}$$

(*cpq* are elements of the inverse covariance matrix *K*−1);

– Input data: *t* ∈ [9; 18], *T* = 8, = 1, *D* = 1, *α* = 1, *K* = 1 0 0 1 , *ξ*1(*τ*) = 1, *ξ*2(*τ*) = *τ*; *ζ*1(*t*) = 1, *ζ*2(*t*) = *t* + , *J* = 2, *L* = 8.

A typical realization method demonstrates high accuracy in Figure 1. As practice for quick calibration of typical devices we use, algorithms more simple than (82) were developed, computed and compared. This information is necessary for passport documentation.

**Figure 1.** Graphs of: (**a**) signal extrapolation *W* and estimate extrapolation *W*∗; (**b**) module |*W*<sup>∗</sup> − *W*|.

The extrapolator takes values from −38.6099 to 11.9854. At the same time, the extrapolator error modulus does not exceed 0.7568 (Figure 1).

#### **7. Conclusions**

This article is devoted to problems with optimizing observable stochastic systems based on wavelet canonical expansions. Section 2 is devoted to different Bayes criteria in terms of risk theory. Following [1,2], in Section 3, basic formulae for optimal Bayes synthesis based on canonical expansions are given. Section 4 is dedicated to the solution of a general optimization problem using wavelet canonical expansions in case of complex nonstationary linear systems. In Section 5, a basic algorithm is given for the criterion of maximal probability that the signal will not exceed a particular value in absolute magnitude. An example of a radar-location extrapolator device is discussed.

The developed optimization methodology "quick probabilistic analytical numerical optimization" does not use statistical Monte Carlo methods.

Directions of future generalizations and implementations:


The research was carried out using the infrastructure of the Shared Research Facilities "High Performance Computing and Big Data" (CKP "Informatics") of FRC CSC RAS (Moscow).

**Author Contributions:** Conceptualization, I.S.; methodology, I.S., V.S., T.K.; software, E.K., T.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** The research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


## *Article* **Identification of Quadratic Volterra Polynomials in the "Input–Output" Models of Nonlinear Systems**

**Yury Voscoboynikov 1,2, Svetlana Solodusha 3,\*, Evgeniia Markova 3, Ekaterina Antipina 3,4 and Vasilisa Boeva <sup>1</sup>**


**Abstract:** In this paper, we propose a new algorithm for constructing an integral model of a nonlinear dynamic system of the "input–output" type in the form of a quadratic segment of the Volterra integropower series (polynomial). We consider nonparametric identification of models using physically realizable piecewise linear test signals in the time domain. The advantage of the presented approach is to obtain explicit formulas for calculating the transient responses (Volterra kernels), which determine the unique solution of the Volterra integral equations of the first kind with two variable integration limits. The numerical method proposed in the paper for solving the corresponding equations includes the use of smoothing splines. An important result is that the constructed identification algorithm has a low methodological error.

**Keywords:** nonparametric identification; dynamic system; integral model; Volterra equations; smoothing cubic splines; selection of the smoothing option

**MSC:** 45D05

#### **1. Introduction**

The development of the theory of dynamical systems, taking into account the specifics of applied problems, aims to create new mathematical methods. This paper is devoted to the develop mathematical tools for studying inverse problems in the theory of dynamical systems. The work aims to develop a methodology and algorithms for identifying Volterra polynomials (finite segments of Volterra series) [1].

$$y(t) = \sum\_{n=1}^{N} \int\_{0}^{t} \dots \int\_{0}^{t} K\_{n}(t, s\_{1}, \dots, s\_{n}) \prod\_{k}^{n} \mathbf{x}(s\_{k}) ds\_{k'} \, t \in [0, T]. \tag{1}$$

The Volterra integro-power series is well known in the theory of mathematical modeling of nonlinear dynamic systems of the "input–output" type. However, modern and classical studies in this area do not provide a universal mathematical apparatus for studying problems with restrictions on the dynamic characteristics of systems.

Reference [2] contains an extensive list of references on methods for identifying nonlinear objects using Volterra integral equations. References [3–7] are devoted to methods for constructing dynamic models using Volterra polynomials. Models based on the Volterra theory are used to describe stochastic systems [8], as well as for the structural identification of nonlinear dynamic systems [9]. A systematic approach to modeling nonlinear dynamic systems by formalizing the relationship between input *x*(*t*) and output *y*(*t*) was

**Citation:** Voscoboynikov, Y.; Solodusha, S.; Markova, E.; Antipina, E.; Boeva, V. Identification of Quadratic Volterra Polynomials in the "Input–Output" Models of Nonlinear Systems. *Mathematics* **2022**, *10*, 1836. https://doi.org/10.3390/ math10111836

Academic Editors: Natalia Bakhtadze, Igor Yadykin, Andrei Torgashov and Nikolay Korgin

Received: 12 April 2022 Accepted: 24 May 2022 Published: 26 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

first implemented by Norbert Wiener [10]. He applied the Volterra series in the analysis of nonlinear electronic circuits. He developed efficient identification algorithms for the case of an input signal in the form of Gaussian white noise. Wiener's research was continued in the works of Marmarelis, Schetzen, Rugh, and other researchers (see, for example, the reviews in [11,12]). The system responses to test signals in the form of ideal white noise are used to identify the Wiener kernels. In practice, the implementation of such input actions is carried out with inevitable errors, which are compensated by choosing the optimal range in test disturbances [13]. When solving inverse quantum mechanical problems, researchers use wave functions [14] to construct Volterra integral models. The identification of Volterra kernels is based on minimizing the root-mean-square error from the response of the dynamic system tested. This approach is associated with the extreme complexity of practical implementation [15].

In this regard, they strive to achieve a simplification of the methods (see, for example, [16–19]). In particular, the authors of [18] implemented the case where Volterra kernels are assumed to be separable,

$$K\_i(s\_1, \ldots, s\_i) = \prod\_{n=1}^i g(s\_n), i = \overline{1,3}, \tag{2}$$

as well as the satisfiability of a priori conditions,

$$K\_n(s\_1, \ldots, s\_n) = 0, n \gg 3. \tag{3}$$

Reference [16] considered a modified discrete analog of the cubic Volterra polynomial.

$$\begin{aligned} y(t\_i) &= \sum\_{m\_1=0}^{N\_1-1} K\_1(t\_{m\_1}) \mathbf{x}(t\_{i-m\_1}) + \sum\_{m\_1=0}^{N\_2-1} \sum\_{m\_2=m\_1}^{N\_2-1} K\_2(t\_{m\_1}, t\_{m\_2}) \mathbf{x}(t\_{i-m\_1}) \mathbf{x}(t\_{i-m\_2}) + \\ &+ \sum\_{m\_1=0}^{N\_3-1} \sum\_{m\_2=m\_1}^{N\_3-1} \sum\_{m\_3=m\_2}^{N\_3-1} K\_3(t\_{m\_1}, t\_{m\_2}, t\_{m\_3}) \mathbf{x}(t\_{i-m\_1}) \mathbf{x}(t\_{i-m\_2}) \mathbf{x}(t\_{i-m\_3}), \end{aligned} \tag{4}$$

where the symmetric kernels *K*<sup>2</sup> and *K*<sup>3</sup> are defined only on one of the subdomains 0 ≤ *m*<sup>1</sup> ≤ *m*<sup>2</sup> ≤ *N*<sup>2</sup> − 1 and 0 ≤ *m*<sup>1</sup> ≤ *m*<sup>2</sup> ≤ *m*<sup>3</sup> ≤ *N*<sup>3</sup> − 1, respectively. To reduce computational costs, the authors of [16] proposed a transition from (4) to relations

$$y(t\_i) = \sum\_{m\_1=0}^{N\_1-1} K\_1(t\_{\mathfrak{m}\_1}) \mathbf{x}(t\_{i-m\_1}) + \sum\_{m\_1=0}^{N\_2-1} \sum\_{m\_2=m\_1}^{N\_2-1} K\_2(t\_{\mathfrak{m}\_1}, t\_{\mathfrak{m}\_2}) \mathbf{x}(t\_{i-m\_1}) \mathbf{x}(t\_{i-m\_2}) + \sum\_{m=0}^{N\_3-1} \tilde{K}\_3(t\_{\mathfrak{m}}) \mathbf{x}^3(t\_{i-m}) \tag{5}$$

or

$$y(t\_i) = \sum\_{m\_1=0}^{N\_1-1} K\_1(t\_{m\_1}) x(t\_{i-m\_1}) + \sum\_{m=0}^{N\_2-1} \bar{K}\_2(t\_m) x^2(t\_{i-m}) + \sum\_{m=0}^{N\_3-1} \bar{K}\_3(t\_m) x^3(t\_{i-m}).\tag{6}$$

It depends on the statistical properties of the input signals. In this case, they solve the problem of restoring the functions *K<sup>n</sup>* of one variable instead of the problem of determining in (4) the functions *Kn*, *n* = 2, 3, of many variables in (5) and (6). Moreover, instead of searching for *Kn*(*t*,*s*1,...,*sn*) on the entire domain of definition 0 ≤ *s*1, ... ,*sn* ≤ *t* ≤ *T*, researchers confine themselves to the values of the function at fixed values *s*<sup>1</sup> = *s*<sup>2</sup> = ... = *sn* = *t*, *t* ∈ [0, *T*]. In particular, this approach was applied in [20] (p. 1387) and [21] (p. 1078). The critical review of [22] (pp. 178–179) explained the difference between these problems in detail using the approaches described in [23,24] as an example.

As noted in [25], "for the presentation of information in the time domain, the expediency of using pulsed and stepped test signals is obvious". A method based on the

*δ*-functions use was proposed in [26] and developed later in [27]. It suggests using the (*n* − 1)-parametric family,

$$\chi\_{\omega\_1,\dots,\omega\_{n-1}}(t) = \sum\_{j=0}^{n-1} \delta\left(t - \omega\_j\right), \omega\_0 = 0, \omega\_j \ge 0, \sum\_{j=0}^{n-1} \omega\_j \le t \le T,\tag{7}$$

where *δ*(*s*) is the Dirac *δ*-function,

$$\delta(s) = \begin{cases} 0, \text{s} \neq 0, \\ \infty, \text{s} = 0, \end{cases}$$

as test actions for identifying the *Kn*(*s*1,...,*sn*).

A discrete analog of this approach is the numerical algorithm proposed in [28]. Note that the technique based on (6) has a limited scope. An explanation for this can be found in [29] (p. 142): " ... this simple idea is impulse-response analysis. Its basic weakness is that many physical processes do not allow pulse inputs ... Moreover, such input could make the system exhibit nonlinear effect that would disturb the linearized behavior we have set out to model". Readers can find a detailed review of identification methods based on impulse disturbances [27,30].

Let us now turn to methods based on the application of Heaviside functions *e*(*t*). Reference [31] considered an approach related to approximating on [0, *T*] a periodic test signal by discretely given stepwise one with a constant quantization step. It is assumed the initial continuous input signal has a constant period *T*. This technique was further developed in [32,33], in which

$$\alpha\_{\omega\_1,\ldots,\omega\_{n-1}}(t) = \sum\_{j=1}^n \mathbb{C}\_{\omega\_j} \alpha e\left(t - \omega\_j\right), \omega\_j \ge 0, \sum\_{j=1}^n \omega\_j \le t \le T, \omega\_j$$

was used as the test signal for identifying *Kn*, *n* ≥ 2, where *α* is the signal amplitude (height), and *Cω<sup>j</sup>* is a logical variable equal to zero if

$$\omega\_{j} = 0.$$

In [34], a modification was made for a dynamical system with two inputs. Here, the identification process included a heuristic algorithm for dividing the system response *y*(*t*) into components due to the influence of a separate integral term of the quadratic Volterra model.

In this paper, we consider dynamic systems, the transient characteristics of which are presented in the time domain. The possibility of scaling in time makes it possible to study fast processes that are typical for many technical (energy) systems. The method of finding the transient characteristics of the system is deterministic. Fewer data are required to formalize the mathematical model in comparison with the probabilistic method. The collection of initial data occurs during the execution of an active experiment, which implies the possibility of influencing the system with test input signals. In comparison with a passive experiment (observation), this method allows one to reduce the time for collecting initial data and specify the type of test signal.

Reference [3] presented a method for identifying Volterra kernels using a combination of Heaviside functions with a deviating argument as test signals. Its advantage lies in the transition from the original problem to the solution of such special multidimensional Volterra equations of the first kind with variable upper and lower integration limits, which have explicit inversion formulas. The scope of this technique for modeling the dynamics of real-life technical objects is limited by the complexity of the formation of piecewise constant test signals. Reference [35] considered the possibility of using test signals of a piecewise linear form,

$$\mathbf{x}(t) \equiv \mathbf{x}\_{\nu}(t) = \begin{cases} 0, & t \le 0, \\ \frac{t}{\nu}, & 0 < t \le \nu, \\ 1, & t > \nu \end{cases} \tag{8}$$

in the problem of identifying a two-dimensional continuum of unknowns from a linear Volterra equation of the first kind with a nonstationary kernel. Figure 1 shows the form of the input signal (8).

**Figure 1.** The form of the input signal (8).

The chosen modification of the input signals simplifies their formation in practice, and the distinguished Volterra integral equations of the first kind, as before, have a unique solution in the class of continuous functions.

The identification method was developed to further apply it for numerical modeling the process of automatic simulation of the nonlinear dynamics of heat and electric power industry objects based on Volterra polynomials with a vector input.

The purpose of this work is, firstly, to use the reserve for increasing the accuracy of constructing an integral model, presented as a modified quadratic Volterra polynomial, through the use of piecewise linear signals close to real-life dynamic systems, and secondly, to develop measurement noise-resistant algorithms for identifying functions two variables.

The paper is organized as follows: Section 2 describes the technique for building an integral model using piecewise linear test signals. It also presents an example illustrating the effect of increasing the accuracy of modeling the linear term by applying piecewise linear signals. Section 3 contains a numerical algorithm for identifying the quadratic term of the Volterra series based on smoothing cubic splines. Section 4 considers the implementation of the numerical solution algorithm using the quadrature method. Section 5 suggests directions for future work. Section 6 contains the main results.

#### **2. Method for Constructing a Quadratic Volterra Polynomial**

Let us consider a quadratic model containing a linear nonstationary component,

$$y(t) = \int\_0^t K\_1(t, s)x(s)ds + \int\_0^t \int\_0^t K\_2(s\_1, s\_2)x(t - s\_1)x(t - s\_2)ds\_1ds\_2, t \in [0, T]. \tag{9}$$

To identify the Volterra kernels *K*1(*t*,*s*), 0 ≤ *s* ≤ *t* ≤ *T*, *K*2(*s*1,*s*2), 0 ≤ *s*1,*s*<sup>2</sup> ≤ *t* ≤ *T*, the authors of [36] used test signals

$$\mathbf{x}(t) \equiv \mathbf{x}\_{\nu}^{a\_{1,2}}(t) = a\_{1,2}(e(t) - e(t-\nu)), 0 \le \nu \le t \le T,\tag{10}$$

where *α*<sup>1</sup> = *α*2. Figure 2 shows the form of the input signal (10) when the signal amplitude is equal to 1.

**Figure 2.** The form of the input signal (10).

Substituting (10) in (9) leads to the following system:

$$\begin{aligned} \alpha\_1 \int\_0^\nu K\_1(t, s) ds + a\_1^2 \int\_t^t \int\_0^t K\_2(s\_1, s\_2) ds\_1 ds\_2 &= y^{a\_1}(t, \nu), \\ \alpha\_2 \int\_0^\nu K\_1(t, s) ds + a\_2^2 \int\_t^t \int\_t^t K\_2(s\_1, s\_2) ds\_1 ds\_2 &= y^{a\_2}(t, \nu), \end{aligned} \tag{11}$$

where *α*<sup>1</sup> = *α*2, 0 ≤ *ν* ≤ *t* ≤ *T*, which implies that

$$K\_1(t, \nu) = f\_{1\nu}'(t, \nu),\tag{12}$$

$$K\_2(t, t - \nu) = \frac{1}{2} \left( f\_{2t\nu}''(t, \nu) + f\_{2\nu^2}''(t, \nu) \right),\tag{13}$$

where

$$f\_1(t, \nu) = \frac{a\_2^2 y^{a\_1}(t, \nu) - a\_1^2 y^{a\_2}(t, \nu)}{a\_1 a\_2 (a\_2 - a\_1)},\tag{14}$$

$$f\_2(t, \nu) = \frac{a\_1 y^{a\_2}(t, \nu) - a\_2 y^{a\_1}(t, \nu)}{a\_1 a\_2 (a\_2 - a\_1)}.\tag{15}$$

Let us carry out the procedure for identifying the Volterra kernel *K*2(*s*1,*s*2) symmetric in variables *s*1,*s*2, using Equations (13) and (15). Then the problem of identifying *K*1(*t*,*s*) from (9) reduces to solving

$$\begin{aligned} \stackrel{t}{\int} K\_1(t, s)\mathbf{x}(s)ds &= q(t),\\ q(t) = y(t) - \stackrel{t}{\int} \stackrel{t}{\int} K\_2(s\_1, s\_2)\mathbf{x}(t - s\_1)\mathbf{x}(t - s\_2)ds\_1ds\_2. \end{aligned} \tag{16}$$

where *K*2(*s*1,*s*2) is known. Applying test signals (8) in addition to (10), we obtain Equation (16), where

$$q(t) \equiv q\_\nu(t) = \begin{cases} 0, & t = 0, \,\,\nu = 0, \\ g(t, \nu), & 0 < \nu \le t, \end{cases}$$

which can be represented in the form

$$\int\_{0}^{\nu} \mathcal{K}\_{1}(t,s) \frac{s}{\nu} ds + \int\_{\nu}^{t} \mathcal{K}\_{1}(t,s) ds = q(t,\nu), \tag{17}$$

$$\begin{split} q(t,\nu) &= \mathcal{g}(t,\nu) - \int\_{t-\nu}^{t} \int\_{t}^{t} \mathcal{K}\_{2}(s\_{1},s\_{2}) \frac{t-s\_{1}}{\nu} \frac{t-s\_{2}}{\nu} ds\_{1} ds\_{2} -\\ &- 2 \int\_{t-\nu}^{t} ds\_{1} \int\_{0}^{t-\nu} \mathcal{K}\_{2}(s\_{1},s\_{2}) \frac{t-s\_{1}}{\nu} ds\_{2} - \int\_{0}^{t} \int\_{s}^{t} \mathcal{K}\_{2}(s\_{1},s\_{2}) ds\_{1} ds\_{2}. \end{split}$$

Here, *g*(*t*, *ν*) is the response of a dynamic object to a signal (8) at 0 ≤ *ν* ≤ *t* ≤ *T*. Following [35,37], the inversion Formula (17) has the form

$$K\_1(t, \nu) = -\left(2g\_{\nu}'(t, \nu) + \nu g\_{\nu^2}''(t, \nu)\right). \tag{18}$$

Let us compare the effect of using test signals (8) and (10) when building an integral model (9).

The below example demonstrates the effect of increasing the simulation accuracy when using test signals of the form (8). Let the "reference" dynamical system be represented by a cubic Volterra polynomial with kernels *K*<sup>1</sup> = 1, *K*<sup>2</sup> = <sup>1</sup> <sup>2</sup> , *<sup>K</sup>*<sup>3</sup> <sup>=</sup> <sup>1</sup> 3! , so that

$$y\_{ct}(t) = \int\_0^t \mathbf{x}(s)ds + \frac{1}{2} \left(\int\_0^t \mathbf{x}(s)ds\right)^2 + \frac{1}{3!} \left(\int\_0^t \mathbf{x}(s)ds\right)^3. \tag{19}$$

The technique for constructing quadratic and cubic Volterra polynomials, based on the use of piecewise constant test signals of type (10), has been successfully tested on dynamic systems of various physical nature, including a mathematical model of type (19), as well as in modeling the dynamics of a heat exchanger element and wind power plant [38]. Note that (19) is a partial sum of the series for the function

$$e^{\int\_{0}^{t} x(s)ds} = 1.$$

This function has proven itself well in the study of the areas of applicability of identification algorithms for quadratic and cubic Volterra polynomials [38,39]. We apply the procedure for identifying kernels by using test signals (10) with amplitudes *α*<sup>1</sup> = −*α*<sup>2</sup> = *α* > 0 and, instead of (9), obtain

$$y\_1(t) = \int\_0^t \left(1 + \frac{a^2}{2}s^2\right) x(s)ds + \frac{1}{2} \left(\int\_0^t x(t-s)ds\right)^2 \tag{20}$$

where the Volterra kernels were restored using Equations (12) and (13), respectively.

The combined model (9) with the addition to (10) test signals (8) with amplitude *α* for identification *K*1(*t*,*s*) has the form

$$y\_2(t) = \int\_0^t \left(1 + a^2 \left(\frac{1}{4}s^2 - \frac{3}{4}ts + \frac{1}{2}t^2\right)\right) \mathbf{x}(s) ds + \frac{1}{2} \left(\int\_0^t \mathbf{x}(t-s) ds\right)^2,\tag{21}$$

where the kernel identification was performed using Equations (18) and (13), respectively. On signals *xβ*(*t*) = *<sup>t</sup> <sup>β</sup>* , *β* = *k* · *α* · 0.01, *k* = 1, *B*, model (20) gives residual

$$m\_1(t) = y\_{ct}^{\mathcal{S}}(t) - y\_1^{\mathcal{S}}(t) = \frac{t^6}{48\beta^3} - \frac{\alpha^2 t^4}{8\beta}y\_1^{\mathcal{S}}$$

and model (21) gives residual

$$m\_2(t) = y\_{ct}^{\beta}(t) - y\_2^{\beta}(t) = \frac{t^6}{48\beta^3} - \frac{\alpha^2 t^4}{16\beta^4}t$$

where *y β et* is the response (19) to signal *<sup>x</sup>β*(*t*).

Let us present an algorithm for constructing the polynomial (9) for modeling the response of the dynamic system represented in the form (19).

*Step 1*. Calculation of the values of *y<sup>α</sup> et*(*t*, *<sup>ν</sup>*) and *<sup>y</sup>*−*<sup>α</sup> et* (*t*, *ν*) using substitution (10) with amplitude *α*<sup>1</sup> = −*α*<sup>2</sup> = *α* > 0 into the right-hand side of (19).

*Step 2*. Calculation by (15) of the values of the right-hand side of the integral equation,

$$\int\_{t-\nu}^{t} \int\_{t-\nu}^{t} K\_2(s\_1, s\_2) ds\_1 ds\_2 = f\_2(t, \nu), \\ 0 \le \nu \le t \le T.$$

*Step 3*. Application of Equation (13) for identifying *K*2(*s*1,*s*2), 0 ≤ *s*1,*s*<sup>2</sup> ≤ *T*.

*Step 4*. Calculation of values *y<sup>α</sup> et*(*t*, *ν*) using substitution (8) with an amplitude *α* into the right-hand side of (19).

*Step 5*. Calculation of the right-hand side of (17) *q*(*t*, *ν*), where *K*2(*s*1,*s*2) and *<sup>q</sup>*(*t*, *<sup>ν</sup>*) <sup>≡</sup> *<sup>y</sup><sup>α</sup> et*(*t*, *ν*) are obtained in the previous steps 3 and 4, respectively.

*Step 6*. Application of Equation (18) for identifying *K*1(*t*, *ν*), 0 ≤ *ν* ≤ *t* ≤ *T*.

*Step 7*. Substitution of kernels *K*2(*s*1,*s*2) and *K*1(*t*, *ν*) obtained in steps 3 and 6, respectively, into the right-hand side of (9). This leads to (21).

Modeling accuracy *y*1(*t*) was compared with response *y*2(*t*). The value of the "mean absolute error" coefficient was chosen as a criterion for modeling accuracy.

$$MAE\_r(t) = \frac{1}{B} \sum\_{\beta=1}^{B} |n\_r(t)|\_\prime \; r = 1,2, \; t \in [0,15].$$

In Figure 3, black color shows the areas of fulfillment of the inequality *MAE*2(*t*) < *MAE*1(*t*) for *B* = 10, 25, 40 with an accuracy of *δ* = 10−2.

**Figure 3.** Areas of fulfillment of the inequality *MAE*2(*t*) < *MAE*1(*t*) for (**a**) *B* = 10, (**b**) *B* = 25, and (**c**) *B* = 40.

The computational experiment showed that the areas of efficiency of the integral models (20) and (21) depend on the length of the segment *T*, the amplitude of the test signals *α* used to identify the Volterra kernels, and the accuracy of the calculations *δ*.

Note that we assumed the quadratic term, the two-dimensional kernel *K*2(*t*, *ν*), in Equation (18) to be known. Therefore, in the next section, we consider an algorithm for identifying this term using Equation (13).

#### **3. Identification Algorithm for Quadratic Term**

Unfortunately, the implementation of the obtained inversion Equation (13) in practice faces a fundamental difficulty: the differentiation operation is an ill-posed one [40]. One of the manifestations of ill-posedness is large errors in calculating the derivative, even for very small errors in specifying a differentiable function. Note that the operation of subtraction in (15) of the registration errors of two functions leads to an increase in the variance of the total error in setting the function *f*2(*t*, *ν*). Thus, stable differentiation of noisy data becomes an urgent problem for the implementation of formula (13) in practice.

Reference [41] constructed a stable identification algorithm on the basis of Equation (12) (a stable identification algorithm is an algorithm in which the relative identification error is comparable to the relative error of the initial data). There, a smoothing cubic spline (SCS) of a defect unit was used for a stable calculation of the first derivative. The smoothing parameter was chosen from the condition of the minimum root-mean-square smoothing error. The use of smoothing splines becomes much more complicated in the case of identifying the quadratic kernel *K*2(*τ*,*s*). First, to calculate the second-order mixed derivative *f* <sup>2</sup>*tν*(*t*, *ν*), we need to build a smoothing bicubic spline (SBS), which is a function of two variables *t*, *ν*. Secondly, the boundary conditions are now given not at two extreme points of the SCS construction interval, but on four straight lines, which are the boundaries of the rectangular area of the SCS construction. Thirdly, due to the different "smoothness" of the function *f*2(*t*, *ν*) in different variables, we now have to choose two smoothing parameters from the condition for the minimum smoothing error. These difficulties caused the main problems that were not solved in the corresponding scientific publications and which are addressed in this section.

Suppose that the values of the function *f*2(*t*, *ν*) are determined at the nodes of a rectangular grid. To take into account possible errors (noise) of measurements, the following representation of noisy measurements *f* <sup>2</sup>(*ti*, *νj*) is taken:

$$f\_2(t\_{i\prime}\nu\_{\hat{\jmath}}) = f\_2(t\_{i\prime}\nu\_{\hat{\jmath}}) + \eta\_{i,\hat{\jmath}\prime}\ i = 1, \dots, N\_{\prime\prime} \\ j = 1, \dots, N\_{\prime\prime}$$

where *ηi*,*<sup>j</sup>* is random measurement noise with zero mean value and variance *σ*<sup>2</sup> *<sup>η</sup>* (equally accurate measurements). Note that nodes *ti* and *ν<sup>j</sup>* may not have the same or equal steps. It is required to calculate the values of derivatives *f* <sup>2</sup>*tν*(*t*, *ν*), *f* <sup>2</sup>*ν*<sup>2</sup> (*t*, *<sup>ν</sup>*) at the given nodes from the initial data *f* 2 - *ti*, *ν<sup>j</sup>* . .

For a stable calculation of these derivatives, we turn to SCS [42] widely used in the processing of experimental data [43,44]. Suppose we have *N<sup>ν</sup>* nodes *V*<sup>1</sup> = *ν*<sup>1</sup> < *ν*<sup>2</sup> < ... < *νN<sup>ν</sup>* = *V*<sup>2</sup> at some interval [*V*1, *V*2]. In these nodes, the values of the function (signal) *f*(*ν*) are measured as follows:

$$f\_j = f(\nu\_j) + \eta\_{j\prime} j = 1 \dots N\_{\nu\prime} \tag{22}$$

where *η<sup>j</sup>* is the random measurement noise with zero mean and variance *σ*<sup>2</sup> *<sup>η</sup>* (equally accurate measurements). The smoothing cubic spline *SNν*,*α*(*ν*) of a defect unit on each segment *νj*, *νj*+<sup>1</sup> can be represented by a cubic polynomial of the following form [42]:

$$S\_{N\_{\nu},a}(\nu) = a\_{\dot{\jmath}} + b\_{\dot{\jmath}} \cdot (\nu - \nu\_{\dot{\jmath}}) + c\_{\dot{\jmath}} \cdot (\nu - \nu\_{\dot{\jmath}})^2 + d\_{\dot{\jmath}} \cdot (\nu - \nu\_{\dot{\jmath}})^3. \tag{23}$$

Moreover, the function *SNν*,*α*(*ν*) must be twice continuously differentiable on the entire interval [*V*1, *V*2] of its definition. Note that, in contrast to the interpolation spline (passing through the points *νj*, *f j* ), the smoothing cubic spline *SNν*,*α*(*ν*) generally does not pass through these points, but passes more "smoothly" in some neighborhoods of these points (depending on the smoothing parameter *α*), thereby providing smoothing (filtering) of measurement noise.

To uniquely calculate the spline coefficients *aj*, *bj*, *cj*, *dj*, boundary conditions are set at the nodes *ν*1, *νN<sup>ν</sup>* . The following conditions are most often used [42,44]:

• conditions on zero second derivatives of the spline (natural boundary conditions),

$$\left. S\_{N\_{\nu}a}''(\nu\_1) = 0; \; S\_{N\_{\nu}a}''(\nu\_{N\_{\nu}}) = 0, \right. \tag{24}$$

• conditions on the first derivatives of the spline,

$$S\_{N\_{\nu}, \mathfrak{a}}'(\nu\_1) = s\_1'; \ S\_{N\_{\nu}, \mathfrak{a}}'(\nu\_{N\_{\nu}}) = s\_{N\_{\nu} \prime}' \tag{25}$$

as well as a combination of these conditions (for example, condition (25) is on the left, condition (24) is on the right). It was shown [42] the SCS constructed under these conditions provides a minimum to the functional

$$F\_{\mathfrak{a}}(S) = \mathfrak{a} \cdot \int\_{\nu\_1}^{\nu\_{N\_{\nu}}} |S''(\nu)|^2 d\nu + \sum\_{j=1}^{N\_{\nu}} p\_j^{-1} \cdot \left(\tilde{f}\_j - \mathcal{S}(\nu\_j)\right)^2,\tag{26}$$

where *pj* denotes the weight factors reflecting the accuracy of the *j-*th measurement *f j* (they are given the same in the case of equally accurate measurements).

To calculate the spline coefficients (for a given smoothing parameter), it is necessary to compose a system of linear algebraic equations with a five-diagonal matrix concerning some vector (as a rule, these are the values of the second derivative of the spline at the nodes 8 *νj* 9 ), through which all the spline coefficients are then found (for details, see [42,44]).

The smoothing parameter *α* "controls" the smoothness of the spline, and the smoothing error (as well as the differentiation error) depends significantly on the value of this parameter [44,45]. There is a parameter value (let us call it optimal) for which the smoothing error (in the accepted norm) is minimal [45]. Let us temporarily assume that we have found an acceptable (in terms of the minimum smoothing error) value of the smoothing parameter (the choice of the parameter is discussed in the next section).

**Remark 1.** *It follows from the form of the integrals (11) that the function f*2(*t*, *ν*) *takes nonzero values for the arguments satisfying the condition ν* ≤ *t. For other values of ν, t, the function is equal to zero due to the condition of the technical feasibility of the system with negative values of the arguments, i.e., k*2(*t*, *ν*) ≡ 0*, if ν* < 0*, t* < 0.

To eliminate the discontinuity of the first kind at *ν* = *t* values when constructing a smoothing spline, we propose to supplement the values of the function *f*2(*t*, *ν*) for *ν* > *t* according to the following rule:

$$f\_2(t, t + \Delta \nu) = \begin{cases} 2f\_2(t, t) - f\_2(t, t - \Delta \nu), & 0 \Delta \nu \le t; \\ 2f\_2(t, t), & t \Delta \nu \le T - t. \end{cases}$$

We denote the function supplemented in this way as *f* ∗ <sup>2</sup> (*t*, *ν*).

Initially, we focus on the algorithm for calculating the values of the derivative *f* <sup>2</sup>*ν*<sup>2</sup> (*t*, *<sup>ν</sup>*). It can be represented by the following steps:

*Step 1*. We set the boundary conditions, the combination of which at the extreme points *ν*1, *νN<sup>ν</sup>* of the construction interval is determined on the basis of available a priori information about the function *f* ∗ <sup>2</sup> (*t*, *ν*). If such reliable information is not available, then one should turn to the natural boundary conditions (24).

*Step 2*. For each *i* = 1, . . . , *Nt*, we form a dataset

$$\left\{ \nu\_{j\prime} \tilde{f} 1\_j^{(i)} = \tilde{f}\_2^\* \left( t\_{i\prime} \nu\_j \right), j = 1, \dots, N\_{\nu} \right\}\_{\prime \prime}$$

select the smoothing parameter *α*1(*i*), and build the SCS *S*1 (*i*) *<sup>N</sup>ν*,*α*1(*i*)(*ν*), from which we then calculate the first derivative ˆ *f* <sup>2</sup>*ν*(*ti*, *<sup>ν</sup>j*) = *<sup>d</sup> <sup>d</sup><sup>ν</sup> S*1 (*i*) *<sup>N</sup>ν*,*α*1(*i*) (*ν*)|*ν*=*ν<sup>j</sup>* <sup>=</sup> *<sup>b</sup>*<sup>1</sup> (*i*) *<sup>j</sup>* (an estimate of the derivative *f* <sup>2</sup>*ν*(*ti*, *νj*)), where *b*1 (*i*) *<sup>j</sup>* is the coefficient of spline *S*1 (*i*) *<sup>N</sup>ν*,*α*1(*i*)(*ν*) in representation (23).

*Step 3*. For each Y, we again form the dataset

$$\left\{ \nu\_{j\prime} \, \widetilde{f} 2^{(i)}\_{j} = \hat{f}'\_{2\nu}(t\_{i\prime} \nu\_{j})\_{\prime} \, j = 1, \ldots, N\_{\nu} \right\}\_{\prime}.$$

select the smoothing parameter *α*2(*i*), and build the SCS *S*2 (*i*) *<sup>N</sup>ν*,*α*2(*i*)(*ν*), the first derivative of which is the estimate ˆ *f* <sup>2</sup>*ν*<sup>2</sup> (*ti*, *<sup>ν</sup>j*) = *<sup>d</sup> <sup>d</sup><sup>ν</sup> S*2 (*i*) *<sup>N</sup>ν*,*α*2(*i*) (*ν*)|*ν*=*ν<sup>j</sup>* <sup>=</sup> *<sup>b</sup>*<sup>2</sup> (*i*) *<sup>j</sup>* for the second derivative *f* <sup>2</sup>*ν*<sup>2</sup> (*ti*, *<sup>ν</sup>j*), where *<sup>b</sup>*<sup>2</sup> (*i*) *<sup>j</sup>* is the coefficient of spline *S*2 (*i*) *<sup>N</sup>ν*,*α*2(*i*)(*ν*) in representation (23).

Thus, we calculate estimates of the second derivative *f* <sup>2</sup>*ν*<sup>2</sup> (*ti*, *<sup>ν</sup>j*) for *ti*, *<sup>i</sup>* = 1, . . . , *Nt*.

Let us proceed to the construction (following the technique of [46]) of a bicubic smoothing spline for calculating the mixed derivative *f* <sup>2</sup>*tν*(*ti*, *νj*). We use the following algorithm: *Step 1*. For each *j* = 1, . . . , *Nν*, we again form a dataset (fix the value of *νj*)

$$\left\{ t\_{i\prime} \widetilde{f} \mathfrak{Z}\_i^{(j)} = \widetilde{f}\_2^\* \left( t\_{i\prime} \nu\_j \right), i = 1, \dots, N\_t \right\}\_{\prime \prime}$$

select the smoothing parameter *α*3(*j*), build the SCS *S*3 (*j*) *Nt*,*α*3(*j*)(*t*), from which we then calculate the first derivative ˆ *f* <sup>2</sup>*t*(*ti*, *<sup>ν</sup>j*) = *<sup>d</sup> dt S*3 (*j*) *Nt*,*α*3(*j*) (*t*)|*t*=*ti* <sup>=</sup> *<sup>b</sup>*<sup>3</sup> (*j*) *<sup>i</sup>* (estimation of the derivative *f* <sup>2</sup>*t*(*ti*, *νj*)), where *b*3 (*j*) *<sup>i</sup>* is the coefficient of spline *S*3 (*j*) *Nt*,*α*3(*j*)(*t*) in representation (23).

*Step 2*. For each Y, we form a dataset

$$\left\{ \nu\_{\boldsymbol{\upprime}}, \boldsymbol{\upwidetilde{f}} \mathbf{4}\_{\boldsymbol{\upprime}}^{(i)} = \boldsymbol{\uphat{f}}\_{2t}'(t\_{i\boldsymbol{\upprime}} \boldsymbol{\upnu}\_{\boldsymbol{\upprime}}), j = 1, \dots, N\_{\boldsymbol{\upnu}} \right\}\_{\boldsymbol{\upmu}}$$

select a smoothing parameter *α*4(*i*), build an SCS *S*4 (*i*) *<sup>N</sup>ν*,*α*4(*i*)(*ν*), the first derivative of which is an estimate ˆ *f* <sup>2</sup>*tν*(*ti*, *<sup>ν</sup>j*) = *<sup>d</sup> <sup>d</sup><sup>ν</sup> S*4 (*i*) *<sup>N</sup>ν*,*α*4(*i*) (*ν*)|*ν*=*ν<sup>j</sup>* <sup>=</sup> *<sup>b</sup>*<sup>4</sup> (*i*) *<sup>j</sup>* for the mixed derivative *f* <sup>2</sup>*tν*(*ti*, *νj*), where *b*4 (*i*) *<sup>j</sup>* is the coefficient of spline *S*4 (*i*) *<sup>N</sup>ν*,*α*4(*i*)(*ν*) in representation (23).

Thus, we repeat step 1 for *νj*, *j* = 1, ... , *Nν*, and step 2 for *ti*, *i* = 1, ... , *Nt*. After calculating the estimates ˆ *f* <sup>2</sup>*ν*<sup>2</sup> (*ti*, *<sup>ν</sup>j*), <sup>ˆ</sup> *f* <sup>2</sup>*tν*(*ti*, *νj*) using Equation (13), we find the estimate ˆ *k*2(*ti* − *νj*, *ti*) for the values *ν<sup>j</sup>* ≤ *ti*.

**Remark 2.** *The inversion Equation (13) determines the value of the quadratic kernel K*2(*t*, *ν*) *for the arguments* 0 ≤ *ν* ≤ *t* ≤ *T, i.e., for the values of the argument ν* ≤ *t. The line ν* = *t is the axis of symmetry of the kernel K*2(*t*, *ν*) *(follows from the one-dimensionality of the input signal); therefore, to determine the values of the kernel for ν* = *t* + Δ*ν* > *t, where* Δ*ν* > 0*, we propose a symmetrical supplement of the kernel values according to the formula K*2(*t*, *t* + Δ*ν*) = *K*2(*t* + Δ*ν*, *t*).

**Remark 3.** *Since the construction of the SCS by the variable ν requires approximately Coper* · *N<sup>ν</sup> arithmetic operations, where Coper* ≈ 30 [42]*, the proposed algorithm for calculating derivatives requires approximately C*<sup>4</sup> *oper* · *<sup>N</sup>*<sup>3</sup> *<sup>ν</sup>* · *Nt operations. Therefore, the proposed algorithms for calculating derivatives have a high computational efficiency even with a large dimension of the grid* - *ti*, *ν<sup>j</sup>* .

Previously, the values of the smoothing parameters *α*1(*i*) , *α*2(*i*) , *α*3(*j*) , *α*4(*i*) selected were assumed (i.e., determined). Therefore, the question of how to choose these parameters arises, which will significantly affect the error of smoothing and differentiation. If the variance *σ*<sup>2</sup> *<sup>η</sup>* of the measurement noise (see (22)) were reliably known (at least with an accuracy of 5–8%), then the selection algorithm constructed on the basis of checking the optimality criterion of the linear filtering algorithm would allow, with acceptable accuracy (5–8%), to estimate the values of the optimal smoothing parameter that minimizes the value of the root-mean-square smoothing error (see [44] (pp. 60–67), [45]). It is obvious that the situation with unknown noise dispersion is most characteristic in solving practical identification problems. Therefore, to choose a parameter in this case, we turn to the L-curve method used to choose the regularization parameter in algorithms for solving linear ill-posed problems (for example, [47,48]). In [49], a modification of the L-curve method was proposed for choosing the smoothing parameter.

Let us talk briefly about the essence of this selection algorithm. Let us introduce the following functionals (see [49]):

$$\rho(\mathfrak{a}) = \sum\_{j=1}^{N\_{\nu}} p\_i^{-1} \cdot \left(\overline{f}\_j - S\_{n,\mathfrak{a}}(\nu\_j)\right)^2,\\ \gamma(\mathfrak{a}) = \int\_{\nu\_1}^{\nu\_{N\_{\nu}}} \left| S\_{N\_{\nu},\mathfrak{a}}^{\nu}(\nu) \right|^2 d\nu.$$

Then, an L-curve (whose shape resembles the outline of the Latin letter L) is a parametric curve with coordinates (*ρ*(*α*), *γ*(*α*)). It can be shown that the curvature of an L-curve is given by the following formula:

$$k\_L(\alpha) = 2 \cdot \frac{\not{\rho}'(\alpha) \cdot \not{\gamma}''(\alpha) - \not{\rho}''(\alpha) \cdot \not{\gamma}'(\alpha)}{\left[\left(\not{\rho}'(\alpha)\right)^2 + \left(\not{\gamma}'(\alpha)\right)^2\right]^{\frac{3}{2}}},\tag{27}$$

where *ρ*ˆ(*α*) = ln *ρ*(*α*), *γ*ˆ(*α*) = ln *γ*(*α*). The smoothing parameter is the value *α<sup>L</sup>* for which the curvature *kL*(*α*) takes on the maximum value. To effectively calculate the value of the functional *γ*(*α*), the following formula is proposed:

$$\gamma(\boldsymbol{\alpha}) = \sum\_{i=1}^{n-1} \left( 4c\_i^2 \cdot h\_i + 12c\_i \cdot d\_i \cdot h\_i^2 + 12d\_i^2 \cdot h\_i^3 \right) \boldsymbol{\epsilon}$$

where *hi* = *ti*+<sup>1</sup> − *ti*, *i* = 1, ... , *n* − 1, *ci*, *di* are the SCS coefficients in representation (23), calculated for a given parameter *α*. To calculate the curvature value using Equation (27), an approach is proposed that uses cubic interpolation splines to approximate the dependences *ρ*ˆ(*α*), *γ*ˆ(*α*) (for details, see [49]). An extensive computational experiment was also carried out there to answer the following question: Is the loss due to smoothing error large when *α<sup>L</sup>* is used instead of the optimal *αopt* (which can only be determined in a computational experiment)? The experiment was carried out with functions that are "typical" output signals of a dynamic system when step signals are applied to the input. The analysis of the results of the experiment showed that the algorithm for selecting the smoothing parameter on the basis of the L-curve method makes it possible to estimate the optimal value of the smoothing parameter quite well. The increase in the smoothing error when using the parameter *α<sup>L</sup>* does not exceed 5–15% on average compared to *αopt*, the calculation of which is impossible in practice. Therefore, to calculate the smoothing parameters *α*1(*i*), *α*2(*i*), *α*3(*j*), and *α*4(*i*), it is proposed to use the described algorithm for choosing the smoothing parameter on the basis of the L-curve method.

To test the proposed algorithm of identifying quadratic kernel, a numerical experiment was carried out, some of the results of which we present in this paper. The test quadratic kernel *K*2(*τ*,*s*) is a function used to describe the dynamics of some type of heat exchangers [50]. Figure 4a shows the surface of this function, and Figure 4b shows isolines. The time interval boundary was *T* = 1, while the number of nodes was *Nt* = 80, *N<sup>ν</sup>* = 80.

First, we define the methodological error of the identification algorithm. To do this, we calculated the values of the function (15) at the nodes *ti*, *i* = 1, . . . , *Nt*, *νj*, *j* = 1, . . . , *Nν*, which were interpreted as the exact values of the function *f*2(*ti*, *νj*). These data, presented as a matrix *F* with dimensions 80 × 80 with elements *Fi*,*<sup>j</sup>* = *f*2(*ti*, *νj*), were the initial data for the proposed identification algorithm. Since these initial data were taken as exact, instead of SCS, we built interpolating cubic splines (including the bicubic spline) with boundary conditions (24). We calculated estimates for the derivatives ˆ *f* <sup>2</sup>*ν*<sup>2</sup> (*ti*, *<sup>ν</sup>j*) and <sup>ˆ</sup> *f* <sup>2</sup>*tν*(*ti*, *νj*) on the basis of these splines and then constructed an estimate for the quadratic kernel using Equation (9) (see Remark 2). Figure 5 shows the isolines of this estimate, having a relative identification error *<sup>δ</sup><sup>K</sup>* <sup>=</sup> *K*2−*K*ˆ2 *K*2 <sup>=</sup> 0.011, where *<sup>K</sup>*2, *<sup>K</sup>*ˆ2 are matrices composed of the values of the exact kernel *<sup>K</sup>*2(*ti*, *<sup>ν</sup>j*) and its estimates *<sup>K</sup>*ˆ2(*ti*, *<sup>ν</sup>j*), respectively, and · is the Euclidean norm of the matrix. Approximately the same error was observed for other grid sizes in *t*, *ν*. Therefore, we can conclude the proposed identification algorithm has a low methodological error.

**Figure 4.** Test quadratic kernel: (**a**) the surface of *K*2(*τ*,*s*); (**b**) isolines.

**Figure 5.** Estimation of the kernel *K*ˆ2(*τ*,*s*), built on exact data.

Let us consider the influence of the measurement noise of the function *f*2(*t*, *ν*) on the accuracy of identification. To do this, we distorted all elements of the "exact" matrix *F* with normally distributed noise with a relative level *<sup>δ</sup><sup>F</sup>* <sup>=</sup> *F*−*F F* , where *<sup>F</sup>* is a matrix with "noisy" elements. The matrix *F* thus formed was used as initial data for the previously described identification algorithm. We chose the smoothing parameter at all steps of calculating derivatives using the L-curve method described above. Figure 6 shows the isolines of the estimate *K*ˆ2(*ti*, *νj*), built at a noise level of 0.02. The relative identification error was *δ<sup>K</sup>* = 0.044, which indicates the acceptable accuracy of quadratic term identification by the proposed algorithm.

**Figure 6.** Estimation of the kernel *K*ˆ2(*τ*,*s*), built on noisy data.

#### **4. Difference Scheme for Finding a Linear Nonstationary Kernel Using the Quadrature Method**

It often happens in practice that the responses of the system (the right-hand side of equations) are given not analytically, but in the form of a series of numbers. In this case, we have to turn to the numerical solution. The procedure for the numerical identification of the Volterra polynomial (9) using piecewise constant test signals (10) was considered in detail earlier in [36]. This approach to constructing a quadratic polynomial was tested in applications for thermal power objects [51]. As shown in the previous section, using signals of a new type with a rising edge of the form (8) makes it possible to improve the accuracy of modeling, even if they are used to identify only one of the polynomial kernels (9). Therefore, in this section, we restrict ourselves to the procedure for numerical identification of a nonstationary linear term from (9) based on test signals of the form (8).

As shown in Section 2, if we assume that identifying the kernel *K*2(*s*1,*s*2) in the quadratic term of the polynomial (9) has already been achieved in one way or another, then the substitution of (8) into (16) leads to (17). We present a difference scheme for finding a linear nonstationary kernel from (17) with a known right-hand side. To do this, we introduce on the interval [0, *T*] a uniform grid *ti* = *ih*, *i* = 0, *N* and a subgrid *ti*−1/2 <sup>=</sup> (*<sup>i</sup>* <sup>−</sup> 1/2)*h*, *<sup>i</sup>* <sup>=</sup> 1, *<sup>N</sup>*, while we denote by *<sup>K</sup><sup>h</sup> <sup>i</sup>*,*<sup>j</sup>* the grid approximation of the kernel *K*1 - *ti*, *tj* . To approximate the integrals in (17), we use the middle rectangle rule, taking into account *ν* ≤ *t*,

$$h\sum\_{k=1}^{j} K\_{i,\,k-1/2}^{h} \frac{t\_{k-1/2}}{t\_j} + h\sum\_{k=j+1}^{i} K\_{i,\,k-1/2}^{h} = q\left(t\_i, t\_j\right), i = \overline{1, N}, j = \overline{1, l}. \tag{28}$$

At each step *i* = 1, *N*, one has to solve a system of linear algebraic equations of dimension (*<sup>i</sup>* <sup>×</sup> *<sup>i</sup>*) with respect to *<sup>K</sup><sup>h</sup> <sup>i</sup>*, *<sup>k</sup>*−1/2, *<sup>k</sup>* <sup>=</sup> 1, *<sup>i</sup>*.

Consider the application of the difference scheme (28) with help of a test example. Let the right side of (17) have the form

$$q(t, \nu) = t - \frac{\nu}{2} + \frac{5t^3}{24} - \frac{\nu^3}{48} + \frac{t\nu^2}{8} - \frac{t^2\nu}{4}. \tag{29}$$

This right side will correspond to the kernel *K*1(*t*, *ν*) from example (21). Table 1 shows the results of numerical calculations obtained using the difference scheme (28). Here,

$$\varepsilon = \max\_{1 \le j \le i \le N} \left| \mathcal{K}\_1 \left( t\_{i\prime} t\_{j-1/2} \right) - \mathcal{K}\_{i,j-1/2}^h \right|.$$

denotes the errors of the numerical solution. The last column of the table shows the number of nodes in which the maximum error is achieved. The table shows that the proposed algorithm has a linear order of convergence.

*h ε* **Node Number, (***i***,** *j***)** 1/8 0.00553385 (8, 2) 1/16 0.00268555 (16, 4) 1/32 0.00132243 (32, 8) 1/64 0.00065613 (64, 6)

**Table 1.** The error of the numerical solution to (17) with the right side (29).

Thus, the numerical construction of the quadratic Volterra polynomial using the quadrature of the middle rectangles can be implemented by the formula

$$h\sum\_{j=1}^{i} \mathcal{K}\_1^h \left(t\_i, t\_{j-1/2}\right) \mathbf{x}\left(t\_j\right) + h^2 \sum\_{k=1}^{i} \sum\_{l=1}^{i} \mathcal{K}\_2^h (t\_{k-1/2}, t\_{l-1/2}) \mathbf{x}\left(t\_i - t\_{k-1/2}\right) \mathbf{x}\left(t\_i - t\_{l-1/2}\right) = \mathbf{g}\left(t\_i\right), i = \overline{1, N}, \quad i = \overline{1, N}$$

where the kernels *K<sup>h</sup>* 1 *ti*, *tj*−1/2 are obtained using the difference Equation (28).

#### **5. Future Research**

This section is devoted to interpreting the identification method for nonsymmetric kernel *K*1(*t*,*s*) presented in Section 2 for solving the reconstruction problem for symmetric function *K*2(*s*1,*s*2). For this, we introduce the system of integral Equation (9), where the functions *x*(*t*) and *y*(*t*) have the form

$$\mathbf{x}(t) \equiv \mathbf{x}\_{\nu}^{a\_{1,2}}(t) = \begin{cases} 0, & t \le 0, \\ a\_{1,2}\frac{t}{\nu}, & 0 < t \le \nu, \\ a\_{1,2\nu} & t > \nu, \end{cases} \tag{30}$$

$$y(t) \equiv y\_{\nu}^{a\_{1,2}}(t) = \begin{cases} 0, & t = 0, \nu = 0, \\ g^{a\_{1,2}}(t, \nu), & 0 < \nu \le t, \end{cases} \tag{31}$$

where *α*<sup>1</sup> = *α*2, and *g α*1,2 *<sup>ν</sup>* (*t*) is a sufficiently smooth function. Assuming that in (9) the kernel *K*2(*s*1,*s*2) = *ϕ*(*s*1)*ϕ*(*s*2) is a separable function, such that *ϕ*(*s*) ∈ *C*Ω, *C*<sup>Ω</sup> is the space of continuous functions symmetric on the square Ω = {*s*1,*s*<sup>2</sup> : 0 ≤ *s*1,*s*<sup>2</sup> ≤ *T*}; then, system (9) can be transformed to the form

$$\int\_0^t K\_1(t,s)x(s)ds + \left(\int\_0^t \varphi(s)x(t-s)ds\right)^2 = y(t),$$

or, taking into account (30) and (31), into the system

$$a\_{1,2} \left( \int\_0^\nu K\_1(t,s) \frac{s}{\nu} ds + \int\_\nu^t K\_1(t,s) ds \right) + a\_{1,2}^2 \left( \int\_0^\nu \rho(t-s) \frac{s}{\nu} ds + \int\_\nu^t \rho(t-s) ds \right)^2 = g^{a\_{1,2}}(t,\nu). \tag{32}$$

We introduce the following functions *f*1(*t*, *ν*) , *f*2(*t*, *ν*):

$$f\_1(t, \nu) = \int\_0^\nu K\_1(t, s) \frac{s}{\nu} ds + \int\_\nu^t K\_1(t, s) ds,\tag{33}$$

$$f\_2(t, \nu) = \int\_0^\nu \varphi(t - s) \frac{s}{\nu} ds + \int\_\nu^t \varphi(t - s) ds. \tag{34}$$

The system of linear functional equations of the form (32), presented with the designations (33) and (34),

$$\begin{cases} \alpha\_1 f\_1(t, \nu) + \alpha\_1^2 f\_2^2(t, \nu) = \mathcal{g}^{\alpha\_1}(t, \nu), \\ \alpha\_2 f\_1(t, \nu) + \alpha\_2^2 f\_2^2(t, \nu) = \mathcal{g}^{\alpha\_2}(t, \nu), \end{cases}$$

where *α*<sup>1</sup> = *α*2, has a unique solution

$$f\_1(t, \nu) = \frac{a\_2^2 g^{a\_1}(t, \nu) - a\_1^2 g^{a\_2}(t, \nu)}{a\_1 a\_2^2 - a\_1^2 a\_2},\tag{35}$$

$$f\_2^2(t, \nu) = \frac{a\_1 g^{a\_2}(t, \nu) - a\_2 g^{a\_1}(t, \nu)}{a\_1 a\_2^2 - a\_1^2 a\_2}. \tag{36}$$

According to [35], the inversion formula for (33) has the form

$$K\_1(t, \nu) = -2\frac{\partial f\_1(t, \nu)}{\partial \nu} - \nu \frac{\partial^2 f\_1(t, \nu)}{\partial^2 \nu}.$$

or, introducing the differentiation operator *D*<sup>2</sup> = 2 *<sup>∂</sup> ∂ν* <sup>+</sup> *<sup>ν</sup> <sup>∂</sup>*<sup>2</sup> *∂*2*ν* 1 ,

$$K\_1(t, \nu) = -D\_2(f\_1(t, \nu)).$$

Similarly, for (34) we have

$$
\varphi(t-\nu) = -D\_2(f\_2(t,\nu)).
$$

Here, the functions *f*1(*t*, *ν*) and *f*2(*t*, *ν*) are determined by (35) and (36), respectively.

#### **6. Conclusions**

This paper generalized the experience of using piecewise-specified test signals to identify nonlinear dynamic systems of the input–output type, represented as quadratic Volterra polynomials, taking into account the nonstationary properties of the object. The development of this direction is associated with the introduction of test signals with a rising edge, which are characteristic of input actions that occur in practice. The type of test signals introduced in this paper can be used to identify the Volterra kernels included in the quadratic Volterra polynomial.

The new approach to constructing a quadratic Volterra polynomial in the time domain is based on the use of physically realizable test signals, which is very promising for applications. Volterra integral equations of the first kind, to which the problem of identifying Volterra kernels is reduced, have explicit inversion formulas, which ensures the construction of high-speed computational procedures. These formulas include mixed partial derivatives. A new method is proposed for choosing the smoothing parameter of a cubic spline for a stable numerical calculation of the derivatives included in the constructed inversion formula. This choice of parameter provides effective filtering of measurement noise. The results of the computational experiment showed that the relative identification error is comparable to the relative error of the initial data error; at a noise level of the initial data of 2%, the methodological error in the identification of the Volterra kernel was 4.4%.

**Author Contributions:** Conceptualization, Y.V. and S.S.; methodology, Y.V. and S.S.; software, E.M., E.A. and V.B.; validation, Y.V., S.S., E.M., E.A. and V.B.; formal analysis, Y.V., S.S., E.M., E.A. and V.B.; investigation, Y.V. and S.S.; resources, Y.V., S.S., E.M., E.A. and V.B.; data curation, E.M. and V.B.; writing—original draft preparation, Y.V., S.S., E.M., E.A. and V.B.; writing—review and editing, Y.V., S.S., E.M., E.A. and V.B.; visualization, Y.V., S.S., E.M., E.A. and V.B.; supervision, Y.V., S.S., E.M., E.A. and V.B.; project administration, Y.V. and S.S.; funding acquisition, Y.V. and S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Russian Science Foundation, grant number 22-21-00409.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Choice of Regularization Methods in Experiment Processing: Solving Inverse Problems of Thermal Conductivity**

**Alexander Sokolov 1,\* and Irina Nikulina <sup>2</sup>**


**Abstract:** This work is aimed at numerical studies of inverse problems of experiment processing (identification of unknown parameters of mathematical models from experimental data) based on the balanced identification technology. Such problems are inverse in their nature and often turn out to be ill-posed. To solve them, various regularization methods are used, which differ in regularizing additions and methods for choosing the values of the regularization parameters. Balanced identification technology uses the cross-validation root-mean-square error to select the values of the regularization parameters. Its minimization leads to an optimally balanced solution, and the obtained value is used as a quantitative criterion for the correspondence of the model and the regularization method to the data. The approach is illustrated by the problem of identifying the heat-conduction coefficient on temperature. A mixed one-dimensional nonlinear heat conduction problem was chosen as a model. The one-dimensional problem was chosen based on the convenience of the graphical presentation of the results. The experimental data are synthetic data obtained on the basis of a known exact solution with added random errors. In total, nine problems (some original) were considered, differing in data sets and criteria for choosing solutions. This is the first time such a comprehensive study with error analysis has been carried out. Various estimates of the modeling errors are given and show a good agreement with the characteristics of the synthetic data errors. The effectiveness of the technology is confirmed by comparing numerical solutions with exact ones.

**Keywords:** modeling; regularization; inverse problems; balanced identification; error analysis; onedimensional heat equation

**MSC:** 93B30

#### **1. Introduction**

The experiment preparation and processing of the results involve an extensive use of mathematical models of the objects under study. To save costs, they must be carefully planned: one should determine what, when, where and with what accuracy is to be measured to estimate the sought parameters with the given accuracy. These questions can be answered by "rehearsing" the experiment and its processing on a mathematical model simulating the behavior of the object.

Usually, the purpose of an experiment is to evaluate some of the object's parameters. In the case of an indirect experiment, some parameters are measured, while others are to be evaluated. The relationship between the parameters can be described by complex mathematical models. The formalization of this approach leads to identification problems that are by their nature inverse. Those problems often turn out to be ill-posed, and specific approaches using regularization methods are required for the solution [1]. One of the problems with regularization methods is the choice of regularization weights (penalties): weights that are too large lead to unreasonable simplification (and distortion) of the model,

**Citation:** Sokolov, A.; Nikulina, I. Choice of Regularization Methods in Experiment Processing: Solving Inverse Problems of Thermal Conductivity. *Mathematics* **2022**, *10*, 4221. https://doi.org/10.3390/ math10224221

Academic Editor: Dimplekumar N. Chalishajar

Received: 15 September 2022 Accepted: 8 November 2022 Published: 11 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and those that are too small lead to overtraining, an excessive fitting of the model's trajectory to experimental data. In the balanced identification method [2], the choice of regularization weights is carried out by minimizing the cross-validation error. This makes it possible to find a balanced solution that implements the optimal (in the sense of minimizing the cross-validation error) compromise between the proximity of the model to the data and the simplicity of the model [3], formalized in a regularizing additive.

Usually, for each specific identification problem (see examples of modeling pollutants moving in the river corridor [4], parameter identification in nonlinear mechanical systems [5], identification of conductivity coefficient in heat equation [6–8]), a separate special study is carried out, including goal setting, mathematical formalization of the problem, its study, creating a numerical model, preparing a computer program, solving a numerical problem and studying the results, including error estimation, etc.

However, such problems have much in common: the mathematical model description, assignment of operators linking measurements with model variables, formalization of the solution selection criterion, program preparation, error estimation, etc. Additionally, the abundance of similar tasks invariably necessitates a technology that summarizes the accumulated experience.

Balanced Identification Technology or SvF (Simplicity versus Fitting) technology is a step in this direction.

Here is the general "human–computer" scheme of the SvF technology, which implements the balanced identification method (a more detailed description of the technical issues of the technology implementation and the corresponding flowchart can be found in [2]). At the user level, an expert (with knowledge about the object under study) prepares data files and a task file. The data files contain tables with experimental data (as plain text or in MS Excel or MS Access formats). The task file usually contains the data file names, a mathematical description of the object (formalization of the model in a notation close to mathematical, see Appendix A), including a list of unknown parameters, as well as specifications of the cross-validation procedure (CV). These files are transferred to the client program, which replaces the variational problems with discrete ones, creates various sets (training and testing) for the CV procedure, formulates a number of NLP (nonlinear mathematical programming) problems and writes (formalizes) them in the Pyomo package language [9]. The constructed data structures are transferred to a two-level optimization routine that implements an iterative numerical search for unknown model parameters and regularization coefficients to minimize the error of cross-validation. This subroutine can use the parallel solution of mathematical programming problems in a distributed environment of Everest optimization services [10], namely SSOP applications [11]. The Pyomo package converts the NLP description into so-called NL files, which are processed at the server level by special Ipopt solvers [12]. The solutions are then collected and sent back to the client level and subsequently analyzed (for example, complete iterative process conditions are checked). If the iterative process is completed, the program prepares the results (calculates errors, creates solution files, draws graphs of the functions found) and presents them to the researcher (who may not know about the long chain of the tasks preceding the result).

The experts then utilize the results (especially the values of modeling errors–rootmean-square errors of cross validation) for choosing a new (or modified) model or deciding to cease calculations.

The software package together with examples (including some examples of this article) is freely available online (file SvF-2021-11.zip in the Git repository https://github.com/ distcomp/SvF, accessed on 1 September 2022).

SvF technology has been successfully applied in various scientific fields (mechanics, plasma physics, biology, plant physiology, epidemiology, meteorology, atmospheric pollution transfer, etc., and a more detailed enumeration can be found in [2]) as an inverse problem solving method. In these studies, the main attention was paid to the construction of object models using specific regularization methods. This article, in contrast, focuses on

the study of the regularization methods themselves, and the problem of heat conduction is chosen as a convenient example.

The problem of thermal conductivity is chosen to illustrate the technology. This is a classic problem in mathematical physics. It is well studied, and the one-dimensionality allows you to present the results in the form of graphs. Literature reviews can be found in [7,8]. The main task is to find the dependence of the thermal conductivity coefficient on temperature based on an array of experimental data. In total, nine problems were considered, differing in data sets and criteria for choosing solutions. Some of them are original. This is the first time such a comprehensive study with error analysis has been carried out. Various estimates of the modeling errors are given and turn out to be in good agreement with the characteristics of the synthetic data errors.

#### **2. Mixed One-Dimensional Thermal Conductivity Problem**

Let us denote *M* = 0 a set of mathematical statements defining the investigated model of thermal conductivity:

$$M = 0: \begin{cases} \mathbf{x} \in [0, 2], \ t \in [0, 5] \\ \frac{\partial T}{\partial t} = \frac{\partial}{\partial \mathbf{x}} \Big( K(T) \frac{\partial T}{\partial \mathbf{x}} \Big) \\\ T(\mathbf{x}, 0) = \boldsymbol{\varrho}(\mathbf{x}) \\\ T(0, t) = l(t) \\\ T(2, t) = r(t) \end{cases} \tag{1}$$

where *x* and *t* are the spatial and temporal coordinates, *T*(*x,t*) is the temperature, *K*(*T*) is the (temperature-dependent) thermal conductivity coefficient, *ϕ*(*t*) is the initial condition, *l*(*t*) and *r*(*t*) are the left and right boundary conditions.

In what follows, all functions in various (non-difference) statements are considered twice continuously differentiable.

Remark. The formulas in (1) actually coincide with the records (descriptions of the model) in the text of the task file (a set of instructions for obtaining a numerical solution) given in Appendix A.

When conducting numerical experiments, the exact solution of the mathematical model (1)

$$\begin{aligned} Ts(x,t) &= \frac{200(t+1)}{\left(x+1\right)^2 + \left(t+1\right)^2} \\ ks(T) &= \frac{100}{T} \\ qs(x) &= \frac{200}{\left(x+1\right)^2 + 1} \\ ls(t) &= \frac{200(t+1)}{1 + \left(t+1\right)^2} \\ rs(t) &= \frac{200(t+1)}{9 + \left(t+1\right)^2} \end{aligned} \tag{2}$$

is used for the generation of pseudo-experimental data sets (observations) and for comparison with the numerical solution (calculation of errors).

In the notation of the functions of the exact solution, '*s*' is used (short for solution). The functions of the exact solution are shown in Figure 1.

**Figure 1.** Functions of the exact solution: (**T**) contour lines of *Ts*(*x,t*); (**T6**) 6 time slices of *Ts*(*x,t*): *Ts*(*x*,0), *Ts*(*x*,1), ... , *Ts*(*x*,5); (**K**) thermal conductivity *K(T)*; (ϕ) initial condition *ϕs*(*t*); (**l&r**) left *ls*(*x*) and right *rs(x)* boundary conditions.

#### **3. Data Sets**

Formalizing the concept of a data set (observations or measurements set):

$$D: \quad \{\pi\_{i\nu}t\_{i\nu}T\_{i\nu}\}, \ i \in I, \ I = 0..i\_{\max\prime}$$

where *Ti* is the temperature measurement at point *xi* at time *ti*.

For vectors of dimension *|D|*, introduce the notation

$$\|\|a\_i\|\|\_D = \||a\|\|\_D = \sqrt{\frac{1}{|D|}\sum\_{i \in I} a\_i^2}$$

Below, for numerical experiments, pseudo-experimental data are used, prepared on the basis of the exact solution (2) using pseudo-random number generators. The prepared 4 data sets were chosen as the most illustrative.

A basic data set was generated on a regular 11 × 11 grid (11 points in space 0, 0.2, 0.4 . . . , 2 and 11 points in time 0, 0.5, 1, . . . 5)

$$D\_{-
reg11x11}: \quad \{\mathbf{x}\_i = n\*0.2, \ t\_i = j\*0.5, \ T\_i = Ts(x\_i, t\_i) + \varepsilon\_i\},$$

$$\dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots$$

*i* = 11 ∗ *j* + *n*, *n* = 0..10, *j* = 0..10, where *Ts*(*xi,ti*) are the values of the exact solution, *ε<sup>i</sup>* is the random error with variance

$$
\sigma\_d = ||\varepsilon||\_D.
$$

To generate *εi*, a normal distribution random number generator (gauss (0.2)) with zero mean and variance equal to 2 (degrees) was used. As a result, the distribution *ε<sup>i</sup>* was obtained with average *md* = −0.10 (degrees) and variance *σ<sup>d</sup>* = 2.06 (degrees). These characteristics of errors are not used in calculations but are taken into account when considering the results.

By analogy, we introduce a data set of exact measurements:

$$D\_{-}\text{reg}\,11x11(\varepsilon=0)$$

with zero errors *ε<sup>i</sup>* = 0.

Let us define a data set containing 121 points randomly distributed on the *x,t* plane:

$$D\_{-}rm121: \quad \{ \mathbf{x}\_{i} = uniform(0, \, 2), t\_{i} = uniform(0, \, 5), T\_{i} = Ts(\mathbf{x}\_{i}, t\_{i}) + \varepsilon\_{i} \}, \\ j = 0..121.$$

To do this, use *uniform*(*a*, *b*)—a generator of random numbers uniformly distributed over the interval (*a,b*). The obtained characteristics of the normal distribution of temperature measurements are: *md* = −0.19 (degrees) and *σ<sup>d</sup>* = 2.14 (degrees).

Finally, let us define a data set containing 1000 points, distributed in a random way:

*D*\_*rnd*1000 : {*xi* = *uni f orm*(0, 2), *ti* = *uni f orm*(0, 5), *Ti* = *Ts*(*xi*, *ti*) + *εi*}, *j* = 0..1000,

with the characteristics of the normal distribution of temperature measurements: *md* = −0.02 (degrees) and *σ<sup>d</sup>* = 2.01 (degrees).

The location of the measurement points of the *D\_reg11x11, D\_rnd121* and *D\_rnd1000* sets on the *x*, *t* plane can be seen in Figure 2.

**Figure 2.** Solutions with different weights of regularization (penalties): (**A**) too big a penalty (undertrained solution); (**B**) optimally balanced SvF solution; (**C**) too small a penalty (overtrained solution).

The data set files can be found in file SvF-2021-11.zip in the Git repository https: //github.com/distcomp/SvF (accessed on 1 September 2022).

#### **4. Method of Balanced Identification**

The general problem is finding a function *T*(*x,t*) (and other functions of model (1)) that approximates the data set *D* and, possibly, satisfies additional conditions (for example, the heat equation). To formalize it, we define an objective function (or selection criterion), which is a weighted sum of two terms: one formalizing the concept of the proximity of the model trajectory to the corresponding observations, the other formalizing the concept of the complexity of the model, expressed in this case through the measure of curvature included in the statement of functions.

Let us introduce a measure of the proximity of the trajectory of the model to measurements (data set *D*) or the approximation error:

$$MSD(D\_\prime T) = \frac{1}{|D|} \sum\_{i \in I} (T\_i - T(\mathbf{x}\_{i\prime} t\_i))^2 = ||T\_i - T(\mathbf{x}\_{i\prime} t\_i)||\_{D\prime}^2$$

where |*D*| is the number of elements of the set *D*,

and a measure of curvature (complexity) of functions of one variable

$$\operatorname{Curv}(f(\mathfrak{x}), \mathfrak{a}) = \mathfrak{a} \int\_{a}^{b} \left(f''(\mathfrak{x})\right)^{2} d\mathfrak{x} \text{ .}$$

where [*a, b*] is the domain of the function *f(x)*, and two variables

$$\mathbb{C}urv\big(f(x,y),a\_x,a\_y\big) = \int\_{x\_{\min}}^{x\_{\max}} \int\_{y\_{\min}}^{y\_{\max}} \left(a\_x^2 \big(f\_{xx}^{''}\big)^2 + 2a\_x a\_y \big(f\_{xy}^{''}\big)^2 + a\_y^2 \big(f\_{yy}^{''}\big)^2\right) dx dy.$$

The objective function is a combination of the measures introduced above. Let us give, as an example, the objective function

$$Obj(T, D, \mathfrak{a}\_{\mathfrak{x}}, \mathfrak{a}\_{\mathfrak{t}}) = MSD(D, T) + Curv(T(\mathfrak{x}, \mathfrak{y}), \mathfrak{a}\_{\mathfrak{x}}, \mathfrak{a}\_{\mathfrak{t}}) \dots$$

The second term is the regularizing addition that makes the problem (of the search for a continuous function) correct. The choice of its value determines the quality of the solution. Figure 2 shows two unsuccessful options (A—weights that are too large, C—too small) and one successful (B—optimal weights chosen to minimize the cross-validation error).

Hereinafter, the following designations are used:

*rmsd = Ti – T(xi,ti)* <sup>D</sup> – the standard deviation of the solution from the measurements; *rmsd*\* – standard deviation of the balanced solution from measurements; *Err*(*x,t*) = *T*(*x,t*) − *Ts*(*x,t*) – deviation of the solution from the exact solution; Δ *= Err(xi,ti) <sup>D</sup>* – the standard deviation of the SvF solution from the exact solution; Δ*\** – estimation of Δ;

*<sup>σ</sup>cv* <sup>=</sup> *Ti* <sup>−</sup> *<sup>T</sup><sup>i</sup> <sup>α</sup>*(*xi*, *ti*)*<sup>D</sup>* – error (mean square error) of cross-validation, where *T<sup>i</sup> <sup>α</sup>*(*xi*, *ti*) is the solution obtained by minimizing the objective functional for given *α* on the set *D* without point (*xi,ti*). A more detailed (and more general) description of the cross-validation procedure can be found in [2].

An optimally balanced SvF solution is obtained by minimizing the cross-validation error by regularization coefficients (*α*):

$$
\sigma\_{cv}^\* = \min\_{\boldsymbol{\alpha}} \left\| \boldsymbol{T}\_{\boldsymbol{i}} - \boldsymbol{T}\_{\boldsymbol{\alpha}}^i(\boldsymbol{x}\_{i\prime} \boldsymbol{t}\_i) \right\|\_{D}
$$

As a justification for using the minimization of *σcv* to choose a model (regularization weights), we present the following reasoning (here (·*i*) stands for (*xi,ti*)):

$$
\sigma\_{cv}^2 = \frac{1}{|D|} \sum\_{i \in I} \left( T\_i - T\_a^i(\cdot\_i) \right)^2 = \frac{1}{|D|} \sum\_{i \in I} \left( T\_i - \text{Ts}(\cdot\_i) - \left( T\_a^i(\cdot\_i) - \text{Ts}(\cdot\_i) \right) \right)^2
$$

$$
\sigma\_{cv}^2 = \frac{1}{|D|} \sum\_{i \in I} (\varepsilon\_i)^2 - \frac{2}{|D|} \sum\_{i \in I} \varepsilon\_i \cdot \left( T\_a^i(\cdot\_i) - \text{Ts}(\cdot\_i) \right) + \frac{1}{|D|} \sum\_{i \in I} \left( T\_a^i(\cdot\_i) - \text{Ts}(\cdot\_i) \right)^2
$$

The second term represents the sum of the products of random variables *ε<sup>i</sup>* by an expression in parentheses, with the value of *ε<sup>i</sup>* excluded from the calculation (point *i* was removed from the data set). It is expected to tend to zero with an increase of the observations' number. Similarly, with an increase of the observations' number (everywhere dense in space (*x,t*)), the third term tends to Δ*2*, since Ti <sup>α</sup>(·i) → T(·i). As a result, we obtain the estimate

$$
\sigma\_{\rm cv}^2 \approx \sigma\_{\rm D}^2 + \Delta^2.
$$

Thus, cross-validation error minimizing leads (if a number of observations go to infinity) to minimizing the deviation of the solution found from the (unknown) exact solution. To assess such a deviation, introduce the designation:

$$
\Delta^\* = \sqrt{\sigma\_{c\upsilon}^{\*2} - \sigma\_D^2}.\tag{3}
$$

Remark. The payment for the problem regularization, as a rule, is the distortion of the solution. Moreover, the greater the weight of the regularization, the greater the distortion. In the case under consideration, the distortion consists in "straightening" the solution. The extreme case of "straightening" is shown in Figure 2A.

#### **5. Various Identification Problems and Their Numerical Solution**

Nine different identification tasks are discussed below. They differ in choices of data sets, minimization criteria (various regularizing additives) and additional conditions. For example, in Problem 5.1 *MSD*(*D\_reg11x11*) *+ Curv*(*T*):*M* = 0, the minimization criterion is used:

$$\mathcal{S}(T,\mathsf{K},\mathsf{q},\mathsf{l},\mathsf{r}) = \underset{T,\mathsf{K},\mathsf{q},\mathsf{l},\mathsf{r}}{\operatorname{argmin}} \{ \operatorname{MSD}(\mathsf{Dreg11x6},T) + \mathsf{Cur}\upsilon(T,\mathsf{a}\_{\mathsf{x}\mathsf{r}}\mathsf{a}\_{\mathsf{t}}) : M = \mathsf{0} \},$$

which means for the given regularization weights *αx,α<sup>t</sup>* and a given data set *D\_reg11x11*, find a set of functions (*T*, *K*, *ϕ*, *l*, *r*) that minimizes the functional *MSD(D\_reg11x11,T) + Curv(T,αx,αt)*, and the sought functions must satisfy the equations of the model *M* = 0. This criterion is used to minimize the error of cross-validation, which makes it possible to find the regularization weights and the corresponding balanced SvF solution (*T*, *K*, *ϕ*, *l*, *r*).

To reduce the size of the formulas, a more compact notation for the selection criterion is used:

$$\text{MSD}(\text{D\\_reg}11\text{x11}, \text{T}) + \text{Curv}(\text{T}, \alpha\_{\text{x}}, \alpha\_{\text{y}}) \rightarrow \min; \text{(M = 0)}).$$

The same notation will be used for the other problems.

The mathematical study of the variational problems is not the subject of the article. Note that even the original inverse problems of this type can have a non-unique solution, in particular, there are different heat conductivity coefficients leading to the same solution *T*(*x,t*) [7,8]. Only Problem 5.0 (a spline approximation problem) is known to have a unique solution under rather simple conditions [13].

To find approximate solutions, we will use numerical models, which are obtained from analytical ones by replacing arbitrary mathematical functions with functions specified on the grid or polynomials (only for *K*(*T*)), derivatives with their difference analogs, integrals with the sums. Note that the grid used for the numerical model (41 points in *x* with a step equal to 0.05 and 21 points in *t* with a step equal to 0.25) is not tied to the measurement points in any way. For simplicity (and stability of calculations), an implicit four-point scheme was chosen [14]. The choice of scheme requires a separate study and is not carried out here. However, apparently, the optimization algorithm used for solving the problem as a whole (residual minimization) makes it possible to avoid a number of problems associated with the stability of calculations.

For the graphs of the exact solution, blue lines will be used, and for the SvF solution, red. *5.0. Problem MSD(D\_reg11x11) + Curv(T)*

Generally speaking, this simplest problem has nothing to do with the heat equation (therefore, its number is 0). It consists of finding a compromise between the proximity of the surface *T*(*x*,*t*) to observations and its complexity (expressed in terms of the curvature *T*(*x*,*t*)) based on the minimization functional:

$$\text{MSD}(\text{D\\_reg}11 \ge 11, \text{T}) + \text{Curv}(\text{T}, \mathfrak{a}\_{\text{x}}, \mathfrak{a}\_{\text{y}}) \to \min \tag{4}$$

The results of the numerical solution of the identification problem are shown in Figure 3. The estimates obtained (resulting errors)

$$\sigma\_{cv}^\* \text{ 2.38, } \text{rmssd} \ast = 1.44, \text{ } \Delta^\* = 1.19$$

are benchmarks for assessing the errors of further problems.

**Figure 3.** SvF solution of Problem 5.0: (**T**) contour lines of *T*(*x,t*); (**T6**) 6 slices of *T*(*x,t*); (**Err**) *Err*(*x,t*) *= T*(*x,t*) − *Ts*(*x,t*) – deviation of the SvF solution from the exact solution; (ϕ) is the initial condition; (**l&r**) left and right boundary conditions.

*5.1. Problem MSD(D\_reg11x11) + Curv(T):M=0*

Now, the identification problem is related to the heat conduction equation. It consists of minimizing the cross-validation error, provided that the solution sought satisfies the thermal conductivity equation (*M=0*), based on the criterion:

*MSD(D\_reg11x11,T) + Curv(T,αx,αy)* → *min:(M = 0)*

The results are shown in Figure 4.

**Figure 4.** *Cont*.

**Figure 4.** SvF solution of Problem 5.1: (**T**) contour lines of *T(x,t)*; (**T6**) 6 slices of *T(x,t)*; (**Err**) *Err(x,t) = T(x,t)-Ts(x,t)*; (ϕ) the initial condition; (**l&r**) boundary conditions; (**K**) the thermal conductivity coefficient *K(t)*.

Errors: *σ*∗ *cv =* 2.24, *rmsd\** = 1.58, Δ*\** = 0.86.

*5.2. Problem MSD(D\_reg11x11) + Curv(T): M = 0, l = ls, r = rs*

Two additional conditions *l = ls, r = rs* mean that the SvF solution must coincide with the exact one on the boundaries:

*MSD(D\_reg11x11,T) + Curv(T,αx,αy)* → *min:(M = 0, l = ls, r = rs)*

Here and below, the figures show not the entire set of functions, but only the essential ones (the rest do not change much). The results are shown in Figure 5.

**Figure 5.** SvF solution of Problem 5.2: (**T**) contour lines of *T(x,t)*; (ϕ) the initial condition; (**K**) the thermal conductivity coefficient *K(t)*.

Errors: *σ*∗ *cv* = 2.15, *rmsd\** = 1.86, Δ*\** = 0.61. *5.3. Problem MSD(D\_reg11x11) + Curv(T): M = 0, l = ls, r = rs, ϕ = ϕs* Suppose that the initial condition is also known:

$$\text{MSD(D\\_reg11x11,T)} + \text{Curv}(T, \mathfrak{a}\_x, \mathfrak{a}\_y) \to \min; \\ \text{s.t.} \\ (M = 0, \mathfrak{l} = \text{ls}, r = rs, \,\mathfrak{q} = \text{qs});$$

Some results are shown in Figure 6.

Errors: *σ*∗ *cv =* 2.06, *rmsd\** = 2.01, Δ*\** = 0.49.

*5.4. Problem MSD(D\_reg11x11) + Curv(ϕ) + Curv(l) + Curv(r) + Curv(K):M = 0*

The problem differs from Problem 5.1 by the penalties of four functions *ϕ, l, r* and *K,* that determine the solution, replacing the penalty for the curvature of the solution *T(x,t)*:

*MSD(D\_reg11x11,T) + Curv(ϕ,α1) + Curv(l,α2) + Curv(r,α3) + Curv(K,α4)* → *min:(M = 0).*

The formulation seems to be more consistent with the physics of the phenomenon regularization occurs at the level of functions that determine the solution, and not at the solution itself.

**Figure 6.** SvF solution of Problem 5.3: (**Err**) *Err(x,t) = T(x,t)-Ts(x,t)*; (**K**) the thermal conductivity coefficient *K(t)*.

Errors: *σ*∗ *cv* = 2.22, *rmsd\** = 1.82, Δ*\** = 0.83.

Attention should be paid to the incorrect behavior of the thermal conductivity coefficient near the right border of the graph in Figure 7K.

**Figure 7.** SvF solution of Problem 5.4: (ϕ) the initial condition; (**l&r**) boundary conditions; (**K**) the thermal conductivity coefficient.

*5.5. Problem MSD(D\_reg11x11) + Curv(ϕ) +Curv(l) + Curv(r) + Curv(K): M = 0, dK/dT <= 0* Let it be additionally known that the thermal conductivity does not increase with increasing temperature *dK/dT* <= 0:

*MSD(D\_reg11x11,T) + Curv(ϕ,α1) + Curv(l,α2) + Curv(r,α3) + Curv(K,α4)* → *min:(M =0, dK/dT <= 0)*

This is an attempt to correct the solution by adding to the formulation of the minimization problem an additional condition formalizing a priori knowledge of the behavior of the coefficient *K(T)* (see Figures 7K and 8K).

**Figure 8.** SvF solution of Problem 5.5: (**Err**) *Err(x,t) = T(x,t)-Ts(x,t)*; (**K**) the thermal conductivity coefficient.

Errors: *σ*∗ *cv =* 2.23, *rmsd\** = 1.80, Δ*\** = 0.85. *5.6. Problem MSD(D\_rnd121) + Curv(T): M = 0, l = ls, r = rs, ϕ = ϕs*

The problem is similar to Problem 5.3, except the data set consists of 121 points on an irregular grid:

*MSD(D\_rnd121,T) + Curv(T,αx,αy)* → *min:(M = 0, l = ls, r = rs, ϕ = ϕs)*

Some results are shown in Figure 9.

**Figure 9.** SvF solution of Problem 5.6: (**Err**) *Err(x,t) = T(x,t)-Ts(x,t)*; (**K**) the thermal conductivity coefficient.

Errors: *σ*∗ *cv* = 2.13, *rmsd\** = 2.05, Δ*\** = 0.39.

*5.7. Problem MSD(D\_rnd1000) + Curv(T): M = 0, l = ls, r = rs , ϕ = ϕs*

The problem is similar to problem 5.6, except the data set consists of 1000 points:

*MSD(D\_rnd1000,T) + Curv(T,αx,αy)* → *min:(M = 0, l = ls, r = rs, ϕ = ϕs)*

The results are shown in Figure 10.

**Figure 10.** SvF solution of Problem 5.7: (**Err**) *Err(x,t) = T(x,t)-Ts(x,t)*; (**K**) the thermal conductivity coefficient.

Errors: *σ*∗ *cv* = 2.02, *rmsd\** = 2.01, Δ*\** = 0.15.

*5.8. Problem MSD(D\_reg11x11(ε = 0)) + Curv(ϕ) + Curv(l) + Curv(r) + Curv(K):M = 0* The problem is similar to Problem 5.4, but with a set of exact measurements (*ε<sup>i</sup>* = 0):

*MSD(D\_reg11x11(ε = 0)),T) + Curv(ϕ,α1) + Curv(l,α2) + Curv(r,α3) + Curv(K,α4)* → *min:(M = 0).*

Some results are shown in Figure 11.

Errors: *σ*∗ *cv* = 0.06, *rmsd\** = 0.004, Δ*\** = 0.

The graphs of the boundary and initial conditions are not shown, since the SvF solutions actually coincide with the exact one.

**Figure 11.** SvF solution of Problem 5.8: (**T**) contour lines of *T(x,t)*; (**K**) the thermal conductivity coefficient.

#### **6. Discussion**

The errors obtained during problem solving are summarized in Table 1. Analyzing the table allowed us to identify some of the patterns that appeared during problem modification.

**Table 1.** Errors: *σ*∗ *cv* –error of cross-validation, the main indicator of the "quality" of the constructed model; *rmsd\** is the standard deviation of the SvF solution from observations, *σ<sup>d</sup>* is the data error, Δ is the standard deviation of the SvF solution from the exact solution, Δ*\** is the estimate of Δ determined by Formula (3).


Lines 0–3. Lines 0–3 of Table 1 show some patterns of successive model modifications. As expected, adding the "correct" additional conditions leads to a more accurate (see column Δ) modification of the model. These conditions reduce the set of feasible solutions of the optimization problem, while adding "correct" conditions cuts off unnecessary (nonessential) parts from it. In the technology used, this leads to a decrease in the *σ*∗ *cv* crossvalidation error.

The growth of the *rmsd\** error seems paradoxical: the more accurate the model, the greater its root mean square deviation from observations. However, it is easy to explain. First of all, *rmsd\** is within the error limits of the initial data *σd*. Second, the better the model, the closer it is to the exact solution, and for the exact solution *rmsd = σd*. Of course, if regularization penalties that are too large are chosen, the solution will be distorted so that *rmsd* will be greater than *σd*. This situation is shown in Figure 2A.

During modification, every subsequent model (from 0 to 3) is a follow up of the previous one. Previously found solutions are used as initial approximations, which allows us to find solutions faster as well as avoid poorly interpreted solutions.

Lines 4–5. The problems considered differ from Problem 5.1 by the selection criterion: instead of the solution *T*, the functions *ϕ*, *l*, *r*, and *K* (defining the solution) are used for regularization. This formulation seems to be more consistent with the physics of the

phenomenon—a penalty imposed on the original functions determining the dynamics of the process, and not on their consequence (solution). The estimates of the cross-validation error (*σcv*) obtained are similar to Problem 5.1 but with smaller deviation from the exact solution Δ. The decrease in deviation may be associated with a special case of generated errors. The issue requires further research.

In Problem 5.4, the obtained solution of the thermal conductivity coefficient *K (T)* (see Figure 7K) rises sharply to the right border. Suppose it is known in advance that the coefficient is not to increase. This knowledge can be easily added to the model as an additional condition (*dK/dT* <= 0). As a result (Problem 5.5), *K(T)* changed (see Figure 8K). At the same time, the accuracy indicators (line 5) practically stayed unchanged, which indicates that such an additional condition does not contradict the model and observations.

Line 6. Problem similar to Problem 5.3 but with a data set with a random arrangement of observations in space and time. The same number of observations leads to the same error estimates but the deviation from the exact solution is noticeably smaller. The use of such data sets should be carefully considered.

Line 7. Increasing the number of observations to 1000 significantly improves the accuracy of the solution.

Line 8. Using a data set with precise measurements allows us to get a close-to-exact solution.

General notes. The Δ*\** estimate generally describes Δ (the standard deviation of the SvF solution from the exact one) well enough. Note, that the data error *σ<sup>d</sup>* (usually unknown) is used for the calculations.

Figures 4Err, 6Err, 8Err, 9T and 10Err show how the regularization distorts the solution. As expected, distortions are mainly observed in regions with high curvature (large values of the squares of the second derivatives).

It is easy to see that almost for all problems (except problem 5.8), the following inequalities hold:

$$
\sigma\_{\rm cr}^\* \ge \sigma\_{\rm d} \ge \text{rmsd}^\*.
$$

It appears to be true when the model used, the regularization method, and the chosen cross-validation procedure are consistent with the data used and the physics of the phenomenon. At least, if the wrong model is chosen for describing the data (an incorrect mathematical description or too severe a regularization penalty), then the right-hand side of the inequality does not hold. If the errors in setting the data are not random (for example, space position related) or the cross-validation procedure is chosen incorrectly, the left side of the inequality will be violated. Thus, the violation of the inequality above is a sign of something going wrong.

#### **7. Conclusions**

The problems (and their solution) considered in the article illustrate the effectiveness of the application of regularization methods and, in particular, the use of balanced identification technology.

The results above confirm the thesis: the more data, the higher the accuracy, and the more knowledge about the object, the more complex and accurate models can be constructed. The technology used allows us to organize the evolutionary process of building models, from simple to complex. In this case, the indicator determining "the winner in the competitive struggle of models" is the error of cross-validation—reducing the error is a big argument in favor of this model.

In addition, this gradual (evolutionary) modification is highly desirable as the formulations under consideration are complex two-level (possibly multi-extreme) optimization problems and their solution requires significant resources. Thus, finding a solution without a "plausible" initial approximation would require computational resources that are too large and, in addition, one cannot be sure that the solution found (one of the local minima of the optimization problem) will have a subject interpretation that satisfies the researcher.

This step-by-step complication of the problem, together with specific techniques such as doubling the number of grid nodes, can significantly save computational resources. All of this work's results were obtained on a modern laptop (CORE i5 processor) within a reasonable time (up to 1 h). The two-level optimization problem, which in this case allows parallelization, consumes the majority of the resources. Tools for the solution of more complex resource-intensive tasks exist for high-performance multiprocessor complexes [10,11].

As for computing resources, SvF technology is resource intensive. This is justified as it is aimed at saving the researcher's time.

Appendix A contains a listing of the task file. The notation used is close to the mathematical one—a formal description of the model for calculations practically coincides with the formulas of the model (1). This allows for an easy model modification (no "manual" program code rewriting). For example, to take into account the heat flux at the border, a corresponding condition defining the derivative at the border has to be added to the task file.

Let us take a look at unsolved problems and possible solutions.

One problem is possible local minima. However, there are special solvers designed to search for global extrema, for example, SCIP [15] (source codes are available) which implements the branch-and-bound algorithm, including global optimization problems with continuous variables. Perhaps, if a previously found solution is used as an initial approximation, a confirmation that the found minimum is global might be obtained in a reasonable time.

Finally, the paper considers various errors' estimates of solution T(x,t) only and not the other functions' identification accuracy. The evaluation of the accuracy of determining the thermal conductivity coefficient is particularly interesting. Another problem is the formalization of errors that arise when replacing a real physical object with a mathematical model and real observations with a measurement error model. In the future, these issues should be researched.

**Author Contributions:** Conceptualization, A.S.; methodology, A.S.; software, A.S. and I.N.; validation, A.S. and I.N.; formal analysis, A.S. and I.N.; investigation, A.S. and I.N.; resources, A.S. and I.N.; data curation, A.S. and I.N.; writing—original draft preparation, A.S. and I.N.; writing—review and editing, A.S.; visualization, A.S. and I.N.; supervision, A.S. and I.N.; project administration, A.S. and I.N.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Russian Science Foundation under grant no. 22-11-00317, https://rscf.ru/project/22-11-00317/, accessed on 1 November 2022. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Megascience Facilities at NRC "Kurchatov Institute".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The software package together with a task file (MSD(D\_reg11x11) + Curv(T):M = 0.odt) is freely available online in the Git repository https://github.com/distcomp/SvF, accessed on 1 November 2022 (file SvF-2021-11.zip).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **Appendix A. Task File Sample**

The software package together with the considered task file (MSD(D\_reg11x11) + Curv(T):M = 0.odt) is freely available online in the Git repository https://github.com/ distcomp/SvF, accessed on 1 November 2022 (file SvF-2021-11.zip) (accessed on 1 September 2022).

Format: .odt-Open/Libre Office.

The file contains a complete formal description of Problem 5.1 (identification of unknown functions of the mathematical model *MSD(D\_reg11x11) + Curv(T):M* = 0 and a

number of service instructions required for a numerical solution based on the balanced identification technology.

The first line (see Figure A1) specifies the maximum number of iterations, the second specifies the difference scheme, the third specifies the data source (data set), and the fourth specifies the cross-validation procedure parameters. The following describes the mathematical model: *Set:* defines the sets, *Var:* defines unknown variables—functions to be identified, *EQ:* equations of the mathematical model, *Obj:* objective function (selection criterion). Note that the first equation was made in the formula editor (Tex notation). A different, less visual encoding of formulas (commented out line, marked with a # symbol) can be used instead.

**Figure A1.** Listing of the example task file.

#### **References**


#### MDPI

St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Mathematics* Editorial Office E-mail: mathematics@mdpi.com www.mdpi.com/journal/mathematics

Academic Open Access Publishing

www.mdpi.com ISBN 978-3-0365-8061-6