**4. The AM-MIFSG Algorithm**

This section deduces an AM-MIFSG algorithm to improve the parameter estimation performance of above AM-MISG identification algorithm.

In (7), the first-order gradient is used to update the parameter vector. In contrast to the integer order, for the quadratic objective function, the derivative of a fractional-order near a point is uncertain, so its essential property is nonlocal. This excellent property enables the fractional-order gradient method to jump out of local optimum and reach global minimum point more quicker. Here, we propose to add the fractional-order gradient in addition to the first-order gradient, and the final update relation is written as:

$$
\hat{\theta}\_k = \hat{\theta}\_{k-1} - \mu\_1 \frac{\partial f(\theta)}{\partial \theta} - \mu\_a \frac{\partial^a f(\theta)}{\partial \theta},
\tag{25}
$$

where *μα* is the step size for the factional order derivative *∂α*. According to the Caputo and Riemann–Liouville definition [46,47], the fractional derivation of a power function *f*(*t*) = *tn* (*n* > −1)is defined as:

$$D\_t^{\mathfrak{a}}t^n = \frac{\Gamma(n+1)}{\Gamma(n+1-\mathfrak{a})}t^{n-\mathfrak{a}},\tag{26}$$

where *<sup>D</sup><sup>α</sup>t* is the fractional derivative operator of order *α* and Γ is the gamma function which defined as <sup>Γ</sup>(*n*)=(*n* − 1)!.

According to (26), the fractional-order gradient in Equation (25) can be written as follows:

$$\frac{\partial^{\mathfrak{a}}f(\theta)}{\partial\theta} \quad = \quad -\mathfrak{p}\_{\mathbb{k}}\left(\frac{\partial^{\mathfrak{a}}\theta}{\partial\theta}\right) = -\mathfrak{p}\_{\mathbb{k}}\left(\frac{\Gamma(2)}{\Gamma(2-\mathfrak{a})}\mathfrak{G}^{1-\mathfrak{a}}\right),\tag{27}$$

where Γ(2) = 1. Then Equation (25) can be approximated as follows:

$$
\boldsymbol{\theta}\_{k} = \boldsymbol{\theta}\_{k-1} + \frac{\boldsymbol{\varphi}\_{k}}{\boldsymbol{s}\_{k}} \boldsymbol{\varepsilon}\_{k} + \frac{\boldsymbol{\Psi}\_{k}}{\boldsymbol{s}\_{a,k}} \boldsymbol{\varepsilon}\_{k} \quad 0 < a < 1,\tag{28}
$$

$$s\_{a,k} = s\_{a,k-1} + \|\psi\_k\|^2, \quad s\_{a,0} = 1,\tag{29}$$

$$\Psi\_k \quad = \frac{\text{diag}(\mathfrak{o}\_k)(|\mathfrak{d}|\_{k-1}^{1-\mathfrak{a}})}{\Gamma(2-\mathfrak{a})}.\tag{30}$$

Please note that the absolute value of *θ* is used to avoid complex values, this is a common way of dealing with fractional-order gradient [38]. The introduction of fractionalorder parameter *α* provides additional degrees of freedom and increases the flexibility of the parameter estimation.

Similar to the AM-MISG algorithm in Section 3, expanding the information vector *ψk* to the information matrix

$$\mathbf{Y}\_{p,k} = [\psi\_{k'} \psi\_{k-1'} \cdot \cdot \cdot, \psi\_{k-p+1}]\_{\prime}$$

and applying the auxiliary model identification idea, we can obtain the following AM-MIFSG algorithm:

···

···

=

$$
\hat{\theta}\_k \quad = \quad \hat{\theta}\_{k-1} + \left(\frac{\hat{\Phi}\_{p,k}}{s\_k} + \frac{\hat{\Psi}\_{p,k}}{s\_{a,k}}\right) E\_{p,k\prime} \tag{31}
$$

$$E\_{p,k} = \|\Upsilon\_{p,k} - \Phi\_{p,k}^{\top}\theta\_{k-1\prime}\tag{32}$$

$$s\_k \quad = \quad s\_{k-1} + \|\Phi\_k\|\_{\text{\tiny in \\_2\\_to}}^2 \quad \text{so} \quad = 1,\tag{33}$$

$$s\_{a,k} = \|s\_{a,k-1} + \|\hat{\Psi}\_k\|^2, \quad s\_{a,0} = 1,\tag{34}$$

$$\Psi\_{p,k} = \begin{array}{c} \left[y\_k.y\_{k-1}, \dots, y\_{k-p+1}\right]^\dagger, \\ \Phi\_{p,k} = \left[\Phi\_k, \Phi\_{k-1}, \dots, \Phi\_{k-p+1}\right], \end{array} \tag{35}$$

$$\begin{array}{rcl}\Phi\_{p,k} &=& [\hat{\Phi}\_k, \hat{\Phi}\_{k-1}, \dots, \hat{\Phi}\_{k-p+1}]\_\prime \\\Psi\_{p,k} &=& [\hat{\Psi}\_k, \hat{\Psi}\_{k-1}, \dots, \hat{\Psi}\_{k-p+1}]\_\prime \end{array} \tag{36}$$

$$\hat{\Psi}\_{j} \quad = \frac{\text{diag}(\Phi\_{j}) (|\dot{\Phi}|\_{k-1}^{1-a})}{\Gamma(2-a)}, \quad j = k, k-1, \dots, k-p+1,\tag{38}$$

$$
\hat{u}\_{k\_{\perp}} = -f(u\_k)\hat{\mathbf{c}}\_{k\_{\perp}} \tag{39}
$$

$$
\hat{x}\_k \quad = \; \Phi\_{s,k}^{\text{T}} \hat{\theta}\_{s,k} \,. \tag{40}
$$

$$
\psi\_k \quad = \ \ y\_k - \Phi\_k^{\text{r}} \theta\_k. \tag{41}
$$

$$f(u\_k) = \inf\_{r\_1, \ldots, r\_m} [f\_1(u\_k), f\_2(u\_k), \ldots, f\_m(u\_k)]\_\prime \tag{42}$$

$$\begin{array}{rclclclcl}\Phi\_k &=& \begin{bmatrix} \Phi\_{k,k} \\ \Phi\_{n,k} \end{bmatrix} \prime \\\ \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \vdots \\ \end{array} \right\} & \left\{ \begin{array}{ccccc} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} \\\ \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} \\\ \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} \\\ \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} \\\ \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} & \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} \\\ \left\{ \begin{array}{ccccc} \Phi\_{n,k} \\ \Phi\_{n,k} \end{array} \right\} \end{$$

$$\begin{array}{lclclcl}\Phi\_{\mathbf{s},k} &=& [ -\p\_{k-1}, -\p\_{k-2}, \dots, -\p\_{k-n\_d}, \pounds\_{k-1}, \pounds\_{k-2}, \dots, \pounds\_{k-m\_b}, f(\mu\_k) ]^\top, & & & \\ \Phi\_{\mathbf{n},k} &=& [\p\_{k-1}, \pounds\_{k-2}, \dots, \pounds\_{k-n\_d}]^\top, & & & & \end{array} \tag{45}$$

$$
\begin{array}{rcl}
\hat{\boldsymbol{\theta}}\_{k} &=& \begin{bmatrix}
\hat{\boldsymbol{\theta}}\_{k} \\
\hat{\boldsymbol{d}}\_{k}
\end{bmatrix},
\end{array}
\tag{46}
$$

$$
\boldsymbol{\theta}\_{\mathbf{s},k} \quad = \quad [\mathbf{\mathring{a}}\_{k'}^{\mathbf{r}}, \mathbf{\mathring{b}}\_{k'}^{\mathbf{r}}, \mathbf{\mathring{c}}\_{k}^{\mathbf{r}}]^{\mathbf{r}}.\tag{47}
$$

Here, the above AM-MIFSG algorithm reduces to the auxiliary model-based fractional stochastic gradient (AM-FSG) algorithm when *p* = 1.

**Remark 1.** *In general, as the innovation length p increases, the collected data are being used more fully, and therefore the estimation accuracy is gradually improved. However, the computational amount increases at the same time. How to choose optimal innovation p is an open problem to be solved. In practice, we often choose p* < *n.*

**Remark 2.** *The differential order α is chose in the range of (0,1). The orders may show different characteristics for different systems, and can be adjusted during the procedure as needed.*

The implementation of the AM-MIFSG algorithm is listed as follows.

1. Choose *p*, *α* and initialize: let *k* = 1, *θ* ˆ 0 = *θ*ˆ s,0 *d*ˆ 0 = <sup>1</sup>*n*/*p*0 , *s*0 = 1, *<sup>s</sup>α*,<sup>0</sup> = 1, and set *x*ˆ*i* = 1/*p*0, *u*ˆ¯*i* = 1/*p*0 and *v*ˆ*i* = 1/*p*0 for *i* 0, *p*0 = 106, and give the base function *fi*(·).


4. Compute the innovation vector *<sup>E</sup>p*,*<sup>k</sup>* by (32), *sk* by (33) and *<sup>s</sup>α*,*<sup>k</sup>* by (34).


The algorithm obtained above combined with the method in [48–53] can cope with linear and nonlinear systems with different disturbances. Furthermore, prediction models or soft sensor models can be obtained with the assistance of other parameter estimation algorithms [54–59] and can be applied to process control and other fields [60–65].
