**3. The AM-MISG Algorithm**

ˆ

In this section, we introduce the auxiliary model and multi-innovation identification theories briefly, and derive the AM-MISG algorithm for the Hammerstein OEMA system.

Let *θ k* denote the estimate of *θ*. Based on the search principle of negative gradient, defining and minimizing the cost function

$$J(\boldsymbol{\theta}) := \frac{1}{2} \sum\_{j=1}^{k} [y\_j - \boldsymbol{\varrho}\_j^{\mathrm{r}} \boldsymbol{\theta}]^2 \boldsymbol{\omega}$$

the following SG algorithm can be obtained for estimating the parameter vector *θ*:

$$
\theta\_k = \theta\_{k-1} - \mu\_1 \frac{\partial f(\theta)}{\partial \theta} = \theta\_{k-1} + \frac{\mathfrak{p}\_k}{s\_k} \varepsilon\_{k\prime} \tag{7}
$$

$$\begin{array}{rcl} \varepsilon\_k &=& y\_k - \mathfrak{q}\_k^\top \hat{\mathfrak{G}}\_{k-1} \end{array} \tag{8}$$

$$s\_k \quad = \quad s\_{k-1} + \|\boldsymbol{\varrho}\_k\|^2. \tag{9}$$

where *μ*1 is the step size for the SG algorithm, which is taken as *μ*1 = 1*sk*, and *s*0 = 1.

However, it is worth noting that the variables *xk*−*i*, *<sup>u</sup>*¯*k*−*<sup>i</sup>* and *vk*−*i* in *ϕk* are unknown, and thus the algorithms in (7)–(9) cannot be implemented directly. The solution is to use the idea of the auxiliary model to build the following auxiliary models based on the parameter estimate *θ* ˆ *k*:

$$\begin{aligned} \label{eq:1} \hat{\mathfrak{x}}\_{k} &=& \hat{\mathfrak{p}}\_{\mathfrak{s},k}^{\mathfrak{r}} \hat{\mathfrak{o}}\_{\mathfrak{s},k},\\ \hat{\mathfrak{u}}\_{k} &=& \hat{\mathfrak{e}}\_{1,k} f\_{1}(u\_{k}) + \mathfrak{e}\_{2,k} f\_{2}(u\_{k}) + \cdots + \mathfrak{e}\_{m,k} f\_{m}(u\_{k}),\\ \hat{\mathfrak{y}}\_{k} &=& y\_{k} - \hat{\mathfrak{p}}\_{k}^{\mathfrak{r}} \hat{\mathfrak{o}}\_{k'} \end{aligned}$$

and use the outputs *<sup>x</sup>*<sup>ˆ</sup>*k*−*i*, *<sup>u</sup>*ˆ¯*k*−*<sup>i</sup>* and *<sup>v</sup>*<sup>ˆ</sup>*k*−*<sup>i</sup>* of the auxiliary models instead of the unknown variables *xk*−*i*, *<sup>u</sup>*¯*k*−*<sup>i</sup>* and *vk*−*i* to construct the estimates of the information vectors:

$$\begin{array}{rcl}\Phi\_{k} &=& \begin{bmatrix} \Phi\_{\mathsf{s},k} \\ \Phi\_{\mathsf{n},k} \end{bmatrix} , \\ \Phi\_{\mathsf{s},k} &=& [-\hat{\mathsf{x}}\_{k-1}, -\hat{\mathsf{x}}\_{k-2}, \cdots, -\hat{\mathsf{x}}\_{k-n\_{d}}, \hat{u}\_{k-1}, \hat{u}\_{k-2}, \cdots, \hat{u}\_{k-n\_{b}}, f(u\_{k})]^{\mathsf{T}}, \\ \Phi\_{\mathsf{n},k} &=& [\vartheta\_{k-1}, \vartheta\_{k-2}, \cdots, \vartheta\_{k-n\_{d}}]^{\mathsf{T}}. \end{array}$$

The SG algorithm update the parameter estimate using the current data information, thus its computational complexity is low, but estimation accuracy needs to be improved. Based on the multi-innovation identification theory [44,45], a slide window of length *p* (i.e., innovation length) is built to improve the estimation performance of the SG algorithm, which contains the data information from the current time *k* to *k* − *p* + 1, i.e.,

$$E\_{p,k} = \left[y\_k - \Phi\_k^{\mathbb{T}}\theta\_{k-1}, y\_{k-1} - \Phi\_{k-1}^{\mathbb{T}}\theta\_{k-2\prime} \cdot \cdots \cdot y\_{k-p+1} - \Phi\_{k-p+1}^{\mathbb{T}}\theta\_{k-p}\right]^\top. \tag{10}$$

Define the stacked output vector *<sup>Y</sup>p*,*<sup>k</sup>* and information matrix **Φ**ˆ *p*,*k* as

$$\begin{array}{lcl}\mathcal{Y}\_{p,k} &:=& [y\_{k'}y\_{k-1'}\cdot\cdot\cdot, y\_{k-p+1}]^\top \in \mathbb{R}^p, \\\Phi\_{p,k} &:=& [\hat{\Phi}\_{k'}\hat{\Phi}\_{k-1'}\cdot\cdot\cdot, \hat{\Phi}\_{k-p+1}] \in \mathbb{R}^{n \times p}.\end{array}$$

In principle, the estimate *θ* ˆ *t*−1 is closer to the optimal value *θ* than *θ* ˆ *t*−*i* for *i* = 2, ··· , *p*, then Equation (10) can be approximated by

$$E\_{p,k} = \mathcal{Y}\_{p,k} - \hat{\Phi}\_{p,k}^\dagger \hat{\theta}\_{k-1}.$$

In summary, we can obtain the AM-MISG algorithm as follows:

ˆ

ˆ

ˆ

 ,

T

=

ˆ

T

ˆ

$$
\hat{\boldsymbol{\theta}}\_{k} = \boldsymbol{\theta}\_{k-1} + \frac{\boldsymbol{\Phi}\_{p,k}}{s\_k} \boldsymbol{E}\_{p,k\prime} \tag{11}
$$

$$\begin{array}{rcl} E\_{p,k} & = & \Upsilon\_{p,k} - \Phi\_{p,k}^{\prime} \theta\_{k-1} \\ s\_k & = & s\_{k-1} + \|\hat{\Phi}\_k\|^2 \quad s\_0 = 1, \end{array} \tag{12}$$

$$\mathbf{Y}\_{p,k} = [\underbrace{y\_k, y\_{k-1}, \dots, y\_{k-p+1}}\_{k-p+1}]^\top,\tag{14}$$

$$\Phi\_{p,k} = [\Phi\_k, \Phi\_{k-1}, \dots, \Phi\_{k-p+1}]\_\prime \tag{15}$$

$$
\hat{\mathfrak{a}}\_{k} = \underset{\omega \to \omega}{f(\mathfrak{u}\_{k})} \hat{\mathfrak{c}}\_{k'} \tag{16}
$$

$$\begin{array}{rclclcl}\cline{1-2}\pounds\_{k} &=& \spadesuit\_{s,k} \theta\_{s,k} & & & & \tag{17} \\ \cline{1-2}\spadesuit\_{k} & \dots & \rightsquigarrow & & & & \tag{17} \\ & & & & & \rightsquigarrow & & & \tag{18} \end{array}$$

$$\begin{aligned} \psi\_k &= \, \, \_\text{\sharp} - \phi\_k^\text{\sharp} \theta\_k \\ \, f(u\_k) &= \, \_\text{\sharp} [f\_1(u\_k), f\_2(u\_k), \dots, f\_m(u\_k)] \end{aligned} \tag{18}$$

$$
\Phi\_k \quad = \begin{bmatrix} \Phi\_{\mathbf{s},k} \\ \Phi\_{\mathbf{n},k} \end{bmatrix} \tag{20}
$$

$$\begin{array}{rclcrcl}\Phi\_{\mathbf{s},k} &=& [-\p\_{k-1}, -\p\_{k-2}, \dots, -\p\_{k-n\_d}, \p\_{k-1}, \p\_{k-2}, \dots, \p\_{k-n\_{\mathbf{b}'}} f(u\_k)]^\top, & \text{(21)}\\\Phi\_{\mathbf{n},k} &=& [\p\_{k-1}, \p\_{k-2}, \dots, \p\_{k-n\_d}]^\top, & \end{array} \tag{22}$$

$$
\boldsymbol{\hat{\theta}}\_{\mathbf{n},k} \quad \boldsymbol{\epsilon} \quad \quad \left[ \begin{array}{c} \boldsymbol{\omega}\_{k-1}, \boldsymbol{\omega}\_{k-2}, \dots, \boldsymbol{\omega}\_{\boldsymbol{\cdot}}, \boldsymbol{\omega}\_{k-\boldsymbol{n}\_d} \right] \; \; \; \; \tag{23}
$$

$$
\boldsymbol{\hat{\theta}}\_{k} \; \; \; \boldsymbol{\hat{\theta}}\_{k} \; \; \; \; \in \; \quad \left[ \begin{array}{c} \boldsymbol{\hat{\theta}}\_{\mathbf{s},k} \\ \boldsymbol{\hat{\mathbf{d}}}\_{k} \end{array} \right] \; \; \tag{23}
$$

$$
\boldsymbol{\theta}\_{\mathbf{s},k} \quad = \quad [\mathbf{\hat{a}}\_{k'}^{\mathrm{r}}, \mathbf{\hat{b}}\_{k'}^{\mathrm{r}}, \mathbf{\hat{c}}\_{k}^{\mathrm{r}}]^{\mathrm{r}}.\tag{24}
$$

Please note that the AM-MISG algorithm will reduce to the auxiliary model-based stochastic gradient (AM-SG) algorithm when *p* = 1.
