*3.4. Training*

The neural network training process facilitates the definition of existing parameters in an evolving fuzzy neural network. In the model proposed in this article, this training takes place in two stages. The first, offline, is performed using the concept of the Extreme Learning Machine [10], and in the evolving phase of the model, a recursive weighted least squares (RWLS) approach [52] is used. The first process guarantees a definition of weights based on the initial architecture, and the second incrementally updates these weights as new fuzzy rules, and samples with essential information are evaluated. The initial definition of the weights of the third layer carried out in the offline training stage can be expressed by [38]:

$$
\vec{v}\_k = Z^+ \vec{y}\_m \quad \forall m = 1, \dots, 2 \tag{42}
$$

where *m* is the number of classes, and *Z*<sup>+</sup> = *ZTZ* is the pseudo-inverse of the Moore– Penrose matrix [53] of *Z* (output of the second layer). This procedure is commonly applied in training artificial neural networks and sets the weights in a single step, allowing the training to be fast and accurate.

In the evolving phase of the model (new incoming online stream samples), the recursive weighted least squares (RWLS) approach is used [52] with formulas for updating the *v k* , and it can be expressed as follow:

$$\eta = \vec{z}^t Q^{t-1} \left( \psi + (\vec{z}^t)^T Q^{t-1} \vec{z}^t \right)^{-1} \tag{43}$$

$$\mathbb{Q}^t = (\mathbf{I}\_{L^t} - \eta^T \overline{z}^t) \boldsymbol{\psi}^{-1} \mathbf{Q}^{t-1} \tag{44}$$

$$
\vec{v}\_k^t = \vec{v}\_k^{t-1} + \eta^T (y\_m^t - \vec{z}^t \vec{v}\_k^{t-1}) \tag{45}
$$

where the index *k* again denotes the class index *k* = 1, ... , *C*. *zt* denotes the regressor vector of the current sample, *η* is the current kalman gain vector, *ILts* is an identity matrix based on the number of neurons in the second layer, and *Lts* × *<sup>L</sup>ts*; *ψ* ∈]0, 1] denotes forgetting factor (1 per default). *Q* denotes the inverse Hessian matrix *Q* = (*ZTselZsel*)−<sup>1</sup> and is set initially as *<sup>ω</sup>ILts*, where *ω* = 1000 [54].

This matrix is directly and incrementally updated by the second equation above without requiring (time-consuming and possibly unpredictable) matrix re-inversion. This procedure identifies existing changes in the dataset and updates the weight values without losing the previous reference. Thus, the concept of memory is applied to the weights, allowing the model's training to be consistent with the premises of evolving fuzzy systems.

The eFNN-SODA can be synthesized, as represented in Algorithm 1. It has one parameter during the initial batch learning phase:

Granularity of the cloud results (grid size), *ϑ*;

#### **Algorithm 1** eFNN-SODA Training and Update Algorithm

Initial Batch Learning Phase (Input: data matrix *X*):

(1) Define granularity of the cloud, *ϑ*.

(2) Extract *L* clouds in the first layer using the SODA approach (based on *ϑ*).

(3) Construct *L* fuzzy neurons with Gaussian membership functions with*c* and *σ* values derived from SODA.

(4) Calculate the combination (feature) weights *w* for neuron construction using Equation (34).

(5) Construct *L* logic neurons on the second layer of the network by welding the *L* fuzzy neurons of the first layer, using or-neurons (Equation (37)) and the centers*c* and widths*σ*. (6)

**for** *i* = 1, . . . , *K* **do**

> (6.1) Calculate the regression vector *<sup>z</sup>*(*xi*).

(6.2) Store it as one row entry into the activation level matrix *z*.

**end for**

(7) Extract activation level matrix *z* according to the *L* neurons.

(8) Estimate the weights of the output layer for all classes *k* = 1, ... , *C* by Equation (42) using *z* and indicator vectors *yk* .

#### **Update Phase (Input: single data sample** *xt***):**

(11) Update *L* clouds and evolving new ones on demand (based on Stages 5, 6, and 7 of the SODA approach).

(12) Update the feature weights *w* by updating the within- and between-class scatter matrix and recalculating Equation (34).

(13) Perform Steps (3) and (5).

(14) Calculate the interpretability criteria (Section 3.5).

(15) Calculate the regression vector *<sup>z</sup>*(*xt*).

(16) Update the output layer weights by Equation (45).

The computational complexity of eFNN-SODA encloses the number of flops demanded to process one single sample through the updated algorithm (second part in Algorithm 1) because this affects the online speed of the algorithm. In this sense, the main steps involved can be listed below:

