*3.1. Independent Component Analysis-ICA*

ICA is a methodology for multivariate signal-processing based on the statistical independence property. ICA techniques seek to uncover the independent source signals from a set of observations that are composed of linear mixtures of the underlying sources. The sources are the data projected onto some new axes that must be discovered. Accordingly, this process is known as blind source separation, a category of algorithms that try to decompose mixed signals into their original sources. A classical example of separation of a mixed signal is the cocktail party in which a band is playing [31]. Invited people to the cocktail are not listening each instrument of the band separately, but the combination of all the instruments, voices and noises of the environment. Is it possible to separate each sound's source captured by the microphones? To answer the question, BSS algorithms are proposed that try to isolate each source.

Let us consider *N* time series each consisting of *M* samples (measured points). The aim is to find a transformation of these time series into a new representation in which independent components are identified and separated.

Formally, we represent the *N* measured time series

$$\mathbf{X}\_{i} = (\mathbf{x}\_{i1}, \mathbf{x}\_{i2}, \dots, \mathbf{x}\_{iM})^T, i = 1, \dots, N \tag{1}$$

compactly by a matrix **X** whose rows are the transposed time series

$$\mathbf{X} = \begin{pmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} & \cdots & \mathbf{x}\_{1M} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{x}\_{N1} & a\_{N2} & \cdots & \mathbf{x}\_{NM} \end{pmatrix} . \tag{2}$$

This *N* × *M* matrix is supposed to be a linear combination of the original signals, which can also be represented by another *N* × *M* matrix **S** with similar structure to **X**, i.e., the rows of **<sup>S</sup>** are the transposed of the original time series **<sup>S</sup>***<sup>i</sup>* = (*si*1,*si*2, ··· ,*siM*)*T*. The linear combination may be expressed by

$$\mathbf{X} = A \mathbf{S}\_{\prime} \tag{3}$$

where *A*, so-called mixing matrix, is the matrix representing the linear transformation. Keeping the analogy of the cocktail party, **X** corresponds to the sounds listened by the guests and **S** to the original sounds. The main objective of ICA is to determine the mixing matrix *A* and the original sources **S**. This task is formulated as an inverse and dual problem. First, a demixing matrix *W* must be found and then, based on this matrix, the source vector is calculated by

$$\mathbf{S} = \mathbb{W}\mathbf{X}.\tag{4}$$

Since the problem is highly underdetermined, the direct calculation of *W* or *A* is not possible. An estimate **<sup>Y</sup>** <sup>≈</sup> **<sup>S</sup>**<sup>ˆ</sup> of the sources is made instead by calculating a demixing matrix *W*, which acts on X such that

$$\mathbf{Y} = \mathsf{WX} = \hat{\mathsf{S}}.\tag{5}$$

and *<sup>W</sup>* ≈ *<sup>A</sup>*<sup>−</sup>1.

To perform this approximation, the process in the ICA algorithm uses some factorization of the observed data (mainly singular value decomposition), and high order statistics (such as the fourth moment, kurtosis) to measure signal-noise separation. From a statistical point of view, the separated signals must be independent, and the independent components must have a non-Gaussian distribution [32]. Based on this non-Gaussian nature, to calculate *W*, most ICA methods estimate the inverse of *A*, allowing the calculation of the source vector. The trick behind this process is to find that *A*−<sup>1</sup> that maximizes the non-Gaussian nature of the independent components. Usually, this process is done based

on maximum-likelihood estimation, maximization of the output entropy or minimization of mutual information in the output [33].

In this paper, the non-Gaussian nature is measured based on the the concept of negentropy, as presented and discussed by [32] in the algorithm called fastICA. The idea behind negentropy comes from the Information Theory. Gaussian-distributed data has entropy *H* equal to zero, while non-Gaussian-distributed data has non-negative entropy. Negentropy *J* is calculated as:

$$J(\mathbf{x}) = H(\mathbf{x\_{gauss}}) - H(\mathbf{x}) \, , \tag{6}$$

where **xgauss** is a Gaussian random variable with the same covariance as **x**.

The fastICA algorithm is based on a fixed-point scheme for finding *<sup>W</sup>* ≈ *<sup>A</sup>*−<sup>1</sup> through maximization of the negentropy. In addition, based on that matrix, it is possible to approximately rebuild the source vector as written in (5).

#### *3.2. Abrupt Change Point Detection-ACPD*

After sources separation by fastICA, it is expected that one of the sources will be affected by the cyber-attack. For detecting this change, an algorithm of abrupt change point detection (ACPD) is applied. ACPD is performed by evaluating one or more statistical parameters of the time series, so-called control variables.

For a formal definition, following the ACPD algorithm proposed by [34], let us first identify, among the separate signals provided by fastICA, that one that best represents the kind of signal we are interested in. In our case, we must identify that series mainly representing non-periodic behavior. Let **Y**(1) = (*y*11, *y*12, ..., *y*1*M*)*T*, one of the signals obtained by (5), be our series of interest, where *M* is the size of the time series. The algorithm tries to identify the various, say *m*, change points in this time series, which are positioned at indexes *τ*1, ..., *τm*. Each position *τ<sup>i</sup>* corresponds to an integer value between 1 and *M* − 1 and splits the time series into intervals [*τi*, *τi*+1].

A common approach to estimating *τ* = (*τ*1, ..., *τm*) is by minimizing the objective function:

$$\sum\_{i=1}^{m+1} f\left(\tau\_{i\prime}, \tau\_{i+1}\right) + \beta p(m), \tag{7}$$

where *f*(*τi*, *τi*+1) is a cost function related to the time series in the interval [*τi*, *τi*+1]. Several cost functions have been proposed in the literature, such as log-likelihood [35], quadratic loss or cumulative sums [36]. Moreover, *βp*(*m*) is a penalty function to avoid overfitting. The most common choice, according to [34], is a linear variation *p* = *βm*. This constraint allows the method to estimate a vector *τ* corresponding to a trade-off between the minimization of the cost function (found by a large-size *τ*) and the minimization of the penalty function (found by a small-size *τ*) [37].

The entire process can be summarized as follows:


The result of this process is exactly the set of components of *τ*. For this work, each component of the source's signal **Y** found by fastICA obtained by (5) is evaluated by the ACPD algorithm, and the vector *τ* corresponds to the start and the end times of an attack.
