*2.4. Pearson Correlation*

This association method's primary goal is to identify two or more correlated variables [45].

The Pearson correlation coefficient measures the degree of correlation between two variables in a linear approach. Let *X* and *Y* be those variables, with measurements given by {*x*1, *x*2, *x*3, ..., *xn*} and {*y*1, *y*2, *y*3, ..., *yn*} and means *x* and *y*. Then, the Pearson coefficient is given by Equation (3) [42].

$$\rho(X,Y) = \frac{\sum\_{i=1}^{n} \left(x\_i - \overline{x}\right) \left(y\_i - \overline{y}\right)}{\left[\sum\_{i=1}^{n} \left(x\_i - \overline{x}\right)^2 \sum\_{i=1}^{n} \left(y\_i - \overline{y}\right)^2\right]^{\frac{1}{2}}} \tag{3}$$

A Pearson coefficient with the range *ρ* = [−1, 1] represents the level of correlation when *ρ* is positive and correlation is direct, and the negative is the inverse [42].

When two variables are highly correlated, one can be redundant. The Pearson correlation works only for linear relations and results in incorrectly measured correlations for nonlinear systems. When classifying with binary outputs, it is possible to identify using Pearson coefficients how an attribute correlates with the target class [42].

Additionally, one can perform a correlation statistical significance test using the *pvalue* coefficient, such as a test of the probability that the correlation coefficient *ρ* is a wrong hypothesis; for example, as a convention from the literature, if *pvalue* > 0.05, it is unreliable. The alternatives for such a determination include statistical tests, such as the *tvalue*, variance analysis (ANOVA), and 1*tailed* or 2*tailed* tests [48].
