2.3.3. Simple Linear Regression

In this study, the predictive power and the relevant statistical descriptions between a pair of variables are justified by building several SLR models. Every SLR model chooses a data transform (method) to convert the independent variable (i.e., the RHS variable) prior to parameter estimation, so the model remains linear. A typical SLR model is defined as follows:

$$Y\_i = \alpha + \beta X\_i + \varepsilon\_i \tag{4}$$

where *α* and *β* are the intercept and the regression coefficient associated with the only independent variable *X* in the model, respectively, which are to be estimated; given the *i*-th data tuple in the dataset, (*Xi*,*Yi*), *ε<sup>i</sup>* is the residual of this data tuple with respect to *Y* = *α* + *βX*; other symbols are as defined previously.

The SLR model *Y* = *α* + *βX* can then be plotted as the best line fit (i.e., the 'AB-line') that includes all data points (*Xi*,*Yi*), ∀*i* ∈ {1, 2, . . . , *n*} in the data space. This model also satisfies the requirement to identify the causal relationship and predictive power between two variables 'pair-wisely', i.e., the PCA used in the overall analysis (which also applies to the correlation analysis and the cosine similarity analysis).

In our analysis, every time two variables are paired, one variable becomes the dependent variable *Y* and the other one becomes the independent variable *X*. This then yields a total number of *C*<sup>8</sup> <sup>2</sup> × 2! = 56 'base models' of (*X*, *Y*), and in each case, the two paired variables are interchangeable (to appear on the RHS or LHS of the model, i.e., 2!).
