*3.5. Correlation Analysis*

This goal in this step was not only to determine if there was a linear correlation between image features and the power ramp events, but to determine at which time intervals they were higher. Because of the detection-per-ramp event, the data are not contiguous in their entirety; however, during several periods, the variations occurred close enough for the data to overlap and create an almost continuous set of data points.

To unbiasedly determine which time interval would be more adequate for modeling the ramp events, several different values were analyzed. The power variation between points was calculated for each possible pair of data points that fits in the different intervals. Due to the disconnection between the data points, as the intervals grew larger, less points fit in a certain interval, so the maximum interval used was 90 s. The corresponding variable for the correlation analysis is obtained directly from the image. It was obtained by subtracting the corresponding images from each two data points used for calculating the power difference. After subtraction of each digital channel (RGB), the energy (image energy is calculated by summating the individual pixel values in an image or ROI) was calculated for a circular Region of Interest (ROI) around the sun. Different ROI radii (distance in pixels) were used to take into consideration cloud movement (speed) in a given interval (time).

Aside from the power difference and subtracted image energy, the instant power measurements and temperature measurement were also analyzed. In total, 84 combinations of time intervals Δ*t* = {1; 2; 5; 8; 10; 15; 20; 30; 45; 60; 75; 90} and ROI radii *r* = {25; 50; 75; 100; 150; 200; 250} were analyzed in this step. To be able to present the results in a concise form, each combination of interval and radius was assigned an index that will be used to identify them throughout this work. Table 4 contains the keys to identify the combinations from their respective indexes.


**Table 4.** Indexes used to identify the combinations of Δ*t* and ROI radius.

Correlation coefficients were calculated for each combination of the target variables (power at *t*0, P0; and power difference between *t*<sup>0</sup> and *t*0−Δ*t*: ΔP) and the aforementioned variables (power at *t*0−Δ*t*, P−1; temperature at *t*0, T0; and ROI energy differences between *t*<sup>0</sup> and *t*0−Δ*t*). Correlation coefficients are used to measure linear proportionality between data pairs, which will show if a linear regression model would suffice for this problem.

### *3.6. Neural Network Modeling*

To validate the obtained data, first a baseline regression performance was defined by performing multivariate linear regression to model *P*<sup>0</sup> and Δ*P* as a function of *P*−1, *T*<sup>0</sup> and the image attribute of the blue channel, previously introduced. Only one color channel was used to prevent a collinearity issue from adversely affecting the model regression. To evaluate the regression performance, the coefficient of determination (R2) was employed, as it measures how well the model represents the data used for regression.

All attempted linear regressions presented low R2, despite showing low error, most likely due to the extremely low variation rates in the data presented. This aligns with the information obtained from the correlation analysis, where for shorter time intervals, *P*<sup>0</sup> and *P*−<sup>1</sup> showed high correlation coefficients. This fact does not suffice to produce a good regression model. The other variables were statistically insignificant to the model, despite being relevant in theory. This pointed to the possible suitability of a nonlinear model, and for that step a regression neural network was chosen.

Artificial neural networks aim to mimic a brain's neuronal structure by assigning weights to the individual interconnections between neurons, and thus are capable of solving complex, non-linear problems [30]. Despite the correlation analysis only looking into linear correlation between pairs of variables, most likely there are more complex relationships between these variables, and by increasing size and complexity of a neural network, it should be able to model these relationships.

A multilayer perceptron (MLP) network was used for the purpose of validating the acquired data and selected image features. The network used in this work had fully connected neurons to map underlying relationships between the selected variables. If a certain connection does not prove to be relevant to the problem, the learning process will assign low synaptic weights to them. The selected training algorithm was through feed-forward backpropagation [30].

In it, the function signals resulting of the response of the activation function move forward through the interconnected neurons biased by the synaptic weights until they reach the output layer. The result is compared to a previously known value and the error values are propagated backwards through the network and the synaptic weights are adjusted to minimize the error values. This process may take several iterations depending on the complexity of the model and the network [30].

This process has the potential to overfit the model to the presented data, rendering it unsuitable for interpretingnew data. In order to avoid this, the data provided need to be of sufficient size and pertinent to the problem, a suitable architecture and size of network must be used, the problem must not be complex beyond what the model can handle, and the training process must be stopped before the model is overfitted to the training data. This process may take several iterations depending on the complexity of the model and the network [30].

For the first issue, in the context of this work, the data-acquisition procedure and feature selection were tailored to the problem at hand, so the representativeness of the dataset should be sufficient. As for sample size, the system acquired data for as long as it could, until the camera failed, most likely due to humidity damage to the circuitry or ultraviolet (UV) damage to the camera sensor.

Regarding the second issue, the MLP network was tested with several sizes and architectures to produce the highest accuracy and generalization possible. As for the complexity of the problem, that cannot be changed, but the representativeness of the variables used should provide the network with enough valuable information. Again, that is also a result of the tailoring of the data-acquisition procedures to the very short-term forecast problem.

Finally, regarding overfitting by overtraining, a cross-validation approach [30] was used to the back-propagation learning. This means that the training sample was split into two subsets, one to perform the actual learning with error backpropagation and synaptic weights adjustment, and the other was used to validate the error on a fresh set of data that the model could not have been overfitted to. By comparing the network performance on both subsets, it is clear when the model starts to get overfitted. Whilst the training set would keep reducing errors, the validation set would start to see increasing errors. This would mean that the model was overfitted to the training set and was losing generalization capability.
