*4.2. Neural Network Architecture*

Different networks with different architectures were trained for each combination of Δ*t* and ROI radius to determine the best architecture for this model. Unlike the default random split employed by the Matlab neural network training tool, for this application an interleaved division algorithm was used to ensure that data from every day were available for training and validation, thus ensuring maximum representativeness. The proportion of training data was 70% of the set and consequently 30% was used for validation. Other splits were preliminarily tested; however, this proportion showed less variation among results when run multiple times. Normalization is an important process for neural network training, framing all values between 0 and 1, so that the gradients applied to the synaptic weights' updates are always decreasing [30].

The Matlab neural network training tool is highly customizable, but some of the default values for data-fitting problems, such as these, were left unchanged: the specific type of backpropagation algorithm, Levenberg–Marquardt; the mean-squared-error performance metric; and the hyperbolic tangent sigmoid (tansig) transfer function for the neurons. This was performed because these default values yielded solid results and were beyond the machine learning scope of this work.

Training was performed for both target variables, *P*<sup>0</sup> and Δ*P*, since both had very different behaviors and none were successfully represented by linear models. A diagram of the relationships between the inputs, neurons and modeled variables is presented in Figure 10.

**Figure 10.** Diagram of inputs, outputs and layers in the tested networks.

First, the *P*<sup>0</sup> coefficient of determination is presented in Figure 11 for the different architectures and combination of Δ*t* and ROI radius.

The model was trained with all five input variables previously used for the correlation analysis and linear regression (*P*−1, *T*<sup>0</sup> and the image attributes from all three channels). Each line on the plot represents a different network architecture, with either one or two hidden layers and several layer sizes listed in the legend. Thicker lines represent networks with two hidden layers.

For the first two Δ*t* values, all plot lines are indistinguishably close and boast good coefficients of determination, this being consistent with the results from the correlation analysis and linear regression. After this point, there is a dip in regression performance consistent with the linear evaluations; it then starts improving again, reaching even higher R2 than the initial Δ*t* range.

**Figure 11.** R2 values for neural network regression models for *P*0.

Networks with five neurons on the first hidden layer seem to yield the worst results. Other architectures vary and not one architecture seems significantly better than another. That said, networks with two hidden layers seem to be very similar to one another in most cases, as well as seem to vary less in amplitude than networks with a single hidden layer. The performance starts decreasing again for the last two Δ*t* values, which may be due to a less relevant relationship between input and output or due to less training samples availability. This occurs because the data are not contiguous, and therefore, with larger time intervals, the amount of data points that can be related decreases. These results show that *P*<sup>0</sup> is more accurately modeled with a nonlinear method, such as neural networks.

The same method, variables and architectures were used for modeling Δ*P*, and the R<sup>2</sup> values for this step are depicted in Figure 12.

This result showed that, for the first four Δ*t* values, neither a linear nor a non-linear method was capable of properly fitting these data. As of the fifth Δ*t* value, the neural network model starts presenting good R2, around 0.9. Similar to the previous plot, it is clear that networks with five neurons in the first hidden layer are inferior to the other tested architectures for most data points. After the fifth Δ*t* value, the R2 behaves similarly in both plots, reaching the highest coefficient of determination for Δ*t* = 60 s, closely followed by Δ*t* = 15 s. For both intervals, there are small peaks around ROI radius = 75 and 200 pixels.

One significant difference in both is the lack of the four drastically lower coefficient of determination points in the Δ*P* models, but that may be due to the lack of outlier analysis prior to model training. Since these regressions were made with all input variables, it was necessary to see if all variables were relevant to represent the target data. For this reason, the same training processes were performed for both target variables but varying the inputs. The chosen architecture was with two hidden layers with 15 and 10 neurons, respectively, which seemed to have some of the highest R2 values and varied less than others. The inputs used for each model are displayed in Table 5. Each row represents one model, and each column represents one of the five variables. The Xs mark when a variable is used.

**Figure 12.** R<sup>2</sup> values for neural network regression models for Δ*P*.


**Table 5.** Variables used in the second step of NN modeling.

The results from this training with different input variables for target variable Δ*P* are shown in Figure 13. Each line represents one line on the plot, and to save some room in the legend, the variables *P*−<sup>1</sup> (power at instant 0—Δ*t*) and *T*<sup>0</sup> (temperature at instant zero), were shortened to *P* and *T*, respectively. The red, blue, and green channel attributes were represented by *R*, *G* and *B*, respectively. In order to reduce some of the randomness attributed to the initialization of the variables and data division, each network was trained five times, and the best result was selected.

**Figure 13.** R2 values for neural network regression models for ΔP with varying input variables.

The first information to stand out in this plot is the three lower performing models with either missing power, temperature or image attributes (*P*, *T* or [*R, G, B*]). The best result was considered to be with all variables, reaching the highest R2 value (>0.98) and being the best result for several points. This result was achieved for Δ*t* = 60 s and ROI radius = 250 pixels.

The same methodology was applied to training with *P*<sup>0</sup> as the target, with the same combinations of variables and the results shown in Figure 14.

**Figure 14.** R2 values for neural network regression models for *P*<sup>0</sup> with varying input variables.

Similar to Figure 13, the three worst variable selections have either power, temperature or image attribute missing, but in this case, power made a bigger impact. However, as Δ*t* increases, the importance of *P*−<sup>1</sup> decreases, not just linearly as previously thought. Moreover, for *P*<sup>0</sup> the difference between using two or three image attributes is lower than for Δ*P*, but with all input variables, the models seem to fare overall slightly better, with ones using just the red and blue channels closely behind. The highest R2 (≈0.97) is with just red and blue (*P, T, R, B*) at Δ*t* = 60 s and ROI radius = 250 pixels.

It is safe to say both variables were successfully modeled by using neural networks, especially compared with linear models. For both cases, previous power, temperature and image attributes from image subtraction proved to be important to model the targets.

#### *4.3. Best Neural Network Results*

Finally, the selected architecture of two layers with 15 and 10 neurons, respectively, was trained several times, using all five input variables with data from the Δ*t* = 60 s step and ROI radius = 250 pixels to provide further insight on their performance and finish the validation step.

First, the data were tested modeling *P*0, using all five input variables. The coefficient of determination obtained was R2 = 0.94 for the validation process. This means that, when presented with data which were not used to train the network, it still was capable of estimating the output close to the real measured value.

Figure 15 presents the regression plot from this model, the blue line represents the model and the points are the pairs of estimated value versus real value for each input sample.

**Figure 15.** Regression plot for the validation of a NN model of *P*<sup>0</sup> with Δ*t* = 60 s step and ROI radius = 250 pixels.

In a perfect model, all points would stand on the line, but this result shows a very close representation of the relationship between input variables and the target variable. Next, the same process was applied to modeling Δ*P*, and the results are presented in Figure 16.

**Figure 16.** Regression plot for the validation of a NN model of Δ*P* with Δ*t* = 60 s step and ROI radius = 250 pixels.

In this case, the validation R<sup>2</sup> = 0.93 and the regression plot also show how well the model represents the relationship between input and target variable. It is safe to claim that neural networks are well suited to model this type of data.
