*2.4. ANN and Machine Learning Analysis*

In a computational environment, we randomly separated 80% of the data to train the ANN algorithm and 20% to test it. The number of spectral wavelengths (n = 360) was the same for each day. The spectral wavelengths were added as input for the ANN, and one hidden layer with n neurons was considered. A linear activation function was applied in the output layer. We adopted the Adam Optimizer with regularization of a = 0.0001. We used an open-source version of the RapidMiner v. 9.4 software.

To define the best hyperparameters, we performed a cross-validation method by separating our dataset into 10 folds. This separation was stratified and we used only the training dataset (80%). In this approach, one-fold is used to validate the algorithm performance while the remaining folds are used to train the model. The test is repeated until all 10 folds are used individually as validation data. An example of the training curve being adjusted to the 1st measurement day absorbance data is plotted below (Figure 2).

**Figure 2.** Example of the training curve with the difference in accuracy for the artificial neural networks (ANN) model.

We applied a hyperparametrization evaluation and detected that 100 neurons in hidden layers and a maximum number of interactions of 200 presented the ideal configuration without overfitting our model for most of the tests. Finally, we plotted an ROC (receiver operating characteristic) curve to evaluate the comparison between each classification and a confusion matrix of the ANN results. We evaluated the gain ratio and the F-score for each individual wavelength.

To test the robustness of the ANN, we compared it with other traditional machine learning algorithms, such as decision-tree; support vector machine (SVM); random forest (RF); naïve Bayes; and logistic regression. The number of training and testing remained the same. We also performed a hyperparametrization with these algorithms. The criteria for stopping was defined once it did not return in any practical gains for the classification accuracy (%). For this, we considered the individual characteristic of each classifier, like the number of trees, nodes and leaves, number of interactions, function degree, and others.

The decision tree and random forest models provide classification trees that rely on the idea of an overall accuracy improvement by adding the predictions of combined independent predictors [34]. SVM uses a regression approach to find separation lines and can be applied in many cases where there is a distinct margin of separation. Naïve Bayes is a probabilistic classifier that applies the Bayes' theorem with independence assumptions. Lastly, logistic regression is also a regression approach that bases itself in a sigmoid function to model the predicted classes [35].

The metrics used for evaluating the performance of each algorithm was the AUC (area under the curve), overall accuracy, F1-score, precision, and recall. We compared each one of them during the four stages of the spectral response measurement: 14th, 19th, 24th, and 29th days. Both the reflectance and the absorbance values were used separately as input features. The results are presented in the following section.
