*2.1. Study Area and Data Acquisition*

As a study area, an open field of citrus trees, located on private property in the municipality of Ubirajara, São Paulo state, Brazil, was selected. The species analyzed were all of Valencia-orange (*Citrus sinensis* "Valencia"), planted on a Citrumelo–Swingle rootstock. During the evaluation, the trees were in their vegetative phase, with an adult size, measuring nearly 3 m in height (ground-related), with crown areas around 5.5 m2. During the survey, the trees were at their maturing stage, which is five years from their initial planting. The area contains 752 trees per hectare, planted at approximately a 7 × 1.9 m spacing. Each plantation field was 250 m × 250 m in size, with some fields compensating for others accordingly to its location. The plantation fields were selected randomly inside this property and configured the different conditions of the treatments. Before the analysis, the soil was fertilized with 250 kg/ha of N in the form of urea, 125 kg/ha of phosphorus excreted, expressed as P2O5, and 167 kg/ha of potassium oxide (K2O). The area is predominantly composed of red-yellow podzolic soil, situated in a Cwa Köppen [54] subtropical climate type unit.

This paper evaluated leaf samples from multiple orange tress scattered around different planting fields in an experimental portion of the orchard. In each field, the number of trees was selected according to the size of the planting field and trees planted per area. The selection was both to measure the leaf hyperspectral response and to collect them. A total of 320 samples collected in the field, with both spectra curves and later-known nutrient content, were gathered during this survey. The sampling method followed standard recommended agronomic procedures, guided by a field specialist. To represent the proper conditions of a citrus tree, only leaves at a medium canopy height and those visually healthy with no signs of diseases or damages were evaluated. A lift platform was used to elevate the person with the equipment. Since the chemical analysis of the leaf tissue recommends the 3rd or 4th leaf of a fruit branch to be sampled, the spectroradiometer equipment was directed as close as possible to the leaves that shared this description.

After measuring the spectral radiance, the leaves were extracted from their respective branches, separated, and identified them into plastic bags to be submitted to the laboratory. The leaf samples consisted of the same leaves that had their spectral radiance measured. They were conditioned at an appropriate temperature and transported accordingly. In the laboratory, the leaves were washed with a neutral detergent to remove any impurities. Later, they were dried in an oven, for 48 h, at 60–65 ◦C, and then crushed. From the crushed material, 100 mg was used for the N analysis. For that, the Kjeldahl titration method [55], divided into 3 phases, was followed: (1) digestion; (2) distillation in a nitrogen distiller; and (3) titration with sulfuric acid (H2SO4). The remaining material was separated and used for the analysis of the other macronutrient (P, K, Ca, Mg, and S) and the micronutrients (copper (Cu), Fe, Mn, and zinc (Zn)), following standard laboratory procedures of chemical analysis of the leaf-tissue [56].

#### *2.2. Hyperspectral Measurement Processing*

The spectral radiance of the Valencia-orange leaves was measured with a Fieldspec ASD FieldSpec® HandHeld 2 spectroradiometer. To record each target, the equipment was positioned at a 45◦ angle concerning the tree canopy. For that, a lift platform was used to ensure the correct height. This equipment operates at a spectral range of 325 nm to 1075 nm. In this study, a 10◦ aperture lens was adopted, and 10 readings/measurements were conducted in each leaf to produce one mean spectral signature. This procedure was important to reduce noise and variance for the same target. Before each spectral measurement, a Lambertian (Spectralon® plate) surface plate was registered. This Lambertian plate was used to calibrate the equipment and convert the digital number to a physical signal. As mentioned, the leaf-spectral response in-field was recorded into 320 measurements for this experiment.

The measured spectral curves consist of the radiance value of the target (i.e., leaf samples) spread along the electromagnetic spectrum. To produce the reflectance value (i.e., reflectance factor), the Hemispherical Conical Reflectance Factor (HCRF) was calculated as shown in Equation (1) [57]:

$$\text{HCRF}(\omega\_{\text{l}}\omega\_{\text{r}}) = \frac{\text{dL}\ (\theta\_{\text{r}}, \Phi\_{\text{r}})\ (\text{target})}{\text{dL}\ (\theta\_{\text{r}}, \Phi\_{\text{r}})\ (\text{reference})} \text{ K}\ (\theta\_{\text{l}}, \Phi\_{\text{l}}, \theta\_{\text{r}}, \Phi\_{\text{r}}) \tag{1}$$

where dL is the radiance; ω is the solid angle; θ and Φ are the zenith and azimuth angles, respectively; i is the incident flux; and r is the reflected energy flux. The K value is the calibration coefficient (i.e., correction factor specified for the equipment). The target corresponds to the radiance of the leaf and the reference is the radiance of the Lambertian surface plate. The HCRF represents the spectral signature of the recorded target.

After obtaining the reflectance factor of each leaf, a low signal-to-noise removal was performed by excluding wavelengths under 380 nm and above 1020 nm. After this, the first-derivative of all the HCRFs (n = 320) was calculated. The first-derivative calculation is a traditional method for modeling spectral data, and many approaches have discussed this issue. For this study, a linear least mean-squared smoothing filter [58] was firstly performed to reduce the random noise that may vary with the wavelengths and affect the derivative function. In most cases, noise can be assumed to be stationary with constant variance. It then can estimate a noise-free spectrum s(λ) in terms of the current value of the observed data. By knowing the correct signal of the spectrum giving a specific wavelength s(λ), it is possible to perform a final approximation to estimate derivatives by suitable difference schemes according to a finite band resolution: Δλ. Thus, the first-derivative was calculated according to [58]:

$$\left| \frac{\mathbf{d}\_s}{\mathbf{d}\_\lambda} \right| = \frac{\mathbf{s}(\lambda\_i) - \mathbf{s}(\lambda\_j)}{\Delta\lambda} \tag{2}$$

where Δλ is calculated as |λj − λi|, assuming that the interval between the bands is constant. Additional tests involving further derivates, such as the second, third, and fourth, were also made in the experimental phase of this study. However, there were no indications of an improvement over the first-derivate for the used dataset during the machine learning analysis. For this reason, the proposed framework was limited only to the first-derivative, but future research using different leaf data to process additional derivatives is encouraged.

From the total leaf measurements (n = 320) used here, 10% (n = 32) was randomly separated and designed to compose the testing dataset (Figure 2). Wavelengths ranging from 380 to 1020 nm were used in the software as columns, while the leaf measurements (320) were used as rows. The 32 measurements were configured as an independent dataset, which belonged to the Valencia-orange trees located at different plantation fields, never before seen by the algorithms. The other 288 measurements configured the dependent dataset and belonged to trees with conditions or characteristics distinct from one another, observed during the field campaign. To indicate that, a descriptive statistical analysis was conducted with the nutrients' concentration from the chemical analysis of the leaf tissue, and the following parameters were calculated: minimum, maximum, mean, standard deviation, median, and coefficient of variation. They were important to demonstrate the discrepancy of the dependent (calibration/training) data used, and how representative it could be of the nutritional conditions of Valencia-orange leaf tissue in the analyzed period.

## *2.3. Machine Learning Analysis and Hyperspectral Mapping*

In a computational environment, the nutrients were individually selected as the target variables. As input parameters, the reflectance and the first-derivative were used, and the performance of the algorithms in predicting these nutrients was evaluated. As stated, the curves were separated into three sets. The training dataset was used to set-up the hyperparametrization of the chosen algorithms. For that, the Random Search approach [59] was used. The same conjunction of training/validation data was adopted for all algorithms. This process was repeated with a fine-tuning until the reduction in the mean absolute error (MAE) did not result in any more practical gains, as the modification in the parameters impacted the processing time. Once the hyperparameters of each algorithm were defined, the testing dataset was used to verify its real performance.

To configure and run the algorithms, the open-source computer program RapidMiner 9.5 was used, which is based on a particular Python Library [60], while still permitting the development and implementation of different codes. The workstation for this task was equipped with an Intel(R) Core (TM) i7-8550U CPU 4.00 GHz, a Nvidia GeForce MX-150 4Gb GDDR5 64-bits 6008 MHz GPU, and 8GB RAM DDR4 2400MHz. The algorithms for the proposed framework were as follows: k-Nearest Neighbor (kNN), Lasso Regression, Ridge Regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), and Random Forest (RF). The prediction metrics to evaluate these algorithms were the coefficient of determination (R2), mean absolute error (MAE), and root-mean-squared error (RMSE). To ascertain the relationship between the measured data and the predicted data, the overall finest models were evaluated in a regression plot.

**Figure 2.** Spectral wavelengths used for testing the machine learning algorithms' performance. In green are the spectral reflectance, while in dark-red are their respective first-derivatives.

Regarding the configuration of each algorithm, the parameters of the used methods were set to the library default values, except those described in Table 1. For both the DT and RF algorithms an Extreme Gradient Boosting (XGBoost) model was used to increase their performances. This model adopts a forward-learning ensemble method [61], which obtains predictive results in gradually improved estimations. To illustrate the machine learning architecture regarding data inputs and outputs in the proposed analysis, a structure was organized in Figure 3.


**Table 1.** Detailed information regarding the algorithms adopted in the proposed framework.

**Figure 3.** Structure of the machine learning architecture of the proposed framework.

It is also important to address that, although hyperspectral data is relatively easy to obtain, leaf tissue analysis can be limited. This is mostly because the chemical analysis can be highly cost if considering the amounts of data required to process the machine learning algorithms. Therefore, the appropriate number of samples is something to be observed in each case. In this study, the amount of data used to train and validate/test the used algorithms should be sufficient based on the literature. One study [62] compared different learners, such as RF, SVM, kNN, and others, in diverse settings. Between these settings, they evaluated the number of classes per problem (from 2 to 50) and the number of samples per class (from 5 to 100). This returned a variation of 10 to 5000 samples. Through their study, it was demonstrated that data curation could be modeled by these algorithms from a few to a high number of samples and still achieve appropriate results. In comparison with the proposed approach, other machine learning frameworks also adopted similar sample sizes, like 324 leaf measurements that were used to model the water-stress response from lettuce [23], 189 hyperspectral observations that were used to model grapevine water status [63], and 266 observations that were used as training to predict nitrogen content in rice fields [64].

As a discussion example, a recent paper collected 500 samples per class with 540 spectral bands and adopted a Cross-Validation method with a dataset considering 200 samples for each validation to demonstrate the importance of the feature selection methods [65]. Regardless, hyperspectral data have a characteristic distinct from most data, which is a high number of bands/wavelengths available to model a given problem. The used dataset was composed of 320 leaf measurement (in which 32 were separated as a test) and 640 spectral bands (380–1020 nm). This gives a total of 204,800 combinations to work with, which should be enough to configure a training/testing dataset. Although this high dimensionality could offer potential problems to hyperspectral data processes [65], studies already suggested that maintaining the original data could also outperform feature-selected subsets [66,67].

As mentioned, though the aforementioned studies did use similar sample sizes of data to train, validate, and test their learners, little information related to hyperspectral wavelengths and machine learning method sample size could be encountered in the literature [65]. Since there is no research focused on evaluating the impact of the training set to model spectral data, a previous comparison regarding two well-known sampling methods was performed. The first is the cross-validation method, which is more suitable to deal with the most common tasks in machine learning data curation [43]. The Cross-Validation method was performed with 10k folders. This model separates the data into 10(k) parts while using nine of them to train the algorithm and one to validate. This process is done sequentially, constantly changing the folder used for validation. In this manner, the chosen algorithm is always validated by data not used during the training phase. The second method used was the Leave-One-Out approach. This method is similar to Cross-Validation, but instead it only takes one data instance for validation each time. The method is considered a very time-intensive procedure, and it is only recommended for smaller datasets [43]. After applying the Random Search approach [59] to perform a fine-tuning, both training methods' results were compared (Table 2).

In the Cross-Validation method, from the 288 samples, 90% was used to train while 10% was used to validate, and was repeated 10 times randomly. In the Leave-One-Out method, 287 samples were used to train, while one sample was used to test it. This was repeated until all instances were used. The low difference between MAE predictions in each nutrient for both methods indicates that, even when adopting a more suitable training approach to model smaller datasets (Leave-One-Out), the training results are similar. Still, while the Leave-One-Out method is approximately unbiased, it could result in a high variance. Normally, the variance in fitting a model tends to be higher in small datasets since it is more sensitive to noise and artifacts in the used training sample. Because of that, a Cross-Validation method would also show signs of high variance, as well as a high bias if given a limited amount of data. This was not the case here, since both methods returned high predictions and similar metrics, thus indicating that, whatever the training method, the amount of data (204,800 combinations) was sufficient to model the given problem. Regardless, the Leave-One-Out method needed a higher computational cost, which is something to be considered when evaluating the amount of processed data. In the workstation, the Leave-One-Out-averaged processing time for all algorithms was 7.5 times slower than the Cross-Validation method. Because of that, the Cross-Validation method was adopted in this study, but future research should considerer both methods according to their respective dataset size and characteristics.

Lastly, the contribution of each spectral wavelength into the performance of the algorithm was computed by displaying their Relief-F value. Relief-F uses a kNN scoring to address noise data while handles incomplete data [68]. It is considered a reliable metric to inform a feature score and then be applied to rank top-scoring features. Here, the Relief-F values were used to map the hyperspectral response of each nutrient regarding the strength of the individual wavelengths to the performance of the evaluated algorithms. Aside from that, to help ascertain the hyperspectral relationship with the evaluated nutrients dataset, an analysis of each nutrient and a Shapiro–Wilk normality test at a 95% confidence interval was performed. As the normality test returned a p-value under 0.05, a non-parametric Spearman's correlation test in a pairwise comparison was executed to verify the association between each nutrient.


**2.**MAEreturnedfrombothtrainingmethods:Cross-ValidationandLeave-One-Out.

#### *Remote Sens.* **2020**, *12*, 906

#### **3. Results**

The chemical analysis of the leaf tissue returned heterogeneous and non-parametric results for the nutrient content of the analyzed leaves (Table 3). Analysis has shown that the majority of the nutrients presented a high variability and uniform distribution. This behavior was most noticeable in nutrients, such as Ca, Fe, Mn, and Zn. Regardless, this condition is important to demonstrate the applicability of the proposed framework. Since this is a heterogeneous dataset, machine learning algorithms are advantageous for modeling data with such characteristics.


**Table 3.** Descriptive data from the chemical analysis of the Valencia-orange leaves.

All of the nutrients returned a *p*-value under 0.05 for the Shapiro–Wilk normality test at a 95% confidence interval.

The correlation between nutrients (Figure 4) is important information to characterize a dataset. The correlation coefficients indicated that, although significant, most nutrients have a low correlation value among themselves. Still, the pairwise comparison returned an expected behavior. Macronutrients, such as N, P, and K, showed positive correlations with each other while presenting negative correlation coefficients (N and P) with the other nutrients. The correlation coefficients between the nutrients varied around 0.5 or below, with the highest reaching 0.59 and lowest reaching −0.60. The low correlation value is also favorable for the proposed framework, as it helps to isolate the effects of the nutrient on the evaluated wavelengths.


**Figure 4.** Correlation between the measured nutrients for the Valencia-orange leaf samples.

For the machine learning algorithms used, the results were separated between the two datasets: reflectance (Table 4) and first-derivative (Table 5). The algorithms returned good performances (R2 > 0.80) for the macronutrients with the spectral reflectance as predictors. When the first-derivative was used, the algorithms performed well on both macro- and micronutrients (some R2 > 0.80), but all performances were improved for the micronutrients. This is an important discovery, as it highlights

the importance of first-derivative measurements and their relationship with micronutrients in the Valencia-orange leaf tissue. In both datasets, algorithms like RF, ANN, and kNN returned better predictions than most linear ones, such as Lasso and Ridge Regressions and SVM. The MAE predictions returned here are similar to the predictions resulted from the training phase, which indicates how adjusted the sampling method was.


**Table 4.** The machine learning algorithms' accuracy performance for the reflectance data.

The bolded in the table are the scores representing the overall best performance of each nutrient.


**Table 5.** The machine learning algorithms' accuracy performance for first-derivative data.

The bolded in the table are the scores representing the overall best performance of each nutrient.

To ascertain the relationship between each nutrient prediction, their regression values were plotted (Figure 5a,b). A quick analysis of the best-predicted values versus the laboratory-measured values demonstrates how the performance of the algorithms varied with the increase in the nutrient concentrations. Nutrients such as P and Ca did not show a closer resemblance with a 1:1 relationship (dashed-line—Figure 5a,b) as much as the other nutrients' predictions, even lower ones such as Fe. Regardless, most predictions were quite well related to the laboratory data, and on-site measurements of nutrients like N, K, Mg, S, Cu, Mn, and Zn may benefit from the advantage promulgated by the approach presented here.

**Figure 5.** (**a**) Macronutrient prediction comparison against laboratory measurements for the best algorithms' results. (**b**) Micronutrient prediction comparison against laboratory measurements for the best algorithms' results.

The calculated Relief-F value showed the contribution of each wavelength to the algorithms' performance (Figure 6a,b). This contribution is important to isolate specific spectral regions and wavelengths of the electromagnetic spectrum most closely related to each nutrient. This relationship, however, is limited to the evaluated algorithm and its performance. Still, since most performances were relatively high (Tables 3 and 4) for most nutrients, this metric is an interesting parameter, as it shines some light on the spectral mapping of Valencia-orange leaf nutrients, as not much is known about their spectral behavior.

As mentioned, the Relief-F value calculated for each wavelength indicated important contributions in different ranges for each nutrient (Figure 5a,b). Because of that, certain bands of the electromagnetic spectrum contributed more than others. To summarize the information obtained from the proposed framework, a table (Table 6) indicating the nutrient and its class (macro or micro), the machine learning method most suitable to model it, its coefficient of determination (R2), the spectral data which its prediction was calculated from, and the most contributive wavelengths or spectral regions to model the measured nutrient from the Relief-F value was created. These results demonstrate the potential of applying different machine learning algorithms for this task. So far, this is the first approach of its kind with nutrient content in leaf tissue analysis.

**Figure 6.** *Cont.*

**Figure 6.** (**a**) The individual contribution of wavelengths for each macronutrients' prediction. (**b**) The individual contribution of wavelengths for each micronutrients' prediction.


**Table 6.** Summarized information on the results obtained by the proposed framework.

\* These wavelengths and regions were obtained by sorting the highest Relief-F values of each prediction.

#### **4. Discussion**

In the proposed framework, both reflectance data and their first-derivatives were used to predict macro- and micronutrients. This approach used a robust technique (machine learning) to model the hyperspectral data, which helped to ascertain some discoveries. The results demonstrated compelling performances to predict most of the nutrients (Tables 3–5). Nutrients like N, Mg, Cu, Mn, and Zn were predicted with an R<sup>2</sup> of 0.912, 0.832, 0.861, 0.898, and 0.855, respectively. Other nutrients like P, K, and S presented inferior performances with an R2 of 0.771, 0.763, and 0.727, respectively. The worst performances were obtained for nutrients like Ca and Fe, with their R2's equal to 0.624 and 0.612, respectively. In comparison to the literature, most of these performances, specifically those related to macronutrients, were similar or superior for other types of plants and methods. For N, predictions using visible to infrared data returned accuracies between 0.73 to 0.87 (R2) [2,30,40]. For

K, a three-band combination index predicted the nutrient with an R2 equal to 0.74 [37]. In nutrients like Mg, S, P, Ca, and others, the predictions (R2) variated between lower values of 0.27 up to 0.98, depending on the method applied and the plant evaluated [24,26,30,32,38,39,50].

One important finding from this research is the relationship between nutrients, algorithms, and leaf-spectral curves. For macronutrients, the performance of the algorithms was superior when adopting the surface reflectance data. As for the micronutrients, the first-derivative of the spectral reflectance returned better performances for the algorithms (Figure 5a,b and Table 6). This finding can be related to information reported in the literature [39–41]. Since first-derivative spectra allow for highlighting absorption features of the original spectra, it could potentially be linked to different components not so easily observable in spectral reflectance data alone. This approach demonstrated a better relation for all micronutrients when linked to the algorithms with the first-derivative, so this could offer a possible explanation. As previously mentioned in Section 2.2, other derivatives of the dataset were evaluated, but could not find a significant difference over the first derivative. Still, further research should continue to explore the association between first-derivative spectra, secondand third-derivatives, and micronutrients.

Another contribution of the proposed framework is that, although, with a limitation in the accuracy of the algorithms, it is possible to identify the wavelengths and the spectral regions that most contributed to predicting each nutrient (Figure 6a,b and Table 6). While some nutrients show contributions from the same wavelengths, these contributions vary in value (Relief-F). Even so, most of the nutrients showcase particular wavelengths that could potentially be isolated or used in combination with others to ascertain their relationship with the prediction (Table 6). This finding could help to map the Valencia-orange leaf spectral behavior related to both macro- and micronutrients and promote the investigation of simpler mathematical models or spectral indices capable of modeling these nutrients by focusing on these wavelengths.

Machine learning algorithms have the advantage of modeling data in a non-linear and a non-parametric manner. Unlike many traditional statistical methods, these algorithms are built with the advantage of dealing with noisy, complex, and heterogeneous data [16,23,50–52]. These characteristics proved to be an advantage for this study, as the data used had higher variance, was not-normal (Table 3), and, while statistically significant, low-correlated in a pairwise manner (Figure 4). Previous research aimed to model nutrient content with similar characteristics by using multiple mathematical methods in the analysis of plant hyperspectral data, but it did not return the same accuracies [15,24,32,40,42]. Nonetheless, since machine learning methods can deal with most of the data inconsistencies, both in hyperspectral measurements and in nutrient content analysis, the proposed framework should be more appropriate to combine these features not requiring data modification while still returning good performances.

Finally, the different performances returned by the algorithms should be discussed. It is clear that regression models like Lasso, Ridge, and SVM were inferior to others (Tables 3 and 4) in both scenarios (reflectance/first-derivative). Although SVM is known to handle high dimensionality data and do well with a limited training dataset [45], it performed poorly in the used dataset in comparison with the rest. The DT algorithm, though not as inferior in performance as the aforementioned algorithms, achieved middling results in comparison with the remaining methods. For the DT, the XGBoost model was adopted to improve the prediction performance, which was also implemented in the RF base model. During the experimental phase, this boosting model proved to be of assistance in enhancing the performance of both algorithms. Still, DT did not return predictions as accurate as did the RF algorithm.

The highest performances were obtained by the RF, ANN, and kNN algorithms; both for macroand micronutrients. While RF was better in almost all predictions, ANN and kNN performed well in only particular cases, especially for K (reflectance data) and Mn (first-derivative data), respectively. kNN is a simpler method than ANN and RF, being relatively faster. However, throughout the different nutrients, RF and ANN had better consistency. The ANN was constructed in a manner that could predict most of the nutrients, but performance rates were limited. The amount of available data for

training the algorithms could also be a potential hindrance for deep learning networks to handle. While the ANN method benefited from a multilayer perceptron, with two hidden layers and a high number of neurons and interactions, the RF algorithm was boosted with the XGBoost model, which returned a continuous performance for the reflectance and first-derivative datasets. Regardless, as no machine learning algorithm is considered universally appropriate to deal with any task, a framework like the one proposed here is recommended since it makes uses of multiple algorithms because different data could potentially impact its performance.
