UAV Hyperspectral Characterization of Vegetation Using Entropy-Based Active Sampling for Partial Least Square Regression Models

Amitrano, Donato; Cicala, Luca; De Mizio, Marco; Tufano, Francesco

doi:10.3390/app13084812

Open AccessArticle

UAV Hyperspectral Characterization of Vegetation Using Entropy-Based Active Sampling for Partial Least Square Regression Models

by

Donato Amitrano

^*,

Luca Cicala

,

Marco De Mizio

and

Francesco Tufano

Italian Aerospace Research Centre, Via Maiorise snc, 81043 Capua, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(8), 4812; https://doi.org/10.3390/app13084812

Submission received: 1 March 2023 / Revised: 27 March 2023 / Accepted: 10 April 2023 / Published: 11 April 2023

(This article belongs to the Special Issue Agriculture 4.0 – the Future of Farming Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Optimization of agricultural practices is key for facing the challenges of modern agri-food systems, which are expected to satisfy a growing demand of food production in a landscape characterized by a reduction in cultivable lands and an increasing awareness of sustainability issues. In this work, an operational methodology for characterization of vegetation biomass and nitrogen content based on close-range hyperspectral remote sensing is introduced. It is based on an unsupervised active learning technique suitable for the calibration of a partial least square regression. The proposed technique relies on an innovative usage of Shannon’s entropy and allows for the set-up of an incremental monitoring framework from scratch aiming at minimizing field sampling activities. Experimental results concerning the estimation of grassland biomass and nitrogen content returned RMSE values of 2.05 t/ha and 4.68 kg/ha, respectively. They are comparable with the literature, mostly relying on supervised frameworks and confirmed the suitability of the proposed methodology with operational environments.

Keywords:

UAV; precision agriculture; hyperspectral imagery; active sampling; vegetation biomass; vegetation nitrogen content; partial least squares regression

1. Introduction

Nowadays, adequate crop management is crucial to ensure high and sustainable food production. The increasing world population needs higher productivity, but at the same time, reduces the availability of agricultural lands [1]. Moreover, the increased awareness of the environmental impact of fertilizers and water supplies is modifying farming practices and regulations with the objective of minimizing waste of resources [2]. As a result, according to the second and the twelfth Sustainable Development Goals, i.e., zero hunger and responsible consumption and production, agri-food systems are asked to meet the challenge of increasing productivity in a sustainable manner [1].

In this context, automation and digitalization are key, because they allow for optimizing all the phases of the agricultural cycle, from sowing to harvesting. However, while automation, i.e., the exploitation of machines able to help and outperform humans in production tasks, has been widespread in agricultural practice for several years [1], digitalization, i.e., the creation and exploitation of digital twins of farmlands [3], is still limited.

Digital twins are virtual equivalents of physical objects [4] with whom they are connected in real time. Their intrinsic dynamic nature includes the representation of the current behavior of real objects, as well as the prediction of their future state [5]. In the case of agricultural systems, the parameters to be collected for building such a representation are related to both the environment (such as air temperature, humidity, precipitations, etc.) and the vegetation. These are traditionally retrieved through field measurements that, despite their precision, become unsustainable in presence of distributed targets. Therefore, the exploitation of remote [6] and close-range sensing technologies is essential [7,8,9]. Indeed, resolution issues and the flexibility of close-range remote sensing platforms, with particular reference to the possibility of carrying different sensors, make drones the preferred tools for precision agriculture. In particular, hyperspectral imagery is mostly exploited because of the spectral resolution and detail it can offer [10].

The approaches developed to retrieve vegetation biophysical parameters from remote sensing data can be classified as physical-based or empirical [11]. In the first case, a model based on radiative transfer theory is inverted. This requires significant knowledge of the scene, which has to be adequately parametrized both at canopy structure and atmospheric levels, since the solution of the problem is to find the best match between the simulated signal and the measured one. In turn, empirical methods rely on the direct calibration of a relationship between the measured signal and the biophysical variable of interest. These methods are constrained by how representative the calibration data is compared to the behavior of the object to be modeled [11]. However, if significant training samples are provided, empirical methods, such as the partial least squares regression (PLSR) [12], are able to deliver accurate predictions of many variables of agricultural interest. Despite some studies raised concerns about the ability of such methodologies to model nonlinear relation between remote sensing data and, as an example, the biomass [13], the literature widely exploited PLSR to estimate quantities such as grapevine yield and berry weight [8], grassland biomass [14], nitrogen [9] and phosphorus [15] content and carotenoids content in cotton crops [16].

Using empirical methods, the selection of significant samples is fundamental for model calibration. In the literature, this concept is usually expressed as active sample selection or active learning [17]. Its purpose is to extend the forecasting capability of an existing training dataset by adding a limited number of samples from newly available acquisitions [9,18] in order to allow new predictions with limited field work.

As stated in [18], active learning techniques suitable to solve regression problems can be categorized as based on uncertainty [19] or diversity criteria [20]. In the first case, available samples are ranked according to their uncertainty. The higher the uncertainty, the better their rank. In this family of methods, those based on variance-based pool of regressors (PAL) are probably the most interesting. The initial step in the PAL algorithm is the generation of n random subsets of the available training set. Each subset is used to train a regressor which delivers a prediction for the samples stored in the test set. This way, the samples in the test are coupled with n predictions, each one having its own variance with respect to the original training set. The higher the variance, the higher the uncertainty associated with a specific sample, which is therefore aggregated to the training set [18]. This methodology is thoroughly discussed in [21] and adopted, as an example, in [9].

Active learning techniques based on diversity criteria select samples based on the dissimilarities they introduce in the training datasets [18]. In this context, several metrics can be used to assess such dissimilarity like the Euclidean distance [22] or the cosine angle distance [23].

While the past literature mostly focused on the analysis of vegetation parameters, specific crops or growing stages, maybe not yet investigated using a given sensor [24,25,26], the purpose of this paper is the introduction of a crop monitoring framework, based on a PLSR model, iteratively updated as new measurements becomes available. The proposed approach is characterized by the use of a novel unsupervised and completely data-driven selection criterion for the choice of sampling areas for model calibration. It is assumed that any prior knowledge about the field or models to draw upon for active samples selection are available. In other words, monitoring activities are started from scratch and initialized by the first observation. Calibration data for PLSR are selected based on an innovative technique relying on the concept of entropy as defined by Shannon [27].

According to the active learning paradigm, the proposed methodology allows for (i) reducing field sampling, (ii) including newly acquired data within the model to improve its prediction capability and (iii) setting-up a continuous monitoring framework.

The work is organized as follows. Exploited data and the adopted methodology are introduced in Section 2. Section 3 is dedicated to the presentation of the obtained experimental results, which are discussed in Section 4. Conclusions are drawn at the end of the work.

2. Materials and Methods

2.1. Data

Data used in this study have been collected in two different campaigns using, as described by Franceschini et al. [9], the WageningenUR Hyperspectral Mapping System (HYMSY) [28]. It is a complex multisensory imaging system including, among other things, a hyperspectral sensor able to acquire data in 101 reflectance bands included in the range 450–950 nm with 5 nm interval. As declared in [9], data are provided in georeferenced (i.e., geometrically corrected) calibrated reflectance units.

The test site is a crop field cultivated with ryegrass (Lolium perenne) [29]. The study area was divided in 60 rectangular plots (see Figure 1) measuring 1.5 × 8 m. A total of 15 different fertilization treatments, with different nitrogen (N) amounts, were applied to the plots. In such way, each treatment was applied to 4 plots. In Figure 1, each treatment is associated with a different color.

Ground measurements have been implemented concurrently with drone acquisitions with destructive methods on 15 May 2014 (average dry mass: 33.5 t/ha, average nitrogen content: 43.4 kg/ha), 14 October 2014 (average dry mass: 16.2 t/ha, average nitrogen content: 37.2 kg/ha), 9 May 2017 (average dry mass: 11.0 t/ha, average nitrogen content: 23.8 kg/ha), 29 August 2017 (average dry mass: 10.2 t/ha, average nitrogen content: 29.8 kg/ha) and 26 October 2017 (average dry mass: 9.5 t/ha, average nitrogen content 23.8 kg/ha) [9].

The objective of the study is the estimation of grassland biomass and nitrogen content. Ground data for both the quantities are available for each plot. Indeed, as reported in [9], only 49 plots were considered for the estimation, as some of them are partially obscured by a metallic structure altering their reflectance, as shown in Figure 1.

2.2. Methodology

The overall methodology is presented in Figure 2. Its rationale is to set-up a monitoring framework using no prior knowledge about the field and no models available to draw upon for active samples selection. A completely data-driven active sampling procedure allowing a prediction starting from scratch is proposed for the retrieval of calibration data necessary for PLSR.

The prosed active sampling method is based on the concept of entropy H as defined by Shannon [27]

H = \sum_{i = 1}^{N} - P_{n} \log_{2} P_{n},

(1)

where

P_{n}

is the normalized probability of the n-th histogram quantization level and N is total number of bins. Entropy is a measure of the quantity of information carried by a signal. The higher the entropy, the higher its information content.

In the literature, entropy is usually exploited as selection criterion for query by bagging (EQB). In this technique, the samples were selected based on the maximum disagreement between a committee of classifiers obtained by bagging. First, different training sets were defined by replacement within original training data. Then, each training set was used to train the selected classifier to predict labels for each unlabeled sample. Finally, the entropy of the distribution of the different labels associated to each sample was calculated with the purpose to evaluate the disagreement among the classifiers. The samples showing the maximum entropy, i.e., those with maximum disagreement among the classifiers, were added to the current training set [30]. For the purposes of this paper, it is particularly of interest as its calculation was very fast, even relative to large datasets. Moreover, being a histogram shape parameter, it is suitable to be used with vectors composed by heterogeneous data, such as spectral measurements and vegetation indices, without projection within an auxiliary homogeneous feature space.

According to the diversity paradigm of active learning techniques, the entropy is exploited as parameter to discriminate whether the sample under consideration is adding information to the set already collected. The hypothesis behind the method is that, within a given hypercube, reflectance differences between different areas of the image are due to variations of the biophysical parameter under investigation. These areas are represented by the plots, in which, according to [9], the spectral response was averaged in order to extract a single feature vector for each of them (see Figure 3). The histogram was calculated using all the available spectral features, starting from a plot randomly selected. At the time of the first acquisition, all the components of the feature vector were considered for the calculation of the histogram.

The histogram entropy was calculated according to Equation (1). A new plot was added by appending its spectral content to the vector constituted by the one previously considered. In other words, the vectors representing the spectral response of the plots were concatenated to form a longer one of

k \times n

elements, where k is the number of available spectral features and n the number of plots considered. The entropy of the new histogram was calculated and, if higher than the previous value, the plot was marked as selected for field sampling. The procedure was repeated up to the end of available plots.

Marked plots are ideally sampled to retrieve ground data. They can be used to tune future sampling campaigns. In particular, the Pearson linear correlation coefficient [31] between the average plot response and ground data is computed in order to guide the successive active sampling procedure. When a new acquisition became available, the histogram was computed considering only the spectral features showing a significant correlation with the ground truth.

In synthesis, the proposed active sampling procedure can be summarized as follows.

At time zero, i.e., when just one image is available:
a.
Consider each plot of the field as cluster and make the spatial average for each spectral band;
b.
Calculate the histogram considering all the elements of the average spectral response starting from one randomly selected plot. Calculate the entropy $H_{1}$ of the histogram according to Equation (1);
c.
Add another plot to the dataset by appending its (average) spectral response to the vector constituted by the one of the first plot. Calculate the new histogram and its entropy $H_{2}$ ;
d.
If $H_{2} > H_{1}$ mark the plot as informative and continue by adding new plots to the dataset without deleting those marked as not informative. The plots marked as informative will be sampled to retrieve model calibration data.
Use ground data collected with the above procedure to calculate the linear correlation with each spectral band.
When a new acquisition becomes available, repeat the procedure described at Point 1, this time using only the spectral features showing significant correlation with ground data (see point 2) for histograms calculation. A reasonable threshold to select features showing at least moderate correlation can be set to 0.4 [32]. All the available data should be used for samples selection. In other words, the new clusters should be appended to those already available from time zero in order to set-up an incremental sampling framework. This growing vector is the one used for entropy calculation at each iteration.
As the number of flights increases, the linear correlation analysis between the different spectral features and ground data continues to modulate the active sampling procedure. In this case, the computation of histograms will consider only the features showing, on average, a linear correlation coefficient higher than 0.4.

Following the diagram depicted in Figure 2, active sampling feeds the PLSR to estimate the value of specific biophysical parameter within the entire area of interest. The proposed model initially exploits a total of 141 variables constituted by the entire hyperspectral data-cube (101 reflectance bands) plus a selection of 40 vegetation indices, as suggested, among the others in [8,33]. For the ease of the reader, the list of the indices used in this study is reported in Table 1, which has a twofold purpose. The first is to collect all the formulas in one place, as it could be difficult to retrieve each of them from the literature. The second is to suggest a possible set of variables to exploit in similar studies.

A tentative regression step was run using 10 latent variables (projected components). Its principal objective is to calculate the variable importance in projection (VIP) score. It was defined, for each variable j, as the sum, over latent variables f, of its PLS-weight value

w_{j}

weighted by the percentage of explained variance of the specific latent variable

S S Y_{f}

according to relation [52]

V I P_{j} = \sqrt{\frac{J \times [\sum_{f = 1}^{F} w_{f j}^{2} \times S S Y_{f}]}{\sum_{f = 1}^{F} S S Y_{f}}},

(2)

where F is the total number of latent variables and J the total number of variables (in our case 141). The relation for the calculation of

S S Y_{f}

is given by [53]

S S Y_{f} = b_{f}^{2} t_{f}^{T} t_{f},

(3)

where

b_{f}

and

t_{f}

are the PLS inner relation coefficients and the score matrix relevant with the f-th latent variable, respectively.

The VIP score varies in a fixed range, being the sum of squared VIP equal to total number of variables. Therefore, it is common practice in the literature to assume as threshold for retaining informative variables a VIP score larger than one. This means that those variables have an above average influence on the building of the model explaining the observations [53].

The variables selected through VIP scoring are used for the final regression. In order to choose the optimal number of components [54], a leave-one-out cross-validation has been implemented [55]. This procedure consists in the building of a model leaving out from the calibration set one by one all the samples. In other words, if 20 samples are available, the model is built each time using 19 samples. The one left out changes at each iteration. The model built in such way is evaluated by using as test set the sample left out and the RMSE with respect to the corresponding measure is calculated. The process is repeated using in PLSR a different number of components. For all the tries, the cumulative RMSE is calculated. Finally, the optimal number of components is selected as the one corresponding with the lowest cumulative RMSE.

3. Results

The proposed active sampling approach has been validated separately for dry mass and nitrogen content estimation purposes. Although the ground truth is available on all the plots, an operative scenario has been simulated with the objective to collect the minimum possible number of field samples. The proposed active sampling approach suggests how many and which area should be considered for field sampling. The purpose of the following experiments is to assess the regression performance of the iteratively calibrated model, showing the added value of the ground sampling at each calibration step.

The results obtained by applying the proposed methodology to the available data are reported in Table 2 and Figure 4 for the dry mass experiment and in Table 3 and Figure 5 for the nitrogen one. The variables exploited for each regression, determined via VIP scoring, are listed in Table 4.

The figures show the scatter plots of the predicted dry mass and nitrogen content against the corresponding ground data. In both cases, the last one, i.e., the graph (f), refers to the whole dataset.

In Table 2 and Table 3, the results were obtained using the proposed methodology and the literature approaches using greedy sampling (GSx) [56] and random sampling are reported. Acquisition dates are referred with the symbol D_i. The first five columns report, for each date, the number of samples exploited for PLSR model calibration. The red-framed cell indicates the prediction date. The sixth column expresses the total number of samples used for model calibration. Quantitative data are provided in terms of the root mean squared error (RMSE). The values reported in the last column of the tables (RMSE_p) refer to predictions for acquisition D_i made using a model calibrated using data collected up to the date D_i−1. Data concerning the random sampling approach refer to the maximum and average RMSEs obtained following 1000 experiments.

As for the dry mass experiment (see Table 2), the RMSE with respect to ground truth ranged between 0.14 t/ha obtained on 15 May 2014 and 4.97 t/ha registered on 14 October 2014. The average RMSE was of 2.05 t/ha. The number of samples selected for PLSR ranged from 48 for the acquisition made on 15 May 2014 to 21 referred to the flight made on 26 October 2017 with an average of about 30 samples per date.

The values of RMSE_p were useful to assess the forecasting capability of the fitted model. They have been calculated by using the model fitted at time D_i−1 with the data acquired at time D_i. From the Table, it arises that the model failed the prediction at time D₂ and D₃. Starting from time D_4, predictions show reasonable RMSE, although its value was much higher than that the one obtained using active learning.

Random sampling approach has been evaluated by computing the maximum RMSE and the average one. In the first case the range was between 2.83 t/ha and 9.93 t/ha. In the second, between 2.26 t/ha and 3.77 t/ha. The GSx approach returned an average RMSE of 1.96 t/ha.

As for the nitrogen content experiment, (see Table 3), the obtained RMSE ranged between 1.10 kg/ha, obtained on 15 May 2014, and 8.84 kg/ha, registered on 14 October 2014. The average value was 5.20 kg/ha. The number of samples selected for PLSR ranged from 47 for the acquisition made on 15 May 2014 to 20 referred to the flight made on 26 October 2017. For this experiment, further data about the RMSE calculated by averaging data based on the fertilization treatment (see Figure 1) have been reported in analogy with the results discussed in [9]. In this case, the obtained

{RMSE}^{*}

ranged between 2.96 kg/ha, obtained on 9 May 2017, and 7.22 kg/ha, registered on 14 October 2014, with an average of 4.68 kg/ha. As for forecasting, the model calibrated with data collected up to time D_i−1 fails in the prediction of the nitrogen content at D₂ and D₃. Starting from D_4, it was possible to make reasonable predictions, even with a significantly higher approximation with respect to active learning.

The random sampling test has been evaluated through calculation of the maximum RMSE and the average has been evaluated. In the first case, the range was between 7.67 kg/ha and 13.4 kg/ha. In the second case it was between 4.71 kg/ha and 7.93 kg/ha. The GSx approach returned an average RMSE of 5.42 kg/ha.

In Figure 6, the Pearson coefficient trends, with respect to available biomass (Figure 6a) and nitrogen (Figure 6b), data are reported. It is remarkable that the higher RMSE was registered for the acquisition, the one made on 14 October 2014 (see the orange line in the graphs) shows the lower correlation in the vegetation indices area.

In Table 2 and Table 3 it is also reported the number of samples selected by the proposed active learning technique. Clearly, in an operational environment, the samples marked as informative for the estimation of all the biophysical parameters under investigation are collected simultaneously. With the help of Figure 7, it is possible to have a more precise idea about the amount of sampling necessary to implement the proposed framework.

In this graphics, each rectangle represents a plot. Each of them is divided in five parts, each one representative of an available acquisition. Sub-parts colored in green stand for selected samples. Figure 7a represents the sampling schema for biomass estimation, while Figure 7b depicts the one for nitrogen content investigation. By counting the green parts relevant to each acquisition, it arises that the number of necessary samples is 48, 29, 32, 30 and 24, for a total of 163, which corresponds approximately to the 66% of all the available samples.

4. Discussion

Optimization of agricultural practice is key for facing the challenges of modern food production systems, which are expected to satisfy a growing demand, both in terms of quality and quantity, in a landscape characterized by a reduction of cultivable lands and an increasing awareness concerning sustainability issues.

Grassland can be accurately characterized by using UAV hyperspectral imagery. To this end, PLSR calibrated with an innovative active learning methodology has been exploited in this work.

The results presented here (using PLSR) revealed that predictions can be made with an average RMSE of 2.05 t/ha and 4.68 kg/ha for dry mass and N-uptake estimation, respectively. As shown in the previous section, they are mostly insensitive with respect to the validation sets used, as the results obtained using all the samples and only those not selected from the active sampling technique are quite similar.

As shown by data reported in Table 4, the variables used for regression mainly belong to NIR frequencies (as already observed in [9]) and to the family of chlorophyll absorption indices (CARI). By observation of the curves depicted in Figure 6 and of the linear correlation values in Table 4, it arises that these regressors are, on average, correlated with the biophysical variable under investigation. This contributes to reliable regressions in almost all the experiments, as shown in Figure 4 and Figure 5. The only one which can be considered a failure is the D₂ dry mass regression. This experiment had a poor agreement with ground data, as testified by picture Figure 4b.

As a general comment, the acquisition D₂ is the one showing the lower correlation with ground data, especially in the vegetation indices area (see the orange line in Figure 6a). Moreover, among all the implemented regressions, the one implemented at D₂ has been implemented using the lowest number of VIP variables (five) which also showed poor correlation with ground data. This means that amount and the quality of information selected as valuable through VIP scoring was lower with respect to the other cases and this could have had an impact on the estimate. Moreover, looking at the average biomass values on the scene reported in Table 2, it is observed a significant decrease between D₁ and D₂. This suggests the presence of nonlinearity in the data which could be not appropriately modeled by PLSR.

Overall, the obtained estimates for vegetation biomass and nitrogen content are comparable to those reported in the literature by studies using similar equipment. As an example, reference [57] claimed a RMSE of 3.75 t/ha in the estimation of maize biomass. Oliveira et al. [58] reported an RMSE of 16.99 kg/ha in the estimation of nitrogen content of grasslands. Reference [9] reported an RMSE of 3.25 t/ha and 6.50 kg/ha for biomass and N-uptake estimations of grassland, respectively, with the N-uptake result referring to an average made at fertilization treatment scale.

The results obtained using the proposed methodology have been compared against those returned by PLSR calibrated with GSx and random sampling, which is a common sampling strategy [59,60,61], especially when an extended ground truth is available [9].

From the comparison it arises that random sampling is, on average, less effective than the proposed methodology, as the average RMSE values tends to be higher than those reported in Section 3. This behavior is even more evident looking at the maximum RMSE values, which are constantly higher than the ones returned using the proposed methodology. These results suggest that active sampling is beneficial, especially when the dimensionality of the problem is increased by several acquisitions, as it is able to include in the calibration set the samples mostly representative of the variability of the data.

As for GSx, it can be argued that its performance is equivalent with that provided by the proposed entropy-based approach. However, GSx has one important parameter that should be set by the operator, i.e., the number of samples to be selected by active learning [56]. The experiments presented here have been implemented using the same number of samples output by the proposed approach, but it is in principle unknown. As the parameter affects the results of the calibration, this could be a serious drawback, especially in presence of extended targets. Moreover, GSx requires the projection of data in a homogeneous feature space for the computation of the distance between samples and this increases the computational complexity of the approach [56].

PLSR is a well-established linear statistical approach suitable for the analysis of multi-collinear spectral dataset, thus making full use of redundant information [62]. This technique has been widely exploited to estimate crop biophysical variables from hyperspectral remote sensing data [24,25,26]. However, the literature also revealed that the large amount of data usually exploited in PLSR may contain irrelevant information, which could reduce the performance of the technique [62]. To cope with this, the VIP score calculated following a tentative regression has been used as discriminator and only the variable exhibiting a VIP value higher than one have been used for the final analysis [52].

Reference [9] allows for an analogy with the results here reported as its authors made available the data exploited for this study. However, significant differences in the methodology are present. In reference [9], the active sampling procedure relies on a technique developed in [21], opportunely reworked in order to improve the selection of the most representative samples. Two important aspects to be remarked about this framework are the necessity of supervision in active learning and the fact that it is based on predictive models already available, which are retrieved via bootstrapping [63]. In other words, the prediction model constituting the base for active sampling is built by splitting the calibration set in two parts. The first part is used to fit the prediction, while the second for validation. This process is repeated several times changing the composition of the two subsets. The final model is the one delivering the lowest RMSE.

The main innovation introduced in this paper is the active learning technique which, exploiting Shannon’s entropy as diversity criterion, allows for the set-up of an unsupervised methodology suitable for the implementation of an incremental monitoring framework starting from scratch. The obtained results are fully comparable with the literature, which mainly relies on supervised techniques needing an already available model for active sampling implementation. This makes the proposed methodology very well suited for operational environments, in which predictive models are usually not available for specific crop fields and automation is highly requested.

In this context, a crucial aspect is represented by the amount of sampling necessary for the successful calibration of PLSR, as this could be a bottleneck for real-world implementation of monitoring activities. As reported in Section 3, the proposed methodology requires an initial massive sampling, as the entropy tend to increase when few calibration points are considered. However, as reported in Table 2, Table 3 and Figure 6, this behavior is strongly mitigated beginning with the second acquisition, in which the number of samples selected for calibration is almost halved and the trend is decreasing as soon new data are acquired. As reported in Section 3, if the whole dataset is considered, the amount of calibration data necessary to fit the predictions is about the 66% of the total.

In the literature, the amount of sampling requested for the calibration of machine learning techniques for predicting vegetation traits is not always explicitly declared. However, assuming that the area of interest has been preventively divided in sub-regions considered homogeneous against the parameter to be predicted (see as an example Figure 1), it is reasonable to collect ground data about the 70% of them [9,64] in order to implement an effective bootstrapping. This number is comparable with the one found by implementing the proposed methodology.

The last comment is about the prediction capability of the models, which is probably the most interesting aspect for end-users as it could cut out the need for field sampling. In the previous the prediction at date D_i has been determined using the data collected up to date D_i−1. The obtained results are different for the dry mass and the nitrogen content. In the first case, the RMSE values suggest that after few sampling campaigns it is possible to calibrate a model able to deliver reasonable predictions, although a longer time series should be studied in order to assess this consideration.

As for nitrogen content estimation, less stable RMSE values have been reported. This suggests that more calibration data are needed in order to build a model able to deliver reliable predictions. As a general comment, looking at the diagrams reported in Figure 4 and Figure 5, it is possible to argue that, following two seasons of sparse sampling using the proposed active learning methodology, it is possible to retrieve a model with significant fitting with the ground truth. This confirms the findings of reference [9].

5. Conclusions

In this work, the characterization of vegetation biophysical parameters using unmanned aerial vehicles equipped with hyperspectral sensors has been discussed. The main innovation introduced is the active sampling technique for calibration of partial least squares regression models which, exploiting Shannon’s entropy as diversity criterion, allows for setting-up an unsupervised incremental monitoring framework starting from scratch.

The proposed methodology has been tested by exploiting a dataset involving five flights made by a remotely piloted platform equipped with a hyperspectral sensor. The obtained results concerning the estimation of the ryegrass biomass (RMSE = 2.05 t/ha) and of content (RMSE = 4.68 kg/ha) revealed that the delivered prediction models are suitable for the purpose, although the relatively small dataset exploited for this study should be extended in order to draw conclusions about its generalization potential. Possible instabilities can be due to low correlation between measurements and ground data and or the presence of nonlinearity difficult to be modeled using partial least square regressions.

The amount of sampling needed is comparable with the numbers provided by the literature and therefore compatible with operational environments. In this regard, the obtained data also suggest that sampling activities can be drastically reduced as soon the monitoring is enriched with new acquisitions, thus making the proposed framework suitable for interoperability with agricultural digital twins.

Author Contributions

Conceptualization, D.A. and F.T.; methodology, D.A.; software, D.A.; validation, D.A.; formal analysis, D.A. and L.C.; investigation, D.A. and L.C.; resources, L.C., F.T. and M.D.M.; data curation, D.A.; writing—original draft preparation, D.A., L.C. and F.T.; writing—review and editing, L.C. and M.D.M.; supervision, M.D.M. and F.T.; project administration, F.T.; funding acquisition, M.D.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been funded by the Italian Ministry of Economic Development under the aegis of the project “MONICAP—“Monitoraggio di colture agricole in persistenza””.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

FAO. The State of Food and Agriculture 2022; FAO: Rome, Italy, 2022. [Google Scholar] [CrossRef]
Cisternas, I.; Velásquez, I.; Caro, A.; Rodríguez, A. Systematic literature review of implementations of precision agriculture. Comput. Electron. Agric. 2020, 176, 105626. [Google Scholar] [CrossRef]
Verdouw, C.; Tekinerdogan, B.; Beulens, A.; Wolfert, S. Digital twins in smart farming. Agric. Syst. 2021, 189, 103046. [Google Scholar] [CrossRef]
Fuller, A.; Fan, Z.; Day, C.; Barlow, C. Digital Twin: Enabling Technologies, Challenges and Open Research. IEEE Access 2020, 8, 108952–108971. [Google Scholar] [CrossRef]
Verdouw, C.; Beulens, A.; Reijers, H.; van der Vorst, J. A control model for object virtualization in supply chain management. Comput. Ind. 2015, 68, 116–131. [Google Scholar] [CrossRef] [Green Version]
Ranghetti, M.; Boschetti, M.; Ranghetti, L.; Tagliabue, G.; Panigada, C.; Gianinetto, M.; Verrelst, J.; Candiani, G. Assessment of maize nitrogen uptake from PRISMA hyperspectral data through hybrid modelling. Eur. J. Remote. Sens. 2022. [Google Scholar] [CrossRef]
Sousa, J.J.; Toscano, P.; Matese, A.; Di Gennaro, S.F.; Berton, A.; Gatti, M.; Poni, S.; Pádua, L.; Hruška, J.; Morais, R.; et al. UAV-Based Hyperspectral Monitoring Using Push-Broom and Snapshot Sensors: A Multisite Assessment for Precision Viticulture Applications. Sensors 2022, 22, 6574. [Google Scholar] [CrossRef]
Matese, A.; Di Gennaro, S.F.; Orlandi, G.; Gatti, M.; Poni, S. Assessing Grapevine Biophysical Parameters From Unmanned Aerial Vehicles Hyperspectral Imagery. Front. Plant Sci. 2022, 13, 898722. [Google Scholar] [CrossRef]
Franceschini, M.H.D.; Becker, R.; Wichern, F.; Kooistra, L. Quantification of Grassland Biomass and Nitrogen Content through UAV Hyperspectral Imagery—Active Sample Selection for Model Transfer. Drones 2022, 6, 73. [Google Scholar] [CrossRef]
Adão, T.; Hruška, J.; Pádua, L.; Bessa, J.; Peres, E.; Morais, R.; Sousa, J.J. Hyperspectral imaging: A review on UAV-based sensors, data processing and applications for agriculture and forestry. Remote Sens. 2017, 9, 1110. [Google Scholar] [CrossRef] [Green Version]
Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
De Jong, S. SIMPLS: An alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 1993, 18, 251–263. [Google Scholar] [CrossRef]
Li, Y.; Li, M.; Li, C.; Liu, Z. Forest aboveground biomass estimation using Landsat 8 and Sentinel-1A data with machine learning algorithms. Sci. Rep. 2020, 10, 7155. [Google Scholar] [CrossRef]
Cho, M.A.; Skidmore, A.; Corsi, F.; van Wieren, S.E.; Sobhan, I. Estimation of green grass/herb biomass from airborne hyperspectral imagery using spectral indices and partial least squares regression. Int. J. Appl. Earth Obs. Geoinf. 2007, 9, 414–424. [Google Scholar] [CrossRef]
Ramoelo, A.; Skidmore, A.; Cho, M.; Mathieu, R.; Heitkönig, I.; Dudeni-Tlhone, N.; Schlerf, M.; Prins, H. Non-linear partial least square regression increases the estimation accuracy of grass nitrogen and phosphorus using in situ hyperspectral and environmental data. ISPRS J. Photogramm. Remote Sens. 2013, 82, 27–40. [Google Scholar] [CrossRef]
Yi, Q.; Jiapaer, G.; Chen, J.; Bao, A.; Wang, F. Different units of measurement of carotenoids estimation in cotton using hyperspectral indices and partial least square regression. ISPRS J. Photogramm. Remote Sens. 2014, 91, 72–84. [Google Scholar] [CrossRef]
Tuia, D.; Persello, C.; Bruzzone, L. Domain Adaptation for the Classification of Remote Sensing Data: An Overview of Recent Advances. IEEE Geosci. Remote Sens. Mag. 2016, 4, 41–57. [Google Scholar] [CrossRef]
Berger, K.; Caicedo, J.R.; Martino, L.; Wocher, M.; Hank, T.; Verrelst, J. A survey of active learning for quantifying vegetation traits from terrestrial earth observation data. Remote Sens. 2021, 13, 287. [Google Scholar] [CrossRef]
He, T.; Zhang, S.; Xin, J.; Zhao, P.; Wu, J.; Xian, X.; Li, C.; Cui, Z. An Active Learning Approach with Uncertainty, Representativeness, and Diversity. Sci. World J. 2014, 2014, 827586. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lu, X.; Zhang, J.; Li, T.; Zhang, Y. Incorporating Diversity into Self-Learning for Synergetic Classification of Hyperspectral and Panchromatic Images. Remote Sens. 2016, 8, 804. [Google Scholar] [CrossRef] [Green Version]
Douak, F.; Melgani, F.; Benoudjit, N. Kernel ridge regression with active learning for wind speed prediction. Appl. Energy 2013, 103, 328–340. [Google Scholar] [CrossRef]
Yuan, B.; Wu, Z.; Zhang, K.; Li, D.; Ma, Q. Application of Active Learning in Carbonate Lithologic Identification. In Proceedings of the 4th International Conference on Artificial Intelligence and Big Data, Chengdu, China, 28–31 May 2021; pp. 404–408. [Google Scholar] [CrossRef]
Demir, B.; Persello, C.; Bruzzone, L. Batch-Mode Active-Learning Methods for the Interactive Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1014–1031. [Google Scholar] [CrossRef] [Green Version]
Kira, O.; Linker, R.; Gitelson, A. Non-destructive estimation of foliar chlorophyll and carotenoid contents: Focus on informative spectral bands. Int. J. Appl. Earth Obs. Geoinf. 2015, 38, 251–260. [Google Scholar] [CrossRef]
Pan, W.-J.; Wang, X.; Deng, Y.-R.; Li, J.-H.; Chen, W.; Chiang, J.Y.; Yang, J.-B.; Zheng, L. Nondestructive and intuitive determination of circadian chlorophyll rhythms in soybean leaves using multispectral imaging. Sci. Rep. 2015, 5, 11108. [Google Scholar] [CrossRef] [Green Version]
Yu, K.-Q.; Zhao, Y.-R.; Zhu, F.-L.; Li, X.-L.; He, Y. Mapping of Chlorophyll and SPAD Distribution in Pepper Leaves During Leaf Senescence Using Visible and Near-Infrared Hyperspectral Imaging. Trans. ASABE 2016, 59, 13–24. [Google Scholar] [CrossRef]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
Suomalainen, J.; Anders, N.; Iqbal, S.; Roerink, G.; Franke, J.; Wenting, P.; Hünniger, D.; Bartholomeus, H.; Becker, R.; Kooistra, L. A Lightweight Hyperspectral Mapping System and Photogrammetric Processing Chain for Unmanned Aerial Vehicles. Remote Sens. 2014, 6, 11013–11030. [Google Scholar] [CrossRef] [Green Version]
Capolupo, A.; Kooistra, L.; Berendonk, C.; Boccia, L.; Suomalainen, J. Estimating plant traits of grasslands from uav-acquired hyperspectral images: A comparison of statistical approaches. ISPRS Int. J. Geo-Inf. 2015, 4, 2792–2820. [Google Scholar] [CrossRef]
Patra, S.; Bruzzone, L. A Fast Cluster-Assumption Based Active-Learning Technique for Classification of Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2011, 49, 1617–1626. [Google Scholar] [CrossRef]
Neter, J.; Wasserman, W.; Whitmore, G.A. Applied Statistics, 4th ed.; Allyn & Bacon: Boston, MA, USA, 2000. [Google Scholar]
Profillidis, V.; Botzoris, G. Statistical Methods for Transport Demand Modeling. In Modeling of Transport Demand; Elsevier: Amsterdam, The Netherlands, 2019; pp. 163–224. [Google Scholar] [CrossRef]
Clevers, J.G.P.W.; Kooistra, L. Using hyperspectral remote sensing data for retrieving canopy chlorophyll and nitrogen content. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 574–583. [Google Scholar] [CrossRef]
Roujean, J.-L.; Breon, F.-M. Estimating PAR absorbed by vegetation from bidirectional reflectance measurements. Remote Sens. Environ. 1995, 51, 375–384. [Google Scholar] [CrossRef]
Guyot, G.; Baret, F. Utilisation de la Haute Resolution Spectrale pour Suivre L’etat des Couverts Vegetaux. In Spectral Signatures of Objects in Remote Sensing; European Space Agency: Paris, France, 1988; 279p. [Google Scholar]
Curran, P.J.; Windham, W.R.; Gholz, H.L. Exploring the relationship between reflectance red edge and chlorophyll concentration in slash pine leaves. Tree Physiol. 1995, 15, 203–206. [Google Scholar] [CrossRef]
Dash, J.; Curran, P. Evaluation of the MERIS terrestrial chlorophyll index (MTCI). Adv. Space Res. 2007, 39, 100–104. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; De Colstoun, E.B.; McMurtrey, J.E., III. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating chlorophyll content from hyperspectral vegetation indices: Modeling and validation. Agric. For. Meteorol. 2008, 148, 1230–1241. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Pattey, E.; Zarco-Tejada, P.J.; Strachan, I.B. Hyperspectral vegetation indices and novel algorithms for predicting green LAI of crop canopies: Modeling and validation in the context of precision agriculture. Remote Sens. Environ. 2004, 90, 337–352. [Google Scholar] [CrossRef]
Jurgens, C. The modified normalized difference vegetation index (mNDVI) a new index to determine frost damages in agriculture based on Landsat TM data. Int. J. Remote Sens. 1997, 18, 3583–3594. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Broge, N.H.; Leblanc, E. Comparing prediction power and stability of broadband and hyperspectral vegetation indices for estimation of green leaf area index and canopy chlorophyll density. Remote Sens. Environ. 2001, 76, 156. [Google Scholar] [CrossRef]
Datt, B. Remote Sensing of Water Content in Eucalyptus Leaves. Aust. J. Bot. 1999, 47, 909–923. [Google Scholar] [CrossRef]
Huang, W.; Yang, Q.; Pu, R.; Yang, S. Estimation of Nitrogen Vertical Distribution by Bi-Directional Canopy Reflectance in Winter Wheat. Sensors 2014, 14, 20347–20359. [Google Scholar] [CrossRef] [Green Version]
Gamon, J.A.; Peñuelas, J.; Field, C.B. A narrow-waveband spectral index that tracks diurnal changes in photosynthetic efficiency. Remote Sens. Environ. 1992, 41, 35–44. [Google Scholar] [CrossRef]
Vincini, M.; Frazzi, E.; Alessio, P. Angular Dependence of Maize and Sugar Beet VIs from Directional CHRIS/Proba Data. In Proceedings of the 4th ESA CHRIS PROBA Workshop, online, 19–21 September 2006; pp. 19–21. [Google Scholar]
Main, R.; Cho, M.A.; Mathieu, R.; O’Kennedy, M.M.; Ramoelo, A.; Koch, S. An investigation into robust spectral indices for leaf chlorophyll estimation. ISPRS J. Photogramm. Remote Sens. 2011, 66, 751–761. [Google Scholar] [CrossRef]
Gonenc, A.; Ozerdem, M.S.; Acar, E. Comparison of NDVI and RVI Vegetation Indices Using Satellite Images. In Proceedings of the 2019 8th International Conference on Agro-Geoinformatics (Agro-Geoinformatics), Istanbul, Turkey, 16–19 July 2019; pp. 1–4. [Google Scholar] [CrossRef]
Vogelmann, J.E.; Rock, B.N.; Moss, D.M. Red edge spectral measurements from sugar maple leaves. Int. J. Remote Sens. 1993, 14, 1563–1575. [Google Scholar] [CrossRef]
Farrés, M.; Platikanov, S.; Tsakovski, S.; Tauler, R. Comparison of the variable importance in projection (VIP) and of the selectivity ratio (SR) methods for variable selection and interpretation. J. Chemom. 2015, 29, 528–536. [Google Scholar] [CrossRef]
Cocchi, M.; Biancolillo, A.; Marini, F. Chemometric Methods for Classification and Feature Selection. Compr. Anal. Chem. 2018, 82, 265–299. [Google Scholar] [CrossRef]
Wold, S.; Sjostrom, M.; Eriksson, L. PLS-Regression: A Basic Tool of Chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Molinaro, A.M.; Simon, R.; Pfeiffer, R.M. Prediction error estimation: A comparison of resampling methods. Bioinformatics 2005, 21, 3301–3307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wu, D.; Lin, C.-T.; Huang, J. Active learning for regression using greedy sampling. Inf. Sci. 2019, 474, 90–105. [Google Scholar] [CrossRef] [Green Version]
Wang, C.; Nie, S.; Xi, X.; Luo, S.; Sun, X. Estimating the Biomass of Maize with Hyperspectral and LiDAR Data. Remote Sens. 2016, 9, 11. [Google Scholar] [CrossRef] [Green Version]
Oliveira, R.A.; Näsi, R.; Niemeläinen, O.; Nyholm, L.; Alhonoja, K.; Kaivosoja, J.; Jauhiainen, L.; Viljanen, N.; Nezami, S.; Markelin, L.; et al. Machine learning estimators for the quantity and quality of grass swards used for silage production using drone-based imaging spectrometry and photogrammetry. Remote Sens. Environ. 2020, 246, 111830. [Google Scholar] [CrossRef]
Tian, Q.; Gong, P.; Zhao, C.; Guo, X. A feasibility study on diagnosing wheat water status using spectral reflectance. Chin. Sci. Bull. 2001, 46, 666–669. [Google Scholar] [CrossRef]
Zhao, H.-S.; Zhu, X.-C.; Li, C.; Wei, Y.; Zhao, G.-X.; Jiang, Y.-M. Improving the Accuracy of the Hyperspectral Model for Apple Canopy Water Content Prediction using the Equidistant Sampling Method. Sci. Rep. 2017, 7, 11192. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Verrelst, J.; Berger, K.; Rivera-Caicedo, J.P. Intelligent Sampling for Vegetation Nitrogen Mapping Based on Hybrid Machine Learning Algorithms. IEEE Geosci. Remote Sens. Lett. 2020, 18, 2038–2042. [Google Scholar] [CrossRef] [PubMed]
Jin, J.; Wang, Q. Evaluation of informative bands used in different pls regressions for estimating leaf biochemical contents from hyperspectral reflectance. Remote Sens. 2019, 11, 197. [Google Scholar] [CrossRef] [Green Version]
Afanador, N.; Tran, T.; Buydens, L. An assessment of the jackknife and bootstrap procedures on uncertainty estimation in the variable importance in the projection metric. Chemom. Intell. Lab. Syst. 2014, 137, 162–172. [Google Scholar] [CrossRef]
Schucknecht, A.; Seo, B.; Krämer, A.; Asam, S.; Atzberger, C.; Kiese, R. Estimating dry biomass and plant nitrogen concentration in pre-Alpine grasslands with low-cost UAS-borne multispectral data—A comparison of sensors, algorithms, and predictor sets. Biogeosciences 2022, 19, 2699–2727. [Google Scholar] [CrossRef]

Figure 1. An orthomosaic of the study area. Colored rectangles indicate the plots. Each color refers to a fertilization treatment.

Figure 2. Proposed workflow. The active sampling procedure is fed by any observation available. Regression is implemented using all the data available at time

t_{i}

to generate the prediction of the biophysical variable of interest.

Figure 2. Proposed workflow. The active sampling procedure is fed by any observation available. Regression is implemented using all the data available at time

t_{i}

to generate the prediction of the biophysical variable of interest.

Figure 3. Schematic representation of the proposed active sampling methodology.

Figure 4. Scatter plots of the predicted dry mass against ground data. (a) D₁ (R² = 0.831), (b) D₂ (R² = 0.002), (c) D₃ (R² = 0.350), (d) D₄ (R² = 0.485), (e) D₅ (R² = 0.267), (f) all data (R² = 0.923).

Figure 5. Scatter plots of the predicted nitrogen content against ground data. (a) D₁ (R² = 0.876), (b) D₂ (R² = 0.307), (c) D₃ (R² = 0.601), (d) D₄ (R² = 0.605), (e) D₅ (R² = 0.252), (f) all data (R² = 0.884).

Figure 6. Linear correlation coefficient trends against the available ground data relevant with (a) dry mass and (b) nitrogen content. Indices on the x-axis from 1 to 101 refers to spectral bands. Those ranging from 102 to 141 correspond to the vegetation indices listed in Table 1.

Figure 7. Schematic representation of the study area highlighting the samples selected by the proposed active learning technique. Each rectangle represents a plot. It is divided in 5 parts representing an acquisition date. If the part is colored in green, the sample is selected for regression. (a) Sampling schema for dry mass estimation. (b) Sampling schema for N-uptake estimation.

Table 1. Hyperspectral vegetation indices used in this work as regression variables. In the formulas, the variable R_n refers to the central band wavelength expressed in nanometers. Indexing starts from 102 as band position indices from 1 to 101 are reserved to hyperspectral measurements.

Index	Full Name	Formula	Reference
102	Green normalized difference vegetation index	${GNDVI}_{1} = \frac{R_{850} - R_{540}}{R_{850} + R_{540}}$	[8]
103	Green normalized difference vegetation index	${GNDVI}_{2} = \frac{R_{780} - R_{550}}{R_{780} + R_{550}}$	[8]
104	Renormalized difference vegetation index	$RDVI = \frac{R_{800} - R_{670}}{{(R_{800} + R_{670})}^{0.5}}$	[34]
105	Red edge Position Index	${REP}_{670, 700} = 700 + 40 \frac{\frac{R_{670} + R_{780}}{2} - R_{700}}{R_{740} - R_{700}}$	[35]
106	Red edge Position Index	${REP}_{670, 850} = 700 + 45 \frac{\frac{R_{670} + R_{778}}{2} - R_{850}}{R_{735} - R_{695}}$	[36]
107	MERIS terrestrial chlorophyll index	$MTCI = \frac{R_{754} - R_{709}}{R_{709} + R_{681}}$	[37]
108	Modified chlorophyll absorption ratio index	${MCARI}_{700, 670} = [(R_{700} - R_{670}) - 0.2 (R_{700} - R_{550})] \frac{R_{700}}{R_{670}}$	[38]
109	Modified Chlorophyll Absorption Reflectance Index/Optimized Soil Adjusted Vegetation Index	$\frac{{MCARI}_{700, 670}}{{OSAVI}_{800, 670}} = \frac{[(R_{700} - R_{670}) - 0.2 * (R_{700} - R_{550})] \frac{R_{700}}{R_{670}}}{\frac{(1 + 0.6) (R_{800} - R_{670})}{R_{800} + R_{670} + 0.16}}$	[33]
110	Transformed Chlorophyll Absorption Ratio Optimized Soil Adjusted Vegetation Index	$\frac{{TCARI}_{700, 670}}{{OSAVI}_{800, 670}} = \frac{3 ((R_{700} - R_{670}) - 0.2 (R_{700} - R_{550}) \frac{R_{700}}{R_{670}})}{\frac{(1 + 0.6) (R_{800} - R_{670})}{R_{800} + R_{670} + 0.16}}$	[33]
111	Transformed Chlorophyll Absorption Ratio	${TCARI}_{700, 670} = 3 ((R_{700} - R_{670}) - 0.2 (R_{700} - R_{550}) \frac{R_{700}}{R_{670}})$	[38]
112	Modified Chlorophyll Absorption Ratio Index	${MCARI}_{750, 705} = (((R_{750} - R_{705}) - 0.2 * (R_{750} - R_{550})) \frac{R_{750}}{R_{705}})$	[39]
113	Modified Chlorophyll Absorption Reflectance Index/Optimized Soil Adjusted Vegetation Index	$\frac{{MCARI}_{750, 705}}{{OSAVI}_{750, 705}} = \frac{(((R_{750} - R_{705}) - 0.2 * (R_{750} - R_{550})) \frac{R_{750}}{R_{705}})}{\frac{(1 + 0.6) * (R_{750} - R_{705})}{R_{750} + R_{705} + 0.16}}$	[39]
114	Transformed Chlorophyll Absorption Ratio	${TCARI}_{750, 705} = 3 ((R_{750} - R_{705}) - 0.2 (R_{700} - R_{550}) \frac{R_{750}}{R_{705}})$	[40]
115	Transformed Chlorophyll Absorption Ratio/Optimized Soil Adjusted Vegetation Index	$\frac{{TCARI}_{750, 705}}{{OSAVI}_{750, 705}} = \frac{3 ((R_{750} - R_{705}) - 0.2 (R_{700} - R_{550}) (\frac{R_{750}}{R_{705}}))}{\frac{(1 + 0.6) (R_{750} - R_{705})}{R_{750} + R_{705} + 0.16}}$	[39]
116	Chlorophyll index red edge	${CI}_{RE} = (\frac{R_{780}}{R_{710}}) - 1$	[33]
117	Chlorophyll index green	${CI}_{G} = (\frac{R_{780}}{R_{550}}) - 1$	[33]
118	Normalized Difference Vegetation Index	${NDVI}_{850, 660} = \frac{R_{850} - R_{660}}{R_{850} + R_{660}}$	[7]
119	Normalized Difference Vegetation Index	${NDVI}_{835, 660} = \frac{R_{835} - R_{660}}{R_{835} + R_{660}}$	[7]
120	Normalized Difference Vegetation Index	${NDVI}_{775, 670} = \frac{R_{775} - R_{670}}{R_{775} + R_{670}}$	[7]
121	Modified Normalized Difference Vegetation Index	$mNDVI = \frac{R_{775} + R_{670}}{R_{775} + R_{670}}$	[41]
122	Soil Adjusted Vegetation Index	$SAVI = \frac{(1 + 0.5) (R_{802} - R_{660})}{(R_{802} + R_{660} + 0.5)}$	[42]
123	Renormalized Difference Vegetation Index	$REDVI = \frac{R_{800} - R_{670}}{\sqrt{R_{800} + R_{670}}}$	[40]
124	Normalized Difference Red Edge Index	${NDRE}_{720, 795} = \frac{R_{795} - R_{720}}{R_{795} + R_{720}}$	[43]
125	Normalized Difference Red Edge Index	${NDRE}_{750, 770} = \frac{R_{770} - R_{750}}{R_{770} + R_{750}}$	[8]
126	NIR—red edge—red normalized difference vegetation index	$NRER = \frac{R_{850} - R_{695}}{R_{695} + R_{660}}$	[8]
127	Transformed Vegetation Index	$TVI = 0.5 (120 (R_{750} - R_{550}) - 200 (R_{670} - R_{550}))$	[44]
128	Modified triangular vegetation index	$MTVI = 1.2 [1.2 (R_{800} - R_{450}) - 2.5 (R_{600} - R_{540})]$	[8]
129	Enhanced Vegetation Index	${EVI}_{850, 660} = 2.5 (\frac{(R_{850} - R_{660})}{((R_{850} + 6 {* R}_{660} - 7.5 {* R}_{505}) + 1)})$	[43]
130	Enhanced Vegetation Index	${EVI}_{800, 670} = 2.5 (\frac{(R_{800} - R_{670})}{((R_{800} + 6 {* R}_{670} - 7.5 {* R}_{508}) + 1)})$	[8]
131	Leaf Chlorophyll Index	$LCI = \frac{R_{850} - R_{710}}{R_{850} + R_{680}}$	[45]
132	Modified Normalized Difference Vegetation Index	${MTCI}_{var} = \frac{R_{850} - R_{680}}{R_{680} + R_{660}}$	[8]
133	Nitrogen Reflectance Index	$NRI = \frac{R_{555} - R_{550}}{R_{555} + R_{550}}$	[46]
134	Photochemical Reflectance Index	$PRI = \frac{R_{570} - R_{530}}{R_{570} + R_{530}}$	[47]
135	Spectral Polygon Vegetation Index	$SVPI = 0.4 (3.7 (R_{800} - R_{670}) - 1.2 (R_{530} - R_{670}))$	[48]
136	Simple Ratio	${SR}_{710} = \frac{R_{750}}{R_{710}}$	[49]
137	Simple Ratio	${SR}_{680} = \frac{R_{800}}{R_{680}}$	[49]
138	Ratio Vegetation Index	$RVI = \frac{R_{810}}{R_{660}}$	[50]
139	Vogelmann Index	$VOG = \frac{R_{745}}{R_{720}}$	[51]
140	Gitelson and Merzlyak index	$GM = \frac{R_{750}}{R_{550}}$	[8]
141	Modified normalized difference	$MND = \frac{R_{750} - R_{705}}{R_{750} + R_{705} + 2 R_{508}}$	[8]

Table 2. Dry mass estimation results using the proposed methodology, the greedy sampling and the random sampling approaches. The first five columns express, for each date, the number of samples exploited for PLSR model calibration, with the red-framed cell indicating the prediction date. Results are expressed in terms of the root mean squared error RMSE. The values RMSE_p refer to predictions made using a model calibrated with data acquired up to D_i−1. The last column of the table refers to the average dry mass for each date. D₁: 15 May 2014, D₂: 14 October 2014, D₃: 9 May 2017, D₄: 29 August 2017, D₅: 26 October 2017.

						Proposed		Random Sampling		GSx
D₁	D₂	D₃	D₄	D₅	Total	$R M S E$ (t/ha)	$R M S E_{p}$ (t/ha)	RMSE_max (t/ha)	RMSE_mean (t/ha)	RMSE (t/ha)	M_mean (t/ha)
48					48	0.14	na	9.93	2.70	0.14	33.5
48	27				75	4.97	62.3	5.55	3.77	3.91	16.2
48	27	28			103	2.07	196	4.70	2.38	2.39	11.0
48	27	28	24		127	1.38	20.9	2.83	2.26	1.35	10.2
48	27	28	24	21	148	1.69	11.7	3.55	2.70	2.03	9.50

Table 3. Nitrogen content estimation results using the proposed methodology, the greedy sampling and the random sampling approaches. The first five columns express, for each date, the number of samples exploited for PLSR model calibration, with the red-framed cell indicating the prediction date. Results are expressed in terms of the root mean squared error RMSE. The values RMSE_p refer to predictions made using a model calibrated with data acquired up to D_i−1. The column named as

{RMSE}^{*}

reports values calculated on data averaged per-treatment in analogy with the study implemented in [9]. The last column of the table refers to the average nitrogen content for each date. D₁: 15 May 2014, D₂: 14 October 2014, D₃: 9 May 2017, D₄: 29 August 2017, D₅: 26 October 2017.

Table 3. Nitrogen content estimation results using the proposed methodology, the greedy sampling and the random sampling approaches. The first five columns express, for each date, the number of samples exploited for PLSR model calibration, with the red-framed cell indicating the prediction date. Results are expressed in terms of the root mean squared error RMSE. The values RMSE_p refer to predictions made using a model calibrated with data acquired up to D_i−1. The column named as

{RMSE}^{*}

reports values calculated on data averaged per-treatment in analogy with the study implemented in [9]. The last column of the table refers to the average nitrogen content for each date. D₁: 15 May 2014, D₂: 14 October 2014, D₃: 9 May 2017, D₄: 29 August 2017, D₅: 26 October 2017.

						Proposed			Random Sampling		GSx
D₁	D₂	D₃	D₄	D₅	Total	RMSE (kg/ha)	RMSE* (kg/ha)	RMSE (t/ha)	RMSE_max (kg/ha)	RMSE_mean (kg/ha)	RMSE (kg/ha)	N_mean (kg/ha)
47					47	1.10	2.96	na	13.4	4.96	1.70	43.4
47	27				74	8.84	7.22	93.1	9.71	7.03	8.74	37.2
47	27	28			102	4.63	2.65	265	7.67	4.71	3.87	23.8
47	27	28	26		128	6.87	6.16	48.8	10.8	7.93	5.82	29.8
47	27	28	26	20	148	4.56	4.43	56.5	10.5	7.34	6.99	23.8

Table 4. Variables used for regression determined via VIP scoring. NVIP stands for number of VIP variables.

	Dry Mass		Nitrogen
	N_VIP	VIP	N_VIP	VIP
D₁	7	B112 (0.780), B113 (0.742), B114 (0.633), B115 (0.439), B127 (0.439), B128 (0.474), B135 (0.591)	8	B100 (0.732), B101 (0.754), B109 (−0.293), B112 (0.626), B113 (0.667), B114 (0.901), B115 (0.802), B127 (0.610)
D₂	5	B101 (0.158), B112 (0.065), B113 (0.115), B115 (0.266), B127 (0.333)	9	B65 (0.685), B99 (0.343), B100 (0.130), B101 (0.182), B112 (0.135), B113 (0.263), B114 (0.253), B115 (0.345), B127 (0.519)
D₃	9	B63 (0.774), B64 (0.699), B108 (0.028), B109 (−0.134), B112 (0.432), B113 (0.358), B114 (−0.543), B115 (−0.427), B127 (0.183)	11	B63 (0.780), B64 (0.767), B69 (0.719), B100 (0.255), B108 (−0.080), B109 (−0.242), B112 (0.526), B113 (0.441), B114 (−0.381), B115 (−0.243), B127 (0.232)
D₄	9	B63 (0.545), B64 (0.588), B108 (0.249), B109 (0.192), B112 (0.728). B113 (0.726), B114 (0.134), B115 (0.085), B127 (0.658)	10	B63 (0.435), B64 (0.477), B69 (0.527), B108 (0.136), B109 (0.080), B112 (0.669), B113 (0.661), B114 (0.233), B115 (0.187), B127 (0.579)
D₅	9	B63 (0.474), B64 (0.469), B108 (0.189), B109 (0.124), B112 (0.287). B113 (0.259), B114 (−0.168), B115 (−0.232), B127 (0.215)	11	B57 (0.278), B58 (0.420), B63 (0.547), B64 (0.538), B108 (0.253), B109 (0.190), B112 (0.340), B113 (0.317), B114 (−0.211), B115 (−0.280), B127 (0.280)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Amitrano, D.; Cicala, L.; De Mizio, M.; Tufano, F. UAV Hyperspectral Characterization of Vegetation Using Entropy-Based Active Sampling for Partial Least Square Regression Models. Appl. Sci. 2023, 13, 4812. https://doi.org/10.3390/app13084812

AMA Style

Amitrano D, Cicala L, De Mizio M, Tufano F. UAV Hyperspectral Characterization of Vegetation Using Entropy-Based Active Sampling for Partial Least Square Regression Models. Applied Sciences. 2023; 13(8):4812. https://doi.org/10.3390/app13084812

Chicago/Turabian Style

Amitrano, Donato, Luca Cicala, Marco De Mizio, and Francesco Tufano. 2023. "UAV Hyperspectral Characterization of Vegetation Using Entropy-Based Active Sampling for Partial Least Square Regression Models" Applied Sciences 13, no. 8: 4812. https://doi.org/10.3390/app13084812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

UAV Hyperspectral Characterization of Vegetation Using Entropy-Based Active Sampling for Partial Least Square Regression Models

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. Methodology

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI