1. Introduction
The importance of robust retrieval of crop traits in the agricultural sector has been endorsed by the Global Climate Observing System and reported in IPCC Sixth Evaluation [
1]. Demand for food will increase in the future in order to feed a constantly growing human population. Currently, the total global population is around 7.86 billion and the Food and Agriculture Organization of the United Nations (FAO) estimates that it will increase to 11.2 billion by the end of the 21st century. To meet estimated world nutrition needs, it is necessary to boost crop production. However, climatic change is likely to cause a reduction in crop productivity in terms of quality and quantity, resulting in a progressive burden on the ecosystem [
2]. Thus, intervention in agriculture is necessary in order to obtain not only high quantity yields but also high-quality yields while remaining economically perspective [
3].
To maintain crop productivity, croplands should be monitored during the growth cycle, noting the changes in crop status that are manifested in biophysical (e.g., leaf area index (LAI), fractional vegetation cover (fCover)) and biochemical (e.g., chlorophyll content (CC) at leaf (LCC) and canopy (CCC) levels) variables. Where an abatement in yield can be anticipated, for example, due to climate change, feasible measures must be undertaken to avoid nutrient deficiencies in the population [
4]. A precise management approach needs to be adopted by providing accurate supplies of fertilizers (organic or non-organic), pesticides, and irrigation taking into account inter- and intra-field variability in crops [
5]. This method is known as precision agriculture (PA), and endeavors to reduce and optimize the cost of inputs by using multiple sources of data informing about spatial and temporal variability of the crops and soil sites [
6].
For providing such data in a continuous and non-destructive way, remote sensing (RS) technology provides an affordable and environmentally friendly tool to assess crop area, type, and condition (e.g., diseases, plant pests, and plant stress states) in near-real time along the season [
7]. In this context, the continuous growth of space-based platforms, such as airborne, unmanned aerial vehicles (UAV), and satellite remote sensing, along with in situ observation, creates the potential to offer reliable and repeatable coverage of Earth observation datasets. These will support and help farmers as well as national and international ministries of agriculture for management and decision-making. In optical RS, the visible near-infrared spectrum (VNIR) and short-wave infrared (SWIR) are often used to record information about the state and dynamics of vegetation. This information is acquired by the sensors that are implemented, making it possible for multispectral and hyperspectral imaging [
8]. Each of these images has different spatial, spectral, and temporal resolutions. Multispectral imagery (e.g., Landsat, Sentinel 2, and SPOT) contains a limited number of spectral bands. On the other hand, hyperspectral images (e.g., PROBA-1, HyspIRI, Hyperion, and CASI) provide multiple bands that help detect subtle variations of ground covers and their changes over time [
9]. Therefore, the performance of hyperspectral data is outperformed in monitoring crop traits and has attracted researchers, e.g., estimating LAI [
10], fCover [
11,
12,
13], and CC [
14,
15].
Since the advent of RS science, a diversity of retrieval methods has been employed to link spectral reflectance with biophysical and biochemical traits. The earliest approaches were (i) variable data-driven, i.e., empirical statistical approaches from a practical experiment in 1970, and (ii) radiometric data-driven, also referred to as physically-based approaches or radiative transfer modeling (RTM), appearing in the 1980s [
16,
17]. During the last two decades, hybrid retrieval approaches have paved the road to make the use of both fundamental approaches (variable and radiometric data-driven) in a synergistic way. Verrelst et al. [
18] and Verrelst et al. [
19] updated the actual taxonomy of the retrieval methods into four main groups: (1) parametric regression methods follow an empirical statistical approach, which postulates the explicit relationship between the spectra bands as predictors and the interested canopy variable as the dependent variable. (2) Nonparametric regression methods are preferred for use due to fewer limitations in using the number of spectrum bands and in the type of data distribution as compared to the previous method [
20]. The non-explicit relationship between the feature spectrum and the target parameters is assumed, which means that the relationship forum is not predetermined. This method is divided into two classes: linear and nonlinear regression models. (3) Physically-based canopy trait retrieval method is grounded by radiative transfer theory, which is a mathematical model that describes the interaction between solar radiation and vegetation canopies. Over the past four decades, a diversity of vegetation radiative transfer models (RTM) have been developed, simulating the optical properties and radiative interactions of leaves, canopies, and soil (e.g., see [
21] for a comparison). (4) Hybrid methods are the combination of physically-based canopy trait retrieval methods with data-driven models (parametric and nonparametric).
A retrieval approach is used to model Earth observation (EO) data. With the advancement of remote sensing tools, the acquisition of EO data has increased the data archive beyond dozens of 65 petabytes [
22]. It increases the dimensions of the data structure in terms of spatial, spectral, and temporal resolution. Thus, these big data require computational power to analyze, process, and create large-scale crop mapping [
23,
24]. The developed method of hybrid retrieval has the ability to speed up the processing chain and find complex relationships between the canopy reflectance and the variable of interest to obtain useful information about crop traits [
22]. In addition, thanks to the synergistic use of both mechanical (RTM) and data-driven methods (either parametric or nonparametric), the ground in situ data are not urgently required in the simulation process but can mainly be used for validation. Different studies have proved the efficiency of using such a method in terms of the accuracy of estimates and mapping crop variables at local and global scales [
25,
26,
27,
28,
29,
30,
31].
Indeed, the terminology of a hybrid approach is widely denominated when combining at least two methods, as claimed in the study of [
32]. For instance, the hybrid method can be a combination of a crop growth model (CGM) with a canopy RTM [
33,
34], a geometric and a turbid medium model to represent the canopy in three-dimensional (3D) space RTM [
35], a geostatistical method with machine learning (ML) [
36], or blending two methods of MLs [
37]. Verrelst et al. [
18] and Berger et al. [
32] exemplified the conceptual framework of retrieval strategies, including the hybrid method based on a combination of RTMs with MLs for estimating vegetation traits. However, some studies used vegetation indices instead of ML to retrieve the targeted variable(s) based on the simulated spectra of the RTM [
38,
39]. Hence, there is no consensus about the exact terminology of a hybrid retrieval method, i.e., the types of methods that may contribute to quantifying the vegetation properties of interest.
Therefore, there is still a need to clarify the conceptual framework for the hybrid retrieval methods, including a description of the basic idea of both approaches, parametric and nonparametric, based on radiative transfer model simulation. To date, no review paper has been devoted to depicting this optical retrieval method in a detailed manner, although a variety of retrieval approaches (such as RTM and empirical statistical model using them as an independent unit) are available to determine the essential vegetation characteristics in the agriculture application. The review paper aims to provide a comprehensive overview of the application of hybrid retrieval methods in the field of quantitative RS, exemplarily, for retrieving LAI, fCover, and CC. Our paper is structured into several main aspects as follows: In
Section 2, the state-of-the-art hybrid approaches are explained by providing the conceptual framework.
Section 3 describes techniques that handle the simulated spectra obtained from the radiative transfer model. The scientific literature over the past two decades (from 2000 to 2022) was analyzed, as illustrated in
Section 4.
Section 5 and
Section 6 are devoted to presenting the results in the form of bar and pie charts and drawing the conclusion and future perspective.
2. The Conceptual Frameworks of Hybrid Retrieval Methods
The foundation of optical remote sensing is radiative transfer models (RTMs), which describe the interaction of matter and electromagnetic radiation [
40]. RTM (deterministic models) is an effective tool for precise retrievals of Earth attributes from satellite data and is used in a variety of contexts, including the calibration of radiometric sensors, atmospheric correction, and the modeling radiation processes in vegetation canopies [
22]. In canopy radiative transfer models, the link between leaf and canopy parameters and reflectance, absorbance, and scattering mechanisms has been investigated [
40,
41,
42,
43]. According to the complex structure of plants, various types of canopy RTMs have been proposed, starting from the simple turbid medium model (1D) to the advanced Monte Carlo model, which allows a clear representation of complex 3D canopy structures [
21,
44,
45,
46]. The most established method of canopy modeling is to calculate the reflectance from the top of the canopy (TOC) by coupling the leaf optical properties model and the canopy reflectance model to the soil reflectance model (e.g., PROSAIL [
47]). This aids to investigate how other disturbing elements, such as soil background, nonphotosynthetic materials, and observation geometry, affect canopy reflectance [
48]. When combining the canopy reflectance models with the atmosphere, the radiance from the top of the atmosphere (TOA), which is detected by the sensor, can be computed [
41,
49]. To estimate leaf traits and canopy properties either from TOC reflectance or TOA radiance data, the retrieval method as the core of a retrieval system is needed [
16].
The hybrid retrieval method, as the main focus of this research, is to link both models of canopy radiative transfer with the data-driven model. Analytically, a hybrid retrieval workflow consists of two parts: first, establishing the lookup table (LUT database) for simulating canopy spectra based on the predefined input parameters of RTM, and second, applying a data-driven model (statistical methods) to simulations to estimate crop traits. These simulations are determined according to spectral configuration for the particular sensor. To search for the optimal simulated spectra closed to the measured spectra, two methods are available to use, either the parametric or nonparametric regression method. Selecting either of them depends on the types of sensors in remote sensing used. Many studies prefer to use a nonparametric method with hyperspectral data due to its ability to handle the high dimensions of the spectra and resistance against noise and spectrum uncertainty [
50]. This method is distinguished by its outstanding predictive power and can be efficiently applied over the entire satellite images at a global or local scale to map the functional traits of plants. This is not the case for the parametric method when using multispectral data. Any of the retrieval methods, whether parametric or nonparametric, can apply to these data, and this depends on the objective of the study.
The benefit of hybrid retrieval methods is that they mimic a wide range of land cover scenarios (up to hundreds of thousands), resulting in a data collection far larger than what can be obtained during a field study [
20]. Hence, a large amount of in situ data is not required; only a few samples are needed to validate the estimations. Keep in mind that the developed retrieval method does not mitigate the nature of the ill-posed problem, which often renders unstable results and uncertainties of results. To alleviate this problem, prior information related to the correlation and distribution of the canopy variables [
51,
52] and/or use additional information of spatial, temporal [
53,
54,
55], both of them, or multi-angular observation data [
56] were integrated with LUT approach in the RTM.
Figure 1 shows an overview of the hybrid retrieval methodology, including regression models. They are described in the following
Section 2.1 and
Section 2.2.
2.1. Hybrid Modeling Based on Parametric Regression Methods
Over the last two decades, parametric models based on physical models have been pronounced in the science of vegetation analysis to obtain universal indices applicable under different environmental conditions. Given the various scenarios of synthetic canopy spectra and their corresponding canopy characteristics, a hybrid model can be utilized to create a new index or optimize and evaluate the robustness of vegetation indices (VIs), shape indices, and spectral transformation techniques. In general, these techniques create a regression model in which a few spectral bands with high sensitivity are selected for the variable of interest. In the hybrid retrieval method, the generated regression model is applied to simulation and experimental data. Then, the cost function is used to reduce the discrepancy between the observed indicator and the simulation. To select the best VIs for predicting canopy characteristics, the curve-fitting models are used to construct the relationship between the targeted variable and index. These models can be linear, exponential, power, logarithmic, or polynomial regression. Finally, the implementation of a validation procedure using empirical measurements is a necessary and final step to verify the accuracy of the estimation of the variable of interest.
Table 1 describes the advantages and limitations attached to a parametric method that researchers need to be aware of. The procedure of the hybrid method based on the parametric method is illustrated in
Figure 2 below, along with an overview of the most popular parametric regression techniques.
2.1.1. Vegetation Indices
Vegetation index (VI), which was developed in the 1970s, is a mathematical combination of surface reflectance from multiple bands to refine information about canopy attributes, reducing susceptibility to confound influences such as soil background, illumination geometry, and atmospheric conditions [
57,
58,
59]. The observations from several spectral bands are transformed to provide a single value of VI. These numerical transformations, which are semi-analytical measurements of vegetative activity, have been shown to differ considerably not just with seasonal variation in green foliage but also throughout space, making them valuable for identifying within-field spatial variability (i.e., in precision farming application) [
60].
VIs can be calculated from either broadband spectra (more than 50 nm intervals) or narrowband spectra (5–10 nm intervals) [
61,
62]. In broadband VIs computed from multispectral data, VIs intend to study the spectral properties of vegetation in both the visible and near-infrared regions of the spectrum. The spectral response of vegetation within the red domain is powerfully related to the amount and concentration of photosynthetic pigments such as chlorophyll concentration, whereas the spectral response within the near-infrared region is controlled by leaf structural characteristics (e.g., LAI and fCover). Since hyperspectral narrow band data can separate and characterize the canopy, narrow-band vegetation indices are recommended for use by finding the most effective combination between spectral bands. Many VIs have been developed moving from two bands (e.g., Normalized Difference Vegetation Index (NDVI)) toward four spectral bands (e.g., the transformed chlorophyll absorption in reflectance index/the optimized soil adjusted vegetation index (TCARI/OSAVI)) [
63]. The accuracy of estimates using VIs as model inputs can be affected if the study does not identify an appropriate index through model inversion [
64]. Further information about this technique presenting the general concept of VIs can be found in these studies: [
65,
66,
67] for LAI, ref. [
68] for fCover, and [
69] for CC.
2.1.2. Shape Indices
As an alternative to classical VIs, shape indices have been investigated to enhance absorption features present in vegetation spectra since the advent of hyperspectral data. One of the most common calculations used in the category of shape indices, the red edge position (REP), is a significant feature for detecting the variations of crop variables [
70,
71,
72]. It is defined as the maximum first derivative of the spectrum between the red and NIR domains [
73,
74]. This region was often used to infer crop characteristics such as LAI and CC [
75,
76]. Recently, it was discovered that the indices created by the red edge (RE) bands (680–780 nm) are useful to enhance the precision of the estimates [
76,
77,
78,
79]. The extraction of REP parameters from various sources of spectral data has resulted in the development of a number of techniques, such as maximum first derivative (MFD) [
80,
81], the polynomial fitting (PF) technique [
82], the inverted Gaussian (IG) technique [
83], and the linear extrapolation (LE) technique [
84]. Cui et al. [
85] succeeded in increasing the accuracy of predicated LCC by proposing a new VI called red edge chlorophyll absorption index (RECAI) and integrated it with classical VI (TVI).
Besides REP, there are other methods of calculation indices that depend on the derivative-based VIs (D_VI) and integration (I_VI) [
86]. Both methods convert the original spectrum band for any spectral region, including the red edge band, into an index. In the D_VI, the slope and first and second derivative curves of spectral reflectance are determined instead of using reflectance values, while in the I_VI, the integration of the spectral regions at the visible wavelengths and the red edge is used to normalize the vegetation index. Further details can be found in [
72,
87,
88,
89]. Indices based on derivative spectra have been demonstrated to be more successful than reflectance-based indices because they principally reduce background signals and separate overlapping spectra using a variety of differentiation techniques [
61]. Some studies performed a systematic evaluation between conventional VI and derivative-based indices, and the results confirmed that it is not necessary to see such improvement when using a derivative-based index [
86,
90]. In the study of [
91], the double-peak nitrogen index (DCNI I) was the best for estimating chlorophyll content, resulting in the ability to assess nitrogen content. For LAI, Qiu et al. [
92] constructed new derivative parameters of NDVI to improve the estimation accuracy. Compared to the derivative-based index approach, the integration-based indices are utilized for retrieving leaf chlorophyll (LCC) [
89,
93,
94].
2.1.3. Spectral Transformations
In addition to the previous techniques, continuum removal and wavelet transform methods are developed for airborne or satellite hyperspectral imaging instruments. Kokaly and Clark [
95] tested the potential of continuum removal (CR), a technique that is frequently employed in geology, using dried leaf specimens, where broad absorption characteristics in dry leaf spectra were subjected to CR, and absorption-band depths in relation to the continuum were computed. Each absorption feature’s band depths were normalized. The depth at the feature’s center and the region under the band depth curve were used to examine normalization. In other words, CR normalizes the reflectance spectrum by comparing different absorption properties related to vegetation characteristics with a common baseline [
96,
97,
98]. For instance, in the study of [
99], the Chlorophyll Absorption Continuum Index (CACI) was developed and calculated, based on computing the area under the spectral curve between 550 and 730 nm. Other studies also used this technique for enhancing the accuracy of crop traits (LAI, nitrogen, and chlorophyll) [
97,
100,
101]. The wavelet transform (WT) method is a viable method for analyzing the spectrum that converts the original reflectance spectrum into coefficients resolving at high scales (e.g., small narrow bandwidth absorption features) and low scales (e.g., broad absorption features). The discrete wavelet transform (DWT) and the continuous wavelet transform (CWT) are methods utilized to extract spectral features, whereby using one of these methods, the optimum number of wavelet coefficients associated with a particular type of spectral feature is determined [
102]. Different studies focus on the WT method for estimating LAI [
103,
104], chlorophyll content [
105,
106], fCover [
107], and nitrogen [
108].
2.2. Hybrid Approach Based on Nonparametric Methods
The nonparametric methods have recently gained prominence in the era of free Earth observation (EO) data streams. In practice, the LUT databases, including pairs of simulation and canopy parameters, are used to fit linear or nonlinear nonparametric regression formulas, and the fitted equation is then utilized for estimating land surface parameters. This is performed after the training data and testing data have been prepared. This is a critical step for developing generic and robust hybrid models and is a typical application of the supervised learning model. The learning process successfully lies in the ability to minimize the error of the training sets and improve the accuracy over iterations [
109]. To assess the generalizability of the regression model, the model should be tested based on independent (unseen) datasets to ensure full interpretation of the spectral variance in the optical remote sensing image that is reflected in the accuracy of the plant characteristics of interest. Lastly, the results (estimated canopy characteristics of interest) from the successful trained model should be validated with the ground data (
Figure 3).
Table 2 displays the advantages and limitations and caveats of the nonparametric method.
2.2.1. Linear Nonparametric Regression Methods
Linear regression applied to optical data deals with more than one single explanatory variable, called a regressor (X), for a regression model to determine the response variable (Y) while keeping the assumption of linearity. Stepwise multiple linear regression (SMLR), principal component analysis (PCA), and partial least square error (PLSR) are the most popularized methods used in the 1980s, as compared to ridge regression (RR) and least absolute shrinkage and selection operator (LASSO). These methods have been adopted from simple linear regression.
Table 3 presents the comparison of pros and cons of different model representations.
Stepwise multiple linear regression
Stepwise multiple linear regression (SMLR) is a way to select the most significant explanatory variable from a set of independent variables that has the highest correlation with the response variable (Y) [
110]. The SMLR method is conducted in two phases: forward and backward stepwise selections. The model starts with no variable (spectral bands) and adds variables one by one, which is the most significant part. Then, a backward elimination procedure starts with all spectral bands and removes the bands one-by-one, obtaining the least statistically significant. Typically, the range of the
p-value for entering and removing the variables is set between 0.01–0.02 [
58]. In addition, to quantify the severity of multicollinearity between explanatory variables, the variance inflation factor (VIF) is an index to measure how much variance there is of the estimated regression coefficient. A rule of thumb is that if VIF is more than 10, then the data have high collinearity [
111], otherwise no collinearity between independent variables is found.
Principal component regression
Principal component regression (PCR) is based on a combination of principal component analysis (PCA) and linear regression model [
112]. The main idea is to convert the original variables into a new set of synthetic variables, which are independent of each other. By using a linear transformation, the data are transformed into a new orthogonal coordinate system where the data with the largest variance are displayed on the first axis (referred to as the first PC), the data with the second-largest variance on the second axis (referred to as the second PC), and so on [
98]. As a result, the orthogonal PCs are ordered from the highest to lowest variance data information of spectral features.
Partial least square regression
Following a similar idea to the above method, partial least square regression (PLSR) relies on two methods, which are PCR and canonical correlation analysis (CCA). A large number of correlated variables of the spectral data is reduced to a few non-correlated variables, with high variability. For the case of PCR, the projection space of PCA depends only on the independent data (X); however, in partial least squares (PLS), the projection space of X is explicative of both X and Y. The original variables X and Y are transformed into their respective latent variables (X1 and Y1), and then PLS seeks the most probable linear correlation between latent variables (the idea of CCA).
Ridge regression
Ridge regression (RR) is a method for estimating the coefficients of multiple-regression models in scenarios with highly correlated linearly independent variables. A new trendline is introduced to fit the training data by adding a certain amount of bias in the regression estimates to obtain reliable approximations of the population values. The bias called lambda () plays a role to control the trade-off of bias variance and the user tries to find the best value of lambda that has low variance using cross-validation. With increasing lambda value, the important parameters may shrink to be zero, and fewer stay at high values.
Least absolute shrinkage and selection operator
This approach, abbreviated as LASSO, uses variable selection and regularization to improve the statistical model’s prediction accuracy and interpretability. This method allows forcing the most and least important parameters to be close to zero or absolute zero, as compared to RR.
Table 3.
Pros and cons of the linear nonparametric methods.
Table 3.
Pros and cons of the linear nonparametric methods.
Methods | Pros | Cons |
---|
SMLR | (1) Simple, fast, and easy to use. | (1) Suffers from multicollinearity when applied to canopy hyperspectral data. |
(2) Screens a large number of potential predictors to obtain the best one. | (2) The selected wavelength is often not related to the absorption characteristics of the compounds of interest [113,114]. |
PCR | (1) Mitigates multicollinearity and avoids overfitting problem. | (1) Does not consider the response variable (Y) when deciding which principal components are dropped and relies only on the magnitude of the variance of components. |
(2) Improves the predictive performance and provides stable result in regression coefficient. | (2) Does not perform feature selection. |
| (3) Issue of interpretability. |
PLSR | (1) Handles multiple inputs and outputs, data noise, and missing data. | (1) Relies on the cross-product relations with the response variables and is not based on the (co)variances between independent variables. |
| (2) Has difficulty explaining. |
| (3) Response distribution unknown. |
RR | (1) Solves the problem of overfitting. | (1) Low in-model interpretability. |
(2) Adds bias to estimators to reduce the standard error. | (2) Unimplemented the feature selection. |
(3) Uses all the predictors in the final model. | (3) Trades the variance for bias. |
LASSO | (1) Performs feature selection. | (1) Arbitrarily selection. |
(2) Fast in terms of inference and fitting. | (2) Difficult to justify which predictor needs to select. |
(3) Avoids overfitting. | (3) Uses a small bias in the model since the prediction is too dependent upon the particular variable. |
| (4) Lower prediction performance than RR. |
2.2.2. Nonlinear–Nonparametric Methods: Machine Learning
As a part of the retrieval methods used in the hybrid model, machine learning (ML) does not rely on any particular form of the regression function to characterize the connection between the dependent (variable of interest) and explanatory variables (in this case, a spectral reflectance image). In addition, ML not only provides a powerful and flexible framework of the data-driven method for making a decision, but it also allows for the incorporation of expert knowledge into a learning system. For this reason, ML is becoming increasingly popular and important in the field of agricultural monitoring studies. Below is a brief description of ML methods, with their pros and cons summarized in
Table 4.
Artificial Neural Networks
An artificial neural network (ANN) is a collection of connected artificial neurons, and each artificial neuron or node connects to another, linking with weight, and nonlinear equations are specified by the activation function (e.g., rectified linear unit or sigmoid functions). Through a nonlinear function of the sum of its inputs, the output of each neuron is calculated. When exceeding a certain value of the threshold/activation function of the output node, then the node is activated and data are sent to the next layer (having a set of neurons or nodes) of the neural network, known as the hidden layer [
115]. This leads us to identify the design or structure of ANN starting from simple to the complex one, depending on the number of hidden layers, the number of artificial neurons, the directional flows (uni or multi), the type of activation function used, and how many inputs and outputs are used in the model. An example of simple architecture is a feed-forward neural network (FFANN). It was often used in remote sensing for mapping vegetation properties in the mid-1990s. This is a unidirectional flow, where the information from the input nodes is transferred to the output nodes.
An back-propagation neural network (BPANN) is built based on using multi-directional forward and backward mode and the error rate obtained from the output layer and distributed back through the network layers [
116]. As an alternative to the aforementioned methods, radial basis function (RBFANN) [
117], recurrent neural network (RANN) [
118], and Bayesian regularized ANN (BRANN) are advanced models that deal with a large quantity of remotely sensed data [
119].
Deep neural networks (DNNs), which emerged in 2015, have achieved excellent results in classification tasks. Nevertheless, DNN is still under investigation for regression in experimental and operational hybrid settings [
24]. It uses many hidden layers and relatively few neurons per layer, as compared to the simple structure of NNs [
115]. Ultimately, the success of NN performance relies on how the user adjusts the hyperparameters, such as the number of hidden layers and neurons in the layer, to minimize the difference between the model prediction and the desired outcome, respecting a good trade-off between the computational time, stability, and accuracy [
34].
Ensemble learning
Ensemble learning (EL) uses multiple learners that are trained to solve the same problem. The EL approach mixes numerous decision trees to generate higher predictive power, instead of using a single decision tree. Bagging and boosting are the main families of ensemble methods. An ensemble is made up of a group of learners known as base learners. An ensemble’s generalization ability is usually much higher than that of base learners.
- The bagging technique is the short form for bootstrap aggregating, in which the independent multiple sub-groups of features are randomly created with iterative replacement from original training datasets. Their decision trees are trained with each group of data and aggregated to average (reducing the variance of the decision tree) to obtain the final prediction [
120].
Random forest regression (RFR) is an extension over bagging where a subset of features is randomly selected from the total and the best split feature from the feature subsets is used to split each node in a tree and all features are examined for splitting at a node [
121].
A canonical correlation forest (CCF) is a collection of decision trees that are constructed by several canonical correlation trees (CCTs). They are trained by using canonical correlation analysis (CCA) to determine feature projections providing the maximum correlation between features and then picking the optimal splits in this projected space. The results from individual CCTs combine to make a final prediction for unknown samples [
122]. Contrary to RF, CCF uses full training datasets in selecting split points at each tree. Since the bagging approach works based on the combination of multiple weak learners to obtain a stable result, it is the preferred method to be used for any study. However, the result can be biased if the model is properly adapted and thus may result in underfitting.
- Boosting is a dependent framework, based on generating several weaker learners in a very adaptable manner and sequentially to make a strong learner. At every step, a new model is built upon the previous one to boost the training instances by weighing previously mislabeled examples with higher weight. The best example of a dependent framework is gradient boosting regression tree (GBRT), introduced by [
123], which aims to reduce the bias rather than variance. On the other hand, random forests reduce the variance of the regression predictions without changing the bias.
Kernel machines
A kernel machine uses a kernel to perform calculations in a higher-dimensional space without explicitly doing so. Kernel methods transform data from their original location (known as input space) to a higher-dimensional space (known as feature space). Then, in the feature space, these approaches look for linear decision functions that become nonlinear decision functions in the input space [
124]. Kernel methods replace the inner product of the observations with a chosen Kernel function. There are various classes of kernel functions, including the linear kernel, radial basis function, polynomial, and sigmoid functions. They should be continuous, symmetric, and have a positive definite value.
- Support vector regression (SVR) was introduced in the late 1990s to early 2000s by [
125,
126] SVR enables the extraction of the complex nonlinear relationships between the feature vector (X) containing spectral information and the variable of interest (Y) using the kernel trick. This approach determines how much error is acceptable in the model and finds an appropriate line (or hyperplane) to split the data spatially in high-dimension space. Ultimately, the performance of SVR depends on which kernel function is used in the model and how the user tuned their hyperparameters (epsilon-insensitive zone (
) and regularization (C) parameters). The parameter (
) controls the width of the epsilon-insensitive zone for the training data, whereas regularization (C) controls the trade-off between the minimization of errors and the regularization term [
127].
- Gaussian processes regression (GPR) follows the Bayesian theorem by using the probability distribution across all admissible functions that fit the data [
128]. After specifying the prior on the function space, the posterior distribution is computed based on the prior distribution for the successor retrieval procedures [
129]. Since GPR can describe the properties of functions, the mean of a (Gaussian) posterior distribution and variance are predicted. To increase the efficiency of the GPR model, the kernel’s hyperparameters (mean and covariance function) need to be tuned efficiently for maximizing the log-marginal likelihood in the training data [
130].
- Kernel ridge regression
Kernel ridge regression (KRR) combines the kernel trick with ridge regression [
19]. The key idea is that nonlinear map data can be transformed to high-dimensional feature space and linear regression embedded in feature space using a weight penalty. As a result, it learns a linear function in the space caused by the kernel and data. This relates to a nonlinear function in the original space for nonlinear kernels [
131]. The model learned by KRR has the same form as support vector regression (SVR). The loss function of SVR is based on
-insensitive loss with ridge regression, but KRR uses the square error loss function to solve a convex quadratic programming problem for classical SVMs [
132].
4. Systematic Reviews
In this section of the reviewed articles, each of the three variables of interest is classified into two categories; one for the parametric method and the other one for the nonparametric method. To screen the literature review for such an objective, Scopus, Google Scholar, ScienceDirect, PubMed, Web of Science, and MDPI were used as search engines. In addition, the “hybrid retrieval method” was used in conjunction with each of the following keywords: “machine learning, vegetation indices, radiative transfer model, LAI, fCover or vegetation cover, and LCC and CCC”. A total of 102 publications were found in total for 2000–2022. On the other hand, publications in languages other than English, conference papers, chapters, reviews, and master’s and doctoral theses were excluded after reviewing the aggregated data for the published papers.
Finally, 73 of the total published papers, which include 46 and 27 papers applied to nonparametric and parametric methods, respectively, were identified under this research.
Figure 5 shows the general trend of published papers over a period of 22 years, indicating the greater use of the hybrid retrieval approach in the journal Remote Sensing rather than the journal Remote Sensing in Environment. Moreover, the upper part of
Figure 5 shows that there is a larger number of publications applying the nonparametric method for training the LUT database (64%) than the parametric method (36%).
4.1. Estimated Canopy Traits from Hybrid Models Based on Parametric Methods
4.1.1. Leaf Area Index
Two studies were devoted to analyzing the wheat crop. For example, the study of [
190] explored the effect of using prior knowledge relating to the distribution in the LUT-based inversion. Moreover, by using fifteen vegetation indices along with the reflectance bands, the accuracy of leaf area index (LAI) winter wheat retrieval with different phenological stages was improved. In the other study [
191], the authors also investigated the performance of reflectance-based LUT and vegetation index (VI)-based LUT over six experimental plots from 2018 to 2019 for wheat LAI retrieval.
For mixed crops including wheat, the red-edge-based VI was assessed from multitemporal RapidEye images and compared with VI-visible reflectance using synthetic spectrum [
76]. The authors of [
192] tried to find the optimal VI from nine tested VIs, using the curve fitting and backward feature elimination method (BFE) integrated with RFR. Then, three regression models, including curve fitting, k-nearest neighbor (KNN), and RFR, were determined to find the optimal algorithm for building the relationship between LAI and VIs. The aim of [
193]’s study was close to the idea of other studies by finding the suitable LAI-VI that can be resistant to chlorophyll content and atmospheric and soil brightness effects. Concerning the property of generalization, the uncertainty measures were also considered through the analysis, which was mainly focused on the crop reflectance model (e.g., PROSAIL). As sources of propagation of uncertainty in LAI estimation, the influences of changing the solar zenith angle and atmospheric perturbations were tested over multiple years (1999 to 2006) and on a regional scale [
194]. These authors focused on four indices (NDVI, OSAVI, EVI, and MTVI2) to show the spectral resolution under these conditions. Finally, Broge and Leblanc [
99] carried out a systematic and rigorous evaluation between broad-band and narrow-band VIs to find out which of them could increase the accuracy of the estimation.
The authors of [
195] dedicated their study to evaluating the performance of 43 hyperspectral VIs to find the optimal one based on two datasets of PROSAIL simulations. It also relied on prior knowledge of one from literature and the other from ground data. To build the relationship between LAI and simulated VIs, the simple (curve fitting) and advanced regression (RFR and ANN) models were employed. The same authors extended this study by comparing the results obtained from using 26 VIs and PLS dimension reduction with the use of appropriate principal components as the input variables for modeling inversion strategy [
64]. Houborg et al. [
196] evaluated the performance of a hybrid model based on VIs for mapping LAI over time and space. Under different spatial resolutions (250–500 m) for 8 days, the MODIS data were used in the coupling of PROSPECT and the two-layer Markov chain canopy reflectance (ACRM) model inversion. Moreover, in this study, a hybrid inversion scenario was investigated based on the combination of the measurements from the field and physical model. The target property (LAI) and explanatory variables of vegetation indices using Landsat 8, which were classified into five groups, were trained by using random forest and cubist regression approaches.
Table A1 summarizes the above papers and presents the main result for the LAI hybrid parametric method.
4.1.2. Fractional Vegetation Cover
In this category, few articles are reported to belong to hybrid spectral indices. The studies of [
13,
197] are mainly focused on improving such a variable of interest. In particular, the authors of [
197] developed the physical model by considering multi-angle reflectance and LAI products to quantify the Normalized Difference Vegetation Index of highly dense vegetation (NDVIv) and bare soil (NDVIs) at coarse resolution (e.g., 1 km) for estimating fCover. The other study [
13] proposed a method called the “fan-shaped method” (FSM) to mitigate the effect of CCC variation in the pixel dichotomy model (PDM)-based FVC estimation. For fCover estimation, an FSM method, which creates a two-dimensional scatter map with three vertices, represents high and low levels of CCC values, and bare soil using a CCC spectral index (SI). It relied on spectra simulated on PROSAIL and spectra measurements delivered from UAV. Lastly, the studies of [
198,
199] evaluated the impact of soil background and leaf angle distribution (LAD) on fCover estimation by using a set of different vegetation indices.
Table A2 summarizes the above papers for the fCover hybrid parametric method.
4.1.3. Chlorophyll Content at Leaf and Canopy Levels
For corn, Haboudane et al. [
69] integrated a new index by combining two indices as a ratio TCARI/OSAVI. It has the potential to predict CCC and minimize the background and LAI effects. Another study by [
200] studied the effects of different nitrogen fertilization with eight levels on CC estimation for corn. Two spectral indices (MCARI and OSAVI) were combined with spectral bands, such as OSAVI and NIR/red and MCARI and NIR/green, to define which indices can minimize the background and are sensitive to LCC. In the last study [
201], two cultivars of corn were planted in the field experiment; spectral features based on vegetation indices, wavelet coefficient (WC), and spectral reflectance were assessed for estimating the LCC.
Three articles are reported for potatoes. In the study of [
202], a systematic evaluation between sets of VIs was determined to define the suitable VI for estimation of canopy chlorophyll content (CCC). The authors of [
38] hypothesized that using the ratio of vegetation indices based on LAI normalization can accurately estimate leaf chlorophyll content (LCC) by mitigating the other external factors (e.g., soil background properties, changing leaf orientation, or changing solar zenith angle). The simulated spectra were evaluated with field measurements for five consecutive years between 2010 and 2014. The last article [
36] studied fifty hyperspectral vegetation indices for potatoes, where indices were tested to retrieve LCC and CCC. To verify the inversion result, observed data, including auxiliary data obtained from fieldwork and CHRIS image data, were utilized.
As shown from the presented systematic reviews (
Table A3), two articles [
201,
203] devoted their analysis to wheat. The authors [
201] suggested a new strategy to improve LCC estimation by building a matrix-based VI combination for minimizing the influence of LAI. Single VI (e.g., MCARI and OSAVI) and the ratio of VIs (e.g., red edge relative index) were used to build a matrix of two VIs (VI1–VI2) space and each cell of the matrix was assigned to an LCC value using simulated data. For the study of [
203], the extracted wavelengths of LCC were selected by the amplitude- and shape-enhanced 2D correlation spectrum based on using PROSAIL. Deep learning was then utilized for training the PROSAIL database to the inversion tasks of field-measured LCC.
Several studies cultivated different crops in the field, such as wheat, corn, and soybean [
204,
205,
206,
207]. These studies tried to increase the sensitivity of VI to chlorophyll content variations and resistance to LAI and other permutation factors (such as soil background). Particularly, ref. [
204] concluded that the type of crop, the type of data obtained from model simulations or/and from field measurements, spectral range, and model type can influence the predictions of variables.
Table A3 summarizes the results from the aforementioned papers.
4.2. Estimated Canopy Traits from Hybrid Models Based on Nonparametric Methods
4.2.1. Leaf Area Index
The study of [
208] proposed a new approach to alleviating the ill-posed problem that relies on the use of the object signature for a specific crop. The synthetic database was built based on the spectrum signature obtained from a neighboring pixel of interest using a neural network. The authors proved that the suggested method can reduce the uncertainties in estimations and does not require the use of prior knowledge for constraining the boundary of input parameters or identifying the crop type. Another study [
209] attempted to solve the inversion problem by introducing SVR-based kernel regularization to reduce the number of simulations, leading to reduced computational time, rather than using NN, which requires a large number of datasets for training. The recent article [
52] introduces the variable correlation through the generation of LUT to produce a realistic simulation from accurate representative combinations of the input parameter. The regularized LUT (LUTreg) was trained by GPR-based kernel since it performs well due to the unnecessity of using a large size of datasets with robustness in the estimation along with providing the uncertainty of estimates. In the article [
210], the authors explored the utility of active learning (AL) with a GPR method to train simulated datasets for reducing the sample size and redundant information. To study its performance, the outcome was compared with the results of non-kernel methods for a specific crop (wheat). By applying the hybrid NN model for the same crop at different growing seasons, the study of [
211] intended to optimize this approach by decreasing the uncertainty of LAI estimation, especially when values of LAI or green leaf area index (GLAI) are high due to the saturation effect.
Other studies for such a corn crop [
187,
192] were devoted to finding the optimal method based on a comparison of different retrieval methods. The same objective was applied to the study of [
212,
213]. With the availability of multiple data sources from multiresolution satellite data, a hybrid model can help to create a generic model, transferable and independent from in situ data, as shown in the study of [
214]. Additionally, other studies [
20,
195] used Landsat 8 and SPOT 5 to confirm the robustness and consistency of the retrieval chain for monitoring the real spatiotemporal changes in crop development.
Table A4 summarizes the above papers for a hybrid nonparametric method sub-category.
4.2.2. Fractional Vegetation Cover
Two studies [
215,
216] suggested the combination of the two models, RTM and crop growth model, for time series fCover estimation using a dynamic Bayesian network (DBN). It was generated from coarse-resolution remote sensing data and validated with a fine temporal and high spatial resolution. As shown from their findings, the proposed method gave reliable results and was visible for use at a large scale with various types of vegetation. Extending the previous two studies, a study of [
217] utilized the proposed method based on GLASS FVC data from MODIS with temporal dependencies for each Landsat 7 ETM+ pixel to constrain the dynamic vegetation growth model. They concluded that the computational power of the proposed method was improved and feasible for real-time fCover estimation. A comparison between different nonparametric approaches was performed, and GPR was found to be the best algorithm using Sentinel-2 [
210]. However, in the study of [
52], RF was the best retrieval for fCover using UAV-based hyperspectral data.
Three studies [
216,
218,
219] utilized a hybrid retrieval method for their study area with corn and wheat fields. In these studies, the authors were interested in quantifying the spatiotemporal fCover products from different scales of remote sensing. That needs, first, temporal consistency between remote sensing products to have a time series of fCover. Then, after training the simulation data by ML, the spatiotemporal fusion algorithm is used to make spatial consistency between RS data for improving the accuracy of fCover estimates. The last article [
219] developed the hybrid framework using ML to retrieve the variable of interest from the bottom of the atmosphere (BOA) and the top of the atmosphere (TOA). The rest of the studies in
Table A5 applied a hybrid retrieval model for the mixed plants.
Table A5 summarizes the above papers and presents the main result of the hybrid nonparametric method.
4.2.3. Chlorophyll Content at Leaf and Canopy Levels
Several researchers studied wheat as one of the most common crops [
210,
220,
221,
222,
223]. Two of the researchers were interested in studying AL techniques with GPR. Respecting the use of different sensors in these studies [
210,
222], entropy query by bagging (EQB) and Euclidean distance-based diversity (EBD) was the most efficient technique in terms of accuracy and computational demand. Indeed, the study of [
210] intended to compare different regression methods for estimating LCC and CCC at two different locations and found that RFR and PLSR performed better than GPR + AL.
These findings are in agreement with other studies [
52,
224] when comparing GPR with other MLs. However, some studies, such as [
206,
219,
225,
226], preferred to place their attention on one method of MLs for increasing the efficiency of training RTM-based inversion to improve the accuracy of estimations. Another study [
227] was dedicated to improving the sampling strategy in a simulated dataset to decrease the problem of ill-posed inversion. Different types of distributed datasets of simulations and variable relations were applied to reflect the real situation in the field. Other studies [
201,
221] compared NN and Bayesian network (BN) within a hybrid retrieval framework with LUT-based inversion, aiming to improve the accuracy of estimates. Finally, two studies [
220,
228] discussed the impact of different spatial resolutions on the vegetation variables and tried to decrease the uncertainties obtained from the model when applied to heterogeneous pixels.
Table A6 summarizes the outcomes from the above papers for the CCC using the hybrid nonparametric method.
5. Results, Meta-Analysis, and Discussion
Based on analyzing the articles contributed in this arena, it was found that several researchers placed more attention on methods of hybrid-model-based nonparametric, specifically nonlinear, than parametric methods (
Figure 6 and
Figure 7). In a parametric method, most of the researchers often used vegetation indices for LAI, fCover, and CC, especially the Normalized Difference Vegetation Index (NDVI), which is extensively applied as compared to other indices such as the enhanced vegetation index (EVI), the modified triangular vegetation index (MTVI2 and MTVI1), the (optimized) soil adjusted vegetation index (SAVI and OSAVI), the chlorophyll index CIgreen or the red edge, the Transformed Chlorophyll Absorption Reflectance Index, and the Transformed Chlorophyll Index (TCARI and TCI) using satellite. There are a few researchers who, in their studies, used shape indices and shape transformation (i.e., red edge and waveform analysis). The results accuracy of estimates fall within the range R
2 = 0.2–0.93 and RMSE = 0.05–0.94 m
2/m
2 for LAI, R
2 = 0.54–0.90 and RMSE = 0.05–0.22 for fCover, R
2 = 0.61–0.85 and RMSE = 3.24–11.90 (
g cm
) for LCC, and 0.61–0.85 and RMSE = 9.28–77.10 (g m
) for CCC.
Within the various types of nonparametric methods, the hybrid model based on machine learning using ANN excels in improving the accuracy of estimates (
Figure 7). Several studies applied machine learning more than the linear nonparametric methods (e.g., PLSR and LSLR), which only applied in two studies [
133,
210]. The accuracy of LAI ranges from 0.63–0.83 for R
2 and 0.32–3.89 m
2/m
2 for RMSE. fCover’s accuracy ranges from 0.70–0.98 for R
2 and 0.05–0.10 for RMSE. The accuracy of LCC and CCC falls within range for R
2 = 0.38–0.93 and RMSE = 6.5–57.51 (
g cm
), R
2 = 0.55–0.78, and RMSE = 0.35–111.90 (g m
), respectively. When comparing the range of accuracy from two approaches, it was shown that the nonparametric approach was successful to obtain the best result for fCover rather than the parametric approach. In particular, nonparametric nonlinear methods are powerful in extracting information from subtle differences in reflection by supporting covariance between biochemical and biophysical variables [
18].
Another remark is that after ANN, GPR is becoming more popular and applicable in the retrieval process, since Verrelst et al. [
130] found that GPR had the best performance using Sentinel-2 and -3 and provides retrieval uncertainties (
Figure 7). Nowadays, deep learning (DL), as extending machine learning, is starting to be explored for crop monitoring using hyperspectral images [
34,
229]. DL has the advantage of handling a large data size of training samples to possibly improve the targeted variable.
In the hybrid model context, the wheat crop was mostly analyzed by researchers, followed by corn, potato, soybean, and rice, as shown in
Figure 8. Other crop types comprise grapes, barley, alfalfa, sugar beet, oil-seed rape, cotton, pea, sunflower, garlic, and onion.
From synthesizing the reviewed studies, PROSAIL, which is an integration of the leaf level PROSPECT model and canopy-level SAIL model, seems to be more favorably used with the methods of vegetation indices and machine learning rather than other types of RTM (SLC [
41], SCOPE [
230], and DART [
231]) (
Figure 9). This is due to its simplification in terms of model parameterization and that it is computationally inexpensive and free for users in various computer languages [
212]. Nevertheless, radiative transfer models were used less often in the literature for investigating agriculture features as compared to the pure regression models [
9]. While regression models can only estimate one variable at a time, RTM can infer a wide range of vegetation features in a single model.
Therefore, the next development of physical models should be simple and capable of generating a realistic simulation in the spatial and temporal dimensions for agricultural purposes. Analyzing such a large amount of remote sensing data necessitates a computationally efficient retrieval algorithm. Recently, some studies have tried to solve this issue by introducing emulation where a technique is used for estimating model simulations, such as RTM, to accelerate the inversion procedure and the speed of vegetation mapping [
156,
232].
Among the papers reviewed in this study, nonparametric approaches with multispectral data were employed in a hybrid model more than other platforms due to their accessibility (
Figure 10). Nonetheless, hyperspectral data gathered from airborne platforms or drones have the potential to provide more precise spectral information regarding variables of interest, particularly in the red edge, NIR, and SWIR regions. Applying multisource remote sensing data, such as multispatial, multitemporal, and multiangular, in the framework of crop monitoring and management increases the estimation accuracy, as proved in these studies [
20,
54,
130,
233]. Limited access to high spectral resolution using a multisensor approach to regions of land cover heterogeneity at the pixel scale may cause the problem of scale effect. More studies need to explore approaches to eliminate the effect of size since many crops can grow together in one plot.
6. Conclusions and Future Directions
In this review paper, we provided the conceptual framework of hybrid retrieval models and processing chains for retrieving biophysical and biochemical variables using parametric and nonparametric methods. In view of the increasing popularity of hybrid strategies, including machine learning, these methods may become a cornerstone in the context of precision agriculture applications and, in particular, for hyperspectral data processing.This popularity can be explained by the synergistic use of two complementary methods (data-driven and physical-based retrieval), which perfectly combines their advantages. The simplicity, flexibility, and computational efficiency of statistical methods are combined with the generalization capabilities of the physical-based method. Additionally, the need for collecting in-situ training data is reduced and used only for validating the targeted trait.
Upon the meta-analysis, we note that the NDVI-VI and NN algorithms have been extensively applied to Landsat and Sentinel-2 data, which are among the most popular sources of remote sensing data used for crop trait estimates. The high-frequency Earth observation at different scales requires a model that can process big data with high speed in the calculations. This typically applies to the use of machine learning algorithms. As shown from the publications, researchers often utilize nonparametric (machine learning) with a radiative transfer model rather than a parametric regression approach. An important drawback of the latter approach, such as VIs or other indices, is the saturation problem, a lack of uncertainty estimates with difficulty in selecting an optimal vegetation index from a wide range of VIs that correspond to the spectral ranges in optical remote sensing data. In contrast, the nonparametric approaches can provide estimates of uncertainty and the use of the complete optical spectrum information. Developers of ML attempt to modify them in such a way that the model can reduce the erroneous values in the training data and the outliers with fast computation in the training and good candidate for the operational mapping application. In general, there is a clear gap to define an optimal generalized hybrid method (either parametric or nonparametric) coupled with a radiative transfer model that can be applied to another crop or other sites. The final result found by analyzing the articles, merging numerous sensor data from diverse spatial, spectral, and temporal ranges into a single model (e.g., hybrid method), was improved accuracy in monitoring intra-field variations of crop attributes, particularly from mid- to late growth stages and improving the level of agricultural monitoring operation.
From the perspective of the research trends, further development is needed to increase the robustness of the hybrid model in terms of model output stability while improving model performance with consensus on a single globally applicable model. The developed model can also mitigate the ill-posed problem associated with the inversion of the physical model. Besides estimating basic characteristics of crop traits, the hybrid approach with active learning techniques has recently been successfully applied in some studies for estimating nitrogen content at the canopy level. Despite the success of these studies, the techniques used for selecting the spectral feature and the informative sample to increase the quality of training data and reduce the computational burden of model generation are still in their infancy. Therefore, in the foreseeable future, additional studies should be conducted on this exciting topic to allow the hybrid method to be portable and independent from field measurement.