1. Introduction
The quantitative estimation of vegetation characteristics from hyperspectral remote sensing is required for various applications in agriculture, ecology, plant physiology and meteorology [
1,
2]. In the agricultural context, among many vegetation characteristics, leaf area index (LAI), chlorophyll content and fractional vegetation cover (fCover) are of prime importance for crop production, crop phenotyping and precision farming [
1,
3,
4]. LAI, a key attribute of vegetation structure, is related to gas exchange processes and the final production of crops [
5,
6], while fCover is used to monitor vegetation growth [
7]. Chlorophyll content is an essential variable for photosynthesis and plant functioning and can serve as an indicator of plant nitrogen content [
8]. Chlorophyll content can be defined either at the leaf level (leaf chlorophyll content, LCC) or at the canopy level (canopy chlorophyll content, CCC, that is, the product of LAI and LCC) [
9].
To derive vegetation attributes from hyperspectral remote sensing data, empirical (statistical) [
4,
10,
11], physically based (using radiative transfer models (RTMs)) [
12,
13,
14,
15] or hybrid approaches can be used [
16,
17]. In the physically based approach, the interaction between solar radiation and vegetation is represented by RTMs based on physical laws [
16,
18]. RTMs can be utilized in either the forward or inverse (backward) mode. Using RTMs in the forward mode is an effective way to understand the model’s behavior and the impact of parameterizing the model variables on the simulated canopy reflectance [
19]. The inverse mode is applied to retrieve the biophysical and biochemical variables associated with a certain canopy reflectance spectrum [
20]. RTMs are required to be well suited for the crop type under study. For homogeneous and continuous canopies, a large number of studies have successfully used PROSAIL (a combination of the PROSPECT leaf model and the SAIL canopy model) [
21]. For row crops (heterogeneous and discontinuous crop) such as potato, the SLC (Soil–Leaf–Canopy model [
22], which is an extended version of the PROSAIL model, seems to be more suitable because it can accommodate for both (homogeneous and heterogeneous) canopy structures. Moreover, compared with more complex 3D models, the SLC model is more simple and requires less information on canopy structure [
23]; it can directly retrieve the fCover variable from two basic input variables—LAI and vertical crown cover (Cv) [
24].
Researchers have suggested many different inversion strategies, including look-up tables [
15,
20]; numerical optimization [
25,
26]; and machine learning-based inversion methods, such as artificial neural networks [
13,
27,
28,
29], genetic algorithms [
30], support vector machines [
31] and Bayesian system [
24,
32,
33]. Among inversion strategies, the look-up table (LUT) constitutes a commonly used and robust approach for inverting an RTM [
34]. The aim of the present study is to improve this method for more accurate retrievals of plant properties.
One general limitation of RTM inversion is the “ill-posed” problem. The root of this problem is that several combinations of variables result in similar spectra that are closest to the actual remote sensing observations [
20,
35]. Moreover, the measurement and model uncertainties may also induce high inaccuracy in the simulated reflectance spectra [
26,
36,
37]. Several regularization schemes have been proposed to mitigate this problem and obtain more stable and reliable solutions (estimated parameters) [
20]: using
a priori information [
38]; increasing data dimensionality (i.e., multi-angular, multi-temporal) [
39]; the selection of proper cost functions [
34,
40,
41]; wavelength selection [
12,
13,
34,
42,
43]; the use of multiple solutions [
12]; and the addition of Gaussian noise [
34,
44,
45,
46].
In this study, a new regularization method for the LUT inversion approach is proposed in which the variance–covariance structure among variables is considered in the simulation. Generally, previous studies have explicitly treated the input parameters of RTMs independently to generate a range of simulated canopy reflectances [
47]. However, in reality, certain variables are intercorrelated (e.g., LAI and fCover). Considering the correlation between model variables, which is an additional source of
a priori information, may prevent a mismatch during the sampling of the LUT input space and thus result in realistic variable combinations. This, in turn, may prevent the generation of meaningless canopy spectra and thereby improve the retrieval process [
47].
There are different methods that can be applied to transform independent random variables to correlated ones. However, compared with other methods, the Cholesky decomposition algorithm is simpler, computationally faster, and insensitive to the sampling error in a small sample size, and it works for nonlinear data distributions [
48,
49,
50,
51]. As shown in previous studies, the Cholesky algorithm with the nonparametric regression approach can be employed to derive biophysical and biochemical parameters [
52,
53]. Nevertheless, this algorithm has not yet been applied directly to physical models (RTMs) to retrieve the corresponding parameters [
34].
The present study intends to address the following research question: To what extent does integrating the correlation structure of selected variables into the LUT inversion approach using the SLC model improve the retrieval of LAI and fCover? We hypothesized that adding a known relation between variables to the LUT (denoted as regularized LUT (LUTreg)) would increase the accuracy of LAI and fCover retrievals compared with the standard LUT (LUTstd). Besides addressing the relation between variables, an existing regularization technique (multiple solutions) was examined as an additional means of stabilizing the LUT retrieval approach. Retrievals from both types of LUTs (LUTreg and LUTstd) were validated by in situ measurements and systematically compared to highlight the added value of using variable relation in the generation of LUT.
4. Discussion
This study extends previous studies that suggested several regularization schemes to mitigate the ill-posedness problem in radiative transfer models. We proposed a new way to regularize the LUT inversion approach by introducing the correlation between variables to improve the variable retrieval. The results obtained from LUTreg confirm that the accuracy of LAI retrieval was improved significantly compared with the retrieval using LUTstd. However, fCover retrieval did not show any improvement in LUTreg. Quan et al. [
47] used a correlation matrix of five input model variables in the Bayesian network-based inversion approach (without using the Cholesky algorithm) in order to improve the accuracy of LAI and canopy water content (CWC) retrievals. In their study, the estimated LAI values were improved (R
2 = 0.77 and RMSE = 0.51) compared with those obtained by using independent input variables (R
2 = 0.69 and RMSE = 0.55). These results were comparable to our result. However, we used fewer variables (only two variables, LAI and Cv) to improve the accuracy of the parameters of interest (LAI and fCover) using the Cholesky decomposition.
As another regularization measure, we used
a priori knowledge about the crop species to specify the leaf inclination distribution function parameters (LIDFa and LIDFb). We compared the simulated spectra of four types of LIDFs (planophile, spherical, transitional-1 and transitional-2) with the measured reflectance data and selected the best type of LIDF. LIDF type 4 (“transitional-2”), which represents a transition between planophile and spherical, was determined to be the best match. This result of
Table 4 is consistent with the results of References [
26,
59]. They found that a planophile canopy structure matches a sugar beet type of canopy more than it does a potato canopy. By contrast, other studies have stated that potato leaves mainly tend to be of the planophile type in fertilized plots [
8,
60].
In addition, for multiple solutions the RMSE and NRMSE for more than 100 solutions from both LUTs improved the accuracy of estimations which is comparable with the result of Reference [
66]. In this study, we found that 300 best entries was optimal solution for size of 17,280 LUT and more than 300 solutions the accuracy started to become stable and slightly decrease. In contrary to other studies, they found that the optimal solution for a size of 100,000 and 280,000 LUT was between 10 and 250 entries [
12,
20,
66,
67]. The accuracy of LAI estimations increased when using LUTreg (R
2 = 0.74, NRMSE = 24.45%) compared with LUTstd (R
2 = 0.71, NRMSE = 25.57%). Especially for low and medium LAI values, LUTreg decreased the underestimation of LAI values compared with LUTstd. Several studies have also observed the underestimation of LAI values in their predictions and attributed these underestimations to the effects of soil background, the heterogeneity of vegetation cover and shadow [
37,
39,
59]. However, in the studied experimental plots, the reflectance contribution from the background soil was less than the contribution from green plants. Furthermore, the influence of row-induced shadow effects was minimal because the reflectance measurements were acquired from the nadir position. As shown in
Figure 5, LUTreg decreased the underestimation phenomenon compared with LUTstd. On the other hand, for the plots that relatively covered the soil, LUTreg retained the high values of LAI overestimation (above 3), which are poorly estimated by a 1D turbid medium RTM (i.e., SLC) as a result of the saturation effect [
20,
59]. This result is consistent with other studies that have used the PROSAIL model and overestimated high LAI values (above 3.5) [
37,
39,
68].
Both types of LUTs failed to accurately estimate the LCC variable using the SLC model (see
Appendix A,
Table A1). Other studies that have used the SLC model [
33] or the PROSAIL model [
12,
13,
24,
36,
39,
59] have reported similar failures to accurately derive LCC. This issue might be caused by the limited range of LCC values in LUT and the uncertainty of using SPAD measurements [
39]. For fCover, LUTreg could not improve the accuracy compared with LUTstd (R
2 = 0.69, NRMSE = 18.60% for LUTreg; R
2 = 0.70, NRMSE = 17.85% for LUTstd). However, the results of both scenarios revealed that the accuracy increased at high values of retrieved fCover (>0.4) and scattered points were distributed around the 1:1 regression line, comparable to Reference [
69]. This result means that high values of fCover are indicative that the potato crop is homogeneous (fully covered by vegetation) in the horizontal direction. This observation is aligned with the assumption of the turbid medium of canopy RTMs. On the contrary, lower values of fCover (<0.4) were overestimated. This overestimation might be explained by the fact that non-leaf elements (i.e., stem, shoot and branch) that greatly affect the canopy reflectance are not reflected in the SLC model. In a study in Reference [
70], when the vegetation cover was at a low level, the PROSAIL model overestimated the fCover for a corn crop, whose planting pattern is similar to that of the potato crop (discontinuous crops).
Calculating the variance also confirmed the improvement of the LAI estimate derived from LUTreg. This implies that the proposed method (Cholesky algorithm) reduced the inverse problem by decreasing the uncertainties of modelled spectra generated from LUT for each plot. However, compared with LUTstd, the percentage of variance for the Cv and fCover retrievals from LUTreg (50% for Cv and fCover, respectively) (
Figure 8 and
Figure 9) was not obvious, as in the case of LAI (60%) (
Figure 7). The errors mainly came from the five plots of retrieved Cv from LUTreg. This is because these plots were not fully covered by the canopy (heterogeneous). In addition, the Cv parameter caused a nearly linear mixing of canopy and soil spectra, while LAI mainly modified the near-infrared spectrum, with deeper dips at high LAI. This likely induced problems related to the compensation effect between variables (LAI and Cv) since the fCover variable is a derived quantity from basic inputs (LAI, Cv, and the LIDF variables). The different combinations of Cv and LAI can produce the same value of fCover [
33]. For instance, plots with high LAI and low Cv or vice versa increase the uncertainty in retrievals of fCover. Additionally, the plots that had low LAI and Cv were characterized by a high uncertainty in their corresponding estimations because of the contributions of the soil signature. Nevertheless, in plots with high values of LAI and Cv, the model represented more accurate estimations of fCover (corresponding to high values), indicating well-developed canopies.
Although LAI retrieval was improved by considering the variable correlations between LAI and fCover, there is still a need to examine the full correlation matrix among model variables to enhance fCover and CCC retrieval, as well. It is supposed that appraising the other correlations between free model variables (such as LAI–Cw/Cm/SM/LIDF and Cm–N/Cv) in LUT using the measured correlation variables will likely increase the reduction of the development of the unrealistic simulated reflectance produced by an unrealistic combination of input model variables. It is recommended that future research apply the presented method to different RTMs (i.e., PROSAIL, INFORM, or SCOPE) and to different areas at different scales in order to investigate the retrievals of various types of crops from different remote sensing products. To apply the proposed method for different observation dates or areas in a simple way an average correlation coefficient (between this range 0.65–0.90) can be used to improve the results instead of using correlation for a single date or area. However, for representing the actual situation for each observation data or area the Cholesky method can not be easily applicable in term of time calculations. Moreover, it is suggested to examine the given strategy with other inversion methods, such as OPT, Bayesian, and ANN.
5. Conclusions
This study presented the feasibility of using a Cholesky algorithm in the LUT-based inversion approach to improving predictions of the SLC model. To quantify vegetation attribute for 27 potato plots the LUT inversion approach was used as a robust and simple method using hyperspectral remote sensing data. Besides regularization techniques to optimize LUT inversion method, the incorporating known variable correlations were considered as an additional source of a priori information to avoid the generation of meaningless canopy spectra. The proposed method (Cholesky algorithm) utilizes the correlation between LAI and fCover that naturally exists in the study field to correlate the independent model variables LAI and Cv. Retrievals from the regularized LUT (LUTreg), which was modified by using known variable correlation were compared with the standard LUT (LUTstd).
The results revealed that LUTreg was appeared to be successful for improving the accuracy of retrieved LAI and CCC rather than LUTstd in term of R2 and NRMSE%, because of reducing the probabilities of unrealistic combinations of model parameters. However, the estimated fCover was not improved by LUTreg that is due to the error of the estimated Cv. By calculating the variability of model predictions the estimated LAI (5.10) was remarkably decreased over 27 plots in LUTreg compared to LUTstd (12.10), while the estimated Cv and fCover from LUTreg were slightly decreased (0.10 and 0.02) as compared to LUTstd (0.20 and 0.04, respectively).
For further studies, two main issues could be addressed and taken into account. First, measuring the LIDF parameter and soil spectra in the field should be considered to increase the accuracy of fCover retrieval. Second, based on ground data sufficiency a full variance–covariance or correlation matrix of all involved variables of RTMs should be considered to implement the Cholesky algorithm efficiently for decreasing the uncertainties of retrievals especially for Cv, which is related to improve the accuracy of fCover variables.