*3.2. Empirical Relationships*

Linear regression is a widely used method for analyzing hyperspectral imagery and retrieving target information (e.g., crop and soil properties). Both spectral reflectance and vegetation indices can be used as predictor variables in establishing a linear relationship. For instance, using spectral bands, Finn et al. [108] built linear regressions between field-measured soil moisture data and the spectral reflectance of collected hyperspectral imagery and identified bands that have stronger correlations with soil moisture. More studies have used vegetation indices in the regression for a better performance as some indices can enhance the signal of targeted features and minimize the background noise. Some of the previous studies are shown in Table 6.


**Table 6.** Selected previous studies utilized linear regression and hyperspectral vegetation indices for investigating agricultural features.


**Table 6.** *Cont.*

Overall, linear regression has been commonly used for estimating a wide range of crop or soil properties. It is easy to establish, and most of the index-based regressions generated satisfactory accuracies. However, there are several potential issues associated with this approach, such as the large number of indices available and it is unknown which performs better, regression may be very sensitive to data size and quality, and the saturation problem of indices [36,165]. It is thus critical to consider these potential issues and adopt appropriate solutions when establishing linear regressions with hyperspectral data. For instance, selecting appropriate vegetation indices with targeted crop or soil variables is recommended. Researchers have evaluated a wide range of hyperspectral vegetation indices for different research purposes. Haboudane et al. [166] examined 11 hyperspectral vegetation indices for estimating crop chlorophyll content. Main et al. [167] investigated 73 vegetation indices for estimating chlorophyll content in crop and savanna tree species. Peng and Gitelson [168] tested 10 multispectral indices and 4 hyperspectral indices for quantifying crop gross primary productivity. Croft et al. [169] analyzed 47 hyperspectral indices for estimating the leaf chlorophyll content of different tree species. Zhou et al. [170] evaluated eight hyperspectral indices for estimating the canopy-level wheat nitrogen content. Tong and He [165] evaluated 21 multispectral and 123 hyperspectral vegetation indices for calculating the grass chlorophyll content at both the leaf and canopy scales. Yue et al. [171] examined 54 hyperspectral vegetation indices for estimating winter wheat biomass. Indices performed differently

in these studies; thus, it is suggested to evaluate the top-performed ones in these studies and select the one that generates the highest accuracy.

To deal with issues of linear regression, advanced regression, such as MLR and PLSR, has also been commonly used in previous research for estimating crop and soil properties [172,173]. Compared with linear regression, the advanced regression models mostly use multiple predictor variables in the model to achieve a higher accuracy. PLSR is one of the most widely used models for investigating crop properties using hyperspectral images, such as Ryu et al. [35], Jarmer [99], Siegmann et al. [73], and Yue et al. [124] used PLSR and hyperspectral images for estimating different crop biophysical and biochemical variables (e.g., LAI, biomass, chlorophyll, content, fresh matter, and nitrogen contents). Thomas et al. [100] examined PLSR for retrieving the biogas potential from hyperspectral images and evaluated the influence of imaging time on retrieval accuracy. Regarding soil features, Gomez et al. [49], Van Wesemael et al. [107], Hbirkou et al. [102], and Castaldi et al. [110] built a PLSR model for estimating the SOC content using hyperspectral images. Zhang et al. [50] used PLSR for estimating a wide range of soil properties (e.g., soil moisture, soil organic matter, clay, total carbon, phosphorus, and nitrogen content) from hyperspectral imagery and identified factors that may affect the model accuracy (e.g., low signal-to-noise ratio, spectral overlap of different soil features). Casa et al. [59] used the PLSR model and different hyperspectral imagery for investigating soil textural features and evaluated various factors (e.g., spectral range and resolution, soil moisture, geolocation error) influencing the model performance.

The PLSR model is implemented in Python and R [174,175] and is widely used in many research areas, including forests [176], grasslands [177], and waters [178]. This model performed well in different studies owning to its strengths in dealing with a large number of inter-correlated predictor variables (i.e., by converting them to a few non-correlated latent variables), addressing the data noise challenge, and tackling the over-fitting problem [171,179]. Different techniques have also been confirmed to be efficient for improving the accuracy of the PLSR model, such as incorporating different types of predictor variables in the model (e.g., spectral bands, indices, textural variables), utilizing predicted residual error sum of squares (PRESS) statistics for determining the optimal number of latent variables, and feature evaluation for selecting more important predictor variables in the model [36]. It is thus critical to carefully examine these techniques for achieving the optimal model accuracy.
