4.1. The PC Analysis
Upon applying PCA to the hyperspectral images, our analysis shows a compelling narrative of data dimensionality and spectral feature representation. The scree plot (
Figure 3) illustrates that the first two principal components are the most significant, explaining most of the variance within the dataset. This initial finding provided a rationale for focusing on these components for further model training and analysis. However, we decided to consider the first six PCs in the model training process, as minor spectral variations, corresponding with the water stress and structural variation in the crops, appeared in the higher components of PCA transformation.
Figure 4 (loadings of spectral bands in each PC) proves this assumption that PC3, PC4, PC5, and PC6 contain minor spectral variation of specific wavelengths, regarding their coefficient of loading in the PCs.
To analyze the relationship between the first six PCs and soil moisture levels, we created scatterplots, each with two combinations of PCs, to see how the moisture samples are distributed and discriminated given their moisture levels. The goal of this analysis was to identify any meaningful spectral characteristics that could be linked to the canopy’s spectral properties, influenced by both the plants’ physiological state and the underlying soil moisture.
Figure 5 and
Figure 6 show the distribution of soil moisture samples at two depths: 10 cm and 30 cm, respectively, in a 2D scatterplot created based on the first six PCs.
Before moving forward to the scatterplots, each PC should be explained in terms of the wavelengths contributed, given the loading plots presented in
Figure 4. In
Figure 4, PC1 represents the overall brightness of the reflectance, which in the case of plant canopies, is influenced by the general health and vigor of the vegetation. While this component might be less directly related to soil moisture, healthier plants generally correspond to better-watered conditions, which might indirectly reflect the overall reflectance.
Some wavelengths have a significant contribution to PC2. PC2’s alternating loading patterns are indicative of the pigmentation differences within the plant canopy (regarding the significant contribution of wavelengths in the range of 645–730 nm and 800–950 cm [
28,
29]), which is related to variations in chlorophyll content. Many studies have reported that healthier plants with more available water display different reflectance characteristics than those under water stress [
30,
31,
32]. This variation may have a relationship with soil moisture at the root level, so we expect PC2 to be a significant variable in our model.
In PC3 and PC4, almost single spectral bands or narrow regions of wavelengths (900 nm in PC3 and 950–1000 nm in PC4) with significant contributions are shown. These wavelengths from the NIR region of hyperspectral images were already used by previous studies for generating spectral indices related to plant structure [
33]. Therefore, PC3 and PC4 might capture more useful information related to the vegetation’s structure, such as leaf area index (LAI) or canopy density, both of which can be correlated with soil moisture. This finding was highlighted in ref. [
34], where they could find the variation of spectral indices related to plant structure with different levels of water stress. As a result, changes in leaf water content led to subtle shifts in reflectance in the NIR region, which could be picked up by these components.
PC5 and PC6, with their fine-scale oscillatory patterns, might detect very specific responses of the plant canopy to variations in soil moisture, including stress responses that only affect certain narrow bands of the spectrum. For example, these PCs might highlight the slight spectral shifts that occur when plants begin to experience water stress before more visible symptoms appear. Because of the existence of these specific wavelengths in PC5 and PC6, we decided to maintain these PCs in our model training process even though these PCs contain less than 1% of the variation.
As mentioned, to analyze the relationship between the soil moisture levels and spectral info, we not only created the 2D scatterplot of PCs but also the 2D scatterplot of soil moisture levels and some significant wavelengths with higher contributions in the PCs.
The distribution of soil moisture at a 10 cm depth (
Figure 5), as interpreted through the PCA of plant canopy reflectance hyperspectral data, presents a nuanced understanding of the relationship between spectral characteristics and soil moisture levels. In this study, we observed that PC1 exhibits a significant gradient when plotted against the other PCs in terms of causing higher variation, suggesting higher discrimination of moisture samples in different levels (three clusters of low moisture content, mid-level, and high-level moisture content), suggestive of its overarching role in encompassing general reflectance related to canopy characteristics potentially influenced by soil moisture content. This relationship is particularly evident in the scatterplots of PC1 versus PCs 2 and 3, where systematic variance in soil moisture content along the PC1 axis was noted. The previous loading plots bolster this observation, as PC1’s uniform high loading across all wavelengths corresponded to general spectral intensity, which, in the context of canopy reflectance, could be influenced by the moisture conditions of the surface soil.
Conversely, PC2, when compared with PCs 3 through 6, presents a more heterogenous relationship with soil moisture content, indicating that PC2 might be capturing contrast variations in the canopy related to differential water stress levels. This aligns with the oscillatory loadings observed in PC2 from the loading plots, reflecting the contrast in reflectance across different wavelengths. Plots F through I of PC2 against the higher PCs show a complex and less-direct correspondence with soil moisture levels, suggesting the influence of additional canopy stress factors or other environmental variables. The inter-relationships among PCs 3 through 5 (Plots J, K, L, M, N) display a diffuse distribution of soil moisture content, potentially capturing more subtle or specific canopy and soil interactions, which are not linearly related to soil moisture at the investigated depth. The connection between these higher-order PCs and soil moisture content is less apparent in the scatterplots, which may imply that while they capture fine spectral details as indicated by the previous loading plots, these details might not be directly correlated with the soil moisture at the depth of 10 cm.
At a depth of 30 cm, the PC scatterplots (
Figure 6) exhibit a meaningful relationship with soil moisture levels, indicative of the intricate interactions between canopy reflectance and subsurface water content. PC1’s influence appears attenuated in plots A through E, with soil moisture distribution showing a gradient along PC1’s axis, though less distinct than at 10 cm depth. This suggests a dilution of the direct linkage between surface reflectance and soil moisture as the depth increases, hinting at the canopy’s integration of long-term soil moisture availability rather than immediate variations at this greater soil depth.
In the PC2 scatterplots (
Figure 6F–I), the distribution of soil moisture is notably scattered, lacking the clearer associations observed at shallower depths. Such a pattern may reflect the plant canopy’s composite response to environmental stressors, which includes but is not directly indicative of the moisture at a 30 cm depth. The lack of a strong, direct correlation might also point toward the canopy’s physiological adaptations and structural characteristics, which can obscure the spectral signals of deeper soil moisture contents. Comparing these results with the 10 cm depth plots, it becomes clear that as we probe deeper into the soil profile, the direct relationships captured by the hyperspectral canopy reflectance data become increasingly unclear. This trend underscores the complexity of the subsurface soil moisture dynamics and their expression through the plant canopy at various depths.
4.2. Crop Canopy Spectrum Analysis
In this section, we present the analysis of the interplay between select wavelengths, represented by key spectral bands, and the soil moisture at various depths within the cornfield. The wavelengths chosen for this investigation are those identified through PCA as having a significant influence (or causing higher discrimination of soil moisture levels) on the dataset, as indicated by their contribution to the loading plots. There are two rationales behind this aspect of our analysis. Firstly, we aimed to analyze how the relationship between soil moisture and canopy reflectance fluctuates across different soil depths as the plants progress through their growth season. This comparison helps us to understand how the depth of root water uptake influences the spectral signature of the canopy. Secondly, by examining the data from multiple dates, we sought to determine when and at what depth the correlation between soil moisture levels and canopy reflectance is strongest. Identifying these key times and depths allows us to pinpoint the most informative spectral bands for monitoring soil moisture, which can be critical for precision agriculture practices and water management strategies. Hence, given the previous studies conducted on corn’s growth stages, which are assigned the designations of VE to R6, from seeding to harvesting, we could assign each dataset to a specific growth stage. 15 June was placed in the V6 to V10 stages of plant growth and the data collection on 30 June was conducted at the V10 stage. The data collected on 11 July and 20 July included the stage of VT and R1, and the data collection conducted on 2 August and 14 August were at the stage of R1. Regarding the changes in root density and distribution over these stages of plant growth, we decided to compare the changes in the relationship between the soil moisture samples and canopy reflectance over the growth period to examine the effects of root structure changes on the relationship.
We plotted the soil moisture data collected on various dates against the corresponding reflectance values to create a series of 2D scatterplots.
Figure 7 and
Figure 8 display these relationships for soil moisture depths of 10 cm and 30 cm, respectively.
From the figures, different depths show unique correlation profiles with the spectral data. Higher correlations at specific depths point to where the crop’s roots are most active in water absorption. For example, if shallower depths (e.g., 10 cm) demonstrate higher correlations early in the season, this is due to shallower root systems (
Figure 7). There are two other reasons that we could see a higher correlation in the moisture samples collected on 15 June. The first reason is that, on this day, there was a very high difference between the moisture samples over the irrigated and non-irrigated plots, where the non-irrigated plots were under high water stress. At this stage of the corn growth stage (named V7 to V10), a critical stage for corn, any variation in available moisture can lead to significant physiological changes. These changes are readily detectable as alterations in the canopy reflectance, explaining the higher correlation when plants are undergoing such rapid development. This high water stress over the non-irrigated plots significantly affects the canopy spectrum and the crop’s leaf greenness, where we could even see the canopy color difference from simple RGB images. The correlation value of 0.95 in 453 nm and 677 nm (exactly the blue and red bands of the sensor) could show a higher correlation. This finding was also highlighted by Tucker in 1979, who used a red band for vegetation monitoring [
35].
There are two other findings from the correlation values in
Figure 7. The first finding is that there is a high correlation between soil moisture measured from irrigated and non-irrigated plots at a depth of 10 cm and spectral information in visible bands. Higher correlation at the specific depth of 10 cm in the moisture samples collected on 15 June proves that over the V6 to V10 stages of plant growth, the root system is highly active in the upper soil layers during these vegetative stages, with water uptake being vigorous to support rapid growth. This high activity in the root zone means that changes in soil moisture at this depth can significantly impact the plant’s physiological status, including its canopy reflectance, especially in the bands sensitive to chlorophyll and water content. This finding was also highlighted by Jackson et al. [
36]. Canopy responsiveness is another specific reason, proving the higher correlation between samples at a depth of 10 cm and canopy reflectance values. The canopy is particularly responsive to water status during early growth stages (V7 to V10, 15 June). Soil moisture at a 10 cm depth directly influences the health and vigor of the canopy, affecting its spectral reflectance properties. This sensitivity is especially evident in bands that capture the chlorophyll content (
Figure 7, and NIR and red-edge wavelengths show a higher relation with the soil moisture samples at 10 cm). This finding was mentioned by Gitelson et al. in 2003, where they could highlight a relationship between leaf chlorophyll content and spectral reflectance in the red edge and near-infrared regions to create an algorithm for non-destructive chlorophyll assessment in higher plant leaves [
37].
The correlation values between moisture samples and the crop’s canopy spectrum information decreased for moisture samples measured on 30 June. On this date, although there was a reasonable difference in soil moisture content in the samples over the irrigated and non-irrigated plots, the crop’s canopies were similar at their highest levels of greenness and chlorophyll content. This date could be considered as the mid-vegetative stage of plant growth or mid-stage of the growing season when the plant usually has higher levels of chlorophyll [
38] as well as a higher resiliency to environmental stress (high air temperature or low water available) [
39]. Therefore, the higher difference in soil moisture sample over the irrigated and non-irrigated plots could not cause significant correlation (or variation in canopy spectrum), and the drop in the correlation values across all the spectral variables is expected.
As the crop matures and roots grow deeper, we expect to see an increase in correlation at greater depths (30 cm), specifically over the non-irrigated plots where the root starts going down to access water [
40]. As displayed in
Figure 8, the higher correlation occurred at the depths of 30 cm between 11 July and 20 July, which is the growing season. Over these two data collection dates, there were no rainy days, the crops in non-irrigated plots were under high water stress, and irrigated plots were irrigated regularly. This caused a situation where there was not only a high difference in soil moisture samples over the irrigated and non-irrigated plots, but the crop’s canopy spectrum was also significantly different over irrigated and non-irrigated plots. The higher correlation between soil moisture samples and the crop’s canopy spectrum can also be associated with the corn’s root functioning at this stage. This stage is also called the tasseling stage (VT), when just before and during tasseling, the corn plant is still typically at its peak greenness, indicating high chlorophyll concentration as the plant prepares to reproduce [
41,
42]. At this stage, the crop root is functioning critically in various terms, such as water uptake, where the plant roots are actively up taking water to support the high transpiration demands of the large leaf area. Adequate water is crucial at this stage, as it directly impacts tassel development and the success of pollination. Thus, since there has not been adequate water in non-irrigated plots, we expected a high spectrum variation due to the root–canopy response and eventually a higher relationship between soil moisture levels and canopy spectrum. The non-irrigated plot moisture levels revealed a high correlation between moisture levels at 30 cm and the canopy spectrum (the scatterplots of samples collected on 11 July and 20 July in
Figure 8).
Root growth and soil exploration are two other factors that we considered while analyzing the moisture samples measured on 11 July and 20 July. On these two dates, because of the higher water stress of the plant and low levels of soil moisture, the corn roots began exploring the soil and going deeper to access water [
43,
44], potentially making the root system more extensive at a 30 cm depth, causing higher correlation values at this depth.
During the R1 to R6 stages of corn growth, which encompass the reproductive phase from silking to physiological maturity [
45], we collected data twice, on 2 August and 14 August. The functioning of the roots in irrigated and non-irrigated plots can reveal several insights. The first noticeable result is that the correlation values between the measured soil moisture at 30 cm and reflectance in visible wavelengths with a value between 0.60 to 0.65 during July remained unchanged or slightly increased to a value of between 0.65 and 0.73 in moisture samples collected at a 30 cm depth measured on 2 August and 14 August (
Figure 8). This finding proves that, in August, the plant’s roots at a lower depth (30 cm) become more functionally active, and moisture variation causes higher variation in canopy spectrum information.
Another finding from the samples measured over this period is that we can see a better relationship between the moisture samples of non-irrigated plots at a depth of 10 cm compared to the samples measured during July. This is due to the non-irrigated plants’ heightened sensitivity to changes in water availability, where a slight increase in moisture leads (resulting from rainfall) to a significant spectral response. This sensitivity is a result of the plants being more attuned to water stress, making their canopies more responsive to fluctuations in water supply.
4.3. Evaluation of Learning Tools
For the SVR, we explored several kernel types including RBF, linear, and polynomial to identify the best fit for our data characteristics. The regularization parameter C was varied across a broad spectrum (0, 10, 50, 100, 200) to determine the optimal balance between model complexity and training accuracy. The epsilon values (0.1, 0.2, 0.5), controlling the width of the margin of tolerance where no penalty is given to errors, and gamma values (0.01, 0.001, 0.002), which define the influence of a single training example, were carefully adjusted to fine-tune the model’s sensitivity to the training data.
For the RF, adjustments were made to the maximum depth of the trees (3, 5, 7, 9, 20) to prevent overfitting while ensuring sufficient model depth for capturing data complexities. The number of estimators was also varied (20, 50, 100, 200) to find an optimal number that provides the best generalization performance without becoming computationally prohibitive.
In the case of the GBM, we tested different configurations by altering the number of estimators (100, 200, 300) and the learning rate (0.02, 0.05, 0.1, 0.2) to control the training speed and overfitting potential. The subsample parameter was set at 0.5 and 0.6 to introduce stochasticity in the learning process, thereby improving the robustness of the model against noise in the training data.
For the Artificial Neural Network (ANN), the architecture was optimized by varying the number of neurons per layer (32, 40, 50, 64) and experimenting with different optimizers (Adam, SGD) along with a range of learning rates (0.1, 0.01, 0.002, 0.0005). The batch sizes (32, 64, 100, 256) were adjusted to optimize computational efficiency and convergence speed during training. This allowed the ANN to effectively learn complex patterns and interactions within the dataset. The process of model training was conducted with 70% of the total samples (548 samples), and the remaining samples were used for evaluation of the models.
The SVR model was configured with a radial basis function (RBF) kernel, regularization parameter C = 100, loss tolerance = 0.2, and kernel coefficient γ = 0.01 to effectively manage the trade-off between model complexity and learning accuracy from the data. For the RF regressor, we selected an ensemble of 50 trees with a maximum depth of 9, employing bootstrap samples, ensuring robust random sampling, and preventing overfitting.
Further, our GBM was tuned with parameters set for optimal depth and node creation using a learning rate of 0.2 and 100 estimators. The ANN, designed using the TensorFlow framework, comprised multiple dense layers with 40 neurons each, ReLU activation, and dropout of 0.3 to prevent overfitting, which was crucial for learning non-linear relationships in the data. An Adam optimizer with a learning rate of 0.0005 was utilized to minimize the mean squared error across 400 epochs, demonstrating the model’s efficiency in learning from the training data while validating against the test set.
The assessment of machine learning models for estimating soil moisture at a depth of 10 cm revealed significant differences in performance. Presented in
Table 2, the models evaluated included RF, GBM, SVM, and ANN. The ANN emerged as the most accurate model, recording an R
2 of 0.72, indicating that it could explain 75% of the variance in soil moisture at this depth. The performance of the ANN is attributed to its nature and power in capturing the complex nonlinear relationships inherent in the soil moisture variables compared to other algorithms [
46].
Additionally, it achieved the lowest RMSE of 2.30%. The SVM followed with an R2 of 0.68, while GBM and RF had lower but still substantial R2 values of 0.67 and 0.63, respectively. The progression from RF to ANN highlighted a clear trend: as the complexity of the models increased, so did their ability to capture and predict the nuances of soil moisture variability. In terms of Pbias error, the ANN model had the best performance at a 10 cm depth, with a Pbias of 0.055%, indicating the least amount of bias and highly accurate predictions with minimal over or underestimation. The SVM model followed closely, with a Pbias of −0.32%, demonstrating only a slight tendency toward underestimation. The GBM model had a Pbias of 2.41%, reflecting a moderate level of bias but still maintaining a good balance between prediction accuracy and systematic error. The RF model exhibited the highest Pbias at 2.66%, suggesting a greater degree of bias and a higher likelihood of over- or underpredicting soil moisture.
Further statistical analysis supports the superior performance of the ANN model (
Table 3). The ANN achieved an F-statistic value of 432.7, indicating a highly significant model fit with a practically negligible probability of F-statistic (
p < 6.06 × 10
−46). This was complemented by the lowest BIC score of 677.5 among the models, suggesting that despite its complexity, the ANN model offers a desirable balance of fit and simplicity. The
t-tests conducted on the slopes and intercepts of the regression models reinforce this conclusion, with the ANN model showing a statistically significant
t-test/slope value of 20.87 and a near-zero
t-test/intercept value, suggesting minimal deviation from the origin. These statistical validations cement the ANN model’s status as a robust and reliable method for soil moisture estimation, marking it a preferable choice for applications that require high accuracy such as precision agriculture and water resource management.
Figure 9 shows that the 1:1-line scatterplots and the distribution of residuals resulted from the machine learning algorithms.
Figure 9a through
Figure 9d depict 1:1-line scatterplots of observed versus predicted soil moisture for each algorithm. Consistent with the previously reported statistical metrics, the ANN model (
Figure 9d) demonstrated the closest adherence to the 1:1 line, reflected in an R
2 value of 0.72, indicating a strong positive correlation between predicted and observed values. This is corroborated by the narrow spread around the line of unity, suggesting accurate predictions with minimal bias. The SVM model (
Figure 9c) also showed tight clustering near the 1:1 line, albeit with slightly more dispersion than the ANN, corresponding to its R
2 of 0.68. The RF and GBM models (
Figure 9a,b), while still showing a positive correlation with R
2 values of 0.60 and 0.63, respectively, exhibit a broader scatter of points, implying less-accurate prediction of soil moisture levels.
Figure 9e–h provide insight into the distribution of residuals for each model’s moisture predictions. A residual is the difference between the observed value and the model’s prediction, and an ideal model would have a residual distribution that is narrowly centered around zero. The ANN model (
Figure 9h) achieves a nearly symmetrical distribution of residuals with a sharp peak around zero, indicating that most of its predictions were very close to the actual values. The SVM model (
Figure 9g) displays a similar pattern, with a slightly wider spread, which aligns with its marginally lower R
2 value. In contrast, the RF and GBM models (
Figure 9e,f) show wider distributions, reflecting the greater variability in their prediction accuracy. The histograms reaffirm the superior performance of the ANN and SVM models over the RF and GBM models for soil moisture estimation at this depth.
At a depth of 30 cm, all of the models demonstrated enhanced performance, with the ANN leading (R
2 = 0.79,
Table 4) and the GBM not far behind (R
2 = 0.74,
Table 4). The SVM also displayed a commendable R
2 of 0.70, showcasing its robustness in modeling at this depth. The scatterplots corroborate these findings, with a tighter cluster of points around the 1:1 line for GBM and SVM (
Figure 10b,c), and the residual plots revealed a relatively narrow spread for these models (
Figure 10f,g), suggesting a higher consistency in predictions. In terms of biases errors, the ANN model showed the best performance in terms of Pbias, with a value of 1.19%, indicating minimal bias and highly reliable predictions without significant over- or underestimation. The GBM model also performed well, with a Pbias of 2.12%, reflecting a reasonable balance between accuracy and bias. The Random Forest (RF) model had a slightly higher Pbias of 2.34%, indicating a modest increase in systematic bias. In contrast, the SVM model had the highest Pbias at 2.74%, making it the most prone to over- or underpredicting soil moisture.
The further statistical tests presented in
Table 5 also prove the better performance of the ANN model. In this table, the ANN model shows the highest F-statistic of 640.1, indicating the strongest relationship between predicted and observed values and the greatest statistical significance. This is supported by the probability (Prob) of the F-statistic, where the ANN model has the lowest value at 2.01 × 10
−55, which effectively demonstrates a virtually zero chance that the model’s predictive capabilities are due to random variation. The
t-test/slope values are all significant for the models, with the ANN model exhibiting the highest value of 25.29, suggesting a strong linear relationship between the predicted and observed soil moisture levels. The ANN’s
t-test/intercept is closest to zero at 0.98, showing minimal bias in its predictions. Conversely, the RF model exhibits a negative intercept, suggesting a consistent underestimation of the soil moisture content.
Figure 11 presents the error breakdown of soil moisture estimates at 10 cm and 30 cm soil depths, determined using the ANN model over several dates.
Figure 11 shows that, generally, non-irrigated plots yielded more accurate soil moisture estimates than irrigated ones across most dates (less than 2% at 10 cm and less than 3% at 30 cm). As previously examined, the early growth stage on 15 June is characterized by minimal root activity at 30 cm, which may lead to a lesser correspondence with canopy spectral data and, consequently, increased estimation errors. Additionally, the substantial estimation errors across various depths on 25 June can be attributed to the preceding rainfall, which likely led to homogenized soil moisture due to saturation, hampering the spectral differentiation typically seen in the canopy, a phenomenon referred to as spectral saturation. This effect aligns with the observations in the scatterplots (
Figure 7 and
Figure 8) and is consistent with the spectral responses expected during the early growth stages, where even minor shifts in water content can significantly alter canopy spectra.
Figure 11b indicates that the smallest relative errors for the non-irrigated plots occurred in the 30 cm depth samples collected on 11 and 20 July. During these dates, the plants in non-irrigated plots experienced moisture stress at shallower levels, prompting root systems to extend deeper in search of water. This stress response is reflected in the higher correlation (
Figure 8), which inversely correlates with a lower RMSE, indicating enhanced model accuracy for non-irrigated samples in July.
The error breakdown for the July period in
Figure 11b highlights intriguing results for the data collected on 2 and 14 August, corresponding to the R1 growth stage. Here, the correlation between soil moisture at a depth of 30 cm in non-irrigated plots and canopy spectral data was stronger than at shallower depths; a relationship likely influenced by the moisture-responsive root system at these lower depths, leading to improved estimation accuracy.
4.4. Comparison with Previous Studies
When comparing our findings with previous research, our study highlights the promising accuracy of advanced machine learning models, particularly ANNs, for soil moisture estimation across multiple depths. Notably, the ANN achieved an R
2 of 0.72 at 10 cm, aligning closely with the high performance benchmarks set in other studies on soil moisture estimation in agricultural contexts. This reinforces the assertion that complex models like ANNs, capable of capturing nonlinear interactions, outperform simpler algorithms. For instance, as noted by Ding et al. [
47], Nawar and Mouazen [
48], RF consistently delivers robust predictions of soil properties, particularly in arid, heterogeneous soils. However, our findings indicate that while RF is robust (R
2 of 0.63), the ANN’s accuracy surpasses it, suggesting that ANNs are better suited for capturing soil moisture variability in complex environments, as also observed by Chen et al. [
49] and Lindner et al. [
50].
The selection of critical spectral wavelengths in our study, particularly 453 nm, 557 nm, 677 nm, 814 nm, and 997 nm, directly contributes to the high accuracy of soil moisture models, especially for ANN and SVM. This is in line with Ge et al. [
10], who found that pretreatment of spectral indices, particularly those emphasizing green, red, and red-edge wavelengths, enhanced model performance significantly. Also, Guan et al. highlighted the importance of reflectance data from red, green, blue, and NIR bands, which showed correlations with SWC measurements that were as strong as or stronger than many vegetation indices, suggesting that these bands should be incorporated into machine learning models [
51]. Similarly, our study found that targeted spectral data enhanced the predictive capability of the ANN, achieving a low MAE of 1.83% at 10 cm, reflecting a robust accuracy consistent with other studies that leveraged pretreatment for SMC predictions. These findings suggest that carefully selected wavelengths amplify model precision, particularly when applied with advanced machine learning techniques.
At greater soil depths, our results mirror previous studies in highlighting the challenge of maintaining prediction accuracy as depth increases. Specifically, Zhu et al. [
52] reported that models exhibit a drop in R
2 with increasing depth due to reduced spectral signal quality from subsurface layers and increased canopy interference. In our study, model performance at a 30 cm depth showed a similar trend for all models except the ANN, consistent with the observations by Ge et al. and Zhu et al. of diminishing prediction accuracy for soil moisture in deeper layers. Additionally, Chen et al. [
49] and Thevs et al. [
53] indicated that the root-zone depth and vegetation canopy significantly impact accuracy, with models demonstrating heightened sensitivity to shallow soil moisture levels. Our results support these findings, showing that the ANN and SVM achieved better predictive performance at a 10 cm depth, where spectral data strongly correspond to moisture content due to shallow root uptake.
Interestingly, our study also echoes the insights from Chen et al. [
49] regarding the anti-interference capabilities of RF in handling noise and outliers, which allows it to perform reliably under variable moisture conditions. While the RF in our study performed reasonably well, particularly at a 10 cm depth, the ANN’s superior generalization to complex soil interactions resulted in a higher accuracy, as reflected by the ANN’s F-statistic of 432.7. Moreover, Chen et al. [
49] found that SVM generally performs well with complex datasets, though it is sensitive to noisy data. Similarly, our SVM model achieved a strong R
2 of 0.68 at 10 cm, proving its efficacy in scenarios with limited noise but showing less resilience at 40 cm, where outliers impacted its accuracy more noticeably.
In line with Shidan Zhu et al. [
54], who highlighted that the effectiveness of ML models in soil moisture estimation varies across root depths, our study confirms that the ANN consistently performs best across all depths, with the SVM and GBM following closely at shallower levels. Specifically, at 30 cm, the ANN reached an impressive R
2 of 0.79, underscoring its capacity to manage soil moisture prediction even as depth increases; a challenge previously noted by Chen et al. [
49] for models applied to varying soil depths. This result indicates that while depth inherently complicates moisture estimation, the ANN provides a reliable solution for applications where detailed, depth-sensitive moisture data are essential.
Overall, our findings resonate strongly with previous research, reaffirming that machine learning, and particularly ANNs, plays a crucial role in achieving high soil moisture prediction accuracy across variable depths. The comparable performance metrics, such as high R2 values and low error rates, support that advanced ML models, tailored with critical spectral information, are capable of capturing soil moisture intricacies across layers. This consistency with prior studies underscores ANN’s potential as a primary tool for precision agriculture and water management applications, especially when model selection is aligned with the target depth of moisture estimation.