1. Introduction
The rapid development of deep learning technology, particularly the successful application of Convolutional Neural Networks (CNNs) in image recognition and processing, has significantly expanded its potential applications in soil science. Among various deep learning architectures, ResNet has been widely utilized not only for soil classification and fertility assessment but also in plant leaf disease detection [
1]. However, soil is an important natural resource on the earth’s surface, its physical and chemical properties, humidity, types and other information are of great significance for agricultural production, ecological protection, land use planning and so on. The diversity and complexity of soil make its data show obvious spatial heterogeneity [
2] and temporal dynamics [
3], which brings several challenges to traditional soil analysis methods. For instance, conventional methods often rely on sampling and experimental analysis, which can be inefficient and inadequate for handling large-scale data. Additionally, gathering soil information usually depends on manual ground observations, which not only consumes a lot of time and resources, but also makes it difficult to accurately reflect the distribution characteristics of soil across varying spatial scales [
4]. Therefore, how to use efficient and accurate techniques for automatic analysis and large-scale monitoring of soil data has emerged as a crucial topic in soil science research.
ResNet effectively addresses the problem of gradient disappearance that can occur during the training of deep networks. By introducing a residual learning mechanism, it allows the network to maintain high accuracy when processing complex and high-dimensional data. Compared to traditional convolutional neural networks, the ResNet architecture can extract feature information from images more deeply and accurately. This results in significant advantages for tasks such as soil type classification and soil health assessment. ResNet’s advanced feature learning capability makes it well-suited for handling complex data in soil research, particularly in the analysis of remote sensing images and the evaluation of soil health [
5].
Additionally, ResNet has become extensively utilized in the fields of soil species identification and soil health assessment. By integrating remote sensing image data, ResNet not only aids researchers in accurately identifying soil types and analyzing soil health status but also offers more scientifically sound and precise decision support for soil protection and management. Regarding dataset construction, the ongoing advancements in remote sensing technology are making it increasingly feasible to obtain high-quality soil data. Specifically, remote sensing image data serves as a valuable new resource for large-scale soil monitoring. When combined with ResNet and other deep learning models, it allows for the extraction of meaningful information from vast and complex datasets, significantly enhancing the efficiency and accuracy of soil research.
This paper provides a comprehensive review of ResNet’s application in soil science, focusing on its advancements in soil type classification and soil health assessment. It explores key aspects such as dataset acquisition, preprocessing techniques, and feature extraction performance, highlighting ResNet’s advantages over traditional models. Additionally, the study critically examines challenges and limitations, including data heterogeneity, computational demands, and model interpretability, which may impact its practical implementation. Looking ahead, the paper discusses future research directions, emphasizing multi-source data fusion, model optimization, and real-time monitoring applications. By summarizing these focal points, this review aims to offer a structured perspective on the role of ResNet in soil research and provide valuable insights for advancing deep learning applications in the field.
2. Data Acquisition
The data technologies used in soil image analysis primarily include soil microscopic observation, remote sensing technology, and high-definition image acquisition. Soil microscopic observation allows for the examination of the physical and chemical properties of soil particle structures and microbial activities through detailed microscopic analysis. This provides essential data for assessing soil quality and studying biodiversity. Remote sensing technology, utilizing the extensive coverage capabilities of satellite and UAVs (Unmanned Aerial Vehicles), facilitates the rapid monitoring of parameters such as soil moisture, temperature, and vegetation coverage, thereby offering valuable insights for agricultural production and ecological conservation. Meanwhile, high-definition image acquisition enhances the accuracy of soil surface morphology analysis, allowing for improved capture of fine details such as color and texture. The integration of these technologies has significantly advanced soil science and provides multi-dimensional support for the sustainable management of soil resources.
ResNet’s feature extraction process varies across different soil imaging modalities, leveraging its deep hierarchical structure to capture texture, color, shape, and structural properties. In scanning electron microscopy (SEM) images, ResNet extracts fine-grained surface details, identifying soil particle morphology, porosity, and mineral aggregation through its convolutional layers. For optical microscopy, it processes color variations, geometric shapes, and spectral properties, enabling accurate differentiation of sandy, loamy, and clay soil while also assessing organic matter content. When applied to micro-CT scanner data, ResNet, particularly in 3D adaptations, learns volumetric representations of soil porosity, compaction, and structural connectivity, providing insights into water retention and permeability. By integrating these diverse data sources, ResNet effectively captures multi-scale and multi-modal features, enhancing soil type classification and health assessment with greater accuracy and reliability.
2.1. Collection of Soil Microscopic Images
When applying the ResNet model to soil microimages, the data acquisition process is crucial due to the rich details and high resolution inherent in these images, which can reveal the microstructure and particle composition of the soil. Acquiring such high-quality images necessitates specialized equipment and precise operational protocols to ensure the data’s suitability for training and validating deep learning models, such as ResNet, for tasks like automated soil classification and feature extraction. Typically, the acquisition of microscopic images requires high-precision microscopy equipment capable of magnifying soil details and producing high-resolution images for subsequent model training. Commonly used equipment includes scanning electron microscopes (SEM) [
6,
7,
8], optical microscopes [
9,
10,
11,
12], and micro-CT scanners [
13,
14,
15], with a comparative overview provided in
Table 1.
The image acquisition process is a critical step in microscopy, aiming to capture clear, accurate, and representative high-resolution images for subsequent analysis [
16]. This process involves several essential steps, each of which significantly influences the final image quality. To ensure comprehensive and high-quality data acquisition, techniques such as multi-view acquisition [
17], multi-magnification imaging [
18], and multi-illumination and contrast adjustments [
19] are often employed.
To maintain the clarity and consistency of microscopic images, soil samples undergo a specific preparation process [
20], as illustrated in
Figure 1. Initially, soil samples are collected and transported to the laboratory, where they are subjected to drying procedures such as air drying, low-temperature drying, or freeze-drying. Subsequently, the samples are fixed using immersion solutions, embedded in resin, and coated with a thin metal layer. The samples are then sliced and polished to achieve the desired texture and structure. Finally, additional treatments, such as staining, freezing, or vacuum processing, may be applied based on experimental requirements.
After completing the aforementioned preparation steps, the soil sample is prepared for microscopic image acquisition. High-quality sample preparation not only enhances the clarity and detail of the microscopic images but also establishes a solid foundation for subsequent feature extraction. During the image acquisition process, it is essential to fine-tune the light source, magnification, and shooting angle of the microscope to ensure that the acquired image accurately reflects the structural characteristics and physical properties of the soil samples. These optimized images provide valuable support for further feature extraction, thereby improving the accuracy and reliability of the analysis.
Feature extraction involves transforming the visual features of soil images into numerical forms that can be processed by computational models. Key methods of feature extraction include color, texture, morphological, and deep learning-based features. Color feature extraction typically relies on color histograms and color space transformations (such as RGB to HSV) to identify soil types [
21]. Texture features, which describe the size and distribution of soil particles, are often extracted using the gray-level co-occurrence matrix (GLCM) and local binary patterns (LBP). Morphological features, related to the shape and arrangement of soil particles, are derived using image dilation and erosion operations. In recent years, deep learning techniques, such as ResNet, have become essential tools for automatic feature extraction. These high-dimensional features offer superior expressive power compared to traditional manually designed features. The integration of these methods provides a solid foundation for soil analysis and classification [
22].
2.2. Collection of Soil Remote Sensing Image
The acquisition of soil remote sensing images is a vital method for studying soil properties, monitoring changes, and assessing soil health. Remote sensing technology enables researchers to gather extensive information about soil, which is crucial for agricultural management, environmental protection, and land use planning. The primary acquisition technologies include satellite remote sensing [
23] and UAV remote sensing [
24]. Satellite remote sensing, such as Landsat [
25] and Sentinel-2 [
26], offers wide-area coverage and provides multispectral or hyperspectral images, making it suitable for large-scale soil monitoring. In contrast, UAV remote sensing captures high-resolution images over smaller areas, making it ideal for detailed soil analysis and local monitoring.
The process of satellite remote sensing acquisition [
27] begins with target determination and demand analysis, followed by the definition of the research area and the selection of appropriate satellites and sensors. The next step involves planning the optimal acquisition time to avoid meteorological interference, collecting data via satellite sensors, and transmitting the raw data to ground stations for storage and preliminary quality assessment. Subsequently, radiometric, atmospheric, and geometric corrections are applied to enhance image quality. After preprocessing, the remote sensing images are ready for analysis, including the extraction of soil characteristics and land change monitoring. The final results are typically presented in reports or visualizations to support decision-making and resource management. This comprehensive process ensures the accuracy and effectiveness of remote sensing data, contributing to more scientific and efficient environmental monitoring and resource management. The entire procedure is illustrated in
Figure 2.
In the acquisition process, selecting the appropriate sensor and optimal shooting time is crucial. Depending on the specific requirements, sensors such as optical, infrared, or LiDAR [
28] can be chosen to capture soil information across different spectral bands. The selection of the shooting time should take into account seasonal variations and meteorological conditions to ensure the accuracy and consistency of the data. Following data acquisition, several preprocessing steps are generally required, including radiometric correction, atmospheric correction, and geometric correction, all of which serve to enhance image quality. After these corrections, researchers can extract important soil characteristics such as spectral features and texture information, providing a strong foundation for subsequent analyses.
During the preprocessing stage of satellite remote sensing data, radiometric correction [
29], atmospheric correction [
30], and geometric correction [
31] are critical steps. Radiometric correction aims to eliminate sensor errors, convert the received digital signals into physical radiation, and ensure the consistency of images obtained from different sensors and at different times. Atmospheric correction accounts for the effects of gases and aerosols in the atmosphere, removing scattering and absorption effects to enhance the accuracy of surface information. Geometric correction addresses positional discrepancies in the image, aligning it with the geographic coordinate system and compensating for geometric distortions caused by satellite orbits and the Earth’s curvature. These three steps collectively ensure that remote sensing images are suitable for high-precision analysis.
UAV remote sensing technology offers an efficient and comprehensive solution for soil image acquisition, covering multiple dimensions such as multispectral/hyperspectral data, RGB images, thermal infrared data, digital elevation models, and time series data. The multispectral or hyperspectral sensors mounted on UAVs can capture the physical and chemical properties of soil based on reflected light from different spectral bands. For example, visible and near-infrared bands can be used to analyze soil types, while shortwave infrared bands are valuable for assessing soil health. Additionally, high-resolution images obtained from standard RGB cameras provide clear details of the soil surface, including color, texture, and cracks, which are essential for soil classification and health assessment. The process begins with mission planning, followed by drone preparation and sensor calibration to ensure optimal data collection. Once these steps are completed, the flight execution phase takes place, during which data are captured through data acquisition. The collected data then undergo image preprocessing before reaching the final stage of data analysis, where insights are derived.
The process of UAV remote sensing soil image acquisition involves several key steps, including mission planning, UAV preparation, sensor calibration, flight execution, data acquisition, image preprocessing, and data analysis. Initially, the target area and flight route are determined during the mission planning phase. Subsequently, UAV equipment is prepared, and sensors are calibrated to ensure the accuracy of the data. The UAV then follows the designated flight route, collecting various soil data, including multispectral, thermal infrared, and RGB images. The acquired images undergo preprocessing, which includes noise removal and contrast enhancement. Finally, the soil characteristics are extracted through analysis for purposes such as classification, humidity evaluation, and other related research.
In comparison to UAV remote sensing, satellite remote sensing is more suited for large-scale and macro-level soil monitoring. It offers advantages such as global coverage and long-term observation, making it useful for applications like drought monitoring, land degradation assessment, and regional soil dynamic change analysis. However, satellite remote sensing typically provides lower resolution and has limited flexibility. On the other hand, UAV remote sensing excels in high resolution and operational flexibility. It enables fine-scale monitoring of local soil conditions, capturing details such as cracks and particle distribution, and allows for flexible adjustment of flight time and coverage area to meet specific needs. These two methods are highly complementary. While satellite remote sensing provides macro-level background and trend analysis, UAV remote sensing offers in-depth, detailed observations. The combination of both techniques allows for comprehensive soil monitoring, spanning from macro-level trends to micro-level details. A comparison between satellite remote sensing and UAV remote sensing in soil image acquisition is presented in
Table 2.
2.3. Acquisition of Soil HD Image
The acquisition of high-definition soil images is primarily carried out through photographic techniques aimed at capturing the fine details and characteristics of the soil surface. Common methods for acquiring these images include handheld camera shooting, UAV aerial photography, and the use of specialized professional photography equipment. These technologies provide valuable visual information that aids researchers in analyzing soil conditions, identifying soil types, and monitoring environmental changes.
In the image acquisition process, selecting the appropriate equipment and methods is crucial, which depends on the target area and specific objectives. Handheld cameras are well-suited for focused observations in small areas, enabling the capture of detailed soil surface features such as texture, color, and moisture variations. These cameras require stable lighting and appropriate shooting angles, making them ideal for fine-scale analysis of flat terrain and localized regions. In contrast, UAV aerial photography is more appropriate for large-scale soil monitoring. UAVs can cover broader geographical areas and capture high-resolution images, offering the advantage of quickly obtaining extensive soil data. This is particularly valuable in complex terrains or areas that are difficult to access, providing an efficient means of monitoring. Additionally, selecting the optimal shooting time is essential. Factors such as lighting conditions, weather changes, and soil moisture levels must be considered to ensure clear and usable images. The process of acquiring high-definition soil images is illustrated in
Figure 3.
After acquisition, high-definition soil images typically undergo post-processing procedures, such as color correction, denoising, and contrast enhancement, to enhance their visual quality. These processed images are invaluable for analyzing soil characteristics, monitoring plant growth, and assessing soil health. Not only do they provide crucial data support for soil research, but they also offer a scientific foundation for decision-making in agricultural management and environmental protection. With advancements in imaging technologies and the continuous development of image processing software, the collection and application of high-definition soil images will become increasingly efficient and precise.
Color correction aims to address distortions caused by variations in light sources or equipment discrepancies, ensuring that soil features are represented in their true colors. Common techniques for color correction include white balance adjustment, color space conversion, and the use of standard reference materials. These methods improve the authenticity and usability of the image, ensuring that the representation of soil characteristics is as accurate as possible.
Denoising is crucial to eliminate noise introduced during image acquisition [
38], thereby enhancing the clarity and readability of the image. Standard denoising techniques include spatial filtering, frequency domain filtering, and adaptive denoising. These methods effectively reduce random noise while preserving essential edges and details, making soil features more prominent and easier to analyze.
Contrast enhancement improves the visualization of different elements in the image, facilitating the differentiation between soil and other components. Techniques such as histogram equalization [
39], local contrast enhancement [
40], and gamma correction [
41] are employed to heighten the visual impact of the image and accentuate key features. These post-processing steps ensure that high-definition soil images are of the highest quality, providing reliable and accurate data for subsequent analysis and research.
Contrast enhancement techniques such as histogram equalization, local contrast enhancement, and gamma correction play a crucial role in improving soil image quality for analysis. Histogram equalization (HE) redistributes pixel intensity values to enhance contrast, making fine soil textures and variations more distinguishable. Local contrast enhancement, including methods like adaptive histogram equalization (AHE) and contrast-limited adaptive histogram equalization (CLAHE), adjusts contrast in smaller image regions, effectively handling uneven lighting and improving microstructure visibility. Gamma correction modifies image brightness non-linearly, preserving subtle variations in color and texture to ensure accurate soil composition analysis. By applying these techniques, soil images achieve better clarity, facilitating more precise classification, health assessments, and environmental monitoring.
3. Learning Model
The ResNet model (Residual Network) demonstrates exceptional performance in soil image analysis, particularly when compared to other deep learning models such as VGG, GoogLeNet, PReLU Net, and plain networks. The depth and stability of ResNet provide significant advantages in a variety of critical tasks. A longitudinal comparison of VGG, GoogLeNet, PReLU Net, and plain models is shown in
Table 3. This table provides both horizontal comparisons of the error (ERR) at iteration 1 (epoch 1) and iteration 5 (epoch 5), as well as vertical comparisons among the different models. ResNet-50, ResNet-101, and ResNet-152 are variants of the ResNet architecture, with the numbers indicating the number of layers in each network.
To quantitatively evaluate the performance of different deep learning models, training error metrics were recorded at specific training cycles, focusing on epochs 1 and 5, to assess both the initial learning efficiency and final accuracy of each model. The training error, representing the deviation between predicted and actual values, served as the primary metric to evaluate model performance. A rapid decrease in training error between epochs indicates efficient learning, where the model effectively optimizes parameters and extracts meaningful features. Conversely, models with slower convergence require more iterations to achieve comparable accuracy, potentially indicating limitations in feature extraction. Comparing error reduction trends across different architectures further reveals the impact of network depth and optimization strategies. Deeper models, such as ResNet, leverage residual connections to enhance gradient flow and improve long-term learning stability, leading to progressive error reduction and superior accuracy. Meanwhile, shallower architectures like VGG-16 and GoogLeNet often exhibit faster initial learning, but their final performance may be constrained by limited depth and feature extraction capabilities. This quantification approach highlights the advantages of deeper architectures in achieving lower final errors, better convergence, and overall improved accuracy.
From the comparison in
Table 3, the following conclusions can be drawn:
- (a)
The error in the first training cycle (epoch 1) for VGG-16 and GoogLeNet is relatively high (VGG-16 at 28.07 and GoogLeNet at 24.27). However, by the fifth training cycle (epoch 5), both models show significant error reduction, with GoogLeNet achieving a lower error (7.38) compared to VGG-16 (9.33). This suggests that GoogLeNet converges faster during the learning process.
- (b)
PReLU Net also demonstrated good performance, with an initial error of 9.15 in the first training cycle, and a decrease to 7.38 in the fifth cycle, indicating a stable improvement in performance over time.
- (c)
In the ResNet series, performance varies with the increase in network depth. The errors for ResNet-34 A and ResNet-34 B at the fifth training cycle are relatively low (7.40 and 7.46, respectively), suggesting good convergence for the ResNet-34 series. As the network depth increases further, the error for ResNet-50 decreases to 6.71, for ResNet-101 it is 6.05, and for ResNet-152, the error is 5.71, indicating progressively better performance. Notably, ResNet-152 outperforms the other models, achieving the lowest error of 5.71 in the fifth cycle. While VGG-16 and GoogLeNet perform well in early training stages, their final error rates are comparatively higher. In contrast, the ResNet series, particularly the deeper models, shows a clear advantage in enhancing accuracy.
The comparison of different depth models is shown in
Table 4.
In the task of soil image recognition, the training performance of ResNet was compared with its plain network counterpart, which lacks residual connections. Notably, the introduction of residual connections in ResNet does not increase the number of parameters, thereby preserving the relative simplicity of the network architecture.
Figure 4 illustrates the performance of ResNet and plain networks at different depths (18 and 34 layers).
Figure 4a illustrates the training loss and validation accuracy of plain networks with 18 and 34 layers. The blue solid line represents the training loss of Plain-18, the red solid line represents the validation accuracy of Plain-18, the purple dotted line represents the validation accuracy of Plain-34, and the green dotted line represents the training loss of Plain-34, with the
x-axis denoting the number of epochs. It is evident that as the network depth increases, the training error for Plain-34 rises, and the validation set exhibits a higher validation error. This suggests that deeper plain networks are susceptible to challenges such as vanishing gradients, leading to performance degradation.
Figure 4b presents the training loss and validation accuracy of ResNet with 18 and 34 layers. The blue solid line represents the training loss of ResNet-18, the red solid line represents the validation accuracy of ResNet-18, the purple dotted line represents the validation accuracy of ResNet-34, and the green dotted line represents the training loss of ResNet-34, with the
x-axis representing the number of epochs. The inclusion of residual connections in ResNet ensures low training and validation errors even at a depth of 34 layers, demonstrating the stability and effectiveness of residual connections in deep network architectures. Comparatively, ResNet networks exhibit lower training and validation errors than their plain network counterparts at the same depth, underscoring their superior feature learning and generalization capabilities in soil image recognition tasks.
Furthermore, compared to VGG-16, GoogLeNet, and PReLU Net, ResNet addresses the vanishing gradient problem effectively through the use of residual connections [
41]. This enhancement allows ResNet to achieve greater feature learning and generalization capabilities in deeper structures without increasing the number of parameters. By contrast, VGG-16 employs a relatively simple structure but requires a large number of parameters. GoogLeNet improves multi-scale feature extraction through its inception modules, while PReLU Net enhances non-linear expression via its adaptive activation functions.
Figure 5 illustrates the training performance comparison among ResNet, VGG-16, GoogLeNet, and PReLU Net.
Figure 5a illustrates the training loss of the ResNet network compared to VGG-16, GoogLeNet, and PReLU Net. The blue solid line represents the training loss of ResNet, the green dotted line corresponds to VGG-16, the yellow dotted line represents GoogLeNet, and the purple dotted line indicates PReLU Net. The
x-axis denotes the number of epochs.
Figure 5b compares the accuracy of these models, with the colors corresponding to those used in
Figure 5a.
The comparison reveals that ResNet, owing to its residual learning mechanism, effectively mitigates the vanishing gradient problem. It consistently achieves the lowest training loss and converges rapidly, reaching an accuracy of over 97%. In contrast, while VGG-16 demonstrates strong performance across various tasks, its high parameter complexity and depth result in a higher training loss, typically around 95%. GoogLeNet, leveraging its inception modules, maintains high accuracy with stable training loss, generally around 93%, which is lower than that of VGG-16. PReLU Net, with its learnable activation function, exhibits favorable convergence speed and accuracy, and its training loss sometimes falls below that of VGG-16, making it particularly suitable for specific tasks. Among these models, ResNet stands out not only for its superior training loss but also for achieving the highest accuracy.
When compared to traditional machine learning models, the advantages of the ResNet deep learning model lie in its feature extraction capabilities and data requirements. ResNet utilizes a deep convolutional neural network to automatically learn complex features, necessitating a substantial amount of labeled data for training to achieve high accuracy and generalization. Traditional machine learning models, such as decision trees and support vector machines, rely on manual feature extraction and can perform effectively with smaller datasets. They feature simpler structures, are easier to implement and debug, and often offer better interpretability. However, their performance may be limited in handling complex tasks. Deep learning models, while achieving higher accuracy in such tasks, are often considered “black boxes” due to the difficulty in intuitively understanding their decision-making processes. ResNet demonstrates remarkable stability across varying data volumes, making it a robust deep learning model for diverse applications. Its residual learning framework mitigates the vanishing gradient problem, allowing for effective training even with limited data, where traditional deep networks often struggle with overfitting or degraded performance. Unlike architectures such as VGG, which tend to be highly sensitive to data volume, ResNet maintains strong generalization by efficiently propagating gradients and leveraging hierarchical feature extraction. As dataset size increases, ResNet scales effectively, with deeper versions like ResNet-50, ResNet-101, and ResNet-152 showing continuous improvements in accuracy while avoiding significant performance fluctuations. Additionally, batch normalization further enhances stability by normalizing feature distributions and ensuring consistent convergence across different dataset sizes. While the model benefits from larger data volumes, diminishing returns may occur beyond a certain point, emphasizing the importance of data augmentation and preprocessing.
Table 5 provides a comparative overview of ResNet deep learning models and traditional machine learning models.
4. Data Evaluation
The evaluation of soil images is critical in advancing soil research. In recent years, the adoption of deep learning models, particularly Residual Neural Networks (ResNet) and other Convolutional Neural Networks (CNNs), has become increasingly prevalent in tasks such as soil image classification, feature extraction, and quantitative analysis. The implementation of these models has substantially enhanced the automation and accuracy of soil type classification and feature extraction, addressing key challenges in traditional soil analysis.
In soil image analysis, robust data evaluation methodologies play a pivotal role. Common practices include dataset augmentation, advanced model evaluation techniques, and the application of standardized evaluation metrics. Dataset augmentation involves techniques such as rotation, scaling, and flipping to artificially expand the training dataset, mitigating overfitting and improving model generalization. Model evaluation technologies assess the performance of deep learning models through methods such as cross-validation, confusion matrix analysis, and statistical significance testing, ensuring robustness and reliability. Commonly used evaluation indicators include accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC), which collectively provide a comprehensive understanding of model performance in soil image analysis.
Integrating diverse soil information such as moisture, temperature, and nutrient content with ResNet presents challenges due to data heterogeneity, varying spatial resolutions, and inconsistencies across different sources. Differences in data formats—ranging from soil images, sensor readings, and spectral data—require pre-processing techniques to ensure compatibility. Normalization and standardization help align numerical values, while spatial interpolation and data resampling synchronize multi-resolution datasets. To address missing or inconsistent data, interpolation techniques such as kriging [
68] for geospatial data and deep learning-based inpainting for images can be applied. Additionally, dimensionality reduction methods like Principal Component Analysis (PCA) [
69] can extract essential features from high-dimensional data. For effective integration, multi-branch CNN architectures can process different modalities separately before feature fusion at deeper layers, ensuring that distinct data characteristics are preserved while improving model performance. By leveraging these pre-processing and fusion techniques, ResNet can effectively handle multi-source soil data, enhancing soil type classification and health assessment in precision agriculture and environmental monitoring.
4.1. Dataset Enhancements
The construction of soil image datasets forms the foundation for deep learning tasks in soil analysis. Typical soil image datasets consist of diverse soil samples captured using microscopes, remote sensing technologies, or high-definition photography. These datasets encapsulate key attributes such as color, shape, texture, and other essential characteristics of soil particles. Commonly represented soil types include sandy soil, loam, and clay. The high resolution of these datasets allows for detailed visualization of the microscopic features of soil particles.
Among the various attributes analyzed in past studies, certain key features have proven particularly useful for determining soil particle size distribution (PSD). The most widely used attributes include particle shape descriptors (e.g., circularity, aspect ratio, and roundness), texture features (e.g., surface roughness and grain arrangement), and color-based properties (e.g., hue and saturation, which may indicate mineral composition). Additionally, fractal dimension analysis and Fourier descriptors have been employed to capture finer details of particle boundaries. While numerous variables have been explored in PSD studies, emphasizing the most reliable and commonly used attributes ensures more accurate and efficient soil classification and analysis [
70,
71].
One of the challenges in creating soil image datasets is the prevalence of class imbalance, where certain soil types are overrepresented compared to others. To address this issue, data augmentation techniques are often employed to balance class distributions and enrich the dataset. Traditional augmentation methods include operations such as rotation, translation, scaling, flipping, cropping, grayscale adjustment, adding noise, blurring, and radiometric transformations. These techniques are effective in increasing dataset diversity without introducing significant computational complexity.
In addition to traditional approaches, non-traditional data augmentation methods have gained prominence. These include active learning-based enhancement and model-based generation techniques. Among model-based methods, Generative Adversarial Networks (GANs) [
67] are widely recognized. GANs consist of two components: a generator and a discriminator. The generator produces synthetic data, while the discriminator evaluates the authenticity of the data. Through an adversarial training process, both components iteratively optimize, resulting in synthetic data that closely resembles real data. GANs have demonstrated significant utility in image generation, style transfer, and data augmentation; however, they may encounter challenges such as mode collapse and training instability.
Variational Autoencoders (VAEs) [
72] represent another advanced data generation method. Unlike traditional autoencoders, VAEs introduce probabilistic modeling in the latent space, which enhances the diversity and continuity of generated data. VAEs comprise two components: an encoder and a decoder. The encoder maps input data to a probability distribution in the latent space, while the decoder samples from this distribution to reconstruct the data. This probabilistic framework allows VAEs to excel in applications such as image generation, dimensionality reduction, and anomaly detection.
Figure 6 illustrates the primary methods employed for data augmentation, highlighting both traditional and non-traditional approaches.
4.2. Data Evaluation Method
To accurately assess the performance of deep learning models, commonly used evaluation strategies include splitting datasets into training and test sets, typically using an 8:2 or 7:3 ratio. To mitigate the risk of overfitting, cross-validation techniques such as k-fold cross-validation are frequently employed. The k-fold cross-validation method [
73,
74,
75,
76,
77,
78] involves partitioning the dataset into k equally sized subsets. Each subset is iteratively used as the validation set, while the remaining k−1 subsets serve as the training set. This process ensures that every subset is used once for validation. The final model evaluation metric is derived by averaging the performance metrics across all k iterations. A typical choice for k is 10, as it provides a robust balance between computational efficiency and model evaluation accuracy. By testing the model’s performance on various combinations of training and validation sets, this method effectively evaluates the model’s generalization capability [
79].
Hyperparameter optimization, such as tuning the learning rate, network depth, and batch size, is another critical strategy for enhancing model performance [
80]. These optimizations can significantly affect convergence speed, accuracy, and overall model robustness.
For soil image datasets, alternative methods such as the holdout method and bootstrap method are also valuable. The holdout method plays a pivotal role in modeling and evaluating soil datasets, especially for image data. This approach involves splitting the dataset into three mutually exclusive parts: training, validation, and test sets. For soil image datasets, the training set is utilized to learn soil features (e.g., particle structure and color distribution), the validation set is used for hyperparameter tuning, and the test set evaluates the model’s generalization performance on unseen data. A common allocation is 60–80% for the training set, and 10–20% each for the validation and test sets.
This method is particularly suitable for handling large-scale soil image datasets, such as those derived from remote sensing or microscopic imaging. Its simplicity and computational efficiency make it ideal for preliminary model evaluations. However, class imbalance issues often observed in soil image datasets—where certain soil types are underrepresented—can compromise the representativeness of the training or test sets. This, in turn, may reduce the reliability of the evaluation outcomes. Addressing this challenge often requires stratified sampling or synthetic data augmentation techniques.
The complete workflow of the holdout method for soil image data processing is depicted in
Figure 7, illustrating the division of datasets and the corresponding steps in model training, validation, and evaluation.
4.3. Evaluation Index
The bootstrap method, a resampling technique, is extensively applied in model evaluation and statistical analysis. This method involves randomly selecting samples from the original dataset to generate multiple bootstrap sample sets. Each bootstrap sample set is typically the same size as the original dataset. However, due to the resampling process, certain data points may be selected multiple times, while others may not be included at all [
81]. This inherent characteristic enables the bootstrap method to provide robust evaluations by leveraging diverse subsets of the data.
The primary advantage of the bootstrap method is its capacity to maximize the utility of limited samples, particularly in scenarios where the dataset size is small or acquiring additional data is challenging. By repeatedly resampling, the method facilitates training and validating models on various sample combinations, yielding a more comprehensive assessment of model performance. Furthermore, the bootstrap method effectively reduces the risk of overfitting by providing repeated evaluations, thereby enhancing the reliability of the results.
Beyond model evaluation, the bootstrap method is also employed to estimate the distribution of statistical metrics and construct confidence intervals. By generating a large number of bootstrap replicates, it becomes possible to quantify the uncertainty associated with model predictions or statistical estimates. This flexibility and versatility make the bootstrap method a valuable tool in machine learning and statistical applications.
Table 6 lists common model performance indicators in machine learning, each playing a crucial role in evaluating ResNet-based soil classification. Accuracy, while intuitive, is less reliable in imbalanced soil datasets, where ResNet may classify dominant soil types well but struggle with rare categories. Precision is useful for identifying specific soil types, such as saline-alkali soil, reducing misclassification, but it may overlook less-frequent categories. Recall ensures critical soil types, like cultivated land, are not missed, though a high recall may introduce false positives. To balance these trade-offs, the F1-score is particularly beneficial in ResNet applications, as it provides a harmonic mean of precision and recall, effectively handling imbalanced classifications. The AUC-ROC curve is advantageous for binary classifications, such as distinguishing cultivated from non-cultivated land, showcasing ResNet’s ability to differentiate complex patterns, though multi-class applications increase its computational complexity. Meanwhile, the Confusion Matrix helps analyze misclassification patterns, offering insights into how ResNet confuses different soil types and guiding model optimization. Given ResNet’s deep feature extraction capabilities, a combination of F1-score, AUC-ROC, and Confusion Matrix provides a comprehensive evaluation, ensuring robust and precise soil classification across diverse datasets. The quantitative analysis indicators for soil identification based on ResNet are shown in
Figure 8.
5. Application Scenarios
The application of ResNet (Residual Network) in soil science has emerged as a pivotal tool for addressing challenges related to soil monitoring, analysis, and prediction. Leveraging its deep network architecture and residual learning mechanism, ResNet demonstrates exceptional capability in processing complex soil data. Its applications span diverse domains, including soil classification, soil moisture monitoring, pollution detection, and soil quality assessment.
Through deep learning, ResNet can autonomously extract features from soil images and sensor data, enabling the integration and effective fusion of multi-source datasets. This capability enhances the precision and reliability of predictions and analytical outcomes. The adaptability and robustness of ResNet make it highly suitable for precision agriculture, environmental monitoring, and ecological restoration.
By facilitating intelligent and automated approaches to soil research, ResNet is driving significant advancements in the field, paving the way for innovative solutions to critical environmental and agricultural challenges.
5.1. Classification of Soil Types
The principle of soil type classification using ResNet primarily relies on the deep residual network’s robust feature extraction capabilities. ResNet addresses challenges such as gradient vanishing and degradation, commonly encountered in deep networks, through the “skip connection” structure within residual blocks. This mechanism allows the network to effectively learn subtle and intricate features from soil images at deeper levels.
Specifically, when a soil image is input into the network, the initial layers focus on extracting basic features such as edges and textures. As the data progresses through deeper residual blocks, the network captures increasingly complex and abstract features. These features are subsequently integrated and analyzed in the classification layer, enabling the model to accurately differentiate between various soil types.
The operational principle of soil type classification with ResNet is visually illustrated in
Figure 9.
Traditional soil classification usually depends on physical and chemical laboratory analysis, such as the determination of particle size distribution, soil organic matter content, pH value, salinity, nitrogen, phosphorus, potassium and other nutrients. These methods require a large number of field samplings, experimental operations and data analyses, which are time-consuming, high-cost, and the results are easily affected by human subjective factors. For example, the world soil classification system [
95] (WRB) and the United States soil classification system [
96] (USDA soil taxonomy) classify soils through a series of standardized features. Compared with traditional methods, image-based soil classification has the following advantages: (A) non-invasive and wide range: using remote sensing images or UAV ground images can obtain soil surface features non-invasively, covering a large area, especially in large-scale agricultural production. (B) Fast and efficient: compared with traditional manual analysis and laboratory testing, image-based soil classification methods can significantly improve the efficiency of data processing and reduce human interference. (C) Diversity of data sources: remote sensing images, UAV images and ground sensor data can be used at the same time to provide multi-dimensional and multi-level information and improve the accuracy of soil type classification. With the rapid development of deep learning, especially the success of ResNet in the task of image recognition, the potential of soil image classification using these technologies is gradually emerging. The comparison between traditional soil classification methods and ResNet soil image classification methods is shown in
Table 7.
The soil image dataset for classification is sourced from Kaggle [
98]. The dataset is divided into two categories: the training set and the test set. Data collection is a crucial step in any data-related task, requiring careful and organized gathering of relevant information from various sources. This process involves determining the specific information needed and identifying the appropriate sources for obtaining it [
99]. The process of soil type classification is illustrated in
Figure 10.
In their research into soil classification, scholars worldwide have extensively explored the application of artificial intelligence and deep learning technologies to replace traditional manual methods. Internationally, research has primarily focused on developing universal and cross-regional soil classification algorithms [
100]. Foreign scholars have employed models such as ResNet to enhance the model’s generalization ability by utilizing a large volume of annotated data. ResNet and other models have shown superiority in extracting fine features from high-resolution remote sensing [
101] and satellite images [
102]. Additionally, foreign researchers have integrated ResNet with convolutional neural networks (CNN), generative adversarial networks (GAN), and other technologies to develop innovative frameworks that address the loss of detail in complex soil images. European and American countries also use ResNet to monitor soil pollution and detect organic matter and heavy metal content in the soil through multispectral data, thus supporting soil environmental monitoring efforts. For example, foreign researchers conducting soil type recognition tests based on subgraph selection have demonstrated that decision-making using the highest subgraph recognition probability yields stable classification results. This approach has been shown to significantly enhance the accuracy of soil type recognition in machine vision applications, ensuring more reliable and consistent identification outcomes [
103].
In contrast, China focuses more on localization and practicality in applying deep learning to soil classification. Domestic researchers widely use ResNet to automatically identify soil types and optimize the model for complex terrains and diverse soil types. For instance, some domestic scholars have enhanced ResNet by adding convolution layers and feature extraction modules to improve the model’s adaptability to regional soil characteristics. Additionally, there is an increasing use of ResNet for high-resolution remote sensing and UAV images, which boosts its practical value in farmland classification, land resource management, and other applications. In precision agriculture, ResNet aids in farmland soil classification, providing intelligent support for agricultural planting and water resource management.
Validation methods for large-scale soil type classification are essential to ensure model accuracy and generalizability across diverse regions. Cross-validation, particularly k-fold cross-validation, is commonly used to assess model robustness by training and testing on different dataset partitions. Independent test set evaluation further validates performance by using unseen soil data to measure real-world applicability. Field validation, involving on-site soil sampling and expert analysis, provides ground-truth verification by comparing predicted classifications with actual soil properties. Remote sensing validation, using satellite or UAV imagery, helps assess classification accuracy over extensive areas. Additionally, statistical metrics such as confusion matrix, F1-score, AUC-ROC, and kappa coefficient quantify classification performance and highlight potential misclassifications. A combination of these methods ensures comprehensive validation, supporting reliable large-scale soil type classification [
104].
ResNet should deliver satisfactory classification performance after validation, demonstrating high accuracy, strong generalization, and effective feature extraction across diverse soil datasets. Through cross-validation, independent test set evaluation, and field validation, its performance can be measured using precision, recall, F1-score, and AUC-ROC metrics. A well-validated ResNet model should consistently minimize misclassification errors, accurately distinguish between soil types, and maintain stability even in complex or imbalanced datasets. Furthermore, successful validation against ground-truth soil samples and remote sensing data would confirm its reliability for large-scale soil classification, ensuring its practical applicability in real-world scenarios.
In China, the integration of deep learning with traditional soil analysis methods is also emphasized, with the optimization of classification models being supported by soil physical and chemical properties. For example, by combining physical analysis with the ResNet model, soil chemical composition or physical properties are used as auxiliary input data to improve model classification accuracy. These practices highlight the different emphases in soil classification research and application between China and abroad, with China focusing more on the localization, adaptability, and practical use of models, while foreign research tends to prioritize universality and cross-regional optimization.
5.2. Soil Health Assessment
The principle of soil health assessment based on ResNet involves analyzing the soil’s health status using deep learning models, typically utilizing remote sensing images or soil sample images. ResNet’s unique residual structure addresses the gradient vanishing problem in deep networks by introducing residual connections, enabling effective extraction of key soil characteristics such as soil type, structural changes, pollution levels, and other factors impacting soil health. This capability allows ResNet to perform exceptionally well in processing large-scale, complex soil image data. By training the ResNet model, it can identify and classify the physical, chemical, and biological characteristics of soil, including organic matter content, nutrient distribution, microbial activity, and potential pollution, providing a scientific basis for soil health assessments. The ResNet method can efficiently process complex image data, quickly and accurately assess soil health, and is especially effective for remote sensing image analysis and large-scale soil monitoring. This contributes to dynamic monitoring and management decision-making for soil quality. The training process follows the same procedure as outlined in the previous section, with the assessment of soil health being a more in-depth interpretation and understanding of the images.
ResNet can also be applied to the assessment of soil fertility, utilizing its deep feature extraction capabilities to analyze key soil properties such as texture, organic matter content, moisture levels, and nutrient distribution from high-resolution soil images. By training on large-scale soil datasets, ResNet can effectively classify fertility levels based on spectral, structural, and morphological characteristics. Through multi-modal data integration, including remote sensing imagery, microscopic soil images, and laboratory-tested soil parameters, ResNet enhances the accuracy of fertility assessments. Additionally, quantitative validation using ground-truth measurements ensures the model’s reliability in predicting soil productivity. The ability of ResNet to learn complex patterns in soil composition makes it a valuable tool for precision agriculture, land management, and sustainable farming practices, supporting efficient decision-making for soil improvement strategies [
105].
Compared with traditional soil health assessment methods, deep learning-based methods such as ResNet offer significant advantages. Traditional methods rely heavily on physical, chemical, and biological analysis of soil samples, which are labor-intensive, localized, time-consuming, and prone to human error. In contrast, ResNet and similar deep learning models can quickly process a vast amount of soil data through remote sensing or soil sample images, automatically extract critical features such as soil type, pollution levels, and nutrient distribution, thus improving both the accuracy and efficiency of the evaluation. The residual structure of ResNet effectively mitigates the gradient vanishing issue in deep networks, ensuring excellent performance in processing complex and large-scale data. This makes it highly suitable for large-scale soil health monitoring, providing real-time, scientific decision support for soil management. The comparison table is shown in
Table 8.
The research on soil health assessment, both domestically and internationally, has made rapid progress, particularly with the application of remote sensing technology, deep learning, and data fusion methods. Remote sensing, especially high-resolution satellite and UAV images, combined with machine learning and deep learning techniques, is widely used in foreign research to assess soil health. For instance, studies in the United States and Europe utilize remote sensing images to extract the physical and chemical characteristics of soil and apply methods such as support vector machines and random forests for analysis. In recent years, the accuracy and efficiency of soil health assessments have been significantly improved through the use of deep learning, particularly convolutional neural networks (CNN) and the ResNet model. Additionally, multidimensional data fusion has become a key research direction abroad, combining remote sensing data, climate data, soil sampling data, and GIS to enhance the comprehensiveness of soil health assessments [
112].
In China, research on soil health assessment primarily focuses on agricultural production and ecological environmental protection. With the growing demand for precision agriculture, remote sensing technology and geographic information systems (GIS) are gradually being incorporated into domestic research for monitoring and evaluating soil quality, particularly in arid and saline-alkali regions. The application of deep learning techniques for soil health assessment has gradually emerged in China, with researchers applying methods like CNN and long short-term memory networks (LSTM) to improve the accuracy of assessments using UAV and ground sampling images. More recently, domestic research has been exploring the use of ResNet and other deep neural network models to analyze remote sensing images, thereby extracting multidimensional soil characteristics.
By leveraging ResNet and other deep learning models to analyze soil images, key features can be effectively identified and evaluated, providing a scientific basis for soil health evaluation and management. Soil health can be accurately assessed through classification (e.g., health, damage) or regression (e.g., organic matter content) using trained models, optimized with appropriate loss functions and optimizers. The model’s effectiveness in soil health analysis is verified through evaluation on an independent test set, offering valuable insights for agricultural management and sustainable development. The general process is shown in
Figure 11.
6. Conclusions and Prospects
With the continuous advancement of deep learning technologies, particularly the application of the residual network (ResNet) in soil science, significant progress has been made. ResNet, through its unique residual learning mechanism, effectively addresses the gradient vanishing problem encountered during the training of traditional deep neural networks, allowing the model to perform deeper learning and extract complex features from soil data. This mechanism ensures that information flow is maintained in deeper network structures, preventing the loss of important features as the network depth increases, making ResNet highly effective in processing soil image data.
In soil science, the application potential of ResNet is primarily evident in soil type classification and soil health assessment. In soil type classification, ResNet extracts soil features from images, enabling the rapid and accurate differentiation of soil types, thus providing a scientific basis for agricultural management and land use planning. In soil health assessment, when combined with remote sensing images, soil sensor data, and other multi-source data, ResNet can more accurately evaluate soil health, detect pollution, nutrient deficiencies, and other issues, offering valuable data support for land governance and ecological restoration.
Additionally, with the continuous progress in remote sensing and soil sensor technologies, ResNet can integrate more diverse soil information, including moisture, temperature, and nutrient content. This enables more accurate, real-time decision support for agricultural production, precision irrigation, environmental protection, and other sectors. By fusing multimodal data, ResNet not only enhances the accuracy of soil information extraction but also plays a critical role in improving the sustainable management of land resources. These technological advancements not only drive the in-depth study of soil science but also provide a robust technical foundation for practical applications.
While ResNet demonstrates significant advantages in soil science applications, it is equally important to address the challenges and limitations associated with its implementation. One notable challenge is data heterogeneity, as soil data come from various sources, including remote sensing images, field sensors, and laboratory analyses, each with different resolutions, formats, and noise levels. The integration of such diverse datasets requires extensive preprocessing techniques to ensure consistency and compatibility. Additionally, high computational demands pose another limitation, particularly in real-time monitoring and edge computing applications. Deep ResNet architectures require substantial processing power and memory, making deployment on low-resource devices challenging. Furthermore, interpretability and model transparency remain concerns, as deep learning models, including ResNet, often function as “black boxes,” making it difficult to understand their decision-making process. This can hinder trust and adoption in scientific and agricultural communities. Lastly, domain-specific limitations, such as the scarcity of high-quality labeled soil datasets, can affect the model’s generalization ability. Addressing these challenges through data standardization, model optimization, and explainability methods will be crucial for advancing the practical applications of ResNet in soil science.
Looking ahead, the application of ResNet in soil science will expand further with advances in remote sensing, sensor technology, and big data analysis. Multi-source data fusion has become crucial for improving the accuracy of soil information extraction. By integrating multi-modal data such as remote sensing images, soil sensors, and meteorological data, ResNet will offer more precise soil analysis across various dimensions and scales. As computational demands increase, model optimization and lightweight techniques become essential for enhancing ResNet’s efficiency in soil science applications. Pruning reduces model complexity by removing redundant neurons and filters, enabling faster inference while maintaining accuracy, which is particularly beneficial for real-time soil classification and field-based analysis. Quantization further improves efficiency by reducing the precision of model weights and activations, making ResNet more suitable for low-power edge computing and remote sensing applications. Additionally, knowledge distillation transfers knowledge from a larger, more complex ResNet model to a smaller, more efficient version, ensuring that high-level features relevant to soil texture, moisture content, and fertility assessment are retained. By integrating these optimization techniques, ResNet can achieve higher processing speed, lower energy consumption, and greater adaptability, making it a practical tool for large-scale soil monitoring and precision agriculture. Furthermore, real-time monitoring and prediction are poised to be key areas of research. Leveraging deep learning and real-time data, ResNet will enable the real-time prediction of critical indicators such as soil moisture and health status, providing vital support for agricultural and environmental management decisions. Simultaneously, as computational demands increase, model optimization and lightweight technologies will become essential research directions to enhance computational efficiency [
113].