1. Introduction
Rice, a vital staple crop cultivated across 155 million hectares globally, serves as the primary food source for approximately 3 billion people. By 2050, the global population is projected to reach 9 billion, necessitating a 70% to 100% increase in food production to meet escalating demand [
1,
2]. Rapid classification and diagnosis of nutrient deficiencies in field-grown rice, followed by timely management interventions, are critical for ensuring healthy crop growth, accelerating the breeding of superior varieties, and safeguarding food security.
To address the need for high-throughput, efficient, and low-cost field detection, unmanned aerial vehicles (UAVs) have emerged as transformative tools in modern agriculture [
3]. Equipped with advanced sensors such as RGB cameras, multispectral imagers, and thermal devices, UAVs enable real-time, high-resolution, and non-invasive monitoring of crop health and field conditions [
4,
5,
6]. Over the past decade, UAVs have been widely adopted in precision agriculture [
7], facilitating applications such as pest detection [
8], crop health assessment [
9], yield prediction [
10], and nutrient management [
11,
12]. The integration of UAVs with remote sensing technologies [
13], particularly multispectral and hyperspectral imaging, has revolutionized agricultural data collection and analysis [
14,
15,
16]. For instance, vegetation indices (VIs) derived from multispectral data, such as the Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Enhanced Vegetation Index (EVI), and so on, have been extensively utilized to assess crop health, nutrient conditions, and biomass [
17,
18]. These indices leverage the spectral reflectance properties of plants to provide insights into their physiological conditions, enabling early detection of nutrient deficiencies, water stress, and disease outbreaks [
19,
20]. Despite these advancements, challenges such as limited flight time, regulatory restrictions, and data processing complexities remain barriers to widespread adoption [
21]. Moreover, UAVs have been employed for precision spraying of fertilizers and pesticides, reducing chemical usage and minimizing environmental impact [
22,
23]. Despite these advancements, challenges such as limited flight time, regulatory restrictions, and data processing complexities remain significant barriers to the widespread adoption of UAVs in agriculture [
24,
25].
Machine learning (ML) algorithms have further enhanced the utility of VIs by enabling the development of predictive models for crop classification, yield estimation, and nutrient diagnosis [
26,
27]. For instance, random forest (RF) and support vector machine (SVM) methods have been successfully applied in terrain classification, crop trait detection, and the prediction of nitrogen content in crops such as rice, wheat, maize, and cotton [
28,
29]. The combination of VIs and ML has proven particularly effective in addressing the challenges of spatial and temporal variability in agricultural fields, providing a robust framework for precision nutrient management [
30]. Moreover, the integration of various types of vegetation indices has further enhanced their performance in the aforementioned aspects [
31,
32]. In recent years, deep learning (DL) techniques have gained significant traction in agricultural remote sensing due to their ability to automatically extract complex features from high-dimensional data. Convolutional neural networks (CNNs) and other deep architectures have been employed for tasks such as crop classification, field segmentation, and disease detection [
33]. For example, U-Net, a popular deep learning model, has been used for precise field boundary detection and crop type mapping, while Vision Transformers (ViTs) have shown promise in handling multi-spectral and multi-temporal data for yield prediction [
34,
35]. Deep learning models excel in capturing spatial and spectral patterns in UAV imagery, making them ideal for applications such as nutrient deficiency classification and topdressing management [
36,
37].
However, field fertilization practices do not exhibit ideal uniform distribution [
38], but rather demonstrate variability depending on application methods and crop growth stages [
39]. Additionally, excessive or indiscriminate fertilizer application can adversely affect land productivity [
40,
41]. Therefore, this study proposes a novel framework for classifying nutrient deficiencies and formulating fertilization management strategies in field-grown rice, aiming to accurately reflect the actual nutrient deficiency conditions during rice growth. The framework integrated visible light VIs, multispectral VIs, and image features extracted through deep neural networks, leveraging the complementary strengths of these feature types to achieve accurate and robust nutrient deficiency classification. By utilizing UAV platforms for high-throughput and regionalized field detection, the framework generates real-time fertilization prescription maps based on actual nutrient deficiency conditions, enabling intelligent and precise management of rice growth. This approach not only enhanced the efficiency and accuracy of nutrient diagnosis but also aligned more closely with practical production needs, offering a scalable solution for optimizing rice cultivation and ensuring food security.
2. Materials and Methods
The overall experimental design and technical approach of this study are illustrated in
Figure 1. The controlled field experiment was conducted on high-quality ratoon rice under nutrient deficiency conditions. Multispectral imageries of the rice field were captured using an unmanned aerial vehicle (UAV) remote sensing platform, and data were extracted from the imagery using ENVI version 5.3 software (HARRIS Geospatial, Wokingham, UK). Subsequently, visible light VIs and spectral VIs were calculated based on the multispectral imagery data, while deep features were extracted from the visible light imagery. Following this, the VI features, after significance screening, were fused with the deep features and utilized as input features for the machine learning models. Machine learning models for classifying nutrient deficiency in the field were constructed using XGBoost, random forest (RF), and support vector machine (SVM) classifiers. Finally, the optimal classification model was employed to identify actual nutrient deficiency conditions in the field, and corresponding fertilization strategies for the ratoon season were formulated based on the deficiency conditions.
2.1. Study Area and Field Experimental Design
As illustrated in
Figure 2A, this study was conducted in 2024 at the rice cultivation base of Huazhong Agricultural University in Wuhan, Hubei Province, China (30.474852° N, 114.356769° E). The study focused on a controlled nutrient deficiency experiment for high-quality ratoon rice. The study area is characterized by a subtropical monsoon climate, with favorable environmental conditions, distinct seasons, abundant sunlight, and ample rainfall. The average annual rainfall is 1269 mm, and the average annual temperature ranges between 15.8 °C and 17.5 °C. In this study, the main growing season of the ratoon rice spanned from 30 April 2024 (transplanting date) to 3 August 2024 (main season harvest date), while the ratoon season continued until 10 October 2024 (ratoon season harvest date). Field management practices followed local standards, including sufficient irrigation, necessary herbicide and pesticide applications, and other standard agronomic practices.
In this study, base fertilizer was applied before transplanting, with additional fertilizers (tillering, panicle initiation, and spikelet protection) applied during the main season. To capture nutrient deficiency variations, four fertilization levels (25%, 50%, 75%, and 100% of the standard rate, labeled N1–N4) were implemented. In the ratoon season, bud-promoting and tillering fertilizers were applied at levels consistent with the main season. Detailed fertilization rates are provided in
Table 1. The experiment involved six ratoon rice varieties (HHZ, LY6326, ZDQY1610, LLY68812, YY4949, and WLY6312), labeled V1–V6, with the field experimental design illustrated in
Figure 2B.
2.2. UAV Platform for Field Imagery Acquisition, Processing, and Analysis
2.2.1. UAV Multispectral Imagery Acquisition
In this study, multispectral imagery of the field was captured one week before the application windows for the spikelet protection fertilizer in the main season (18 June 2024) and the bud-promoting fertilizer in the ratoon season (20 July 2024). These imageries, labeled S1 and S2, guided specific fertilization strategies. The UAV platform used was the DJI M3M RTK quadcopter (SZ DJI Technology Co., Ltd., Shenzhen, China), equipped with a multispectral camera capturing reflectance in five bands: visible light (RGB), near-infrared (NIR), and red edge (RE). The camera’s specific parameters are shown in
Figure 3A, with an imagery resolution of 2592 × 1944, a field of view of 84°, and an equivalent focal length of 24 mm.
During acquisition, the UAV flew at 15 m altitude and 1.5 m/s speed, with 75% forward and side overlap. Each flight captured approximately 800 multispectral imageries, covering the entire experimental field. To ensure accuracy, imagery was acquired under clear, cloudless conditions around 10:00 a.m. The discrete images were stitched using DJI Terra software Version 4.1.0 (SZ DJI Technology Co., Ltd., Shenzhen, China) to generate a complete field multispectral imagery, saved in high-resolution TIFF format, as shown in
Figure 3B.
2.2.2. Vegetation Index Features and ROIs Extraction
Vegetation indices (VIs) were calculated based on the multispectral imagery captured by the UAV. These indices are linear or nonlinear combinations of multiple spectral bands, and they were used to replace single-band reflectance values. Depending on the types of bands involved in the calculation, VIs were categorized into visible light VIs (RGB-VIs) and spectral VIs (S-VIs). RGB-VIs were derived from the reflectance values of the R, G, and B bands, while S-VIs were calculated using all five bands: R, G, B, NIR, and RE. Typically, RGB-VIs provided a more intuitive representation of the surface color distribution of crops in the field, whereas S-VIs offered more detailed insights into the nutritional conditions of crops. These two types of VI features complemented each other.
In this study, 13 S-VIs and 10 RGB-VIs related to rice growth conditions were calculated from the field multispectral imagery and used as features. The values of each feature, including 23 VIs and RGB, were treated as a separate band and merged to create a new TIFF image. This process transformed the original multispectral imagery into a new image containing 26 band values, which was then used for subsequent image extraction and dataset creation. The specific VIs and their calculation formulas are listed in
Table 2.
ROI refers to the region of interest, which is manually selected to identify specific areas of interest. In this study, ROIs were extracted for each variety within each fertilization treatment area. Two ROIs were selected for each variety in each fertilization treatment area, ensuring that these ROIs covered the entire rice planting region. After extraction, the boundaries of the rice planting area and field pathways within each ROI were removed. Following this procedure, 48 valid ROIs were extracted from the multispectral imagery of the field during both the S1 and S2 periods, covering 6 varieties and 4 fertilization treatment areas. Subsequently, each ROI image was randomly cropped to obtain 125 sub-images, with each sub-image containing the 26 features mentioned earlier. A dataset containing 1000 sub-images was created for each variety, which was then used for subsequent training. This process is illustrated in
Figure 3C.
2.2.3. Significance Analysis of Vegetation Index Features
A large number of VI features could hinder the accurate and efficient construction of classification models. To identify more relevant VI feature combinations, reduce data dimensionality, minimize interference from irrelevant data, and enhance the sensitivity of the UAV platform to imagery of different nutrient-deficient areas in the field, this study employed the Pearson correlation coefficient method. Formula (1) was used to calculate the correlation coefficient between each VI feature and the nutrient deficiency categories of field rice. Subsequently, the statistical significance was tested using
p-values, with
p < 0.05 considered statistically significant. The correlation coefficients and
p-values of each VI feature were statistically analyzed, and VI features with |
r| < 0.2 were further excluded from the significant VI feature combinations. Finally, by combining data from both the S1 and S2 periods, the intersection of highly correlated VI features was taken to obtain a VI feature combination applicable to different periods.
where
r represents the correlation coefficient between variables
X and
Y. If
r > 0, then
X and
Y are positively correlated; if
r < 0, then
X and
Y are negatively correlated; and if
r = 0, then there is no correlation between the two.
Xi is the measured value of the
i-th variable
X, and
is the mean value of variable
X, which in this study corresponds to the feature values of the VIs.
Yi is the measured value at the
i-th position corresponding to
Xi, and
is the mean value of variable
Y, which in this study corresponds to the label values of different nutrient-deficient areas.
n is the number of samples for variables
X or
Y, which in this study refers to the number of experimental plots.
2.3. Deep Feature Extraction from Imagery Based on VGG16
VGG16, an excellent convolutional neural network (CNN) architecture, was proposed by the Visual Geometry Group at the University of Oxford in 2014. The use of VGG16 as the backbone for feature extraction in image classification tasks has demonstrated strong performance in land cover classification, multi-scene classification, and phenological change detection, indicating its applicability in the field of agricultural remote sensing imagery classification. Additionally, this study focused on classifying nutrient deficiency in different areas of field rice based on imagery captured by a UAV platform, which aligns with the application scenarios of VGG16. Therefore, VGG16 was employed to train the dataset, and deep features were extracted using the optimal model obtained from the training process.
As shown in
Figure 4a, the original architecture of the VGG16 network involved input imagery passing through multiple convolutional layers for feature extraction. The feature vectors were then flattened and fed into two fully connected layers, each containing 4096 neurons (FC-4096), followed by a final output layer for classification into 1000 categories. In this study, since the final classification task involved only 4 categories, the use of the original FC-4096 fully connected structure would lead to overfitting in the deep learning model due to excessive parameters. This was evidenced by the classification accuracy exceeding 99% during training for nutrient deficiency in each rice variety across both the S1 and S2 periods. In addition to model overfitting, the original fully connected structure also failed to effectively extract the required number of image features. Therefore, dimensionality reduction was applied to the fully connected part of the original network. The mathematical description of the fully connected layers in the original VGG16 network is shown in Equation (2).
where
x is the input vector,
y is the output vector, and their subscripts represent the number of elements in the vectors;
W is the weight matrix,
b is the bias matrix, and their subscripts represent the parameters of different layers.
Meanwhile, max pooling selected the maximum value from the pooling window as the output, making it more suitable for applications that required highlighting prominent features. On the other hand, average pooling calculated the average value of all pixel values within the pooling window as the output, which smoothed the feature map. Since the target scenario of this study was more inclined to reflect the overall characteristics of a specific area in the field, the max pooling layers in the VGG16 network were replaced with average pooling layers. For an input feature map
x with a pooling window size of
k ×
k, the mathematical descriptions of max pooling and average pooling operations were given by Equations (3) and (4), respectively.
Here,
x (
i +
m − 1,
j +
n − 1) represents the pixel value within the pooling window of the input feature map, and
y (
i,
j) represents the pixel value of the output feature map.
As shown in
Figure 4b–d, each network variant included a fully connected layer with 15 neurons (FC-15) before the final 4-class classification layer. This was designed to reduce the dimensionality of image features to an appropriate number. The value of 15 was chosen because it was higher than the number of visible light VI features (10) and multispectral VI features (13) mentioned earlier but smaller than their combined total (23). Before the FC-15 layer, three different configurations were implemented.
Figure 4b shows the structure where only one of the two original FC-4096 layers was retained, followed immediately by an FC-15 layer for feature dimensionality reduction.
Figure 4c shows the structure where both of the original FC-4096 layers were retained, followed immediately by an FC-15 layer for feature dimensionality reduction.
Figure 4d shows the structure where both of the original FC-4096 layers were retained, followed by an FC-50 layer for preliminary feature dimensionality reduction and then an FC-15 layer to further reduce the features to the specified number.
Finally, the dataset for each variety was divided into training and testing sets in a 1:1 ratio. The modified network was then trained to obtain the optimal classification model, which was regarded as an image deep feature extractor for the corresponding variety. In this study, the model’s performance was evaluated using classification accuracy and average precision, with their calculation formulas provided in Equations (5) and (6). Subsequently, the trained model was called, and the final classification layer of the network was removed. By inputting imagery for feature extraction, 15-dimensional deep image features, denoted as Features-Deep, were obtained.
Here,
Acc. represents the value of classification accuracy,
C is the total number of categories (in this study,
C = 4),
TPi is the number of correctly predicted samples for the
i-th category, and
N is the total number of samples in the test set.
Here,
Precision represents the value of average precision, and
FPi is the number of samples from other categories that are incorrectly predicted as the
i-th category.
2.4. Machine Learning Classification Based on Fusion Features
As described earlier, this study extracted three types of features: RGB-VIs, S-VIs, and Features-Deep. In subsequent usage, these features were denoted as
R,
S, and
F, respectively. The mathematical expressions for these three feature vectors are provided in Equation (7).
Here,
r1–r10 represent the feature values of the 10 RGB-VIs,
s1–s13 represent the feature values of the 13 S-VIs, and
f1–f15 represent the values of the 15-dimensional Features-Deep.
Using the trained Modified-VGG16 model, the forward propagation function was employed to extract the output of the final fully connected layer as deep image features (represented as a set of one-dimensional vectors). Subsequently, VI features were fused with the deep image features into a unified set of one-dimensional feature vectors, followed by normalization to achieve feature fusion. By linearly combining these feature vectors, four additional feature vector combinations were obtained: RS, FR, FS, and FRS. The feature combinations that did not include
F (i.e.,
R,
S, and RS) were referred to as original features, while those that included
F (i.e., FR, FS, and FRS) were referred to as fusion features. The mathematical expression for the fusion feature vectors is given by Equation (8).
Here,
Ffusion represents the fused feature vector, and
Fori represents the feature vector composed of original features.
In this study, three classifiers—support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost)—were used to construct machine learning models for classifying nutrient deficiency in field rice. SVM is a classification algorithm based on margin maximization. It handles nonlinear data through kernel functions, performs well on high-dimensional data, and has strong generalization capabilities. However, SVM is sensitive to the choice of hyperparameters and kernel functions, requiring multiple optimizations during model construction. RF is an ensemble learning algorithm based on decision trees. It improves performance by constructing multiple trees and voting or averaging their results. RF is robust, insensitive to noise and outliers, interpretable, and fast to train, making it suitable for large-scale data. However, RF models are complex, with high storage and inference costs, and they generally perform poorly on high-dimensional sparse data. XGBoost is an efficient ensemble learning algorithm based on gradient-boosted trees. It iteratively constructs decision trees to optimize the loss function, achieving high performance on structured data. XGBoost supports customizable loss functions and regularization, and it accelerates training through parallel computing and sparse data processing. However, it is sensitive to hyperparameters, requires complex tuning, and demands significant training time. To better evaluate the machine learning model’s sensitivity and specificity, this study calculated the classification model’s recall rate and F1-score, as shown in Equations (9) and (10), respectively.
Here,
Recall represents the model’s recall rate,
FNi indicates the number of false negatives (samples that are actually positive but were predicted as negative) for the
i-th category, and
F1 is the model’s F1-score.
The three original feature vectors (R, S, and RS) and the three fused feature vectors (FR, FS, and FRS) were used as input features for constructing machine learning classification models. These six feature vectors were input into the SVM, RF, and XGBoost classifiers to compare the performance differences between original and fused feature combinations in classifying nutrient deficiency in field rice. The best classification model was determined based on classification accuracy.
To enhance the effectiveness of the proposed classification model, the study on nutrient deficiency classification based on the six feature vectors was conducted during both the S1 and S2 periods. The classification model constructed during the S2 period was used to guide the topdressing strategies for the ratoon season.
2.5. Classification of Nutrient Deficiency and Topdressing Strategies in the Field
In this study, six varieties (V1–V6) were tested. Three of these varieties (V3, V4, and V6) were randomly selected, and their feature data were used to train and construct machine learning classification models. The remaining three varieties (V1, V2, and V5) were used to evaluate the actual nutrient deficiency in the field by applying the trained classification model and formulating corresponding topdressing strategies.
The feature vector data from the S2 period (before ratoon season fertilization) were used. The datasets for V3, V4, and V6 were combined and split into training and testing sets in a 7:3 ratio for model training and construction. The optimal model was then determined. Next, the multispectral imagery of the field areas for V1, V2, and V5 was evenly divided. Each variety in each fertilization treatment area was divided into four subplots, resulting in 16 subplots per variety across four treatment areas. The fused feature data for each subplot were extracted and input into the optimal classification model to determine the actual nutrient deficiency distribution. The deficiency levels were categorized as 75%, 50%, 25%, or no deficiency.
After determining the actual nutrient deficiency in the treatment areas for V1, V2, and V5, additional topdressing was applied before the ratoon season tillering fertilizer application. Specifically, 25%, 50%, and 75% incremental topdressing was applied to areas with corresponding deficiency levels to compensate for the nutrient shortage in the main season. Then, the standard 100% fertilization rate was applied to the N1, N2, and N4 treatment areas during the ratoon season, while the N3 treatment area received only 75% of the standard rate without incremental topdressing. This was done to create a control group for evaluating the effectiveness of the topdressing strategy.
Finally, the effectiveness of the topdressing strategy was evaluated based on four traits that reflect rice nutrient utilization: tiller number (growth dimension), thousand-grain weight and seed setting rate (yield dimension), and grain area (quality dimension). Before harvesting in both the main and ratoon seasons, 20 plants were randomly sampled from each variety in each treatment area. The specific values of these traits were measured through manual evaluation. The changes in these traits between the main and ratoon seasons were analyzed, and the improvement in the four traits during the ratoon season was used as the evaluation metric for the topdressing strategy’s effectiveness.
3. Results and Discussion
3.1. Significant VI Features and Corresponding Field Multispectral Imagery
The results of the correlation analysis between VI features and nutrient deficiency conditions are illustrated in
Figure 5. Specifically,
Figure 5a displays the corresponding heatmap for RGB-Vis, and
Figure 5b presents the heatmap for S-VIs, both during the two distinct periods S1 and S2. Additionally, based on the
p-value method, the 10 RGB-VIs and 13 S-VIs selected in this study were all found to be significant factors (
p < 0.05) in evaluating nutrient deficiency in field rice.
Therefore, to reduce feature dimensionality and minimize the interference of weakly correlated features in the construction of the classification model, this study screened the VI features based on the absolute correlation coefficient values (|
r|) between the features and nutrient deficiency conditions. Features with weak correlations (|
r| < 0.2) were excluded, and only strongly correlated features were retained. As shown in
Figure 5a, among the 10 RGB-VI features, two features (NGRDI and VEG) had |
r| values below 0.2 during either S1 or S2, while the |
r| values of the remaining features were all greater than 0.2. Similarly, as shown in
Figure 5b, among the 13 S-VI features, two features (EVI and CIRE) had |
r| values below 0.2 during either S1 or S2, while the |
r| values of the remaining features were all greater than 0.2. The results of the VI feature screening are shown in
Table 3. Ultimately, 8 out of the 10 RGB-VI features and 11 out of the 13 S-VI features were selected as the feature combinations for this study.
The selected VI feature combinations will be used in the subsequent construction of the classification model. Since the S2 period falls within the fertilization window for the ratoon season, its classification results will be used to guide specific topdressing measures. The intuitive color distributions of the field vegetation index features during the S2 period are shown in
Figure 6, where
Figure 6a illustrates the distribution of visible light vegetation indices, and
Figure 6b illustrates the distribution of multispectral vegetation indices.
The selected VI feature combinations were used in the subsequent construction of the classification model. Since the S2 period fell within the fertilization window for the ratoon season, its classification results were used to guide specific topdressing measures. The intuitive color distributions of the field VI features during the S2 period are shown in
Figure 6, where
Figure 6a illustrates the distribution of visible light VIs, and
Figure 6b illustrates the distribution of multispectral VIs.
3.2. Results of Deep Learning Training and Deep Feature Extraction from Imagery
To validate the effectiveness of the three different network structures described in
Section 2.3 for deep feature extraction, 100 rounds of model pre-training were conducted using the same dataset for each structure. The average classification accuracy of the rice variety imagery was used as the evaluation metric, and the results are shown in
Table 4. From the results, it can be seen that the classification accuracy of the network structure shown in
Figure 4d was significantly lower than that of the other two structures. This indicated that some information was lost during the feature dimensionality reduction process, and two rounds of dimensionality reduction resulted in twice the loss, leading to a decrease in classification accuracy. The network structures shown in
Figure 4b,c showed no significant difference in classification accuracy, indicating that the parameter size of the FC-4096 structure was fully sufficient for the four-class classification task and did not require additional stacking. In conclusion, considering both the effectiveness of the feature extraction model and the parameter size, this study selected the network structure shown in
Figure 4b for training the feature extraction model.
During training, the dataset and hyperparameters constructed earlier were used for 150 rounds of formal training. The training results for each variety during the two periods are shown in
Figure 7, and the confusion matrices for the average precision of nutrient deficiency classification in each region are shown in
Figure 8. During the S1 period, the average classification accuracy for nutrient deficiency across the six rice varieties was 88.78%, while the optimal classification result using a single VI feature (either R or S alone) during the same period was 87.50% with the RF classifier. During the S2 period, the average classification accuracy for nutrient deficiency across the six rice varieties was 84.56%, while the optimal classification result using a single VI feature during the same period was 90.28% with the RF classifier. Compared to the machine learning classification accuracy based on single VI features, the image classification accuracy based on the deep learning model achieved similarly high accuracy on the same dataset, indicating that both feature types had already demonstrated good performance in classifying rice nutrient deficiency.
However, VI features and deep image features represent significantly different crop physiological mechanisms and belong to distinct feature types; this is evidenced in
Figure 8, where the deep learning approach demonstrates superior performance in both low-fertilization (N1) and high-fertilization (N4) zones. This enhanced performance can be attributed to the more pronounced manifestation of color features under extreme fertilization conditions. In contrast, greater classification errors were observed in medium-fertilization zones (N2 and N3), particularly in the N3 experimental area. Therefore, this study proposes that combining these two feature types could theoretically achieve improved classification accuracy. Using the pre-trained deep neural network described earlier, the parameters computed after the FC-15 layer were saved as a one-dimensional vector containing 15 elements, serving as the deep image features.
3.3. Classification Results Based on VI Features and Fusion Features
During the two different periods, S1 and S2, the VI features (R, S, and RS) and the fused features (FR, FS, and FRS) were used as feature vectors and input into the machine learning classifiers SVM, RF, and XGBoost to construct classification models. The effectiveness of the fused features compared to the original features was evaluated based on the changes in classification accuracy for nutrient deficiency in the field. The classification accuracy of each classifier is shown in
Table 5. The results indicated that the fused features improved the classification accuracy to some extent in the machine learning models constructed by all three classifiers, although the overall performance of the models varied significantly.
3.3.1. Performance of the SVM RF and XGBoost Classifier
The SVM-based model achieved the lowest accuracy among the three models for most feature combinations during both periods, except for the FR feature combination in S1. Its highest accuracy was 95.83% (S1) and 88.89% (S2) with FR, though these did not surpass the study’s overall best accuracy. The SVM model exhibited lower sensitivity to multispectral VI features, as shown by (1) higher accuracy on R than S in S1, with no improvement from RS; (2) only a 3.61% accuracy increase on S compared to R in S2; and (3) substantial accuracy improvements only with FR (11.39% in S1 and 13.89% in S2), while FS and FRS showed minimal gains (approximately 1%).
The RF model consistently achieved the highest accuracy across most feature combinations, except for FR, where it slightly underperformed (94.72% in S1 and 90.72% in S2). Its sensitivity aligned with crop growth mechanisms was as follows: (1) In S1, high accuracy on R (86.11%) and FR (94.72%) reflected visible light color differences during vigorous growth, while outperforming SVM on multispectral VI features (S and RS). (2) In S2, as visible light color characteristics weakened and multispectral VI features strengthened, the RF model’s accuracy on R and S was 78.33% and 90.72%, respectively, showing a significant gap compared to SVM. (3) Deep image fusion features (FR, FS, FRS) significantly improved accuracy. Accuracy improvements were most significant for the FR feature (8.61% in S1 and 12.39% in S2), followed by FS (8.89% in S1 and 3.61% in S2) and FRS (6.67% in S1 and 4.34% in S2).
The XGBoost model performed similarly to RF with fused features, achieving notable accuracy improvements in S1 (FR: 8.33%, FS: 14.58%, FRS: 11.66%) and S2 (FR: 17.08%, FS: 6.5%, FRS: 5.42%). However, its accuracy on non-fused features was lower than RF in both S1 (R: 3.19%, S: 5.83%, RS: 5.41%) and S2 (R: 3.33%, S: 4.70%, RS: 3.47%). With fused features, XGBoost nearly matched RF, with FS and FRS accuracies within approximately 2% during S2.
The F1-score analysis demonstrates that the RF model consistently outperforms SVM and XGBoost across most feature combinations, achieving the highest scores (F1-score: 0.808–0.976). Although XGBoost exhibits comparable performance (F1-score: 0.732–0.974), it underperforms with simpler features and shows greater performance variability. SVM achieves the least favorable results overall (F1-score: 0.708–0.949), particularly demonstrating significant limitations when handling complex feature sets. Consequently, RF exhibits superior robustness in managing diverse feature combinations, establishing it as the optimal classification model for this study.
3.3.2. Comprehensive Comparison of the Three Machine Learning Classifiers and Determination of the Optimal Model
The results are visually represented in
Figure 9, illustrating the performance differences among the three machine learning models across various feature combinations. From the bar charts (
Figure 9a,c), incorporating deep image features improved classification accuracy for nutrient deficiency, with an average increase of 7.52% across both periods. Excluding the SVM model, the average improvement with fused features rose to 9.01%. This trend is further reflected in the radar charts (
Figure 9b,d), where points representing fused feature accuracy clustered closer to the best classification accuracy (blue border) compared to non-fused features.
The RF and XGBoost models exhibited similar overall performance, with RF slightly outperforming XGBoost and both surpassing SVM. This is evident in the radar charts, where the areas enclosed by RF and XGBoost accuracy points were larger and closer to the best accuracy region than those of SVM.
The proposed method, combining VI and deep image features, achieved optimal performance during both S1 and S2 periods. The highest classification accuracy was observed on the FRS feature, reaching 97.50% (RF, S1) and 96.56% (RF, S2). Consequently, the RF model was selected for subsequent field nutrient deficiency identification and topdressing strategy formulation during the ratoon season.
3.4. Field Nutrient Deficiency Diagnosis and Topdressing Strategies Based on Optimal Classification Model
In the previous sections, the dataset and classification models were constructed using ratoon rice varieties V3, V4, and V6, as shown in the white dashed box in
Figure 10a. The optimal RF model, trained in
Section 3.3, was applied to classify the nutrient deficiency conditions of varieties V1, V2, and V5. Despite the experimental design dividing the field into four fertilization zones (N1–N4), actual nutrient deficiency distribution deviated due to factors such as uneven light exposure, elevation differences, and fertilizer mixing. Thus, the classification results provided a critical basis for precise topdressing strategies.
The nutrient deficiency classification results for V1, V2, and V5 are shown in
Figure 10a, with colors representing deficiency levels: red (75%), orange (50%), yellow (25%), and blue (no deficiency). Variety V5 aligned well with the original fertilization gradients, demonstrating superior fertilizer utilization efficiency by outperforming the original nutrient conditions in multiple grid areas. In contrast, V1 and V2 showed less alignment with the experimental design but still exhibited a trend of decreasing deficiency from N1 to N4. Both varieties displayed poorer fertilizer utilization efficiency, falling behind the original nutrient conditions in more grid areas.
The topdressing results for V1, V2, and V5 are shown in
Figure 10b, with fertilization amounts (75%, 50%, and 25% of the main season rate) corresponding to deficiency levels. To emphasize topdressing effects, no additional fertilization was applied to the N3 zone. Each treatment zone was divided into four grid areas, with supplementary fertilization calculated as 1/4 of the standard main season rate. The ratoon season bud-promoting fertilizer was applied at the standard rate to the N1, N2, and N4 zones, while N3 received 75% of the standard rate, consistent with the main season strategy. The supplementary amounts were mixed with the bud-promoting fertilizer and applied accordingly (
Table 6).
3.5. Evaluation of Topdressing Effects
After applying the topdressing strategy described above to the rice varieties in the N1, N2, and N4 treatment zones, field sampling was conducted before the ratoon season harvest to measure the values of four traits: tiller number, seed setting rate, thousand-grain weight, and grain area. These values were then compared with the trait values measured during the main season harvest. The specific changes in trait values for each variety across the two seasons are shown in
Figure 11a–d. From left to right, the three figures display the relevant information for varieties V1, V2, and V5, with solid lines representing the main season and dash-dot lines representing the ratoon season.
From the main season perspective, the four traits (tiller number, grain area, seed setting rate, and thousand-grain weight) exhibited consistent sensitivity to nutrient deficiency across all varieties. Tiller number showed the highest sensitivity, with significant variation across the N1–N4 zones, followed by grain area, which displayed a clear but less-pronounced trend. Seed setting rate and thousand-grain weight were the least sensitive, with no significant differences across zones. Despite these variations, all traits generally increased from N1 to N4, indicating that greater nutrient deficiency corresponded to poorer growth.
The topdressing strategy demonstrated significant effectiveness for variety V5. The N1 and N2 zones, which received additional fertilization, showed substantial trait improvements compared to the non-fertilized N3 zone, as detailed in
Table 7. The N4 zone, receiving standard fertilization, achieved the highest values in all four traits without additional fertilization, reflecting its efficient fertilizer utilization. Variety V5 in N4 also showed more significant improvement relative to the main season compared to N3.
Variety V2 exhibited the effectiveness of topdressing in three traits: tiller number, thousand-grain weight, and grain area, while seed setting rate remained unchanged. Due to its lower fertilizer utilization efficiency compared to V5, V2 showed significant improvements in tiller number and grain area under limited fertility, aligning with the main season sensitivity trend. However, only thousand-grain weight improved among yield-related traits. The N4 zone did not achieve the highest values in all traits, as insufficient original fertility prevented optimal growth, and it lacked the additional fertilization applied to N1 and N2. Nevertheless, it still outperformed N3 relative to the main season.
Variety V1 showed topdressing effectiveness only in grain area, with N1 and N2 underperforming compared to the non-fertilized N3 in the other three traits. V1’s poor fertilizer utilization efficiency indicated that conventional topdressing could not effectively correct nutrient deficiency under extremely insufficient fertility, as field conditions could not be restored quickly through additional fertilization.
Additionally, the improvement rates of varieties V1, V2, and V5 in the four traits during the main and ratoon seasons relative to the non-fertilized N3 zone were listed in
Table 7. This included the average improvement rates for the N1–N2 zones and the improvement rate for the N4 zone relative to the N3 zone. An improvement rate in the ratoon season higher than that in the main season indicated the effectiveness of the topdressing strategy. Among the 24 comparison groups, 16 showed higher improvement rates in the ratoon season than in the main season. If variety V1 was excluded, 13 out of 16 comparison groups showed higher improvement rates in the ratoon season. Furthermore, among these 13 groups, 12 showed improvements of more than 5%, 9 showed improvements of more than 10%, and 6 showed improvements of more than 20%.