1. Introduction
As an essential pillar of the global economic crop, cotton is pivotal in the international textile industry due to its abundant yield and low production costs [
1]. As one of the world’s largest cotton-producing countries, China accounts for about 30% of the global total production. Xinjiang cotton is renowned worldwide for its soft and long fibers, good glossiness, and high strength. Its cotton production accounts for 92.4% of the total national output. However, the negative impact of cotton diseases poses a severe challenge to the cotton industry, as it not only threatens crop health but may also cause significant economic losses [
2]. In traditional methods, the identification and inspection of diseases rely on experienced farmers for manual operation, which is not only time-consuming and labor-intensive but also has limitations in accuracy, making it challenging to meet the production needs of modern agriculture for scale and precision [
3].
With the vigorous development of artificial intelligence technology, its application in crop disease recognition has opened up new horizons [
4]. Traditional manual detection methods are gradually being replaced by intelligent recognition technologies, which have shown significant advantages in automation and accuracy. The intelligent recognition technology for cotton diseases is developing rapidly [
5]. As an important economic crop, early diagnosis of cotton diseases is crucial for ensuring yield and quality. As cotton diseases’ most intuitive pathological feature, leaves provide rich recognition information for deep learning and computer vision technology. By inputting images of cotton leaves containing diseases into a deep learning model, automatic analysis and recognition of disease types significantly improves recognition accuracy and dramatically enhances the recognition process’s efficiency [
6]. This artificial intelligence-based cotton disease recognition method effectively reduces labor costs in large-scale farm management while steadily improving disease recognition’s accuracy and response speed, making it the preferred method in the current cotton disease recognition field [
7,
8].
Pechuho et al. [
9] developed a cross-platform recognition model using machine learning methods on a cotton dataset, achieving an accuracy rate of up to 90%. Sujatha et al. [
10] compared the performance of machine learning algorithms (random forest, support vector machine, stochastic gradient descent) and deep learning algorithms through experiments on a cotton disease dataset. The analysis of experimental results shows that VGG-16 performs better than other comparative models, with an accuracy rate of 89.5%. Caldeira et al. [
11] collected a dataset of 60,000 cotton leaf lesions divided into two categories: healthy and diseased. Through experiments using both machine learning and deep learning methods, it was found that the recognition accuracy of all models exceeded 70%, with SVM performing the best with an accuracy of 80%. Suriya et al. [
12] proposed a cotton disease feature extraction method based on a Convolutional Neural Network (CNN) by cleverly combining convolutional layers and max pooling layers and developed a corresponding classification model. However, there is still room for improvement in the accuracy of this model in practical applications, and it is challenging to meet the high standard requirements of production practice. To address this issue, Zambare et al. [
13] further optimized the model structure by introducing dense layers and flattening operations in CNN, significantly improving the model’s performance. The experimental results on public datasets show that the model achieves a recognition accuracy of 99%. Zekiwos et al. [
14] developed an efficient cotton pest and disease detection mechanism based on CNN. This method was tested on a cotton dataset containing 2400 images, divided into four categories: underage leaves, spider mite leaves, healthy leaves, and bacterial withered leaves, achieving a recognition accuracy of 98%. Jajja et al. [
15] proposed a network model based on Compact Convolutional Transform (CCT) and created a cotton disease dataset containing 5135 images. A high accuracy of 97.25% was achieved on this dataset. However, the above research is mainly based on stable environments and may not achieve ideal recognition results in actual production and life. Therefore, Rai et al. [
16] achieved a high recognition accuracy of 97.98% on a cotton disease dataset with complex backgrounds by improving deep convolutional neural networks and studying different segmentation ratios and pooling layers.
Pankaj et al. [
17] found that combining IoT technology and CNN can effectively promote the identification of cotton diseases and develop a predictive model to help farmers accurately identify and provide treatment suggestions through mobile devices. Shao et al. [
18] found through their research that the bilinear coordinate attention mechanism can help the model better focus on the disease area and reduce the loss of feature information and the interference of redundant information. Based on this idea, they integrated this mechanism into CNN and proposed a deep-learning model to identify cotton diseases in natural environments accurately. Although these studies have made significant progress in cotton disease identification, the high demand for computing resources in the models has limited their popularity in practical production. To address the above challenges, Gao et al. [
19] integrated Transformer technology and knowledge graph to design a cotton disease detection model with fast response speed and high recognition accuracy, meeting practical applications’ needs.
The above research provides a brief overview of the development of artificial intelligence technology in cotton disease identification in recent years. Research and development on other crops can also help us better develop intelligent cotton disease recognition technology.
Xu et al. [
20] conducted an in-depth analysis of the limitations of existing grading methods in identifying crop diseases and proposed a new two-stage approach. This method performs disease recognition tasks by classifying potato leaves in the initial stage and accurately segmenting the leaves, disease sites, and background in the obtained images. It achieved a high accuracy recognition rate of 97.86% on publicly available potato datasets. The self-attention mechanism is the core of the Transformer architecture. Guo et al. [
21] proposed a Convolutional Swin Transformer (CST) method based on the Swin Transformer, which effectively solves the problem of low accuracy in crop disease recognition in noisy environments. Zeng et al. [
22] utilized a self-attention mechanism to focus on the characteristics of essential regions in images and developed a convolutional neural network that integrates a self-attention mechanism. The network achieved high accuracy recognition rates of 95.3% and 98% on the AES-CD9214 and MK-D2 datasets. Lee et al. [
23] found that CNN can extract features from diseased areas and effectively utilize image backgrounds and healthy areas for classification operations. Based on this idea, they proposed a model based on a recursive neural network (RNN). Based on the analysis of experimental results, this model is more robust and has better generalization ability than traditional CNN methods. Wang et al. [
24] significantly improved the accuracy of crop disease recognition, especially in fine-grained classification, by integrating the Convolutional Block Attention Module (CBAM) module into convolutional neural networks.
Although existing research has made some progress in stable environments, these results differ significantly from the complex and changing environmental conditions in actual agricultural production [
25]. In practical production applications, the diversity of cotton diseases and their interaction with environmental factors pose higher requirements for disease identification [
26]. Specifically, the yellow or yellow-brown spots of cotton leaf blight disease are easily confused with soil color in natural environments, and this similarity does not occur in stable environment experiments, which poses a challenge to the recognition ability of the model. Although there have been some studies on identifying cotton diseases under complex background conditions in recent years, there are still various problems. Firstly, the accuracy of recognition obtained from most studies cannot meet the high requirements of daily production and life. Secondly, although Rai et al. effectively identified cotton diseases in complex backgrounds, they used a large dataset with significant model parameters. This makes the model have high-performance requirements for device deployment and cannot be widely used among ordinary farmers. Therefore, developing a lightweight cotton disease recognition model that can maintain high recognition accuracy in complex backgrounds is particularly urgent.
To address the above challenges, this study proposes an innovative solution, the CANnet model, to improve the accuracy and efficiency of cotton disease identification in complex backgrounds. The CANent model is based on the ResNet [
27] architecture and integrates the Reception Field Space Channel (RFSC) module, Precise Coordinate Attention (PCA) module, and Kolmogorov Arnold Networks-s (KANs) mechanism. The RFSC module effectively solves the parameter-sharing problem in traditional convolution by integrating dynamic receptive field features with conventional convolution features. It introduces a channel attention mechanism to combine spatial and channel attention effectively. The PCA module further utilizes the Spatial Redundancy Unit (SRU) and Channel Redundancy Unit (CRU) to optimize the spatial channel attention in the RFSC module, effectively reducing redundancy in feature representation. In this study, we first applied the Kolmogorov Arnold Networks (KAN) [
28] concept to the field of crop disease identification and made lightweight improvements, forming the KANs mechanism. The KANs mechanism utilizes learnable activation functions instead of fixed activation functions, significantly eliminating the dependence on linear weight matrices and improving the model’s recognition and classification capabilities in complex backgrounds. The CANnet model demonstrates significant advantages in identifying cotton diseases in complex backgrounds while maintaining lightweight. The main contributions of this study are as follows:
This study innovatively proposed a network model called CANnet, which achieved high recognition accuracy on two cotton disease datasets (mainly composed of diseases with few pests).
We have designed an innovative PCA module that optimizes the performance of spatial channel attention through spatial channel redundancy reduction technology.
For the first time, the KAN concept was applied to crop disease identification and lightweight improvements were made based on this concept, resulting in KANs. KANs can help the classifier part of the network model better adapt to complex background images and achieve better classification results.
We independently collected and created a cotton dataset in Xinjiang, which includes 792 images and covers various diseases.
3. Results and Discussion
3.1. Experimental Environment
The model’s construction, training, and testing are carried out on servers using the Linux operating system. The server is equipped with NVIDIA A40 graphics card, NVIDIA SMI version 460.106.00, driver version 460.10.6.00, CUDA version 11.2, Python version 3.7, Pytorch version 1.13.1, and Torchvision version 0.14.1.
3.2. Experimental Settings
We selected three key performance indicators to comprehensively evaluate the model’s performance: Precision, Top1 Acc, and F1 score for comprehensive comparative analysis. True positive (TP) refers to the number of leaves correctly identified as diseased; True negative (TN) refers to the number of leaves correctly identified as healthy leaves; False positive (FP) refers to the number of leaves incorrectly identified as diseased; False negative (FN) refers to the number of leaves incorrectly identified as healthy.
Top1 Acc, as a direct measure of model prediction accuracy, reflects the proportion of samples consistent with the actual labels predicted by the model and is suitable for various problems, including binary classification and multi-classification. The specific calculation formula for Top1 Acc is as follows:
Precision measures the proportion of samples predicted by the model as positive categories that are positive categories. In datasets with uneven distribution of positive and negative categories, this indicator can reflect the model’s ability to recognize minority categories. The mathematical expression for Precision is defined as:
In classification problems, the F1 score, as the harmonic mean of Precision and Recall, plays a significant role in handling imbalanced datasets or simultaneously considering both precision and recall. The following is the specific expression for the F1 score:
Through detailed analysis of these indicators, this study aims to deeply understand the model’s performance in cotton disease leaf recognition tasks and provide a scientific basis for further model optimization and application. The hyperparameters used in the experiments in this study are shown in
Table 2.
3.3. Model Comparison Experiment
We validate the effectiveness of our proposed method on cotton datasets with complex backgrounds by comparing CANnet with other advanced models. AlexNet [
32], SqueezeNet [
33], MobileNetV2 [
34], and MobileNetV3 [
35] based on convolutional neural networks. Pooling-based vision Transformer (PiT) [
36], CF-ViT [
37], Transformer in Transformer (TNT) [
38], LeVit [
39], CoAtNet [
40], Vision Transformer(ViT) [
41], and Swin Transformer [
42] based on Transformer. Models such as CvT [
43], and Mobile-Former [
44] based on CNN + Transformer architecture. The experimental results are shown in
Table 3 and
Table 4.
The CANnet model performs well on two independent datasets, surpassing existing deep convolutional neural networks (CNNs). Specifically, on a self-built dataset with limited training samples, CANnet achieved recognition accuracy of 0.963, 0.971, and 0.961 on the three key evaluation metrics of Top1 Acc, Precision, and F1 score, respectively. Compared with the best-performing comparison model, MobileNetV2, the performance improvement of CANnet is 4.8%, 4.3%, and 4.2%, respectively. This result confirms that CANnet is superior to traditional convolutional neural networks in utilizing spatial self-attention and channel attention and has more robust training and testing capabilities on small datasets. On the public dataset, CANnet’s performance is more outstanding, with three indicators reaching 0.986, 0.988, and 0.986, respectively. Compared with MobileNetV2, it achieved performance improvements of 6.1%, 6%, and 6.4%, respectively. These data indicate that on larger datasets, CANnet’s recognition performance is more significant, with higher accuracy and relatively more remarkable improvement. Compared to traditional convolutional neural networks with poor recognition performance (AlexNet, SqueezeNet, and MobileNet), CANnet showed the highest improvement of 8.6% and 7.5% in recognition performance on two independent datasets, respectively. This significant improvement demonstrates the excellent capabilities of CANnet’s attention mechanism and KANs classification technology in feature extraction and classification recognition. The application of these technologies not only improves recognition accuracy compared to traditional CNNs but also enhances the model’s ability to capture essential information. The experimental results further validated the efficiency and reliability of the CANnet model in meeting daily production and living needs.
Compared with advanced models using Transformer architecture, CANnet’s performance on self-built datasets is particularly remarkable. Compared with the CF-ViT model, it has improved evaluation metrics by 9.8%, 9.8%, and 9.6%, respectively, which is significantly better than the CNN model. On the publicly available cotton disease dataset, the performance improvement of CANnet compared to the LeVit model is 6.3%, 5.7%, and 6.2%, respectively. The experimental results intuitively demonstrate that the spatial channel attention used by CANnet is more suitable for agricultural datasets than the Transformer architecture with a self-attention mechanism. We analyzed the reasons for the insufficient performance of Transformer architecture on two datasets. This may be due to the high demand for parameter quantity in transformer models and the possible phenomenon of attention saturation.
The latest research trend is integrating CNN and Transformer architectures to achieve better performance. By comparing CANnet with the best-performing Mobile-Former in the fusion model, we found that CANnet improved performance by 5.2%, 4.8%, and 4.9% on self-built datasets, and 5.5%, 4.5%, and 5% on public datasets, respectively. Analysis suggests that the performance of fusion model architectures on small-scale agricultural datasets could be better than CANnet, possibly due to their failure to thoroughly learn compelling features from the data, resulting in lower recognition accuracy.
Figure 6 presents the performance of CANnet on two independent datasets in the form of a confusion matrix.
Figure 7 shows the clustering comparison between CANnet and the backbone network and the high-dimensional feature space of the current optimal comparison model. Through the comparative analysis of these charts, we can observe the advantages of the CANnet model in maintaining internal compactness and inter-class separation. The design of CANnet integrates advanced feature learning (low redundancy spatial channel attention) and classification techniques (KANs), improving the model’s generalization ability and ensuring stable performance in complex environments. These achievements provide a new perspective on the field of agricultural disease identification.
3.4. Ablation Experiment
We conducted ablation experiments on two cotton disease datasets using consistent random number seeds and the same experimental configuration. We also conducted ablation experiments on the hidden layer of KANs. In this experiment section, we mainly analyze the experimental results through the Top1 Acc performance indicators. ‘w/o’ indicates the operation of removing a module.
3.4.1. Dataset Ablation Experiment
Table 5 shows the results of the self-built dataset ablation experiment, and
Table 6 shows the public dataset ablation experiment results. The specific analysis is as follows:
Specifically, ‘w/o PCA + KANs’ means that the PCA module and KANs module have been removed from the CANnet architecture, leaving only the RFSC module. Under this configuration, the accuracy on the two datasets reached 0.938 and 0.944, respectively, which were 3.7% and 3.3% higher than the baseline model. This result indicates that the RFSC module effectively improves the model’s disease recognition ability by integrating traditional convolution techniques with dynamic, receptive field kernels and incorporating channel attention mechanisms.
In the experimental setup of ‘w/o RFSC + KANs’, the model only uses the PCA module and does not include the RFSC and KANs modules. At this point, the accuracy on both datasets improved to 0.911 and 0.937, achieving performance gains of 1% and 2.6%, respectively, compared to the baseline model. This result reveals the performance of the PCA module when used independently. In the ‘w/o KANs’ experiment, the KANs module was removed, while the RFSC and PCA modules were retained. In this case, the accuracies on the two datasets were 0.952 and 0.968, respectively, which improved by 5.1% and 5.7% compared to the baseline model and by 1.4% and 2.4% compared to using only the RFSC module. This further confirms the effectiveness of the PCA module in spatial channel redundancy reduction and its essential role in improving the feature extraction of the RFSC module.
The ‘w/o RFSC + PCA’ experiment aims to verify the KANs module’s independent contribution by removing the RFSC module and PCA module. The experimental results showed that the accuracy was improved by 1.5% and 2.2% on self-built and publicly available datasets, respectively. This discovery fully demonstrates that in the field of crop disease recognition, improving and utilizing the learnable activation function concept in KANs in ResNet can effectively assist models in better recognition and classification. The experiments of ‘w/o PCA’ and ‘w/o RFSC’ removed the PCA module and RFSC module, respectively, to evaluate the performance of the KNAs module under different configurations. The results showed that the accuracy reached 0.947/0.962 and 0.949/0.951 on the two datasets, with performance improvements of 4.6%/5.1% and 4.8%/4%, respectively. These results indicate that the KANs module improves recognition accuracy on its own and maintains excellent performance when combined with PCA and RFSC modules.
3.4.2. KANs Ablation Experiment
We applied the KAN concept for the first time in crop disease identification and achieved good results. The hidden layers in KANs have the characteristics of enhancing the expressive power of the model and controlling the complexity of the model. We conducted a series of ablation experiments on the hidden layer to select the most suitable parameter configuration for this study. These experiments allow us to carefully evaluate the specific impact of each configuration on the experimental results by adjusting the hidden layer parameters one by one. The experimental results are shown in
Table 7 and
Table 8.
From
Table 7, it can be seen that when the hidden layer parameters are set to (512,3) and (512,256,3), the Top1 accuracy reaches 0.958 and 0.957, respectively. Compared with the optimal parameter configuration (512,128,3), the performance only decreases by 0.5% and 0.6% while still maintaining a high recognition accuracy, which is higher than the model without an integrated KANs module. However, when the hidden layer parameters were further expanded to (512,256,128,3), the Top1 accuracy decreased to 0.935, which was 2.8% lower than the optimal performance parameters. This result indicates that the parameter configuration may have a suppressive effect on the model’s performance. Based on the analysis of experimental results from publicly available datasets, we speculate that the reason for the decrease in accuracy caused by excessive hidden layer parameters may be related to the small size of the dataset, which may lead to overfitting of the model.
On the public dataset (
Table 8), the Top1 accuracy configured with parameters (512,3) and (512,256,3) was 0.975 and 0.971, respectively, with performance degradation of 1.1% and 1.5%. Compared to self-built datasets, this decrease is more significant. These data further support the conclusion that KANs perform better in processing large datasets. When the hidden layer parameter is (512,256,128,3), the Top1 accuracy is 0.954. Compared to the model without the added KANs mechanism, the recognition performance decreased by 1.4%.
The above experimental results show that (512,128,3) is the most effective hidden layer parameter configuration. The number of hidden layer parameters is closely related to the dataset’s sample size. An appropriate number of hidden layer parameters is more suitable for small datasets, while for large datasets, the number of parameters can be appropriately increased to fully utilize the modeling capabilities of KANs.
3.5. Qualitative Analysis
This section analyzes the effectiveness of CANnet, backbone network, and optimal comparative experimental model in identifying specific plant disease types on two datasets.
Based on the experimental results in
Figure 8, we observed that CANnet performed excellently in all category recognition tasks. Specifically, the accuracy of CANnet in identifying Bacterial blight and Healthy categories is 0.94 and 0.96, respectively. Moreover, the recognition accuracy on the Target spot category has reached 1. These results highlight the high accuracy and excellent model performance of CANnet on self-built datasets. In contrast, ResNet18 achieved recognition accuracies of 0.91, 0.93, and 0.86 on the same dataset. Although ResNet18 performs well in the Bacterial blight and Healthy categories, its recognition accuracy in the Target spot category is 14% lower than that of CANnet. This significant difference may be attributed to the limitations of ResNet18 in identifying small targets. On the other hand, MobileNetV2, as a lightweight model, exhibits a high recognition accuracy of 0.94 in the Bacterial blight category. However, compared to CANnet, MobileNetV2 showed a 7% and 10% decrease in recognition accuracy for the Healthy and Target spot categories, respectively, indicating insufficient recognition ability for these categories. Based on the above experimental results, we can conclude a significant difference in recognition accuracy between ResNet18 and MobileNetV2 compared to CANnet. We speculate that this difference may be partially attributed to the rich background information in the dataset, which interferes with the feature extraction process and affects the model’s ability to recognize.
As shown in
Figure 9, our analysis results demonstrate the outstanding performance of CANnet on public datasets. Specifically, CANnet achieved a recognition accuracy of 1 in the Army worm, Healthy, and Powdery mildew categories while achieving a recognition accuracy of 0.94 or higher in the other three categories, demonstrating extremely high classification ability. Compared with self-built datasets, public datasets have a larger sample size, which provides richer learning materials for the innovative spatial channel attention mechanism integrated into CANnet. Therefore, CANnet can more effectively capture features from many samples, achieving higher recognition accuracy. Although CANnet exhibits high recognition accuracy across all categories, ResNet18 and Mobile-Former models also demonstrate high recognition capabilities in specific categories. For example, in the categories of Army worm, Bacterial blight, Healthy, and Powdery mildew, the recognition accuracy of both models exceeded 0.9, demonstrating performance comparable to CANnet. However, in the Aphids category, the recognition accuracy of ResNet18 decreased to 0.87, while in the Target spot category, the recognition accuracy of ResNet18 and Mobile-Former were only 0.87 and 0.88, respectively. Based on the experimental results of our self-built dataset, we observed that both ResNet18 and MobileNetV2 had lower recognition accuracy in the Target spot category. We speculate that this may be due to the minor distinguishability differences between the Target spot category and other categories, which increases the difficulty of feature extraction for these two models and decreases recognition accuracy. In summary, the performance of the CANnet model on public datasets further confirms its superiority in cotton disease identification tasks.
3.6. Visualization Analysis
We used the Grad-CAM [
45] method to visualize the attention region of the proposed CANnet model and compared it with the baseline and best-performing comparison models.
As shown in
Figure 10, by comparing the attention regions of the three models on the self-built dataset, it can be observed that the ResNet model has problems with fewer attention regions and insufficient feature extraction. Complex backgrounds disrupt MobileNetV2’s attention and fail to focus on disease areas effectively. Compared with these two models, CANnet has demonstrated excellent performance in both the accuracy of the attention area and the handling of complex backgrounds. Among the two diseases, CANnet can accurately focus on the affected area rather than complex backgrounds or healthy tissues. In the health category, CANnet focuses entirely on the leaves, forming a clear boundary with the background.
Figure 11 shows that CANnet has significant advantages over the other two models in terms of the size and depth of the attention region. ResNet’s attention was not focused on disease areas in the Army worm, Healthy, and Powdery mildew categories, and its performance in the other three types of diseases was insufficient. Although Mobile Farmer outperforms ResNet in some aspects, there is still a significant gap compared to CANnet, especially in terms of attention to disease areas.
Based on the above analysis, CANnet exhibits better attention concentration ability in complex backgrounds than the other two models. CANnet effectively solves the interference of complex backgrounds and the feature extraction of diseased areas. We have ample reason to believe that the innovative spatial channel attention mechanism in CANnet is the key factor in solving these problems.