ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification

Ji, Zhengjie; Bao, Shudi; Chen, Meng; Wei, Linjing

doi:10.3390/agronomy14071587

Open AccessArticle

ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification

¹

The College of Information Science and Technology, Gansu Agricultural University, Lanzhou 730070, China

²

The School of Cyber Science and Engineering, Ningbo University of Technology, Ningbo 315211, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(7), 1587; https://doi.org/10.3390/agronomy14071587 (registering DOI)

Submission received: 3 July 2024 / Revised: 17 July 2024 / Accepted: 18 July 2024 / Published: 21 July 2024

(This article belongs to the Section Pest and Disease Management)

Download

Browse Figures

Versions Notes

Abstract

:

The accurate identification of corn leaf diseases is crucial for preventing disease spread and improving corn yield. Plant leaf images are often affected by factors such as complex backgrounds, climate, light, and sample data imbalance. To address these issues, we propose a lightweight convolutional neural network, ICS-ResNet, based on ResNet50. This network incorporates improved spatial and channel attention modules as well as a deep separable residual structure to enhance recognition accuracy. (1) The residual connections in the ResNet network prevent gradient loss during deep network training. (2) The improved channel attention (ICA) and spatial attention (ISA) modules fully utilize semantic information from different feature layers to accurately localize key features of the network. (3) To reduce the number of parameters and lower computational costs, we replace traditional convolutional computation with a depth-separable residual structure. (4) We also employ cosine annealing to dynamically adjust the learning rate, enhancing the network’s training stability, improving model convergence, and preventing local optima. Experiments on the corn dataset in Plant Village compare the proposed ICS-ResNet with eight popular networks: CSPNet, InceptionNet_v3, EfficientNet, ShuffleNet, MobileNet, ResNet50, ResNet101 and ResNet152. The results show that the ICS-ResNet achieves an accuracy of 98.87%, which is 5.03%, 3.18%, 1.13%, 1.81%, 1.13%, 0.68%, 0.44% and 0.60% higher than the other networks, respectively. Furthermore, the number of parameters and computations are reduced by 69.21% and 54.88%, respectively, compared to the original ResNet50 network, significantly improving the efficiency of corn leaf disease classification. The study provides strong technical support for sustainable agriculture and the promotion of agricultural science and technology innovation.

Keywords:

maize; leaf disease; attention mechanism; convolutional neural network; depth separable residual structure; image recognition

1. Introduction

Food security is crucial to national economic and social development and forms an essential basis for national security [1]. Since 2012, maize has surpassed rice as the largest grain crop in China, accounting for 40% of the total grain output for many years [2]. As a vital grain, energy feed, and industrial raw material, maize plays a key role in ensuring “basic self-sufficiency in grain” and is closely linked to national food security. Therefore, preventing and treating maize leaf pests and diseases is critical for improving maize yield and quality. The accurate identification and characterization of pests and diseases enable timely pesticide application, reducing economic losses and enhancing yields.

Traditional methods for identifying and detecting crop pests and diseases rely on manual observation, which may be influenced by the subjective judgment of farmers or agricultural experts, leading to inconsistent results [3]. In addition, laboratory tests are commonly used for pest and disease diagnosis, offering more accurate analytical results. However, these methods often require expensive equipment, complex operational procedures, and significant time, limiting their widespread use in large-scale or real-time testing. With advancements in computer performance and computational power, computer vision and deep learning have rapidly developed, and convolutional neural networks (CNNs) have become essential tools for crop identification [4]. These neural network models can mimic the human visual system to achieve high-precision crop recognition by automatically learning and extracting features from images. For example, Huang Zhiquan et al. [5] used the MobileNetv3 network model to identify plant pests and diseases, achieving over 90% accuracy on the validation set and deploying it to a WeChat mini-program. Xiong Mengyuan et al. [6,7,8] introduced the CBAM [9] attention mechanism, modified the convolutional kernel size, used the SeLU activation function with Alpha Dropout, and made other improvements based on the ResNet network structure, achieving recognition accuracies of 97.5%, 98.3%, and 94.2% on the corn leaf disease dataset, respectively. Liu Yutong [10] proposed a fine-grained recognition method based on Transformer [11] to address the difficulty of recognizing different diseases among similar plants. Zhao Yue et al. [12] proposed a deep learning-based potato leaf disease detection method using the Faster R-CNN [13] network model, which achieved good recognition results. Xiang Xiaodong et al. [14] developed a plant pest and disease recognition method based on the Xception-CEMs neural network, which used multi-scale deep convolution with channel-assigned weights and group convolution to enhance spatial and channel feature extraction, achieving 91.9% accuracy. Wu Kui et al. [15] utilized EfficientNetV2 [16] as a feature extraction network, introduced the DeepViT [17] algorithm to modify network channels, and incorporated a feature fusion network for maize pest and disease recognition, achieving an average recognition accuracy of 97.72%. Pan Chenlu et al. [18] proposed a rice image recognition model, GE-DenseNet (G-ECA DenseNet), that combines the ECA (Efficient Channel Attention) attention mechanism with DenseNet201 and adopts the Focal Loss function to address class imbalance, achieving 83.52% accuracy. Zhang Guozhong et al. [19] developed a lotus leaf pest and disease recognition model based on improved DenseNet [20] and transfer learning, incorporating a branching structure to enhance shallow feature extraction, the Squeeze and Excitation attention mechanism, and sharpened cosine convolution within the Dense Block and Transition Layer, achieving 91.34% accuracy on the Plant Village dataset. These examples demonstrate that deep learning has achieved significant results in plant pest and disease recognition tasks and holds substantial advantages in agriculture.

However, the methods mentioned above for maize leaf pest and disease recognition have several shortcomings: First, most network models, despite some improvements, do not achieve high recognition accuracy. Second, traditional convolutional operations increase the number of parameters as the network layers increase, leading to higher time and computational costs, which affect the model’s deployment and performance. Third, using a fixed learning rate may cause the model to fall into a local optimum, affecting the recognition effectiveness.

To address these problems, we propose the ICS-ResNet lightweight network, which improves the ResNet50 structure by incorporating the improved channel attention module (ICA) and spatial attention module (ISA). These additions enhance the feature distinguishing ability and key information extraction. Traditional convolution operations in each Bottleneck are replaced with a depth-separable residual approach to reduce model complexity. Furthermore, we use cosine annealing for dynamic learning rate adjustment to prevent the model from falling into local optima. Experiments on the corn dataset in Plant Village [21] demonstrate that the improved network model not only enhances the recognition accuracy of maize leaf diseases but also significantly reduces the number of network parameters and computational costs. This provides a valuable reference for future research on plant pest and disease recognition and detection in practical applications.

2. Materials and Methods

2.1. ICS-ResNet Overall Network Architecture

The overall structure of ICS-ResNet, a lightweight network proposed for maize leaf disease classification, is shown in Figure 1. ResNet50 is used as the backbone network and is divided into five stages. STAGE0 preprocesses the input feature (INPUT) using a convolution operation with a kernel size of 7 × 7, a stride of 2, batch normalization (BN), and a ReLU activation function, resulting in an output feature map with 64 channels and a size reduced by half. This is followed by the I_CBAM block, which includes the improved channel attention module (ICA) and the spatial attention module (ISA). The MAXPOOL layer then reduces the size again by half using a convolution with a kernel size of 3 × 3 and a stride of 2, producing an output feature map with 64 channels, which is passed to the next stage.

STAGE1 to STAGE4 consist of 3, 4, 6, and 3 Bottleneck units, respectively. To reduce network parameters and computational cost, the conventional convolution after the first convolution unit in each Bottleneck is replaced with a depth-separable residual structure, resulting in DSR_BTNK1 and DSR_BTNK2. Here, C denotes the number of input feature map channels, W the dimension size, C1 the number of channels generated during execution, and S the stride. Depthwise convolution (Dw_CONV) performs channel-by-channel convolution, while pointwise convolution (PW_CONV) performs point-by-point convolution.

The improved channel attention module (ICA) and spatial attention module (ISA) are introduced after the first convolution unit in STAGE0 and the last DSR_BTNK2 in STAGE4. This allows the network to fully utilize detailed information from different feature layers, focusing on local key features and suppressing irrelevant regions. The operation mechanisms of the various modules in the network structure are described in detail below.

2.2. ResNet50 Network Architecture

Residual Neural Network (ResNet) is a convolutional neural network with a residual structure [22]. As the number of network layers increases, the network structure becomes increasingly complex, leading to the “degradation phenomenon”, where training error gradually increases. To address this issue, researchers introduced the residual network structure, as illustrated in Figure 2. The core idea is to add a skip connection to the convolutional layers of the traditional convolutional neural network. This approach allows the nonlinear superposition layer F(X) to approximate the target function H(X) more closely as the number of layers increases, making F(X) + X approximate H(X) through the skip connection. The addition of constant mapping in each residual module allows the network to easily learn to map shallow inputs into deep networks, preserving the informational integrity of the input data and effectively overcoming the vanishing gradient problem. ResNet50, another deep residual network structure, uses Bottleneck residual blocks, as shown in Figure 3.

Figure 2 illustrates the standard residual block (BasicBlock) in the ResNet network structure, which contains two convolutional layers and a skip connection for relatively shallow networks. First, the input feature map X passes through a convolutional layer with a 3 × 3 kernel to extract features. The output feature map is then batch normalized to stabilize it, and the ReLU activation function is applied to introduce nonlinearity. Next, the features output from the first convolutional layer pass through another convolutional layer with a 3 × 3 kernel, followed by batch normalization and activation. Finally, a skip connection is performed, adding the feature maps of the second convolutional output layer to the initial input to obtain the output features of a BasicBlock. The BasicBlock is suitable for building shallower ResNet models, such as ResNet18 and ResNet34.

Figure 3 shows BottleBlock, another residual block for ResNet, intended for building deeper ResNet models. It contains three convolutional layers and a smaller intermediate dimension to reduce computational complexity. First, the input feature map X passes through a convolutional layer with a 1 × 1 kernel to reduce the dimensionality of the input, lowering computational complexity and improving training and inference efficiency. It then goes through a convolution operation with a 3 × 3 kernel, where the convolved features are batch normalized and activated. Next, another 1 × 1 convolutional layer restores the dimensionality of the feature map to its original size. Finally, like BasicBlock, BottleBlock includes a skip connection that adds the features output from the third convolutional layer to the initial input features. BottleBlock is suitable for building deep ResNet models such as ResNet50, ResNet101, and ResNet152. Each BottleBlock contains three convolutional layers, enabling the more efficient training of deep models by reducing computational complexity, removing redundant information, and lowering the risk of overfitting. In this paper, ResNet50 is used as the backbone network, and its network structure parameters are shown in Table 1.

2.3. Improvement of the Spatial Attention Module

The high resolution of the input image provides rich details and semantic information. However, in the original ResNet50, the image is preprocessed in the STAGE0 stage, where a convolution operation with a 7 × 7 kernel is followed by max pooling, which generates the input feature layer for the next stage (STAGE1). This downsampling operation results in the loss of many detailed image features. To address this issue, we propose an improved spatial attention module (ISA), as shown in Figure 4. By using ISA to preprocess the input image, the feature representation of regions of interest is enhanced, and the loss of detailed features after max pooling with downsampling is reduced.

In ISA, the input image feature size is 64 × 112 × 112, where 112 and 64 denote the dimensions of the feature map produced after passing through a preceding convolutional layer with a 7 × 7 kernel, utilizing 64 convolutional kernels, and a stride of 2, followed by ReLU activation. Subsequently, global maximum pooling and global average pooling are applied, compressing the input image along the channel dimension to 1 × 112 × 112. The resulting two compressed features are concatenated to obtain a feature map of size 2 × 112 × 112. This map undergoes convolution operations with three kernels of sizes 3 × 3, 1 × 1, and 3 × 3, respectively, yielding a spatial attention map of size 1 × 112 × 112 that contains spatial feature information. Finally, this map is activated by the sigmoid function and multiplied element-wise with the initial input image features to produce output features of size 64 × 112 × 112, as shown in Equation (1).

{O = σ (C}^{3 \times 3} {(C}^{1 \times 1} {(M}_{c} {(I) + A}_{c} (I)))) \times I

(1)

where

O

denotes the output features,

σ

represents the sigmoid activation function,

C^{3 \times 3}

denotes 3 × 3 convolution,

C^{1 \times 1}

denotes 1 × 1 convolution,

M_{c}

signifies the global maximum pooling of channel dimensions,

A_{c}

indicates the global average pooling of channel dimensions, and

I

denotes the input features.

2.4. Improvement of the Channel Attention Module

As depicted in Figure 1, after undergoing multiple convolution and pooling operations, the feature map size at the output of STAGE4 in the network reaches 2048 × 7 × 7, with the number of channels increased to 2048. In the original ResNet50 network structure, STAGE4 is followed by global average pooling and a fully connected layer for the classification task’s output. However, this dimensionality reduction results in the loss of deep semantic information. To address this issue, this paper proposes to enhance the channel attention module (ICA), whose structure is illustrated in Figure 5. Prior to the dimensionality reduction step in the final layer of the network, ICA applies an attention mechanism to the output features. This ensures that deeper feature information within the network receives more emphasis, thereby mitigating the loss of features caused by channel reduction.

In ICA, the input image feature size is 2048 × 7 × 7, which is spatially compressed to 2048 × 1 × 1 through global maximum pooling and global average pooling. Subsequently, convolution operations with a kernel size of 1 × 1 are applied independently. This yields global maximum channel attention and global average channel attention maps of size 2048 × 1 × 1. These maps are then activated using the sigmoid activation function and multiplied element-wise with the initial input feature image, respectively. Finally, the results are summed to produce output features of size 2048 × 7 × 7, as shown in Equation (2).

{O = σ (C}^{1 \times 1} {(M}_{s} (I))) \times {I + σ (C}^{1 \times 1} {(A}_{s} (I))) \times I

(2)

where

O

denotes the output features,

σ

represents the sigmoid activation function,

C^{1 \times 1}

denotes 1 × 1 convolution,

M_{s}

signifies the global maximum pooling of spatial dimensions,

A_{s}

indicates the global average pooling of spatial dimensions, and

I

denotes the input features.

2.5. Depth-Separable Residual Structures

As network layers increase, the model’s parameter count rises, leading to heightened computational costs and longer runtimes. To address this, we introduce a depth-separable residual structure in the Bottleneck blocks from STAGE1 to STAGE4. Take DSR_BTNK1 in STAGE1 as an illustrative example, depicted in Figure 6. Depth-separable convolution drastically reduces the number of network parameters and computational demands. It achieves this by fusing input features with output features post point-by-point convolution through residual connections, thereby enhancing feature expression capability.

In Figure 6, the feature map with the input size C × H × W undergoes a convolution operation with a 1 × 1 convolution kernel and a stride S to produce a feature map with C1 channels and a constant spatial size. Subsequently, a 3 × 3 convolution kernel with a stride of 1 and channel-wise convolution with padding of 1 is applied. This operation aims to reduce computational complexity while maintaining the spatial dimensions constant. The padding of 1 ensures that the feature map dimensions remain consistent for subsequent residual connections. The channel-wise convolution operation is defined by Equation (3).

Y_{i, j, c} = \sum_{d = 1}^{D} \sum_{d^{'} = 1}^{D} X_{{i + d - 1, j + d}^{'} - 1, c} \cdot K_{{d, d}^{'}, c}

(3)

Let

X

represent the input feature map with the dimensions H × W × C, where H is the height, W is the width, and C is the number of channels. The convolution kernel K has the dimensions D × D × C, where D is the size of the kernel and C is the number of channels. The element

Y_{i, j, c}

denotes the output feature map after channel-by-channel convolution,

X_{{i + d - 1, j + d}^{'} - 1, c}

represents an element of the input feature map, and

K_{{d, d}^{'}, c}

are the convolution kernel parameters. Here,

i

and

j

indicate the spatial locations on the output feature map, while

c

is the channel index of the input feature map.

The features generated by channel-by-channel convolution are then subjected to point-by-point convolution using a 1 × 1 convolution kernel with a step size of 1. The computational formula is provided in Equation (4). Finally, the resulting feature map is summed with the initial input feature map and convolved using a 1 × 1 kernel to produce the output features of the depth-separable residual structure with a channel number of 4C1.

Z_{i, j, k} = \sum_{k = 1}^{K} Y_{i, j, k} \cdot P_{k}

(4)

where

Z_{i, j, k}

denotes an element of the output feature map from the point-wise convolution,

Y_{i, j, k}

denotes an element of the output feature map from the channel-wise convolution,

K

represents the depth dimension of the feature map after passing through the channel-wise convolution, and

P_{k}

denotes the convolution kernel for the point-wise convolution with a size of 1 × 1.

2.6. Cosine Annealing Learning Rate

In traditional network models, the learning rate is typically kept constant, which may not effectively optimize the model once training progresses beyond a certain number of iterations, potentially overlooking the optimal solution. This paper introduces the cosine annealing strategy to dynamically adjust the learning rate. By applying this approach, we aim to decrease loss, prevent the model from getting stuck in local optima, and expedite convergence towards the optimal solution, illustrated in Figure 7.

The initial learning rate in Figure 7 is set to 0.01. Initially, the first five epochs employ LinearLR for warm-up, gradually increasing the learning rate. Subsequently, the main learning rate scheduler switches to cosine annealing. Here, T_max is set to 45 epochs, defining the maximum period for cosine annealing. The scheduler spans from epoch 5 to epoch 50, encompassing both the warm-up and annealing phases. This approach starts with a conservative learning rate, gradually ramping it up during warm-up to aid in model convergence. The cosine annealing scheduler then fine-tunes the learning rate, dynamically adjusting it to prevent the model from converging prematurely to a suboptimal solution, thus enhancing the overall optimization results. The mathematical formula governing this process is given in Equation (5).

η_{t} {= η}_{\min}^{i} + \frac{1}{2} {(η}_{\max}^{i} {- η}_{\min}^{i}) (1 + \cos (\frac{T_{cur}}{T_{i}} π))

(5)

where

i

denotes the index value;

η_{\min}^{i}

and

η_{\max}^{i}

denote the minimum and maximum values of the learning rate, respectively;

T_{cur}

denotes the number of times the samples have been trained since the last occurrence; and

T_{i}

denotes the total number of times the samples need to be trained since the

i

occurrence.

2.7. Experimental Environment and Parameter Settings

The experimental environment for this study was based on a 64-bit Windows 11 operating system. The hardware configuration included an Intel(R) Core(TM) i9-10885H processor and a GeForce RTX 3090 GPU with 24GB of video memory. The deep learning framework PyTorch 1.7.1 was utilized, along with CUDA driver version 11.1. Programming was conducted in Python 3.8.3. The dataset was split into training and test sets in a 7:3 ratio. The training settings included a batch size of 16, an input image size of 224 × 224 pixels, and training conducted over 50 epochs. The optimization was performed using the SGD optimizer, with an initial learning rate set to 0.001, a momentum of 0.9, a weight decay (weight_decay) of 0.0005, and a warm-up tuning strategy of 5 cycles.

2.8. Evaluation Indicators

In order to objectively and realistically evaluate the performance of the network model proposed in this paper, the accuracy, precision, recall and F1 score are used in this experiment. The formulas for indexes are given in Equations (6)–(9).

A_{ccuracy} = \frac{T_{P} {+ T}_{N}}{T_{P} {+ T}_{N} {+ F}_{P} {+ F}_{N}}

(6)

R_{ecall} = \frac{T_{P}}{T_{P} {+ F}_{N}}

(7)

P_{recision} = \frac{T_{P}}{T_{P} {+ F}_{P}}

(8)

F 1 = \frac{2 \times P_{recision} \times R_{ecall}}{P_{recision} {+ R}_{ecall}}

(9)

where

T_{P}

,

T_{N}

,

F_{P}

and

F_{N}

denote true positive, true negative, false positive and false negative, respectively. The larger the value of the above four indicators, the higher the classification accuracy and the better the effect.

3. Data Construction

3.1. Dataset

In this study, we utilized two datasets: the corn dataset from Plant Village and the crop pest and disease dataset available on the Kaggle platform. The datasets encompass five distinct categories of corn leaf diseases: caecilian leaf spot, common rust, northern leaf blight, corn line virus, and healthy specimens, totaling 8280 images. For our experiments, we partitioned this dataset into a training set and a test set, maintaining a ratio of 7:3. The dataset parameter information is presented in Table 2.

3.2. Data Preprocessing

To enhance the model’s robustness and generalization capabilities while mitigating overfitting, constructing a high-quality and adequately large dataset is crucial. Therefore, we implemented several data augmentation techniques, including uniform cropping of the image center to 224 × 224 pixels, horizontal flipping, random rotation, brightness adjustment, noise addition, and others, as depicted in Figure 8.

4. Experiments and Analysis of Results

4.1. Ablation Experiment

In order to thoroughly analyze and evaluate the effectiveness of the model proposed in this paper, we conducted ablation experiments targeting various improvement points. The evaluation criteria included accuracy (Acc), mean accuracy rate (mAP) across all categories, mean recall rate (mAR) across all categories, F1 score, computational complexity (Flops), and parameter count (Pa). To ensure experimental fairness, conditions were kept consistent except for variations in the modules tested. The results of these experiments are presented in Table 3.

As shown in Table 3, the model achieves a 2.64% accuracy improvement with the introduction of the improved channel attention module (ICA) and spatial attention module (ISA), accompanied by slight increases in computational and parameter quantities. However, integrating the depth-separable residual structure (DSR) instead of traditional convolution results in a 1.87% accuracy reduction. Yet, it significantly reduces computational costs by 60% and decreases parameters by 72%. This trade-off indicates that adopting DSR sacrifices some accuracy for faster inference and reduced model complexity.

The dynamic adjustment of the learning rate using cosine annealing shows no change in parameters or computations, while all other metrics improve, particularly accuracy by 1.8%. Furthermore, fine-tuning through migration learning improves accuracy by 2.8%, and slightly reduces parameters and computations. Ultimately, leveraging attention mechanisms, DSR, cosine annealing learning rates, and migration learning collectively enhances the proposed model’s Acc, mAP, mAR, and F1 scores on the corn disease dataset by 3.45%, 4.92%, 4.90%, and 4.93%, respectively. It also reduces parameters by 16.27 M and computations by 2.25 G compared to pre-improvement stages, significantly enhancing classification efficiency.

4.2. Comparison Experiment

To comprehensively validate the effectiveness of the proposed model in this paper, we compared it with eight popular network models: CSPNet [23], InceptionNet_v3 [24], EfficientNet [25], ShuffleNet [26], MobileNet [27], ResNet50, ResNet101 and ResNet152. The experiments involved conducting tests every other cycle during training to capture various metrics. To ensure fairness, all experimental parameters were kept consistent, and the results are summarized in Table 4.

From Table 4, it is evident that the network model proposed in this paper outperforms other networks across all metrics. Specifically, its accuracy surpasses that of other networks by 5.03%, 3.18%, 1.13%, 1.81%, 1.13%, 0.68%, 0.60%, and 0.44%, respectively. The ResNet series outperforms all other models except the network models proposed in this paper. This indicates that connecting layers via residuals helps mitigate gradient vanishing, making the network easier to train and improving model performance. Additionally, the recognition accuracies of ResNet101 and ResNet152 are higher than that of ResNet50 by 0.24% and 0.08%, respectively. This suggests that deeper networks can learn more complex feature representations, enhancing the model’s recognition ability, albeit at the cost of increased computational resources and training time. Figure 9 illustrates the performance curves of different network models. It shows that ShuffleNet exhibits the highest fluctuation in its performance metrics, indicating poor convergence and stability. CSPNet and InceptionNet_v3 also demonstrate significant early-stage fluctuations in their curves, gradually stabilizing in later stages with improved convergence. In contrast, the ICS-ResNet network proposed in this paper exhibits the smallest fluctuations, highest stability, and rapid convergence after the 20th training cycle. This underscores that the proposed network model achieves high accuracy and fast convergence, outperforming the other networks overall.

Table 5 presents the experimental results for five maize diseases: caecilian leaf spot, common rust disease, healthy specimens, corn line virus, and northern leaf blight, across various network models. The results indicate that the models proposed in this paper outperform the other networks in all disease categories, with the exception of the average recall (AR) for corn line virus, where the values are 1.03% lower than the ResNet50 network.

4.3. Visualization

4.3.1. Grad-CAM

Figure 10 shows the Grad-CAM class activation mapping for different network models, where the intensity of the colors corresponds to the degree of influence on model decisions. Darker colors, such as red, indicate high activation regions that contribute positively to the predicted categories, while lighter colors, such as green, indicate low activation regions. It can be seen that the method proposed in this paper can clearly and accurately locate the ‘interested’ regions in all categories of disease maps, focusing on a wider range of regions.

4.3.2. Confusion Matrix

To provide a comprehensive understanding of the classification results, a confusion matrix diagram was generated for the network model proposed in this paper, depicted in Figure 11. In the diagram, the horizontal axis represents the predicted labels, while the vertical axis represents the true labels. The labels for various types of diseases are positioned horizontally from left to right and vertically from top to bottom. An analysis of Figure 11 reveals that Cercospora leaf spot was frequently misidentified as northern leaf blight, and vice versa. This indicates similarity in characteristics between these two diseases, posing challenges for the network’s classification accuracy. Conversely, no misidentifications were observed for common rust and maize streak virus, achieving 100% accuracy for both. This underscores the robust performance of the network in classifying maize diseases within the corn disease dataset.

4.3.3. Effectiveness of Identification of Different Corn Diseases

To provide a more intuitive reflection of this paper’s method performance, the validation results for five different types of corn diseases are presented in Figure 12. The figure illustrates that the ICS-ResNet model effectively identifies various types of diseases with high accuracy. This reaffirms the strong performance of the method proposed in this paper for corn disease recognition tasks.

5. Discussion and Conclusions

This study introduces ICS-ResNet, a lightweight network designed to address the challenges of slow recognition speeds and low accuracy in classifying maize leaf diseases under complex environmental conditions. By leveraging ResNet50 as the backbone and incorporating improved spatial attention (ISA) and improved channel attention (ICA) mechanisms, our model emphasizes key information extraction at STAGE0 and STAGE4. These enhancements are complemented by the replacement of traditional convolution with depth-separable residual structures in each Bottleneck module, reducing network parameters and computations while enhancing feature expression through residual connections. Furthermore, the model employs a cosine annealing strategy to dynamically adjust the learning rate, facilitating rapid convergence and avoiding local optima. The experimental results show that (1) our proposed ICS-ResNet achieves 98.87% accuracy in corn leaf disease classification, surpassing the current state-of-the-art methods CSPNet, InceptionNet_v3, EfficientNet, ShuffleNet, MobileNet, ResNet50, ResNet101, and ResNet152 by 5.03%, 3.18%, 1.13%, 1.81%, 1.13%, 0.68%, 0.44%, and 0.60%, respectively. (2) Compared to the original ResNet50, ICS-ResNet reduces parameters and computation by 69.21% and 54.88%, respectively, within a lightweight framework, underscoring its efficiency. Therefore, this study provides strong technical support for the management of corn diseases in practical production, reduces the cost of agricultural production, and improves the efficiency of disease management. Consequently, it increases overall crop yield and promotes the sustainable development of agriculture.

Although the method presented in this paper has achieved good results in the classification of corn diseases, it is limited to five types due to the unavailability of data for other categories. Additionally, the study employed a supervised learning approach, which requires a large amount of labeled data. In future research, we will enrich the sample data with more types of corn diseases and combine semi-supervised learning methods to leverage unlabeled data, further enhancing the model’s recognition performance and generalization ability.

Author Contributions

Conceptualization and methodology, Z.J. and S.B.; validation and data curation, Z.J.; writing—original draft preparation, Z.J.; review S.B., M.C. and L.W.; funding acquisition, S.B. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Lanzhou Municipal Talent Innovation and Entrepreneurship Project (2021-RC-47), Ministry of Science and Technology National Foreign Expertise Project (No. G2022042005L), Gansu Higher Education Institutions Industrial Support Project (No. 2023CYZC-54), Gansu Key R&D Program (No. 23YFWA0013) Gansu Agricultural University Aesthetic and Labor Education Teaching Reform Project (No. 2023-09), the Open Research Fund of National Mobile Communications Research Laboratory (No. 2023D15) and the Natural Science Foundation of Zhejiang, China (LGF22H120009).

Data Availability Statement

Data were derived from the following resources available in the public domain: https://github.com/spMohanty/PlantVillage-Dataset (accessed on 2 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Q.; Li, Y. Mechanism and economic effect of agricultural equipment digital intelligence to enhance food security. J. Northwest Agric. For. Univ. 2023, 23, 76–83. [Google Scholar]
Wang, S.; Zhao, Y.; Xu, Y.; Hu, X.; Li, X. Characterization of the coupled spatial and temporal evolution of corn supply and demand in China and response strategies. Southwest J. Agric. 2023, 36, 1625–1635. [Google Scholar]
Zhang, M.; Qian, R.; Zhu, J.; Zhang, L.; Li, R.; Dong, W. Progress and Prospect of Crop Pest Image Recognition Research. Anhui Agric. Sci. 2018, 46, 11–12+15. [Google Scholar]
Shi, B.; Li, J.; Zhang, L.; Li, J. CNN-based image recognition model for crop pests and diseases. Comput. Syst. Appl. 2020, 29, 89–96. [Google Scholar]
Huang, Z.; Xu, Y.; Li, C.; Sun, L. Neural network based pest recognition system. Netw. Secur. Technol. Appl. 2023, 5, 46–48. [Google Scholar]
Xiong, M.; Zhan, W.; Gui, L.; Liu, H.; Wang, P.; Han, T.; Li, W.; Sun, Y. Detection and identification of maize leaf diseases based on ResNet model. Jiangsu Agric. Sci. 2023, 51, 164–170. [Google Scholar]
Huang, Y.L.; Ai, X. Improved residual network for classification of corn leaf disease images. Comput. Eng. Appl. 2021, 57, 178–184. [Google Scholar]
Zeng, P.; Sa, J.; Liu, J. Corn leaf disease recognition method based on improved ResNet18. J. Inn. Mong. Agric. Univ. 2024, 6, 1–10. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, H.; Jian, Y.; Ren, D.; Li, Y.; Wei, Y. A deep learning-based method for potato leaf disease detection. Chin. J. Agric. Mech. Chem. 2022, 43, 183–189. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Xiang, S.D.; Zhai, W.; Huang, Y.T.; Liu, W. Plant disease recognition based on Xception-CEMs neural network. Chin. J. Agric. Mech. Chem. 2021, 42, 177–186. [Google Scholar]
Wu, K.; Gao, B. Research on maize pest and disease recognition and grading with improved EfficientNetV2. Mod. Electron. Technol. 2023, 46, 68–74. [Google Scholar]
Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Zhou, D.; Kang, B.; Jin, X.; Yang, L.; Lian, X.; Jiang, Z.; Hou, Q.; Feng, J. Deepvit: Towards deeper vision transformer. arXiv 2021, arXiv:2103.11886. [Google Scholar]
Pan, C.; Zhang, Z.; Gui, W.; Ma, J.; Yan, C.; Zhang, X. A rice pest and disease recognition method integrating ECA mechanism and DenseNet201. Intell. Agric. 2023, 5, 45–55. [Google Scholar]
Zhang, G.; Lv, Z.; Liu, H.; Long, C.; Huang, C. A lotus leaf pest and disease recognition model based on improved DenseNet and migration learning. J. Agric. Eng. 2023, 39, 188–196. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Ma, X.; Xing, X.; Wu, Q. Improved ConvNext-based classification of maize leaf diseases in complex context. Jiangsu Agric. Sci. 2023, 51, 190–197. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Wang, C.Y.; Liao HY, M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 390–391. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6105–6114. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]

Figure 1. ICS-ResNet network structure diagram.

Figure 2. BasicBlock diagram.

Figure 3. BottleBlock diagram.

Figure 4. ISA module diagram.

Figure 5. ICA module diagram.

Figure 6. Depth-separable residual structure diagram.

Figure 7. Cosine annealing learning rate.

Figure 8. Data augmentation example diagram.

Figure 9. Change curves of various indicators of different network models.

Figure 10. Grad-CAM.

Figure 11. Confusion matrix diagram.

Figure 12. Identification effect diagram.

Table 1. ResNet50 network parameters.

Lay_Name.	Output_Size	50-Layer
conv1	112 × 112	7 × 7, 64, stride 2
conv2_x	56 × 56	3 × 3, max pool, stride 2
conv2_x	56 × 56	$[\begin{array}{l} \begin{array}{l} 1 \times 1, 64 \\ 3 \times 3, 64 \end{array} \\ 1 \times 1, 256 \end{array}] \times 3$
conv3_x	28 × 28	$[\begin{array}{l} \begin{array}{l} 1 \times 1, 128 \\ 3 \times 3, 128 \end{array} \\ 1 \times 1, 512 \end{array}] \times 4$
conv4_x	14 × 14	$[\begin{array}{l} \begin{array}{l} 1 \times 1, 256 \\ 3 \times 3, 256 \end{array} \\ 1 \times 1, 1024 \end{array}] \times 6$
Conv5_x	7 × 7	$[\begin{array}{l} \begin{array}{l} 1 \times 1, 512 \\ 3 \times 3, 512 \end{array} \\ 1 \times 1, 2048 \end{array}] \times 3$
	1 × 1	average pool, fc, softmax

Table 2. The dataset parameter information.

Disease Category	Total No. of Samples	No. of Samples in Training Set	No. of Samples in Test Set
Caecilian Leaf Spot	1642	493	1149
Common Rust	1907	1334	573
Northern Leaf Blight	1908	1335	573
Corn Line Virus	964	674	290
Healthy	1859	1301	558

Table 3. The influence of different modules on classification results.

Backbone	Ablation Factors
Backbone	ICA + ISA	DSR	CLR	Transfer Learning	Acc/%	mAP/%	mAR/%	F1/%	Flops/G	Pa/M
ResNet50	—	—	—	—	95.42	94.03	93.91	93.95	4.10	23.51
	✓	—	—	—	98.06	97.37	97.17	97.27	4.25	23.62
	—	✓	—	—	96.19	95.28	94.58	94.88	1.64	6.73
	—	—	✓	—	97.22	95.97	96.80	96.29	4.10	23.51
	—	—	—	✓	98.19	97.89	98.27	98.07	3.74	20.63
Ours	✓	✓	✓	✓	98.87	98.95	98.81	98.88	1.85	7.24

Table 4. Comparative experimental results of different network models.

Model	Acc/%	mAP/%	mAR/%	F1 Score/%
CSPNet	93.84	93.81	92.76	93.21
InceptionNet_v3	95.69	95.89	95.08	95.44
EfficientNet	97.74	97.79	97.73	97.76
ShuffleNet	97.06	97.26	97.06	97.10
MobileNet	97.74	97.86	97.69	97.75
ResNet50	98.19	97.89	98.27	98.07
ResNet152	98.27	98.33	98.42	98.35
ResNet101	98.43	98.49	98.45	98.46
ICS-ResNet	98.87	98.95	98.81	98.88

Table 5. Experimental results of various types of diseases in different network models.

Model	Caecilian Leaf Spot			Common Rust			Healthy			Corn Line Virus			Northern Leaf Blight
Model	AP/%	AR/%	F1/%	AP/%	AR/%	F1/%	AP/%	AR/%	F1/%	AP/%	AR/%	F1/%	AP/%	AR/%	F1/%
CSPNet	88.63	87.01	87.81	97.92	98.95	98.43	96.52	99.46	97.97	94.96	84.48	89.41	91.03	93.89	92.43
InceptionNet_v3	91.51	89.65	90.57	99.82	98.60	99.20	98.75	99.46	99.10	98.14	91.03	94.45	91.26	96.68	93.89
EfficientNet	94.92	94.92	94.92	99.30	99.82	99.56	99.64	99.46	99.55	98.95	98.27	98.61	96.16	96.16	96.16
ShuffleNet	97.11	88.64	92.68	99.47	99.65	99.56	99.64	99.82	99.73	98.29	99.31	98.79	91.81	97.90	94.76
MobileNet	97.43	92.29	94.79	99.65	99.82	99.73	99.82	99.64	99.73	98.62	98.62	98.62	93.82	98.08	95.90
ResNet50	97.53	96.14	96.83	100.00	99.82	99.91	99.81	97.84	98.82	95.06	99.65	97.30	97.05	97.90	97.48
ResNet101	97.89	94.52	96.18	100.00	99.30	99.64	100.00	99.82	99.91	98.72	99.65	99.31	95.61	98.95	97.25
ResNet152	93.46	98.58	95.95	100.00	99.47	99.73	100.00	99.82	99.91	99.65	99.65	99.65	98.54	94.58	96.52
Ours	97.36	97.56	97.46	100.00	99.82	99.91	99.82	100.00	99.91	100.00	98.62	99.30	97.56	98.08	97.82

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ji, Z.; Bao, S.; Chen, M.; Wei, L. ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification. Agronomy 2024, 14, 1587. https://doi.org/10.3390/agronomy14071587

AMA Style

Ji Z, Bao S, Chen M, Wei L. ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification. Agronomy. 2024; 14(7):1587. https://doi.org/10.3390/agronomy14071587

Chicago/Turabian Style

Ji, Zhengjie, Shudi Bao, Meng Chen, and Linjing Wei. 2024. "ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification" Agronomy 14, no. 7: 1587. https://doi.org/10.3390/agronomy14071587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

ICS-ResNet: A Lightweight Network for Maize Leaf Disease Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. ICS-ResNet Overall Network Architecture

2.2. ResNet50 Network Architecture

2.3. Improvement of the Spatial Attention Module

2.4. Improvement of the Channel Attention Module

2.5. Depth-Separable Residual Structures

2.6. Cosine Annealing Learning Rate

2.7. Experimental Environment and Parameter Settings

2.8. Evaluation Indicators

3. Data Construction

3.1. Dataset

3.2. Data Preprocessing

4. Experiments and Analysis of Results

4.1. Ablation Experiment

4.2. Comparison Experiment

4.3. Visualization

4.3.1. Grad-CAM

4.3.2. Confusion Matrix

4.3.3. Effectiveness of Identification of Different Corn Diseases

5. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI