Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer

Yang, Ruotong; Guo, Yaojiang; Hu, Zhiwei; Gao, Ruibo; Yang, Hua

doi:10.3390/agriculture13081513

Open AccessArticle

Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer

by

Ruotong Yang

,

Yaojiang Guo

,

Zhiwei Hu

,

Ruibo Gao

and

Hua Yang

^*

College of Information Science and Engineering, Shanxi Agricultural University, Jinzhong 030801, China

^*

Author to whom correspondence should be addressed.

Agriculture 2023, 13(8), 1513; https://doi.org/10.3390/agriculture13081513

Submission received: 27 June 2023 / Revised: 23 July 2023 / Accepted: 25 July 2023 / Published: 28 July 2023

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Accurate semantic segmentation of disease spots is critical in the evaluation and treatment of cucumber leaf damage. To solve the problem of poor segmentation accuracy caused by the imbalanced feature fusion of SegFormer, the Efficient Channel Attention SegFormer (ECA-SegFormer) is proposed to handle the semantic segmentation of cucumber leaf disease spots under natural acquisition conditions. First, the decoder of SegFormer is modified by inserting the Efficient Channel Attention and adopting the Feature Pyramid Network to increase the scale robustness of the feature representation. Then, a cucumber leaf disease dataset is built with 1558 images collected from the outdoor experimental vegetable base, including downy mildew, powdery mildew, target leaf spot, and angular leaf spot. Tested on the dataset, the Mean Pixel Accuracy of ECA-SegFormer is 38.03%, and the mean Intersection over Union is 60.86%, which is 14.55% and 1.47% higher than SegFormer, respectively. These findings demonstrate the superiority of ECA-SegFormer over the original SegFormer, offering enhanced suitability for precise segmentation of cucumber leaf disease spots in the natural environment.

Keywords:

cucumber leaf disease spots; semantic segmentation; SegFormer; efficient channel attention; feature pyramid network

1. Introduction

The cucumber, recognized as the third most widely consumed vegetable crop globally [1], is susceptible to debilitating leaf diseases that exert a detrimental impact on yield and engender substantial economic losses to the agricultural economy. Disease severity estimation by calculating the proportion of the lesion out of the total leaf area on the leaves can help to achieve the diagnosis and treatment of cucumber diseases. Nevertheless, the conventional methodologies employed for such assessments, relying on labor-intensive manual estimations [2], have exhibited notable deficiencies in terms of disease management and treatment efficacy. Therefore, a semantic segmentation method is highly needed to help estimate cucumber leaf disease severity efficiently, which can minimize the labor cost of agricultural experts.

High-efficiency segmentation of leaf disease spots is a research hot spot. Various computer vision segmentation methods have been employed to segment targets based on image characteristics such as color, texture, shape, and space [3,4,5,6,7]. However, these traditional approaches suffer from limitations in their application and are time-consuming. The advent of deep neural networks has spurred rapid development in image segmentation technology. In a study by Minaee et al. [8], numerous deep-learning-based segmentation methods introduced before 2019 were compared. Convolutional neural networks (CNNs) have found extensive application in agricultural disease segmentation tasks, facilitating more accurate identification of disease spots and broadening the scope of their utilization [9,10,11,12]. In 2017, a functional deep fully convolutional neural network architecture, SegNet [13], was proposed and applied to segment crop-disease images [14,15]. Wang et al. [16] used DeepLab v3+ and U-Net methods to segment three kinds of disease spots from cucumber leaves and calculated their damage levels with a Dice accuracy of 69.14%. Similarly, many studies have applied the U-Net model to segment the leaf disease spots of various crops and achieved high accuracies, such as 89.18% for persimmon leaf disease spots [17], 94.58% for wheat stripe rust disease spots [18], and 98.24% for strawberry gray mold disease spots [19].

Compared with the convolution operation in CNNs with a limited receptive field, the inherent self-attention mechanism in the Transformers [20] can capture long-distance information and dynamically adapt the receptive field according to the image content. Consequently, Transformers are deemed more flexible and powerful than CNNs, holding promise for advancing visual recognition [21]. Several visual networks based on Transformers have been proposed, including Detection Transformer (DETR) [22], Vision Transformer (ViT) [23], Swin Transformer (SwinT) [24], Segmentation Transformer (SETR) [25], and SegFormer [26]. Wang et al. [27] enhanced the SwinT network and applied it to data augmentation and identification of actual cucumber leaf diseases. Wu et al. [28] efficiently segmented tomato leaf disease spots through several improvements to DETR, achieving a disease classification accuracy of 96.40%. Reedha et al. [29] employed ViT to classify weeds and crop images captured by Unmanned Aerial Vehicles, surpassing CNN performance with an F1 score of 99.28%. Li et al. [30] proposed a lightweight network based on copy–paste and SegFormer for accurate disease-region segmentation and severity assessment, achieving an MIoU of 85.38%. Zhang et al. [31] proposed a customized segmentation architecture referred to as the Cross-Resolution Transformer for grape leaf disease in the field. SegFormer is a simple, efficient, and robust semantic segmentation framework that unifies Transformers with lightweight multi-layer perceptron decoders. However, SegFormer cannot sift features that have the greatest impact on the results and misses the reuse of low-level features, making it difficult to segment cucumber spots with smaller pixel ratios. In recent years, the combined use of the attention mechanism and the Feature Pyramid Network (FPN) has been extensively explored [32,33,34]. The attention mechanism can be used for discriminating feature selection. When extracting information, it can strengthen the weight of regional features related to the task and improve the task effect. It has the advantage of being plug-and-play and is easy to embed in various task models. The FPN hierarchically fuses deep and shallow information through skip connections, which can be used to solve imbalanced feature fusion.

Inspired by these works, in order to address the aforementioned limitations and enhance the efficacy of cucumber disease spot segmentation, we present the Efficient Channel Attention SegFormer (ECA-SegFormer) approach. The main contributions of our work are as follows:

A new dataset, including four cucumber leaf diseases, is collected under natural conditions to demonstrate the effectiveness of the proposed method.
The Efficient Channel Attention (ECA) module is added to the decoder of SegFormer. ECA can improve the representational power of the model by extracting attention from the channel dimensions of the feature map and focusing on the most salient components of the information.
The FPN module is used in the decoder of SegFormer to represent the output of picture information by fusing features from different layers and using multi-scale feature maps for prediction.
The segmentation results for different types of disease, different numbers of disease types, different clarity of spots, different levels of shading, and different levels of sparseness are visualized.

2. Materials and Methods

2.1. Dataset

2.1.1. Image Data Acquisition

We conducted image acquisition for diseased cucumber leaves within an outdoor experimental vegetable base located in the Taigu District, Jinzhong City, Shanxi Province, China. The data collection spanned from 30 April to 5 May 2022 (temperature: 26.7

^{°}

C, humidity: 41.4%, cloudy to sunny). The devices used were iPhone 13s, and the resolution was 4032 × 3024 pixels. The dataset consisted of 1558 images of diseased cucumber leaves containing four classes: Downy mildew, Powdery mildew, Target leaf spot, and Angular leaf spot. Rigorous annotation and segmentation of all images were carried out by expert technicians with the software (Labelme 4.5.13, An image annotation tool developed at Massachusetts Institute of Technology’s (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL), https://github.com/wkentaro/labelme, accessed on 19 May 2022). These expert technicians are all professors in plant protection and have extensive experience in disease diagnosis. Figure 1 illustrates representative examples of diseased-cucumber leaf images for each category, along with their corresponding annotations.

2.1.2. Dataset Preprocessing

The collected diseased-cucumber leaf images cover different scenarios. To be able to obtain suitable model input, the dataset was preprocessed. The entire data process is shown in Figure 2.

Before the experimental training, the 1558 annotated diseased-cucumber leaf images were divided in a ratio of 8:2. A total of 1245 images were selected as the training and validating sets, and 313 were chosen as the testing set. During the network training process, the 1245 images were divided into training and validating sets with a ratio of 8:2. The training set was used for the learning of the weight parameters of the model; the validating set was used to optimize the structure of the model while reducing the complexity of the model, and the testing set was used to evaluate the model.
Firstly, the picture was resized to 2048 × 512 pixels with a size scaling-ratio range of 0.5–2.0 and blank areas filled with black pixels. Then, the image was cropped to 512 × 512 pixels, with each type of disease spot taking up less than 0.75 of the whole picture.
To increase the diversity of the dataset and the robustness of the model, in this study, we performed data augmentation in both the training and validating sets. Each augmentation operation was performed for each image with a probability of 0.5. The data augmentation operations and the corresponding values are shown in Table 1.
Before entering the model, three channels’ mean values (123.675, 116.28, 103.53) and variances (58.395, 57.12, 57.375) were taken to normalize the image values to speed up model convergence.

2.2. Semantic Segmentation Based on ECA-SegFormer

2.2.1. SegFormer

As shown in Figure 3, SegFormer consists of an encoder and a decoder. The Transformer block inside the encoder uses Overlap Patch Embeddings (OPEs) to extract features and down-sample from the input image. Then, it inputs the resulting features into the Efficient Self-Attention (ESA) and Mix Feed-Forward Network (Mix-FFN). The OPE is computed using standard convolutional layers, and after spatially reshaping 2D features into 1D features, they are input into the ESA layer for self-attentive computation and feature enhancement. To replace the positional encoding in the normal Transformer, a 3 × 3 Conv is added between the two linear layers of the Feed-Forward Network (FFN) to fuse the position information on the space. The linear layers in the encoder are followed by Layer Normalization (LN), and the activation function is Gaussian Error Linear Units (GELUs). Transformer Block uses multiple ESAs and MixFFNs to deepen the network to extract rich details and semantic features. Self-attention is computed in ESA at each scale. Compared with other previous networks, which perform self-attention computation after integrating information on all scales based on convolutional neural networks, the self-attention at each scale is purer. The encoder for the model in this paper uses MiT-B0, and the primary hyperparameters are shown in Table 2.

SegFormer’s decoder consisted of four main steps. First, the feature maps from the four stages of the encoder were fed into an MLP layer to adjust the dimension of channels to 256. Then, in the second step, features were up-sampled to 1/4th, after which they were concatenated as a feature map with a channel dimension of 256. Third, the MLP layer was used to fuse the cascade features. Finally, another MLP layer took the fused feature to predict the semantic segmentation images. To enhance the model’s generalization performance, Batch Normalization (BN) was used after the MLP in the decoder, and the activation function was Rectified Linear Unit (RELU).

2.2.2. ECA-SegFormer Network Structure

To address the challenge of imbalanced feature fusion in SegFormer, we propose the construction of the ECA-SegFormer network, which effectively enhances segmentation efficiency. The SegFormer decoder undergoes significant improvements through the integration of two novel modules. As depicted in Figure 4, the comprehensive architecture of the ECA-SegFormer model is presented. In order to minimize information redundancy within the semantic features and further augment feature expression, we leveraged the Efficient Channel Attention (ECA) module [35] to process the feature output from each of the encoder’s four stages. Additionally, the Feature Pyramid Network module (FPN) [36] employs a multi-scale feature-fusion approach, leading to notable enhancements in the scale robustness of feature expression.

2.2.3. Efficient Channel Attention Module

The ECA module was obtained after improving Squeeze-and-Excitation (SE) networks [37], and is more efficient. By constantly controlling the magnitude of the weight, high weights are used to enhance important information, and low weights are used to weaken irrelevant information, making it possible to select important information even in different situations. The SE block first employs Global Average Pooling (GAP) for each input feature channel independently. Two fully connected (FC) layers with non-linearity followed by a Sigmoid function are used to generate channel weights. The two FC layers are designed to capture non-linear cross-channel interaction, which involves dimensionality reduction for controlling model complexity. The dimensionality reduction operation reduces the complexity of the model, which destroys the direct correspondence between the channel and its weights. Compared with SE, the ECA module learns effective channel attention by avoiding the channel dimensionality reduction while capturing cross-channel interactions in an extremely lightweight way. Its implementation process is shown in Figure 5.

After aggregating convolution features using GAP without dimensionality reduction, the ECA module first adaptively determines kernel size k = 3, where the size of the convolution kernel k indicates how many neighbors near the channel participate in the attention prediction of this channel. Then, the Sigmoid function is used to learn the channel attention to obtain a new feature image with a matrix size of H × W × C. In this method, multi-level features go through the ECA module for effective channel attention learning, which aims to improve the segmentation effectiveness of the model.

2.2.4. Feature Pyramid Networks Module

Feature context information is essential for segmenting cucumber leaf disease spots. Some cucumber leaf disease spots occupy a relatively small proportion of pixels in the image. SegFormer misses the reuse of low-level features important for detecting small objects, resulting in unbalanced feature fusion, making it difficult to segment cucumber spots with smaller pixel ratios. The expressiveness of features varies at different levels of the feature map, with low-level features reflecting details such as light and dark, edges, and so on, and high-level features reflecting a richer overall structure. The feature maps generated by Block4, Block3, Block2, and Block1 in the pyramidal feature hierarchy are fused to form the FPN. The structure of the FPN is shown in Figure 6. The feature map sizes of Block4, Block3, Block2, and Block1 are 1/32, 1/16, 1/8, and 1/4, and the number of channels is 256, 160, 64, and 32. In the depth direction, the depth of the high-level feature map is compressed to the same as that of the low-level feature map through a 1 × 1 convolution layer. In the directions of length and width, the width and height of the high-level feature map are pulled into the size of the low-level feature map via upsampling. Finally, the new and original low-level feature maps are added by corresponding elements to realize the fusion of the high-level and low-level features. Then the fused feature map of each layer is output as a new feature map with a depth of 256. Each layer of the feature map of the constructed FPN integrates the high-level feature map and the low-level feature map, which has richer semantic and spatial information and can improve the segmentation effect of the SegFormer network.

3. Experiments and Results

3.1. Implementation Details and Evaluation Metrics

3.1.1. Implementation Details

We pre-trained the encoder and randomly initialized the decoder on the Imagenet-1K dataset. The hardware and software parameters employed for the training and testing configuration are shown in Table 3. Table 4 shows the hyperparameters.

3.1.2. Evaluation Metrics

To evaluate the method correctly, we used the metrics commonly used in the field of semantic image segmentation: Intersection over Union (IoU), mean Intersection over Union (mIoU), Pixel Accuracy (PA), and Mean Pixel Accuracy (MPA). The complexity of the model was quantified using Parameters (Params) and Floating-point Operations Second (FLOPs). Let the correct category-labeled image be G, the predicted category-labeled image be P, and H and W represent the height and width of the label image, respectively. The

P_{ij}

denotes the number of pixels whose actual label is category i, predicted to be category j.

P_{ij}

is calculated as follows:

P_{ij} = \sum_{y = 1}^{H} \sum_{x = 1}^{W} I (G (x, y), i) \cdot I (P (x, y), j)

(1)

where G(x, y) and P(x, y) denote the pixel values at (x, y) for the correct category-labeled image and the predicted category-labeled image, respectively. I(i, j) is a schematic function, and I(G(x, y), i) denotes that if the pixel value of G(x, y) at (x, y) is category i, the value of this function is one; otherwise, it is zero.

IoU is the ratio of the Intersection over Union of the true labels and predicted labels for a specific category, and mIoU is the mean of the Intersection over Union for each class. The mathematical expressions for IoU and mIoU are:

{IoU}_{i} = \frac{p_{ii}}{\sum_{j = 0}^{k} P_{ij} + \sum_{j = 0}^{k} P_{ji} - p_{ii}}

(2)

mIoU = \frac{1}{k + 1} \sum_{i = 0}^{k} {IoU}_{i}

(3)

where k is the largest number representing a valid class label, and k+1 is the total number of classes.

PA is the ratio of the number of pixels correctly predicted to the total number for a specific category. MPA represents the mean of PA for each class. The mathematical expressions for PA and MPA are:

{PA}_{i} = \frac{P_{ii}}{\sum_{j = 0}^{k} P_{ij}}

(4)

MPA = \frac{1}{k + 1} \sum_{i = 0}^{k} {PA}_{i}

(5)

3.2. Experimental Results and Analysis

3.2.1. Comparison of Different Pyramid Modules

To explore the effect of varying pyramid modules on SegFormer segmentation performance, we inserted two kinds of pyramid modules, Single Prediction Pyramid (SPP) and FPN. SPP is a top-down architecture with skip connections, where predictions are made on the finest level. FPN has a similar structure but can make predictions independently at all levels. The comparison results of different pyramid modules are shown in Table 5. When the SPP was used, MIoU and MPA reduced by 13.04% and 20.32%, respectively. When the FPN was used, the MPA improved by 2.67%, and the PA of downy mildew, powdery mildew, and angular leaf spot significantly enhanced by 4.82%, 5.79%, and 14.31%, respectively. It can be judged that FPN is more likely to improve the performance of SegFormer in the cucumber leaf disease spots segmentation task. The main reason is that FPN can extract richer semantic and contextual information.

3.2.2. Comparison of Different Attention Modules

After choosing the appropriate pyramid module, to further explore the effects of the different attention modules, we added ten attention modules to SegFormer in which the FPN had been inserted: CBAM [38], CoT [39], ECA, ParNet [40], SE, PSA [41], SGE [42], SA [43], SimAM [44], SK [45], and TripA [46]; the corresponding results are listed in Table 6. As seen from Table 6, adding SE, SGE, SimAM, TripA, and ECA blocks in existing research to SegFormer can improve the model’s performance to a certain extent. Adding ECA provides the most significant improvement in SegFormer performance. Specifically, MIoU and MPA increased by 1.47% and 14.55%, respectively. For the four diseases, downy mildew, powdery mildew, target leaf spot, and angular leaf spot, PA increased by 10.36%, 15.49%, 21.8%, and 10.52%, respectively. The main reason is that the ECA block assigns more weights to channels that favor disease spot segmentation on cucumber leaves.

3.2.3. Comparison of Different Positions of ECA

To investigate the effect of the ECA insertion position on SegFormer segmentation performance, we compared three scenarios of ECA insertion at different positions. As shown in Table 7, when ECA was added at either a or c, MIoU and MPA for network segmentation increased to varying degrees. Furthermore, when ECA was added at b, the MIoU of network segmentation increased, and the MPA decreased slightly. The experimental results indicate that the ECA can improve the model’s performance when placed at the correct location. The greatest improvement in performance was obtained when we set ECA at a, with a 1.47% increase in MIoU and a 14.55% increase in MPA. Compared with b and c, adding the ECA module in a can improve the performance of SegFormer in the cucumber leaf disease spot segmentation task.

3.2.4. Comparison of Different Stages of Using FPN

We compared seven scenarios using FPN at different stages to explore the impact on SegFormer segmentation performance. As seen in Table 8, the simultaneous use of FPN in stages 1–2, 2–3, and 3–4 resulted in significant performance improvements, with a 14.55% increase in MPA and a 1.47% increase in MIoU, reflecting the ability of FPN for feature extraction.

3.2.5. Comparison of ECA-SegFormer Using Different Hyperparameters

We performed hyperparameter tuning and sensitivity analyses to ensure the ECA-SegFormer’s robustness and generalizability. We compared the impact of using different hyperparameters on the performance of ECA-SegFormer to obtain suitable hyperparameters. As seen in Table 9, When the Initial learning rate, Dropout ratio, and Kernel were 0.00006, 0.1, and 3, respectively, ECA-SegFormer showed the best performance, with mIoU reaching 38.03% and MPA reaching 60.86.

3.2.6. Comparison with Other Segmentation Models

To evaluate the superiority of ECA-SegFormer, we trained and fine-tuned some representative segmentation models on the training set and validating set; then, these models were compared with ECA-SegFormer on the testing set. The segmentation result, the number of parameters, and the calculated amount of each model for cucumber leaf disease spots are shown in Table 10. As shown in Table 10, the mIoU of ECA-SegFormer reached 38.03%, which is 5.93%, 0.53%, 9.52%, 7.01%, 16.63%, and 1.47% higher than those of DeepLabV3+ [47], U-Net [48], PSPNet [49], HRNet [50], SETR, and SegFormer, respectively. The MPA of ECA-SegFormer reached 60.86%, which is 9.61%, 14.82%, 18.83%, 7.11%, 37.08%, and 14.55% higher than those of DeepLabV3+, U-Net, PSPNet, HRNet, SETR, and SegFormer, respectively. The ECA-SegFormer model had the highest segmentation accuracy. In terms of the model’s light-weight property, the Params of ECA-SegFormer were only 4.04 M, which is 50.67 M, 20.85 M, 42.67 M, 25.5 M, and 92.95 M lower than those of DeepLabV3+, U-Net, PSPNet, HRNet, SETR, and SegFormer, respectively. In addition, the FLOPs of ECA-SegFormer reached 10.64 G, which is 156.23 G, 441.13 G, 107.79 G, 69.32 G, and 112.77 G lower than those of DeepLabV3+, U-Net, PSPNet, HRNet, SETR, and SegFormer, respectively. Although the Params and FLOPs were slightly higher than that of the original SegFormer, the ECA-SegFormer far surpassed SegFormer in segmentation accuracy. The comparison results in Table 10 show that ECA-SegFormer can achieve better cucumber leaf disease spot segmentation with lower computational resources.

3.2.7. Visualization of Segmentation for Different Scenarios

To further validate the effectiveness of ECA-SegFormer, the testing set was divided into different test subsets from different perspectives. The main criteria for the division were the types of disease, the number of disease types, the clarity of disease spots, the presence of shelter, and the sparseness of disease spots. Four leaves with different diseases were selected as research objects for different disease types, and the semantic segmentation results of disease spots were visualized, as shown in Figure 7a. According to the number of disease types, the disease types were classified into one, two, and three, and four samples were selected for visualization. The corresponding results are shown in Figure 7b. For the clarity of the disease spots, this study classified them into clear and blur scenarios and selected two samples to visualize the results. For the presence of shelter, this study divides it into two scenarios, shelter, and no shelter, and sets two examples to visualize the results. For the sparseness of disease spots, this study divided it into two scenarios, sparse and dense, and selected two samples to visualize the results, as shown in Figure 7c. The blue markers in Figure 7 indicate the disease spots targeted in the original image that were not identified and segmented by the original SegFormer, but were correctly segmented by ECA-SegFormer. The yellow markers indicate that ECA-SegFormer correctly identified and segmented some disease spots in the original image, even though they were not manually labeled and segmented.

The qualitative assessment rendered the following conclusions:

ECA-SegFormer can improve the disease spot segmentation effect of cucumber leaves with four disease types in each scenario.
For cucumber leaves with different numbers of disease types, the ECA-SegFormer can correctly segment and identify disease spots.
In particular, for the dense scenario (the last row in Figure 7c), the SegFormer model cannot accurately segment cucumber leaf disease spots due to the large number and dense adhesion of the disease spots. The ECA-SegFormer had higher accuracy and robustness for segmenting densely connected disease spots, focusing on the most significant components of the information and realizing multi-scale feature fusion.
Furthermore, Figure 7 shows that while ECA-SegFormer can segment the spots at the same location as SegFormer, the ECA-SegFormer is more accurate, and the segmented spots overlap more with the actual spots.
ECA-SegFormer correctly segments and identifies some disease spots in the original image that are not manually labeled, demonstrating that ECA-SegFormer can reduce the subjective errors caused by manual labeling.

4. Conclusions

An ECA-SegFormer model is proposed to segment cucumber leaf disease spots in the natural environment, which balances the fusion of multi-level features. First, to verify the performance of ECA-SegFormer, a new image dataset of diseased cucumber leaves was constructed. Second, the performance of each module was further verified and analyzed using the ablation experiment. Then, the ECA-SegFormer was compared with the representative segmentation models. Finally, the segmentation results of different scenes were visualized. The experimental results show that the ECA-SegFormer model performs better than SegFormer. The method provides an effective tool for the evaluation of cucumber leaf damage. In future work, we plan to conduct further research to provide more accurate and real-time disease diagnosis in natural environments.

Author Contributions

Conceptualization, R.Y. and H.Y.; methodology, R.Y., Z.H. and R.G.; software, R.Y., Z.H. and R.G.; validation, R.Y., Z.H. and R.G.; formal analysis, Y.G.; investigation, Y.G.; resources, R.Y.; data curation, R.Y.; writing—original draft preparation, R.Y.; writing—review and editing, R.Y.; visualization, R.Y.; supervision, R.Y.; project administration, R.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanxi Province Basic Research Program Project (Free Exploration) (Grant Nos. 20210302123408, 20210302124523) and the Science and Technology Innovation Fund of Shanxi Agricultural University (Grant No. 2016ZZ11).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

Thanks to the Editor and the anonymous Reviewers for their valuable suggestions to improve the quality of this paper.

Conflicts of Interest

The authors declare that they have no known conflicting financial interest or personal relationships that could have appeared to influence the work reported in this paper.

References

Atallah, O.O.; Osman, A.; Ali, M.A.; Sitohy, M. Soybean β-conglycinin and catfish cutaneous mucous p22 glycoproteins deteriorate sporangial cell walls of Pseudoperonospora cubensis and suppress cucumber downy mildew. Pest Manag. Sci. 2021, 77, 3313–3324. [Google Scholar] [CrossRef] [PubMed]
Martinelli, F.; Scalenghe, R.; Davino, S.; Panno, S.; Scuderi, G.; Ruisi, P.; Villa, P.; Stroppiana, D.; Boschetti, M.; Goulart, L.; et al. Advanced methods of plant disease detection. A review. Agron. Sustain. Dev. 2015, 35, 1–25. [Google Scholar] [CrossRef] [Green Version]
Deenan, S.; Janakiraman, S.; Nagachandrabose, S. Image segmentation algorithms for Banana leaf disease diagnosis. J. Inst. Eng. Ser. C 2020, 101, 807–820. [Google Scholar] [CrossRef]
Pugoy, R.A.; Mariano, V. Automated rice leaf disease detection using color image analysis. In Third International Conference on Digital Image Processing; SPIE: Bellingham, WA, USA, 2011; Volume 8009. [Google Scholar] [CrossRef]
Revathi, P.; Hemalatha, M. Classification of cotton leaf spot diseases using image processing edge detection techniques. In Proceedings of the 2012 International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET), Tiruchirappalli, India, 13–14 December 2012; pp. 169–173. [Google Scholar]
Wang, Z.; Wang, K.y.; Pan, S.; Han, Y.y. Segmentation of Crop Disease Images with an Improved K-means Clustering Algorithm. Appl. Eng. Agric. 2018, 34, 277–289. [Google Scholar] [CrossRef]
Zhao, J.; Fang, Y.; Chu, G.; Yan, H.; Hu, L.; Huang, L. Identification of Leaf-Scale Wheat Powdery Mildew (Blumeria graminis f. sp. Tritici) Combining Hyperspectral Imaging and an SVM Classifier. Plants 2020, 9, 936. [Google Scholar] [CrossRef]
Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Nucl. Sci. 2020, 44, 3523–3542. [Google Scholar] [CrossRef] [PubMed]
Jiang, F.; Lu, Y.; Chen, Y.; Cai, D.; Li, G. Image recognition of four rice leaf diseases based on deep learning and support vector machine. Comput. Educ. 2020, 179, 105824. [Google Scholar] [CrossRef]
Yao, N.; Ni, F.; Wu, M.; Wang, H.; Li, G.; Sung, W.K. Deep Learning-Based Segmentation of Peach Diseases Using Convolutional Neural Network. Front. Plant Sci. 2022, 13, 876357. [Google Scholar] [CrossRef] [PubMed]
Craze, H.A.; Pillay, N.; Joubert, F.; Berger, D.K. Deep Learning Diagnostics of Gray Leaf Spot in Maize under Mixed Disease Field Conditions. Plants 2022, 11, 1942. [Google Scholar] [CrossRef]
Yong, L.Z.; Khairunniza-Bejo, S.; Jahari, M.; Muharam, F.M. Automatic Disease Detection of Basal Stem Rot Using Deep Learning and Hyperspectral Imaging. Agriculture 2023, 13, 69. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Nucl. Sci. 2017, 39, 2481–2495. [Google Scholar] [CrossRef]
Agarwal, M.; Gupta, S.K.; Biswas, K. A compressed and accelerated SegNet for plant leaf disease segmentation: A differential evolution based approach. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 25th Pacific-Asia Conference, PAKDD 2021, Virtual Event, 11–14 May 2021; pp. 272–284. [Google Scholar]
Yue, Y.; Li, X.; Zhao, H.; Wang, H. Image segmentation method of crop diseases based on improved SegNet neural network. In Proceedings of the 2020 IEEE International Conference on Mechatronics and Automation (ICMA), Beijing, China, 13–16 October 2020; pp. 1986–1991. [Google Scholar]
Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
Jia, Z.; Shi, A.; Xie, G.; Mu, S. Image segmentation of persimmon leaf diseases based on UNet. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp. 2036–2039. [Google Scholar]
Li, Y.; Qiao, T.; Leng, W.; Jiao, W.; Luo, J.; Lv, Y.; Tong, Y.; Mei, X.; Li, H.; Hu, Q. Semantic Segmentation of Wheat Stripe Rust Images Using Deep Learning. Agronomy 2022, 12, 2933. [Google Scholar] [CrossRef]
Bhujel, A.; Khan, F.; Basak, J.K.; Jaihuni, M.; Sihalath, T.; Moon, B.E.; Park, J.; Kim, H.T. Detection of gray mold disease and its severity on strawberry using deep learning networks. J. Plant Dis. Prot. 2022, 129, 579–592. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention Is All You Need. arXiv 2017, 30. [Google Scholar] [CrossRef]
Duong, L.T.; Le, N.H.; Tran, T.B.; Ngo, V.M.; Nguyen, P.T. Detection of tuberculosis from chest X-ray images: Boosting the performance with vision transformer and transfer learning. Expert Syst. Appl. 2021, 184, 115519. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:https://arxiv.org/abs/2010.11929v2. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Wang, F.; Rao, Y.; Luo, Q.; Jin, X.; Jiang, Z.; Zhang, W.; Li, S. Practical cucumber leaf disease recognition using improved Swin Transformer and small sample size. Comput. Electron. Agric. 2022, 199, 107163. [Google Scholar] [CrossRef]
Wu, J.; Wen, C.; Chen, H.; Ma, Z.; Zhang, T.; Su, H.; Yang, C. DS-DETR: A Model for Tomato Leaf Disease Segmentation and Damage Evaluation. Agronomy 2022, 12, 2023. [Google Scholar] [CrossRef]
Reedha, R.; Dericquebourg, E.; Canals, R.; Hafiane, A. Transformer neural network for weed and crop classification of high resolution UAV images. Remote Sens. 2022, 14, 592. [Google Scholar] [CrossRef]
Li, Z.; Chen, P.; Shuai, L.; Wang, M.; Zhang, L.; Wang, Y.; Mu, J. A Copy Paste and Semantic Segmentation-Based Approach for the Classification and Assessment of Significant Rice Diseases. Plants 2022, 11, 3174. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Cen, C.; Li, F.; Liu, M.; Mu, W. CRFormer: Cross-Resolution Transformer for segmentation of grape leaf diseases with context mining. Expert Syst. Appl. 2023, 229, 120324. [Google Scholar] [CrossRef]
Hu, Z.; Yang, H.; Lou, T. Dual attention-guided feature pyramid network for instance segmentation of group pigs. Comput. Electron. Agric. 2021, 186, 106140. [Google Scholar] [CrossRef]
Hu, Z.; Yan, H.; Lou, T. Parallel channel and position attention-guided feature pyramid for pig face posture detection. Int. J. Agric. Biol. Eng. 2022, 15, 222–234. [Google Scholar] [CrossRef]
Hu, Z.; Yang, H.; Yan, H. Attention-Guided Instance Segmentation for Group-Raised Pigs. Animals 2023, 13, 2181. [Google Scholar] [CrossRef] [PubMed]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Nucl. Sci. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [Green Version]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8 September 2018; pp. 3–19. [Google Scholar]
Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Nucl. Sci. 2022, 45, 1489–1500. [Google Scholar] [CrossRef]
Fan, D.P.; Ji, G.P.; Zhou, T.; Chen, G.; Fu, H.; Shen, J.; Shao, L. Pranet: Parallel reverse attention network for polyp segmentation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, 4–8 October 2020; pp. 263–273. [Google Scholar]
Liu, H.; Liu, F.; Fan, X.; Huang, D. Polarized self-attention: Towards high-quality pixel-wise regression. arXiv 2021, arXiv:2107.00782. [Google Scholar] [CrossRef]
Li, X.; Hu, X.; Yang, J. Spatial group-wise enhance: Improving semantic feature learning in convolutional networks. arXiv 2019, arXiv:2019.09646. [Google Scholar] [CrossRef]
Zhang, Q.L.; Yang, Y.B. Sa-net: Shuffle attention for deep convolutional neural networks. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
Zhou, H.; Li, J.; Peng, J.; Zhang, S.; Zhang, S. Triplet Attention: Rethinking the Similarity in Transformers. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Data Mining, Virtual Event, Singapore, 14–18 August 2021; pp. 2378–2388. [Google Scholar]
Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X.; et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. The disease image and annotated sample. The first row indicates typical disease images, and the second row indicates annotated examples. The red, green, yellow, blue, and black represent downy mildew, powdery mildew, target leaf spot, angular leaf spot, and background, respectively.

Figure 2. Data preprocessing process of disease images in this study. The first row represents the original image, and the second row denotes the images after reducing the resolution size and data-enhancement processing. The first column indicates the images after reducing the resolution size, flipping, and adjusting contrast and saturation; the second column denotes the images after reducing the resolution size, flipping, and adjusting contrast and hue; the third column represents the images after reducing the resolution size, flipping, and adjusting brightness and contrast; and the fourth column denotes the images after reducing the resolution size, adjusting brightness, contrast, and hue.

Figure 3. The SegFormer framework. FFN indicates a Feed-Forward Network. H and W represent the height and width of the original image, respectively. The Transformer Block is the basic structure of the SegFormer backbone network.

Figure 4. ECA-SegFormer network structure.

Figure 5. Diagram of the Efficient Channel Attention module.

Figure 6. The architecture of Feature Pyramid Network. Briefly, 2× represents the up-sampling operation;

c_{i}

,

m_{i}

, and

p_{i}

represent the corresponding feature map of the ith component block of Bottom-up, Top-down, and Stage-output, respectively.

Figure 6. The architecture of Feature Pyramid Network. Briefly, 2× represents the up-sampling operation;

c_{i}

,

m_{i}

, and

p_{i}

represent the corresponding feature map of the ith component block of Bottom-up, Top-down, and Stage-output, respectively.

Figure 7. Visualization of segmentation for different scenarios. (a) Different types of leaf disease, (b) Different numbers of disease type, and (c) The clarity of disease spots, the presence of shelter, and the sparseness of disease spots. Original represents the original image. Labeled represents the manually labeled and segmented image.

Table 1. The data-augmentation operations and the corresponding values.

Operation	Value
flip	horizontal flip
brightness	[−32, 32]
contrast	[−0.5, 1.5]
saturation	[−0.5, 1.5]
hue	[−18, 18]

Table 2. MiT-B0 hyperparameters.

Parameter	Value
Channel number	[32, 64, 160, 256]
Num layer	[2, 2, 2, 2]
Num head	[1, 2, 5, 8]
Patch size	[7, 3, 3, 3]
Stride	[4, 2, 2, 2]
Sr ratio	[8, 4, 2, 1]
Expansion ratio	[8, 8, 4, 4]

The channel number is the channel number of the output of each stage; the num layer is the number of encoder layers in each stage; the num head is the head number of the ESA in each stage; the patch size is the patch size of the OPE in each stage; the stride is the stride of the OPE in each stage; the sr ratio is the reduction ratio of the ESA in each stage; the expansion ratio is the expansion ratio of the feed-forward layer in each stage.

Table 3. Hardware and software parameters.

Environment	Item	Value
Hardware environment	CPU	i5-9300 H
	GPU	NVIDIA GeForce GTX 1650
	video memory	4 GB
	code base	mmsegmentation
Software environment	OS	Windows 10
	Python	3.8
	PyTorch	1.8.1
	CUDA	10.2

Table 4. Hyperparameters.

Item	Value
Optimizer	AdamW
Initial learning rate	0.00006
Minimum learning rate	0.0
Weight decay	0.01
Beta1	0.9
Beta2	0.999
Learning strategy	poly
Dropout ratio	0.1
Kernel	3

Table 5. Comparison results of different pyramid modules (%).

Module	MIoU	MPA	DPA	PPA	TPA	APA
NONE	36.56	46.31	68.72	21.28	47.95	47.32
SPP	23.52	25.99	56.08	17.70	10.52	19.67
FPN	35.75	48.98	73.54	27.07	33.71	61.63

DPA denotes downy mildew pixel accuracy, PPA denotes powdery mildew pixel accuracy, TPA denotes target leaf spot pixel accuracy, and APA denotes angular leaf spot pixel accuracy. Bold indicates the corresponding optimal value. NONE means we did not add a pyramid module. SPP represents Single Prediction Pyramid, and FPN represents Feature Pyramid Network.

Table 6. Comparative results of the different attention modules (%).

Attention	MIoU	MPA	DPA	PPA	TPA	APA
NONE	36.56	46.31	68.72	21.28	47.95	47.32
CBAM	35.97	46.19	75.63	30.12	38.16	40.87
CoT	35.03	44.92	72.32	23.57	29.41	54.40
ECA	38.03	60.86	79.08	36.77	69.75	57.84
ParNet	29.81	36.61	71.31	26.20	14.99	33.97
SE	37.82	57.84	75.79	39.69	55.42	60.48
PSA	35.31	49.44	65.10	33.50	67.75	31.44
SGE	36.73	48.36	75.02	30.49	43.94	44.01
SA	36.37	50.37	77.19	29.34	43.38	51.57
SimAM	37.50	55.64	79.57	30.19	61.83	50.97
SK	34.64	41.66	69.03	23.03	29.87	44.74
TripA	37.55	51.64	72.09	29.32	56.01	49.14

CBAM denotes Convolutional Block Attention Module; CoT denotes Contextual Transformer Networks; ECA denotes Efficient Channel Attention; ParNet denotes Parallel Reverse Attention Net; SE denotes Squeeze-and-Excitation Network; PSA denotes Polarized Self-Attention; SGE denotes Spatial Group-wise Enhance; SA denotes Shuffle Attention; SimAM denotes A Simple, Parameter-Free Attention Module; SK represents Selective Kernel Network; TripA denotes Triplet Attention. NONE means we did not add an attention module.

Table 7. Comparative results of the different ECA positions (%).

Position	MIoU	MPA	DPA	PPA	TPA	APA
NONE	36.56	46.31	68.72	21.28	47.95	47.32
a	38.03	60.86	79.08	36.77	69.75	57.84
b	35.42	48.08	76.88	28.76	48.67	38.03
c	36.87	52.73	76.43	30.64	56.98	46.89

a: at the beginning of the decoder; b: inside the FPN; c: at the end of the decoder. NONE means we did not add an attention module.

Table 8. Comparative results of different stages of using FPN (%).

Stage	MIoU	MPA	DPA	PPA	TPA	APA
NONE	36.56	46.31	68.72	21.28	47.95	47.32
3-4	36.15	44.49	64.86	25.09	47.60	40.42
2-3	35.63	44.15	68.29	24.36	46.35	37.60
1-2	37.35	47.16	69.90	27.49	44.99	46.29
3-4+2-3	36.24	53.61	73.46	28.08	51.98	60.95
2-3+1-2	36.13	48.75	67.36	24.00	64.60	39.05
3-4+1-2	28.24	34.99	68.20	22.70	14.98	34.11
1-2+2-3+3-4	38.03	60.86	79.08	36.77	69.75	57.84

3–4: feature four and feature three; 2–3: feature three and feature two; 1–2: feature two and feature one. Take 1–2 + 2–3 + 3–4 as an example; 1–2 + 2–3 + 3–4 represents stages 3–4, 2–3, and 1–2 using FPN simultaneously. NONE means we did not add a pyramid module.

Table 9. Comparison of ECA-SegFormer using different hyperparameters (%).

Item	Value	mIoU	MPA
Initial learning rate	0.00003	35.80	48.73
	0.00006	38.03	60.86
	0.00009	36.97	52.47
	0.000006	20.16	25.58
	0.0006	17.52	20.40
Dropout ratio	0.1	38.03	60.86
	0.3	37.43	55.39
	0.5	35.57	48.87
	0.7	34.21	48.70
Kernel	1	35.96	47.63
	3	38.03	60.86
	5	36.10	46.84
	7	36.26	53.07

Table 10. Comparison with other segmentation models.

Model	Backbone	mIoU (%)	MPA (%)	Params (M)	FLOPs (G)
DeepLabV3+	Xception	32.10	51.25	54.71	166.87
U-Net	Vgg16	37.50	46.04	24.89	451.77
PSPNet	Resnet50	28.51	42.03	46.71	118.43
HRNet	-	31.02	53.75	29.54	79.96
SETR	-	21.40	23.78	96.99	123.41
SegFormer	-	36.56	46.31	1.22	3.72
ECA-SegFormer	-	38.03	60.86	4.04	10.64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, R.; Guo, Y.; Hu, Z.; Gao, R.; Yang, H. Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer. Agriculture 2023, 13, 1513. https://doi.org/10.3390/agriculture13081513

AMA Style

Yang R, Guo Y, Hu Z, Gao R, Yang H. Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer. Agriculture. 2023; 13(8):1513. https://doi.org/10.3390/agriculture13081513

Chicago/Turabian Style

Yang, Ruotong, Yaojiang Guo, Zhiwei Hu, Ruibo Gao, and Hua Yang. 2023. "Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer" Agriculture 13, no. 8: 1513. https://doi.org/10.3390/agriculture13081513

APA Style

Yang, R., Guo, Y., Hu, Z., Gao, R., & Yang, H. (2023). Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer. Agriculture, 13(8), 1513. https://doi.org/10.3390/agriculture13081513

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semantic Segmentation of Cucumber Leaf Disease Spots Based on ECA-SegFormer

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.1.1. Image Data Acquisition

2.1.2. Dataset Preprocessing

2.2. Semantic Segmentation Based on ECA-SegFormer

2.2.1. SegFormer

2.2.2. ECA-SegFormer Network Structure

2.2.3. Efficient Channel Attention Module

2.2.4. Feature Pyramid Networks Module

3. Experiments and Results

3.1. Implementation Details and Evaluation Metrics

3.1.1. Implementation Details

3.1.2. Evaluation Metrics

3.2. Experimental Results and Analysis

3.2.1. Comparison of Different Pyramid Modules

3.2.2. Comparison of Different Attention Modules

3.2.3. Comparison of Different Positions of ECA

3.2.4. Comparison of Different Stages of Using FPN

3.2.5. Comparison of ECA-SegFormer Using Different Hyperparameters

3.2.6. Comparison with Other Segmentation Models

3.2.7. Visualization of Segmentation for Different Scenarios

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI