RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation

Zhao, Wenbo; Hu, Lijun; Wang, Qi; Wu, Hongxin; Wang, Jiangbo; Li, Xu; Wu, Cuiyun

doi:10.3390/agronomy15040770

Open AccessArticle

RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation

by

Wenbo Zhao

^1,2,

Lijun Hu

^1,2,

Qi Wang

^1,2

,

Hongxin Wu

^1,2,

Jiangbo Wang

³,

Xu Li

^1,2,* and

Cuiyun Wu

^3,*

¹

College of Information Engineering, Tarim University, Alaer 843300, China

²

Key Laboratory of Tarim Oasis Agriculture, Tarim University, Ministry of Education, Alaer 843300, China

³

National-Local Joint Engineering Laboratory of High Efficiency and Superior-Quality Cultivation and Fruit Deep Processing Technology on Characteristic Fruit Trees, Technology Innovation Center for Characteristic Forest Fruits in Southern Xinjiang, Alar 843300, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2025, 15(4), 770; https://doi.org/10.3390/agronomy15040770

Submission received: 4 February 2025 / Revised: 10 March 2025 / Accepted: 20 March 2025 / Published: 21 March 2025

(This article belongs to the Section Pest and Disease Management)

Download

Browse Figures

Versions Notes

Abstract

:

As an important and nutrient-rich economic crop, apple is significantly threatened by leaf diseases, which severely impact yield, making the timely and accurate diagnosis and segmentation of these diseases crucial. Traditional segmentation models face challenges such as low segmentation accuracy and excessive model size, limiting their applicability on resource-constrained devices. To address these issues, this study proposes RMP-UNet, an efficient and lightweight model for apple leaf disease segmentation. Based on the traditional UNet architecture, RMP-UNet incorporates an efficient multi-scale attention mechanism (EMA) along with innovative lightweight reparameterization modules (RepECA) and multi-scale feature fusion dynamic upsampling modules (PagDy), optimizing feature extraction and fusion processes to improve segmentation accuracy while reducing model complexity. The experimental results demonstrate that RMP-UNet achieves superior performance compared to mainstream models across multiple metrics, including a mean Intersection over Union (mIoU) of 83.27%, mean pixel accuracy of 89.84%, model size of 9.26 M, and computational complexity of 21.55 G FLOPs, making it suitable for deployment in resource-constrained environments and providing an efficient solution for real-time apple leaf disease diagnosis.

Keywords:

apple leaf disease; image segmentation; UNet; lightweight model; deep learning

1. Introduction

Apple, as a delicious and nutrient-rich fruit, contains various trace elements and vitamins [1] and is one of the most important economic crops globally [2]. However, during the growth of apple trees, various diseases often occur, posing a serious threat to apple yield. In particular, diseases affecting the leaves [3] require early detection and effective prevention and control measures to minimize losses.

Traditional disease diagnosis methods primarily rely on the experience and visual inspection of farmers or agricultural technicians. Although intuitive, these methods are inefficient and prone to inaccuracies due to the limitations of individual expertise [4]. In recent years, advancements in technology and computer science have provided new solutions for disease segmentation. Current crop image segmentation techniques can be broadly categorized into two types: traditional methods and deep learning-based methods. Traditional segmentation methods include thresholding [5], region-based [6], edge-based [7], and clustering methods [8]. These methods have been widely used in image segmentation, particularly in early crop disease segmentation, and are characterized by high computational efficiency, ease of implementation, and good performance in specific scenarios. However, they also have significant limitations, such as reliance on handcrafted features, limited segmentation accuracy, and sensitivity to noise.

With the rapid development of deep learning, image segmentation methods based on deep learning have demonstrated remarkable advantages in crop disease detection. These methods can automatically learn and extract features from large datasets, significantly improving the accuracy and robustness of disease segmentation. For instance, Sandesh Bhagat et al. [9] proposed Eff-UNet++, a novel method for leaf segmentation and counting, which redesigned skip connections and residual blocks in the decoder to address information degradation, integrating low-level and high-level features to enhance segmentation performance. Ruotong Yang et al. [10] introduced ECA-SegFormer to address the issue of imbalanced feature fusion in SegFormer, improving the semantic segmentation of cucumber leaf spots by incorporating an efficient channel attention mechanism and a feature pyramid network, resulting in a 14.55% and 1.47% improvement in mean pixel accuracy and mean intersection over union (mIoU), respectively. Zhang S et al. [11] addressed the challenges of complex backgrounds and irregular shapes in plant disease leaf images by proposing an improved U-Net (MU-Net) with residual blocks and residual paths to mitigate gradient vanishing and explosion issues. Experiments showed that the model effectively segmented leaves and disease spots. Experiments demonstrated improved accuracy and efficiency in plant disease leaf image segmentation. Yuan H et al. [12] proposed an improved DeepLabv3+ network for the more accurate segmentation of black rot spots on grape leaves by replacing the original backbone network with ResNet101 and adding channel attention and feature fusion branches. The method demonstrated higher segmentation performance than the original DeepLabv3+ on two test sets.

In response to the multi-target segmentation demands in complex agricultural scenarios, researchers have further proposed a two-stage segmentation architecture. Chunshan Wang et al. [13] proposed a two-stage model, DUNet, combining DeepLabV3+ and U-Net to classify the severity of cucumber leaf diseases. DeepLabV3+ was used to segment leaves from complex backgrounds, followed by U-Net for further disease spot segmentation. Zhu S et al. [14] proposed a two-stage adaptive loss DeepLabv3+ model to improve apple leaf image segmentation in complex environments. The first stage, Leaf-DeepLabv3+, enhanced the receptive field and reverse attention modules to identify leaf size and edges, while the second stage, Disease-DeepLabv3+, adjusted atrous convolution and channel attention modules to separate disease spots. Experiments showed that the model achieved IoU values of 98.70% and 86.56% for leaf segmentation and disease spot extraction, respectively. Liu B Y et al. [15] proposed a two-stage convolutional neural network for diagnosing apple leaf black spot disease. The first stage used deep learning algorithms with different backbones to segment apple leaves from complex backgrounds, followed by disease area identification on the segmented leaves. Results showed that the PSPNet model with MobileNetV2 as the backbone performed best in leaf segmentation, while the U-Net model with VGG as the backbone performed best in disease area prediction.

Despite the significant achievements in plant disease segmentation, which have greatly improved the accuracy and efficiency of disease diagnosis, these models often come with large sizes and complex computational structures. While such designs facilitate high-precision segmentation on high-performance computing platforms, they also impose high requirements on deployment environments. In particular, on mobile or resource-constrained devices, these large models may fail to operate effectively due to insufficient computational resources or memory limitations. Therefore, to address the issue of excessively large and cumbersome models, this study focuses on exploring lightweight improvements. By optimizing network structures, reducing parameter counts, and adopting more efficient feature extraction methods, we aim to develop an apple leaf disease segmentation model that maintains high accuracy while being lightweight. Such a model would be more suitable for deployment on mobile devices, providing a new solution for real-time and efficient agricultural disease diagnosis.

2. Materials and Methods

2.1. Dataset Construction

The data used in this study were obtained from two sources. The first part was derived from the publicly available PlantVillage dataset [16], which included three common fungal infections affecting apple leaves: apple black rot, apple scab, and apple cedar rust. The second part was collected from the horticultural experimental station of Tarim University, captured using a Xiaomi 14 smartphone (Xiaomi Corporation, Beijing, China), and included apple mosaic disease and apple Alternaria leaf spot. For each disease, 224 images were collected, yielding a dataset comprising 1120 images. To facilitate training, the initial images were resized to 256 × 256 pixels.

2.2. Data Annotation and Augmentation

In this study, the LabelMe software (version 5.5.0) was used to perform pixel-level annotations of the diseased regions on apple leaves. The annotations were saved as JSON files and subsequently converted into 8-bit depth PNG format grayscale images, which served as the annotation files for the original images. The annotated results are shown in Figure 1.

To further enhance the generalization ability and robustness of the model, this study applied various data augmentation techniques to the collected image data. These techniques simulate different shooting conditions and environmental variations, thereby improving the adaptability and stability of the model in practical applications. Specific methods included vertical flipping, horizontal flipping, random brightness adjustment, and noise addition. Through these data augmentation techniques, we successfully generated 4480 augmented images. In the final dataset, the total number of original and augmented images reached 5600. All data were stored in the PASCAL VOC2007 standard format to facilitate subsequent data processing and training. Figure 2 illustrates the original images and their augmented counterparts.

2.3. Methods

2.3.1. Improved UNet Network

In this research, the UNet network [17] was employed as the foundational architecture for segmenting apple leaf diseases. UNet is a convolutional neural network framework tailored for pixel-level image analysis, distinguished by its distinctive U-shaped design. The network comprises two primary modules: a feature extraction module and a reconstruction module. The feature extraction module captures hierarchical features from the input image via a sequence of convolutional and downsampling layers, progressively reducing the spatial dimensions. Conversely, the reconstruction module restores the original image size through upsampling operations and cross-layer connections, ultimately producing the segmentation output. Cross-layer connections, a hallmark of UNet, facilitate the direct transfer of feature maps from the extraction module to the corresponding reconstruction layers, enhancing spatial detail retention and boosting segmentation precision. Owing to its straightforward design, robust performance, and adaptability to limited data scenarios, UNet has found extensive applications in areas such as medical imaging analysis, satellite imagery interpretation, and scene understanding.

Despite the notable strengths of UNet regarding architectural design and performance, it still has some limitations. First, the substantial parameter count in UNet demands significant computational resources, hindering its deployment in resource-limited environments, such as mobile or embedded systems. Second, conventional convolutional operations struggle to model long-range dependencies, a critical requirement in disease segmentation tasks where subtle leaf variations must be accurately captured. Lastly, the feature fusion mechanism in UNet relies on simple concatenation, which may result in the loss of essential information and suboptimal segmentation outcomes. To overcome these challenges, this research introduces an enhanced model, RMP-UNet, which incorporates angle differences optimizations in both feature extraction and fusion stages. These advancements not only elevate segmentation accuracy but also maximize model efficiency for apple leaf disease segmentation tasks. The structure of RMP-UNet is depicted in Figure 3.

2.3.2. RepECA

Within the domain of deep learning, the 3 × 3 convolution is a widely adopted operation, particularly in image data processing, due to its effectiveness in extracting image features. However, in resource-constrained environments and complex segmentation tasks, the substantial parameter count and significant computational demands of 3 × 3 convolutional kernels limit their expressive power. To address these limitations, we innovatively propose the RepECA module, which combines RepVit [18] and ECA [19] to optimize this issue.

RepVit is a network architecture that combines the advantages of convolutional neural networks and vision transformers [20]. It substantially decreases the parameter count and computational demands through reparameterization techniques while maintaining strong feature extraction capabilities. However, we observed that RepVit employs the Squeeze-and-Excitation (SE) [21] channel attention mechanism. Although SE enhances model performance, its fully connected layers introduce a substantial amount of parameters, especially when the channel count is high, potentially leading to a surge in model parameters and affecting performance. To address this challenge, we replaced the SE mechanism in the RepVit module with the efficient channel attention (ECA) mechanism. Compared to SE, ECA removes the two fully connected layers and generates channel weights by adaptively selecting the size of 1D convolutional kernels, as illustrated in Figure 4, dramatically lowering both the parameter count and computational requirements. This improvement not only preserves the model’s ability to finely calibrate channel features, but also effectively reduces the model’s parameter scale and enhances its operational efficiency. The structure of the improved RepECA module is illustrated in Figure 5.

2.3.3. EMA

To address the insufficient utilization of spatial information during feature extraction in the encoder stage of traditional UNet, we introduced the EMA (efficient multi-scale attention) module [22]. The EMA module is designed to capture multi-scale features and enhance the utilization of spatial information. It divides the input feature map into multiple groups along the channel dimension and performs global average pooling in both horizontal and vertical directions to extract global features. These features are then processed through 1 × 1 and 3 × 3 convolutional branches to generate spatial weight maps. Through a cross-space learning mechanism, the outputs of the 1 × 1 and 3 × 3 branches mutually calibrate each other, further enhancing the utilization of spatial information. Finally, the spatial weight maps generated by the sigmoid function are used to recalibrate the input feature maps, improving the robustness and precision of feature representation. Additionally, the EMA mechanism is lightweight, effectively improving segmentation performance without significantly increasing the number of model parameters. The structure of the EMA module is illustrated in Figure 6.

2.3.4. PagDy

In traditional UNet models, the feature fusion stage typically employs direct concatenation. However, this approach not only results in large feature maps, increasing the complexity of subsequent processing, but also tends to cause high-resolution detail features to be overshadowed by low-frequency contextual information, thereby affecting the accuracy of segmentation results. To address this challenge, we propose the PagDy module for more efficient feature fusion. The PagDy module consists of two components: Pag [23] and DySample [24].

The Pag module enables the feature fusion of feature maps with different resolutions by selectively learning high-level semantic information, effectively integrating multi-resolution feature maps and avoiding the issue of high-resolution details being overwhelmed by low-frequency context. The module first applies convolution to two input features to generate feature maps. The low-resolution feature map is then upsampled to match the size of the high-resolution feature map using traditional bilinear interpolation. Subsequently, the feature maps are multiplied through a sigmoid function to obtain weight information, which is then used to weight the original features. Finally, the weighted features are summed to produce the output.

However, we observed that the Pag module uses traditional bilinear interpolation for image restoration, which often fails to precisely preserve edge details in disease regions, reducing segmentation accuracy. To address this issue, we replaced traditional upsampling with the DySample module. DySample is an efficient dynamic upsampling module that dynamically adjusts interpolation weights by learning the characteristics of disease regions, effectively enhancing image resolution while preserving edge details. By adaptively adjusting interpolation weights based on disease region features, DySample better retains edge details during image upsampling. Compared to traditional bilinear interpolation, this strategy significantly improves image resolution while reducing edge blurring, ensuring the precise segmentation of disease regions. Additionally, the use of the DySample module not only improves segmentation accuracy but also reduces the computational complexity of the model. The structure of the PagDy module is illustrated in Figure 7.

3. Results

3.1. Experimental Environment Configuration

To ensure the fairness of the experiments, all experiments in this study were conducted under consistent hardware and software conditions. The hardware and software configurations used in the experiments are listed in Table 1.

3.2. Training Settings

In this study, the Adam optimizer was used for model training, with an initial learning rate set to 1 × 10⁻⁴. The learning rate was dynamically adjusted using a cosine annealing strategy to optimize the training process and enhance model performance. For the loss function, Cross Entropy Loss was employed to measure the difference between model predictions and ground truth labels. The batch size was set to 8, and the number of epochs was set to 200. The dataset was divided into training, validation, and test sets in a ratio of 6:2:2, corresponding to 4480, 1120, and 1120 images, respectively. This ensures that the model is thoroughly and independently evaluated during the training, validation, and testing phases.

3.3. Evaluation Metrics

To comprehensively evaluate the performance of the model, we selected four key metrics: Total Parameters, Floating Point (FLOPs), mean Intersection over Union (mIoU), and mean pixel accuracy (mPA). These metrics quantify the model’s performance in terms of complexity, segmentation accuracy, and pixel-level classification accuracy, respectively.

Total Parameters is a critical metric for assessing model complexity, representing the overall count of trainable parameters in the model. A reduced parameter count implies faster inference speed and lower resource consumption. For a given model, the overall parameter count P is calculated by summing the parameters across all layers, including convolutional layers, fully connected layers, and others, as follows:

P = \sum_{i = 1}^{L} (W_{i} {\times H}_{i} {\times C}_{i} + B_{i})

(1)

where L is the number of layers in the model, W and H are the spatial dimensions of the weight matrix in the i layer, C is the number of input channels in the i layer, and B is the number of bias parameters in the i layer.

FLOPs (Floating Point Operations) is a widely used metric to evaluate the computational complexity of deep learning models, indicating the overall count of floating point operations needed for one forward pass. Lower FLOPs indicate higher computational efficiency, which is of great significance for resource-constrained devices. The formula for FLOPs is as follows:

F L O P s = \sum (K^{2} {\times C}_{in} {\times C}_{out} \times H \times W)

(2)

In the formula, K represents the convolution kernel size, C_in denotes the number of input channels, C_out stands for the number of output channels, and H × W indicates the dimensions of the output feature map.

Mean Intersection over Union (mIoU) is one of the most widely used performance metrics in semantic segmentation tasks, measuring the degree of overlap between predicted results and ground truth labels. Here, TP (True Positive) represents the number of pixels correctly predicted as the positive class, FP (False Positive) represents the number of pixels incorrectly predicted as the positive class, and FN (False Negative) represents the number of pixels incorrectly predicted as the negative class. The formula for mIoU is as follows:

mI o U = \frac{1}{N} \sum_{c = 1}^{N} \frac{T P_{c}}{{TP}_{c} + {FP}_{c} + {FN}_{c}}

(3)

Mean pixel accuracy (mPA) is another key metric for evaluating pixel-level classification accuracy in semantic segmentation tasks. It measures the prediction accuracy of the model at the pixel level. The mPA metric takes into account the model’s prediction accuracy for pixels of different classes and calculates the average pixel accuracy across all classes. The formula for mPA is as follows:

mPA = \frac{1}{N} \sum_{c = 1}^{N} \frac{T P_{c}}{T P_{c} + F N_{c}}

(4)

Through the aforementioned four key metrics, we can comprehensively evaluate the model’s performance from four aspects: model complexity, computational efficiency, segmentation accuracy, and pixel-level classification accuracy. This provides robust support for model evaluation and optimization.

3.4. Results and Analysis

3.4.1. Ablation Experiments

To validate the effectiveness of each module in the improved UNet model, we conducted ablation experiments. First, we performed experiments based on the baseline UNet model and recorded its performance. Subsequently, we sequentially added the RepECA module, EMA module, and PagDy module and finally trained the RMP-UNet model. After each addition, the model was retrained and its performance was evaluated. As shown in Table 2, the ablation results indicate that after adding the RepECA, EMA, and PagDy modules, we obtained three intermediate models: RepECA-UNet, EMA-UNet, and PagDy-UNet. Compared to the baseline model, RepECA-UNet significantly reduced the total number of parameters to 11.34 M and FLOPs to 28.42 G, while improving the mIoU and mPA to 82.73% and 89.80%, respectively. This demonstrates that the RepECA module not only reduces model complexity but also enhances segmentation accuracy and pixel-level classification performance. EMA-UNet maintains nearly identical parameters and comparable computational costs to the baseline, yet achieves improved performance, with an 82.30% mIoU and an 89.33% mPA, demonstrating the EMA module’s effectiveness in enhancing model generalization without substantial computational overhead. PagDy-UNet has a parameter count between the baseline model and RepECA-UNet, with slight improvements in mIoU and mPA, suggesting its positive role in optimizing segmentation results. Finally, the RMP-UNet model, which combines all modules, achieves the most efficient configuration with 9.26 M parameters and 21.55 G FLOPs while attaining the highest mIoU of 83.27% and mPA of 89.84%. These results fully validate the rationality and effectiveness of the RMP-UNet model design.

To visually demonstrate the improvement in the segmentation performance of the proposed RMP-UNet model compared to the baseline UNet, we calculated and plotted the confusion matrices for both models, as shown in Figure 8. The confusion matrices provide a detailed record of the model’s predictions for each class, including True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). Through comparative analysis, we observed that RMP-UNet significantly increased the TP values for most classes while reducing the FP and FN values accordingly. This indicates that RMP-UNet can more accurately identify and segment apple leaf disease regions while minimizing misclassification errors.

3.4.2. Comparative Experiments

To validate the superiority of the RMP-UNet model in the task of apple leaf disease segmentation, we conducted comparative experiments with several mainstream segmentation models. These models include FCN [25], baseline UNet, UNet++ [26], PSPNet [27], SegNet [28], and RMP-UNet, totaling six models. Under the same experimental settings and evaluation metrics, we trained and tested these models and recorded their parameter counts, mIoU, and mPA.

The experimental results demonstrate that the RMP-UNet model significantly outperforms the other models in terms of mIoU and mPA. We plotted the mIoU and mPA curves of the six models during the training process, as shown in Figure 9. From the figure, it is clear that RMP-UNet consistently outperforms the other models in terms of training performance, maintaining a leading position throughout the training process.

The comparative analysis reveals that the RMP-UNet architecture achieves substantial parameter reduction and computational efficiency improvements over conventional models. The optimized design not only enhances segmentation accuracy but also reduces hardware resource demands, demonstrating strong deployment potential in agricultural environments with limited computational resources.

To provide a clearer comparison of the performance of each model, we compiled detailed experimental data, as shown in Table 3.

To visually demonstrate the performance differences among the models in the apple leaf disease segmentation task, we provided segmentation comparison diagrams for the six models (see Figure 10). For clarity, red boxes were used to mark the regions of interest. From the figure, it is evident that the RMP-UNet model excels in segmentation accuracy and edge preservation, with the segmentation results closely matching the ground truth labels. In contrast, the other models exhibit varying degrees of over-segmentation, under-segmentation, or edge blurring issues.

4. Discussion

Apple leaf diseases pose a significant threat to orchard yields, and modern precision agriculture technologies have gradually replaced traditional disease diagnosis methods. In this study, the proposed RMP-UNet model demonstrated exceptional performance in the task of apple leaf disease segmentation. Through ablation experiments, we validated the effectiveness of the RepECA, EMA, and PagDy modules, which played crucial roles in reducing model complexity, improving generalization ability, and enhancing segmentation accuracy. When these modules were integrated into RMP-UNet, the overall performance of the model was significantly improved.

Furthermore, the comparative experimental results further confirmed the superiority of the RMP-UNet model. Compared to mainstream segmentation models such as FCN, UNet++, PSPNet, and SegNet, RMP-UNet not only achieved the highest scores in segmentation accuracy (mIoU and mPA) but also significantly fewer parameters and lower computational complexity (FLOPs), which is particularly important for disease diagnosis in resource-constrained environments.

Although the RMP-UNet model has demonstrated significant achievements in performance and lightweight design, its deployment in real-world natural environments still faces notable challenges. The current training data are primarily constructed in laboratory settings, which significantly differs from the complexity and diversity of field scenarios. For instance, the data fail to adequately capture the dynamic variations in natural lighting and multi-angle characteristics, and the fixed-perspective data collection method is mismatched with the multi-angle, multi-pose shooting modes of mobile devices. Additionally, the lack of real-world background complexities, such as foliage occlusion and multi-object interference, may lead to issues like the mis-segmentation of high-reflectance areas, confusion of similar textures, and missed detection of tiny lesions in field applications, thereby reducing the model’s generalization capability and practical reliability. To address these challenges, future research could focus on constructing datasets that more closely resemble actual field conditions, incorporating diverse lighting conditions and multi-angle differences. Employing data augmentation techniques, such as random lighting variations, perspective transformations, and diffusion model generation, can further enrich the diversity of training data and enhance the model’s adaptability. Lastly, the continuous optimization of lightweight design will reduce computational resource demands, making the model more suitable for field devices. Through these improvements, the RMP-UNet model is expected to achieve more efficient and accurate disease segmentation in real-world environments, providing robust technical support for the intelligent development of precision agriculture.

5. Conclusions

This study proposes RMP-UNet, a lightweight improved UNet model, for the task of apple leaf disease segmentation. By designing the RepECA and PagDy modules and incorporating the EMA module, we effectively addressed the limitations of the traditional UNet model in terms of parameter size and segmentation accuracy. Experimental results demonstrate that the RMP-UNet model excels in total parameter count, mean intersection over union (mIoU), and mean pixel accuracy (mPA). Notably, it demonstrates superior segmentation accuracy and edge preservation compared to mainstream models, maintaining lightweight characteristics, with only 9.26 M parameters and 21.55 G FLOPs, while achieving exceptional performance metrics of 83.27% mIoU and 89.84% mPA. These results not only validate the effectiveness and superiority of the RMP-UNet model but also highlight its potential for deployment in resource-constrained environments.

Although the RMP-UNet model has achieved remarkable success in terms of performance and lightweight design, there is still room for improvement in segmentation performance under complex background conditions. Future research could focus on further optimizing the model architecture and enhancing its robustness to complex backgrounds to further improve segmentation accuracy and generalization capabilities. In summary, the RMP-UNet model provides a novel solution for the efficient and real-time diagnosis of apple leaf diseases, demonstrating significant application prospects.

Author Contributions

Conceptualization, X.L., W.Z. and C.W.; methodology, X.L., W.Z. and C.W.; validation, X.L., W.Z. and C.W.; formal analysis, X.L., W.Z. and L.H.; investigation, W.Z., L.H., Q.W. and H.W.; resources, X.L., W.Z., J.W. and C.W.; data curation, W.Z., L.H. and Q.W.; writing—original draft preparation, W.Z.; writing—review and editing, X.L. and C.W.; visualization, W.Z.; supervision, X.L. and C.W.; funding acquisition, X.L. and C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Open Project of the Oasis Ecological Agriculture Corps Key Laboratory 202002, the Corps Science and Technology Program 2021CB041, 2021BB023, and 2021DB001, the Innovation Team Project of Tarim University TDZKCX202306 and TDZKCX202102, the National Natural Science Foundation of China 61563046, and the China Agricultural University-Tarim University Joint Scientific Research Fund ZNLH202402.

Data Availability Statement

The data involved in the study can be obtained by contacting the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Patocka, J.; Bhardwaj, K.; Klimova, B.; Nepovimova, E.; Wu, Q.; Landi, M.; Kuca, K.; Valis, M.; Wu, W. Malus domestica: A review on nutritional features, chemical composition, traditional and medicinal value. Plants 2020, 9, 1408. [Google Scholar] [CrossRef] [PubMed]
O’Rourke, D. Economic importance of the world apple industry. In The Apple Genome; Springer: Cham, Switzerland, 2021; pp. 1–18. [Google Scholar]
Jiang, P.; Chen, Y.; Liu, B.; He, D.; Liang, C. Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks. IEEE Access 2019, 7, 59069–59080. [Google Scholar]
Bharate, A.A.; Shirdhonkar, M.S. A review on plant disease detection using image processing. In Proceedings of the 2017 International Conference on Intelligent Sustainable Systems (ICISS), Palladam, India, 7–8 December 2017; IEEE: New York, NY, USA, 2017; pp. 103–109. [Google Scholar]
Kuchipudi, D.P.; Babu, T.R. A review on segmentation of plant maladies and pathological parts from the leaf images in agriculture crop. In Proceedings of the 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, India, 5–6 July 2019; IEEE: New York, NY, USA, 2019; Volume 1, pp. 927–934. [Google Scholar]
Jin, Z.; Han, F. An Automatic Hierarchical Leaf Venation Segmentation Based on Region Growing. In Proceedings of the 2020 IEEE 5th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 23–25 October 2020; IEEE: New York, NY, USA, 2020; pp. 124–128. [Google Scholar]
Lomte, S.S.; Janwale, A.P. Plant leaves image segmentation techniques: A review. Artic. Int. J. Comput. Sci. Eng. 2017, 5, 147–150. [Google Scholar]
Tian, K.; Li, J.; Zeng, J.; Evans, A.; Zhang, L. Segmentation of tomato leaf images based on adaptive clustering number of K-means algorithm. Comput. Electron. Agric. 2019, 165, 104962. [Google Scholar]
Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. Eff-UNet++: A novel architecture for plant leaf segmentation and counting. Ecol. Inform. 2022, 68, 101583. [Google Scholar] [CrossRef]
Yang, R.; Guo, Y.; Hu, Z.; Gao, R.; Yang, H. Semantic segmentation of cucumber leaf disease spots based on ECA-SegFormer. Agriculture 2023, 13, 1513. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, C. Modified U-Net for plant diseased leaf image segmentation. Comput. Electron. Agric. 2023, 204, 107511. [Google Scholar] [CrossRef]
Yuan, H.; Zhu, J.; Wang, Q.; Cheng, M.; Cai, Z. An improved DeepLab v3+ deep learning network applied to the segmentation of grape leaf black rot spots. Front. Plant Sci. 2022, 13, 795410. [Google Scholar]
Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
Zhu, S.; Ma, W.; Lu, J.; Ren, B.; Wang, C.; Wang, J. A novel approach for apple leaf disease image segmentation in complex scenes based on two-stage DeepLabv3+ with adaptive loss. Comput. Electron. Agric. 2023, 204, 107539. [Google Scholar]
Liu, B.Y.; Fan, K.J.; Su, W.H.; Peng, Y. Two-stage convolutional neural networks for diagnosing the severity of Alternaria leaf blotch disease of the apple tree. Remote Sens. 2022, 14, 2519. [Google Scholar] [CrossRef]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using deep learning for image-based plant disease detection. Front. Plant Sci. 2016, 7, 215232. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention, Proceedings of the MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Wang, A.; Chen, H.; Lin, Z.; Han, J.; Ding, G. RepViT: Revisiting mobile CNN from ViT perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 15909–15920. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2019; pp. 1–5. [Google Scholar]
Xu, J.; Xiong, Z.; Bhattacharyya, S.P. PIDNet: A real-time semantic segmentation network inspired by PID controllers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 19529–19539. [Google Scholar]
Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to upsample by learning to sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 6027–6037. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested U-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Proceedings of the 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4; Springer International Publishing: Cham, Switzerland, 2018; pp. 3–11. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [PubMed]

Figure 1. Image annotation results.

Figure 2. Original images and augmented images.

Figure 3. Architecture of RMP-UNet.

Figure 4. Architecture of ECA module.

Figure 5. Architecture of the RepECA module. DW: depthwise convolution.

Figure 6. Architecture of the EMA module.

Figure 7. Architecture of the PagDy module.

Figure 8. Confusion matrices of UNet and RMP-UNet.

Figure 9. mIoU and mPA curves during training.

Figure 10. Segmentation results of different models. (The red box highlights the differences in segmentation results.).

Table 1. Experimental environment configuration.

	CPU	Intel Core i9-9900KF (USA)
Hardware Environment	GPU	NVIDIA RTX4070Ti SUPER (USA)
	RAM	32 GB
	OS	Windows 10
	CUDA	V12.3
Software Environment	CUDNN	V8.9.7
	Python	3.8
	torch	2.1.0
	torchvision	0.16.0

Table 2. Ablation study results.

Model	Total Parameters (M)	FLOPs (G)	mIoU (%)	mPA (%)
UNet	17.26	40.21	81.12	88.64
RepECA-UNet	11.34	28.42	82.73	89.80
EMA-UNet	17.27	40.56	82.30	89.33
PagDy-UNet	15.18	32.98	82.08	89.08
RMP-UNet	9.26	21.55	83.27	89.84

Table 3. Comparative experimental results.

Model	Total Parameters (M)	FLOPs (G)	mIoU (%)	mPA (%)
UNet	17.26	40.21	81.12	88.64
FCN	20.11	22.90	81.40	88.10
UNet++	36.62	138.68	80.71	87.22
SegNet	29.45	40.20	76.85	84.93
PSPNet	46.72	46.23	78.09	86.14
RMP-UNet	9.26	21.55	83.27	89.84

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, W.; Hu, L.; Wang, Q.; Wu, H.; Wang, J.; Li, X.; Wu, C. RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation. Agronomy 2025, 15, 770. https://doi.org/10.3390/agronomy15040770

AMA Style

Zhao W, Hu L, Wang Q, Wu H, Wang J, Li X, Wu C. RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation. Agronomy. 2025; 15(4):770. https://doi.org/10.3390/agronomy15040770

Chicago/Turabian Style

Zhao, Wenbo, Lijun Hu, Qi Wang, Hongxin Wu, Jiangbo Wang, Xu Li, and Cuiyun Wu. 2025. "RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation" Agronomy 15, no. 4: 770. https://doi.org/10.3390/agronomy15040770

APA Style

Zhao, W., Hu, L., Wang, Q., Wu, H., Wang, J., Li, X., & Wu, C. (2025). RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation. Agronomy, 15(4), 770. https://doi.org/10.3390/agronomy15040770

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RMP-UNet: An Efficient and Lightweight Model for Apple Leaf Disease Segmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction

2.2. Data Annotation and Augmentation

2.3. Methods

2.3.1. Improved UNet Network

2.3.2. RepECA

2.3.3. EMA

2.3.4. PagDy

3. Results

3.1. Experimental Environment Configuration

3.2. Training Settings

3.3. Evaluation Metrics

3.4. Results and Analysis

3.4.1. Ablation Experiments

3.4.2. Comparative Experiments

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI