A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection

Wang, Tao; Xia, Hongyi; Xie, Jiao; Li, Jianjun; Liu, Junwan

doi:10.3390/agriculture15030262

Open AccessArticle

A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection

by

Tao Wang

¹,

Hongyi Xia

²,

Jiao Xie

¹,

Jianjun Li

¹ and

Junwan Liu

^1,*

¹

College of Computer and Mathematics, Central South University of Forestry & Technology, Changsha 410004, China

²

College of Information Engineering, Hunan University of Applied Technology, Changde 415500, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(3), 262; https://doi.org/10.3390/agriculture15030262

Submission received: 1 January 2025 / Revised: 20 January 2025 / Accepted: 22 January 2025 / Published: 25 January 2025

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Hemerocallis fulva, essential to urban ecosystems and landscape design, faces challenges in disease detection due to limited data and reduced accuracy in complex backgrounds. To address these issues, the Hemerocallis fulva leaf disease dataset (HFLD-Dataset) is introduced, alongside the Hemerocallis fulva Multi-Scale and Enhanced Network (HF-MSENet), an efficient model designed to improve multi-scale disease detection accuracy and reduce misdetections. The Channel–Spatial Multi-Scale Module (CSMSM) enhances the localization and capture of critical features, overcoming limitations in multi-scale feature extraction caused by inadequate attention to disease characteristics. The C3_EMSCP module improves multi-scale feature fusion by combining multi-scale convolutional kernels and group convolution, increasing fusion adaptability and interaction across scales. To address interpolation errors and boundary blurring in upsampling, the DySample module adapts sampling positions using a dynamic offset learning mechanism. This, combined with pixel reordering and grid sampling techniques, reduces interpolation errors and preserves edge details. Experimental results show that HF-MSENet achieves mAP@50 and mAP%50–95 scores of 94.9% and 80.3%, respectively, outperforming the baseline model by 1.8% and 6.5%. Compared to other models, HF-MSENet demonstrates significant advantages in efficiency and robustness, offering reliable support for precise disease detection in Hemerocallis fulva.

Keywords:

Hemerocallis fulva; leaf disease detection; multi-scale feature extraction; feature fusion; upsampling

1. Introduction

Hemerocallis, a perennial herbaceous species in the Asphodelaceae family [1], is predominantly distributed in temperate and subtropical regions of Asia [2]. Known as the “king of perennials”, Hemerocallis plays a crucial role in ecological protection due to its well-developed root system, drought resistance, and ability to stabilize soil and prevent sand erosion [3]. Additionally, its extracts contain flavonoid compounds with sedative and antidepressant effects [4,5,6], and amid growing concerns over depression and insomnia in the post-pandemic era, this issue has garnered significant attention [7].

Hemerocallis fulva, valued for its vibrant flowers and adaptability to various environments, is widely used in landscaping and ecological restoration [8]. However, large-scale cultivation has intensified leaf disease issues, damaging plants and threatening other vegetation [9]. This diminishes its ecological roles, such as soil stabilization and water conservation, while impacting urban landscaping and related industries [10]. The exchange of scientific knowledge, such as the integration of Western and Chinese medicinal practices in Pedro de la Piñuela’s Bencao Bu, exemplifies how diverse knowledge systems contribute to modern research [11]. Effective research and detection methods are essential to protect its ecological and economic value and support sustainable development.

Early research on leaf disease detection primarily focused on combining image feature extraction with traditional classifiers, leading to significant advancements in disease diagnosis [12,13]. For instance, Mahmud et al. [14] proposed a strawberry powdery mildew detection method using image texture features along with support vector machine (SVM) and k-nearest neighbor (kNN) classifiers, achieving up to 98.33% accuracy across various validation methods. Xie and He combined hyperspectral images with texture features using the gray-level co-occurrence matrix (GLCM), employing kNN and AdaBoost models for the early classification of eggplant ulcer disease, reaching 88.46% accuracy [15]. Nashrullah et al. [16] introduced a texture feature extraction method based on Gabor filters with an SVM classifier, achieving 98.83% specificity and 94.60% AUC. Ahmad et al. [17] integrated GLCM texture features with SVM classification, attaining 98.79% accuracy in cross-validation. Wang et al. [18] proposed a K-means clustering and Lab color space-based algorithm for wheat leaf disease detection, achieving 90% accuracy. Wu et al. [19] applied hyperspectral imaging, vegetation indices, and texture features combined with machine learning for the early detection of strawberry gray mold. Singh and Kaur used a multi-class SVM classifier for potato leaf disease detection, reaching 95.99% accuracy [20]. Zhao et al. [21] employed hyperspectral imaging and principal component analysis (PCA) with an SVM classifier for diagnosing Schisandra chinensis black spot disease, achieving 92.77% accuracy. Suganya Devi et al. [22] proposed the H2K method, integrating Harris corner detection, HOG features, and kNNs, achieving 97.67% accuracy. Jinling Zhao et al. [23] utilized hyperspectral imaging and SVM for wheat rust identification, achieving 93.33% accuracy after PCA dimensionality reduction.

Despite progress in feature extraction and classifier selection, early methods struggled with complex backgrounds and subtle disease features. Relying on manually designed features, they often missed critical details in overlapping diseases, leading to false positives or missed detections. While effective in simpler tasks, these methods need improvement in robustness and precision for fine-grained disease detection.

Recent advancements in deep learning technologies have significantly advanced leaf disease detection methods. Compared to traditional approaches, deep learning models automatically extract deep features from large-scale annotated data, improving detection accuracy and adaptability in complex environments. Maheswaran et al. [24] utilized convolutional neural networks (CNNs) for paddy leaf disease detection, applying background exclusion based on hue values for pre-processing and extracting disease features through CNNs, achieving high classification accuracy. As a result, deep learning-based object detection has become mainstream, with methods primarily categorized into two types: two-stage and one-stage algorithms. Compared to two-stage models, single-stage algorithms, such as SSD and YOLO, use end-to-end direct prediction, offering faster speeds and lower computational complexity, making them more suitable for real-time detection tasks [25,26,27,28,29,30,31,32,33,34]. For example, Tian et al. [35] proposed the VMF-SSD model, which optimizes multi-scale feature extraction to improve detection performance for small spot diseases, achieving an mAP of 83.19% on the test set. Deari Sabri et al. [36] introduced a hybrid multi-stage model combining YOLO with an enhanced inception network, optimizing disease localization and capture, achieving a detection accuracy of 96.67% on a public rice leaf disease dataset. Jianlong Wang et al. [37] proposed the LCGSC-YOLO method, incorporating the GSConv module to enhance feature fusion adaptability and interaction capability, achieving an mAP of 95.5% on a mixed dataset. Kumar V. [38] Senthil et al. introduced the Bi-FAPN model, using a bidirectional feature attention pyramid network (Bi-FAPN) to optimize multi-scale feature extraction, achieving an mAP of 82.8% on the RID dataset. Yuelong He et al. [39] presented the KTD-YOLOv8 model, which integrates a Triplet Attention mechanism to improve multi-scale feature extraction, resulting in a 2.8% increase in accuracy over YOLOv8 and enhanced strawberry leaf disease detection. Zhedong Xie et al. [40] proposed the YOLOv5s-BiPCNeXt model, introducing a multi-scale cross-spatial attention mechanism (ElA) and CARAFE operations to improve multi-scale feature extraction and the capture of detailed information. Zhu Shisong et al. [41] introduced the EADD-YOLO model, which optimizes feature extraction and fusion using ShuffleNet and a coordinate attention module, achieving an mAP of 95.5% on an apple leaf disease dataset. Yan Chunman et al. [42] proposed the FSM-YOLO model, integrating an Adaptive Feature Enhancement module (AFEl) and a Spatial Context-Aware Attention module (SCAA), improving mAP@50 by 2.7%. Akram Abdulah et al. [43] employed the YOLOv8s framework to optimize feature extraction, achieving an mAP of 92.5% and improving tomato leaf disease detection accuracy. Xu Weishi et al. [44] introduced the ALAD-YOLO model, incorporating MobileNet-V3s and a coordination attention mechanism, achieving a 7.9% improvement in accuracy over YOLOv5s for apple leaf disease detection. Bandi et al. [45] proposed a leaf disease detection and stage classification method using YOLOv5 and Vision Transformer (ViT), achieving an F1 score of 0.908 while enhancing detail capture and reconstruction precision through background removal.

Despite these advancements, existing methods still face limitations in complex background and multi-scale disease detection tasks. On the one hand, most research has focused on disease detection in major crops, such as apples and rice, with relatively few studies on horticultural plants, such as Hemerocallis fulva. On the other hand, the stability of current models in handling complex backgrounds and noise interference remains inadequate, particularly when detecting small lesions, where models struggle to accurately capture subtle features. Furthermore, although multi-scale feature extraction and detailed information reconstruction have improved, precise lesion boundary localization and feature fusion still require further optimization.

After analyzing prior research, YOLO has proven to be an effective model for leaf disease detection. YOLOv8 stands out with its speed and accuracy, making it a mainstream choice in this field. Inspired by its performance, we propose a high-precision object detection model for Hemerocallis fulva leaf disease in complex backgrounds to address these issues, with the following contributions:

To address the scarcity of Hemerocallis fulva leaf disease data, the Hemerocallis fulva leaf disease dataset (HFLD-Dataset) was created, covering four disease categories collected from the central Yangtze River plain in southern Asia (April–August 2024), providing comprehensive data for model validation.
An improved object detection model, the Hemerocallis fulva Multi-Scale and Enhanced Network (HF-MSENet), was developed to enhance disease detection accuracy under varying lighting conditions, angles, and growth stages. Experimental results confirm its superiority over traditional methods.
A Channel–Spatial Multi-Scale Module (CSMSM) is introduced to enhance the model’s ability to focus on and extract features from target regions. By employing a channel–spatial dual attention mechanism and multi-scale feature extraction, it significantly improves the capture of fine-grained information and target region detection.
Traditional multi-scale feature fusion methods are limited by poor information interaction, making them ineffective for variations in target size and shape. Upsampling stages often introduce interpolation errors, reducing edge detail and detection accuracy. To address these issues, the C3_EMSCP module enhances multi-scale feature fusion through joint multi-scale and group convolutions. Paired with the DySample module, which adjusts sampling positions using dynamic offsets, this approach improves detail reconstruction, reduces interpolation errors, and enhances edge clarity through pixel reordering and grid sampling.

2. Materials and Methods

2.1. Dataset Construction and Pre-Processing

2.1.1. Description of Study Area and Data Collection

The Hemerocallis fulva leaf disease dataset was collected from the middle reaches of the Yangtze River plain in southern Asia, a region characterized by a subtropical monsoon climate. This area features a warm and humid climate, flat terrain, and fertile soil, making it ideal for Hemerocallis fulva growth. The abundance of water resources and diverse microclimates provided optimal conditions for the variety of disease samples. Data were collected from three locations in Hunan Province, China: the plantation of the Central South University of Forestry and Technology (28.132 N, 112.994 E), with 636 images; Bafang Park (28.239 N, 112.944 E), with 1380 images; and the Hunan Provincial Botanical Garden (28.103 N, 113.032 E), with 339 images, totaling 2355 images. Data collection took place from April to August 2024, during the rapid growth phase of Hemerocallis fulva, when the leaves fully expanded, and disease symptoms were most prominent. This period ensured the data’s representativeness and provided a scientific basis for disease prevention and control. The geographical locations and environment of the dataset collection sites are shown in Figure 1.

To enhance the diversity and representativeness of the dataset, four smartphones and four cameras were used to capture images from multiple angles, at a distance of 10–20 cm from the leaves. This approach ensured clear visibility of disease features and captured a wide range of image qualities and scenes. Specific device pixel parameters and image quantities are listed in Table 1.

2.1.2. Dataset Annotation

The disease characteristics of Hemerocallis fulva leaves are shown in Figure 2, with all disease instances accurately annotated using the LabelImg tool 1.8.0. The annotation data reveal that the instances of Rust, Anthracnose, Leaf Spot, and Leaf Blight are 391, 332, 932, and 1050, respectively. Among these, small targets are most densely distributed, followed by medium and large targets. This distribution reflects the real-world occurrence of disease targets, highlighting the importance of enhancing small-scale feature extraction. Additionally, the distribution of medium and large targets aids in balancing the weight of different scale targets in the training dataset, thereby improving the model’s adaptability and generalization capability. To ensure model generalization, the dataset was divided into training, testing, and validation sets in a 7:2:1 ratio [46,47]. The organized dataset is named “HFLD-Dataset”, and the target box size information is provided in Figure 3.

2.1.3. Image Enhancement

To enhance the model’s generalization ability and prevent overfitting, various data augmentation strategies were employed to enrich the dataset and address the scarcity of image samples in different environments. These strategies include horizontal flipping to simulate changes in leaf orientation and angle, perspective transformations to replicate morphological variations due to changes in shooting angles or viewpoints, noise addition to simulate the impact of sensor or environmental interference, brightness adjustment to mimic performance under varying lighting conditions, and rotation transformations to simulate variations in leaf growth or camera angles. Each image was augmented five times, with at least one enhancement method applied per instance, ensuring that no single method was used more than once. The effects of these augmentations are illustrated in Figure 4.

Data augmentation enhanced dataset diversity, ensuring a representation of various environmental conditions and variations during training. Each augmentation expanded the dataset, balancing sample distribution, mitigating overfitting, and improving model adaptability to complex scenarios. While the instance distribution of diseases is slightly imbalanced due to factors such as the climate, geography, vegetation density, pesticide frequency, and the presence of multiple small disease targets in a single image, the overall scale and diversity of the dataset meet the requirements for model training and validation. The dataset composition is shown in Table 2.

2.2. HF-MSENet Model

One-stage object detection algorithms, such as YOLO, directly predict bounding boxes and class scores, eliminating the additional overhead of two-stage models, which enhances real-time processing efficiency. YOLO outperforms SSD in both speed and accuracy, making it more suitable for real-time detection tasks. YOLOv8, through continuous optimization, further improves performance, with YOLOv8n offering high accuracy, fast detection, and low memory usage, making it ideal for mobile deployment [48]. Its architecture integrates the C2f network, SPPF, and convolutional modules for feature extraction, while PANet is employed for feature fusion. With its efficient design, YOLOv8n excels in both performance and accuracy, providing an effective solution for disease detection in agriculture.

Despite YOLOv8n’s strengths in fast detection and multi-object labeling, it presents notable limitations in the detection of Hemerocallis fulva leaf diseases. First, interference from complex backgrounds often obscures disease features, reducing the model’s ability to focus on critical regions. Second, its multi-scale feature extraction capacity is limited, making it less adaptable to the significant variations in disease region size and shape, particularly in small-scale disease detection. Furthermore, its ability to reconstruct features and capture fine details is insufficient for accurately detecting subtle disease characteristics. To address these issues, several key improvements have been made in this study, leading to the proposal of the enhanced HF-MSENet model. The HF-MSENet model’s architecture is shown in Figure 5.

To enhance the model’s ability to focus on target regions and improve multi-scale feature extraction, the CSMSM module is introduced at the backend of the backbone network in ①. This module serves as a critical component in the detection pipeline, optimizing the model’s ability to prioritize and extract features from key regions where disease symptoms are most prominent. This module strengthens the model’s attention on key regions through a channel–spatial dual attention mechanism, while the incorporation of multi-scale feature extraction strategies comprehensively improves the extraction of disease region features.
To address the low information interaction efficiency and insufficient cross-scale collaboration in multi-scale feature fusion, the C3_EMSCP module is introduced and optimized at three critical junctions between the backbone and network layers in ②. This strategic placement allows the module to bridge different feature layers, improving the model’s ability to fuse information from diverse resolutions. Using multi-scale convolutional kernels, this module effectively adapts and refines features of different resolutions, while the integration of group convolution structures further enhances computational efficiency and feature fusion depth.
To solve the problem of detail loss and blurred object boundaries often caused by traditional upsampling methods, the DySample module is introduced during the upsampling phase in ③. The DySample module is designed to specifically address the challenges posed by fine-grained features and boundary precision in disease detection. This module employs a dynamic offset learning mechanism to adaptively adjust sampling positions, avoiding detail loss due to interpolation errors. Additionally, by using pixel reordering and grid sampling techniques, it optimizes detail retention and improves edge clarity.

The collaborative optimization of feature extraction, multi-scale fusion, and detail reconstruction significantly enhances the model’s ability to focus on target regions, interact across scales, and restore fine details, providing robust support for the accurate detection of Hemerocallis fulva leaf diseases.

2.2.1. CSMSM Module

In the task of Hemerocallis fulva leaf disease detection, disease features often appear in various scales and are accompanied by complex background interference. Existing object detection models exhibit limitations in focusing on key regions and extracting multi-scale features. To address these challenges, this paper proposes the CSMSM (Channel–Spatial Multi-Scale Module), which combines a channel–spatial dual attention mechanism with a multi-scale feature extraction strategy. This approach effectively suppresses background noise and enhances the model’s ability to focus on and extract features from key disease regions.

The design of CSMSM is inspired by improvements in multi-stage modules, incorporating two core components: channel–spatial dual attention and the Multi-Scale Feature Fusion Strategy. The channel–spatial dual attention module, building upon the collaboration of channel and spatial attention mechanisms (such as CBAM and GAM) [49,50], further strengthens feature representation in critical regions while effectively suppressing background noise. The Multi-Scale Feature Fusion Strategy integrates ideas from the SPPF and SPPFCSPC modules, particularly the cross-stage network (CSP) design [51,52]. By employing feature sharing and separation mechanisms [53], it enhances the model’s ability to extract features at multiple scales. The detailed design is illustrated in Figure 6.

Channel–Spatial Dual Attention

One of the key components of CSMSM is the channel–spatial dual attention, designed to optimize feature representation, enhance the model’s focus on critical disease regions, and suppress background noise interference. The structures of channel attention and spatial attention are shown in Figure 7.

Specifically, the input feature map first passes through the channel attention module. This module aims to optimize the channel weight distribution of the input feature map, highlighting disease-related features while suppressing irrelevant information. The design of channel attention is inspired by the dynamic allocation of global information. By analyzing the global features of each channel, adaptive weights are generated. In the implementation process, the input feature map is first rearranged along the channel dimension. Then, two fully connected layers (MLPs) compress and restore the feature dimensions, with the ReLU activation function applied for nonlinear mapping. Subsequently, the generated weights are normalized using a Sigmoid function, ensuring that the weights of disease-relevant channels are enhanced, while the weights of redundant background channels are suppressed. The optimized channel-weighted feature map is element-wise multiplied with the original feature map to enhance features along the channel dimension.

Next, the optimized feature map is passed to the spatial attention module. This module focuses on the two-dimensional spatial distribution of the feature map to capture pixel-level local information. In the implementation, spatial attention first uses a 7 × 7 convolution kernel to extract local features, followed by feature optimization through batch normalization (BatchNorm) and the ReLU activation function. Another convolution layer restores the features to their original dimensions, and a Sigmoid function generates the spatial weight map, which significantly enhances the key disease regions in the spatial dimension. This design effectively avoids information loss that may result from traditional pooling operations, performing well in handling irregular disease shapes and fine lesions. Finally, the channel-optimized feature map is element-wise multiplied with the spatial weight map, resulting in a feature map that combines both channel and spatial attention mechanisms.

Throughout this process, channel attention dynamically allocates channel weights to emphasize disease features while suppressing redundant information. Spatial attention strengthens local details in the spatial dimension, improving the focus on target disease regions amidst complex backgrounds. The final combined feature map exhibits an improved hierarchical structure and expressive power.

The overall computation process is outlined in Equations (1)–(3).

F_{2} = M_{c} (F_{1}) \otimes F_{1} = sigmoid {[K_{1} \cdot ReLU (w_{2} y + b_{2})]}^{T},

(1)

y = w_{1} K_{1}^{T} + b_{1},

(2)

F_{3} = M_{s} (F_{2}) \otimes F_{2} = sigmoid [ConvBN (ConvReLU (K_{2}))]

(3)

Here, F₁ represents the input feature map, while F₂ denotes the output feature map of the channel attention sub-module. w₁ and w₂ and b₁ and b₂ represent the initial weights and biases of the multilayer perceptron (MLP). M_C refers to the channel attention function, F₃ is the output feature map, and M_S represents the spatial attention function.

Multi-Scale Feature Fusion Strategy

The Multi-Scale Feature Fusion Strategy is another core design of CSMSM, aimed at further enhancing the model’s ability to capture multi-scale disease features after attention optimization, while maintaining efficient feature extraction. The overall design of the Multi-Scale Feature Fusion Strategy is shown in Figure 8.

Specifically, the input feature map is divided into two processing branches: one extracts features through traditional convolution operations, while the other gradually expands the receptive field by stacking fixed-size pooling kernels (e.g., 5 × 5), simulating the effect of larger pooling kernels (e.g., 9 × 9 and 13 × 13). This layer-wise stacking approach avoids redundant computations caused by parallel multi-kernel pooling, enabling efficient multi-scale feature extraction. The pooled feature maps are then processed by 1 × 1 convolutions for both dimensionality reduction and expansion, preserving detailed information while reducing computational complexity. Additionally, by combining pooling with residual connections, this strategy ensures continuous and complete feature flow. Ultimately, the Multi-Scale Feature Fusion Strategy integrates pooling and residual mechanisms to capture disease features at different scales while maintaining efficient computational performance.

The calculations for the pooling section are given in Equations (4)–(7).

S_{1} (R) = M a x P o o l_{(k = 5)}^{(p = 2)} (R),

(4)

S_{2} (R) = M a x P o o l_{(k = 5)}^{(p = 2)} (R),

(5)

S_{3} (R) = M a x P o o l_{(k = 5)}^{(p = 2)} (R),

(6)

S_{4} = S_{1} \otimes S_{2} \otimes S_{3}

(7)

Here, R represents the input feature layer, while S₁, S₂, and S₃ denote the results from the small, medium, and large pooling kernels, respectively. S₄ refers to the final output, k indicates the number of convolution kernels, and p denotes the convolution stride. The symbol ⊗ represents tensor concatenation.

Through the stepwise optimization and element-wise multiplication of channel attention and spatial attention, CSMSM enhances the representation of information related to key disease regions while suppressing background noise, thereby improving the prominence of disease areas. Combined with the Multi-Scale Feature Fusion Strategy’s efficient feature sharing and separation mechanism, the hierarchical expression of features is progressively optimized, enabling the efficient extraction of information across different scales. Overall, CSMSM significantly improves the depth and breadth of feature representation, providing high-quality feature support for Hemerocallis fulva leaf disease detection in complex backgrounds.

2.2.2. C3_EMSCP Module

In the task of detecting Hemerocallis fulva leaf diseases, the diversity and irregularity of disease areas present significant challenges to the model’s ability to fuse and interact with multi-scale features. The traditional C2f module, as a lightweight feature extraction unit, expresses features through stacked Bottleneck sub-modules and improves computational efficiency with chunk operations. However, the C2f module is limited in its ability to interact and fuse multi-scale information, making it difficult to fully coordinate the information flow between features of different scales. This limitation restricts the model’s overall understanding and adaptability to complex disease areas. The structure of the C2f module is shown in Figure 9.

The C3_EMSCP module is introduced and optimized to enhance multi-scale information interaction and fusion. By combining multi-scale convolutional kernels with group convolution techniques, it improves feature flow [54,55]. The structure of the C3_EMSCP module is shown in Figure 10.

Specifically, the C3_EMSCP module adopts a dual-path design with shallow and deep branches to preserve local details and extract deep features:

Shallow branch: a 1 × 1 convolution quickly compresses the input features and extracts key information, preserving the integrity of the detail features and laying the foundation for subsequent feature fusion.
The deep branch, utilizing the stacked Bottleneck_EMSCP structure, enhances feature fusion depth and multi-scale information interaction. In Bottleneck_EMSCP, 1 × 1 convolutions reduce channels and integrate information. The EMSConvP layer applies multi-scale convolution kernels (1 × 1, 3 × 3, 5 × 5, and 7 × 7) and group convolutions, facilitating stepwise convolutions of the grouped input feature map. This enables deep interaction between cross-scale features while minimizing computational redundancy. The Bottleneck_EMSCP structure is illustrated in Figure 11.

In terms of the fusion mechanism, the C3_EMSCP module enhances the interaction and integration of shallow and deep features through the collaborative design of multi-scale convolutional kernels and group convolution techniques. The shallow branch focuses on preserving local detail information from the input features, while the deep branch captures global features from disease areas of different scales using multi-scale convolution kernels and group convolutions. Ultimately, the feature maps from both branches are fused along the channel dimension and further optimized through 1 × 1 convolutions, significantly improving the comprehensive expression of both local and global information.

The core innovation of the C3_EMSCP module lies in the synergistic design of multi-scale convolution kernels and group convolutions. This design effectively addresses the shortcomings of traditional modules in extracting multi-scale disease features, enhancing the multi-scale adaptability and information interaction of features. It not only improves the model’s ability to comprehensively understand disease areas at various scales but also effectively balances computational efficiency and detection performance, providing efficient and stable feature support for the precise detection of complex disease areas.

2.2.3. DySample Module

In the task of detecting Hemerocallis fulva leaf diseases, to address the issues of detail loss and blurred boundaries during feature reconstruction, a DySample dynamic upsampling module was introduced and optimized in the HF-MSENet model. The core of DySample lies in dynamically generating sampling positions based on the content of the input feature map, enabling a more precise restoration of high-resolution details and structural information [56]. By generating offset values through convolutional layers, the module dynamically adjusts the sampling range for each pixel, effectively optimizing the spatial distribution of the feature map. This mechanism significantly enhances the model’s ability to capture subtle disease features, particularly in complex backgrounds, thereby improving detection accuracy and robustness.

Compared to traditional upsampling methods, DySample not only uses point sampling to reduce interpolation errors but also incorporates pixel rearrangement and grid sampling to avoid boundary blurring. This approach generates content-aware upsampling results without requiring additional high-resolution feature inputs, thereby reducing model complexity and computational costs while maintaining high performance. The sampling process is illustrated in Figure 12.

Specifically, given an input feature map X (of size C × H × W) and a point sampling set S (of size 2 g × sH × sW, where 2 g represents x and y coordinates), the Grid Sample function re-samples X at positions specified by S to produce the upsampled feature map X′. The computation formula is shown in Equation (8).

X^{'} = g r i d s a m p l e (X, S)

(8)

The offset range of the sampling set can be determined by the dynamic and static scope factors, respectively. The sampling processes of both are illustrated in Figure 13.

Dynamic Scope Factor

To improve the upsampling quality, DySample dynamically adjusts sampling positions based on the content and context of the input feature map. By utilizing learnable offsets, DySample enables fine-grained sampling across spatial scales, overcoming the limitations of traditional interpolation methods. The offset learning mechanism allows for the more effective capture of detailed information, enhancing performance in high-resolution image reconstruction and multi-scale object detection tasks. In detecting Hemerocallis fulva leaf diseases, the module’s precise upsampling mechanism effectively captures lesion features at various scales, thereby improving overall model performance. The computational formulas are provided in Equations (9) and (10).

O_{1} = 0.5 σ (l i n e a r_{1} (X)) \times l i n e a r_{2} (X),

(9)

S_{1} = G + O_{1}

(10)

Static Scope Factor

The input feature map X (of size C × H × W) is first passed through a linear layer (input channel C; output channel 2 gs²), and the resulting tensor is reorganized using a pixel reorganization method to produce an offset tensor O₂ (of size 2 g × sH × sW). The offset O₂ is then added to the original sampling grid G to generate the sampling set S₂. The computation process is defined by Equations (11) and (12).

O_{2} = 0.25 l i n e a r_{3} (X),

(11)

S_{2} = G + O_{2}

(12)

The dynamic upsampling mechanism of the DySample module enhances the model’s ability to restore details and capture multi-scale features, while maintaining computational efficiency. It significantly improves detection accuracy and robustness. In high-resolution feature reconstruction and multi-scale object detection tasks, the design of DySample enables HF-MSENet to perform exceptionally well in complex backgrounds.

2.3. Training Environment and Parameter Settings

The experimental hardware specifications, runtime environment, CUDA and cuDNN configurations, as well as the relevant libraries used in constructing the proposed HF-MSENet model for Hemerocallis fulva leaf disease detection, are thoroughly detailed in the following sections, as summarized in Table 3.

The baseline model is defined as the YOLOv8n architecture, which operates without any additional modules or modifications, ensuring a standard framework. Specific training parameters and configurations are comprehensively listed in Table 4.

2.4. Performance Evaluation Metrics

The Hemerocallis fulva leaf disease detection model was evaluated using key metrics: precision (P), recall (R), F1 score (F1), mAP@50, and mAP@50–95.

P measures prediction accuracy, emphasizing a reduction in false positives.
R quantifies target detection, emphasizing a minimization of false negatives.
F1 is the harmonic mean of P and R, balancing detection accuracy and sensitivity, especially in imbalanced classes.
AP evaluates the model’s detection performance for a single category, reflecting its average precision for that specific class.
mAP@50 measures mean detection precision at an IoU threshold of 0.5, offering an overall model assessment.
mAP@50–95 calculates mean precision at IoU thresholds of 0.5 to 0.95, evaluating model robustness under varying overlap conditions.
The formulas for these metrics are provided in Equations (13)–(18).

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e} \times 100 %

(13)

R e c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e} \times 100 %

(14)

A P = \int_{0}^{1} P r e c i s i o n (R e c a l l) d R e c a l l

(15)

F 1 = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(16)

m A P @ 50 = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(17)

m A P @ 50 - 95 = \frac{1}{N} \sum_{i = 1}^{95} A P_{j}

(18)

3. Results

3.1. Experimental Results and Discussion

3.1.1. Ablation Experiment

To comprehensively evaluate the contributions of individual modules to the performance optimization of HF-MSENet, particularly for disease detection in complex backgrounds, a series of ablation experiments was conducted. Modules were sequentially added or removed to assess their independent and synergistic effects on feature extraction, fusion, and reconstruction. The experiments, divided into eight groups, included the baseline model and various combinations of improved modules. The findings provide theoretical support for the design of HF-MSENet. The results of the ablation study are summarized in Table 5.

The experimental results demonstrate that the baseline model exhibited low performance across all metrics, with a particularly low mAP@50–95 of 0.738, indicating its limited ability to detect diseases in complex backgrounds. The introduction of the CSMSM module alone improved P from 0.920 to 0.936 and mAP@50–95 to 0.793, suggesting that this module effectively enhanced target feature representation and suppressed background interference, thereby improving the overall detection performance. When the C3_EMSCP module was introduced, P and F1 scores improved slightly, but R and mAP@50–95 showed limited gains, highlighting its primary role in multi-scale feature fusion with less impact on fine detail optimization. Conversely, the inclusion of the DySample module significantly improved R and mAP@50–95, validating its effectiveness in restoring fine-grained details and optimizing target boundaries.

The combination of CSMSM and DySample modules resulted in synergistic improvements, achieving the highest mAP@50–95, highlighting their complementary strengths in feature extraction and detail recovery. Integrating all modules further enhanced performance, with mAP@50 and mAP@50–95 reaching 0.949 and 0.803, respectively, confirming the independent and synergistic contributions of each module.

3.1.2. Discussion of Ablation Experiment

Heatmap visualizations were used to illustrate the impact of each module on model performance. The heatmaps highlight detection accuracy and model attention in disease regions, with darker colors indicating higher attention. Figure 14 shows the progressive changes as CSMSM, C3_EMSCP, and DySample were introduced sequentially.

The baseline model (Figure 14a) shows vague disease localization and misses subtle features due to general extraction techniques, resulting in high background noise and missed detections. Enhancements are needed for improved performance.
The CSMSM module (Figure 14b) enhances focus on key disease regions through a channel–spatial dual attention mechanism that prioritizes disease-related features. However, limited feature extraction can lead to missed fine details and slight shifts in detection boundaries as the model emphasizes larger areas.
The C3_EMSCP module (Figure 14c) enhances multi-scale feature fusion, allowing the model to capture features at various resolutions. However, due to incomplete fine-grained feature extraction, small disease lesions may be inaccurately located, causing minor shifts in detection positions.
The DySample module (Figure 14d) enhances detail recovery and reduces boundary blurring during upsampling. By dynamically adjusting sampling positions, it refines the detection of fine-grained features. Nevertheless, false detections may still occur in complex backgrounds, causing slight shifts in detection locations, particularly in regions with overlapping or subtle disease symptoms.
The combination of CSMSM and C3_EMSCP (Figure 14e) enhances feature attention and multi-scale fusion, improving the coverage of disease regions and focus on critical areas. However, smaller targets may still be missed as the model prioritizes larger features, causing variations in detection locations for small lesions.
Combining CSMSM and DySample (Figure 14f) improves regional focus and detail recovery, enhancing detection accuracy across target sizes. Although background noise is reduced and boundary clarity is enhanced, minor shifts in detection locations may still occur due to small misdetections in complex backgrounds.
The integration of C3_EMSCP and DySample (Figure 14g) demonstrates strong multi-scale fusion and boundary optimization. However, background noise complexity slightly reduces detection accuracy, leading to minor shifts in detection locations, especially for small or irregularly shaped disease lesions.
Finally, the integration of CSMSM, C3_EMSCP, and DySample in HF-MSENet (Figure 14h) achieves optimal disease region localization. This approach improves detection accuracy across various scales, significantly reducing false and missed detections. The synergistic effect of these modules enhances detection precision and ensures consistent localization, as demonstrated in Figure 14h.

As shown in Figure 15 and Table 6, the HF-MSENet model outperforms baseline models across all disease categories, demonstrating the synergistic optimization effects of its three modules. For Leaf Blight, mAP@50 increased from 0.964 to 0.972, primarily due to the CSMSM module’s improved focus on key region features, enhancing detection accuracy. For Anthracnose, mAP@50 improved from 0.912 to 0.934, with HF-MSENet generating detection boxes that more closely match the actual disease regions, providing clearer boundaries and more precise coverage. This improvement reflects the enhanced feature extraction of the CSMSM module and the DySample module’s role in boundary optimization and detail recovery. For Leaf Spot, mAP@50 rose from 0.918 to 0.929, reflecting the improved detection of small lesions, thanks to the C3_EMSCP module’s multi-scale feature fusion and interaction, enhancing adaptability to diverse disease regions. For Rust, mAP@50 increased from 0.931 to 0.962, with detection boxes closely matching true disease regions and suppressing background noise. This demonstrates the model’s advantage in key region localization and validates the C3_EMSCP module’s multi-scale adaptation and the DySample module’s effectiveness in detail reconstruction. Overall, mAP@50 for HF-MSENet increased from 0.931 to 0.949, and mAP@50–95 improved from 0.738 to 0.803, reflecting gains of 1.8% and 6.5%, respectively. These results highlight the synergistic effects of the CSMSM, C3_EMSCP, and DySample modules in feature extraction, fusion, and detail recovery, enhancing the accuracy and robustness of Hemerocallis fulva leaf disease detection.

To further validate the performance of HF-MSENet, training curves across different metrics were compared. Figure 16 displays the model’s training curves, with panels (a–d) showing the original model’s curves for F1 Confidence, P Confidence, P-R, and R Confidence and panels (e–h) representing the corresponding curves for HF-MSENet. The F1-Confidence curve shows that HF-MSENet achieves an F1 score of 0.93 at a confidence threshold of 0.261, surpassing the original model’s score of 0.91 at 0.268, indicating a better balance between P and R. In the Precision–Confidence curve, HF-MSENet reaches a precision value of 1.00 at a confidence of 0.921, while the original model reaches the same value at a lower confidence of 0.905, highlighting HF-MSENet’s advantage in reducing false positives. The Precision–Recall curve further demonstrates that HF-MSENet maintains a higher precision at various recall rates. In the R-Confidence curve, HF-MSENet exhibits a higher recall in the low-confidence region, achieving a recall of 0.96 at a confidence of 0.000, compared to the original model’s recall of 0.95. These results confirm the robustness and superior performance of HF-MSENet in complex backgrounds and multi-disease detection.

Figure 17a illustrates the accuracy changes of different models across training epochs. The baseline model shows slow accuracy improvements with lower values at each epoch. In contrast, models integrating improved modules exhibit varying levels of enhancement. The CSMSM module leads to rapid early-stage growth, enhancing the focus on key regions and feature extraction. The C3_EMSCP module improves performance in the middle and late stages by optimizing multi-scale feature fusion. The DySample module accelerates growth in the later stages, reducing detail loss. When combining modules, rapid early-stage growth and higher final accuracy are observed, suggesting synergistic improvements. HF-MSENet, integrating all modules, achieves the highest accuracy at stabilization. Figure 17b presents a comparison of the mAP@50 values across different models. Similarly, the values increase with training epochs, showing rapid early growth followed by slower increases. The baseline model has lower mAP@50 values with slower growth, while models incorporating CSMSM enhance target detection accuracy. C3_EMSCP improves growth in the middle and late stages, optimizing multi-scale feature fusion, while DySample accelerates growth in the later stages by improving boundary and detail handling. Combinations of two modules lead to significant performance improvements, with HF-MSENet, which integrates all module advantages, achieving the highest mAP@50 value at stabilization. Both figures demonstrate that the introduction of specific modules and their combinations improves model performance, with HF-MSENet showing a clear overall advantage.

A comparison of confusion matrices shows that HF-MSENet significantly outperforms the baseline model in disease classification. The false positive rate for Leaf_Blight decreased, and the detection accuracy for Anthracnose and Leaf_Spot improved from 0.87 and 0.89 to 0.93 and 0.92, respectively. The false positive rate in the background notably reduced. Additionally, the detection accuracy for Rust increased from 0.93 to 0.94, with the background false positive rate remaining stable. These results demonstrate that optimization through the CSMSM, C3_EMSCP, and DySample modules enhanced the model’s robustness, reduced background noise, and strengthened disease classification and localization in complex backgrounds. The training confusion matrix comparison is shown in Figure 18.

In summary, HF-MSENet demonstrates exceptional performance across multiple evaluation metrics. Ablation studies confirmed the critical role of the CSMSM, C3_EMSCP, and DySample modules in feature extraction, fusion, and detail reconstruction. Heatmap visualizations illustrated the model’s precise focus on disease regions and its ability to suppress background interference. Detection accuracy and metric curve analysis highlighted its excellent learning efficiency and high detection precision. Furthermore, the confusion matrix validated the model’s robustness and low false positive rate in classification and localization tasks. Overall, HF-MSENet excels in all aspects of disease detection, fully validating its effectiveness and practicality.

3.1.3. Comparative Experiment

To comprehensively evaluate the performance of the HF-MSENet model in detecting Hemerocallis fulva leaf diseases, several widely validated object detection models were selected for comparison. These models served as reference benchmarks, offering a reliable basis for assessing HF-MSENet’s performance and highlighting its advantages in complex backgrounds and multi-scale disease detection. The detection results for each model on the validation set are shown in Table 7.

Experimental results show that Faster R-CNN exhibited relatively low detection performance, primarily due to interference from background noise during feature extraction and region proposal generation. This limited its effectiveness in detecting subtle diseases and handling multi-scale tasks. The SSD model showed acceptable performance in terms of precision and mAP@50, but its mAP@50–95 was only 0.473, highlighting its limitations in detecting subtle diseases and achieving high precision. Among the YOLO series, YOLOv5n and YOLOv6n provided a better balance between precision and recall. However, YOLOv7n showed a decrease in accuracy, with an mAP@50 of only 0.853, indicating difficulties in detecting disease regions. YOLOv8n performed the best, with an mAP@50 of 0.931, but its mAP@50–95 was only 0.738, revealing limitations in the reconstruction of subtle disease features. YOLOv9t, YOLOv10n, and YOLOv11n demonstrated limited improvements in high-precision detection, with only slight increases in mAP@50–95, maintaining a moderate overall performance. In contrast, the proposed HF-MSENet outperformed all models in key metrics, achieving the highest performance, with a p value of 0.936, an F1 score of 0.931, an mAP@50 of 0.949, and a breakthrough mAP@50–95 of 0.803. These results clearly demonstrate the significant synergistic optimization of HF-MSENet’s modules, confirming its superior performance in detecting Hemerocallis fulva leaf diseases.

3.1.4. Discussion of Comparative Experiment

To evaluate model performance during training, mAP@50 variation curves were plotted at 10-epoch intervals to highlight convergence speed and final accuracy (Figure 19). Radar charts based on P, R, F1 score, mAP@50, and mAP@50–95 were also generated to illustrate the overall performance distribution (Figure 19).

Figure 18 shows that HF-MSENet consistently outperforms all comparison models. During the early and mid-training stages (epochs 10 to 150), and its mAP@50 curve increases rapidly, demonstrating high learning efficiency and effective feature extraction for Hemerocallis fulva leaf diseases. By epoch 50, HF-MSENet achieves an mAP@50 of 0.835, surpassing YOLOv8n (0.794), YOLOv5n (0.711), and YOLOv6n (0.739), highlighting the effectiveness of the CSMSM and C3_EMSCP modules in feature extraction and multi-scale fusion. In contrast, Faster R-CNN shows the weakest initial performance, with an mAP@50 of only 0.260 at epoch 10 and significant fluctuations during early training, indicating slower convergence and a limited learning capacity. While SSD outperforms Faster R-CNN slightly, its overall performance remains inferior to the YOLO models, particularly in the mid-training phase (epochs 50 to 150), where stagnation is observed. YOLOv8n achieves a higher initial mAP@50 of 0.470 at epoch 10, but its growth slows, reaching 0.917 at epoch 300, which still lags behind HF-MSENet. YOLOv5n and YOLOv6n show similar trends, with final mAP@50 values of 0.888 and 0.884, respectively, both lower than HF-MSENet.

Throughout training, HF-MSENet not only converges more rapidly but also achieves greater stability in later stages (post-150 epochs). By epoch 300, its mAP@50 reaches 0.934, significantly outperforming all other models, underscoring its robustness and superior performance under complex conditions.

As shown in the radar chart in Figure 20, HF-MSENet occupies the largest area, forming a highly balanced pentagonal structure. This indicates outstanding performance across all metrics, particularly in mAP@50 and mAP@50–95, where it achieves 0.949 and 0.803, respectively, significantly outperforming other models. This demonstrates its strong disease detection capability and robustness.

In contrast, the radar chart for Faster R-CNN shows a clear asymmetry, with the relatively low P and F1 values of 0.314 and 0.433, respectively. This reflects its vulnerability to noise interference in complex backgrounds, making it difficult to accurately detect disease regions. SSD’s radar chart exhibits a “one-sided shift” structure, where the p value reaches 0.862, but its mAP@50–95 is only 0.473, exposing its limitations in high-precision tasks.

Among the YOLO series models, YOLOv5n, YOLOv6n, and YOLOv8n have radar charts that are relatively close to the shape of HF-MSENet, with high P, R, and mAP@50 values. For example, YOLOv8n achieves an mAP@50 of 0.931, which is close to HF-MSENet, but its mAP@50–95 is only 0.738, indicating that its performance in high-precision tasks remains slightly inferior. Later versions, such as YOLOv9t, YOLOv10n, and YOLOv11n, show improvements in certain metrics but exhibit noticeable fluctuations in the overall radar chart shape, failing to surpass HF-MSENet.

HF-MSENet’s radar chart is highly balanced with no significant contraction. It achieves the highest values in P, R, and F1, reaching 0.936, 0.926, and 0.931, respectively. This further validates its exceptional performance in precise localization and high detection accuracy. The significant advantage in mAP@50 and mAP@50–95 highlights the model’s ability in feature extraction and multi-scale fusion in complex backgrounds. Overall, the balanced radar chart and leading metrics confirm HF-MSENet’s practical value in complex disease detection.

3.2. Discussion on Limitations and Future Directions

Despite the significant success of the HF-MSENet model in detecting Hemerocallis fulva leaf diseases, several limitations remain. Firstly, the current dataset, HFLD-Dataset, does not fully encompass all variations of Hemerocallis fulva leaf diseases, restricting the model’s generalization ability, particularly in new scenarios with unseen disease samples or novel disease variations. Secondly, although the model performs excellently, its complex network architecture increases computational requirements, limiting its deployment and real-time application in mobile and embedded devices.

In comparison with other related research, most existing studies have primarily focused on major crops such as apples and rice, while our research targets horticultural plants like Hemerocallis fulva. There is a lack of direct comparison in terms of model performance on different crop types. For example, in the detection of diseases in wheat, maize, and soybeans, other models might have different levels of performance and face distinct challenges. This indicates that there is a need for further exploration of the applicability of our model to a wider range of crops.

Future research should explore the following directions. First, expanding the dataset in terms of size and diversity to include samples from various climates, regions, and new disease variations would enhance the model’s adaptability and accuracy in diverse environments. Second, investigating model optimization techniques such as pruning and knowledge distillation could improve detection efficiency and enable real-time applications. Additionally, it would be valuable to test the HF-MSENet model on other important crops like wheat, maize, and soybeans to evaluate its performance and potential for generalization. This would provide a more comprehensive understanding of the model’s capabilities and its potential to contribute to the field of automated disease recognition across different agricultural plants.

In conclusion, HF-MSENet demonstrates clear advantages in disease region focus, feature fusion, and detection accuracy, providing an effective solution for Hemerocallis fulva disease detection. Future efforts to expand the dataset, implement lightweight designs, and incorporate multi-modal fusion are expected to drive its broader deployment in practical applications, offering valuable technical support for Hemerocallis fulva health management and the development of smart agriculture.

4. Conclusions

To address the challenges of detecting Hemerocallis fulva leaf diseases in complex backgrounds and multi-scale scenarios, this study proposed HF-MSENet, a high-precision object detection model. Using the high-quality HFLD-Dataset and innovative modules (CSMSM, C3_EMSCP, and DySample), the model enhances attention to critical regions, multi-scale feature fusion, and detailed reconstruction, greatly improving accuracy and robustness.

Experimental results show that HF-MSENet achieved accuracy, recall, and mAP@50 values of 93.6%, 92.6%, and 94.9% on the HFLD-Dataset, surpassing the baseline by 1.6%, 2.5%, and 1.8%, respectively. The model excelled in mAP@50–95, reaching 80.3%, a 6.5% improvement. Heatmap visualizations validated the CSMSM module’s focus on key disease regions, the C3_EMSCP module’s effectiveness in multi-scale feature fusion, and the DySample module’s enhancement of fine-grained feature reconstruction. Additionally, the model improved disease localization and detection accuracy, reducing false positives and missed detections across disease types.

Compared to mainstream object detection models, such as YOLOv8n and Faster R-CNN, HF-MSENet achieved superior performance in terms of detection precision, recall, and high IoU threshold metrics, demonstrating its efficiency and applicability.

Overall, HF-MSENet maintains high detection accuracy while optimizing computational efficiency through modular design, making it highly applicable. This model provides a reliable solution for the precise detection and classification of Hemerocallis fulva leaf diseases and shows great potential for real-world disease monitoring and plant protection systems, driving advancements in horticultural health management and intelligent agricultural technologies.

Author Contributions

Conceptualization, T.W.; methodology, T.W. and J.L. (Jianjun Li); software, H.X.; validation, T.W., H.X. and J.L. (Jianjun Li); formal analysis, J.X.; investigation, T.W. and J.X.; resources, J.L. (Jianjun Li); data curation, J.X.; writing—original draft preparation, T.W.; writing—review and editing, J.L. (Jianjun Li); visualization, T.W.; supervision, J.L. (Junwan Liu); project administration, J.L. (Junwan Liu); funding acquisition, J.L. (Junwan Liu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Dongting Lake Forest Structure and Ecosystem Services Scale Relationship and Optimization Model, grant number 2022JJ31000. This research was funded by the Hunan Provincial Natural Science Foundation.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The dataset is available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hirota, S.K.; Yasumoto, A.A.; Nitta, K.; Tagane, M.; Miki, N.; Suyama, Y.; Yahara, T. Evolutionary history of Hemerocallis in Japan inferred from chloroplast and nuclear phylogenies and levels of interspecific gene flow. Mol. Phylogenetics Evol. 2021, 164, 107264. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Ji, F.; Hou, F.; Shi, Q.; Xing, G.; Chen, H.; Weng, Y.; Kang, X. Morphological, palynological and molecular assessment of Hemerocallis core collection. Sci. Hortic. 2021, 285, 110181. [Google Scholar] [CrossRef]
Bortolini, L.; Zanin, G. Hydrological behaviour of rain gardens and plant suitability: A study in the Veneto plain (north-eastern Italy) conditions. Urban For. Urban Green. 2018, 34, 121–133. [Google Scholar] [CrossRef]
Szewczyk, K.; Kalemba, D.; Miazga-Karska, M.; Krzemińska, B.; Dąbrowska, A.; Nowak, R. The essential oil composition of selected Hemerocallis cultivars and their biological activity. Open Chem. 2019, 17, 1412–1422. [Google Scholar] [CrossRef]
Liang, Y.; Huang, R.; Chen, Y.; Zhong, J.; Deng, J.; Wang, Z.; Wu, Z.; Li, M.; Wang, H.; Sun, Y. Study on the sleep-improvement effects of Hemerocallis citrina Baroni in Drosophila melanogaster and targeted screening to identify its active components and mechanism. Foods 2021, 10, 883. [Google Scholar] [CrossRef]
Li, X.; Jiang, S.; Cui, J.; Qin, X.; Zhang, G. Progress of genus Hemerocallis in traditional uses, phytochemistry, and pharmacology. J. Hortic. Sci. Biotechnol. 2022, 97, 298–314. [Google Scholar] [CrossRef]
Sandri, E.; Werner, L.U.; Bernalte Martí, V. Lifestyle Habits and Nutritional Profile of the Spanish Population: A Comparison Between the Period During and After the COVID-19 Pandemic. Foods 2024, 13, 3962. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Qu, Y.-t.; Han, H.; Tang, H.-w.; Chen, F.; Xiong, Y. Effects of Plant Growth Regulators on Adventitious Bud Induction and Proliferation of Hemerocallis fulva; Northeast Forestry University: Harbin, China, 2021. [Google Scholar]
Yu, Y.; Hu, J.; Wa, J.; Zhang, Z. The control effect of combination of fertilizer and medicine on daylily leaf streak of Hemerocallis fulva. J. Technol. 2023, 23, 177–181. [Google Scholar]
Zhao, T.-R.; Xu, Z.-H.; Zhang, C.-H.; Wang, J.-J.; Guo, F.-Q.; Ye, Q.-M. Evaluation on Waterlogging Tolerance of Hemerocallis Fulva in Field; Jiangxi Academy of Agricultural Sciences: Nanchang, China, 2021. [Google Scholar]
Ye, J. Pedro de la Piñuela’s Bencao Bu and the Cultural Exchanges between China and the West. Religions 2024, 15, 343. [Google Scholar] [CrossRef]
Dhingra, G.; Kumar, V.; Joshi, H. Study of digital image processing techniques for leaf disease detection and classification. Multimed. Tools Appl. 2018, 77, 19951–20000. [Google Scholar] [CrossRef]
Keivani, M.; Mazloum, J.; Sedaghatfar, E.; Tavakoli, M. Automated analysis of leaf shape, texture, and color features for plant classification. Trait. Du Signal 2020, 37, 17–28. [Google Scholar] [CrossRef]
Mahmud, M.S.; Chang, Y.K.; Zaman, Q.U.; Esau, T.J. Detection of strawberry powdery mildew disease in leaf using image texture and supervised classifiers. In Proceedings of the CSBE/SCGAB 2018 Annual Conference, Guelph, ON, USA, 22–25 July 2018; pp. 22–25. [Google Scholar]
Xie, C.; He, Y. Spectrum and image texture features analysis for early blight disease detection on eggplant leaves. Sensors 2016, 16, 676. [Google Scholar] [CrossRef]
Nashrullah, F.H.; Suryani, E.; Salamah, U.; Prakisya, N.P.; Setyawan, S. Texture-Based Feature Extraction Using Gabor Filters to Detect Diseases of Tomato Leaves. Rev. D’intelligence Artif. 2021, 35, 331. [Google Scholar]
Ahmad, N.; Asif, H.M.S.; Saleem, G.; Younus, M.U.; Anwar, S.; Anjum, M.R. Leaf image-based plant disease identification using color and texture features. Wirel. Pers. Commun. 2021, 121, 1139–1168. [Google Scholar] [CrossRef]
Wang, M.; Guo, S.; Niu, X. surveys. Detect. Wheat Leaf Disease. Appl. Int. J. Res. 2015, 6, 1669–1675. [Google Scholar]
Gangshan, W.; Yinlong, F.; Qiyou, J.; Ming, C.; Na, L.; Yunmeng, O.; Zhihua, D.; Baohua, Z. Early identification of strawberry leaves disease utilizing hyperspectral imaging combing with spectral features, multiple vegetation indices and textural features. Comput. Electron. Agric. 2023, 204, 107553. [Google Scholar]
Aditi, S.; Harjeet, K. Potato Plant Leaves Disease Detection and Classification using Machine Learning Methodologies. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1022, 012121. [Google Scholar]
Sen, Z.; Yun, F.; Jiang-nan, C.; Ye, L.; Xu-dong, D.; Yong-liang, L. Application of hyperspectral imaging in the diagnosis of acanthopanax senticosus black spot disease. Spectrosc. Spectr. Anal. 2021, 41, 1898–1904. [Google Scholar]
Devi, K.S.; Srinivasan, P.; Bandhopadhyay, S. H2K–A robust and optimum approach for detection and classification of groundnut leaf diseases. Comput. Electron. Agric. 2020, 178, 105749. [Google Scholar] [CrossRef]
Zhao, J.; Fang, Y.; Chu, G.; Yan, H.; Hu, L.; Huang, L. Identification of leaf-scale wheat powdery mildew (Blumeria graminis f. sp. Tritici) combining hyperspectral imaging and an SVM classifier. Plants 2020, 9, 936. [Google Scholar] [CrossRef] [PubMed]
Maheswaran, S.; Sathesh, S.; Rithika, P.; Shafiq, I.M.; Nandita, S.; Gomathi, R. Detection and classification of paddy leaf diseases using deep learning (cnn). In Proceedings of the International Conference on Computer, Communication, and Signal Processing, Chennai, India, 24–25 February 2022; pp. 60–74. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. pp. 21–37. [Google Scholar]
Redmon, J. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Farhadi, A.; Redmon, J. Yolov3: An Incremental Improvement; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
Khanam, R.; Hussain, M.J. Yolov11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar]
Tian, L.; Zhang, H.; Liu, B.; Zhang, J.; Duan, N.; Yuan, A.; Huo, Y. Bioinformatics. VMF-SSD: A Novel v-space based multi-scale feature fusion SSD for apple leaf disease detection. IEEE/ACM Trans. Comput. Biol. Bioinform. 2022, 20, 2016–2028. [Google Scholar]
Deari, S.; Ulukaya, S. Engineering. A hybrid multistage model based on YOLO and modified inception network for rice leaf disease analysis. Arab. J. Sci. Eng. 2024, 49, 6715–6723. [Google Scholar] [CrossRef]
Wang, J.; Qin, C.; Hou, B.; Yuan, Y.; Zhang, Y.; Feng, W. LCGSC-YOLO: A lightweight apple leaf diseases detection method based on LCNet and GSConv module under YOLO framework. Front. Plant Sci. 2024, 15, 1398277. [Google Scholar] [CrossRef] [PubMed]
Kumar, V.S.; Jaganathan, M.; Viswanathan, A.; Umamaheswari, M.; Vignesh, J.J. Rice leaf disease detection based on bidirectional feature attention pyramid network with YOLO v5 model. Environ. Res. Commun. 2023, 5, 065014. [Google Scholar] [CrossRef]
He, Y.; Peng, Y.; Wei, C.; Zheng, Y.; Yang, C.; Zou, T. Automatic Disease Detection from Strawberry Leaf Based on Improved YOLOv8. Plants 2024, 13, 2556. [Google Scholar] [CrossRef]
Xie, Z.; Li, C.; Yang, Z.; Zhang, Z.; Jiang, J.; Guo, H. YOLOv5s-BiPCNeXt, a Lightweight Model for Detecting Disease in Eggplant Leaves. Plants 2024, 13, 2303. [Google Scholar] [CrossRef]
Zhu, S.; Ma, W.; Wang, J.; Yang, M.; Wang, Y.; Wang, C. EADD-YOLO: An efficient and accurate disease detector for apple leaf using improved lightweight YOLOv5. Front. Plant Sci. 2023, 14, 1120724. [Google Scholar] [CrossRef] [PubMed]
Yan, C.; Yang, K. FSM-YOLO: Apple leaf disease detection network based on adaptive feature capture and spatial context awareness. Digit. Signal Process. 2024, 155, 104770. [Google Scholar] [CrossRef]
Abdullah, A.; Amran, G.A.; Tahmid, S.A.; Alabrah, A.; AL-Bakhrani, A.A.; Ali, A. A deep-learning-based model for the detection of diseased tomato leaves. Agronomy 2024, 14, 1593. [Google Scholar] [CrossRef]
Xu, W.; Wang, R. ALAD-YOLO: An lightweight and accurate detector for apple leaf diseases. Front. Plant Sci. 2023, 14, 1204569. [Google Scholar]
Bandi, R.; Swamy, S.; Arvind, C.S. Leaf disease severity classification with explainable artificial intelligence using transformer networks. Int. J. Adv. Technol. Eng. Explor. 2023, 10, 278. [Google Scholar]
Brownlee, J. Deep Learning for Computer Vision: Image Classification, Object Detection, and Face Recognition in Python; Machine Learning Mastery: Melbourne, Australia, 2019. [Google Scholar]
Wang, R.; Wang, Z.; Xu, Z.; Wang, C.; Li, Q.; Zhang, Y.; Li, H. A Real-Time Object Detector for Autonomous Vehicles Based on YOLOv4. Comput. Intell. Neurosci. 2021, 2021, 9218137. [Google Scholar] [CrossRef] [PubMed]
Sohan, M.; Sai Ram, T.; Reddy, R.; Venkata, C. A review on yolov8 and its advancements. In Proceedings of the International Conference on Data Intelligence and Cognitive Informatics, Tirunelveli, India, 18–20 November 2024; pp. 529–545. [Google Scholar]
Wang, S.-H.; Fernandes, S.L.; Zhu, Z.; Zhang, Y.-D. Attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sens. J. 2021, 22, 17431–17438. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Wang, C.-Y.; Liao, H.; Wu, Y.-H.; Chen, P.-Y.; Yeh, I. A New Backbone that can Enhance Learning Capability of CNN. 2020 IEEE. In Proceedings of the CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; IEEE: New York, NY, USA; pp. 390–391. [Google Scholar]
Li, X.; Song, D.; Dong, Y. Hierarchical feature fusion network for salient object detection. IEEE Trans. Image Process. 2020, 29, 9165–9175. [Google Scholar] [CrossRef]
Yan, C.; Xu, E. ECM-YOLO: A real-time detection method of steel surface defects based on multiscale convolution. J. Opt. Soc. Am. A 2024, 41, 1905–1914. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Liu, W.; Lu, H.; Fu, H.; Cao, Z. Learning to upsample by learning to sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 4–6 October 2023; pp. 6027–6037. [Google Scholar]

Figure 1. The data collection locations for Hemerocallis fulva leaf disease are as follows. (a) Map of southern Asia; (b) environment of collection site in Hunan Province, China.

Figure 2. Hemerocallis fulva leaf disease dataset (HFLD-Dataset) samples. (a) Leaf Blight; (b) Anthracnose; (c) Leaf Spot; (d) Rust.

Figure 3. Visualization of label distribution.

Figure 4. Data augmentation effects for Hemerocallis fulva leaf disease. (a) Flipping; (b) perspective transformation; (c) brightness adjustment; (d) noise injection; (e) rotation.

Figure 5. Structure of the (Hemerocallis fulva Multi-Scale and Enhanced Network) HF-MSENet model.

Figure 6. Structure of the CSMSM module.

Figure 7. Structure of channel–spatial dual attention.

Figure 8. Structure of the Multi-Scale Feature Fusion Strategy.

Figure 9. Structure of the C2f module.

Figure 10. Structure of the C3_EMSCP module.

Figure 11. Structure of the Bottleneck_EMSCP module.

Figure 12. Sampling-based dynamic upsampling.

Figure 13. Sampling point generator in DySample.

Figure 14. Heatmap visualization of detection results with different module combinations. (a) Baseline; (b) Baseline + CSMSM; (c) Baseline + C3_EMSCP; (d) Baseline + DySample; (e) Baseline + CSMSM + C3_EMSCP; (f) Baseline + CSMSM + DySample; (g) Baseline + C3_EMSCP + DySample; (h) HF-MSENet.

Figure 15. Detection performance comparison of four typical disease samples. (a) Leaf Blight; (b) Anthracnose; (c) Leaf Spot; (d) Rust.

Figure 16. Comparison of detection results across different models and metrics. (a) original model’s F1 confidence curve; (b) original model’s P confidence curve; (c) original model’s P - R curve; (d) original model’s R confidence curve; (e) HF-MSENet’s F1 confidence curve; (f) HF-MSENet’s P confidence curve; (g) HF-MSENet’s P - R curve; (h) HF-MSENet’s R confidence curve.

Figure 17. Comparison of precision and mAP@50 metrics across various models. (a) precision change of each model with epochs; (b) mAP@50 value comparison of each models with epochs.

Figure 18. Comparison of confusion matrices across different models. (a) confusion matrix of the original model; (b) confusion matrix of the HF - MSENet model.

Figure 19. Comparison of mAP@50 variation with training epochs for different models.

Figure 20. Radar chart comparison of detection results across different models and metrics.

Table 1. Collection equipment and quantities for Hemerocallis fulva leaf disease data.

Collection Equipment	iPhone 13 Pro	iPhone 14 Pro	realmeGT Neo5	Redmi K60	Canon 600D	Canon 750D	Canon 200D	Nikon D5300
Pixel Resolution (MP)	1200	4800	5000	6400	1800	2420	2420	2416
Images Collected	199	295	116	260	336	483	210	456

Table 2. Table of dataset composition.

Overall Category	Number of Images	Total Number of Labels
Training set	9891	11,365
Validation set	1413	1638
Test set	2826	3227

Table 3. Experimental environment parameters.

System	CPU	GPU	CUDA	Cudnn	Pytorch	Python
Windows 11	Intel(R) Core(TM) i5-13400 CPU @ 2.50 GHz	NVIDIA GeForce RTX 3060 12G	11.8	8.7.0	2.0.0	3.11.0

Table 4. Training parameters.

Input Image	Batch Size	Epoch	Lr0	Momentum	Weight Decay
640 × 640	16	300	0.01	0.937	0.0005

Table 5. Ablation study results of the improved model.

No.	CSMSM	C3_EMSCP	DySample	P	R	F1	mAP@50	mAP@50–95
1	-	-	-	0.920	0.901	0.910	0.931	0.738
2	√	-	-	0.936	0.911	0.923	0.940	0.793
3	-	√	-	0.925	0.889	0.906	0.924	0.728
4	-	-	√	0.917	0.910	0.914	0.937	0.747
5	√	√	-	0.937	0.916	0.926	0.943	0.789
6	√	-	√	0.936	0.916	0.926	0.944	0.804
7	-	√	√	0.920	0.897	0.908	0.930	0.720
8	√	√	√	0.936	0.926	0.931	0.949	0.803

¹ Note: “√” indicates inclusion, while “-” indicates exclusion.

Table 6. Comparison of detection accuracy across different models for various diseases.

Model	mAP50 Values for Different Leaf Diseases				mAP@50	mAP@50–95
Model	Leaf_Blight	Anthracnose	Leaf_Spot	Rust	mAP@50	mAP@50–95
Baseline	0.964	0.912	0.918	0.931	0.931	0.738
HF-MSENet	0.972	0.934	0.929	0.962	0.949	0.803

Table 7. Comparison of detection results of different models.

Model	P	R	F1	mAP@50	mAP@50–95
Faster R-CNN	0.314	0.706	0.433	0.552	0.291
SSD	0.862	0.708	0.775	0.782	0.473
YOLOv5n	0.906	0.857	0.881	0.906	0.678
YOLOv6n	0.919	0.861	0.889	0.910	0.712
YOLOv7n	0.864	0.828	0.850	0.853	0.642
YOLOv8n	0.920	0.901	0.910	0.931	0.738
YOLOv9t	0.889	0.861	0.875	0.904	0.684
YOLOv10n	0.884	0.849	0.866	0.900	0.687
YOLOv11n	0.915	0.901	0.908	0.927	0.739
HF-MSENet	0.936	0.926	0.931	0.949	0.803

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, T.; Xia, H.; Xie, J.; Li, J.; Liu, J. A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection. Agriculture 2025, 15, 262. https://doi.org/10.3390/agriculture15030262

AMA Style

Wang T, Xia H, Xie J, Li J, Liu J. A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection. Agriculture. 2025; 15(3):262. https://doi.org/10.3390/agriculture15030262

Chicago/Turabian Style

Wang, Tao, Hongyi Xia, Jiao Xie, Jianjun Li, and Junwan Liu. 2025. "A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection" Agriculture 15, no. 3: 262. https://doi.org/10.3390/agriculture15030262

APA Style

Wang, T., Xia, H., Xie, J., Li, J., & Liu, J. (2025). A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection. Agriculture, 15(3), 262. https://doi.org/10.3390/agriculture15030262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Scale Feature Focus and Dynamic Sampling-Based Model for Hemerocallis fulva Leaf Disease Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction and Pre-Processing

2.1.1. Description of Study Area and Data Collection

2.1.2. Dataset Annotation

2.1.3. Image Enhancement

2.2. HF-MSENet Model

2.2.1. CSMSM Module

2.2.2. C3_EMSCP Module

2.2.3. DySample Module

2.3. Training Environment and Parameter Settings

2.4. Performance Evaluation Metrics

3. Results

3.1. Experimental Results and Discussion

3.1.1. Ablation Experiment

3.1.2. Discussion of Ablation Experiment

3.1.3. Comparative Experiment

3.1.4. Discussion of Comparative Experiment

3.2. Discussion on Limitations and Future Directions

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI