1. Introduction
Maize, grown over a land area that ranks second only to rice and wheat, serves as a significant food source for humans, as well as a crucial feed for livestock. Moreover, it holds substantial value as a raw material in the light and medical industries [
1]. The primary factor contributing to the reduction in corn yields is leaf disease. Hence, the expeditious and precise identification of these ailments, facilitating early detection, enables cultivators, breeders, and researchers to efficiently employ suitable preventive measures, in order to alleviate the impact of the disease. The acquisition of the specialized knowledge necessary for the identification of diseases in maize leaves poses a significant challenge for the average cultivator. This often leads to erroneous diagnoses of maize plant diseases, which can have substantial negative impacts on the economy. The advancement of machine vision and deep learning has facilitated the automated identification and diagnosis of plant pests and diseases [
2].
In recent years, there has been a notable emphasis among scholars on the application of machine learning and image processing techniques for plant disease identification. Support vector machines (SVM) were proposed by Cortes and Vapnik [
3] in 1995, and Yang et al. [
4] multiple begetter SVM (MBSVM) further improved the classification ability. Bhange and Hingoliwala [
5] applied an SVM model to identify pomegranate leaf disease, by extracting features based on parameters such as color, morphology, and CCV, and through clustering using a k-means algorithm. Thomas et al. [
6] used an SVM classifier to classify and identify hyperspectral images of potato late blight. This K-nearest neighbor (KNN) algorithm was proposed by [
7] in 1975. Similarly to SVM, this method can be applied to multiple classes. Ref. [
8] used this model to classify and identify deadly fungal (Ganoderma) diseases in oil palm plantations. Zhang et al. [
9] segmented maize leaf patches and extracted disease feature vectors, and used KNN to identify five different diseases of maize leaves. Devi et al. [
10] used a HOG (histogram of oriented gradients) and KNN classifier for accurate detection and classification of peanut foliar diseases. The conventional algorithms utilized in machine vision for the automated diagnosis of plant leaf diseases are characterized by their inherent challenges, including complexity, susceptibility to errors, and reliance on manual feature extraction. Consequently, these limitations hinder the effectiveness and practicality of disease-detection processes.
It was observed by researchers that convolutional neural networks (CNNs) exhibited the ability to acquire feature representations that possess both robustness and expressiveness. This realization came about when CNN-based classification networks demonstrated exceptional performance in the 2012 ILSVRC image classification competition [
11]. Consequently, certain scholars have employed deep learning techniques in the context of identifying and detecting diseases in crop leaves [
2,
12]. Richey et al. [
13] proposed a model based on ResNet50 for the detection of northern maize leaf blight in maize plants. The model was applied in a mobile phone application. Zhang et al. [
14] used an improved GoogleNet to identify diseases in the leaves of maize plants. Wu et al. [
15] achieved better results by capturing images of maize planting sites using UAVs and using convolutional neural networks for classification. Panigrahi et al. [
16] presented improved CNN models for training and testing four maize leaf images, by adding modified linear unit activation functions and Adam optimizers. Despite having good accuracy, CNN-based network models are not very efficient. This has motivated several researchers to concentrate on finding ways to improve CNN-based network models. Howard et al. [
17] produced MobileNetV1, which uses deep separable convolution and is a lightweight CNN. Since then, MobileNetV2 [
18] and MobileNetV3 [
19] have been proposed, to improve the effectiveness of MobileNet. In addition, there are other lightweight networks with better performance, such as ShuffleNet [
20], ShuffleNetV2 [
21], and GhostNet [
22], proposed by Huawei’s Noah’s Ark Lab. By introducing an ECA [
23] attention mechanism and using cross-entropy loss, Bi et al. [
24] proposed an improved MoblieNet network to identify maize leaf diseases, with good results. In their study, Gulzar [
25] introduced a classification method that utilizes an enhanced version of the MobileNetV2 architecture for fruit recognition. In their study, Dhiman et al. [
26] thoroughly examined the techniques employed in image capture, preprocessing, and classification for the detection of citrus fruit diseases. These techniques encompassed both machine learning and deep learning methodologies. The program demonstrated a remarkable accuracy rate of 99%. Chen et al. [
27] proposed an improved ShufflenetV2 network to identify apple leaf diseases, which is lighter but loses accuracy. In a separate domain of research, Aggarwal et al. [
28] used integration methodologies to enhance the efficacy of the prevailing convolutional neural networks. They further employed a pre-existing model to suggest a deep learning framework centered on layered integration, thus yielding classifiers that are both more dependable and robust. In practical applications, the utilization of image classification for the purpose of identifying leaf disease in crops has yielded a relatively lower amount of information. Conversely, object detection techniques have the capability of precisely determining the location of the lesion, but these pose greater challenges compared to image classification methods.
Girshick et al. [
29] proposed an R-CNN model with CNN features, to facilitate a combination of deep learning and object detection. The Fast R-CNN [
30] aggregates a region of interest (RoI) layer to improve speed and accuracy but still uses external algorithms to extract object candidate frames. The Faster R-CNN [
31] model proposes a region proposal network (RPN) that discards the manual extraction of candidate frames and merges the object candidate frame extraction process into a single deep network. Du et al. [
32] proposed an improved Faster R-CNN for maize pest detection. Kumar et al. [
33] proposed a Faster R-CNN approach to identify 93 maize diseases. He et al. [
34] used MFaster R-CNN for maize leaf disease detection. Since this type of candidate area-based detector requires two steps, it is collectively referred to as a two-stage object detection algorithm. This type of algorithm has good detection accuracy but lacks real-time performance. A one-stage object detection algorithm is a good solution to the problem of a lack of real-time detection compared to the two-stage, especially with the YOLO (You Only Look Once) [
35,
36,
37,
38,
39,
40,
41,
42,
43] series, which is set to become the mainstream of object detection.Shill and Rahman [
44] used YOLOv3 and YOLOv4 as expert systems to detect plant diseases. Mamat et al. [
45] employed the enhanced YOLOv5 algorithm to accurately detect a set of 100 photos depicting the olea europaea palm. The achieved mean average precision (mAP) for this detection task was 98.7%. Liu et al. [
46] proposed a YOLOX-based algorithm for tomato leaf disease detection, which uses MobileNetV3 instead of the YOLOX backbone for lightweight model feature extraction and introduces a CBAM [
47] attention mechanism. While real-time operation is achievable with one-stage object detection algorithms, their implementation on mobile and edge devices remains a challenging task. Various researchers have proposed alternative approaches in the form of lightweight object detection methods, to address this concern. Examples include YOLOv3-tiny [
48], YOLOv4-tiny [
49], and YOLOv5-lite [
50]. Li et al. [
51] proposed a YOLOv4-tiny based algorithm for broken corn kernel detection.
The YOLO family has been extensively employed across various domains, including agriculture (for the purpose of diagnosing crop pests and illnesses), industry, and medical. The YOLOv5s model, known for its compact size, is a suitable choice for environments with restricted computational resources. Despite its simplified model size and computational complexity compared to other lightweight detection networks, YOLOv5s demonstrates commendable accuracy and robustness on various target detection benchmark datasets. This compelling performance motivated us to employ YOLOv5s as a benchmark model for identifying maize leaf diseases. In order to further advance our research, we considered the following knowledge gaps:
1. The predominant focus of contemporary research pertaining to the identification of maize leaf diseases lies in the domain of image classification recognition. However, it is important to note that this approach offers a restricted amount of information. Alternatively, employing object detection techniques can yield a more comprehensive set of data, hence facilitating more effective disease treatment strategies;
2. The absence of well-defined evaluation measures or benchmarks especially designed for the identification of maize leaf diseases poses challenges in assessing the performance of various object detection algorithms for this purpose;
3. There is a limited number of maize leaf disease detection models that simultaneously address detection accuracy and speed, hence hindering the practical implementation of these methods.
Li et al. [
52] presented a algorithm for the identification of vegetable diseases in their research. The researchers employed the YOLOv5s architecture as a foundational framework but introduced modifications by substituting the CSP module with cross stage partials integrated with a transformer encoder. Furthermore, enhancements were made to the inception module within the neck network. The algorithm under consideration demonstrated a mean average accuracy (mAP) of 0.5 on the curated vegetable dataset, thereby surpassing the performance of the original YOLOv5s by a margin of 4.8%. Nevertheless, it is important to acknowledge that the algorithm that underwent modifications led to a decrease of 1.3 megabytes in the parameter count, thus affecting the frames per second (FPS) measure. In their study, Zhang et al. [
53] employed the YOLOv5-lite model to accurately detect tea leaves suitable for harvesting by an automated picking robot. The YOLOv5-lite model incorporates an enhanced shufflenet network, as a substitute for the backbone network utilized in the original YOLOv5 model. This substitution results in a notable enhancement in detection speed; however, this comes at the expense of a considerable reduction in detection accuracy. Run-Hua et al. [
54] utilized an increased attention mechanism and ghost convolution techniques in their research to improve the performance of the YOLOv5s network. This led to a significant increase of 2.6% in the mean average precision (mAP) on the VOC dataset, in comparison to the initial YOLOv5s network. It is noteworthy to mention that the observed enhancement in performance was attained with an augmentation of 0.75M in the size of the model file. Based on the previously mentioned research, it becomes evident that incorporating a lightweight module as a replacement for the module in the original YOLOv5s network results in a certain reduction in accuracy. Conversely, the inclusion of a module aimed at enhancing accuracy entails a compromise in terms of detection speed. One notable distinction between this study and the aforementioned enhancements lies in the substantial reduction in model parameters achieved through the incorporation of an enhanced lightweight convolutional module. While this reduction results in a certain degree of diminished detection accuracy, the integration of an improved lightweight upsampling algorithm and CoordConv within the neck network serves to augment the network’s perceptual capabilities and enhance its sensitivity to the detected location. Consequently, this mitigates the aforementioned loss in detection accuracy. Furthermore, the channel-wise knowledge distillation approach enhances the detection accuracy of the network, without an increase in the number of model parameters.
In conjunction with the aforementioned studies, it is evident that the current algorithms for detecting maize leaf diseases fail to strike a balance between accuracy and speed of detection. The primary objective of this study was to maximize the level of accuracy within a comparatively lightweight detection algorithm, hence enhancing the applicability of the maize leaf disease detection method over a wider range of settings. In order to accomplish this objective, this study employs a lightweight algorithm for detecting maize leaf diseases. The program achieves enhanced detection accuracy by incorporating lighter modules, enhancing the upsampling technique, and employing knowledge distillation. Initially, the YOLOv5s model underwent a modification wherein the CSP module was substituted with a more lightweight variant known as Faster-C3. This substitution resulted in a reduction in model parameters and computational complexity by minimizing the number of convolution calculations. Consequently, a slight compromise in accuracy was incurred. Furthermore, the neck architecture employed in YOLOv5s incorporates the CoordConv and an enhanced lightweight CARAFE downsampling module. This integration serves to enhance the semantic information within the feature fusion process, resulting in improved detection accuracy, while maintaining minimal alterations to the model parameters. The proposed algorithm incorporates knowledge from YOLOv5m through a channel-wise knowledge distillation method during training, resulting in an enhancement of the detection accuracy, without causing any damage to the model parameters. The main contributions and innovations of this work are summarized below:
- (1)
In this study, we propose an enhanced and more efficient lightweight detection algorithm that builds upon the YOLOv5s framework. Our approach involves replacing the original CSP module with the Faster-C3 module, resulting in a significant reduction in the total number of model parameters. The proposed enhancements in the neck network involve the integration of CARAFE and CoordConv modules. These modifications aim to enhance the extraction of semantic information, with a particular focus on improving the accuracy of detecting object locations;
- (2)
Additional enhancements to the initial model are achieved by implementing the channel-wise knowledge distillation technique during training and adopting the WIoU metric as a loss function. These modifications effectively enhance the accuracy of maize leaf disease detection, without introducing any additional model parameters or computational complexity;
- (3)
A series of comparative experiments were carried out on a dataset pertaining to the detection of maize leaf disease. The findings of these experiments indicate that our proposed algorithm exhibits superior comprehensive performance when compared to other lightweight algorithms. The algorithm has the capability to offer precise planting, visual management, and intelligent decision-making for maize crops.
3. Results and Discussion
3.1. Ablation Experiment Results
Table 3 shows the results of the ablation experiments. Each individual enhancement was incorporated independently into the foundational YOLOv5s model. When exclusively utilizing the enhanced Faster-C3 model, there was a reduction in precision, recall, mAP(0.5), and mAP(0.5:0.95) of 0.5%, 0.7%, 0.3%, and 1.6%, respectively, in comparison to the YOLOv5s model. However, there was a decrease in the number of parameters by 17.5% and FLOPs by 3.1G. The enhancement of solely the neck network resulted in an increase of 1.1% in recall, as well as 0.5% and 0.4% in mAP(0.5) and mAP(0.5:0.95), respectively, when compared to YOLOv5s. Notably, the number of parameters and FLOPs experienced minimal alteration. When solely employing knowledge distillation, there was no alteration in the quantity of model parameters and FLOPs. However, there was an improvement in precision, recall, mAP(0.5), and mAP(0.5:0.95) by 2.6%, 3.0%, 2.5%, and 2.2%, correspondingly. The cumulative enhancements yielded a 2.4% increase in precision, a 2.7% increase in recall, a 3.8% increase in mAP(0.5), and a 1.5% increase in mAP(0.5:0.95). Additionally, there was a reduction of 15.5% in the number of model parameters and a decrease of 2.9G in FLOPs when compared to YOLOv5s.
The observed outcomes indicate that by solely changing the Faster-C3 module, there was a noteworthy drop in the number of model parameters and FLOPs, with only a minor decline in accuracy. This suggests that the proposed modification successfully reduces the overall complexity of the model. The observed substantial enhancement in recall, specifically when focusing on enhancing the neck network exclusively, provides compelling evidence that this modification effectively enhances the network’s ability to recognize and understand spatial information, while still preserving a strong resemblance to the original model’s size. Exclusive utilization of the knowledge distillation technique has the potential to enhance the overall accuracy of the model, while simultaneously maintaining a constant number of model parameters and floating point operations (FLOPs). Following the integration of many enhancement strategies, the assessment metrics of YOLOv5s experienced complete improvements, yielding good outcomes. In summary, the efficacy of the enhancement technique described in this study has been demonstrated.
3.2. Comparative Results of Different Knowledge Distillation Methods
The channel-wise knowledge distillation (cwd) method was employed in this study for the purpose of distillation. In order to conduct a comparative analysis of the efficacy of various knowledge distillation techniques, the YOLOv5s model was subjected to distillation using three distinct methods: mgd [
68], mimic [
69], and cwd. These methods were applied within a consistent experimental framework. The experimental results are depicted in
Figure 9. The figure illustrates that the other knowledge distillation methods, namely mimic and mgd, exhibited a comparable training accuracy. However, the cwd method demonstrated a superior training accuracy and exhibited faster improvement.
While both the mgd and mimic methods have demonstrated efficacy in enhancing accuracy, their effectiveness mostly hinges on the utilization of soft labels derived from model outputs or the decisions made by the teacher model. However, it is worth noting that these approaches may not fully capture the nuanced information inherent in the teacher model. CWD places a focus on the effective transfer of knowledge across different channels. This transfer is accomplished through a systematic adjustment of the activation process, which occurs incrementally. This facilitates the student model in effectively capturing significant features and information from the instructor model across all channels. The process of knowledge distillation is facilitated by channel-wise distillation (CWD), which involves the modification of channel activations between the teacher and student networks, in order to optimize the utilization of knowledge inside each channel. Therefore, the utilization of cwd distillation greatly enhances the precision of the algorithm and expedites the learning capacity of the model.
3.3. Comparison of Different Versions of YOLOv5
This research paper presents an enhanced and optimized lightweight detection algorithm utilizing the YOLOv5s framework. To assess the efficacy of the proposed algorithm across multiple iterations of YOLOv5, a series of experiments were conducted independently for each version of YOLOv5, employing an identical experimental configuration. The experimental results for various versions of YOLOv5 are presented in
Table 4. Although the proposed algorithm exhibits a higher number of parameters and FLOPs, YOLOv5n demonstrates significantly lower values in these aspects. Nevertheless, the algorithm under consideration exhibited an increase of 5.1%, 12.5%, 16.9%, and 11.6% in terms of precision, recall, mAP(0.5), and mAP(0.5:0.95), respectively, when compared to the YOLOv5n model. The proposed algorithm has a significantly lower number of model parameters compared to YOLOv5m and YOLOv5l, with the former being 3.5-times higher and the latter being 7.8-times higher. The mean average precision (mAP) at the intersection over union (IoU) threshold of 0.5 and the mAP across the IoU range of 0.5 to 0.95 exhibited improvements of approximately 3% and 10% over the proposed algorithm, respectively.
In addition to the aforementioned findings, it is worth noting that while YOLOv5m and YOLOv5l exhibit commendable detection accuracy, their practical applicability is limited due to the substantial equipment demands resulting from the significant escalation in both the parameter count and FLOPs. The YOLOv5n model demonstrated a notable efficiency in terms of model parameters and FLOPs, yet this compromises a substantial portion of its detection accuracy. Although YOLOv5m and YOLOv5l were about 5% higher compared to our proposed algorithm in the four metrics of model precision, recall, mAP(0.5), and mAP(0.5:0.95), the number of parameters was 3.5- and 7.8-times higher than the proposed algorithm, and the FLOPS were 3.7- and 8.3-times higher than the proposed algorithm, respectively. Despite the fact that our proposed algorithms exhibited an accuracy metric lower by around 5%, this improvement came at the expense of a significantly larger performance measure. It is evident that our algorithms offer a more favorable balance between accuracy and performance.
3.4. Statistical Significance Test (t-Test)
In this section, statistical significance tests (specifically t-tests) are employed to evaluate the presence of a statistically significant disparity in the performance metrics of the proposed model. The t-test is a widely employed statistical hypothesis test that is utilized to evaluate the presence of a statistically significant difference between two groups or samples. This is accomplished through the computation of the t-statistic and its associated p-value. The t-statistic is a statistical metric that quantifies the disparity between the means of two groups, while considering the variability within each group. This metric represents the magnitude of the standard error in the difference between the means of the two groups. The p-value is a statistical measure that represents the likelihood of observing a result as extreme as the t-statistic we achieved, assuming that the null hypothesis is true. The null hypothesis posits that there exists no statistically significant difference between the means of the two sets. In statistical analysis, it is customary to employ a significance level of 0.05 (or 5%) as the critical value for establishing statistical significance. When the p-value is smaller than 0.05, it is appropriate to reject the null hypothesis and infer that there exists a statistically significant disparity between the means of the two sets.
This study utilized a paired
t-test to assess the efficacy of the proposed methodology over multiple iterations of YOLOv5. The assessment was performed utilizing two metrics: mean average precision (mAP) at a threshold of 0.5, and the parameter count. As the performance of the model becomes closer, the absolute value of the
t-statistic decreases. Based on the results displayed in
Table 5, our model’s
t-statistic demonstrates a positive value and a much greater magnitude compared to YOLOv5n and the other models. Furthermore, the obtained
p-value (0.02268776414333465) falls below the commonly accepted threshold of 0.05. Therefore, it can be deduced that a statistically significant discrepancy existed in the means of the two groups or samples. The performance of the suggested model, as indicated by its mean average precision (mAP) metrics at a threshold of 0.5, exhibited a much higher level of effectiveness in comparison to YOLOv5n. The comparative study entailed the assessment of the suggested model in comparison to YOLOv5s, YOLOv5m, and YOLOv5l. The
t-statistic provided evidence that the proposed model demonstrates the greatest resemblance to YOLOv5m in relation to mean average accuracy (mAP) when the threshold was set at 0.5. The
p-values for YOLOv5s, YOLOv5m, and YOLOv5l were all above 5%. This implies that the proposed model did not demonstrate a statistically significant distinction from YOLOv5s, YOLOv5m, and YOLOv5l in relation to the accuracy metric mAP(0.5). As indicated in
Table 6, the
t-statistic values reveal that the parameter count of the proposed model closely approximates that of YOLOv5s, being smaller in magnitude. Additionally, the parameter count of the proposed model surpassed that of YOLOv5n, albeit being significantly smaller than both YOLOv5m and YOLOv5l. Based on the observation that all
p-values were less than 0.05, it is evident that a statistically significant distinction between the suggested model and the alternative models exists, in terms of the parameter representing the number of indicators.
In conjunction with the aforementioned findings, the suggested model exhibited no substantial disparity in the accuracy metric mAP(0.5) when compared to YOLOv5s, YOLOv5m, and YOLOv5l. In contrast, the suggested model exhibited a notable disparity in the performance index parameters when compared to YOLOv5s, YOLOv5m, and YOLOv5l. The results demonstrate that our suggested model achieved a comparable accuracy to the larger models, while maintaining a significantly reduced model volume. This indicates a favorable trade-off between detection accuracy and computing efficiency.
3.5. Comparison Results of Different Lightweight Detection Models
In order to establish the efficacy of our proposed methodology, a comparative analysis was conducted between our approach and the existing light detection algorithms, using identical experimental conditions. The empirical findings are presented in
Table 7. At a mean average precision (mAP) threshold of 0.5, our proposed algorithm demonstrated superior performance compared to YOLOv3-tiny, YOLOv4-tiny, YOLOv5-lite-g, and YOLOv5-lite-e, with improvements of 18.2%, 17.6%, 25.1%, and 51.4%, respectively. In terms of mean average precision (mAP) with a range of 0.5 to 0.95, our proposed algorithm demonstrated superior performance compared to YOLOv3-tiny, YOLOv4-tiny, YOLOv5-lite-g, and YOLOv5-lite-e, with improvements of 11.3%, 10.1%, 15.9%, and 27.6%, respectively. YOLOv3-tiny, YOLOv4-tiny, and YOLOv5-lite-g exhibited similar parameter counts and FLOPs as our proposed algorithm, while YOLOv5-lite-e demonstrated significantly lower parameter counts and FLOPs in comparison to our proposed algorithm.
To assess the efficacy of our suggested approach, we performed a comparative analysis between our method and contemporary lightweight detection algorithms, employing identical experimental conditions. The experimental findings are presented in
Table 7. At a threshold of 0.5 for mean average precision (mAP), our proposed approach demonstrated an improved performance in comparison to YOLOv3-tiny, YOLOv4-tiny, YOLOv5-lite-g, and YOLOv5-lite-e. The observed improvements were 18.2%, 17.6%, 25.1%, and 51.4%, correspondingly. Within the range of 0.5 to 0.95 mean average precision (mAP), our proposed approach demonstrated superior performance in comparison to YOLOv3-tiny, YOLOv4-tiny, YOLOv5-lite-g, and YOLOv5-lite-e. Specifically, our technique exhibited improvements of 11.3%, 10.1%, 15.9%, and 27.6% over these respective models. The YOLOv3-tiny, YOLOv4-tiny, and YOLOv5-lite-g models exhibited comparable parameter numbers and FLOPs to our proposed algorithm. Conversely, the YOLOv5-lite-e model demonstrated notably lower parameter numbers and FLOPs in comparison to our proposed approach. In comparison to YOLOv7-tiny and YOLOv8n, the suggested technique exhibited similarities in terms of the parameter count and FLOPs. However, it demonstrated worse performance in metrics such as the precision, recall, mAP(0.5), and mAP(0.5:0.95). The results demonstrated that our suggested model had an exceptional detection accuracy, compared to models with similar parameter numbers and FLOPs. The number of parameters and FLOPs in YOLOv8s is approximately twice as high as that of the proposed model. While YOLOv8s exhibited a 4.4% higher mean average precision (mAP) at the intersection over union (IoU) threshold of 0.5 to 0.95, and it demonstrated a 3.7%, 5.6%, and 5.2% lower precision, recall, and mAP(0.5), respectively. This observation shows that the suggested model achieved a more optimal trade-off between accuracy and recall in the task of target identification, while also attaining a higher average accuracy when evaluated at an intersection over union (IoU) threshold of 0.5. While YOLOv8s demonstrated a satisfactory performance within a broader range of IoU thresholds (0.5 to 0.95), its performance was somewhat subpar when subjected to stricter criteria.
In order to illustrate the applicability of our suggested method, we conducted a comparative analysis of many lightweight detection algorithms using the publicly available PascalVOC dataset. The outcomes of our experiments are presented in
Table 8. In comparison to YOLOv5m, the suggested model exhibited a higher precision, recall, mAP(0.5), and mAP(0.5:0.95) by 6.2%, 1.6%, 3.8%, and 6.9%, respectively. However, it is important to note that the proposed model had a much larger number of model parameters and FLOPs; approximately 3.5- and 3.7-times higher, respectively, compared to the proposed model. The results suggest that YOLOv5m demonstrated enhancements in precision, recall, mAP(0.5), and mAP(0.5:0.95) measures compared to the proposed model. However, these improvements were accompanied by a more intricate model architecture and an increased number of parameters and FLOPs, which should be taken into consideration. These findings indicate that the suggested model achieved a favorable trade-off between model complexity and performance in comparison to YOLOv5s and YOLOv7-tiny. Additionally, the proposed model exhibited a minor advantage in terms of performance measures. The YOLOv8s model exhibited approximately a 2.5-times greater parameter count and FLOPs compared to the suggested model. However, the proposed model demonstrated improved efficiency in terms of accuracy, recall, mean average accuracy (mAP) at IoU threshold 0.5, and mAP across the range of IoU thresholds from 0.5 to 0.95. Specifically, the proposed model achieved improvements of 1.4% in precision, 2.1% in recall, 1.7% in mAP(0.5), and 6.1% in mAP(0.5:0.95). The suggested model exhibited a notable superiority over YOLOv8s in relation to the quantity of parameters and FLOPs. However, the enhancements observed in the performance metrics of precision, recall, mAP(0.5), and mAP(0.5:0.95) were not statistically significant. Significantly, the performance of the suggested model on the PascalVOC public dataset demonstrated a high level of consistency with the maize leaf disease dataset, hence providing evidence of the generalizability of the proposed strategy.
Figure 10a,b shows the single image inference time and FPS of the relevant lightweight detection algorithms on edge computing devices.To give a fair indication of the time performance of the detection algorithms, none of the tested detection algorithms used any detection acceleration techniques. From
Figure 10a, it can be seen that YOLOv5m took the longest time to reason about a single image, YOLOv8s was the second longest, and our proposed algorithm took the shortest time. This indicates that the proposed algorithm had a faster inference on edge computing devices. From
Figure 10b, our proposed algorithms, YOLOv4-tiny, YOLOv5m, and YOLOv8s achieved 5.03FPS, 4.18FPS, 2.09FPS, and 3.729FPS, respectively on edge computing devices. It can be seen that the proposed algorithms achieved the highest frame rate, i.e., the highest processing speed, on this device. In summary, the proposed algorithm shows a shorter inference time and higher frame rate on edge computing devices as compared to the comparative lightweight algorithms. This means that it may be more suitable for edge computing scenarios in practical applications.
Based on the aforementioned findings, our suggested methodology demonstrated superior detection accuracy in comparison to the existing lightweight detection methods, while simultaneously preserving a reduced number of model parameters and FLOPs. Hence, by evaluating the algorithm’s dependability, robustness, and adaptability, it becomes evident that our method possesses the attribute of being lightweight, without compromising its excellent performance. The method demonstrated superior performance compared to the existing lightweight detection algorithms, as seen by its improved detection accuracy and detection performance metrics.