The model training experiments in this study were conducted on a Linux system, using an NVIDIA GeForce GTX 3080 GPU, a 12 vCPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz, 40 GB of RAM, and 80 GB of memory, which meet the requirements of this experiment.
Through the lightweight optimization of YOLOv7, the efficiency of weed identification has been further improved, reducing the impact of precision on the model’s classification performance. When the network model is large, convolutional neural networks become difficult to train. Therefore, we propose a method for identifying weeds in lily fields using a combination of pruning and distillation. Knowledge distillation allows the large model to transfer the knowledge it has learned to guide the training of the smaller model, significantly reducing the number of parameters and further achieving model compression and acceleration. Meanwhile, model pruning removes components that have minimal impact on the results, reduces the number of heads, and shares parameters, among other techniques. Together, knowledge distillation improves the model’s accuracy, reduces its latency, and compresses its network parameters. Dynamic pruning further reduces the model’s parameters, ultimately achieving a lightweight YOLOv7 network model.
5.1. Comparative Results and Discussion of CHHO
The chaotic system [
29] is a classical optimization strategy known for its diverse types and dynamic characteristics, which enable robust global search capabilities. Different chaotic systems exhibit varying performance in global exploration, local search, and space traversal. In this study, we selected 21 representative chaotic systems for comparative experiments to evaluate their optimization performance.
The experimental results indicate that the Tent chaotic system significantly outperforms other systems, particularly in terms of diverse search capabilities and approximating the global optimum. Consequently, we selected Tent as the core mechanism for chaotic initialization and search.
To further validate the superiority of Tent, we selected three types of classical test functions [
30,
31]: unimodal functions, multimodal functions, and fixed-dimensional functions. From these, six classical functions (see
Table 5) with distinct characteristics were chosen for comparative experiments, covering simple convex optimization problems and complex nonlinear high-dimensional optimization challenges.
In
Table 5, the comparison metric is the final objective (fitness) value achieved on each test function, with results expressed in scientific notation. The test functions are denoted as F1, F3, F10, F12, F15, and F21, which represent standard benchmark functions commonly used in the optimization literature to evaluate algorithm performance (e.g., unimodal, multimodal, and non-convex functions; see references [
30,
31]). For instance, F1 might correspond to a simple unimodal function like the Sphere function, while the other functions are designed to assess different aspects such as landscape ruggedness and scalability. Lower (or in some cases higher) final objective values indicate better optimization performance depending on the formulation of each test function.
Table 6 also compares various chaotic mapping algorithms employed to generate initial populations or chaotic sequences, including methods such as Random, Chebyshev, Circle, Gauss, Iterative, Logistic, Piecewise, Sine, Singer, Sinusoidal, Kent, Fuch, SPM, ICMIC, Tent–Logistic–Cosine, Sine–Tent–Cosine, Logistic–Sine–Cosine, Henon, Cubic, Logistic–Tent, Bernoulli, and Tent. These algorithms are referenced in the literature (see references [
29,
32,
33]) and serve as alternative approaches to facilitate global exploration in optimization problems.
Experimental results in
Table 6 demonstrate that the Tent system performs [
34,
35,
36] exceptionally well across the selected test functions. Among 23 classical test functions, Tent exhibits outstanding performance in both unimodal and multimodal functions. It not only converges rapidly to the global optimal solution but also shows higher convergence stability and stronger resistance to interference.
To present the experimental results more intuitively, the following figures were plotted:
Figure 6 illustrates the chaotic curve of the Tent system [
32,
33], representing its traversal performance across the search space. The curve indicates that Tent effectively covers the search space, avoiding local optima.
Figure 7 shows the performance histogram, displaying the fitness value distributions of different chaotic systems on the test functions. The histogram highlights Tent-stable performance across various functions.
Figure 8 is a chaotic scatter plot, representing Tent exploration and exploitation distribution in the high-dimensional search space. The plot showcases Tent-efficient exploration and robust global search characteristics.
Figure 9 shows the comparison diagram of iteration curves, which are compared among 21 kinds of chaotic systems to select the optimal system.
To ensure the fairness and objectivity of the experiments, identical experimental parameters were applied across all algorithms, specifically the following: Population size N = 50, ensuring each chaotic system performs searches with the same number of individuals; Dimensionality D = 30 (excluding fixed-dimensional functions), standardizing the search space scale for optimization problems; Maximum number of iterations set to 50, balancing algorithm runtime and ensuring comparable results.
Overall, the experimental findings confirm that the Tent chaotic system consistently outperforms others across various optimization problems, particularly in global search capability and convergence accuracy. This discovery provides strong support for the practical application of chaotic optimization strategies.
For the lily field dataset, using the improved Chaotic Harris Hawks Optimization (CHHO) algorithm to optimize the YOLOv7-tiny algorithm significantly enhanced its performance in detecting lily field weeds. As shown in subsequent sections, the YOLOv7-tiny parameters optimized by CHHO exhibit stronger competitiveness in detection tasks.
Through iterative optimization with the CHHO algorithm, the optimal activation functions and hyperparameter combinations for the lily field dataset were identified. During the optimization process, weights trained using 11 activation functions for the YOLOv7-tiny model were accumulated into a weight pool, from which one weight was randomly selected as the starting point for optimization. The optimization was performed in the form of fine-tuning, requiring at least 20 epochs to ensure effectiveness. In this process, besides optimizing the activation functions, hyperparameters can also be adjusted, or both activation functions and hyperparameters can be optimized simultaneously.
This study employed a simultaneous optimization approach, leveraging ample computational resources and time.
Table 7 compares the default hyperparameters and activation functions of the YOLOv7-tiny algorithm with the optimized values obtained through different optimization algorithms, including CHHO, HHO, SCA [
37], WOA [
38], and PSO [
39]. The table highlights CHHO’s superior tuning capabilities, demonstrating its effectiveness in refining key parameters such as activation functions, momentum, weight decay, and learning rate. This CHHO-based optimization method not only enhances model performance but also provides a comparative analysis of various metaheuristic approaches, offering practical insights for improving YOLO parameter tuning.
The computational complexity of the CHHO algorithm can be analyzed based on its four main components: population initialization, exploration phase, exploitation phase, and ranking mechanism. The initialization process, including the Tent Chaotic Mapping strategy, has a complexity of , where n is the population size and d is the dimensionality of the search space. The exploration phase, responsible for global searches, follows the Harris Hawks Optimization (HHO) framework and has a complexity of , where L represents the maximum number of iterations. The exploitation phase integrates the Differential Evolution mechanism to refine the solution, contributing an additional complexity of . Finally, the ranking mechanism employs a fast sorting method, leading to a worst case complexity of . Combining these components, the overall complexity of CHHO can be expressed as , which simplifies to , demonstrating that CHHO maintains a reasonable computational burden while significantly improving convergence efficiency through the combination of chaotic mapping and differential evolution mechanisms.
5.2. Comparative Results and Discussion of YOLOv7-Tiny
YOLOv7-tiny Model Detection Results Analysis:
(1) Impact of Different Network Models on Weed Detection in Lanzhou Lily Fields.
In this study, we compared different network models [
40,
41,
42,
43,
44,
45] through experiments to identify the model with the highest accuracy for weed detection in Lanzhou lily fields. To address the issue of long inference time with the larger YOLOv7 model, we employed a dynamic pruning strategy to reduce the model’s complexity and subsequently retrained the model. The results showed a notable improvement in accuracy across all models. The accuracy of the Lanzhou lily-weed detection model based on different network models is summarized in
Table 8.
As shown in
Table 8, selecting the appropriate model is crucial for object detection tasks. Due to differences in structure, parameters, and optimization methods, various models exhibit distinct performance during training. While the accuracy improvement varies significantly across different models, the YOLOv7 model consistently demonstrates superior overall metrics, with a substantial increase in accuracy. The average precision is 0.85 and the F1 score is 0.88. Therefore, in this study, YOLOv7 was selected for subsequent experiments.
(2) Impact of Different Parameter Counts on Weed Detection in Lanzhou Lily Fields.
To enhance the model’s recognition capabilities without compromising accuracy, knowledge distillation was introduced to improve the model’s learning speed. The experiment results indicate that knowledge distillation has effectively lightened the model to some extent while maintaining accuracy. Knowledge distillation allows knowledge to be transferred from a larger, more complex model (referred to as the “teacher model”) to a smaller, simpler model (referred to as the “student model”).
By transferring the knowledge from the teacher model to the student model, the results show that this method contributes to model lightweighting without significantly compromising accuracy. Although accuracy may decline during the distillation process, careful selection of the teacher and student models, as well as optimization of the distillation process, allows this accuracy loss to be controlled within an acceptable range.
Comparing the parameter counts of different versions of the YOLOv7-based model in this study, the results are shown in
Table 9. Among the YOLOv7 series models, YOLOv7-tiny’ performed the best with an accuracy of 92.53%, an improvement of 6.7% over the previous model.
As shown in
Table 9, when analyzing the YOLOv7 network models based on accuracy, the YOLOv7-tiny model achieved the highest accuracy of 92.53% with the fewest parameters. Despite the minimal decrease in accuracy compared to larger models, the YOLOv7-tiny model also demonstrated significant computational efficiency, with the lowest GFLOPs and the highest ms–FPS, making it well suited for real-time applications. Therefore, this study uses the YOLOv7-tiny model for the experiments, as it strikes an optimal balance between performance and efficiency.
(3) Detection Performance of Weeds in the Lily Field under Different Occlusion Conditions.
To investigate the impact of occlusion on the model’s performance, three types of occlusion conditions were tested: no occlusion, light occlusion, and severe occlusion. As shown in
Figure 10, when the images of Lanzhou lily and weeds are either unobstructed or lightly occluded, the model maintains a high recognition accuracy. In cases where the features are clear, the model does not miss any detections or make false positives. However, when severe occlusion occurs, the irregular growth of weeds and lilies creates multiple gaps that expose limited features, providing insufficient information for the model to make accurate judgments. As a result, more missed detections are observed, though false detections are rare. Additionally, weak lighting caused by occlusion can further contribute to misclassification.
(4) The Impact of Different Dataset Sizes on Weed Detection in Lanzhou Lily Fields. Using YOLOv7-tiny as the base model, this study trains the Lanzhou lily-weed detection model, which incorporates pruning and distillation techniques, on datasets of various sizes. The aim is to explore how the proposed method performs under different dataset sizes. The comparison of the recognition accuracy of the lightweight YOLOv7-tiny model across datasets of varying sizes is shown in
Figure 11.
In the comparison of different network models, as shown in
Figure 11, YOLOv7-tiny exhibits the highest recognition accuracy, and the dataset size has the least impact on the model’s performance. When the sample size is 500 images, YOLOv7-tiny achieves the highest recognition accuracy. However, when the number of samples per class increases to 1500 images, the recognition accuracy of each model is at its peak. As the sample size decreases, the recognition accuracy of all models gradually decreases. Yet, YOLOv7-tiny shows minimal accuracy loss, indicating that the proposed method maintains high classification accuracy even with fewer samples. This demonstrates that the method can handle situations with limited Lanzhou lily-weed samples.
However, when the dataset is extremely small, convolutional neural networks (CNNs) struggle to classify the data accurately. The experiment highlights the reduced dependence on large datasets by the proposed method.
The YOLOv7-tiny model, implemented for Lanzhou lily-weed detection, addresses issues related to low accuracy in CNN-based models for identifying common weeds in lily fields, as well as the high data requirements. By analyzing the impact of different network models, parameter sizes, dataset sizes, and occlusion conditions on model performance, the experiment validates the effectiveness of the proposed approach.
Figure 12 shows the detection results for the YOLOv7-tiny network model with dynamic pruning and knowledge distillation on the lily-weed dataset. Comparing image (b) with image (a), it is evident that the enhanced model accurately detects weeds and lilies with improved precision. This indicates that even after model compression, the network is capable of learning the similarities within the same category and distinguishing between different categories in a short time, leading to accurate identification of weeds and lilies.
As shown in
Figure 13, the model’s accuracy (P), recall rate (R), and P-R curve were recorded during the training process. From the plot in
Figure 13a, it is evident that the accuracy increases rapidly during the initial training stages. Ultimately, the model achieves an accuracy of 89.2%, a recall rate of 81.5%, and a mAP (mean Average Precision) of 88.4% at an overlap threshold of 0.5. The variations in P, R, mAP, and loss values over iterations align with the experimental expectations. The series of network models proposed in this study, considering both parameter count and accuracy, offer significant advantages for practical applications.
To address the issue of insufficient data, which can negatively impact the performance of the network model and even prevent it from learning effective feature representations, experiments were conducted using the YOLOv7-tiny model. The results demonstrated its outstanding overall performance. YOLOv7-tiny achieved the highest recognition accuracy and showed stability in performance across different datasets, reaching 93.53%. This result highlights the model’s robustness and generalization capability when handling limited data. Additionally, the lightweight nature of YOLOv7-tiny enhances its flexibility and efficiency in practical applications. It can be deployed in resource-constrained environments, such as mobile devices or embedded systems, enabling real-time object detection and recognition. This makes it particularly valuable for precision weeding applications.