Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s

Yu, Hong; Zhao, Jie; Xi, Xiaobo; Li, Yongbo; Zhao, Ying

doi:10.3390/agronomy14092141

Open AccessArticle

Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s

by

Hong Yu

¹,

Jie Zhao

²,

Xiaobo Xi

²

,

Yongbo Li

^3,* and

Ying Zhao

⁴

¹

College of Agricultural Engineering, Jiangsu Agri-Animal Husbandry Vocational College, Taizhou 225300, China

²

School of Mechanical Engineering, Yangzhou University, Yangzhou 225127, China

³

College of Intelligent Manufacturing, Taizhou Polytechnic College, Taizhou 225300, China

⁴

School of Mechanical and Electrical Engineering, Hainan University, Haikou 570228, China

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(9), 2141; https://doi.org/10.3390/agronomy14092141

Submission received: 18 August 2024 / Revised: 16 September 2024 / Accepted: 18 September 2024 / Published: 20 September 2024

(This article belongs to the Special Issue AI, Sensors and Robotics for Smart Agriculture—2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Currently, multi-layer deep convolutional networks are mostly used for field weed recognition to extract and identify target features. However, in practical application scenarios, they still face challenges such as insufficient recognition accuracy, a large number of model parameters, and slow detection speed. In response to the above problems, using chickweed as the identification object, a weed identification model based on improved YOLOv5s was proposed. Firstly, the Squeeze-and-Excitation Module (SE) and Convolutional Block Attention Module (CBAM) were added to the model’s feature extraction network to improve the model’s recognition accuracy; secondly, the Ghost convolution lightweight feature fusion network was introduced to effectively identify the volume, parameter amount, and calculation amount of the model, and make the model lightweight; finally, we replaced the loss function in the original target bounding box with the Efficient Intersection over Union (EloU) loss function to further improve the detection performance of the improved YOLOv5s model. After testing, the accuracy of the improved YOLOv5s model was 96.80%, the recall rate was 94.00%, the average precision was 93.20%, and the frame rate was 14.01 fps, which were improved by 6.6%, 4.4%, 1.0%, and 6.1%, respectively, compared to the original YOLOv5s model. The model volume was 9.6 MB, the calculation amount was 13.6 GB, and the parameter amount was 5.9 MB, which decreased by 29.4%, 14.5%, and 13.2% compared with the original YOLOv5s model, respectively. This model can effectively distinguish chickweed between crops. This research can provide theoretical and technical support for efficient identification of weeds in complex field environments.

Keywords:

weed identification; YOLOv5s; attention mechanism; lightweight; recognition accuracy

1. Introduction

Weeds usually grow in farmland or on ridges. They are tenacious and easy to reproduce, with a fast growth rate. Crops compete for nutrients. Competition for living space and the spread pests and diseases affect the growth and development of crops. If prevention and treatment are not timely, the crop yield and quality will be reduced, bringing heavy losses to agricultural production [1,2,3,4,5,6,7]. According to statistics, China’s annual crop yield reduction caused by weeds is about 10%, and serious cases can reach 50–70%. In order to reduce damage from weeds, artificial, chemical, physical, mechanical, biological, and other methods of weed control are commonly used, with herbicide weed control being the most widely used [8]. Excessive use of herbicides over a long period of time will cause soil contamination and destroy soil structure, leading to soil crusting, soil quality decline, and ultimately crop yield reduction [9,10,11,12,13,14].

In order to reduce the use of herbicides and achieve precise spraying of herbicides, researchers have carried out a large amount of research on farmland weed identification technology, among which deep learning detection algorithms are a typical technology [15,16]. Deng et al. [17], in their research on weed recognition against the complex background of rice fields, studied the selection method of the number of hidden layer nodes in the Deep Belief Network (DBN) model and concluded that the single hidden layer network structure has a higher recognition rate than the double hidden layer network structure. The recognition rate can reach 91.13%. Peng et al. [18] constructed a paddy field weed dataset (PFMW) and studied and compared the recognition accuracy of various types of deep convolutional neural networks. The VGG-16 model achieved the highest accuracy, and the recognition accuracy of various types of weeds reached 90%. Zhang et al. [19] applied the Faster R-CNN deep network model method for rape and weed identification, and compared the recognition results of VGG-16, ResNet-50, and ResNet-101 feature extraction networks, concluding that the recognition accuracy of the VGG-16 model can reach 83.9%. Sun et al. [20] compared the recognition performance of the YOLOX model and the Deformable DETR model in the study of the identification of green vegetable seedlings and weeds, and concluded that the YOLOX model performed better, with an average accuracy of 98.1%. Sun et al. [21] used depth-separable convolution blocks to reduce the number of parameters and calculations of the convolutional neural network model, achieving rapid and accurate identification of sugar beets and weeds. The average identification accuracy rate was 87.58%, and the speed reached 42.064 fps. Meng et al. [22] used depth-separable convolution and Squeeze-and-Excitation Networks (SENet) modules, combined with the feature layer fusion mechanism to improve the detection accuracy and detection speed of weeds in corn fields. The average accuracy was 88.27% and the detection speed was 32.26 fps. Xu et al. [23] built a lightweight convolutional network-based recognition model based on the Xception convolutional network and applied it to field weed identification in the corn seedling stage. The average accuracy reached 98.63% and the volume was 83.5 MB. Kang et al. [24] proposed a model compression method based on the SENet attention mechanism and dynamic sparsity constraints to identify weeds in sugar beet fields, which reduced the number of model parameters to 43.97% and the amount of calculation to 82.94%. In the existing research on deep learning weed identification algorithms, multi-layer deep convolutional networks are often used for target feature extraction and identification. These types of methods have problems such as large model parameters, low recognition accuracy, and slow detection speed. Identifying how to achieve lightweight models and strike a reasonable balance between recognition accuracy and recognition speed requires further research [25,26,27].

In order to solve the above problems, this paper uses chickweed as the identification object and designs a weed identification model based on improved YOLOv5s. The SE and CBAM attention mechanism modules are added to the model to enhance target feature extraction and improve model recognition accuracy by suppressing unimportant feature information. The Ghost lightweight feature fusion network is introduced to distribute the parameters across the terminal, thereby simplifying the convolution operation and achieving lightweight models. We use the EIoU loss function to replace the model’s original loss function to improve the model’s anti-interference ability and stability.

2. Materials and Methods

2.1. Image Acquisition

In January 2024, a camera was used to complete image data collection for chickweed. A total of 320 valid images were collected with a maximum resolution of 1920 pixels × 1080 pixels. The collected images contained a variety of states, including: different illumination states, single plant and multiple plant states, and impurity-containing states. The photos collected in different states are shown in Figure 1.

2.2. Image Enhancement and Data Segmentation

In order to increase the number and diversity of sample data and prevent model overfitting during the training process, the dataset was expanded to 1600 images by rotating, mirroring, cropping, adjusting brightness and other image enhancement methods. The expanded image data were randomly divided, of which 80% was used as the training set and 20% as the test set to ensure the effectiveness and generalization ability of the model. The data were trained using the five-fold cross-validation method.

2.3. Image Data Annotation

Labeling software was used to manually label the chickweed in the dataset. The labeling category was weed and the label was grass, as shown in Figure 2. Save the marked information in a TXT tag file.

2.4. YOLOv5s Model’s Structure

The YOLOv5s model’s structure consists of four main parts, namely: the input terminal (Input) responsible for processing the original image data; the backbone network (Backbone) responsible for extracting image features; the feature fusion network (Neck) that fuses the feature maps extracted by the backbone network (Backbone); and the prediction end (Head) that performs target detection and prediction on the fused feature map, as shown in Figure 3. Efficient and accurate target detection can be achieved through the above four parts [28,29].

2.5. Improve Strategy SE Module

In order to strengthen the YOLOv5s model’s attention to channel information, the SE (Squeeze-and-Excitation) attention mechanism module is added to the backbone network, which consists of two modules: Squeeze and Excitation. It extracts features according to the importance of the channel and suppresses unimportant features. The advantage of this module is that it can perform new and effective work on the original basis without changing the original network structure. The SE module was originally used in image classification tasks. Because it can enhance the network’s attention to information features, the technology has been widely used in various visual tasks, including object detection and image classification. The SE module’ structure is shown in Figure 4 [30,31,32].

First, the spatial dimension is compressed through the Squeeze module (compression operation), and each feature map is globally average pooled and compressed into a real number, thereby extending the receptive field to the global scope. This type of real number can be obtained by Formula (1):

Z k = \frac{1}{W \times H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} u k k = 1, 2, \dots C

(1)

where

Z k

represents the compressed vector representing the importance of the kth channel in the entire feature graph;

W \times H

represents the spatial dimension of

u

;

u

represents the Convolved feature map;

k

represents the value of the position (

i, j

) on the kth channel in the original feature map;

C

represents the number of channels of

u

.

Again, through the Excitation module (excitation operation), the real numbers of the previous step are obtained and two fully connected layers are used to increase the nonlinearity of the module. The first fully connected layer reduces the dimension of the module, and after ReLU activation, the second fully connected layer is used to increase the module dimension. The sigmoid function is used for activation. The process is as follows:

S = σ [W_{2} δ (W_{1} z)]

(2)

where

σ

represents the sigmoid activation function;

W_{1}

and

W_{2}

represent the parameters of the fully connected layer;

δ

represents the nonlinear activation function ReLU; and

z

represents the global feature dimension reduction parameter.

Finally, channel weight multiplication is used to multiply the weight values calculated by the SE module with the original feature channels to obtain the newly calibrated features, as shown in Formula (3):

X k = s k \cdot u k k = 1, 2, \dots C

(3)

where

X k

represents the weighted feature vector of the kth channels; and

s k

represents the weight values of the kth channels.

The SE attention module is introduced based on the backbone network of the original network model, and its location is shown in Figure 5. Feature mapping is constructed based on channels to increase the proportion of useful information in the feature layer, thereby improving the recognition accuracy of chickweed and reducing the probability of false detection during model detection.

In the context of weed identification, the SE module helps the model concentrate on features that are most critical for distinguishing among different types of vegetation while ignoring noise or irrelevant patterns. The SE module is a lightweight yet powerful addition to the architecture. It dynamically adjusts channel-wise feature responses based on global contextual information, allowing the model to prioritize the most important features for detection. This significantly improves the performance of YOLOv5s in tasks such as weed identification, where distinguishing among different vegetation types is crucial. The combination of squeeze and excitation ensures that both spatial and channel-wise dependencies are leveraged to enhance the model’s robustness.

The SE module improves the model’s perception of important features through adaptive weighting between channels. The introduction of the SE module will increase the number of parameters and calculations of the model. The SE module will increase the number of parameters in the fully connected layer after each convolutional layer, but these parameters are relatively few, which will have a small impact on the total number of parameters and the calculation amount. At the same time, the SE module will perform additional calculation steps, such as global average pooling and calculation of fully connected layers, to increase the computation amount to a certain extent.

2.6. Improve Strategy CBAM Module

Adding the CBAM (convolutional block attention module) attention mechanism to the backbone network of YOLOv5s can effectively suppress unimportant information, improve the neural network’s attention to chickweed targets, and thereby improve the detection accuracy. The CBAM module is added to the backbone network, and its location is shown in Figure 6.

The overall process of the CBAM attention mechanism is shown in Figure 7. It consists of the channel attention module (CAM) and the spatial attention module (SAM), which focus on the key information in the image. In this task, we assign a higher weight to chickweed, which improves the ability of learning and expressing network features. Because of the above advantages, the module has been widely used in feature extraction tasks that require finer detail [33].

The structure of the channel attention module (CAM) is shown in Figure 8. The module sums up the input feature map F through average pooling (

F_{a v g}^{c}

) and maximum pooling (

F_{m a x}^{c}

) to extract feature map information, and its features enter the Multi-Layer Perceptron (MLP) for processing, through the activation function

σ

to get the channel attention feature map Mc. The two-layer parameters in the multi-layer perception model are represented by

α 1

and

α 2

, and the attention expression (4) can be obtained:

M c (F) = σ (α 2 α 1 F_{a v g}^{c}) + α 2 α 1 F_{m a x}^{c}

(4)

where

M c (F)

represents the output weight of the feature map

F

through the channel attention;

α 1

and

α 2

represent the two-layer parameters in the multilayer perceptual model;

F_{a v g}^{c}

represents the average pooling;

F_{m a x}^{c}

represents the maximum pooling.

The structure of the spatial attention module (SAM) is shown in Figure 9. Its input terminal is the output terminal of the channel attention module (CAM). This module performs average pooling (

F_{a v g}^{c}

) and maximum pooling (

F_{m a x}^{c}

) on the input feature map. Finally, the obtained aggregated channel information is combined into a feature map, and a 7 × 7 convolution is used to generate a two-dimensional spatial feature map, as shown in Formula (5):

M s (F) = σ (f^{7 \times 7} ([A v g P o o l (F); M a x P o o l (F)])) = σ (f^{7 \times 7} ([F_{a v g}^{s}; F_{m a x}^{s}]))

(5)

where

M s (F)

represents the output weight of the feature map

F

through the spatial attention;

f^{7 \times 7}

represents the convolution operation with filter size 7 × 7;

A v g P o o l (F)

represents the output of average pooling;

M a x P o o l (F)

represents the output of maximum pooling.

[F_{a v g}^{s}; F_{m a x}^{s}]

represents the 2 H × W × 1 description spliced by channel.

The use of average pooling and maximum pooling simultaneously in CAM and SAM modules is to capture the information in the feature map more comprehensively. Average pooling calculates the average value of feature maps in each channel or spatial position, and extracts global context information, which helps the network to pay attention to the distribution of overall features. This helps to identify regions on the feature map with higher average responses. Maximum pooling focuses on the most prominent features at each channel or spatial location, i.e., extracts the most prominent local information. Through maximum pooling, the network can focus on those parts of the feature map that are responsive, and then identify the most critical target features. The combination of average pooling and maximum pooling can capture important global and local information in feature maps on a wider scale, thus improving the expressive ability of attention mechanism and the recognition accuracy of network. This combination enhances the model’s ability to detect diverse targets in complex scenarios, especially in distinguishing different types of target and false detection.

The CBAM module further enhances the feature extraction capability of the model by focusing on important information in the “channel” and “space” dimensions, respectively. Its introduction will also affect the model parameters and calculation to a certain extent. The CBAM module realizes spatial attention and channel attention by introducing a convolutional layer, and the introduced convolutional layer has a small number of parameters. At the same time, the computational complexity of the CBAM module is relatively high, especially when applied to high-dimensional feature maps, which will significantly increase the computational complexity of the model, including global average pooling, maximum pooling, and convolution operations.

2.7. Ghost Convolution

The introduction of the improved strategy improved the model recognition accuracy, but it also caused the model volume to increase. In order to achieve a lightweight model, Ghost convolution was introduced to replace the ordinary convolution in the Conv module for feature extraction. Ghost convolution is mainly composed of two parts. First, a small number of intrinsic feature maps are generated by ordinary convolution calculations. The intrinsic feature maps are linearly changed to obtain more feature maps. Then, the two sets of feature maps are spliced to form a new feature map. Compared with the original ordinary convolution, Ghost convolution can maintain a higher degree of similarity and can significantly reduce the number of model parameters, making it easier to distribute the parameters to the terminal, thereby simplifying the convolution operation and achieving lightweight models. At present, Ghost convolution is mainly used to design lightweight neural networks, especially in applications that require high computational efficiency [34]. The convolution formation process of the Ghost module is shown in Figure 10.

Ghost convolution uses fewer parameters to realize the recognition of target features. While being lightweight, it will lead to a decrease in detection accuracy. Therefore, the replacement position of Ghost convolution needs to be further verified. After pre-testing, the Ghost convolution module was used to replace the ordinary convolutions in all Conv modules of YOLOv5s in the backbone network + neck network, which only reduced the model on a large scale but also ensured recognition accuracy.

2.8. EloU Loss Function

The loss function of the YOLOv5s model consists of a bounding box regression loss function, a classification loss function, and a confidence loss function. The similarity relationship of the bounding box regression loss function is not sensitive enough and is susceptible to interference from different scales and angles; in complex environments, the classification loss function experiences problems such as training difficulties and model instability. Therefore, the original loss function has the problem of low accuracy. In order to further improve the accuracy of the model, the original loss function is replaced by the EIoU loss function. The EIoU loss function combines the intersection ratio and the difference in center coordinates, and introduces correction factors. Its expression is as in Formula (6):

L_{E I o U} = 1 - I o U + \frac{p^{2} (b, b^{g t})}{C^{2}} + \frac{p^{2} (w, w^{g t})}{C_{w}^{2}} + \frac{p^{2} (h, h^{g t})}{C_{h}^{2}}

(6)

where

b

represents the center point of the prediction box;

b^{g t}

represents the center point of the target box;

p

represents the Euclidean distance between two center points;

C

represents the minimum diagonal distance between the predicted frame and the actual frame;

C_{h}

represents the height of the smallest outer frame;

C_{w}

represents width of the smallest outer frame;

h

represents the height of the prediction box;

w

represents the width of the prediction box;

h^{g t}

represents the height of the target box; and

w^{g t}

represents the width of the target box.

The EIoU loss function can not only better measure the similarity between target frames, but also help the model better handle complex situations in the target detection task, thus improving the performance of the target detection model.

2.9. Test Environment

The test environment for this test is shown in Table 1.

The batch size set in the experiment was 8; the learning rate was 0.01; the rounds were 150; the momentum parameter was 0.95; and the weight attenuation coefficient was 0.001.

2.10. Confusion Matrix

The confusion matrix of this experiment is shown in Table 2.

True Positive (

T_{P}

): The model correctly predicted that the sample that was actually chickweed was chickweed; False Negative (

F_{N}

): The model incorrectly predicted that a sample that was actually chickweed was non-chickweed; False Positive (

F_{P}

): The model incorrectly predicted that samples that were actually non-chickweed were chickweed; True Negative (

T_{N}

): The model correctly predicted that the sample that was actually non-chickweed was non-chickweed.

2.11. Evaluation Indicators

Model performance was evaluated using indicators such as precision (P, Precision), recall (R, Recall), mean average precision (mAP, mean average precision), and model volume. The precision rate P refers to the proportion of positive samples in the real situation among the predicted positive samples. The recall rate R refers to the proportion of predicted positive samples among the real positive samples. Compared with the unstable volatility of P and R, the average precision value (mAP) can better reflect the global performance index. Among them, P, R, and mAP are as in Formulas (7)–(9):

p = \frac{T_{P}}{T_{P} + F_{P}} \times 100 %

(7)

R = \frac{T_{P}}{T_{P} + F_{N}} \times 100 %

(8)

mAP = \int_{0}^{1} P (R) dR \times 100 %

(9)

Model volume, computation, and parameter size were used to measure the complexity of the model; frame rate was used to evaluate the real-time detection performance of the model.

3. Results

3.1. Ablation Test

In order to verify the effect of adding each module to the original YOLOv5s model, a model more suitable for the chickweed identification task was found. Modifications were made on the basis of the original YOLOv5s model, and the three modules—SE, CBAM, and Ghost—were installed and tested individually, in pairs, or all together, and the evaluation of various indicators was completed. The test results are shown in Table 3.

Table 3. Ablation test results.

Model	Precision/%	Recall/%	Mean Average Precision/%	Model Volume/MB	Computation/GB	Parameter Size/MB	Frame Rate/fps
YOLOv5s	90.20	89.60	92.20	13.6	15.90	6.80	13.21
YOLOv5s + SE	93.10	91.30	92.50	14.2	16.30	7.10	13.51
YOLOv5s + CBAM	93.90	92.50	92.90	13.9	16.00	7.00	13.61
YOLOv5s + Ghost	89.70	89.80	92.10	7.5	11.30	5.10	13.20
YOLOv5s + SE + CBAM	95.90	93.80	93.30	15.2	17.60	7.50	13.95
YOLOv5s + SE + Ghost	92.60	91.10	92.60	8.2	12.30	5.70	13.77
YOLOv5s + CBAM + Ghost	93.00	92.30	92.70	8.5	13.10	5.40	13.62
YOLOv5s + SE + CBAM + Ghost	95.40	93.60	93.00	9.6	13.60	5.90	13.83
YOLOv5s + SE + CBAM + Ghost + EIoU	96.80	94.00	93.20	9.6	13.60	5.90	14.01

Note: The five-fold cross-validation method was performed in this experiment and these experiments or datasets are comparable, that is, they are consistent in design and execution. Therefore, the accuracy and recall rate of this experiment are based on the average of five experimental data. The results of the YOLOv5s + SE + CBAM + Ghost + EIoU model are shown in Table 4.

Table 4. Results of the five-fold cross-validation method.

Test Number	$T_{P}$	$F_{N}$	$F_{P}$	$T_{N}$	Precision	Recall
1	152	8	5	155	96.82	95.00
2	151	9	4	156	97.42	94.38
3	149	11	3	158	98.03	93.13
4	150	10	6	154	96.15	93.75
5	150	10	7	153	95.54	93.75
Average	/	/	/	/	96.80	94.00

The confusion matrix results of the first group of experiments of the YOLOV5S + SE + CBAM + GHOST + EIOU model are shown in Table 5.

It can be concluded from the ablation test results that the SE module and CBAM module greatly improve the accuracy of the model. These two modules can effectively suppress unimportant information, increase the neural network’s attention to chickweed, and improve the feature extraction capability of the model, thereby improving detection accuracy. The Ghost module can achieve lightweighting of the model without excessively reducing the accuracy of the model. At the same time, after replacing the original function with the EIoU loss function, the detection accuracy is further improved and the detection speed of the model is accelerated.

Compared with the original YOLOv5s model, the accuracy, recall rate, average precision mean, and frame rate of the improved YOLOv5s model were increased by 6.6%, 4.4%, 1.0%, and 6.1%, respectively, and the model volume, calculation amount, and parameter amount were reduced by 29.4%, 14.5%, and 13.2%, respectively. The performance of the improved YOLOv5s model has been significantly improved.

During the experiment, it was found that when there are other weeds in a chickweed image, the original YOLOv5s model will misdetect the other weeds as chickweed, as shown in Figure 11a. The improved YOLOv5s model can more accurately select the location of chickweed, as shown in Figure 11b. The improved model’s recognition effect on chickweed and other weeds is significantly better than the original YOLOv5s model. This is due to the addition of the SE module and CBAM module to the extraction of important features of chickweed, and the expansion of the SE module’s receptive field to the global scope.

In summary, the YOLOv5s + SE + CBAM + Ghost + EIoU model was selected to efficiently complete the identification of chickweed.

3.2. Comparative Tests

The improved YOLOv5s model was compared with other mainstream models to verify the superiority of the model. The comparative test results are shown in Table 6. It can be concluded from the comparative experiments that compared with the improved YOLOv5s model, the YOLOv4, SSD, and Faster-RCNN models are larger in size and are not suitable for lightweight deployment, and the recognition accuracy is not as good as that of the improved YOLOv5s model. Although the YOLOv7s and YOLOv8s models performed well, the improved YOLOv5s model performed even better in the detection task, especially in terms of detection accuracy and recall rate, which reached 96.80% and 94.00%, respectively, and the average accuracy (mAP) also reached 93.20%. In contrast, although YOLOv7s and YOLOv8s have their own advantages in frame rate and other performance indicators, they are not as good as the improved YOLOv5s model when comprehensively considering resource occupancy and detection accuracy. Specifically, the improved YOLOv5s model achieves excellent detection accuracy with a minimum computational amount (13.60 GB) and minimum model volume (9.6 MB), which makes the model particularly suitable for deployment in computational resource-limited areas without sacrificing detection performance. Although this model is relatively low in terms of frame rate (14.01 fps), it meets the requirements of this target detection rate. Therefore, based on the model that needs low resource occupancy and high detection accuracy for this recognition task, the improved YOLOv5s model was selected as the final detection model, as it met the requirements of this detection task best.

3.3. Experimental Effects

In order to verify the effect of the improved YOLOv5s model on the recognition of chickweed in agricultural production, the image splicing method was used to splice images of chickweed into images of broad bean seedlings and rapeseed seedlings and conduct experiments. The test results are shown in Figure 12 and Figure 13. Compared with the YOLOv5s model, the identification ability of the improved YOLOv5s model in agricultural production has been significantly improved, which is reflected in higher confidence, more accurate frame selection of chickweed, and the ability to effectively distinguish chickweed between crops.

4. Discussion

In this study, we have proposed an improved YOLOv5s model to identify chickweed. The main objective is to provide theoretical and technical support for the effective identification of chickweed in complex field conditions, enhancing the robustness of the model to detect chickweed in different fields. A dataset of different images has been made, and the dataset amplified and trained at the same time. The improved model shows strong detection capabilities and can accurately identify chickweed in field images.

One of the key improvements is the integration of the SE module into the original YOLOv5s model. The SE module enhances the model’s ability to recalibrate the importance of feature channels so that features can be extracted more efficiently and irrelevant channels can be suppressed [32]. By embedding the SE module into the backbone network, we have observed a significant improvement in overall object recognition performance, which is consistent with previous studies emphasizing the advantages of attention mechanisms in feature enhancement [35].

As well as the SE module, the CBAM module has been introduced to further improve the feature extraction and recognition capabilities. CBAM combines channel and spatial attention, enabling the model to focus more effectively on the most relevant features in the image [33]. By embedding the CBAM attention mechanism into the backbone network, the recognition performance for chickweed detection was greatly improved, confirming the conclusions drawn from previous studies [36,37]. This confirms the effectiveness of the CBAM module in enhancing object recognition.

In addition, Ghost convolution has been introduced in the Conv module instead of convolution for feature extraction, which further optimizes the model. Ghost convolution aims at the computational complexity of the model by generating more features with fewer convolution operations [34]. Replacing convolution with Ghost convolution simplifies the convolution operation and improves the overall efficiency of the model, which is consistent with previous studies [38].

Furthermore, we replaced the original loss function with the EIoU loss function, which takes into account the IoU difference and the central coordinates. This modification is expected to improve the accuracy of the model by improving the bounding box regression [39], which is very critical for the accurate detection of chickweed.

Although the model has improved in terms of model size and detection speed, there are still areas for further improvement, especially the details of some networks can be refined, which in turn may lead to greater improvements in detection accuracy and robustness.

Future studies should not be limited to the detection of chickweed and consider a wider range of weed species to further verify the generality of this model. Extending the dataset to include other types of weeds, and applying this to a wider range of crops, will provide a more comprehensive assessment of the model’s effectiveness in real-world agricultural applications.

5. Conclusions

Adding the attention mechanism SE module and CBAM module to the backbone network was found to effectively suppress unimportant information, improve the neural network’s attention to the chickweed target, improve the feature extraction capability of the model, and thereby improve the detection accuracy. Using Ghost convolution to replace the ordinary convolution in the Conv module during extraction achieved lightweighting of the model without excessively reducing the accuracy of the model; at the same time, replacing the original loss function with the EIoU loss function further improved the model’s performance.

An improved model based on the YOLOv5s model was proposed. The YOLOv5s + SE + CBAM + Ghost + EIoU model can achieve a balance between recognition accuracy and model volume. In the chickweed recognition test with a low dataset, the accuracy rate of the model was 96.80%, the recall rate was 94.00%, the average precision was 93.20%, and the frame rate was 14.01 fps, which were improved by 6.6%, 4.4%, 1.0%, and 6.1%, respectively, compared to the original YOLOv5s model. The model volume was 9.6 MB, the calculation amount was 13.6 GB, and the parameter amount was 5.9 MB, which were reduced by 29.4%, 14.5%, and 13.2%, respectively, compared with the original YOLOv5s model. Performance improvements were significant.

We conducted comparative experiments with YOLOv5s, YOLOv4, SSD, Faster-RCNN, YOLOv7s, and YOLOv8s models. Compared with the improved YOLOv5s model, the YOLOv5s, YOLOv4, SSD, and Faster-RCNN models are larger in size and are not suitable for lightweight deployment, and the recognition accuracy is not as good as the improved YOLOv5s model. The YOLOv7s and YOLOv8s have their own advantages in frame rate and other performance indicators, but they are not as good as the improved YOLOv5s model when comprehensively considering resource occupancy and detection accuracy. The improved YOLOv5s model achieves high accuracy in identifying chickweed under low datasets, helps improve the identification accuracy of chickweed in the field, and promotes the development of weed control and precision agriculture in the agricultural field.

Author Contributions

Conceptualization, H.Y. and Y.L.; methodology, Y.L. and J.Z.; software, J.Z. and X.X.; validation, H.Y. and Y.L.; Data curation, J.Z. and Y.Z.; writing—original draft preparation, H.Y.; writing—review and editing, J.Z., Y.L. and Y.Z.; project administration, H.Y., Y.L. and X.X.; funding acquisition, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Jiangsu Modern Agricultural Machinery Equipment and Technology Demonstration Project (NJ2023-22), the Basic science (Natural science) research project in universities of Jiangsu (24KJB510045), the 2022 Taizhou “Fengcheng Talent Program” Young science and technology talent Lifting Project (Taizhou Association for Science and Technology Document (2022) No. 64), and the Jiangsu Agri-animal Husbandry Vocational College Research Project (NSF2023ZR12).

Data Availability Statement

Data is contained within the article or supplementary material.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Westwood, J.H.; Charudattan, R.; Duke, S.O.; Fennimore, S.A.; Marrone, P.; Slaughter, D.C.; Swanton, C.; Zollinger, R. Weed Management in 2050: Perspectives on the Future of Weed Science. Weed Sci. 2018, 66, 275–285. [Google Scholar] [CrossRef]
Raja, R.; Nguyen, T.T.; Slaughter, D.C.; Fennimore, S.A. Real-time weed-crop classification and localisation technique for robotic weed control in lettuce. Biosyst. Eng. 2020, 192, 257–274. [Google Scholar] [CrossRef]
Fu, H.; Zhao, X.; Zhai, C.; Zheng, K.; Zheng, S.; Wang, X. Research progress on weed recognition method based on deep learning technology. J. Chin. Agric. Mech. 2023, 44, 198–207. [Google Scholar]
Yuan, H.; Zhao, N.; Cheng, M. Review of weeds recognition based on image processing. Trans. Chin. Soc. Agric. Mach. 2020, 51, 323–334. [Google Scholar]
Gharde, Y.; Singh, P.K.; Dubey, R.P.; Gupta, P.K. Assessment of yield and economic losses in agriculture due to weeds in India. Crop Prot. 2018, 107, 12–18. [Google Scholar] [CrossRef]
Hussain, Z.; Marwat, K.B.; Cardina, J.; Khan, I.A. Xanthium stramonium L. Impact on corn yield and yield components. Turk. J. Agric. For. 2014, 38, 39–46. [Google Scholar] [CrossRef]
Tursun, N.; Datta, A.; Sakinmaz, M.S.; Kantarci, Z.; Knezevic, S.Z.; Chauhan, B.S. The critical period for weed control in three corn (Zea mays L.) types. Crop Prot. 2016, 90, 59–65. [Google Scholar] [CrossRef]
Zhu, H.; Zhang, Y.; Mu, D.; Bai, L.; Zhuang, H.; Li, H. YOLOX-based blue laser weeding robot in corn field. Front. Plant Sci. 2022, 13, 1017803. [Google Scholar] [CrossRef]
Liu, B.; Bruch, R. Weed detection for selective spraying: A review. Curr. Robot. Rep. 2020, 1, 19–26. [Google Scholar] [CrossRef]
Wang, Z.; Guo, J.; Zhang, S. Lightweight convolution neural network based on multi-scale parallel fusion for weed identification. Int. J. Pattern Recognit. Artif. Intell. 2022, 36, 2250028. [Google Scholar] [CrossRef]
Heap, I.; Duke, S.O. Overview of glyphosate-resistant weeds worldwide. Pest. Manag. Sci. 2018, 74, 1040–10049. [Google Scholar] [CrossRef]
Gould, F.; Brown, Z.S.; Kuzma, J. Wicked evolution: Can we address the sociobiological dilemma of pesticide resistance? Science 2018, 360, 728–732. [Google Scholar] [CrossRef]
Peterson, M.A.; Collavo, A.; Ovejero, R.; Shivrain, V.; Walsh, M.J. The challenge of herbicide resistance around the world: A current summary. Pest. Manag. Sci. 2018, 74, 2246–2259. [Google Scholar] [CrossRef]
Mennan, H.; Jabran, K.; Zandstra, B.H.; Pala, F. Non-chemical weed management in vegetables by using cover crops: A review. Agronomy 2020, 10, 257. [Google Scholar] [CrossRef]
Wu, Z.; Chen, Y.; Zhao, B.; Kang, X.; Ding, Y. Review of weed detection methods based on computer vision. Sensors 2021, 21, 3647. [Google Scholar] [CrossRef]
Dhanya, V.G.; Subeesh, A.; Kushwaha, N.L.; Vishwakarma, D.K.; Kumar, T.N.; Ritika, G.; Singh, A.N. Deep learning based computer vision approaches for smart agricultural applications. Artif. Intell. Agric. 2022, 6, 211–229. [Google Scholar] [CrossRef]
Deng, X.; Qi, L.; Ma, X.; Jiang, Y.; Chen, X.; Liu, H.; Chen, W. Recognition of weeds at seedling stage in paddy fields using multi-feature fusion and deep belief networks. Trans. Chin. Soc. Agric. Eng. 2018, 34, 165–172. [Google Scholar]
Peng, W.; Lan, Y.; Yue, X.; Cheng, Z.; Wang, L.; Cen, Z.; Lu, Y.; Hong, J. Research on paddy weed recognition based on deep convolutional neural network. J. S. Chin. Agric. Univ. 2020, 41, 75–81. [Google Scholar]
Zhang, L.; Jin, X.; Fu, L.; Li, S. Recognition Method for Weeds in Rapeseed Field Based on Faster R-CNN Deep Network. Laser Optoelectron. Prog. 2020, 57, 304–312. [Google Scholar] [CrossRef]
Sun, Y.; Chen, Y.; Jin, X.; Yu, J.; Chen, Y. Differentiation of Bok choy seedlings from weeds. Fujian J. Agric. Sci. 2021, 36, 1484–1490. [Google Scholar]
Sun, J.; Tan, W.; Wu, X.; Shen, J.; Lu, B.; Dai, C. Real-time recognition of sugar beet and weeds in complex backgrounds using multi-channel depth-wise separable convolution model. Trans. Chin. Soc. Agric. Eng. 2019, 35, 184–190. [Google Scholar]
Meng, Q.; Zhang, M.; Yang, X.; Liu, Y.; Zhang, Z. Recognition of maize seedling and weed based on light weight convolution and feature fusion. Trans. Chin. Soc. Agric. Mach. 2020, 51, 238–245, 303. [Google Scholar]
Xu, Y.; He, R.; Zhai, Y.; Zhao, B.; Li, C. Weed identification method based on deep transfer learning in field natural Environment. J. Jilin Univ. (Eng. Technol.) 2021, 51, 2304–2312. [Google Scholar]
Kang, J.; Liu, G.; Wang, Q.; Xia, Y.; Guo, G.; Liu, W. Weed detection algorithm based on dynamic pruning neural network. Trans. Chin. Soc. Agric. Mach. 2023, 54, 269–277. [Google Scholar]
Bakhshipour, A.; Jafari, A. Evaluation of support vector machine and artificial neural networks in weed detection using shape features. Comput. Electron. Agric. 2018, 145, 153–160. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; IEEE: New York, NY, USA, 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; IEEE: New York, NY, USA, 2015; pp. 1440–1448. [Google Scholar]
García-Navarrete, O.L.; Santamaria, O.; Martín-Ramos, P.; Valenzuela-Mahecha, M.Á.; Navas-Gracia, L.M. Development of a Detection System for Types of Weeds in Maize (Zea mays L.) under Greenhouse Conditions Using the YOLOv5 v7.0 Model. Agriculture 2024, 14, 286. [Google Scholar] [CrossRef]
Ajayi, O.G.; Ashi, J.; Guda, B. Performance Evaluation of YOLO v5 Model for Automatic Crop and Weed Classification on UAV Images. Smart Agric. Technol. 2023, 5, 100231. [Google Scholar] [CrossRef]
Wang, Y.; Ma, T.; Chen, G. Weeds detection in farmland based on a modified YOLOv5 algorithm. J. Chin. Agric. Mech. 2023, 44, 167–173. [Google Scholar]
Hui, Q.; Ma, W.; Bian, C. Agricultural weeds recognition method based on enhanced attention mechanism. J. Chin. Agric. Mech. 2023, 44, 195–201. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Paoletti, M.E.; Haut, J.M.; Pereira, N.S.; Plaza, J.; Plaza, A. Ghostnet for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10378–10393. [Google Scholar] [CrossRef]
Chen, Z.; Wu, R.; Lin, Y.; Li, C.; Chen, S.; Yuan, Z.; Chen, S.; Zou, X. Plant disease recognition model based on improved YOLOv5. Agronomy. 2022, 12, 365. [Google Scholar] [CrossRef]
Zhao, J.; Xi, X.; Shi, Y.; Zhang, B.; Qu, J.; Zhang, Y.; Zhu, Z.; Zhang, R. An Online Method for Detecting Seeding Performance Based on Improved YOLOv5s Model. Agronomy. 2023, 13, 2391. [Google Scholar] [CrossRef]
Yuan, C.; Liu, T.; Gao, F.; Zhang, R. YOLOv5s-CBAM-DMLHead: A lightweight identification algorithm for weedy rice (Oryza sativa f. spontanea) based on improved YOLOv5. Crop Prot. 2023, 172, 106342. [Google Scholar] [CrossRef]
Li, L.; Wang, Z.; Zhang, T. Gbh-yolov5: Ghost convolution with bottleneckcsp and tiny target prediction head incorporating yolov5 for pv panel defect detection. Electronics 2023, 12, 561. [Google Scholar] [CrossRef]
Zhou, L.; Wei, D.; Ran, Y.; Liu, C.; Fu, S.; Ren, Z. Reclining Public Chair Behavior Detection Based on Improved YOLOv5. J. Adv. Comput. Intell. 2023, 27, 1175–1182. [Google Scholar] [CrossRef]

Figure 1. Image data of chickweed in different states. (a) Strong light. (b) Weak light. (c) multiple plants. (d) Containing impurities.

Figure 2. Labeling marked chickweed.

Figure 3. YOLOv5s model structure diagram.

Figure 4. Diagram of the structure of the SE module.

Figure 5. Location of SE module in the backbone network.

Figure 6. Adding CBAM to the backbone network.

Figure 7. The whole process of the CBAM attention mechanism.

Figure 8. Diagram of the structure of the channel attention module (CAM).

Figure 9. Diagram of the structure of the spatial attention module (SAM).

Figure 10. Convolution formation process of the Ghost module.

Figure 11. Experimental results of different models for detecting other weeds in chickweed images. (a) YOLOv5s model inspection. (b) Improved YOLOv5s model inspection.

Figure 12. Comparison of image recognition effects for broad bean seedlings. (a) YOLOv5s recognition effect. (b) Improved YOLOv5s recognition effect.

Figure 13. Comparison of image recognition effects for rape seedlings. (a) YOLOv5s recognition effect. (b) Improved YOLOv5s recognition effect.

Table 1. Test environment.

Name	Experimental Environment
Deep Learning Framework	Pytorch 1.10.1
CPU	12th Gen Intel(R) Core(TM) i5-12400F 2.50 GHz
GPU	NVIDIA GE Force RTX 3060
Video memory	12G
Software version	CUDA 11.3, UDNN 8.2.1

Table 2. Confusion matrix.

	Predicted Chickweed	Predicted Non-Chickweed
Actually chickweed	$True Positive (T_{P}$ )	$False Negative (F_{N}$ )
Actually non-chickweed	$False Positive (F_{P}$ )	$True Negative (T_{N}$ )

Table 5. Confusion matrix results.

	Predicted Chickweed	Predicted Non-Chickweed
Actually chickweed	152	8
Actually non-chickweed	5	155

Table 6. Comparative test results.

Model	Precision/%	Recall/%	Mean Average Precision/%	Computation/GB	Model Volume/MB	Frame Rate/fps
YOLOv5s	90.20	89.60	92.20	15.90	13.6	13.21
YOLOv4	85.10	80.60	84.70	60.70	256.0	9.60
SSD	87.20	85.30	86.30	105.00	92.0	55.30
Faster-RCNN	79.60	78.30	82.40	370.20	110.0	9.20
YOLOv7s	93.70	92.50	92.80	104.00	91.3	28.00
YOLOv8s	89.75	88.60	90.30	28.70	21.4	49.00
YOLOv5s+SE+CBAM+Ghost+EIoU	96.80	94.00	93.20	13.60	9.6	14.01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, H.; Zhao, J.; Xi, X.; Li, Y.; Zhao, Y. Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s. Agronomy 2024, 14, 2141. https://doi.org/10.3390/agronomy14092141

AMA Style

Yu H, Zhao J, Xi X, Li Y, Zhao Y. Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s. Agronomy. 2024; 14(9):2141. https://doi.org/10.3390/agronomy14092141

Chicago/Turabian Style

Yu, Hong, Jie Zhao, Xiaobo Xi, Yongbo Li, and Ying Zhao. 2024. "Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s" Agronomy 14, no. 9: 2141. https://doi.org/10.3390/agronomy14092141

APA Style

Yu, H., Zhao, J., Xi, X., Li, Y., & Zhao, Y. (2024). Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s. Agronomy, 14(9), 2141. https://doi.org/10.3390/agronomy14092141

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research and Experiment on a Chickweed Identification Model Based on Improved YOLOv5s

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Image Enhancement and Data Segmentation

2.3. Image Data Annotation

2.4. YOLOv5s Model’s Structure

2.5. Improve Strategy SE Module

2.6. Improve Strategy CBAM Module

2.7. Ghost Convolution

2.8. EloU Loss Function

2.9. Test Environment

2.10. Confusion Matrix

2.11. Evaluation Indicators

3. Results

3.1. Ablation Test

3.2. Comparative Tests

3.3. Experimental Effects

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI