Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s

Li, Hongling; Liu, Xiaolong; Zhang, Hua; Li, Hui; Jia, Shangyun; Sun, Wei; Wang, Guanping; Feng, Quan; Yang, Sen; Xing, Wei

doi:10.3390/agriculture14111905

Open AccessArticle

Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s

by

Hongling Li

,

Xiaolong Liu

^*,

Hua Zhang

,

Hui Li

,

Shangyun Jia

,

Wei Sun

,

Guanping Wang

,

Quan Feng

,

Sen Yang

and

Wei Xing

College of Mechanical and Electrical Engineering, Gansu Agricultural University, Lanzhou 730070, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(11), 1905; https://doi.org/10.3390/agriculture14111905

Submission received: 24 September 2024 / Revised: 23 October 2024 / Accepted: 25 October 2024 / Published: 27 October 2024

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

In order to improve the performance of potato planter, reduce miss-seeding rates, enhance the overall quality of the seeding operation, and ultimately increase the yield of the potato, it is necessary to implement effective technical means to monitor and identify the miss-seeding issues during the seeding process. The existing miss-seeding detection technologies commonly use sensors to monitor, but such technologies are easily affected by factors like heavy dust and strong vibrations, resulting in poor interference resistance and adaptability. Therefore, this study aims to explore and apply deep learning algorithms to achieve real-time monitoring of the miss-seeding phenomenon in potato planter during the planting process. Considering both the lightweight of the miss-seeding detection model and its practical deployment, this study selects and adapts the YOLOv5s algorithm to achieve this goal. Firstly, the attention mechanism is integrated into the backbone network to suppress background interference and improve detection accuracy. Secondly, the non-maximum suppression algorithm is improved by replacing the original IoU-NMS with the Soft-NMS algorithm to enhance the bounding box regression rate and reduce missed detections of potato seeds due to background overlap or occlusion. Experimental results show that the accuracy of the improved algorithm in detecting miss-seeding increased from 96.02% to 98.30%, the recall rate increased from 96.31% to 99.40%, and the mean average precision (mAP) improved from 99.12% to 99.40%. The improved model reduces missed and false detections, provides more precise target localization, and is suitable for miss-seeding detection in natural environments for potato planter, providing technical and theoretical support for subsequent intelligent reseeding in potato planter.

Keywords:

potato; spoon chain-type seed metering device; YOLOv5s; miss-seeding detection; CBAM

1. Introduction

Potatoes are the fourth-largest food crop in the world [1], and China ranks first globally in both potato planting area and production, playing a dominant role in the global potato industry [2]. As such, potato yield is crucial to China’s food security. In agricultural planting processes, the seeding step plays a vital role, and its efficiency and quality directly determine the final crop yield [3]. Particularly for potatoes, to ensure stable and increased yields, potato planting machinery must exhibit excellent performance. However, due to the irregular shape of potato seeds, their poor flowability, the high randomness of the seed-picking process, and the uneven surface and strong mechanical vibrations at planting sites, the miss-seeding rate tends to be high, which reduces planting quality [4]. Therefore, this study focuses on addressing the key technical issue of miss-seeding detection in the current spoon chain-type seed metering device.

In the field of potato planting machinery, there are various types of seed metering devices, including needle-type, spoon chain-type, pneumatic, and clamp-type [5]. Among them, the spoon chain-type seed metering device has been widely used globally due to its simple design, wide applicability, adjustable planting spacing, and cost-effectiveness [6]. Currently, there are many studies on miss-seeding detection for spoon chain-type seed metering devices. For example, Sun Wei [7] and Li Ping et al. [8] designed a detection scheme consisting of Hall sensor positioning and infrared transmitting and receiving devices. Guan Hongmin et al. [9] adopted a multi-sensor fusion detection method, integrating Hall, photoelectric, and piezoelectric sensors to perform multi-level detection of potato seeds, effectively improving the accuracy of miss-seeding detection. Wang Guanping [10] and Zhu Liang et al. [11] proposed a potato miss-seeding detection scheme based on spatial capacitance for spoon chain-type seed metering devices. In addition, Liu Shufeng et al. [12] used two pairs of laser opposite sensors and contact travel switch sensors to detect leakage, respectively. Qiu et al. [13] developed a leakage detection and compensation system based on visual monitoring, which successfully detected and compensated for leakage in real time, achieving an accuracy rate of over 98.5%. Existing miss-seeding detection technologies generally use photoelectric and capacitive sensors to monitor the seed pickup status of the seed spoon. Although these detection technologies have low cost and high automation, they are easily affected by factors such as heavy dust and strong vibrations, resulting in poor interference resistance and adaptability.

In recent years, with the rapid development of machine vision and image processing technology, deep learning-based techniques have shown significant advantages in feature extraction. These technologies can automatically learn more complex and abstract representations from datasets and exhibit high adaptability, robustness, and excellent feature expression capabilities, leading to their extensive use in the agricultural field [14]. In the field of image object detection, the mainstream methods are currently divided into two categories. The first category includes two-stage detection algorithms that use a region proposal network (RPN), such as Faster R-CNN [15] and R-CNN [16]. These methods first generate a series of potential regions of interest (ROI) in the image and then classify these regions and refine their bounding boxes. Zhang et al. [17] proposed a weed recognition method based on an optimized Faster R-CNN algorithm for soybean seedlings, achieving an average recognition speed of 336 ms per image and an average recognition accuracy of 99.16%. Li et al. [18] designed an intelligent recognition and counting model for strawberries in natural environments based on R-CNN. The second category consists of one-stage detection algorithms that directly use regression, such as SSD [19] and YOLO [20]. These algorithms avoid the candidate region generation step and directly predict the class probabilities and bounding box coordinates of objects. Dai et al. [21] proposed an improved sprouted potato detection model based on YOLOv5 for detecting and grading sprouted potatoes in complex scenes. Tian et al. [22] used 3000 × 3000 resolution apple images and improved the YOLOv3 model to identify different growth stages of apples. This algorithm demonstrated high efficiency when handling complex and dense backgrounds, with an average detection time of only 0.304 s. All of the above studies adopted deep learning technology, with high recognition accuracy and speed.

Although the YOLO model has demonstrated excellent performance in target detection tasks, its application in agricultural scenarios remains challenging, particularly in complex environments characterized by strong light, dust interference, and occlusions. Under these conditions, the model’s robustness and detection accuracy are often suboptimal. To address these limitations, more advanced models, such as YOLOv7 and YOLOv8, have been developed. These models exhibit improved performance in handling complex backgrounds and mitigating the impact of occlusions, thereby enhancing detection accuracy [23]. Additionally, transformer-based architectures, such as the Swin transformer, have shown the ability to extract richer features in complex scenes, further improving the robustness of object detection models [24]. Furthermore, the incorporation of attention mechanisms, such as the convolutional block attention module (CBAM), has significantly improved feature extraction capabilities, effectively reducing instances of missed and false detections in challenging environments [25]. Despite these advancements, deploying such models on agricultural equipment presents another challenge: the limited computational resources of these devices. Many deep learning models, while highly accurate, are computationally intensive and possess a large number of parameters, making them unsuitable for real-time applications on resource-constrained devices. In contrast, YOLOv5s, with its lightweight architecture, offers a more practical solution for agricultural scenarios. Its design allows for a significant reduction in computational overhead while maintaining high detection accuracy, making it more suitable for deployment on resource-limited agricultural equipment. In this study, we propose two key improvements to the YOLOv5s algorithm to address the specific challenges of missed potato seeding detection. First, we integrate the CBAM attention mechanism to enhance the model’s detection accuracy in complex environments, thereby reducing the likelihood of missed or false detections. Second, we replace the traditional bounding box loss function with the Distance-IoU (DIoU) loss function, which improves the localization accuracy of detection boxes and minimizes redundant detections. These improvements aim to significantly enhance the model’s adaptability to agricultural settings, especially in resource-constrained environments, enabling more efficient real-time detection of missed seedings.

2. Materials and Methods

2.1. Data Collection

A high-resolution CCD industrial camera (WP-UT320/M, 120 frames/s, Huagu Power, Shenzhen, China) was installed on the spoon chain-type seed metering device (Gansu Agricultural University, Lanzhou, China) to capture all seed potato image data. The image resolution was 2048 × 1536. The camera was fixed on a camera mounting plate (see Figure 1), with the shortest distance of 15 cm from the seed spoon. The collected video of the seed spoons in motion was decomposed into a series of images, and after data cleaning, 1550 clear and non-blurred images were selected, as shown in Figure 2.

2.2. Dataset Construction and Division

To enhance the diversity of the image dataset, facilitate more effective feature extraction, and improve the model’s generalization ability, this study employed various image augmentation techniques to expand the image dataset for missed planting detection. These techniques include horizontal and vertical flipping, brightness adjustment (both enhancement and reduction), motion blur simulation, and contrast adjustment [26]. Through these methods, the dataset was expanded to a total of 5000 images. The specific data augmentation strategies are as follows: horizontal and vertical flipping: By flipping images horizontally and vertically, the diversity of the dataset is increased. Brightness adjustment: The brightness of the images is randomly adjusted within a range of 0.8 to 1.2 times to simulate different lighting conditions. Motion blur: Motion blur is randomly introduced within a range of −45° to 45° to simulate dynamic scenes during the capture process. Contrast enhancement: The contrast of the images is randomly enhanced within a range of 0.8 to 1.2 times to improve the model’s generalization ability under different contrast conditions.

For labeling the targets, the colabeler software (v2.0.4) was used to annotate the rectangular boxes in the images where the potato seed spoon is located, with labels set as “Potato seed” for the potato seeds and “Miss seeding” for the missed planting, resulting in XML format label files [27].

To ensure the reliability of the dataset and the robustness of the model, this study first performed a random shuffle of the dataset and then split the shuffled dataset into training, validation, and test sets in an 8:1:1 ratio. During the splitting process, it was ensured that there was no overlap of images between the three subsets. By applying the aforementioned image augmentation techniques, this study successfully constructed an image dataset suitable for potato miss-seeding detection. Specifically, the training set contains 4000 images, the validation set contains 1000 images, and the test set also contains 1000 images, with each subset meeting specific data quality and quantity requirements.

2.3. Comparison of Attention Mechanisms

In the research and application of deep learning, attention mechanisms [28] are divided into three main categories: spatial attention, channel attention, and convolutional attention. Specifically, several representative attention mechanisms include squeeze-and-excitation networks (SENet), efficient channel attention networks (ECANet), and convolutional CBAM. Each of these mechanisms enhances the model’s sensitivity to key features through unique architectural designs, thereby improving the model’s performance.

2.3.1. Principle of SENet Module

The channel attention mechanism, especially SENet [29], has been widely used in convolutional neural networks. SENet evaluates the importance of each channel through global average pooling and two fully connected layers, adjusting the feature maps accordingly to enhance important features and suppress unimportant ones. This process first compresses the feature maps to capture global spatial information, then learns the dependencies between channels through two fully connected layers (with a ReLU activation function in between) and outputs the weights for each channel. These weights are normalized to a range of 0 to 1 using the sSigmoid function and are then multiplied by the original feature maps to obtain the weighted feature maps. SENet can be inserted as a module into any layer of a convolutional neural network, improving the model’s performance and accuracy while enhancing its ability to understand complex data. The structure of SENet is shown in Figure 3.

2.3.2. Principle of ECANet Module

ECANet [30] is an efficient implementation of the channel attention mechanism that reduces the complexity and computational cost of the model by removing the fully connected layers found in SENet and replacing them with 1D convolutions. The core of ECANet is to apply 1D convolution to the feature maps after global average pooling, enabling local interactions between channels instead of the global channel dependencies in SENet. This design reduces the number of parameters while maintaining performance. The size of the convolution kernel in the 1D convolution determines the number of channels considered when calculating the weight for each channel, thereby influencing the breadth of inter-channel interactions [31]. The structure of ECANet is shown in Figure 4.

2.3.3. Principle of CBAM Module

CBAM [32] is based on the channel attention module (CAM) (Figure 5a) and the Spatial Attention Module (SAM) (Figure 5b), with its network structure illustrated in Figure 5. In CBAM, the choice between using the channel attention mechanism or the spatial attention mechanism depends on whether the channel attention is connected through MLP or 1D convolution. Although CBAM is similar to SENet, it performs better in detection effectiveness [33]. In the channel attention phase, the feature maps undergo average pooling and max pooling, followed by fully connected layers and sigmoid activation to obtain channel weights. In the spatial attention phase, the results of max pooling and average pooling are stacked along the channel dimension, and spatial weights are obtained using a 1 × 1 convolution layer and sigmoid activation. This combined mechanism allows CBAM to capture both inter-channel dependencies and local spatial features simultaneously, thereby improving the model’s detection performance and generalization ability. CBAM can be integrated as a module into various deep learning models to enhance their understanding and classification performance of complex data.

3. Algorithm Improvement

3.1. Overview of YOLOv5s Network

To adapt to the working scenarios of agricultural machinery, this study selects the YOLOv5s algorithm, which offers higher accuracy, greater versatility, and lightweight design. The network structure is illustrated in Figure 6. YOLOv5s is mainly composed of three parts: backbone, neck, and head. The backbone network consists of modules such as focus, CSP, and SPP [34]. Compared to YOLOv3 and YOLOv4, YOLOv5s introduces a unique Focus structure, which is centered around the slicing operation. In this structure, the input raw image first undergoes slicing processing, followed by convolution operations using 32 convolution kernels to generate feature maps. The images of the seed-taking spoon to be detected are adjusted to the model’s input size and are then forwarded through the backbone network. YOLOv5s uses CSPDarknet53 as its backbone to extract features from the images. The neck module employs an FPN + PAN structure, where the FPN layer conveys semantic information from the top, and the PAN layer conveys localization information from the bottom, ultimately generating multi-scale feature maps to be passed to the head module [35]. The head module adopts a lightweight and efficient design, combining multi-scale convolutions and upsampling operations to output prediction results for each grid cell. Each grid cell is responsible for predicting a fixed number of bounding boxes and the corresponding class confidence for these bounding boxes. To reduce redundant detection boxes, non-maximum suppression (NMS) is used to select the boxes with the highest confidence while eliminating other boxes with an intersection over union (IoU) exceeding a certain threshold [36].

3.2. YOLOv5s Network Improvement

3.2.1. Introduction of Attention Mechanism

To overcome the potential impact of complex backgrounds and strong dust environments on potato miss-seeding detection performance in field operations and to enhance the model’s adaptability to complex scenes, this study integrates attention mechanisms into the FPN structure of YOLOv5s, aiming to improve the model’s feature expression capability. By introducing attention modules at different levels of the FPN, the model’s ability to detect targets of varying sizes can be enhanced. To evaluate the impact of different attention mechanisms on the performance of the YOLOv5s model, this study embeds SE, ECA, and CBAM attention mechanism modules into the neck structure of the YOLOv5s model, with specific locations described in detail below.

YOLOv5s-SENet Network Design

In the neck section of the YOLOv5s model, this study introduces the SENet module to construct a new network structure named YOLOv5s-SENet. Specifically, the SENet module is inserted at three key positions in the neck section: between the first upsampling and the second upsampling in the CSP2_1 module and CBL module, as well as between the CSP2_1 module and CBL module immediately following two downsampling operations. A schematic diagram of the YOLOv5s-SENet network architecture is shown in Figure 7.

YOLOv5s-ECANet Network Design

This study integrates the ECANet module into the neck section of the YOLOv5s model, creating a new network variant named YOLOv5s-ECANet. The specific implementation method involves inserting the ECANet module between the CSP2_1 module and CBL module during the first upsampling and the second upsampling in the neck section, as well as between the CSP2_1 module and CBL module immediately following two downsampling operations. This modification aims to enhance the channel attention of the feature maps in the model to improve detection performance. A schematic diagram of the YOLOv5s-ECANet network architecture is shown in Figure 8.

YOLOv5s-CBAM Network Design

In the neck section of the YOLOv5s model, this study integrates the CBAM module, resulting in a new network variant named YOLOv5s-CBAM. The specific operation involves inserting the CBAM module at three key positions in the neck section: between the CSP2_1 module and CBL module during the first upsampling and the second upsampling, as well as between the CSP2_1 module and CBL module immediately following two downsampling operations. This modification aims to enhance the model’s fine-grained understanding of the feature maps by combining channel attention and spatial attention, thereby improving detection performance. A schematic diagram of the YOLOv5s-CBAM network structure is shown in Figure 9.

3.2.2. Improved Non-Maximum Suppression

In the YOLOv5s object detection framework, the default algorithm used is weighted non-maximum suppression (NMS), represented by Equation (1). This algorithm filters candidate bounding boxes based on the intersection over union (IoU). The specific steps are as follows: firstly, all bounding boxes of the same category are sorted in descending order based on their scores; secondly, the bounding box with the highest score is selected; finally, the IoU value between this bounding box and the remaining bounding boxes is calculated, and those bounding boxes with IoU values exceeding a preset threshold are removed. In the identification task of potato miss-seeding detection, due to the similar shapes and sizes of the seed potatoes, the detection algorithm may generate overlapping detection boxes for multiple seed potatoes, leading to increased IoU values. This may result in only a single bounding box being retained when using weighted NMS, while other overlapping seed potatoes are excluded, causing detection errors.

s_{i} = \{\begin{array}{l} s_{i}, & i o u (M, b_{i}) < N_{t} \\ 0, & i o u (M, b_{i}) \geq N_{t} \end{array}

(1)

In this context, s_i is the score of the i-th detection box, M represents the detection box with the highest score, b_i is the i-th detection box, and N_t is the preset IoU threshold.

NMS is sensitive to the preset IoU threshold N_t; if the threshold is too high or too low, it can lead to false positives or missed detections. Therefore, this paper improves the NMS in the original network using Soft-NMS to avoid the problem of lost detection targets due to threshold issues. The calculation formula for Soft-NMS is shown in Equation (2).

s_{i} = \{\begin{array}{l} s_{i}, & i o u (M, b_{i}) < N_{t} \\ s_{i} (1 - i o u (M, b_{i})), & i o u (M, b_{i}) \geq N_{t} \end{array}

(2)

Soft-NMS provides a more flexible way to handle overlapping detection boxes by gradually reducing the detection scores instead of directly removing low-scoring boxes, thus improving detection performance [37].

4. Experimental Results and Analysis

4.1. Experimental Platform

In this study, experiments were conducted on an Ubuntu 16.04 operating system using PyCharm 1.13.0 as the development environment, based on Python 3.9. The computer configuration used for the experiments included an Intel Core i7-8700 CPU, an NVIDIA GTX 3070Ti graphics card, and 32 GB of RAM.

4.2. Model Training Parameters

The model training parameters include: the input image size for potato miss-seeding detection is 2048 × 1536, using the SGD optimizer to optimize the gradient, setting momentum to 0.9, setting initial learning rate to 0.01, setting final learning rate to 0.01, setting weight decay to 0.0005, setting batch size to 8, and setting epoch to 150.

4.3. Evaluation Metrics

To validate the effectiveness of the detection model, precision (P), recall (R), and mean average precision (mAP) are used as evaluation metrics for the model, with specific calculation formulas provided in Equations (3)–(5). Precision (P) indicates the accuracy of predicting normal and empty seed potato scoops, recall (R) represents the completeness rate of predicting normal and empty seed potato scoops, and mean average precision (mAP) serves as a comprehensive evaluation metric of the model’s precision. The closer these three metrics are to 1, the better the model’s detection performance.

P = \frac{T P}{T P + F P}

(3)

R = \frac{T P}{T P + F N}

(4)

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(5)

In the formula, TP represents true positives (the number of positive samples correctly detected by the model); FP represents false positives (the number of negative samples incorrectly classified as positive by the model); FN represents false negatives (the number of positive samples that the model failed to detect); AP is the area under the P–R curve for each category; and N is the number of detection categories.

4.4. Experimental Results and Analysis of Different Attention Mechanisms

Using the constructed potato miss-seeding detection test dataset, comparative experiments were conducted between the YOLOv5s-SENet model, YOLOv5s-ECANet model, YOLOv5s-CBAM model, and the original YOLOv5s model as proposed in Section 3.2.1. The experimental process was recorded, including precision P, recall R, and mean average precision (mAP). In addition, to assess the performance differences between models in terms of mAP, we conducted statistical analyses using the Friedman test. This rank-based, nonparametric test is well suited for handling multiple correlated samples, particularly when the data do not meet the assumption of a normal distribution. Since we are comparing the performance of multiple models on the same dataset under identical experimental conditions, the data can be considered as correlated samples, making the Friedman test an appropriate choice. A significance level of p < 0.05 was used for this analysis. The experimental results are shown in Table 1.

According to the experimental results, the YOLOv5s-SENet model achieved an increase in precision of 0.10%, an increase in recall of 0.06%, and an increase in mean average precision (mAP) of 0.02% compared to YOLOv5s. After adding SENet, the model demonstrated improved accuracy and performance, although the probability of false detection increased compared to the original model. The YOLOv5s-ECANet model showed an increase in precision of 0.49%, an increase in recall of 0.11%, and an increase in mAP of 0.05% compared to YOLOv5s. After integrating ECANet, the model exhibited a decrease in both missed detections and false detections, resulting in enhanced accuracy and performance. The YOLOv5s-CBAM model recorded an increase in precision of 0.88%, an increase in recall of 0.19%, and an increase in mAP of 0.08% compared to YOLOv5s. With the addition of CBAM, the model demonstrated a reduction in both missed detections and false detections, leading to improved performance. We conducted the Friedman test on the mAP values of the YOLOv5s, YOLOv5s-SENet, YOLOv5s-ECANet, and YOLOv5s-CBAM models to evaluate the performance differences between them (Table 1). In the table, asterisks (*) denote statistically significant differences in mAP values. Specifically, the YOLOv5s-SENet, YOLOv5s-ECANet, and YOLOv5s-CBAM models showed significant improvements in mAP compared to the original YOLOv5s model (p < 0.05), with the YOLOv5s-CBAM model exhibiting the most significant improvement (p < 0.01). In summary, incorporating the CBAM attention mechanism into the YOLOv5s network structure yielded better optimization results than the YOLOv5s-SENet and YOLOv5s-ECANet models.

4.5. Ablation Experiments

Based on the original YOLOv5s algorithm, the CBAM attention mechanism and improved non-maximum suppression were respectively introduced. Ablation experiments were designed on the self-made dataset, labeled as I, II, III, and IV, with the results shown in Table 2.

Experiment I is the original YOLOv5s algorithm, which achieved precision (P), recall (R), and mean average precision (mAP) of 96.02%, 96.31%, and 99.12%, respectively, for potato missed-seeding detection. Compared to Experiment I, Experiment II improved precision (P), recall (R), and mAP by 0.88%, 0.19%, and 0.08%, respectively. In comparison to Experiment I, Experiment III improved precision (P) and mAP by 2.28% and 0.08%, respectively, but the recall (R) decreased by 0.21%. In contrast to Experiment I, Experiment IV showed improvements in precision (P), recall (R), and mAP, increasing by 2.28%, 3.09%, and 0.28%, respectively.

Based on the combined ablation experiments, the following conclusions can be drawn: (1) The YOLOv5s model with the CBAM attention mechanism demonstrated outstanding performance in suppressing background interference, thereby improving background noise reduction. (2) The detection accuracy of the YOLOv5s algorithm was also enhanced after improving the non-maximum suppression algorithm. (3) When both the attention mechanism and the improved non-maximum suppression algorithm were integrated, there was a significant increase in precision (P), recall (R), and mean average precision (mAP), with their values approaching 1, indicating a clear optimization in the model’s detection performance.

4.6. Comparison Experiments of Different Models

To test the relative effectiveness of the improved algorithm, this study compared the enhanced YOLOv5s model with other lightweight YOLO series algorithms, including YOLOv3, YOLOv4, YOLOv5m, YOLOv6s, YOLOv7, and YOLOv7-tiny. The experimental environment and dataset remained consistent, with the same testing duration, confidence thresholds, and initial learning rates. The testing results are shown in Table 3.

According to the data shown in Table 3, YOLOv3, YOLOv4, YOLOv5, and YOLOv6s have relatively large parameter counts, computational loads, and model sizes, resulting in lower mAP values. Therefore, these models are not suitable for detecting potato misplanting situations. Although YOLOv7 has the highest mAP value, it also has a relatively large parameter count, computational load, and model size, making it unsuitable for deployment on lightweight devices. The YOLOv7-tiny algorithm shows test results that are relatively close to the improved YOLOv5s algorithm. However, the improved YOLOv5 algorithm outperforms the YOLOv7-tiny algorithm by 0.05% in the key metric of mAP, indicating that the improved YOLOv5 algorithm is better suited for applications and deployment on lightweight devices.

4.7. Experiment Result Presentation

Observing the loss transformation curve of the improved network, the loss value of the modified network decreases rapidly within the first 50 iterations and tends to converge between the 50th and 100th iterations. This indicates that the model’s loss value can converge in a few training cycles. After 100 iterations, the network with the added CBAM attention mechanism and improved non-maximum suppression shows a training accuracy that approaches a more stable value compared to the original YOLOv5s algorithm, as shown in Figure 10.

Figure 11 presents a selection of detection images. In Figure 11a–c, potatoes are detected in the seed scoop, while Figure 11d–g indicates that there are no potatoes in the seed scoop, resulting in a determination of potato planter missed sowing. In Figure 11h, the upper detection box incorrectly identifies the presence of seed potatoes as missed sowing, while Figure 11i misidentifies missed sowing as the presence of seed potatoes. The primary reason for these errors in the two images is that the position of the potato seed scoop is slightly above the expected area, with part of the seed scoop extending beyond the image frame. Additionally, the strong background lighting around the seed scoop contributes to these misdetections.

4.8. Confidence Comparison Experiment

Three groups of images containing potato seeds and three groups without potato seeds were randomly selected to test the original YOLOv5s and the improved YOLOv5s algorithms. The confidence comparison of the two algorithms is shown in Figure 12.

Comparing Figure 12a–f and Figure 12g–l, both algorithms correctly detected the targets in all six images. For the original YOLOv5s algorithm, the confidence interval ranged from 0.87 to 0.90, while the improved YOLOv5s algorithm had a confidence interval of 0.92 to 0.94. Therefore, the improved YOLOv5s algorithm significantly enhances the confidence level when detecting potato miss-seeding targets. The fusion of deep and shallow feature maps improves detection effectiveness and capability. This meets the requirements of potato miss-seeding detection in complex working backgrounds.

4.9. Discussion of Practical Applications

While the lightweight YOLOv5s model proposed in this study shows promise for potato missed seeding detection, several challenges remain in applying it to real agricultural environments. Although the model demonstrates good performance in detecting missed seedings, its adaptability may vary across different agricultural scenarios. Factors such as varying potato varieties with different sizes, shapes, and colors may affect detection accuracy, and environmental conditions such as lighting, soil composition, and climate may also interfere with the model’s effectiveness. Therefore, additional experiments are needed to verify the model’s robustness across diverse conditions. Although the lightweight design reduces computational resource requirements, hardware selection and processing power remain critical considerations for real-world deployment, particularly in resource-constrained environments such as small-scale agricultural equipment. While the model’s real-time performance meets the demands of agricultural applications, some cases may require high-performance GPUs or embedded systems, which could increase deployment costs. Future research could aim to further optimize the model to operate efficiently on low-power devices, thereby reducing costs. To address these challenges, future work should focus on improving the model’s adaptability through techniques such as data augmentation and transfer learning, as well as optimizing the network architecture to reduce computational load. Additionally, integrating other sensors, such as RGB-D cameras or laser sensors, could further enhance detection accuracy and robustness in complex agricultural environments.

5. Conclusions

This study presents a novel and lightweight approach for potato miss-seeding detection, based on YOLOv5s and enhanced with advanced attention mechanisms to address the challenges posed by high miss-seeding rates in spoon chain-type potato planters. The main contributions and conclusions of this study are as follows:

(1) Model innovation and lightweight design: YOLOv5s was used as the baseline model, and three attention mechanisms—SENet, ECANet, and CBAM—were incorporated into the network’s neck structure. Among these, the CBAM attention mechanism showed the most significant improvement in detection performance. This innovation not only reduced the model’s parameter count and computational load but also minimized memory usage, leading to a more lightweight and efficient model. The improved model with CBAM achieved precision, recall, and mAP values of 96.90%, 96.50%, and 99.20%, respectively, representing a clear improvement over the baseline YOLOv5s model. These improvements demonstrate that the model maintains high detection accuracy while being more computationally efficient, making it suitable for real-time applications in resource-constrained agricultural environments.

(2) Enhanced robustness and detection accuracy: By incorporating the CBAM attention mechanism and improving the non-maximum suppression (NMS) algorithm, the model effectively addressed the issue of overlapping bounding boxes, which often occurs due to the similarity in shape and size of potato seeds. This enhancement significantly reduced missed detections and improved the model’s robustness in field conditions, particularly in complex environments with heavy background interference.

(3) Comparison with existing methods: Compared to the original YOLOv5s model and other mainstream object detection models (including YOLOv3, YOLOv4, YOLOv5, YOLOv6, and YOLOv7), the proposed lightweight model achieved notable improvements in precision, recall, and mAP. Specifically, compared to the YOLOv5s model, precision increased by 2.28%, recall by 3.09%, and mAP by 0.28%, outperforming other models on the same dataset. These results validate the superiority of the proposed model in terms of both accuracy and computational efficiency, highlighting its potential for widespread deployment in automated agricultural systems.

In conclusion, the innovations introduced in this study—namely the integration of attention mechanisms and the optimization of detection algorithms—provide a highly accurate, efficient, and practical solution for miss-seeding detection in agricultural settings. The proposed model’s ability to deliver high performance in real time while remaining lightweight positions it as a viable solution for future agricultural automation applications.

Author Contributions

Conceptualization, H.L. (Hongling Li) and X.L.; software, H.L. (Hongling Li), S.J., G.W., S.Y. and W.X.; investigation, H.L. (Hongling Li), H.Z., H.L. (Hui Li). and S.J.; resources, X.L., Q.F., S.Y. and H.Z.; writing—original draft preparation, H.L. (Hongling Li); writing—review and editing, X.L., H.Z., H.L. (Hui Li) and W.X.; supervision, Q.F., W.S. and G.W.; project administration, X.L. and W.S.; funding acquisition, X.L. and W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by several programs, including the National Natural Science Foundation of China grant NSFC (52165028), the National Key Research and Development Program (2022YFD2002005), the Key Scientific and Technological Program of Gansu Province (22ZD6NA046), the Gansu Provincial University Industry Support Plan (2022CYZC-42, 2024CYZC-32, 2023CYZC-42), and the Gansu Province Agricultural Machinery Equipment R&D Key Project (njyf2024-03-1). Additional funding was provided by the Horizontal Project of Gansu Agricultural University (GSAU-JSZR-2024-004), the Gansu Agricultural University Self-Listed Project (701-0722045).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, N.; Reidsma, P.; Pronk, A.A.; de Wit, A.; van Ittersum, M. Can potato add to China’s food self-sufficiency? The scope for increasing potato production in China. Eur. J. Agron. 2018, 101, 20–29. [Google Scholar] [CrossRef]
Li, Y.; Tang, J.Z.; Wang, J.; Zhao, G.; Yu, Q.; Wang, Y.; Hu, Q.; Zhang, J.; Pan, Z.; Pan, X.; et al. Diverging water-saving potential across China’s potato planting regions. Eur. J. Agron. 2022, 134, 126450. [Google Scholar] [CrossRef]
Li, Z.; Wen, X.; Lv, J.; Li, J.; Yi, S.; Qiao, D. Analysis and Prospect of Key Technologies and Equipment in Potato Planting Mechanization Research Progress. Trans. Chin. Soc. Agric. Mach. 2019, 50, 1–16. [Google Scholar]
Zheng, Z.; Zhao, H.; Liu, Z.; He, J.; Liu, W. Research progress and development of mechanized potato planters: A review. Agriculture 2021, 11, 521. [Google Scholar] [CrossRef]
Lei, X.; Zou, H.; Yang, Z.; Li, Y.; Gong, J.; Zheng, M.; Lei, Y.; Zhang, L.; Lv, X. Design and Experiment of Missed Seed Detection and Replanting System for Potato Planter. J. China Agric. Univ. 2022, 27, 234–244. [Google Scholar]
Zhou, B.; Li, Y.; Zhang, C.; Cao, L.; Li, C.; Xie, S.; Niu, Q. Potato planter and planting technology: A review of recent developments. Agriculture 2022, 12, 1600. [Google Scholar] [CrossRef]
Sun, W.; Wang, G.; Wu, J. Design and Experiment of Missed Seed Detection and Replanting System for Spoon-Chain Potato Seed Metering Device. Trans. Chin. Soc. Agric. Eng. 2016, 32, 8–15. [Google Scholar]
Li, P.; Feng, W.; Zhang, X.; Zhong, W.; Wang, P.; Cui, J. Design and Research of Intelligent Potato Planter Based on Electric Drive. Agric. Mech. Res. 2024, 46, 101–106. [Google Scholar]
Guan, H.; Li, J.; Wang, T. Research and Development of a Potato Planter Control System for Accelerated Replanting. Agric. Eng. 2021, 11, 44–49. [Google Scholar]
Wang, G.; Yang, X.; Sun, W.; Liu, Y.; Wang, C.; Zhang, H.; Liu, X.; Feng, B.; Li, H. Potato seed-metering monitoring and improved miss-seeding catching-up compensation control system using spatial capacitance sensor. Int. J. Agric. Biol. Eng. 2024, 17, 255–264. [Google Scholar]
Zhu, L.; Wang, G.; Sun, W.; Zhang, H.; Liu, X.; Feng, B.; Wang, C.; Sun, L. Development of a Potato Seed Metering Status Monitoring System Based on Spatial Capacitance Sensor. Trans. Chin. Soc. Agric. Eng. 2021, 37, 34–43. [Google Scholar]
Liu, S.; Zhang, G.; Li, G.; Lv, Z. Design and Experiment of Automatic Replanting Device for Missed Potato Seed Detection. Agric. Equip. Veh. Eng. 2022, 44, 78–83. [Google Scholar]
Qiu, Z.; Ma, T.; Jin, X.; Xing, F.; Ji, J.; Shi, G. Design and experiment of a situ compensation system for miss-seeding of spoon-chain potato seeders. Appl. Eng. Agric. 2023, 39, 69–79. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Ding, X.; Li, Q.; Cheng, Y.; Wang, J.; Bian, W.; Jie, B. Local keypoint-based Faster R-CNN. Appl. Intell. 2020, 50, 3007–3022. [Google Scholar] [CrossRef]
Chen, Y.; Wang, J.; Wang, G. Intelligent welding defect detection model on improved r-cnn. IETE J. Res. 2023, 69, 9235–9244. [Google Scholar] [CrossRef]
Zhang, X.; Cui, J.; Liu, H.; Han, Y.; Ai, H.; Dong, C.; Zhang, J.; Chu, Y. Weed identification in soybean seedling stage based on optimized Faster R-CNN algorithm. Agriculture 2023, 13, 175. [Google Scholar] [CrossRef]
Li, J.; Zhu, Z.; Liu, H.; Su, Y.; Deng, L. Strawberry R-CNN: Recognition and counting model of strawberry based on improved faster R-CNN. Ecol. Inform. 2023, 77, 102210. [Google Scholar] [CrossRef]
Chen, Z.; Wu, K.; Li, Y.; Wang, M.; Li, W. SSD-MSN: An improved multi-scale object detection network based on SSD. IEEE Access 2019, 7, 80622–80632. [Google Scholar] [CrossRef]
Chen, C.; Zheng, Z.; Xu, T.; Guo, S.; Feng, S.; Yao, W.; Lan, Y. Yolo-based uav technology: A review of the research and its applications. Drones 2023, 7, 190. [Google Scholar] [CrossRef]
Dai, G.; Hu, L.; Fan, J.; Yan, S.; Li, R. A deep learning-based object detection scheme by improving YOLOv5 for sprouted potatoes datasets. IEEE Access 2022, 10, 85416–85428. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Swathi, Y.; Challa, M. YOLOv8: Advancements and Innovations in Object Detection. In International Conference on Smart Computing and Communication; Springer Nature: Singapore, 2024; pp. 1–13. [Google Scholar]
Liu, Z.; Ning, J.; Cao, Y.; Wei, Y.; Zhang, Z.; Lin, S.; Hu, H. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 3202–3211. [Google Scholar]
Farman, H.; Ahmad, J.; Jan, B.; Shahzad, Y.; Abdullah, M.; Ullah, A. Efficientnet-based robust recognition of peach plant diseases in field images. Comput. Mater. Contin. 2022, 71, 2073–2089. [Google Scholar]
Wang, W.; Wu, X.; Yuan, X.; Gao, Z. An experiment-based review of low-light image enhancement methods. IEEE Access 2020, 8, 87884–87917. [Google Scholar] [CrossRef]
Li, R. Research on Dual Missed Seed Detection and Replanting Device Based on Fiber Optic Sensor and Machine Vision. Master’s Thesis, Shandong University of Technology, Zibo, China, 2022. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Huang, Y.; Shi, P.; He, H.; He, H.; Zhao, B. Senet: Spatial information enhancement for semantic segmentation neural networks. Vis. Comput. 2024, 40, 3427–3440. [Google Scholar] [CrossRef]
Xue, H.; Sun, M.; Liang, Y. ECANet: Explicit cyclic attention-based network for video saliency prediction. Neurocomputing 2022, 468, 233–244. [Google Scholar] [CrossRef]
Liu, W.; Li, Z.; Zhang, S.; Qin, T.; Zhao, J. Bud-YOLOv8s: A Potato Bud-Eye-Detection Algorithm Based on Improved YOLOv8s. Electronics 2024, 13, 2541. [Google Scholar] [CrossRef]
Ma, R.; Wang, J.; Zhao, W.; Guo, H.; Dai, D.; Yun, Y.; Li, L.; Hao, F.; Bai, J.; Ma, D. Identification of maize seed varieties using MobileNetV2 with improved attention mechanism CBAM. Agriculture 2022, 13, 11. [Google Scholar] [CrossRef]
Jiang, T.; Li, C.; Yang, M.; Wang, Z. An improved YOLOv5s algorithm for object detection with an attention mechanism. Electronics 2022, 11, 2494. [Google Scholar] [CrossRef]
Wang, Z.; Sun, W.; Zhu, Q.; Shi, P. Face mask-wearing detection model based on loss function and attention mechanism. Comput. Intell. Neurosci. 2022, 2022, 2452291. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Zeng, X.; Liu, S.; Mu, G.; Zhang, H.; Guo, Z. Detection Method of Potato Seed Bud Eye Based on Improved YOLO v5s. Trans. Chin. Soc. Agric. Mach. 2023, 54, 260–269. [Google Scholar]
Feng, J.; Yu, C.; Shi, X.; Zheng, Z.; Yang, L.; Hu, Y. Research on winter jujube object detection based on optimized yolov5s. Agronomy 2023, 13, 810. [Google Scholar] [CrossRef]
Zhang, D.Y.; Luo, H.S.; Wang, D.Y.; Zhou, X.-G.; Li, W.-F.; Gu, C.-Y.; Zhang, G.; He, F.-M. Assessment of the levels of damage caused by Fusarium head blight in wheat using an improved YOLOV5 method. Comput. Electron. Agric. 2022, 198, 107086. [Google Scholar] [CrossRef]

Figure 1. Structural scheme of spoon-chain metering device and loss sowing testing and compensation system 1. Camera mounting box 2. Seed scoop 3. Seed metering chain 4. Seed box 5. Fill light 6. CCD camera 7. Reseeding device.

Figure 2. Sample images of potato miss-seeding detection.

Figure 3. SENet structural diagram. The colors in the figure are used to differentiate between various processing steps.

Figure 4. ECANet structural diagram. The colors in the figure are used to differentiate between various processing stages. The arrows indicate the flow of data between different operations in the ECANet module.

Figure 5. CBAM network structure. (a) Channel attention module; (b) spatial attention module; (c) convolutional block attention module.

Figure 6. YOLOv5s network structure.

Figure 7. YOLOv5s-SENet network architecture.

Figure 8. YOLOv5s-ECANet network architecture.

Figure 9. YOLOv5s-CBAM network structure.

Figure 10. The training result of the improved model.

Figure 11. Results of potato miss-seeding detection. ((a–c) Potato seed; (d–g) miss-seeding; (h,i) detected error).

Figure 12. Confidence comparison of YOLOv5s and improved YOLOv5s algorithms. ((a–f) Original YOLOv5s algorithm; (g–l) improved YOLOv5s algorithm).

Table 1. Comparison of experimental results with different attention modules.

Model	P/%	R/%	mAP/%
YOLOv5s	96.02	96.31	99.12
YOLOv5s-SENet	96.12	96.37	99.14 *
YOLOv5s-ECANe	96.51	96.42	99.17 *
YOLOv5s-CBAM	96.90	96.50	99.20 **

Note: * p < 0.05, ** p < 0.01. Significance was assessed using the Friedman test.

Table 2. Results of ablation test.

Test No.	Module Setting		P/%	R/%	mAP/%
Test No.	CBAM	Soft_NMS	P/%	R/%	mAP/%
I	-	-	96.02	96.31	99.12
II	√	-	96.90	96.50	99.20
III	-	√	98.30	96.10	99.20
IV	√	√	98.30	99.40	99.40

Note: “√” indicates that the module is added; “-” indicates that the module is not added.

Table 3. Comparison experimental results of different models.

	Parameters /×10⁶	Computation/GFLOPs	Model Size /M	mAP/%
YOLOv3	61.53	193.89	120.5	97.35
YOLOv4	52.5	119.83	100.64	96.12
YOLOv5	20.9	48.0	42.2	98.24
YOLOv6s	17.19	44.12	36.3	98.25
YOLOv7	36.49	103.5	74.8	99.43
YOLOv7-tiny	6.01	13.1	12.3	99.35
Improved YOLOv5s	7.02	10.8	13.4	99.40

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Liu, X.; Zhang, H.; Li, H.; Jia, S.; Sun, W.; Wang, G.; Feng, Q.; Yang, S.; Xing, W. Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s. Agriculture 2024, 14, 1905. https://doi.org/10.3390/agriculture14111905

AMA Style

Li H, Liu X, Zhang H, Li H, Jia S, Sun W, Wang G, Feng Q, Yang S, Xing W. Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s. Agriculture. 2024; 14(11):1905. https://doi.org/10.3390/agriculture14111905

Chicago/Turabian Style

Li, Hongling, Xiaolong Liu, Hua Zhang, Hui Li, Shangyun Jia, Wei Sun, Guanping Wang, Quan Feng, Sen Yang, and Wei Xing. 2024. "Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s" Agriculture 14, no. 11: 1905. https://doi.org/10.3390/agriculture14111905

APA Style

Li, H., Liu, X., Zhang, H., Li, H., Jia, S., Sun, W., Wang, G., Feng, Q., Yang, S., & Xing, W. (2024). Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s. Agriculture, 14(11), 1905. https://doi.org/10.3390/agriculture14111905

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research and Experiment on Miss-Seeding Detection of Potato Planter Based on Improved YOLOv5s

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Dataset Construction and Division

2.3. Comparison of Attention Mechanisms

2.3.1. Principle of SENet Module

2.3.2. Principle of ECANet Module

2.3.3. Principle of CBAM Module

3. Algorithm Improvement

3.1. Overview of YOLOv5s Network

3.2. YOLOv5s Network Improvement

3.2.1. Introduction of Attention Mechanism

3.2.2. Improved Non-Maximum Suppression

4. Experimental Results and Analysis

4.1. Experimental Platform

4.2. Model Training Parameters

4.3. Evaluation Metrics

4.4. Experimental Results and Analysis of Different Attention Mechanisms

4.5. Ablation Experiments

4.6. Comparison Experiments of Different Models

4.7. Experiment Result Presentation

4.8. Confidence Comparison Experiment

4.9. Discussion of Practical Applications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI