DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s

Qiu, Zhaomei; Wang, Weili; Jin, Xin; Wang, Fei; He, Zhitao; Ji, Jiangtao; Jin, Shanshan

doi:10.3390/agronomy14112558

Open AccessArticle

DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s

by

Zhaomei Qiu

^1,*,

Weili Wang

¹,

Xin Jin

^1,2,*,

Fei Wang

¹,

Zhitao He

¹,

Jiangtao Ji

¹ and

Shanshan Jin

¹

College of Agricultural Equipment Engineering, Henan University of Science and Technology, Luoyang 471000, China

²

Science and Technology Innovation Center for Completed Set Equipment, Longmen Laboratory, Luoyang 471003, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(11), 2558; https://doi.org/10.3390/agronomy14112558

Submission received: 8 August 2024 / Revised: 29 October 2024 / Accepted: 30 October 2024 / Published: 31 October 2024

(This article belongs to the Special Issue Advances in Data, Models, and Their Applications in Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The quality inspection of potato seed tubers is pivotal for their effective segregation and a critical step in the cultivation process of potatoes. Given the dearth of research on intelligent tuber-cutting machinery in China, particularly concerning the identification of bud eyes and defect detection, this study has developed a multi-target recognition approach for potato seed tubers utilizing deep learning techniques. By refining the YOLOv5s algorithm, a novel, lightweight model termed DCS-YOLOv5s has been introduced for the simultaneous identification of tuber buds and defects. This study initiates with data augmentation of the seed tuber images obtained via the image acquisition system, employing strategies such as translation, noise injection, luminance modulation, cropping, mirroring, and the Cutout technique to amplify the dataset and fortify the model’s resilience. Subsequently, the original YOLOv5s model undergoes a series of enhancements, including the substitution of the conventional convolutional modules in the backbone network with the depth-wise separable convolution DP_Conv module to curtail the model’s parameter count and computational load; the replacement of the original C3 module’s Bottleneck with the GhostBottleneck to render the model more compact; and the integration of the SimAM attention mechanism module to augment the model’s proficiency in capturing features of potato tuber buds and defects, culminating in the DCS-YOLOv5s lightweight model. The research findings indicate that the DCS-YOLOv5s model outperforms the YOLOv5s model in detection precision and velocity, exhibiting superior detection efficacy and model compactness. The model’s detection metrics, including Precision, Recall, and mean Average Precision at Intersection over Union thresholds of 0.5 (mAP1) and 0.75 (mAP2), have improved to 95.8%, 93.2%, 97.1%, and 66.2%, respectively, signifying increments of 4.2%, 5.7%, 5.4%, and 9.8%. The detection velocity has also been augmented by 12.07%, achieving a rate of 65 FPS. The DCS-YOLOv5s target detection model, by attaining model compactness, has substantially heightened the detection precision, presenting a beneficial reference for dynamic sample target detection in the context of potato-cutting machinery.

Keywords:

DCS-YOLOv5s; deep learning; lightweighting; seed potato; quality inspection

1. Introduction

Potatoes stand as the preeminent non-grain crop globally, following only rice, wheat, and corn in terms of worldwide importance [1,2]. China plays a pivotal role in the global potato industry [3]. With a significant share of approximately one-fourth of the total global potato cultivation area [4], China’s potato farming is a substantial sector. The propagation of potatoes in China predominantly utilizes cut tubers as seed potatoes, where their quality is crucial for the germination rate and the health of the seedlings. Thus, the quality inspection of potato seed tubers prior to planting is of utmost importance. Traditional manual identification methods are inefficient, costly, and error-prone. To bolster the precision of potato seed tuber quality inspection, the development of reliable detection and recognition methods is indispensable [5]. The evolving integration of machine vision technology in agriculture [6] is accelerating the transformation of modern agriculture towards a more intelligent, information-driven, and scalable industry.

Machine vision allows for enhanced environmental perception by machines [7]. Known for its high efficiency, swift operation, and contactless nature, this technology has become extensively utilized in assessing the quality of agricultural products [8,9,10]. Yang, et al. [11] leveraged the Canny edge detector on grayscale images to generate segmentation masks, achieving an 89.28% accuracy rate in detecting potato buds. Ji and Sun [12] applied the K-means clustering technique for segmenting potatoes, extracting data from sprouted specimens, and attained an 84.62% accuracy with their model. Li, et al. [13] introduced a recognition method for potato bud eyes based on three-dimensional color saturation geometry, which achieved a 91.48% recognition rate. Lopez-Juarez, et al. [14] successfully integrated HIS technology with a 3 × 3 balanced filter and threshold segmentation to create an innovative model that combines machine vision with hyperspectral imaging, demonstrating notable effectiveness in identifying potato surface defects. Xi, et al. [15] optimized the K-means algorithm for the rapid segmentation of potato seed tuber buds using a chaotic system, which achieved a 98% segmentation accuracy and an average detection rate of 1.11 s per frame under normal conditions. Barnes, et al. [16] employed an adaptive enhancement algorithm to automatically extract optimal features from segmented regions, accurately pinpointing spot defects in potatoes across various breeds and levels of freshness. While these studies successfully detect potato buds and defects, they are limited by an insufficient extraction of high-dimensional features, which restricts their identification capabilities in stochastic environments.

The relentless innovation in deep learning technology has marked significant breakthroughs and found profound application across various sectors [17,18]. Its impact on agriculture has been particularly transformative [19]. This technology is primarily divided into two categories: single-stage and two-stage detection methods. Notable two-stage detection algorithms include R-CNN, Fast R-CNN, and Faster R-CNN [20,21,22]. Lee and Shin [23] utilized the Mask R-CNN to discern the irregular shapes of potatoes against a similarly hued soil backdrop, extracting the dimensions of the detected areas with 90.8% accuracy. Li, et al. [24] crafted an innovative method for detecting bitter gourd leaf diseases in the field, building upon an enhanced Faster R-CNN algorithm, which proved highly robust and precise under natural conditions. Chen, et al. [25] refined the Faster R-CNN to pinpoint cotton apical buds, achieving a detection precision of 98.1% at a rate of 10.3 frames per second. Xi, et al. [26] employed convolutional neural networks with the Faster R-CNN algorithm for the detection of potato buds. Liang, et al. [27] advanced the Mask R-CNN to identify tomato lateral branch pruning points, attaining an 82.9% accuracy with a detection time of 0.319 s. Single-stage detection algorithms, in contrast to their two-stage counterparts, offer a more streamlined and efficient approach by bypassing the candidate region generation and directly forecasting the object’s class probabilities and spatial coordinates [28]. Prominent single-stage detection algorithms include the SSD series [29] and the YOLO series [30,31]. YOLO’s popularity stems from its swift performance, commendable accuracy, and compact model size. Wang, et al. [32] introduced a novel, lightweight detection framework under the YOLO umbrella, targeting deadwood with an 89.11% accuracy and a model weight of a mere 7.6MB. Dang, et al. [33] leveraged the YOLO detection algorithm for identifying a multitude of cotton weeds, surpassing a 95% accuracy threshold. Shi, et al. [34] implemented the YOLOv3 algorithm to address the detection of potatoes with various occlusions, mechanical damage, buds, and impurities. The efficacy of object detection algorithms in agricultural contexts is influenced by a multitude of factors, prompting researchers to refine the YOLO model to accommodate the unique attributes of diverse agricultural products, thereby enhancing accuracy, velocity, and robustness. Zeng, et al. [35] proposed an improved, lightweight, real-time tomato detection approach using YOLO, integrating the MobileNetV3 bottleneck module to reconstruct YOLOv5’s backbone and applying channel pruning to the neck layers, yielding an average frame rate of 26.5 fps and a 93% detection accuracy. Wang, et al. [36] optimized the YOLOv4 model for the identification of soil clumps and stones in potatoes, employing channel pruning to achieve a detection speed of 78.49 fps. Yi, et al. [37] developed the FR-YOLOv4 model, capitalizing on feature recursive fusion within the YOLOv4 network, for the real-time detection and enumeration of densely packed small targets such as citrus fruits in natural settings. The current YOLO series of single-stage detection algorithms have demonstrated superior performance in feature extraction and object detection, addressing challenges like low recognition accuracy and limited generalizability while simultaneously accelerating the speed of detection. Consequently, the popularity of YOLO-based single-stage detection algorithms is on the rise.

This research introduces a streamlined DCS-YOLOv5s model, derived from the YOLOv5s framework, with the objective of refining the quality assessment of potato seed tubers and establishing a theoretical basis for the quality detection of agricultural produce. The primary focuses of this study are outlined below: (1) The development of a lightweight detection model for the multi-target identification of potato seed tubers, crafted to boost the precision and velocity of quality inspection for potato seed tuber blocks. (2) A systematic substitution of the original backbone network’s convolutional modules with depth-wise separable convolution DP_Conv modules, replacement of the Bottleneck in the original C3 module with GhostBottleneck, and the incorporation of the SimAM three-dimensional attention module to enhance precision while curtailing model parameters. (3) A comparative analysis against other sophisticated deep learning algorithms, coupled with heatmap analysis, to illustrate that the refined model’s network architecture is more efficient and exhibits superior stability.

2. Materials and Methods

2.1. Dataset Assembly

The apparatus for acquiring images of seed tubers comprises a computer, supplementary lighting, and a CCD industrial camera, positioned at a height of about 35 cm above the table surface. The specific industrial camera model used in this study is the Dahua 3000 series, a 12-megapixel A3A20MG8. The imagery corpus includes snapshots of potatoes post-dormancy and during the sprouting phase, capturing four distinct stages of bud eye development, as well as images depicting mechanical damage and the presence of wormholes on the tuber surface. Given the morphological resemblance between the potato’s tail and the bud eyes in early sprouting, images of the tuber’s tail were also provided. To enhance the model’s robustness and its capacity for generalization, image augmentation techniques were applied to broaden the dataset. Six distinct data augmentation strategies were implemented in the processing of the potato seed tuber images: translation, noise injection, luminance adjustment, cropping, mirroring, and the Cutout technique. Drawing from empirical data and recognized approaches to determining the sample size in the development of machine learning models, we opted for a dataset of 8400 images to adequately support our model through training, testing, and validation phases. This decision was made to ensure that the model would be robust and generalize well across various scenarios, which is crucial for its practical application and reliability. Figure 1 illustrates a selection of enhanced images, where cropping is utilized to emphasize the critical features within the images. Translation is employed to foster the model’s ability to learn and maintain robustness in the face of changes in object position. Brightness adjustments are incorporated to replicate the diverse lighting conditions that are typical in natural environments. The synergy of these varied processing strategies not only augments the diversity of the data but also plays a pivotal role in strengthening the model’s capacity for generalization. Subsequently, the images were divided into training, validation, and test subsets at an 8:1:1 ratio. The LabelImg utility was engaged for the annotation and archival of the images, categorizing them into four classes: bud, mechanical injury, wormhole, and tail.

2.2. Enhancements to the YOLOv5s Model

2.2.1. Underlying Framework of the YOLOv5s Network

The YOLOv5s detection framework is segmented into three core components: the backbone network, the neck network, and the detection head, as depicted in Figure 2. The latest iteration has substituted the Focus layer with a 6 × 6 Conv convolutional module and has transitioned from a parallel to a serial SPPF structure, thereby diminishing computational demands and accelerating detection without compromising accuracy. The backbone of YOLOv5s is underpinned by a CSPDarknet53 architecture that integrates deep convolutions and residual connections. The Conv module, foundational in CNNs [38], applies convolutions to distill critical spatial information and employs a BN layer for stabilizing feature distribution through normalization. An activation function is subsequently integrated to confer the network’s capacity for non-linear transformations. Within the YOLOv5s neck network, the C3 module is pivotal, tasked with deepening the network to amplify the receptive field, which in turn substantially boosts its feature extraction prowess. YOLOv5s also incorporates a feature pyramid mechanism, designed to harness multi-level features for the detection of objects in diverse sizes. Through meticulous upscaling and downscaling techniques, YOLOv5s adeptly merges features across various strata, culminating in a tiered feature pyramid. The detection head module generates the bounding boxes, confidence metrics, and categorical probabilities of targets. It initiates this process by downsizing the channel dimensions and scaling the feature maps from the backbone’s output through a Conv module. Subsequently, it amalgamates feature maps from multiple hierarchies to enrich the feature information. YOLOv5s then applies the Sigmoid function to confine output values within the 0–1 interval and utilizes non-maximum suppression to refine the detection outcomes.

2.2.2. Streamlining the Backbone Network

The convolutional modules within the YOLOv5 backbone, coupled with a plethora of convolutional operations, lead to less-than-ideal forward inference efficiency for the model. In light of the network’s inherent characteristics, this study replaces the original convolutional modules in the backbone with Depth-wise Separable Convolution (DP_Conv) modules. This strategic substitution is designed to curtail the model’s parameter count and computational demands without a loss in accuracy, which in turn, expedites the detection process.

Depth-wise Separable Convolution, depicted in Figure 3, is distinctly different from standard convolution in the manner of feature extraction. Standard convolution usually extracts features from multi-channel feature maps in one operation, while Depth-wise Separable Convolution divides the traditional convolution process into two distinct phases. The first stage is Depth-wise Convolution, where each feature map is convolved independently with its corresponding kernel, preserving the separation of feature information across channels without any cross-channel blending, and keeping the channel count constant. The subsequent stage entails pointwise convolution, which facilitates the intermixing and consolidation of features across various channels within a convolutional neural network. Within layers of convolution that possess a multitude of channels, pointwise convolution operates by deploying a convolution kernel on a per-channel basis. It linearly integrates information from disparate channels to forge new channels, thereby enabling the network to acquire more complex and nuanced feature representations. It mirrors the role of standard convolutional operations but is specifically geared towards synthesizing feature information from each channel following a depth-wise convolution. This approach fosters interaction and integration of information across channels and allows for the adjustment of the channel count in the output feature maps according to requirements. Notably, in the Depth-wise Convolution stage, each kernel acts on a single-channel feature map only. This separated convolution approach significantly reduces the parameter count and computational complexity during feature extraction, with almost no compromise in the expressive power of features compared to traditional convolution. Such convolutional operations are of vital importance in the design of lightweight network architectures, enabling a decrease in model complexity and computational costs while sustaining performance levels.

2.2.3. Streamlining the Neck Network

In an effort to curtail the proliferation of similar feature maps that arise during feature extraction, this study replaces the Bottleneck architecture within the C3 module with the more streamlined GhostBottleneck. This involves replacing the conventional convolutional Conv module of the Bottleneck with a GhostConv convolutional module [39]. With the same number of input and output channels, the C3Ghost module boasts a reduced parameter count.

The Ghost module operates by partitioning the convolutional kernel into multiple smaller groups. It employs a modest set of standard convolutional kernels to initially extract features from the input data, yielding a collection of foundational feature maps. Subsequently, these undergo a series of elementary linear transformations, resulting in a vast array of novel feature maps. All feature maps are then efficiently concatenated to produce the Ghost convolution’s ultimate output. Figure 4 depicts the architecture and operational mechanism of the Ghost module, which is centered around the idea of generating “ghost features” to diminish computational requirements. The process starts with the extraction of primary features through standard convolution, which is then followed by light-weight convolutional operations to generate feature maps that closely mirror the original set. These analogous feature maps are not simple duplicates; instead, they serve to expand the original features’ dimensions in an efficient manner, leading to an enrichment of the information they carry. By employing this method, the Ghost module is capable of markedly reducing both the parameter count and computational complexity while preserving the network’s performance. This results in a network that is more efficient and well-suited for deployment in environments with limited resources.

As depicted in Figure 5, the GhostConv module initiates the process by employing a 1 × 1 convolution to the input feature maps for the purpose of dimensionality reduction, converting the input maps with c channels to have c/2 channels. Following this, a convolutional operation is executed on the feature maps now possessing c/2 channels, and the resulting output feature maps are concatenated with the original feature maps prior to convolution. By applying convolution to only half of the initial feature maps, this method effectively minimizes the proliferation of redundant feature maps that typically arise during feature extraction. This not only economizes on hardware storage space but also accelerates the computational process.

2.2.4. Incorporation of the Attention Mechanism

The attention mechanism has become a staple in machine learning and natural language processing, designed to direct models to concentrate on the most critical aspects of the input data. Conventional convolutional layers may not prioritize feature importance, potentially diminishing model efficacy. Leveraging the local self-similarity present in feature maps, the SimAM attention mechanism module selectively modulates the weight of each pixel. It achieves this by dynamically computing the degree of similarity between each pixel and the pixels in its vicinity within the feature map. Consequently, this leads to the amplification of salient features and the dampening of irrelevant features. In this research, we introduce the SimAM [40] attention mechanism module to bolster the convolutional neural network’s capacity to concentrate on salient features, with the goal of elevating the precision of the network in detecting potato seed tuber buds and defects.

SimAM represents a three-dimensional attention module that employs an energy function to calculate attention weights efficiently, offering a streamlined and robust attention mechanism for CNNs. Through an adeptly crafted energy function, SimAM markedly amplifies the feature representation capabilities of neural networks. This module seamlessly integrates spatial and channel attention, which can be dynamically applied to feature maps in either a parallel or serial fashion, deducing three-dimensional attention weights from feature maps without the addition of extra parameters. By refining the energy function, SimAM is capable of uncovering the significance of each neuron and leverages a cohesive weight attention module to enhance the energy function, yielding an analytical solution. This approach not only expedites the computation of attention weights but also renders SimAM a lightweight attention module, rendering it highly appropriate for applications with constrained computational resources. A depiction of its architecture is presented in Figure 6.

In the realm of neuroscience, informational neurons typically display activation patterns distinct from those of other neurons and have the capacity to suppress adjacent neurons. Neurons with spatial inhibitory properties are, therefore, of significant importance. To effectively identify these pivotal neurons, methods that assess the linear separability among neurons can be utilized, representing an approach that is both perceptive and practical. Consequently, the following energy function has been formulated for assessment as shown in Equation (1) [41].

e_{t} (w_{t}, b_{t}, y, x_{i}) = {(y_{t} - \hat{t})}^{2} + \frac{1}{M - 1} \sum_{i = 1}^{M - 1} (y_{0} - {\hat{x}}_{i})^{2}

(1)

Throughout the training process, efforts are dedicated to enhancing the linear separability among neuron t and other neurons within the same channel. This optimization can be given by minimizing one of the following Equations (2) and (3).

\hat{t} = w_{t} t + b_{t}

(2)

{\hat{x}}_{i} = w_{t} x_{i} + b_{t}

(3)

Binary labels are utilized, and regularization terms are incorporated to bolster the model’s capacity for generalization. The resultant energy function is defined as shown in Equation (4).

e_{t} (w_{t}, b_{t}, y, x_{i}) = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(- 1 - (w_{t} x_{i} + b_{t}))}^{2} + {(1 - (w_{t} t + b_{t}))}^{2} + λ w_{t}^{2}

(4)

Each channel theoretically encompasses

M = H \times W

separate energy functions. From the aforementioned mathematical expression, an analytical solution can be deduced as shown in Equations (5)–(8).

w_{t} = - \frac{2 (t - u_{t})}{{(t - u_{t})}^{2} + 2 σ_{t}^{2} + 2 λ}

(5)

b_{t} = - \frac{1}{2} (t + u_{t}) w_{t}

(6)

u_{t} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} x_{i}

(7)

σ_{t}^{2} = \frac{1}{M - 1} \sum_{i = 1}^{M - 1} {(x_{i} - u_{t})}^{2}

(8)

Here,

u_{t}

and

σ_{t}

denote the average and the variance, respectively, of all neurons in that channel excluding neuron t. Thus, the formula for the minimized energy

e_{t}^{*}

is presented in Equation (9), where

1 / e_{t}^{*}

signifies the weight associated with the neuron.

e_{t}^{*} = \frac{4 ({\hat{σ}}^{2} + λ)}{{(t - \hat{u})}^{2} + 2 {\hat{σ}}^{2} + 2 λ}

(9)

2.2.5. Improved YOLOv5s-Based Network Architecture

This research replaces the conventional convolutional module within the base network architecture with a Depth-wise Separable Convolution (DP_Conv) module. This modification enhances model efficiency and facilitates a lightweight design by removing superfluous parameters, without markedly affecting the model’s recognition accuracy. In the neck network, the standard C3 module has been upgraded to the C3Ghost module, where the traditional Bottleneck component is substituted by the GhostBottleneck, resulting in a reduced model size during training. Furthermore, the inclusion of the SimAM attention mechanism module introduces an efficient and robust attention mechanism for convolutional neural networks, bolstering their capacity to concentrate on critical features. Consequently, the YOLOv5s model developed in this study is designated as DCS-YOLOv5s, and its architecture is depicted in Figure 7.

2.3. Experimental Apparatus

The enhanced model, based on YOLOv5 and presented in this paper, is developed using the open-source framework PyTorch. Experiments are conducted on a Windows 10 system computer equipped with an NVIDIA graphics card. The hardware specifications include an NVIDIA Tesla V100-16GB GPU with 16GB of dedicated video memory, an Intel(R) Xeon(R) Platinum 8160T CPU @2.10GHz, and 64GB of RAM.

During the model development, the CUDA and cuDNN environments are configured to harness the GPU’s capabilities for accelerating data computation and training procedures, thereby effectively facilitating the research on multi-object recognition for potato seed tubers. For this experimental setup, the dimensions of the input images are configured to be 640 pixels by 640 pixels.

Precision (P) is defined as the ratio of the true positive predictions to the total number of predictions made as positive. Recall (R) is the ratio of the true positive predictions to the total actual number of positive instances. These are illustrated in Equations (10) and (11), where TP indicates true positives, FP indicates false positives, and FN indicates false negatives.

P = \frac{T P}{T P + F P}

(10)

R = \frac{T P}{T P + F N}

(11)

The mean Average Precision (mAP) represents the average of the Average Precision (AP) values for all target categories, and it stands as the most critical metric for assessing the performance of object detectors. The formulas are shown in Equations (12) and (13).

m A P = \frac{\sum_{1}^{n} \int_{0}^{1} P (R) d R}{n}

(12)

A P = \int_{0}^{1} (P * R) d R

(13)

In which, n represents the quantity of all target classes, specifically set at n = 4 for this research. To determine the mean Average Precision (mAP), one must first calculate the foundational evaluation metric known as Intersection over Union (IOU) [39]. The IOU indicates the proportion of the overlap between the predicted bounding box and the actual bounding box, which is a metric for assessing the degree of overlap between the areas enclosed by two bounding boxes. As depicted in Figure 8, the target manually labeled with a red rectangle signifies the true value, whereas the target labeled with a blue rectangle denotes the inferred prediction from the model.

The formula for calculating the Intersection over Union (IOU) value is presented in Equation (14).

I O U = \frac{A \cap B}{A \cup B}

(14)

A∩B refers to the area of overlap between the predicted region and the actual region, while A∪B denotes the total area covered by both the predicted region and the actual region. In tasks of object detection and segmentation, an IOU threshold is established to ascertain the correctness of the model’s inference. The typical IOU thresholds are set at 0.5 and between 0.5 and 0.95. Within the scope of this research, the term mean Average Precision at the first Intersection over Union (mAP1) indicates mAP at an IOU threshold of 0.5, while the mean Average Precision at the second Intersection over Union (mAP2) refers to mAP across the range of IOUs from 0.5 to 0.95.

FLOPs stands for Floating Point Operations, a measure of the computational intensity of an algorithm or model, focusing on the count of operations executed on floating-point numbers as the algorithm or model runs.

Frames Per Second (FPS) is a critical metric for evaluating the speed performance of an algorithm when dealing with images, signifying the quantity of image frames the algorithm can recognize and process within a second.

3. Results

3.1. Comparative Experimental Results with Other Models

Frames Per Second (FPS) is a critical metric for evaluating the speed performance of an algorithm when dealing with images, signifying the quantity of image frames the algorithm can recognize and process within a second.

The research presents the DCS-YOLOv5s model, developed for the quality assessment of potato seed tubers, utilizing the YOLOv5s single-stage object detection methodology. In an effort to appraise the efficacy of the advanced DCS-YOLOv5s model for multi-object detection within this domain, a comparative analysis was conducted with two established object detection algorithms, namely Faster R-CNN and SSD. The outcomes of the various models’ training processes are depicted in Figure 9, while the performance indicators are detailed in Table 1 and Table 2.

The curve of the DCS-YOLOv5s model in Figure 9 demonstrates a marked upward trajectory, indicating swift gains in precision. As iterations progress, the acceleration in accuracy enhancement for the DCS-YOLOv5s model slows, the curve levels off, and the model’s performance settles at an elevated detection precision plateau. During the advanced phases of model training, the DCS-YOLOv5s model’s curve manifests a pronounced advantage over other models, underscoring its excellence in the domain of quality detection for potato seed tubers. Conversely, while the curves for Faster R-CNN and YOLOv5s also ascend with increasing iterations, their overall performance does not match that of DCS-YOLOv5s. Additionally, when juxtaposed with DCS-YOLOv5s, the SSD exhibits the most pronounced disparity in overall performance.

Table 1 and Table 2 showcase a comparative analysis of the DCS-YOLOv5s model’s performance in contrast with models like SSD, YOLOv5s, and Faster R-CNN across a spectrum of evaluative metrics. The DCS-YOLOv5s model manifests enhanced performance across all evaluated indicators, achieving a precision rate of 95.8% and a recall rate of 93.2% in its detection outcomes. Furthermore, it attains mAP1 and mAP2 scores of 97.1% and 66.2%, respectively, demonstrating its superiority in multi-class detection assignments. The model is also notably compact, with a parameter count of 4.68×10⁶, 10.7 G FLOPs, and a compact weight size of 9.2 MB, facilitating its deployment and practical application. Comparatively, the SSD model records the lowest precision, and while Faster R-CNN marginally surpasses YOLOv5s in precision, it falls behind in the remaining three metrics, most notably in recall and mAP. Faster R-CNN is also encumbered by a substantial parameter count and extensive computational requirements, translating to a mere processing rate of 9 FPS, inadequate for the exigencies of real-time detection tasks. YOLOv5s, despite its modest parameter count, necessitates a greater investment in parameters and computational resources when juxtaposed with DCS-YOLOv5s. Although SSD and YOLOv5s boast relatively swift detection velocities, their holistic performance is eclipsed by DCS-YOLOv5s, substantiating the latter’s dominance in real-time detection scenarios that necessitate stringent accuracy and expeditious responsiveness.

3.2. Ablation Study

This research employs YOLOv5s as the foundation for a multi-object detection model designed for potato seed tubers. An ablation study is performed on the experimental dataset to dissect and confirm the precision of the DP_Conv module, C3Ghost module, and the SimAM attention mechanism. The outcomes of this ablation study are depicted in Figure 10 and detailed in Table 3.

Figure 10 and Table 3 illustrate that following the replacement of the original backbone network’s convolutional modules with Depth-wise Separable Convolution (DP_Conv) modules, enhancements were observed across all four precision metrics. Specifically, the Precision (P), Recall (R), mean Average Precision at 0.5 (mAP1), and mean Average Precision at 0.5 to 0.95 (mAP2) rose to 93.7%, 88.3%, 93.0%, and 59.8%, respectively. Concurrently, there was a reduction in the parameter count, Floating Point Operations (FLOPs), and model weight size to 5.64 × 10⁶, 12.6 gigaflops (G), and 11.0, respectively, along with an increase in detection speed, reaching 63 frames per second (FPS). This indicates that in the study of potato seed-tuber bud and defect detection, certain modules within the backbone network can be optimized for efficiency by eliminating redundant parameters without significantly impacting the model’s recognition accuracy. Subsequently, substituting the original C3 module’s Bottleneck with the GhostBottleneck module resulted in further improvements, with P, R, mAP1, and mAP2 increasing to 93.4%, 90.2%, 94.7%, and 62.5%, respectively. The parameter count, FLOPs, and weight size were further reduced to 4.68 × 10⁶, 10.7 G, and 9.2, respectively, and the detection speed was enhanced to 67 FPS. Ultimately, the incorporation of the SimAM attention mechanism module led to the most significant improvements, with P, R, mAP1, and mAP2 reaching 95.8%, 93.2%, 97.1%, and 66.2%, respectively. These enhancements highlight the benefits of each module in synergistically boosting the model’s detection precision and velocity.

Upon examining the comprehensive model’s precision metrics, parameter count, and FLOPs, it is evident that in the study of multi-object detection models for potato seed tubers, the DCS-YOLOv5s model has achieved a notable enhancement in both detection accuracy and velocity. Figure 11 depicts the accuracy of various target categories detected by the DCS-YOLOv5s model, with the mAP1 values for each category exceeding 95%, specifically 97.1%, 95.3%, 96.1%, 98.2%, and 98.7%. Apart from the Recall (R) value for wormhole detection being 89.7%, the Precision (P) and R values for all other categories have surpassed 90%. The performance data from the refined model indicate that the DCS-YOLOv5s detection model is capable of fulfilling the demands for real-time multi-object detection of potato seed tubers.

3.3. Post-Improvement Model Result Analysis through Multi-Stage Enhancements

In order to provide a more comprehensive assessment of the enhanced DCS-YOLOv5s model’s capabilities in multi-object detection for seed potatoes, a visualization comparison has been conducted using a heatmap (Grad-CAM) to evaluate the model’s performance before and after the refinements. The Grad-CAM heatmap offers a direct visual representation of the model’s focus on input features, comparing the level of attention before and after the improvements. Specifically, the Grad-CAM analysis was applied to the second detection head of the original YOLOv5s model, which is the prediction layer at step 20, and compared with the second detection head of the improved model at step 21. The resulting heatmap for the multi-object detection of seed potatoes is depicted in Figure 12. The enhanced model demonstrates increased activity in the heatmap for areas of interest in seed potato detection and reduced activity in non-target areas. This indicates that the network’s model has improved in terms of both the extent of coverage and the degree of focus on the detection areas of interest for seed potatoes.

Figure 13 corresponds to the annotation files of the dataset, and Figure 14 and Figure 15 il-lustrate sample recognition effects on an identical test set for both the YOLOv5s and DCS-YOLOv5s models, respectively. A comparison between the recognition images and the original annotations reveals that the YOLOv5s model has issues with missed and false detections for the identification of potato seed-tuber buds. The small size of the buds makes them easily confusable with surface defects, and their detection is further complicated by factors such as angle and lighting conditions, leading to a higher likelihood of missing small buds on the tuber’s edges. In Figure 14, missed targets are highlighted with green circles, and false detections are marked with purple circles. Figure 15 demonstrates that the enhanced DCS-YOLOv5s model has substantially decreased the rate of missed detections in the multi-object identification process of seed tubers, yielding more precise outcomes. While there are occasional misidentifications, the overall recognition performance is notably better than that of the YOLOv5s model.

4. Discussion

The DCS-YOLOv5s model, developed utilizing deep learning techniques in this study, holds substantial significance for elevating the mechanization of seed potato preparation processes and for hastening the advancement of China’s potato cultivation industry, particularly in the realm of multi-object recognition for potato seed tubers. Trials conducted on the seed potato image dataset emphasized three key areas: the streamlining of the YOLOv5s backbone and neck networks, and the incorporation of an attention mechanism to augment model accuracy. The DP_Conv module achieved an optimal equilibrium between precision and velocity, the C3Ghost module, which supplanted the original C3 module in the model, substantiated the model’s efficacy. Ultimately, the introduction of the SimAM attention mechanism prior to the SPPF in the backbone network indicated that the multi-stage refined DCS-YOLOv5s model has bolstered the extent of coverage and the level of attention to the target regions of seed potatoes.

The lightweight DCS-YOLOv5s model designed in this research is capable of functioning on cost-effective hardware, offering a reference for agricultural applications with constrained resources. The model has achieved significant improvements in its Precision (P), Recall (R), mean Average Precision at 0.5 (mAP1), and mean Average Precision at 0.5 to 0.95 (mAP2), with the values reaching 95.8%, 93.2%, 97.1%, and 66.2%, respectively. In comparison with state-of-the-art detectors such as YOLOv5n, YOLOX, YOLOv7, and Improved-YOLOv7, the DCS-YOLOv5s model surpasses them in terms of average precision for bud detection by margins of 18.93%, 17.4%, 5.9%, and 1.7%, respectively [42]. Moreover, in defect detection tasks, it outperforms YOLOv7 and YOLOv7-LSA by 5.1% and 5.0% in average precision, respectively, indicating a strengthened capability in feature extraction [43]. These advancements highlight the DCS-YOLOv5s model’s adaptability and effectiveness in the quality assessment of potato seed tubers.

Subsequent studies may concentrate on diversifying the dataset by incorporating a broader range of seed potato varieties, which would fortify the model’s robustness and generalize its applicability, allowing it to conduct detection tasks across different seed potato varieties in tandem. Furthermore, as deep learning continues to evolve, adopting superior network architectures to refine the model could lead to additional improvements in its precision for detecting small targets in dynamic detection environments.

In conclusion, the DCS-YOLOv5s model, which is an enhancement of the YOLOv5s model, has demonstrated its capability in effectively categorizing potato seed tubers through multi-object detection. Both the detection velocity and precision of the model have been substantiated by experimental outcomes, offering a robust solution to augment the accuracy and efficiency of intelligent agricultural tasks. The study’s outcomes affirm the model’s suitability for real-world applications and establish a foundation for the future development of analogous models. With the ongoing advancement of the agricultural industry, the necessity for innovative approaches such as the DCS-YOLOv5s model is anticipated to grow, driving the sector towards enhanced sustainability and productivity.

5. Conclusions

The study of methods for detecting the quality of potato seed tubers is crucial for advancing the mechanization of seed tuber preparation and for hastening the growth of China’s potato cultivation industry. This paper introduces an enhanced detection algorithm based on YOLOv5s, known as DCS-YOLOv5s, which is adept at recognizing multiple targets on potato seed tubers, thereby enabling their effective segregation. The core of this research entails the deployment of the Depth-Wise Separable Convolution DP_Conv module to streamline the backbone network. Additionally, GhostConv convolution is harnessed to minimize the proliferation of similar feature maps during feature extraction, rendering the model more lightweight. The research is capped with the incorporation of the SimAM attention mechanism module, which bolsters the convolutional neural network’s capacity to concentrate on pivotal features. The enhanced model boasts mAP1 and mAP2 values of 97.1% and 66.2%, respectively, marking an improvement of 5.4% and 9.8%. The detection speed has also been elevated to 65 frames per second (FPS), an augmentation of 12.07%. The findings of this research demonstrate that the DCS-YOLOv5s model has augmented the coverage and attentiveness to the target regions of seed tubers. This investigation has focused on detecting features such as potato seed tuber buds, mechanical damages, wormholes, and tails. Ongoing research will concentrate on expanding the dataset’s diversity to more effectively tackle the detection challenges presented by different potato varieties. Our plan is to introduce an increased number of samples, encompassing images from various growth phases and environmental settings, thereby boosting the model’s ability to generalize across different conditions. Additionally, we are committed to exploring state-of-the-art network architectures with the goal of further optimizing the model’s performance, especially in terms of detecting smaller targets, and increasing its accuracy and dependability. Such enhancements will bolster the model’s efficacy in real-world applications, enabling it to adapt more readily to intricate scenarios and a wide array of detection demands.

Author Contributions

Writing—review and editing, formal analysis, supervision, Z.Q.; conceptualization, writing—original draft, methodology, W.W.; funding acquisition, X.J.; investigation, writing—review and editing, F.W.; supervision, Z.H.; project administration, J.J.; resources, data curation, S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program Sub-project (No. 2022YFD2001205C) and Henan Province Major Science and Technology Special Project (No. 231100110200).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to extend our thanks to the Henan Province Major Science and Technology Special Project, “Development and Industrialization of High-Performance Machinery for the Cultivation, Planting, and Harvesting of Major Food Crops,” for their sponsorship of this study.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Devaux, A.; Goffart, J.-P.; Kromann, P.; Andrade-Piedra, J.; Polar, V.; Hareau, G. The potato of the future: Opportunities and challenges in sustainable agri-food systems. Potato Res. 2021, 64, 681–720. [Google Scholar] [CrossRef] [PubMed]
He, Z.; Larkin, R.; Honeycutt, W. Sustainable Potato Production: Global Case Studies; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
Xu, N.; Zhang, H.; Zhang, R.; Xu, Y. Current situation and prospect of potato planting in China. Chin. Potato J. 2021, 35, 81–96. [Google Scholar]
Li, Z.; Wen, X.; Lv, J.; Li, J.; Yi, S.; Qiao, D. Analysis and prospect of research progress on key technologies and equipments of mechanization of potato planting. Trans. Chin. Soc. Agric. Mach. 2019, 50, 1–16. [Google Scholar]
Yang, Z.; Sun, W.; Liu, F.; Zhang, Y.; Chen, X.; Wei, Z.; Li, X. Field collaborative recognition method and experiment for thermal infrared imaging of damaged potatoes. Comput. Electron. Agric. 2024, 223, 109096. [Google Scholar] [CrossRef]
Lv, X.; Zhang, X.; Gao, H.; He, T.; Lv, Z.; Zhangzhong, L. When Crops meet Machine Vision: A review and development framework for a low-cost nondestructive online monitoring technology in agricultural production. Agric. Commun. 2024, 2, 100029. [Google Scholar] [CrossRef]
Xu, J.; Lu, Y. Prototyping and evaluation of a novel machine vision system for real-time, automated quality grading of sweetpotatoes. Comput. Electron. Agric. 2024, 219, 108826. [Google Scholar] [CrossRef]
Bi, S.; Gao, F.; Chen, J.; Zhang, L. Detection method of citrus based on deep convolution neural network. Trans. Chin. Soc. Agric. Mach. 2019, 50, 181–186. [Google Scholar]
Xie, W.; Ding, W.; Wang, F.; Wei, S.; Yang, D. Integrity recognition of camellia oleifera seeds based on convolutional neural network. Trans. Chin. Soc. Agric. Mach. 2020, 51, 13–21. [Google Scholar]
Mao, S.; Liu, Z.; Luo, Y. A deep learning-based method for estimating the main stem length of sweet potato seedlings. Measurement 2024, 238, 115388. [Google Scholar] [CrossRef]
Yang, Y.; Zhao, X.; Huang, M.; Wang, X.; Zhu, Q. Multispectral image based germination detection of potato by using supervised multiple threshold segmentation model and Canny edge detector. Comput. Electron. Agric. 2021, 182, 106041. [Google Scholar] [CrossRef]
Ji, Y.; Sun, L. Nondestructive Classification of Potatoes Based on HSI and Clustering. In Proceedings of the 2019 4th International Conference on Measurement, Information and Control (ICMIC), Harbin, China, 23–25 August 2019; pp. 73–77. [Google Scholar]
Li, Y.; Li, T.; Niu, Z.; Wu, Y.; Zhang, Z.; Hou, J. Potato bud eyes recognition based on three-dimensional geometric features of color saturation. Trans. CSAE 2019, 34, 158–164. [Google Scholar]
Lopez-Juarez, I.; Rios-Cabrera, R.; Hsieh, S.; Howarth, M. A hybrid non-invasive method for internal/external quality assessment of potatoes. Eur. Food Res. Technol. 2018, 244, 161–174. [Google Scholar] [CrossRef]
Xi, R.; Hou, J.; Li, L. Fast segmentation on potato buds with chaos optimization-based K-means algorithm. Trans. Chin. Soc. Agric. Eng. 2019, 35, 190. [Google Scholar]
Barnes, M.; Duckett, T.; Cielniak, G.; Stroud, G.; Harper, G. Visual detection of blemishes in potatoes using minimalist boosted classifiers. J. Food Eng. 2010, 98, 339–346. [Google Scholar] [CrossRef]
Liu, D.; Li, S.; Cao, Z. State-of-the-art on deep learning and its application in image object classification and detection. Comput. Sci. 2016, 43, 13–23. [Google Scholar]
Cummins, N.; Baird, A.; Schuller, B.W. Speech analysis for health: Current state-of-the-art and the increasing impact of deep learning. Methods 2018, 151, 41–54. [Google Scholar] [CrossRef]
Lan, Y.; Zhao, D.; Zhang, Y.; Zhu, J. Exploration and development prospect of eco-unmanned farm modes. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2021, 37, 312–327. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ouf, N.S. Leguminous seeds detection based on convolutional neural networks: Comparison of faster R-CNN and YOLOv4 on a small custom dataset. Artif. Intell. Agric. 2023, 8, 30–45. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Lee, H.-S.; Shin, B.-S. Potato detection and segmentation based on mask R-CNN. J. Biosyst. Eng. 2020, 45, 233–238. [Google Scholar] [CrossRef]
Li, J.; Lin, L.; Tian, K.; Alaa, A.A. Detection of leaf diseases of balsam pear in the field based on improved Faster R-CNN. Trans. Chin. Soc. Agric. Eng. 2020, 36, 179–185. [Google Scholar]
Chen, K.; Zhu, L.; Song, P.; Tian, X.; Huang, C.; Nie, X.; Xiao, A.; He, L. Recognition of cotton terminal bud in field using improved Faster R-CNN by integrating dynamic mechanism. Trans. CSAE 2021, 37, 161–168. [Google Scholar]
Xi, R.; Jiang, K.; Zhang, W.; Lv, Z.; Hou, J. Recognition method for potato buds based on improved faster R-CNN. Trans. Chin. Soc. Agric. Mach. 2020, 51, 216–223. [Google Scholar]
Liang, X.; Zhang, X.; Wang, Y. Recognition method for the pruning points of tomato lateral branches using improved Mask R-CNN. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2022, 38, 112–121. [Google Scholar]
Shi, Y.; Qing, S.; Zhao, L.; Wang, F.; Yuwen, X.; Qu, M. YOLO-Peach: A High-Performance Lightweight YOLOv8s-Based Model for Accurate Recognition and Enumeration of Peach Seedling Fruits. Agronomy 2024, 14, 1628. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. 2016; pp. 21–37. [Google Scholar]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, X.; Zhao, Q.; Jiang, P.; Zheng, Y.; Yuan, L.; Yuan, P. LDS-YOLO: A lightweight small object detection method for dead trees from shelter forest. Comput. Electron. Agric. 2022, 198, 107035. [Google Scholar] [CrossRef]
Dang, F.; Chen, D.; Lu, Y.; Li, Z. YOLOWeeds: A novel benchmark of YOLO object detectors for multi-class weed detection in cotton production systems. Comput. Electron. Agric. 2023, 205, 107655. [Google Scholar] [CrossRef]
Shi, F.; Wang, H.; Huang, H. Research on potato buds detection and recognition based on convolutional neural network. J. Chin. Agric. Mech. 2022, 43, 159. [Google Scholar]
Zeng, T.; Li, S.; Song, Q.; Zhong, F.; Wei, X. Lightweight tomato real-time detection method based on improved YOLO and mobile deployment. Comput. Electron. Agric. 2023, 205, 107625. [Google Scholar] [CrossRef]
Wang, X.; Zhu, S.; Li, X. Design and experiment of directional arrangement vertical and horizontal cutting of seed potato cutter. Trans. Chin. Soc. Agric. Mach. 2020, 51, 334–345. [Google Scholar]
Yi, S.; Li, J.; Zhang, P. Detecting and counting of spring-see citrus using YOLOv4 network model and recursive fusion of features. Trans. Chin. Soc. Agric. Eng. 2021, 37, 161–169. [Google Scholar]
Kaur, G.; Sivia, J.S. Development of deep and machine learning convolutional networks of variable spatial resolution for automatic detection of leaf blast disease of rice. Comput. Electron. Agric. 2024, 224, 109210. [Google Scholar] [CrossRef]
Zhang, J.; Tian, M.; Yang, Z.; Li, J.; Zhao, L. An improved target detection method based on YOLOv5 in natural orchard environments. Comput. Electron. Agric. 2024, 219, 108780. [Google Scholar] [CrossRef]
Yang, L.; Zhang, R.-Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Qin, X.; Li, N.; Weng, C.; Su, D.; Li, M. Simple attention module based speaker verification with iterative noisy label detection. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 6722–6726. [Google Scholar]
Zhang, W.; Zhang, H.; Liu, S.; Zeng, X.; Mu, G. Detection of potato seed buds based on an improved YOLOv7 model. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2023, 39, 148–158. [Google Scholar]
Luo, T. Research on Potato Defect Detection Based on Improved YOLOv7. Master’s Thesis, Ningxia University, Yinchuan, China, 2023. [Google Scholar]

Figure 1. Selected Augmented Images.

Figure 2. Schematic of the YOLOv5 Network Architecture.

Figure 3. Illustration of Depth-wise Separable Convolution.

Figure 4. Illustration of the Ghost Module.

Figure 5. Illustration of the GhostConv Module.

Figure 6. Illustration of the SimAM Attention Mechanism.

Figure 7. Architecture of the DCS-YOLOv5s Model.

Figure 8. Illustration of Intersection over Union (IOU).

Figure 9. Accuracy Metric Trend Lines for Various Detection Models on the Validation Dataset.

Figure 10. Ablation Study Outcome Graph.

Figure 11. Category Precision Comparison in DCS-YOLOv5s Model Detection.

Figure 12. Multi-Object Detection Heatmaps for Seed Potatoes: (a) 20th Layer Prediction Heatmap of the YOLOv5s Model, (b) 21st Layer Prediction Heatmap of the DCS-YOLOv5s Model.

Figure 13. Seed Potato Test Image Annotation Files. (The □ symbol in the image represents the detection target, and the labels in the image represent detection categories).

Figure 14. YOLOv5s Model Detection Illustrations. (The ☐ symbol in the image represents the detection target, the labels in the image represent detection categories and confidence values, the ○ in the image represents a missed target, and the ○ in the image represents the false detection target).

Figure 15. DCS-YOLOv5s Model Detection Illustrations. (The ☐ symbol in the image represents the detection target, the labels in the image represent detection categories and confidence values, and the ○ in the image represents the false detection target).

Table 1. Precision Performance Indicators for Various Detection Models on the Test Set.

Model	P (%)	R (%)	mAP1 (%)	mAP2 (%)
Faster RCNN	92.0	83.3	90.5	55.4
SSD	88.0	82.8	88.4	52.6
YOLOv5s	91.6	87.5	91.7	56.4
DCS-YOLOv5s	95.8	93.2	97.1	66.2

Table 2. Parameter Performance Indicators for Various Detection Models.

Model	Parameter Volume	FLOPs (G)	Weight Size (MB)	FPS
Faster RCNN	59.26 × 106	132.4	105.5	9
SSD	24.28 × 106	85.6	78.9	44
YOLOv5s	7.03 × 106	16.0	11.8	58
DCS-YOLOv5s	4.68 × 106	10.7	9.2	65

Table 3. Ablation Study Outcomes.

Number	DP_Conv	C3Ghost	SimAM	P (%)	R (%)	mAP1 (%)	mAP2 (%)
1	-	-	-	91.6	87.5	91.7	56.4
2	√	-	-	93.7	88.3	93.0	59.8
3	√	√	-	93.4	90.2	94.7	62.5
4	√	√	√	95.8	93.2	97.1	66.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, Z.; Wang, W.; Jin, X.; Wang, F.; He, Z.; Ji, J.; Jin, S. DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s. Agronomy 2024, 14, 2558. https://doi.org/10.3390/agronomy14112558

AMA Style

Qiu Z, Wang W, Jin X, Wang F, He Z, Ji J, Jin S. DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s. Agronomy. 2024; 14(11):2558. https://doi.org/10.3390/agronomy14112558

Chicago/Turabian Style

Qiu, Zhaomei, Weili Wang, Xin Jin, Fei Wang, Zhitao He, Jiangtao Ji, and Shanshan Jin. 2024. "DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s" Agronomy 14, no. 11: 2558. https://doi.org/10.3390/agronomy14112558

APA Style

Qiu, Z., Wang, W., Jin, X., Wang, F., He, Z., Ji, J., & Jin, S. (2024). DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s. Agronomy, 14(11), 2558. https://doi.org/10.3390/agronomy14112558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DCS-YOLOv5s: A Lightweight Algorithm for Multi-Target Recognition of Potato Seed Potatoes Based on YOLOv5s

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Assembly

2.2. Enhancements to the YOLOv5s Model

2.2.1. Underlying Framework of the YOLOv5s Network

2.2.2. Streamlining the Backbone Network

2.2.3. Streamlining the Neck Network

2.2.4. Incorporation of the Attention Mechanism

2.2.5. Improved YOLOv5s-Based Network Architecture

2.3. Experimental Apparatus

3. Results

3.1. Comparative Experimental Results with Other Models

3.2. Ablation Study

3.3. Post-Improvement Model Result Analysis through Multi-Stage Enhancements

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI