An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate

Wu, Zhaoyun; Zhang, Yehao; Zhang, Zhongwei; Shen, Fasheng; Li, Li; He, Xuewu; Zhong, Hongyu; Zhou, Yufei

doi:10.3390/agriculture15151595

Open AccessArticle

An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate

by

Zhaoyun Wu

¹,

Yehao Zhang

¹,

Zhongwei Zhang

^1,*

,

Fasheng Shen

²,

Li Li

¹,

Xuewu He

¹,

Hongyu Zhong

¹ and

Yufei Zhou

¹

School of Mechanical and Electrical Engineering, Henan University of Technology, Zhengzhou 450001, China

²

Shandong Alesmart Intelligent Technology Co., Ltd., Jinan 250014, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(15), 1595; https://doi.org/10.3390/agriculture15151595

Submission received: 18 June 2025 / Revised: 15 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate and real-time detection of rice shelling rate (SR) and brown rice breakage rate (BR) is crucial for intelligent hulling sorting but remains challenging because of small grain size, dense adhesion, and uneven illumination causing missed detections and blurred boundaries in traditional YOLOv8n. This paper proposes a high-precision, lightweight solution based on an enhanced YOLOv8n with improvements in network architecture, feature fusion, and attention mechanism. The backbone’s C2f module is replaced with C2f-Faster-CGLU, integrating partial convolution (PConv) local convolution and convolutional gated linear unit (CGLU) gating to reduce computational redundancy via sparse interaction and enhance small-target feature extraction. A bidirectional feature pyramid network (BiFPN) weights multiscale feature fusion to improve edge positioning accuracy of dense grains. Attention mechanism for fine-grained classification (AFGC) is embedded to focus on texture and damage details, enhancing adaptability to light fluctuations. The Detect_Rice lightweight head compresses parameters via group normalization and dynamic convolution sharing, optimizing small-target response. The improved model achieved 96.8% precision and 96.2% mAP. Combined with a quantity–mass model, SR/BR detection errors reduced to 1.11% and 1.24%, meeting national standard (GB/T 29898-2013) requirements, providing an effective real-time solution for intelligent hulling sorting.

Keywords:

paddy grains; lightweight detection; YOLOv8n; shelling rate; brown rice breakage rate

1. Introduction

As a traditional staple food, rice plays a vital role in feeding our population. With projected global population growth, rice demand is expected to increase by 30% by 2050 [1]. The rice processing chain—comprising cleaning, hulling, milling, finishing, and packaging—relies critically on the hulling stage. This key process determines both processing efficiency and final product quality [2]. During this stage, a rubber roller huller removes paddy grain husks through the friction generated between its rubber rollers [3]. The shelling rate [4] and brown rice breakage rate [5] directly impact both product quality and economic value. A low shelling rate indicates ineffective recycling and reprocessing of unshelled rice grains, leading to raw material waste. Conversely, a high brown rice breakage rate increases the quantity of broken grains during subsequent milling, thereby reducing the market value of the finished rice [6]. Consequently, the shelling rate and brown rice breakage rate serve as critical performance indicators for rubber roller hullers. However, current industry practice predominantly relies on manual sampling for monitoring these two indicators. This method suffers from low efficiency, poor timeliness, and susceptibility to sampling inaccuracies. Consequently, operators struggle to capture huller operating status accurately and promptly, ultimately compromising quality control and production efficiency at this critical processing stage [7].

Driven by rapid advances in deep learning and continuous improvements in computational hardware, object detection methodologies have transitioned from traditional feature-engineering approaches to contemporary deep learning-based frameworks [8]. Contemporary object detection methodologies are primarily categorized into two paradigms. One consists of two-stage algorithms, exemplified by Fast R-CNN and Faster R-CNN [9], which first generate region proposals then perform classification and localization. For instance, in agricultural grain inspection applications, Wu et al. [10] developed a specialized detection framework based on Faster R-CNN, achieving effective wheat grain detection and counting under complex backgrounds, multiangle perspectives, and varying degrees of grain density. Nevertheless, despite its robust detection performance, the method’s computationally intensive architecture and processing complexity hinder real-time implementation, limiting its practical deployment in high-throughput agricultural scenarios. In contrast, single-stage detection frameworks, exemplified by YOLO (You Only Look Once), achieve a favorable balance between speed and accuracy, enabling real-time object detection with minimal computational overhead. This efficiency makes them particularly suitable for time-sensitive grain inspection tasks, where rapid and accurate analysis of paddy grains, wheat kernels, or other cereals is critical for quality control and yield estimation. As a result, YOLO-based architectures have been widely adopted in grain inspection applications in recent years. For instance, Liu et al. [11] proposed an enhanced YOLOv5 model for broken rice detection. Experimental results demonstrated that the enhanced model achieved a mean Average Precision (mAP) of 98.9%, representing a 0.3% improvement over the baseline YOLOv5, while reducing both model parameters and computational complexity by over 85%. Addressing the challenges of similar-sized corn kernels and impurities, high object density, and small target dimensions, Zhang et al. [12] developed an enhanced YOLOv8 architecture. Their method achieved 95.33% precision for broken detection and 96.15% for impurity detection. To develop a rapid and nondestructive method for rice mold detection, Sun et al. [13] proposed a detection framework based on YOLOv5 that identifies mold-infected regions and quantifies their coverage areas in rice samples. The model achieved a precision of 82.1% and recall of 86.5% on the validation set, demonstrating its effectiveness in automated mold assessment under controlled conditions. Recent years have witnessed lightweight network architectures emerging as a key research focus to enhance model deployability in real-world applications. Exemplifying this trend, Li et al. [14] enhanced YOLOv7-tiny through a lightweight feature extraction backbone and decoupled head structure, simultaneously improving detection precision and inference efficiency. To address issues of insufficient detection accuracy, slow inference speed, and high computational resource consumption, Wang et al. [15] substituted the backbone structure of YOLOv8 with MobileNetV3-small. This modification reduced model complexity while maintaining accuracy. Furthermore, the Ghost convolution (GsConv) module was integrated to enhance the feature extraction network, optimizing particle identification and localization. Experimental results demonstrated that the enhanced YOLOv8 achieved 97.4% accuracy, with a 66% reduction in parameters and a 70.7% reduction in computational complexity (FLOPs). To achieve accurate in situ detection and identification of maize pests and diseases, Shi et al. [16] employed EfficientViT as the backbone network to reduce computational complexity. Additionally, they introduced spatial channel reconstruction convolution (SCConv) into the C2f module to enhance feature extraction performance. The improved model, termed YOLOv8-EGCW, was evaluated on a custom-built maize pest and disease dataset, achieving a mean average precision (mAP) of 93.4%.

Despite significant progress in improving grain detection accuracy and model lightweight, a critical gap remains in the development of real-time, lightweight methods specifically tailored for the accurate quantification of rice hulling performance metrics—namely, the shelling rate (SR) and brown rice breakage rate (BR). Current approaches still face persistent challenges such as missed detections and misclassification when processing high-density rice grain images. For instance, Zou et al. [17] demonstrated that in high-density paddy imaging scenarios, traditional detection algorithms struggle to distinguish individual rice grains because of morphological homogeneity and occlusion effects, particularly under partial occlusion conditions. Sun et al. [18] further revealed that severe occlusion in certain panicle categories can result in nearly half of the grains remaining undetected, which significantly compromises counting accuracy. These issues reflect four major limitations of existing methods: (1) data scarcity: high-quality annotated datasets specifically designed for SR/BR detection tasks are largely lacking, limiting model training and generalization capabilities; (2) morphology blindness: conventional models fail to fully exploit critical morphological features of rice grains, such as geometric shape and husk texture, leading to misclassification between intact and broken grains; (3) density-induced errors: in densely packed and overlapping scenarios, high miss rates severely impact counting accuracy, making reliable detection difficult; (4) accuracy–efficiency tradeoff: lightweight models often sacrifice detection accuracy for speed, while high-accuracy models tend to be computationally intensive, posing challenges for real-time deployment. These limitations collectively hinder the practical implementation of automated, real-time SR/BR monitoring systems, which are essential for optimizing rice milling efficiency and quality control.

To address the aforementioned challenges in real-time quantification of rice hulling performance metrics, this study proposes a lightweight rice grain detection framework based on an improved YOLOv8n architecture. The key contributions include: (1) the construction of a high-quality image dataset covering representative rice varieties under diverse illumination conditions and key grain states relevant to hulling quality assessment; (2) the design and implementation of a novel, lightweight detection model incorporating architectural enhancements—including a C2f-Faster-CGLU backbone, BiFPN-based multiscale feature fusion, AFGC fine-grained attention mechanism, and a customized lightweight Detect_Rice detection head—to improve detection accuracy and efficiency for dense, small-target rice grains; and (3) the development of a robust mass–count conversion model that enables accurate quantification of both shelling rate (SR) and brown rice breakage rate (BR), meeting the requirements specified in the national standard GB/T 29898-2013 [19]. This work lays a solid technical foundation for intelligent rice hulling and processing systems.

2. Materials and Methods

2.1. Operating Principle of Grain Husking and Breakage in Rubber Roller Paddy Husker and Analysis of Performance Metrics

2.1.1. Operating Principle of Paddy Husking and Grain Breakage in Rubber Roller Hullers

The rubber roller husker achieves paddy husking through the compressive, shearing, and frictional forces generated by the differential rotation of its dual rollers (Figure 1). The core mechanism is as follows. When paddy grains enter the roller gap—which is narrower than the minor axis length of the paddy grain kernels—they undergo compression and friction. Simultaneously, the linear speed differential between the rollers applies a tearing force to the husk, causing its separation from the endosperm, thus completing the husking process [20].

However, when the linear speed differential between the rubber rollers is insufficient, the generated shear stress fails to exceed the husk’s tensile strength threshold, resulting in unhusked grains. Conversely, excessive speed differential or elevated roller hardness causes compressive stress to surpass the crushing strength of brown rice kernels. This induces endosperm rupture and increases the risk of kernel breakage. Therefore, precise detection and real-time monitoring of husking yield and breakage rate are essential to ensure optimal operational efficiency of rubber roller huskers.

2.1.2. Performance Metrics Analysis for Rubber Roller Huskers

In rubber roller husking operations, shelling rate and brown rice breakage rate serve as the core metrics for evaluating processing quality. Shelling rate is defined as the mass proportion of intact brown rice kernels (m_Full) and broken brown rice kernels (m_Broken) to the total input paddy mass (m_Total) after a single pass through the husker, calculated as:

η = \frac{m_{Full} + m_{Broken}}{m_{Total}} \times 100 %

(1)

where m_Full is the mass of intact brown rice kernels, m_Broken is the mass of broken brown rice kernels, and m_Total is the total mass of input paddy, which includes m_Full, m_Broken, and m_Unshelled.

According to China’s National Standard GB/T 29898–2013, the minimum acceptable shelling rate for qualified husking operations is ≥85% for japonica rice and ≥78% for indica rice varieties. Insufficient shelling rate directly results in raw material loss due to unprocessed grains.

Brown rice breakage rate reflects the mechanical damage sustained by brown rice during processing. It is defined as the mass fraction of broken brown rice kernels relative to the total brown rice mass (including both whole and broken kernels) in the husked rice mixture after hulling. The calculation formula is:

ζ = \frac{m_{Broken}}{m_{Full} + m_{Broken}} \times 100 %

(2)

According to China’s National Standard GB/T 29898–2013, the maximum allowable brown rice breakage rate is ≤2% for japonica rice and ≤5% for indica rice. Elevated brown rice breakage rates not only degrade finished rice quality classification but exacerbate broken kernel generation in subsequent milling operations.

2.2. Image-Based Quantification Methodology for Paddy Quality Metrics

National standards evaluate paddy quality using mass-based metrics such as shelling rate and brown rice breakage rate, whereas automated image-based inspection can directly quantify only kernel counts. To effectively align the detection results with national standards, a mapping relationship between the counts of unhulled grains, broken grains, and total grains in paddy rice and their actual mass must be established. This study selected two representative varieties: long-grain indica rice and round-grain japonica rice. For each variety, we conducted weighing experiments using a precision electronic balance (accuracy: 0.001 g) on 50 randomly sampled hulled kernels, 50 randomly sampled broken kernels, and 50 randomly sampled intact unhulled kernels (total n = 300 kernels per variety), characterizing the single-kernel mass distribution for each kernel category.

Prior to experimentation, grain moisture content was calibrated at 15.0% using an LDS-1G Grain Moisture Tester. The study was conducted in Zhengzhou, Henan Province, China, during October 2024, using the cultivars Zhongxiangyou 1 (long-grain indica) and Jingdao 8 (round-grain japonica). Employing stratified random sampling across three kernel categories (hulled, broken, and intact unhulled), five independent sample plots were established per category (15 plots total) with ten kernels uniformly selected from each plot, yielding 150 sample kernels.

Based on experimental data, a linear regression model was constructed to achieve mass-to-count conversion. The independent variables were defined as kernel counts (N_Hulled, N_Broken, N_Unhulled), with corresponding masses (M_Hulled, M_Broken, M_Unhulled) as dependent variables. Following established mass–count mapping methodologies in agricultural engineering (as referenced in [12]), the model is expressed as:

M_{K} = a_{k} \cdot N_{k} + b_{k} (k \in \{Hulled, Broken, Unhulled\})

(3)

where M_k denotes the total mass (g) of kernel category k, N_k represents the kernel count for category k, a_k signifies the mean single-kernel mass (g/kernel) of category k, and b_k is the model intercept term. Parameters a_k and b_k were fitted using least squares regression, with model goodness-of-fit evaluated through the coefficient of determination (R²). The resulting regression curves and model parameters for both rice varieties are presented in Figure 2 and Table 1.

The fitting results demonstrated that the coefficients of determination (R²) for all grain types—unhulled japonica rice, hulled japonica rice, broken japonica rice, unhulled indica rice, hulled indica rice, and broken indica rice—exceeded 0.99. This validated the significance of the linear relationship between grain quantity and mass. Consequently, the methodology of predicting material mass based on bounding box counts for different grain varieties was substantiated. For the specific rice varieties at 15% moisture content, the predictive model relating object count to mass is expressed as:

\{\begin{cases} M_{Unhulled 1} = 27.563 N_{Unshelled 1} + 3.644 \\ M_{Hulled 1} = 23.714 N_{Full 1} - 5.787 \\ M_{Broken 1} = 11.441 N_{Broken 1} + 7.324 \\ M_{Unhulled 2} = 22.656 N_{Unshelled 2} + 4.650 \\ M_{Hulled 2} = 18.619 N_{Full 2} - 0.494 \\ M_{Broken 2} = 9.460 N_{Broken 2} + 3.157 \end{cases}

(4)

where subscript “1” denotes japonica rice, subscript “2” denotes indica rice, “Unhulled” represents paddy rice with husk, “Hulled” represents intact hulled brown rice, “Broken” represents fragmented brown rice, M denotes mass, and N denotes grain count.

By incorporating the aforementioned quantity–mass predictive model into Equations (1) and (2), we derived image detection-based computational models for shelling rate and brown rice breakage rate in rice processing.

2.3. Image Acquisition

Research utilizing deep learning integrated with computer vision fundamentally relies on extensive, high-quality datasets [21]. To conduct effective studies, it is imperative to possess abundant and precisely annotated data. For complex agricultural problems, the required data volume increases correspondingly. In agricultural applications, datasets for training deep learning models primarily originate from three sources: (1) researcher-captured image datasets, (2) synthetically generated datasets through data augmentation techniques, and (3) publicly available image repositories.

In paddy grain detection tasks, datasets primarily serve to train deep learning models for practical deployment. Specifically, these datasets comprise extensively annotated paddy grain images and associated parameters, which are utilized to train deep learning algorithms for identifying and classifying grains under varying states or types. However, publicly available datasets for detecting shelling rate and brown rice breakage rate are still relatively scarce. Although there are several research datasets focused on milled rice, they often have limited scale, typically containing hundreds to thousands of raw images, which may constrain model training in complex scenarios. Moreover, dedicated datasets specifically targeting brown rice remain virtually nonexistent. To address this gap, in the present study, a specialized dataset was constructed and validated it through multiple deep neural network architectures.

2.3.1. Data Image Acquisition

This study employed a custom-built high-precision image acquisition system, as illustrated in Figure 3. The system comprised an industrial camera (MV-GE502GC-T; Hikrobot, Hangzhou, China), an LED ring light source, a light-isolated enclosure, and a camera mount. Samples included two rice varieties: long-grain indica rice and round-grain japonica rice. For each variety, four distinct states were captured: hulled grains, unhulled grains, broken brown rice, and grains husks. During image acquisition, the equipment remained stationary. Randomly selected grain samples were positioned on a black stage for imaging. By modulating the light source intensity, images under different illumination conditions—including low-light, normal-light, and high-light scenarios—were captured. This process resulted in the establishment of an image dataset containing 400 valid samples, with all images uniformly resized to 960 × 960 pixels. Representative sample images are shown in Figure 4.

2.3.2. Dataset Construction

Different grain categories were defined within the rice images and annotated using LabelImg (version 1.8.6), an open-source graphical image annotation tool.. The annotation labels were assigned as follows: shelled grains were labeled “Full”, unshelled grains were labeled “Unshelled”, broken brown rice were labeled “Broken”, and grains husks were labeled “Other”. Annotation results were saved in XML format. Data augmentation techniques—including rotation, flipping, cropping, and brightness adjustment—were then applied to expand the dataset and enhance model generalization capabilities. This process increased the dataset size to 1600 images. Following augmentation, conversion scripts transformed the XML-format annotations into TXT format compatible with YOLO models. For experimental purposes, the dataset was randomly partitioned into training, validation, and test sets at a ratio of 7:1.5:1.5.

2.4. YOLOv8n Network Structures

The official YOLOv8 [22] implementation comprises five variants: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x, with sequentially increasing model size and parameter count. YOLOv8n is frequently employed in object detection tasks because of its advantages of low parameter volume, rapid detection speed, and high accuracy. The overall network architecture of YOLOv8n is depicted in Figure 5.

While YOLOv8n offers lightweight design and computational efficiency, it exhibits limitations when processing dense small-grain targets such as rice kernels, potentially leading to missed detections and false identifications. The present study addresses these deficiencies through targeted modifications.

2.5. Enhanced YOLOv8n Model

To optimize the YOLOv8n model for the constructed dataset, four key modifications were implemented. Backbone Enhancement: The original C2f module was replaced with a more lightweight C2f-Faster-CGLU structure, reducing computational complexity and parameter count while maintaining robust feature extraction capabilities. Multiscale Feature Fusion: A hierarchical feature integration strategy was incorporated into the feature fusion module, enhancing multiscale object perception through cross-layer feature aggregation. Attention Mechanism Integration: An attention for fine-grained categorization (AFGC) module was embedded after the SPPF block to focus on local details, significantly improving discrimination of subtle target variations. Detection Head Optimization: The detection head was refined into a lightweight Detect_Rice structure, reducing computational costs while enhancing recognition of rice-specific fine-grained features. The architecture of the enhanced model is illustrated in Figure 6.

2.5.1. Lightweight Backbone Modification: C2f-Faster-CGLU

In standard YOLOv8n, the C2f module employs stacked Bottleneck structures to leverage multiscale features. As shown in Figure 7, the original C2f configuration incurs increased parameter volume and model size, leading to higher training and inference time costs.

To enhance detection speed and accuracy, FasterBlock [23] was introduced to refine the YOLOv8n architecture. As the core component of FasterNet, FasterBlock comprises a partial convolution (PConv) layer [24] followed by two 1 × 1 pointwise convolutions, as illustrated in Figure 8. The PConv structure in FasterNet maintains partial channels unchanged while applying standard convolution to remaining channels for feature extraction. This design significantly reduces memory access and computational redundancy, achieving higher FLOPs efficiency with lower computational overhead. The computational complexity of PConv is given by h × w × k² × c_p², where h and w represent feature map dimensions, k denotes kernel size, and cp indicates the number of channels involved in convolution. Typically, c_p utilizes one-quarter of conventional convolution channels. When c_p = 1/4c, PConv achieves 1/16 the FLOPs of standard convolution. This approach optimally leverages hardware capabilities while playing a pivotal role in spatial feature extraction.

Subsequently, convolutional GLU (CGLU) [25] was incorporated. By integrating depthwise convolution with parallel computing mechanisms, CGLU substantially overcomes the limitations of traditional GLU [26]. Unlike classical GLU, which relies solely on linear projections for gating, CGLU innovatively employs 3 × 3 depthwise convolutional layers. This preserves spatial feature perception while dramatically accelerating inference through parallel computation. The design reduces both model parameters and computational complexity while enhancing feature extraction efficiency via depthwise convolution’s local receptive fields. This effectively resolves the insufficient spatial sensitivity of pure linear gating mechanisms, achieving superior feature representation within a lightweight framework. The computational complexity of CGLU is as follows:

2 R H W C^{2} + \frac{2}{3} R H W C k^{2}

(5)

where R is the expansion ratio (a hyperparameter) and H, W, and C denote the feature map height, width, and channel count, respectively.

Integrated Improvement: The original C2f module was modified in this study by introducing CGLU after PConv in FasterBlock to form the Faster-CGLU composite module. Bottleneck components in C2f were replaced with this new module to construct the final C2f-Faster-CGLU architecture. This refinement reduced computational redundancy while strengthening feature representation capabilities, significantly boosting inference speed without compromising lightweight design. The enhanced architecture enables rapid detection performance, as depicted in Figure 9.

2.5.2. Enhanced Feature Fusion Architecture: BiFPN

YOLOv8n employs an FPN + PAN structure for feature fusion, incorporating cross-stage partial (CSP) networks within the PAN component. As shown in Figure 10a, the feature pyramid network (FPN) operates in a top-down manner, integrating high-level semantic information into lower layers. Conversely, the path aggregation network (PAN) functions bottom-up to fuse positional information upward (Figure 10b). The combined FPN–PAN architecture in YOLOv8n strengthens the integration of semantic and positional information, ensuring precise detection of multiscale targets. To address diverse target scales in the dataset, we introduced the more efficient bidirectional feature pyramid network (BiFPN) [27]. This implementation involves the following features. Learnable Weighting Mechanism: Assigns trainable scalar weights w_i to each input feature level P_i, constrained by ReLU activation (w_i ≥ 0) to adaptively prioritize feature scales. Fast Normalized Fusion: Computes fused output as

P^{o u t} = \frac{\sum_{i} w_{i} \cdot P_{i}}{\sum_{i} w_{i} + a}

(6)

where a = 10⁻⁴ is introduced to prevent numerical instability during computation. Bidirectional Cross-Scale Connections: Establishes lateral (same-level) and vertical (cross-level) feature interaction paths. Layer Repetition: Repeated application of BiFPN as feature network layers. The resulting architecture is illustrated in Figure 10c.

2.5.3. Attention for Fine-Grained Categorization (AFGC)

Attention for fine-grained categorization (AFGC) [28] is an efficient attention mechanism specifically designed for detecting subtle feature differences. Its core strength lies in accurately capturing and distinguishing object features with minute variations through localized detail focusing.

In paddy grain detection tasks, AFGC demonstrates significant advantages in two key aspects. First, by analyzing grayscale distribution, edge contours, and local texture features, AFGC precisely identifies surface differences between unhulled grains (rough husk surfaces) and intact grains (smooth endosperm textures). It simultaneously distinguishes grain types (e.g., long-grain vs. round-grain varieties) and size variations (e.g., large vs. small kernels). This capability provides intelligent sorting systems with critical data to automatically select target grain types based on variety, intended use, and processing requirements, significantly enhancing sorting efficiency and product quality. Second, AFGC effectively enhances morphological differentiation between fragmented particles and impurities. For instance, broken grains typically exhibit irregular fracture edges and localized grayscale discontinuities, while impurities maintain more stable geometric shapes. This quantifiable differentiation in boundary characteristics reduces misclassification rates of fragmented grains, thereby providing highly reliable data support for impurity assessment and graded sorting of paddy grains.

2.5.4. Lightweight Detection Head: Detect_Rice

The YOLOv8n architecture employs a decoupled head structure, where classification and regression tasks are processed by separate branches. This detection head design consists of two 3 × 3 convolutional layers (Conv3 × 3) and one 1 × 1 convolutional layer (Conv1 × 1). While this task decoupling enhances feature representation, it introduces parameter redundancy, reduces computational efficiency, and exhibits insufficient sensitivity for small target detection. Given the high real-time demands in paddy grains hulling processes, the study introduced a novel Detect_Rice head, with its architecture shown in Figure 11.

To address limitations of the CBS (Conv-BN-Silu) module in YOLOv8n, the study proposes the Conv_GN module (Figure 12). This modification replaces batch normalization (BN) [29] with group normalization (GN) [30], enhancing localization and classification performance. Additionally, the detection head employs parameter-shared convolutional operations: outputs from the 1 × 1 Conv_GN module feed into the 3 × 3 Conv_GN module, using identical kernels across all positions for feature extraction. This approach reduces model complexity while improving feature representation. Finally, a scale layer is incorporated to adaptively resize features according to target scale variations across detection heads.

3. Results

3.1. Parameter Configuration

Experiments were conducted on a 64-bit Windows 10 system within an Anaconda environment. A PyTorch virtual environment was established using Python 3.11.4 and PyTorch version 2.1.0, with GPU acceleration enabled via CUDA 11.8. Detailed hardware and software specifications are presented in Table 2.

The training hyperparameters were configured as follows: dataset partitioning ratio, 70% training, 15% validation, 15% testing; input image dimensions, 640× 640 pixels; number of target classes, 4; optimizer, AdamW with initial learning rate 0.001; label smoothing coefficient, 0; momentum, 0.9; batch size, 16; training epochs, 100. Upon completion, the optimal model weights were saved and subsequently used for validation on paddy grain test data.

3.2. Evaluation Metrics

Model performance was assessed using five key metrics: precision (P), accuracy of positive predictions; recall (R), proportion of actual positives correctly identified; mAP@50, mean average precision at IoU threshold 50%; GFLOPs, giga floating-point operations per second; FPS: frames processed per second. Precision and recall are calculated as:

P = \frac{T P}{T P + F P}

(7)

R = \frac{T P}{T P + F N}

(8)

where TP represents the number of samples correctly predicted as positive (true positives); FP represents the number of samples incorrectly predicted as positive but actually negative (false positives); and FN represents the number of samples incorrectly predicted as negative but actually positive (false negatives).

mAP@50 refers to the mean average precision (mAP) calculated at an intersection over union (IoU) threshold of 50% across all classes, serving as a core evaluation metric in object detection. It comprehensively reflects the performance of four key metrics: precision (P), recall (R), IoU, and average precision (AP). All subsequent references to “average precision” in this study denote this metric. GFLOPs quantifies computational complexity, while FPS measures real-time processing capability—higher values indicate faster target detection.

3.3. Network Model Training Results

The improved model was trained and evaluated on the paddy grain dataset using 640 × 640 pixel input images. Training results (Figure 13) revealed two key observations. In regard to classification loss dynamics, during the initial 20 epochs, the classification loss (cls_loss) exhibited a rapid decline. By approximately epoch 40, both training and validation loss curves plateaued with minimal divergence (Figure 13a), indicating no overfitting occurred during training. In regard to mAP@50 progression, the mean average precision (mAP@50) demonstrated its most significant increase within the first 20 epochs. Beyond epoch 40, the growth rate substantially decreased, stabilizing at approximately 97% (Figure 13b). This high-performance convergence validates the effectiveness of the proposed model enhancements.

3.4. Comparative Experiments

To validate the effectiveness of the constructed dataset and evaluate the performance advantages of the improved model in shelling rate and brown rice breakage rate detection tasks, comparative experiments were conducted using multiple state-of-the-art object detection models. Building upon this foundation, lightweight design optimizations were implemented within the YOLOv8n framework to develop an enhanced model achieving a superior accuracy–efficiency balance. This ultimately yielded a lightweight YOLOv8n detection method tailored for rice quality inspection.

3.4.1. Effectiveness of C2f-Faster-CGLU Module

To verify the improvement effects of the C2f-Faster-CGLU module in different network components, three comparative experiments were designed (Table 3), with optimal values highlighted in bold. Experimental results demonstrated that replacing the C2f module in the backbone network of the baseline model with C2f-Faster-CGLU increased precision (P) by 0.3% and mean average precision by 0.7%, both achieving optimal performance. Concurrently, computational load (GFLOPs) decreased by 16.0% while parameters were compressed by 14.5%, confirming the critical role of feature extraction layer optimization in balancing accuracy and efficiency. Consequently, the study adopted C2f-Faster-CGLU replacement in the backbone network.

3.4.2. Comparative Experiment on Improved AFGC Attention Mechanism

To validate the effectiveness of the proposed AFGC attention mechanism, comparative experiments were conducted against mainstream attention methods including efficient channel attention (ECA) [31], parameter-free attention SimAM [32], and the hybrid spatial-channel attention mechanisms CBAM [33] and GAM [34]. All experiments were performed under identical hardware/software configurations with equivalent training iterations. Results are presented in Table 4, with optimal values highlighted in bold.

As evidenced in Table 4, the AFGC mechanism achieved 95.5% mAP—a 0.8% improvement over baseline—significantly outperforming ECA (+0.5%), SimAM (+0.4%), and CBAM (+0.2%). Although precision (93.3%) showed a marginal decrease compared with baseline (94.0%), AFGC demonstrated superior balance between recall and precision metrics. Crucially, AFGC exhibited exceptionally lightweight characteristics: while matching GAM’s 95.5% mAP, it maintained significantly lower parameter counts (3.0 M vs. 4.6 M) and computational load (7.7 GFLOPs vs. 9.4 GFLOPs). This confirms AFGC’s optimal accuracy–efficiency tradeoff, particularly suitable for applications demanding strict model size and computational constraints.

3.4.3. Ablation Experiments

The experimental results indicated that the model structure achieved the highest accuracy when the C2f-Faster-CGLU module replaced the corresponding part in the YOLOv8n backbone network and further achieved optimal model accuracy with the addition of the AFGC attention mechanism. To analyze the impact of each module in the algorithm on performance, a series of ablation experiments were designed to validate their contributions to the final performance. The performance on the test set after training on the selected dataset is shown in Table 5, with the optimal values highlighted in bold.

As shown in Table 5, replacing the C2f module in the backbone network with the C2f-Faster-CGLU module increased the model’s mean average precision (mAP) by 0.7 percentage points. This enhancement significantly strengthened the model’s ability to extract rice grain features while simultaneously reducing computational cost by 16.0% and compressing the number of parameters by 14.5%. Building upon this, integrating the BiFPN module significantly boosted the recall rate (R) from 93.6% to 95.5% (+1.9%) through its multiscale feature fusion strategy, further increasing the mAP to 95.8%. This demonstrates the effectiveness of multiscale feature fusion in enhancing the robustness of target localization, all while maintaining the computational cost at 6.8 GFLOPs. Subsequently, introducing the AFGC attention mechanism, which focuses on paddy grain texture details, raised the mAP by an additional 0.5 percentage points, highlighting the adaptability of the attention mechanism to complex scenes. Finally, with the complete module combination of C2f-Faster-CGLU+BiFPN+AFGC+ Detect_Rice, the model achieved significantly lighter weight (2.02 M parameters, a 32.6% reduction) and more efficient inference (5.4 GFLOPs, a 33.3% reduction). While maintaining a high mAP of 96.2%, this configuration delivered the optimal overall performance.

Table 6 compares the detection performance of the improved model against the original model across different paddy grain categories (full, unshelled, broken, other), with optimal values highlighted in bold. It is evident that the improved model exhibited superior recognition capability for all four categories, achieving higher accuracy rates and effectively demonstrating the efficacy of the proposed model.

3.4.4. Visualization and Analysis of Baseline Comparison

To visually demonstrate the performance of the proposed improved algorithm in paddy grain detection, two samples each of long-grain indica rice and round-grain japonica rice were randomly selected from the test set. A comparison was conducted using the original YOLOv8n algorithm and the proposed improved algorithm on these samples. The left side of the visualization shows the detection results from the original algorithm, while the right side shows those from the improved algorithm. The detection results are presented in Figure 14, where yellow circles indicate false positives and blue circles highlight overlapping detections. The first two rows depict loosely and densely arranged round-grain japonica rice samples, respectively, while the last two rows show loosely and densely arranged long-grain indica rice samples, respectively.

Analysis of the detection results in Figure 14 reveals that the original YOLOv8n algorithm exhibited varying degrees of false positives in the first, second, and third row samples because of small target sizes and lighting conditions. Particularly in the second row sample, the dense arrangement and small scale of the targets led to feature confusion and overlap, resulting in multiple detection boxes redundantly annotating the same target. In contrast, the proposed improved model significantly reduced both the false positive rate and redundant annotations. It also demonstrated superior detection precision compared with the original model, indicating that the new method offers significant advantages in enhancing detection accuracy and stability.

3.4.5. Visualization and Analysis of Multi-Model Comparison

To comprehensively validate the precision advantages of our algorithm in paddy grain detection tasks, comparative experiments were conducted using identical experimental environments and datasets against state-of-the-art detection algorithms. Selected comparison models included YOLOv3-tiny [35], YOLOv5n [22], YOLOv6n [22], YOLOv8n, YOLOv9t [36], YOLOv10n [37], and YOLO11n [38], with results detailed in Table 7 (optimal values in bold).

As evidenced in Table 7, the proposed algorithm demonstrates superior performance in model parameter scale, size, and computational complexity compared with benchmark models while achieving significantly higher mean average precision. These results indicate enhanced generalization capability and robustness for multiscale object detection tasks, with particularly outstanding comprehensive detection accuracy. Furthermore, real-time performance testing on a GTX 1650 GPU platform achieved 89.94 FPS, fully satisfying the high-speed detection requirements of industrial applications.

Visual comparisons of detection results between the improved model and other YOLO series models on the custom dataset are presented in Figure 15. Yellow circles indicate false detections, while blue circles denote overlapping detections. The first two rows display detection results for long-grain indica rice under low-light and high-light conditions, respectively, followed by round-grain japonica rice under corresponding illumination scenarios. Overall, the proposed model exhibited significantly higher robustness and detection integrity than other methods, which consistently showed misidentifications or missed detections. Particularly in challenging environments, the study algorithm demonstrated superior adaptability to low-light conditions, strong reflections, and complex backgrounds.

3.5. Validation Experiments for Shelling Rate and Brown Rice Breakage Rate

3.5.1. Verification of Quantity–Mass Conversion Relationship Validity

To validate the accuracy of converting grain counts to mass parameters, a dual-method comparative experiment was designed. National Standard Weighing Method: Following GB/T 29898–2013, masses of hulled, unhulled, and broken grains were directly measured to calculate SR (shelling rate) and BR (brown rice breakage rate). Formula Verification Method: The seeds of each category were manually enumerated and substituted into Equation (4) to derive the corresponding SR and BR. Five samples each of long-grain indica rice (Samples 1–5) and round-grain japonica rice (Samples 6–10) were tested, with 50 grains per sample. Table 8 presents calculation results and absolute errors (AE), validating the reliability of the quantity–mass conversion model.

As evidenced in Table 8, absolute errors (AE) for both SR and BR measurements across rice varieties remained below the national standard tolerance threshold (≤2.5%). This confirms that the quantity–mass prediction model based on linear regression effectively substitutes traditional weighing methods, providing theoretical foundation for subsequent automated detection systems.

3.5.2. SR/BR Detection Comparative Experiment Based on Different Models

To validate the practicality of the improved YOLOv8n integrated with the quantity–mass conversion model and evaluate the performance differences among lightweight object detection models in rice shelling rate (SR) and brown rice breakage rate (BR) detection, a comparative experiment was designed, and the 10 sample groups from Section 3.5.1 were expanded to 20 sample groups per illumination condition. Three representative illumination conditions simulating field scenarios were tested: Normal Illumination, Low Illumination, and High Illumination. For each sample group under each illumination condition, the grain counts detected by each model were substituted into Formula (4) to derive SR and BR values, which were then compared with results from the national standard weighing method (GB/T 29898–2013). Absolute errors (AE) were subsequently calculated. Table 9 presents the average detection errors of the seven models across the 20 sample groups under each illumination condition (unit: %).

Experimental results demonstrated that the proposed improved model, combined with a quantity–mass prediction approach, achieved excellent and stable performance in detecting rice shelling rate and brown rice breakage rate under varying illumination conditions. Under normal lighting, the detection errors (Avg. AE_SR: 1.11%, AE_BR: 1.24%) consistently remained below the national standard threshold (≤2.5%). Notably, the model also exhibited strong robustness under challenging lighting conditions, with low errors observed under both low illumination (Avg. AE_SR: 1.35%, AE_BR: 1.41%) and high illumination (Avg. AE_SR: 1.28%, AE_BR: 1.33%). The model maintained a high real-time processing speed of 89.94 FPS under normal illumination, which not only supports real-time detection but provides sufficient computational headroom to accommodate fluctuations in operational parameters such as conveyor belt speed. Although YOLOv11n and YOLOv8n showed relatively better adaptability to illumination changes among baseline models (low-light AE_SR/AE_BR: 1.88/1.95 and 2.15/1.68, respectively), their performance degraded more significantly under nonideal lighting compared with the proposed model. Thus, they remain viable alternatives under stable lighting conditions, offering advantages in lightweight deployment and precision–speed tradeoff, respectively.

4. Discussion

In this study, we selected YOLOv8n as the base model because of its inherent advantages in deployability and lightweight design, as detailed in Section 2.4. Building upon this foundation, we developed an enhanced YOLOv8n-based model specifically optimized for detecting rice shelling rate (SR) and brown rice breakage rate (BR). Key improvements include: (1) replacing the backbone’s C2f module with the computationally efficient C2f-Faster-CGLU structure, integrating partial convolution (PConv) and convolutional gated linear unit (CGLU) to enhance small-target feature extraction; (2) implementing BiFPN for weighted multiscale feature fusion to improve boundary localization of densely packed grains; (3) embedding the AFGC attention mechanism to focus on critical texture and damage details; and (4) designing the lightweight Detect_Rice head using group normalization and dynamic convolution sharing. Ablation studies (Table 5) systematically demonstrated the contribution of each module, with the full model achieving 96.8% precision, 95.3% recall, and 96.2% mAP—significantly outperforming the baseline while reducing parameters by 32.6% (2.02 M) and computational load by 33.3% (5.4 GFLOPs). Visualization comparisons (Figure 14 and Figure 15) further confirmed the model’s superior robustness against challenges like dense adhesion, uneven illumination, and scale variation.

The proposed model achieved state-of-the-art performance among lightweight detectors (Table 7), outperforming YOLOv3-tiny (+2.3% mAP), YOLOv5n (+1.4% mAP), and even recent variants such as YOLOv10n (+2.0% mAP) and YOLOv11n (+0.6% mAP) while maintaining real-time capability (89.94 FPS). When integrated with the quantity–mass conversion model, the method achieved SR and BR detection errors of 1.11% and 1.24%, respectively (Table 9), well below the national standard thresholds (indica BR ≤ 5%, japonica BR ≤ 2%). These results demonstrate its strong potential for deployment in resource-constrained rice husking equipment.

Limitations and Future Work: While the proposed method demonstrated robustness under the tested conditions, performance may degrade under extreme lighting or severe grain occlusion. Future efforts will include: (1) near-infrared (NIR) spectral fusion—integration of a 900–1700 nm hyperspectral imaging system with a cross-modal attention mechanism to fuse visible and NIR features, constructing illumination-invariant representations; (2) transformer-enhanced occlusion handling—development of a cascaded deformable transformer module leveraging deformable attention to dynamically focus on contextual information in overlapping regions, combined with graph neural networks for modeling intergrain topological relationships; (3) edge deployment optimization—implementation of hardware-aware neural architecture search (NAS) to automatically generate compressed models deployable on edge devices (e.g., Jetson Orin); (4) cross-crop generalization: adoption of a meta-learning strategy within the transfer learning framework, enabling few-shot fine-tuning on granular crops (e.g., wheat, sorghum) to validate cross-species adaptability.

5. Conclusions

This study presents a high-precision, lightweight detection method for quantifying rice shelling rate (SR) and brown rice breakage rate (BR) based on an enhanced YOLOv8n architecture. The key innovations and findings are summarized as follows. Architecture Optimization: The backbone network was strengthened by replacing the standard C2f module with C2f-Faster-CGLU, integrating PConv for computational efficiency and CGLU for enhanced feature representation. This reduced GFLOPs by 16.0% while improving mAP by 0.7%. Feature Fusion and Attention: BiFPN was implemented for efficient multiscale feature fusion, boosting recall by 1.9%. The AFGC attention mechanism was embedded to focus on critical texture and damage details, further increasing mAP by 0.5% and enhancing robustness against illumination variations. Lightweight Detection Head: The novel Detect_Rice head, utilizing group normalization and parameter-shared convolution, compressed model size by 33.3% (to 4.0 MB) and parameters by 32.6% (to 2.02 M) while maintaining high accuracy. Integrated Detection Framework: Combining the improved YOLOv8n model with a rigorously validated linear quantity–mass conversion model enabled direct calculation of SR and BR aligned with national standards (GB/T 29898–2013). The final system achieved exceptional performance metrics: 96.8% precision, 95.3% recall, 96.2% mAP, and real-time inference at 89.94 FPS.

Superior Practical Performance: Validation experiments confirmed SR and BR detection errors of only 1.11% and 1.24%, respectively, with stable performance maintained across varying illumination conditions. The model significantly outperformed existing YOLO variants (from YOLOv3-tiny to YOLOv11n) and met stringent industrial requirements for rice husking quality control.

This research provides a high-precision, real-time solution for intelligent rice husking sorting. The proposed framework effectively addresses the challenges of small particle size, dense adhesion, and uneven illumination, paving the way for automated, nondestructive quality monitoring in rice processing lines. Future work will focus on enhancing robustness under extreme conditions and extending the methodology to other granular agricultural products.

Author Contributions

Conceptualization, Z.W. and Z.Z.; data curation, X.H.; formal analysis, L.L. and Y.Z. (Yehao Zhang); methodology, Z.W.; software, Y.Z. (Yehao Zhang); validation, F.S. and X.H.; visualization, Y.Z. (Yehao Zhang); writing—original draft, Y.Z. (Yehao Zhang) and H.Z.; writing—review and editing, Z.Z. and Y.Z. (Yufei Zhou). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Department of Science and Technology of Henan Province (Grant No. 232103810085, 242102220029).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available because of ongoing study.

Acknowledgments

Thanks to all of the authors cited in this article and the referees for their helpful comments and suggestions.

Conflicts of Interest

Author Fasheng Shen was employed by the company Shandong Alesmart Intelligent Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

All abbreviations in this article are explained as follows:

YOLO	You Only Look Once
C2f	Faster implementation of CSP bottleneck with two convolutions
AFGC	Attention for fine-grained categorization
SPPF	Spatial pyramid pooling with feature concatenation
PConv	Partial convolution
FasterNet	Fast and efficient neural network
FasterBlock	Faster network block
GLU	Gated linear unit
CGLU	Convolutional gated linear unit
BiFPN	Bidirectional feature pyramid network
FPN	Feature pyramid network
PAN	Path aggregation network
CSP	Cross-stage partial network
BN	Batch normalization
GN	Group normalization
CBS	Conv-BN-Silu
IoU	Intersection over union
GFLOPs	Giga floating-point operations per second
FPS	Frames per second
mAP@50	Mean average precision (IoU = 50%)
ECA	Efficient channel attention
SimAM	Simple parameter-free attention module
CBAM	Convolutional block attention module
GAM	Global attention mechanism

References

Cheng, K.; Yan, M.; Nayak, D.; Pan, G.X.; Smith, P.; Zheng, J.F.; Zheng, J.W. Carbon Footprint of Crop Production in China: An Analysis of National Statistics Data. J. Agric. Sci. 2015, 153, 422–431. [Google Scholar] [CrossRef]
Ren, J.X. Intelligent Rice Huller Technical Effect Online Evaluation and Process Parameter Optimization. Master’s Thesis, Henan University of Technology, Zhengzhou, China, 2024. [Google Scholar]
Cui, B.; Yang, L.; Fan, Y.C.; Wu, J.F.; Xu, Z.L.; Song, S.Y.; Zhang, Y.L. Simulation Analysis of the Structure and Husking Process of Rice Husk. Wuhan Polytech. Univ. 2023, 42, 25–30. [Google Scholar]
Cheng, M.; Zhou, S.H.; Wang, M.X.; Zhang, C.; Xu, X.M. Design and implementation of auxiliary calculation system for huller rotor system structural parameters. Food Mach. 2024, 40, 90–96. [Google Scholar]
Wang, R.; Li, J.H.; Xiao, C.Y.; Wang, C.; Xie, J.B.; Pang, X.B.; Peng, Z.H. Structural optimization design and experiment of polyploid rice processing equipment. Hubei Agric. Sci. 2021, 60, 191–193. [Google Scholar]
Zhang, C.Y. Problems and treatment measures of rice storage in southern shallow circular silos. Grain Oil Storage Technol. Commun. 2024, 40, 29–31. [Google Scholar]
Liu, Y.Q.; Ren, R.L.; Gu, R.G.; Shang, W.; Zhang, S.X.; Ruan, J.L. Research and practice of rubber roller huller based on dual-frequency motor drive. Grain Process. 2021, 46, 55–58. [Google Scholar]
Jiang, X.S.; Ji, K.H.; Jiang, H.Z.; Zhou, H.P. Research progress of deep learning in non-destructive detection of fruit quality. Trans. Chin. Soc. Agric. Eng. 2024, 40, 1–16. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Wu, W.; Rui, L.; Chen, C.; Tao, L.; Kai, Z.; Chengming, S.; Sun, C.; Guo, W. Detection and enumeration of wheat grains based on a deep learning method under various scenarios and scales. J. Integr. Agric. 2020, 19, 1998–2008. [Google Scholar] [CrossRef]
Liu, S.T.; Mou, Y.; Chen, W.Z. Broken rice detection dataset based on improved YOLOv5. J. Chin. Cereals Oils Assoc. 2024, 39, 140–148. [Google Scholar]
Zhang, W.R.; Du, Y.F.; Li, X.Y.; Liu, L.; Wang, L.Z.; Wu, Z.K. Online detection method of corn kernel harvest quality based on FSLYOLO v8n. Trans. Chin. Soc. Agric. Mach. 2024, 55, 253–265. [Google Scholar]
Sun, K.; Zhang, Y.J.; Tong, S.Y.; Tang, M.D.; Wang, C.B. Study on rice grain mildewed region recognition based on microscopic computer vision and YOLO-v5 model. Foods 2022, 11, 4031. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Liu, Y.; Li, C.; Luo, Q.; Lu, J. Pineapple Detection with YOLOv7-Tiny Network Model Improved via Pruning and a Lightweight Backbone Sub-Network. Remote Sens. 2024, 16, 2805. [Google Scholar] [CrossRef]
Wang, B.; Liu, P.; Tian, H.; Ren, H.; Cao, Y.; Li, S.; Wei, R. A lightweight particle detection algorithm based on an improved YOLOv8. J. Phys. Conf. Ser. 2024, 2816, 012093. [Google Scholar] [CrossRef]
Shi, J.; Xiong, K.X.; Li, Z.; Chen, L.C.; Tang, X.Y.; Yang, L.L. Lightweight improved YOLOv8 model and edge computing-based corn disease detection system. Jiangsu J. Agric. Sci. 2025, 41, 313–322. [Google Scholar]
Zou, Y.; Tian, Z.; Cao, J.; Ren, Y.; Zhang, Y.; Liu, L.; Zhang, P.; Ni, J. Rice grain detection and counting method based on TCLE–YOLO model. Sensors 2023, 23, 9129. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Jia, H.; Ren, Z.; Cui, J.; Yang, W.; Song, P. Accurate rice grain counting in natural morphology: A method based on image classification and object detection. Comput. Electron. Agric. 2024, 227, 109490. [Google Scholar] [CrossRef]
GB/T 29898-2013; Grain and Oil Machinery—Rubber Roll Husker. Standards Press of China: Beijing, China, 2013.
Wang, Z.X.; Wang, W.P.; Song, S.Y. Simulation analysis of parameters affecting dehulling efficiency of rubber roller huller based on ADAMS. Food Mach. 2023, 39, 83–87. [Google Scholar]
Fan, F.J.; Shi, Y. Effects of data quality and quantity on deep learning for protein–ligand binding affinity prediction. Bioorg. Med. Chem. 2022, 72, 117003. [Google Scholar] [CrossRef] [PubMed]
Terven, J.; Córdova-Esparza, D.M.; Romero-González, J.A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12021–12031. [Google Scholar]
Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image Inpainting for Irregular Holes Using Partial Convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
Shi, D. TransNext: Robust Foveal Visual Perception for Vision Transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 17–21 June 2024; pp. 17773–17783. [Google Scholar]
Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. In Proceedings of the International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 933–941. [Google Scholar]
Zhu, L.; Deng, Z.; Hu, X.; Fu, C.W.; Xu, X.; Qin, J.; Heng, P.A. Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 121–136. [Google Scholar]
Sermanet, P.; Frome, A.; Real, E. Attention for Fine-Grained Categorization. arXiv 2015, arXiv:1412.7054. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Wu, Y.X.; He, K.M. Group normalization. Int. J. Comput. Vis. 2020, 128, 742–755. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SIMAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar] [CrossRef]
Adarsh, P.; Rathi, P.; Kumar, M. YOLOv3-Tiny: Object Detection and Recognition Using One Stage Improved Model. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar]
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 7–13 September 2024; Part VIII, LNCS 14808. Springer: Cham, Switzerland, 2024; pp. 1–21. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J. YOLOv10: Real-Time End-to-End Object Detection. Adv. Neural Inf. Process. Syst. 2024, 37, 107984–108011. [Google Scholar]
Khanam, R.; Hussain, M. YOLOv11: An overview of the key architectural enhancements. arXiv 2024, arXiv:2410.17725. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the husking principle in a rubber roller husker.

Figure 2. Linear fitting results of grain mass versus grain count for japonica and indica rice.

Figure 3. Image acquisition system.

Figure 4. Representative paddy grain images acquired by the system: (a) japonica rice (round-grain) under low-light conditions, (b) japonica rice (round-grain) under normal lighting, (c) japonica rice (round-grain) under high-intensity lighting; (d) indica rice (long-grain) under low-light conditions, (e) indica rice (long-grain) under normal lighting, (f) indica rice (long-grain) under high-intensity lighting.

Figure 5. Baseline network architecture of YOLOv8n, consisting of a backbone (C2f modules), a neck (PANet-like feature pyramid), and a detection head.

Figure 6. Network architecture of the enhanced YOLOv8n model tailored for rice shelling rate (SR) and brown rice breakage rate (BR) detection. The improved model incorporates four key modifications: (1) a lightweight C2f-Faster-CGLU backbone, (2) hierarchical multiscale feature fusion, (3) an AFGC attention module for fine-grained categorization, and (4) a customized Detect_Rice detection head. These enhancements improve efficiency and accuracy in detecting small, densely clustered rice grains.

Figure 7. Schematic of C2f structure.

Figure 8. Architecture of FasterBlock.

Figure 9. Architecture of C2f-Faster-CGLU.

Figure 10. Comparison of three different feature fusion architectures evaluated. The BiFPN introduces four key improvements: (1) a learnable weighting mechanism that adaptively prioritizes multiscale features, (2) fast normalized fusion to ensure numerical stability, (3) bidirectional cross-scale connections for enhanced feature interaction, and (4) repeated application of the BiFPN layer to refine feature representation. These enhancements improve detection accuracy for multiscale targets in complex rice grain scenes.

Figure 11. Lightweight Detect_Rice head. The main improvements include: (1) replacing the CBS (Convolution-BatchNorm-SiLU) module with a Conv_GN module to enhance localization and classification performance; (2) introducing parameter-shared convolutional operations between the 1 × 1 and 3 × 3 Conv_GN layers to reduce model complexity; and (3) incorporating a scale layer for adaptive feature resizing in response to target scale variations. These modifications aim to improve real-time inference efficiency and sensitivity to small-grain targets.

Figure 12. Structure of Conv_GN module.

Figure 13. Training process metrics.

Figure 14. Comparative visualization of paddy grain detection results: (a) YOLOv8n detection results, (b) improved algorithm detection results.

Figure 15. Visual comparison of detection results across models on custom dataset: (a) YOLOv3-tiny, (b) YOLOv5n, (c) YOLOv6n, (d) YOLOv9t, (e) YOLOv10n, (f) YOLOv11n, (g) YOLOv8n, (h) improved algorithm.

Table 1. Regression model parameters.

Grain Type	Slope a_k (mg/grain)	Intercept b_k (mg)	R²
Japonica rice, unhulled	27.563	+3.644	0.999928
Japonica rice, hulled	23.714	−5.787	0.999859
Japonica rice, broken	11.441	+7.324	0.998006
Indica rice, unhulled	22.656	+4.650	0.999588
Indica rice, hulled	18.619	−0.494	0.999777
Indica rice, broken	9.460	+3.157	0.998559

Table 2. Model training parameters and environment configuration.

Configuration	Parameter
GPU	NVIDIA GeForce GTX 1650
Framework Version	Torch2.1.0+cuDNN8700+CUDNA11.8

Table 3. Performance comparison of C2f-Faster-CGLU replacements at different positions.

Model/Position	P/%	R/%	mAP/%	Model Size/MB	Parameters	GFLOPs
YOLOv8n (Baseline)	94.0	94.4	94.7	6.0	3,006,428	8.1
Backbone Replacement	94.3	93.6	95.4	5.1	2,569,252	6.8
Neck Replacement	93.3	94.9	95.3	5.0	2,510,212	6.8
Backbone+Neck Replacement	93.8	94.7	95.3	4.4	2,175,268	6.0

Table 4. Performance comparison of various attention mechanisms.

Attention Mechanism	P/%	R/%	mAP/%	Model Size/MB	Parameters	GFLOPs
YOLOv8n (Baseline)	94.0	94.4	94.7	6.0	3,006,428	8.1
YOLOv8n-ECA	93.4	92.8	95.0	6.0	3,006,431	8.1
YOLOv8n-SimAM	93.6	95.8	95.1	6.0	3,006,428	8.1
YOLOv8n-GAM	93.7	96.3	95.5	9.3	4,646,108	9.4
YOLOv8n-CBAM	92.4	95.9	95.3	6.1	3,072,318	8.1
YOLOv8n-AFGC	93.3	96.4	95.5	6.0	3,002,402	7.7

Table 5. Results of ablation experiments. “√” indicates the inclusion of the corresponding module/component; “×” indicates its exclusion.

Baseline Network	C2f-Faster-CGLU	BiFPN	AFGC	Detect_Rice	P/%	R/%	mAP/%	Model Size (MB)	Parameters	GFLOPs
YOLOv8n	×	×	×	×	94.0	94.4	94.7	6.0	3,006,428	8.1
	√	×	×	×	94.3	93.6	95.4	5.1	2,569,252	6.8
	√	√	×	×	94.3	95.5	95.8	5.1	2,569,260	6.8
	√	√	√	×	94.7	96.2	96.3	5.3	2,667,826	6.9
	√	√	√	√	96.8	95.3	96.2	4.0	2,023,981	5.4

Table 6. Performance comparison of improved and original YOLOv8n models across rice grain categories.

Category	P/% (Improved)	P/% (Original)	R/% (Improved)	R/% (Original)	mAP/% (Improved)	mAP/% (Original)
full	96.5	96.5	98.0	97.8	97.1	96.9
unshelled	97.3	97.3	98.3	96.2	97.3	97.9
broken	97.9	97.3	96.4	97.9	98.9	98.3
other	95.0	85.0	89.1	86.6	91.7	85.9

Table 7. Comprehensive model comparison.

Model	mAP/%				mAP/%	Parameters	Model Size/MB	GFLOPs	FPS
Model	Full	Unshelled	Broken	Other	mAP/%	Parameters	Model Size/MB	GFLOPs	FPS
YOLOv3-tiny	97.2	96.8	98.6	83.1	93.9	12,129,720	23.2	18.9	75.94
YOLOv5n	96.4	97.8	98.2	86.8	94.8	2,503,724	5.0	7.1	94.10
YOLOv6n	96.0	96.9	98.3	85.6	94.2	4,234,140	8.3	11.8	86.87
YOLOv8n	96.9	97.9	98.3	85.9	94.7	3,006,428	6.0	8.1	85.54
YOLOv9t	96.7	96.9	98.8	90.6	95.8	1,730,604	4.0	6.4	73.40
YOLOv10n	96.0	97.3	98.8	84.9	94.2	2,695,976	5.5	8.2	91.43
YOLOv11n	96.7	97.3	98.7	89.8	95.6	2,582,932	5.2	6.3	87.75
Ours	97.1	97.3	98.9	91.7	96.2	2,023,981	4.0	5.4	89.94

Table 8. Validation results for quantity–mass conversion model (%). “×” indicates not applicable or not available.

Sample	Type	Standard SR	Formula SR	AE_SR	Standard BR	Formula BR	AE_BR
1	Long-grain indica	87.6	86.9	0.7	6.7	7.6	0.9
2		92.1	91.9	0.2	9.5	8.3	1.2
3		92.2	92.1	0.1	3.3	3.7	0.4
4		87.4	87.0	0.6	6.5	6.3	0.2
5		87.5	88.7	0.8	5.4	5.5	0.1
6	Round-grain japonica	88.0	87.7	0.3	7.4	6.4	1.0
7		91.0	90.2	0.8	6.2	5.1	1.1
8		96.0	95.0	1.0	3.4	2.7	0.7
9		93.4	92.6	0.8	4.8	3.9	0.9
10		93.0	92.3	0.7	5.0	4.0	1.0
Mean	Long-grain indica	×	×	0.48	×	×	0.56
Mean	Round-grain japonica	×	×	0.72	×	×	0.94
Overall Mean	×	×	×	0.60	×	×	0.75

Table 9. Performance comparison of different models in SR/BR detection under varying illumination conditions.

Model	Normal Illumination (Avg. AE_SR/Avg. AE_BR)	Low Illumination (Avg. AE_SR/Avg. AE_BR)	High Illumination (Avg. AE_SR/Avg. AE_BR)	Parameters	FPS
YOLOv3-tiny	1.89/2.17	2.35/2.78	2.18/2.41	12,129,720	75.94
YOLOv5n	2.32/1.98	2.78/2/25	2.55/2.12	2,503,724	94.10
YOLOv6n	2.02/1.95	2.43/2.38	2.31/2.15	4,234,140	86.87
YOLOv8n	1.87/1.24	2.15/1.68	2.11/1.42	3,006,428	85.54
YOLOv9t	1.80/2.27	2.32/2.65	2.10/2.48	1,730,604	73.40
YOLOv10n	2.23/1.54	2.67/1.92	2.45/1.71	2,695,976	91.43
YOLOv11n	1.42/1.58	1.88/1.95	1.65/1.73	2,582,932	87.75
Ours	1.11/1.24	1.35/1.41	1.28/1.33	2,023,981	89.94

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Z.; Zhang, Y.; Zhang, Z.; Shen, F.; Li, L.; He, X.; Zhong, H.; Zhou, Y. An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate. Agriculture 2025, 15, 1595. https://doi.org/10.3390/agriculture15151595

AMA Style

Wu Z, Zhang Y, Zhang Z, Shen F, Li L, He X, Zhong H, Zhou Y. An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate. Agriculture. 2025; 15(15):1595. https://doi.org/10.3390/agriculture15151595

Chicago/Turabian Style

Wu, Zhaoyun, Yehao Zhang, Zhongwei Zhang, Fasheng Shen, Li Li, Xuewu He, Hongyu Zhong, and Yufei Zhou. 2025. "An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate" Agriculture 15, no. 15: 1595. https://doi.org/10.3390/agriculture15151595

APA Style

Wu, Z., Zhang, Y., Zhang, Z., Shen, F., Li, L., He, X., Zhong, H., & Zhou, Y. (2025). An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate. Agriculture, 15(15), 1595. https://doi.org/10.3390/agriculture15151595

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved YOLOv8n-Based Method for Detecting Rice Shelling Rate and Brown Rice Breakage Rate

Abstract

1. Introduction

2. Materials and Methods

2.1. Operating Principle of Grain Husking and Breakage in Rubber Roller Paddy Husker and Analysis of Performance Metrics

2.1.1. Operating Principle of Paddy Husking and Grain Breakage in Rubber Roller Hullers

2.1.2. Performance Metrics Analysis for Rubber Roller Huskers

2.2. Image-Based Quantification Methodology for Paddy Quality Metrics

2.3. Image Acquisition

2.3.1. Data Image Acquisition

2.3.2. Dataset Construction

2.4. YOLOv8n Network Structures

2.5. Enhanced YOLOv8n Model

2.5.1. Lightweight Backbone Modification: C2f-Faster-CGLU

2.5.2. Enhanced Feature Fusion Architecture: BiFPN

2.5.3. Attention for Fine-Grained Categorization (AFGC)

2.5.4. Lightweight Detection Head: Detect_Rice

3. Results

3.1. Parameter Configuration

3.2. Evaluation Metrics

3.3. Network Model Training Results

3.4. Comparative Experiments

3.4.1. Effectiveness of C2f-Faster-CGLU Module

3.4.2. Comparative Experiment on Improved AFGC Attention Mechanism

3.4.3. Ablation Experiments

3.4.4. Visualization and Analysis of Baseline Comparison

3.4.5. Visualization and Analysis of Multi-Model Comparison

3.5. Validation Experiments for Shelling Rate and Brown Rice Breakage Rate

3.5.1. Verification of Quantity–Mass Conversion Relationship Validity

3.5.2. SR/BR Detection Comparative Experiment Based on Different Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI