1. Introduction
Maize, as one of the world’s three major food crops, nourishes approximately one-fifth of the global population and serves as a staple food for hundreds of millions of people. At the same time, the industrial significance of the maize crop is unmatched when compared to other cereal crops, and it is used as a raw material for over 3000 products in various sectors, namely, sweeteners, cosmetics, textiles, gum, alcoholic beverages, films, packaging, and paper industries [
1]. Its yield and utilization profoundly influence global food security, economic development, and even the energy landscape. Maize kernels, serving as the primary storage site of nutrients in maize, have demonstrated certain preventive and therapeutic effects against conditions such as coronary heart disease, atherosclerosis, hyperlipidemia, and hypertension [
2], and can also be directly used as feed for poultry and other livestock [
3]. The number of maize kernels serves as a more accurate indicator for assessing crop growth and yield potential, providing scientific evidence and decision-making support for farmers and agricultural managers. Furthermore, it enables breeders to identify differences in kernel numbers among various varieties, thereby offering a foundation for breeding improvement.
From a traditional perspective, radar technology has achieved significant development and application in smart agriculture [
4,
5]. Numerous researchers have conducted extensive studies on maize, ranging from plant phenotypic measurement [
6] and remote sensing imaging [
7] to agricultural by-product processing [
8] using various engineering techniques. Liu et al. developed an accurate and effective method for determining maize leaf azimuth and plant spacing using Light Detection and Ranging (LiDAR) technology. They collected three-dimensional point cloud data of maize plants and achieved effective 3D morphological reconstruction through multi-frame stitching, with R
2 values of 0.87 and 0.83 for leaf azimuth and interplant spacing detection, respectively [
9]. Gu et al. utilized UAV-based LiDAR technology to quantitatively analyze the impact of different growth stages and lodging severities on the self-recovery ability of maize plants. They validated the accuracy of UAV-LiDAR point cloud data in predicting the plant height and lodging angle of lodged maize and further examined how the self-recovery capacity of maize plants is manifested across various growth stages and levels of lodging severity [
10]. Yadav et al. proposed a study on detecting VC plants in maize fields using RGB images collected by unmanned aerial vehicles (UAVs), aiming to maximize the true positive detection of VC plants in maize fields while minimizing the infestation of boll weevil pests. Su et al. [
11] combined ground-based radar with a designed automatic extraction algorithm to identify points on maize leaves from large, unstructured LiDAR point cloud data. The results showed that the final accuracy could reach 94.1% [
12]. Most of the researchers in the aforementioned studies adopted radar technology, using radar equipment to collect data and construct models of maize, ultimately achieving the goal of analyzing maize traits. However, the application of LiDAR in agriculture also has certain drawbacks. The point cloud data generated by LiDAR are massive and require high-performance computing platforms for processing. In addition, LiDAR faces difficulties in classifying crop types and characteristics, necessitating assistance from visible light imaging.
In recent years, computer vision technology has experienced rapid development in the field of smart agriculture [
13,
14]. Deep learning, as an advanced artificial intelligence technology, has already found extensive applications in the field of agricultural engineering [
15,
16], achieving significant progress in areas such as weed identification and management [
17], crop yield prediction [
18], and pest and disease monitoring [
19]. In recent years, numerous researchers have combined deep learning techniques with various types of data in the field of agricultural engineering to achieve more accurate monitoring of crop growth conditions and food classification, thereby providing better decision-making support for professionals in agricultural engineering [
20]. In the field of maize research, BT Kitano et al. employed a low-cost unmanned aerial vehicle (UAV) platform to capture images of maize fields and applied deep learning techniques to count maize plants, ultimately achieving automation of this process and reducing the need for manual labor [
21].
Yang et al. proposed a maize variety recognition model based on a Convolutional Neural Network (LeNet-5) combined with near-infrared (NIR) spectroscopy and deep learning techniques, enabling efficient and rapid identification of maize varieties. The model achieved an accuracy of 99.20%, offering a new approach for maize variety classification [
22]. Amin et al. developed an end-to-end deep learning model to distinguish between healthy and unhealthy maize leaves. The model leverages two pre-trained Convolutional Neural Networks (CNNs), EfficientNet-B0 and DenseNet-121, to extract deep features from maize plant images, achieving a classification accuracy of 98.56% [
23]. Divyanth et al. developed a novel two-stage approach based on deep learning, employing the SegNet, U-Net, and DeepLabV3+ architectures to train three semantic segmentation models for each stage of maize disease. This method lays the foundation for developing field-ready disease management systems [
24]. Xiao et al. mounted RGB and MicaSense multispectral cameras on UAVs to collect images of maize fields and utilized YOLOv5 to count maize plants. They demonstrated the feasibility of using the Otsu thresholding method to automatically extract plant height, NDVI, and NDRE values. By analyzing different maize field management practices, they verified that these variations significantly affect the emergence rate and concluded that fertilizing near the seeds is the most effective method for achieving higher emergence rates in experimental fields [
25].
However, despite the significant potential demonstrated by deep learning techniques in maize-related research fields, there exists a notable deficiency in specialized studies focusing on maize kernels. On one hand, existing achievements are mostly concentrated on macroscopic aspects such as maize plant phenotype analysis, variety identification, or disease detection, with a limited number of in-depth studies specifically centered on maize kernels. On the other hand, even when kernel-related content is involved, it fails to fully integrate with the actual working conditions of food industrial production. In industrial assembly lines, maize kernels are often in a state of dynamic falling. During their movement, random flipping of directions and rapid changes in spatial positions (such as mutual occlusion and trajectory crossing) can greatly interfere with the accuracy of traditional recognition algorithms. Meanwhile, these factors lead to problems such as target loss and repeated counting in the process of kernel tracking, seriously affecting the counting efficiency and result reliability.
This technological gap poses numerous challenges in real-world production scenarios. For food processing enterprises, relying on manual counting is not only time-consuming and labor-intensive but also prone to errors caused by operator fatigue, leading to high costs and low reliability. Moreover, existing counting methods developed for static or semi-static conditions are poorly suited to the dynamic scenario of high-speed falling kernels, making it difficult to meet the dual requirements of counting speed and accuracy in industrial production. Therefore, the development of a technical solution capable of accurately identifying the diverse morphological states of maize kernels during free fall—such as side-rolling, overlapping, and rotation—while continuously tracking their dynamic trajectories and performing precise counting, is crucial to addressing the challenge of quantitative kernel detection in the food industry.
Based on this, this study proposes a purpose-built integrated counting hardware system that innovatively combines the dynamic capture capabilities of a high-speed camera with deep learning techniques. The YOLOv8 algorithm is employed to achieve real-time detection of kernels in high-speed motion, while the ByteTrack tracking algorithm is simultaneously introduced to continuously track the motion trajectories of the kernels, ensuring that the kernels can be efficiently identified and accurately counted throughout the entire falling process within the field of view of the high-speed camera. This technical solution not only fills the research gap in batch counting of maize kernels under dynamic conditions, but its core concept and technical framework can also be extended to detection scenarios involving other grain kernels such as wheat, rice, beans, etc. It provides a reusable technical paradigm for quantitative analysis of kernels in the entire food production process and has important practical significance for improving the level of automated detection in the food industry, reducing production costs, and ensuring product quality.
4. Conclusions
The maize kernel batch counting system developed in this study, based on the YOLOv8-ByteTrack framework, achieved a counting accuracy exceeding 99% in dynamic falling scenarios. This result not only validates the effectiveness of the proposed technical approach but also highlights the synergistic advantages of deep learning and multi-object tracking (MOT) technologies in agricultural dynamic counting applications. From a technical perspective, the C2f module in YOLOv8 employs a multi-branch feature fusion architecture that enhances the model’s ability to extract discriminative features from kernels exhibiting complex deformations and rotations during high-speed motion, thereby ensuring robust detection even under rapid movement conditions. Meanwhile, ByteTrack introduces an innovative strategy that leverages low-confidence detection bounding boxes for secondary association, effectively mitigating trajectory fragmentation issues commonly caused by occlusions or motion blur in conventional tracking algorithms.
A critical innovation of this study lies in the proposed line-crossing counting method, which incorporates spatiotemporal coordinate constraints—specifically, requiring that the same object ID be detected on opposite sides of the counting line at different time instances. This design fundamentally prevents duplicate counting caused by ID switching, keeping the cumulative counting error below 0.7%. This performance represents a significant improvement over traditional methods that rely solely on ID-based statistics and offers a novel logical validation mechanism for dynamic object counting tasks.
Compared with existing studies, the proposed system demonstrates significant innovation in both technical approach and application scenarios. In contrast to the ground-based radar method adopted by Su et al. [
12], which achieved a leaf extraction accuracy of 94.1%, the high-speed vision approach in this study requires only 1/20 of the data volume of point cloud methods and does not rely on high-performance computing platforms, making it more suitable for real-time demands on production lines. Compared with the static plant counting method based on unmanned aerial vehicles (UAVs) proposed by Kitano et al. [
21], the present system targets the dynamic process of kernel free-fall and addresses motion blur through the use of high-speed cameras operating at 1000 fps, thereby filling a technological gap in dynamic kernel counting during food processing. While the maize variety recognition model developed by Yang et al. [
22] achieved a high accuracy of 99.20% in static classification, it lacks the capacity for continuous tracking of kernels in dynamic bulk conditions. This study integrates detection, tracking, and counting into a unified pipeline, thereby extending the application of deep learning in agriculture from single-object recognition to dynamic quantitative analysis for the first time. Regarding the counting error reported by Xiao et al. [
25], in maize plant counting, the main issue lies in the absence of a trajectory validation mechanism for moving targets. This deficiency is effectively addressed in the present study through the proposed line-crossing algorithm.
The outcomes of this study hold significant value in both academic and industrial domains. Academically, it establishes a comprehensive technical framework for dynamic kernel counting and validates the applicability of multi-object tracking (MOT) algorithms within agricultural engineering. Notably, the line-crossing counting method proposed herein further expands the scope of MOT applications. From an industrial perspective, the system can be directly integrated into maize processing lines to replace conventional manual sampling methods, achieving a substantial enhancement in counting efficiency. It also provides precise quantification of kernel numbers per plant, which is crucial for breeding research. Furthermore, the proposed technical solution is readily extendable to the counting of other grain types such as soybeans and wheat, offering a valuable reference for automation upgrades in agricultural product processing and feed production sectors.
Nevertheless, this study has certain limitations. First, the dataset includes only a single maize kernel variety and does not encompass kernels from different maize types with varying sizes and shapes (e.g., sweet maize and field maize), which may affect the generalization capability of the model. To address this, future work should expand the dataset to include multiple maize varieties with diverse morphological characteristics. These will include sweet maize with larger and plumper kernels, field maize with smaller and more compact shapes, and kernels with natural variations such as broken or misshapen ones. This expansion will enable the model to learn more comprehensive feature representations, enhancing its adaptability to different production scenarios. Second, the kernel falling density in the experiments was controlled at 5–10 kernels per frame (low-density scenario), where the system achieved a counting accuracy of over 99%, benefiting from YOLOv8’s robust detection of overlapping targets (up to 30% occlusion) and ByteTrack’s effective trajectory association. However, the counting performance under high-density conditions (>20 kernels per frame) was not evaluated; in such scenarios, severe mutual occlusion (occlusion rate exceeding 50%) and trajectory cross may lead to increased errors, with counting errors mainly caused by detection failure due to blurred frames from high-speed movement of kernels and tracking losses due to the severe overlapping of multiple kernels. The current model may struggle to distinguish individual kernels in highly cluttered motion states. Future research will optimize the association strategy of ByteTrack by incorporating appearance features (e.g., texture and color) to enhance target discrimination under high occlusion and further improve YOLOv8’s detection capability for overlapping kernels through multi-scale feature fusion enhancement, with the aim of maintaining an accuracy level above 95% in high-density scenarios. Third, the high cost of the Chronos 2.1-HD high-speed camera hinders the large-scale deployment of the system. Lastly, the current system achieves a processing speed of 30 FPS on a standard PC, which may not meet the demands of ultra-high-speed production lines. Future research can be improved in the following directions: expanding the dataset to include multiple kernel varieties and operational conditions to enhance model robustness; optimizing ByteTrack’s association strategy (e.g., incorporating appearance-based matching) to handle high-density occlusion; exploring frame-rate compensation algorithms using standard cameras to reduce hardware costs; and increasing processing speed to over 100 FPS through model lightweighting (e.g., using YOLOv8-nano) and GPU-based parallel acceleration.
From multiple perspectives of technological application, the deployment of this system can reduce manual intervention in food processing and alleviate visual fatigue caused by prolonged counting tasks, aligning with the human-centered development trend in intelligent manufacturing. From the viewpoint of food security, accurate kernel counting provides micro-level data support for yield estimation, contributing to the optimization of planting strategies and supply chain management. At the methodological level, the problem-driven, technology-integrated, and scenario-adapted approach validated in this study offers a valuable reference for addressing other agricultural engineering challenges, promoting the practical implementation of interdisciplinary technologies in real-world production environments.