1. Introduction
Chicken meat, considered one of the most environmentally-friendly and economical sources of high-quality protein [
1], has led to an increasing prevalence of industrial and intensive farming to meet the growing demand. However, as the public’s requirements for animal welfare and food quality become increasingly stringent, non-caged farming models in commercial farming are gradually becoming the mainstream [
2]. Compared with intensive farming models, non-caged models support the normal behavioral expression of broiler chickens [
3]. Natural non-stress behaviors, such as eating, walking, and standing, can reflect the welfare status of broiler chickens. Chickens in good physical condition exhibit regular natural-behavior expressions. Monitoring natural behaviors to reflect the physiological state of animals has already become a common approach [
4,
5,
6,
7]. Kilpinen et al. [
8] utilized the correlation between preening behavior and skin condition to determine whether chickens were infected by red mites. Mendoza et al. [
9] also determined the degree of stimulation of ultraviolet light on hens by observing behaviors such as wing-flapping and standing. Therefore, rapid and accurate behavior recognition of free-range broiler chickens is of great significance for the welfare management of broiler chickens and the optimization of the farming process in commercial farming.
In recent years, computer-vision (CV)-based behavior recognition technology has been gradually developing. As a key aspect of CV, behavior detection, with its characteristics of accurate recognition and non-contact, has also started to be applied in animal behavior recognition. For instance, Sozzi et al. [
10] developed a visual deep learning model for detecting the comfort behaviors of free-range hens at different ages. Li et al. [
11] used Faster R-CNN to detect the stretching behavior of broiler chickens in image data at two poultry ages, achieving an accuracy of over 92%. To detect the behaviors of breeding hens in cages, Wang et al. [
12] utilized the YOLOv3 model and obtained high behavior detection accuracies: mating (94.72%), standing (94.57%), feeding (93.10%), spreading (92.02%), fighting (88.67%), and drinking (86.88%). Yang et al. [
13] also classified six different behaviors of laying hens (standing, sitting, sleeping, preening, scratching, and pecking) using object detection, with an identification accuracy reaching 95.3%.
Most studies have employed behavior detection techniques to achieve high precision recognition of broiler chicken behaviors. However, merely conducting behavior recognition on instantaneous images cannot accurately reflect the precise welfare status of broiler chickens. In commercial farming, breeders are more concerned with the continuous behavioral changes of a particular chicken over an extended period. Continuously recognizing and recording each chicken’s behavior to create a unique “behavioral profile” is crucial for ensuring animal welfare. This process relies on individual tracking.
In the context of automated farming systems, the lack of individual tracking of broilers can lead to difficulties in early disease detection [
14], inefficient feeding [
15], and inadequate animal welfare monitoring [
16]. Wearable devices are a viable option for tracking broiler chickens. For instance, Yang et al. [
17] used accelerometers to record the movements of nine seven-week-old broiler chickens, allowing for the classification of specific behaviors. Various wearable devices have been employed for broiler chicken behavior tracking, such as RFID [
18], IMU [
19], and UWB [
20]. Visual-based tracking methods have seen rapid development recently. Li et al. [
21] and Siriani et al. [
22] used Kalman filtering to track the movement of chickens in videos. Similarly, Tan et al. [
23] proposed SY-Track, a tool for high-precision tracking of broiler chickens in videos and analysis of their restlessness index. Sensor-based and visual-based tracking methods may perform similarly for the same task. For example, when detecting lameness in broilers, de Alencar Nääs et al. [
24] used pressure-sensitive pads combined with machine learning to predict broiler lameness, achieving 91% accuracy. Nasiri et al. [
25] developed a posture estimation model that automatically identifies broiler lameness by analyzing videos, achieving 95% accuracy. While both methods are accurate, visual-based tracking is more practical for real-world breeding environments, as it enables non-invasive, continuous monitoring of the entire flock. Sensor-based tracking is better suited for detailed studies of individual behavior, but its large-scale application is costly, and installation is complex.
However, most current methods either detect behaviors or track individuals without integrating both aspects, which hinders the monitoring of continuous behavioral patterns. This is because most research in the multi-object tracking (MOT) field focuses solely on objects of a specific category. Behavior recognition results involve multiple behavior categories. In an image, changes in different behaviors not only impact behavior recognition but can also alter the behavioral characteristics and appearance models of broiler chickens, potentially leading to tracking failure [
26]. To tackle the issue of the detector’s output of multiple-behavior-category information in the MOT task affecting broiler chicken tracking, some current studies employ multi-step approaches to associate specific behaviors with broiler chicken identities. For instance, Nasiri et al. [
27] performed two-step detections, concurrently detecting broiler chickens and their drinking behaviors. They utilized the single category results from broiler chicken detection for tracking. Finally, the tracking results were matched with the drinking behavior detection results to estimate the broiler chickens’ drinking time. However, excessive detection processes and matching procedures demand substantial computational resources, rendering it difficult to implement in commercial farming. Meanwhile, the cumbersome procedures may also cause errors to be continuously transmitted and amplified, resulting in poor method stability.
Therefore, this study focused on recognizing and continuously tracking the behavior of individual free-range broilers, verifying that the proposed method exhibits good adaptability. The purposes of this study are to (1) train a YOLOv8-BeCS model to accurately identify the natural behavior of broilers from an overhead perspective, (2) use visual tracking technology to track individual broilers, (3) design a connector structure to integrate behavioral information into the tracking process of individual broilers, and (4) conduct fine-tuning experiments in unfamiliar scenarios to verify the adaptability of the proposed method.
3. Results
3.1. Results of Different Improvement Strategies
3.1.1. Comparison of SimAM Modules
The C2f module in YOLOv8 has limited efficacy in extracting salient features for differentiating different broiler chicken behaviors. These relatively small-scale key features directly impact the model’s recognition performance.
Figure 10 shows that the C2f module overemphasizes feature extraction from the image’s background and irrelevant pixels. This hinders the model from focusing on the target’s detailed features, consequently affecting the model’s overall accuracy. With the introduction of the SimAM attention module, the key features in the image details are accentuated, and the model’s overall classification accuracy improves significantly, with P increasing by 3.5% (
Table 2). Compared with other common attention mechanisms, such as CA, ECA, SE, and CBAM, SimAM can better focus on key features.
3.1.2. IOU Function
The current mainstream IOU functions were tested, and the test results are presented in
Table 3. Compared with CIOU, DIOU, and GIOU, all three versions of WIOU performed well in various comparisons. The reduction in computational complexity leads to a faster inference speed. The fact that WIOU does not need to calculate the aspect ratio enables the overall inference speed of the model to increase by 6.25% compared to the basic model.
When comparing AP95, WIOU v3 used in this study exhibited a 2.7% improvement over the original CIOU, and precision P rose by 5.01%. Although the size of the model weights remained unchanged, the change in Boxloss indicates that the actual convergence of the model was more thorough. The focusing mechanism’s effectiveness manifested in the middle and late training stages, i.e., when the model neared convergence. During this time, WIOU assigned small gradient gains to low-quality anchor boxes, reducing harmful gradients and further lowering Boxloss. The effectiveness of this strategy was also confirmed in the study by Zhao et al. [
44].
3.1.3. Performance of DIOU-NMS
As can be seen from
Table 2, after using DIOU-NMS in the post-processing stage, the problem of inaccurate recognition caused by dense occlusion was solved to a certain extent, with AP50 increasing by 2.3%. This is because if two detection boxes are adjacent or overlapping, the NMS algorithm selects and retains the better detection box based on the scores. To verify the effectiveness of NMS, images with clustered broiler chickens were randomly selected for comparison.
Figure 11 presents the comparison results in the same scene, demonstrating that NMS addition reduces the model’s missed detection probability when detected targets are dense. To further assess the performance of the NMS strategy in handling occlusion, a manual evaluation experiment was conducted. First, 30 images with mild occlusion and 30 images with severe occlusion were selected. Mild occlusion was defined as no broiler in the image being obscured by more than 50% of its area, while severe occlusion was defined as one or more broilers being obscured by more than 50% of their area. The evaluation metrics included the missed detection rate (the proportion of broilers that should have been detected but were not) and the false detection rate (the proportion of broilers incorrectly identified as other broilers). The experimental results are presented in
Table 4. The results indicate that after incorporating NMS, the missed detection rate of the YOLOv8-BeCS model in severe occlusion scenarios decreased by approximately 13.4%. Additionally, in mild occlusion scenarios, the improved method reduced both the missed detection and false detection rates by about 4%. These findings demonstrate that the DIOU-NMS effectively addresses the occlusion problem. However, due to the need to calculate the IOU multiple times as the basis for the scores, the overall inference speed decreased by 0.3 ms.
When DIOU-NMS and the SimAM module work in tandem, they basically meet the special requirements for the model proposed earlier due to the particularity of the dataset. SimAM lays a good precision foundation for the stage of constructing candidate boxes. The combination of NMS and SimAM has led to increases of 3% and 5.7% in the AP and P metrics respectively compared to the basic model, which means that the probability of the behavior being detected incorrectly was reduced by about 5.7%. However, it also results in a loss of approximately 6% in the inference speed.
3.1.4. YOLOv8-BeCS
The newly designed YOLOv8-BeCS model achieves an AP50 of 84.9%, marking a 3.1% increase over the unadjusted YOLOv8m. Additionally, P and AP95 increased by 6.3% and 3.4%, respectively.
Table 5’s ablation experiment results show that the effective coupling of multiple modules accounts for the excellent performance of YOLOv8-BeCS. The new strategy combination proposed in this study takes into account both detection accuracy and detection speed. Although the inference time increased slightly, an AP accuracy of over 70% is sufficient to meet the requirements of practical applications.
Table 6 presents the results of YOLOv8-BeCS in recognizing different behaviors within the dataset. It is gratifying that the proposed algorithm performs extremely well in four behavior categories, except for the lying behavior, with an average P value greater than 88.1%. However, relatively low precision is observed in the lying behavior. Nevertheless, an AP50 performance of over 75% reaches an applicable level in practical scenarios.
To further validate the effectiveness of the algorithm, several representative object detection algorithms, such as YOLOv7 [
45], YOLOv6 [
46], YOLOv5 [
47], SSD [
48], Faster-R-CNN [
49], PP-YOLO [
50], and DTER [
51], were included in the comparative experiments.
Table 7 shows the performance of each model. Compared with the two-stage algorithm Faster-R-CNN, which has a complex procedure, YOLOv8-BeCS has an extremely short detection time, accounting for only 17.82% of the time taken by Faster-R-CNN, while the AP50 accuracy increased by 15.1%. Compared with other one-stage detection algorithms, such as SSD, YOLOv8-BeCS has a significant advantage in accuracy due to the new bounding box regression method brought by WIOU. As state of the art (SOTA) models in the field of object detection, models like YOLOv5, YOLOv6, YOLOv7, and PP-YOLO show obvious advantages in accuracy over other models in the broiler chicken behavior dataset, but they are slightly inferior to YOLOv8-BeCS.
Another advantage of YOLOv8-BeCS lies in its model size, as can be seen in the data. As seen in
Table 7, even after incorporating multiple modules, YOLOv8-BeCS maintains the same model weight size as the basic YOLOv8m model.
Several existing broiler behavior detection models designed for agricultural settings were compared in this study. The results are presented in
Table 8. The YOLOv8-BeCS model outperformed the others in a real free-range environment, likely due to its ability to address the complex background and severe occlusion prevalent in such settings. By enhancing feature extraction and incorporating NMS, YOLOv8-BeCS effectively overcomes these challenges, improving its adaptability to diverse free-range broiler rearing environments. Its suitability is further validated in the subsequent section.
Table 9 shows that the accuracy remains largely consistent across different computing platforms, whereas speed performance is significantly constrained by computational power. The CPU platform fails to accelerate inference, resulting in slower processing speeds. This experiment further demonstrates the strong hardware adaptability of the proposed method.
3.2. Performance of Connector and Trackers
Experiments were conducted to investigate the impact of the presence or absence of the Connector on the tracking performance. The comparative demonstrations video can be found in the
Supplementary Materials.
Figure 12 shows the different tracking results of the same original video segment. It can be observed that the Tracker without the Connector exhibits abnormal tracking continuity after several frames of tracking, with multiple target ID switches and tracking losses occurring. This indicates that in the absence of an effective Connector, the Tracker has difficulty maintaining stable associations of the target objects across consecutive frames.
In contrast, the ODBO Tracker using the complete strategy significantly outperforms the version without the Connector in terms of tracking performance. This study has successfully reduced the interference of changes in behavior classes and target shapes on the tracking process, significantly decreasing the occurrences of ID switches and tracking losses. The data presented in
Table 10 is consistent with the observed results. For the Tracker equipped with the Connector, the number of ID switches decreased by 66.7%, while HOTA, MOTA, and IDF1 increased by 30.17%, 44.22%, and 30.59%, respectively. Despite the substantial improvement in tracking accuracy, due to the additional secondary calculations of the detection head in the Connector and the data processing procedures, the tracking frame rate of ODBO decreased from 31 frames per second to 23 frames per second.
To comprehensively demonstrate the performance of ODBO, the mainstream algorithms of the Sort series were integrated into ODBO (including Sort, DeepSort, OCSort, StrongSort, and BoTSort). The same weights trained based on YOLOv8-BeCS and the Connector were used, and the results are presented in
Table 11.
At the cost of a partial reduction in the detection frame rate, all the Trackers demonstrated improvements in tracking accuracy and stability. The average increases in HOTA, MOTA, and IDF1 were 27.66%, 28%, and 27.96%, respectively.
In comparison with other algorithms, Sort significantly outperformed similar algorithms in terms of FPS, owing to its concise matching process. For DeepSort, its HOTA, MOTA, IDF1, and IDS were 71.56%, 76.54%, 81.36%, and 7, respectively. It can be observed that, despite slightly lower accuracy compared to some other methods, DeepSort had a much higher frame rate than OCSort, StrongSort, and BoTSort. In the comparison of the IDF1 metric, OCSort achieved the best performance at 83.72%, which was attributed to its effective solution to the limitations of the Kalman filter in the Sort algorithm. StrongSort took the lead in HOTA by virtue of its equipped strong detector, following, and embedding models. Regarding IDS, except for Sort, which performed significantly worse, the other four algorithms showed similar performances.
In the comparison of all the metrics, except FPS, BoTSort had the best overall performance. This was due to its improvement of the Kalman filter and the use of camera motion compensation. However, its tracking speed of only seven frames per second might not meet the requirements of applications. OCSort and StrongSort also had poor speed performances. Therefore, considering accuracy, stability, and tracking speed comprehensively, it is more practically meaningful for ODBO to choose DeepSort as the tracker.
3.3. Fine-Tuned YOLOv8-BeCS in Multi-Broiler Scenarios
In commercial farming settings, detectors designed for a single data scenario obviously lack generalization to other scenarios; thus, fine-tuning the model using data from the application scenario is essential [
54]. This study performed fine-tuning experiments on the proposed algorithm with the Scenario 2 dataset. Scenario 2 is an overhead shot of a commercial farming scenario. Notably, the number of broiler chickens in Scenario 2 substantially exceeds that in Scenario 1, yet the amount of training data falls far short of the conventional training level, posing greater challenges.
First, the optimal weights of YOLOv8-BeCS trained on the data from Scenario 1 were used to perform behavior recognition on the dataset of Scenario 2 (the results are shown in
Figure 13). When trained on the data from Scenario 1, YOLOv8-BeCS still achieved a precision of 34.7% and a recall of 47.2%. This demonstrates the good adaptability of YOLOv8-BeCS to unknown scenarios. Its precision was 20.3% higher than the original version, and recall was 17.2% higher. The differences in AP50 and AP95 were 21.7% and 13.9%, respectively.
After fine-tuning the training using the optimal weights of YOLOv8-BeCS trained on the data from Scenario 1, the experimental results are presented in
Table 12. The fine-tuned YOLOv8-BeCS significantly outperformed YOLOv8m when the number of training rounds was relatively small (30 and 50 rounds). Specifically, after 30 rounds of training, YOLOv8-BeCS led by 11.8%, 13.6%, and 11.5% in the P, AP50, and AP95 metrics, respectively. In the Auto Stop comparison, YOLOv8-BeCS showed better precision performance in the complex Scenario 2, with a 3.9% lead in the P metric.
Figure 14 shows the precision and loss changes of the fine-tuned YOLOv8m and YOLOv8-BeCS models during 500 training rounds. Initially, YOLOv8-BeCS quickly enhanced its precision thanks to prior information, while YOLOv8m lagged in progress due to weight matching difficulties. After 100 training rounds, the precision gap between the two models narrowed. At the 300-round mark, the precision of both models peaked but subsequently declined, possibly due to overfitting impacting the testing performance.
Fine-tuning experiments were performed in scenario 2 data using full ODBO, and the detailed performance is shown in
Figure 15. Due to the limitation of the camera coverage, cross-camera tracking could not be carried out. However, the tracking performance for broiler chickens that move within the camera range for an extended period is acceptable (such as ID-4, ID-6, ID-12, and ID-14 in
Figure 15).
4. Discussion
A key consideration is the extensibility of the ODBO system in agricultural settings, particularly regarding camera systems, all-day behavior tracking, and applicability across diverse farms. First, the limited field of view of a single camera system restricts its ability to cover the entire broiler activity area, regardless of the angle used, thereby hindering the large-scale deployment of ODBO. Two solutions address this limitation: a multi-scale detection model and a cross-camera system. The former adjusts the camera angle to capture a broader scene and employs a multi-scale detection model to identify smaller, distant targets [
54]. The latter establishes a cross-camera system to enable continuous tracking of the same target across multiple cameras. This approach has been explored in agricultural contexts; for instance, Han et al. [
55] implemented multi-camera tracking of cattle. However, such technology demands extensive scene-specific datasets and cannot be readily deployed with simple adjustments. Consequently, the selection of an appropriate camera system will significantly influence the future scalability of behavior tracking applications.
As object tracking technology advances, the tracking of different animals’ behaviors has steadily matured. Tu et al. [
56] designed a pig behavior tracking method based on YOLOv5-Byte, showing that its tracking performance is better for non-behavior classification than for behavior classification. This validates the approach of downplaying behavior categories in this study. Currently, research on broiler chicken behavior tracking remains scarce. Nasiri et al. [
27] used multiple detections to achieve the combination of broiler chicken behavior recognition and tracking. This dual detection branch tracking structure requires tedious multiple detection and a lot of computing power, and the accuracy is not high. Therefore, the ODBO process proposed in this paper achieves a balance between accuracy and speed. In selecting behavior tracking approaches, this study adopted the tracking-by-detection (TBD) method. Incorporating spatio-temporal information for time series modeling is another widely used behavior tracking technique. However, these two approaches differ fundamentally in their technical foundations. TBD excels in recognizing static behaviors (e.g., “standing” or “lying down”), whereas time series modeling offers greater accuracy for continuous actions (e.g., “walking” or “eating”), representing a key direction for future behavior tracking. Although the ODBO system demonstrates computational efficiency in detecting and tracking static behaviors, it requires integration of spatio-temporal information to accurately capture continuous behaviors. Consequently, ODBO exhibits notable limitations in long-term continuous behavior tracking. Recent advancements, such as spatio-temporal dual-stream networks and transformer-based temporal attention models, have been developed to address continuous behavior detection. Moving forward, this study could leverage the current single-frame model to extract high-confidence behavior data and integrate lightweight temporal networks (e.g., temporal convolutional networks, TCN) to model continuous behaviors effectively.
At the application level, across diverse farms, fine-tuning experiments have confirmed the method’s robust performance under varying flock densities, ground backgrounds, and lighting conditions. However, this study did not evaluate the method’s performance under extreme environmental conditions (e.g., temperature or lighting extremes), which impose significant demands on the system’s perception and stability. Current behavior tracking systems in related research predominantly rely on either pure visual or pure sensor-based methods. Visual systems may lose stability in extreme weather, while sensor-based systems face challenges related to cost and energy consumption. Future research could explore an integrated approach combining vision and sensor data to enhance adaptability across diverse scenarios and extreme environments.
The YOLOv8-BeCS designed in this study has an AP50 improvement of 3.1% compared to the original model, with P and AP95 increasing by 6.3% and 3.4%, respectively. This is attributed to taking into account the particularities of commercial farming scenarios. According to previous research, integrating SimAM with the backbone network can enhance the image feature extraction ability [
57]. The studies by Tan et al. [
58] and Liang et al. [
33] also demonstrated that NMS and changing the IOU can improve the detection performance. The effective coupling of the three strategies has led to an improvement in detection accuracy. The designed Connector has enhanced the tracking performance. Similarly, the research by Zheng and Qin [
59] also mentioned that the introduction of behavior recognition contributes to improving the tracking performance. In their cow behavior tracking experiment, they achieved the highest HOTA (72.4%), MOTP (86.1%), and IDF1 (80.3%). Both the buffer proposed by them and the tracker designed in this study reuse the data in the tracking process to achieve animal behavior recognition and tracking.
5. Conclusions
Only Detect Broilers Once (ODBO) is a visual-based method to correlate broiler behavior with individual identity information. ODBO consists of a high-precision broiler behavior detector, YOLOv8-BeCS, and a Tracker and Connector between them. YOLOv8-BeCS is based on YOLOv8m. The integration of the SimAM attention module, WIOU, and DIOU-NMS enhanced the detection accuracy of five frequent non-specific behaviors in broilers: eating, standing, lying, preening, and stretching. The comparative findings reveal that the average detection precision of the model for each behavior grows from 77.8% to 84.1%, and the AP50 reaches 84.9%, which is superior to similar models. YOLOv8-BeCS is connected to the Tracker through a well-designed Connector. ODBO performed exceptionally well in the tracking part. The average accuracy of Sort series trackers after Connector use was 71.31% (HOTA), 73.52% (MOTA), and 81.47% (IDF1). Compared to the tracking performance without a Connector, the tracking performance improved by 27.66%, 28%, and 27.96%, respectively. The results demonstrate that the ODBO video processing speed and tracking stability are good. Additionally, fine-tuning studies demonstrate the object detection model’s capacity to generalize across multiple commercial environments. ODBO’s detection track technique only employs one identification process to gather the behavioral data of the broiler and integrate it with the broiler’s identity. This method is vital for accurately managing livestock, safeguarding animal welfare, and promoting smart agriculture.