1. Introduction
Native to central and southwestern Asia and Europe,
Ovis aries, commonly known as sheep, is now widely raised in northern China [
1]. It is a high-quality breed characterized by strong adaptability, tender meat, and fine wool, offering multiple economic benefits including meat, wool, and leather. In recent years, with the rapid development of sheep farming within the livestock industry, the stocking density has rapidly increased [
2,
3]. High stocking density leads to limited space for activity, which weakens their immunity and affects the health status of sheep. Studies have shown that the behaviors exhibited by sheep during their activity are an important indicator of their adaptability and health condition [
4,
5,
6]. Typically, sheep exhibit normal, common behaviors such as lying, walking, eating, and drinking. When their living environment or conditions change, abnormal behaviors such as excessive lying, reduced activity, loss of appetite, lameness, and frequent vocalizations may occur. Therefore, accurately identifying the behavioral activities of sheep is a significant method for monitoring their health status and an essential aspect of health management and control [
7,
8,
9].
Traditional livestock monitoring which mainly relies on the installation of surveillance cameras to manually observe animal behaviors, involves issues including high workload, subjectivity, and poor real-time performance. To overcome these limitations, researchers have achieved the identification and monitoring of common livestock behaviors by using wearable devices, including triaxial accelerometers and triaxial gyroscopes [
10,
11,
12,
13,
14,
15]. Yin et al. [
16] installed wireless sensor nodes on the necks of dairy cows and used the K-means clustering algorithm to accurately monitor parameters like respiration rate, activity acceleration, and others to assess the cows’ health status. Alvarenga et al. [
17] proposed a method for recognizing sheep’s eating behavior based on a triaxial accelerometer. This method effectively distinguishes between biting and chewing behaviors by installing the accelerometer on the sheep’s jaw and using a decision tree algorithm. Zhang et al. [
18] designed a wireless data acquisition system based on a triaxial accelerometer and applied deep learning models to achieve high-precision recognition of the eating, chewing, and rumination behaviors of grazing sheep. Nasirahmadi et al. [
19] used object detectors to recognize pigs’ standing, lateral, and prone postures. Lee et al. [
20] utilized a Kinect sensor to collect depth information and applied support vector machines to detect aggressive behavior in pigs. Yan et al. [
21] selected the MPU6050 and Bluetooth transmission module as an integrated behavior data acquisition module to classify and recognize the standing, lateral, and tilted postures of sows. Although these methods have achieved good results in behavior recognition, the use of contact-based devices may restrict livestock movement and affect their daily life. In addition, the devices need to be placed accurately on specific parts of the livestock, which presents limitations in terms of their general applicability.
Deep learning, due to its end-to-end nature and the advantage of not requiring manual feature extraction, makes it a high-performance algorithm [
22]. In the field of livestock monitoring, it offers a cost-effective, low-cost, non-contact approach for livestock behavior recognition [
23,
24,
25,
26,
27]. Wang et al. [
28] proposed a lightweight behavior recognition model for dairy goats, GSCW-YOLO, based on YOLOv8n. By integrating Gaussian Context Transformation and Content-Aware Reassembly of Features, the model enhances behavior feature recognition accuracy and small target detection capabilities, enabling automatic identification of abnormal behaviors in dairy goats. To achieve real-time online recognition of Liaoning cashmere goat behaviors, Chen et al. [
29] developed a high-precision and efficient behavior recognition model based on the YOLOv8n lightweight object detection algorithm. The model utilizes data augmentation, the CBAM attention mechanism, and Alpha-CIOU to improve recognition performance. Hao et al. [
30] proposed the YOLOv5-EMA model, which introduces an efficient multi-scale attention module that significantly improves the detection accuracy of cattle bodies and key parts, especially in the presence of small targets and occlusion. Yu et al. [
31] introduced the Res-Dense YOLO model for recognition of daily behavior in dairy cows, based on the YOLOv5 framework. This model incorporates multi-scale detection heads, the CoordAtt attention mechanism, and SioU loss function to enhance the recognition accuracy of behaviors such as drinking, feeding, lying, and standing in dairy cows. Wang et al. [
32] optimized the sheep behavior recognition model based on YOLOv8s, making improvements in small object detection, model lightweighting, and other aspects, achieving accurate recognition of behaviors such as standing, walking, eating, drinking, and lying. Song et al. [
33] proposed the ECA-YOLOv5s behavior recognition model, based on the YOLOv5s network and a channel-wise attention module, which enhances the recognition accuracy and stability of behavior in beef cattle under complex occlusion and varying lighting conditions. Yang et al. [
34] adopted Dense Block and SPPCSPC modules on the YOLOv6 framework to improve the recognition accuracy of abnormal pecking and pecked behaviors in chickens, facilitating the intelligent detection of abnormal behaviors in laying hens. Duan et al. [
35] employed a lightweight network structure and attention modules to develop a behavior recognition method for beef cattle based on SNSS-YOLOv7. This method reduces the computational load while accurately identifying common cattle behaviors. Gao et al. [
36] proposed a multi-scale behavior recognition method for dairy cows based on an improved YOLOv5s network, which enhances the recognition accuracy of daily behaviors, including standing, drinking, walking, and lying. Li et al. [
37] introduced a mounting behavior recognition algorithm for pigs based on Mask R-CNN, which automatically detects mounting behavior in pigs. Ding et al. [
38] achieved precise detection of suckling piglets by quantifying and optimizing the YOLOv5 network and efficiently deployed the model on the Jetson Nano platform.
Comprehensive analysis shows that, compared to contact-based devices for livestock behavior recognition, non-contact recognition methods have advantages such as being non-destructive, stress-free, cost-effective, and less affected by environmental factors. However, such research is still in its early stages, and current technologies are primarily focused on simple backgrounds. The effectiveness and accuracy of these methods need to be further improved in high-density and complex backgrounds. Therefore, this paper focuses on three common sheep behaviors: activity, eating, and lying, while also proposing an improved YOLOv8n-based model for sheep behavior recognition, called Fess-YOLOv8n. Firstly, the C2f module in the YOLOv8n backbone network replaces C2f-Faster to mitigate the computational load and reduce the model’s parameter size. Secondly, to address the issue of weak feature extraction due to occlusion of the sheep or external environmental factors, an efficient multi-scale attention module (EMA) is introduced. In addition, a spatial-channel synergistic attention mechanism (SCSA) is implemented to allocate appropriate weights to the model’s spatial and channel features, thereby enhancing its ability to fuse and detect targets across different scales. Finally, selective channel down-sampling (SCDown) is incorporated into the model, utilizing point convolutions and depth convolutions to adjust channel dimensions and spatial resolution, respectively, making the model lightweight while enhancing detection accuracy. The main contributions of this paper are as follows:
(1) Constructing a sheep behavior dataset and proposing the Fess-YOLOv8n model for sheep behavior recognition, which strikes a balance between lightweight design and high-precision recognition. Specifically, the C2f-Faster and SCDown modules contribute to the lightweight design by reducing computational complexity and parameter size, while the integration of EMA and SCSA improves recognition accuracy by enhancing feature extraction.
(2) Investigating the effects of different IoU thresholds, optimizers, and learning rates on Fess-YOLOv8n model training performance and behavior recognition effectiveness.
(3) Benchmarking the proposed model’s performance against other classical deep learning models on sheep behavior recognition tasks.
4. Discussion
In this study, the Fess-YOLOv8n model significantly improved the accuracy of sheep behavior recognition, particularly in detecting dynamic activity behaviors. By modifying the YOLOv8n network, Fess-YOLOv8n demonstrated strong adaptability and accuracy in complex environments. This model provides livestock producers with an efficient and reliable tool to monitor sheep behavior in real time and assess their health status.
The design concept of the Fess-YOLOv8n model is mainly reflected in the following aspects. First, the C2f-Faster network structure is adopted, utilizing the FasterNet Block to optimize computational efficiency and achieve a lightweight design, which meets the real-time requirements of sheep behavior recognition. Second, to further enhance the model’s feature extraction capability, the EMA attention mechanism is integrated into the model. This design effectively addresses the limitations of traditional object detection methods in handling environmental interference and the randomness of sheep behavior in large-scale farming environments. Next, the SCSA module is introduced. This module combines channel and spatial dual attention mechanisms, allowing the model to more effectively extract key information from multiple dimensions, thereby improving the accuracy of behavior detection. Finally, to further optimize model performance, SCDown is introduced. This method reduces redundant calculations and parameters, lowering the model’s computational load while ensuring detection accuracy.
In addition, Fess-YOLOv8n has undergone detailed adjustments in model performance optimization. By adjusting the IoU threshold, the model can reduce false positive and false negative rates, further enhancing recognition accuracy. Meanwhile, the adjustment of the learning rate allows the model to converge more quickly during training while maintaining high accuracy.
While we have successfully developed the Fess-YOLOv8n model for sheep behavior detection, it still has some limitations. First, the dataset used in the current study is primarily sourced from artificial farming environments, and the sample size is relatively limited. Its generalizability and robustness in natural environments still need further validation. To enhance the model’s adaptability and generalization, future research could focus on expanding the dataset and including samples from various environmental conditions, such as different lighting conditions throughout the day and night. This would improve the model’s performance in a wider range of environments and further enhance its stability and accuracy in real-world applications. In addition, single-modal visual data may not fully capture the behavior characteristics of sheep. Future work could explore integrating multimodal data (such as sound and environmental monitoring data) for more precise behavior recognition.
5. Conclusions
With the rapid development of precision livestock farming, artificial intelligence, and deep learning technologies, livestock behavior monitoring has become increasingly important in animal husbandry. Efficient and accurate behavior recognition plays a crucial role in assessing the physiological health of livestock, while also offering a solid foundation for the scientific management of large-scale, automated farming systems. In sheep behavior recognition, the proposed Fess-YOLOv8n unsupervised detection model achieves an effective balance between lightweight design and high accuracy. Through improvements and comparative analysis, the following conclusions are drawn:
1. The Fess-YOLOv8n model utilizes the EMA structure, which significantly enhances the model’s ability to extract key information. The SCSA module improves the model’s feature extraction capabilities for sheep behavior, further enhancing its recognition accuracy. The C2f-Faster and SCDown modules notably reduce the model’s computational complexity and parameter count, achieving a lightweight design and improving detection speed. Experimental results show that the Fess-YOLOv8n model effectively recognizes sheep behavior, achieving @0.5 of 91.4%, with a minimal weight file size of 5.13 MB.
2. Experimental results indicate that when the IoU threshold range is between 0.45 and 0.7, and the learning rate is 0.1, the @0.5 of Fess-YOLOv8n reaches a peak of 91.6%, with the lowest false negative and false positive rates.
In summary, the Fess-YOLOv8n model is capable of quickly and accurately recognizing three distinct behaviors of sheep while maintaining a low false negative rate and false positive rate. Its efficient and precise characteristics not only provide crucial technical support for sheep behavior analysis and health management but also offer a solid foundation for the scientific management of sheep farming. In future work, we will expand the dataset by increasing samples from different environmental conditions, including those recorded in low-light or night-time settings, to better reflect real-world scenarios. Additionally, we will explore the integration of multimodal data to further enrich the dataset, improve model performance, and enhance its stability and effectiveness in real-world applications. This will enable the model to be applied to continuous, round-the-clock monitoring, ensuring its adaptability and robustness in varied environmental conditions.