Detection of Feeding Behavior in Lactating Sows Based on Improved You Only Look Once v5s and Image Segmentation

Liu, Luo; Xu, Shanpeng; Chen, Jinxin; Wang, Haotian; Zheng, Xiang; Shen, Mingxia; Liu, Longshen

doi:10.3390/agriculture14081402

Open AccessArticle

Detection of Feeding Behavior in Lactating Sows Based on Improved You Only Look Once v5s and Image Segmentation

by

Luo Liu

^1,2

,

Shanpeng Xu

^1,3,

Jinxin Chen

^1,4,

Haotian Wang

^1,3

,

Xiang Zheng

^1,3,

Mingxia Shen

^1,3 and

Longshen Liu

^1,3,*

¹

Key Laboratory of Livestock Farming Equipment, Ministry of Agriculture and Rural Affairs, Nanjing 210031, China

²

College of Veterinary Medicine, Nanjing Agricultural University, Nanjing 210095, China

³

College of Artificial Intelligence, Nanjing Agricultural University, Nanjing 210031, China

⁴

College of Engineering, Nanjing Agricultural University, Nanjing 210031, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(8), 1402; https://doi.org/10.3390/agriculture14081402

Submission received: 14 July 2024 / Revised: 17 August 2024 / Accepted: 18 August 2024 / Published: 19 August 2024

(This article belongs to the Section Farm Animal Production)

Download

Browse Figures

Versions Notes

Abstract

:

The production management of lactating sows is a crucial aspect of pig farm operations, as their health directly impacts the farm’s production efficiency. The feeding behavior of lactating sows can reflect their health and welfare status, and monitoring this behavior is essential for precise feeding and management. To address the issues of time-consuming and labor-intensive manual inspection of lactating sows’ feeding behavior and the reliance on breeders’ experience, we propose a method based on the improved YOLO (You Only Look Once) v5s algorithm and image segmentation for detecting the feeding behavior of lactating sows. Based on the YOLOv5s algorithm, the SE (Squeeze-and-Excitation) attention module was added to enhance the algorithm’s performance and reduce the probability of incorrect detection. Additionally, the loss function was replaced by WIoU (Weighted Intersection over Union) to accelerate the model’s convergence speed and improve detection accuracy. The improved YOLOv5s-C3SE-WIoU model is designed to recognize pre-feeding postures and feed trough conditions by detecting images of lactating sows. Compared to the original YOLOv5s, the improved model achieves an 8.9% increase in [email protected] and a 4.7% increase in [email protected] to 0.95. This improvement satisfies the requirements for excellent detection performance, making it suitable for deployment in large-scale pig farms. From the model detection results, the trough remnant image within the detection rectangle was extracted. This image was further processed using image processing techniques to achieve trough remnant image segmentation and infer the remnant amount. Based on the detection model and residue inference method, video data of lactating sows’ feeding behavior were processed to derive the relationship between feeding behavior, standing time, and residue amount. Using a standing duration of 2 s and a leftover-feed proportion threshold of 2% achieves the highest accuracy, enabling the identification of abnormal feeding behavior. We analyzed the pre-feeding postures and residual feed amounts of abnormal and normal groups of lactating sows. Our findings indicated that standing time was significantly lower and residual feed amount was higher in the abnormal groups compared to the normal groups. By combining standing time and residual feed amount information, accurate detection of the feeding status of lactating sows can be realized. This approach facilitates the accurate detection of abnormal feeding behaviors of lactating sows in large-scale pig farm environments.

Keywords:

lactating sows; feeding behavior; improved YOLO v5s; image processing

1. Introduction

Monitoring pig behavior allows for the timely acquisition of information related to their growth, health, and welfare [1,2,3,4]. Among various behaviors, feeding behavior is crucial in determining their production performance [5]. A reduction in food intake is often associated with environmental changes or early-stage diseases [6]. By observing their feeding behavior and analyzing feeding patterns, it is possible to identify pigs with abnormal feeding habits promptly and accurately. This can effectively reduce feed waste and enhance breeding efficiency [7]. In modern large-scale pig farms, managing lactating sows is crucial to farm operations, as their health directly impacts productivity. The feed intake of lactating sows significantly affects their milk production, piglet survival rate, and weaning weight [8,9,10]. Therefore, monitoring the feeding behavior of lactating sows is essential for improving breeding efficiency.

In recent years, deep learning and image-processing technologies have garnered widespread attention in the research into pig feeding-behavior recognition [11]. Some studies have employed RFID (Radio Frequency Identification) technology and a series of multiplexers to record and investigate pig feeding behavior [12]. By combining feeding- and foraging-duration data, these studies aim to identify sick pigs and improve farming strategies. Additionally, some researchers have proposed using Faster R-CNNs (Regions with Convolutional Neural Networks) to locate and identify individual pigs and pig heads in group housing [13]. They designed an algorithm to associate each pig’s head with its body and developed a behavior recognition algorithm that measures feeding behavior by analyzing the occupancy rate of the feeding area. To improve detection speed, researchers have proposed a deep learning-based method to detect and recognize piglets in real time [14]. This technique combines the tasks of detecting pigs and classifying them into different behavioral postures in a single process, focusing solely on detecting feeding behavior. Li et al. [15] developed a target pig-tracking system based on a particle filtering algorithm, which uses real-time area monitoring to achieve multi-feature fusion particle filter tracking. This system statistically monitors feeding time and frequency, as well as the behavior of pigs. Luo et al. [16] utilized the shape characteristics of feed troughs and the ratio of corner points to the image width and height to establish the corner coordinates of the feeding area. They used YOLOv5 to detect pig heads and identified and quantified feeding behavior by calculating the proportion of the head within the located feeding area.

The studies primarily focus on identifying and tracking the feeding behavior of fattening pigs and calculating feeding time, without proposing a simple and efficient method for determining feeding behavior that would allow stockpersons to promptly address pigs with abnormal feeding patterns. Furthermore, research on the feeding behavior of lactating sows is relatively scarce. The feeding environments for fattening pigs and lactating sows in large-scale pig farms differ significantly, necessitating adjustments in the corresponding feeding-behavior detection methods.

Currently, the feeding behavior of lactating sows relies primarily on manual inspections, which are time-consuming, labor-intensive, and heavily dependent on the experience of the inspectors, making it difficult to meet the needs of large-scale farming. To address these issues, this paper builds on the YOLOv5s algorithm by adding SE attention modules to enhance performance and reduce the probability of missed and false detections. The loss function is replaced with WIoU to accelerate model convergence and improve detection accuracy. The improved YOLOv5-C3SE-WIoU model is developed to recognize feeding behavior by detecting images of lactating sows. From the model detection results, images of leftover feed within the bounding boxes are extracted, and image processing techniques are used to segment these images and infer the quantity of leftover feed. Finally, based on the detection model and feed inference methods, the video data of lactating sows’ feeding behavior are processed to determine the relationship between feeding behavior, standing duration, and leftover-feed quantity. This approach aims to achieve precise detection of feeding behavior in lactating sows within large-scale pig farming environments.

2. Materials and Methods

To assess the performance of the YOLOv5s-C3SE-WIoU model proposed in this paper in real-world applications, we designed and conducted a series of experiments. These experiments encompassed the selection of experimental animals, the establishment of feeding conditions, and the collection and processing of data, ensuring the scientific rigor and reproducibility of the results.

2.1. Animals, Housing, and Data Collection

The quality of experimental data directly impacts the accuracy of the model. This section will provide a detailed overview of the animals used in the experiments, their feeding environment, and the specific methods of data collection, ensuring the representativeness and reliability of the data.

2.1.1. Experimental Animals, Site, and Time

The video data for this study were collected at Jiaze Farm, located in Changzhou City, Jiangsu Province, from 14 August to 3 September 2023. The farrowing house, measuring 24 m in length and 8 m in width, contained eighteen farrowing crates arranged in two rows, each measuring 2.2 m by 1.9 m. Seventeen Landrace × Yorkshire sows, which successfully farrowed between 4 and 5 August 2023, were housed in this facility. During the data collection period, the environmental control systems maintained temperatures between 25 °C and 28 °C, with relative humidity ranging from 70% to 82%, ensuring optimal conditions for the lactating sows. Notably, the experimental site lacked automatic feeding lines, necessitating manual feeding practices.

2.1.2. Data Acquisition

In the experiment, a Hikvision camera DS2CD3325-I (25 fps) (Hangzhou, China) was employed, mounted on a bracket above the inspection robot, with the total height set at 2.1 m. The camera was integrated into the same local area network as the wireless router. The bracket’s position was adjusted via the video feed to achieve optimal data collection. The robot’s track was positioned 0.4 m from the farrowing crates. The schematic diagram of the collection equipment is shown in Figure 1. Video recording sessions were conducted daily from 05:40 to 06:10, 10:40 to 11:10, and 17:40 to 18:10, aligning with the 30 min preceding each feeding time. The inspection robot and camera were synchronized to commence data collection simultaneously, were run at the desired location, and collected the data during the required duration of the video.

2.1.3. Data Preprocessing

The recorded videos of lactating sows feeding were saved onto a hard disk recorder. A Python script based on Python 3.9 and OpenCV 4.7 was developed to extract frames from the videos at 1 s intervals, with each image having a resolution of 1920 × 1080 pixels and being saved in JPG (Joint Photographic Experts Group) format. Images were meticulously screened to ensure clear and unobstructed depictions of the sows’ feeding behavior. Blurry or shaky frames, as well as highly similar images, were manually discarded. However, images capturing piglets playing or feeding in the trough were retained to enhance the model’s robustness in real-world conditions. Ultimately, 1325 valid images were obtained.

2.1.4. Data Set Realization

Open-source annotation software LabelImg (Version:1.8.5) was used for manual annotation of the images. Before annotation, two annotators established a consistent annotation standard. One annotator performed the annotations, while the other was responsible for reviewing all the annotations to ensure consistency. The annotation information and number of labels are shown in Table 1. Lactating sows exhibited four main postures: standing, sitting, lateral recumbency, and sternal recumbency. For dataset annotation, standing and sitting were labelled as “stand”, while lateral recumbency and sternal recumbency were labelled as “lie”. Feed troughs with residual fodder were labelled as “surplus”, and empty feed troughs were labelled as “empty”. Rectangular boxes were used to select the detection targets, and the annotated images were exported in PASCAL VOC format, generating annotation files with the same filenames but with an XML suffix. These XML files contained information on the positions, categories, and coordinates of the bounding boxes for each detection target. The annotated images were randomly divided into training, validation, and testing sets in a ratio of 8:1:1, corresponding to 1069, 128, and 128 images, respectively. Considering the varying positions of farrowing crates and the changes in lighting in interactive areas, Mosaic data augmentation was employed to expand the model’s application scenarios and enhance its robustness [17]. This approach ensures that the model is thoroughly validated and tested during training, guaranteeing stable and reliable performance in practical applications.

2.2. Algorithm Design

To achieve precise detection of sow feeding behavior, this study adopted the YOLO v5-based object detection algorithm. The following sections will elaborate on the design principles of the algorithm and its application in this research.

2.2.1. Building the Object Detection Model

YOLO v5 is a single-stage object detection algorithm that builds on the YOLO series, incorporating several innovative designs to enhance performance. Compared to other models in the YOLO series and two-stage object detection models such as Faster R-CNN, YOLO v5 offers faster inference speed and lower computational resource requirements. YOLO v5 utilizes a lightweight network architecture while introducing deeper and wider network structures along with several improvements, enabling it to better handle small and dense objects while maintaining fast inference speed. Its efficient detection capabilities make YOLO v5 particularly suitable for large-scale pig farms [18,19]. Therefore, this study adopts the YOLO v5 algorithm to construct a model for detecting the feeding behavior of lactating sows.

Currently, YOLO v5 offers four main object detection networks: YOLO v5s, YOLO v5m, YOLO v5l, and YOLO v5x, primarily differentiated by their network depth. YOLO v5s is a variant of the YOLO v5 object detection algorithm and is a relatively smaller model within the YOLO v5 series. The network architecture of YOLO v5s is illustrated in Figure 2. The red-framed section illustrates the schematic diagram of the main network structure, while the orange-framed section depicts the schematic diagram of the model components. YOLO v5s adopts the overall network architecture of the YOLO v5 series, comprising the Backbone, Neck, and Head networks. With its smaller network depth, YOLO v5s is suitable for resource-constrained environments. Considering the practical production limitations in pig farms, and to facilitate subsequent software and hardware deployment, the lightweight model YOLO v5s was chosen. The optimized version of this model will serve as the detection model for the feeding behavior of lactating sows.

Due to the presence of piglets entering the trough, causing occlusion and affecting feed and water conditions, the probability of missed and incorrect detections has increased, to some extent. Therefore, this study proposes improvements to the YOLO v5s model. The specific improvements are as follows: (1) adding an SE (Squeeze-and-Excitation) attention mechanism module, and (2) replacing the loss function with WIoU (Weighted Intersection over Union). The SE attention mechanism enhances the model’s focus on important features, improving its generalization ability and accuracy across various tasks, particularly in resource-constrained environments like pig farms. Combined with the WIoU loss function, which optimizes bounding box regression and reduces detection errors, the model can more accurately identify and localize targets in complex backgrounds, thereby boosting overall performance in terms of detection precision and localization accuracy.

Adding the SE attention mechanism module.

The SE attention mechanism enhances the model’s representational capacity by explicitly modelling the interdependencies between channels. It achieves this through the following two steps: Squeeze: first, a global average pooling operation compresses the spatial information of each channel’s features, converting the feature maps into a global descriptor vector. Excitation: next, a fully connected layer applies a nonlinear transformation to the global descriptor vector, generating weights for each channel. These weights enable the model to dynamically adjust the importance of different channels, thereby capturing key features more effectively [20,21]. By enhancing the model’s focus on important features, particularly in resource-constrained environments, the SE attention mechanism improves the model’s generalization ability and accuracy across various tasks. In this study, the integration of the SE mechanism allows the model to more effectively identify features that are highly relevant to the target task, leading to improved detection or classification accuracy. The SE attention module is shown in Figure 3a and is fused with the C3 module structure into an improved C3SE module, which is shown in Figure 3b.

Given that the backbone network of YOLO v5 is responsible for feature extraction and the neck network handles feature fusion, integrating the SE attention module into the backbone can facilitate the extraction of more pertinent features, thus enhancing the detection model’s accuracy. The impact of inserting the SE attention module into different structural parts of the network varies in terms of performance and detection speed. In this study, two methods are employed: first, adding the SE attention module to the last layer of the original YOLO v5s model’s backbone, and second, replacing all C3 modules in the backbone with SE modules. The network models derived from these methods are designated as YOLO v5s-SE_Backbone and YOLO v5s-C3SE, respectively. The network structures of these models are depicted in Figure 3c,d.

2.: Replacing the loss function with WIoU.

Since training data inevitably contains low-quality samples, geometric factors such as distance and aspect ratio exacerbate the penalty on these low-quality examples, thereby reducing the model’s generalization performance. A good loss function mitigates the impact of geometric factors when the anchor box and the target box overlap well, and minimal training intervention will result in better generalization capabilities of the model. An anchor box, also known as a prior box or default box, is a predefined bounding box of a specific aspect ratio and scale used in object detection algorithms like YOLO. These boxes serve as reference points for the model to predict the location and size of objects within an image. During training, the model adjusts these anchor boxes to better fit the ground truth (target) boxes, which represent the actual location of objects in the image. A target box, also known as a ground truth box, is the bounding box that defines the precise location and dimensions of an object within an image. It is manually labelled in the dataset and serves as the standard against which the model’s predictions (adjusted anchor boxes) are compared during training. The goal of the model is to predict bounding boxes that closely match these target boxes.

Focal-EIoU (Extended Intersection over Union) v1 was proposed to address the BBR (Bounding Box Regression) balance issue arising from varying sample quality. BBR is a process used in object detection algorithms to refine the predicted bounding boxes to more accurately match the ground truth boxes (target boxes) in an image. During this process, the model predicts adjustments (offsets) to the coordinates of anchor boxes or prior boxes to better align them with the actual location and size of the object in the image. The goal of bounding box regression is to minimize the difference between the predicted boxes and the ground truth boxes, improving the precision of object localization. This process typically involves predicting the center coordinates, width, and height of the bounding boxes.

However, due to its static FM (Focusing Mechanism), the potential of non-monotonic FM was not fully utilized. WIoU was proposed, based on this concept, introducing a dynamic non-monotonic FM. The attention-based loss WIoUv1 for BBR has demonstrated lower regression error in simulation tests compared to the state-of-the-art SIoU (Scalable Intersection over Union). Additionally, WIoU includes WIoUv2 with a monotonic FM and WIoUv3 with a dynamic non-monotonic FM. Detailed studies on the impact of WIoUv3 on low-quality samples have demonstrated the effectiveness and efficiency of the dynamic non-monotonic FM [22]. The dynamic non-monotonic FM uses the outlier degree instead of IoU to evaluate the quality of anchor boxes and provides a wise gradient gain allocation strategy. This strategy reduces the competitiveness of high-quality anchor boxes while also reducing the harmful gradient generated by low-quality examples. This allows WIoU to focus on ordinary-quality anchor boxes and improve the detector’s overall performance.

In the WIoUv3 version, the outlier measure- and loss-function formulas are shown below as Equations (1) and (2):

β = \frac{L_{I o U}^{*}}{\bar{L_{I o U}}} \in [0, + \infty), r = \frac{β}{δ α^{β - δ}}

(1)

L_{W I o U v 3} = r e x p (\frac{{(x - x_{g t})}^{2} + {(y - y_{g t})}^{2}}{{(W_{g}^{2} + H_{g}^{2})}^{*}}) L_{I o U}

(2)

where

L_{I o U}

represents the original bounding box loss, (

x

, y) denotes the center point of the predicted box, (

x_{g t}, y_{g t}

) denotes the center point of the ground truth box,

W_{g}

and

H_{g}

are the width and height of the smallest enclosing box that contains both the ground truth and predicted boxes, β is the outlier measure used to evaluate the quality of the anchor box—the smaller the β, the higher the quality of the box—and r is the non-monotonic focusing coefficient. α and δ are hyperparameters that can be adjusted according to different models and datasets. To prevent

R_{W i o U}

from generating gradients that hinder convergence,

W_{g}

and

H_{g}

are detached from the computation graph (indicated by the superscript *). This effective elimination of factors that hinder convergence ensures that no new metrics, such as aspect ratio, are introduced.

In resource-constrained environments, such as applications in pig farms, the WIoU loss function aids the model in more accurately localizing targets, even in complex backgrounds. By optimizing bounding box regression, WIoU reduces detection errors, thereby enhancing the overall performance of the model, particularly in terms of detection precision and localization accuracy.

2.2.2. Experimental Setup

The experiments of this research were conducted using an Intel i7 8700K CPU (manufactured by Intel, Santa Clara, CA, USA). The system had 16 GB of RAM and an NVIDIA GeForce GTX 1080Ti GPU with 11 GB of memory (manufactured by Nvidia, Santa Clara, CA, USA). The chosen model framework was PyTorch 2.1.2, with CUDA 12.0 and Python 3.9 for implementation. Table 2 lists some parameters used in the experiments; the early stopping mechanism of the training process was not disabled.

In this study, P (Precision), R (Recall), and mAP (mean Average Precision) are used as the metrics for evaluating the model’s performance. The metrics are defined in Equations (3) and (4), to validate the effectiveness of the model’s lightweight design [23].

P r e c i s i o n = \frac{T P}{T P + F P}, S e n s i t i v i t y = R e c a l l = \frac{T P}{T P + F N}

(3)

A P = \frac{\sum P r e c i s i o n}{N}, m A P = \frac{\sum A P}{C}

(4)

where TP is the number of true positives, FP is the number of false positives, and FN is the number of false negatives. C represents the number of target categories, with C = 4 in this study. N represents the number of images in the test set that contain targets of these C categories.

P evaluates the model’s ability to accurately recognize the feeding behavior of lactating sows. R assesses the model’s coverage in identifying the feeding behavior of lactating sows. [email protected] represents the mean average precision of all target categories (four categories in total) when the threshold is 0.50. [email protected]:0.95 represents the mean average precision of all target categories when the threshold ranges from 0.50 to 0.95 [19].

2.3. Quantification of Feed Residue in Troughs Based on Image Processing

The study on the quantification of feed residue in troughs is primarily based on the output results of the lactating sow feeding-behavior detection model, which further examines the amount of residue. According to the definition of feeding behavior, if a lactating sow is in a standing or sitting posture and the trough is empty, it can be preliminarily judged that the feeding behavior is normal. Conversely, if the sow is lying on its side or chest and the trough contains feed, the feeding behavior is abnormal. However, in practical production processes, there are situations where a sow might be standing with feed in the trough or lying down with no feed, making direct classification and judgment challenging. Therefore, this study proposes a quantification method for feed residue in troughs. By quantifying the residue and analyzing the standing-time statistics, this method aims to achieve precise detection of the feeding status of lactating sows in large-scale pig farm environments. The research roadmap for the quantification of feed residue is illustrated in Figure 4. Frames of different colors indicate distinct processes: the pink frame represents the target detection module, the blue frame corresponds to the calculation of residual feed in the trough, and the yellow frame denotes the data analysis module.

The output results of the lactating sow feeding-behavior detection model are divided into several categories: standing with feed, standing without feed, lying with feed, and lying without feed. The feed residue data are retained for further processing. Images containing feed residue are selected, and, based on the detection results, the coordinates of the trough detection box are obtained. The image within the coordinates is cropped to obtain an image containing only the trough. In this study, the pixel distribution difference between feed residue and the trough is significant. However, there are cases such as water and feed mixtures, where distinguishing the image features and edges is challenging. Since the method ultimately adopted should occupy minimal computational resources to be suitable for large-scale pig farms, using the traditional image segmentation method of threshold segmentation is more appropriate for this study.

Before performing threshold segmentation, median filtering is first applied to reduce noise-induced errors in subsequent processing [24,25]. Median filtering is a non-linear digital filtering technique commonly used in image processing to remove noise from an image while preserving edges. It works by sliding a window (often a square or rectangular kernel) over the image, where each pixel within the window is sorted by intensity, and the median value of the sorted list is then used to replace the central pixel of the window. This process helps to eliminate outliers, such as salt-and-pepper noise, without blurring sharp edges, making it particularly effective for enhancing image quality in noisy environments. Due to the color difference between the trough and the feed, the feed residue is segmented by adjusting the RGB threshold values, resulting in a complete contour map of the residue. The area ratio is calculated, with a larger ratio indicating more residue. By continuously adjusting the threshold through experiments to select the optimal threshold, the residue can be segmented as completely as possible, while reducing background interference [26]. The final segmentation thresholds are determined as follows: 86 ≤ R ≤ 130, 30 ≤ G ≤ 85, and 40 ≤ B ≤ 110.

After threshold segmentation, the pixel distribution of the trough image becomes more dispersed, necessitating the use of edge detection algorithms to locate the edges in the image [27]. The Sobel edge detection algorithm, a common image processing technique, is used to detect edges and contours in the image [28]. It works by applying the Sobel operator, which consists of two 3 × 3 convolution kernels—one for detecting horizontal edges and the other for detecting vertical edges. These kernels compute the gradient of the image intensity at each pixel, effectively measuring the rate of change in intensity along the x and y axes. The Sobel operator emphasizes regions of high spatial frequency, such as edges, by calculating the gradient magnitude. The result is an image that highlights areas where there is a significant intensity change, which typically corresponds to the boundaries of objects within the image. Sobel edge detection is particularly useful for applications requiring the identification of object contours or the separation of different regions within an image. By applying Sobel edge detection to the binarized image and adjusting the parameters of the ‘cv2.Sobel’ function, the processing effect can be optimized. Through experimentation, the parameter value for ‘ksize’ is determined to be 3, yielding the best results. After applying the Sobel edge detection algorithm, the residue portions in the image are mostly clustered. To connect the adjacent black pixels within the residue portions without expanding the external pixels, a strategy of erosion followed by dilation is employed. This helps to connect the internal adjacent black pixels while keeping the representation of the feed residue in the image closer to the actual value. In the processed image, only black residue areas and white background areas remain. The area ratio is obtained by calculating the ratio of the number of black pixels to the total number of pixels. The calculation method is shown in Equation (5):

S = \frac{N_{B}}{N_{W} + N_{B}}

(5)

where S represents the area ratio, N_B represents the number of black pixels, and N_W represents the number of white pixels. The larger the area ratio S, the more feed residue remains.

3. Results

In this study, we conducted multiple experiments to validate the effectiveness of the proposed object detection model and behavior analysis methods. This section is divided into two parts: first, we discuss the model’s enhancements and their impact on detection accuracy; then, we analyze the model’s effectiveness in detecting sow feeding behavior in practical applications. These results provide a crucial foundation for further optimizing the model and applying it in real production environments.

3.1. Target-Detection Model Improvements

To enhance the overall performance of the model, we implemented several improvements to the foundational architecture. This section will focus on the specific implementation of these enhancements and their impact on the effectiveness of object detection.

3.1.1. Comparison and Analysis of Different Attention Mechanisms

To validate the impact of attention mechanisms on the model, this study selected several common attention modules—CBAM (Convolutional Block Attention Module), SE, CA (Coordinate Attention), and ECA (Efficient Channel Attention)—and integrated them into the model. CBAM is a lightweight attention mechanism that enhances a model’s ability to focus on important features. It does this by sequentially applying two types of attention: Channel Attention and Spatial Attention. Channel Attention focuses on ‘what’ features are important, while Spatial Attention focuses on ‘where’ these important features are located. By applying these two attentions in sequence, CBAM helps the model better capture relevant information in an image. CA is an attention mechanism designed to encode both channel and positional information efficiently. Unlike traditional attention mechanisms that treat each channel separately, CA divides channel attention into two parallel streams, focusing on the horizontal and vertical directions independently. This allows the model to capture long-range dependencies more effectively while maintaining the positional information, making it particularly useful for tasks where spatial structure is important. ECA is a simplified version of channel attention that aims to improve efficiency by avoiding the need for complex operations like fully connected layers. Instead, ECA uses a 1D convolution with a small kernel size to capture local cross-channel interactions. This lightweight approach makes ECA particularly suitable for scenarios where computational resources are limited, as it enhances the model’s ability to focus on important channels with minimal overhead. Two methods were used for integration: first, adding the attention module to the last layer of the original YOLO v5s model’s backbone, and second, replacing all C3 modules in the backbone with attention modules. The experimental results are shown in Table 3.

As seen from Table 3, integrating attention modules into different parts of the model results in varying changes in detection performance. Compared to the original YOLO v5s, YOLO v5s-SE_Backbone shows significant improvements in both precision and mean average precision. The [email protected] reached 91.6%, an increase of 8.3% over the original model, and [email protected]:0.95 reached 66.8%, demonstrating excellent detection performance. The YOLO v5s-CA_Backbone achieves the highest [email protected]:0.95 among all models at 68.2%, which is a 7.1% improvement over the original model. The YOLO v5s-C3SE had significantly improved [email protected] and [email protected]:0.95 compared to the original model. The [email protected] increased by 7.7%, and [email protected]:0.95 increased by 4.3%. Introducing attention mechanisms into a model typically increases computational overhead and model complexity, which can lead to slower inference speeds, particularly in resource-constrained environments. However, these mechanisms can also significantly enhance the model’s performance, especially when dealing with complex image or video data. YOLO v5s-C3SE strikes a balance between performance and effective detection, making it suitable for large-scale applications, such as detecting feeding behaviors in lactating sows on large-scale pig farms.

3.1.2. Comparison and Analysis of Different Loss Functions

During training, the presence of low-quality samples of lactating sows’ feeding behavior and trough feed residue can cause incorrect classifications, reducing the model’s generalization capability. To address this, WIoU was used to replace the original loss function. By reducing the weight of positional information when the anchor box and target box overlap, this mechanism minimizes pre-training interference, thereby enhancing the model’s generalization ability. The model with the updated loss function is named YOLO v5s-WIoU. To further evaluate the impact of different loss functions on the performance of the object-detection model, various loss functions including SIoU, EIoU, Focal_EIoU, AlphaIoU, Alpha_CIoU, and WIoU were integrated into the YOLO v5s model. The results of these comparisons are shown in Table 4, and the bounding-box-regression loss curves are illustrated in Figure 5.

As seen from Table 4 and Figure 5, compared to YOLO v5s, the model with the WIoU loss function shows a significant improvement in mAP values. The [email protected] increased by 7.3%, and the [email protected]:0.95 increased by 3.3%, ensuring high detection accuracy. Additionally, the WIoU model’s box_loss converges much faster than with other loss functions, demonstrating that the dynamic non-monotonic focusing mechanism of the WIoU loss function can reasonably allocate gradient gains for anchor boxes, effectively balancing high- and low-quality samples. The introduction of the WIoU loss function not only improves detection accuracy but also accelerates the convergence process, demonstrating its effectiveness in handling varying sample qualities. This makes the YOLO v5s-WIoU model particularly suitable for practical applications in large-scale pig farms, where robustness and efficiency are crucial.

3.1.3. Ablation Study

In this experiment, the C3 modules in the Backbone were replaced with C3SE attention mechanism modules. This modification allows the network to focus more on low-brightness targets in the feeding-behavior images of lactating sows, enhancing the network’s ability to perceive critical information and improving detection accuracy. Additionally, this change reduces the model’s parameters and computational load, achieving a lightweight model with good detection performance. The WIoU loss function was also integrated to accelerate convergence. Four sets of experiments were conducted, each incrementally adding one of the improvement schemes to ensure the feasibility of the optimization plan. The mean average precision (mAP) and parameter count were used as evaluation metrics. A “√” in the table indicates that the method was used in the improvement based on the YOLO v5s model, while an “×” indicates it was not used. The results are shown in Table 5.

As seen from Table 5, the YOLO v5s model achieved a [email protected] of 83.3%. Replacing the C3 module in the backbone with the SE attention mechanism increased [email protected] by 7.7% and [email protected]:0.95 by 4.3%. Substituting the WIoU loss function further improved [email protected] by 1.2% and [email protected]:0.95 by 0.4%. This improvement resulted in achieving the highest detection accuracy in this experiment.

3.1.4. Recognition Results of Different Labels

Comparative experiments were conducted on the dataset using both the YOLOv5s model and the improved YOLOv5s-C3SE-WIoU model. Compared to the original YOLOv5s model, the improved YOLOv5s-C3SE-WIoU model shows enhancements in both detection accuracy and model convergence speed. It significantly improves overall category detection performance, particularly in discerning leftover feed in troughs, thereby reducing misjudgments such as water or piglets entering the feed trough. The improved model demonstrates effective detection capabilities. The training results for each category of the improved and original models are shown in Table 6.

The training comparison graph indicates that the original YOLOv5s model had issues with lower accuracy, mAP values, and slower convergence speed during training. In contrast, the improved model exhibits higher accuracy, faster convergence, and improved mAP values compared to the original model. These experimental results validate the significant enhancements of the YOLOv5s-C3SE-WIoU model in detecting feeding behaviors and leftover feed during lactation in sows. The comparison graphs before and after the improvement of YOLOv5s are shown in Figure 6.

To further validate the algorithm’s performance, this study also compares YOLOv5s-C3SE-WIoU with other mainstream object detection algorithms: the two-stage algorithm Faster R-CNN, and other variants of YOLOv5: YOLOv5l, YOLOv5m, YOLOv5x, and YOLOv5s. The results are presented in Table 7. YOLOv5s-C3SE-WIoU achieved improvements of 8.9, 14.4, 11.3, 16.4, and 3.4 percentage points over YOLOv5s, YOLOv5x, YOLOv5m, YOLOv5l, and Faster R-CNN, respectively, in [email protected]. In terms of [email protected]:0.95, it showed improvements of 4.7, 8.3, 8.2, 10.3, and 1.2 percentage points over the same algorithms, respectively.

Considering potential challenges like piglet target occlusion, feeding-pig-head occlusion, and water and feed situations in large-scale pig farms, a test set was established to visually verify the efficiency of the improved model and its adaptability to complex environments. Comparative detection results between the traditional model and the improved model are depicted in Figure 7. Above the figure are the object detection images of the YOLOv5s model before improvement, and below are the object detection images of the improved YOLOv5s-C3SE-WIoU model. Specifically, Figure 7a shows the detection comparison when piglets enter the feed trough. Figure 7b shows the detection comparison with a small amount of feed. Figure 7c illustrates the detection comparison with a large amount of feed. Figure 7d demonstrates the detection comparison in situations where pig-head targets are occluded.

From Figure 7, it is evident that the YOLOv5s model exhibits instances of missed detections and false positives in complex environments. In Figure 7a,d, there are false positives in the presence of feed trough obstruction. In Figure 7b, there are false detections where piglets in the feed trough are misclassified as lactating sow troughs. Figure 7b,c also show errors in detecting water and feed, which are easily confused. The comparison highlights the fact that the YOLOv5s model’s accuracy falls short of requirements, affecting subsequent processing tasks. In contrast, the improved YOLOv5s-C3SE-WIoU model shows enhanced detection performance, reducing instances of missed detections and false positives. It has improved detection accuracy and demonstrates better adaptability to complex environments.

3.2. Feeding-Behavior Analysis

To define feeding behavior: when lactating sows are in a standing or sitting posture with no leftover feed in the trough, their feeding behavior can be preliminarily considered normal. Conversely, if lactating sows are still in a side-lying or chest-lying posture with leftover feed in the trough, their feeding behavior is deemed abnormal. However, in practical production, situations like standing with feed or side-lying without feed pose challenges for direct classification and judgment. Therefore, it is necessary to quantify leftover feed, using the method mentioned to objectively measure the amount of leftover feed in the trough. Based on the feeding posture of sows and the condition of the trough as detected by the model, combined with statistics on standing time and leftover feed quantity, the relationship between feeding behavior, standing duration, and leftover feed quantity can be analyzed. This approach aims to achieve precise detection of feeding conditions of lactating sows in large-scale pig farming environments.

First, randomly select 20 videos of 30 s each where lactating sows interact with the feeder before feeding behavior. Use the feeding-behavior detection model for lactating sows to detect the videos and record the standing duration. Output the detection results to quantify the leftover feed in the trough and record the leftover feed results. The detection results are shown in Figure 8 and Figure 9.

The feeding behavior of sows is assessed by three stockpersons through daily inspections, food-lure tests, and other comprehensive methods to determine their normalcy. This serves as the basis for analyzing the relationship between feeding behavior, standing duration, and leftover feed quantity. The results of the feeding-behavior detection are shown in Table 8.

From Table 8, it is evident that when the sow’s feeding behavior is abnormal, the average standing duration during interaction with the feeder before feeding decreases by 26.1 s, and the proportion of leftover feed increases by 9.71%. Subsequent experiments will further investigate the feeding-behavior detection results.

Sixty-one video clips, each 30 s long, were selected, comprising 30 sets of normal feeding-behavior data and 31 sets of abnormal feeding-behavior data in lactating sows. The model performs detections every second on the behavior data. The statistical results of standing duration during interaction with the feeder before feeding for each sow in the videos are shown in Figure 10, and the statistical results of leftover-feed proportion are shown in Figure 11.

According to the statistical distribution results in Figure 10 and Figure 11, there is a significant difference in standing-time distribution, with the highest accuracy achieved using 2 s as the threshold for determining standing duration. The distribution of leftover-feed proportion is relatively uniform; therefore, using thresholds of 1%, 2%, 3%, 4%, and 5% to determine feeding abnormalities, the recognition results were calculated using error rate, sensitivity, specificity, and accuracy as evaluation metrics. The error rate estimates the proportion of sows with abnormal feeding behavior incorrectly identified as normal, with lower values indicating fewer false alarms. Sensitivity refers to the proportion of normal feeding sows correctly identified from normal feeding samples, with higher values indicating better detection performance. Specificity refers to the proportion of abnormal feeding sows correctly identified from abnormal feeding samples, with higher values indicating better detection performance.

As shown in Table 9 using 2% as the detection threshold for leftover-feed proportion results in a feeding detection error rate of 3.85%, a sensitivity of 83.33%, a specificity of 96.77%, and an accuracy of 90.16%. The analysis of detection results at different thresholds is shown in Table 9.

To further test the effectiveness of this threshold in detecting feeding behavior, segments of feeding-behavior sequences and detection results from some videos are shown in Table 10. In this table, the standing state is denoted as 0, the lying state as 1, no feed as 2, and with feed as 3. The results are compared with manual testing outcomes. The test results indicate that using 2 s of standing duration and 2% leftover-feed proportion as thresholds for detecting feeding behavior, the model’s results most closely match the manual detection results. The study demonstrates that these thresholds provide optimal performance for feeding-behavior detection.

4. Discussion

Current research has established a strong foundation for studying pig feeding behavior using deep learning techniques. However, most studies primarily focus on identifying feeding behavior and calculating feeding duration, lacking precise methods to determine the normalcy of feeding behavior in lactating sows. This gap hinders early detection of abnormal feeding behaviors, delaying timely diagnosis and treatment crucial for animal welfare and farming efficiency. In this study, we propose a novel method to assess whether sow feeding behavior is normal. Our main innovation lies in collecting interaction- behavior data between lactating sows and inspection personnel before feeding and constructing a deep-learning detection model. This approach addresses the time-consuming and labor-intensive nature of manual inspections and their lack of continuity. To quantify leftover feed in the trough, we process the model detection results to obtain standardized images of the feed-trough leftovers, using traditional image processing methods to calculate the proportion of leftover feed. By analyzing the feeding posture of sows and the condition of the feed trough before feeding, we can determine the standing time and leftover-feed quantity. Combining these factors allows precise detection of the feeding status of lactating sows.

We propose the YOLOv5s-C3SE-WIoU model, which demonstrates significant improvements and innovations in several key aspects, particularly in the introduction of the SE attention mechanism and the optimization of the WIoU loss function. The proposed YOLOv5s-C3SE-WIoU model achieved a [email protected] of 92.2% and a [email protected]:0.95 of 65.8% in detecting the feeding behavior of lactating sows, demonstrating superior performance compared to the original YOLOv5s model. The SE module adaptively adjusts channel weights, thereby enhancing the model’s focus on important features, which significantly improves detection accuracy, especially in scenarios with complex backgrounds or occlusions. Meanwhile, the WIoU loss function incorporates a dynamic non-monotonic focusing mechanism that effectively mitigates the negative impact of low-quality samples during training, offering more precise bounding-box regression and enhancing overall performance. While the SE module enhances detection accuracy, it also increases computational complexity, potentially limiting model deployment in resource-constrained environments. Future research could explore optimizing the SE module or adopting more lightweight attention mechanisms to reduce computational demands while maintaining high detection accuracy. Furthermore, while WIoU performs well on general datasets, it may face optimization challenges on highly imbalanced datasets, particularly when there is a severe imbalance between positive and negative samples. Future work could investigate specialized optimization strategies or loss-function modifications for imbalanced datasets to further enhance the model’s robustness.

Additionally, by analyzing the standing duration and the proportion of leftover feed, we created scatter plots of these indicators and compared them with manual assessments. We found that using a standing duration of 2 s and a leftover-feed proportion of 2% as the standards for feeding-behavior detection yielded optimal results. Accurate monitoring of sow feeding behavior enables the early detection of abnormal behaviors, such as unusually short feeding times or irregular feed consumption. These anomalies are often early indicators of health issues in sows, such as digestive problems, stress responses, or potential diseases. By identifying these issues promptly, farm staff can take swift action, such as adjusting feeding-management strategies or administering necessary medical interventions, to prevent further deterioration of the sow’s health. Additionally, an automated behavior detection system reduces reliance on manual monitoring and provides more continuous and detailed data. This continuous monitoring offers a better understanding of the sow’s daily behavior and health status, aiding in the optimization of feeding environments and management decisions. For example, if the system detects a significant reduction in a sow’s feed intake over a certain period, it can automatically trigger an alert, prompting staff to check feed quality or environmental conditions, to ensure the sow’s nutritional needs are met. Furthermore, by improving the accuracy and real-time nature of detection, the method proposed in this study can also reduce feed waste and optimize feed utilization, thereby lowering costs while meeting the nutritional needs of the sows. This not only positively impacts the farm’s economic efficiency, but also contributes to reducing environmental burdens and promoting sustainable development.

In summary, the detection method proposed in this paper successfully applies deep learning and image-processing technologies to achieve automated detection of sow feeding behavior. However, this study has limitations that need to be addressed in future research. First, the animal experiments were conducted from the third to the twenty-fourth day of the sows’ lactation period, and all sows were in their third parity, limiting the dataset’s diversity. This restriction makes it difficult to validate the model’s performance across sows with varying parities during lactation. Second, the presence of water and feed in the troughs during actual production challenges the accuracy of trough-leftover image segmentation using metrics such as pixel accuracy and intersection over union. Most of the time, the analysis relies on human judgment. Therefore, employing convolutional neural networks for image segmentation to further improve the accuracy of leftover-feed segmentation is a key focus of our future work.

5. Conclusions

The improved YOLOv5s model increased the [email protected] value by 8.9% and the [email protected]:0.95 value by 4.7%, achieving a balance between lightweight design and good detection performance, making it suitable for deployment in large-scale pig farms. The proposed method analyzes the standing duration and leftover-feed quantity based on the feeding posture of sows and the condition of the trough before feeding, as output by the detection model. Using a standing duration of 2 s as the threshold achieves the highest accuracy. With a remaining-feed-area proportion threshold of 2%, the detection error rate is 3.85%, sensitivity is 83.33%, specificity is 96.77%, and accuracy is 90.16%. This approach ensures high sensitivity while reducing the error rate, thereby enabling the identification of abnormal feeding behavior in lactating sows.

Author Contributions

Conceptualization, L.L. (Luo Liu) and S.X.; methodology, L.L. (Luo Liu), S.X. and J.C.; software, L.L. (Luo Liu) and S.X.; validation, L.L. (Luo Liu), S.X. and J.C.; formal analysis, L.L. (Luo Liu); investigation, S.X.; resources, L.L. (Longshen Liu); data curation, S.X. and X.Z.; writing—original draft preparation, L.L. (Luo Liu) and S.X.; writing—review and editing, L.L. (Luo Liu); visualization, L.L. (Luo Liu), S.X. and H.W.; supervision, M.S. and L.L. (Longshen Liu); project administration, M.S. and L.L. (Longshen Liu); funding acquisition, M.S. and L.L. (Longshen Liu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant No: 2021YFD2000805) and the National Natural Science Foundation of China (Grant No: 32272929).

Institutional Review Board Statement

This study involved only observational data and did not involve any handling of animals; therefore, ethical approval was not required.

Data Availability Statement

The data underlying the results presented in this paper are not publicly available at this time, but may be obtained from the authors upon reasonable request.

Acknowledgments

The authors would like to acknowledge the support from the Jiangsu Lihua Animal Husbandry Co., Ltd (Changzhou, China) for the use of their animals and facilities.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, C.; Zhu, W.; Steibel, J.; Siegford, J.; Han, J.; Norton, T. Recognition of Feeding Behaviour of Pigs and Determination of Feeding Time of Each Pig by a Video-Based Deep Learning Method. Comput. Electron. Agric. 2020, 176, 105642. [Google Scholar] [CrossRef]
Kittawornrat, A.; Zimmerman, J.J. Toward a Better Understanding of Pig Behavior and Pig Welfare. Anim. Health Res. Rev. 2011, 12, 25–32. [Google Scholar] [CrossRef]
Li, D.; Chen, Y.; Zhang, K.; Li, Z. Mounting Behaviour Recognition for Pigs Based on Deep Learning. Sensors 2019, 19, 4924. [Google Scholar] [CrossRef] [PubMed]
Yang, Q.; Xiao, D. A Review of Video-Based Pig Behavior Recognition. Appl. Anim. Behav. Sci. 2020, 233, 105146. [Google Scholar] [CrossRef]
Gomes, B.C.; Andretta, I.; Valk, M.; Pomar, C.; Hauschild, L.; Fraga, A.Z.; Kipper, M.; Trevizan, L.; Remus, A. Prandial Correlations and Structure of the Ingestive Behavior of Pigs in Precision Feeding Programs. Animals 2021, 11, 2998. [Google Scholar] [CrossRef]
Matthews, S.G.; Miller, A.L.; Clapp, J.; Plötz, T.; Kyriazakis, I. Early Detection of Health and Welfare Compromises through Automated Detection of Behavioural Changes in Pigs. Vet. J. 2016, 217, 43–51. [Google Scholar] [CrossRef] [PubMed]
Alameer, A.; Kyriazakis, I.; Bacardit, J. Automated Recognition of Postures and Drinking Behaviour for the Detection of Compromised Health in Pigs. Sci. Rep. 2020, 10, 13615–13665. [Google Scholar] [CrossRef] [PubMed]
Craig, A.; Gordon, A.; Magowan, E. Understanding the Drivers of Improved Pig Weaning Weight by Investigation of Colostrum Intake, Sow Lactation Feed Intake, or Lactation Diet Specification. J. Anim. Sci. 2017, 95, 4499–4509. [Google Scholar] [CrossRef] [PubMed]
De Bettio, S.; Maiorka, A.; Barrilli, L.N.E.; Bergsma, R.; Silva, B.A.N. Impact of Feed Restriction on the Performance of Highly Prolific Lactating Sows and Its Effect on the Subsequent Lactation. Animal 2016, 10, 396–402. [Google Scholar] [CrossRef]
Pedersen, T.F.; Chang, C.Y.; Trottier, N.L.; Bruun, T.S.; Theil, P.K. Effect of Dietary Protein Intake on Energy Utilization and Feed Efficiency of Lactating Sows. J. Anim. Sci. 2019, 97, 779–793. [Google Scholar] [CrossRef]
Gu, X.; Song, H.; Chen, J.; Wang, Z. A Review of Research on Pig Behavior Recognition Based on Image Processing. Int. Core J. Eng. 2020, 6, 249–254. [Google Scholar]
Brown-Brandl, T.M.; Rohrer, G.A.; Eigenberg, R.A. Analysis of Feeding Behavior of Group Housed Growing–Finishing Pigs. Comput. Electron. Agric. 2013, 96, 246–252. [Google Scholar] [CrossRef]
Yang, Q.; Xiao, D.; Lin, S. Feeding Behavior Recognition for Group-Housed Pigs with the Faster R-CNN. Comput. Electron. Agric. 2018, 155, 453–460. [Google Scholar] [CrossRef]
Kim, M.; Choi, Y.; Lee, J.; Sa, S.; Cho, H. A Deep Learning-Based Approach for Feeding Behavior Recognition of Weanling Pigs. J. Anim. Sci. Technol. 2021, 63, 1453. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Sun, L.; Sun, X. Automatic Tracking of Pig Feeding Behavior Based on Particle Filter with Multi-Feature Fusion. Trans. Chin. Soc. Agric. Eng. 2017, 33, 246–252. [Google Scholar]
Luo, Y.; Xia, J.; Lu, H.; Luo, H.; Lv, E.; Zeng, Z.; Li, B.; Meng, F.; Yang, A. Automatic Recognition and Quantification Feeding Behaviors of Nursery Pigs Using Improved YOLOV5 and Feeding Functional Area Proposals. Animals 2024, 14, 569. [Google Scholar] [CrossRef]
Zhao, Y.; Shi, Y.; Wang, Z. The Improved YOLOV5 Algorithm and Its Application in Small Target Detection. In Intelligent Robotics and Applications; Liu, H., Yin, Z., Liu, L., Jiang, L., Gu, G., Wu, X., Ren, W., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2022; Volume 13458, pp. 679–688. ISBN 978-3-031-13840-9. [Google Scholar]
Li, R.; Wu, Y. Improved YOLO v5 Wheat Ear Detection Algorithm Based on Attention Mechanism. Electronics 2022, 11, 1673. [Google Scholar] [CrossRef]
Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Jiang, Y. Surface Defect Detection of Steel Based on Improved YOLOv5 Algorithm. Math. Biosci. Eng. 2023, 20, 19858–19870. [Google Scholar] [CrossRef]
Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An Improved YOLOv5 Model Based on Visual Attention Mechanism: Application to Recognition of Tomato Virus Disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
Bhargavi, K.; Jyothi, S. A Survey on Threshold Based Segmentation Technique in Image Processing. Int. J. Innov. Res. Dev. 2014, 3, 234–239. [Google Scholar]
Pare, S.; Kumar, A.; Singh, G.K.; Bajaj, V. Image Segmentation Using Multilevel Thresholding: A Research Review. Iran. J. Sci. Technol. Trans. Electr. Eng. 2020, 44, 1–29. [Google Scholar] [CrossRef]
Gupta, G. Algorithm for Image Processing Using Improved Median Filter and Comparison of Mean, Median and Improved Median Filter. Int. J. Soft Comput. Eng. (IJSCE) 2011, 1, 304–311. [Google Scholar]
Chudasama, D.; Patel, T.; Joshi, S.; Prajapati, G.I. Image Segmentation Using Morphological Operations. Int. J. Comput. Appl. 2015, 117, 16–19. [Google Scholar] [CrossRef]
Gao, W.; Zhang, X.; Yang, L.; Liu, H. An Improved Sobel Edge Detection. In Proceedings of the 2010 3rd International Conference on Computer Science and Information Technology, Chengdu, China, 9–11 July 2010; IEEE: Piscataway, NJ, USA, 2010; Volume 5, pp. 67–71. [Google Scholar]

Figure 1. Schematic diagram of data collection equipment.

Figure 2. YOLO v5s network structure diagram.

Figure 3. Original SE module and improved C3SE module architecture.

Figure 4. Roadmap for quantitative research on residual feed in the trough.

Figure 5. Train box-loss comparison of different loss functions.

Figure 6. Comparison of YOLO v5s and YOLO v5s-C3SE-WIoU train curve.

Figure 7. Complex environment detection effect comparison.

Figure 8. Feeding-behavior test results of lactating sows.

Figure 9. Feed-trough test results.

Figure 10. Statistical-result chart of standing time.

Figure 11. Statistical-result chart of remaining material ratio.

Table 1. Labellmg label information and the number of labels.

Labels	Explanation	Number of Trains Images	Number of Valid Images	Number of Tests Images
stand	Standing or sitting posture of lactating sows	742	48	108
lie	Lateral recumbency or sternal recumbency posture of lactating sows	327	80	20
empty	Empty feed troughs	775	74	81
surplus	Feed troughs with residual fodder	294	54	47

Table 2. Experimental parameter configurations.

Parameters	Value
Workers	8
Epochs	200
Learning rates	0.01
Batch size	s, l, m = 16, x = 12
Weight decay	0.0005
Momentum	0.937

Table 3. Model training results of different attention mechanisms.

Model	P	R	[email protected]	[email protected]:0.95
YOLO v5s	0.741	0.936	0.833	0.611
YOLO v5s-C3CBAM	0.774	0.906	0.87	0.64
YOLO v5s-CBAM_Backbone	0.839	0.875	0.902	0.667
YOLO v5s-C3SE	0.833	0.897	0.91	0.654
YOLO v5s-SE_Backbone	0.813	0.951	0.916	0.668
YOLO v5s-C3CA	0.753	0.965	0.885	0.655
YOLO v5s-CA_Backbone	0.788	0.986	0.914	0.682
YOLO v5s-C3ECA	0.759	0.995	0.898	0.667
YOLO v5s-ECA_Backbone	0.784	0.968	0.905	0.668

Table 4. Model training results of different loss functions.

Model	P	R	[email protected]	[email protected]:0.95
YOLO v5s	0.741	0.936	0.833	0.611
YOLO v5s-SIoU	0.761	0.962	0.895	0.655
YOLO v5s-EIoU	0.861	0.821	0.885	0.657
YOLO v5s-Focal_EIoU	0.777	0.935	0.884	0.651
YOLO v5s-AlphaIoU	0.875	0.856	0.895	0.655
YOLO v5s-Alpha_CIoU	0.798	0.941	0.902	0.66
YOLO v5s-WIoU	0.808	0.993	0.906	0.644

Table 5. Improve the ablation experiment of YOLO v5s.

C3SE	WIoU	[email protected]	[email protected]:0.95
×	×	0.833	0.611
√	×	0.91	0.654
×	√	0.906	0.644
√	√	0.922	0.658

Table 6. Comparison of YOLO v5s and YOLO v5s-C3SE-WIoU recognition results.

Model	Labels	P	R	[email protected]	mAP @0.5:0.95
YOLO v5s	All	0.741	0.936	0.833	0.611
	Stand	0.871	1	0.995	0.812
	Lie	0.99	1	0.995	0.745
	Empty	0.649	0.997	0.736	0.482
	Surplus	0.455	0.766	0.606	0.407
YOLO v5s-C3SE-WIoU	All	0.8	0.922	0.922	0.658
	Stand	0.931	1	0.995	0.765
	Lie	0.975	1	0.995	0.728
	Empty	0.772	0.689	0.825	0.531
	Surplus	0.524	1	0.871	0.61

Table 7. Comparison results of the different detection models.

Model	[email protected]	[email protected]:0.95
YOLO v5s	0.833	0.611
YOLO v5x	0.778	0.575
YOLO v5m	0.809	0.576
YOLO v5l	0.758	0.555
YOLO v5s-C3SE-WIoU	0.922	0.658
Faster-RCNN	0.888	0.646

Table 8. Feeding-behavior test results.

Sample ID	Standing Duration (Seconds)	S (%)	Feeding Behavior
1	0	13.88	abnormal
2	0	16.4	abnormal
3	29	0	normal
4	30	0	normal
5	2	8.15	abnormal
6	0	4.01	abnormal
7	27	0	normal
8	17	2.54	abnormal
9	22	5.39	normal
10	1	2.89	abnormal
11	30	2.26	normal
12	28	0	normal
13	0	27.01	abnormal
14	30	0	normal
15	0	3.52	abnormal
16	0	14.01	abnormal
17	25	0.47	normal
18	30	0	normal
19	0	12.81	abnormal
20	30	0	normal

Table 9. Analysis table of different threshold measurement results.

S	Error Rate	Sensitivity	Specificity	Accuracy
1%	4%	80%	96.77%	88.52%
2%	3.85%	83.33%	96.77%	90.16%
3%	12.9%	90%	87.1%	88.52%
4%	20%	93.33%	77.42%	85.25%
5%	23.68%	96.67%	70.97%	83.61%

Table 10. Analysis of feeding-behavior test results.

Video ID	Feeding-Behavior Sequence Recognition	Standing Duration (Seconds)	S (%)	Feeding Behavior
1	030302020202…	30	0.19	normal
27	131313030202…	27	2.26	normal
30	131313131313…	0	6.45	abnormal
45	131313131313…	0	16.41	abnormal
56	030313131313…	2	3.08	abnormal

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, L.; Xu, S.; Chen, J.; Wang, H.; Zheng, X.; Shen, M.; Liu, L. Detection of Feeding Behavior in Lactating Sows Based on Improved You Only Look Once v5s and Image Segmentation. Agriculture 2024, 14, 1402. https://doi.org/10.3390/agriculture14081402

AMA Style

Liu L, Xu S, Chen J, Wang H, Zheng X, Shen M, Liu L. Detection of Feeding Behavior in Lactating Sows Based on Improved You Only Look Once v5s and Image Segmentation. Agriculture. 2024; 14(8):1402. https://doi.org/10.3390/agriculture14081402

Chicago/Turabian Style

Liu, Luo, Shanpeng Xu, Jinxin Chen, Haotian Wang, Xiang Zheng, Mingxia Shen, and Longshen Liu. 2024. "Detection of Feeding Behavior in Lactating Sows Based on Improved You Only Look Once v5s and Image Segmentation" Agriculture 14, no. 8: 1402. https://doi.org/10.3390/agriculture14081402

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of Feeding Behavior in Lactating Sows Based on Improved You Only Look Once v5s and Image Segmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Animals, Housing, and Data Collection

2.1.1. Experimental Animals, Site, and Time

2.1.2. Data Acquisition

2.1.3. Data Preprocessing

2.1.4. Data Set Realization

2.2. Algorithm Design

2.2.1. Building the Object Detection Model

2.2.2. Experimental Setup

2.3. Quantification of Feed Residue in Troughs Based on Image Processing

3. Results

3.1. Target-Detection Model Improvements

3.1.1. Comparison and Analysis of Different Attention Mechanisms

3.1.2. Comparison and Analysis of Different Loss Functions

3.1.3. Ablation Study

3.1.4. Recognition Results of Different Labels

3.2. Feeding-Behavior Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI