1. Introduction
The dairy industry is increasingly facing grand challenges due to climatic changes [
1,
2], with heat stress being one of the most significant environmental factors affecting dairy cattle [
3]. Projected climatic trends indicate a troubling forecast for dairy production in the United States, with anticipated decreases in milk production due to heat stress expected to reach significantly low levels by 2080 [
4]. Despite current cooling efforts, these losses are juxtaposed against the increasing need to identify judicious uses of natural resources, including water [
5].
The adverse effects of heat stress on cattle include diminished milk production, decreased reproductive capabilities, heightened susceptibility to diseases, and potentially increased mortality rates [
6]. These consequences affect productivity and translate into considerable annual economic losses, estimated at billions of dollars. The ability of dairy cows to withstand heat, termed thermotolerance, is affected by a combination of physiological and behavioral aspects. These traits are significantly heritable and exhibit considerable variation across individual cows, complicating the challenge of effectively managing heat stress within dairy herds [
4].
The complexity of interpreting cow behavior in relation to heat stress is heightened by genetic diversity influencing their capacity to manage thermal stress [
7]. In particular, drinking and environmental enrichment use are challenging behaviors to quantify but may be the most informative behaviors for characterizing how individuals cope with heat stress. The intricate nature of thermal adaptation necessitates integrating sophisticated, non-invasive measures into the genetic selection process to enhance the thermotolerance of dairy herds. Thus, interpreting cow behavior in response to heat stress is complicated because some cows are genetically better suited to cope with heat stress. In contrast, others are behaviorally flexible in dealing with thermal challenges [
8].
Integrating automated, non-invasive phenotypic indicators of thermotolerance into genetic selection decisions using metrics relevant to thermotolerance is necessary [
4]. However, the existing monitoring methods of heat stress are labor-intensive and often fail to provide timely data [
9]. Furthermore, the increase in dairy size is juxtaposed against the need to monitor individual animals using fewer employees. Because of this paradigm, there is a persistent need to develop new strategies and technologies for monitoring individual animals in large groups [
1].
Cattle have a variety of inherent traits that can be used to identify unique individuals, including coat patterns, iris patterns, retinal patterns, facial features, and muzzle patterns [
8]. Holstein cattle, a common breed of dairy cattle in the US, are easily recognizable by their distinctive black-and-white patterns. Each cow’s pattern of spots is unique, making this morphological feature a useful biometric tool for individual identification [
10].
This paper presents an approach to address these challenges by developing a system that provides real-time and automated monitoring of dairy cows milked in a robotic milking system using artificial intelligence (AI) and computer vision technologies [
11]. First, we present an imagery data collection and processing approach that automatically detects and quantifies the drinking and brush use behavior of Holstein cattle dairy cows using their coat patterns. Second, the presented approach performs the fundamental research needed to enable the characterization and development of non-invasive behavioral phenotypes indicative of a cow’s ability to withstand heat stress. These behaviors (i.e., brush use and drinking behavior) are integral to maintaining homeostasis, particularly during heat stress [
4]. Monitoring an animal’s use of these resources provides insight into an animal’s inherent water efficiency (e.g., drinking behavior), temperament (e.g., resource use frequency, circadian pattern, and plasticity to environmental conditions), and motivation to engage in pleasurable behaviors (e.g., brush use) that ultimately promotes animal welfare [
3].
To validate the applicability of the proposed approach, we captured a video dataset consisting of 3421 videos with a total duration of 24 h of continuous recording of dairy cows housed at the T&K Dairy, a commercial dairy partner in Snyder, Texas.
Figure 1 shows examples from the collected video dataset. As shown in the video snapshots, the cows are housed in a single free-stall barn that is divided into six pens (
n = 180 cows/pen). Each pen provides cattle with access to four water troughs evenly placed throughout the barn. Near three of the water troughs within the pen, cattle have access to an automatic rotating cattle brush that is mounted to the barn.
The individual cows that appear in this video dataset were identified using clustering algorithms (e.g., K-means [
12]) to assign unique identifiers to individual cows, converting raw visual video streams into structured and analyzable formats stored in a relational database. Then, we utilized ML-based object detection models (e.g., YoloV8 [
13]) to accurately recognize individual cows using their coat patterns within the complex farm environments. A Convolutional Neural Network (CNN) model [
14] is trained using the extracted cow objects to classify each cow to a particular cluster in our database, which is used in conjunction with the DeepSORT algorithm [
15] to track cow activities and provides accurate quantification of watering and brush use behaviors. Finally, a user-friendly GUI interface is developed to enable system users to utilize the developed system conveniently.
This paper makes the following contributions. First, we present a machine learning approach that can automatically capture, process, and visualize massive video datasets to characterize behavioral phenotypes for dairy cows relevant to thermotolerance. Second, a novel object-tracking module is proposed to detect moving cows’ behavior in real-time CCTV footage videos. Third, this paper presents a GUI interface on top of a pipeline of ML models and computer vision algorithms (i.e., K-means, YoloV8, CNN, and DeepSORT) to allow ranchers to interact with the developed system conveniently using a web-based GUI.
2. Related Work
Machine learning (ML) coupled with computer vision [
14,
16,
17,
18,
19] has already enabled game-changing capabilities of robotic milking systems by providing the ability to enhance dairy cow health management by automating the detection and analysis of heat stress behaviors using CCTV footage videos [
7]. ML and computer vision have been used in the literature for a wide variety of functionalities in the dairy cattle domain, including the identification of individual animals [
18], analysis of cow behaviors such as feeding [
20] and standing and lying [
7], and the detection of health indicators such as lameness [
21] and body condition score [
17].
Fuentes et al. [
11] studied the use of ML and computer vision to identify the age of cattle based on their facial features. The face location was detected in still frames isolated from recorded video using YOLOv5. The authors used the MobilenetV2 tool to extract the face’s vector of 128 features and aligned it with Resnet18. The extracted feature vector is then fed into an ML model to predict the animal’s age accurately. Despite the similarity in the scope, our approach uses different methodologies in utilizing a pipeline of ML and clustering algorithms to identify individual cows based on side-angle images of their coat patterns.
In [
7], the authors used computer vision techniques to detect the lying behavior of dairy cows in a freestall barn. Similar to our work, the authors used a combination of YOLOv5x and DeepSORT to identify and track cows using individual bounding boxes for each cow. Changes in the properties of the bounding boxes were used to identify the start and end of positional change events (i.e., lying down and standing from a lying position). However, no attempt has been made to identify cattle based on their biometrics. While the bounding boxes are used to detect behaviors, the behaviors are detected using the changing properties of a single box, unlike our presented approach, which involves two bounding boxes overlapping.
Gupta et al. [
22] used the YOLOv4 model to identify cattle by breed. The YOLOv4 model was trained using a custom dataset of eight cattle breeds. The authors evaluated the model using an intersection over union metric, precision–recall curves, a confusion matrix, an overall accuracy equation, and Cohen’s kappa. The model was experimentally proven more effective with smaller and high-resolution images. When comparing the YOLOv4 model to other models used for breed detection (e.g., faster RCNN, SSD, and YOLOv3), it is found that YOLOv4 improved the performance of the three models.
Another work presented in [
18] attempted to develop a cattle identification method based on coat patterns. Videos were captured from a top angle, resulting in a top-down image of the cow’s back. A Mask R-CNN model was used to identify the patterned region of the cow and extract pattern features from the frames of the video, after which a Support Vector Machine (SVM) was used to identify cows based on their pattern features. The resulting system had an accuracy of 98.67%. While this project and ours focus on identifying cattle by coat pattern, the methodologies used vary significantly. The previous work uses a top-down view of the cow in contrast to ours, which uses a side view.
Wang et al. [
14] used a 3D-based CNN (E3D) algorithm to classify five cow behaviors in a clip of video: standing, walking, drinking, feeding, and lying down. Videos captured from cattle pens were split into short segments, each containing one of the behaviors of interest. The E3D algorithm comprised several modular parts: a 3D convolution module, a SandGlass-3D module, and an ECA module. The 3D convolution module extracted features from the still video frames, which were then put through the SandGlass 3D module to identify spatial and temporal properties. Background information from the videos was screened and removed by the ECA module. A 3D pooling layer and the Softmax function were used as the final processing steps to compress the behavioral features and perform the behavioral classification, respectively. The proposed model achieved high accuracy in detecting and classifying cow behaviors. This project adopted a different approach from ours, though both were based on a CNN model. Both projects also identify multiple behaviors with the same algorithm. However, this work focuses solely on behavior detection and does not attempt to identify cattle as individuals.
Another study presented in [
23] achieved acceptable results in detecting the behaviors of dairy cows. The authors focused on developing a deep learning model called Res-DenseYOLO, which is an improvement of the YOLOv5 model by incorporating DenseNet and residual network structures to enhance feature extraction, for the automatic recognition of dairy cow behaviors, specifically standing, lying, eating, and drinking. However, this work has not implemented the unique identification of individual cows or continuously tracked the duration of behaviors.
Despite the previous success of cattle identification using computer vision to identify coat patterns, the existing work has notable limitations [
17,
24]. The used imagery dataset has a small field of view, often set up where cattle walk through narrow passages with limited ability to turn [
8,
10,
25]. Additionally, the lighting is constant, and there may be only one or a few cattle in the frame at a time, all of which simplify the task of identifying cattle by computer vision but limit the potential applications in a busy barn. In contrast, our approach is designed to identify cattle at a distance and in an open space within a broad view frame.
In summary, the existing work focusing on the automatic characterization of behavioral phenotypes for dairy cows used different approaches for cattle identification [
11,
17,
18] and behavioral monitoring [
7,
11,
22] via computer vision and ML. However, none of them have successfully combined these two objectives into a single platform and conveniently provided a user-friendly GUI interface to the system. To the best of our knowledge, our approach represents the first step to building a system that automatically identifies dairy cattle based on biometric features and monitors their behavior of interest based on interactions with other objects in the barn (i.e., water troughs and brush stations).
3. Design
3.1. Dataset Collection
Figure 2 illustrates the camera placement in the barn at a low angle to capture the side of the cows. As shown in the schematic figure, each barn at the T&K Dairy is fitted with Safevant, Safesky, and 1080P Isotect wireless security cameras that continuously capture individual cow behavior at the waterers and the brushes throughout the 45-day observation period. Cows are milked using a Lely Robotic Milking System, equipped with 18 robots and 3 robots per pen, that milks the cows twice daily.
Cows are cooled using multiple different strategies. The barn is equipped with fans, sprinklers, and foggers. The sprinklers begin running, at rates of one-minute durations, in a round-robin system across all pens when the air temperature in the barn exceeds 74 °F. Thus, each pen will have the sprinklers turned on for one minute at least ten times per hour until the temperature falls below 74 °F. When the temperature in the barn exceeds 80 °F, the fogger system will begin operating and will continue until the temperature drops below 80 °F.
Several variables are collected using the robotic system and recorded in the Lely management software Time for Cows (T4C). This project is of specific interest in milk production, yield, maximum milk speed, dead milking time, and robot behavior (i.e., visit, rejection, and fetch frequency). A subset of focal cows (n = 96; 16 cows/pen) that are 45–90 DIM were monitored for a 45 d period.
We captured a video dataset on 12–13 March 2023, consisting of 3421 videos with a total duration of 24 h of continuous recording of the cows. These videos were recorded in DAT format. We converted them to MP4 format using the FFmpeg conversion tool [
26]. This preprocessing step was necessary because the MP4 format has high compression and compatibility with numerous multimedia applications, making it the preferred choice for ensuring seamless playback and processing.
These video recordings were used quantify drinking behavior and brush use behavior. When individual identification is required on the video recordings, each dairy cow has a unique coat color spotting pattern that can be used for individual identification. During the time that individuals were fitted with pedometers, their drinking and brush use behavior (frequency, duration, circadian pattern, displacements) were decoded from video recordings. While this is possible using manual decoding methods, the development of automatic ML-based methods can expedite data collection, knowledge creation, and results implementation.
3.2. Methodology
The practical side of the proposed approach is to build machine vision and ML methods to support the automatic acquisition and processing of imagery data needed to develop behavioral phenotypes for dairy cows relevant to thermotolerance. The foundational work aims to understand the principles underlying such systems and inform the design and implementation decisions about them.
Figure 3 shows the system architecture of the proposed approach, which is divided into four layers. Layer 1 shows the video preprocessing phase, which involves slicing the collected video dataset (i.e., 3421 videos) into individual frame images using a Python 3.12.4 script leveraging the FFmpeg framework [
26] at predefined intervals. We then extracted 1961 cow objects to train the cow clustering algorithm and CNN model. The Roboflow tool [
27] was utilized to annotate the cow, brushing tool, and waterer objects to train the detection and segmentation models.
Layer 2 shows the cow detection and segmentation module using the YOLOv8 model, the cow clustering module using the K-means model, and the cow identification module using the CNN and SENet models [
28]. Layer 3 describes the application layer implemented using the Python Flask Framework [
29] and SQLite version 3.46 database engine [
30] to build a GUI web-based app that allows system users, shown in layer 4, to use the system conveniently.
Using unsupervised and supervised ML models alongside algorithmic tracking and behavior analysis [
24], we utilized a pipeline approach for processing the video dataset where data flows from one layer to the next.
Figure 4 shows the different phases of cow detection, clustering, identification, and tracking behaviors of interest.
3.2.1. Cow Detection and Segmentation Using YoloV8
The YoloV8 model is trained using a custom imagery dataset to accurately detect and segment the cow objects in the video frames. The YoloV8 model is a deep learning algorithm used for its high-performance detection of real-time objects within video streams. Upon receiving video input, YoloV8 processes the frames to identify and locate the cow, water tank, and brushing tool objects, assigning bounding boxes around them.
After detecting the objects of interest (i.e., cow, water tank, and brushing tool objects), we used a cropping tool to extract the bounding boxes generated by YoloV8 containing these objects from the frames. This extraction process is vital to isolate objects of interest from their background, allowing for cleaner data input into the next clustering phase. This process is summarized in Algorithm 1.
Algorithm 1 Cow detection and segmentation using YoloV8. |
- 1:
Import necessary libraries and define classes SAMMaskGenerator - 2:
Initialize model with type, checkpoint, and device - 3:
function generate_and_save_mask(, , ) - 4:
Load and process the image to RGB - 5:
Generate masks and sort by area - 6:
Save masks and optionally save RGBA images - 7:
end function - 8:
function process_images(, , , ) - 9:
for each image in directory do - 10:
Generate and save mask for each image - 11:
Detect objects using the YOLO model - 12:
Save images with detected objects - 13:
end for - 14:
end function - 15:
procedure main(, , , ) - 16:
Initialize mask generator and YOLO model - 17:
Process images in the specified folder - 18:
end procedure
|
We used the
TaskAlignedAssigner class to improve the model’s performance by effectively matching the predicted bounding boxes with ground truth boxes. In particular, it calculates a score
s for each predicted box, as follows.
where
is the prediction score corresponding to the ground truth category,
is the IoU of the prediction bounding box and the ground truth bounding box, and
m and
n are hyperparameters that weight the importance of the classification score and the IoU score, respectively.
TaskAlignedAssigner ensures that only these prediction scores, which are confident in their class predictions and accurate in their localization, are selected as positive samples. This dual consideration helps the model learn more effectively from classification and localization tasks, leading to improved overall performance in object detection.
3.2.2. Cow Clustering Using K-Means
The extracted cow objects are then fed into a K-means clustering algorithm, an unsupervised learning algorithm that groups the cow objects into clusters based on their visual similarities. The K-means algorithm iteratively assigns each cow object to one of K predefined clusters based on feature similarities, minimizing variance within the clusters and maximizing variance between them.
Algorithm 2 shows the steps of the cow clustering phase, which is divided into the following processes: (i) texture feature extraction using the Local Binary Pattern (LBP) method and (ii) creating color histograms by capturing and analyzing the color distribution in the cow images. We also used the Principal Component Analysis (PCA) method to reduce the feature dimensionality and focus on the most important features of input images.
As shown in the algorithm, we select an initial cluster of centroids,
K, randomly selected from the data points. We then assign each data point to the nearest cluster centroid. For a given data point
and centroid
, the assignment process is performed as follows:
where
is the cluster assignment for data point
, and
is the squared Euclidean distance between
and
.
The K-means algorithm updates the centroids after each iteration by calculating the mean of data points assigned to each cluster, as follows:
where
is the new centroid of cluster
j, and
is the set of data points assigned to cluster
j.
K-means tries to minimize the Within-Cluster Sum of Squares (WCSS) inertia objective function, which is defined as:
The algorithm keeps iterating between the assignment and updates steps until convergence, typically when the cluster assignments no longer change or the change in the objective function is below a certain threshold.
3.2.3. Cow Identification Using a CNN and SENet Model
We trained a Convolutional Neural Network (CNN) model enhanced with Squeeze-and-Excitation Network (SENet) layers [
28] using the cow clusters generated from the clustering phase to detect the cow objects based on their features and behaviors of interest. The training process allows CNN to learn the nuanced differences between clusters by calculating a similarity score for each cow against the cluster centroids. If the score exceeds a predefined threshold, the cow is assigned the ID of that cluster; otherwise, the cow is flagged as potentially new or not belonging to any existing cluster. Algorithm 3 shows the steps of the cow identification phase using the CNN and SENet model.
Algorithm 2 Cow clustering using K-means |
- 1:
function extract_features() - 2:
Read image from the path - 3:
if image is not read correctly then - 4:
Print error and return an empty list - 5:
end if - 6:
Convert image to grayscale and apply histogram equalization - 7:
Calculate LBP and generate a histogram - 8:
Calculate color histograms for each channel - 9:
Combine and normalize histograms return Combined histogram as a feature vector - 10:
end function - 11:
function organize_images() - 12:
for each cluster and image path do - 13:
Create or verify the existence of a directory for cluster - 14:
Copy the image to the corresponding cluster directory - 15:
end for - 16:
end function - 17:
function clear_directory() - 18:
if directory exists then - 19:
Remove directory and contents - 20:
end if - 21:
Create directory - 22:
end function - 23:
function visualize_pca_variance() - 24:
Plot PCA explained variance - 25:
end function - 26:
procedure main - 27:
Define image directory and output directory - 28:
Clear output directory - 29:
if image directory does not exist then - 30:
Print error and return - 31:
end if - 32:
List all image paths in the directory - 33:
Extract features from each image - 34:
Remove empty feature lists - 35:
if no valid features then - 36:
Print error and return - 37:
end if - 38:
Standardize features - 39:
Apply PCA to reduce feature dimensionality - 40:
Optional: Visualize PCA variance - 41:
Cluster features using KMeans - 42:
Organize images into clusters based on their assigned cluster label - 43:
end procedure
|
The SENet block enhances the feature extraction and representation of the trained cow images by dynamically recalibrating channel-wise feature responses.
First,
SENet applies a convolution operation,
to the input feature map
I, as follows:
where
represents the convolution operation and
X is the output feature map with dimensions
.
Algorithm 3 Cow identification using CNN and SENet Models |
- 1:
Define image transformations for data augmentation - 2:
Load the training, validation, and testing datasets - 3:
function imshow() - 4:
Convert tensor to image - 5:
Display image with title - 6:
end function - 7:
Define the architecture of the SENetBlock - 8:
Define the architecture of the CowIdentificationModel - 9:
Instantiate the model and transfer to the computing device - 10:
Define loss function and optimizer - 11:
Apply weight initialization to model - 12:
function evaluate() - 13:
Evaluate the model with data from the loader - 14:
Calculate accuracy return accuracy - 15:
end function - 16:
function train_model() - 17:
for to do - 18:
Train model for one epoch - 19:
Calculate validation accuracy - 20:
if validation accuracy is improved then - 21:
Save model state - 22:
end if - 23:
end for - 24:
Print best validation accuracy - 25:
end function - 26:
Train the model - 27:
Load the best model and evaluate on the test set - 28:
function accuracies_per_cluster() - 29:
Calculate accuracies per cluster return cluster accuracies - 30:
end function - 31:
Print accuracy per cluster for validation and test datasets - 32:
Visualize predictions of training images - 33:
function compute_cluster_centroids() - 34:
Compute centroids for each cluster in the dataset return centroids - 35:
end function - 36:
function predict_image() - 37:
Predict the label of an image using the model and cluster centroids return predicted label and similarity - 38:
end function - 39:
Define additional transformations for the test images - 40:
Compute centroids for the training dataset - 41:
Predict label for a test image and check similarity - 42:
function calculate_metrics_manual() - 43:
Calculate precision, recall, and F1 score manually return precision, recall, F1 - 44:
end function - 45:
Calculate and print precision, recall, F1 for validation and test sets
|
Then, the squeeze operation performs a global average pooling on
X to generate a channel descriptor
, which is defined as:
where
is the
cth element of the descriptor
.
The excitation operation models the channel-wise dependencies using two fully connected layers with ReLU and sigmoid activations, as follows:
where
and
are the weight matrices,
r is the reduction ratio, and
is the sigmoid function.
Finally, the recalibration step scales the original feature map
X by the channel-wise weights
s, as follows:
where
is the recalibrated feature map.
3.2.4. Tracking Cow Behaviors of Interest Using DeepSORT
The DeepSORT algorithm is used to track the cow’s behaviors of interest (i.e., drinking and brushing). The DeepSORT algorithm extends the SORT (Simple Online and Real-Time Tracking) algorithm by incorporating deep learning features for more accurate tracking in crowded and complex environments, which can track multiple objects in a video stream, handling challenges such as occlusion and reappearance.
DeepSORT uses the assigned cluster IDs generated from the previous CNN phase across the video frames and associates the recognized behaviors of interest to the individual cows throughout recorded videos. Algorithm 4 summarizes this process.
Algorithm 4 Tracking cow behaviors of interest using DeepSORT |
- 1:
Input: video_path, output_video_path, model_path - 2:
Output: Annotated Video, Activity Videos, Cow Images - 3:
procedure Inference() - 4:
Initialize YOLO model with - 5:
Initialize DeepSort object tracker - 6:
Create directories for cow activity videos and images - 7:
Load cow identification model - 8:
Define image transformations - 9:
Open input video and prepare output video writer - 10:
while video has frames do - 11:
Read frame from video - 12:
Detect objects using the YOLO model - 13:
Update tracks with DeepSort tracker - 14:
for each detected cow do - 15:
Capture cow image - 16:
Transform and classify cow image to predict cluster ID - 17:
Save cow image and update the database with the cluster ID - 18:
if cow is performing an activity then - 19:
Record activity duration - 20:
Generate and save activity video - 21:
Update database with activity information - 22:
end if - 23:
end for - 24:
Write annotated frame to output video - 25:
end while - 26:
Close video writer and release resources - 27:
end procedure - 28:
procedure predict_cluster_id() - 29:
Transform image to tensor - 30:
Predict cluster-ID using cow identification model - 31:
return predicted cluster-ID and probability - 32:
end procedure
|
We used the Kalman filter to enhance the motion prediction. The Kalman filter predicts the current state
of an object based on its previous state
, as follows:
where
F is the state transition matrix,
B is the control input matrix,
is the control vector, and
is the process noise.
The observation model updates the state with new measurements
, as follows:
where
H is the observation matrix and
is the measurement noise.
The cost matrix
combines the motion and appearance information to match detections to tracks for better data association:
where
is the Mahalanobis distance between the predicted state
and the actual detection value
,
is the cosine distance between the deep feature vectors
and
, and
is a weight parameter to balance motion and appearance costs.
3.2.5. Cow Behavior Analysis with Overlap Detection Algorithm
The final phase in our pipeline system is quantifying the cow behaviors of interest using an overlap detection algorithm. In particular, we developed an algorithm that calculates the duration each cow spends drinking water or using the brush tool by measuring the overlap area between the bounding boxes of the cow and the water tank or brushing tool. This duration is then logged into the database, which monitors the cows’ health indicators over time.
In Algorithm 5, the Calculate the Coordinates of the Bounding Box procedure describes the step of calculating the coordinates of the bounding box of an object of interest using the input parameters , and converting the result to a top-left format (i.e., ).
Procedure Check the Existence of Overlap between the Bounding Boxes presents the functionality of checking the existence of overlap between the bounding boxes of a cow object with a water tank or brushing tool object. The two coordinates of the two boxes, , are fed into the function, which returns if there is an overlap between and ; otherwise, it returns .
In Procedure Calculate the Overlap Area between Two Bounding Boxes, we calculate the overlap area between the input bounding boxes () using their centroid (i.e., the center coordinates of the bounding box) as follows: . We then calculate the Euclidean distance between the centroids of bounding boxes as described in Procedure Calculate the Euclidean Distance between the Centroids of Bounding Boxes.
We then check the proximity of a target bounding box to other boxes in a video frame by comparing the Euclidean distance of the target bounding boxes with all identified boxes in the scene using a predefined threshold.
Algorithm 5 Cow behavior analysis using the overlap detection algorithm |
- 1:
procedure
Calculate the Coordinates of the Bounding Box - 2:
Require - 3:
- 4:
- 5:
- 6:
- 7:
Return - 8:
end procedure -
- 9:
procedure
Check the Existence of Overlap between the Bounding Boxes - 10:
Require - 11:
- 12:
- 13:
if OR OR OR then - 14:
Return - 15:
else - 16:
- 17:
Return - 18:
end if - 19:
end procedure -
- 20:
procedure
Calculate the overlap area between two bounding boxes - 21:
Require - 22:
- 23:
- 24:
- 25:
- 26:
Return - 27:
end procedure -
- 28:
procedure
Calculate Euclidean distance between centroids of bboxes - 29:
Require - 30:
- 31:
- 32:
- 33:
Return - 34:
end procedure -
- 35:
procedure
Check the Close Proximity of a Target Bounding Box to other Boxes in a Video Frame - 36:
Require - 37:
- 38:
- 39:
for in do - 40:
- 41:
if then - 42:
- 43:
- 44:
Return - 45:
end if - 46:
end for - 47:
Return - 48:
end procedure
|
5. Evaluation
We experimentally evaluated our prototype implementation regarding classification accuracy and performance. For classification accuracy, we observed that our system delivers good results in natural conditions even when the images are captured from different distances from the camera, orientations, and illumination conditions.
Figure 14 shows an example of successful inference of cow identifiers and their behaviors of interest, along with the duration spent in each activity.
The precision vs. recall curve, shown in
Figure 15, summarizes the trade-off between the true positive rate and the positive predictive value for our YoloV8 model using different probability thresholds. In other words, it indicates the model’s ability to accurately identify the cow objects while maintaining a balance between false positives and false negatives. The curve demonstrates that the model achieves high precision and recall across a wide range of thresholds. Also, it attests to its effectiveness in detecting cows regardless of the sensitivity level, which proves that our system can be reliably deployed in real-world scenarios.
Precision represents the positive predictive value of our model, while recall is a measure of how many true positives are identified correctly. As shown in the figure, the precision vs. recall curve tilts towards 1.0, which means that our YoloV8 model achieves high accuracy while minimizing the number of false negatives.
The precision ratio describes the performance of our model at predicting the positive class. It is calculated by dividing the number of true positives (
TPs) by the sum of
TPs and false positives (
FPs), as follows:
The recall ratio is calculated as the ratio of the number of true positives divided by the sum of
TPs and the false negatives (
FNs), as follows:
The overall classification accuracy of our model is calculated as the ratio of correctly predicted observation (i.e., the sum of TPs and true negatives (
TNs)) to the total observations (i.e., the sum of
TPs, FPs, FNs, and
TNs) using this equation:
The YoloV8 model achieved an overall average classification accuracy of 93%, 97.9%, and 99.5% for the cow, water tub, and brush tool objects, respectively. The CNN model for cow identification achieved an overall average classification, recall, and F1-score of 96%, 97%, and 97%, respectively.
6. Conclusions and Future Work
As intensive dairies grow, the need for automatic cattle monitoring becomes more pressing. Manual observation can be practical on a small scale but quickly becomes infeasible when dairies host hundreds or thousands of cows. Further, there is an increasing need to use modern technologies, including computer vision and AI, to track behavioral changes to alert the farmer of the herd’s health status. This paper presented the design and implementation of an ML-powered approach for automatically characterizing behavioral phenotypes for dairy cows relevant to thermotolerance.
We collected a dataset consisting of 3584 videos of 24 h of continuous recording of hundreds of cows captured from the T&K Dairy in Snyder, Texas. The developed system used computer vision and ML models to monitor two cow behaviors of interest: the drinking and brush use of dairy cows in a robotic milking system. In particular, we utilized the YoloV8 model to detect and segment cow, water tub, and brushing tool objects. The K-means algorithm is used to group the cows into clusters, which is used as input to a CNN model to identify the cows in the videos. We used the DeepSORT model to track the cow activities in the barn. We finally quantified the behaviors of interest using the developed overlap detection algorithm. A user-friendly interface was created on top of the ML models, allowing ranchers to interact with the system conveniently.
We tested our system with a dataset of various cow videos, where crowded backgrounds, low contrast, and images of diverse illumination conditions were considered. Our system achieved high precision in object detection and behavior recognition, which was corroborated by the system’s ability to accurately track and analyze the cow behaviors of interest within a dynamic farm environment. Most notably, the YoloV8 and CNN models achieved accuracies of 93% and 96% in detecting the objects of interest and identifying the cow IDs, respectively.
In ongoing work, we are looking into opportunities for generalizing our approach to detect a broader range of changes in behaviors or health indicators in various farm conditions [
6], such as increased mounting or standing behavior that can indicate that a cow is going into estrus. In contrast, changes in walking and lying behavior can indicate lameness before it is evident enough to be noticed by manual inspection [
21]. Another avenue of further improvement is incorporating IoT sensors into the barn that could automate data collection and action initiation, such as adjusting environmental conditions in response to detected behaviors, thereby enhancing the system’s responsiveness. We expect the developed system to inform the genetic selection decisions and impact dairy cow welfare and water use efficiency.