1. Introduction
The pig farming and pork industry is a crucial component of the global food supply chain, with profound impacts on the economy, society, and the environment [
1]. Pig weight measurement is essential in livestock management, as it reflects growth conditions, helps optimize feeding strategies [
2], and is closely related to feeding costs and feed conversion efficiency, thereby aiding in cost reduction. Weight data support scientific market-out planning, ensuring optimal market weight to maximize economic benefits [
3]. Additionally, weight measurement helps in the timely identification of potential health risks, facilitating effective preventive measures. The application of intelligent weight measurement systems allows farms to record weight data in real time, improving decision-making accuracy and farming efficiency [
4]. Timely and accurate monitoring of pig weight throughout the farming process is essential for effective farm management.
Traditional pig weight monitoring methods typically employ mechanical scales or weighing cages [
5], which involve direct contact and can easily induce stress responses. The main types of stress include group stress, behavioral stress, and physiological stress. These stresses not only affect measurement accuracy but can also harm pig health, production performance, and economic returns [
6]. In contrast with traditional weighing methods, the study of modern weight estimation techniques in animal husbandry has made significant advancements through various technological means [
7,
8]. Compared to traditional contact-based weighing methods, these non-contact technologies not only reduce stress responses but also avoid high costs and operational inconvenience. For example, Pezzuolo et al. [
9] used the Kinect v1 depth camera for non-contact pig weight estimation and validated the effectiveness of the Kinect sensor under various lighting conditions; Hansen et al. [
10] utilized 3D imaging technology to capture cattle images and were able to extract data on weight and body condition unconstrained; Zhang et al. [
11] achieved rapid and accurate weight estimation by extracting features of pig height, body shape, and contour using a regression-based CNN model, with R
2 values ranging from 0.9879 to 0.9973. Furthermore, He et al. [
12] proposed a method for predicting pig weight based on depth images, using an improved BotNet regression network and a series of preprocessing algorithms, achieving an MAE value of 6.37 kg on 5326 test images. Although deep learning methods perform well in terms of accuracy, they often require high-performance computational resources. The adoption of lightweight machine learning for edge computing weight estimation is, therefore, a very important approach.
RGB images, as a low-cost and easily accessible data source, can provide rich information for tasks such as object detection, pose estimation, and 3D Gaussian generation. For instance, Liu et al. [
13] proposed an image segmentation-based point cloud generation method that can produce high-quality point cloud data from RGB images, suitable for the 3D modeling of complex objects such as animals. Shi et al. [
14,
15] developed a 3D surface reconstruction and body size measurement system based on multi-view RGB-D cameras. Liu et al. [
16] introduced a partial convolution-based image completion method that can handle irregular missing regions, making it applicable for local repairs of animal body images. Current systems often face challenges such as immobile equipment, high costs, and a lack of open datasets, making it difficult for research to scale. For instance, Reza et al. [
17] used RGB images for pig posture monitoring and early detection of welfare issues, optimizing farm management practices. Jorquera-Chavez et al. [
18] applied computer vision techniques for continuous animal monitoring and rapid detection of physiological changes associated with market-weight pigs, providing crucial support for the development of intelligent farming systems. Hou et al. [
19] employed the PointNet++ deep learning model to estimate cattle weight using LiDAR-acquired 3D point cloud data, achieving 95.1% accuracy with an RMSE of 10.2 kg. While highly precise, the method’s high computational cost and complex data processing limit its practical application. This highlights the necessity of lightweight, low-cost machine learning approaches for weight estimation. Although these studies are valuable, they often rely on complex setups that are difficult to adapt to small-scale operations. Moreover, there has been limited focus on integrating image feature extraction methods to improve the accuracy of weight estimation. Therefore, to overcome the limitations of existing technologies regarding immobile equipment and high costs, we propose a self-developed fixed-view lifting acquisition trolley that is not only lightweight and portable but also effectively integrates with weight estimation software, addressing the issue of fixed, immobile devices. This innovative solution enables real-time and flexible pig weight monitoring, offering greater practicality and scalability.
This study utilizes features extracted from the segmented mask images of pig RGB images (such as the relative projection area (SR), contour length (LC), body length (Length), body width (LW), and eccentricity (E)) to predict live weight. Compared to existing technologies, the machine learning methods employed not only improve the accuracy of weight estimation but also overcome the limitations of deep learning methods, which require high computational resources. Through this study, we have identified optimal machine learning models for estimating pig weight in a free-moving state. Additionally, we share a dataset of RGB images of pigs in a free-moving state captured at a fixed height, along with corresponding weight information.
Main Contributions:
A machine learning-based method for weight prediction using image feature extraction, integrated into a lightweight and portable acquisition system, that significantly improves prediction accuracy through hyperparameter optimization and model configuration selection. This innovative approach enables real-time, on-site weight estimation without the need for large-scale, fixed equipment, providing a more flexible and cost-effective solution for practical farm management.
This research provides a publicly available RGB image and weight dataset for researchers in the fields of computer vision and precision livestock farming. The pig images in the dataset were captured in a natural standing position and daily living conditions, ensuring data authenticity and representativeness [
20].
2. Materials and Methods
The overall weight estimation process of the method is illustrated in
Figure 1. Using collected pig body measurements, weight, and back image data from the farm, weight estimation tests were conducted. The collected data were preprocessed and then divided to create the required dataset. Next, manually segmented image data were used for transfer learning [
21], where the SAM2 [
22] model, an instance segmentation algorithm based on the Segment Anything Model (SAM), generates a large number of mask images for weight estimation. SAM2 enables accurate object segmentation without requiring extensive labeled training data, making it well-suited for our pig weight estimation task. After removing image portions that could affect the experimental results, image feature extraction was performed, followed by weight estimation testing using machine learning regression models. Finally, the model was evaluated.
The specific information regarding data collection is shown in
Table 1.
The specific process of data collection is as follows: First, the pigs are driven out of the group pens and guided into a cage scale for weight measurement and recording. Then, a tape measure is used to collect and record the body measurements. Next, the pigs are driven into an empty pen where the image collection equipment is located, and the collection vehicle is adjusted to the appropriate position, ensuring that the camera is perpendicular to the pig’s back before starting data collection. During the collection process, the pigs are followed as they move freely, and data are continuously collected until the target number is reached. Finally, the pigs are driven back to the group pen.
2.1. Dataset Construction
2.1.1. Data Acquisition
Body Scale Data Acquisition
The main body measurement parameters for pigs include body length (Length), shoulder width (shoulder), withers height (Height), chest girth (Chest), abdominal girth (Abdominal), hip girth (Hip), and cannon bone circumference (Cannon Bone), as detailed in
Table 2. Measuring personnel used a tape measure to collect data on the pigs, with the measurement schematic shown in
Figure 2. Each pig was measured twice for body dimensions, and the average value was recorded.
Image Data Acquisition
The image collection equipment used is a self-developed, fixed-angle, suspended collection cart designed by the team (
Figure 3). It is equipped with a main controller and a power supply for the depth camera, allowing for the fixed installation of both the main controller and the depth camera for data collection and weight estimation. The device is constructed using aluminum profiles and features a crane-like structure for camera installation. The camera used is the Orbbec Femto Bolt model, while the terminal is a Microsoft Surface Book 2, with an Intel I7-8650U CPU. The mobile power supply can simultaneously power both the camera and the terminal. The camera is mounted on a self-developed, fixed-view lifting acquisition trolley, where the camera is mounted parallel to the ground and the distance between its lens and the ground is maintained at a set angle. This equipment enhances the comprehensiveness and convenience of data collection, reduces pig stress, and provides better space efficiency and operational flexibility. It enables the tracking of pig behavior during free movement while maintaining a safe distance, facilitating efficient data collection. This functionality enriches the diversity of pig morphology in the dataset and minimizes the negative impact of stress on the growth of multiple pigs.
2.1.2. Data Pre-Processing
The camera operates in the WFOV 2X2BINNED mode, which effectively enhances the sensor’s signal strength, improves imaging quality under low-light conditions, and reduces image noise (
Figure 4). It has a wide field of view of 120° × 120°, capable of effective measurements within a range of 0.25 m to 2.88 m. The system captures images at a rate of 30 frames per second, with the RGB camera resolution set to 1920 × 1080 pixels. Throughout the experiment, we collected a comprehensive dataset of back images from 73 pigs with varying body weights.
The next step involves data cleaning, where images containing pigs lying down, which may affect their standing posture, as well as images where the body features of the pig are incomplete for weight estimation, or images that are overexposed or underexposed, will be deleted. Only images where the pig is standing, with complete body features and appropriate lighting, will be retained. Subsequently, corresponding depth images are then obtained based on the retained RGB images.
2.1.3. Description of the Dataset
After data cleaning, the final dataset retained images where the pigs were standing with complete body features and appropriate lighting. These images meet the requirements for subsequent weight estimation and provide a high-quality foundation for dataset construction. Since no human intervention was made in the pigs’ behavior during the collection process (except for lying down), the images reflect a rich variety of pig morphology. Additionally, the collection pens were spacious and closely simulated the pigs’ daily living environment, allowing them to freely engage in activities such as eating and drinking. Therefore, the collected images accurately reflect the pigs’ natural morphology and behaviors, aligning with their daily life habits. The behavior states of the pigs included in the dataset are shown in
Figure 5.
Detailed information on the dataset structure is provided in
Appendix A: Dataset Structure.
2.1.4. Value of the Dataset
Advancing Computer Vision Algorithm Development: This dataset provides a rich set of training samples for the computer vision field, particularly in tasks like animal behavior analysis and weight estimation. It promotes the optimization and application of machine learning algorithms, fostering innovations in unsupervised learning, object detection, and instance segmentation technologies.
Accurate Weight Estimation and Health Monitoring: Through non-contact weight estimation of pigs using RGB images and combining behavior analysis, this method offers an efficient, low-cost solution for pig management.
This dataset can be used for training and validating machine learning models, particularly for 3D Gaussian-based applications [
23]. It facilitates the generation of 3D models of pigs, enabling accurate weight estimation and further exploration of animal behavior in three-dimensional space.
The PIGRGB-Weight dataset information is shown in
Table 3.
2.1.5. Dataset Classification
This study employed a five-fold cross-validation method to evaluate the performance of the live pig weight prediction model, with a training-to-test set ratio of 4:1 (
Figure 6). To ensure that the weight distribution in both the training and test sets was representative, the weights and related feature values of all pigs were sorted in ascending order. The purpose of this approach was to ensure that each fold in the dataset presented a uniform increase in weight, allowing each fold to cover pigs from different weight ranges and ensuring that the model could effectively predict data across various weight ranges.
2.2. Segmentation of Pig Images in the Environment
To achieve precise segmentation of individual target pigs from the images, we first manually annotated the dataset using the EISeg tool [
24], ensuring the quality of the training data. The manually annotated dataset provided a solid foundation for training the segmentation model, ensuring that the model could accurately learn the morphological features of the pigs. Subsequently, we employed the pre-trained SAM2 segmentation model, which performs excellently in image segmentation tasks in complex environments and can effectively handle target segmentation under various poses and backgrounds [
25]. We chose to directly use this model for segmenting individual pigs, making minor adjustments to create SAM2-Pig, so as to fully leverage the general features learned by the model from large-scale datasets.
All input images were standardized before segmentation to ensure image consistency and the stability of model input. When using the SAM2 model for segmentation, the model could accurately identify and segment the pig regions from the images in a short time, especially maintaining high segmentation accuracy even under complex backgrounds or when the pigs’ postures varied significantly. The segmented images were primarily used for the subsequent weight estimation task.
2.3. Image Feature Extraction
After image segmentation, to improve the accuracy of weight estimation, a continuous morphological opening operation method with adaptive kernel size was used for image processing. This method is typically employed to exclude unnecessary parts of the image, particularly to eliminate noise or irrelevant areas in the background of the target object. In this experiment, it was used to remove irrelevant parts such as the ears, tail, and legs from the pig mask images [
26]. These areas occupy a large portion of the image but are not directly related to the actual weight of the pig, so they needed to be excluded from the mask to avoid interference with the subsequent analysis. The result after segmentation is shown in
Figure 7. Additionally, to confirm the reliability of this approach, in the Discussion section, we will perform feature extraction on images without removing these body parts and conduct weight estimation tests under the same experimental conditions, except for the data.
Regarding the selection of feature values for image feature extraction, elements such as the relative projection area on the pig’s back (
SR), contour length of the pig (
LC), body length (
BL), and body width (
BW) were chosen. Since the images were collected during the pig’s movement, the pigs’ postures were relatively free and could either be straight or curved, so eccentricity (E) was introduced as a feature to correct for this variation [
5].
2.3.1. Relative Projection Area (SR)
The relative projection area is the ratio of the pig’s back area in the mask image to the total area of the entire binary image (1):
where “
refers to the number of pixels in the image with a value of 255 (white) and “
” refers to the total number of pixels in the image.
2.3.2. Contour Perimeter (LC)
Contour length is the boundary length of the pig’s back in the mask image, which refers to the total number of pixels along the contour of the pig’s body (2):
where “
” are obtained using cv2.findContours and ‘
’ indicates that the contour is closed [
27].
2.3.3. Body Length (BL) and Body Width (BW)
Body length is the length of the long side of the minimum rectangle enclosing the pig’s back in the mask image, which refers to the length of the longer side of the minimum bounding rectangle. Body width is the length of the short side of the minimum enclosing rectangle around the pig’s back, which refers to the length of the shorter side of the minimum bounding rectangle (3) and (4):
where box consists of the four corner coordinates (
box[0],
box[1],
box[2],
box[3]) of the minimum bounding rectangle returned by cv2.minAreaRect(contours). The values
box[1] and
box[2] correspond to the longer side of the rectangle, and
box[0] and
box[1] correspond to the shorter side. By explicitly using the Euclidean distance formula, we ensure an accurate computation of body length and body width.
2.3.4. Eccentricity (E)
The contour of the pig’s back in the mask image is extracted, and an ellipse is fitted to describe the contour shape using the least squares method. The eccentricity of the pig’s body is then calculated by finding the square of the difference between the ratio of the ellipse’s long axis and short axis. This method effectively describes the degree of deviation in the pig’s body shape and provides an important geometric feature for weight estimation and other analyses (5),
where the
is the major axis of the ellipse, and the
is the minor axis of the ellipse.
2.4. Regression Models
This study uses multiple regression algorithms for feature training, including Ordinary Least Squares (OLS), Support Vector Regression (SVR), Backpropagation Neural Networks (BPNNs, including trainbr, trainlm, trainscg, and traincgb), AdaBoost, CatBoost, XGBoost, and Random Forest (RF). OLS regression is a classic linear regression method aimed at fitting the data by minimizing the sum of squared errors, suitable for scenarios with a strong linear relationship and good interpretability [
28]. SVR performs regression using the Support Vector Machine algorithm, effectively handling nonlinear relationships and utilizing the kernel trick to find the optimal regression hyperplane in high-dimensional space, thereby enhancing the robustness of the model [
29]. The BPNN is a multilayer feedforward neural network that uses the backpropagation algorithm to update network weights and can handle complex nonlinear data [
30]. In the BPNN, trainlm (Levenberg–Marquardt algorithm) and trainbr (Bayesian regularization) are two commonly used optimization variants that accelerate convergence and prevent overfitting. In addition, the trainscg (Scaled Conjugate Gradient) optimization algorithm is used for training large-scale datasets, providing a more stable training process by extending the conjugate gradient method and effectively avoiding convergence problems in traditional gradient descent methods [
31]. On the other hand, traincgb (Conjugate Gradient with Box Constraints) is another variant of the conjugate gradient method, suitable for optimization problems with constraints, enabling efficient optimization under specific constraints on model parameters [
32]. AdaBoost is an ensemble learning method that repeatedly trains multiple weak regressors, increasing the weight of misclassified samples in each iteration to improve model accuracy [
33]. CatBoost is based on gradient boosting trees and has strong capabilities in handling categorical features, improving model performance through loss function optimization and feature ranking [
34]. XGBoost improves traditional gradient boosting trees by introducing regularization terms and second-order Taylor expansions, significantly improving fitting accuracy and computational efficiency, with strong generalization ability [
35,
36]. Random Forest (RF) integrates multiple decision trees and combines a voting mechanism to improve prediction performance, effectively reducing the bias and variance of a single decision tree, enhancing the model’s robustness and stability [
37]. By using these regression algorithms in combination, this study can select the most suitable model for weight estimation in different data types and task scenarios, maximizing the model’s predictive performance.
In all models, we consistently used the following parameter settings to ensure the stability and efficiency of the training process: we employed an Early Stopping strategy to prevent overfitting. The batch size was set to 100, and the specific training parameters were configured as follows: the learning rate (net.trainParam.lr) was set to 0.01, the goal error (net.trainParam.goal) was set to 1 × 10−6, and the minimum gradient (net.trainParam.min_grad) was set to 1 × 10−7. These settings ensured consistency across all models during the training process, helping the models achieve optimal performance and effectively prevent overfitting.
We used the coefficient of determination (
R2) (6), the mean absolute error (
MAE) (7), the mean squared error (
MSE) (8), and the root mean square error (
RMSE) (9) as measures to evaluate quality. They are defined as follows:
where
is the sample size,
is the actual value,
is the predicted value, and
is the mean of the actual values.
3. Results
3.1. Image Segmentation Results
Based on 1309 manually annotated samples, we applied the pre-trained SAM2 segmentation model to segment the single-target pigs. The model successfully extracted the pig regions from the input images, even in complex backgrounds and diverse postures, demonstrating good segmentation performance. In total, the model processed 9579 mask images (
Figure 8), achieving precise segmentation results, which shows its good adaptability and stability in various scenes and pig postures. With these segmentation results, we provided accurate image data for the subsequent weight estimation task, ensuring the reliability of the analysis. Overall, the SAM2 segmentation model was still able to effectively complete the target pig segmentation task with a relatively small dataset, providing high-quality input data for the subsequent weight estimation and verifying its feasibility and effectiveness in practical applications (
Figure 9).
3.2. Feature Extraction Results
After removing features such as the ears, tail, and legs, which occupy a large area in the back image but contribute little to the actual weight, the extraction and calculation of various features from the mask images resulted in a total of 9579 data entries for 73 pigs.
3.3. Weight Prediction Results
After the feature values were extracted, they were normalized, scaling the data to a specific range, typically [0, 1], in order to eliminate the influence of inconsistent units between different features during model training.
where
is the original value of the data point to be standardized,
is the mean of the entire dataset, calculated as the average of all data points, and
is the standard deviation of the dataset, which measures the spread or dispersion of the data points.
In terms of feature selection, this study used the relative projection area, contour length, body width, body length, and eccentricity as input variables for the regression algorithms. These features help the model better capture the relationship between body shape and weight, thereby improving prediction accuracy. Next, the experiments were conducted in the Python 3.12.3 and PyTorch 2.5.1 environments, with a system supporting NVIDIA GPUs, specifically with NVIDIA driver version 475.14 and CUDA version 11.4. In this environment, we used several regression models, including OLS, SVR, AdaBoost, CatBoost, XGBoost, and RF, for the weight prediction task. The Backpropagation Neural Network (BPNN), including trainbr, trainlm, trainscg, and traincgb, was trained and tested on the MATLAB platform (MATLAB R2024a). By comparing the performance of these algorithms on different training sets, the effectiveness and robustness of each model in predicting live pig weight were evaluated.
To identify the most suitable BPNN configuration, a series of experiments were conducted to fine-tune the network structure, training algorithms, number of hidden neurons, and other key parameters. Various hidden layer configurations were tested, including single-layer and dual-layer combinations. Specifically, configurations such as [10], [20], and [50] for single hidden layers and [10, 10], [20, 10], [50, 30], [100, 50] for dual hidden layers were evaluated. Through experimentation, it was found that smaller single-layer networks (e.g., [10], [20]) are suitable for faster training and fewer computations, whereas dual-layer networks (e.g., [20, 10], [50, 30]) showed improved accuracy and generalization performance. Ultimately, by employing optimization algorithms such as trainbr, trainlm, trainscg, and traincgb and adjusting hyperparameters like the learning rate and training epochs, the optimal configuration for the BPNN network model was determined for this dataset (
Table 4).
The BPNN with the Trainlm[20,10] configuration performed the best, achieving the lowest MSE, RMSE, and MAE, and the highest R2. This indicates its superior prediction accuracy and fitting performance on this dataset. Following that, CatBoost and XGBoost also showed strong performance, particularly in terms of R2 and error metrics, demonstrating strong predictive power. OLS also performed relatively stably, making it suitable for cases with strong linear relationships, and it provided good interpretability. AdaBoost and SVR, however, performed weaker with larger errors, likely due to their inability to fully capture the nonlinear characteristics of the data.
Overall, these results suggest that the model can maintain high accuracy across this dataset. As shown in
Figure 10 and
Figure 11, each model demonstrated strong prediction performance. The carefully designed dataset facilitates optimal predictive results across various regression algorithms, demonstrating excellent adaptability and efficiency.
Figure 10 displays the scatter plots of the predicted values versus the actual values for each model. The models in the figure are arranged in descending order of their R
2 values, with the highest R
2 model placed at the top. This arrangement allows for an intuitive comparison of the prediction performance of each model, highlighting their alignment with the actual observed values.
Figure 11 shows the scatter plot of the optimal model, Trainlm. In this plot, the predicted weight values are tightly clustered around the ideal line (1:1 relationship), indicating the model’s excellent prediction accuracy. The closeness between the predicted values and actual values demonstrates that this model has high precision in estimating pig weight, with minimal deviation.
3.4. Supplementary Analysis and Further Research Directions
In this chapter, we discuss potential areas for improvement and optimization in the experiment. For instance, we will explore the possibility of directly using the collected pig body measurement data to develop the weight estimation model and investigate the relationship between body measurements and weight. Additionally, we will explore the use of the Tracking Regressor ensemble learning method, which could enhance model prediction accuracy by identifying and emphasizing the features most influential for weight prediction (such as chest girth and abdominal girth). Furthermore, we will assess the effect of changing the dataset partitioning method to determine whether the dataset demonstrates strong reliability and generalization ability in practical applications. We will also investigate whether removing or retaining areas such as the pig’s ears and tail in the mask significantly impacts weight estimation accuracy. Finally, we will apply the Pearson correlation coefficient to analyze the potential relationships between image features and body measurement data, offering a theoretical basis for effectively utilizing image features in weight estimation.
3.4.1. Analysis of Manual Measurement Data
We recorded manually collected body size data that included various body measurement characteristics such as body length, shoulder width, shoulder height, etc. These body measurements will be used as input characteristics to provide a practical basis for further research on the significance and impact of body measurement information on pig weight estimation.
Multiple Linear Regression Model
In the present experimental framework, manually measured body measurement data of pigs collected during image acquisition were utilized to construct a weight estimation model. The use of multiple linear regression (MLR) for weight estimation is a classic and effective statistical methodology. By analyzing the linear relationship between various body measurement variables and body weight, a regression equation is derived that enables accurate weight estimation. The multiple linear regression model can be expressed through the following weight estimation formula:
where
A represents Shoulder,
B represents Height,
C represents Hip,
D represents Chest,
E represents Abdominal, and
F represents Cannon Bone.
The weight estimation model was developed using manually measured body dimensions of pigs obtained during image acquisition, employing multiple linear regression techniques. By constructing a multiple linear regression equation, the model provides an effective method for predicting the weight of pigs based on their body measurements. This approach transforms the raw body measurement data into a robust mathematical framework, which not only enhances the accuracy of weight estimation but also offers practical applicability for further research.
Feature Importance Score
Based on the Feature Importance Score derived from the Stacking Regressor algorithm for pig weight prediction [
38], we employed Random Forest Regressor and Gradient Boosting Regressor as base learners, with Linear Regression serving as the final regressor, thereby constructing the Stacking Regressor model. Each row in the resulting table represents a feature and its corresponding contribution value. The magnitude of the value reflects the degree to which each feature influences the prediction outcome. A higher contribution value indicates a greater impact of that particular feature on the prediction result, as illustrated in
Figure 12.
The evaluation results demonstrate that chest girth and abdominal girth are the most influential features in predicting pig weight, accounting for the majority of the model’s explanatory variance. This observation aligns closely with the growth characteristics of pigs, where these dimensions directly correlate with body weight, thereby reinforcing the central role of these features in weight prediction. In contrast, features such as withers height, body length, hip girth, cannon bone circumference, and shoulder width contribute relatively less and may serve as auxiliary features. These less impactful features can be selectively retained during the model optimization process, depending on practical requirements. This analysis not only reveals the relative importance of each feature in weight prediction but also offers valuable insights for feature engineering, providing a clear strategy for optimizing model performance and improving prediction accuracy in real-world applications.
3.4.2. Train–Test Split
In the data partitioning stage, we compared the Train–Test Split and K-Fold Cross-Validation methods to evaluate the model’s performance under different partitioning strategies. The simple split method is suitable for situations with large datasets or limited computational resources, as it allows for quick validation of model performance. However, its results may have large fluctuations, especially when the dataset is small or unevenly distributed, which can lead to biased evaluation results, as shown in
Table 5. In contrast, K-Fold Cross-Validation, by repeatedly partitioning and validating the dataset, provides a more stable and comprehensive model evaluation, making it particularly suitable for small datasets or scenarios requiring high-precision evaluation.
From the experimental results, K-Fold Cross-Validation showed significant superiority in model evaluation. Specifically, in the RF and XGBoost models, the K-Fold method exhibited more consistent performance on the test set, with smaller errors and better generalization ability and stability. On the other hand, the Train–Test Split method, due to the randomness of the partitioning, resulted in larger test set errors and higher variability in evaluation results. This comparison not only reveals the impact of different partitioning strategies on model evaluation but also further validates the reliability and generalization ability of this dataset in real-world applications. K-Fold Cross-Validation has been proven to be a more robust evaluation method in this study, effectively reducing the random effects of data partitioning and providing more reliable performance assessments for the model. This finding also provides an important reference for the choice of data partitioning strategies in future related research, especially in scenarios involving small datasets or requiring high-precision evaluations, where K-Fold Cross-Validation should be the preferred method.
3.4.3. Effect of Retaining Large but Low-Weight Features on Weight Prediction
In the study by Xie et al. [
39], it was observed that the ears and tail of pigs occupy a significant portion of the image area but contribute minimally to the actual weight prediction. Based on this, we removed the ear, tail, and leg regions from the data images in our experiment. To further validate this hypothesis, we conducted a comparative experiment using the original, unprocessed image data, without any removal of these parts. The results of this experiment, showing the performance of the weight prediction task with the unprocessed images, are presented in
Table 6.
The experimental results indicate that by removing the areas in the pig images that occupy a large volume but have a relatively low weight contribution (such as the ears, tail, and legs), the model’s prediction accuracy can be significantly improved. This preprocessing strategy not only makes the model results more interpretable but also helps the model focus more on the primary features that are highly correlated with weight, thereby effectively enhancing prediction performance. This finding highlights the critical role of data preprocessing in model training and further validates the necessity of removing irrelevant parts during the data processing phase. The research results provide important theoretical support and practical evidence for future model optimization and data cleaning while also offering valuable references for designing data preprocessing strategies in similar tasks.
3.4.4. Pearson Correlation Analysis
The calculation of the Pearson correlation coefficient is significant in weight prediction models. By quantifying the strength of the linear relationship between features and actual weight, it effectively evaluates the predictive ability of each feature and provides a basis for feature selection. Additionally, the Pearson correlation coefficient helps to uncover potential relationships between different features, further deepening the understanding of the data. Based on this, the most valuable features can be selected, reducing the interference of redundant features in model training and ultimately improving model accuracy. Therefore, the calculation of the Pearson correlation coefficient provides data support and a theoretical basis for model optimization, helping to improve the accuracy of weight estimation.
Manual Body Measurements with Weight Measurements
According to the Pearson correlation coefficient matrix (
Figure 13), the correlations between weight and body length, chest girth, and abdominal girth are relatively strong, with values of 0.91, 0.96, and 0.94, respectively, indicating that these features have a significant impact on weight prediction. The correlation between weight and shoulder width and withers height is also strong (0.82 and 0.89) but slightly lower than that of chest girth and abdominal girth. The correlation between weight and hip girth (0.20) and cannon bone circumference (0.58) is relatively weak, so these features may be considered for exclusion during feature selection. Overall, chest girth, abdominal girth, and body length are key features for weight prediction, while shoulder width and withers height can serve as auxiliary features.
3.4.5. Applicability in Different Test Scenarios
Our experimental acquisition device has low requirements for the data collection environment and can adapt to various farming conditions, such as large-scale commercial farms, smallholder operations, and different flooring materials and lighting conditions. This flexibility ensures that the proposed method can effectively capture data and estimate weight across different farming scenarios. However, since our dataset primarily consists of specific pig breeds, variations in body shape, back contour features, and coat color among different breeds may affect the model’s generalization ability. For instance, more compact or elongated breeds may exhibit different relationships between key features—such as relative projection area, body length, and body width—and weight. Additionally, dark-colored or patterned pigs may pose challenges for RGB-based feature extraction, potentially impacting the model’s prediction accuracy.
It is also important to consider that the dataset collection method may introduce some potential biases. Specifically, the fixed camera height and the inclusion of limited pig breeds in the dataset may result in biases that affect the robustness and generalizability of the model’s predictions. These biases, such as variations in pig shape and coat color that influence feature extraction, could lead to limitations in the model’s applicability across a wider range of pigs and environments.
To improve the applicability of our approach, we plan to expand the dataset in future studies by incorporating more pig breeds and varying environmental conditions. This will enhance the model’s robustness and generalization, ensuring better performance across diverse breeding environments and a broader spectrum of pig breeds.
4. Conclusions
The image data used in this study were collected from a pig farm in a free-standing state, including activity scenes such as drinking, feeding, and running. The data were captured using a self-developed suspended fixed-angle acquisition cart, ensuring the stability and consistency of the image data. Through the use of SAM2-Pig for instance segmentation, mask images of the pigs were extracted, and features such as relative projection area, contour length, body length, body width, and eccentricity were further obtained. Based on these feature values, a linear regression model was used for machine learning modeling to estimate the pigs’ body weight. The experimental results showed that the BPNN with the Trainlm configuration performed the best across all evaluation metrics, with the smallest MSE, RMSE, and MAE, and the highest R2 value, demonstrating the superior prediction accuracy of this model on the dataset. The results of this study identified a good machine learning model for estimating pig body weight in a free-roaming state and verified the stability and reliability of the dataset under different models and partitioning strategies. Additionally, the RGB image dataset used in this study has been made publicly available for use by researchers in related fields.