1. Introduction
China has a high rate of pork consumption, and with the growth in its population, the country’s demand for pork is increasing. In 2021, the Ministry of Agriculture and Rural Development issued the “14th Five-Year Plan” National Animal Husbandry and Veterinary Industry Development Plan, which predicts that by 2025, the output value of the national hog farming industry will reach CNY 1.5 trillion. However, the country is not a breeding power, and there are many problems in China’s hog farming, such as high costs, a low degree of automation, and a weak breeding capacity of sows [
1]. In particular, the weak breeding capacity of sows is the key problem that restricts large-scale breeding in the hog industry. To realize large-scale pig breeding, how to improve the breeding capacity of sows is an urgent problem. However, sow fertility is affected by many factors, of which sow backfat is significant. Sow backfat is the soft fatty tissue on the back of the sow, and its thickness affects the animal’s service life, culling rate, reproductive performance, the performance of its piglets, and intestinal health [
2,
3]. Different backfat thicknesses have different effects on sow reproductive performance [
4]. The feeding program in production is based on the backfat thickness of sows. Therefore, dynamic detection of sow backfat thickness is an important means to improve sow reproductive performance and production capacity, and it is a powerful guarantee for the economic efficiency of pig farms and the realization of large-scale breeding.
Visual inspection is the original method of sow backfat detection, supplemented by pressing, which is subjective, causes stress in sows, and makes it difficult to achieve accurate and real-time measurement [
5]. In addition to the visual method, there is also the carcass method, which uses vernier calipers to measure the backfat of the carcass. The measurement results are accurate, but they do not apply to the detection of live pigs, and the workload is large and cumbersome [
6,
7]. In 1945, Hazel invented the probe method, which opened a new chapter in live measurement and improved the accuracy compared to the visual method. However, the probe method requires piercing into the skin of the pig, causing a strong stress reaction in pigs, which is not conducive to pig production. In the 1950s, Wild applied ultrasonic technology for the first time in the detection of biological tissues, and since then, the ultrasonic method has gradually become the best method for measuring the backfat thickness of live pigs that does not subject them to mechanical damage. Currently, the main method for measuring backfat thickness in live pigs is ultrasound, which requires professionals to draw the backfat position on an ultrasound image according to specialized theories to calculate the backfat thickness. This method reduces the accuracy of the measurement results, is time-consuming and relatively ineffective, is subject to the constraints of the equipment and the influence of the measuring personnel, and is not able to achieve real-time, low-cost contactless measurement of backfat thickness.
With the development of deep learning, deep learning-based computer vision technology has been very successful in detecting the behavior, individual weight, and body size of pigs [
8,
9,
10,
11,
12]. With great success, it has detection advantages, such as being contactless, real time, automatic, and objective, and it has become possible to establish a model for the contactless measurement of sow backfat thickness using computer vision technology. Zhang Lijuan et al. applied fully convolutional networks (FCNs) to determine pig backfat thickness in ultrasound images. The results showed that the correlation coefficient between the segmentation results and the actual measurement of backfat thickness reached 0.92, and the determination of backfat thickness in the ultrasound images of pigs segmented by FCN had high accuracy [
13]. Basak et al. [
14] used ultrasound to measure growing–finishing pigs’ backfat thickness and established a machine learning model of growing–finishing pigs’ fat mass and backfat thickness. However, the use of the ultrasound method not only causes a stress response in sows but is also limited by the level and equipment of professionals. Li Qing et al. [
15] used image processing technology instead of manual methods to achieve accurate measurements of pork backfat thickness, which improved the efficiency but is limited to only carcasses. Fernandes et al. explored the correlation between top-view 3D images and the backfat thickness of fattening pigs using deep learning and a manual means of extracting image features of fattening pigs, respectively. The experimental results showed that the R
2 of the predicted and measured backfat thickness obtained by the deep learning model and the mean absolute scaled error (MASE) were 0.45 and 13.56 mm. The best backfat thickness detection model was obtained by deep learning [
16]. Thus, the deep learning algorithm can be utilized to construct a contactless detection model of pig backfat thickness.
The essence of deep learning is to train neural networks. In the field of intelligent pig breeding research, generally, only end-to-end research of neural networks is conducted, focusing only on the detection accuracy of the network [
17]. This method lacks the understanding of its internal working mechanism, resulting in the inability to determine the image region that affects the decision making of the neural network and to understand the phenomenon of the high detection accuracy of the neural network, thus failing to improve the credibility of the application of neural network models. By visualizing image features extracted by the neural network, the research object and non-research object image regions affecting the neural network decision making can be studied. The former can study the feature map corresponding to image regions affecting the network decision making, providing ideas for understanding the neural network detection performance. The latter can guide the separation of which images are non-research object regions, obtain new image data, and study possible methods of data processing for improving the accuracy of the model test.
The residual network is a kind of deep learning. It is the pioneering image classification algorithm, and it solves the degradation and training problem of very deep networks, improves the neural network feature extraction ability [
18,
19], and is widely used as the feature extraction network part of computer vision tasks, which are widely used in the research field of livestock and poultry breeding [
20,
21,
22]. Therefore, to address the problem that real-time and low-cost contactless detection of sow backfat thickness cannot be realized, and to explain the backfat thickness detection model, this paper explores the relationship between sow back images and backfat thickness based on residual networks to construct a contactless detection model for sow backfat thickness.
2. Data Collection and Dataset Construction
The data in this study were collected by the research group from a pig farm in Guangxi Province, with a total of 48 gestating sows, including those in pre-gestation and mid-gestation. The sows were kept in single pens, and sows of different gestation periods were kept in different buildings. The video data of each sow were captured by an Azure Kinect camera. The camera was set up on a homemade adjustable mobile cart to capture video data of the sow back while standing at an overhead angle. The camera captured videos at 30 frames/second for 3 min, and the videos were parsed to obtain RGB images, which were used as the object of this study. The backfat thicknesses of 48 sows were measured using a Renco (LEAN-METER) backfat meter with a measuring range of 4–35 mm and an error of ±1 mm. The backfat measuring point was chosen as the P2 measuring point commonly used in the international pig industry.
Parsing the video data of the sow back to obtain the image data of the sow back and taking into account the similarity of the neighboring images, one frame of the image was taken every two seconds. Each sow’s video was parsed to obtain 90 images with a size of 1280 × 720, and a total of 4320 RGB images were obtained from the parsing of the video data of 48 sows. Some of the images are shown in
Figure 1, and the captured information contains the background of the sows in the limiting pen, the sows in the limiting pen next to it, and the drainpipe above it. To reduce the influence of the background and to speed up the forward inference and backpropagation of the model, the image of the sow back was cropped and grayscaled to retain more information about the sows in the middle limit fence and to reduce the information about the sows in the side limit fence as much as possible. The resolution of the image was reduced from 1280 × 720 to 1080 × 450. Part of the image is shown in
Figure 1. The cropped image was combined with the backfat thickness of the sow to construct the dataset.
The body condition scores of 48 sows are shown in
Table 1. The actual measured backfat thicknesses of the sows were all within the backfat thickness intervals corresponding to body condition scores 2, 3, and 4. The dataset was randomly divided into a training set, a validation set, and a test set with a ratio of 8:1:1, with the number of samples being 37, 6, and 5. To validate the stability of the model’s generalization performance across different datasets, the division of the five datasets was repeated with the five test sets. The samples were not repeated, and the division samples of each dataset are shown in
Table 2.
4. Research on Backfat Thickness Detection Model Based on Sow Contour Images
4.1. Residual Network to Extract the Image Features of the Sow’s Back Corresponding to Image Site Analysis
The high accuracy of the backfat thickness detection model is based on the residual network. To explain the phenomenon of high accuracy of the backfat thickness detection model, the feature maps of each layer of the network extracted by the model were analyzed using feature visualization. The residual network was loaded with the saved model, and the back image data of 48 sows were input. The input image is shown in
Figure 4, and the actual measurement position of sow backfat is shown in
Figure 5. A program with a visualization function was written to visualize the channel feature maps of each convolutional and residual layer. The process of extracting the back image features is shown in
Figure 6,
Figure 7,
Figure 8 and
Figure 9.
Point A in
Figure 5 indicates the area near the P2 point of sow backfat, which is the location where the actual measurement of sow backfat was made [
2]. The area around point B represents the area near the high point of the sow body, where Xiong Yuanzhu et al. measured the backfat thickness. They referred to this location as the thickest part of the shoulder [
6].
The highlighted region in the feature map represents that this part occupies a more important part of the feature map. The network pays more attention to the features of these image regions and plays a greater role in the decision making of the neural network [
24,
25]. By visualizing the features extracted by the residual network from the sow back image, it is possible to understand how the different regions of the sow back image affect the network decision making and, thus, analyze the effectiveness of the extracted features in predicting the sow backfat [
26].
A neural network’s extracted image features are divided into relevant features and irrelevant features. Relevant features are features in the target area of the image, and irrelevant features are features outside the target area [
27]. The relevant features in this study are the image features of the target area of the middle sow of the sow back image, and the irrelevant features are the image features outside the target area.
In analyzing the channel feature maps in
Figure 6,
Figure 7,
Figure 8 and
Figure 9, combined with
Figure 4 and
Figure 5, it was found that the irrelevant features of the channel feature maps extracted by the network for the images of 48 sows were those corresponding to the image regions such as the drain and the feeding pen, and the relevant types of features were the rump edges, the region near the P2 measurement point, the region near the body height point, and the flank contour edges. The analysis indicates that different network layers of the residual network extract different features. In addition to extracting the features of the sow body itself, the hip edges, the pig rear-drive area with its corresponding P2 point, the region near the body height point, and the side profile edges, and the irrelevant features, such as the restriction bar and the drainage pipe, etc., are also extracted. Finally, these features are combined in the fully connected layer to obtain the output of the network. The irrelevant features affect the decision making of the network, which may reduce the accuracy of the network. Subsequent research should isolate the image regions corresponding to irrelevant features and explore whether the irrelevant features will affect the accuracy of the results.
4.2. Interpretive Analysis of Backfat Thickness Detection Models
The kinds of relevant features in the back images of 48 sows were counted, and it was found that the features in four sow image regions, namely, near the P2 point, near the body height point, the rump edge, and the flank profile edge of each sow, were extracted by the residual network. The following analyzes the effectiveness of the feature maps extracted by the residual network from the back image of one sow in predicting the backfat thickness of the sows to explain the phenomenon of the high accuracy of the model.
The output feature maps of the residual module’s second layer and the first convolutional layer’s highlighted areas were roughly similar, so only the output feature maps of the first layer were analyzed. In the channel feature map of the first convolutional layer, shown in the first and third figures in
Figure 6, the residual network extracted the sow back hind image feature in the red-circled area. The feature acted through the convolutional layer and the fully connected layer to output the backfat thickness. This feature reflects the information related to the sow backfat thickness, which is consistent with the location of the actual measurement of sow backfat thickness (P2). The highlighted area contains the location of the actual detection of sow backfat on the sow back, which indicates that there is a correlation between the features extracted by the network from the back driving region of the sow and the sow backfat thickness, and that the residual network extracting the features in this region for the prediction of the backfat thickness is effective in predicting the backfat thickness.
The channel feature map of the third convolutional layer and the channel feature map of the fourth residual module layer had roughly the same highlighted area corresponding to the sow’s back area. It can be seen in the first and third graphs of the channel feature maps of the fourth residual module layer that the residual network extracted the features of the edge of the sow rump, and the features went through the action of the fully connected layer to output the sow backfat thickness, which reflects the information related to the characteristics of the sow rump. This had a greater impact on the network decision making, which is in line with the previous research carried out by Teng Guanghui et al. [
28], who believed that there is a correlation between the height-to-width ratio of the sow rump, the area of the rump stock the radius of curvature, and the backfat thickness. These three areas also reflect the information related to the characteristics of the sow rump, and there is a correlation between the rump information and the sow backfat thickness, which suggests that the residual network extracting the features of the rump edges of sows is effective in predicting their backfat thickness.
In the first and third graphs of the channel feature maps of the fourth residual module layer, the residual network extracted the features in the area near the sow body height point and output the backfat thickness after the action of the full connectivity layer, which influenced the decision of the residual network. This area coincides with the location where the actual measurement of sow backfat thickness was made—the thickest part of the shoulder [
25]. Xiong Yuanzhi et al. measured sow backfat thickness at this location, which indicates that the features extracted by the network near the sow’s body height point are correlated with the sow backfat thickness, and the features extracted by the residual network from the region are effective in predicting sow backfat thickness.
In the fourth panel of the channel feature map of the fourth residual modeling layer and the third panel of the third convolutional layer, the residual network extracted the left half-contour edge and right half-contour edge features of the sow back and output the sow backfat thickness after the action of the fully connected layer. The left half-contour edge and right half-contour edge features are related to the prediction of backfat thickness, reflecting the information on the sow contour. However, no study has yet shown statistical correlation between the side contour and backfat. The side contour features of the 48 sows were extracted by the network, which indicates that the residual-extracted side contour features had a greater impact on the network decision and its correlation with the sow backfat thickness, which is effective for the prediction of sow backfat thickness.
In summary, the relevant features of 48 sows extracted by the residual network were statistically corroborated with the relevant studies on sow backfat thickness detection. Combined with previous studies on backfat thickness, it was shown that the extracted features of the area near the P2 point, the area near the body height point, and the rump edge features were all effective in the prediction of sow backfat thickness, which demonstrates that the method used in this experiment was correct as a model for backfat thickness detection. This explains the phenomenon of the high accuracy of the backfat thickness detection model and adds credibility to the practical application of the model. Meanwhile, the left half-contour and right half-contour edge features, which reflect the edge information of the lateral contour, were found to be effective in the prediction of backfat thickness.
4.3. UNet-Based Sow Contour Segmentation
The analysis of extracting sow back image features based on a residual network explains the phenomenon of the high accuracy of the model and increases the credibility of the practical application of the model but, at the same time, irrelevant features in the feature map that may affect the accuracy of the backfat thickness detection model were found. To explore the influence of irrelevant features on the accuracy of the backfat thickness detection model, the image region corresponding to the irrelevant features needed to be separated. In addition, to realize efficient and accurate segmentation, a Unet network [
29] was constructed to segment the data of 4320 sow back images of 48 sows to obtain new image data. The process of segmentation is shown in
Figure 10. Combined with the backfat thicknesses of 48 sows, a new dataset was constructed. The dataset was divided according to the dataset division method in
Table 2, and each dataset had the same individual sow samples.
4.4. Model Parameter Setting and Training
The architecture of the constructed residual network is shown in
Figure 2. The input image was converted into a grayscale image, with a size of 240 × 100. In this experiment, the Adam optimization algorithm and learning rate decay strategy were used. The number of training rounds was set to 50 rounds, the mean squared error (MSE) was used as the loss function for training, and the image batch size (batch sizes) was set to 4. The learning rate decay strategy was as follows: the initial learning rate was 0.01, and the learning rate remained unchanged within intervals of 10 rounds. The rounds were decayed to 0.1 of the initial learning rate within an interval of 10~40, and the rounds were decayed to 0.01 of the initial learning rate within an interval of 40~50. The hyperparameter space settings for the training of the residual network model are shown in
Table 4. The hyperparameter space in
Table 4 was searched using the automated search framework OPTUNA to determine the optimal parameter combinations within the search space, and five models with different parameters were obtained.
4.5. Test Results and Analysis
The five models obtained were tested using each of the five test sets, and the results of the predicted pairs for the five test sets are shown in
Figure 11. The MAE and R
2 sizes for the five sets are shown in
Table 4, with BF representing sow backfat thickness.
In
Figure 11, the five discrete points of the horizontal coordinates of the five test sets represent five pigs, and the yellow and green colors represent the actual and predicted backfat thickness of each pig, respectively. The results of the test set evaluation indexes are shown in
Table 4.
In
Table 4, the test set numbers in the first column, in order from smallest to largest, respectively, represent the five test sets in
Figure 11. The median denoising was performed on the results of 90 images of each pig predicted by the model. The median was taken as the final predicted backfat thickness of the pig, and combined with the real backfat thickness, we obtained the MAE and R
2 of the five test sets shown in
Table 4. The R
2 values of the five test sets were all greater than 0.91, and the mean R
2 value was 0.94. The MAE values of the five test sets were all below 0.65 mm, and the mean MAE value was 0.44 mm. These results show that the estimated and measured backfat thickness of sows based on the residual network model of the present study had a good linear relationship, and the model prediction accuracy was high.
4.6. Comparison of the Performance of Different Models in the Test Set
Table 5 shows the average of the prediction results of the three models on the five test sets. As shown in the table, in terms of the R
2 and MAE, the ResNet model in this study had the highest R
2 and the lowest MAE. VGG16 had strong performance in the classification field [
30] but it did not perform very well in this study, with an R
2 of 0.66 and an MAE of 0.66 mm, which were the lowest and the highest, respectively. In this study, the R
2 increased by 42.4% and 23.6%, respectively, and the MAE decreased by 68.3% and 63.3% compared to VGG16 and the model by YU M et al. [
31], respectively. Therefore, the performance of the model in this study was better than VGG16 and that proposed by YU M et al. [
31].
4.7. Comparison of the Performance of the Backfat Thickness Detection Model before and after Segmentation Analysis
In
Table 6, the test set numbers in the first column, in order from smallest to largest, represent the five test sets in
Figure 3 and
Figure 11. We compared the results of the five test sets obtained before and after the segmentation of the backfat thickness detection model. The MAE of each test set was obtained after the segmentation, except for the second group, which was the same as that of the pre-segmentation group. The rest of the test sets were all smaller than that of the pre-segmentation group, but the second group after segmentation had a larger R
2 than the pre-segmentation group. The average R
2 of the model after segmentation was greater than that of the pre-segmentation group, and the average R
2 of the model after segmentation was the same except for the first group. The rest were all larger after segmentation than before segmentation. The average R
2 of the model after segmentation increased by 3.3%, and the average MAE decreased by 18.5%. The results show that irrelevant features reduced the detection accuracy of the model, and visualization features can provide a reference method to improve the model accuracy.
5. Conclusions
In order to solve the problems of time consumption, relative inefficiency, equipment constraints, and the influence of surveyors, this study proposes a non-contact detection method for measuring sow backfat thickness based on the feature visualization of a residual network. Compared to the VGG16 model and that proposed by YU M et al. [
31], the R
2 of the model in this study was 0.94 in the test set, which is higher than the R
2 of the other two models, and the MAE was 0.44 mm, which is lower than the MAE of the other two models, indicating that the performance of the model in this study is better than the other two. Secondly, compared to the model before segmentation, the R
2 increased by 3.3% and the MAE decreased by 18.5%, which verifies the feasibility of removing irrelevant features and improving the accuracy of the sow backfat thickness detection model.
Feature visualization not only provides a method to improve the accuracy of the model but also validates the conclusions of some researchers on sow backfat thickness. We also found a sow contour area feature that is strongly correlated with sow backfat thickness. Whether establishing a model of the feature information of these areas to determine the backfat thickness of sows can improve the accuracy of the data, reduce the model size, and improve the accuracy of the model needs to be further studied.