Next Article in Journal
Application of Riemannian Seismic Ray Path Tracing in Salt Dome Prospecting
Previous Article in Journal
Advancements, Dynamics, and Future Directions in Rural Environmental Governance Research in China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Measurement Method for Body Parameters of Mongolian Horses Based on Deep Learning and Machine Vision

1
College of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China
2
Inner Mongolia Engineering Research Center of Intelligent Equipment for the Entire Process of Forage and Feed Production, Hohhot 010018, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(13), 5655; https://doi.org/10.3390/app14135655
Submission received: 29 May 2024 / Revised: 22 June 2024 / Accepted: 26 June 2024 / Published: 28 June 2024

Abstract

:
The traditional manual methods for measuring Mongolian horse body parameters are not very safe, have low levels of automation, and cannot effectively ensure animal welfare. This research proposes a method for extracting target Mongolian horse body parameters based on deep learning and machine vision technology. Firstly, Swin Transformer is used as the backbone feature extraction network of Mask R-CNN model, and the CNN-based differentiated feature clustering model is added to minimize the loss of similarity and spatial continuity between pixels, thereby improving the robustness of the model while reducing error pixels and optimizing the rough mask boundary output. Secondly, an improved Harris algorithm and a polynomial fitting method based on contour curves are applied to determine the positions of various measurement points on the horse mask and calculate various body parameters. The accuracy of the proposed method was tested using 20 Mongolian horses. The experimental results show that compared with the original Mask R-CNN network, the PA (pixel accuracy) and MIoU (mean intersection over union) of the optimized model results increased from 91.46% and 84.72% to 98.72% and 95.36%, respectively. The average relative errors of shoulder height, withers height, chest depth, body length, croup height, shoulder angle, and croup angle were 4.01%, 2.98%, 4.86%, 2.97%, 3.06%, 4.91%, and 5.21%, respectively. The research results can provide technical support for assessing body parameters related to the performance of horses under natural conditions, which is of great significance for improving the refinement and welfare of Mongolian horse breeding techniques.

1. Introduction

In recent years, with the continuous evolution of the modern horse industry, a trend of diversified development has emerged within the horse industry [1,2,3]. The modern horse industry in Inner Mongolia represents a novel tertiary sector that amalgamates leisure, economy, sports, and culture [4,5,6]. While horses are no longer primarily utilized for transportation or labor, their athletic prowess in sports, entertainment, and horse racing is pivotal [7,8,9]. Phenotypic parameters hold significant importance in equestrian sports, where variations in body parameters can directly impact kinetic characteristics and dynamic parameters, thereby influencing athletic performance [10,11,12,13]. For instance, hip joint height can affect the amplitude of upward movement of a horse’s left and right hip joints when a hoof touches the ground. This amplitude is deemed a robust evaluation index for horse movement performance. Horses with differing shoulder heights exhibit varied degrees of vertical movement during limb motion, while horses with distinct movement performances exhibit diverse pelvic and sacral movement patterns [14,15]. Concurrently, body parameters serve as crucial quantitative traits for horses, playing a vital role in breeding and directly reflecting their growth, development, and breed enhancement effects [16]. As such, precise measurement of body parameters is essential for the evaluation of Mongolian horse performance and the development of genetic breeding programs.
There are several ways to measure the body parameters of a horse. One is a manual measurement method, wherein the evaluator needs to use measurement equipment objectively, such as measuring tape, tape measure, etc. Another approach is to use machine vision technology to measure body parameters. The manual measurement method is affected by such problems as high time consumption, a heavy workload, low efficiency, and a high propensity to provoke a stress reaction. At the same time, horses may be anxious during manual measurement, and there is a certain risk [17,18]. Body measurement based on machine vision technology can reduce stress on animals and reduce the risk of injury to horses and assessors. Methods based on machine vision technology can not only be used to make measurements quickly but also to store visual data for future analysis. Researchers capture a horse’s movement by means of video or image analysis [18], using multiple cameras, in addition to tagging the horse, and then analyzing it frame-by-frame using appropriate software [13]. Mariz et al. [17] measured the body parameters of horses by collecting two-dimensional images of horses and using machine vision. The results showed that the precision of this image-processing method is very close to that of the manual measurement method, which indicates that the method based on machine vision can be used for horse body measurement and is reliable. Another study has also confirmed this notion [18]. At present, many studies have simplified the process of detecting horses’ key points by attaching markers to the anatomical points of horses and then studying the measurements of horses’ motion and body parameters [19,20,21]. Freitag et al. [20] placed markers on Quarter horses in the United States and employed ImageJ 1.51r image-processing software to manually mark key points for body parameter measurements, with an average relative error of only 1.46% for various body parameters. Recently, progress has been made in using stereoscopic image technology and three-dimensional information to obtain animal phenotypic parameters [22,23]. Pérez-Ruiz et al. [19] recorded side-view point cloud information of Andalusian horses using LiDAR sensors and manually measured the reconstructed side-view point cloud in RVis 1.14.1 software through manual scaling. The results showed a correlation coefficient of only 0.55 between the 17 body parameters reconstructed in three dimensions and the manually measured values. Matsuura et al. [24] used four scanners to scan horses simultaneously from four directions and combined the images from the four scans into one complete image of each horse. Height at the withers, back, and croup; chest depth; width of the chest, croup, and waist; girth circumference; cannon circumference; and body length were measured. Compared with manual measurement, the relative error range of this system was −1.89~7.05%. These advances help to measure multiple scale parameters of horses with a single image and enable non-contact measurements.
Animal body parameter measurement and behavior recognition using deep learning and machine vision technology has become a research hotspot [25,26,27,28]. Although body measurements based on reflective markers attached to the body require contact with an animal, computer vision and deep learning techniques have evolved to a point where unmarked motion capture is possible [29,30,31]. Chu et al. [32] proposed a method for keyframe extraction of the trunk in overhead walking videos of cows excluding the head and neck, using the maximum curvature point of the curve in the trunk as the measurement point to obtain the body parameters of the cows. The accuracy rates of keyframe extraction and head and neck removal were 97.36% and 94.04%, respectively, and the average relative error of the body parameter measurement results was maintained within 3.3%. Yang et al. [23] reconstructed the trunks of cows in three dimensions using Structure from Motion (SfM) and then annotated measurement points based on the morphological features of cows and calculated body parameters based on the positional relationships between measurement points. Geng et al. [33] proposed a semantic segmentation model combining attention modules to segment point clouds of pigs into different parts and measure body parameters for different segmented parts, with measurement errors for various parameters controlled within 5 cm. Li et al. [34] used U-Net to segment goat trunk images from complex background environments and employed the Hourglass key point detection model to locate body dimension measurement points in segmented goat images. The experimental results showed that the average relative errors for shoulder height, hip height, body diagonal length, and chest depth were 0.65%, 1.06%, 1.77%, and 1.99%, respectively. This demonstrates that utilizing CNN-based key point localization models in livestock body dimension measurement tasks can also yield high measurement accuracy. Song et al. [35] proposed the SimCC-ShufleNetV2 lightweight key point detection model for cows to address the issue of high network complexity in existing deep learning models. This model has a floating-point operation of 0.15 GB, a parameter quantity of 1.31 × 106, and a detection speed of 10.87 frames per second, providing technical support for tasks such as cow body dimension measurement, behavior recognition, and weight estimation. Although manual marking can significantly improve the accuracy of subsequent marking work, this method still has the disadvantages of a heavy workload and a low degree of automation. Body measurement based on a single image saves more time than using multi-angle images.
In summary, both domestic and international researchers have made significant advancements in employing computer vision technology to measure livestock body parameters in a non-contact manner. Despite these advancements, there remain critical challenges that need to be addressed, particularly in the precise segmentation of horse body parts. The complexity of accurately identifying and segmenting various parts of a horse’s body poses a significant obstacle. For Mongolian horses, which are known for their sensitivity and dynamic behavior, existing methods fall short of providing accurate body parameter measurements in a natural, non-contact manner. The evaluation of motor performance and the measurement of body parameters based on the key points of horse anatomy have been proven to be feasible [13]. Mask R-CNN is a prevalent deep learning model utilized for object detection and instance segmentation. While it excels at extracting local information, it often lacks the capability to integrate global context information effectively. This paper proposes an innovative method for measuring Mongolian horse body parameters without physical contact. This method leverages the combined power of machine vision and deep learning technologies to overcome the limitations of current techniques. Firstly, Swin Transformer is used as the backbone feature extraction network of the Mask R-CNN model, and a CNN-based differentiated feature clustering model is added to complete the optimization of the Mask R-CNN model. After utilizing the improved Harris algorithm and the polynomial fitting method based on contour curves, the locations of the horse body measurements points are determined. These measurement points were utilized to calculate the relevant body parameters that may affect the athletic performance of Mongolian horses, encompassing withers height, shoulder height, chest depth, body length, croup height, shoulder angle, and croup angle. By integrating these advanced technologies, the proposed method is designed to achieve more accurate and reliable measurements of horses’ body parameters while they are in their natural state.

2. Materials and Methods

Mongolian horse image data were collected from the Horse Breeding Technology Research Base of Inner Mongolia Agricultural University in Hohhot, Inner Mongolia Autonomous Region. In April 2022, an ORDRO V17 camera (BOYA Technology Company, Shenzhen, China) was used to capture Mongolian horses in an actual scene from the daytime, when they were out of the barn for free time activities, to the end of their free time activities in the evening. A total of 30 sets of Mongolian horse videos with a total duration of about 2 h were captured, with a resolution of 1920 × 1080 pixels and a pixel frame rate of 30 fps. The video footage was processed by a self-programmed Python 3.7 program, and, considering that redundant images would appear between consecutive frames of the Mongolian horse video images, it was set to be extracted and stored once in every 120 frames, and the invalid data were screened artificially. The pseudo-code of the self-programmed program is shown below (Algorithm 1).
Algorithm 1. Custom Methods
Input: A total of 30 Mongolian horse videos
Output: Stored images from videos every 120 frames
1. SET number_of_videos = 30
2. SET total_duration = 2 h
3. SET frame_rate = 30 fps
4. SET frame_interval = 120 frames
5. FOR each video_index from 1 to number_of_videos:
6. OPEN video file corresponding to video_index
7. SET frame_counter = 0
8. WHILE video has more frames:
9. READ current_frame
10. IF frame_counter % frame_interval == 0:
11. STORE current_frame as an image
12. INCREMENT frame_counter by 1
13. CLOSE video file

2.1. Mongolian Horse Training Dataset Construction

The image dataset included images of Mongolian horses in different growth stages, such as juvenile and adult; different postures; and different lighting conditions. In order to ensure the robustness of the model and combine the different problems that existed during the actual shooting, such as those relating to angle and hue, the original image dataset was expanded via rotating, mirroring, and other geometrical transformations; color transformations such as hue and saturation; the addition of noise; etc., and 2760 training set images were obtained in the end. The data augmentation effect is shown in Figure 1, and the number of images obtained after data enhancement are shown in Table 1. The result was 2760 training images. The labeling software Labelme 3.16.7 [36] was used to label the Mongolian horse training set. The obtained labeled images were in JSON format, and they were converted to the corresponding COCO format of the original dataset to improve the universality and uniformity of the dataset and facilitate the subsequent training of the convolutional neural network model and evaluation of the segmentation results.

2.2. Flow Chart of the Experiment

The specific experimental procedure for the measurement of Mongolian horse body condition parameters is shown in Figure 2, consisting of the following steps:
(1)
Mongolian horse videos were shot, processed, and filtered by frames, and horse images were labeled using Labelme software, yielding the training set;
(2)
Walking videos of individual Mongolian horses were used, and key frames were extracted as a test set;
(3)
The Mongolian horse image segmentation model was trained, and the test set was segmented using the trained model;
(4)
The center of mass of the segmented image was determined, and the contour was intervalized;
(5)
The point of maximum curvature in the contour was counted and marked using the improved Harris algorithm and the polynomial fitting method based on the edge contour so as to determine the specific location of the measurement point;
(6)
We calculated the scale parameters and related them to the pixel distances in the figure to calculate the true values of the Mongolian horses’ body parameters.

2.3. Improved Mask R-CNN

Mask R-CNN [37] is a classical two-stage instance segmentation model. Due to its simple structure and stable effect, this model has been widely used in the field of image processing. Its main structure is as follows: In the first stage, the feature information of an image is extracted by utilizing convolutional neural networks as a backbone (a Feature Extraction Backbone Network), such as VGG [38] series and ResNet [39] series networks. The upper and lower pathways (hereinafter referred to as feature maps) in Feature Pyramid Networks (FPNs) are scanned by a Region Proposal Network (RPN), and regions that may contain objects are proposed. Anchor points are then used to bind the features to the original image locations, and the sizes of the feature-mapped regions and their preselected borders are calculated. In the second stage, the size of the image is changed after extensive convolution of the original image. The pre-selected borders and the corresponding feature regions of different scales are cropped and scaled uniformly by the RoIAlign (Region of Interest Align network) module to generate a fixed-size Region of Interest (RoI) target region. In contrast to the original RoI pooling module of Faster R-CNN, the RoIAlign module can avoid the problem of information loss in the pooling process of RoI. Finally, it flows through two branches: one branch enters the fully connected layer for object classification and target prediction, and the other branch enters the Fully Convolutional Network (FCN) [40] for pixel-level image segmentation. Examples of the results yielded by the model are shown in Figure 3, where the first row shows the original images and the second row shows the mask maps corresponding to the original images.
As the original Mask R-CNN model has a low feature space resolution and a small boundary pixel ratio, the segmentation results of the target horse are not precise enough, the contour boundary is rough, and the shape of the segmented contour has a large gap with respect to the shape of the real target [41]. Therefore, a CNN-based differentiated feature clustering model was added as the mask post-processing module in the Mask R-CNN model for the preliminary results yielded by the original Mask R-CNN in regard to detecting the location of the target horse and outputting the mask. The network architecture of the improved model is shown in Figure 4b. Along with providing the location information and mask information of the target horse generated by the original Mask R-CNN, this module is able to perform cluster extraction for pixel clusters that are continuous and have similar features and apply a linear classifier to classify the image pixels into different clusters based on the pixel features of the input image. Due to insufficient light and other factors, individual horses and boundary lines in an image may appear blurred, leading to lower inter-pixel dependencies. While the Mask R-CNN convolutional neural network excels at extracting local information, it may fall short in integrating global information. The Swin Transformer, a novel visual Transformer model tailored for image-processing tasks, is adept at addressing these shortcomings, being capable of compensating for the aforementioned limitations. Therefore, Swin Transformer was used as the backbone feature extraction network for the Mask R-CNN model [42]. It utilizes the RPN to recommend candidate RoIs that may contain objects. These RoIs are mapped to fixed-size feature maps using RoIAlign. These feature maps are then passed through the fully connected layers of the detection branch for classification and bounding box regression. The segmentation branch uses an FCN for upsampling to generate the segmentation map. Swin Transformer offers significant advantages over ResNet-50 as a backbone feature extraction network for Mask R-CNN. Its hierarchical structure effectively captures multi-scale features, and its self-attention mechanisms enhance the ability to capture long-range dependencies, which is crucial for object detection and segmentation tasks. The Shifted Window Attention mechanism reduces computational complexity by shifting non-overlapping windows between layers while maintaining the ability to model long-range interactions. The feature extraction process of the Swin Transformer model is divided into four stages using a hierarchical self-attention mechanism. Information can flow between different levels, which helps the model capture deep features of different scales, ranging from local to global. Each Swin Transformer module first normalizes the input features and then calculates the window attention mechanism (W-MSA), adds the result to the input features through residual connections, and normalizes it again. By using the shifted window multi-head self-attention layer (SW-MSA), information exchange between windows is achieved through window shifting, increasing relative position encoding [43]. The attention calculation is restricted within each window, thereby reducing computational cost. The specific network structure is shown in Figure 4c. In addition, the integrity and continuity of the target Mongolian horse silhouette in the extracted image are ensured by utilizing the Simple Linear Iterative Clustering (SLIC) algorithm to perform superpixel extraction [44] and then calculating the predicted clusters of all the pixels in the superpixel and assigning the value with the highest number of clusters to the clusters of all the pixels in the superpixel, i.e., forcing all the pixels in the superpixel to be converted to the same cluster labels. Since only a single target Mongolian horse in an image was segmented in this study, a fixed number of labels was set to ensure model convergence. Finally, by calculating the loss between the network and the clustered labels and then using the back-propagation learning network to transmit the error signal to update the parameters of the convolutional filter and the classifier, the model is made to alternate and iterate between the two stages of the forward hyperpixel refinement process and the backward gradient descent. The model can improve the mask generation accuracy of the target Mongolian horse to a certain extent and allows one to prepare for the subsequent extraction of measurement points.

2.4. Measurement of Physical Parameters of a Mongolian Horses

2.4.1. Body Measurement Point Extraction Based on Improved Harris Algorithm

Due to the abundance of physical parameters of Mongolian horses, seven parameters that are closely related to athletic performance ability and have high heritability coefficients were analyzed: shoulder height, chest depth, withers height, body length, croup height, shoulder angle, and croup angle [45]. Shoulder height refers to the vertical distance from the anterior border of the sternum to the ground; chest depth refers to the vertical distance from the point of the nail to the end point of the elbow; withers height refers to the vertical distance from the point of the nail to the ground; body length refers to the vertical distance from the anterior border of the sternum to the posterior border of the end of the sciatic bone; croup height refers to the vertical distance from the highest point of the buttocks to the ground; shoulder angle refers to the angle formed by the point of the nail, the anterior border of the sternum, and the end point of the elbow; croup angle refers to the angle formed by the highest point of the buttocks, the posterior border of the end of the sciatic bone, and the knee of the hindlimb; and hip angle refers to the angle formed by the highest point of the hip, the point of the posterior margin of the sciatic bone, and the knee joint of the hind limb, as shown in Figure 5.
By analyzing the locations of the Mongolian horse measurement points, it was found that all six measurement points corresponded to the point of maximum curvature at the edge of the contour. The principle of the Harris corner point algorithm is to use a small window to move through the image in a small area and to compute the change in the gray scale of the window. If the change is greater than the threshold, then the center pixel point in the window is determined to be a corner point. The principle of this algorithm can be represented using an autocorrelation matrix calculation as
M ( p ) = ( x , y ) W l x 2 ( p ) l x y ( p ) l x y ( p ) l y 2 ( p ) ω ( p ) = A C C B
where l x ( p ) denotes the value of the gradient in the x-direction at position p in image I; l y ( p ) denotes the value of the gradient in the y-direction at position p in image I; and ω ( p ) denotes Gaussian filter.
After calculating the response function R, if R is a local maximum and greater than the threshold, then the pixel location is a corner point.
R = det ( M ) k t r a c e ( M ) 2
where det ( M ) and t r a c e ( M ) are the determinant and trace of the autocorrelation matrix in Equation (2), respectively, and k is an empirical constant that usually lies in the interval [0.04, 0.06]. Setting a higher threshold in the traditional Harris algorithm leads to the detection of fewer corner points, and a lower threshold tends to lead to the problem of overly dense corner points. Therefore, in this study, we used an adaptive iterative thresholding method to select suitable thresholds for the Harris algorithm according to different horse contour maps to reduce cases of dense feature points and an insufficient number of feature points in order to improve detection accuracy [46]. The algorithm is as follows (Algorithm 2), the effect is shown in Figure 6, red symbols for contour feature point extraction.
Algorithm 2. Adaptive iterative thresholding method
Input: Contour points of the Mongolian horse extracted using Canny edge detection, denoted as ContourPoints.
Output: Threshold T used for contour feature point extraction
   1. Set initial parameters:
      K = 1
      Rmax = max(R_matrix)
      Rmin = min(R_matrix)
      T0 = (Rmax + Rmin)/2
   2. Divide elements in R_matrix by T0:
      G1 = [element for element in R_matrix if element > T0]
      G2 = [element for element in R_matrix if element ≤ T0]
   3. Calculate arithmetic mean of G1 and G2:
      µ1 = sum(G1)/len(G1)
      µ2 = sum(G2)/len(G2)
   4. Compute mean µ and difference t:
      µ = (µ1 + µ2)/2
      t = (µ1 − µ2)/2
   5. Iterate until the threshold criteria are met:
      While abs(µt) > K:
         t = µ
         G1 = [element for element in R_matrix if element > t]
         G2 = [element for element in R_matrix if element ≤ t]
         µ1 = sum(G1)/len(G1)
         µ2 = sum(G2)/len(G2)
         µ = (µ1 + µ2)/2
         t = (µ1 − µ2)/2
      End While
   6. The final threshold value T is
      T = t
7. Use T for contour feature point extraction;

2.4.2. Body Measurement Point Extraction Based on Edge Contours

Although the improved Harris algorithm can better detect the coordinate values at the obvious inflection points on contours such as hooves and feet, elbow endpoints, and the knee joints of the hind limbs, the rest of the measurement points in the irregular contour segments are still not adequately detected by simply using the Harris algorithm. Therefore, according to the special nature of the horse’s body, the center of mass is used as the origin to divide the contour into intervals, and the formula for the center of mass is as follows:
i = 1 n p i ( x i x ) = 0 i = 1 n p i ( y i y ) = 0
In a two-dimensional image, in the x and y directions of the two ends of the image pixels and the same position for the center of mass, for each pixel in the image in the x direction of the sitemap for xi, its corresponding pixel value for pi, the center of mass in the x direction of the sitemap for x and in the y direction of the sitemap for y, Equation (3) can be converted to
x = i = 1 n p i x i i = 1 n p i y = i = 1 n p i y i i = 1 n p i
The coordinates of the center of mass ( x i , y i ) of a given horse were calculated using Equation (4), and the center-of-mass coordinates were set as the origin. In this coordinate system with the center of mass ( x i , y i ) as the origin, the target horse in an image is divided into four zones, in which the point of the nail falls in zone I; the highest point of the hip and the point of the posterior edge of the end of the sciatic bone fall in zone II; and the point of the anterior edge of the sternum falls in zone III. The point at which the x-axis intersects the left side of the contour and the y-axis intersects the upper side of the contour line is taken as interval one; the point at which the horizontal line formed by the y-axis intersecting the upper side of the contour line intersects the right side of the contour is taken as interval two; the point of intersection of the horizontal line with the x-axis over the right lateral contour line is used as interval three; and the intersection of the x-axis with the left lateral contour and the point of maximum curvature on the left side of the forelimb is used as interval four. Since the edge contour is in the form of an irregular curve, the polynomial fitting method was used to fit the contour in segments, and then the maximum point of curvature in its segmented curve was calculated. The fitting results are shown in Figure 7, where the blue points are the contour pixel points, the red curves are the fitted curves of the pixel points, and the x and y axes indicate the distribution of the pixels in the image. The calculation results and labeling of the point of maximum curvature in the fitted curve are shown in Figure 8, where the polynomial curve fitting formula is
y ( x , w ) = i = 0 M w i x i
The specific locations of the measurement points are determined by the curvature of the curve, calculated as shown in Equation (6):
k = y ( x , w ) 1 + y ( x , w ) 3 2

2.4.3. Calculation of Body Parameters

The scale parameters of a Mongolian horse’s real body parameters can be calculated based on the image scale parameters with respect to the specific length of the pixel. For this purpose, a height contrast rod with a length of 100 cm was placed at the midpoint of each horse (Figure 8) before the start of the test, a camera was utilized to capture the scale information of the images, and the image pixel distances were calculated. The calculation process for the specific scale parameter Lscale is shown below, where the pixel distance is Lpixel, and the true length of the height comparison bar is Ltruth:
L s c a l e = L p i x e l L t r u t h
Using the straight line formed by the lowest points of the anterior and posterior hooves detected by the improved Harris algorithm as the ground reference value, the extracted measurement points were sequentially utilized and matched, and the corresponding three linear body parameters, namely, withers height, shoulder height, and croup height, were calculated according to Equation (8):
d = A x 0 + B y 0 + C A 2 + B 2 × L s c a l e
Two linear body parameters, body length and chest depth, were calculated according to Equation (9):
l = ( x 2 x 1 ) 2 + ( y 2 y 1 ) 2 × L s c a l e
The slope k1 of the line connecting the point of the anterior edge of the sternum and the point of the nail, the point of the anterior edge of the sternum and the measuring point of the end point of the elbow, and the slope k2 of the line connecting the highest point of the rump and the point of the posterior edge of the end of the sciatic bone and the measuring point of the knee joint of the hind limb were utilized to calculate the angle of the shoulder and the angle of the rump of the Mongolian horses in sequence, and the calculation formula is as follows:
θ = arctan k 1 k 2 1 + k 1 k 2

3. Experiments and Results

3.1. Experimental Design

The test was conducted at the Equine Research Technology Base of Inner Mongolia Agricultural University, Hohhot, Inner Mongolia Autonomous Region. The test subjects were 20 adult Mongolian horses, which were numbered. First, the horses to be tested were fixed in the enclosure, and their body parameters were measured manually using measuring sticks, tape measures, and other measuring tools, and each parameter was measured three times to allow us to determine the average value. Secondly, the breeder used a halter to pull the horse from right to left, walking it at a constant speed past the camera, to capture a complete side view of the horse; the corresponding shooting schematic as shown in Figure 9. The average length of each captured video of a horse walking is 10 s, so each captured video was set filmed such that an image would be intercepted every 150 frames to ensure the integrity of the Mongolian horse outline in the image, and the intercepted images were used as the test set for the experimental model. After a horse contour image is obtained from the test set through the convolutional neural network model input, the key points of its measurement were automatically extracted, and the scale parameters were extracted to obtain the real body parameter values, which were compared with the data obtained from manual measurement to verify the effectiveness of the test.

3.2. Comparison of Segmentation Results for the Mongolian Horse Test Set

To further evaluate and analyze the performance of the model used in this paper for horse contour segmentation and extraction, the segmentation effect of the model on the Mongolian horse test set dataset was explored. The pixel accuracy (PA) and the mean intersection ratio (MioU) were used as evaluation indexes to measure image segmentation accuracy and compared with the representative models developed by Chu et al. [32] and Li et al. [47], namely, the original Mask R-CNN [37], YOLACT [48], and SOLO [49], in order to validate and analyze the performance of the improved model in horse silhouette extraction. The results for the same set of test images are shown in Table 2.
As shown in Table 2, the model described in this paper can segment the target Mongolian horse silhouette in the image relatively accurately, with a pixel accuracy and intersection ratio of 98.72% and 95.36%, respectively. Compared with the other models, namely, Mask R-CNN, YOLACT, Chu et al.’s model [32], SOLO, and Li et al.’s model [47], the pixel accuracy provided by our model was 7.26, 6.39, 8.57, 5.28, and 0.46 percentage points higher, and the Mean intersection over union was 10.64, 9.85, 9.99, 8.90, and 0.39 percentage points higher, respectively. Although the method proposed in this paper requires more processing time for a single image, Figure 10 shows that our method yields higher pixel accuracy and intersection ratios. As a result, our method can achieve higher accuracy in extracting the boundary contours of Mongolian horse images. In addition, the segmentation accuracy of the target horse contour edges exceeds that of the original Mask R-CNN model.

3.3. Validity Analysis of Body Parameter Measurements

The detection results for each key point and the heatmap representation results obtained after inputting the horse images to be measured into the algorithmic model proposed in this study are shown in Figure 11, from which it is evident that the algorithm proposed in this study can effectively locate and classify the location and category of each measurement key point. Finally, the automated calculation of individual body measurements is predicated on the coordinate data associated with keypoints across distinct categories.
As shown in Figure 11, by comparing the automatically acquired and manually measured values of each physical parameter with the actual values, the following results were obtained: the average relative error of shoulder height of the 20 Mongolian horses was 4.01%, the average relative error of withers height was 2.98%, the average relative error of chest depth was 4.86%, the average relative error of body length was 2.97%, the average relative error of croup height was 3.06%, the average relative error of shoulder angle was 4.91%, and the average relative error of croup angle was 5.21%. Among these values, the errors of shoulder angle and croup angle are significantly larger than the errors of linear parameters. This difference may be due to the obvious changes in the pectoral and caudal end muscles of individual horses when walking at slow speeds, and there is a certain degree of difference compared with the real values measured manually when the horses were at rest, leading to the bias in the measurement results.

4. Conclusions and Discussion

In this research, we propose a method for measuring the body parameters of Mongolian horses based on deep learning and image processing, and the main conclusions are as follows:
(1)
Our method can ensure that the somatic parameters of a Mongolian horse’s athletic performance ability are acquired when the horse is in its natural state; in addition, the method proposed in this paper enhanced the ability to obtain the Mongolian horses’ somatic measurement parameters, not only acquiring the horses’ growth information in a contactless way but also being able to be used in selection and breeding programs for improving the Mongolian horse’s athletic performance ability.
(2)
Utilizing the improved Mask R-CNN model for contour extraction of the captured images of horses in a natural walking state can effectively solve the problem of the occurrence of rough edge contours caused by the original Mask R-CNN model because it has a lower feature space resolution and a smaller proportion of boundary pixels, thus improving the target Mongolian horse contour segmentation accuracy. The model in this paper was compared with the Mask R-CNN model, the YOLACT model, Chu et al.’s model [32], the SOLO model, and Li et al.’s model [47]: the PA yielded by our model was 7.26, 6.39, 8.57, 5.28, and 0.46 percentage points higher, and the MIoU was 10.64, 9.85, 9.99, 8.90, and 0.39 percentage points higher, respectively.
(3)
Using the improved Harris-based algorithm, the optimal threshold size was obtained through iteration, and then the feature points in Mongolian horse contour images were detected according to this threshold, effectively solving the problem of the inaccuracy of the detected feature points due to the irrational threshold setting of the original Harris algorithm, and the obtained feature points were used as the location of the measurement points. The contours were divided based on the center of mass of a horse in an image, a fitting operation was performed on the curves in each interval, and the point of maximum curvature in the curves was calculated as the measurement point position. The average relative errors of the linear and angular parameters calculated in connection with the scale parameters are 3.58% and 5.06%, respectively, allowing for a more accurate estimation of body parameters.
(4)
The method proposed in this paper is computationally intensive and relies on multiple algorithms. While the approach proposed in this paper emphasizes 2D body size measurements, critical values such as withers height, body height, chest depth, body length, hip height, shoulder angle, and hip angle can be effectively gauged using just a single RGB camera. Despite a marginally increased processing time in contrast to that reported in other studies, expenditure was minimized in this study due to the usage of a single RGB camera. Future research will focus on making this algorithm simpler and more integrated.

Author Contributions

Conceptualization, methodology, writing—original draft preparation, and investigation, L.S.; validation, formal analysis, and writing—review and editing, M.L. and Z.Z.; and writing—review and editing, supervision, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The National Natural Science Foundation of China (32360856), and The Natural Science Foundation of Inner Mongolia Autonomous Region (2022QN03019), and The Scientific Research Program of Higher Education Institutions in Inner Mongolia Autonomous Region (NJZY22516), and The Innovation Team of Higher Education Institutions in Inner Mongolia Autonomous Region (NMGIRT2312).

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the fact that our study did not cause any stress to the horses and there was no contact with the horses during the collection process, which would not have affected the normal activities of the horses anyway.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gibson, M.J.; Legg, K.A.; Gee, E.K.; Chin, Y.Y.; Rogers, C.W. Career profile and pattern of racing for Thoroughbred jumps-racing horses in New Zealand. Anim. Prod. Sci. 2024, 64, AN23422. [Google Scholar] [CrossRef]
  2. Luke, K.L.; Rawluk, A.; McAdie, T.; Smith, B.P.; Warren-Smith, A.K. How equestrians conceptualise horse welfare: Does it facilitate or hinder change? Anim. Welf. 2023, 32, e59. [Google Scholar] [CrossRef]
  3. Li, X.H.; Zhu, X.Y.; Liu, R.J. Inner Mongolia Horse Industry Development Path Analysis. Mod. Anim. Husb. Sci. Technol. 2022, 12, 126–128. [Google Scholar]
  4. Singh, A.; Pal, Y.; Kumar, R.; Kumar, S.; Bhardwaj, A.; Rani, K.; Ana, R. Equine husbandry based agri-entrepreneurship-an overview. J. Community Mobilization Sustain. Dev. 2022, 3, 697–704. [Google Scholar]
  5. Mang, L.; Bai, D.Y. Analysis of the current situation of the horse industry in Inner Mongolia autonomous region. North. Econ. 2019, 11, 20–25. [Google Scholar]
  6. Zhu, Y. Research on the Integrated Development of Horse Tourism Industry in Inner Mongolia. J. Jining Norm. Univ. 2024, 46, 18–22. [Google Scholar]
  7. Sun, P.; Li, Y.; Zeng, H. Multidimensional Value and Path Exploration of Digital Transformation of China’s Horse Industry in the Context of Digital Economy. In Proceedings of the First Hubei Sports Science Conference, Huanggang, China, 9 December 2023; Volume 2. [Google Scholar]
  8. Tanaka, A.; Ohshima, H.; Tanezaki, T.; Muko, R.; Oikawa, M.; Matsuda, H. Development of a New IoT Device to Acquire and Analyze Gait Information of Horses in Training. Preprints 2024, 2024060507. [Google Scholar] [CrossRef]
  9. Ghezelsoflou, H.; Hamidi, P.; Gharahveysi, S. Study of factors affecting the body conformation traits of Iranian Turkoman horses. J. Equine Sci. 2018, 29, 91–96. [Google Scholar] [CrossRef]
  10. Paksoy, Y.; Ünal, N. Multivariate analysis of morphometry effect on race performance in Thoroughbred horses. Rev. Bras. Zootec. 2019, 48, e20180030. [Google Scholar] [CrossRef]
  11. Janczarek, I.; Wilk, I.; Dtrzelec, K. Correlations between body dimensions of young trotters and motion parameters and racing performance. Pferdeheilkunde 2017, 33, 139–145. [Google Scholar] [CrossRef]
  12. Yildirim, F. The differences in arabian horse body measurements used in different horse sports (racing and jereed). AgroLife Sci. J. 2023, 12, 248–252. [Google Scholar] [CrossRef]
  13. Li, C.; Mellbin, Y.; Krogager, J.; Polikovsky, S.; Holmberg, M.; Ghorbani, N.; Black, M.J.; Kjellström, H.; Zuffi, S.; Hernlund, E. The Poses for Equine Research Dataset (PFERD). Sci. Data 2024, 11, 497. [Google Scholar] [CrossRef]
  14. Rhodin, M.; Smit, I.H.; Persson-Sjodin, E. Timing of Vertical Head, Withers and Pelvis Movements Relative to the Footfalls in Different Equine Gaits and Breeds. Animals 2022, 12, 3053. [Google Scholar] [CrossRef]
  15. Starke, S.D.; May, S.A. Robustness of five different visual assessment methods for the evaluation of hindlimb lameness based on tubera coxarum movement in horses at the trot on a straight line. Equine Vet. J. 2022, 54, 1103–1113. [Google Scholar] [CrossRef]
  16. Todd, E.T.; Fromentier, A.; Sutcliffe, R.; Collin, Y.R.H.; Perdereau, A.; Aury, J.-M.; Èche, C.; Bouchez, O.; Donnadieu, C.; Wincker, P.; et al. Imputed genomes of historical horses provide insights into modern breeding. iScience 2023, 26, 107104. [Google Scholar] [CrossRef]
  17. Mariz, T.M.A.; Santos, W.K.; Mota, L.F.M.; Martins, R.B.; Lima, C.B.; Escodro, P.B.; Lima, D.M., Jr.; Oliveira, L.P.; Sousa, M.F.; Ribeiro, J.S. Evaluation morphostructural measures in horses of the Quarter horse Breed using image analysis. Acta Vet. Bras. 2015, 9, 362–370. [Google Scholar]
  18. Santos, M.R.; Freiberger, G.; Bottin, F.; Chiocca, M.; Zampar, A.; Cucco, D.C. Evaluation of methodologies for equine biometry. Livest. Sci. 2017, 206, 24–31. [Google Scholar] [CrossRef]
  19. Pérez-Ruiz, M.; Tarrat-Martín, D.; Sánchez-Guerrero, M.J.; Valera, M. Advances in horse morphometric measurements using LiDAR. Comput. Electron. Agric. 2020, 174, 105510. [Google Scholar] [CrossRef]
  20. Freitag, G.P.; De Lima, L.G.F.; Jacomini, J.A.; Kozicki, L.E.; Ribeiro, L.B. An accurate image analysis method for estimating body measurements in horses. J. Equine Vet. Sci. 2021, 101, 103418. [Google Scholar] [CrossRef]
  21. Rhodin, M.; Persson-Sjodin, E.; Egenvall, A. Vertical movement symmetry of the withers in horses with induced forelimb and hindlimb lameness at trot. Equine Vet. J. 2018, 50, 818–824. [Google Scholar] [CrossRef]
  22. Zhao, Y.; Xiao, Q.; Li, J.; Tian, K.; Yang, L.; Shan, P.; Lv, X.; Li, L.; Zhan, Z. Review on image-based animals weight weighing. Comput. Electron. Agric. 2023, 215, 108456. [Google Scholar] [CrossRef]
  23. Yang, G.; Xu, X.; Song, L.; Zhang, Q.; Duan, Y.; Song, H. Automated measurement of dairy cows body size via 3D point cloud data analysis. Comput. Electron. Agric. 2022, 200, 107218. [Google Scholar] [CrossRef]
  24. Matsuura, A.; Torii, S.; Ojima, Y.; Kiku, Y. 3D imaging and body measurement of riding horses using four scanners simultaneously. J. Jining Norm. Univ. 2024, 35, 1–7. [Google Scholar] [CrossRef]
  25. Le, C.Y.; Allain, C.; Caillot, A.; Delouard, J.; Delattre, L.; Luginbuhl, T.; Faverdin, P. High-precision scanning system for complete 3D cow body shape imaging and analysis of morphological traits. Comput. Electron. Agric. 2019, 157, 447–453. [Google Scholar]
  26. Du, A.; Guo, H.; Lu, J. Automatic livestock body measurement based on keypoint detection with multiple depth cameras. Comput. Electron. Agric. 2022, 198, 107059. [Google Scholar] [CrossRef]
  27. Martin-Cirera, A.; Nowak, M.; Norton, T.; Auer, U.; Oczak, M. Comparison of Transformers with LSTM for classification of the behavioural time budget in horses based on video data. Biosyst. Eng. 2024, 242, 154–168. [Google Scholar] [CrossRef]
  28. Li, X.; Gao, R.; Li, Q.; Wang, R.; Liu, S.; Huang, W.; Yang, L.; Zhuo, Z. Multi-Target Feeding-Behavior Recognition Method for Cows Based on Improved RefineMask. Sensors 2024, 24, 2975. [Google Scholar] [CrossRef]
  29. Patel, M.; Gu, Y.; Carstensen, L.C.; Hasselmo, M.E.; Betke, M. Animal pose tracking: 3d multimodal dataset and token-based pose optimization. Int. J. Comput. Vis. 2022, 131, 514–530. [Google Scholar] [CrossRef]
  30. Gosztolai, A.; Günel, S.; Lobato-Ríos, V.; Pietro Abrate, M.; Daniel, M.; Helge, R.; Pascal, F.; Pavan, R. Liftpose3d, a deep learning-based approach for transforming two-dimensional to three-dimensional poses in laboratory animals. Nat. Methods 2021, 18, 975–981. [Google Scholar] [CrossRef]
  31. Joska, D.; Clark, L.; Muramatsu, N. Acinoset: A 3d pose estimation dataset and baseline models for cheetahs in the wild. In Proceedings of the IEEE International Conference on Robotics and Automation, Xi’an, China, 30 May–5 June 2021; pp. 13901–13908. [Google Scholar]
  32. Chu, M.Y.; Li, M.F.; Li, Q.; Liu, G. Method of Cows Body Size Measurement Based on Key Frame Extraction and Head and Neck Removal. Trans. Chin. Soc. Agric. Mach. 2022, 53, 226–233+259. [Google Scholar]
  33. Geng, Y.L.; Ji, Y.K.; Yue, X.D.; Fu, Y.F. Pigs Body Size Measurement Based on Point Cloud Semantic Segmentation. Trans. Chin. Soc. Agric. Mach. 2023, 54, 332–338+380. [Google Scholar]
  34. Li, K.; Teng, G. Study on body size measurement method of goat and cattle under different background based on deep learning. Electronics 2022, 11, 993. [Google Scholar] [CrossRef]
  35. Song, H.B.; Hua, Z.X.; Ma, B.L.; Wen, Y.C.; Kong, X.F.; Xu, X.S. Lightweight Keypoint Detection Method of Dairy Cow Based on SimCC-ShuffleNetV2. Trans. Chin. Soc. Agric. Mach. 2023, 54, 275–281+363. [Google Scholar]
  36. Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A database and web-based tool for image annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
  37. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  38. Vedaldi, A.; Zisserman, A. Vgg Convolutional Neural Networks Practical; Department of Engineering Science, University of Oxford: Oxford, UK, 2016; Volume 66. [Google Scholar]
  39. Targ, S.; Almeida, D.; Lyman, K. Resnet in resnet: Generalizing residual architectures. arXiv 2016, arXiv:1603.08029. [Google Scholar]
  40. Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
  41. Anantharaman, R.; Velazquez, M.; Lee, Y. Utilizing mask R-CNN for detection and segmentation of oral diseases. In Proceedings of the 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 3–6 December 2018; pp. 2197–2204. [Google Scholar]
  42. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
  43. Wu, C.; Zheng, J.; Feng, Z.; Zhang, H.; Zhang, L.; Cao, J.; Yan, H. Fuzzy SLIC: Fuzzy simple linear iterative clustering. In Proceedings of the 2024 34th International Conference Radioelektronika (RADIOELEKTRONIKA), Zilina, Slovakia, 17–18 April 2024; pp. 1–5. [Google Scholar]
  44. Gibril, M.; Ruzouq, R.; Bolcek, J.; Shanableh, A.; Jena, R. Building Extraction from Satellite Images Using Mask R-CNN and Swin Transformer. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 2114–2124. [Google Scholar]
  45. Meira, C.T.; Curi, R.A.; Silva, J.A.I.V.; Corrêa, M.J.M.; de Oliveira, H.N.; da Mota, M.D.S. Morphological and genomic differences between cutting and racing lines of Quarter Horses. J. Equine Vet. Sci. 2013, 33, 244–249. [Google Scholar] [CrossRef]
  46. Li, Y.; Li, J. Harris corner detection algorithm based on improved contourlet transform. Procedia Eng. 2011, 15, 2239–2243. [Google Scholar]
  47. Li, M.; Su, L.; Zhang, Y.; Zong, Z.; Zhang, S. Automatic Measurement of Mongolian Horse Body Based on Improved YOLOv8n-pose and Point Cloud Analysis. Smart Agric. 2024. [Google Scholar] [CrossRef]
  48. Zhang, J.; Cheng, Y. Design of Horse Size Measurement System Based on Image Segmentation. Comput. Technol. Dev. 2020, 30, 177–180. [Google Scholar]
  49. Bello, R.W.; Ikeremo, E.S.; Otobo, F.N.; Olubummo, D.A.; Enuma, O.C. Cattle Segmentation and Contour Detection Based on Solo for Precision Livestock Husbandry. J. Appl. Sci. Environ. Manag. 2022, 26, 1713–1720. [Google Scholar] [CrossRef]
Figure 1. Examples of data enhancements.
Figure 1. Examples of data enhancements.
Applsci 14 05655 g001aApplsci 14 05655 g001b
Figure 2. Flow chart of Mongolian horse body parameter extraction method.
Figure 2. Flow chart of Mongolian horse body parameter extraction method.
Applsci 14 05655 g002
Figure 3. Sample examples.
Figure 3. Sample examples.
Applsci 14 05655 g003
Figure 4. The overall structure of the optimized Mask R-CNN.
Figure 4. The overall structure of the optimized Mask R-CNN.
Applsci 14 05655 g004aApplsci 14 05655 g004b
Figure 5. a. Tuberosity measurement point of the sternum; b. withers; c. the highest point of the croup; d. caudal portion measurement point of the ischial tuberosity; e. elbow end point; f. hind limb stifle joint; 1. shoulder height; 2. chest depth; 3. withers height; 4. body length; 5. croup height; α. shoulder angle; β. croup angle.
Figure 5. a. Tuberosity measurement point of the sternum; b. withers; c. the highest point of the croup; d. caudal portion measurement point of the ischial tuberosity; e. elbow end point; f. hind limb stifle joint; 1. shoulder height; 2. chest depth; 3. withers height; 4. body length; 5. croup height; α. shoulder angle; β. croup angle.
Applsci 14 05655 g005
Figure 6. Result yielded by Harris algorithm based on iterative threshold.
Figure 6. Result yielded by Harris algorithm based on iterative threshold.
Applsci 14 05655 g006
Figure 7. Curve fitting in different regions.
Figure 7. Curve fitting in different regions.
Applsci 14 05655 g007
Figure 8. Result of regional division and placement of measurement point markers in each region.
Figure 8. Result of regional division and placement of measurement point markers in each region.
Applsci 14 05655 g008
Figure 9. Layout of shooting environment.
Figure 9. Layout of shooting environment.
Applsci 14 05655 g009
Figure 10. Comparison of Mongolian horse segmentation performance for Mask R-CNN and our algorithm.
Figure 10. Comparison of Mongolian horse segmentation performance for Mask R-CNN and our algorithm.
Applsci 14 05655 g010
Figure 11. Relative error box plot of body measurements.
Figure 11. Relative error box plot of body measurements.
Applsci 14 05655 g011
Table 1. Image quantity distribution for Mongolian horse dataset.
Table 1. Image quantity distribution for Mongolian horse dataset.
OriginMirrorRotateBrightnessNoise
1380345345345345
Table 2. Comparison of segmentation performance for Mongolian horses yielded by different segmentation algorithms.
Table 2. Comparison of segmentation performance for Mongolian horses yielded by different segmentation algorithms.
Network ModelPixel Accuracy
PA (%)
Intersection over Union
MIoU (%)
Average
DetectionTime (s)
Mask R-CNN [37]91.4684.720.76
YOLACT92.3385.510.57
SOLO93.4486.460.48
Chu et al. [32]90.1585.37-
Li et al. [47]98.2694.970.96
Our research98.7295.362.97
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Su, L.; Li, M.; Zhang, Y.; Zong, Z. A Measurement Method for Body Parameters of Mongolian Horses Based on Deep Learning and Machine Vision. Appl. Sci. 2024, 14, 5655. https://doi.org/10.3390/app14135655

AMA Style

Su L, Li M, Zhang Y, Zong Z. A Measurement Method for Body Parameters of Mongolian Horses Based on Deep Learning and Machine Vision. Applied Sciences. 2024; 14(13):5655. https://doi.org/10.3390/app14135655

Chicago/Turabian Style

Su, Lide, Minghuang Li, Yong Zhang, and Zheying Zong. 2024. "A Measurement Method for Body Parameters of Mongolian Horses Based on Deep Learning and Machine Vision" Applied Sciences 14, no. 13: 5655. https://doi.org/10.3390/app14135655

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop