1. Introduction
Sheep identification can achieve health monitoring and realize a high scale and precision of livestock breeding management, which is of great significance to improving livestock production conditions, reducing labor costs, improving the livestock breeding efficiency, and promoting the digital and intelligent development of the livestock industry [
1].
The traditional methods of individual livestock identification include manual observation methods and use invasive equipment techniques. Manual observation methods rely too much on the observers’ experience and memory, which is highly subjective, has a low accuracy rate, and requires a huge amount of labor. Invasive equipment techniques mainly use methods such as ear-marked identification and radio frequency identification (RFID) tags, which have the disadvantages of a high cost, damage to the integrity of the sheep, being difficult to replace, and a low identification efficiency. For instance, Sun Yukun et al. [
2] found that only 21% of individual buffaloes were still identifiable after 2 years of wearing ear tags. The traditional methods rely on devices rather than the animals themselves, making the individual identification of animals less reliable.
Currently, an increasing number of scholars both domestically and internationally are applying machine vision technology to the field of agriculture and animal husbandry. The development of deep learning technology has led to the outstanding performance of neural network-based methods in image recognition tasks. However, with the advancement of deep learning, the model structures are becoming increasingly complex, and the scale of training data continues to grow, leading to escalating demands in terms of hardware and computational power [
3]. Therefore, the evolution of deep learning has driven the development of lightweight neural networks. This ensures not only the model accuracy but also a smaller model size, faster speed, and efficient image recognition tasks, even in scenarios with limited hardware resources. Various convolutional neural networks (CNNs) have been developed for facial recognition tasks, including the identification of animals such as pigs, cows, and sheep. However, the research on lightweight models in this context has started relatively late, and there is limited literature on the subject [
4,
5,
6].
In 2019, Yi Shi et al. [
7] addressed the challenges associated with the nocturnal activity, small target size, high speed, and complex environments of wild rabbits. They combined infrared thermal imaging technology with YOLOV3 (IR-YOLO). The experimental results indicated that IR-YOLO achieved a detection rate of 75% in complex environments captured using infrared thermal imaging videos. The average detection speed was 51 frames per second, representing a 15% improvement in the detection rate and a 5 frames per second increase in the detection speed compared to the original YOLOV3.
In 2019, Yan Hongwen [
8] employed the basic principles of convolutional neural networks to construct three network structures: Alex Net, Mini-Alex Net, and Attention-Alex Net, all applied to the facial recognition of pigs. The accuracy rates achieved were 97.48%, 96.66%, and 98.11%, respectively. The lightweight model, Mini-Alex Net, exhibited faster processing speeds.
In 2020, Feng Mingqiang et al. [
9] utilized the ResNet50 neural network model for training and prediction on photos. They developed a pig facial recognition app based on the MUI and Django frameworks. The experimental results indicated that, after optimizing the ResNet50 model, the accuracy rate reached around 92%.
In 2020, Yan Hongwen et al. [
10] addressed the issue of multi-target detection for individual pigs by combining Feature Pyramid Attention (FPA) with Tiny-YOLO. They used FPA modules of depth 3 to obtain the model FPA3-Tiny-YOLO. After incorporating the FPA-3 module, Tiny-YOLO’s recall, F1, and mAP increased by 6.73, 4.34, and 7.33 percentage points, respectively.
In 2021, Hu Zhiwei et al. [
11] aimed to address the challenges posed by factors such as pig adhesion and pigpen obstruction in the multi-object instance detection of individual pigs in complex environments. They introduced a Dual Attention Unit (DAU) and incorporated it into the Feature Pyramid Network structure. Simultaneously, they concatenated PAU units to construct different spatial attention modules. As a result, the accuracy reached 92.8%.
In 2022, Wang et al. [
12] designed a lightweight pig facial recognition model based on a deep convolutional neural network algorithm. For pig facial classification, they proposed an improved method based on the Triple Margin Loss function. This enhancement resulted in a 28% increase in the average precision, with a mean average precision (mAP) value of 94.04%.
In 2022, Li et al. [
13] designed a lightweight neural network for cattle face recognition in an embedded system. They employed batch normalization to normalizing the neural network inputs and used Dropout after the ReLU activation function to enhance the accuracy. As a result, the model size was significantly reduced, achieving an accuracy of 98.37%.
In 2022, Yang Jialin [
14] constructed a sheep face detection model based on the Retina Face model. He chose MobileNet as the feature extraction network for the model. He utilized Complete Intersection over the Union (CIOU) as the loss function. After the improvement, the average precision of the sheep face detection model reached 97.12%, with a computational load reduced to 64.7% of the original.
In 2022, Zhou Lixiang [
15] introduced the GhostNet network into the Retina Face detection model. She constructed a lightweight sheep face detection model named G-Retina Face. The experimental results indicate that the G-Retina Face model operates at a faster speed and has a reduced size of 107.1 MB. Information on these studies is shown in year order in
Table 1.
In summary, through the collective efforts of researchers worldwide, deep learning technology has been widely applied to facial recognition in animals. The recognition of pig’s and cattle’s faces has been well developed, while sheep face recognition applications are relatively limited, especially in the domain of lightweight neural networks for sheep face recognition [
16,
17,
18]. Furthermore, the availability of sheep face datasets is currently relatively limited, lacking large-scale publicly accessible datasets for reference [
19]. To address the challenge of individual sheep identification in complex scenarios and achieve the requirements of a small model size and fast detection speed, this study focuses on sheep facial recognition and constructs a lightweight neural network model. A dataset was created by collecting 5371 clear facial photos from various angles of 114 sheep. The SSD algorithm was used as a foundation to conduct the sheep facial detection and algorithm improvements. Following that, validation was conducted.
3. Results
3.1. Comparison of the Test Results after the SSD Improvement
To validate the optimization of sheep face detection using different improvement strategies, using SSD as the base network, the effectiveness of lightweight neural networks (SqueezeNet), the ECA mechanism, and the BalancedL1 loss function are studied. The experimental results are presented in
Table 5.
Replacing the original SSD model’s backbone network VGG with SqueezeNet, the improved network model exhibits a significant reduction in volume, with the detection speed increasing by 4.26 frames/s. However, the average precision mean decreases by 3.62%. These changes demonstrate that the lightweight neural network SqueezeNet effectively reduces the model’s volume at the cost of sacrificing a certain degree of average precision. After incorporating the ECA mechanism into the bottleneck front end of the feature extraction network with a parameter count of 32 × 1024, the average precision mean of the network model increases by 1.91%, the model volume remains basically unchanged, and the detection speed improves by 0.56 frames/s. This demonstrates that the ECA mechanism can effectively enhance the global perception of small-scale features of the sheep faces. Replacing the original network’s smoothL1 loss function with the BalancedL1 loss function, the mean average precision of the network model increases by 1.47%, and the model’s detection speed improves by 3.27 frames/s. This demonstrates that the BalancedL1 loss function, by boosting the gradient of inliers, points to better matching the real target boxes, thereby improving both the detection accuracy and speed.
Considering that attention mechanisms can be easily influenced by the network structure, therefore, based on SSD + Sq as the foundational network, we investigated the strengths and weaknesses of different attention mechanisms. We separately employed CA, SE, CBAM, and ECA modules. Additionally, we incorporated the attention modules into the feature extraction network at both the front end and backend of the average pooling layer with the parameters set at 12 × 1000. It can be observed that the model sizes of the four attention mechanisms remain relatively consistent. The detection speed of the SE1, CBAM2, ECA1, and ECA2 attention mechanism modules showed improvement, with respective increases of 2.79 frames/s, 0.94 frames/s, 3.03 frames/s, and 1.19 frames/s. The detection speed of CA1, CA2, SE2, and CBAM1 decreased, with respective reductions of 8.7 frames/s, 6.46 frames/s, 0.6 frames/s, and 2.36 frames/s. The CA2, SE1, and SE2 attention mechanism modules have an impact on the model’s detection accuracy. The mean average precision of the model decreased by 0.78%, 0.13%, and 0.22%, respectively. This is evidently not suitable for the task of detecting sheep facial features. The CA1, CBAM1, CBAM2, ECA1, and ECA2 attention mechanism modules contributed to a certain improvement in the network’s mean average precision. They, respectively, increased by 2.15%, 0.68%, 0.36%, 2.51%, and 1.9%.Therefore, the introduced ECA1 attention mechanism in this paper demonstrates a superior performance in the application of this model.
3.2. Analysis of the Ablation Experiment Results
The three proposed improvement methods in this study are SqueezeNet, ECA, and BalancedL1. On the self-constructed sheep face dataset, the following ablation experimental methods are designed: (1) Based on the original SSD network before improvements, each of the three enhancement methods is incorporated separately to assess the optimization effect of each improvement method on the original algorithm. (2) Building upon the enhanced SSD algorithm, each of the three improvement methods is individually removed to evaluate the impact of each improvement on the final algorithm, as shown in
Table 6. From
Table 6, it can be observed that compared to the original SSD algorithm, the ECA mechanism module exhibits the most pronounced improvement in accuracy. Its average precision increased by 1.91%. The model size remained largely unchanged. The detection speed increased by 0.56 frames per second. The introduction of the lightweight neural network SqueezeNet showed the most significant improvement in the detection speed. The detection speed increased by 5.14 frames per second, and the model size decreased by 96.4 MB, but the average precision decreased by 3.01%. Compared to the improved SSD-Sq-ECA1-B algorithm, removing the ECA1 attention mechanism module had the most significant impact on the accuracy, resulting in a decrease of 2.43% in the average precision. The removal of the lightweight neural network SqueezeNet had the most significant impact on the model size. The model size increased by 96.2 MB, and the detection speed decreased by 3.95 frames per second, indicating the most significant impact on speed. The SSD-Sq-ECA1-B algorithm proposed in this paper, compared to the original SSD algorithm, resulted in a 2.17% improvement in the average precision on the sheep face dataset. The model size decreased by 96.2 MB, and the detection speed increased by 7.13 frames per second. The SSD-Sq-ECA1-B algorithm can enhance the detection accuracy while ensuring detection efficiency.
3.3. Comparison of Experimental Results of Different Networks
On the self-constructed sheep face dataset, the algorithm model proposed in this paper was compared with three mainstream network models in the same environment, namely SSD, Faster R-CNN, and Retinanet. The results are presented in
Table 7. Compared to other models, the algorithm proposed in this paper exhibits significant improvements in both the detection speed and accuracy. The SSD-Sq-ECA1-B algorithm proposed in this paper achieves an average precision of 82.39% and a detection speed of 66.11 frames per second, and the model size has significantly decreased from 132 MB to 35.8 MB. Compared to SSD, Faster R-CNN, and Retinanet, the average precision of this algorithm has improved by 2.17%, 3.63%, and 1.3%, respectively. The detection speed has also increased by 7.13 frames per second, 56.13 frames per second, and 50.68 frames per second, respectively. In summary, the SSD-Sq-ECA1-B algorithm proposed in this paper demonstrates significant advantages in both detection accuracy and efficiency.
3.4. Comparison with State-of-the-Art Models
To investigate the performance of SSD-Sq-ECA1-B proposed in this study, we compared it with the prior research’s sheep face recognition model. The prior research employed the models YOLOv3 and YOLOv4. The comparison results are presented in
Table 6. The comparison results are shown in
Table 8. As can be seen in
Table 6, SSD-Sq-ECA1-B achieved the best model size and FPS performance. The model size was smaller by 85.85% and 41.79%, respectively. The FPS performance was higher by 72.77% and 86.84%, respectively. This is more mobile friendly. The comparison results show that SSD-v3-EC2-B has significant advantages in terms of the model volume and detection speed.
4. Discussion
This study constructed a lightweight sheep face recognition model, SSD-Sq-ECA1-B, for the identification of sheep faces. The SSD-Sq-ECA1-B model exhibits significant advantages in detection speed. The experimental results on the self-constructed sheep face dataset indicate that the SSD-Sq-ECA1-B model achieves an mAP of 82.39%, with a substantial reduction in size from 132 MB to 35.8 MB. Additionally, the detection speed reaches 66.11 frames per second. This provides a method for deploying the SSD-Sq-ECA1-B model on mobile devices for video stream processing.
Due to the limited variety in the captured sheep breeds, which solely consist of the Small-tailed Han sheep from a specific region, future experiments will continue to expand the scale of the sheep face dataset by incorporating facial images from various breeds, thereby enhancing the diversity of the dataset.
From a long-term perspective, in order to facilitate the operations of herders, there is an urgent need to develop an embedded device at the current stage. Therefore, it is essential to minimize the model size as much as possible to facilitate the deployment of embedded devices. Our future research direction is to provide herders with an accurate and efficient recognition device.
5. Conclusions
This study proposes a lightweight sheep face detection algorithm, SSD-Sq-ECA1-B, based on the improved SSD. It offers a new approach to individual sheep identification. As a result, it enhances both the speed and accuracy of detection. The experimental results indicate that the proposed method in this paper can effectively detect the facial identity information of sheep, achieving an average precision of 82.39%. Compared to models such as SSD, Faster R-CNN, and Retinanet, the average precision has been improved by 2.17%, 3.63%, and 1.3%, respectively. The detection speed has also increased by 7.13 frames per second, 56.13 frames per second, and 50.68 frames per second, respectively. This algorithm not only enhances the detection accuracy but also significantly reduces the model size and improves the detection efficiency when applied to sheep face identification. It can serve as a reference for subsequent sheep face identity recognition tasks.