Exploring Attributions in Convolutional Neural Networks for Cow Identification

Tanchev, Dimitar; Marazov, Alexander; Balieva, Gergana; Lazarova, Ivanka; Rankova, Ralitsa

doi:10.3390/app15073622

Open AccessArticle

Exploring Attributions in Convolutional Neural Networks for Cow Identification

by

Dimitar Tanchev

^1,*,

Alexander Marazov

²

,

Gergana Balieva

³,

Ivanka Lazarova

³ and

Ralitsa Rankova

¹

Faculty of Veterinary Medicine, Trakia University, 6000 Stara Zagora, Bulgaria

²

Department of Bioinformatics and Mathematical Modelling, Institute of Biophysics and Biomedical Engineering, Bulgarian Academy of Sciences, 1000 Sofia, Bulgaria

³

Veterinary Legislation Unit, Faculty of Veterinary Medicine, Trakia University, 6000 Stara Zagora, Bulgaria

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3622; https://doi.org/10.3390/app15073622

Submission received: 16 February 2025 / Revised: 18 March 2025 / Accepted: 18 March 2025 / Published: 26 March 2025

(This article belongs to the Special Issue Applications of Machine Learning Technology in Agricultural Data Mining)

Download

Browse Figures

Versions Notes

Abstract

:

Face recognition and identification is a method that is well established in traffic monitoring, security, human biodata analysis, etc. Regarding the current development and implementation of digitalization in all spheres of public life, new approaches are being sought to use the opportunities of high technology advancements in animal husbandry to enhance the sector’s sustainability. Using machine learning the present study aims to investigate the possibilities for the creation of a model for visual face recognition of farm animals (cows) that could be used in future applications to manage health, welfare, and productivity of the animals at the herd and individual levels in real-time. This study provides preliminary results from an ongoing research project, which employs attribution methods to identify which parts of a facial image contribute most to cow identification using a triplet loss network. A new dataset for identifying cows in farm environments was therefore created by taking digital images of cows at animal holdings with intensive breeding systems. After normalizing the images, they were subsequently segmented into cow and background regions. Several methods were then explored for analyzing attributions and examine whether the cow or background regions have a greater influence on the network’s performance and identifying the animal.

Keywords:

attributions; convolutional networks; triplet loss; cow identification

1. Introduction

Cattle identification and the ability to precisely pinpoint the individual among many other animals in the herd are critical to modern livestock management. They enable effective tracking of animal movement, the traceability of products of animal origin, and health monitoring, including surveillance of contagious disease outbreaks. They also enhance optimization of production and performance and improve animal welfare. Traditional identification methods such as ear tags and RFID transponders are susceptible to loss, tagging errors, or fraud, necessitating the need for automated, non-invasive solutions [1]. In response, computer vision and deep learning have emerged as promising tools for animal identification, offering high accuracy and scalability. With the advances in the field of computer vision, animal recognition has undertaken new development through visual biometrics [2,3]. With the use of machine learning and a set of algorithms for image processing and recognition, researchers have successfully identified several vertebrate species [4,5] including wildlife [6,7,8,9] and farm animals like sheep [10], pigs [11,12], and cattle [13,14,15]. Particular success was achieved with the cattle recognition process using deep learning and a CNN [13,16,17]. Researchers went further in the identification of face images and cow face parts [18,19] and even distinguishing particular cattle [20,21,22].

Convolutional neural networks (CNNs) are widely used in image classification [23], object detection [24,25], and identification [26], because they can automatically learn complex patterns in data. Recent advancements in CNNs have demonstrated strong performance s in biometric recognition across multiple domains, including human face recognition, wildlife monitoring, and livestock identification [27]. However, while CNNs excel at learning distinguishing patterns, their decision-making processes remain largely opaque, raising concerns about their robustness and generalization in real-world farm conditions [28].

These networks operate largely as “black boxes”, and it is often unclear which parts of an image influence network’s predictions [29]. In particular, CNN-based animal recognition models may not solely rely on biometric features but also on environmental cues, leading to potential overfitting to background elements rather than the animal itself. This lack of interpretability becomes a significant limitation, especially when the purpose is to assess the abilities of the model to generalize, with limited data available.

It is important to emphasize that the use of computer vision and deep learning is explored for the task of identification. It involves a search process, where the goal is to pinpoint the exact match from a database. The images in the database, in general, are not available during training. For example, when identifying a cow in a farm, the system first creates a database with all cows in the farm. Then, for a new image of a cow, it checks the database to find the closest match.

In contrast, the task of recognition compares the input against predefined classes or categories. In general, the model is trained with images that represent these classes. For example, in the task of recognizing a cow in a farm, each cow represents one class, but there can be many images for one cow. The model has seen all classes during training and directly outputs the predicted class. In livestock farming, the benefits of individual recognition include health monitoring, improving welfare, and breeding [30].

The task of re-identification is recognizing an individual across different instances or contexts. It is similar to identification, but does not compare an instance to a database [31].

One of the main challenges in CNN-based animal identification is ensuring that models focus on biometric features (e.g., facial patterns, body structure) rather than on background features [32]. Previous studies have shown that animal recognition models trained in one environment often fail under changed conditions, suggesting that CNNs learn dataset-specific artifacts rather than biologically relevant animal recognition features [33]. This issue necessitates the use of explanation techniques when training the model to understand which image regions contribute most to classification decisions.

In order to help the model to generalize better, images of the animals in different environments should be taken. However, this is a challenging task due to the need to keep track of the individual cows; it appears to be also time-consuming. Hence, it would be useful if the research could apply attribution methods to detect the lack of generalization due to the background in the images. Models that do not generalize well have poor performance on unseen objects or environments. Identification systems that have poor performance are not useful in real farms.

Data Shapley values are defined in [34] as a method to reliably assess the contribution of data records to the performance of the model.

Based on the previous research in the application of machine learning in the field of cattle face recognition [18,35,36], the present paper focused on cow face identification with the aim of exploring the effect of the background on the generalization of an identification model trained on dairy cows from the Holstein breed. For this purpose, a procedure that quantifies the influence of the background in relation to the performance of the identification model was proposed. Some attribution methods were investigated in order to determine their capabilities to quantify the influence of the background without the need for intensive segmentation. The existence of such an attribution method would allow the builders of identification systems to assess the model’s ability to generalize to different backgrounds by inspecting the attribution output. Attribution methods in image tasks aim to explain model predictions by identifying the importance of input pixels or regions that contribute to the output. Techniques like saliency maps, occlusion, and gradient-based approaches visualize these contributions, offering insights into how models process visual data. However, the research cited focuses either on classification tasks and/or evaluates the results only visually. The present paper proposes a test that can numerically compare the results of a model with the output of an attribution map.

Within the current initial stage of the study, two related questions were posed: (i) Does the cow (foreground) or the surrounding background contribute more to the results of the identification task? (ii) Is there an attribution method that can quantify correctly the contribution of the foreground vs. the background for the performance of the model?

2. Materials and Methods

The present study, which is the first phase of an ongoing two-year project on cow face visual identification, included creation of datasets, training, and exploration of different methods to evaluate influence of background in process of cow identification.

An experimental dataset was created and introduced, containing images of 168 dairy cows from the Holstein breed, consisting of 760 images in total. The images were taken in several farms, situated in Southern Bulgaria in the period from 1 April to 30 June 2024. The photos were manually captured on-site in daylight with Panasonic Lumix DMC-GX8 camera (Kadoma City, Japan) with a Panasonic Leica 15 mm f/1.7 lens. Each cow has at least three images. Near-duplicate images were removed by visual inspection to make sure the test results would not be compromised. It should be noted that on some of the images more than one animal was visible. A record kept for correct identification of the individual cows. The image resolution was 908 pixels (width) and 1210 pixels (height) with a horizontal and vertical resolution of 72 dpi.

The images from the dataset were annotated after with the identity of the particular animal. In this process of the dataset preparation, we used Roboflow resources [37,38] which is a cloud-based platform that streamlines image annotation and preprocessing. The images were uploaded and segmented using built-in tools, ensuring a standardized dataset for training and evaluation. During the procedure the cow regions were manually annotated, ensuring precise bounding lines for segmentation between cow and foreground. Using standardized annotation techniques with Roboflow enhanced the model reliability, as previous studies have shown that consistent dataset preprocessing significantly improves classification accuracy [39].

The segmentation was completed for all 760 images, clearly defining foreground and background regions. Then, the background was replaced with white noise in the preparation of foreground images and the reverse for the background images, respectively.

The prepared data were afterwards split into training, validation, and test datasets with approximate ratios of 34%, 33% and 33% of the cows. When a cow was randomly assigned to one of the splits, all its images became part of the split. This ensured that the cows in the validation and the test set were not seen during training.

For the training process, the last layer of a ResNet50 [35] model was fine-tuned on an Nvidia RTX A4500 (Nvidia, Santa Clara, CA, USA) graphical processing unit with 20 GB of VRAM on a cloud-based platform. The use of a pre-trained model allowed the efficient of the network on high-performance hardware, significantly reducing the computational time. During training, the following random augmentations were applied: resized crop, horizontal flip, rotation, color jitter, affine transform, erasing, and Gaussian noise.

The loss function used was triplet loss [26], which is designed for identification. The model learns by pulling similar items closer together in an embedding space while pushing dissimilar items further apart. The loss function operates on triplets of samples, each consisting of an anchor (a reference sample), a positive sample (similar to the anchor), and a negative sample (dissimilar to the anchor).

The triplet loss function can be mathematically expressed as follows:

L (a, p, n) = max (d (a, p) - d (a, n) + α, 0)

(1)

where

d (a, p)

is the distance between the anchor, a, and the positive sample, p;

d (a, n)

is the distance between the anchor and the negative sample, n; and

α

is a positive number that ensures a gap between the distances of positive and negative pairs.

Triplet loss is widely used in various fields, including face recognition, where it was introduced in the FaceNet paper [26]. For animal re-identification, it has been used in [31].

The following evaluation of attribution methods was proposed:

Compute the Shapley value of the accuracy of the model for the foreground and the background;
Compute the average contribution for a pixel in the foreground and in the background;
Compare the previous two values.

The hypothesis was that if the average attribution was close to the Shapley value, then it could be assumed that the attribution method correctly allocated the importance according to the model.

In the definition of the Shapley value in Equation (2), i is either foreground or background; the set of players N has two elements: foreground and background; the value or the payoff v is the accuracy of the identification model.

ϕ_{i} = \sum_{S \subseteq N ∖ {i}} \frac{| S |! (n - | S | - 1)!}{n!} [v (S \cup {i}) - v (S)]

(2)

More specifically, the definition of

v (S)

for the different values of S is as follows:

Foreground: The foreground was left and the background was replaced with white noise for both the test and reference images; $v (S)$ is the accuracy of the model over these images.
Background: The process was executed as previously mentioned, but there the background was left and the foreground was replaced with white noise.
Empty set: $v (\emptyset)$ is set to the expected probability of being correct if the individual is guessed at random.
Foreground and background: $v (S)$ is the accuracy of the model on the original images.

It should be noted that the model is kept fixed during the evaluation of the accuracy, and it is trained on the original images. The foreground and the background images are presented in Figure 1.

The attribution methods that were chosen for identifying the regions of the cow face images that contribute the most to the final output were gradient-based approaches [40]. Several gradient methods are in use in computer vision and machine learning, but relatively few have been proposed for triplet loss networks. In the present study, we used adaptations of Vanilla gradient, Grad-CAM [41], and EmbeddingCAM [42] in order to highlight the parts of the cattle face images that appear to be most important for the decision made by the CNN in the identification process.

The Shapley value for the foreground and background images was computed to measure the relative contributions, similar to research where the Shapley value assesses the importance of the dataset images [43]. In experimental research [40] Grad-CAM showed the highest accuracy among other Explainable AI methods (XAI). [44] suggested that particular modifications of the Grad-CAM method perform even better. However, the evaluation of the goodness of the explanations in most of the research still relies on a review by a human expert. In the current experiment, the use of Shapley values aims to give an objective measure of the attribution of a region in the image with regard to the trained model.

Vanilla gradient [45] is a technique used to interpret neural network predictions by examining the gradients of the output with respect to the input features:

\frac{\partial f_{w} (a; p, n)}{\partial a}

(3)

where

f_{w} (a; p, n) = MSE (a, p) - MSE (a, n)

is the difference between the mean squared errors (MSE) in the positive and negative images; a is the anchor image, and w are the weights of the model. This approach is being used in order to help identify which input features are most influential in the network’s decision-making process by highlighting areas where slight changes in input lead to significant changes in the output. By computing the gradients, the influence of each input pixel (in the case of images) on the final prediction could be traced, providing insights into which aspects of the data are deemed important by the model.

This method has several limitations that have been addressed in new methods [46]. Gradients are often noisy, meaning they can highlight irrelevant or uninformative pixels alongside meaningful features. This noise can make it difficult to interpret the results and may obscure the true areas of interest in the image. In Figure 2 this can be clearly seen over one of the test images.

Grad-CAM combines gradients with feature maps to produce more spatially coherent, class-discriminative visualizations [47].

The Grad-CAM for an image a is defined as follow:

L_{a}^{Grad-CAM} = ReLU (\sum_{k} α_{k} A^{k})

where

A^{k}

is the k-th feature map of the last convolutional layer and

α_{k}

is the importance weight computed as follows:

α_{k} = \frac{1}{Z} \sum_{i} \sum_{j} \frac{\partial l}{\partial A_{i j}^{k}}

In the experiments carried out, l is defined as the average mean squared error for a over the positives minus the average mean squared error over the negatives, Z is a normalization constant that gets canceled during comparison and visualization, and

\frac{\partial l}{\partial A_{i j}^{k}}

is the gradient of l with respect to the feature map activation. An example of Grad-CAM can be seen on Figure 3.

Ref. [41] adapts Grad-CAM for embedding networks by using gradient weights computed from multiple training examples. During testing, the method identifies the closest training example to the test image and uses that example’s gradient weight to generate a heatmap for the test image. This approach, called Average Grad-CAM, requires multiple training images and an index to efficiently retrieve the nearest training sample for generating heatmaps for new inputs (Figure 4).

A more recent development in [42] proposes a modification to Grad-CAM suitable for embedding networks—EmbeddingCAM. It uses a proxy point

p_{c}

in the embedding space, where the index c is analogous to the class in classification. In the present experiments c ranges over the individual cows in the test dataset.

For the proxy point, the average embedding was taken for each cow c.

p_{c} = \frac{\frac{1}{n} \sum_{j \in c} y_{j}}{| | \frac{1}{n} \sum_{j \in c} y_{j} | |}

The loss is defined as the vector product between the normalized embedding of the anchor y and

p_{c}

L_{c} (y) = y \cdot p_{c}

This method is visualized in Figure 5.

3. Results

The accuracy of the model over the original images is 92.2%; while using only foreground, it drops to 40.2%. Using only the background gives a slightly better accuracy of 50.0%. (Table 1). This suggests that the model uses both the foreground and the background to make the embedding of the images, with the background having a little more importance.

In Table 2, the ratio of the foreground to the background was compared for the Shapley values and the three gradient methods: Vanilla gradient, Average Grad-CAM, and EmbeddingCAM. For each method, the mean weight was computed for the regions defined as foreground and background. A method that has a ratio of the foreground to the background close to the Shapley value ratio would be more desirable.

4. Discussion

Deep learning and convolutional neural networks (CNNs) are successfully used in objects and patterns recognition through processing massive data [48,49]. The recognition system is based on learning of specific features that are extracted from a dataset of images [50,51]. The recent application of CNNs in veterinary medicine is focused on disease detection and prevention [36,52]; the present study focused on cow face identification. In this process, a dataset of images was created and annotated afterwards, as perceived by [53], which as an approach was also applied by [49]. The subsequent segmentation of the images into foreground and background completed for all 760 images, was found to accurately define the features needed for the analysis [54,55]. Ref. [56] argued that the segmentation technique could increase the accuracy of the experiment of recognition.

For the segmentation of the original images, the chosen method included the application of white noise for replacement of the background in the preparation of foreground images, and the reverse technique was applied for the background.

The analyses of [57,58,59] were in favor of such a multi-center approach.

The attribution method chosen in the study for determining the most important regions from the cow face images to the final output was a gradient-based approach. Several gradient methods are in use in computer vision and machine learning [60]. Among those described in the scientific literature, within the current experiment there consecutively were applied the Vanilla gradient [61], Grad-CAM [62], and EmbeddingCAM in order to highlight the parts of the cattle face images that appear to be most important for the decision made by the CNN in the recognition process. The Shapley value was computed to measure the relative contribution of values of the tested model, as described by [63], tracking the influence of the foreground and background images in the present study. In experimental research [64] Grad-CAM showed the highest accuracy among other explainable AI methods (XAI).

With heatmaps of test images, ref. [65] suggested that particular modifications of the Grad-CAM method could perform even with better interpretable results.

While gradient methods are derived using intuitive reasoning, they do not have provable guarantees on their correctness. Shapley values based attribution methods, on the other hand have game-theoretical guarantees, but they are prohibitively slow to compute exactly for most applications. The popular Python 3 package SHAP [66] has practical modifications of the Shapley-based attribution algorithms. In the current experiment the process covered only two classes, hence this exponential complexity of the computation was not perceived as a problem.

During the analysis, the experimental model showed an accuracy of about 92.2% over the test cow images. These results were comparable to other research studies on image classification with artificial neural networks, as [67] reported an accuracy of 92.9%, and ref. [59] reached an accuracy of 91.66%. These findings show that the triplet loss architecture easily outperforms previous architectures like Fisherfaces [11], which achieved 77% accuracy on the recognition task over 10 pigs, despite the much larger training.

When segmented and using only the foreground (cow), the model’s accuracy dropped significantly (40.2%). Using only the background showed a slightly higher accuracy (50.0%), indicating that the model relies on the background as well, which is not desirable in the process of cattle identification in real environment. With a Shapley value of 80.4% for the foreground to background ratio, the model determined that Average Grad-CAM values of 98.8% are the closest to the target, while using EmbeddedCAM favors foreground influence (117.7%).

From a practical perspective, improving individual cow identification accuracy could have significant implications for precision livestock farming, including the Holstein breed which represents the predominant share of dairy cows in Bulgaria. A robust AI-based identification system would enable automated health monitoring allowing correctly recorded cattle medical history reports, thus reducing the risk of misdiagnosed treatments [68]. Further benefits would allow the tracking of individual movement patterns to help identify early signs of illness, stress, or abnormal behaviors, allowing farmers to take proactive measures for ensuring high standards of animal welfare [69].

5. Conclusions

The present study, which applied a triplet loss function to identify dairy cattle based on biometric features, revealed that none of the evaluated gradient attribution methods adequately captured the Shapley value in estimating the contribution of foreground and background regions to model accuracy. The model relied more on background features for identification, while the tested methods—Vanilla gradients, Grad-CAM, and EmbeddingCAM—could not confirm this conclusively. This finding raises a problem for real-world applications of the method, as a working identification system must prioritize biometric features over environmental cues to ensure robustness and generalizability across farms.

The findings presented, while offering a novel methodology in the field of animal identification, emphasize the need for bias mitigation techniques in CNN-based livestock identification. They provide a basis for further exploration if a new gradient method for triplet loss networks or a modification of the presented methods can more accurately reflect the model’s focus on biometric features. These advances will provide an opportunity for facilitating the implementation of AI systems in the precise livestock farming, improving farm management, breeding, genetic selection, animal health, and welfare management by increasing the accuracy in animal detection and identification.

Author Contributions

Conceptualization, D.T. and A.M.; methodology, D.T. and A.M.; coding and machine learning, A.M.; validation, A.M., D.T. and I.L.; formal analysis, A.M.; investigation, A.M. and G.B.; resources, D.T. and I.L.; data curation, D.T. and A.M.; writing—original draft preparation, A.M. and G.B.; writing—review and editing, A.M., G.B., D.T. and R.R.; visualization, A.M.; supervision, A.M., and G.B.; project administration, G.B.; funding acquisition, G.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Education and Science in Bulgaria within the framework of the Bulgarian National Recovery and Resilience Plan, Component “Innovative Bulgaria”, Project No. BG-RRP-2.004-0006-C02 “Development of research and innovation at Trakia University in the service of health and sustainable well-being”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to due to legal reasons.

Acknowledgments

The authors express their gratitude to all farm owners who willingly cooperated during the process of data collection and permitted shooting digital images of their cattle and the images consequent use for the research process.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Meng, Y.; Yoon, S.; Han, S.; Fuentes, A.; Park, J.; Jeong, Y.; Park, D.S. Improving Known-Unknown Cattle’s Face Recognition for Smart Livestock Farm Management. Animals 2023, 13, 3588. [Google Scholar] [CrossRef] [PubMed]
Kühl, H.S.; Burghardt, T. Animal Biometrics: Quantifying and Detecting Phenotypic Appearance. Trends Ecol. Evol. 2013, 28, 432–441. [Google Scholar] [CrossRef] [PubMed]
Kumar, S.; Singh, S.K. Cattle Recognition: A New Frontier in Visual Animal Biometrics Research. Proc. Natl. Acad. Sci. India Sect. A Phys. Sci. 2019, 90, 689–708. [Google Scholar] [CrossRef]
El Abbadi, N.K.; Alsaadi, E.M.T.A. An Automated Vertebrate Animals Classification Using Deep Convolution Neural Networks. In Proceedings of the 2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Iraq, 16–18 April 2020; pp. 72–77. [Google Scholar] [CrossRef]
Samabth, M.; Lella, G.; Arulalan, K.; Rathinavel, M.; Aravindhar, D.J.; Ravi, S. Automated Face Authentication and Alert System Using AI. Alinteri J. Agric. Sci. 2021, 36, 181–186. [Google Scholar] [CrossRef]
Hiby, L.; Lovell, P. Computer-Aided Matching of Natural Markings: A Prototype System for Grey Seals. In Proceedings of the Report of the International Whaling Commission, Washington, DC, USA, 2 December 1990; pp. 57–61. [Google Scholar]
Chen, G.; Han, T.X.; He, Z.; Kays, R.; Forrester, T. Deep convolutional neural network based species recognition for wild animal monitoring. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 858–862. [Google Scholar] [CrossRef]
Islam, S.B.; Valles, D. Identification of Wild Species in Texas from Camera-trap Images using Deep Neural Network for Conservation Monitoring. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2020; pp. 0537–0542. [Google Scholar] [CrossRef]
Ráduly, Z.; Sulyok, C.; Vadászi, Z.; Zölde, A. Dog Breed Identification Using Deep Learning. In Proceedings of the 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY), Subotica, Serbia, 13–15 September 2018; pp. 000271–000276. [Google Scholar] [CrossRef]
Abu Jwade, S.; Guzzomi, A.; Mian, A. On-Farm Automatic Sheep Breed Classification Using Deep Learning. Comput. Electron. Agric. 2019, 167, 105055. [Google Scholar] [CrossRef]
Hansen, M.F.; Smith, M.L.; Smith, L.N.; Salter, M.G.; Baxter, E.M.; Farish, M.; Grieve, B.D. Towards On-Farm Pig Face Recognition Using Convolutional Neural Networks. Comput. Ind. 2018, 98, 145–152. [Google Scholar] [CrossRef]
Marsot, M.; Mei, J.; Shan, X.; Ye, L.; Feng, P.; Yan, X.; Li, C.; Zhao, Y. An Adaptive Pig Face Recognition Approach Using Convolutional Neural Networks. Comput. Electron. Agric. 2020, 173, 105386. [Google Scholar] [CrossRef]
Kumar, S.; Tiwari, S.; Singh, S.K. Face Recognition of Cattle: Can It Be Done? Proc. Natl. Acad. Sci. India Sect. A Phys. Sci. 2016, 86, 137–148. [Google Scholar] [CrossRef]
Noviyanto, A.; Arymurthy, A.M. Beef Cattle Identification Based on Muzzle Pattern Using a Matching Refinement Technique in the SIFT Method. Comput. Electron. Agric. 2016, 99, 77–84. [Google Scholar] [CrossRef]
Yao, L.; Hu, Z.; Liu, C.; Liu, H.; Kuang, Y.; Gao, Y. Cow Face Detection and Recognition Based on Automatic Feature Extraction Algorithm. In Proceedings of the ACM Turing Celebration Conference, Chengdu, China, 17–19 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
El-Bakry, H.; El-Henawy, I.; El-Hadad, H. Muzzle Classification Using Neural Networks. Int. Arab. J. Inf. Technol. 2017, 14, 464–472. [Google Scholar]
Zin, T.T.; Phyo, C.N.; Tin, P.; Hama, H.; Kobayashi, I. Image Technology Based Cow Identification System Using Deep Learning. In Proceedings of the International MultiConference of Engineers and Computer Scientists (IMECS2018), Hong Kong, China, 14–16 March 2018; pp. 320–323. [Google Scholar]
Wang, H.; Qin, J.; Hou, Q.; Gong, S. Cattle Face Recognition Method Based on Parameter Transfer and Deep Learning. J. Phys. Conf. Ser. 2020, 1453, 012054. [Google Scholar] [CrossRef]
Xu, F.; Gao, J.; Pan, X. Cow Face Recognition for a Small Sample Based on Siamese DB Capsule Network. IEEE Access 2022, 10, 63189–63198. [Google Scholar] [CrossRef]
de Lima Weber, F.; de Moraes Weber, V.A.; Menezes, G.V.; da Silva Oliveira Junior, A.; Alves, D.A.; de Oliveira, M.V.M.; Matsubara, E.T.; Pistori, H.; de Abreu, U.G.P. Recognition of Pantaneira cattle breed using computer vision and convolutional neural networks. Comput. Electron. Agric. 2020, 175, 105548. [Google Scholar] [CrossRef]
Andrew, W.; Gao, J.; Mullan, S.; Campbell, N.; Dowsey, A.W.; Burghardt, T. Visual identification of individual Holstein-Friesian cattle via deep metric learning. Comput. Electron. Agric. 2021, 185, 106133. [Google Scholar] [CrossRef]
Chen, X.; Yang, T.; Mai, K.; Liu, C.; Xiong, J.; Kuang, Y.; Gao, Y. Holstein Cattle Face Re-Identification Unifying Global and Part Feature Deep Network with Attention Mechanism. Animals 2022, 12, 1047. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar] [CrossRef]
Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 815–823. [Google Scholar] [CrossRef]
Wang, C.C.; Hung, C.W.; Yan, S.J.; Ho, C.C. Application of YOLO and Custom-Designed Intelligent Teaching Aids in Robotic Arm-Based Fruit Classification and Grasping Instruction. Preprints 2025. [Google Scholar] [CrossRef]
Bloch, V.; Frondelius, L.; Arcidiacono, C.; Mancino, M.; Pastell, M. Development and Analysis of a CNN- and Transfer-Learning-Based Classification Model for Automated Dairy Cow Feeding Behavior Recognition from Accelerometer Data. Sensors 2023, 23, 2611. [Google Scholar] [CrossRef]
Lipton, Z.C. The Mythos of Model Interpretability. In Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning, New York, NY, USA, 23 June 2016; pp. 96–100. [Google Scholar]
Neethirajan, S. The Role of Sensors, Big Data and Machine Learning in Modern Animal Farming. Sens. Bio-Sens. Res. 2020, 29, 100367. [Google Scholar] [CrossRef]
Schneider, S.; Taylor, G.W.; Kremer, S.C. Similarity learning networks for animal individual re-identification-beyond the capabilities of a human observer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, Snowmass Village, CO, USA, 1–5 March 2020; pp. 44–52. [Google Scholar]
Qiao, Y.; Clark, C.; Lomax, S.; Kong, H.; Su, D.; Sukkarieh, S. Automated Individual Cattle Identification Using Video Data: A Unified Deep Learning Architecture Approach. Front. Anim. Sci. 2021, 2, 759147. [Google Scholar] [CrossRef]
Xu, B.; Wang, W.; Guo, L.; Chen, G.; Wang, Y.; Zhang, W.; Li, Y. Evaluation of Deep Learning for Automatic Multi-View Face Detection in Cattle. Agriculture 2021, 11, 1062. [Google Scholar] [CrossRef]
Ghorbani, A.; Zou, J. Data Shapley: Equitable Valuation of Data for Machine Learning. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 2242–2251. [Google Scholar]
Li, S.; Fu, L.; Sun, Y.; Mu, Y.; Chen, L.; Li, J.; Gong, H. Individual Dairy Cow Identification Based on Lightweight Convolutional Neural Network. PloS ONE 2021, 16, e0260510. [Google Scholar] [CrossRef] [PubMed]
Qiao, L.; Geng, Y.; Zhang, Y.; Zhang, S.; Xu, C. MobiCFNet: A Lightweight Model for Cattle Face Recognition in Nature. In Proceedings of the International Conference on Artificial Intelligence and Soft Computing, Zakopane, Poland, 19–23 June 2022. [Google Scholar] [CrossRef]
Yadav, C.S.; Peixoto, A.A.T.; Rufino, L.A.L.; Silveira, A.B.; Alexandria, A.R.d. Intelligent Classifier for Identifying and Managing Sheep and Goat Faces Using Deep Learning. AgriEngineering 2024, 6, 3586–3601. [Google Scholar] [CrossRef]
Pavithra, M.; Surya Karthikesh, P.; Jahnavi, B.; Navyalokesh, M.C.R.; Lokesh Krishna, K. Implementation of Enhanced Security System using Roboflow. In Proceedings of the 2024 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 14–15 March 2024; pp. 1–5. [Google Scholar]
Khmelnytskyi National University. Method of Creating Custom Dataset to Train Convolutional Neural Network. Comput. Syst. Inf. Technol. 2024. [Google Scholar] [CrossRef]
Samuel, S.; Kamakshi, V.; Lodhi, N.; Krishnan, N. Evaluation of Saliency-based Explainability Method. arXiv 2021, arXiv:2106.12773. [Google Scholar] [CrossRef]
Chen, J.; Hajimirsadeghi, H.; Mori, G. Adapting Grad-CAM for Embedding Networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 1–6. [Google Scholar]
Bachhawat, M. Generalizing GradCAM for Embedding Networks. arXiv 2024, arXiv:2402.00909. [Google Scholar] [CrossRef]
Shobeiri, S.; Aajami, M. Shapley value in convolutional neural networks (CNNs): A Comparative Study. Am. J. Sci. Eng. 2021, 2, 9–14. [Google Scholar] [CrossRef]
Boneh-Shitrit, T.; Amir, S.; Bremhorst, A.; Mills, D.S.; Riemer, S.; Fried, D.; Zamansky, A. Deep Learning Models for Automated Classification of Dog Emotional States from Facial Expressions. arXiv 2022, arXiv:2206.05619. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv 2013, arXiv:1312.6034. [Google Scholar]
Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. SmoothGrad: Removing noise by adding noise. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef]
Banzato, T.; Bernardini, M.; Cherubini, G.; Zotti, A. A Methodological Approach for Deep Learning to Distinguish Between Meningiomas and Gliomas on Canine MR-Images. BMC Vet. Res. 2018, 14, 317. [Google Scholar] [CrossRef]
Marzahl, C.; Aubreville, M.; Bertram, C.A.; Stayt, J.; Jasensky, A.K.; Bartenschlager, F.; Fragoso-Garcia, M.; Barton, A.K.; Elsemann, S.; Jabari, S.; et al. Deep Learning-Based Quantification of Pulmonary Hemosiderophages in Cytology Slides. Sci. Rep. 2020, 10, 9795. [Google Scholar] [CrossRef]
Zheng, Y.; Li, G.; Li, Y. Survey of Application of Deep Learning in Image Recognition. Comput. Eng. Appl. 2019, 55, 20–36. [Google Scholar]
Biercher, A.; Meller, S.; Wendt, J.; Caspari, N.; Schmidt-Mosig, J.; De Decker, S.; Volk, H.A. Using Deep Learning to Detect Spinal Cord Diseases on Thoracolumbar Magnetic Resonance Images of Dogs. Front. Vet. Sci. 2021, 8, 721167. [Google Scholar] [CrossRef]
Biourge, V.; Delmotte, S.; Feugier, A.; Bradley, R.; McAllister, M.; Elliott, J. An artificial neural network-based model to predict chronic kidney disease in aged cats. J. Vet. Intern. Med. 2020, 34, 1920–1931. [Google Scholar] [CrossRef] [PubMed]
Kukartsev, V.V.; Ageev, R.A.; Borodulin, A.S.; Gantimurov, A.P.; Kleshko, I.I. Deep Learning for Object Detection in Images Development and Evaluation of the YOLOv8 Model Using Ultralytics and Roboflow Libraries. In Software Engineering Methods Design and Application; Silhavy, R., Silhavy, P., Eds.; Springer: Cham, Switzerland, 2024; pp. 629–637. [Google Scholar]
Dumortier, L.; Guépin, F.; Delignette-Muller, M.-L.; Boulocher, C.; Grenier, T. Deep Learning in Veterinary Medicine: An Approach Based on CNN to Detect Pulmonary Abnormalities from Lateral Thoracic Radiographs in Cats. Sci. Rep. 2022, 12, 11418. [Google Scholar] [CrossRef]
Schmid, D.; Scholz, V.B.; Kircher, P.R.; Lautenschlaeger, I.E. Employing Deep Convolutional Neural Networks for Segmenting the Medial Retropharyngeal Lymph Nodes in CT Studies of Dogs. Vet. Radiol. Ultrasound 2022, 63, 763–770. [Google Scholar] [CrossRef]
Badhe, T.; Borde, J.; Waghmare, B.; Thakur, V.; Chaudhari, A. Study of Deep Learning Algorithms to Identify and Detect Endangered Species of Animals. Int. J. Eng. Res. Technol. 2022, 11. [Google Scholar] [CrossRef]
Pastell, M.E.; Kujala, M. A Probabilistic Neural Network Model for Lameness Detection. J. Dairy Sci. 2007, 90, 2283–2292. [Google Scholar] [CrossRef]
Porter, I.R.; Wieland, M.; Basran, P.S. Feasibility of the Use of Deep Learning Classification of Teat-End Condition in Holstein Cattle. J. Dairy Sci. 2021, 104, 4529–4536. [Google Scholar] [CrossRef]
Salvi, M.; Molinari, F.; Iussich, S.; Muscatello, L.V.; Pazzini, L.; Benali, S.; Banco, B.; Abramo, F.; De Maria, R.; Aresu, L. Histopathological Classification of Canine Cutaneous Round Cell Tumors Using Deep Learning: A Multi-Center Study. Front. Vet. Sci. 2021, 8, 640944. [Google Scholar] [CrossRef]
Ancona, M.; Ceolini, E.; Öztireli, C.; Gross, M. Gradient-Based Attribution Methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning; Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.R., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 169–191. [Google Scholar]
Simpson, L.; Costanza, F.; Millar, K.; Cheng, A.; Lim, C.C.; Chew, H.G. Tangentially Aligned Integrated Gradients for User-Friendly Explanations. arXiv 2025, arXiv:2503.08240. [Google Scholar]
Polat, H.E.; Gerdan Koc, D.; Ertuğrul, O.; Koc, C.; Ekinci, K. Deep Learning based Individual Cattle Face Recognition using Data Augmentation and Transfer Learning. J. Agric. Sci. 2025, 31, 137–150. [Google Scholar] [CrossRef]
Tanfoni, M.; Ceroni, E.G.; Marziali, S.; Pancino, N.; Maggini, M.; Bianchini, M. Generated or Not Generated (GNG): The Importance of Background in the Detection of Fake Images. Electronics 2024, 13, 3161. [Google Scholar] [CrossRef]
Mahamud, E.; Fahad, N.; Assaduzzaman, M.; Zain, S.; Goh, K.O.M.; Morol, M.K. An explainable artificial intelligence model for multiple lung diseases classification from chest X-ray images using fine-tuned transfer learning. Decis. Anal. J. 2024, 12, 100499. [Google Scholar] [CrossRef]
Dhore, V.; Bhat, A.; Nerlekar, V.; Chavhan, K.; Umare, A. Enhancing Explainable AI: A Hybrid Approach Combining GradCAM and LRP for CNN Interpretability. arXiv 2024, arXiv:2405.12175. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]
Yoon, Y.; Hwang, T.; Lee, H. Prediction of radiographic abnormalities by the use of bag-of-features and convolutional neural networks. Vet. J. 2018, 237, 43–48. [Google Scholar]
Shinde, S.; Himpalnerkar, A.; Shendurkar, S.; Deshmane, S.; Jadhav, S. Cattle Disease Detection using VGG16 CNN Architecture. In Proceedings of the 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
Breaking the barriers of technology adoption: Explainable AI for requirement analysis and technology design in smart farming. Smart Agric. Technol. 2024, 9, 100658. [CrossRef]

Figure 1. The original image and the foreground and background images for one image. The model is trained on the original images. The test and reference images can be any of them.

Figure 2. Vanilla gradient contributions superimposed over the original image. There are some small patterns that can be seen, but overall the attributions are noisy.

Figure 3. Grad-CAM contributions superimposed over the original image. Lighter regions indicate higher attribution.

Figure 4. Average Grad-CAM contributions superimposed over the original image. Lighter regions indicate higher attribution.

Figure 5. EmbeddingCAM contributions superimposed over the original image. Lighter regions indicate higher attribution.

Table 1. The accuracy over test and reference images and the corresponding Shapley value.

	Original	Background	Foreground	Empty
Model accuracy	92.2%	50.0%	40.2%	2.0%
Shapley value	92.2%	86.3%	76.5%	2.0%

Table 2. The ratio of the Shapley values for foreground to background compared to the gradient attribution methods.

	Foreground:Background
Shapley value	80.4%
Vanilla gradient	100.0%
Average Grad-CAM	98.8%
EmbeddingCAM	117.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tanchev, D.; Marazov, A.; Balieva, G.; Lazarova, I.; Rankova, R. Exploring Attributions in Convolutional Neural Networks for Cow Identification. Appl. Sci. 2025, 15, 3622. https://doi.org/10.3390/app15073622

AMA Style

Tanchev D, Marazov A, Balieva G, Lazarova I, Rankova R. Exploring Attributions in Convolutional Neural Networks for Cow Identification. Applied Sciences. 2025; 15(7):3622. https://doi.org/10.3390/app15073622

Chicago/Turabian Style

Tanchev, Dimitar, Alexander Marazov, Gergana Balieva, Ivanka Lazarova, and Ralitsa Rankova. 2025. "Exploring Attributions in Convolutional Neural Networks for Cow Identification" Applied Sciences 15, no. 7: 3622. https://doi.org/10.3390/app15073622

APA Style

Tanchev, D., Marazov, A., Balieva, G., Lazarova, I., & Rankova, R. (2025). Exploring Attributions in Convolutional Neural Networks for Cow Identification. Applied Sciences, 15(7), 3622. https://doi.org/10.3390/app15073622

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Attributions in Convolutional Neural Networks for Cow Identification

Abstract

1. Introduction

2. Materials and Methods

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI