Next Article in Journal
Characterisation of Pasteurella multocida Strains from Different Lesions in Rabbits
Previous Article in Journal
Diet Drives Gut Bacterial Diversity of Wild and Semi-Captive Common Cranes (Grus grus)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning to Predict Pregnancy in Dairy Cows: An Approach Integrating Automated Activity Monitoring and On-Farm Data

by
Thaisa Campos Marques
1,2,
Letícia Ribeiro Marques
1,
Patrick Bezerra Fernandes
1,
Fabio Soares de Lima
2,
Tiago do Prado Paim
1 and
Karen Martins Leão
1,*
1
Departamento de Zootecnia, Instituto Federal Goiano, Rio Verde 75901-970, Brazil
2
Department of Population Health and Reproduction, University of California, Davis, CA 95616, USA
*
Author to whom correspondence should be addressed.
Animals 2024, 14(11), 1567; https://doi.org/10.3390/ani14111567
Submission received: 22 April 2024 / Revised: 21 May 2024 / Accepted: 22 May 2024 / Published: 25 May 2024

Abstract

:

Simple Summary

Scientists have developed a way to more accurately predict when dairy cows are most likely to become pregnant using automated activity monitoring (AAM) systems to track their activity. These systems track the cow’s movement and behavior in real time, which is crucial for determining the best time for artificial insemination (AI). This study used data from over a thousand Holstein cows to create a mathematical model that predicts pregnancy chances at the time of AI, considering not just the cow’s activity data but also individual health, the environment, and even the specific bull used for insemination. This study found that combining on-farm data (like health and environmental conditions) with the AAM data gives a clearer picture of a cow’s pregnancy chances compared to using AAM data alone. The random forest model, one of the mathematical methods used, was particularly good at reducing errors in prediction. This research suggests that merging detailed farm data with automated monitoring can greatly improve the predictions of pregnancy at the time of AI, which is beneficial for managing dairy cow reproduction efficiently.

Abstract

Automated activity monitoring (AAM) systems are critical in the dairy industry for detecting estrus and optimizing the timing of artificial insemination (AI), thus enhancing pregnancy success rates in cows. This study developed a predictive model to improve pregnancy success by integrating AAM data with cow-specific and environmental factors. Utilizing data from 1,054 cows, this study compared the pregnancy outcomes between two AI timings—8 or 10 h post-AAM alarm. Variables such as age, parity, body condition, locomotion, and vaginal discharge scores, peripartum diseases, the breeding program, the bull used for AI, milk production at the time of AI, and environmental conditions (season, relative humidity, and temperature–humidity index) were considered alongside the AAM data on rumination, activity, and estrus intensity. Six predictive models were assessed to determine their efficacy in predicting pregnancy success: logistic regression, Bagged AdaBoost algorithm, linear discriminant, random forest, support vector machine, and Bagged Classification Tree. Integrating the on-farm data with AAM significantly enhanced the pregnancy prediction accuracy at AI compared to using AAM data alone. The random forest models showed a superior performance, with the highest Kappa statistic and lowest false positive rates. The linear discriminant and logistic regression models demonstrated the best accuracy, minimal false negatives, and the highest area under the curve. These findings suggest that combining on-farm and AAM data can significantly improve reproductive management in the dairy industry.

1. Introduction

Reproductive performance is key to the profitability of dairy farms. Several factors, encompassing cow health, management, and environmental conditions, can influence whether a cow becomes pregnant after artificial insemination. These include the body condition score [1], retained placenta and periparturient diseases [2], parity [3], nutrition, heat stress, the month of artificial insemination [4], and environmental factors [5]. In this context, effective estrus detection is a determining factor in reproductive performance in intensive dairy production systems, as these cows have a lower peak of activity and estrus duration [6,7]. Thus, the use of the Automated Animal Monitoring (AAM) systems provides improved reproductive performance through accurate estrus detection, consecutive increases in service rates [8], and a reduction in the days to the first AI [9]. Considering the increased animal activity, AAM detects 15 to 35% more cows in estrus compared to visual observation, achieving an efficient detection rate of over 80% [10]. Nevertheless, our group observed that the inclusion of cow-specific variables and environmental factors in the statistical model can help increase more precisely estrus detection in dairy cows [11]. In addition, the AAM available commercially, while effective in detecting estrus, nowadays, does not provide pregnancy probability estimation, which could enhance semen investment management. Integrating statistical models to predict pregnancy probabilities would allow for more strategic use of high-value semen in cows with higher chances of conception, thereby optimizing genetic gains and improving returns on investment.
Statistical models have been used to predict lifetime production [12], fertility [13,14], health [15], and genomic selection [16]. Machine learning models are an alternative approach to classical statistical models for developing predictive models in large datasets, such as livestock-related studies [17], allowing for proactive management decisions and the customization of approaches to suit specific farm conditions [18]. In cows, studies have been conducted using this powerful tool to predict pregnancy success [19], and health disorders such as clinical mastitis, subclinical ketosis, lameness, and metritis [20].
In summary, while it is evident that the use of AAM can improve estrus detection, studies including cow-level variables and environmental factors associated with fertility in AAM machine learning models are still lacking. We hypothesized that incorporating on-farm factors like parity, peripartum health history, and environmental conditions in AAM models to predict pregnancy improves the models’ accuracy, sensitivity, and specificity using different machine learning algorithms. Therefore, the objective of this study was to compare the effect of including cow-level variables and environmental factors associated with fertility in predicting pregnancy at the time of AI in dairy cows using AAM through machine learning models.

2. Materials and Methods

2.1. Data and Animals

Our dataset was from a previous study comparing two artificial insemination (AI) times after the AAM alarm to predict pregnancy in dairy cows [11], where the methodology is thoroughly detailed. The retrospective observational case–control study briefly used data collected from a commercial dairy farm in the southwestern state of Goiás, Brazil. These data, collected using the farm’s software, covered the period from January 2018 to December 2020. Briefly, the study utilized data from 1054 Holstein cows, producing an average of 11,154 kg of milk per animal per lactation (305 days). These cows were housed in a free-stall barn equipped with fans above the stalls and sprinklers along the feedline, and they were milked three times daily. Data regarding cow-level [age, parity, body condition score (BCS, 1 = thin and 5 = fat, according to Ferguson et al. [21]), and locomotion score (LS, 1 = walked normally and 5 = presented lameness, according to Sprecher et al. [22]), days in milk (DIM), milk production, somatic cell count (SCC), and retained fetal membranes (RFMs)], peripartum diseases (hypocalcemia, ketosis, displacement abomasum, laminitis, foot disorders, pneumonia, and clinical mastitis) were recorded by farm personnel. The Metricheck® device (SimcroTech, Hamilton, New Zealand) was used to collect vaginal discharge at 11 ± 4 days postpartum to be classified according to Sheldon et al. [23]. Cows with more than 50% pus or fetid watery reddish-brown discharge were defined with metritis and treated in a single dose with 20 mg ceftiofur hydrochloride (Lactofur®, Ouro Fino, Cravinhos, Brazil).
The daily time (in minutes) of activity and rumination was measured using AAM (SCR®, Netanya, Israel), which generated the estrus intensity to predict fertility on a scale from 0 to 100, where 0 represents no fertility and 100 represents high fertility.
Cows were synchronized for the first AI using prostaglandin (PGF 0.5 mg cloprostenol), or an estradiol- and progesterone-based program (EP) as described by Pereira et al. [24]. On Day 0, cows received two progesterone intravaginal implants (CIDR devices, Zoetis, Florham Park, NJ) and 2 mg of estradiol benzoate. After 7 days (Day 7), cows were treated with PGF (0.5 mg cloprostenol). On Day 9, they received 1.0 mg of estradiol cypionate, and the CIDR devices were removed. The AI was performed at 60 ± 7 DIM in one of two different time periods, at 8 h (n = 536) or 10 h (n = 518) after the AAM alarm with estrus intensity greater than 30. The pregnancy diagnosis was performed by the veterinarian responsible for overseeing the farm 30 days after AI.
The seasons followed for the Southern Hemisphere: spring (21 September to 20 December), summer (21 December to 20 March), fall (21 March to 20 June), and winter (21 June to 20 September). Environmental temperature and relative humidity (RH) were obtained from the weather station situated on the farm (ADAMA Clima®, Adama Brasil, Londrina, Brazil). To assess the level of heat stress the animals experienced, the temperature and humidity index (THI) was calculated on the day of the AI using the model defined by Mader et al. [25]: THI = 0.8 × T + [(RH (%) ÷ 100) × (T − 14.4)] + 46.4, where T represents the ambient temperature in °C, and RH stands for relative humidity.

2.2. Building Prediction Models

First of all, the non-continuous variables described in this study above were categorized for each cow included in the research: parity (1 = primiparous, 2 = multiparous), RFM (yes vs. no), BCS (1 = thin to 5 = fat), LS (1 = normal, 2 = mild lameness, 3 = moderate lameness, 4 = lameness, 5 = severely lameness), peripartum diseases (yes vs. no), season of the year at AI (spring, summer, fall, and winter), vaginal discharge at 11 ± 4 d postpartum (1 = clear or translucent, 2 = little purulent material, 3 = mucopurulent, 4 = 50% or more pus, 5 = fetid watery reddish-brown), reproductive program (PGF vs. EP), AI time (8 h vs. 10 h), bull (1 to 10), and pregnant (yes vs. no).
Secondly, we selected six models based on maximizing the Jaccard dissimilarity between sets of models from the classification models available on the package caret [26] in R software v.4.2.2 [27]. Therefore, the models selected included logistic regression (L), random forest (RF), linear discriminant analysis (LDA), support vector machines with linear kernel (SVM), Bagged AdaBoost (ADABAG), and Bagged Classification Tree (TREEBAG).
The collinearity of the variables was verified and pairs of variables with VIF higher than 10 were removed. This step removed the air temperature measures as it had higher VIF with the temperature and humidity index (THI). Then, we evaluated the models with five different sets of variables (FULL, AAM, no-AAM, GA, and STEP), as explained next (Table 1).
We evaluated the models using all variables available (FULL), only the three variables (activity, rumination, and estrus score) provided by the monitoring system (AAM), and all other variables available without those from the monitoring system (no-AAM). The GA group of variables was selected by the method of genetic algorithms using the function gafs of the caret package. This method is recognized for its effectiveness in feature selection, a crucial step in data mining, removing irrelevant and/or redundant features from a dataset [28].At the end, we applied stepwise feature selection method in logistic regression and linear discriminant regression to select the best set of variables for building the prediction model based on Akaike Information Criterion (AIC). Then, we had 6 methods with 4 sets of variables, resulting in 24 different models. Moreover, we have L and LDA methods with the STEP set of variables, ending at 26 different models to be evaluated.
The dataset was split into training (80% of the observations) and test set (20% of the observations). The predictors were centered and scaled using the preProcess function of the caret package. Then, the 26 different models were run using the train function of the caret package, setting family equal to binomial, as the outcome was pregnancy status (yes or no). Three repeats of 10-fold cross-validation were performed on the training data. For the model using treebag method, the nbagg was set as 50 and the metric used was based on the area under the curve (AUC). Random forest method also used the AUC metric for optimization.
Accuracy, false positive rate, and false negative rate were calculated building a confusion matrix of the predicted by the model and the observed in the test data. The area under the curve of each model was calculated using the predict function. The statistics of goodness of fit were reported as sensitivity (Se), specificity (Sp), positive predictive value (PPV), and negative predictive value (NPV).

3. Results

The evaluation of various machine learning models to predict pregnancy in high-milk-producing cows revealed distinct performance metrics across the models. The specificity of most of the models exceeded 0.80, except for those utilizing the no_AAM variables. The sensitivity exceeded 0.60 in only eight of the FULL, STEP, and GA models. Additionally, the accuracy surpassed 80% in nine models, predominantly in the FULL, STEP, and GA setups. The highest AUC values were observed in the GA (logistic) and FULL (rf) setups, achieving up to 0.87. These results underscore the importance of including cow-level and environmental variables to enhance the models’ predictive accuracy.
We evaluated the performance of each machine learning algorithm according to its accuracy, sensitivity, specificity, PPV, NPV, FP, FN, and AUC value (Table 2).
The SVM model using the no_AAM variables did not converge properly. The specificity of the models exceeded 0.80, except for no_AAM (AdaBag and treebag). By contrast, in the sensitivity of 26 models, only 8 were greater than 0.60, being from the FULL (lda, logistic, and svm), STEP (logistic), and GA (Ida, logistic, svm, and treebag) setups. The accuracy of only nine models was higher than 80%, namely, FULL (lda, logistic, rf, and svm), STEP (logistic), and GA (Ida, logistic, svm, and AdaBag). The AUC reached 0.87 in GA (logistic) and 0.86 in FULL (rf), STEP (logistic), and GA (Ida, svm, and rf). Using only the no_AAM data did not perform well in any of the best goodness of fit measures. The results of the machine learning algorithms models to predict pregnancy at the first postpartum AI in high-milk-producing cows using AAM showed that the inclusion of the cow-level variables and environmental factors associated with fertility improves the prediction of pregnancy success.
Most of the models using FULL, STEP, and GA increased the accuracy, sensitivity, and specificity in predicting pregnancy compared to the models that did not use AAM or only used variables from AAM. The positive and negative predictive values were higher in most of the models compared to the no_AAM model. All these values are reflected in a higher area under the curve compared to the no_AAM model. Moreover, the inclusion of the on-farm variables in the AAM model is 3–4% more accurate in predicting pregnancy compared to using just AAM parameters.
The analysis of machine learning showed that the addition of farm-available parameters, particularly parturition-related disorders, season on AI day, and maximum THI on AI day to the AAM-related variables models was necessary to predict whether the cow would become pregnant or not with an accuracy of 84–85% (Figure 1). Moreover, the different measures demonstrated the best models depending on the use of them. In terms of positive pregnancy prediction, the random forest model with all the data (FULL) provides the best results. On the contrary, in terms of equilibrium between the accuracy, AUC, and negative predictive value, the logistic (STEP and GA), support vector machines (GA), and linear discriminant (GA) methods were the best options. Thus, this preliminary study demonstrated the potential benefits of using machine learning models for predictive classification.

4. Discussion

The best models in terms of the AUC were based on the logistic regression, linear discriminant, and support vector machine methods. The group of variables used in these models (GA) was composed of the AAM data (activity at estrus day and estrus score) coupled with the cow health data (retained fetal membranes and vaginal discharge score) and the environmental variables (season and THI max of the day of AI). Therefore, the inclusion of only these four variables resulted in a better prediction of pregnancy probability. It is worth noting that, currently, without any additional information, the expected probability is 50% for all cows [29]. In addition, Hansen [30] mentioned that the application of machine learning to identify morphological features could be enhanced by incorporating information about pregnancy outcomes after embryo transfer.
The current study highlights the importance of continuously monitoring and recording cow health data. Monitoring the retained fetal membranes is just a matter of keeping a record of animals identified with this disorder, which causes a decline in fertility and milk output, leading to economic losses [31,32]. Furthermore, it necessitates additional mating attempts, prolongs the interval between calvings, and may lead to health complications, including the slow recovery of the uterus, ovarian cysts, and metritis [33]. The vaginal discharge score, otherwise, depends on continuous monitoring at 7 days after birth, using the appropriate technique. However, this management helps in the early detection of metritis, a common infectious disease in dairy cattle [34], allowing for timely intervention to improve cows’ production efficiency once cows with metritis exhibit reduced milk production and fertility and are more likely to be culled [35]. In summary, keeping detailed records of these two specific health issues can help farmers in planning the AI at the most suitable time and choosing bulls that have a higher probability of successful pregnancies.
The THI maximum on the day of AI and the season were the two main environmental variables entering the models. There is a strong influence of the environment on parameters related to animal pregnancy in the context of production, both for beef and dairy cattle. Previous studies [36,37] have identified a low-magnitude heritability for the probability of pregnancy at the first natural mating in Murrah breed heifers (0.122 to 0.154) and the probability of calving from the first insemination in Angus breed heifers (0.025 to 0.048).
In the evaluation of the 26 tested models, the logistic regression model with stepwise feature selection exhibited a notable performance. It secured the top position for the accuracy and negative predictive value, ranked second for the Kappa statistic and sensitivity, fifth for the positive predictive value, and twelfth for the specificity. Logistic regressions are suitable for simple data with linear relationships between variables and outcomes. Moreover, logistic models are easy to implement and explain, efficient with small to moderate datasets, have a low risk of overfitting, and provide a probabilistic output. Consequently, logistic models are constantly used in predicting binary outcomes as is the case of pregnancy probabilities [38,39]. Failure to control the variables entering models according to prior knowledge and in addressing the collinearity between the variables will result in the poor performance of logistic regression models [40]. In the present study, we controlled these aspects; therefore, the simple logistic regression model outperforms some of the machine learning methods. Thurmond et al. [41], employing a Bayesian hierarchical logistic survival model to incorporate longitudinal data from multiple pregnancies of a single cow, revealed that the predicted probabilities of abortion increased with the cow’s age at conception, the number of previous abortions, and if the previous pregnancy was aborted after 60 days of gestation.
The variables that entered the final logistic model were age, parity, retained fetal membranes, the vaginal discharge score, activity at estrus day, the estrus score, reproductive program, AI time after AAM alarm, and bull semen. Mendes et al. [42] showed that cows with moderate production but a lower reproductive performance demonstrated lower longevity in the production system. This issue becomes even more challenging with high-production cows, as milk production shows a negative association with reproductive characteristics [43,44]. Considering these elements, the age of the cow and her physical state play determining roles in the reproductive success of the production system.
On the other side, when employing a pregnancy probability model for semen investment management, the accurate prediction of positive values assumes heightened significance. This is particularly crucial as high-value semen is allocated to cows with a higher likelihood of pregnancy. In such scenarios, the random forest models, utilizing both FULL and GA datasets, outperform other models, resulting in approximately only 10% false positives (Table 2). Moreover, the random forest with the FULL dataset presented the highest Kappa statistic. These models can be applied as robust decision-making tools, allowing for the selection of specific semen for cows predicted to achieve pregnancy, thereby yielding a substantially higher pregnancy ratio. Consequently, this approach has the potential to amplify genetic gain and deliver a more favorable return on investment.
Random forest is a nonlinear tree-based integrated learning model. It is a popular machine learning procedure introduced by Breiman [45]. The forest is composed of many decision trees, and there is no correlation between each decision tree. After the random forest model is obtained, each decision tree in the random forest is judged when the new sample enters. For the classification problem, the voting method is used, and the maximum number of votes is the final model output [40]. Random forests consistently offer the highest prediction accuracy compared to other models in the setting of classification [46]. In the agriculture and livestock sector, several studies have shown good results in the application of random forests, especially for the prediction of binary or class outcomes [47]. For example, random forest methods exhibited a good performance in the prediction of survival to second lactation in Holstein cattle [48], modeling the milk yield of dairy cows under heat stress conditions [49], and also in the prediction of the risk of tick presence on livestock farms [50].
Furthermore, to optimize animal reproduction, it is crucial to associate management strategies that can promote better efficiency in reproductive animal management. In this regard, there are already several studies proposing the use of AAM systems and exploring machine learning algorithms to enhance the prediction of health and fertility disorders in dairy cows [51,52,53]. A previous study has also shown that routinely collected farm data and milk production records on the test day are valuable for predicting the success of insemination in dairy cows [54]. Our preliminary study found that optimizing activity monitoring models by including on-farm measures such as parity, peripartum health history, and environmental conditions can favor the correct identification of estrus and improve activity monitoring alerts regarding the optimal timing for AI, thereby increasing the reproductive performance in dairy cows [11].
Our study integrating AAM systems with specific cow and environmental data to improve pregnancy success in dairy cows showed promise but faces challenges. Issues include the limited generalizability due to the sample diversity, potential model biases, environmental variability, and the economic and technical demands of implementation. Although AAM systems offer advantages over traditional methods and reduced labor, integrating these into less technologically advanced farms is challenging. Ongoing research is necessary to assess the effectiveness and feasibility of these models across different farm settings.

5. Conclusions

Including cow-specific variables and environmental factors in machine learning models significantly improves the accuracy, sensitivity, and specificity of predicting pregnancy at the time of artificial insemination in dairy cows using automated activity monitoring (AAM) systems. The most accurate model was the support vector machines (GA) logistic regression, which used AAM data, cow health indicators, and environmental variables. This model showed an accuracy of 87% compared to 84% for the model using only AAM data. These findings can impact decision-making regarding semen investment during AI and highlight the practical relevance and future perspectives of integrating such models in dairy farming. Future studies should explore similar methodologies for embryo recipients to enhance assisted reproduction programs.

Author Contributions

Conceptualization, K.M.L., T.d.P.P. and T.C.M.; methodology, K.M.L., T.d.P.P. and T.C.M.; formal analysis, T.d.P.P.; resources, T.C.M. and L.R.M.; data curation, T.d.P.P. and T.C.M.; writing—original draft preparation, T.C.M.; writing—review and editing, K.M.L., T.d.P.P., T.C.M., L.R.M., P.B.F. and F.S.d.L.; project administration, K.M.L.; funding acquisition, K.M.L. and T.d.P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES, Finance code 001); Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPQ - Proccesses: 409620/2022-0 and 409400/2021-1); and Instituto Federal Goiano and CEAGRE (Edital 19/21).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

The data used in this study were sourced from Santa Helena Farm, Goias, Brazil. The owner’s farm agreed to use the data in the study being conducted.

Data Availability Statement

Data will be made available on request.

Acknowledgments

The authors thank Instituto Federal Goiano - Campus Rio Verde, CAPES, CNPq, Fundação de Amparo à Pesquisa do Estado de Goiás (FAPEG), and all collaborators of the Animal Reproduction Laboratory. We also thank Santa Helena Farm for authorizing and collaborating in this study. The authors also thank CAPES for the postdoctoral scholarship provided.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Weik, F.; Archer, J.; Morris, S.; Garrick, D.; Hickson, R. Relationship between body condition score and pregnancy rates following artificial insemination and subsequent natural mating in beef cows on commercial farms in New Zealand. N. Z. J. Anim. Sci. Prod. 2020, 80, 14–20. [Google Scholar]
  2. Ribeiro, E.S.; Gomes, G.; Greco, L.F.; Cerri, R.L.A.; Vieira-Neto, A.; Monteiro, P.L.J., Jr.; Lima, F.S.; Bisinotto, R.S.; Thatcher, W.W.; Santos, J.E.P. Carryover effect of postpartum inflammatory diseases on developmental biology and fertility in lactating dairy cows. J. Dairy Sci. 2016, 99, 2201–2220. [Google Scholar] [CrossRef] [PubMed]
  3. Abreu, B.; Barbosa, S.B.P.; Da Silva, E.C.; Santoro, K.R.; Batista, Â.M.V.; Martinez, R.L.V.; Valenza, L.M.; Jatobá, R.B. Productive and reproductive performance of Holstein cows in Agreste, Pernambuco, from 2007 to 2017. Semin. Cienc. Agrar. 2020, 41, 571–586. [Google Scholar] [CrossRef]
  4. Siddiqui, M.A.; Das, Z.C.; Bhattacharjee, J.; Rahman, M.M.; Islam, M.M.; Haque, M.A.; Parrish, J.J.; Shamsuddin, M. Factors affecting the first service conception rate of cows in smallholder dairy farms in Bangladesh. Reprod. Domest. Anim. 2013, 48, 500–505. [Google Scholar] [CrossRef] [PubMed]
  5. Souza, F.R.; Campos, C.C.; Da Silva, N.A.M.; Dos Santos, R.M. Influence of seasonality; timing of insemination and rectal temperature on conception rate of crossbred dairy cows. Semin. Cienc. Agrar. 2016, 37, 155–162. [Google Scholar] [CrossRef]
  6. Palmer, M.A.; Olmos, G.; Boyle, L.A.; Mee, J.F. Estrus detection and estrus characteristics in housed and pastured Holstein–Friesian cows. Theriogenology 2010, 74, 255–264. [Google Scholar] [CrossRef] [PubMed]
  7. Reith, S.; Hoy, S. Review: Behavioral signs of estrus and the potential of fully automated systems for detection of estrus in dairy cattle. Animal 2018, 12, 398–407. [Google Scholar] [CrossRef] [PubMed]
  8. Marques, O.; Veronese, A.; Merenda, V.R.; Bisinotto, R.S.; Chebel, R.C. Effect of estrous detection strategy on pregnancy outcomes of lactating Holstein cows receiving artificial insemination and embryo transfer. J. Dairy Sci. 2020, 103, 6635–6646. [Google Scholar] [CrossRef] [PubMed]
  9. Fricke, P.M.; Giordano, J.O.; Valenza, A.; Lopes Júnior, G.; Amundson, M.C.; Carvalho, P.D. Reproductive performance of lactating dairy cows managed for first service using timed artificial insemination with or without detection of estrus using an activity-monitoring system. J. Dairy Sci. 2014, 97, 2771–2781. [Google Scholar] [CrossRef]
  10. Mayo, L.M.; Silvia, W.J.; Ray, D.L.; Jones, B.W.; Stone, A.E.; Tsai, I.C.; Clark, J.D.; Bewley, G.; Heersche, G., Jr. Automated estrous detection using multiple commercial precision dairy monitoring technologies in synchronized dairy cows. J. Dairy Sci. 2019, 102, 2645–2656. [Google Scholar] [CrossRef]
  11. Marques, L.R.; Almeida, J.V.N.; Oliveira, A.C.; Paim, T.P.; Marques, T.C.; Leão, K.M. Artificial insemination timing on pregnancy rate of Holstein cows using an automated activity monitoring. Cienc. Rural. 2024, 54, e20220557. [Google Scholar] [CrossRef]
  12. Perneel, M.; De Smet, S.; Verwaeren, J. Data driven prediction of dairy cattle lifetime production and its use as a guideline to select surplus youngstock. J. Dairy Sci. 2024, in press. [Google Scholar] [CrossRef]
  13. Antanaitis, R.; Juozaitienė, V.; Malašauskienė, D.; Televičius, M.; Urbutis, M.; Zamokas, G.; Baumgartner, W. Prediction of Reproductive Success in Multiparous First Service Dairy Cows by Parameters from In-Line Sensors. Agriculture 2021, 11, 334. [Google Scholar] [CrossRef]
  14. Wang, C.W.; Kuo, C.Y.; Chen, C.H.; Hsieh, Y.H.; Su, E.C. Predicting clinical pregnancy using clinical features and machine learning algorithms in in vitro fertilization. PLoS ONE 2022, 17, e0267554. [Google Scholar] [CrossRef]
  15. Nie, J.; Fang, J.; Zhao, Y. Cow Health Prediction Method Based on Logistic Regression and Decision Tree. In Proceedings of the 34th Chinese Control and Decision Conference (CCDC), Hefei, China, 15–17 August 2022; pp. 3712–3717. [Google Scholar] [CrossRef]
  16. Gutierrez-Reinoso, M.A.; Aponte, P.M.; Garcia-Herreros, M. Genomic Analysis, Progress and Future Perspectives in Dairy Cattle Selection: A Review. Animals 2021, 11, 599. [Google Scholar] [CrossRef] [PubMed]
  17. Grzesiak, W.; Zaborski, D.; Sablik, P.; Zukiewicz, A.; Dybus, A.; Szatkowska, I. Detection of cows with insemination problems using selected classification models. Comput. Electron. Agric. 2010, 74, 265–273. [Google Scholar] [CrossRef]
  18. Cockburn, M. Review: Application and Prospective Discussion of Machine Learning for the Management of Dairy Farms. Animals 2020, 10, 1690. [Google Scholar] [CrossRef]
  19. Hempstalk, K.; McParland, S.; Berry, D. Machine learning algorithms for the prediction of conception success to a given insemination in lactating dairy cows. J. Dairy Sci. 2015, 98, 5262–5273. [Google Scholar] [CrossRef] [PubMed]
  20. Zhou, X.; Xu, C.; Wang, H.; Xu, W.; Zhao, Z.; Chen, M.; Jia, B.; Huang, B. The Early Prediction of Common Disorders in Dairy Cows Monitored by Automatic Systems with Machine Learning Algorithms. Animals 2022, 12, 1251. [Google Scholar] [CrossRef]
  21. Ferguson, J.D.; Galligan, D.T.; Thomsen, N. Principal descriptors of body condition score in Holstein cows. J. Dairy Sci. 1994, 77, 2695–2703. [Google Scholar] [CrossRef]
  22. Sprecher, D.J.; Hostetler, D.E.; Kaneene, J.B. lameness scoring system that uses posture and gait to predict dairy cattle reproductive performance. Theriogenology 1997, 47, 1179–1187. [Google Scholar] [CrossRef] [PubMed]
  23. Sheldon, I.M.; Lewis, G.S.; LeBlanc, S.; Gilbert, R.O. Defining postpartum uterine disease in cattle. Theriogenology 2006, 65, 1516–1530. [Google Scholar] [CrossRef] [PubMed]
  24. Pereira, M.H.C.; Sanches, C.P., Jr.; Guida, T.G.; Wiltbank, M.C.; Vasconcelos, J.L.M. Comparison of fertility following use of one versus two intravaginal progesterone inserts in dairy cows without a CL during a synchronization protocol before timed AI or timed embryo transfer. Theriogenology 2017, 89, 72–78. [Google Scholar] [CrossRef]
  25. Mader, T.L.; Davis, M.S.; Brown-Brandl, T. Environmental factors influencing heat stress in feedlot cattle. J. Anim. Sci. 2006, 84, 712–719. [Google Scholar] [CrossRef]
  26. Kuhn, M. caret: Classification and Regression Training. R Package Version 6.0-93. 2022. Available online: https://CRAN.R-project.org/package=caret (accessed on 12 June 2023).
  27. R CORE TEAM. The R Foundation for Statistical Computing; R CORE TEAM: Vienna, Austria, 2021. [Google Scholar]
  28. Rostami, M.; Moradi, P. A clustering based genetic algorithm for feature selection. In Proceedings of the 6th Conference on Information and Knowledge Technology (IKT), Shahrood, Iran, 27–29 May 2014; Institute of Electrical and Electronics Engineers: Shahrood, Iran, 2014; pp. 112–116. [Google Scholar] [CrossRef]
  29. da Silva, M.I.; Oli, N.; Gambonini, F.; Ott, T. Effects of parity and early pregnancy on peripheral blood leukocytes in dairy cattle. Preprint. BioRxiv, 2024; preprint. [Google Scholar] [CrossRef]
  30. Hansen, P.J. The incompletely fulfilled promise of embryo transfer in cattle-why aren’t pregnancy rates greater and what can we do about it? J. Anim. Sci. 2020, 98, skaa288. [Google Scholar] [CrossRef] [PubMed]
  31. Mahnani, A.; Sadeghi-Sefidmazgi, A.; Ansari-Mahyari, S.; Ghorbani, G.R. Assessing the consequences and economic impact of retained placenta in Holstein dairy cattle. Theriogenology 2021, 175, 61–68. [Google Scholar] [CrossRef] [PubMed]
  32. Kamel, E.; Ahmed, H.; Hassan, F. The effect of retained placenta on the reproductive performance and its economic losses in a Holstein dairy herd. Iraqi J. Vet. Sci. 2022, 36, 359–365. [Google Scholar] [CrossRef]
  33. Kashima, I.P.; Ngou, A.A. Retained Fetal Membrane in Tanzanian Dairy Cows: Economic Impacts and Subsequent Reproductive Performances. J. Vet. Med. Anim. Sci. 2021, 4, 1059. [Google Scholar]
  34. Galvão, K.N.; Bicalho, R.C.; Jeon, S.J. Symposium review: The uterine microbiome associated with the development of uterine disease in dairy cows. J. Dairy Sci. 2019, 102, 11786–11797. [Google Scholar] [CrossRef]
  35. de Oliveira, E.B.; Cunha, F.; Daetz, R.; Figuereido, C.C.; Chebel, R.C.; Santos, J.E.; Risco, C.A.; Jeong, K.C.; Machado, V.S.; Galvão, K.N. Using chitosan microparticles to treat metritis in lactating dairy cows. J. Dairy Sci. 2020, 103, 7377–7391. [Google Scholar] [CrossRef]
  36. Fernandes, P.B.; Marques, K.O.; De Araujo Neto, F.R.; De Oliveira, D.P.; Hurtado-Lugo, N.A.; Aspilcueta-Borquis, R.R.; Tonhati, H. Genetic-Quantitative Study of the First-Service Pregnancy Probability of Murrah Heifers. Reprod. Domest. Anim. 2016, 51, 428–434. [Google Scholar] [CrossRef]
  37. Donoghue, K.A.; Rekaya, R.; Bertrand, J.K.; Misztal, I. Genetic evaluation of calving to first insemination using natural and artificial insemination mating data. J. Anim. Sci. 2004, 82, 362–367. [Google Scholar] [CrossRef]
  38. Peixoto, M.G.; Bergmann, J.A.; Suyama, E.; Carvalho, M.R.; Penna, V.M. Logistic regression analysis of pregnancy rate following transfer of Bos indicus embryos into Bos indicus x Bos taurus heifers. Theriogenology 2007, 67, 287–292. [Google Scholar] [CrossRef]
  39. Friggens, N.C.; Labouriau, R. Probability of pregnancy as affected by oestrus number and days to first oestrus in dairy cows of three breeds and parities. Anim. Reprod. Sci. 2010, 118, 155–162. [Google Scholar] [CrossRef] [PubMed]
  40. Ye, Y.; Xiong, Y.; Zhou, Q.; Wu, J.; Li, X.; Xiao, X. Comparison of Machine Learning Methods and Conventional Logistic Regressions for Predicting Gestational Diabetes Using Routine Clinical Data: A Retrospective Cohort Study. J. Diabetes Res. 2020, 2020, 4168340. [Google Scholar] [CrossRef]
  41. Thurmond, M.C.; Branscum, A.J.; Johnson, W.O.; Bedrick, E.J.; Hanson, T.E. Predicting the probability of abortion in dairy cows: A hierarchical Bayesian logistic-survival model using sequential pregnancy data. Prev. Vet. Med. 2005, 68, 223–239. [Google Scholar] [CrossRef] [PubMed]
  42. Mendes, L.B.; Coppa, M.; Rouel, J.; Martin, B.; Dumont, B.; Ferlay, A.; Blanc, F. Profiles of dairy cows with different productive lifespan emerge from multiple traits assessed at first lactation: The case of a grassland-based dairy system. Livest. Sci. 2021, 246, 104443. [Google Scholar] [CrossRef]
  43. Eldawy, M.H.; Lashen, M.E.S.; Badr, H.M.; Farouk, M.H. Milk production potential and reproductive performance of Egyptian buffalo cows. Trop. Anim. Health Prod 2021, 53, 282. [Google Scholar] [CrossRef] [PubMed]
  44. Ratwan, P.; Chakravarty, A.K.; Kumar, M. Assessment of relation among production and reproduction traits in Sahiwal cattle at an organized herd of northern India. Biol. Rhythm. Res. 2022, 53, 70–78. [Google Scholar] [CrossRef]
  45. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  46. Fernández-Delgado, M.; Cernadas, E.; Barro, S.; Amorim, D. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 2014, 15, 3133–3181. [Google Scholar]
  47. Everingham, Y.; Sexton, J.; Skocaj, D.; Inman-Bamber, G. Accurate prediction of sugarcane yield using a random forest algorithm. Agron. Sustain. Dev. 2016, 36, 27. [Google Scholar] [CrossRef]
  48. van der Heide, E.M.M.; Veerkamp, R.F.; van Pelt, M.L.; Kamphuis, C.; Athanasiadis, I.; Ducro, B.J. Comparing regression, naive Bayes, and random forest methods in the predic-tion of individual survival to second lactation in Holstein cattle. J. Dairy Sci. 2019, 102, 9409–9421. [Google Scholar] [CrossRef] [PubMed]
  49. Bovo, M.; Agrusti, M.; Benni, S.; Torreggiani, D.; Tassinari, P. Random Forest Modelling of Milk Yield of Dairy Cows under Heat Stress Conditions. Animals 2021, 11, 1305. [Google Scholar] [CrossRef] [PubMed]
  50. Lihou, K.; Wall, R. Predicting the current and future risk of ticks on livestock farms in Britain using random forest models. Vet. Parasitol. 2022, 311, 109806. [Google Scholar] [CrossRef] [PubMed]
  51. Wang, J.; Bell, M.; Liu, X.; Liu, G. Machine-Learning Techniques Can Enhance Dairy Cow Estrus Detection Using Location and Acceleration Data. Animals 2020, 10, 1160. [Google Scholar] [CrossRef] [PubMed]
  52. Kliś, P.; Sawa, A.; Piwczyński, D.; Sitkowska, B.; Bogucki, M. Prediction of cow’s fertility based on data recorded by automatic milking system during the periparturient period. Reprod. Domest. Anim. 2021, 56, 1227–1234. [Google Scholar] [CrossRef] [PubMed]
  53. de Oliveira, E.B.; Ferreira, F.C.; Galvão, K.N.; Youn, J.; Tagkopoulos, I.; Silva-Del-Rio, N.; Pereira, R.V.V.; Machado, V.S.; Lima, F.S. Integration of statistical inferences and machine learning algorithms for prediction of metritis cure in dairy cows. J. Dairy Sci. 2021, 104, 12887–12899. [Google Scholar] [CrossRef]
  54. Rutten, C.J.; Steeneveld, W.; Vernooij, J.C.M.; Huijps, K.; Nielen, M.; Hogeveen, H. A prognostic model to predict the success of artificial insemination in dairy cows based on readily available data. J. Dairy Sci. 2016, 99, 6764–6779. [Google Scholar] [CrossRef]
Figure 1. Accuracy of pregnancy prediction for different models using 5 different groups of variables and 6 different methods (showing the average as point and the 95% confidence interval estimation).
Figure 1. Accuracy of pregnancy prediction for different models using 5 different groups of variables and 6 different methods (showing the average as point and the 95% confidence interval estimation).
Animals 14 01567 g001
Table 1. Variables included in each variable set used in the models to predict pregnancy 30 days after the first artificial insemination (AI) in dairy cows using an automatic monitoring system (AAM).
Table 1. Variables included in each variable set used in the models to predict pregnancy 30 days after the first artificial insemination (AI) in dairy cows using an automatic monitoring system (AAM).
Variables Available Variable Set
FULLNo-AAMAAMGASTEP-LSTEP-LDA
Cow-level 1
  Age
  Parity
  Body condition score
  Locomotion score
  Retained fetal membranes
  Vaginal discharge score
  Peripartum disease
  DIM on AI day
  Milk yield on AI day
  SCC on AI day
AAM 2
  Activity at estrus day
  Estrus score
  Rumination at estrus day
Breeding 3
  Reproductive program
  AI time after AAM alarm
  Bull
Environmental factors 4
  Season on AI day
  Relative humidity on AI day
  THI (Max.) on AI day
  THI (Min.) on AI day
  THI (Mean) on AI day
1 Parity = primiparous or multiparous; body score condition = 1 (thin) and 5 (fat) according to Ferguson et al. [21]; locomotion score = 1 = walked normally and 5 = presented lameness according to Sprecher et al. [22]; vaginal discharge score collected at 11 ± 4 days postpartum = 1 (clear or translucent), 2 (little purulent material), 3 (mucopurulent), 4 (50% or more pus), and 5 (fetid watery reddish-brown) according to Sheldon et al. [23]; peripartum disease = hypocalcemia, ketosis, displacement abomasum, laminitis, foot disorders, pneumonia, and clinical mastitis; DIM = days in milk; AI = artificial insemination; SCC = somatic cell count. 2 Automated activity monitoring (SCR Engineering, Netanya, Israel); estrus score = estrus intensity to predict fertility (0 = no fertility and 100 = high fertility). 3 Reproductive program = synchronization of estrus using prostaglandin (PGF 0.5 mg cloprostenol, Sincrocio®, Cravinhos, Brazil), or synchronization of estrus and ovulation with estradiol and progesterone protocols [24]. 4 Seasons defined for the Southern Hemisphere: spring (21 September to 20 December), summer (2 December to 20 March), fall (21 March to 20 June), and winter (June 21 to September 20).
Table 2. Performance of machine learning algorithms to predict pregnancy at first postpartum AI in dairy cows using an automated monitoring system 1 associated with cow-level 2, breeding management 3, and environmental factors 4.
Table 2. Performance of machine learning algorithms to predict pregnancy at first postpartum AI in dairy cows using an automated monitoring system 1 associated with cow-level 2, breeding management 3, and environmental factors 4.
Model 5Goodness of Fit Measures 6
KappaSeSpPPVNPVFPFNAUC
no_AAM
  logistic0.370.530.800.590.750.410.250.68
  lda0.310.470.820.600.730.400.270.71
  rf0.230.280.920.670.690.330.310.68
  AdaBag0.230.430.790.540.710.470.290.65
  svm--------
  treebag0.110.470.640.430.680.580.320.57
AAM
  rf0.500.540.920.800.780.200.220.82
  lda0.470.560.890.740.780.260.220.82
  logistic0.460.560.880.730.780.270.220.84
  treebag0.430.560.860.690.770.310.230.81
  AdaBag0.400.460.910.730.750.270.250.77
  svm0.370.400.930.760.740.240.270.82
FULL
  lda0.570.670.880.760.830.240.180.85
  logistic0.550.640.890.770.810.230.190.85
  rf0.810.500.960.880.770.120.230.86
  svm0.540.640.880.750.810.250.190.84
  AdaBag0.450.500.910.770.770.230.240.82
  treebag0.420.510.880.710.760.290.240.80
STEP
  logistic0.590.680.890.780.830.220.170.86
  lda0.490.580.880.740.790.260.210.83
GA
  lda0.590.680.890.780.830.220.170.86
  Logistic0.570.680.880.750.830.250.170.87
  Svm0.570.670.880.760.830.240.180.86
  AdaBag0.530.590.910.780.800.220.200.84
  Treebag0.530.680.840.710.820.290.180.84
  Rf0.280.250.980.900.700.100.300.86
The best values in each parameter are highlighted in bold. 1 SCR Engineering, Netanya, Israel, activity at estrus, rumination at estrus, estrus score (0 = no fertility and 100 = high fertility). 2 Parity = primiparous or multiparous; body score condition = 1 (thin) and 5 (fat) according to Ferguson et al. [21]; locomotion score = 1 = walked normally and 5 = presented lameness according to Sprecher et al. [22]; vaginal discharge score collected at 11 ± 4 days postpartum = 1 (clear or translucent), 2 (little purulent material), 3 (mucopurulent), 4 (50% or more pus), and 5 (fetid watery reddish-brown) according to Sheldon et al. [23]; peripartum disease = hypocalcemia, ketosis, displacement abomasum, laminitis, foot disorders, pneumonia, and clinical mastitis; DIM = days in milk; AI = artificial insemination; SCC = somatic cell count. 3 Reproductive program = synchronization of estrus using prostaglandin (PGF 0.5 mg cloprostenol, Sincrocio®, Cravinhos, Brazil), or synchronization of estrus and ovulation with estradiol and progesterone protocols [24], artificial insemination time after AAM alarm (8 h or 10 h), and bull. 4 Season on AI day (defined for the Southern Hemisphere: spring = 21 September to 20 December, summer = 2 December to 20 March, fall = 21 March to 20 June, and winter = June 21 to September 20), temperature on AI day, relative humidity on AI day, temperature humidity index on AI day (THI). 5 Variables included in the models: no-AAM (variables related to cow-level, breeding management programs, and environmental factors), AAM (only variables used in the system = activity, rumination, and estrus score), FULL (no-AAM and AAM models merged), STEP (selected by stepwise), GA (retained fetal membranes, vaginal discharge at 11 ± 4 DPP, AAM activity at estrus, estrus score, season on AI day, and THI maximum). 6 PPV = positive predictive value, NPV = negative predictive value, AUC = area under the curve.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Marques, T.C.; Marques, L.R.; Fernandes, P.B.; de Lima, F.S.; do Prado Paim, T.; Leão, K.M. Machine Learning to Predict Pregnancy in Dairy Cows: An Approach Integrating Automated Activity Monitoring and On-Farm Data. Animals 2024, 14, 1567. https://doi.org/10.3390/ani14111567

AMA Style

Marques TC, Marques LR, Fernandes PB, de Lima FS, do Prado Paim T, Leão KM. Machine Learning to Predict Pregnancy in Dairy Cows: An Approach Integrating Automated Activity Monitoring and On-Farm Data. Animals. 2024; 14(11):1567. https://doi.org/10.3390/ani14111567

Chicago/Turabian Style

Marques, Thaisa Campos, Letícia Ribeiro Marques, Patrick Bezerra Fernandes, Fabio Soares de Lima, Tiago do Prado Paim, and Karen Martins Leão. 2024. "Machine Learning to Predict Pregnancy in Dairy Cows: An Approach Integrating Automated Activity Monitoring and On-Farm Data" Animals 14, no. 11: 1567. https://doi.org/10.3390/ani14111567

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop