Next Article in Journal
Characterization of Short-Term Heat Stress in Holstein Dairy Cows Using Altered Indicators of Metabolomics, Blood Parameters, Milk MicroRNA-216 and Characteristics
Previous Article in Journal
Black Soldier Fly (Hermetia illucens) Larvae and Prepupae Defatted Meals in Diets for Zebrafish (Danio rerio)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Use of Artificial Neural Networks and a General Discriminant Analysis for Predicting Culling Reasons in Holstein-Friesian Cows Based on First-Lactation Performance Records

by
Krzysztof Adamczyk
1,*,
Wilhelm Grzesiak
2 and
Daniel Zaborski
2
1
Department of Animal Genetics, Breeding and Ethology, University of Agriculture in Krakow, al. Mickiewicza 24/28, 30-059 Kraków, Poland
2
Department of Ruminants Science, West Pomeranian University of Technology, Klemensa Janickiego 29, 71-270 Szczecin, Poland
*
Author to whom correspondence should be addressed.
Animals 2021, 11(3), 721; https://doi.org/10.3390/ani11030721
Submission received: 20 January 2021 / Revised: 27 February 2021 / Accepted: 3 March 2021 / Published: 6 March 2021
(This article belongs to the Section Cattle)

Abstract

:

Simple Summary

Routinely collected data on the performance of dairy cows are a valuable source of information on the beginning, course, and completion of their productive life. As a result, when using sufficiently accurate methods, one can analyze and optimize the milk production process at a herd level from the breeding and economic point-of-view. In this context, it is important to have a possibility to early predict culling reasons for cows, since, in the case of finding an effective method, it would be possible to modify breeding actions and farm management practices without anticipating the end of the animals’ productive lives. Therefore, the aim of the present study was to verify whether artificial neural networks and a general discriminant analysis may be an effective tool for predicting the culling reasons in cows based on routinely collected first-lactation data. It turned out that they were most effective in predicting culling due to old age and reproductive problems. It is significant because infertility is one of the conditions that are the most difficult to eliminate in dairy herds.

Abstract

The aim of the present study was to verify whether artificial neural networks (ANN) may be an effective tool for predicting the culling reasons in cows based on routinely collected first-lactation records. Data on Holstein-Friesian cows culled in Poland between 2017 and 2018 were used in the present study. A general discriminant analysis (GDA) was applied as a reference method for ANN. Considering all predictive performance measures, ANN were the most effective in predicting the culling of cows due to old age (99.76–99.88% of correctly classified cases). In addition, a very high correct classification rate (99.24–99.98%) was obtained for culling the animals due to reproductive problems. It is significant because infertility is one of the conditions that are the most difficult to eliminate in dairy herds. The correct classification rate for individual culling reasons obtained with GDA (0.00–97.63%) was, in general, lower than that for multilayer perceptrons (MLP). The obtained results indicated that, in order to effectively predict the previously mentioned culling reasons, the following first-lactation parameters should be used: calving age, calving difficulty, and the characteristics of the lactation curve based on Wood’s model parameters.

1. Introduction

Routinely collected data on longevity and survivability of cows and their culling reasons may be used for the analysis of dairy cattle management and milk production profitability in individual animals, herds, or populations [1]. Therefore, these data are more frequently monitored in real time and relationships among them are analyzed in the context of animal performance prediction [2].
The studies published so far have indicated multiple possibilities in this regard, starting from a basic analysis of milk production and linear type traits using economic techniques for decision-making [3,4], through the application of survival analysis to the prediction of longevity breeding value in dairy bulls [5,6], the prediction of health problems associated with metabolic diseases in cows [7,8], the analysis of the association between the leptin gene polymorphism and functional longevity of dairy cows [9], the objective evaluation of effective transition cow management at a herd level [10], ending up with the prediction of the first test-day milk yield of dairy heifers [11]. The previously mentioned studies have mostly focused on the analysis of relatively short time periods, i.e., a specific and important moment in cows’ life (e.g., the perinatal period), whereas research on the effective prediction of dairy cow longevity and/or their culling reasons over a longer time span based on routine herd data is still rather scarce. Preliminary studies on culling reasons in cows were carried out by Lacroix et al. [12] and two decades later by Adamczyk et al. [13], who indicated a potential relationship between certain culling reasons and lifetime performance of dairy cows. In addition, Krug et al. [14] developed a model to identify herds with poor welfare based on the Portuguese national cattle database, suggesting, at the same time, the possibility of replacing the laborious and time-consuming procedures required for the Welfare Quality® protocol.
During the cows’ life, the first lactation is one of the most crucial periods. It is associated with the beginning of animal productive life and the change in management conditions, which constitutes a huge challenge for maintaining the optimal welfare level of cows [15,16,17]. This is significant since primiparous cows, as young animals, undergo growth and development, which must be considered in the production and breeding practice [18,19].
In the context of culling decisions, economic and breeding effects of age at first calving and the course of the first lactation are indicated [1]. This results, among others, from an association between the performance of primiparous cows and their lifetime production [20,21]. In this regard, the current attempts at searching for an association between certain production traits and culling reasons in cows are promising [22,23]. However, a qualitatively and quantitatively optimal selection of variables for prediction models and the use of sufficiently accurate analytical methods still remain a challenge.
In the present study, lactation curve parameters estimated from Wood’s model were used as predictors. Wood’s regression curve, as a mathematical equation describing the relationship between milk yield and lactation duration, is frequently used in the studies on the estimation of milk yield in cows [24,25]. This model includes parameters associated with the course of milk yield of a cow during lactation (incline, peak, and decline after the peak).
One of the prediction methods frequently used in animal science is an artificial neural network (ANN). ANN is an information processing system inspired by biological structures such as the human brain. The popularity of ANN results from their ability to reproduce the processes occurring in the brain (incremental information processing, learning new concepts, making decisions, and drawing conclusions based on complex, sometimes irrelevant, or incomplete data), to a limited extent [26]. Therefore, ANN represents a different approach than traditional statistical methods in which it is necessary to define an algorithm and record it in the form of a computer program. Instead, ANN are presented with exemplary tasks and the connections between the network elements and their weight coefficients are modified automatically, according to the assumed training strategy. Besides the ability for self-programming, ANN also show reduced sensitivity to the damages of their structural elements and capability of parallel data processing [27]. There are different types of ANN (feedforward, recurrent, cellular, etc.) among which feedforward ANN consisting of several neuronal layers (an input layer, one or more hidden layers, and an output layer) are very popular. Such ANN are trained in a supervised manner, which means that the desired responses (e.g., culling reasons) are known for each training example (containing cow data). The more recent applications of ANN in animal science include the prediction of milk yield [28], fertility status [29], and assisted or difficult calvings [30] in dairy cows, or the estimation of carcass weight in beef cattle [31], among others. Another, more traditional statistical approach (also belonging to data mining methods) is a general discriminant analysis (GDA). In this method, discriminant function analysis problems are solved using a general multivariate linear model, in which the dependent variables are binary vectors that reflect the class membership of each case (animal) [32,33]. GDA offers more possibilities than traditional discriminant function analysis, based on a classification rule, which allows for the correct classification of cows and the evaluation of classification accuracy depending on the adopted division criteria. In animal science, GDA has been used for dystocia detection in dairy cows [34] or the examination of factors affecting beef tenderness [35], among others.
The following research hypothesis was adopted in the present study. ANN may be an effective tool for predicting culling reasons in cows, based on routinely collected first-lactation data. Moreover, the effects of the prediction made by ANN were compared with those obtained using the GDA.

2. Materials and Methods

2.1. Animals

The analysis included data on Holstein-Friesian cows (from 466 herds) culled in Poland between 2017 and 2018. The animals were performance recorded. The data were obtained from the SYMLEK Polish National Milk Recording System. SYMLEK is a system of databases (including the results of data analysis for breeding purposes) on the population of dairy cattle under milk recording in Poland. At the breeding level, the system is managed by the Polish Federation of Cattle Breeders and Dairy Farmers, while ZETO Software is responsible for its technical (IT) side.

2.2. Data Splitting

Based on the whole dataset of test-day records for the first lactation, the cows were grouped according to the culling reason and the age at first calving. The following culling reasons (R) were analyzed: infectious diseases (R1), respiratory system diseases (R2), low milk yield (R3), nutritive and metabolic diseases (R4), leg diseases (R5), udder diseases (R6), infertility and reproduction problems (R7), old age (R8), accidents (R9), and others (R10). The grouping was carried out in order to plot lactation curves through the calculation of Wood’s model parameters within the categories of culling reasons and age at first calving. Using test-day records, 17 cow groups were distinguished according to the age at first calving (at one-month intervals, from 17 to 34 months of calving, whereas the cows calving at the age of 17 and 18 months were treated as one group). A total of 164 classes (age group × culling reason) were formed in this way. Theoretically, the number of classes should be 170 (17 age groups × 10 culling reasons), but six age groups had missing data for certain culling reasons (Table 1).

2.3. Estimation of Wood’s Model Parameters

Based on milk yield from test-day records, the first-lactation curve parameters were estimated separately for each group (age at first calving × culling reason). For this purpose, the mean values of milk yield from test-day records were determined for each lactation stage. Ten lactation stages were distinguished at 30-day intervals (the first lactation stage from 5 to 30 days of lactation, the second lactation stage from 31 to 60 days, the third lactation stage from 61 to 90 days, the fourth lactation stage from 91 to 120 days, the fifth lactation stage from 121 to 150 days, the sixth lactation stage from 151 to 180 days, the seventh lactation stage from 181 to 210 days, the eighth lactation stage from 211 to 240 days, the ninth lactation stage from 241 to 270 days, and the 10th lactation stage from 271 to 305 days). For the description of the lactation curve, the gamma function proposed by Wood [36] was used:
y = a t b e c t ,
where y is the milk production (kg) at time t (days), e is Napier’s constant, a is the initial milk yield, b is the rate of increase until the peak is reached, and c is the rate of decline after peak production.
The regression model parameters were estimated with the quasi-Newton method [37]. A total of 948,010 test-day records (for the first 305-day lactation) from 163,369 cows were used for estimating Wood’s model parameters. The estimated Wood’s model parameters (a, b, c) were used as explanatory (input) variables for further analysis.

2.4. Data Editing

When preparing the training set for classification using neural networks, only cows with a complete set of information were used. Records with less than 1 kg of milk, incomplete or erroneous ones (e.g., improbable minimal and maximal values of variables) were removed. In addition, only cows with at least nine test-day records were included in the analysis. The final dataset contained 50,879 cows.
In this dataset, the following explanatory (input) variables were included: X1—herd-size (from 3 to 1644 cows), X2—age at first calving (from 17 to 34 months), X3—lactation length (in days), X4—the number of first-lactation test-day records, X5–X7—Wood’s model parameters (a, b, c, respectively) for individual categories (age at first calving × culling reason). Additionally, the following production traits for the first lactation were used as predictors (minimum, maximum, mean values, and standard deviations, respectively): X8–X11—daily milk yield (kg), X12–X15—fat content (%), X16–X19—protein content (%), X20–X23—lactose content (%), X24–X27—dry matter content (%), X28–X31—urea content (mg/L), and X32–X35—somatic cell count (thousand/mL). Moreover, first-lactation nominal variables such as X36—calving difficulty (according to the scale used for performance recording: easy, spontaneous, difficult, very difficult, abortion, and cesarean section) and X37—calving season (spring from 21 March to 20 June, summer from 21 June to 21 September, autumn from 22 September to 22 December, and winter from 23 December to 20 March) were included in the model.
Ultimately, the dataset was randomly divided into a training set (33,071 culling records, 65% of all observations), a validation set (used for controlling the network training process, 7632 records, 15% of all observations) and a test set (used for verifying the predictive performance of the models, 10,176 records, and 20% of all observations). The distribution of continuous and nominal predictors in individual sets is presented in Table 2 and Table 3.

2.5. Neural Network Analysis

Different multilayer perceptrons (MLP) with one hidden layer were analyzed. The hidden layer consisted of 5 to 30 neurons (the number of neurons was selected empirically). The number of neurons in the input layer was 45 and the calving season and calving difficulty variables were coded by four and six neurons, respectively (one-of-n encoding) (Figure 1). In the input layer, the min-max transformation was used for continuous variables. In the hidden and output neurons, different types of activation functions were verified (linear, logistic, hyperbolic tangent, and exponential). The networks were trained with the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, which is a powerful second order training algorithm with very fast convergence but high memory requirements due to storing the Hessian matrix [38]. For each analyzed network, a given number of iterations was carried out until reaching the minimum misclassification rate on the validation set. For the evaluation of the network during its training, two error functions were considered, i.e., the sum of squares and cross-entropy. The latter is calculated as the sum of the products of real values and error logarithms for each output neuron.
Based on the obtained results, the classification matrix was created on the test set and predictive performance measures were calculated (the percentage of correctly classified cases from each category and the overall accuracy). In addition, the positive predictive values (PPV) were calculated, which showed the reliability of predictions made by the neural models. Finally, a sensitivity analysis was carried out for ANN, which allowed for the ordering of predictors (input variables) according to their relative importance. This analysis was based on two criteria: an error ratio, i.e., error when input was set to mean divided by the error when input was used (for continuous predictors) or an average error when input was set to all other categorical levels divided by the error when input was used (for categorical predictors), and a rank, which ordered predictors according to their decreasing importance from one (the most important predictor) to 37 (the least important predictor).

2.6. Training of the Neural Model with the Most Discriminative Predictors

Based on the results of sensitivity analysis (Table 4), the set of predictors was limited to the five most discriminative ones, adopting the value of an error ratio above 1.5. The entire procedure was the same as for the networks with the full set of predictors (the selection of the best network out of 10 initial networks, classification matrix, sensitivity analysis, and gains charts). Clearly, there were differences in the number of input neurons (ten, Figure 2).

2.7. Discriminant Analysis

Based on the same dataset as for ANN, the GDA was carried out and the classification matrix was created on the test set for the models including all 37 initial or the five most discriminative predictors. The method of the GDA model building was described in more detail by Zaborski et al. [34]. In addition, the predictive performance measures were calculated (the percentage of correctly classified cases from each culling category, the overall accuracy, and PPV).

2.8. Gains Charts

In order to better illustrate the predictive abilities of the neural and GDA models, cumulative gains charts were also plotted. These charts show the relationship between the cumulative gains (the proportion of cases from a given culling category among all the cases belonging to this category) and the percentage of cases predicted by the model as belonging to this category in the whole data set [39]. A diagonal crossing the (0,0) and (1,1) points (the baseline) indicates a random model (without any predictive capabilities). Therefore, the curves located above the diagonal are preferred [the closer the line to the (0,1) point, the better the model] [40].
Statistica software (v. 13.3, Tibco Inc., Tulsa, OK, USA) was used for statistical analysis.

3. Results

The most effective ANN with 37 predictors had a relatively, highly correct classification rate on the training and validation set (86.73%–96.17% and 87.33%–95.96%, respectively) (Table 5). From among the analyzed ANN, the MLP with one hidden layer and a 45-29-10 structure (the number of neurons in the input, hidden and output layer, respectively) was selected (Figure 1). This perceptron (denoted as MLP37) had the highest correct classification rate on the validation set. The applied training algorithm included 320 iterations. The cross-entropy error function was applied together with the SoftMax activation function in the network output layer. A hyperbolic tangent activation function was used in the hidden layer.
The sensitivity analysis of MLP37 showed that the greatest influence on the output variable was exerted by lactation curve parameters (a, b, c), age at first calving, and calving difficulty. Their error ratio ranged from 8.870 (calving difficulty) to 188.901 (the a parameter). Therefore, these variables were used as the only input variables for the network with a reduced set of predictors. The remaining input variables for MLP37 had a much lower error ratio, i.e., below 2 (Table 4).
In comparison with the best networks containing 37 predictors, the networks with a lower number of predictors were characterized by the lower values of a correct classification rate both on the training (71.46%–83.01%) and validation (71.40%–83.52%) set. Among these networks, the MLP with one hidden layer and a 10-19-10 structure (denoted as MLP5) was the most effective (Figure 2). The applied training algorithm included 227 iterations. To evaluate the network performance during its training (like for MLP37), an entropy error function was used, which was applied together with the SoftMax activation function in the network output layer. Similarly, a hyperbolic tangent activation function was used in the hidden layer (Table 5).
As can be seen from Table 3, the most frequent culling reasons in the test set were: reproductive problems (4055 records), udder diseases (1314 records), and accidents and leg diseases (1047 and 1025 records, respectively). Nevertheless, both MLP37 and MLP5 almost always correctly classified culling records from an old age category (R8) (Table 6). A very high correct classification rate (at least 99%) was also found for cows culled due to reproductive problems (R7). In other cases, the percentage of correct classification was 91%–97% for MLP37 (except for low milk yield–R3, for which it was 77%) and 51%–88% for MLP5. On the other hand, the lowest correct classification rate (77% and 55% for MLP37 and MLP5, respectively) was observed for low milk yield (R3). The percentage of correct classification for individual culling reasons obtained with GDA37 and GDA5 was, in general, lower than that for MLP37 and MLP5 (Table 7).
R1—infectious diseases, R2—respiratory system diseases, R3—low milk yield, R4—nutritive and metabolic diseases, R5—leg diseases, R6—udder diseases, R7—infertility and reproduction problems, R8—old age, R9—accidents, and R10—other. The numbers of correctly classified cases are shown on the diagonal.
A significant indicator of the predictive abilities of ANN and GDA was also the reliability of prediction. In the present study, PPV were used for this purpose (Table 7). In general, these values were quite high for both neural models (88.31–100% for MLP37 and 68.15–100% for MLP5). For GDA, they ranged from 0% to 83.33% (GDA37) and from 0% to 100% (GDA5). In order to get an even better insight into the prediction reliability of ANN and GDA, the cumulative gains charts were plotted and analyzed, which illustrated the relationship between the percentage of correctly classified cases from a given category and the percentage of records from the dataset ordered, according to the predicted probability of the class assignment. It should be emphasized that the gain curves for most culling reasons predicted by ANN were located as much higher than the baseline [near the (0, 100%) point], which indicates a high prediction reliability of the neural models (Figure 3 and Figure 4). However, this time, MLP37 was (like for the previously reported results) more effective than MLP5. When interpreting gains charts for individual categories, one should also consider the percentage of cases from a given category in the whole dataset. Consequently, the course of the curve for udder diseases was not optimal (from the first 13% of observations classified with the highest probability to this category by the model, which about 80% belonged to this class). On the other hand, the curve for reproductive problems passed much closer to the baseline, even though the prediction reliability was very high. This resulted from the fact that the percentage of cases from this category in the whole dataset was about 40%. The gains obtained with both MLP37 and MLP5 were the highest (besides reproductive problems) for such culling reasons as: old age, accidents, or leg diseases. The gains for the GDA models with 37 and five predictors were much lower (Figure 5 and Figure 6). In principle, the gains curves for individual categories (except for old age) were located very close to the baseline, and some curves were even below this line, which shows the uselessness of such models, since better results can be obtained in a purely random manner (without any model).

4. Discussion

In the present study, old age was the most accurately predicted culling reason by both MLP37 and MLP5. In addition, prediction reliability for ANN was high for this category, which indicates that almost all animals predicted by ANN to be culled due to old age, really belonged to this category. The gains charts were also nearly optimal. Similar values of individual performance indicators were obtained for GDA. Therefore, in this case, it is really possible to include only lactation curve parameters (from Wood’s model), age at first calving, and calving difficulty as the only predictors in primiparous cows. It may be of great importance from the production practice point-of-view, since, among all culling reasons, old age directly indicates the productive lifespan of dairy cows, considering the longest-living animals. The possibility of an early prediction of the maximum lifespan of cows (based on first-lactation parameters) may provide information required for the assessment of milk production profitability [41] and the modification of breeding programs for dairy cattle in terms of their longevity [42]. It should also be noted that many authors [1,42,43,44] have indicated cow longevity as one of the most important measures of animal welfare. It is highly significant from both a breeding and production point-of-view and due to the increasingly higher sensitivity of dairy product consumers to the human-animal relationship [45]. Therefore, it seems that the prediction of the maximum length of productive life in high-yielding cows should be interesting for both breeders/milk producers and the food industry.
On the other hand, reproductive problems, as the second culling reason (after old age), most accurately predicted by MLP37 and MLP5, belong to the most frequent difficulties encountered in production practice. It is estimated that they account for approximately 20%–40% of culled dairy cows [1,46]. A highly correct classification rate (at least 99% in the case of ANN) for this category was also accompanied by high prediction reliability and high gains (considering the fact that this category was the most frequent one). Cows with reproductive problems were sometimes incorrectly classified to such categories as: leg diseases, udder diseases, accidents, other reasons, and (to a lesser extent) low yield, respiratory system diseases, metabolic diseases, and old age. It may have resulted from the relationship between reproductive problems in cows and other culling reasons. An association between cow fertility and respiratory system diseases [47,48], milk yield level [49,50], metabolic diseases [51,52,53], leg diseases [54,55,56], and udder diseases [57,58] has been shown. In the present study, ANN incorrectly classified reproductive problems in cows in these cases. At the same time, this result supports the suggestion made by Adamczyk et al. [59], who recommended to consider not only the ultimate culling reasons but also the mutual relationships among individual reasons and the life-history of cows when analyzing longevity and indicating culling reasons for these animals. Cows culled due to reproductive problems were also accurately predicted by GDA. However, PPV and gains were lower than those for ANN considering the proportion of this class in the whole dataset. Similar results were obtained for GDA with a reduced set of predictors.
For the potential application of ANN in dairy production, prediction of culling reasons in cows should be considered in a broader context, i.e., concerning the lifetime performance of animals. In this regard, Kumar and Hooda [60] stated that artificial intelligence may be successfully applied to the prediction of lifetime milk yield of cows based on age at first calving, calving interval, and some parameters of the first and second lactation (service period, lactation milk yield, lactation length, and dry period), whereas Bhosale and Singh [61] reported that, for the effective prediction of lifetime milk yield in cross-bred cows with a proportion of Holstein-Friesian genes, it is sufficient to include only the first-lactation parameters in the ANN input layer (lactation length, peak yield, and lactation total milk yield). Moreover, ANN was very effective in this case for both smaller (fewer than 10 cows) and larger herds. Considering the results reported by Bhosale and Singh [61], it should be noted that, in the present study, the predictive abilities of ANN were confirmed based on the first-lactation data, including Wood’s model parameters. Consequently, the effectiveness of ANN in predicting phenotypic milk performance traits is even more important due to the fact that it corresponds to the significant abilities of ANN to predict breeding value of dairy cattle [62].
In this context, a more traditional method such as a discriminant function analysis was much less effective when compared with ANN including 37 or five predictors. In addition, the application of GDA is associated with certain assumptions about predictors, especially multicollinearity, which limits its applicability [33]. Predictors should not be correlated with each other since this causes computational problems. These assumptions, however, are not so important for ANN.

5. Conclusions

In the present study, it was shown that artificial neural networks may be an effective method of classifying cows culled due to old age based on routinely collected first-lactation data. Among the remaining culling reasons, a highly correct classification rate was observed for reproductive problems. An association between this culling reason and low milk yield, udder diseases, metabolic diseases, leg diseases, and respiratory system diseases was also confirmed in our study. It should be emphasized that, for the effective prediction of culling reasons, it was sufficient to include such first-lactation traits as calving age, calving difficulty, and the characteristics of the lactation curve (Wood’s model parameters). The confirmed abilities of ANN may constitute a valuable source of information that can be used for breeding programs’ modification in Holstein-Friesian cattle and economic model optimization for dairy herds.

Author Contributions

Conceptualization, K.A. Methodology, W.G. and D.Z. Validation, K.A. and W.G. Formal analysis, K.A. Investigation, D.Z. and W.G. Resources, K.A. Data curation, D.Z. Writing—original draft preparation, K.A., W.G., and D.Z. Writing—review and editing, K.A. and W.G. Visualization, D.Z. Supervision, K.A. Project administration, K.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data are publicly unavailable due to data confidentiality.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. De Vries, A. Cow longevity economics: The cost benefit of keeping the cow in the herd. In Proceedings of the “Cow Longevity Conference”, Hamra Farm/Tumba, Sweden, 28–29 August 2013; pp. 22–52. [Google Scholar]
  2. Brouwer, H.; Stegeman, J.A.; Straatsma, J.W.; Hooijer, G.A.; van Schaik, G. The validity of a monitoring system based on routinely collected dairy cattle health data relative to a standardized herd check. Prev. Vet. Med. 2015, 122, 76–82. [Google Scholar] [CrossRef]
  3. Congleton, W.R. Dairy cow culling decision. 3. Risk of culling on predicted income (an application of Bayes criterion) 1. J. Dairy Sci. 1988, 71, 1916–1925. [Google Scholar] [CrossRef]
  4. Weigel, K.A.; Lawlor, T.J., Jr.; VanRaden, P.M.; Wiggans, G.R. Use of linear type and production data to supplement early predicted transmitting abilities for productive life. J. Dairy Sci. 1998, 81, 2040–2044. [Google Scholar] [CrossRef]
  5. Caraviello, D.Z.; Weigel, K.A.; Gianola, D. Comparison between a Weibull proportional hazards model and a linear model for predicting the genetic merit of US Jersey sires for daughter longevity. J. Dairy Sci. 2004, 87, 1469–1476. [Google Scholar] [CrossRef]
  6. Caraviello, D.Z.; Weigel, K.A.; Gianola, D. Prediction of longevity breeding values for US Holstein sires using survival analysis methodology. J. Dairy Sci. 2004, 87, 3518–3525. [Google Scholar] [CrossRef]
  7. Roberts, T.; Chapinal, N.; LeBlanc, S.J.; Kelton, D.F.; Dubuc, J.; Duffield, T.F. Metabolic parameters in transition cows as indicators for early-lactation culling risk. J. Dairy Sci. 2012, 95, 3057–3063. [Google Scholar] [CrossRef] [PubMed]
  8. Seifi, H.A.; LeBlanc, S.J.; Leslie, K.E.; Duffield, T.F. Metabolic predictors of post-partum disease and culling risk in dairy cattle. Vet. J. 2011, 188, 216–220. [Google Scholar] [CrossRef]
  9. Szyda, J.; Morek-Kopeć, M.; Komisarek, J.; Żarnecki, A. Evaluating markers in selected genes for association with functional longevity of dairy cattle. BMC Genet. 2011, 12, 30. [Google Scholar] [CrossRef] [Green Version]
  10. Nordlund, K. The Transition cow needs space and comfort. In Proceedings of the “Cow Longevity Conference”, Hamra Farm/Tumba, Sweden, 28–29 August 2013; pp. 166–177. [Google Scholar]
  11. Dallago, G.M.; de Figueiredo, D.M.; de Resende Andrade, P.C.; dos Santos, R.A.; Lacroix, R.; Santschi, D.E.; Lefebvre, D.M. Predicting first test day milk yield of dairy heifers. Comput. Electron. Agric. 2019, 166, 105032. [Google Scholar] [CrossRef]
  12. Lacroix, R.; Salehi, F.; Yang, X.Z.; Wade, K.M. Effects of data preprocessing on the performance of artificial neural networks for dairy yield prediction and cow culling classification. Trans. ASAE 1997, 40, 839–846. [Google Scholar] [CrossRef]
  13. Adamczyk, K.; Zaborski, D.; Grzesiak, W.; Makulska, J.; Jagusiak, W. Recognition of culling reasons in Polish dairy cows using data mining methods. Comput. Electron. Agric. 2016, 127, 26–37. [Google Scholar] [CrossRef]
  14. Krug, C.; Haskell, M.J.; Nunes, T.; Stilwell, G. Creating a model to detect dairy cattle farms with poor welfare using a national database. Prev. Vet. Med. 2015, 122, 280–286. [Google Scholar] [CrossRef]
  15. Bach, A. Associations between several aspects of heifer development and dairy cow survivability to second lactation. J. Dairy Sci. 2011, 94, 1052–1057. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Heinrichs, A.J.; Zanton, G.I.; Lascano, G.J.; Jones, C.M. A 100-Year Review: A century of dairy heifer research. J. Dairy Sci. 2017, 100, 10173–10188. [Google Scholar] [CrossRef]
  17. Wathes, D.C.; Pollott, G.E.; Johnson, K.F.; Richardson, H.; Cooke, J.S. Heifer fertility and carry over consequences for life time production in dairy and beef cattle. Animal 2014, 8, 91–104. [Google Scholar] [CrossRef] [Green Version]
  18. Lohakare, J.D.; Südekum, K.-H.; Pattanaik, A.K. Nutrition-induced changes of growth from birth to first calving and its impact on mammary development and first-lactation milk yield in dairy heifers: A review. Asian. Austral. J. Anim. 2012, 25, 1338–1350. [Google Scholar] [CrossRef] [Green Version]
  19. Wall, E.; Coffey, M.P.; Brotherstone, S. The relationship between body energy traits and production and fitness traits in first-lactation dairy cows. J. Dairy Sci. 2007, 90, 1527–1537. [Google Scholar] [CrossRef]
  20. Haworth, G.M.; Tranter, W.P.; Chuck, J.N.; Cheng, Z.; Wathes, D.C. Relationships between age at first calving and first lactation milk yield, and lifetime productivity and longevity in dairy cows. Vet. Rec. 2008, 162, 643–647. [Google Scholar] [CrossRef] [PubMed]
  21. Jairath, L.K.; Hayes, J.F.; Cue, R.I. Correlations between first lactation and lifetime performance traits of Canadian Holsteins. J. Dairy Sci. 1995, 78, 438–448. [Google Scholar] [CrossRef]
  22. Janus, E.; Borkowska, D. Correlations between milk yield in primiparous PHF cows and selected lifetime performance and fertility indicators as well as reasons for culling. Acta Sci. Pol. Zootechnica 2012, 11, 23–32. [Google Scholar]
  23. Sawa, A.; Siatka, K.; Krężel-Czopek, S. Effect of age at first calving on first lactation milk yield, lifetime milk production and longevity of cows. Ann. Anim. Sci. 2019, 19, 189–200. [Google Scholar] [CrossRef] [Green Version]
  24. Jeretina, J.; Babnik, D.; Škorjanc, D. Modeling lactation curve standards for test-day milk yield in Holstein, Brown Swiss and Simmental cows. J. Anim. Plant. Sci. 2013, 23, 754–762. [Google Scholar]
  25. Silvestre, A.M.; Martins, A.M.; Santos, V.A.; Ginja, M.M.; Colaço, J.A. Lactation curves for milk, fat and protein in dairy cows: A full approach. Livest. Sci. 2009, 122, 308–313. [Google Scholar] [CrossRef]
  26. Samarasinghe, S. Neural Networks for Applied Sciences and Engineering: From Fundamentals to Complex. Pattern Recognition; Auerbach: Boca Raton, FL, USA, 2007; ISBN 978-1-4200-1306-1. [Google Scholar]
  27. Grzesiak, W.; Zaborski, D. Examples of the use of data mining methods in animal breeding. In Data Mining Applications in Engineering and Medicine; Karahoca, A., Ed.; InTech: Rijeka, Croatia, 2012; pp. 303–324. [Google Scholar] [CrossRef] [Green Version]
  28. Behkami, S.; Zain, S.M.; Gholami, M.; Khir, M.F.A. Classification of cow milk using artificial neural network developed from the spectral data of single-and three-detector spectrophotometers. Food Chem. 2019, 294, 309–315. [Google Scholar] [CrossRef] [PubMed]
  29. Elfadl, E.A.A.; Abdallah, F.D. Using discriminant analysis and artificial neural network models for classification and prediction of fertility status of Friesian cattle. Am. J. Appl. Math. Stat. 2017, 5, 90–94. [Google Scholar] [CrossRef] [Green Version]
  30. Fenlon, C.; O’Grady, L.; Mee, J.F.; Butler, S.T.; Doherty, M.L.; Dunnion, J. A comparison of 4 predictive models of calving assistance and difficulty in dairy heifers and cows. J. Dairy Sci. 2017, 100, 9746–9758. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Lee, D.-H.; Lee, S.-H.; Cho, B.-K.; Wakholi, C.; Seo, Y.-W.; Cho, S.-H.; Kang, T.-H.; Lee, W.-H. Estimation of carcass weight of Hanwoo (Korean Native Cattle) as a function of body measurements using statistical models and a neural network. Asian-Australas. J. Anim. Sci. 2020, 33, 1633. [Google Scholar] [CrossRef] [Green Version]
  32. Hill, T.; Lewicki, P. Statistics: Methods and Applications; StatSoft: Tulsa, OK, USA, 2007; ISBN 1-884233-59-7. [Google Scholar]
  33. Maddala, G.S. Introduction to Econometrics, 3rd ed.; Wiley India Pvt. Limited: New Delhi, India, 2007; ISBN 978-81-265-1095-5. [Google Scholar]
  34. Zaborski, D.; Proskura, W.S.; Grzesiak, W. The Use of data mining methods for dystocia detection in Polish Holstein-Friesian Black-and-White Cattle. Asian-Australas. J. Anim. Sci. 2018, 31, 1700. [Google Scholar] [CrossRef] [Green Version]
  35. Sifre, L.; Berge, P.; Engel, E.; Martin, J.-F.; Bonny, J.-M.; Listrat, A.; Taylor, R.; Culioli, J. Influence of the spatial organization of the perimysium on beef tenderness. J. Agric. Food Chem. 2005, 53, 8390–8399. [Google Scholar] [CrossRef]
  36. Wood, P.D.P. Algebraic model of the lactation curve in cattle. Nature 1967, 216, 164. [Google Scholar] [CrossRef]
  37. Olori, V.E.; Brotherstone, S.; Hill, W.G.; McGuirk, B.J. Fit of standard models of the lactation curve to weekly records of milk production of cows in a single herd. Livest. Prod. Sci. 1999, 58, 55–63. [Google Scholar] [CrossRef]
  38. Nawi, N.M.; Ransing, M.R.; Ransing, R.S. An improved learning algorithm based on the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method for back propagation neural networks. In Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications, Jinan, China, 16–18 October 2006; Volume 1, pp. 152–157. [Google Scholar]
  39. Burez, J.; Van den Poel, D. Handling class imbalance in customer churn prediction. Expert Syst. Appl. 2009, 36, 4626–4636. [Google Scholar] [CrossRef] [Green Version]
  40. Ha, K.; Cho, S.; MacLachlan, D. Response models based on bagging neural networks. J. Interact. Mark. 2005, 19, 17–30. [Google Scholar] [CrossRef]
  41. Krpálková, L.; Cabrera, V.E.; Kvapilík, J.; Burdych, J.; Crump, P. Associations between age at first calving, rearing average daily weight gain, herd milk yield and dairy herd production, reproduction, and profitability. J. Dairy Sci. 2014, 97, 6573–6582. [Google Scholar] [CrossRef] [PubMed]
  42. Miglior, F.; Fleming, A.; Malchiodi, F.; Brito, L.F.; Martin, P.; Baes, C.F. A 100-year review: Identification and genetic selection of economically important traits in dairy cattle. J. Dairy Sci. 2017, 100, 10251–10271. [Google Scholar] [CrossRef]
  43. Oltenacu, P.A.; Broom, D.M. The impact of genetic selection for increased milk yield on the welfare of dairy cows. Anim. Welfare 2010, 19, 39–49. [Google Scholar]
  44. Vasseur, E. Animal Behavior and Well-Being Symposium: Optimizing outcome measures of welfare in dairy cattle assessment. J. Anim. Sci. 2017, 95, 1365–1371. [Google Scholar] [CrossRef]
  45. Wolf, C.A.; Tonsor, G.T.; McKendree, M.G.S.; Thomson, D.U.; Swanson, J.C. Public and farmer perceptions of dairy cattle welfare in the United States. J. Dairy Sci. 2016, 99, 5892–5903. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Adamczyk, K.; Makulska, J.; Jagusiak, W.; Węglarz, A. Associations between strain, herd size, age at first calving, culling reason and lifetime performance characteristics in Holstein-Friesian cows. Animal 2017, 11, 327–334. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  47. Beaudeau, F.; Ohlson, A.; Emanuelson, U. Associations between bovine coronavirus and bovine respiratory syncytial virus infections and animal performance in Swedish dairy herds. J. Dairy Sci. 2010, 93, 1523–1533. [Google Scholar] [CrossRef] [Green Version]
  48. Van der Fels-Klerx, H.J.; Martin, S.W.; Nielen, M.; Huirne, R.B.M. Effects on productivity and risk factors of bovine respiratory disease in dairy heifers; a review for the Netherlands. NJAS-Wag. J. Life Sci. 2002, 50, 27–45. [Google Scholar] [CrossRef] [Green Version]
  49. Inchaisri, C.; Hogeveen, H.; Vos, P.; Van Der Weijden, G.C.; Jorritsma, R. Effect of milk yield characteristics, breed, and parity on success of the first insemination in Dutch dairy cows. J. Dairy Sci. 2010, 93, 5179–5187. [Google Scholar] [CrossRef] [PubMed]
  50. Rearte, R.; LeBlanc, S.J.; Corva, S.G.; de la Sota, R.L.; Lacau-Mengido, I.M.; Giuliodori, M.J. Effect of milk production on reproductive performance in dairy herds. J. Dairy Sci. 2018, 101, 7575–7584. [Google Scholar] [CrossRef]
  51. Bicalho, M.L.S.; Marques, E.C.; Gilbert, R.O.; Bicalho, R.C. The association of plasma glucose, BHBA, and NEFA with postpartum uterine diseases, fertility, and milk production of Holstein dairy cows. Theriogenology 2017, 88, 270–282. [Google Scholar] [CrossRef] [PubMed]
  52. Bisinotto, R.S.; Greco, L.F.; Ribeiro, E.S.; Martinez, N.; Lima, F.S.; Staples, C.R.; Thatcher, W.W.; Santos, J.E.P. Influences of nutrition and metabolism on fertility of dairy cows. Anim. Reprod. 2012, 9, 260–272. [Google Scholar]
  53. Zebeli, Q.; Ghareeb, K.; Humer, E.; Metzler-Zebeli, B.U.; Besenfelder, U. Nutrition, rumen health and inflammation in the transition period and their role on overall health and fertility in dairy cows. Res. Vet. Sci. 2015, 103, 126–136. [Google Scholar] [CrossRef]
  54. Bicalho, R.C.; Oikonomou, G. Control and prevention of lameness associated with claw lesions in dairy cows. Livest. Sci. 2013, 156, 96–105. [Google Scholar] [CrossRef]
  55. Charfeddine, N.; Pérez-Cabal, M.A. Effect of claw disorders on milk production, fertility, and longevity, and their economic impact in Spanish Holstein cows. J. Dairy Sci. 2017, 100, 653–665. [Google Scholar] [CrossRef]
  56. Mottram, T. Animal board invited review: Precision livestock farming for dairy cows with a focus on oestrus detection. Animal 2016, 10, 1575–1584. [Google Scholar] [CrossRef] [Green Version]
  57. Albaaj, A.; Foucras, G.; Raboisson, D. High somatic cell counts and changes in milk fat and protein contents around insemination are negatively associated with conception in dairy cows. Theriogenology 2017, 88, 18–27. [Google Scholar] [CrossRef]
  58. Hudson, C.D.; Bradley, A.J.; Breen, J.E.; Green, M.J. Associations between udder health and reproductive performance in United Kingdom dairy cows. J. Dairy Sci. 2012, 95, 3683–3697. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  59. Adamczyk, K.; Jagusiak, W.; Makulska, J. Analysis of lifetime performance and culling reasons in Black-and-White Holstein-Friesian cows compared with crossbreds. Ann. Anim. Sci. 2018, 18, 1061–1079. [Google Scholar] [CrossRef] [Green Version]
  60. Kumar, H.; Hooda, B.K. Prediction of milk production using artificial neural network. Curr. Adv. Agric. Sci. 2014, 6, 173–175. [Google Scholar] [CrossRef]
  61. Bhosale, M.D.; Singh, T.P. Development of lifetime milk yield equation using artificial neural network in Holstein Friesian crossbred dairy cattle and comparison with multiple linear regression model. Curr. Sci. 2017, 113, 951–955. [Google Scholar] [CrossRef]
  62. Ehret, A.; Hochstuhl, D.; Gianola, D.; Thaller, G. Application of neural networks with back-propagation to genome-enabled prediction of complex traits in Holstein-Friesian and German Fleckvieh cattle. Genet. Sel. Evol. 2015, 47, 22–31. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Figure 1. The structure of the multilayer perceptron with one hidden layer and 37 input variables (MLP37).
Figure 1. The structure of the multilayer perceptron with one hidden layer and 37 input variables (MLP37).
Animals 11 00721 g001
Figure 2. The structure of the multilayer perceptron with one hidden layer and five input variables (MLP5).
Figure 2. The structure of the multilayer perceptron with one hidden layer and five input variables (MLP5).
Animals 11 00721 g002
Figure 3. Gain chart for the culling reasons predicted by the multilayer perceptron with one hidden layer and 37 input variables (MLP37). Black, reference line corresponding to a model without discriminatory power.
Figure 3. Gain chart for the culling reasons predicted by the multilayer perceptron with one hidden layer and 37 input variables (MLP37). Black, reference line corresponding to a model without discriminatory power.
Animals 11 00721 g003
Figure 4. Gains chart for the culling reasons predicted by the multilayer perceptron with one hidden layer and five input variables (MLP5). Black, reference line corresponding to a model without discriminatory power.
Figure 4. Gains chart for the culling reasons predicted by the multilayer perceptron with one hidden layer and five input variables (MLP5). Black, reference line corresponding to a model without discriminatory power.
Animals 11 00721 g004
Figure 5. Gains chart for the culling reasons predicted by the general discriminant analysis model with 37 predictors (GDA37). Black, reference line corresponding to a model without discriminatory power.
Figure 5. Gains chart for the culling reasons predicted by the general discriminant analysis model with 37 predictors (GDA37). Black, reference line corresponding to a model without discriminatory power.
Animals 11 00721 g005
Figure 6. Gains chart for the culling reasons predicted by the general discriminant analysis model with five predictors (GDA5). Black, reference line corresponding to a model without discriminatory power.
Figure 6. Gains chart for the culling reasons predicted by the general discriminant analysis model with five predictors (GDA5). Black, reference line corresponding to a model without discriminatory power.
Animals 11 00721 g006
Table 1. The number of cows in individual classes based on the age at first calving (AFC) groups and culling reasons (R).
Table 1. The number of cows in individual classes based on the age at first calving (AFC) groups and culling reasons (R).
Age GroupAFCCulling Reason (R)
R1
(866)
R2
(2548)
R3 (16,923)R4 (22,867)R5
(42,566)
R6
(57,701)
R7
(379,582)
R8
(323,523)
R9
(51,520)
R10 (49,914)Total (948,010)
117–18004471438910692
2190084121764141118148
3200216112339163223331340
42134395554104359777368836
522413951502083209851692272002371
623103720431347567420464215734805233
724124926744369790729566397426477359
825134925641669889229457107726657416
92675622636961975825295496006146327
1027103017128253763721283824954365108
112822914721440553816583054013714070
12294159819034846213452313142973304
13302147812927436910251822492262548
1431417631062012707871521931771970
153211143841602056341031551391535
1633055666131138507821261181229
1734582653104131432738873993
Total-77339179728894953647520,60141205062456650,879
AFC—age at first calving, R—culling reason (R1—infectious diseases, R2—respiratory system diseases, R3—low milk yield, R4—nutritive and metabolic diseases, R5—legs diseases, R6—udder diseases, R7—infertility and reproduction problems, R8—old age, R9—accidents, R10—other). The number of test-day records within each culling reason is given in brackets.
Table 2. Mean and standard deviation (SD) of continuous predictors in individual sets.
Table 2. Mean and standard deviation (SD) of continuous predictors in individual sets.
VariableTraining Set (n = 33,071)Validation Set (n = 7632)Test Set (n = 10,176)Total (n = 50,879)
MeanSDMeanSDMeanSDMeanSD
HERD (number of animals)192.68290.44195.97295.83191.88265.77193.02291.07
TD9.543.609.633.629.643.359.873.54
AFC (months)26.283.0926.263.0626.243.3226.273.09
DIM (days)286.86119.65287.42120.27286.23110.77286.81119.78
a28.561.2228.561.2128.551.2928.561.22
b0.130.040.130.040.130.040.130.04
c0.060.020.060.020.060.030.060.02
MILK (kg)24.336.3724.346.3924.216.0024.316.36
MILKMIN (kg)17.766.6317.646.6517.655.9617.726.61
MILKMAX (kg)30.487.5230.567.6030.367.3730.477.53
MILKSD (kg)4.522.274.572.344.512.024.522.28
FAT (%)4.120.594.120.594.120.544.120.58
FATMIN (%)3.350.623.340.623.360.593.350.62
FATMAX (%)5.091.005.090.995.070.975.080.99
FATSD (%)0.610.360.610.350.600.280.610.35
PROT (%)3.340.293.330.293.330.283.340.29
PROTMIN (%)2.930.262.920.272.930.252.930.27
PROTMAX (%)3.760.473.760.483.760.463.760.47
PROTSD (%)0.300.150.300.150.300.140.300.15
LACT (%)4.840.164.840.164.840.154.840.16
LACTMIN (%)4.610.274.610.284.610.264.610.27
LACTMAX (%)5.020.165.020.165.020.155.020.16
LACTSD (%)0.140.090.140.090.140.070.140.09
UREA (mg/L)223.6460.62222.6360.79223.6660.81223.4961.39
UREAMIN (mg/L)145.7060.76144.5359.66145.2258.95145.4360.76
UREAMAX (mg/L)312.2489.55310.8794.24313.1790.49312.2291.89
UREASD (mg/L)58.1528.4957.9529.6758.8726.2758.2729.20
SCC (thousands/mL)532.26913.81528.85878.79554.89738.16536.27925.37
SCCMIN (thousands/mL)89.94262.4792.64250.6598.53152.6992.06277.86
SCCMAX (thousands/mL)1836.303046.971824.382969.941890.082974.921845.273053.84
SCCSD (thousands/mL)640.551161.20632.641117.94661.23999.18643.501160.49
DMSR (%)13.010.7313.000.7413.000.7113.010.73
DMMIN (%)12.030.7312.010.7312.040.7112.030.73
DMMAX (%)14.141.1714.141.1614.141.1414.141.16
DMSD (%)0.750.390.750.380.740.330.740.39
SD—standard deviation, HERD—herd size, TD—number of test-day records, AFC—age at first calving, DIM—days in milk, a—initial milk yield (Wood’s model parameter), b—rate of increase until the peak is reached (Wood’s model parameter), c—rate of decline after peak production (Wood’s model parameter), MILK—average daily milk yield, MILKMIN—minimum daily milk yield, MILKMAX—maximum daily milk yield, MILKSD—standard deviation of daily milk yield, FAT—average fat content, FATMIN—minimum fat content, FATMAX—maximum fat content, FATSD—standard deviation of fat content, PROT—average protein content, PROTMIN—minimum protein content, PROTMAX—maximum protein content, PROTSD—standard deviation of protein content, LACT—average lactose content, LACTMIN—minimum lactose content, LACTMAX—maximum lactose content, LACTSD—standard deviation of lactose content, UREA—average urea content, UREAMIN—minimum urea content, UREAMAX—maximum urea content, UREASD—standard deviation of urea content, SCC—average somatic cell count, SCCMIN—minimum somatic cell count, SCCMAX—maximum somatic cell count, SCCSD—standard deviation of somatic cell count, DM—average dry matter content, DMMIN—minimum dry matter content, DMMAX—maximum dry matter content, and DMSD—standard deviation of dry matter content.
Table 3. The number (n) and percentage (%) of cows for individual variants of categorical predictors and the output variable in the training, validation, and test set.
Table 3. The number (n) and percentage (%) of cows for individual variants of categorical predictors and the output variable in the training, validation, and test set.
VariantTraining SetValidation SetTest SetTotal
n%n%n%n%
Calving season
Spring859326.0194725.5263425.913,17425.9
Summer760823.0173022.7228522.511,62322.8
Autumn800824.2187324.5246624.212,34724.3
Winter886226.8208227.3279127.413,73527.0
Calving difficulty
Unassisted12,54137.9288537.8383537.719,26137.9
Easy18,60856.3429356.3571856.228,61956.3
Moderate14914.53554.74684.623144.6
Difficult1480.5350.5530.52360.5
Abortions2510.8580.8870.93960.8
Caesarean320.160.1150.2530.1
Culling reason (output variable)
R1510.2140.2120.1770.2
R22170.7470.6750.73390.7
R311313.43034.03633.617973.5
R418615.64455.85835.728895.7
R532069.77229.5102510.149539.7
R6422112.894012.3131412.9647512.7
R713,46040.7308640.4405539.920,60140.5
R826868.16047.98308.241208.1
R932839.97329.6104710.3506210.0
R1029558.97399.78728.645669.0
R1—infectious diseases, R2—respiratory system diseases, R3—low milk yield, R4—nutritive and metabolic diseases, R5—leg diseases, R6—udder diseases, R7—infertility and reproduction problems, R8—old age, R9—accidents, and R10—others.
Table 4. Sensitivity analysis for the multilayer perceptron with one hidden layer and 37 input variables (MLP37) on the training set.
Table 4. Sensitivity analysis for the multilayer perceptron with one hidden layer and 37 input variables (MLP37) on the training set.
VariableabAFCcCALVDMFATSEASON
Ratio188.901113.52745.86334.9408.8701.3161.2841.112
Rank12345678
VariableLACTSCCSDPROTDIMTDFATMAXLACTSDFATSD
Ratio1.1121.0931.0691.0561.0521.0491.0491.043
Rank910111213141516
VariableSCCMILKUREAMAXLACTMAXPROTMAXMILKMAXDMMAXDMSD
Ratio1.0411.0371.0371.0361.0351.0321.0301.029
Rank1718192021222324
VariablePROTMINFATMINDMMINUREAMINUREASDMILKMINPROTSDLACTMIN
Ratio1.0281.0251.0241.0161.0161.0151.0141.012
Rank2526272829303132
VariableSCCMINMILKSDSCCMAXHERDUREA---
Ratio1.0121.0091.0081.0081.005---
Rank3334353637---
Ratio—error when input is set to mean divided by error when input is used (for continuous predictors) or average error when input is set to all other categorical levels divided by error when input is used (for categorical predictors), rank—orders predictors according to their decreasing importance from one—the most important predictor to 37—the least important predictor, HERD—herd size, TD—number of test-day records, AFC—age at first calving, DIM—days in milk, a—initial milk yield (Wood’s model parameter), b—rate of increase until the peak is reached (Wood’s model parameter), c—rate of decline after peak production (Wood’s model parameter), MILK—average daily milk yield, MILKMIN—minimum daily milk yield, MILKMAX—maximum daily milk yield, MILKSD—standard deviation of daily milk yield, FAT—average fat content, FATMIN—minimum fat content, FATMAX—maximum fat content, FATSD—standard deviation of fat content, PROT—average protein content, PROTMIN—minimum protein content, PROTMAX—maximum protein content, PROTSD—standard deviation of protein content, LACT—average lactose content, LACTMIN—minimum lactose content, LACTMAX—maximum lactose content, LACTSD—standard deviation of lactose content, UREA—average urea content, UREAMIN—minimum urea content, UREAMAX—maximum urea content, UREASD—standard deviation of urea content, SCC—average somatic cell count, SCCMIN—minimum somatic cell count, SCCMAX—maximum somatic cell count, SCCSD—standard deviation of somatic cell count, DM—average dry matter content, DMMIN—minimum dry matter content, DMMAX—maximum dry matter content, DMSD—standard deviation of dry matter content, CALV—calving difficulty, SEASON—calving season.
Table 5. Description of the 10 best multi-layer perceptrons (MLP) with 37 and five predictors.
Table 5. Description of the 10 best multi-layer perceptrons (MLP) with 37 and five predictors.
RankingNumber of Input VariablesNetwork StructureQuality of the MLP [%]
Training SetValidation SetTest Set
13745-29-1096.1795.9695.94
510-19-1083.0183.5282.99
23745-27-1090.4289.7089.74
510-20-1079.0779.4278.46
33745-29-1088.7088.4088.77
510-6-1075.9576.0175.35
43745-24-1086.9686.8786.56
510-20-1074.3974.3874.14
53745-22-1086.7387.3386.54
510-12-1071.4671.4070.82
Table 6. Confusion matrix for the best networks (multilayer perceptrons with one hidden layer and 37 or five input variables) on the test set.
Table 6. Confusion matrix for the best networks (multilayer perceptrons with one hidden layer and 37 or five input variables) on the test set.
Predicted Culling ReasonNo. of Input VariablesObserved Culling Reason
R1R2R3R4R5R6R7R8R9R10
R13711000010000
57000000000
R23716820006000
504900000000
R33700279110019001
5001985601500016
R4370350567000000
50068515000200
R53700109772201843
5415507555000145
R637011800123800662
50033032996003620
R7370084142640241016
501110122734054086214
R837000000082900
50001202082800
R9370340226209611
5006052151059953
R103700113221202809
5105201118000444
R1—infectious diseases, R2—respiratory system diseases, R3—low milk yield, R4—nutritive and metabolic diseases, R5—leg diseases, R6—udder diseases, R7—infertility and reproduction problems, R8—old age, R9—accidents, and R10—other. The numbers of correctly classified cases are shown on the diagonal.
Table 7. The correct classification rate for the multilayer perceptrons (MLP) with one hidden layer and the general discriminant analysis (GDA).
Table 7. The correct classification rate for the multilayer perceptrons (MLP) with one hidden layer and the general discriminant analysis (GDA).
Culling ReasonnMLP37MLP5GDA37GDA5
Cor.Incor.PPVCor.Incor.PPVCor.Incor.PPVCor.Incor.PPV
R11291.678.3391.6758.3341.67100.000.00100.000.000.00100.000.00
R27590.679.3388.3165.3334.67100.006.6793.3383.331.3398.67100.00
R336376.8623.1490.0054.5545.4569.4722.8777.1341.910.8399.1713.04
R458397.262.7491.4588.3411.6688.0388.1611.8470.9990.059.9566.37
R5102595.324.6893.6773.6626.3481.2732.2067.8046.613.2296.7876.74
R6131494.225.7893.4375.8024.2069.9937.9862.0247.894.1995.8120.44
R7405599.240.7698.3199.980.0288.8891.398.6158.9197.632.3749.81
R883099.880.12100.0099.760.2498.3490.369.6478.1391.328.6869.73
R9104791.798.2196.2057.2142.7968.152.9697.0440.260.00100.000.00
R1087292.787.2293.2050.9249.0872.084.8295.1828.000.00100.000.00
Total10,17695.944.06-82.9917.01-58.5741.43-52.4147.59-
n—number of records, MLP37—multilayer perceptron with one hidden layer and 37 input variables, MLP5—multilayer perceptron with one hidden layer and five input variables, GDA37—general discriminant analysis with 37 predictors, GDA5—general discriminant analysis with five predictors, Cor.—correct, Incor.—incorrect, PPV—positive predictive value, R1—infectious diseases, R2—respiratory system diseases, R3—low milk yield, R4—nutritive and metabolic diseases, R5—leg diseases, R6—udder diseases, R7—infertility and reproduction problems, R8—old age, R9—accidents, and R10—others.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Adamczyk, K.; Grzesiak, W.; Zaborski, D. The Use of Artificial Neural Networks and a General Discriminant Analysis for Predicting Culling Reasons in Holstein-Friesian Cows Based on First-Lactation Performance Records. Animals 2021, 11, 721. https://doi.org/10.3390/ani11030721

AMA Style

Adamczyk K, Grzesiak W, Zaborski D. The Use of Artificial Neural Networks and a General Discriminant Analysis for Predicting Culling Reasons in Holstein-Friesian Cows Based on First-Lactation Performance Records. Animals. 2021; 11(3):721. https://doi.org/10.3390/ani11030721

Chicago/Turabian Style

Adamczyk, Krzysztof, Wilhelm Grzesiak, and Daniel Zaborski. 2021. "The Use of Artificial Neural Networks and a General Discriminant Analysis for Predicting Culling Reasons in Holstein-Friesian Cows Based on First-Lactation Performance Records" Animals 11, no. 3: 721. https://doi.org/10.3390/ani11030721

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop