1. Introduction
The need to produce more food in less arable land arises from several key factors. With a rapidly growing global population, it is essential to increase food production to meet the rising demand. However, the availability of arable land is limited, and various challenges such as urbanization, soil degradation, and climate change are facing it. By maximizing food production on limited land, we can ensure environmental sustainability by minimizing deforestation and reducing the need for chemical inputs. Additionally, producing more food on less land contributes to food security, particularly in regions vulnerable to hunger and malnutrition. It also improves economic efficiency in agriculture, leading to enhanced profitability, rural development, and overall economic growth [
1].
Breeding methodologies like genomic selection (GS) play a crucial role in improving productivity worldwide due to several key factors. It enables precision breeding by identifying specific genes and markers associated with desirable traits, allowing for targeted selection of high-yielding varieties or breeds. By predicting genetic potential early on, GS accelerates breeding cycles and expedites the development of productive individuals. The accuracy of trait predictions is enhanced through the utilization of genomic data, enabling informed decisions for selecting individuals with higher productivity traits [
2]. GS also facilitates the development of cultivars or breeds that are adaptable to diverse environments, ensuring productivity across different regions [
3]. Moreover, by optimizing breeding efforts, GS promotes sustainable resource management, minimizing waste and environmental impact while meeting global food demands.
Genomic selection leverages advances in genomics and data analysis as it allows breeders to make more informed decisions in selecting individuals with desired traits, leading to faster and more precise breeding outcomes. By utilizing large-scale genomic data, GS accurately predicts the genetic potential of plants at an early stage, significantly reducing the time and resources required for traditional breeding methods. This acceleration of breeding cycles enables the development of improved varieties with enhanced traits such as yield, disease resistance, and nutritional quality. Furthermore, GS expands the scope of breeding by identifying and incorporating favorable traits from diverse genetic backgrounds, promoting genetic diversity and adaptability. Overall, GS is transforming plant breeding by optimizing efficiency, precision, and genetic gains, ultimately contributing to the development of sustainable and high-performing crop varieties [
3,
4].
However, it is important to recognize that the presence of genes or markers alone does not guarantee a high-yielding variety that is well adapted to stress conditions. This limitation arises because not all identified genes are necessarily functional in a given context, particularly under abiotic or biotic stress conditions. To address this, transcriptomics and associative transcriptomics play a crucial role in identifying functional genes linked to specific traits, such as stress tolerance. These approaches enable the assessment of gene expression and functionality, thereby offering deeper insights into which genes are actively contributing to a trait. Incorporating transcriptomic data into genomic prediction models can significantly enhance the precision of selecting stress-tolerant, high-yielding varieties, providing a complementary layer of information beyond marker-based selection [
5,
6].
Other challenges associated with implementing GS in plant breeding programs are when dealing with large and complex genomes, such as those of wheat and maize. Large plant genomes often contain a substantial amount of repetitive DNA, which can complicate marker identification and analysis [
7]. Additionally, the lack of comprehensive knowledge about the genetic basis of many important agronomic traits further complicates the application of GS, particularly for polygenic traits that are influenced by numerous loci with small effects [
8].
Furthermore, genes may behave differently depending on the genetic context in which they are expressed, leading to variable performance across environments and genetic populations [
9]. The lack of robust, high-throughput phenotyping platforms capable of capturing complex traits across environments remains a significant bottleneck in breeding programs [
10]. Addressing these limitations requires further integration of functional genomics, high-throughput phenotyping, and better models that account for environmental and genetic interactions. In summary, these points enrich the narrative with supporting citations while addressing the challenges of genomic selection, such as large genomes, insufficient phenotyping, and gene expression variability.
However, for a successful implementation of GS in breeding programs, high accuracy is essential [
11]. Accurate predictions enable breeders to make reliable decisions, maximizing the genetic potential of offspring and leading to improved traits and higher productivity. It helps avoid selecting individuals that are not the best, saving valuable resources and time. High accuracy allows breeders to focus on specific traits of interest, meeting market demands and environmental challenges. Trust and confidence in the approach are built through accurate predictions, facilitating wider adoption of GS. Ultimately, accurate predictions drive genetic improvement in crops and livestock, contributing to the success of breeding programs.
To enhance the efficiency of the GS methodology, a wide array of statistical machine learning models has been explored, encompassing both parametric approaches such as mixed models, Bayesian models, and penalized regression, as well as nonparametric models such as random forest, gradient boosting machine, and deep learning [
12]. Nevertheless, it is essential to recognize the constraints imposed by the No Free Lunch Theorem in statistical machine learning, which asserts the absence of a universal algorithm excelling in all conceivable tasks. Consequently, any enhancement in performance for one task necessitates a trade-off, potentially resulting in reduced performance in another domain. This underscores the absence of one-size-fits-all solutions in the pursuit of optimal performance across diverse problem domains. As a result, even though certain statistical machine learning models have exhibited commendable performance within the genomic prediction context, practical implementation often encounters challenges due to insufficient prediction accuracies. This limitation can be attributed to the multifaceted nature of the GS methodology, a predictive approach influenced by numerous factors.
In the realm of genomic selection (GS), where the primary objective is to identify and select the most promising genetic lines with high accuracy, it becomes imperative to incorporate robust metrics into the selection process. Sensitivity and Specificity serve as pivotal measures in this regard. Sensitivity, traditionally applied in diagnostic testing contexts, signifies the capacity of a test to accurately identify individuals possessing a particular characteristic. Sensitivity delineates the effectiveness of a test in correctly pinpointing individuals who exhibit the desired characteristic [
13]. A heightened Sensitivity translates to a diminished rate of false negatives, ensuring that fewer instances of the desired characteristic are overlooked or misjudged by the selection process. In the context of plant breeding, where the aim is to pinpoint the most advantageous genetic lines, it is paramount for the models to exhibit a commendable level of Sensitivity. This ensures that the selection process accurately identifies and advances the most promising genetic lines, thereby optimizing breeding outcomes.
Conversely, Specificity characterizes the capability of a diagnostic test to accurately exclude individuals lacking a specific characteristic. It quantifies the precision with which the test identifies individuals devoid of the characteristic under consideration [
13]. Within the realm of plant breeding, it is equally crucial to uphold a reasonable level of Specificity. This ensures that the selection process does not erroneously favor or advance genetic lines that do not possess the desired characteristics. By maintaining a balanced approach that encompasses both Sensitivity and Specificity, the selection process can effectively discriminate between superior genetic lines and those that do not meet the desired criteria, thereby enhancing the efficiency and efficacy of genomic selection in plant breeding endeavors. However, since the selection of the best lines in the context of GS is performed with predictions resulting from regression models, selecting those lines with larger predicted phenotypic or breeding values when the trait of interest is grain yield or other traits where the larger score in the trait is better.
On the other hand, lower predicted values can often be desired because they signify a higher probability of desirable traits (disease resistance) in the selected lines. Therefore, in genomic prediction for traits such as disease resistance or pest tolerance, where lower values indicate superior performance, selecting lines with lower predicted values can lead to the development of improved crop varieties with enhanced resilience to biotic stresses. Under both scenarios of selecting predicted lines since the lines with larger (or lower) predictions, the resulting selection guarantees considerably high Specificity and considerably low Sensitivity. However, many plant breeders may not be fully aware of this issue, as Sensitivity and Specificity metrics are not commonly integrated explicitly into their selection processes for identifying the best genetic lines. However, it is important to recognize that Sensitivity and Specificity are both critical measures of the accuracy and reliability of selected lines in plant breeding programs.
For this reason, a balanced good level of Sensitivity and Specificity is desired in plant breeding since high Sensitivity ensures that valuable cultivars with desirable characteristics are not overlooked, thus maximizing the potential for identifying superior genetic material. Conversely, high Specificity ensures that resources are efficiently allocated by effectively excluding cultivars lacking the desired characteristics. For this reason, in this paper, we explore some existing methods for selecting the best lines and we evaluate their Sensitivity and Specificity. In this research, we aim to provide detailed explanations of each existing selection method, introduce Sensitivity and Specificity as metrics for comparing the chosen selection candidates, and conduct benchmarking to enhance empirical evidence of their performance. Our benchmarking analysis encompasses five real datasets, comprising four related to maize and one to soybean, each with multiple traits and environments.
2. Materials and Methods
This study included Maize Data 2–5 (Maize_1. Maize_2, Maize_3, Maize_4) and Soybean 9 (Soybean_4) from the data employed by Montesinos-Lopez et al. [
14] (see Table A1 from Montesinos-Lopez et al. [
14]).
The current analysis with different models is based on the various datasets described in
Table 1. Four datasets are from maize trials while the remaining are from soybean. The population size of datasets ranged from 1864 to 999 genotypes while the number of applied quality SNP markers ranged from 1803 to 4085. The number of environments for the four maize datasets was 11 while the remaining soybean dataset had 8 environments. Four traits were included on the maize datasets while the soybean dataset comprised six traits.
Table 1.
Description of the datasets used for performing the benchmarking analysis. Data comes from Alencar Xavier. et al. (2021) [
15], recently cited by Montesinos-Lopez et al. 2024 [
14].
Table 1.
Description of the datasets used for performing the benchmarking analysis. Data comes from Alencar Xavier. et al. (2021) [
15], recently cited by Montesinos-Lopez et al. 2024 [
14].
Data Number | Dataset | Genotypes | Markers | Environments | Traits | Reference |
---|
Data 1 | Maize_1 | 1000 | 4085 | 11 | 4 | Maize Data 2 from [14,15] |
Data 2 | Maize_2 | 1000 | 4085 | 11 | 4 | Maize Data 3 from [14,15] |
Data 3 | Maize_3 | 1000 | 4085 | 11 | 4 | Maize Data 4 from [14,15] |
Data 4 | Maize_4 | 999 | 4085 | 11 | 4 | Maize Data 5 from [14,15] |
Data 5 | Soybean_4 | 1864 | 1803 | 8 | 6 | Soybean Data 9 from [14,15] |
All statistical models provided in the next section were implemented for each environment of each dataset and the results were reported for each dataset across the environment to summarize the results. For this reason, the statistical models given in the next section of the predictor do not take the effect of environments into account.
2.1. Statistical Models
The 5 statistical models studied in this research can be grouped into 2 main classes:
GBLUP (RC), GBLUP with a threshold (R), and GBLUP with an optimal fine-tuned threshold (RO). As described below, models R and RO are the basic RC models but with certain refinements on how the thresholds are defined;
TGBLUP (threshold GBLUP) with a threshold = 0.5 that classifies candidates as top and non-top (B) and a TBLUP with an optimal probability threshold is denoted as BO model. The BO model is also trained with the TGBLUP but in place of using a threshold of 0.5 to classify the lines as top and not top, an optimal probability threshold that guarantees similar Sensitivity and Specificity is used.
Thus, our study includes models RC, R, and RO, as well as B and BO that are described below.
2.1.1. Model RC
Model RC, known as the Bayesian Best Linear Unbiased Predictor (GBLUP) model, is structured as a regression framework. The model is defined as follows:
Here,
represents the continuous response variable observed in the ith instance.
are BLUEs resulting after accounting for environments and experimental factors (blocks, reps, etc). μ stands for the general mean or intercept.
, signifies the random effect associated with the ith genotype. Additionally,
denotes the random error component for the ith genotype, distributed as an independent normal random variable with a mean of 0 and a variance of
. It is assumed that
, where
is a linear kernel referred to as the genomic relationship matrix, calculated using the method outlined in [
16]. This model has been implemented in the R statistical software [
17], utilizing the BGLR library [
18].
For each fold of the cross-validation, we train the model (1) using the whole training information, and the predictions are performed for the whole testing set and then the predicted values are classified as top lines or not top lines. Given our interest in selecting the top-performing lines for each trait, we introduce the threshold . This threshold is determined by the empirical quantile of training response values (, …, ). For our purposes, we have chosen , but it is important to note that any other value between 0 and 1 can be employed. The classification of the lines in the testing set as top lines (1) and not top lines (0) under this model was performed by first ordering the lines in the testing set (of both observed and predicted) in decreasing order. Then, we identified how many lines of the observed response were larger than the threshold , and then this same number of lines was selected from the ordered vector of predicted lines. It is important to point out that, from the predicted lines, we selected an equal number of lines as in the vector of the observed response variable that had the best performance, that is, the top lines, but in this case, they were not chosen regarding of the threshold . Next the lines that matched between the two selected vectors were classified as top lines and those that did not were classified as not top lines.
2.1.2. Model R
Under this model, the predictions were obtained exactly with the trained model RC given above with Equation (1), but the process of classification was performed differently. For this reason, model R stood for GBLUP with a threshold. Here, the threshold,
, was employed to classify the lines into two categories: top lines (denoted as 1) if
, for
) and not top lines (denotes as 0) if
. That is, under this approach after obtaining the continuous predictions using the model RC, for any lines with predicted values exceeding the threshold,
were classified as top lines, while those with predicted values below the threshold were classified as not top lines [
19].
2.1.3. Model RO
The acronym RO stands for Regression Optimum, and this model leverages the RC model in its training process to fine-tune the threshold. For this reason, this model consists of the GBLUP model with an optimal threshold. Also, here, the initial threshold was the 80% quantile of the response variable from the training set. However, this initial threshold was adjusted to ensure a similar Sensitivity and Specificity. A schematic representation of the procedural steps involved in the training process of Model RO is illustrated in
Figure 1. According to the previously applied approach by Montesinos-López et al. [
19], these steps are briefly elucidated as follows:
Step 1: Initiate by partitioning the data into distinct subsets, comprising the inner-training, validation, and test sets;
Step 2: Proceed to train Model RC using the inner-training set while utilizing the original response variable;
Step 3: Utilize the trained Model RC (from Step 2) on the validation set to compute predicted continuous values, for . Subsequently, employ these predicted values to look for the optimal threshold ;
Step 4: This optimal threshold () is identified such that it guarantees that minimizes the average squared difference between Sensitivity and Specificity;
Step 5: Next, with the complete training dataset (comprising both the inner-training and validation sets), retrain Model RC. Use this refitted model to compute predicted values for the testing set, resulting in for ;
Step 6: Subsequently, employing the optimal threshold ( computed in Step 4 and the predicted values from the testing set in Step 5, classify the lines. If , categorize the line as a top line (1); otherwise, classify it as a not top line (0).
Figure 1 visually illustrates the incorporation of model R within the training process of model RO.
It is noteworthy that this refined optimal rule can be expressed in terms of conventional threshold values (). This equivalence arises from the classification of a line as top being analogous to categorizing it if where . These modified predicted values, or adjusted predicted values, are designed to ensure a congruent balance between Sensitivity and Specificity.
It is essential to highlight that, for this RO method, we have introduced a more computationally efficient version, which we refer to as the “Simple (S) RO method”. This approach involves training only one time in place of k times model (1) using the complete training (inner-training + validation) dataset and then using this trained model to predict the complete testing sets. This means that we utilize the predictions from model RC prior to classification and for this reason; this S RO method is computationally more efficient since only one time is trained the method RC. Subsequently, with the results obtained from the predicted values we divide these predicted values into an outer training and validation set. This splitting process of the results of the S RO method is carried out for choosing the optimal threshold but uses the predicted values resulting in training only one time the full training set. Employing a 10-fold inner cross-validation, we determine the optimal threshold for classifying the lines in the testing dataset. The schematic representation of this Simple RO method is akin to
Figure 1, with the noteworthy difference being that we only train model (1) once. This eliminates the need to repeatedly train the model (1) for the number of folds specified in the inner cross-validation, resulting in a significantly more computationally efficient implementation of the Simple RO method. In all models under study, we focus on the selection process of selecting the top-performing lines, assuming that these are the best lines. For this reason, Sensitivity will be associated with how these top lines were selected while Specificity regards how the non-top lines are not selected.
2.1.4. Model B
Model B, known as the Threshold Bayesian Probit Binary model (TGBLUP), operates on the premise that given
(covariates of dimension
),
is a random variable taking binary values, 0 and 1, with the following probabilities:
where
represents the intercept parameter,
signifies the random effect associated with the ith genotype, distributed as per the definition in model (1).
is the cumulative distribution function of the standard normal distribution. Furthermore,
represents the latent continuous normal process that underlies the observed categories (top lines and not top lines), where
is a normal random variable for errors with a mean of 0 and a variance of 1. These
values are referred to as “liabilities” [
20,
21]. The binary categorical phenotypes in model (2) are derived from the underlying phenotypic values,
, as follows:
if
, otherwise
Since model (2) is articulated within a Bayesian framework, it assumes a flat prior distribution for
(
). The TGBLUP has been implemented in the BGLR package [
18] within the R statistical software [
17]. Under this model after the training process has been computed the probability of
for each line in the testing set and, if this probability is larger than 0.5, the line is classified as top line; otherwise, the line is classified as not top line.
2.1.5. Model BO
Model BO is also trained with the TGBLUP using the model given in Equation (2), but in place of using a threshold of 0.5 to classify the lines as top and not top, we used an optimal probability threshold that guarantees similar Sensitivity and Specificity. For estimating this optimal probability threshold (hyperparameter), in addition to dividing the data into training and testing, the training set was divided into inner-training and validation. According to Montesinos-López et al. [
19], all the steps for implementing this method are given next:
Step 1: Commence by converting the continuous response variable into a binary response variable, utilizing the same threshold, as employed previously using the quantile 80% of the training set. Specifically, when the values of the continuous traits surpass the designated threshold, assign them a value of one (1, denoting a top line); otherwise, assign zero (0, signifying not top lines);
Step 2: Initially, partition the data into distinct subsets, namely the inner-training, validation, and test sets;
Step 3: Proceed to train model B, a classification model, using the inner-training set;
Step 4: Employ the trained model B (from Step 3) on the validation set to compute the predicted probabilities, for . Subsequently, utilize these predicted probability values to estimate classification accuracy metrics, facilitating the selection of the optimal probability threshold, ;
Step 5: Identify the optimal probability threshold, , which minimizes the average of the squared difference between Sensitivity and Specificity;
Step 6: Next, with the complete training dataset (comprising both the inner-training and validation sets), retrain model B and generate probability predictions for the testing set. These results are for within the testing set;
Step 7: Subsequently, employing the optimal probability threshold () determined in Step 5 and the predicted probabilities from the testing set in Step 6, classify the lines. If , categorize the line as a top line (1); otherwise, classify it as a not top line (0).
For further elaboration on these steps, please refer to
Figure 2.
In this specific BO method, we have introduced a more computationally efficient version, which we call the “Simple (S) BO method”. Here, is how it works: We train model (2) using the entire training (inner-training + validation) dataset and then use this trained model to predict the entire dataset, leveraging the predictions from model B before classification. This S BO method reduces computational resources since in place of training k-times the model with the training data is trained only one time using the whole training set because the classification model to compute the probabilities is trained only one time in
Figure 2. Next, we take the results from these predicted values for each fold and split them into an outer training and validation set. Through a 10-fold inner cross-validation process, we determine the best threshold for classifying the lines in the testing dataset. The visual representation of the Simple BO method is similar to
Figure 2, but the key difference is that we only train model (2) once. This eliminates the need to repeatedly train the model (2) for the specified number of folds in the inner cross-validation, resulting in a significantly more efficient implementation of the Simple BO method.
Ultimately, as all seven evaluated models (RC, R, RO, Simple RO, B, BO, and Simple BO) generate predictions in the form of binary outcomes (0 for not top lines and 1 for top lines), classification metrics have been computed to evaluate prediction accuracy for the testing sets.
2.2. Evaluation of Prediction Performance
In our study, we conducted a rigorous evaluation process for benchmarking the proposed models using a nested cross-validation approach. This approach involved two levels of cross-validation: outer-fold cross-validation and inner-fold cross-validation, as outlined in Montesinos López et al. [
12] The outer-fold cross-validation aimed to assess the prediction accuracy of our models on unseen data. We utilized a 5-fold cross-validation strategy, where the dataset was randomly split into five subsets or “folds”. The model was trained on four of these folds while the remaining one was reserved for testing. This process was repeated until each fold had served as the test set once. It is important to note that the test sets were exclusively used for evaluation purposes and were never incorporated into the model training process. The average performance across these 5 testing sets was reported, employing four distinct metrics, as elaborated in the subsequent sections.
Additionally, we computed prediction performance metrics based on the average results obtained from the 5-fold cross-validation. These metrics included the Kappa coefficient, Sensitivity, Specificity, and F1 score. For models B, R, and RC, no hyperparameter tuning was necessary, but for models BO and RO, we fine-tuned a critical hyperparameter: the probability threshold for model BO and a threshold for the RO model. This was carried out to ensure a balance between Sensitivity and Specificity. To achieve this balance, we conducted inner-fold cross-validation using ten-folds. The goal was to optimize the threshold values, which were selected as the average threshold value across the ten-folds of the inner cross-validation. These optimized thresholds were subsequently employed in the classification of lines into top and non-top categories within each testing set, as illustrated in
Figure 1 and
Figure 2.
Subsequently, we computed various metrics based on the predictions generated by three distinct models (R, RC, RO, B, and BO) for each testing dataset. These metrics are elucidated as follows, Kappa Coefficient (κ): The Kappa coefficient is a statistical measure used to assess the degree of agreement among raters, accounting for chance. It is defined as: , Where represents the agreement between the predicted and observed values and is computed as (TP + TN)/N, where TN denotes the number of true negatives, TP signifies the number of true positives, FN represents the number of false negatives, FP indicates the number of false positives, and N is defined as N = TP + TN + FP + FN. denotes the probability of agreement and is calculated as = (TP + FN)/N × (TP + FP)/N + (FP + TN)/N × (FN + TN)/N. Sensitivity: Sensitivity is the probability of obtaining a positive test result when the true condition is indeed positive. It is expressed as: Sensitivity = TP/(TP + FN). Specificity: Specificity represents the probability of obtaining a negative test result when the true condition is negative. It is formulated as: Specificity = TN/(TN + FP). Precision: Precision measures the ratio of correctly predicted positive observations to the total predicted positive observations and is defined as: Precision = TP/(TP + FP). A higher Precision value corresponds to a lower false positive rate and, importantly, signifies superior prediction accuracy. These metrics serve as critical indicators for evaluating the performance of our models, providing valuable insights into their predictive capabilities and the accuracy of their assessments.
The F1 Score, as a composite metric, provides a balanced evaluation by considering both Sensitivity and Precision. This holistic approach acknowledges the impact of false negatives and false positives in the assessment, making it particularly valuable when dealing with datasets that exhibit imbalanced class distribution. Accuracy, on the other hand, is most effective when the costs associated with false positives and false negatives are comparable. If there is a significant disparity in the cost implications of these errors, it is advisable to examine both Precision and Sensitivity [
13].
To facilitate interpretation, we compared model RO vs. R, RC, B, and BO and model BO vs. R, RC, B, and RO. We computed the relative efficiencies (RE) in terms of the Kappa score, denoted as
. This calculation is expressed as follows:
Here,
and
represent the Kappa coefficients of one of the five models (RO, R, RC, B, and BO). Similarly, concerning Sensitivity, the relative efficiency (RE) denoted as
, was computed as:
Again, this calculation involved any of the five models. The same approach was employed to calculate the relative efficiency for the F1 score and Specificity. Under all four metrics (Kappa, Sensitivity, Specificity, F1), if , where x represents Kappa, Sensitivity, Specificity, or F1, indicates that the method y yielded superior prediction performance. Conversely, when , the preferable method is z. In cases where , both methods exhibit equal efficiency in their predictive capabilities. This systematic evaluation approach aids in identifying the most effective method for the given context.
4. Discussion
Achieving high prediction accuracies in the application of the GS methodology necessitates several prerequisites; yet, efficiently optimizing the numerous factors that influence its accuracy poses a significant challenge. Given the importance of enhancing predictive methodologies, considerable research efforts are directed towards identifying and implementing strategies that can significantly improve its efficiency. Various studies have studied the impact of the size of the training set and diversity on the accuracy of these methods. Furthermore, investigations have explored how the population structure and its genetic relationship with the breeding population influence the Precision of genomic prediction. Other research areas include the effects of marker density and distribution, linkage disequilibrium, the genetic architecture and heritability of traits, and the exploration of novel statistical machine learning models [
5]. Additionally, the integration of supplementary inputs such as proteomics, metabolomics, and enviromics has been examined for their potential to enrich and refine predictive analytics. All these investigations try to improve the efficiency of the GS methodology and show that improving the GS methodology is an ongoing process that requires a combination of innovative methodologies, rigorous validation, and a deep understanding of the biological context. Additionally, staying informed about emerging technologies and ethical considerations is essential in this field.
Regarding statistical machine learning methods, many parametric (mixed models, Bayesian methods) and nonparametric (deep learning, random forest, gradient boosting machines, etc.) state-of-the-art algorithms have been explored in the context of genomic prediction. Some notable algorithms that aim to leverage genetic data to predict various phenotypic traits or outcomes are GBLUP, Bayesian methods (A, B, C, Lasso, etc), random forest, gradient boosting machine, support vector machine, deep learning, kernel methods, etc. However, the predictions resulting from most of these algorithms are not yet optimal for practical applications. For these reasons, researchers continue to work on improving genomic prediction models by developing more sophisticated algorithms, incorporating additional sources of data (e.g., transcriptomics, epigenetics), and refining methods for accounting for complex genetic interactions and environmental factors. While these models have made significant progress, achieving optimal accuracy in all scenarios remains a complex and ongoing challenge in genomics.
The genomic prediction models, while providing extensive results, were validated by comparing predicted values with observed phenotypic data. The high agreement between observed and predicted values supports the practical applicability of these models in breeding programs. Furthermore, cross-validation across environments demonstrated the models’ robustness and generalizability, making them useful in predicting traits under various environmental conditions, even those not represented in the original dataset.
These findings suggest that while the statistical methods are sophisticated, their practical value lies in their ability to reduce the need for exhaustive field testing, accelerating the selection process by enabling breeders to focus on high-potential genotypes early in the breeding cycle. This application demonstrates that genomic prediction can be an effective tool for enhancing breeding efficiency and genetic gain across diverse environments.
Therefore, intending to enhance the accuracy in identifying the superior (or inferior) lines, this research delineates and contrasts five established methodologies for selecting the top (or bottom) lines evaluating the prediction accuracy in terms of novel metrics popular in the context like Sensitivity and Specificity. The description of each method was achieved with details to avoid any confusion between these methods and to see the simplicity and complexity of each one of them. In our comparative analysis with the five real datasets, we observed that under the original method, model RO outperformed the others in terms of F1 score. Specifically, it exceeded model B by 42.37%, model BO by 9.62%, model R by 60.87%, and model RC by 17.63%. Meanwhile, in terms of the Kappa coefficient, the RO model was superior to models B, BO, R, and RC by 37.46%, 36.21%, 52.18%, and 3.95%, respectively. In terms of Sensitivity, model RO outperformed models B, R, and RC by 145.74%, 250.41%, and 86.20%, respectively. Also, our results show that the second-best model was the BO, that only slightly worse than the RO model.
Our results unequivocally highlight the effectiveness of methods RO and BO in identifying the top lines. These methods incorporate a post-processing step to ensure comparable Sensitivity and Specificity, making them better choices. Nonetheless, it is important to acknowledge that achieving this enhanced classification accuracy comes at the cost of increased computational resources during the tuning process, specifically in the selection of the optimal threshold for final line classification. However, this upsurge in computational demands poses no significant challenge when dealing with small to moderately sized datasets. In such cases, only a single hyperparameter—the optimal threshold—needs tuning. Consequently, the advantages offered by the RO and BO methods far outweigh the associated costs. Furthermore, our research reveals that simplified versions of the original RO and BO methods remain highly competitive. These simplified approaches deliver nearly equivalent prediction performance while substantially reducing computational resource requirements compared to the original RO and BO methods. Given this finding, we encourage the adoption of the original RO and BO methods in real-world applications, especially for datasets of small to moderate size, where their implementation is straightforward and beneficial.
Additionally, our results clearly show that the R method consistently yields the lowest performance across all measures. This can be attributed to the persistent bias in predicting lines in the tails, whether they are top or bottom lines. Consequently, the R method tends to underestimate predictions for the top lines and overestimate predictions for the bottom lines, resulting in a significant misclassification error when selecting the optimal lines regarding a threshold.
Additionally, this paper offers a compelling perspective by conceptualizing the challenge of choosing the top (or bottom) lines in breeding programs as a classification problem. Even though some of the proposed methods (R, RC, and RO) employ a regression model during the initial training phase, this unique approach reframes the selection process as a classification problem. Consequently, the paper introduces a set of classification metrics, including Sensitivity, Specificity, F1 Score, and Kappa Coefficient, to assess the accuracy and quality of top line selection. These metrics provide a more appropriate and insightful means of evaluating the effectiveness of the chosen top lines. Also, our results show evidence that models BO and RO provide more balanced Sensitivity and Specificity regarding the other methods which is of paramount importance since metrics like Sensitivity and Specificity in plant breeding facilitate the precise selection of superior lines, optimization of breeding programs, reduction in errors, improvement of trait selection, and quantitative evaluation of line performance. These metrics play a crucial role in enhancing the efficiency, effectiveness, and success of plant breeding efforts.
Assessing Sensitivity and Specificity
Sensitivity, defined as the model’s ability to correctly identify true positives, is a crucial metric in our analysis, particularly in the context of genomic prediction for breeding. However, we acknowledge that an emphasis on maximizing Sensitivity may inadvertently lead to an increase in false positives, which can complicate the selection of superior phenotypes.
To address this concern, we implemented several strategies to balance Sensitivity and Specificity within our models. We carefully selected thresholds for predicting positive outcomes that aim to minimize false positives while still capturing a high proportion of true positives. Additionally, we utilized cross-validation techniques to assess model performance across diverse datasets, ensuring that our findings are robust and generalizable.
Moreover, we conducted an analysis of the trade-offs involved in adjusting Sensitivity levels, discussing the implications for breeding programs. While higher Sensitivity is desirable to ensure that few superior phenotypes are missed, it is critical to monitor the rate of false positives to avoid misdirecting breeding efforts. This balanced approach allows us to enhance the Precision of our genomic predictions and improve the overall selection process.
These results offer empirical confirmation that the RC model, which represents the conventional approach for selecting the top lines, ranks among the two least efficient strategies in capturing top lines effectively in terms of Sensitivity. Consequently, we strongly advocate for the adoption of methods RO and BO by breeders. These methods have demonstrated their ability to enhance the efficacy of the Genomic Selection (GS) methodology. As previously noted, the practical implementation of GS remains challenging due to the influence of various factors on its performance. Therefore, embracing the RO and BO methods can significantly improve the overall results and reliability of GS in breeding programs.