Linear Mixed Model for Genotype Selection of Sorghum Yield

Tesfa, Mulugeta; Zewotir, Temesgen; Derese, Solomon Assefa; Belay, Denekew Bitew; Shimelis, Hussein

doi:10.3390/app13052784

Open AccessArticle

Linear Mixed Model for Genotype Selection of Sorghum Yield

by

Mulugeta Tesfa

^1,2,*

,

Temesgen Zewotir

³,

Solomon Assefa Derese

⁴,

Denekew Bitew Belay

¹

and

Hussein Shimelis

⁵

¹

Department of Statistics, College of Science, Bahir Dar University, Bahir Dar P.O. Box. 79, Ethiopia

²

Department of Statistics, College of Natural and Computational Sciences, Wollo University, Dessie P.O. Box 1145, Ethiopia

³

School of Mathematics, Statistics & Computer Science, University of KwaZulu-Natal, Durban 4041, South Africa

⁴

Department of Plant Science, College of Agriculture, Woldia University, Woldia P.O. Box. 53, Ethiopia

⁵

School of Agricultural, Earth and Environmental Sciences, College of Agriculture, Engineering & Science, Pietermaritzburg, University of KwaZulu-Natal, Durban 4041, South Africa

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(5), 2784; https://doi.org/10.3390/app13052784

Submission received: 26 December 2022 / Revised: 14 February 2023 / Accepted: 15 February 2023 / Published: 21 February 2023

(This article belongs to the Special Issue Applications of Advanced Genomic and Phenomic Technologies for Plant Improvement II)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Data analysis using the General linear model assumes the factors to be fixed effects, and the BLUE method, which is based on their mean performance, is appropriate to select the best performing genotypes. The linear mixed model incorporates fixed and random effects that are very important to compare a genotype’s performance through BLUP. The purpose of this study was to identify the best performing genotypes that provided a high grain yield using a mixed model, compare the mean performance of genotypes on grain yield using BLUP and BLUE, and determine the impact of drought on sorghum production in Ethiopia. The experiment used water availability as a treatment, and each replication within the treatment levels used a lattice square design for data collection. The design consisted of 14 × 14 square experimental units (plots) comprising 196 genotypes, where each row of the square was represented as a block receiving 14 genotypes. The phenotypic characteristics were measured for the study. The statistical methods used for the study were ANOVA and the linear mixed model to identify the best performing genotypes of sorghum. The study found that sorghum production was influenced by drought, which restricted sorghum growth due to a shortage of water. The implementation of irrigation increased the grain yield from 2.48 to 3.17 t/ha, indicating that the difference in grain yield between treatments (with and without irrigation) was 0.69 t/ha. The study compared the general linear model and linear mixed model, and the investigation revealed that the mixed model was more accurate than the general linear model. The linear mixed model selected the best performing genotypes in grain yield with better accuracy. It is recommended to use the linear mixed model to select the best performing genotypes in grain yield.

Keywords:

linear mixed model; best performing genotype; genotype selection

1. Introduction

Sorghum (Sorghum bicolor (L.) Moench.) is a flowering plant found in arid and semi-arid regions of the world. It serves as a primary source of food [1,2]. It accounts for 43% of all major food staples available for consumption in Sub-Saharan Africa [3]. The production of sorghum is the fourth in Ethiopia following maize, wheat, and teff. In Ethiopia, farmers consumed 70.43% of sorghum production for their household food consumption, and the rest of the sorghum production used for animal feed and in-kind utility to farmers [4].

Sorghum production is mainly measured through phenotypic characteristics, of which grain yield is the main component of the characteristics. The phenotypic characteristics of cereal crops are affected by genotype effects, environmental conditions (stress), and the interactions between the genotype and the environment [5]. The studies found that there was a significant difference among genotypes on yield and yield-related traits and also indicated the effect of drought stress on sorghum production. The analysis of these studies was performed using ANOVA, which considers all the factors as fixed effects, and multivariate analysis techniques, particularly cluster analysis and principal component analysis [6,7]

The investigator who compared the analysis of variance (ANOVA) and the mixed model used the Best Linear Unbiased Predictor (BLUP) to evaluate location-specific genotypic effects for rape cultivars with random locations, and the result indicated that the mixed model provided both high prediction and more efficiency in the prediction of the location specific genotype effects for rape traits with random locations [8,9].

In ANOVA, the performance of the genotypes that have a higher grain yield of sorghum is obtained using the arithmetic mean of the grain yield of the genotypes within the replications, which depends on the Best Linear Unbiased Estimator (BLUE) of the general linear model. However, when the number of genotypes is large, considering the genotype effects as random may be preferable, instead of being considered as fixed effects using various classical analyses such as ANOVA and additive main effect multiplicative interaction. The perdition of the outcome of the genotype is based on the BLUP, which helps to select the best genotype by ranking the estimates of the grain yield in a mixed model, which incorporates both the fixed and random effects [7,9,10]

The selection of genotypes, having the higher genetic gain of the genotypes for future sorghum production and breeding through BLUP, has an advantage in identifying the ranks of the genotypes by the prediction of the sorghum, as BLUP has a property of shrinkage towards the mean, which increases the accuracy by reducing the mean square error. Thus, it maximizes the association of the true and predicted genotypic values [9].

The purpose of this study is to select the best genotypes that have high yield performance using a mixed model, compare the yield performance of genotypes through arithmetic mean and BLUP selection techniques of sorghum by considering genotypes as random effects, and determine the impact of drought on sorghum production in Ethiopia.

The paper is organized into four sections that include the methodology, result, discussion, and conclusion of the study. Section 2 contains the methodology that explains the design of experiment and variables in the study as well as the statistical model description of the paper. Section 3 contains the results of the study that explores the descriptive summary of the data and the result of the analysis of genotype selection through the linear mixed model. Section 4 and Section 5 present the study’s discussion and conclusion, respectively.

2. Materials and Methods

2.1. Site Description and Experimental Design

One hundred ninety-six genotypes were collected from the Ethiopian Biodiversity Institute (EBI), which were used to identify the agronomic characteristics of sorghum in North East Ethiopia [7]. The experiment was held at the Kobo site of the Sirinka Agricultural Research Center in 2014/2015 in Ethiopia. The experiment had a treatment with two levels, namely, rainfed (insufficient water availability) and irrigated (indication of the presence of sufficient water) conditions. Each treatment level contained two replications. A lattice square design was used to obtain data for the study under each replication. The design consisted of 14 × 14 square experimental units (plots) comprising 196 genotypes, where each row of the square was represented as a block receiving 14 genotypes. The genotypes were applied to experimental units that all genotypes were included in as replication [7,11]. The layout of the design is indicated in Table 1 below.

2.2. Variables of the Study

The study considered both the response variables and input variables that incorporated fixed and random variables. The experiment designed for the study measured different phenotypic characteristics of sorghum, of which grain yield was considered as the response variable [12,13]. Treatment was considered as a fixed effect, and replication within the treatment, genotype effect, and the interactions of the genotypes by treatment were assumed to be random effects of the study.

2.3. Linear Mixed Model

The linear mixed model provides flexibility in fitting models for various combinations of fixed and random effects. Grain yield is the response variable that represents the phenotypic characteristics of sorghum, with different factors and a linear mixed model assumed for the analysis. Various studies consider the genotypes, replications, blocks, and all interactions as random and fixed effects [14,15]. The analysis of the phenotypic data for plant breeding and the variety trials depends on a linear mixed model of the form [9,14,16,17]

y_{i j k m} = μ + α_{i} + ρ_{j (i)} + β_{k} + {(α β)}_{i k} + ε_{i j k m}

(1)

where y_ijkm is the outcome of the grain yield of the m^th observation of the i^th treatment of the j^th replication and the k^th genotype; μ is the overall mean of the grain yield; α_i is the i^th treatment (irrigation) effect, which is a fixed effect; β_k is the random effect of the k^th genotype; ρ_j_(i) is the random effect of the j^th replication nested within the i^th treatment; (αβ)_ik is the interaction effect of the i^th treatment by k^th genotype. In this case, ρ_j(i), β_k, (αβ)_ik, and ε_ijkt are the random effects, and they are uncorrelated to each other. The effect of genotypes, the effect of replication within treatment, and the effect of the interactions of the treatment by the genotype are normally distributed with mean zero and variances

σ_{β}^{2}, σ_{ρ (α)}^{2}

, and

σ_{α β}^{2}

.

ε_{i j k m}

is the random term from a normal distribution with mean zero and the variance -covariance matrix

σ_{ε}^{2}

.

The estimates of the fixed effects are estimated based on the BLUE method, which depends on having the minimum variance among all linear unbiased estimators [18]. The estimates of the random effects are obtained by the BLUP method because it has good predictive accuracy compared to the other classical procedure BLUE [9,10].

When using ANOVA (GLM), all factors in the analysis are considered to be fixed effects, and the estimates of the parameters are obtained from the BLUE technique [19]. However, considering the genotypes as the random effect is important to investigate the impact of genetic gain for the response and its estimate is obtained by BLUP, which is crucial to select the best genotypes for future plant breeding, and BLUP is more accurate than the fixed effect estimates, estimated by BLUE, in ANOVA [9]. There are various estimation techniques for the fixed and random effects parameters in the mixed model. The most common methods of estimation for linear mixed model parameters are maximum likelihood (ML), restricted maximum likelihood (REML), and Minimum Norm Quadratic Unbiased Estimator (MINQUE) [19,20]. Of these methods, REML is used to estimate the parameters of the random effects of a mixed model, having important properties that account for the degree of freedom lost in the estimation of fixed effects [21]. The estimated values of the random effects assist in estimating the parameters of the mixed model’s fixed effects that use the generalized least square method (GLS).

BLUP is a method of predicting unobservable random effects depending on the response observations that has better accuracy than other estimators (BLUE) [22,23]. The estimated values that are estimated by BLUP for each random effect parameter are obtained, and the hypothesis test can be performed using the p-value to identify the specific genotype being a significantly high/low performer in grain yield. The hypothesis tests of the best/worst performing genotypes, considering the standardized distributed BLUPs as a Student t-distribution with denominating degrees of freedom. The test result is the same as the p-value test technique [24].

Model diagnosis of the mixed model is the most important visualization to assess the agreement between the model and the data. It is evaluated by marginal and conditional residual plots to test the residuals’ normality, linearity, and homoscedasticity and detect the outlier values. The other measure of diagnosis is a measure of influential diagnosis, which helps to assess the extreme/outlier observation [19]. The analysis of this study was performed by the mixed procedure of the SAS system (version 9.4).

3. Results of the Study

3.1. Descriptive Statistics

Grain yield is the most important phenotypic characteristic of sorghum. The average value of the measurements of grain yield was 2.48 t/ha for the sorghum grown under drought conditions with a standard deviation of the grain yield 0.71 t/ha in log scale. For the sorghum genotype grown under sufficient water availability, the average grain yield was distributed at 3.17 t/ha with the corresponding standard deviation grain yield of 1.15. The absence of sufficient water revealed the influence of drought on the growth of the sorghum crop, which lead to the reduction of sorghum production in grain yield. The coefficient of variation in the grain yield of sorghum showed the consistency of grain yield under drought conditions (non-irrigated) compared with grain yield variation under the availability of water (Table 2). The production of grain yield of sorghum in drought was more consistent than the sorghum production with sufficient water availability. Grain yield variability difference under treatment levels shows that the yield of genotypes under non-irrigated conditions were concentrated close to the mean point and generate less variable grain yields for the genotypes. The variation of grain yield in non-irrigated conditions was less than in irrigated conditions since the standard deviation of grain yield of sorghum in the non-irrigated conditions was smaller than that of the irrigated conditions.

The distribution of grain yield of sorghum has a longer tail to the right (positively skewed) and shows that the distribution of the measurements of the grain yield of the sorghum is non-normal (histogram of the grain yield). The estimated kernel density of observations of grain yield clearly shows the non-normality of the response variable as the kernel density and normal distribution do not overlap each other. The kernel density of the grain yield shows the skewness of the distribution of the grain yield, specifically skewed to the right, and ensures the non-normality of the grain yield (Figure 1). The scatter plot of the measurements is nonlinear relative to the normal line of the Q-Q plot, in which most of the points deviate from the normal line at the right of the distribution. The patterns of the points of the grain yield scatter plot are curved with the slope increasing from left to right and the result of the theoretical distribution is skewed to the right, which does not provide a better fit for the grain yield of the measurements (Figure 1).

The Shapiro–Wilk test is important to test the normality assumption of the grain yield that tests the hypothesis of no significant departure from normality for the observation of a sample size less than 2000. The test statistic for the Shapiro–Wilk test is 0.9548 with the associated p-value < 0.05, which indicates rejecting the null hypothesis. The normality of the distribution of grain yield is highly significant, which reveals that the distribution of grain yield significantly differs from the normal distribution.

One of the remedial measures for violating the normality assumption is transforming the data using the log transformation technique. It is the most familiar technique for transforming non-normal data set into a normal one [25]. Due to the non-normality of the observations of grain yield, to maintain the normality assumption, the data points are transformed using the natural logarithmic function. After that, the histogram of the new data set indicates the symmetrical nature of the distribution, which shows the satisfaction of the normality assumption (left hand side of Figure 2). The scatter plots of the new data points fitted on the normal line show that the new data sets have a normal distribution (0.9811 ± 0.33) with a lower standard deviation compared to the untransformed data values of the grain yield (2.82 ± 1.017) (right hand side of Figure 2).

3.2. The Selection of Best Genotypes Using the Mean Performance of the Genotypes

Table 3 presents the analysis of variance of grain yield (in log scale), considering treatment, genotype, replication within treatment, and interactions of the genotype by treatment as sources of variations for grain yield (log scale). According to the result, there were highly significant differences in grain yield (in log scale) among the genotypes and the interaction between the genotypes. The result also indicates that there was a significant difference between the levels of the treatment on the grain yield (in log scale), which explained the presence of sufficient water and absence of sufficient water yield different sorghum production. There was also a significant difference on grain yield characteristics among the effect of replication within treatment (p-value < 0.0001). The effect of the interactions of the genotypes by treatment on grain yield was highly significant (p-value < 0.0001).

The mean performance of the genotypes was obtained using the arithmetic mean technique, which is one of the best linear unbiased estimators of the genotypes at large, the grain yield over replication, and the estimates ranked to select the best and or worst performing genotypes of sorghum. The worst and the best performing of the genotypes are presented in Table 4, which were obtained via the arithmetic mean performance. The top three best performing genotypes were 149, 190, and 145, with the arithmetic mean of the yield of the genotypes provided being 1.686, 1.677, and 1.6635 t/ha in the log scale, and their corresponding standard deviations were 0.245, 0.251, and 0.260, respectively. The least three performing genotypes were 41, 78, and 108, with corresponding standard deviations of 0.0445, 0.0606, and 0.0409, respectively (Table 4).

3.3. The Selection of the Genotypes Using a Mixed Model

The result in Table 5 indicates that to the test result of the effect of treatment on grain yield, indicating whether the treatment effect parameter estimates are significant or not and if the test of the random effects had a factor on the grain yield. The hypothesis that no mean grain yield difference across treatment levels was rejected as the p-value was small (p-value < 0.0001). This showed that the presence of treatment affected the mean difference in grain yield in the log scale compared to the absence of treatment (non-irrigated) (p-value < 0.0001). The estimated value of the parameter of treatment also indicated the mean difference in the level of treatment that the mean grain yield under irrigation was estimated at, which was 1.0924. The estimated value of grain yield under the non-irrigated level of the treatment (in log scale) was 0.8698 (in log scale). The result showed that the presence of variability in grain yield associated with the effect of genotypes, in which 89.58% of the total variance of grain yield (log scale) was related to the effect of the genotype that explained the grain yield variability. The association between the interactions of the genotypes by treatment and the grain yield was indicated by the result. It was shown that 8.86% of the total variance of the grain yield was associated with the interactions of the treatment by the genotype. The variability of grain yield due to the random effect of the genotype and the interactions of the genotypes by treatment was highly significant (p-value < 0.5). The random effect replication within the treatment explained 0.597% of the total variance of the grain yield variability, which was an insignificant effect in the variability of the grain yield (p-value > 0.5) (Table 5). The variability due to the random block effect was very small, thus the analysis excludes the block effect.

Table 6 indicates that the performance of the genotypes and important to select the best genotypes for future breeding, the solution of the random effects helps to estimate the correlation between the predicted value and true estimated value of the genotypes that is important to make a rank for the genotypes of sorghum. For selecting the best genotypes, the ranking of the BLUP solution of the genotypes and the finding of the better genotype with a minimum mean square error are used. Table 6 presents the worst and best performing genotypes estimated through BLUP. The BLUP result is a precise estimate of the random effect of the genotype having the least standard error of prediction compared to the performance of the estimated BLUE genotypes. The top three best genotypes with high performing grain yields were 149, 190, and 145, and the predicted values of the genotypes were 0.668, 0.6589, and 0.6452, respectively, with their corresponding standard error of sorghum perdition being 0.0689 for each genotype. The BLUP estimate is more accurate than the estimates of the arithmetic mean (BLUE) performance of the genotypes with standard errors of 0.245, 0251, and 0.260, respectively. The least three genotypes having the least performance on grain yield in the log scale were 41, 78, and 108, with the associated standard error of perdition being 0.0457, 0.0457, and 0.0458, respectively. Genotypes 66, 180, 85, and 137 were genotypes with inaccurate BLUE estimates compared to BLUP performance as their standard deviations were higher than the standard deviations under the arithmetic mean (BLUE). However, the other estimates of the worst ten genotype performances were a more precise estimate of BLUE than that of the BLUP of the worst performing genotypes. The performance of the genotypes on grain yield for the top ten best performing genotypes in BLUP was more accurate than the estimates of BLUE of the best top ten performer genotypes.

Table 7 shows the performance of the genotypes under irrigation for the selected genotypes, which was obtained by considering the interactions of the genotypes by treatment. The top three best performing genotypes under irrigation were genotypes 142, 184, and 173 depending on the estimate of BLUP, in contrast, the selection result using BLUE was different, yielding genotypes 190, 149, and 145 in rank order. Genotype 142 was the tenth best performing genotype for the overall comparison of the genotypes and yielded the highest sorghum production under irrigation conditions, while the other genotypes were not among the top ten best performing genotypes, indicating that the top three genotypes provided the highest sorghum production. The least performing genotypes were presented, with the top three being 55, 49, and 41, in that order. In terms of estimating the random effect, genotype 41 was the worst performer.

Table 8 shows the performance of the genotypes under stress (drought), including the top best performing genotypes and the worst performing genotypes. According to the results, the top three genotypes with a high grain yield were 55, 49, and 8, which were among the worst performers under irrigation conditions. The genotypes with the lowest grain yield production under stress were 137, 142, and 184, which were expected to provide insufficient production for community food. Genotypes 142 and 124 were the genotypes that indicated high grain yield under irrigation conditions; however, these were incapable of resisting drought (stress) conditions. This shows that the genotype yielding the high yield under irrigation may be inappropriate for the drought condition.

The model was diagnosed using the Studentized residual for grain yield (log scale), which indicated the linearity of the predicted value on residuals, as well as the conditional Studentized residual for grain yield (in log scale). The linearity of the fixed effects, including the intercept and treatment effect, was represented by the top left plot of the marginal mean versus the residual. The marginal mean demonstrated the fixed effects’ linearity to the residual. The residual statistics show that the mean was close to zero and the standard deviation was one. The descriptive statistics of the residual were shown in the lower right corner of the log scale marginal residual for grain yield. The Q-Q plot checked the residual’s normality assumption, whether the residuals have a normal distribution or not. According to the Q-Q plot, the normality of the studentized residual for grain yield was satisfied as the scatter plot was linear to the normal curve (Figure 3). The histogram and scatter plot showed the normality and randomness of the conditional studentized residual of grain yield, and the assumptions are satisfied (top left and right of conditional studentized residual for gran yield (in log scale)).

Given the random effects, the conditional the Studentized residual for grain yield represents the difference between the observed and predicted values. The scatter plot depicts the conditional residuals’ homoscedasticity. The residual statistics are shown in the lower right, with the mean of the residual being zero and the standard deviation being one. The residual versus predicted value of the conditional is represented by the left top plot. The Studentized residual indicates the homoscedasticity of the conditional error, and the histogram and Q-Q plot of the residual show that the conditional residual meets the normality assumption (Figure 3).

4. Discussion

The paper described the linear mixed model and discussed its importance over GLM analysis, ANOVA, and using the data of the agronomic characteristics of sorghum. It identified sources of variations that lead to invalid conclusions when the classical linear model (ANOVA) was used to analyze the data. The linear mixed model was suggested to be essential for evaluating sorghum agronomic characteristics, primarily grain yield. A descriptive summary result of the sorghum yield, as justified by other investigators, showed that there was an influence from drought on the decline of the production of sorghum, and the variability of the grain yield in drought conditions was less variable compared to the variability of the grain yield production using sufficient moisture. This revealed that effect of drought restricted the genotypes to an extreme yield (too small or too large yield for the genotypes), and the result of the grain yield was consistent for all genotypes [7,26]. The result agreed with the studies that indicated the genotypes of sorghum having the ability to resist the effect of drought, especially arid and semi-arid areas, as sorghum, in general, has a great drought tolerance relative to other crops, and the preference of framers is different according to their preference criteria [6,7,14,27,28,29]. The exploration of the data analysis of the grain yield through a histogram and Q-Q plot indicated the distribution of the data was asymmetrical (not normal), and the remedial measure for the violation of normality assumption, which is the most important assumption for analysis of variance (ANOVA), was the log transformation of the grain yield of sorghum. The transformation reduced the standard deviation from 1.0175 to 0.3305 t/ha, revealing that the transformed data’s variability was less than the variability in the original data set. ANOVA considered all factors as a fixed effect, yielding inappropriate modeling and lead to incorrect conclusions drawn due to false positives (Type I errors), powers, and utilities of estimation [30,31,32,33].

This study incorporates both fixed and random effects to identify the effects of genotype and other factors on grain yield, and a mixed model was used for the analysis of grain yield (in log scale) data that compared the results of it with the ANOVA result. The fixed effect parameters in the mixed and ANOVA models were estimated by the BLUE method, and the random effect parameters in the mixed model used the BLUP technique with better accuracy. This study was supported by the investigator who explained the use of both fixed and random effects in the mixed model and also introduced the inappropriate model that led to the incorrect conclusion when considering only the fixed effects instead of incorporating the random effects in the study [34]. The overall performance of the genotypes of this study deviated from the result of [7] the best performing genotypes, as the analysis was performed separately. In addition, the best genotypes under irrigation and stress conditions also deviated, which could be due to the presence of the variability of grain yield among the random effects. This study used the BLUP estimate, which is more precise in comparison to the BLUE estimate, and supported the evidence for the advancement of the use of the linear mixed model for genotype selection in plant breeding. The study conducted by [23] investigated the use of BLUE computing for a linear function of the fixed effects, and BLUP computing for a linear function of the random effect, and, also, showed the selection using BLUP for animal breeding [7,8,23,34].

In this case, treatment was considered a fixed effect [7], whereas the genotypes, interactions of the treatment by the genotype, and the replication within treatments were assumed to be random effects that represented the whole population of sorghum [31]. The treatment effect significantly affected the grain yield and caused an increase in grain yield due to the presence of irrigation compared to non-irrigation (lack of sufficient water). Studies indicate that drought impacts cereal crop production, and some other constraints also constrain sorghum production [35,36].

The random effects, except for replication within treatment, which is insignificant for the variability of grain yield, are significant for the variability of grain yield (in log scale) and essential to predict the best genotype for future sorghum production. The grain yield prediction of the genotypes is obtained using BLUP and helps to rank the performance of the genotypes on grain yield [9].

One hundred ninety-six genotypes were used in the study, with a large number of parameters involved for estimating random effects [14]. The top three genotypes were 149, 190, and 145. Sorghum production was high in grain yield and recommended for mass production. The least three genotypes were 41, 78, and 108, which indicated weak production in the grain yield in a log scale, thus these were the genotypes with a low production capacity in any environmental conditions. The BLUP result predicted the grain yield of sorghum (in log scale) for unobservable genotypes, based on the observable grain yield with a minimum mean square error. In this case, the BLUP result was identical to the arithmetic mean performance of the genotypes with better precision of the estimates for the BLUP estimates of the sorghum production [7,8,14].

5. Conclusions

The study revealed that the effect of treatment on grain yield was significant. The production of grain yield was increased due to the application of sufficient water from 2.48 to 3.17 t/ha, and the distribution of grain yield for sorghum had a longer tail to the right (positively skewed), showing that the distribution of the measurements was non-normal. Logarithmic transformation was used to transform the grain yield data to a normal distribution. The distribution of the transformed data had a normal distribution and the new data were suitable for mixed effect analysis.

The best genotypes for the grain yield trait were selected using the BLUP method, which evaluates the genotypes with higher production of grain yield and produces more accurate estimates than the BLUE method. The genotypes that have a better capacity for mass production of the cultivar of sorghum were 149, 190, and 145, in order of future sorghum production. On the other hand, genotypes that are not recommended for future mass production were 41, 78, and 108, in order of sorghum production. The rank of the performance of the genotype’s production was the same for the BLUP method and mean performance, but the order using the BLUP method was more accurate than the mean performance using the arithmetic mean that was BLUE. It is recommended for future applications to use a mixed model to select the best genotype for the future production of sorghum. A future study will focus on the identification of the other related factors that predict the sorghum grain yield.

Author Contributions

M.T. was involved in this study and handled data management, data analysis, and the drafting and revising of the final manuscript. T.Z. and D.B.B. contributed to the conception, design, interpretation of data, and manuscript reviews and revisions. S.A.D. and H.S. contributed the data for the study. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We, the authors, will provide the data when the necessary information and data is provided.

Acknowledgments

The author acknowledges Bahir Dar University and Wollo University for providing MTM admission and a salary, respectively. We are also glad to thank Melese Mengesha for his valuable contribution to the English edition of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wondimu, Z.; Dong, H.; Paterson, A.H.; Worku, W.; Bantte, K. Genetic diversity, population structure, and selection signature in Ethiopian sorghum [Sorghum bicolor L.(Moench)] germplasm. G3 2021, 11, jkab087. [Google Scholar] [CrossRef] [PubMed]
Morris, G.P.; Ramu, P.; Deshpande, S.P.; Hash, C.T.; Shah, T.; Upadhyaya, H.D.; Riera-Lizarazu, O.; Brown, P.J.; Acharya, C.B.; Mitchell, S.E.; et al. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. Proc. Natl. Acad. Sci. USA 2013, 110, 453–458. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Agbede, T.; Ojeniyi, S.; Adeyemo, A. Effect of poultry manure on soil physical and chemical properties, growth and grain yield of sorghum in southwest, Nigeria. Am.-Eurasian J. Sustain. Agric. 2008, 2, 72–77. [Google Scholar]
CSA. Agricultural Sample Survey 2020/21(2013 E.C.): September–December, 2020; CSA: Addis Ababa, Ethiopia, 2020; Volume III. Available online: http://www.statsethiopia.gov.et/wp-content/uploads/2021/06/2020_21-2013-E.C-AgSS-Main-Season-Agricultural-Farm-Management-Report.pdf (accessed on 25 December 2022).
Beyene, A.; Hussien, S.; Pangirayi, T.; Mark, L. Physiological mechanisms of drought tolerance in sorghum, genetic basis and breeding methods: A review. Afr. J. Agric. Res. 2015, 10, 3029–3040. [Google Scholar] [CrossRef] [Green Version]
Derese, S.A.; Shimelis, H.; Laing, M.; Mengistu, F. The impact of drought on sorghum production, and farmer’s varietal and trait preferences, in the north eastern Ethiopia: Implications for breeding. Acta Agric. Scand. Sect. B Soil Plant Sci. 2018, 68, 424–436. [Google Scholar] [CrossRef]
Derese, S.A.; Shimelis, H.; Mwadzingeni, L.; Laing, M. Agro-morphological characterisation and selection of sorghum landraces. Acta Agric. Scand. Sect. B Soil Plant Sci. 2018, 68, 585–595. [Google Scholar] [CrossRef]
Hu, X. A comprehensive comparison between ANOVA and BLUP to valuate location-specific genotype effects for rape cultivar trials with random locations. Field Crops Res. 2015, 179, 144–149. [Google Scholar] [CrossRef]
Piepho, H.P.; Möhring, J.; Melchinger, A.E.; Büchse, A. BLUP for phenotypic selection in plant breeding and variety testing. Euphytica 2008, 161, 209–228. [Google Scholar] [CrossRef]
Piepho, H.-P. Best linear unbiased prediction (BLUP) for regional yield trials: A comparison to additive main effects and multiplicative interaction (AMMI) analysis. Theor. Appl. Genet. 1994, 89, 647–654. [Google Scholar] [CrossRef]
Bose, R.C. A note on the resolvability of balanced incomplete block designs. Sankhyā Indian J. Stat. 1942, 6, 105–110. [Google Scholar]
Walpole, R.E.; Myers, R.H.; Myers, S.L.; Ye, K. Probability and Statistics for Engineers and Scientists; Macmillan: New York, NY, USA, 1993; Volume 5. [Google Scholar]
Pérez-Vicente, S.; Ruiz, M.E. Descriptive statistics. Allergol. Immunopathol. 2009, 37, 314–320. [Google Scholar] [CrossRef]
Saroj, R.; Soumya, S.L.; Singh, S.; Sankar, S.M.; Chaudhary, R.; Yashpal; Saini, N.; Vasudev, S.; Yadava, D.K. Unraveling the relationship between seed yield and yield-related traits in a diversity panel of brassica juncea using multi-traits mixed model. Front. Plant Sci. 2021, 12, 651936. [Google Scholar] [CrossRef]
Gasura, E.; Setimela, P.S.; Souta, C.M. Evaluation of the performance of sorghum genotypes using GGE biplot. Can. J. Plant Sci. 2015, 95, 1205–1214. [Google Scholar] [CrossRef] [Green Version]
Robinson, G.K. That BLUP is a good thing: The estimation of random effects. Stat. Sci. 1991, 6, 15–32. [Google Scholar]
Wright, D.B. Some Limits Using Random Slope Models to Measure Academic Growth. Front. Educ. 2017, 2, 58. [Google Scholar] [CrossRef] [Green Version]
Welsch, R.E.; Kuh, E. Linear Regression Diagnostics (0898-2937); National Bureau of Economic Research: Cambridge, MA, USA, 1977. [Google Scholar]
Zewotir, T.; Galpin, J.S. Influence diagnostics for linear mixed models. J. Data Sci. 2005, 3, 153–177. [Google Scholar]
Tunaz, A.T. Determination of Best Variance-Covariance Structure in Mixed Model (SAS Proc Mixed) with Various Parameter Estimation Methods. Gaziosmanpaşa Üniv. Ziraat Fak. Derg. 2021, 38, 53–59. [Google Scholar]
Gumedze, F.; Dunne, T. Parameter estimation and inference in the linear mixed model. Linear Algebra Its Appl. 2011, 435, 1920–1944. [Google Scholar] [CrossRef] [Green Version]
Jiang, J. A derivation of BLUP—Best linear unbiased predictor. Stat. Probab. Lett. 1997, 32, 321–324. [Google Scholar] [CrossRef]
Henderson, C.R. Best linear unbiased estimation and prediction under a selection model. Biometrics 1975, 31, 423–447. [Google Scholar] [CrossRef] [Green Version]
Zewotir, T. On employees’ performance appraisal: The impact and treatment of the raters’ effect. South Afr. J. Econ. Manag. Sci. 2012, 15, 44–54. [Google Scholar] [CrossRef] [Green Version]
Osborne, J. Improving your data transformations: Applying the Box-Cox transformation. Pract. Assess. Res. Eval. 2010, 15, 12. [Google Scholar]
Newsom, J.T. Structural Equation Modeling: A Multidisciplinary Journal. Multidiscip. J. 2009, 9, 19. [Google Scholar]
Mutava, R.N.; Prasad, P.V.V.; Tuinstra, M.R.; Kofoid, K.D.; Yu, J. Characterization of sorghum genotypes for traits related to drought tolerance. Field Crops Res. 2011, 123, 10–18. [Google Scholar] [CrossRef]
Naoura, G.; Sawadogo, N.; Atchozou, E.A.; Emendack, Y.; Hassan, M.A.; Reoungal, D.; Amos, D.N.; Djirabaye, N.; Tabo, R.; Laza, H. Assessment of agro-morphological variability of dry-season sorghum cultivars in Chad as novel sources of drought tolerance. Sci. Rep. 2019, 9, 19581.rich. [Google Scholar] [CrossRef] [Green Version]
Richard EBoyles, Z.W.B.; Kresovich, S. Genetic and genomic resources of sorghum to connect genotype with phenotype in contrasting environments. Plant J. 2019, 97, 21. [Google Scholar]
Knief, U.; Forstmeier, W. Violating the normality assumption may be the lesser of two evils. Behav. Res. Methods 2021, 53, 2576–2590. [Google Scholar] [CrossRef]
Yang, R.-C. Towards understanding and use of mixed-model analysis of agricultural experiments. Can. J. Plant Sci. 2010, 90, 605–627. [Google Scholar] [CrossRef]
Mcguinness, K.A. Of rowing boats, ocean liners and tests of the ANOVA homogeneity of variance assumption. Austral Ecol. 2002, 27, 681–688. [Google Scholar] [CrossRef]
Pusponegoro, N.H.; Rachmawati, R.N.; Notodiputro, K.A.; Sartono, B. Linear mixed model for analyzing longitudinal data: A simulation study of children growth differences. Procedia Comput. Sci. 2017, 116, 284–291. [Google Scholar] [CrossRef]
Yu, H.-T. Applying Linear Mixed Effects Models with Crossed Random Effects to Psycholinguistic Data: Multilevel Specification and Model Selection. Quant. Methods Psychol. 2015, 11, 78–88. [Google Scholar] [CrossRef] [Green Version]
Ouedraogo, N.; Sanou, J.; Kam, H.; Traore, H.; Adam, M.; Gracen, V.; Danquah, E.Y. Farmers’ perception on impact of drought and their preference for sorghum cultivars in Burkina Faso. Agric. Sci. Res. J. 2017, 7, 277–284. [Google Scholar]
Ray, R.L.; Fares, A.; Risch, E. Effects of drought on crop production and cropping areas in texas. Agric. Environ. Lett. 2018, 3, 170037. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Histogram and Q—Q plot of grain yield.

Figure 2. Histogram and Q-Q plot of the new data set.

Figure 3. Marginal and conditional Studentized residual for grain yield (in log scale).

Table 1. Lattice Square Design Layout of the Data.

Treatment→without Irrigation (Level 1)
REP I						REP II
	Genotype (196)						Genotype (196)
Block ↓	1	2	3	⋯	14	Block ↓	1	2	3	⋯	14
1						1
2						2
3						3
						⋮
14						14
Treatment→with Irrigation (Level 2)
REP I						REP II
	Genotype (196)						Genotype (196)
Block↓	1	2	3	⋯	14	Block↓	1	2	3	⋯	14
1						1
2						2
3						3
⋮						⋮
14						14

Table 2. Summary statistics for grain yield of sorghum.

	Non-Irrigated (n = 392)	Irrigated (n = 392)
Min	1.27	1.29
Mean	2.48	3.17
Std Dev	0.71	1.15
Max	4.47	6.81
CV (%)	28.63	36.28

Table 3. ANOVA result for grain yield.

Source of Variations	DF	Mean Square	F Value	Pr > F
Treatment	1	9.709817	10,445.7	<0.0001
Genotype	195	0.367059	394.88	<0.0001
Replication	2	0.177854	191.33	<0.0001
Genotype * treatment	195	0.018134	19.51	<0.0001

* shows interaction.

Table 4. Least and top performing genotypes using arithmetic mean performance.

The Worst Performing Genotypes			The Best Performing Genotypes
Genotype	Mean	Std.Dev	Genotype	Mean	Std.Dev
41	0.3031	0.0445	149	1.6860	0.2448
78	0.3684	0.0606	190	1.6771	0.25147
108	0.4098	0.0409	145	1.6635	0.2603
136	0.4117	0.0746	105	1.6482	0.1886
39	0.4419	0.0555	174	1.6105	0.2122
66	0.4471	0.1903	164	1.5700	0.2243
180	0.4659	0.0818	141	1.5576	0.2272
56	0.4861	0.0507	68	1.5563	0.1609
85	0.5117	0.083	99	1.5247	0.1494
137	0.5125	0.3017	142	1.5177	0.3171

Table 5. Tests on fixed and random effects significance for grain yield.

Solution for Fixed Effects
Effect	Estimate	Standard Error	t Value	Pr > \|t\|
Overall mean; μ	0.8698	0.02791	31.16	<0.0001
Treatment (Irrigated);	0.2226	0.02591	8.59	<0.0001
Ref (non-irrigated)	0
Test of Fixed Effect
Fixed Effect	Numerator DF	Denominator DF	F value	Pr > F
Treatment	1	782	73.77	<0.0001
Random Effects Variance Parameter Estimates
Random Component	Estimate	Standard Error	Z Value	Pr > Z
Replication; $σ_{ρ (α)}^{2}$	0.000579	0.000448	1.29	0.0978
Genotype; $σ_{β}^{2}$	0.08688	0.009246	9.4	<0.0001
Genotype * treatment; $σ_{α β}^{2}$	0.008596	0.000918	9.37	<0.0001
Residual; $σ_{ε}^{2}$	0.00093	0.000067	13.96	<0.0001

* show interaction.

Table 6. The worst and best performing genotypes using BLUP.

The Worst Performing Genotype (DF = 782)					The Best Performing Genotype (DF = 782)
Genotype	Estimate	Std.Err. Pred	t-Value	Pr > \|t\|	Genotype	Estimate	Std. Err. Pred	t-Value	Pr > \|t\|
41	−0.6442	0.0687	−9.37	<0.0001	149	0.67	0.0687	9.75	<0.0001
78	−0.5825	0.0687	−8.47	<0.0001	190	0.6613	0.0687	9.62	<0.0001
108	−0.543	0.0687	−7.9	<0.0001	145	0.6488	0.0687	9.44	<0.0001
136	−0.541	0.0687	−7.87	<0.0001	105	0.6339	0.0687	9.22	<0.0001
39	−0.5127	0.0687	−7.46	<0.0001	174	0.5982	0.0687	8.7	<0.0001
66	−0.5077	0.0687	−7.39	<0.0001	164	0.56	0.0687	8.15	<0.0001
180	−0.4897	0.0687	−7.12	<0.0001	141	0.5481	0.0687	7.97	<0.0001
56	−0.4702	0.0687	−6.84	<0.0001	68	0.5464	0.0687	7.95	<0.0001
85	−0.4461	0.0687	−6.49	<0.0001	99	0.5166	0.0687	7.52	<0.0001
137	−0.4455	0.0687	−6.48	<0.0001	142	0.5103	0.0687	7.42	<0.0001

Std.Err. Pred = standard error predictor.

Table 7. The performance of genotypes under irrigation.

Worst Performing Genotypes under Irrigation					Best Performing Genotypes under Irrigation
Genotype	Estimate	Std. Err. Pred	t Value	Pr > \|t\|	Genotype	Estimate	Std. Err. Pred	t Value	Pr > \|t\|
55	−0.2117	0.06586	−3.21	0.0014	142	0.1793	0.06586	2.72	0.0066
49	−0.176	0.06586	−2.67	0.0077	184	0.1757	0.06586	2.67	0.0078
41	−0.1725	0.06586	−2.62	0.009	173	0.14	0.06586	2.13	0.0339
78	−0.1682	0.06586	−2.55	0.0109	145	0.1399	0.06586	2.12	0.034
119	−0.1356	0.06586	−2.06	0.0398	190	0.1334	0.06586	2.03	0.0431
36	−0.131	0.06586	−1.99	0.047	125	0.1283	0.06586	1.95	0.0518
2	−0.1295	0.06586	−1.97	0.0497	149	0.1281	0.06586	1.94	0.0522
108	−0.1273	0.06586	−1.93	0.0537	170	0.1277	0.06586	1.94	0.0529
185	−0.1257	0.06586	−1.91	0.0567	189	0.127	0.06586	1.93	0.0542
82	−0.1161	0.06586	−1.76	0.0783	124	0.1259	0.06586	1.91	0.0564

Table 8. The performance of genotypes under stress.

Worst Performing Genotypes under Stress					Best Performing Genotypes under Stress
Genotype	Estimate	Std. Err. Pred	t Value	Pr > \|t\|	Genotype	Estimate	Std. Err. Pred	t Value	Pr > \|t\|
137	−0.1576	0.06586	−2.39	0.017	55	0.1712	0.06586	2.6	0.0095
142	−0.1288	0.06586	−1.96	0.0509	49	0.1478	0.06586	2.24	0.0251
184	−0.1276	0.06586	−1.94	0.053	8	0.1258	0.06586	1.91	0.0565
40	−0.1242	0.06586	−1.89	0.0596	2	0.1133	0.06586	1.72	0.0857
124	−0.1142	0.06586	−1.73	0.0833	78	0.1106	0.06586	1.68	0.0937
187	−0.113	0.06586	−1.71	0.0867	41	0.1087	0.06586	1.65	0.0992
158	−0.1096	0.06586	−1.66	0.0965	5	0.1004	0.06586	1.52	0.1277
107	−0.1076	0.06586	−1.63	0.1027	3	0.09952	0.06586	1.51	0.1312
156	−0.1041	0.06586	−1.58	0.1146	119	0.0945	0.06586	1.43	0.1518
157	−0.103	0.06586	−1.56	0.1184	36	0.0934	0.06586	1.42	0.1566

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tesfa, M.; Zewotir, T.; Derese, S.A.; Belay, D.B.; Shimelis, H. Linear Mixed Model for Genotype Selection of Sorghum Yield. Appl. Sci. 2023, 13, 2784. https://doi.org/10.3390/app13052784

AMA Style

Tesfa M, Zewotir T, Derese SA, Belay DB, Shimelis H. Linear Mixed Model for Genotype Selection of Sorghum Yield. Applied Sciences. 2023; 13(5):2784. https://doi.org/10.3390/app13052784

Chicago/Turabian Style

Tesfa, Mulugeta, Temesgen Zewotir, Solomon Assefa Derese, Denekew Bitew Belay, and Hussein Shimelis. 2023. "Linear Mixed Model for Genotype Selection of Sorghum Yield" Applied Sciences 13, no. 5: 2784. https://doi.org/10.3390/app13052784

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Linear Mixed Model for Genotype Selection of Sorghum Yield

Abstract

1. Introduction

2. Materials and Methods

2.1. Site Description and Experimental Design

2.2. Variables of the Study

2.3. Linear Mixed Model

3. Results of the Study

3.1. Descriptive Statistics

3.2. The Selection of Best Genotypes Using the Mean Performance of the Genotypes

3.3. The Selection of the Genotypes Using a Mixed Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI