1. Introduction
Physiological seed quality is the ability of the seed to perform vital functions, characterized by germination, vigor and longevity, which directly affects the implantation of a culture under field conditions. High-potential seeds guarantee the growth and development of plants and the eventual yield of crops [
1,
2,
3].
The correct identification of lots and/or cultivars with seeds of high physiological potential is one of the main tasks of researchers and professionals working in seed physiology and technology. For this, two basic physiological components are taken into account—germination and vigor—these being the factors that theoretically govern the ability of seeds to express their vital functions under biotic and abiotic conditions [
4,
5].
There are several ways to assess the physiological quality of a seed lot; however, the most common method is the germination test [
2,
6]. As a result, the germination test is performed to determine the final germination percentage of the seed lots. However, the time for 50% of the seeds to germinate (T50) and the germination uniformity, which is the time difference between two percentage germinations and other defined parameters [
7,
8,
9,
10], complement the germination data and can indicate the performance of seeds in the field.
To assess which seed lot has superior physiological quality, the response variable (percentage of germinated seeds) is commonly assessed by non-linear regressions, such as the Hill function [
7,
10,
11]. Another way used to analyze the germination data is through linearization, in which a link function is used, highlighting the Probit function as the most used [
12,
13,
14].
However, such approaches are imprecise, as they consider the response variable to be continuous. For instance, in the use of non-linear regressions, the proportions of germinated seeds are cumulative and residual autocorrelation may occur [
15], whereas in the use of linearization, the germination percentages are transformed into Probit units, considering that the data follow normal distribution, which ends up generating inaccurate models or simply a lack of linearization [
14,
16].
As the germination process is qualitative, with a binary result—that is, the seed does or not germinate—errors are not normally distributed. Thus, the classic regression analysis approach is not indicated [
15]. An alternative for analyzing this type of data is the theory of generalized linear models, with the binomial distribution being a particular case and indicated for proportion data [
13,
17,
18]. Therefore, our hypothesis is that the generalized linear models are more indicative, since it may provide the most accurate information about the problem exposed [
19].
In this context, the objective of this work was to use the generalized linear models, investigating which link function (Probit, Logit and Complementary log-log) is suitable to predict T50 and uniformity during germination of soybean and corn seeds.
3. Results and Discussion
In
Figure 1 (experimental data), we present the representations of the germination process for the different hybrids and/or cultivars of corn and soybeans. Initially, germination is slow, with a low proportion, then there is a period of acceleration and finally, stabilization with all viable seeds germinating.
When adjusting the Probit, Logit and Probit, and Logit and Complementary log-log link functions for the 10 hybrids and/or cultivars tested, it was observed that no cultivar presented adequate adjustment simultaneously by the link functions when considering the Deviance criterion. The corn hybrid and soybean cultivars with dispersion parameter closest to 1 were AS 1633 PRO3 and CD251 RR, both adjusted by the Logit link function. The overdispersion phenomenon was observed in six cultivars simultaneously, with functions by link values to Deviance from 1.3, whereas only the 4103 corn cultivar BRS presented underdispersion, indicated by three functions being used (
Table 3).
The phenomena of under- and overdispersion were defined by [
17], as a variance of the response variable above or below the variance expected by the model adopted. The main consequences of these phenomena are the estimation of standard errors, which consequently can induce an inappropriate choice of models, potentially compromising the conclusions [
26]. Even in the face of researchers’ efforts to control experimental conditions, the occurrence of phenomena such as overdispersion is common for agricultural systems, as there is great variability [
27,
28].
Among the tested link functions, the complementary log-log was the one that gave the highest values for Deviance—in other words, most models formulated with that link function are overdispersed, except BRS 4103. While functions Probit and Logit showed the smallest deviations, the latter being closer, the Logit function was the one that provided the best adjustments, presenting the appropriate adjustment in 7 of the 10 hybrids and/or cultivars used in this research (
Table 3).
These results demonstrate the importance of considering the choice of the correct link function, since the use of inaccurate models is potentially generating misleading conclusions [
26,
28]. Although the logit model is currently preferred in some areas—for example, in biometrics [
22,
29,
30]—in this study, smaller deviances were obtained for most hybrid and/or cultivar studied; it is necessary to study this by comparing the link seeking functions that best describe the probability of interest [
31].
For all hybrids and/or cultivars tested, there was agreement between the Deviance criterion and the AIC and BIC information criteria—that is, those functions with the least deviations were also those with the lowest AIC and BIC values (
Table 3). This agreement facilitated the selection of the most parsimonious models to evaluate the germination of corn and soybean seeds. Thus, the Probit model was chosen to evaluate the germination of two cultivars, the Logit model of seven cultivars and the Complementary log-log of one cultivar (
Table 4).
The information criteria presented penalize the lack of adjustment to the data and the complexity of the model; therefore, models with lower values were chosen [
19,
32]. According to these criteria, the Logit function stands out as a good alternative to evaluate the germination of corn and soybean seeds, generating theoretical bases for other areas, such as, for example, in seed science and technology, countering the idea that the model Probit should always be used to assess physiological quality, from germination to longevity in thermal models [
12,
14].
Keeping the focus on the estimation of the dispersion parameter, the selected models had their standard errors of the estimates corrected by quasi-likelihood using Deviance to estimate the constant dispersion parameter, where it was used in the procedure in (5). With the application of this correction, as expected, standard errors showed an increase for models with overdispersion and a decrease for models with subdispersion; however, the significance of the parameters was not affected (
Table 4).
The use of quasi-likelihood to correct estimated standard errors is recommended by [
33] and has been adopted in seed germination studies [
18,
29], in entomology data [
34,
35], in the assessment of ecological data [
26], and in the modeling of the number and dry matter mass of Rhizobium nodules in bean culture [
28]. Therefore, there is a solid body of literature on the use of this methodology.
With the selected models (
Table 4), it is possible to estimate the germination times of interest to the researcher, using the formulas presented in (8), (9) and (10). The average germination time or time required for 50% of the seeds used in the germination experiments (T50) is considered to be the preferred one to describe the germination and physiological quality of the seeds submitted to different treatments or to compare different batches of seeds [
7,
8,
9,
11,
14].
In the case of the Probit and Logit models, the T50 can be obtained easily, since the parameters β
0 and β
1 form a linear equation of the type y = a ± bx, where, upon equaling the terms of the equation to zero, the T50 is obtained, because both models have symmetry around zero. Thus, to calculate the T50, we can use the following formula:
in which
is the intercept and
the slope angle.
Table 5 shows the values for T50 obtained in the 10 hybrids and/or cultivars, adopting the models provided in
Table 4 and the expected interval for this parameter obtained in an experimental way. Thus, we can consider the methodology of generalized linear models, adopting the efficient binomial distribution to evaluate the germination of corn and soybeans, since all the results estimated by the chosen functions are contained in the experimental intervals.
As much as authors defend the use of linear models to estimate germination times, considering the assumption of normality of the data [
12,
14,
36], often a simple transformation of the percentages of germination using a certain link function, such as the Probit model (inv. Norm function in Microsoft Excel) does not allow the dataset to be linearized [
14,
16]. Thus, an approach considering germination as a binary variable, in which seeds may or may not germinate, has been more indicated [
13,
18].
It is worth mentioning that another differential of the work is that the calculated germination times are obtained based on the number of viable seeds—that is, the correct definition for the T50 in this research is the time required for 50% of the viable seeds to germinate, not requiring additional formulas to calculate the actual amount of germinated seeds.
In addition to the traditional T50, other parameters can be used to evaluate the germination of a seed batch; for example, it is possible to calculate the time for 10% of viable seeds to germinate, or it is also necessary to identify whether two batches with final germinations have the same germination uniformity or even understand germination as a global process, not being restricted to just a few parameters that can lead to false conclusions about the physiological quality of a seed lot.
As an example of using other parameters to assess germination, we have the work of [
7,
11], using Hill’s nonlinear function of four parameters to estimate beyond T50, the maximum germination time and germination uniformity, which is the time interval between two predefined germinations. The germination times of 10 and 90% of the seeds have also been calculated [
14,
37] to evaluate the physiological quality of seeds. However, the two most widespread statistics for evaluating the germination of a seed lot are the T50 and germination uniformities [
7,
10,
37].
For uniformity of germination in contrast to the T50, there is no standardization, being adopted in several ways: U9010 (time for 90% germination—time for 10% germination) [
37], U8416 (time for 84% germination—time for 16% germination) [
10], U7525 (time for 75% germination—time for 25% germination) [
7], the latter being the most traditional.
Thus, seed lots or cultivars that exhibit the lowest values of T50 or any other germination time can be considered of higher physiological quality; the same reasoning is valid for germination uniformity [
7,
8,
37].
Currently, research involving the evaluation of seed germination to determine previously mentioned parameters largely uses non-linear models or some link function directly on the percentage data, without paying attention to the type of variable studied and its probability distribution, which often causes convergence problems or even severe errors in parameter estimation [
7,
10,
14]. However, when we use generalized linear models, these difficulties are overcome, as we are working with a simple linear equation. As a demonstration, the germination of the corn cultivar BRS 4103 was modeled by the Complementary log-log function, showing the T50 and the germination uniformity (see
Figure 2). The substitution of T50 = 37.59 in the equation shown in
Figure 2 will return the value of ~ −0.3665, corresponding to 50% germination as indicated in
Table 2.
Following selection of link functions better suited to evaluate the germination of each plot, germination times T10, T50, T99 and U7525 were determined for all sampling data, following which, analysis of variance was performed complemented with the means test (LSD) in order to compare the physiological potential of corn and soybean cultivars. According to the (modified) Shapiro–Wilk test [
25], the four parameters evaluated have a normal distribution. The results of the analysis of variance revealed that the F test was significant for all corn parameters, whereas, for soybean cultivars, only germination uniformity was not significant at 5% probability.
When evaluating T10, it was possible to observe that the corn cultivar 2A401PW was the one that showed the slowest germination, while the cultivars AL Bandeirante and BRS 4103 showed faster germinations. For evaluated soybean varieties, cultivating CD2820 IPRO showed slower germination, different to other cultivars using twice the time to reach 10% germination, compared with cultivars CD251 RR and CD2737 RR (
Table 6).
The behavior of cultivars at T50 was altered in relation to T10, only for cultivar 2B587 RR, the corn cultivars AL Bandeirante and BRS 4103 were also faster in reaching 50% germination in approximately 16 h. For soybeans, the cultivar CD2820 IPRO continued to be less vigorous, whereas cultivars CD251 RR and CD2737 RR continued to exhibit greater physiological quality (
Table 6).
The time for 99% of germinated seeds was calculated in order to determine the behavior of hybrids and/or cultivars when they are near to complete 100% germination. Thus, it is observed that the corn hybrid 2B587 RR, which did not present statistical difference in the previous times with the cultivar AS 1633 PRO3, showed difference in more than 22 h. This differentiation may be due to uniformity since the corn hybrid 2B587 RR proved to be less uniform. The best performing corn cultivar in time T99 was AL Bandeirante, also being the most uniform. For soybean cultivars at time T99, it was proved that CD2820 IPRO is the least vigorous and the CD251 RR and CD2737 RR cultivars have the highest physiological quality.
The lower or higher speed of germination of one cultivar in relation to the other is due to the time spent in the restoration of the damaged organelles and tissues before beginning the development of the embryonic axis, during the germination process [
8,
38]. According to [
38], cultivars or seed lots with higher germination speed and uniformity are considered the most vigorous.
The effect of a seed lot can be defined as the sum of the properties that determine the activity and performance of seed lots as acceptable in germination in a wide range of environments [
4,
5]. Thus, the identification of high-performance seed lots or cultivars is an important initiative for the success of agricultural production [
2].
Under these assumptions, we list the corn cultivars AL Bandeirante and BRS 4103 and the soybean cultivars CD251 RR and CD2737 RR as those of greater vigor based on germination and uniformity times. Several authors who, using a simple radicle count, managed to predict the vigor of several species [
8,
39,
40], support this statement.
The higher vigor of hybrids and/or cultivars is evidenced when analyzing germination in a broader context, considering various germination times (T10, T20, T30, T40, T50, T60, T70, T80, T90 and T99) and uniformity standard (U7525) through multivariate classificatory analysis. In general, the cophenetic coefficient was above 0.90, indicating little distortion with the original data matrix [
41]. It was possible to verify the existence of three groups, in the dendrogram, for both corn and soybeans (
Figure 3).
The corn plants with smaller distance were AL Bandeirante and BRS 4103, with Euclidean distance of 0.96 (
Table 7), or had similar physiological quality, confirming the results for T10, T50 and T99 (
Table 6). The greatest difference in physiological quality was observed between the cultivars BRS 4103 and 2B587 RR with Euclidean distance greater than 7 (
Table 7). Regarding soybean cultivars, the closest or less distant were cultivars CD2737 and CD251 RR, with Euclidean distance equal to 0.15 (
Table 7), confirming the results introduced previously (
Table 5). Additionally, the most distant physiological potential was observed for the cultivar CD2820 IPRO with cultivars CD251 RR and CD2737 RR, with Euclidean distance equal to 8.02 (
Table 7).