Use of Phenomics in the Selection of UAV-Based Vegetation Indices and Prediction of Agronomic Traits in Soybean Subjected to Flooding

Charleston dos Santos Lima
Darci Francisco Uhry Junior
Ivan Ricardo Carvalho
2 and
Christian Bredemeier
Department of Crop Science, Federal University of Rio Grande do Sul (UFRGS), Porto Alegre 91540-000, RS, Brazil
Department of Agriculture, Northwestern Regional University of the State of Rio Grande do Sul (UNIJUI), Ijuí 98700-000, RS, Brazil
Author to whom correspondence should be addressed.
AgriEngineering 2024, 6(3), 3261-3278;
Submission received: 1 August 2024 / Revised: 30 August 2024 / Accepted: 5 September 2024 / Published: 10 September 2024
(This article belongs to the Section Remote Sensing in Agriculture)


Flooding is a frequent environmental stress that reduces soybean growth and grain yield in many producing areas in the world, such as the United States, Southeast Asia, and Southern Brazil. In these regions, soybean is frequently cultivated in lowland areas in crop rotation with rice, which provides numerous technical, economic, and environmental benefits. In this context, the identification of the most important spectral variables for the selection of more flooding-tolerant soybean genotypes is a primary demand within plant phenomics, with faster and more reliable results enabled using multispectral sensors mounted on unmanned aerial vehicles (UAVs). Accordingly, this research aimed to identify the optimal UAV-based multispectral vegetation indices for characterizing the response of soybean genotypes subjected to flooding and to test the best linear model fit in predicting tolerance scores, relative maturity group, biomass, and grain yield based on phenomics analysis. Forty-eight soybean cultivars were sown in two environments (flooded and non-flooded). Ground evaluations and UAV-image acquisition were conducted at 13, 38, and 69 days after flooding and at grain harvest, corresponding to the phenological stages V8, R1, R3, and R8, respectively. Data were subjected to variance component analysis and genetic parameters were estimated, with stepwise regression applied for each agronomic variable of interest. Our results showed that vegetation indices behave differently in their suitability for more tolerant genotype selection. Using this approach, phenomics analysis efficiently identified indices with high heritability, accuracy, and genetic variation (>80%), as observed for MSAVI, NDVI, OSAVI, SAVI, VEG, MGRVI, EVI2, NDRE, GRVI, BNDVI, and RGB index. Additionally, variables predicted based on estimated genetic data via phenomics had determination coefficients above 0.90, enabling the reduction in the number of important variables within the linear model.

1. Introduction

Soybean [Glycine max (L.) Merr.] is one of the most relevant commodities on the global market, due to its high oil and protein contents, and is widely used for human and animal nutrition [1,2]. Between 2000 and 2020, global soybean production increased by around 50% in response to the expansion of the agricultural frontier to previously uncultivated regions [3]. This was possible due to the genetic improvement of plants, obtaining more productive cultivars adapted to new growing conditions [4].
Genotypes tolerant to flooded environments have been one of the demands of new production systems, as soybeans have been increasingly sown in lowland areas, which have historically been cultivated with rice (Oryza sativa). However, such areas have hydromorphic soils and are subjected to periods of flooding, generating a reduction of up to 56% in grain yield [5,6]. Therefore, the development of genotypes adapted to conditions of water excess is essential to enable cultivation in these areas.
Currently, the identification of soybean genotypes with increased flooding tolerance and productive stability is based on manual and laborious measurements in the field and visual observation [7], resulting in low operational yield and data reliability [8]. One of the alternatives is the use of high-throughput phenotyping procedures, based on the use of multispectral sensors embedded in unmanned aerial vehicles (UAVs). This process allows the collection of spectral information of the reflected radiation by plants at different wavelengths of the electromagnetic spectrum and correlation with agronomic traits [9,10].
From the extraction of reflectance data, vegetation indices (VIs) can be calculated using a metric transformation of the data to measure the presence and state of vegetation [11]. Such index values are directly related to the characteristics of the photosynthetic response of vegetation to incident light. Therefore, healthy plants have high reflectance in the near-infrared region (>790 nm) and low reflectance in the red one (660 nm), due to the absorption peak of photosynthetic pigments, resulting in high normalized index values [12].
Vegetation indices have been used to monitor crop development due to their relationship with morphological and physiological characteristics of vegetation. Several studies have shown that VIs can adequately estimate variables such as plant height, shoot biomass, and grain yield [13,14,15]. However, the prediction of parameters from mixed models with multiple variables and growing environments has now become indispensable for genotype selection [16,17]. This allows us to understand the relationship between vegetation indices and genetic components in breeding programs [18].
Methodologies based on restricted maximum likelihood (RELM) and best linear unbiased predictor (BLUP) have been used to extract the genetic contribution in the evaluated traits and increase the accuracy of predictions [19]. In this way, the values are stratified into genotypic effects (G), environment (E), and G x E interaction effects [20]. However, there are no studies using genetic spectral data for the prediction and selection of soybean genotypes subjected to flooding. Therefore, understanding the association between variables obtained by multispectral imaging, such as vegetation indices (VIs), and agronomic traits of interest for flooding tolerance is crucial for breeding programs. This approach can identify easier-to-measure variables to be used in the indirect selection of genotypes for more expensive-to-measure traits, resulting in a faster, labor-saving, and large-scale selection process.
The focus of this study is the possibility of identifying promising spectral variables for the selection of soybean genotypes with increased flooding tolerance using the RELM-BLUP methodology. The objective of this research was to identify the best vegetation indices to be used to characterize these genotypes and to test the best adjustment of the linear model in predicting the tolerance score, relative maturity group, shoot biomass, and grain yield from phenomics analysis.

2. Materials and Methods

2.1. Experimental Design

The field experiment was conducted at the Rice Experimental Station of the Riograndense Rice Institute, located in the municipality of Cachoeirinha/RS, southern Brazil (UTM Coordinates: 490000 Easting, 6690000 Northing, zone 22 S) (Figure 1). The soil of the experimental site is classified as a Haplic gleisol (lowland soil), exhibiting the following physical and chemical characteristics in the 0–20 cm layer: clay = 190 mg dm−3; pH = 5.8; P = 20.6 mg dm−3; K = 74 mg dm−3; and organic matter = 14 g kg−1.
To ensure greater genetic variability among the tested materials, 48 soybean cultivars contrasting in terms of relative maturity group (RMG) were sown under rainfed conditions, in a randomized block design in a 48 × 2 factorial scheme (48 cultivars and two environments—flooded or non-flooded), with four repetitions (Figure 1). The experimental units consisted of four rows spaced 0.5 m apart and six meters long, totaling 12 m2.
The cultivars were sown with a plot seeder in a furrow-ridge system on 18 November 2022, with a density of 36 plants m2, which already had industrial seed treatment with fungicide and are listed below: RMG < 6.0: BRMX 57K58, BRMX 57I59, BRMX 57IX60, BRMX 57IX60, DM 54IX57, DM 56I59, DM 60IX64, NEO 510, NEO 560, NEO 560, NEO 580, NEO 590I2X, CZ15B70 IPRO, ST580 I2X, ST592 IPRO, ST 599 IPRO, P95Y02 IPRO, P95R40 IPRO, P95Y42 IPRO, P95R95 IPRO, P95Y95 IPRO, B5560CE, B5595CE, GH2258 IPRO, M5710 I2X, M5710 I2X and M5737 XT; RMG > 6.0: BRMX 64I61, DM 64I63, DM 66I68, DM 70I71, NEO 610, NEO 630, CZ16B17 IPRO, CZ26B12 I2X, ST611 IPRO, ST622 IPRO, TECIRGA6070RR, FTR3868, FTR4664, FTR2266, FTR3165, TMG 2264 IPRO, TMG 2165 IPRO, TMG 22X65 I2X, TMG 7061 PRO, BS IRGA 1642 IPRO, GH5993 IPRO, GH6433 I2X, M6130 I2X, and M6100 XTD.
Fertilization consisted of the application of 250 kg ha−1 of triple superphosphate at sowing and 300 kg ha−1 of potassium chloride on topdressing, divided into two applications (at V1 and V4 soybean growth stages), for a yield expectation of 4 Mg ha−1. The other crop management practices were performed according to recommendations for soybean cultivation.

2.2. Field Evaluations

The different soybean genotypes were subjected to flooding stress at flowering (phenological stage R1), with a water layer of 5 to 10 cm above ground level. The duration of flooding stress was six days, according to the appearance of injury symptoms in the most sensitive cultivars, and then the area was immediately drained to remove excess water.
After drainage, plant response was assessed at 13, 38, and 69 DAF (days after flooding), based on visual notes according to the flooding tolerance score (FTS). This was based on the evaluation criteria described by [21], where a score of 1 indicates no apparent injury and a score of 9 indicates death of all plants in the plot (Figure 2).
At the same time, the height of 10 representative plants of each plot was measured (ALT_o), with the height determined from the soil surface to the tip of the penultimate trifoliate leaf, with the data expressed in cm plant−1. Additionally, the relative chlorophyll content was estimated with the aid of a chlorophyll meter (Figure 3), with measurements carried out on 10 completely expanded random trifoliate leaves in each plot, which were also sampled to quantify the percentage of leaf nitrogen. Proximal NDVI was also determined with a spectroradiometer equipped with an active sensor (Greenseeker, Trimble, Westminster, CO, USA, Figure 3), which emits radiation at wavelengths in the red (660 nm) and near-infrared (790 nm) regions. This procedure is used as a reference parameter for NDVI data collected from different crops.
Grain yield was determined by harvesting two central rows of each plot having a length of five meters (area of 5 m2), at the R8 phenological stage (grain maturity). Samples were subjected to screening, cleaning, and weighing on a precision scale, with the values adjusted to 13% water content.

2.3. Image Acquisition Using UAV

The collection of images for spectral evaluation of each genotype was carried out before flooding, at 13, 38, and 69 DAF (days after flooding) and on the day of grain harvest, when the plants were at the phenological stage V8 (eight fully developed nodes), R1 (onset of flowering), R3 (onset of pod formation), R5 (grain filling), and R8 (physiological maturity). The evaluation in both environments (flooded and non-flooded) was carried out on the same day, to avoid changes in flight conditions.
All flights for image collection were carried out with the aid of a multispectral DJI Phantom 4 RTK unmanned aerial vehicle—UAV (Figure 3), with autonomous flight control via the Pix4D application, which allowed the acquisition of the multispectral reflectance of plants in the wavelengths of red (R: 660 nm), green (G: 550 nm), blue (B: 450 nm), red-edge (RE: 730 nm) and near-infrared (NIR: 840 nm) regions of the electromagnetic spectrum, with a resolution of 20 megapixels. The general settings consisted of 70% frontal and 70% lateral overlap, a flight height of 30 m above ground level, a camera angle of 90º (nadir), and a flight direction diagonal to the plots. The ground sample distance (GSD) of the processed orthomosaics was 2 cm.
The time for image collection was the same for all flights, which were carried out between 12:00 and 13:00. Additionally, the irradiance sensor (sunshine sensor) attached to the upper end of the UAV was used for radiometric correction of the images.
To estimate soybean plant height, the DJI Phanton 4 Advanced platform was used with a RGB camera with a 12-megapixel resolution. This was used to build the digital surface (DSM) and digital terrain (DTM) models, with a flight height of 20 m above ground level and frontal and lateral image overlap of 75%.
Image processing was carried out using the Agisoft Metashape Professional software version 2.0 to construct the orthomosaics, one for each flight, which were associated with geographical information from seven ground control points (GCPs) georeferenced with GNSS-RTK for geometric correction of the photos. To extract spectral information from the orthomosaics, the QGIS software version 3.28 and the “raster calculator” plugin were used to separate the wavelengths (spectral bands) present in each image. From this, the two central lines having a length of four meters of each plot were delimited by adding a shapefile vector layer and then extracting the digital number (reflectance) of each spectral band, to calculate the different vegetation indices (VIs) in the visible and multispectral regions (Table 1).
The VIs used in the present study were chosen because they are widely used in high-throughput plant phenotyping procedures and, in general, in remote sensing studies of vegetation [15]. Furthermore, the VIs shown in Table 1 include the most important bands of the electromagnetic spectrum related to vegetation studies, e.g., the visible bands (blue, green, and red) as well as the infrared ones, e.g., red-edge and near-infrared (NIR).
The estimated plant height (HEI_e) followed the same methodological procedure regarding image processing and obtaining orthomosaics. However, in the QGIS software version 2.0, the two central lines of each plot were delimited by the vector layer, and the digital surface model (DSM) was subtracted from the digital terrain model (DTM), generating the plant height raster layer. To reduce the effect of the soil on the image, the orthomosaic was binarized with values of 0 (soil/background) and 1 (vegetation). After that, the plant height raster layer was divided by the binary layer and then the estimated average plant height per plot was obtained. The summarized workflow diagram of the procedures used in the present study is shown below (Figure 4).

2.4. Statistical Analysis

Data were subjected to the assumptions of the statistical model (model additivity, normality, and homogeneity of residual variances). Afterwards, deviance analysis was carried out at a 5% probability using the X2 test. Additionally, the Akaike criterion and the convergence of the restricted maximum likelihood model (RELM) were analyzed.
The model that best represents the nature of the data was Yij = μ + Bj + Gi + Ej + εij, where Yij is the value measured for each variable in each plot; Bj is the (fixed) block effect; Gj indicates the genetic effect attributed to the cultivars (random); Ej represents the (fixed) environment effect; and εij is the effect associated with the experimental error.
The variables that demonstrated significance of the complete model were used for the estimation of the phenotypic variance (σ2P), broad heritability (H2), genotype mean heritability or strict heritability (h2mg), genotype x environment determination coefficient (GEIr2), accuracy (Ac), genotypic correlation between environments (rgloc), genotypic variation coefficient (Cvg), and experimental error (Cve) for each biometric and spectral variable.
Based on the REML estimates, it was possible to postulate the best unbiased linear predictor (BLUP), which was weighted by the significance of the genotype x environment interaction model (G × E). Therefore, a new average was obtained for each variable analyzed, without the interference of the environment (flooded or non-flooded). Furthermore, in order to verify which variables are the most important for predicting tolerance scores, plant height, shoot biomass, and grain yield, a stepwise (backward) regression model was applied to remove collinear or unimportant variables in the prediction. In this sense, the dependent variables were fixed, with explanatory variables submitted to the multiple linear regression model (y = x1 … xn), using the StepWise procedures at 5% by the t-test, according to the methodology proposed by [22]. Additionally, the independence of variables, absence of inflation, and collinearity were postulated.
All statistical analyses were performed in Rstudio software version 4.3, using the LM (Linear model-native), Metan, and ggplot2 packages [23].

3. Results

3.1. Analysis of Variance Components

The deviance analysis showed a significant effect for the G × E interaction (p < 0.05) for most of the variables analyzed (Table 2), except for the indices BNDVI, EVI2, MSAVI, NDVI, NDVI_G, NDVIRE, OSAVI, RGBindex, SAVI, and grain yield. Therefore, it was evident that some VIs can present stability even in different environments, presenting a genotype-dependent spectral response.
The analysis of variance components and estimated genetic parameters are presented in Table 3. In general, broad sense heritability (H2) and genotypic correlation between environments (rgloc) presented values lower than 60% for most parameters analyzed. This suggests a greater contribution from the heritability of the average genotype (h2mg), G × E interaction (GEIr2), genotypic coefficient of variation (CVg), and error coefficient of variation (CVe).
Considering h2mg for the analyzed variables, high magnitude values (>80%) were found for the vegetation indices MSAVI, NDVI, OSAVI, SAVI, VEG, NDVIRE, MGRVI, EVI2, NDRE, GRVI, BNDVI, RGBindex, EXG, and NDB, which also showed a lower effect of the genotype–environment interaction (GEIr2), high accuracy (Ac) (>70%), and a ratio of the genotypic coefficient of variation and error (CVg/CVe) close to or greater than 1. These factors indicate high reliability in the selection of genotypes based on these evaluated characteristics.
Biometric variables such as HEI_o, HEI_e, biomass, thousand grain weight (TGW), and grain yield (GY) presented intermediate values regarding the heritability of the characteristics (64% < h2mg < 80%). In this scenario, a greater contribution of the G × E interaction to the expression of the characteristic was also observed (6% < GEIr2 < 30%). However, the accuracy and CVg/CVe ratio maintained satisfactory values.
Low percentages of heritability and accuracy (<60%) were observed for the vegetation indices SRNIRRE, VARI, CVI, GVI, SFDVI, GDVI, and CIG, with a higher percentage of the G × E interaction and coefficient of variation of the experimental error in the respective variables. Such results were also observed for the relative content of total chlorophyll (CHLT), leaf nitrogen content (Leaf N), plants with pods at harvest (Plants_L), and dead plants (Plants_M). This implies low reliability in the use of such characteristics for the selection of soybean genotypes in the field.

3.2. Best Linear Unbiased Prediction (BLUP) Analysis

Considering the need to quickly evaluate a large number of cultivars based on spectral characteristics, a BLUP analysis was carried out for the vegetation indices (VIs) that presented the best values of h2mg, CVg, and CVg/CVr ratio, and also for HEI_o, biomass, and grain yield (Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11).
It was evidenced by the BLUP analysis that the cultivars DM66I68, TMG2165, CZ26B12, FTR3165, BRMX64I61, and GH5993 IPRO presented the highest values for the vegetation indices MSAVI, MGRVI, RGBindex, and EXG under flooding conditions. These genotypes also presented higher average values for plant height, plant shoot biomass, and grain yield (60 cm, 2000 kg ha−1, and 1000 kg ha−1, respectively).
Cultivars that presented low values for the MSAVI, MGRVI, RGBindex, and EXG indices in the flooded environment (DM 54IX57, P95Y42 IPRO, B5560CE, ST TECIRGA6070, P95Y02 IPRO, and BRMX 57K58) also demonstrated a reduction of 30% in plant height, and 60% in plant shoot biomass and grain yield, in relation to the best genotypes. This ended up penalizing grain yield, meaning that these cultivars achieved grain yield below 200 kg ha−1.
In the environment without excess water (non-flooded), there was a general increase in the magnitude of the values observed for the vegetation indices and biometric variables. However, the most productive genotypes maintained the same behavior, obtaining higher values for the MGRVI and RGBindex indices, mainly. The absence of flooding allowed average values to be obtained of 80 cm, 4000 kg ha−1, and 3500 kg ha−1 for HEI_o, shoot biomass, and grain yield, respectively (Figure 8, Figure 9 and Figure 10).
When the flooding-sensitive cultivars DM 54IX57, P95Y42 IPRO, B5560CE, ST TECIRGA6070, P95Y02 IPRO, and BRMX 57K58 were evaluated in an environment without flooding, it was found that they also presented lower values for the vegetation indices of greater genetic heritability SAVI, OSAVI, NDVI, VEG, RGBindex, and EXG (Figure 5, Figure 6, Figure 7 and Figure 8). Therefore, these cultivars presented HEI_o, shoot biomass, and grain yield close to the population average of 65 cm, 2500, and 2200 kg ha−1, respectively (Figure 9, Figure 10 and Figure 11).
The general average grain yield of the genotypes in the environments was 2200 kg ha−1. Therefore, a reduction of up to 80% in productivity was observed when the cultivars were subjected to flooding (Figure 11). Additionally, it was found that cultivars with low grain yield potential in the stress-free (non-flooded) environment (P95Y02, P95Y04, and BRMX 57K58) maintained the same behavior when exposed to flooding, presenting the greatest grain yield losses.
The prediction of important agronomic traits in the evaluation of soybean cultivars subjected to flooding (Table 4), and the relationship between observed and predicted genetic data (RELM-BLUP analysis) using linear models for tolerance score (FTS), shoot biomass, and grain yield, were also analyzed (Figure 12). In this sense, it was found that the coefficient of determination for predicting the flooding tolerance score (FTS) was one of the lowest among the predicted traits (0.78). In this way, it was possible to obtain 11 significant variables by stepwise regression. On the other hand, shoot biomass and grain yield demonstrated a coefficient of determination above 0.9, with variations depending on the data used (genetic estimated via RELM-BLUP) and phenotypic value (collected directly from the field).
The coefficient of determination for shoot biomass and grain yield with phenotypic values was 0.91 and 0.94, respectively. However, it was found that the prediction based on genetic values tends to better estimate the behavior of biomass and productivity, with r2 of 0.95 and 0.99, respectively. Additionally, it was found that the prediction based on estimated genetic values allowed the reduction in the number of variables used within the model, making it more parsimonious.
The most important vegetation indices for predicting plant shoot biomass were EVI2, NDRE, NDVI, and NDVIRE, which were completely different from the indices selected for predicting grain yield, namely, CIRE, GLI, MGRVI, OSAVI, RGBindex, and RVI (Table 4).
The relative maturity group (RMG) of the cultivars was also predicted, as an indirect measure of the duration of the growth cycle of the tested materials (Table 4). It was observed that the prediction of RMG from vegetation indices obtained a coefficient of determination close to 0.60, requiring 14 predictors through stepwise regression, similar to the condition observed for predicting flooding tolerance, which demonstrates the greater difficulty of estimating categorical variables via machine learning.

4. Discussion

Under flooding stress conditions, soybean cultivars tend to exhibit distinct morphological, physiological, and spectral behaviors [24,25]. This trend was also verified in the present study, as all genotypes showed a reduction in plant height, shoot biomass, grain yield, and magnitude of vegetation indices. However, some VIs maintained a genotype-dependent response, with high heritability regardless of the cultivation environment.
Heritability in the broad sense (H2) is defined as the sum of dominance variance, epistatic variance, and additive variance [26] The latter is the only fraction capable of being passed on in plant crossings, which can be evaluated by quantitative genetics through strict sense heritability (h2mg) [27]. It is relevant to verify which parameters of the present study have a greater or lesser genetic contribution.
Vegetation indices, such as MSAVI, NDVI, VEG, OSAVI, SAVI, MGRVI, EVI2, NDRE, RGBindex, EXG, and NDB, showed h2mg > 80%, Ac > 90%, and a CVg/CVe ratio close to 1 (Table 3). In this sense, ref. [28] postulated that heritability values (>80%), accuracy (>70%), and Cvg/Cve ratio equal to or greater than 1 indicate data reliability and genetic variation for the trait, allowing their use for the indirect selection of genotypes due to less environmental influence. Furthermore, it indicates that fewer genes may control the expression of the parameter and that Cvg is not dependent on the G × E interaction [29,30]. These factors end up increasing the efficiency of breeding programs by inserting secondary characteristics such as VIs [31].
Recently, ref. [32] evaluated the heritability of vegetation indices in 28 soybean genotypes. These authors observed that the SAVI and EVI indices have heritability of 79% and 82%, respectively, which reported gains with indirect selection of up to 20% when plants were selected at the V8 phenological stage based on the vegetation indices. Additionally, ref. [33] evaluated the behavior of 456 wheat cultivars regarding the NDVI, NDRE, and RVI indices in four environments. The authors found that the estimated genetic values of the indices based on the RELM-BLUP methodology made it possible to accurately select 66% of the most productive genotypes, with NDVI and NDRE being the indices with the highest genetic heritability and high correlation with grain yield, similar to the data obtained in the present study. Therefore, genetic values estimated for VIs via mixed models from soybean cultivars in different environments (flooded or non-flooded) can be used as secondary selection variables. This is due to the removal of effects from the growing environment [34,35].
According to the RELM-BLUP methodology, variables that demonstrated a high percentage of G × E and CVe interaction (CHLT, N_leaf, Plants_L, GLI, SRNIRRE, VARI, CVI, GVI, SFDVI, EXGRaw, GDVI, and CIG), did not demonstrate consistent results in the evaluation of genotypes between environments (Table 3). This may be associated with significant changes that excess water can cause in the leaf area, number of pods plant−1, and photosynthetic pigments, in addition to modifying radiation reflectance at wavelengths of the visible (400–700 nm), near-infrared (760–900 nm), and short-wave infrared (1600–1850 nm) regions of the electromagnetic spectrum [25]. Previous studies [36,37] reported a better characterization of plants under stress when using the red, red-edge, and infrared spectral bands.
Due to the ability of water stress to mainly affect photosynthetic pigments and relative water content in the leaf, reflectance in the blue, red, and infrared regions of the electromagnetic spectrum is significantly higher under stress conditions [38]. Therefore, indices such as MSAVI, SAVI, OSAVI, NDRE, NDVI, EXG, VEG, MGRVI, and RGBindex can provide better characterization of plants in different environments, since they integrate spectral bands sensitive to photosynthetic changes and water content in the leaf [39,40].
From the stepwise analysis for character prediction, it was evident that RMG was the variable with the lowest fit in the linear model (0.57), requiring 14 predictors in the analysis. In this sense, ref. [13] also found the greater complexity of the correlation between VIs and the duration of soybean crop cycle, which obtained an r2 = 0.70, 30% residual effect, and a significant contribution only from the NDVI, SAVI, GNDVI, and NDRE indices. Additionally, FTS was the variable that also presented the lowest model fit (0.78). This fact can be considered acceptable due to the impact of flooding time (DAF) and flight height on image collection. From this, it is possible to obtain better results with 6 to 12 DAF and a flight altitude of 20 m [24,41]. This was also followed in the present study, except for the flight height (30 m), in addition to the factor associated with the recovery time of the cultivars, which can change the FTS throughout the cycle.
The selection of estimated phenotypic or genetic values for predicting shoot biomass and grain yield showed similar behavior (Table 4). Both values presented a coefficient of determination higher than 90%. However, the number of variables necessary to estimate the behavior was different; a reduced model was obtained with the use of genetic values estimated via RELM-BLUP, with reduced processing time and complexity of the analysis.
Additionally, the comparison of predicted versus observed data (Figure 12) demonstrated that the values were adjusted by the linear model, even with a smaller number of variables in the model with genetic data (RELM-BLUP). Furthermore, more robust prediction models could be applied as a strategy to evaluate parameters with high variability, such as shoot biomass and grain yield.
Efficiency in predicting variables within any model is essential, as it reduces data collection time and avoids using parameters with high collinearity. Therefore, using the RELM-BLUP methodology on spectral variables, the effect of the environment is removed and only the genetic weight is attributed to the variable [20]. This allows the maximization of selection accuracy and identification of variables with a greater additive genetic effect [34]. Moreover, it enables the use of VIs that have stability between environments and high genetic heritability. However, there are still few studies on this topic, which justifies the relevance of the study for high-throughput soybean phenotyping based on vegetation indices and RELM-BLUP methodology.
Thus, high-throughput phenotyping using multispectral sensors embedded in unmanned aerial vehicles (UAVs) can be applied, through the use of vegetation indices with greater genetic heritability and lower environmental effect (as e.g., MSAVI, NDVI, SAVI, OSAVI, VEG, and MGRVI). This enables a faster and more accurate selection of soybean genotypes with greater tolerance to flooding, since the usual methods used in this selection, such as tolerance scores or destructive shoot biomass samples, are time-consuming and laborious. In this sense, the tools and VIs shown in our study can be used in a complementary way in a plant breeding program, both for predicting variables as well as for genotype selection.
One of the main limitations relates to the compilation of data on a larger scale, which could be minimized by creating a phenotyping package to automate data collection on a greater number of genotypes and, especially, different sites. To minimize the effect of the environment on the methodology proposed in the present study, it is crucial to standardize the procedures for acquiring and processing images obtained by a multispectral UAV-based sensor, such as the radiometric correction by the irradiance sensor, the geometric correction using ground control points (GCPs), and time of flight. This standardization was undertaken in our study and must be observed in subsequent studies.

5. Conclusions

  • The environmental factor exerts a significant influence on vegetation indices, which limits their large-scale use in characterizing soybean genotypes with differential tolerance to flooding using multispectral sensors embedded on unmanned aerial vehicles.
  • The phenomic analysis was efficient for identifying vegetation indices with greater heritability, accuracy, and genetic variation, namely, MSAVI, NDVI, SAVI, OSAVI, VEG, MGRVI, EVI2, GRVI, RGBindex, EXG, NDB, and HEI_e. These variables can be used in subsequent studies to develop predictive models of agronomic variables linked to the response of soybeans to flooding with high accuracy.
  • The methodology based on RELM-BLUP for characterizing the behavior of each spectral variable depending on the environment and evaluated genotypes allows the identification of predictors (IVs) with greater genetic heritability and their use in the selection of soybean genotypes tolerant to flooding.
  • Variables predicted based on the use of genetic data estimated via phenomics presented a coefficient of determination above 0.90 and allow the reduction in the number of important variables within the linear model of prediction.

Author Contributions

C.d.S.L.: conceptualization, investigation, conducting and collecting data from the experiment and writing—original draft. D.F.U.J.: conceptualization, methodology and supervision. I.R.C.: formal analysis, methodology and writing—review and editing. C.B.: conceptualization, investigation, supervision and writing—review and editing. All authors have read and agreed to the published version of the manuscript.


This research received funding from the National Council for Scientific and Technological Development (CNPq) for the Ph.D. scholarship of the first author, process number: 140742/2021-2.

Data Availability Statement

Data are contained within this article.


Authors thank the Riograndense Rice Institute (IRGA).

Conflicts of Interest

The authors declare no conflicts of interest.


Figure 1. Location and experimental design used to evaluate 48 soybean genotypes subjected to two environments (flooded or non-flooded), during the 2022/23 growing season.
Figure 1. Location and experimental design used to evaluate 48 soybean genotypes subjected to two environments (flooded or non-flooded), during the 2022/23 growing season.
Agriengineering 06 00186 g001
Figure 2. Visual rating scale referring to the flooding tolerance scores (FTS). 1: no symptoms of injury and 9: death of all plants in the plot.
Figure 2. Visual rating scale referring to the flooding tolerance scores (FTS). 1: no symptoms of injury and 9: death of all plants in the plot.
Agriengineering 06 00186 g002
Figure 3. Platforms and sensors used to collect spectral data.
Figure 3. Platforms and sensors used to collect spectral data.
Agriengineering 06 00186 g003
Figure 4. Workflow diagram of the procedures used in the present study.
Figure 4. Workflow diagram of the procedures used in the present study.
Agriengineering 06 00186 g004
Figure 5. BLUP analysis with predicted genetic data for the MSAVI index.
Figure 5. BLUP analysis with predicted genetic data for the MSAVI index.
Agriengineering 06 00186 g005
Figure 6. BLUP analysis with predicted genetic data for the MGRVI index.
Figure 6. BLUP analysis with predicted genetic data for the MGRVI index.
Agriengineering 06 00186 g006
Figure 7. BLUP with the predicted genetic data for the RGB index.
Figure 7. BLUP with the predicted genetic data for the RGB index.
Agriengineering 06 00186 g007
Figure 8. BLUP with the predicted genetic data for the EXG index.
Figure 8. BLUP with the predicted genetic data for the EXG index.
Agriengineering 06 00186 g008
Figure 9. BLUP with estimated genetic data for plant height (cm).
Figure 9. BLUP with estimated genetic data for plant height (cm).
Agriengineering 06 00186 g009
Figure 10. BLUP with genetic data estimated for shoot biomass (kg ha−1).
Figure 10. BLUP with genetic data estimated for shoot biomass (kg ha−1).
Agriengineering 06 00186 g010
Figure 11. BLUP with predicted genetic data for grain yield (kg ha−1).
Figure 11. BLUP with predicted genetic data for grain yield (kg ha−1).
Agriengineering 06 00186 g011
Figure 12. Relationship between observed and predicted genetic data (RELM-BLUP analysis) using linear models for flooding tolerance score—FTS (a), shoot biomass (b), and grain yield (c).
Figure 12. Relationship between observed and predicted genetic data (RELM-BLUP analysis) using linear models for flooding tolerance score—FTS (a), shoot biomass (b), and grain yield (c).
Agriengineering 06 00186 g012
Table 1. Vegetation indices (VIs) used to analyze soybean cultivars.
Table 1. Vegetation indices (VIs) used to analyze soybean cultivars.
RGBExcess greenEXG = 2 * (G(R + G + B)) − (R/(R + G + B)) − B/(R + G + B))
Excess raw greenEXGRaw = 2 * G − R − B
Excess redEXR = (1.4 * R − G)/(R + G + B)
Excess raw redExRRaw = 1.4 * R − G
Vegetation extraction color indexCIVE = (0.44 * R − 0.81 * G + 0.39 * B)/(R + G + B) + 18.79
Normalized green–red indexGRVI = (G − R)/(G + R)
Excess modified redExRM = (2 * R − G − B)/(R + G + B)
Vegetative indexVEG = g/((r^a) * (b^(1 − a)))
a = 0.667, r = R/(R + G + B), g = G/(G + R + B), b = B/(R + G + B)
Excess blueExB = (1.4 * B − G)/(R + G + B)
Normalized blue differenceNDB = (G − B)/(G + B)
Atmospheric resistant index in visibleVARI = (G–R)/(G + R + B)
Green leaf IndexGLI = ((G − R) + (G − B))/(2 * G + R + B)
Triangular green IndexTGI = (−0.5) * [(190) * (R − G) − (120) * (R − B)]/(R + G + B)
Red–blue IndexRBI = (R − B)/(R + B)
Modified green–red indexMGRVI = (G2−R2)/(G2 + R2)
RGB indexRGBindex = (G2 − B * R)/ (G2 + B * R)
MULTISPECTRALNormalized differenceNDVI = (NIR − R)/(NIR + R)
Edge of red NDRE = (NIR − RE)/(NIR + RE)
Soil-adjusted indexSAVI = (1 + 0.5) * [(NIR − R)/(NIR + R + 0.5)]
Modified soil-adjusted indexMSAVI = (2 * NIR + 1−sqrt((2 * NIR + 1) * 2–8 * (NIR−R)))/2
Optimized soil-adjusted indexOSAVI = (1 + 0.5) * ((NIR − R)/(NIR + R + 0.5))
Improved vegetation indexEVI = 2.5 * (NIR − R)/(NIR + 6 * R − 7.5 * B + 1)
Normalized difference from greenGNDVI = ((NIR − G)/(NIR + G))
Chlorophyll-green indexCIG = NIR/G − 1
Red-edge chlorophyll indexCIRE = NIR/RE − 1
Simple ratio indexRVI = NIR/R
Vegetation difference indexDVI = NIR − R
Green vegetation difference indexGDVI = NIR − G
Green ratio vegetation indexGVI = NIR/G
Vegetation-chlorophyll index CVI = NIR * (R/G2)
Improved vegetation index 2EVI2 = 2.5 * (NIR − R)/(NIR + 2.4 * R + 1)
Normalized blue differenceBNDVI = (NIR − B)/(NIR + B)
R-Re normalized differenceNDVIRE = (RE − R)/(RE + R)
PanNDVIPanNDVI = NIR − (G + R + B)/NIR + (G + R + B)
Simple NIR/RE ratioSRNIRRe = NIR/RE
Spectral feature depthSFDVI = ((NIR + G)/2) − ((R + NIR)/2)
Table 2. Deviance analysis (p < 0.05) using the restricted maximum likelihood method (RELM) in 48 soybean genotypes subjected to two environments.
Table 2. Deviance analysis (p < 0.05) using the restricted maximum likelihood method (RELM) in 48 soybean genotypes subjected to two environments.
G × E0.0000.0000.0090.2120.0000.0000.001
G × E0.0000.0000.9990.0340.0000.0000.000
G × E0.0000.0030.0000.0070.9840.0020.015
G × E0.0250.9991.0000.1551.0000.0020.000
G × E0.0000.0000.1610.3470.0000.9990.000
G × E0.0000.0000.0000.000
Gen: genotype effect; G × E: effect of the genotype-environment interaction. HEI_e: estimated height; HEI_o: observed height; N_leaf: nitrogen in leaf tissue; Plants_L: plants with pods at harvest; Plants_M: dead plants; TGW: thousand grain weight; GY: grain yield.
Table 3. Analysis of variance components and estimated genetic parameters.
Table 3. Analysis of variance components and estimated genetic parameters.
Parametersσ2 P(H2)GEIr2h2mgAcrglocCVgCVeCVg/
Leaf N0.
σ2 P: Phenotypic variance; H2: broad heritability; GEIr2: genotype x environment interaction; h2mg: mean heritability of genotypes; Ac: accuracy; rgloc: genotypic relationship between environments; Cvg: genetic coefficient of variation; CVe: error coefficient of variation.
Table 4. Variable prediction from multiple linear models (stepwise).
Table 4. Variable prediction from multiple linear models (stepwise).
Predicted Variabler2Indices Selected via Stepwise (p < 0.05)
Flooding tolerance score (FTS)0.78CHLT, CIRE, EXGRaw, GVI, NDRE, NDVIRE, PanNDVI,
Shoot biomass (phenotypic values)0.91CHLT, CIG, CVI, EVI2, EXGraw, GDVI, GNDVI, MSAVI,
Shoot biomass (genotypic values)0.95EVI2, NDRE, NDVI, NDVIRE
Grain yield (phenotypic values)0.94CIG, CIRE, EXG, EXGRaw, GDVI, NDRE, SFDVI, SRNIRRE
Grain yield (genotypic values)0.99CHLT, CIRE, GLI, MGRVI, OSAVI, RGBindex, RVI
Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

