Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor

Santana, Dthenifer Cordeiro; de Oliveira, Izabela Cristina; Cavalheiro, Sâmela Beutinger; das Chagas, Paulo Henrique Menezes; Teixeira Filho, Marcelo Carvalho Minhoto; Della-Silva, João Lucas; Teodoro, Larissa Pereira Ribeiro; Campos, Cid Naudi Silva; Baio, Fábio Henrique Rojo; da Silva Junior, Carlos Antonio; Teodoro, Paulo Eduardo

doi:10.3390/agriengineering6020090

Open AccessArticle

Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor

by

Dthenifer Cordeiro Santana

¹

,

Izabela Cristina de Oliveira

¹

,

Sâmela Beutinger Cavalheiro

¹,

Paulo Henrique Menezes das Chagas

¹,

Marcelo Carvalho Minhoto Teixeira Filho

²

,

João Lucas Della-Silva

³

,

Larissa Pereira Ribeiro Teodoro

¹

,

Cid Naudi Silva Campos

¹

,

Fábio Henrique Rojo Baio

¹

,

Carlos Antonio da Silva Junior

³

and

Paulo Eduardo Teodoro

^1,*

¹

Department of Agronomy, Federal University of Mato Grosso do Sul (UFMS), Chapadão do Sul 79560-000, MS, Brazil

²

Department of Agronomy, State University of São Paulo (UNESP), Ilha Solteira 15385-000, SP, Brazil

³

Department of Geography, State University of Mato Grosso (UNEMAT), Sinop 78550-000, MT, Brazil

^*

Author to whom correspondence should be addressed.

AgriEngineering 2024, 6(2), 1581-1593; https://doi.org/10.3390/agriengineering6020090

Submission received: 22 March 2024 / Revised: 13 May 2024 / Accepted: 28 May 2024 / Published: 1 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Making plant breeding programs less expensive, fast, practical, and accurate, especially for soybeans, promotes the selection of new soybean genotypes and contributes to the emergence of new varieties that are more efficient in absorbing and metabolizing nutrients. Using spectral information from soybean genotypes combined with nutritional information on secondary macronutrients can help genetic improvement programs select populations that are efficient in absorbing and metabolizing these nutrients. In addition, using machine learning algorithms to process this information makes the acquisition of superior genotypes more accurate. Therefore, the objective of the work was to verify the classification performance of soybean genotypes regarding secondary macronutrients by ML algorithms and different inputs. The experiment was conducted in the experimental area of the Federal University of Mato Grosso do Sul, municipality of Chapadão do Sul, Brazil. Soybean was sown in the 2019/20 crop season, with the planting of 103 F2 soybean populations. The experimental design used was randomized blocks, with two replications. At 60 days after crop emergence (DAE), spectral images were collected with a Sensifly eBee RTK fixed-wing remotely piloted aircraft (RPA), with autonomous takeoff control, flight plan, and landing. At the reproductive stage (R1), three leaves were collected per plant to determine the macronutrients calcium (Ca), magnesium (Mg), and sulfur (S) levels. The data obtained from the spectral information and the nutritional values of the genotypes in relation to Ca, Mg, and S were subjected to a Pearson correlation analysis; a PC analysis was carried out with a k-means algorithm to divide the genotypes into clusters. The clusters were taken as output variables, while the spectral data were used as input variables for the classification models in the machine learning analyses. The configurations tested in the models were spectral bands (SBs), vegetation indices (VIs), and a combination of both. The combination of machine learning algorithms with spectral data can provide important biological information about soybean plants. The classification of soybean genotypes according to calcium, magnesium, and sulfur content can maximize time, effort, and labor in field evaluations in genetic improvement programs. Therefore, the use of spectral bands as input data in random forest algorithms makes the process of classifying soybean genotypes in terms of secondary macronutrients efficient and important for researchers in the field.

Keywords:

digital agriculture; random forest; remote sensing; secondary macronutrients; plant breeding

1. Introduction

Soybean genetic improvement programs face the challenge of developing more productive and management-responsive genotypes, bearing part of the responsibility for ensuring effective solutions by 2050, seeking to meet the food needs of the world population, making it necessary to double the current agricultural production rate [1]. Among the crops that are responsible for global food security, soybean is a source of protein widely used in animal feed, and a basis for the production of oil used in human food and biofuels [2].

Conventionally in soybean breeding programs, superior cultivars are selected through phenotypic traits, manually and visually measuring the characteristics of interest [3]. Several aspects of breeding programs seek to improve the performance of plants under abiotic stress, mainly to find individuals resilient to conditions of low availability of water and nutrients, as well as to improve the efficiency in the use of these inputs [4]. Root nutrient absorption rates are highly heritable and there is a notable genotypic preference for specific ions [5]. Thus, it is possible to select genotypes that are capable of absorbing certain nutrients, making it possible to generate populations that are efficient in their use, generating savings with the use of fertilizers, and reducing negative environmental impacts due to mistaken applications of them.

The selection of higher plants of agronomic interest has been based on phenotypic traits long before the discovery of DNA. And within breeding, these selected plants are used in crosses, in which the more crosses and environments used to evaluate the performance of the selection response, the greater the chance of success of the progenies. During the breeding process, researchers need to phenotype a large number of plants, where there is a need to accurately identify the best progeny. In the field of genotyping, there have been significant advances that have provided rapid and low-cost genomic information [6], such as marker-assisted recurrent selection (MARS) and genomic selection, in which, like all advances in genomic analysis, phenotypic data are required [7].

In recent years, the great development of the use of high-throughput phenotyping (HTP) in agriculture is relatively new despite the implementation base being remote sensing, which is a well-established field of research [8]. This type of phenotypic measurement allows for obtaining information about the plant in a detailed and non-invasive approach, as well as enabling assessments throughout the plant’s life cycle. Therefore, plant breeders will be able to collect information more efficiently about the variables of interest, which enables them to evaluate large soybean populations in a quick and accurate way [9].

HTP is based on a multiple-image system, in which multispectral sensors operate at determined angles, allowing for the derivation of a mathematical relation between several two-dimensional (2D) images in the visible range (RGB), which allows for obtaining spectral information from plants. Every method that makes use of HTP technologies requires calibration in view of finding accurate answers and enabling image information understanding with plant growth dynamics, so that these data sets can ultimately be used to measure phenotypic variation in biological systems of interest [10].

By obtaining the visible spectral bands, it is possible to carry out calculations to obtain vegetation indices (VIs), which function as metrics related to, for example, senescence, nutritional status, and chlorophyll degradation due to some stress, such as water stress or pathogen [6]. In this context, the spectral region of 470–800 nm is important in the relationship between leaf pigments and nutritional elements, including the secondary macronutrients calcium (Ca), magnesium (Mg), and sulfur (S) [11].

Multispectral sensors generate a large amount of spectral data, which are not directly related to agronomic variables of interest. Based on this data, the use of machine learning (ML) algorithms such as statistical analyses are capable of combining spectral and agronomic information about crops, providing accurate results, especially regarding the recognition of patterns that optimize the identification of soybean genotypes with greater accuracy [12]. ML algorithms can help with various issues regarding plant classification. To be efficient, data must be collected in a systematic and representative way to enable the design of a reliable data set [13].

In recent literature, there are works that use ML techniques together with multispectral data for various activities linked to phenotyping, such as [14] using leaf reflectance to classify soybean genotypes in terms of industrial characters, reaching levels of correct classification close to 0.9 [15]. Due to these applications, the use of such technologies can be promising in classifying soybean genotypes efficient in nutrient absorption; Santana et al. [16] managed to carry out such selection in soybeans for primary macronutrients, achieving greater precision with algorithms such as SVM and J48.

Using spectral data from soybean genotypes combined with nutritional information regarding secondary macronutrients can help genetic breeding programs select populations that are efficient in absorbing and metabolizing these nutrients. Combined with this information, using machine learning algorithms for data processing makes the selection of superior genotypes more accurate. Therefore, the objective of the study was to verify the classification performance of soybean genotypes regarding secondary macronutrients by ML algorithms and different inputs in datasets.

2. Materials and Methods

The experiment was carried out in the experimental area of the Universidade Federal de Mato Grosso do Sul, municipality of Chapadão do Sul—MS (18°41′33″ S, 52°40′45″ W, altitude of 810 m). Soybean was sown in the 2019/20 crop season, with the planting of 103 F2 soybean populations, using the conventional soil preparation system (plowing and harrowing) (Figure 1).

The region has soil classified as dystrophic red latossol, clay texture, pH (H₂O) = 6.2; exchangeable Al (cmolc dm⁻³) = 0.0; Ca+Mg (cmolc dm⁻³) = 4.31; P (mg dm⁻³) = 41.3; K (cmolc dm⁻³) = 0.2; organic matter (g dm⁻³) = 19.74; V (%) = 45; m (%) = 0.0; sum of bases (cmolc dm⁻³) = 2.3; CTC (cmolc dm⁻³) = 5.1, in the 0–0.20 m layer [17]. According to the Köppen classification, the region’s climate is Tropical Savanna (Aw), which means a humid summer and a dry winter.

The experimental design was randomized blocks with two replications, featuring planting lines 3 m long per plot, spacing of 0.45 m, and planting density of 15 plants per meter. The evaluations took place on central line plants.

For sowing, the seeds were treated with fungicide (Pyraclotrobin + Methyl Thiophanate) and insecticide (Fipronil), at a dose of 200 mL of commercial product for every 100 kg of seeds, to prevent pests and soil diseases. Inoculation of seeds with bacteria of the genus Bradyrhizobium occurred with a dose of 200 mL of concentrated liquid inoculant for every 100 kg of seeds. Other cultural treatments were carried out according to the crop needs.

At 60 days after crop emergence (DAE), spectral images were generated based on the Sensifly eBee RTK fixed-wing remotely piloted aircraft (RPA), with autonomous control of takeoff, flight plan, and landing. A Parrot Sequoia multispectral sensor was boarded on the eBee, from where images were generated at 09:00 in the morning, at an altitude of 100 m, spatial resolution of 0.10 m, and with a clear sky of clouds. Radiometric calibration of the sensor was performed for the entire scene, using a calibrated reflective surface, provided by the manufacturer. The Parrot Sequoia multispectral sensor has a luminosity detector, allowing for the calibration of acquired values. The Sequoia sensor is a multispectral camera for agriculture that uses a sunlight sensor and an additional 16 Mpx RGB camera for recognition. The multispectral sensor used was acquired with a horizontal field of view (HFOV) of 61.9°, vertical field of view (VFOV) of 48.5°, and diagonal field of view (DFOV) of 73.7°, as explained by [15]. Reflectance values were obtained by the average of each repetition of the 103 soybean genotypes evaluated, obtaining wavelength information red (660 nm), green (550 nm), NIR (735 nm), and red-edge (790 nm) spectral bands (SBs). These wavelengths enabled the calculations of vegetation indices (VIs) such as the Enhanced Vegetation Index (EVI, [18]), Green Normalized Difference Vegetation Index (GNDVI, [19]), Modified Chlorophyll Absorption in Reflectance Index (MCARI, [20]), Modified Soil-adjusted Vegetation Index (MSAVI, [20], Normalized Difference Red Edge Index (NDRE, [19]), Normalized Difference Vegetation Index (NDVI, [21]), Soil-adjusted Vegetation Index (SAVI, [22]), and Simplified Canopy Chlorophyll Content Index (SCCCI, [23]).

RTK (Real-Time Kinematics) technology enabled aerial surveying and estimation of the camera position at the time of image collection, with an accuracy of 2.5 m. The images obtained were mosaicked and orthorectified using the computer program Pix4Dmapper, with the positional accuracy of the orthoimages verified with ground control points (GCPs) surveyed with RTK.

In those cases where the plant reaches the reproductive stage (R1), three leaves of each plant were collected and washed with water, mild detergent solution (0.1%), acid solution (HCl 0.3%), and deionized water. After washing, samples were kept in paper bags and dried in a forced circulation oven at 65 ± 5 °C, until constant dry mass condition. Then, the samples were weighed on a precision scale (0.0001 g) and ground in a Wiley mill. The micronutrient content (calcium, magnesium, and sulfur) was gauged following adequate methods [24].

Data from spectral information and micronutrient nutritional values of genotypes were subjected to Pearson correlation analysis through Rbio software [25]. From this result, the k-means algorithm was applied for grouping near centroids genotypes to avoid significant variation in minimal distance observation, and thus clustering in two groups. Principal component (PC) analysis was performed to express cluster separation with biplot, based on the “ggfortify” library in R software [26]. Further, following the Tukey test, boxplots for each cluster nutrient content were designed to highlight the higher nutrient content in each genotype set.

The formed clusters were used as output variables, while the spectral data were used as input variables for the following classification models in the machine learning analyses: Multilayer Perceptron Artificial Neural Network (ANN, [27]), REPTree Decision Tree Algorithm (DT, [28]), J48 Decision Tree Algorithm (J48, [29]), Logistic Regression (LR, [30]), random forest (RF, [31]), and Support Vector Machine (SVM, [32]). The algorithms were chosen according to those most recently used in the literature [16,33,34]. The inputs tested in the datasets were spectral bands (SBs), vegetation indices (VIs), and the combination of both VIs+SBs. Cluster classification was based on stratified cross-validation with k-fold = 10 and ten replications, obtaining 100 runs for each model.

The used models’ parameters were defined by following the default configuration in Weka 3.8.5 software. The models’ performance was evaluated according to accuracy metrics of percentage of correct classifications (CCs), F-score, and kappa coefficient, where the higher the values for the metrics, the better the performance of the algorithms. The performance of inputs, ML models, and interaction between them was verified through analysis of variance based on the models, resulting in boxplots with means, with significance at the 5% level according to the Scott–Knott. Such a task was based on ggplot2 and ExpDes.pt libraries from the R software [26].

C C = \frac{t r u e p o s i t i v e c l a s s i f i c a t i o n}{t r u e p o s i t i v e c l a s s i f i c a t i o n + f a l s e n e g a t i v e c l a s s i f i c a t i o n + f a l s e p o s i t i v e c l a s s i f i c a t i o n} \times 100

F s c o r e = \frac{2 \times t r u e p o s i t i v e c l a s s i f i c a t i o n}{2 \times t r u e p o s i t i v e c l a s s i f i c a t i o n + f a l s e n e g a t i v e c l a s s i f i c a t i o n + f a l s e p o s i t i v e c l a s s i f i c a t i o n}

K a p p a = \frac{(o b s e r v e d a g r e e m e n t - a g r e e m e n t e x p e c t e d b y c h a n c e)}{(1 - a g r e e m e n t e x p e c t e d b y c h a n c e)}

3. Results and Discussion

The Pearson correlation analysis was plotted in the form of a scatterplot (Figure 2), where the shades in red represent positive correlations; the more intense the color, the greater the magnitude of the correlation. Similarly, negative correlations are expressed by the colors in blue, using the same tone condition associated with magnitude. A medium magnitude correlation was noticed between Ca and Mg. The spectral variables presented a low magnitude of correlation with the macronutrients and a high magnitude with each other, in which red and green presented high negative correlations with the VIs, and the VIs and red-edge reached high positive correlations with each other.

The median correlation between calcium and magnesium (Figure 1) can be explained by their similar chemical properties, such as ionic radius, valence, degree of hydration, and mobility, thus these nutrients compete for adsorption sites in the soil at the time of being absorbed by plants [35]. Due to this competition for the same absorption site, soil levels of Ca and Mg must be in balance since the overload of one limits the absorption of the other, which means lower levels of these nutrients in plant leaves and seeds [36]. The Ca and S and Mg and S correlations showed very low correlations, due to the different absorption and metabolic routes within the plant.

The high correlations between spectral bands and vegetation indices are already expected relationships since VI calculation relies on SB data [33]. The low correlations observed between nutritional and spectral variables are attributed to the lack of linearity between these variables, which have complex relationships not explained by traditional statistical methods, such as Pearson’s correlations, in which the most recommended is the use of ML algorithms, which overcome problems with the lack of linearity between nutritional and spectral variables [37]. These ML algorithms are robust enough to provide reliable results on the relationship between spectral and agronomic data, thus making the results more reliable [38].

Two clusters were set (C1 and C2) through PC analysis clustering using the k-means algorithm, which within the cluster have mutual parity and are distinct from other cluster genotypes, based on the macronutrients evaluated (Figure 3). The first two principal components combined represent 69% of the total data variation, a value very close to that recommended in past research [39], which suggested a value above 70%, managing to confidently group the genotypes and go further with subsequent analyses.

Every genotype received the same fertilization management despite presenting different levels of Ca, Mg, and S, which allowed us to separate them into two groups (Figure 2) with the help of the k-means algorithm. The purpose of the k-means algorithm is to split the dataset based on the clustering criterion, in which data are grouped in view of trait similarity, and the designed groups by k-means are clusters [40]. After the clusters are defined based on the secondary macronutrient amount in the leaves, PC analysis was carried out with the first two principal components.

Subsequently, the nutrients from each cluster were subjected to the Tukey test, in which the genotypes grouped in Cluster 2 reached significantly higher values of secondary macronutrient concentration when compared to those in Cluster 1 (Figure 4).

By cluster definition, it is noted that Cluster 2 presented the highest means for all nutrients (Figure 4). Different genotypes presented different efficiencies in the use of nutrients, which are influenced by genetic and physiological factors. The plant being efficient in certain nutrients refers to an individual that produces higher yields per unit of nutrient applied or absorbed when compared to other plants grown in similar environmental conditions [41]. Therefore, it can be stated that the genotypes contained in Cluster 2 are superior in terms of Ca, Mg, and S metabolization efficiency.

Analyzing the performance of the machine learning algorithms, three different parameters were used, namely correct classification (CC), F-score, and kappa coefficient. The interaction between Inputs × ML was found to be significant for correct classification (CC), kappa, and F-score (Table 1).

With genotype clusters set, machine learning algorithm analyses were carried out with different inputs from these algorithms, searching for the one with the greatest accuracy in classifying the groups. The combination of ML with multispectral data demonstrates exceptional results in modeling diverse crop characteristics, such as yield, biomass, and height [42]. ML methods use advanced statistical devices to model non-linear data that have complex actions among spectral variables and biological variables linked to plants [43]. The evaluation pattern for ML in this work was based on the LR algorithm, in which algorithms that presented superior results were sought.

From the perspective of the inputs in the algorithms, the ML techniques employed do not present notable differences, except for LR, in which the SB+VIs input had greater performance, reaching an accuracy close to 0.60 (Figure 5). Evaluating the SB input, the algorithms that obtained the best results were J48, RF, and SVM, achieving accuracies of around 0.55 and 0.60 for this metric. In input VIs, the algorithm that performed best was RF. In view of SB+VIs input, RL had better accuracy for CC, close to 0.60.

The inputs used provided a difference only for RF and RL, in which the best performances were achieved using IVs and SB+IVs, respectively. Evaluating the performance of each input within each algorithm, the use of SB provided better results for RF, which also presented better results when IVs were used. SB+IVs provided better performances for SVM (Figure 6).

In the F-score accuracy metric evaluating the performance of the algorithms with the three inputs, it is noted that DT, RF, and SVM showed no difference in performance regardless of the input used (Figure 7). ANN and RL showed better performance when using SB+IVs. The J48 algorithm showed better performance when using SB, above 0.5 accuracy. Evaluating the performance of the algorithms with each input, both SB and IVs provided better accuracies for RF, reaching performance above 0.5. SB+IVs showed better accuracies for RF and RL, with accuracies between 0.5 and 0.6.

In general, RF performed better for all tested accuracy metrics, especially when the tested inputs were SB and VIs. LR presented a good performance for the metrics when SB+VIs was used. J48 and SVM performed well when using the algorithms’ input SB for only the CC accuracy metric.

Vegetation indices enable the summarization of information regarding the plant canopy reflectance, which makes it possible to evaluate various quantitative and qualitative plant parameters when combined with algorithms [44]. However, the use of spectral bands makes figures more viable from a data processing perspective due to the absence of a requirement in mathematical calculations to obtain inputs, as occurs with vegetation indices [33]. The use of spectral bands as input data for ML algorithms presents accurate results for the identification of soybean cultivars [14]. Greater precision was detected among the tested algorithms, as the use of spectral bands as model input data provided better accuracy in determining height and maturation days in soybean plants [45]. Good accuracy was also found in soybean genotypes classification when it comes to oil and protein characteristics [33].

Among the ML algorithms used, RF achieved better performance than the other algorithms. RF is a learning algorithm that uses a non-parametric regression-based model combining a set of decision trees [46]. Random forest is a high-precision ML technique in various agricultural applications [47], such as in predictions regarding corn yield [37,48], soybean yield [43], nitrogen content concentration, and height prediction of corn plants [49], in the classification of injuries in soybean seeds [50], and early detection of diseases [51]. The use of RF has a superior advantage in identifying soybeans and corn, being an algorithm with high potential with remotely sensed data [52].

Therefore, the results prove the effectiveness of using ML algorithms, notably the RF algorithm. RF had superiority compared to other methods in classifying soybean genotypes when it comes to secondary macronutrients. Using SB as algorithm input increases accuracy and reduces data processing work. The use of these technologies in the genetic improvement of plants makes the selection of a soybean genotype superior in terms of absorption and metabolization of nutrients such as Ca, Mg, and S faster, more practical, and non-destructive. In this way, the use of spectral data combined with machine learning techniques allows us to simultaneously analyze phenotypic characteristics and relate them to different nutritional characters, being able to assist in the process of plant improvement by selecting genotypes that will be more efficient in absorbing nutrients. This contribution to agriculture allows for savings in time and resources, reducing costs with labor and chemical reagents used in the laboratory to determine such elements. Furthermore, the use of modern techniques, such as the use of algorithms, supports more digital, precise, and efficient agriculture.

4. Conclusions

Machine learning algorithms have demonstrated promising results in being used to classify soybean genotypes in relation to calcium, magnesium, and sulfur content. The algorithm that presented better results than the others was random forest, achieving accuracies close to 0.6 for correct classification and F-score. This algorithm proved to be robust and capable of efficiently generalizing the information obtained, regardless of the type of input used.

In future work, the use of hyperspectral sensors can provide greater amounts of information across the spectrum, in more detail in relation to these and other nutrients. Furthermore, the machine learning techniques applied in this study can be adapted and extended to other agricultural crops and different nutrients, contributing to expanding the application potential and impact of these approaches in high-precision phenotyping.

Author Contributions

Conceptualization, D.C.S., I.C.d.O. and S.B.C.; methodology, P.H.M.d.C. and P.E.T.; software, D.C.S.; validation, L.P.R.T., C.N.S.C. and F.H.R.B.; formal analysis, D.C.S.; investigation, P.H.M.d.C.; resources, P.E.T.; data curation, P.E.T.; writing—original draft preparation, D.C.S.; writing—review and editing, L.P.R.T.; visualization, J.L.D.-S. and C.A.d.S.J.; supervision, P.E.T.; project administration, P.E.T.; funding acquisition, M.C.M.T.F. and P.E.T. All authors have read and agreed to the published version of the manuscript.

Funding

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)—Grant numbers 308295/2023-4, 309250/2021-8, 306022/2021-4 and 304979/2022-8, and Fundação de Apoio ao Desenvolvimento do Ensino, Ciência e Tecnologia do Estado de Mato Grosso do Sul (FUNDECT) TO numbers 88/2021, 07/2022, 318/2022 and 94/2023, and SIAFEM numbers 30478, 31333, 32242 and 33111. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brazil (CAPES)—Financial Code 001.

Data Availability Statement

Data are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hincks, J. The World Is Headed for a Food Security Crisis. Here’s How We Can Avert It. 2018. Available online: https://time.com/5216532/global-food-security-richard-deverell/ (accessed on 23 March 2024).
Valliyodan, B.; Ye, H.; Song, L.; Murphy, M.; Shannon, J.G.; Nguyen, H.T. Genetic Diversity and Genomic Strategies for Improving Drought and Waterlogging Tolerance in Soybeans. J. Exp. Bot. 2017, 68, 1835–1849. [Google Scholar] [CrossRef]
Ye, H.; Song, L.; Schapaugh, W.T.; Ali, M.L.; Sinclair, T.R.; Riar, M.K.; Mutava, R.N.; Li, Y.; Vuong, T.; Valliyodan, B. The Importance of Slow Canopy Wilting in Drought Tolerance in Soybean. J. Exp. Bot. 2020, 71, 642–652. [Google Scholar] [CrossRef] [PubMed]
Maia, C.; DoVale, J.C.; Fritsche-Neto, R.; Cavatte, P.C.; Miranda, G.V. The Difference between Breeding for Nutrient Use Efficiency and for Nutrient Stress Tolerance. Crop Breed. Appl. Biotechnol. 2011, 11, 270–275. [Google Scholar] [CrossRef]
Griffiths, M.; Roy, S.; Guo, H.; Seethepalli, A.; Huhman, D.; Ge, Y.; Sharp, R.E.; Fritschi, F.B.; York, L.M. A Multiple Ion-Uptake Phenotyping Platform Reveals Shared Mechanisms Affecting Nutrient Uptake by Roots. Plant Physiol. 2021, 185, 781–795. [Google Scholar] [CrossRef] [PubMed]
Araus, J.L.; Cairns, J.E. Field High-Throughput Phenotyping: The New Crop Breeding Frontier. Trends Plant Sci. 2014, 19, 52–61. [Google Scholar] [CrossRef] [PubMed]
Jannink, J.-L.; Lorenz, A.J.; Iwata, H. Genomic Selection in Plant Breeding: From Theory to Practice. Brief. Funct. Genom. 2010, 9, 166–177. [Google Scholar] [CrossRef] [PubMed]
Roth, L.; Barendregt, C.; Bétrix, C.-A.; Hund, A.; Walter, A. High-Throughput Field Phenotyping of Soybean: Spotting an Ideotype. Remote Sens. Environ. 2022, 269, 112797. [Google Scholar] [CrossRef]
Araus, J.L.; Kefauver, S.C.; Zaman-Allah, M.; Olsen, M.S.; Cairns, J.E. Translating High-Throughput Phenotyping into Genetic Gain. Trends Plant Sci. 2018, 23, 451–466. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Zhang, Q.; Huang, D. A Review of Imaging Techniques for Plant Phenotyping. Sensors 2014, 14, 20078–20111. [Google Scholar] [CrossRef]
Ling, B.; Goodin, D.G.; Raynor, E.J.; Joern, A. Hyperspectral Analysis of Leaf Pigments and Nutritional Elements in Tallgrass Prairie Vegetation. Front. Plant Sci. 2019, 10, 142. [Google Scholar] [CrossRef]
de Medeiros, A.D.; Capobiango, N.P.; da Silva, J.M.; da Silva, L.J.; da Silva, C.B.; dos Santos Dias, D.C.F. Interactive Machine Learning for Soybean Seed and Seedling Quality Classification. Sci. Rep. 2020, 10, 11267. [Google Scholar] [CrossRef] [PubMed]
Barbedo, J.G.A. Detection of Nutrition Deficiencies in Plants Using Proximal Images and Machine Learning: A Review. Comput. Electron. Agric. 2019, 162, 482–492. [Google Scholar] [CrossRef]
Gava, R.; Santana, D.C.; Cotrim, M.F.; Rossi, F.S.; Teodoro, L.P.R.; da Silva Junior, C.A.; Teodoro, P.E. Soybean Cultivars Identification Using Remotely Sensed Image and Machine Learning Models. Sustainability 2022, 14, 7125. [Google Scholar] [CrossRef]
da Silva, E.E.; Baio, F.H.R.; Teodoro, L.P.R.; da Silva Junior, C.A.; Borges, R.S.; Teodoro, P.E. UAV-Multispectral and Vegetation Indices in Soybean Grain Yield Prediction Based on in Situ Observation. Remote Sens. Appl. 2020, 18, 100318. [Google Scholar] [CrossRef]
Santana, D.C.; Teixeira Filho, M.C.M.; da Silva, M.R.; das Chagas, P.H.M.; de Oliveira, J.L.G.; Baio, F.H.R.; Campos, C.N.S.; Teodoro, L.P.R.; da Silva Junior, C.A.; Teodoro, P.E. Machine Learning in the Classification of Soybean Genotypes for Primary Macronutrients’ Content Using UAV–Multispectral Sensor. Remote Sens. 2023, 15, 1457. [Google Scholar] [CrossRef]
Teixeira, P.C.; Donagemma, G.K.; Fontana, A.; Teixeira, W.G. Manual de Métodos de Análise de Solo; Embrapa: Brasília, Brazil, 2017; 573p. [Google Scholar]
Huete, A.R.; Liu, H.Q.; Batchily, K.V.; van Leeuwen, W. A Comparison of Vegetation Indices over a Global Set of TM Images for EOS-MODIS. Remote Sens. Environ. 1997, 59, 440–451. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a Green Channel in Remote Sensing of Global Vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A Modified Soil Adjusted Vegetation Index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS; NASA Special Publication-351; NASA: Washington, DC, USA, 1974; pp. 309–317. [Google Scholar]
Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Raper, T.B.; Varco, J.J. Canopy-Scale Wavelength and Vegetative Index Sensitivities to Cotton Growth Parameters and Nitrogen Status. Precis. Agric. 2015, 16, 62–76. [Google Scholar] [CrossRef]
Bataglia, O.C.; Teixeira, J.P.F.; Furlani, P.R.; Furlani, A.M.C.; Gallo, J.R. Métodos de Análise Química de Plantas; IAC: Campinas, Brazil, 1978; Volume 87. [Google Scholar]
Bhering, L.L. Rbio: A Tool for Biometric and Statistical Analysis Using the R Platform. Crop Breed. Appl. Biotechnol. 2017, 17, 187–190. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2013. [Google Scholar]
Egmont-Petersen, M.; de Ridder, D.; Handels, H. Image Processing with Neural Networks—A Review. Pattern Recognit. 2002, 35, 2279–2301. [Google Scholar] [CrossRef]
Al Snousy, M.B.; El-Deeb, H.M.; Badran, K.; Khlil, I.A. Al Suite of Decision Tree-Based Classification Algorithms on Cancer Gene Expression Data. Egypt. Inform. J. 2011, 12, 73–82. [Google Scholar] [CrossRef]
Quinlan, J.R. C4. 5: Programming for Machine Learning. Morgan Kauffmann 1993, 38, 49. [Google Scholar]
Štepanovský, M.; Ibrová, A.; Buk, Z.; Velemínská, J. Novel Age Estimation Model Based on Development of Permanent Teeth Compared with Classical Approach and Other Modern Data Mining Methods. Forensic Sci. Int. 2017, 279, 72–82. [Google Scholar] [CrossRef] [PubMed]
Belgiu, M.; Drăguţ, L. Random Forest in Remote Sensing: A Review of Applications and Future Directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Nalepa, J.; Kawulok, M. Selecting Training Sets for Support Vector Machines: A Review. Artif. Intell. Rev. 2019, 52, 857–900. [Google Scholar] [CrossRef]
Santana, D.C.; Teodoro, L.P.R.; Baio, F.H.R.; dos Santos, R.G.; Coradi, P.C.; Biduski, B.; da Silva Junior, C.A.; Teodoro, P.E.; Shiratsuchi, L.S. Classification of Soybean Genotypes for Industrial Traits Using UAV Multispectral Imagery and Machine Learning. Remote Sens. Appl. 2023, 29, 100919. [Google Scholar] [CrossRef]
Pereira Ribeiro Teodoro, L.; Estevão, R.; Santana, D.C.; de Oliveira, I.C.; Lopes, M.T.G.; de Azevedo, G.B.; Rojo Baio, F.H.; da Silva Junior, C.A.; Teodoro, P.E. Eucalyptus Species Discrimination Using Hyperspectral Sensor Data and Machine Learning. Forests 2023, 15, 39. [Google Scholar] [CrossRef]
Orlando Filho, J.; de Bittencourt, V.C.; de, C. Carmello, Q.A.; Beauclair, E.G.F. de Relações k,’CA’e’MG’de Solo, Areia Quartzosa e Produtividade Da Cana-de-Açúcar. STAB Açúcar Álcool Subprodutos 1996, 14, 13–17. [Google Scholar]
Guo, W.; Nazim, H.; Liang, Z.; Yang, D. Magnesium Deficiency in Plants: An Urgent Problem. Crop J. 2016, 4, 83–91. [Google Scholar] [CrossRef]
Osco, L.P.; Ramos, A.P.M.; Faita Pinheiro, M.M.; Moriya, É.A.S.; Imai, N.N.; Estrabis, N.; Ianczyk, F.; de Araújo, F.F.; Liesenberg, V.; de Jorge, L.A.C. A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements. Remote Sens. 2020, 12, 906. [Google Scholar] [CrossRef]
Schwalbert, R.A.; Amado, T.; Corassa, G.; Pott, L.P.; Prasad, P.V.V.; Ciampitti, I.A. Satellite-Based Soybean Yield Forecast: Integrating Machine Learning and Weather Data for Improving Crop Yield Prediction in Southern Brazil. Agric. For. Meteorol. 2020, 284, 107886. [Google Scholar] [CrossRef]
Cruz, C.D.; Regazzi, A.J. Modelos Biométricos Aplicados Ao Melhoramento Genético; UFV: Viçosa, Brazil, 1994; ISBN 85-7269-010-7. [Google Scholar]
Ahmed, M.; Seraj, R.; Islam, S.M.S. The K-Means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Zhu, Q.; Wang, H.; Shan, Y.Z.; Ma, H.Y.; Wang, H.Y.; Xie, F.T.; Ao, X. Physiological Response of Phosphorus-Efficient and Inefficient Soybean Genotypes under Phosphorus-Deficiency. Russ. J. Plant Physiol. 2020, 67, 175–184. [Google Scholar] [CrossRef]
Herrero-Huerta, M.; Rodriguez-Gonzalvez, P.; Rainey, K.M. Yield Prediction by Machine Learning from UAS-Based Multi-Sensor Data Fusion in Soybean. Plant Methods 2020, 16, 78. [Google Scholar] [CrossRef]
Alabi, T.R.; Abebe, A.T.; Chigeza, G.; Fowobaje, K.R. Estimation of Soybean Grain Yield from Multispectral High-Resolution UAV Data with Machine Learning Models in West Africa. Remote Sens. Appl. 2022, 27, 100782. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Teodoro, P.E.; Teodoro, L.P.R.; Baio, F.H.R.; da Silva Junior, C.A.; dos Santos, R.G.; Ramos, A.P.M.; Pinheiro, M.M.F.; Osco, L.P.; Gonçalves, W.N.; Carneiro, A.M. Predicting Days to Maturity, Plant Height, and Grain Yield in Soybean: A Machine and Deep Learning Approach Using Multispectral Data. Remote Sens. 2021, 13, 4632. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.-M.; Gerber, J.S.; Reddy, V.R. Random Forests for Global and Regional Crop Yield Predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef] [PubMed]
Marques Ramos, A.P.; Prado Osco, L.; Elis Garcia Furuya, D.; Nunes Gonçalves, W.; Cordeiro Santana, D.; Pereira Ribeiro Teodoro, L.; Antonio da Silva Junior, C.; Fernando Capristo-Silva, G.; Li, J.; Henrique Rojo Baio, F.; et al. A Random Forest Ranking Approach to Predict Yield in Maize with Uav-Based Vegetation Spectral Indices. Comput. Electron. Agric. 2020, 178, 105791. [Google Scholar] [CrossRef]
Osco, L.P.; Junior, J.M.; Ramos, A.P.; Furuya, D.E.; Santana, D.C.; Teodoro, L.P.; Gonçalves, W.N.; Baio, F.H.; Pistori, H.; Junior, C.A.; et al. Leaf Nitrogen Concentration and Plant Height Prediction for Maize Using UAV-Based Multispectral Imagery and Machine Learning Techniques. Remote Sens. 2020, 12, 3237. [Google Scholar] [CrossRef]
Wang, L.; Huang, Z.; Wang, R. Discrimination of Cracked Soybean Seeds by Near-Infrared Spectroscopy and Random Forest Variable Selection. Infrared Phys. Technol. 2021, 115, 103731. [Google Scholar]
Raza, M.M.; Harding, C.; Liebman, M.; Leandro, L.F. Exploring the Potential of High-Resolution Satellite Imagery for the Detection of Soybean Sudden Death Syndrome. Remote Sens. 2020, 12, 1213. [Google Scholar] [CrossRef]
Wang, L.; Liu, J.; Yang, L.; Yang, F.; Fu, C. Application of Random Forest Method in Maize-Soybean Accurate Identification. Acta Agron. Sin. 2018, 44, 569–580. [Google Scholar] [CrossRef]

Figure 1. Location of the experimental area in Chapadão do Sul-MS, Brazil; photographic area of the experimental area.

Figure 2. Pearson correlation scatterplot with spectral and secondary macronutrients.

Figure 3. Principal Component (PC) for clusters based on Ca, M, and S contents of soybean genotypes based on k-means.

Figure 4. Boxplot with Ca, Mg, and S means for clustered data. Means followed by the same letters do not differ for the cluster by the Scott–Knott test at 5% probability.

Figure 5. Boxplot with clustering means for percent correct classification regarding the machine learning models. Means followed by the same uppercase letters do not differ for the inputs tested by the Scott–Knott test at 5% probability; means followed by the same lowercase letters do not differ for the algorithms tested by the Scott–Knott test at 5% probability.

Figure 6. Boxplot with clustering means for kappa regarding machine learning models. Means followed by the same uppercase letters do not differ for the inputs tested by the Scott–Knott test at 5% probability; means followed by the same lowercase letters do not differ for the algorithms tested by the Scott–Knott test at 5% probability.

Figure 7. Boxplot with clustering means for F-score regarding the machine learning models tested. Means followed by the same uppercase letters do not differ for the inputs tested by the Scott–Knott test at 5% probability; means followed by the same lowercase letters do not differ for the algorithms tested by the Scott–Knott test at 5% probability.

Table 1. Summary of the analysis of variance for the variables percent correct classification (CC), kappa coefficient, and F-score.

SV	DF	CC	F-Score	Kappa
Inputs	2	12.11 ns	0.01 *	0.03 ***
ML	5	112.00 **	0.10 **	0.97 **
inputs × ML	10	45.10 **	0.02 **	0.04 ***

*, ** and *** significant at 5, 1 and 0.1% probability, respectively by F-test; SV: sources of variation; DF: degrees of freedom; ML: Machine Learning.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Santana, D.C.; de Oliveira, I.C.; Cavalheiro, S.B.; das Chagas, P.H.M.; Teixeira Filho, M.C.M.; Della-Silva, J.L.; Teodoro, L.P.R.; Campos, C.N.S.; Baio, F.H.R.; da Silva Junior, C.A.; et al. Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor. AgriEngineering 2024, 6, 1581-1593. https://doi.org/10.3390/agriengineering6020090

AMA Style

Santana DC, de Oliveira IC, Cavalheiro SB, das Chagas PHM, Teixeira Filho MCM, Della-Silva JL, Teodoro LPR, Campos CNS, Baio FHR, da Silva Junior CA, et al. Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor. AgriEngineering. 2024; 6(2):1581-1593. https://doi.org/10.3390/agriengineering6020090

Chicago/Turabian Style

Santana, Dthenifer Cordeiro, Izabela Cristina de Oliveira, Sâmela Beutinger Cavalheiro, Paulo Henrique Menezes das Chagas, Marcelo Carvalho Minhoto Teixeira Filho, João Lucas Della-Silva, Larissa Pereira Ribeiro Teodoro, Cid Naudi Silva Campos, Fábio Henrique Rojo Baio, Carlos Antonio da Silva Junior, and et al. 2024. "Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor" AgriEngineering 6, no. 2: 1581-1593. https://doi.org/10.3390/agriengineering6020090

APA Style

Santana, D. C., de Oliveira, I. C., Cavalheiro, S. B., das Chagas, P. H. M., Teixeira Filho, M. C. M., Della-Silva, J. L., Teodoro, L. P. R., Campos, C. N. S., Baio, F. H. R., da Silva Junior, C. A., & Teodoro, P. E. (2024). Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor. AgriEngineering, 6(2), 1581-1593. https://doi.org/10.3390/agriengineering6020090

Article Menu

Classification of Soybean Genotypes as to Calcium, Magnesium, and Sulfur Content Using Machine Learning Models and UAV–Multispectral Sensor

Abstract

1. Introduction

2. Materials and Methods

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI