**Discrimination of Cultivated Regions of Soybeans (***Glycine max***) Based on Multivariate Data Analysis of Volatile Metabolite Profiles**

**So-Yeon Kim 1, So Young Kim 1, Sang Mi Lee 1, Do Yup Lee 2, Byeung Kon Shin 3, Dong Jin Kang 3, Hyung-Kyoon Choi 4,\* and Young-Suk Kim 1,\***


Academic Editor: Igor Jerkovi´c

Received: 2 January 2020; Accepted: 6 February 2020; Published: 10 February 2020

**Abstract:** Soybean (*Glycine max*) is a major crop cultivated in various regions and consumed globally. The formation of volatile compounds in soybeans is influenced by the cultivar as well as environmental factors, such as the climate and soil in the cultivation areas. This study used gas chromatography-mass spectrometry (GC-MS) combined by headspace solid-phase microextraction (HS-SPME) to analyze the volatile compounds of soybeans cultivated in Korea, China, and North America. The multivariate data analysis of partial least square-discriminant analysis (PLS-DA), and hierarchical clustering analysis (HCA) were then applied to GC-MS data sets. The soybeans could be clearly discriminated according to their geographical origins on the PLS-DA score plot. In particular, 25 volatile compounds, including terpenes (limonene, myrcene), esters (ethyl hexanoate, butyl butanoate, butyl prop-2-enoate, butyl acetate, butyl propanoate), aldehydes (nonanal, heptanal, *(E)*-hex-2-enal, *(E)*-hept-2-enal, acetaldehyde) were main contributors to the discrimination of soybeans cultivated in China from those cultivated in other regions in the PLS-DA score plot. On the other hand, 15 volatile compounds, such as 2-ethylhexan-1-ol, 2,5-dimethylhexan-2-ol, octanal, and heptanal, were related to Korean soybeans located on the negative PLS 2 axis, whereas 12 volatile compounds, such as oct-1-en-3-ol, heptan-4-ol, butyl butanoate, and butyl acetate, were responsible for North American soybeans. However, the multivariate statistical analysis (PLS-DA) was not able to clearly distinguish soybeans cultivated in Korea, except for those from the Gyeonggi and Kyeongsangbuk provinces.

**Keywords:** gas chromatography-mass spectrometry; solid-phase microextraction; soybean; origin discrimination; volatile compounds

#### **1. Introduction**

Soybean (*Glycine max*) is among the most important crops in the world and is extensively used in the production of soybean flour, soybean milk, fermented products, and oil for consumption by both humans and animals, mainly due to its high protein and fat contents [1]. It is generally accepted that soybean cultivation originated in China, but nowadays, soybeans are produced worldwide, including in North America, South America, and Asia [1]. The importing and exporting of agricultural products

are increasing globally due to the expansion of the free-trade agreements. These circumstances have resulted in some foreign soybeans with unclear origins being distributed as domestic ones in Korea, which can lead to consumer distrust about the market [2]. The National Agricultural Products Quality Management Service in Korea introduced an agricultural food origin labeling system in 1991 to protect domestic agricultural producers and consumers [2]. Soybeans have been included in this system since 2017, and traders must now mark the origins of all products advertised for sale [2].

The properties and qualities of soybeans can be significantly affected by their cultivation region because each production area has different growing conditions, such as temperature, precipitation, and soil characteristics [3–5]. In particular, some previous studies have demonstrated that light and water characteristics and growth temperatures significantly affect the formation of volatile metabolites, such as alcohol and aldehydes in plants [6,7]. In addition, Grieshop et al. and Cherry et al. demonstrated that the environmental growing conditions of soybeans could change the synthesis pathways of proteins and fats, thereby affecting their chemical compositions [8,9].

Volatile components of soybeans have been found to be mainly derived from carbohydrates, proteins, and lipids via enzymatic reactions, autoxidation, and/or other chemical reactions during both storage and cultivation [10]. Lee et al. explained that major aroma constituents of soybean include hexanal, 1-octen-3-ol, γ-butyrolactone, maltol, and phenylethyl alcohol [11]. Also, Boué et al. and Dings et al. identified hexan-1-ol, octan-3-one, oct-1-en-3-ol, ethanol, octanal, 2-propanone, hexan-1-al, and 1-pentan-3-ol—most of which could be lipid oxidative degradation products—as major volatiles in soybean using Tenax trapping and solid-phase microextraction (SPME) combined by gas chromatography-mass spectrometry (GC-MS) analysis [12,13]. Since the composition of volatile compounds in soybeans can vary depending on their cultivation conditions, it might be feasible to use data sets of volatile components to distinguish where soybeans originate from.

Metabolomics is a commonly used tool for the identification and quantification of whole metabolites in biological samples [14]. Metabolite profiling has been performed using various instrumental methods, including mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy [15–17]. These approaches have been successfully used to determine the geographical origins of various agricultural products [18–20]. Metabolomics approaches based on several different types of instrumental methods have been recently used to distinguish soybeans of different geographical origins [21,22]. Liquid-chromatography-orbitrap mass spectrometry (LC-Orbitrap MS) and gas-chromatography time-of-flight mass chromatography (GC-TOF MS) have been used to obtain the metabolic fingerprints of soybeans cultivated domestically in different provinces of Korea [21]. Fourier-transform infrared (FT-IR) spectroscopy has also been combined with multivariate statistical analysis to distinguish the geographical origins of Chinese and Korean soybeans [22]. However, while GC-MS has been widely used to discriminate the origins of foodstuffs and agricultural products, such as green tea, omija fruit, and honey, mainly due to its high resolution and sensitivity, in particular, in the analysis of volatile compounds, this method has not previously been applied to distinguish soybeans according to their cultivation regions based on the data sets of volatiles [14,23–25]. Therefore, we aimed to determine the feasibility of discriminating soybeans according to the cultivation regions using a GC-MS-based metabolomics approach in this study.

#### **2. Results and Discussion**

#### *2.1. Profiling of Total Volatile Compounds in Soybeans*

In total, 146 volatile compounds were identified in GC-MS data sets obtained from soybean samples of different geographical origins. Tables S1–S3 indicate that diverse lipid-derived volatile compounds and terpenes were detected in this study. Previous studies have found the major volatiles of soybeans to be ethanol, 1-octen-3-ol, maltol, phenylethyl alcohol, hexanal, octanal, 2-propanone, and γ-butyrolactone [11,12]. All of these volatile compounds were detected in the present study with the exception of maltol, which could have been due to the use of different extraction techniques [11]—the

present study employed headspace extraction using SPME, which generally focuses on the detection of highly volatile compounds with low boiling points.

The 84 volatile compounds detected in the soybeans cultivated in Korea comprised of 1 acid, 23 alcohols, 9 aldehydes, 4 esters, 6 furans, 6 benzenes, 10 ketones, 3 lactones, 3 nitrogen-containing compounds, 2 sulfur-containing compounds, 10 hydrocarbons, 6 terpenes, and 1 phenol. Certain alcohols, such as 2-ethylhexan-1-ol, predominated, followed by ketones, such as propan-2-one, while terpenes were found at low levels in most samples. Unlike soybeans grown in China and North America, pyrazines were not identified in those cultivated in Korea.

The 124 volatile compounds identified in the soybeans cultivated in China comprised of 2 acids, 25 alcohols, 13 aldehydes, 13 esters, 4 furans, 8 benzenes, 17 ketones, 4 lactones, 3 nitrogen-containing compounds, 2 sulfur-containing compounds, 16 hydrocarbons, 11 terpenes, 3 phenols, and 3 pyrazines. Among them, 3-methylheptan-4-one was detected at higher levels compared to those cultivated in Korea and North America, and there was a greater diversity of terpenes in the Chinese soybeans.

The soybeans cultivated in North America contained 50 volatile compounds: 1 acid, 16 alcohols, 5 aldehydes, 4 esters, 2 furans, 4 benzenes, 4 ketones, 3 lactones, 1 nitrogen-containing compound, 1 sulfur-containing compound, 5 hydrocarbons, 2 terpenes, 1 phenol, and 1 pyrazine. The number of volatile compounds detected in North American soybeans was clearly smaller than in those from other cultivation areas, but there was a greater diversity of alcohol. Oct-1-en-3-ol was detected at higher levels, while propan-2-one and 2-methylprop-1-ene were present at lower levels in North American soybeans. The only pyrazine detected was 2-methylpyrazine. Among esters, the content of 3-hydroxy-2,4,4-trimethylpentyl 2-methylpropanoate was higher in soybeans from Indiana province (IN) than in those of other regions of North America.

Several enzymes of soybeans have been studied by various researchers, including lipoxygenase, lipase, urease, amylase, and protease [26,27]. In particular, soybeans are known to be a rich source of lipoxygenase [27], which is one of several enzymes used to produce aldehydes and alcohols via enzymatic oxidation [28]. This study found hexanal (13-linoleate hydroperoxide) and heptanal (11-linoleate hydroperoxide)—known as the major oxidative products from linoleate hydroperoxides in most of the cultivation regions, as were octanal (11-oleate hydroperoxide) and nonanal (9-/10-oleate hydroperoxide) [29], which are known to be decomposition products of oleate hydroperoxides [30].

Benelli et al. found that the amount of hexanal was related to precipitation and light conditions in the cultivation area [7]. Table 1 [31] presents the differences in precipitation between the cultivation regions, whereas the amount of hexanal did not differ significantly between the geographical regions studied. In this study, alcohols—which are known to be secondary oxidative products of unsaturated fatty acids [29]—predominated in soybeans from Korea, China, and North America, among which pentan-1-ol and hexan-1-ol (both are derived from 13-linoleate hydroperoxide [30,32]) were observed in most samples. As mentioned above, oct-1-en-3-ol (produced from 10-linoleate hydroperoxide [32]) was the most abundant alcohol in soybeans cultivated in North America. On the other hand, furans can be produced from the oxidation of polyunsaturated fatty acids and carotenoids [33], and 2-alkylfurans are commonly derived from lipid degradation [34]. 2-Methylfuran, 2-ethylfuran, and 2-pentylfuran were detected in soybeans from Korea and China, whereas 2-methylfuran was not found in soybeans from North America.

Several ketones were also identified in soybeans from Korea, China, and North America. Other diverse ketones that are mainly formed from unsaturated fatty acids (e.g., linoleic acid) by lipoxygenase were found in soybeans from China [35,36]. Certain ketones, such as propan-2-one, butan-2-one, and 3-methylheptane-4-one, were commonly found in samples from China. Cheesbroug et al. and Gulen et al. explained that the activities of enzymes, such as peroxidase, increased with the temperature at which the plants were grown [37,38]. Also, some previous studies have reported that lipoxygenase activity is affected by the minimum mean temperature from flowering to maturity [39], which affects the formation of volatile compounds [40]. Table 1 indicates that the annual mean temperature was higher in China (excluding the northeast region) than in other cultivation areas (Korea and North

America). It could, therefore, be assumed that the formation of various ketones in soybeans from China is due to high lipoxygenase activity related to the temperatures of their cultivation areas.


**Table 1.** Climatic conditions and geographic coordinates of Korea, China, and North America.

Diverse terpenes that occur naturally as metabolites are commonly found in plants [41]. In general, terpenes are produced from isopentenyl diphosphate, which is elongated to geranyl diphosphate, farnesyl diphosphate, and geranylgeranyl diphosphate [42]. Those terpenes were identified in all of the present cultivated areas but showed the greatest abundance and variety in China. The 11 terpenes of α-pinene, α-thujene, sabinene, l-phellandrene, myrcene, α-terpinene, limonene, β-phellandrene, γ-terpinene, terpinolene, and α-cedrene were detected in soybeans from China. The formation of terpenes could depend on various factors, such as cultivar and region [43]. Marais reported that certain factors, such as increased temperature and acidic conditions, could affect the concentration and diversity of terpenes formed [43]. Also, terpene synthases could be affected by CO2 levels [40]. According to Planbureau voor de Leefomgeving (PBL) Netherland Environmental Assessment Agency, China showed the largest CO2 emissions in 2016 [44]. In particular, limonene derived from geranyl pyrophosphate was identified in all samples from China. A previous study suggested that a higher CO2 concentration could enhance the activity of limonene synthase [40]. Therefore, the formation of limonene could be significantly affected by CO2 concentration as well as other factors, such as temperature.

#### *2.2. Discrimination of Soybeans by Di*ff*erent Geographical Origins*

In order to discriminate soybeans according to their geographical origins, the relationship between soybeans from different cultivation regions and their volatile profiles was investigated. GC-MS data sets were processed using unsupervised statistical analysis (principal components analysis (PCA) and hierarchical clustering analysis (HCA)) as well as supervised statistical analysis (partial least square-discriminant analysis (PLS-DA)) [45]. PCA, HCA, and PLS-DA were performed to identify the differences in volatiles profiles obtained from GC-MS analyses of soybeans of different geographical origins.

The results of PCA were distinguished by their geographical origins (data not shown). Since both results of PCA and PLS-DA on score plots were similar, only PLS-DA results were presented to show the separation of samples according to the cultivation area (Figure 1). In addition, partial least square (PLS) components 1, 2, and 3 in the PLS PLS-DA 3D score plot for soybeans of different origins together explained 37.9% of the total variance: 24.66%, 6.84%, and 6.40%, respectively (Figure 1a). The PLS-DA score plot for PLS component 1 and PLS component 2 is presented (Figure 1b). The parameters of the cross-validation modeling were component 3, with R2X = 0.379, R2Y = 0.788, and Q2(cum) = 0.709. After 100 times permutations, R<sup>2</sup> <sup>=</sup> 0.177 and Q<sup>2</sup> <sup>=</sup> <sup>−</sup>0.219 were obtained.

**Figure 1.** Partial least square-discriminant analysis (PLS-DA) score plot of soybean samples from different cultivation areas; (**a**) 3D score plot; (**b**) score plot PLS[1]-PLS[2], indicating the separation between different cultivation areas.

Some previous studies have shown that the chemical compositions of soybeans can vary significantly with differences in soils, fertilizer treatment, and climatic conditions, as well as other environmental factors [46–48]. Grieshop and Fashey showed that soybeans from China had greater crude protein content than those from North America [8]. Also, Shi et al. [47] demonstrated that soybeans from Korea contained more protein and less oil than those from North America. On the other hand, soybeans from China have been shown to have lower lipid concentration than those from North America [9]. Volatile compounds of soybeans are produced by nonvolatile precursors, such as lipids, sugars, and proteins [49]. In particular, oxidative degradation of lipids can lead to the formation of diverse volatiles. Certain lipid-derived compounds, such as oct-1-en-3-ol, differed significantly between soybeans from North America and those cultivated in other regions, which could be due to the higher lipid concentration of North American soybeans. On the other hand, the amounts of benzaldehyde, 2,6-dimethylpyrazine, and 2,5-dimethylpyrazine, which are known to be mainly produced by amino acids as major precursors [50], differed significantly between soybeans from China and those cultivated in other regions. This could be at least partially due to the differences in protein content between soybeans from different cultivation regions [46,48]. However, their exact formation mechanisms remain unclear, and they could involve both biological and chemical mechanisms during the cultivation and storage of the soybeans.

Medic et al. reported that the constituents of soybeans could be significantly altered by diverse environmental factors exerting complex combined effects [50]. This situation makes it difficult to explain how specific environmental factors influence the formation of volatile components in soybeans. As shown in Figure 1, the soybean samples could be divided into three groups for Korea, China, and North America. Soybeans from China are located in the area of negatively-related PLS component 1 in this score plot, whereas those from Korea and North America are located in the positions of both positively-related PLS component 1. Soybeans from North America are located in the positions of positively-related PLS component 2 in this score plot, whereas those from Korea are located in the positions of negatively-related PLS component 2.

Tables 2 and 3 list the main volatile metabolites identified according to the variable importance plot (VIP) values of >1.20. A VIP value >1 suggests that a compound plays a predominant role in the separation of groups [51]. The major volatile metabolites contributing to the positive PLS 1 axis were 2-ethylhexan-1-ol, while those in the negative axis of PLS component 1 were heptan-4-ol, butan-1-ol, butyl butanoate, octanal, butyl prop-2-enoate, 5-methyl-2-propan-2-ylcyclohexan-1-ol, butyl acetate, butyl propanoate, nonanal, toluene, heptanel, heptan-4-one, 5-ethyloxolan-2-one, 1,2,3-trimethylbenzene, heptan-2-one, (*E*)-hex-2-enal, ethyl hexanoate, (*E*)-hept-2-enal, limonene, 1-butoxybutane, 2-pentylfuran, acetaldehyde, myrcene, and 3-hydroxybutan-2-one, whereas those in the negative axis of PLS component 1 were found in all soybeans from China. On the other hand, the main volatile metabolites that contribute to the negative PLS 2 axis were 2-ethylhexan-1-ol, 2,5-dimethylhexan-2-ol, styrene, 2-methylfuran, 2- methylprop-2-ene, propan-2-one, 2-methylprop-2-enal, hexane, methyl acetate, 2-methylpentan-1-ol, octanal, butyl prop-2-enoate, 1-methyoxypropan-2-ol, heptanal, and toluene, whereas those in the positive PLS 2 axis were oct-1-en-3-ol, nonane, 4-methyloxolan-2-one, heptan-4-ol, butan-1-ol, octan-3-one, butyl butanoate, 3-hydroxy-2,4,4-trimethylpentyl 2-methylpropanoate, 5-methyl-2-propan-2-ylcyclohexan-1-ol, butyl acetate, butyl propanoate, and nonanal. In Figure 2, soybeans from each country are clustered according to their cultivation regions. The figure shows that soybeans from Korea were clustered more closely than the others, which is possibly due to the much smaller land area of that country (100,339 km2) compared to China (9,596,951 km2), Canada (9,984,670 km2), and North America (9,826,676 km2).


**Table 2.** The major volatile metabolites identified in soybeans from Korea, China, and North America according to variables importance plot (VIP > 1.20) list for partial least square (PLS) component 1.

<sup>1</sup> Retention indices were determined using *n*-alkanes C6 to C30 as an external standard; <sup>2</sup> Retention indices were obtained from national institute of standards and technology (NIST) database (http://webbook.nist.gov/chemistry); <sup>3</sup> Identification of the compounds was based as follows; A, mass spectrum and retention index agree with the authentic compounds under similar conditions (positive identification); B, mass spectrum and retention index were consistent with those from NIST database.


**Table 3.** The major volatile metabolites identified in soybeans from Korea, China, and North America according to variables importance plot (VIP > 1.20) list for partial least square PLS component 2.

<sup>1</sup> Retention indices were determined using *n*-alkanes C6 to C30 as an external standard; <sup>2</sup> Retention indices were obtained from NIST database (http://webbook.nist.gov/chemistry); <sup>3</sup> Identification of the compounds was based as follows; A, mass spectrum and retention index agree with the authentic compounds under similar conditions (positive identification); B, mass spectrum and retention index were consistent with those from NIST database; C, mass spectrum was consistent with that of W9N08 (Wiley and NIST) and manual interpretation (tentative identification).

Figure 2 shows the HCA dendrogram with its associated heatmap in which all of the samples are grouped in terms of their nearness or similarity [52]. The figure shows that all of the samples could be clustered into two groups except for Kyeongsangnam province Changnyeong (KNCN): group I consisted of 13 soybean samples cultivated in China, and group II comprised of 22 soybean samples from Korea and North America. The amounts of terpenes and esters were greater in group I than in group II. In group II, soybean samples from Korea—except for Kyeongsangnam province Changnyeong (KNCN)—and North America were classified into the subgroup. Among soybean samples grown in North America, those from Illinois (IL) and Indiana (IN) provinces could be distinguished from the others. Table S3 indicates that the samples from Illinois and Indiana provinces were found to contain greater amounts of alcohol than other North American soybeans (samples MI, MN, ON, and QB). The annual mean precipitations were similar across North America, but the annual mean temperatures were higher in Illinois and Indiana than in the other regions. Wills et al. reported that the concentration of esters and alcohols was positively related to temperature [53]. Therefore, it could be inferred that the formation of volatile compounds was affected by the cultivation temperature in soybeans from North America.

**Figure 2.** Heatmap generated by a hierarchical clustering analysis of 146 metabolites.

When the multivariate statistical analysis was performed only on domestic samples in Korea to investigate the possibility of our method to the discrimination of samples cultivated in the regions close to each other, it could not distinguish soybeans according to the region in the results of PCA (data not shown) and PLS-DA. Figure 3a shows that PLS 1, 2, and 3 together explained 43.7% of the total variance (19.09%, 15.36%, and 8.92%, respectively), while Figure 3b shows that two PLS components (PLS components 1 and 2) explained 33.86%. The parameters of the cross-validation modeling were component 3, with R2X = 0.437, R2Y = 0.169, and Q2(cum) = 0.0535. After 100 times permutations, R2 <sup>=</sup> 0.0951 and Q<sup>2</sup> <sup>=</sup> <sup>−</sup>0.0676 were obtained.

**Figure 3.** PLS-DA score plot of soybeans from Korea on the basis of volatile metabolites: (**a**) 3D score plot; (**b**) score plot PLS[1]-PLS[2]. GGIC—Gyeonggi province Anseong, GGAS—Gyeonggi province Icheon, GWCC—Gangwon province Chuncheon, GWYW—Gangwon province Yeongwol, CBES—Chungcheongbuk province Eumseong, CNCA— Chungcheongnam province Cheonan, CNGJ—Chungcheongnam province Gongju, JBGJ—Jeollabuk province Gimje, JBIS—Jeollabuk province Imsil, JNNJ—Jeollanam province Naju, JNYG—Jeollanam province Yeonggwang, KBCD—Kyeongsangbuk province Cheongdo, KBES—Kyeongsangbuk province Uiseong, KBYC—Kyeongsangbuk province Yeongcheon, KNCN—Kyeongsangnam province Changnyeong, KNMY—Kyeongsangnam province Miryang, KNGC—Kyeongsangnam province Geochang.

Soybean samples from the Gyeonggi and Kyeongsangbuk provinces were clustered according to their regions, whereas other samples were not clearly clustered in the PLS-DA score plot. As shown in Table 1, the climatic conditions varied with the cultivation area. The mean temperatures in 2016 showed similar tendencies in all of the cultivation regions studied, but with slight differences in the total precipitation and sun exposure times. Various plant volatiles can be affected by changing biotic and abiotic factors [54]. Vallat et al. explained that the concentrations of nonanal and benzaldehyde were both positively related to precipitation, and positively and negatively related to temperature, respectively [54]. This variety of climate factors could together affect the volatile metabolites formed in soybeans cultivated in different regions. However, the relationships between climate and the amounts of nonanal and benzaldehyde formed were not clear in this study. Other domestic samples except those from the Gyeonggi and Kyeongsangbuk provinces were not clearly grouped in the PLS-DA score plot.
