*3.5. Statistical Analysis*

All the datasets obtained were processed by multivariate statistical analysis, such as principal components analysis (PCA) and partial least square-discriminant analysis (PLS-DA) using SIMCA-P (version 11.0, Umetrics, Umea, Sweden), to determine the discrimination of soybeans according to different geographic origins. Heatmap visualization and hierarchical clustering analysis were performed based on Pearson's correlation and average linkage method using Multi Experimental Viewer (MeV) software (version 4.9, The Institute for Genomic Research (TIGR)) [55].

#### **4. Conclusions**

This study applied GC-MS analysis combined with the multivariate statistical analysis to distinguish the geographical origins of soybeans. The profiles of volatile compounds in the soybean samples varied with their cultivation regions. In the PLS-DA results, all soybean samples were clearly discriminated by their geographical origins. However, those cultivated in Korea (except for the samples from the Gyeonggi and Kyeongsangbuk provinces) could not be clearly separated according to the region on the PLS-DA score plot. We also determined the major volatile metabolites that contributed to the discrimination of geographical origins on the basis of PLS-DA. This study has the advantage of being able to distinguish the geographical origin of soybeans without any sample pretreatment on the basis of volatile metabolite profiles, which are highly related to their quality. However, we did not have enough sample information on post-harvest practices, such as drying and storage conditions, which could affect volatiles' profiles in some way. Nevertheless, our result could be applied to the discrimination of soybeans distributed and commercially available in Korea, the main objective of this study.

In summary, the findings of this study suggested that combining GC-MS-based analysis of volatile compounds with multivariate data analysis is a useful tool for discriminating the geographical origins of soybeans, but with some limitations for domestically cultivated ones.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1420-3049/25/3/763/s1, Figures S1–S6, Tables S1–S4.

**Author Contributions:** Conceptualization, Y.-S.K. and H.-K.C.; methodology, S.Y.K., D.Y.L., H.-K.C., and Y.-S.K.; analysis, S.-Y.K. and S.Y.K.; investigation, S.Y.K.; resources, B.K.S. and D.J.K.; data curation, S.-Y.K., and S.M.L.; writing—original draft preparation, S.-Y.K., S.M.L., and Y.-S.K.; writing—review and editing, S.-Y.K. and Y.-S.K.; visualization, S.-Y.K. and D.Y.L.; supervision, Y.-S.K. All authors have read and agree to the published version of the manuscript.

**Funding:** This work was supported by Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, Forestry (IPET) through Advanced Production Technology Development Program funded by Ministry of Agriculture, Food and Rural Affairs (MAFRA) (No. 316081-04), and the BK21 Plus program funded by the National Research Foundation of Korea (NRF) (No. 22A20130012233).

**Conflicts of Interest:** The authors declare no conflict of interest.
