*2.3. Sample Preparation*

Dry, solid plant material from each species was ground and homogenized utilizing a ball mill. Approximately 100 mg of the powdered sample material was carefully weighed and placed into a small centrifuge tube. Samples for GC/Q-ToF analysis were prepared using a two-step method. Two internal standards (C13H32 and C22H46) were selected. Each standard was combined with dichloromethane to obtain a solution with a concentration of 100 μg/mL of each internal standard. First, 340 μL dichloromethane with 80 μL of the prepared internal standard solution was added to the samples and sonicated for 1 hour. Next, the samples were centrifuged for 10 min. This procedure was repeated one more time without adding the internal standards, after which the supernatant was collected and filtered prior to the GC/Q-ToF analysis. Each sample was prepared in duplicate.

## *2.4. GC/Q-ToF Analysis*

All prepared samples were analyzed using an Agilent 7890B (GC) instrument equipped with an RS185 PAL3 autosampler. The GC was connected to an Agilent 7250 accurate-mass Q-ToF mass spectrometer. The capillary column (30 m × 0.25 mm i.d.) was coated with a 0.25 μm film of 5% phenyl methyl siloxane (J&W, HP-5MS). Helium at a constant flow rate of 1 mL/min was used as the carrier gas. Each sample was analyzed using the following GC oven program: 50 ◦C, held for 2 min, then heated at 2 ◦C/min to 280 ◦C, and finally held at 280 ◦C for 20 min. A post-runtime period of 5 min at 300 ◦C was also utilized. The inlet was programmed at 280 ◦C, while 1 μL of each sample was injected with a split ratio of 10:1. The transfer line from the GC to the Q-ToF was held at 300 ◦C. Duplicate injections were made for each sample.

The Q-ToF mass spectrometer was equipped with a high-emission low-energy electron ionization source which was operated with an electron energy of 70 eV and an emission current of 5.0 μA. During the experiment, the source, quadrupole, and transfer line temperatures were 280 ◦C, 150 ◦C, and 300 ◦C, respectively. All mass spectra data were recorded at a rate of 5 Hz from 35 to 500 *m/z* after a 5 min solvent delay. After every second sample injection, automated ToF mass calibration was performed utilizing a keyword command in the sequence table. Data were acquired utilizing Agilent MassHunter software (version B7.06.274, Agilent Technologies, Santa Clara, CA, USA). Further data processing was accomplished using Agilent MassHunter Qualitative Analysis and Quantitative Analysis (version 10.0.10305.0, Agilent Technologies, Santa Clara, CA, USA). The NIST database (version 2.3, NIST Standard Reference Materials, Gaithersburg, MD, USA) was utilized for tentative compound identification.

#### *2.5. Data Processing and Statistical Analysis*

As a part of data processing, the GC/Q-ToF data were converted into a .cef file format utilizing Agilent MassHunter Unknown Analysis (version 10.0.7070, Agilent Technologies, Santa Clara, CA, USA). The "SureMass" peak detection and deconvolution algorithm was elected, and a peak area filter of 10,000 counts was applied. Ions with identical elution profiles and similar spectral data were extracted as entities characterized by retention time (*tR*), peak intensity, and mass to charge ratio (*m*/*z)*. Then, the resulting .cef file for each sample was exported into the Mass Profiler Professional software package (version B.12.05, Agilent Technologies, Santa Clara, CA, USA) which includes SCP algorithms for further data processing.

After examining various minimum abundance counts, a setting of 5000 counts was finally selected for the extraction of entities from the spectra. The alignment of retention time, with a tolerance window of 0.15 min, and the similarity of the spectral pattern were carried out and compared across the entire sample set. The internal standard docosane (C22H46) was selected to normalize the peak intensity across all spectra. A stepwise reduction of entity dimensionality was performed based on common entities found across samples to further process the data. In addition, software settings such as parameter values (filter by flags), the frequency of occurrence (filter by frequency), the abundance of respective entities in classes (filter by sample variability), and one-way analysis of variance (ANOVA) were utilized and carried out by the software to filter the raw data. After filtering the raw data, quality control of the samples was performed by PCA to further reduce the dimensionality of the GC/Q-ToF data sets, increase interpretability, and minimize information loss. Based on the PCA, an SCP model was constructed. Five algorithms, namely, partial least squares discriminant analysis (PLS-DA), support vector machines (SVM), naive Bayes (NB), decision tree (DT), and neutral network (NN), were evaluated. The PLS-DA algorithm was selected since it was particularly well-suited for the project and resulted in the best prediction accuracy when compared to other algorithms. To validate the model, a k-fold cross-validation procedure was carried out. The validation procedure had three k-folds and was repeated ten times.

#### *2.6. Establishment of a Personal Compound Database and Library (PCDL)*

A PCDL was constructed using Agilent PCDL Manager software (version B8.00). Either readily available or isolated and fully characterized in-house chemical compounds were utilized as reference standards to establish the PCDL. Data including the retention

time, exact mass, and high-resolution MS fragmentation patterns were exported to the PCDL. Additional information, such as the molecular formula, compound name, and CAS number were assigned to each entry for constructing the PCDL.

#### **3. Results and Discussion**

#### *3.1. Extraction*

Although hexane is touted as an ideal extraction solvent for capturing a wide variety of volatiles in botanicals, the extraction efficiency is questionable for some of the semi-volatile polar constituents, such as salvinorins from *S. divinorum*. These limitations with hexane are alleviated by utilizing dichloromethane [25] as the solvent of choice. A simple sample extraction procedure with dichloromethane improved the overall throughput and captured a wide variety of volatile analytes for species identification.

#### *3.2. GC/Q-ToF Analysis*

After developing a satisfactory sample extraction technique and an optimized GC/Q-ToF method, the sample data were gathered (Figure 1). Upon examining the chromatograms of the investigated species, compounds were detected in the GC/Q-ToF analysis of the authentic *Salvia* plant extracts. Although there were slight variations among the concentrations of components within a particular *Salvia* species, characteristic and consistent fingerprinting patterns from the same species of *Salvia* were observed. However, distinct differences in their chemical profiles were noticed for different species, as illustrated in Figure 1.

Although approximately 200 compounds were tentatively identified from the five species, only 32 compounds which were found in the greatest abundance or were characteristic for each species were reported. The tentative identity of each analyte suggested by the NIST database was further confirmed with reference standards and the accurate mass of molecular ions when they were available for each analyte. Many early-eluting, highly volatile compounds were present in *S. officinalis*, *S. apiana*, and *S. mellifera*; however, these compounds were mostly absent in the samples of *S. divinorum* and *S miltiorrhiza*. After systematically examining the compounds present in each species, additional characteristic patterns were also established. For example, samples of *S. officinalis* contained the compounds β-thujone, viridiflorol, and verticiol, which were not detected in the other *Salvia* species. Although these compounds have been reported in other plant species, e.g., viridiflorol has been reported as a major constituent of *Allophylus edulis* [26], they were only present in *S. officinalis* among the five *Salvia* species in this study. Thus, the co-existence of β-thujone, viridiflorol, and verticiol can be used to distinguish *S. officinalis* from other *Salvia* species. This finding is also supported by a previous study comparing four *Salvia* species [27]. Likewise, samples of *S. mellifera* contained statistically significant amounts (*p* < 0.05) of camphor when compared to the other species. This is also consistent with Martino et al. report of *S. mellifera* containing approximately 12.2% camphor [28]. In addition, *S. mellifera* also contained β-amyrone, pectolinaringenin, and lupeol which were not detected in the other analyzed species. Only *S. apiana* samples contained γ-gurjunene and a statistically significant amount of isoledene. Unfortunately, due to the small amount of available literature concerning the volatile constituents of *S. apiana*, the authors were unable to confirm these findings with literature sources. *S. miltiorrhiza* samples contained the greatest amount (*p* < 0.05) of ferruginol, as well as the unique compound tanshinone II. The occurrence of tanshinone II in only *S. miltiorrhiza* samples is also supported by a review from Zhang et al. [9] In addition to being the only group that possessed the compounds salvinorin A and salvinorin B, *S. divinorum* also contained the greatest abundance (*p* < 0.05) of 8-hexadecyne. Willard and colleagues also reported the utility of salvinorin A in the identification of *S. divinorum* [25]. Utilizing these observed chemical distributions, each species' chemical fingerprint and the peak area percentage of detectable compounds can be obtained (Table 2A,B).

**Figure 1.** Representative chromatograms comparing *Salvia* species. Peak assignments: (1) α-pinene; (2) 1,8 cineole; (3) camphor; (4) viridiflorol; (5) verticiol; (6) salvigenin; (7) 8-hexadecyne; (8) salvinorin B; (9) salvinorin A; (10) γ-gurjunene; (11) lupeol; (12) ferruginol; (13) tanshinone II; (14) cryptotanshinone; (IS-1) tridecane; (IS-2) docosane.


**Table 2.** Tentative compound identification based on NIST library and percent (% peak area) of volatile compounds in methylene chloride extracts of (A) *S. officinalis* and *S. apiana* and (B) *S. divinorum*, *S. mellifera*, and *S. miltiorrhiza* using GC/Q-ToF analysis.

nd: not detected; tr: trace amount; <sup>a</sup> compound identification based on NIST library was confirmed with reference standard; <sup>b</sup> accurate mass was consistent with GC/Q-ToF analysis.

Heptacosane <sup>b</sup> 109.945 nd nd nd 2.55 1.93 2.21 nd nd nd nd nd nd Salvigenin <sup>b</sup> 110.291 nd nd nd 2.55 1.93 2.21 nd nd nd nd nd nd Salvinorin B <sup>b</sup> 110.841 3.60 0.75 1.12 nd nd nd nd nd nd nd nd nd Salvinorin A <sup>a</sup> 114.026 33.48 20.24 22.70 nd nd nd nd nd nd nd nd nd β-Amyrone <sup>b</sup> 116.668 nd nd nd 1.14 0.96 0.79 nd nd nd nd nd nd Lupeol <sup>b</sup> 117.441 nd nd nd 4.16 3.34 4.09 nd nd nd nd nd nd

#### *3.3. Chemometric Analysis*

Although the GC/MS identification of *Salvia* species is a popular means of species identification, it is often time-consuming [16]. While this method is well suited for small sample sizes, it does not lend itself to high-throughput applications, such as batch processing or quality control. With the coupling of GC to a Q-ToF mass spectrometer, vast amounts of high-resolution structural data can be gathered from compounds in each sample. Utilizing this data along with chemometrics, researchers can develop an SCP model from the data obtained from species [29,30].

PCA is a useful analysis that can transform large and complex data sets into manageable information for interpretation [31]. The stepwise reduction in entity dimensionality was performed based on filtering by flags, filtering by frequency, filtering by sample variability, and the results of ANOVA. Stepwise filtering intentionally created a strong filter so that the most discriminant entities could be used to construct the prediction model. After filtering, a PCA was performed, as illustrated in Figure 2. Good separation and speciesspecific clustering of the different *Salvia* species was achieved. Approximately 50% of the variation among species could be attributed to component **1**. Additional variation and separation could be explained by component **2** (15%). Contributing the least, component **3** only accounted for approximately 9% of the variation observed among the species.

**Figure 2.** PCA score plot of five *Salvia* species.

Although the PCA demonstrated good separation between different *Salvia* species, it was unable to assign and predict the identity of unknown/commercial *Salvia* species sold in the U.S. market. Therefore, the GC/Q-ToF data for the authenticate samples were subjected to supervised chemometric methods. The first step in the SCP model construction process is to select the algorithm that is best suited to the project and the data set parameters. The PLS-DA [29] algorithm was found to be the best suited to construct a statistical model for *Salvia* classification and differentiation. Good separation obtained by the PLS-DA model among different *Salvia* species is shown in Figure 3. Once established, the software can use the sample characteristics and the associated algorithms to classify unknown samples. As illustrated in Figure 3, the PLS-DA successfully separated and clustered members of the authentic samples.

**Figure 3.** Score plot of the PLS-DA model constructed based on GC/Q-ToF data for the authenticated *Salvia* samples from five different species.

To validate the constructed model, the same authenticated samples used for the model training were repeatedly used due to the limited number of authenticated plant samples available. Although redundant, this is a valid statistical procedure (k-fold cross validation). Both the recognition and prediction abilities of the class prediction model were 100%, as shown in Table 3. Once the test was complete, a "confusion matrix" was generated. The test results indicated that this SCP could successfully identify and classify samples (Table 3). The construction of the SCP not only allows a large number of samples to be classified efficiently, but also in an automated manner. This allows the user to process additional samples at any point in the future.


**Table 3.** Summary of classification results obtained by the PLS-DA model.

#### *3.4. Construction of a Personal Compound Database and Library (PCDL) for High-Throughput Screening*

Although compound identification can be accomplished by manual inspection, this process can be both time-consuming and inefficient due to the large amount of highresolution data obtained. With this in mind, a PCDL was constructed to facilitate the efficient throughput of samples. From the PCA loading plot (Figure S1 in the Supplementary Material), which is a visual representation of the "characteristic compounds" found in different *Salvia* species, marker compounds correlating to the separation of different species

or the clustering of similar species were identified [30]. As illustrated in Table 4, each species could be distinguished by a few select compounds. Hence, the identified marker compounds that were commercially available or isolated in-house were analyzed by using the identical GC/Q-ToF method.


**Table 4.** Proposed marker compounds tentatively identified for the differentiation of selected *Salvia* species.

\* Statistically significant amount detected (*p* < 0.05).

After analyzing the standards, data including the retention time, exact mass, and a curated accurate mass spectrum containing mass assignments for each spectral peak were exported to the PCDL. Utilizing the PCDL software, additional data such as the molecular formula, compound name, and CAS number were also captured. Figure 4 shows an overview of the PCDL table with the spectrum of salvinorin A, one of the marker compounds only present in *S. divinorum*.

The commercially available MassHunter Unknown Analysis software uses an algorithm called "SureMass" to find peaks in the accurate mass chromatogram and searches a mass spectral library or PCDL to identify compounds. If the library has locked retention times or index values, these can also be used as filters. If these filters are utilized, "hits" must have the correct retention time (*tR*) and be similar to the database spectrum. Figure 5 illustrates the results for the identification and isotope pattern for salvinorin A in one of the *S. divinorum* samples.

The "SureMass" peak-finding algorithm uses the added information available in highresolution accurate mass data. For instance, extracted ion chromatograms of salvinorin A are overlaid and compared in Figure 5A. In contrast, a "head-to-tail" comparison plot of the high-resolution mass spectra of the suspected target and the reference compound illustrates the matching spectra (Figure 5B). In addition, the software can generate the compound's isotope pattern if the molecular ion is detected in sufficient abundance. The compound's theoretical value is next compared to the detected isotope's *m*/*z* and relative abundance [30]. Additional confidence in the correct identification of the compound is provided when the theoretical value and detected *m/z* and abundance are good matches. In Figure 5C, the detected isotope pattern of salvinorin A (black vertical lines) is compared to the theoretical isotope pattern represented by red boxes. In the present study, peaks from the sample spectra of the five *Salvia* species that were identified by "SureMass" were compared to the in-house-constructed PCDL. This approach is inherently simple and data review is relatively easy. Once the PCDL is constructed, it not only allows for high sample

throughput, but can be easily utilized in the future to analyze additional samples or be shared with research labs that do not have standard marker compounds.


**Figure 4.** A section of the PCDL showing some of the content available for each entry and the accurate mass EI spectrum of salvinorin A from the PCDL.

**Figure 5.** Identification of salvinorin A from *S. divinorum* (#22490). (**A**) Overlaid chromatograms of the five ions extracted for salvinorin A; (**B**) a "head-to-tail" comparison plot of high-resolution spectra of salvinorin A from PCDL (black) and the sample (orange); (**C**) the isotope pattern of the molecular ion (black vertical lines) compared to the theoretical pattern (red boxes).

#### **4. Conclusions**

Members of the genus *Salvia* have a rich history of both culinary and medicinal usage. With approximately 900 species included in the genus *Salvia*, the accurate species identification of processed botanical material can be a daunting task [2]. Although arduous, this task is of vast importance since the herb possesses species-specific pharmacological properties [2]. In the present study, we analyzed five species of botanically verified, medicinally important *Salvia* (*apiana, divinorum, mellifera, miltiorrhiza*, and *officinalis*) to develop a single analytical method for species differentiation purposes. Leveraging advances in software, the GC/Q-ToF of volatile organics, and the accurate mass spectral data allowed the unambiguous identification of five studied *Salvia* species. Although some of the marker compounds can be found in other plants, it is both the combination and concentration of the compounds that can aid in the species identification of *Salvia* botanical material. The implementation of chemometric analysis, *viz*. the PCA [29,30] of the *Salvia* samples, resulted in the identification of marker compounds for different *Salvia* species. Furthermore, the same PCA programs can also be expanded to build prediction models which may be utilized and modified for high-throughput sample analyses and classification purposes. To aid further, a PCDL combined with high-resolution mass spectrometry was developed with the versatility and ability to identify individual compounds present in *Salvia* samples.

In summary, by utilizing GC/Q-ToF, we obtained chemical fingerprints of each *Salvia* species being investigated. This information was further processed to construct an SCP model. By utilizing this model, future unknown samples can easily and efficiently be identified. As analytical needs change over time, the SCP model allows researchers to expand by including other economically important *Salvia* species. By leveraging advanced analytical techniques and chemometrics, the quality of closely related botanicals can be confirmed successfully, as demonstrated with a broad spectrum of biologically active *Salvia* species with complex chemistries.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/foods11142132/s1, Figure S1: PCA loading plot illustrating suggested species marker compounds.

**Author Contributions:** Sample preparation, gas chromatography, chemical analyses, data analysis, and manuscript preparation, J.L. (Joseph Lee) and M.W.; conceptualization, resources including funding, review, and editing, I.A.K., A.G.C., C.W. and J.L. (Jing Li); chemometric analysis, J.Z.; methodology, data validation and cross confirmation with LC-MS and B.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is supported by the "Holistic Approach for Potential Drug Interactions with Botanical Drugs" funded by the Center for Drug Evaluation and Research, United States Food and Drug Administration, grant number HHSF223201810175C and "Discovery & Development of Natural Products for Pharmaceutical & Agricultural Applications" funded by the United States Department of Agriculture, Agricultural Research Service, Specific Cooperative Agreement No. 58-6060-6-015. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Food and Drug Administration and the Department of Agriculture, Agricultural Research Service.

**Data Availability Statement:** Data is contained within the article or supplementary material. The data presented in this study are available on request to the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

