**A Rapid HPLC-UV Protocol Coupled to Chemometric Analysis for the Determination of the Major Phenolic Constituents and Tocopherol Content in Almonds and the Discrimination of the Geographical Origin**

**Natasa P. Kalogiouri 1, \* , Petros D. Mitsikaris 2 , Dimitris Klaoudatos 3 , Athanasios N. Papadopoulos 2 and Victoria F. Samanidou 1**


**Citation:** Kalogiouri, N.P.; Mitsikaris, P.D.; Klaoudatos, D.; Papadopoulos, A.N.; Samanidou, V.F. A Rapid HPLC-UV Protocol Coupled to Chemometric Analysis for the Determination of the Major Phenolic Constituents and Tocopherol Content in Almonds and the Discrimination of the Geographical Origin. *Molecules* **2021**, *26*, 5433. https://doi.org/ 10.3390/molecules26185433

Academic Editors: Anna Barros and Eulogio J. Llorent-Martínez

Received: 17 August 2021 Accepted: 4 September 2021 Published: 7 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Abstract:** Reversed phase-high-pressure liquid chromatographic methodologies equipped with UV detector (RP-HPLC-UV) were developed for the determination of phenolic compounds and tocopherols in almonds. Nineteen samples of *Texas* almonds originating from USA and Greece were analyzed and 7 phenolic acids, 7 flavonoids, and tocopherols (−α, −β + γ) were determined. The analytical methodologies were validated and presented excellent linearity (r <sup>2</sup> > 0.99), high recoveries over the range between 83.1 (syringic acid) to 95.5% (ferulic acid) for within-day assay (*n* = 6), and between 90.2 (diosmin) to 103.4% (rosmarinic acid) for between-day assay (*n* = 3 × 3), for phenolic compounds, and between 95.1 and 100.4% for within-day assay (*n* = 6), and between 93.2–96.2% for between-day assay (*n* = 3 × 3) for tocopherols. The analytes were further quantified, and the results were analyzed by principal component analysis (PCA), and agglomerative hierarchical clustering (AHC) to investigate potential differences between the bioactive content of almonds and the geographical origin. A decision tree (DT) was developed for the prediction of the geographical origin of almonds proposing a characteristic marker with a concentration threshold, proving to be a promising and reliable tool for the guarantee of the authenticity of the almonds.

**Keywords:** almonds; HPLC; authenticity; PCA; tocopherols; phenolics

#### **1. Introduction**

The current trend in nutrition is following the Mediterranean diet, as it is considered one of the healthiest dietary patterns. Nuts are a highly nutritious food with unique taste and beneficial health properties deriving from their unique molecular composition. Popular tree nuts comprise almonds (*Prunus amygdalus Batsch* or *P. dulcis*), walnuts (*Juglans regia* L.), hazelnuts (*Corylus avellane* L.), and pistachios (*Pistachia vera* L.), among others. Almonds are one of the most popular and widely harvested culinary nuts in the world. Apart from their unique taste and texture, they have been proven to possess a wide variety of beneficial health properties deriving from their unique molecular composition. Thus, they are now considered as an important component of a healthy and highly nutritious diet [1–5].

Numerous studies have shown that various pharmacological activities can be attributed to regular consumption of almonds. A meta-analysis observed a significant reduction in LDL-C levels with almond consumption [6]. Additionally, a systematic review

conducted by Kalita et al. [7] suggests that eating almonds leads to a significant reduction in total cholesterol, LDL-C, and triglycerides levels, whilst the impact on HDL-C levels is minor. In a randomized, controlled crossover study that took place over a time period of six weeks, individuals that consumed 45 g of almonds per day showed reduced LDL-C and non-HDL-C levels and, at the same time, maintained their HDL-C levels [8]. The study also demonstrated that almond intake reduced abdominal fat which is of very high significance, considering the fact that high amounts of abdominal fat are a major factor in metabolic syndrome. Furthermore, studies suggest that apart from reducing the risk of cardiovascular disease, almonds exhibit anti-inflammatory and anti-carcinogenic effects [9]. From these studies accrues the conclusion that almonds can be an effective diet tool in the process of trying to decrease an individual's cholesterol levels, hence reducing his risk of coming across any type of cardiovascular disease.

The beneficial health effects are mainly owed to their favorable phytochemical composition. Almonds are rich in bioactive constituents, mainly in phenolics and tocopherols. These compounds are defined as secondary plant metabolites and originate from carbohydrates through the shikimate and phenyl propanoid pathways [2,10]. Their chemical structure is characterized by one or more aromatic rings bearing at least one hydroxyl group [2]. Tocopherols are one of two subgroups that comprise vitamin E, with the other one being tocotrienols. Tocopherols are constituted by four derivatives: alpha, beta, gamma, and delta [11–13]. It is suggested that phenolic compounds that are found in almond skins act synergistically with vitamins C and E and protect the LDL particles from oxidation, resulting in the overall enhancement of the individual's antioxidant capacity [14].

Although polyphenols and tocopherols are ubiquitous in nuts, and particularly in almonds, their content, distribution and bioavailability vary depending on genetics, location, plant structure, pre- and post-harvest factors and climate conditions [15–17]. In this context, the analysis of almonds' phenolic content could provide useful information, making the evaluation process of different almond cultivars produced in different countries more accurate. The authentication process of various almond cultivars also contributes to the assessment of overall almond quality. However, traditional methods of doing so depend largely on environmental and production factors, making the differentiation between cultivars, geographical origin, and type of farming a difficult task to tackle [15,18–22]. Hence, the need arises to develop specific analytical methodologies and protocols that are applicable to a wide variety of nut types, with the end goal of differentiating them based on their phenolic content.

The determination of small bioactive molecules from food matrices involves the examination of several distinct aspects of the analytical methodology. Separation of phenolic compounds and tocopherols is mainly achieved with high pressure liquid chromatography (HPLC) coupled to UV [12,23,24], photodiode array (DAD) [25,26], or mass spectrometric (MS) detectors [27,28]. The most crucial step of the analytical methodology is sample preparation. Several laborious and time-consuming protocols have been proposed, suggesting the use of large volumes of organic solvents and Soxhlet-type apparatus [29,30]. The objective is to eliminate the use of organic solvents, minimize extraction times and select techniques that are suitable for the rapid determination of bioactive constituents [23]. The further processing of the results with chemometric tools increases the extensiveness of the analysis, enlightening the reliability of the conclusions derived from the experimental data. Data mining and the development of chemometric models are widely used in food authenticity studies for the investigation of several issues such as the discrimination of botanical origin, geographical origin, farming type, etc. [31–33].

The objective of this research was to develop two rapid HPLC-UV methodologies for the determination of the major phenolic compounds and tocopherols in almonds of the *Texas* variety originating from Greece and the USA. The quantification results were further analyzed with agglomerative hierarchical clustering (AHC) and principal component analysis (PCA) to investigate similarities between samples of the same geographical origin. A decision tree (DT) was developed for the classification of almonds, proving to be a

promising and reliable tool for verifying the geographical origin on the basis of their phenolic profile and bioactive content.

#### **2. Results**

#### *2.1. Analytical Performance*

2.1.1. RP-HPLC-UV Method for the Determination of Phenolic Compounds

The analytical parameters of the HPLC-UV methodology for the determination of phenolic compounds, including the calibration curves, the linear range, the determination coefficients, the limits of detection (LODs) and limits of quantification (LOQs), precision and accuracy are summarized in Table S1. As it can be observed, the coefficients of determination ranged between 0.991 and 0.999, showing good linearity for all the phenolic analytes. The LOQs were found to range between 0.24 (rosmarinic acid) to 1.80 µg/g (diosmin), while the LODs were calculated equal to 0.08 (rosmarinic acid) to −0.60 µg/g (vanillin). The RSD% of the within-day (*n* = 6) and between-day assays (*n* = 3 × 3) was lower than 6.1 and 10.3, respectively, presenting adequate precision. The accuracy was assessed by means of relative percentage of recovery (%R) at three concentration levels (0.5, 5, 10 µg/g) and ranged between 83.1 (syringic acid at 10 µg/g concentration level) to 95.5% (ferulic acid at 0.5 µg/g concentration level) for within-day assay (*n* = 6) (Table S2), and between 90.2 (diosmin) to 103.4% (rosmarinic acid) for between-day assay (*n* = 3 × 3) (Table S3).

#### 2.1.2. RP-HPLC-UV for the Determination of Tocopherols

The analytical parameters of the RP-HPLC-UV methodology for the determination of tocopherols are presented in Table S4. The LOQs were found to range between 0.36 (γ-tocopherol) to −0.99 µg/g (α-tocopherol), while the LODs were calculated equal to 0.12 (γ-tocopherol) to −0.33 µg/g (α-tocopherol). The RSD% of the within-day (*n* = 6) and between-day assays (*n* = 3 × 3) was lower than 5.5 and 8.1, respectively, presenting adequate precision. The accuracy was assessed by means of relative percentage of recovery at three concentration levels (0.5, 5, 10 µg/g) and ranged between 95.1 and 100.4% for withinday assay (*n* = 6) (Table S5), and between 93.2–96.2% for between-day assay (*n* = 3 × 3) (Table S6).

#### *2.2. Real Samples' Application*

#### 2.2.1. Determination of Phenolic Compounds

Nineteen almond samples of the *Texas* variety from USA and Greece were analyzed. In total, fourteen phenolic compounds were determined. Gallic acid, ferulic acid, sinapic acid, rosmarinic acid, vanillic acid, p-coumaric, and caffeic acid were determined from the class of phenolic acids. Diosmin, catechin, epicatechin, quercetin, luteolin, apigenin, and kaempferol were determined from the class of flavonoids. A characteristic chromatogram, of a spiked sample at 5 µg/g is presented in the Supplementary Materials Figure S1. The retention times of the identified phenolic analytes are presented in Table S7. All samples were analyzed in triplicate and the concentration ranges as well as the mean values (±SD) are presented in Table 1.

The results are in accordance with Coric et al. [26] and Boiling [34]. Specifically, vanillic acid ranged between 1.37 to 4.25 µg/g in Greek almonds and between 1.03 to 2.23 µg/g in American almonds, similarly to Coric et al. [26] who reported a range of 0.38–2.84 µg/g. Caffeic acid ranged between 1.18 to 1.85 µg/g in Greek almonds and between 0.82 to 1.90 µg/g in American almonds, slightly higher concentrations compared to Coric et al. [26] who reported concentrations up to 1.48 µg/g. Sinapic acid ranged between 1.25 to 4.48 µg/g in Greek almonds, and between 1.02 to 3.65 µg/g in American almonds, correspondingly to Coric et al. [26] who reported concentrations up to 3.50 µg/g. Syringic acid was not detected in any of the samples, while p-coumaric acid was detected below the LOQ. Furthermore, rosmarinic acid ranged between 1.03 to 1.84 µg/g in Greek almonds and between 2.51 to 4.19 µg/g in American almonds. The detected concentrations of

rosmarinic acid are higher than those reported previously by Keser et al. [35]. Significantly high concentrations up to 4.56 µg/g in Greek almonds and up to 1.81 µg/g in American almonds were detected for gallic acid, as well, compared to the literature [26,34,36].

**Table 1.** Phenolic compounds' quantification results in Greek and American almonds (samples analyzed in triplicate, *n* = 3).


As far as flavonoids are concerned, catechin was the dominant phenolic compound with similar mean values of 21.3 µg/g for Greek and 20.2 µg/g for American, respectively. The second most abundant flavonoid was diosmin with a higher mean value of 8.06 µg/g in American almonds, compared to Greek almonds (3.91 µg/g). Higher concentrations of apigenin were detected in Greek almonds over the range 4.65 to 8.65 µg/g compared to American almonds (up to 3.21 µg/g). The mean concentration of luteolin in Greek almonds was found equal to 0.59 µg/g, while it was not detected in American almonds. Epicatechin ranged between 3.21–6.01 µg/g in Greek almonds and between 1.02 to 1.21 µg/g in American almonds. Kaempferol was detected in Greek almonds at a higher concentration with a mean value of 2.53 µg/g compared to 1.30 µg/g that was detected in American almonds, similarly to Coric et al. [26] who reported concentrations up to 2.63 µg/g. Finally, quercetin was detected at a mean concentration of 0.53 µg/g in Greek almonds and was not detected in American almonds, since according to the literature [28,34], the glucoside is mainly dominant in almonds and not its aglycone form.

#### 2.2.2. Determination of Tocopherols

The separation of tocopherols was achieved within 15 min. The gradient elution program performed separation of α-tocopherol at (Rt = 9.2 min) and δ-tocopherol (Rt = 12.1 min), while β + γ tocopherols co-eluted (Rt = 10.6 min) and were analyzed as a sum according to Gliszczy ´nska-Swigło et al. [ ´ 9]. A representative chromatogram of the 10 µg/g standard solution mixture is shown in the Supplementary Materials Figure S2. The analysis of tocopherols proved that almonds constitute a great source of α-tocopherol which ranged between 502 to 802 µg/g and between 221 to 326 µg/g in American almonds. γ-Tocopherol was measured as the sum of β- and γ-tocopherol, since these tocopherols are isomers and co-elute in RP chromatographic systems. δ-Tocopherol was not detected in any of the analyzed samples. All samples were analyzed in triplicate and quantification ranges and the mean values (±SD) are presented in Table 2.


**Table 2.** Tocopherols' quantification results in Greek and American almonds (samples analyzed in triplicate, *n* = 3).

#### *2.3. Chemometric Analysis*

The quantification results of the determined phenolic compounds and tocopherols (Sections 2.2.1 and 2.2.2) were further processed with chemometric tools to examine if the samples can be classified according to their phenolic composition and tocopherol content.

#### 2.3.1. PCA

PCA was applied in the analysis of nineteen different samples of almonds originating from Greece and the USA. The data matrix consisted of sixteen features (quantification results of phenolics and tocopherols) and was normalized using the auto-scaling function of the MetaboAnalyst package [37]. Figure 1 presents the score plot and the clustering of the almonds into two individual groups, according to the geographical origin. Almonds originating from Greece are marked in red and almonds originating from the USA are marked in green. The first two Principal Components (PCs) explained 66.8% of the variance, presenting appropriate groups of samples of the same variety and geographical origin. The PCA biplot in Figure S3 presents the influence of the variables in each PC.

**Figure 1.** PCA Score plot showing the pairwise correlation between PCs in the clustering between Greek and American almonds.

'

#### 2.3.2. Agglomerative Hierarchical Clustering

Cluster analysis was employed to divide the matrix into homogeneous groups measuring the distance between each pair of objects and without previous knowledge about the structure of the groups. A tree diagram was built with AHC to identify the groups that present high similarity. Each object is considered a singleton cluster (leaf) by the algorithm. Subsequently, the pairs of clusters are merged until all of them end up into a large cluster that contains all the objects [38], resulting in a tree-based representation, the so-called dendrogram.

Figure 2 presents the dendrogram of the eleven Greek and eight American almonds' clustering in two major groups according to the place of origin.

– – ' **Figure 2.** HCA dendrogram of Greek (G1–G11) and American (U1–U8) almonds' clustering in two major groups.

– – The heatmap in Figure 3 presents the data matrix showing pairwise correlations between the Greek (G1–G11) and American almonds (U1–U8). Each one of the colored cells corresponds to a concentration value; the samples are represented in the columns and the compounds in the rows.

– – **Figure 3.** Clustered image map acquired by HCA dendrogram showing pairwise correlation between almonds produced in USA (U1–U8, in green color on the left-hand side of the heatmap) and Greece (G1–G11, in red color on the right-hand of the heatmap). The red blocks of the heatmap indicate positive correlations, whereas the blue blocks indicate negative correlations for the clustering of the samples. The lighter shades indicate smaller correlation values.

#### 2.3.3. Decision Tree

iable that minimizes the model's total error. The initial dataset was split into a training μ The DT algorithm was built to develop a prediction model by splitting the data repeatedly into two discrete subsets according to the numerical value (i.e., concentration threshold) of the selected explanatory variable. The model selects the most significant variable that minimizes the model's total error. The initial dataset was split into a training and a test set. Twelve samples were used as training set and seven as test set. The developed DT suggested that ferulic acid could be used as a characteristic marker for the discrimination between Greek and American almonds and succeeded in classifying the samples with zero error, resulting in two terminal nodes and setting the concentration threshold of 1.54 µg/g. The developed DT was validated with a receiver operating characteristics (ROC) plot for each class of almonds with 1-specificity and zero error (Figure S4).

μ μ According to Figure 4, almonds with calculated concentrations lower or equal to 1.54 µg/g were produced in the USA, while those with higher concentrations than 1.54 µg/g were produced in Greece.

—

Ε

**Figure 4.** Optimal decision tree diagram using predictive analysis for phenolic and tocopherol concentration of almond samples originating from Greece and USA.

#### **3. Materials and Methods**

#### *3.1. Chemicals and Standards*

Acetonitrile (ACN) and 2-propanol (IPA), HPLC grade, were purchased by Panreac —AppliChem (Darmstadt, Germany). Methanol (MeOH), HPLC grade, was acquired by Carl Roth (Carlsruhe, Germany). Hexane reagent grade 99% and acetic acid 99% were purchased by Sigma-Aldrich (Steinheim, Germany). Ultrapure water was provided by a Milli-Q purification system (Millipore, Bedford, MA, USA).

α β γ δ −2 Sinapic acid 95%, gallic acid 98%, ferulic acid 98%, rosmarinic acid 98%, catechin 98%, epicatechin 97%, p-coumaric 98%, quercetin 98%, diosmin 97%, kaempferol 97%, vanillic acid 97%, and caffeic acid 98% were purchased from Sigma-Aldrich (Steinheim, Germany). Luteolin 98% was acquired from Santa Cruz Biotechnologies. Apigenin 97% was purchased from Alfa Aesar (Karlsruhe, Germany). α-Tocopherol 96%, β-tocopherol 96%, γ-tocopherol 96%, and δ-tocopherol 96% were purchased by Sigma-Aldrich (Steinheim, Germany). Stock standard solutions of each analyte (1000 mg/L) were solubilized in methanol and stored at −20 ◦C in dark brown glass bottles.

#### *3.2. Collection of Samples*

Eleven Greek almond samples belonging to the variety *Texas* were acquired from different producers, originating from different territories around Greece (Evia, Trikala, Vergina, Katerini, Adendro, Elassona, Mouzaki, Aridaia, Veroia, Drama, Larissa), and eight almond samples of the *Texas* variety originating from California and available in the Greek market were acquired from eight different traders.

#### *3.3. Instrumentation*

The chromatographic analysis of the analytes was performed in an Αgilent (Santa The chromatographic analysis of the analytes was performed in an Agilent (Santa Clara, CA, USA) 1220 Infinity HPLC-UV, using gradient elution methods. The HPLC system consisted of the following: manual injector, column oven, degasser, and lastly, a UV

Detector. In order to monitor the analysis, the Agilent Open Lab software and the package Method and Run Control were used. For data processing, the Data Analysis software package was used to identify and integrate the peaks. A glass vacuum-filtration apparatus, produced by Alltech Associates (Deerfeld, IL, USA), in combination with cellulose nitrate 0.22 µm nylon filters (Whatman Laboratory Division, Maidstone, UK) were utilized for the filtration of the aqueous and organic phase, respectively. QMax RR syringe filters (0.22 µm nylon membrane) were purchased from Frisenette ApS (Knebel, Denmark) and used for filtering the real samples prior to analysis. An ultrasonic bath (MRC: DC-150H) by MRC (Essex, UK) was utilized to remove the template from the MIP as well as for sample preparation. A vortex mixer from VELP Scientifica (Usmate Velate, Italy) was used for the agitation of the samples. A centrifuge system 3–16PK by Sigma (Osterode am Harz, Germany) was operated for centrifugation.

#### *3.4. Chromatographic Conditions*

A Nucleosil RP-18 analytical column (250 mm × 4.6 mm, 5 µm particle size), supplied by Macherey–Nagel (Düren, Nordrhein-Westfalen, Germany) was used for the analysis of polyphenols at 280 nm. The mobile phase consisted of a mixture of acetic acid in ultrapure water 2% *v/v* (A) and acetic acid in ACN 0.5% *v/v* (B). The system operated in a gradient mode at 28 ◦C. At the beginning of the analysis, the mixture was 100% A, gradually dropping to 20% A at the 60 min mark. The flow rate was set at 1 mL/min. Each chromatographic run lasted for 60 min. The peaks were identified by comparing the retention time of the standard compound with the peaks detected in real samples.

A Kromasil RP-18 analytical column (125 mm × 4.6 mm, 5 µm particle size), purchased from Macherey–Nagel, was used for the analysis of tocopherols at 295 nm. The mobile phase consisted of a mixture of methanol (A) and ACN (B). The system operated in a gradient mode at 28 ◦C. At the beginning of the analysis, the mixture was stable at 50% A and at the 7 min mark, it gradually increased to 100% A until the 12 min mark. The flow rate was set at 1 mL/min. Each chromatographic run lasted for 15 min. The peaks were identified by comparing the retention time of the standard compound with the peaks detected in real samples.

#### *3.5. Sample Preparation*

For the extraction of the phenolic compounds, a modified extraction protocol was applied, as previously suggested by Kritikou et al. [39]. In brief, 0.5 g of each homogenized sample was weighted in 2 mL Eppendorf tubes and 1 mL of MeoH: H2O (80:20, *v*/*v*) was added. The samples were vortexed for 1 min and then they were centrifuged for 5 min at 8000 rpm. The extract was collected and filtered through 0.45 µm nylon filters and 20 µL were injected in the chromatographic system. As for the extraction of tocopherols, 1 g of homogenized samples were weighted in 15 mL falcon tubes and 10 mL of hexane were added to extract the lipid fraction. The samples were vortexed for 1 min and they were then placed in an ultra-sound bath at 40 ◦C for 10 min. In a further step, the falcon tubes were centrifuged at 8000 rpm for 10 min. The organic layer was transferred and evaporated in a rotary evaporator under vacuum. The almond oil product was collected and stored in dark brown vials at −20 ◦C. Prior to analysis, 20 mg of oil was weighed and dissolved in 500 µL 2-propanol, according to Martakos et al. [40]. The mixture was filtered through nylon 0.45 µm syringe filters and an aliquot of 20 µL was injected into the HPLC system.

#### *3.6. Method Validation*

Method validation was performed for both methodologies to estimate selectivity, linearity, LODs and limits of quantification (LOQs), within-day, and between-day accuracy and precision, respectively. Linearity studies were performed in triplicate and covered the working range of 0.5–20 µg/g which was selected for the phenolic compounds, and the working range of 5–50 µg/g was selected for tocopherols. Linearity was assessed by constructing calibration curves for each analyte using standard solutions. Eight point

calibration curves were constructed by plotting the peak area versus concentration. LODs and LOQs were calculated on the basis of the S/N of the analyte until an S/N ratio of 3:1 (LOD) and 10:1 (LOQ) was reached. [41].

Accuracy and precision were studied for both methods using a pool sample spiked at three different concentrations: 0.5 µg/g—10 µg/g—20 µg/g for phenolics and, 5 µg/g—25 µg/g—50 µg/g for tocopherols, all analyzed in triplicate. Relative recoveries (R%) were calculated by means of recovery percentage, by comparing the found and added concentrations of the examined analytes (mean concentration found/concentration\*100, R%), expressing accuracy. The precision of the method was expressed in terms of relative standard deviation (RSD%). Following this approach, within-day precision (repeatability) was assessed in six replicates (*n* = 6), while between-day precision (reproducibility) were assessed by performing triplicate analysis for spiked samples within three consecutive days (*n* = 3 × 3) [41]. Five blank matrices were used to assess selectivity and no interferences were observed in the same chromatographic window for both methodologies.

#### *3.7. Chemometric Analysis*

PCA was used as a mathematical tool to represent the variation in the dataset of nineteen samples and sixteen features (phenolic compounds and tocopherols). PCA is an unsupervised chemometric method used for exploratory data analysis [42]. PCA selects the most important components to reduce data dimension and retain the variation of the data with the Principal Components (PCs) which are linear combinations of the variables of the dataset. The first PC explains the largest variance, the second PC presents the second largest variance, and so on [43]. HCA was also used to represent and visualize the classes of almonds, explore the similarities of the analyzed samples, and discover patterns among them [44]. A DT was developed in an attempt to discover patterns in the quantitative data and predict the geographical origin of the analyzed samples by assigning a numerical value. PCA and HCA were created in R using the MetaboAnalyst package [37]. The DT was created in Minitab 19 software (Minitab, PA, USA).

#### **4. Conclusions**

This work presents an innovative approach for assessing the bioactive content of almonds with the development of two RP-HPLC-UV methodologies for the determination of phenolic compounds and tocopherols, respectively. Nineteen samples of almonds originating from USA and Greece were analyzed, and gallic acid, ferulic acid, sinapic acid, caffeic acid, vanillic acid, p-coumaric acid, and rosmarinic acid were determined from the class of phenolic acids. Catechin, epicatechin, diosmin, quercetin, apigenin, luteolin, and kaempferol were determined from the class of flavonoids. Furthermore, from the group of tocopherols, α-tocopherol and the sum of (β + γ)-tocopherols were determined as well. The quantification results were further processed with chemometrics. PCA analysis quantitatively showed the distribution of the almonds on the score plot and the clear formation of two separate groups on the basis of their geographical origin (Greece or USA), with the first two PCs explaining the 66.7% of variance. An HCA dendrogram was built, as well, showing the clustering of two major groups according to the origin of production. Finally, a DT was developed for the prediction of the country of origin suggesting ferulic acid as a characteristic marker and proposing a concentration value of 1.54 µg/g.

The findings of this research have made progress towards the characterization of almonds that belong to the *Texas* variety, showing that the geographical origin affects the phenolic composition and tocopherol content, as well as showing that these bioactive constituents could be used for the authentication of almonds that are commercially available in the Greek market.

**Supplementary Materials:** The following are available online, Figure S1: Characteristic chromatogram of an almond sample spiked at 5 µg/g. Figure S2: Characteristic chromatogram of a standard mixture of tocopherols at 10 µg/g. Figure S3: PCA biplot presenting the projection of the data set in PC1 and PC2. The red vectors show the influence to each PC (atoc: α-tocopherol; bctoc: β

+ γ-tocopherol). Figure S4: Prediction model performance characteristics. Table S1: Chromatographic retention times of the phenolic compounds determined in almonds. Table S2: Recoveries (%R) for the evaluation of repeatability. Table S3: Recoveries (%R) for the evaluation of intermediate precision. Table S4: Recoveries (%R) for the evaluation of repeatability. Table S5: Recoveries (%R) for the evaluation of repeatability. Table S6: Recoveries (%R) for the evaluation of intermediate precision. Table S7. Chromatographic retention times of the phenolic compounds determined in almonds.

**Author Contributions:** Conceptualization, N.P.K.; methodology, N.P.K.; validation, N.P.K.; formal analysis, N.P.K. and P.D.M.; investigation, N.P.K., P.D.M. and D.K.; resources, A.N.P.; data curation, N.P.K. and P.D.M.; writing—original draft preparation, N.P.K. and P.D.M.; writing—review and editing, N.P.K., D.K., A.N.P. and V.F.S.; visualization, N.P.K. and D.K.; supervision, A.N.P. and V.F.S.; project administration, N.P.K. and V.F.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was co-financed by Greece and the European Union (European Social Fund, ESF) through the Operational Program "Human Resources Development, Education and Lifelong Learning" in the context of the project "Reinforcement of Postdoctoral Researchers—2nd Cycle" (MIS-5033021), implemented by the State Scholarships Foundation (IKY) with grant number 2019- 050-0503-17749.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Sample Availability:** Samples of the compounds are available from the authors.

#### **References**


## *Article* **A Rapid LC-MS/MS-PRM Assay for Serologic Quantification of Sialylated O-HPX Glycoforms in Patients with Liver Fibrosis**

**Aswini Panigrahi 1,2 , Julius Benicky 1,2 , Renhuizi Wei 1,2 , Jaeil Ahn 3 , Radoslav Goldman 1,2,4 and Miloslav Sanda 1,2,5, \***


**Abstract:** Development of high throughput robust methods is a prerequisite for a successful clinical use of LC-MS/MS assays. In earlier studies, we reported that nLC-MS/MS measurement of the O-glycoforms of HPX is an indicator of liver fibrosis. In this study, we show that a microflow LC-MS/MS method using a single column setup for capture of the analytes, desalting, fast gradient elution, and on-line mass spectrometry measurements, is robust, substantially faster, and even more sensitive than our nLC setup. We demonstrate applicability of the workflow on the quantification of the O-HPX glycoforms in unfractionated serum samples of control and liver disease patients. The assay requires microliter volumes of serum samples, and the platform is amenable to one hundred sample injections per day, providing a valuable tool for biomarker validation and screening studies.

**Keywords:** microflow LC-MS; mLC-MS/MS; liver fibrosis; hemopexin; biomarker

#### **1. Introduction**

Biomarker studies rely heavily on nano-flow liquid chromatography tandem mass spectrometry (nLC-MS/MS) for both the discovery shotgun proteomics and the targeted follow-up validation studies. In contrast to the small molecule analyte quantification, where standard HPLC flow rates for LC-MS analysis are common, the nLC-MS/MS has been favored for peptide quantification primarily because of the sensitivity of analyte detection. However, nLC-MS methods remain technically challenging, time consuming, and less robust [1], which limits their use in clinical laboratories or their applications to large sample sets.

More recently, researchers have begun to explore capillary columns with a bore wider than the conventional 75 µM ID nano-flow analytical columns [2–4]. This allows execution of the LC step of proteomic studies at a microflow rate, and at a substantially higher throughput. The increased flow rate reduces the gradient time and increases the reproducibility and robustness of the measurements [5]. However, in a conventional single spray-tip setup, the higher flow rate diminishes ionization efficiency and lowers sensitivity of detection below acceptable limits for the majority of the peptides in complex samples. This has been addressed by the development of a multi-nozzle emitter that splits the flow evenly into multiple smaller streams, which has been shown to enhance substantially the ionization efficiency [6]. In combination with advances in the sensitivity of the mass

**Citation:** Panigrahi, A.; Benicky, J.; Wei, R.; Ahn, J.; Goldman, R.; Sanda, M. A Rapid LC-MS/MS-PRM Assay for Serologic Quantification of Sialylated O-HPX Glycoforms in Patients with Liver Fibrosis. *Molecules* **2022**, *27*, 2213. https:// doi.org/10.3390/molecules27072213

Academic Editors: Victoria Samanidou and Natasa Kalogiouri

Received: 23 February 2022 Accepted: 25 March 2022 Published: 29 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

spectrometers, the microflow LC-MS/MS (mLC-MS/MS) methods reach sensitivity of detection comparable to that of nLC-MS/MS. Shotgun proteomics studies using mLC-MS/MS have reported identification of close to 10,000 proteins in cell digests, and stability and reproducibility over thousands of runs [5,7]. In these studies, the robustness of the method in high-throughput bottom-up proteomic analyses has been demonstrated using complex cell, tissue, and body fluid digests. The microflow method enabled avoidance of column overloading, resulting in good peak shapes. This, in addition to negligible carryovers, is critical for accurate quantification of compounds by the LC-MS/MS analyses. The method has been adapted for protein biomarker studies using data independent analysis (DIA), parallel reaction monitoring (PRM), and multiple reaction monitoring (MRM) [3,8–10]. However, we are not aware of any reports of the use of the mLC-MS/MS for the analysis of O-glycopeptides.

In this study, we developed a mLC-MS/MS-PRM assay for the quantification of site-specific mucin-type O-glycoforms of hemopexin, which we previously reported as a promising candidate biomarker for the serologic monitoring of liver fibrosis [11,12]. We have shown that the sialylated O-glycoforms of hemopexin (HPX) in serum of patients are associated with advancing fibrosis in hepatitis C-associated liver disease [11]. This may prove useful in the monitoring of the fibrotic liver disease, which affects a large segment of the world's population, and whose progression can be mitigated by timely lifestyle changes and interventions [13,14]. Our newly optimized method allows for capture of the analytes, desalting, and gradient elution using a one-column setup, directly in a tryptic digest of unfractionated serum, which significantly reduces the time needed for sample preparation and analysis. We used the method to quantify the HPX glycoforms in serum samples of HCV-induced liver disease, and we demonstrate that the mLC-MS/MS-PRM assay offers substantially higher throughput compared to our reported workflow [11], maintains higher sensitivity of detection, and offers a high-throughput serologic assay (100 injections/day) for an improved screening of these glycopeptide biomarker candidates.

#### **2. Results and Discussion**

Liver biopsy has been the gold standard in the diagnosis of fibrotic changes associated with chronic liver diseases, and non-invasive methods such as liver imaging, ultrasound elastography, and serologic monitoring provide additional options [13]. Serum protein biomarkers, including glycosylation pattern of liver secreted proteins, represent an attractive strategy for serologic monitoring of liver disease (reviewed in [15,16]). We have characterized O-glycoforms of HPX by mass spectrometry [11,12,17] and demonstrated that the relative abundance of the di- and mono-sialylated O-glycoforms increase significantly with the progressing fibrotic liver disease of HCV etiology [11]. Building upon our earlier studies, we aimed to develop a fast mLC-MS/MS assay to quantify the HPX glycoforms at high throughput.

#### *2.1. Microflow LC-MS/MS for the Quantification of O-HPX*

We optimized a microflow (1.5 µL/min) LC-MS/MS workflow with 5× higher throughput compared to the earlier nanoflow (0.3 µL/min) method. In a conventional metal/glass needle emitter setup this would translate to a loss of sensitivity because of the dilution of analytes. To circumvent this, we used a multi-nozzle emitter (8-nozzle, Newomics) [6], which has been reported to achieve sensitivity close to routine nLC-MS/MS applications.

The sample trapping and desalting was achieved within 2 min at a 5 µL/min flow rate using a 20 mm C18 trap column, followed by elution of the analytes at a 1.5 µL flow rate in 3 min, column washing for 2 min, followed by a 6 min equilibration step (total 13 min; for a schematic see Supplementary Figure S1). The time gap between each sample run is negligible, thus making the analysis of approximately 100 samples per day feasible. The analytes were measured by a scheduled PRM assay using an Orbitrap Fusion Lumos Mass Spectrometer (Thermo Scientific, Dreieich, Germany).

Measurement using serially diluted samples showed optimal sensitivity between 0.1 and 0.2 µg of injected serum protein sample (Figure 1). The retention time (RT) of the analytes was highly reproducible (RSD 0.20%, Figure 2) which is suitable for automated results processing. The S-HPX measurement (i.e., the ratio of disialo *m/z* 916.4/monosialo *m/z* 843.6 analyte) [11] was shown to be consistent over 50 injections (RSD 8.91%, Figure 3), demonstrating outstanding technical reproducibility of the label-free tandem mass spectrometry assay.

**Figure 1.** Peak area of tryptic monosialylated O-HPX glycopeptide in relation to the amount of serum protein analyzed by mass spectrometry.

**Figure 2.** Retention time of a tryptic HPX O-glycopeptide on an Acclaim PepMap 100 C18 column. The consistent elution time at 5.85 ± 0.12 min demonstrates excellent reproducibility.

#### *2.2. Application of the Micro-Flow LC-MS/MS Assay to Serum Samples of Liver Disease Patients*

We reported detectability of other O-glycoforms of HPX, including the Tn-antigen, in our previous study; however, we were not able to quantify these analytes in the patient samples [11]. In our current assay, we quantify the additional analytes because of enhanced sensitivity of the current setup in spite of the introduction of faster flow rates (Supplementary Table S1). The inclusion list consisted of multiple O-glycoforms of the N-terminal HPX tryptic peptide [HexNAc (*m/z* 973.5), HexNAc-Hex-Neu5Ac (*m/z* 843.6), HexNAc-Hex-2Neu5Ac (*m/z* 916.4), 2HexNAc-2Hex-2Neu5Ac (*m/z* 1007.7), 2HexNAc-2Hex-3Neu5Ac (*m/z* 1080.5), 2HexNAc-2Hex-4Neu5Ac (*m/z* 1153.2), HexNAc-Hex (*m/z* 770.9)]. Their elution profile shows that the analytes elute within a short window of 5.83–5.91 min (Figure 4). The enhanced detection of the O-HPX glycoforms in unfractionated serum samples using this microflow method may be due to the combination of sample loading capacity and excellent peak shape (Figure 5) obtained at the higher flow rate. With the assumption that minor ionization differences of the glycoforms do not affect the overall results, we calculated the ratios of multiple sialylated to respective monosialylated glycoforms. The ratios of the sialylated O-HPX analytes (S-HPX) were calculated based on the peak areas of the multiple sialylated structures to singly sialylated structures 916.4/843.6, 1080.5/1007.7, and 1153.2/1007.7 using the transitions described previously [11].

**Figure 3.** Repeat measurement of S-HPX from one sample by mass spectrometry. Each dot represents the ratio of the *m/z* 916.4 to the *m/z* 843.6 in one injection.

As a proof of applicability, we quantified S-HPX in serum samples of 15 HCV fibrotic and 15 HCV cirrhotic patients (HALT-C trial participants), and compared the quantities to 15 serum samples of healthy controls. The measurement was undertaken using a fixed volume of serum samples and the measure is normalized by the ratio of the glycoforms of the same protein, as described previously [11]. Statistical analyses were performed to find the association between the different analytes and the disease status. The mean ratio and standard error of 916.4/843.6 in control, fibrotic, and cirrhotic groups was 7.905 ± 0.8562, 13.69 ± 2.942, and 29.99 ± 4.950; and that of 1080.5/1007.7 was 8.802 ± 0.8, 11.65 ± 1.558, and 21.59 ± 2.587; and that of 1153.2/1007.7 was 1.07 ± 1.131, 4.261 ± 1.979, and 14.65 ± 3.49 respectively. One-way ANOVA analysis showed that the relative ratios for the three analytes, 916.4/843.6 (*p* < 0.0001), 1080.5/1007.7 (*p* < 0.0001), and 1153.2/1007.7 (*p* = 0.0004) vary significantly between the control, fibrosis, and cirrhosis groups (Figure 5). Thus, this study expands the number of meaningful analytes for the detection of liver fibrosis. It confirms the results observed in our earlier study, that the S-HPX increases progressively in fibrotic and cirrhotic participants compared to disease-free controls (Figure 5). Further studies are needed to understand the mechanism and biological processes controlling this outcome. Nevertheless, our results show that the mLC–MS/MS-PRM assay has adequate analytical performance for direct quantification of the clinically relevant S-HPX analyte in serum samples.

Overall, we demonstrate the utility of a 13 min mLC-MS/MS-PRM assay for the quantification of the S-HPX glycoforms diagnostic of liver fibrosis of HCV etiology. The assay is more sensitive compared to that of our earlier report, highly reproducible, and amenable to 100 sample injections per day. Target analyte carryover between the sample injections is negligible (results not shown). In conjunction with a simple sample preparation method

without an off-line desalting step, our workflow enables analysis of at least 30 samples per day in triplicate, including necessary QC injections. These parameters would be applicable in a clinical setting. A further increase in the throughput is feasible using a wider-bore capillary column with a higher flow rate, thereby reducing the gradient run time. A multi-nozzle emitter suitable for a flow rate up to 40 µL is commercially available and would support such adjustments. Optimization of a high-flow high-sensitivity methodology would be a focus for future studies.

**Figure 4.** The extracted ion chromatograms showing the elution of the sialylated O-HPX peptides. The composition of the analytes is provided in the bottom right panel.

**Figure 5.** Quantification of S-HPX in control (CTRL, *n* = 15) and progressing stages of liver disease, liver fibrosis (FIB, *n* = 15), and cirrhosis (CIR, *n* = 15). S-HPX, the ratio of monosialylated glycopeptide of the same structure (disialoT/monosialoT) increases significantly (*p* < 0.01) from the control, to the fibrosis and cirrhosis groups. Ratio of (**A**) HexNAc-Hex-2Neu5Ac/HexNAc-Hex-Neu5Ac, (**B**) 2HexNAc-2Hex-3Neu5Ac/2HexNAc-2Hex-2Neu5Ac, (**C**) 2HexNAc-2Hex-4Neu5Ac/2HexNAc-2Hex-2Neu5Ac.

#### **3. Materials and Methods**

#### *3.1. Materials*

Ammonium bicarbonate, DL-dithiothreitol (DTT), iodoacetamide (IAA) (Sigma-Aldrich St. Louis, MO, USA); sequencing grade trypsin (Promega, Madison, WI, USA)). LC/MS grade Water, 0.1% formic acid in Acetonitrile, 0.1% formic acid in Water (Thermo Fisher Scientific, Waltham, MA, USA). Acclaim PepMap 100 column (Thermo Fisher Scientific, Waltham, MA, USA).

#### *3.2. Sample Processing*

Serum samples were processed by trypsin digestion, without any enrichment step, as described earlier [11]. Briefly, 2 µL of each serum sample was diluted to 140 µL with 25 mM ammonium bi-carbonate; the proteins were reduced by 5 mM DTT at 60 ◦C for 1 h, followed by alkylation with 15 mM iodoacetamide for 20 min at RT in the dark. Residual iodoacetamide was reduced with 5 mM DTT for 20 min at RT. The proteins (20 µL by volume from above) were digested with mass spectrometry grade trypsin (1 µg) at 37 ◦C O/N. Tryptic peptides were analyzed without further processing to ensure reliable quantification of the glycoforms.

#### *3.3. Micro-Flow LC-MS/MS-PRM*

LC-MS/MS analysis was performed using an Ultimate 3000 RSLCnano chromatograph and Orbitrap Fusion Lumos Mass Spectrometer platform (Thermo) with a multi-nozzle emitter (NEWOMICS, Berkeley, CA, USA) used as the microflow sprayer. Glycopeptide separation was achieved in microflow mode using an Acclaim PepMap 100 capillary column 75 µm ID × 20 mm length, packed with C18 5 µm, 300 Å (Thermo). Glycopeptides were separated as follows: starting condition flow 5 µL, 2% ACN, 0.1% formic acid; 0–1 min flow 5 µL, 2% ACN, 0.1% formic acid; 1–2 min flow 1.5 µL, 2–5% ACN, 0.1% formic acid; 2–5 min flow 1.5 µL, 5–98% ACN, 0.1% formic acid; 7–9 min flow 1.5 µL, 98% ACN, 0.1% formic acid; followed by equilibration to starting conditions for an additional 4 min (Supplementary Figure S1).

We used a Parallel Reaction Monitoring (PRM) workflow with one MS **<sup>1</sup>** full scan (400–1800 *m/z*, resolution 120 K, max IT 50 ms) and scheduled MS/MS fragmentation (Isolation window *m/z* 2.0, HCD fragmentation, resolution 30 K, scan range 200–1400, RF Lens 55%) for the analysis of the sialylated O-HPX glycopeptide TPLPPTSAHGN-VAEGETKPDPVTER (Table 1).



#### *3.4. Study Population*

Serum samples of participants in the HALT-C trial were obtained from the central repository at the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) as described previously [12]. In this study, O-HPX glycoforms comparison was performed in 30 participants (15 HCV fibrotic and 15 HCV cirrhotic patients) and 15 disease-free controls that donated blood samples at Georgetown University (GU) in line with approved IRB protocols. Briefly, the HALT-C trial is a prospective randomized controlled trial of 1050 patients that evaluated the effect of long-term low-dose peginterferon alpha-2a in patients who failed initial anti-HCV therapy with interferon [18]. Liver disease status of the study participants was classified based on biopsy-evaluation into groups of fibrosis (Ishak score 3–4) or cirrhosis (Ishak score 5–6). The two groups of liver disease samples, and the controls, were frequency matched on age, gender, and race (Supplementary Table S2).

#### *3.5. Data Analysis*

LC-MS/MS data were processed by Quant Browser (Thermo) with manual confirmation/integration. Peak areas were used for peptide and glycopeptide quantification and data normalization. A specific Y-ion (e.g., loss of whole glycan) was used for the quantification of the O-glycopeptides. The specific backbone fragments (y-ions) were used for the confirmation of the correct O-glycopeptides signal. The details of the MS/MS transitions used for the quantification of each glycoforms are listed in Table 1. Relative intensity of multiple sialylated analyte was calculated by normalizing its peak area to the peak area of monosialylated glycopeptide of the same structure (DisialoT/monosialoT, etc.), as described previously [11].

Statistical analysis for the HCV dataset was performed using GraphPad Prism software (v9.3.1). The ratio of three HPX-sialylated analytes 916.4, 1080.5, and 1153.2, to their respective non-sialylated forms (843.6, 1007.7, and 1007.7), was used as the quantitative measure for evaluation of the liver disease. The mean, standard error of mean, and the oneway ANOVA test was performed to determine the correlation between different analytes and disease status, and the data was visualized by nested Tukey plot.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/molecules27072213/s1, Figure S1: A schematic showing gradient conditions for LC-MS/MS analysis, Table S1: Basic information on samples analyzed in this study, participant demography and disease conditions are provided.

**Author Contributions:** Conceptualization, M.S.; methodology, M.S. and J.B.; validation, M.S.; formal analysis, M.S., R.W., J.A. and A.P.; investigation, M.S., J.B. and A.P.; resources, R.G.; data curation, M.S. and A.P.; writing—original draft preparation, A.P.; writing—review and editing, M.S., A.P. and R.G.; visualization, M.S. and A.P.; supervision, R.G. and M.S.; project administration, R.G. and M.S.; funding acquisition, R.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the National Institutes of Health (NIH grants U01CA230692 to RG and MS, R01CA238455 and R01CA135069 to RG). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Georgetown University, IRB code: 2008-549, study: Glycans in Hepatocellular Carcinoma [12].

**Informed Consent Statement:** All participants provided written informed consent.

**Data Availability Statement:** The datasets generated during the current study are available from the corresponding author on reasonable request.

**Acknowledgments:** Further support was provided by the Office of The Director, National Institutes of Health under Award Number S10OD023557 supporting the operation of the Clinical and Translational Glycoscience Research Center, and Georgetown University, CCSG Grant P30 CA51008 (to Lombardi Comprehensive Cancer Center) supporting the Proteomics and Metabolomics Shared Resource.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

**Sample Availability:** Samples of the compounds are available from the authors.

#### **References**

