**Exploring Serum NMR-Based Metabolomic Fingerprint of Colorectal Cancer Patients: Effects of Surgery and Possible Associations with Cancer Relapse**

**Alessia Vignoli 1,2,†, Elena Mori 3,†, Samantha Di Donato 3, Luca Malorni 3,4, Chiara Biagioni 5, Matteo Benelli 5, Vanessa Calamai 3, Stefano Cantafio 6, Annamaria Parnofiello 3,7, Maddalena Baraghini 6, Alessia Garzi 6, Francesca Del Monte 3, Dario Romagnoli 5, Ilenia Migliaccio 4, Claudio Luchinat 1,2, Leonardo Tenori 1,2,\* and Laura Biganzoli 3,\***


**Abstract:** Background: Colorectal cancer (CRC) is the fourth most commonly diagnosed and third most deadly cancer worldwide. Surgery is the main treatment option for early disease; however, a relevant proportion of CRC patients relapse. Here, variations among preoperative and postoperative serum metabolomic fingerprint of CRC patients were studied, and possible associations between metabolic variations and cancer relapse were explored. Methods: A total of 41 patients with stage I–III CRC, planned for radical resection, were enrolled. Serum samples, collected preoperatively (t0) and 4–6 weeks after surgery before the start of any treatment (t1), were analyzed via NMR spectroscopy. NMR data were analyzed using multivariate and univariate statistical approaches. Results: Serum metabolomic fingerprints show differential clustering between t0 and t1 (82–85% accuracy). Pyruvate, HDL-related parameters, acetone, and 3-hydroxybutyrate appear to be the major players in this discrimination. Eight out of the 41 CRC patients enrolled developed cancer relapse. Postoperative, relapsed patients show an increase of pyruvate and HDL-related parameters, and a decrease of Apo-A1 Apo-B100 ratio and VLDL-related parameters. Conclusions: Surgery significantly alters the metabolomic fingerprint of CRC patients. Some metabolic changes seem to be associated with the development of cancer relapse. These data, if validated in a larger cohort, open new possibilities for risk stratification in patients with early-stage CRC.

**Keywords:** metabolomics; colorectal cancer; nuclear magnetic resonance; surgery; relapse

#### **1. Introduction**

Colorectal cancer (CRC) is the third most frequently diagnosed cancer and the second leading cause of cancer death worldwide [1–3]. A total of 80% of colon cancers are

**Citation:** Vignoli, A.; Mori, E.; Di Donato, S.; Malorni, L.; Biagioni, C.; Benelli, M.; Calamai, V.; Cantafio, S.; Parnofiello, A.; Baraghini, M.; et al. Exploring Serum NMR-Based Metabolomic Fingerprint of Colorectal Cancer Patients: Effects of Surgery and Possible Associations with Cancer Relapse. *Appl. Sci.* **2021**, *11*, 11120. https://doi.org/10.3390/ app112311120

Academic Editor: John Patrick Alao

Received: 21 October 2021 Accepted: 18 November 2021 Published: 23 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

diagnosed at early stage (stage 1 to 3), and surgery is the primary treatment option with curative intent for this type of disease [4]. Unfortunately, about 35% of these patients develop cancer relapse, which, in the majority of cases, occurs within the first 2–3 years after surgery [5,6]. TNM staging at diagnosis, based on depth of tumor wall invasion (T), lymph node involvement (N), and presence of distant metastasis (M), is currently the principal instrument available to predict risk of relapse, and thus to identify patients who may have potential benefits from adjuvant treatment [7,8].

Colorectal cancer is a heterogeneous disease, even within the same pathological stage, with different characteristics of clinical onset and different individual response to treatment. Moreover, patients with stage II and III CRC are shown to have different prognoses, particularly those who receive adjuvant chemotherapy, with 5-year overall survival (OS) ranging between 50% and 90% [9].

Adjuvant chemotherapy is strongly indicated in stage III disease, which is associated with a reduction of the relative risk of death of 33%, and an absolute survival benefit of 5–10% [10]. In stage III, the use of oxaliplatin in addition to fluoropyrimidines yields a further significant advantage of about 5% in terms of disease-free survival (DFS) and OS. Conversely, the therapeutic indication in patients with stage II CRC is controversial, as treatment with 5-Fluorouracil has an absolute benefit of 3–4% [11,12]. In patients with clinicopathologically high-risk stage II disease [13], decision-making around adjuvant chemotherapy treatment needs to be carefully evaluated and discussed, considering also recurrence risk factors such as baseline carcinoembryonic antigen and vascular invasion [7]. There is no evidence to support the use of adjuvant chemotherapy in stage I disease. Considering all the above mentioned data, identifying patients who are most likely to benefit from adjuvant chemotherapy and preventing the other patients from futile treatments and unnecessary exposure to toxicity is crucial in stage II disease.

Early detection of disease relapse is extremely relevant in CRC, as radical surgical intervention in patients with oligometastatic CRC can achieve a proven survival benefit. Therefore, early detection of relapse could potentially increase cure rates. Postoperative surveillance with clinical, radiological, and markers examination is often unable to identify early metastatic disease and/or postoperative minimal residual disease. Based on these considerations, improved risk stratification tools are required to reduce the number of patients treated unnecessarily.

Metabolomics is defined as the comprehensive measurement of the ensemble of metabolites present in a biological specimen, the so-called metabolome [14]. Metabolites represent, at the same time, the downstream output of the genome, transcriptome, and proteome, as well as the upstream input from various exogenous factors such as environment, lifestyle, diet, and drug administration [15]. In contrast to genomics, which indicates what might happen, metabolomic profiling/phenotyping captures what is actually happening in the body, and for this reason, in the last few years, metabolomics has been extensively applied in biomedical research [16–22].

Several relevant efforts to improve risk stratification in CRC have been made in the past years, considering mismatch repair (MMR) status, as well as BRAF and KRAS mutations, and the presence of tumor-derived circulating DNA [23,24]. Metabolomics has also emerged as a technique capable of contributing significantly in this setting [25–31]. Some of us have shown, in a cohort of elderly patients, that nuclear magnetic resonance (NMR) based metabolomics can discriminate between early and metastatic CRC. This approach may be a useful tool to build a prognostic model capable of assessing the likelihood of cancer relapse, based on the degree to which a serum fingerprint derived from a patient with early disease resembles that of a metastatic patient [13].

The study presented here explores the variations among preoperative and postoperative metabolomic serum fingerprints of CRC patients, obtained via NMR spectroscopy (Figure 1); moreover, for the first time, to the best of our knowledge, possible associations between pre/post-surgery metabolic variations and cancer relapse are examined.

**Figure 1.** Study design.

#### **2. Materials and Methods**

#### *2.1. Study Cohort*

From June 2017 to August 2018, we prospectively enrolled 41 patients with histologically diagnosed CRC, who were treated as per standard clinical practice at the Prato Hospital. All patients enrolled met the following inclusion criteria: (i) female or male patients with radically operable heteroplasia of the colon/rectum (stage I, II, III); (ii) Eastern Cooperative Oncology Group Scale of Performance Status (ECOG PS) 0–1; (iii) patients of age ≥ 18 years. For all patients enrolled, the following data were collected: (i) demographic data; (ii) clinical and histological characterization of the tumor; (iii) any other clinical information useful for the study (i.e., comorbidities, drug treatments).

All patients signed informed consent before entry into the study. The present study complies with the 1964 Declaration of Helsinki and its later amendments and received the approval by the local ethics committee (Comitato Etico Regione Toscana—Area Vasta Centro, study number: 10208\_bio).

#### *2.2. Samples Collection*

Serum samples were collected and stored following standard operating procedures validated at international level [32]. Two × 10 mL of overnight fasting peripheral blood were collected for each patient at the two timepoints (t0: before the radical tumor resection; t1: 4–6 weeks after surgery before the start of any adjuvant treatment) in serum vacutainer and processed within one hour from phlebotomy. After clot formation at room temperature, tubes were centrifuged at 1600 RCF for 10 min at 4 ◦C. Then, serum aliquots of 1 mL (labelled with an anonymized code) were immediately frozen at −80 ◦C, pending NMR analysis.

#### *2.3. NMR Analysis*

#### 2.3.1. Acquisition of NMR Data

All NMR spectra were acquired using a Bruker 600 MHz spectrometer (Bruker BioSpin, Rheinstetten, Germany) operating at 600.13 MHz proton Larmor frequency, equipped with an automatic refrigerated (6 ◦C) sample changer (SampleJet, Bruker BioSpin). Temperature stabilization (approximately 0.1 K at the sample) was obtained using a BTO 2000 thermocouple. Before NMR acquisition, to equilibrate temperature at 310 K, each sample was maintained inside the NMR probe head for at least 300 s. The spectrometer was calibrated daily, before any measurement, following strict standard operation procedures [33] to ensure high spectral quality and reproducibility.

Serum samples contain low molecular weight metabolites as well as high molecular weight macromolecules; for this reason, three different pulse sequences were used to enable the selective detection of the different serum molecular components: a 1D spin echo Carr–Purcell–Meiboom–Gill sequence (CPMG) was used to selectively detect signals of low molecular weight metabolites, and a 1D diffusion-edited pulse sequence was used to selectively acquire the signals of high molecular weight components (i.e., lipids, lipoproteins, proteins). Moreover, a 1D nuclear Overhauser effect spectroscopy pulse sequence (NOESY) was applied to detect signals of all molecules present in concentrations above the NMR detection limit.

A detailed description of sample preparation procedures, instrument configuration, and NMR parameters setting can be retrieved from our previous publication [16].

#### 2.3.2. Spectral Processing

Before applying Fourier transform, free induction decays were multiplied by an exponential function equivalent to a 0.3 Hz line-broadening factor. Using automated routine of TopSpin 3.6 (Bruker BioSpin), Fourier-transformed spectra were corrected for phase and baseline distortions, and NOESY and CPMG spectra were also calibrated at the anomeric glucose 1H doublet at δ 5.24 ppm. Each 1D spectrum in the range between 0.2 and 10.0 ppm was segmented into chemical shift bins of 0.02 ppm, and the corresponding spectral areas were integrated using AssureNMR software (Bruker BioSpin). The spectral region containing residual water signal (δ 5.12–4.38 ppm) was removed, and the dimension of the system was reduced to 453 bins.

#### *2.4. Statistical Analysis*

All data analysis was executed in the "R" statistical environment [34]. Multivariate analysis was performed on binned spectra without any a priori knowledge of the metabolites present. Multilevel partial least square analysis (mPLS) [35,36] was performed to obtain data reduction (R script developed in-house). Support vector machine [37] applied on the first nine mPLS components was used for classification purposes. Models were evaluated by means of 100 cycles of a Monte Carlo cross-validation scheme (in-housedeveloped R script). In brief, 90% of the pairs of data, selected at random at each iteration, were used as a training set to build the model. Then, the remaining 10% was tested, and sensitivity, specificity, and accuracy (calculated according to the standard definitions) were assessed.

Univariate analysis was conducted directly on the spectral regions associated with the metabolites/lipoproteins present in all serum samples at concentrations above the detection limit (>1 μM). Metabolites and lipoprotein-related parameters were identified and quantified using the Bruker IVDr quantification platform [38]. Metabolites whose levels were lower than the limit of quantification (LOQ) were imputed with half the LOQ (Table S1). Nonparametric Wilcoxon signed-rank test was used to infer intraindividual differences between the two timepoints. The *p*-values were adjusted for multiple testing using the false discovery rate (FDR) procedure with Benjamini–Hochberg [39] correction at α = 0.05. Wilcoxon rank-sum test was used to infer differences between metabolites/lipoproteins of free-from-disease and relapsed CRC patients. The *p*-values were not adjusted for multiple testing because the group of relapse patients is small, and therefore the correction would be too severe, increasing the risk of missing promising biomarkers. However, we are aware that this could increase the risk of a type I error.

Univariate analysis on clinical data was performed using the Fisher test for categorical variables and the ANOVA test for continuous variables. Polyserial correlations between ordinal clinical variables (pT, N, grade, stage, ECOG PS) and metabolites were calculated using the function "polyserial" (R package "polycor"). Point-biserial correlations between dichotomous clinical variables (tumor localization, sex) and metabolites were calculated using the function "biserial.cor" (R package "ltm").

#### **3. Results**

#### *3.1. Characteristics of Enrolled Patients*

Forty-one patients were enrolled in the study (21 female and 20 male). The median age was 73 years (Table 1).


**Table 1.** Descriptive statistics of enrolled CC patients at the time of analysis.

ECOG PS: Eastern Cooperative Oncology Group Performance Status; pT: primary tumor size; N: regional lymph nodes; MSI: microsatellite instability.

> Most of the enrolled patients had a good Eastern Cooperative Oncology Group (ECOG) performance status (PS), with 29 patients (71%) having a PS 0. However, over one half of the patients (*n* = 38; 69%) had comorbidity, of which 20 patients had vascular comorbidity. By inclusion criteria, all patients have early-stage disease: 11 patients (27%) with stage I, 13 patients (32%) stage II, and 17 patients (41%) stage III. In particular, six patients had a

pT1 (5 N0 e 1 N+), eight patients had a pT2 (6 N0 and 2 N+), 23 patients had a pT3 (18 N+), and four patients had a pT4 (3 N0 and 1 N+).

Regarding the 13 patients with stage II, two were at low risk and 11 at high risk for the presence of lymphovascular invasion, T4 or G3–4.

The majority of tumors had intermediate (G2; 48%: N = 19) or high (G3–G4; 47% N = 19) histologic grading, while G1 accounted for 5% of tumors in this population (N = 2). A total of 13 patients had left CRC and 28 right CRC (Table 1).

Half of the patients (46%; *n* = 19) received adjuvant chemotherapy, in accordance with clinical stage of disease. Nine patients received fluoropyrimidine monotherapy and 10 patients received polychemotherapy with oxaliplatin and fluoropyrimidine. Six out of eleven patients at stage II at high risk received adjuvant treatment; the rest of them did not receive chemotherapy for age or comorbidity

Thirteen out of the 17 patients with stage III disease received adjuvant treatment according to tumor stage. At the last follow-up, 19% (*n* = 8) of patients had disease relapse (Table 1). As expected, the patients with relapse had a history of stage III disease or stage II at high risk.

#### *3.2. Effects of Surgery on the Metabolome of CRC Patients*

The mPLS analysis was performed to assess intraindividual variations between t0 and t1 in the metabolomic fingerprints of CRC patients. The results obtained show significant differential clustering, with optimal separation of the two timepoints using each type of NMR spectra acquired, namely CPMG, NOESY, and DIFFUSION (Figure 2). All models classify t0 and t1 samples with an accuracy in the range 82–85%, and the best results were obtained using NOESY spectra. These data indicate that both low molecular weight metabolites and high molecular weight macromolecules (i.e., lipoproteins, proteins) contribute to the discrimination.

**Figure 2.** Score plots of the first two components of the mPLS models calculated using each of the three typologies of NMR spectra acquired: CPMG; NOESY; diffusion-edited. Discrimination accuracy of each model is reported. Each dot represents an NMR spectrum; dots are colored as follows: t0—orange, t1—turquoise. The first component mainly describes the differences between t0 and t1. The second component mainly reports the within-subject variation.

From univariate analysis emerges that after surgery there is a significant increase of pyruvate, HDL cholesterol, HDL phospholipids, HDL Apo-A1, and HDL Apo-A2 (Figure 3). Moreover, after surgery we observed a significant decrement of acetone, 3-hydroxybutyrate, LDL-Chol/HDL-Chol ratio, and Apo-A1/Apo-B100 ratio (Figure 3). Furthermore, several lipoprotein-related subfractions were shown to be significantly altered between t0 and t1 (Figure S1). These data point to a relevant rearrangement of the metabolic pathways related to lipoproteins, ketone bodies, and energy metabolism.

**Figure 3.** Boxplots of the statistically significant metabolites and lipoproteins-related parameters discriminating CRC patients at t0 (orange) and t1 (turquoise); *p*-values obtained using Wilcoxon signed-rank test and adjusted for FDR are reported. \*\*\* *p* < 0.001; \*\* *p* < 0.01; \* *p* < 0.05.

#### *3.3. Associations between Metabolome Variations after Surgery and Cancer Relapse*

Eight out of the 41 CRC patients enrolled in the present study developed cancer relapse in the three years after diagnosis. We hypothesized that different changes in preoperative and postoperative metabolomic serum profiles could be predictive of patients' prognosis. To explore this hypothesis, the difference between each metabolite/lipoproteinrelated parameter at t1 and t0 was calculated, and each resulting difference analyzed via univariate approaches to underline possible divergent behavior in free-from-disease and relapsed CRC patients. Postoperative, relapsed CRC patients show a significant increase of pyruvate, HDL Apo-A1, HDL Apo-A2, HDL cholesterol, HDL free cholesterol, and HDL phospholipids, and a significant decrease of Apo-A1 Apo-B100 ratio, VLDL-5 cholesterol, VLDL-5 free cholesterol, and VLDL-5 phospholipids (Figure 4).

**Figure 4.** Boxplots of the differences between t1 and t0 discriminating free-from-disease (green) and relapsed (red) patients, only statistically significant metabolites and lipoproteins-related parameters are reported; *p*-values obtained using Wilcoxon signed-rank test are reported. \*\* *p* < 0.01; \* *p* < 0.05.

#### *3.4. Associations between Metabolites and Clinical Variables*

Possible associations between metabolites/lipoproteins (main fractions) and clinical variables were investigated. Results are reported in Table S2.

Glycine and histidine showed statistically significant correlations with tumor size. Tyrosine correlates with tumor stage and regional lymph nodal spread (N). N also correlates with isoleucine, Apo-A1, and Apo-A2. Tumor localization (left or right colon) shows correlations with acetone, cholesterol, LDL cholesterol, and Apo-B100. Interestingly a panel of eight metabolic variables (N,N-Dimethylglycine, valine, dimethylsulfone, triglycerides, cholesterol, LDL cholesterol, Apo-A2, Apo-B100) correlates with the Eastern Cooperative Oncology Group Scale of Performance Status. Moreover, as expected, sex shows significant correlations with several metabolites/lipoproteins: creatine, creatinine, glutamine, glycine, isoleucine, leucine, formic acid, cholesterol, LDL cholesterol, HDL cholesterol, Apo-A1, Apo-A2, and Apo-B100. Of note, none of the examined metabolic features show significant correlation with tumor grade.

#### **4. Discussion**

The primary option for the treatment of colorectal cancer is surgery. Adjuvant chemotherapy is strongly indicated in stage III disease and in stage II patients at high risk of relapse. Whereas, in low-risk stage II disease decision-making around adjuvant chemotherapy must be carefully evaluated. At present, postoperative surveillance via clinical, radiological and biomarkers examination often cannot identify early metastatic disease and/or postoperative micrometastatic residual disease.

Based on these considerations, especially in stage II disease, improved risk-stratification tools are required to identify those patients who are most likely to benefit from adjuvant chemotherapy and need to be followed up more closely after surgery to timely detect systemic recurrence. On the other hand, accurate stratification instruments could prevent low-risk patients from unnecessary treatment and possible mild-to-severe adverse reactions.

The analysis described in the present research article shows for the first time, to the best of our knowledge, the metabolomic variations among preoperative and postoperative NMR-based serum fingerprint of CRC patients. Furthermore, metabolomics as novel approach for risk-stratification in CRC setting was evaluated by studying differences between pre- and postoperative serum samples of each patient enrolled. With this innovative approach, each patient in the study population acts as his/her own control, thus eliminating noise from interindividual variability.

Our data demonstrate that metabolomics profiles are influenced by the presence or absence of the cancerous mass. Indeed, the mPLS models calculated using each of the three NMR spectra acquired (namely, CPMG, NOESY, and DIFFUSION) show high discrimination accuracies (range 82–85%). This evidence poses an important question in terms of future study design, since sample collection when the tumor was still in place or after resection can significantly impact on metabolomic data.

From the univariate analysis, it emerges that after surgery, there is a significant increase of pyruvate, HDL cholesterol, HDL phospholipids, HDL Apo-A1, and HDL Apo-A2. Moreover, we observed, postoperative, a significant decrement of acetone, 3-hydroxybutyrate, LDL-Chol/HDL-Chol ratio, and Apo-A1/Apo-B100 ratio. These data point to a relevant rewiring of the metabolic pathways associated to lipoproteins, ketone bodies, and energy metabolism.

Depletion of pyruvate and increase of ketone bodies has been observed in sera of metastatic CRC patients with respect to healthy controls, and this evidence has been associated with an altered energy metabolism, probably reflecting an increased gluconeogenesis and fatty acid oxidation [31]. It is interesting to note that in our dataset, these three metabolites show trend inversions after surgery.

Our data show an increase of several HDL-Chol and a decrease of LDL-Chol lipoproteinrelated parameters post-surgery. This may be explained by the fact that, after cancer resection, an improvement in the inflammatory status of the gut is achieved, allowing for an improved lipid metabolism and lipid assimilation in the absence of the tumor.

Strikingly, despite the low number of recurrence events registered, it is peculiar that the difference in HDL-Chol is particularly marked in relapsed patients and is coupled with a decrease of VLDL-Chol. It has been observed that in colorectal cancerous tissue, the levels of cholesterol and triglycerides were reduced and HDL-Cholesterol level increased, indicating that CRC development destroys the physiological balance of lipids and lipoproteins, leading to lipid metabolic disorders [40]. Preclinical and clinical studies have already investigated the role of cholesterol in CRC progression; however, a clear understanding of the molecular mechanism linking these two entities is still lacking [40,41].

In conclusion, our results show that surgery can affect the metabolomic and lipidomic profiles of CRC patients and they point to possible associations between these metabolic changes and cancer recurrence. This study is based on a small population of CRC patients in which a very limited number of recurrence events are present; therefore, at present, results are only speculative and require further confirmation. In order to validate these findings in a general population, we are conducting a multicentric prospective trial focused on high-risk stage disease, the LIquid BIopsy and METabolomics in colon cancer (LIBIMET) study. LIBIMET aims primarily at redefining the risk of relapse in patients with high-risk, early-stage colon cancer by combining of ctDNA and serum metabolomics.

#### **5. Conclusions**

Taken together, the data here presented highlight the notion that CRC can induce metabolic changes that are reflected at a systemic level and can be detected in serum. This evidence suggests that our approach aimed at detecting micrometastatic CRC by assessing its metabolomic fingerprint in serum is correct, and that this may be exploited for biomarker-oriented research to contribute towards better management of colorectal cancer.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/ 10.3390/app112311120/s1. Table S1: Data completeness for the different metabolites quantified in the serum samples analyzed via NMR. LOQ = limit of quantification. Table S2: Correlation between clinical data and metabolites. Correlation coefficients and p-values are reported in table. Figure S1: Boxplots of the statistically significant lipoproteins-related parameters discriminating of CRC patients at t0 (orange) and t1 (turquoise); p-values obtained using Wilcoxon signed-rank test and adjusted for FDR are reported.

**Author Contributions:** Study conception and design, E.M., S.D.D., C.L., L.T. and L.B.; patient enrolment and management: E.M., S.D.D., V.C., S.C., M.B. (Maddalena Baraghini) and A.G.; collection of clinical data and serum samples: E.M., S.D.D., C.B., M.B. (Maddalena Baraghini), V.C., A.P., S.C., M.B. (Matteo Benelli), A.G. and F.D.M.; NMR analysis: A.V.; statistical analysis, biostatistics, and computational analysis, A.V., C.B., M.B. (Matteo Benelli), D.R. and L.T.; results interpretation, A.V., E.M., S.D.D., L.M., I.M., C.L., L.T. and L.B.; writing—original draft preparation: A.V. and E.M.; writing—review and editing: A.V., E.M., S.D.D., L.M., C.B., M.B. (Matteo Benelli), V.C., A.P., S.C., M.B. (Maddalena Baraghini), A.G., F.D.M., D.R., I.M., C.L., L.T. and L.B.; supervision, C.L, L.T. and L.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the local ethics committee (Comitato Etico Regione Toscana—Area Vasta Centro, study number: 10208\_bio).

**Informed Consent Statement:** Written informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Data and R script are available from the corresponding authors upon reasonable request.

**Acknowledgments:** In memoriam of Angelo Di Leo who passed away on 13 June 2021, while this work was being completed. The authors acknowledge the Fondazione Pitigliani per la lotta contro i tumori ONLUS for its support. The authors acknowledge Instruct-ERIC, a Landmark ESFRI project, and specifically the CERM/CIRMMP Italy Centre. Alessia Vignoli was supported by an AIRC fellowship for Italy.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Untargeted 1H-NMR Urine Metabolomic Analysis of Preterm Infants with Neonatal Sepsis**

**Panagiota D. Georgiopoulou 1, Styliani A. Chasapi 1, Irene Christopoulou 2, Anastasia Varvarigou 2,\* and Georgios A. Spyroulias 1,\***


**Abstract:** One of the most critical medical conditions occurring after preterm birth is neonatal sepsis, a systemic infection with high rates of morbidity and mortality, chiefly amongst neonates hospitalized in Neonatal Intensive Care Units (NICU). Neonatal sepsis is categorized as early-onset sepsis (EOS) and late-onset sepsis (LOS) regarding the time of the disease onset. The accurate early diagnosis or prognosis have hurdles to overcome, since there are not specific clinical signs or laboratory tests. Herein, a need for biomarkers presents, with the goals of aiding accurate medical treatment, reducing the clinical severity of symptoms and the hospitalization time. Through nuclear magnetic resonance (NMR) based metabolomics, we aim to investigate the urine metabolomic profile of septic neonates and reveal those metabolites which could be indicative for an initial discrimination between the diseased and the healthy ones. Multivariate and univariate statistical analysis between NMR spectroscopic data of urine samples from neonates that developed EOS, LOS, and a healthy control group revealed a discriminate metabolic profile of septic newborns. Gluconate, myo-inositol, betaine, taurine, lactose, glucose, creatinine and hippurate were the metabolites highlighted as significant in most comparisons.

**Keywords:** metabolomics; NMR spectroscopy; neonatal sepsis; EOS; LOS; preterm birth

#### **1. Introduction**

Neonatal sepsis is a systemic inflammatory response to infection, with a wide range and severity of symptoms [1]. The incidence varies all over the world and is quite different among high-, low-, and middle-income countries [2]. According to World Health Organization (WHO), 1.3 to 3.9 million cases are reported annually and 400,000 to 700,000 deaths worldwide [3]. Despite the progress in medical knowledge, neonatal care and antibiotic treatment, sepsis is considered one of the main reasons of morbidity and mortality, especially in very low birth weight (VLBW, <1500 g birth weight) and preterm infants (born before 37 weeks of pregnancy) [4–6]. Extremely preterm infants, whose gestational age (GA) is less than 28 weeks, are at greater risk of sepsis diagnosis [7,8]. However, several studies suggest that also late preterms are highly susceptible to developing sepsis [9–11].

Neonatal sepsis is classified as early onset sepsis (EOS), occurring within 72 h after birth, and late onset sepsis (LOS), 72 h after birth according to the onset time of the findings [12]. The incidence of EOS is 1 to 5 per 1000 live births, decreasing with intrapartum antibiotic therapy [13,14]. Associated with the increasing survival rate of preterm and VLBW infants, LOS rate presents an increase, with neonates weighing less than 750 g having the highest rate of LOS diagnosis [15,16].

Prematurity and low birth weight are among the main risk factors causing sepsis, with premature neonates having three to ten times higher possibility of sepsis diagnosis than normal weight full-term ones [1]. Risk factors and mortality rates differ among early-

**Citation:** Georgiopoulou, P.D.; Chasapi, S.A.; Christopoulou, I.; Varvarigou, A.; Spyroulias, G.A. Untargeted 1H-NMR Urine Metabolomic Analysis of Preterm Infants with Neonatal Sepsis. *Appl. Sci.* **2022**, *12*, 1932. https://doi.org/ 10.3390/app12041932

Academic Editor: Chiara Cavaliere

Received: 24 December 2021 Accepted: 8 February 2022 Published: 12 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and late-onset sepsis [17]. Among the EOS risk factors are fetal distress, low APGAR score, resuscitation of the neonate and multiple pregnancies. Additionally, for low-income countries, as EOS seems to be enhanced from factors such as inadequate antenatal care, unhealthy birth and late recognition of conditions that might induce infections to the mother or the neonate [18]. Factors that can increase risk of LOS are considered mainly invasive procedures, such as surgical interventions, intubation, mechanical ventilation, catheter/probe insertion, long-term parenteral nutrition, frequent blood sampling, low stomach acid and/or insufficient breastfeeding [1].

The pathogenesis of EOS is taking place during the intrapartum period, during which the responsible pathogens are transmitted from mother to neonate [19]. Pathogens causing EOS are usually colonized in maternal genitourinary tract and with the amniotic membrane rupture are transmitted to the fetus or during the labor to neonate [20]. Most frequent EOS's pathogens are the Gram (+) group B Streptococcus (GBS) followed by the Gram (−) *E. Coli* bacteria [21]. Pathogens causing LOS can be transmitted during labor or from the environment. LOS is usually caused by nosocomial or environmental pathogens, specifically Coagulase-negative staphylococci (CONS), Gram (−) bacilli and Fungi, especially *C. albicans*. Apart from differences in the pathogenesis and the time of onset, EOS and LOS have different clinical manifestations [22].

A combination of clinical signs and laboratory findings constitute the diagnosis procedure, with blood culture considered as the "gold" standard [23]. For the physicians, the diagnosis of sepsis is a challenge. The limited diagnostic accuracy of common laboratory tests, such as white blood cell indices and acute phase reactants contribute poorly to the early diagnosis of neonatal sepsis. The diagnostics' accuracy limitations, in combination with the neonate's prematurity and survival status as well as the ambiguity of early clinical signs, urge neonatologists to shut out sepsis [24]. Hence, the discovery of new biomarkers, which can easily be detected at an early stage, has occupied the medical and scientific community, with many studies already having been carried out [25].

In this scope, metabolomic analysis could take upon a fundamental role. Detecting and quantifying a wide variety of small molecules, intermediate or final components of metabolic pathways, metabolomics aim to identify the alterations caused by the condition of biological and medical interest. As the outcome of biochemical procedures regulated by proteins derived from genes expression, metabolomics provides the closest relation with the phenotype of an organism, at a specified time frame in correlation with endogenous and exogenous influences. The utilization of metabolomics as a tool for the validation of new biomarkers for early diagnosis or prognosis of pathophysiological conditions is constantly growing.

Before any metabolomic analysis, based on the main question to be answered, the process of the analysis has to be determined. A targeted metabolomics approach is selected when the aim is to measure the levels of a particular set of metabolites which are suspected to have a relation with a condition. An untargeted approach aims to detect and assign as many peaks as possible related to metabolites biofluid, tissue or cell extract under study, identifying the metabolic "fingerprint" [26]. Nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) are the two dominant analytical techniques for metabolomic analysis of biofluids. NMR-based analysis offers numerous advantages and could combine chemometrics and basic clinical research [27].

In the field of neonatology, the interest for a prognostic and diagnostic tool for neonatal sepsis is continuously growing. A urine sample, reflecting a holistic view of the metabolism, is considered as the most appropriate biological fluid for analysis of newborns' metabolism via NMR spectroscopy. The impact of neonatal sepsis on newborn metabolism has been in the center of interest the last few years and application of metabolomic studies towards the investigation of a specific profile of septic newborns is on demand, but, to establish metabolites as biomarkers, further analysis must be conducted [28–30]. The aim of our study is to reveal those metabolites with differentiated levels among the septic neonates hospitalized in the Neonatal Intensive Care Unit (NICU) and control/healthy neonates. Such metabolic alterations could lead to early diagnosis of EOS and LOS. To our knowledge, both medical treatment and parenteral nutrition of NICU hospitalization may induce essential metabolic drifts. For this reason, we chose to investigate the comparison of the sepsis metabolic profile with the metabolic profile of NICU hospitalized neonates without severe comorbidities except prematurity, complementary to healthy neonates, whose care after birth was taken over by their mother.

#### **2. Materials and Methods**

#### *2.1. Study Population*

The current study was conducted in the Neonatal Ward and the NICU of the General University Hospital, and the Department of Pharmacy at the University of Patras. Seventyone (n = 71) neonates were recruited in the study, only after informed consent and parental permission having been provided. The neonates were separated into the following four groups: Group A: Septic neonates diagnosed with EOS (n = 23); Group B: Septic neonates diagnosed with LOS (n = 11); Group C: Preterm neonates without any clinical sign or symptom of sepsis or other serious morbidity (n = 14) hospitalized in the NICU; Group D: Healthy preterm neonates (n = 23) not separated from their mother after birth. All NICU neonates were premature, with GA ranging from 25 to 36 GA weeks, while all healthy non NICU preterm neonates ranged from 35 to 36 GA weeks. For each neonate, the following were recorded: (a) perinatal data such as gestational age (GA), birth weight (BW), sex, delivery mode, premature rupture of membranes and Apgar score; (b) clinical data such as mechanical ventilation, treatment with antibiotics, positive blood cultures, C reactive protein (CRP), and breast feeding. All clinical characteristics and laboratory findings of the neonates participated in this study are listed in Table 1. The biological fluid under study was urine and the samples were collected in the first 24 h after birth for EOS neonates and in the third day of the extrauterine life for LOS neonates. The collection was carried out with the use of adhesive pediatric urine collection bags and 1.5 mL of each collected urine sample was transferred to sterile vials and remained at −80 ◦C until the analysis.

**Table 1.** Clinical characteristics and laboratory findings for septic and control neonates. Median and (minimum–maximum) values of the GA, BW, Apgar Score for the first and tenth minute of life, the number (percentage) of the male sex, the delivery mode (cesarian section), the small for gestational age (SGA) infants, the newborns with premature rupture of membranes (>18 h), treatment with mechanical ventilation and/or antibiotics, blood culture positive (Gram-positive, Gram-negative and fungi) and c reactive protein (CRP).


#### *2.2. Ethical Statement*

The study was approved by the General University Hospital of Patras Human Research Ethics Committee and all the procedures performed in this study involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. A written informed consent was obtained from parents.

#### *2.3. NMR Sample Preparation*

The urine samples were stored at −80 ◦C until analysis. For NMR analysis, frozen urine samples were thawed at room temperature and centrifuged at 12,000 rpm for 10 min at 4 ◦C. For each sample, 540 μL of the supernatant was mixed with 60 μL potassium phosphate buffer (1.5 M KH2PO4 in H2O containing 4% 2 Mm NaN3 and 10 mM 4,4 dimethyl-4-silapentane-1-sulfonic acid (DSS), pH 4.2). The mixture was vortexed and 590 μL of it was transferred into a 5 mm NMR tube (Bruker BioSpin srl).

#### *2.4. NMR Experiments*

The 1H-NMR spectra were recorded at 298 K on a Bruker Avance III HD 700 MHz NMR spectrometer equipped with a cryogenically cooled 5.0 mm 1H/13C/15N/D Z-gradient probe. Two kinds of 1H-NMR spectra were recorded for each urine sample of the study's participants. A mono-dimensional (1D) NMR spectrum was acquired using a standard NOESY (noesygppr1d.comp; Bruker BioSpin, Billerica, MA, USA) pulse sequence for water suppression, to reveal all the detectable 1H signals of metabolites. Due to the complexity of urine NMR spectra, as they consist of numerous peaks, presented in a non-distinctive manner, and sometimes overlapped, the identification and further quantification of metabolites using only 1D NMR spectra cannot be accomplished [31]. Hence, for each urine sample, a two-dimensional (2D) *J*-resolved (jresgpprqf.comp; Bruker BioSpin) spectrum acquired, separating the chemical shifts and J-couplings into two different dimensions, making it a very useful NMR experiment for metabolite assignment in metabolomics [32]. Accurately, the acquisition parameters for the 1H 1D NMR and 2D *J*-resolved spectra were 64 scans, 4 dummy scans, FID size 64.536, a spectral width of 10,504 Hz, 3.1 s acquisition time, 2 s a relaxation delay, and 100 ms mixing time [33].

#### *2.5. Data Processing*

After the NMR data acquisition, all spectra were manually processed for phase and baseline corrections and calibrated to the internal standard's DSS peak at 0.00 ppm. The processing was performed using the TopSpin 4.1.1 (Bruker BioSpin srl). For further statistical analysis, all 1H 1D NOESY NMR spectra were transformed into binned numerical data. Each spectrum was fragmented into buckets of 0.02 ppm width for the spectral region 0.70–9.50 ppm, using the AMIX software (Bruker BioSpin). The spectral areas of 4.50–6.00 ppm, 2.70–2.80 ppm and 1.75–1.77 ppm where 1H signals of water, urea and internal standard DSS are observed, were excluded. Chenomx software (NMR Suite Version 9.0, Edmonton, AB, Canada), the online Human Metabolome Database (HMDB) and data from literature, were utilized for the assignment of important and discriminant proton NMR signals of urine samples.

#### *2.6. Statistical Analysis of NMR Data*

The statistical analysis of the spectral binning data was performed using the online tool MetaboAnalyst 5.0 [34]. Multivariate analysis (MVA), consisting of the unsupervised principal component analysis (PCA) and supervised partial least squares discriminant analysis (PLS-DA), was applied on the 1H NMR data after Pareto normalization. This normalization was selected, since NMR information does not deviate much from the original compared to autoscaling, and fewer errors related to noisy spectral regions without biological impact are conducted [35]. PCA and PLS-DA plots were examined to extract information about group clustering and potential outliers. Loadings values from PCA model and variable importance in projection (VIP) scores >1 from PLS-DA were the main evidence source about metabolites with differentiation between groups, responsible for the group clustering. The quality of PLS-DA was ensured using the parameters as follows: goodness of fit R2 and the goodness of predictability Q2, after 10-fold cross-validation test. For statistical correlation of NMR data, a mono-dimensional (1D) statistical total correlation spectroscopy analysis was performed using the muma R package [36], where pareto scaled data were analyzed and the represented pseudo-NMR spectrum displayed the covariance (height) and the Pearsons correlation coefficient (color) of all spectral variables (buckets) with the variable of interest being the ''driver peak" or ''driver signal". Additionally, univariate analysis was performed on the successfully identified metabolites, targeting only the non-overlapping metabolites' peaks for integration. Non-parametric Kruskal–Wallis test, followed by a false discovery rate (FDR) correction, was conducted using RStudio and metabolites with *p*-value <0.05 were characterized as significant.

#### *2.7. Pathway Analysis and Visualization*

A pathway analysis module of MetaboAnalyst 5.0, integrating enrichment analysis and pathway topology analysis through a Google Maps-style visualization system, was adopted to identify the metabolic pathways associated with the statistically significant metabolites for EOS, LOS and healthy preterms. After the import of metabolites' compound names as data input, the Homo Sapiens pathway library from KEGG was selected. A hypergeometric test and relative betweenness centrality were preferred for over-presentation and topological analysis, respectively.

#### **3. Results**

A total of 45 metabolites were successfully detected and assigned in urine samples of the preterm neonates. Due to the complexity and peak overlapping of the 1H NMR urine spectra, statistically important and peaks with discriminant signals were recorded for the qualitative identification of the neonate's urine metabolism. Table S1 represents the list of all the assigned metabolites (Supplementary materials Table S1) and the main pathway they may belong to. Further statistical analysis of the binned data was based on the reported chemical shifts to associate statistical important spectral regions with specific metabolites.

#### *3.1. Metabolic Profile Alterations of Preterms with EOS*

#### 3.1.1. EOS Preterms Versus Non NICU Healthy Preterms

Twenty-three preterm infants (n = 23), hospitalized in NICU, were diagnosed with EOS and their urine NMR metabolic profiles were compared to the first day of life spectroscopic data of the control group, consisting of twenty (n = 20) healthy preterm neonates, treated by their mothers immediately after birth and non-hospitalized in NICU (non NICU preterms). The generated data, consisting of 359 spectral bins, were processed using Pareto normalization. The numerical data matrix was applied as input for the unsupervised PCA and the supervised PLS-DA. Both showed a clear discrimination between the two groups. According to PC1 and PC2, explaining the 73.4% of the total variance, loadings belonging to an unknown pattern of multiple peaks between 7.40–7.50 ppm and at 1.40 ppm, observed to the majority of the NMR spectra of the NICU urine samples (20 out of 23 spectra of EOS samples), were responsible for the separation (Supplementary materials Figure S1). Additionally, gluconate and lactose differed between the two groups, with EOS neonates having the higher concentration level of gluconate and lower of lactose. The PLS-DA model (Figure 1a) shows a discrimination between the two groups, with an R2 of 0.664 and Q2 of 0.409 for the third component after a 10-fold cross-validation test, supporting the unsupervised PCA's clustering. VIP scores, with colored boxes on the right representing the differentiated relative concentrations of the corresponding buckets in each group (blue—low—and red—high—relative concentration) were in accordance with loadings (Figure 1b) and those with higher scores reveal lower intensity of buckets related to ppm of taurine, myo-inositol and betaine, creatinine for EOS and higher intensity of the unknown

area 7.40–7.50 ppm. The box plots of the metabolites responsible for the group clustering confirm the concentration's alteration among the compared groups (Figure 1c). The Y axis represents the normalized concentration of the corresponding spectral region. The obtained negative values come as a result of the total spectral area normalization. The unassigned spectral region of two different types of peaks that resonate at 1.40 ppm statistically correlate with the multiple peaks at the aromatic region of 7.40–7.50 ppm. The statistical correlation was confirmed with the 1D-STOCSY (Supplementary materials Figure S2) that indicates all the correlations of all spectral variables (buckets). To further examine and highlight the structural correlations with the spectral variables that resonate at 7.40–7.50 ppm, we set as ''driver peak" the variable x.1.41, which includes the signals at 1.40 ppm and the observed maximum correlation displayed between the variable x.1.41 and x.7.51. Thus, they may belong to the same metabolite or group of metabolites derived from medication or parenteral nutrition. We attribute the unknown spectral pattern at 7.40–7.50 ppm and at 1.40 ppm as a NICU metabolic-induced characteristic, since it is present predominantly in the 1H NMR spectra from NICU urine samples (Figure 2). Most of the preterm neonates hospitalized in NICU are under antibiotic treatment because of sepsis or of other infectious condition suspicion; also, prophylactic administration of broad-spectrum antibiotics is unfortunately very common. Antibiotics can lead to adverse effects, including necrotizing enterocolitis (NEC) and LOS [37]. Patton et al. studied their impact on fecal metabolome of preterm infants, but urine metabolome has not been investigated yet [38]. Our study suggests the further analysis of these unidentified spectral regions as antibiotic outcome on 1H NMR spectra of urine samples. However, in depth investigation of the antibiotic and/or additional medication effect on the urine NMR metabolic profile is beyond the scope of this research.

#### 3.1.2. EOS Preterms versus NICU Control Preterms

1H NMR urine metabolic profile of the first day of life from twenty-three (n = 23) preterm infants with EOS was compared to that of thirteen (n = 13) preterms hospitalized in NICU without any sign or symptom of sepsis or other infection. This comparison was conducted complementary to the group of preterms without the need of hospitalization, to examine and reduce the effect of NICU hospitalization on newborns urinary metabolism. Multivariate analysis of EOS and NICU control preterms displayed less distinctive classification (Supplementary Materials Figure S3a) regarding the comparison with healthy non-NICU preterms (Supplementary Materials Figure S1a). The PLS-DA model (Figure 3a) separated the two groups, but the low value of the R<sup>2</sup> = 0.491 and the negative Q<sup>2</sup> indicate an overfitted model with low predictability. The buckets responsible for this group separation, as resulting from the PCA loadings plot (Supplementary Materials Figure S3b) and from VIP scores, are related to gluconate, threonine/lactate and 7.40 ppm chemicals shifts of unknown peaks (Figure 3b). Gluconate seems to be present in higher concentration on EOS group, but some buckets related to gluconate (4.09 ppm, 4.15 ppm) have higher VIP score in the control group. This may be justified by the shift of gluconate peaks in different spectra or the presence of sugars, such as glucose and lactose. The pattern of 7.40, 1.40 ppm had higher intensity for control neonates, and may be related to different medical treatment and nutrition. In accordance with prior analysis, EOS neonates differ from controls at the 3.25 ppm spectral region, where mainly betaine is located, overlapping myo-inositol and taurine.

**Figure 1.** Multivariate analysis of NMR data belonging to neonates diagnosed with EOS (pink circles) and healthy neonates without need for NICU hospitalization (blue circles). (**a**) PLS-DA scores plot of the EOS and healthy non-NICU preterms. (**b**) VIP scores and metabolites related to buckets with different concentration among the two groups. (**c**) Box plots of normalized concentration for the discriminant metabolites.

**Figure 2.** Spectral regions of unassigned chemical shifts mostly present on 1H NMR spectra of NICU samples (blue spectrum) and healthy non-NICU preterms (black spectrum).

**Figure 3.** Multivariate analysis of NMR data belonging to neonates diagnosed with EOS (orange circles) and neonates hospitalized in NICU without EOS (purple circles). (**a**) PLS-DA scores plot of the EOS and control group. (**b**) VIP scores and metabolites related to buckets with different concentration among the two groups.

#### 3.1.3. EOS Metabolic Profile Progression between First and Third Day of Life

With the perspective for the validation of a metabolite for further analysis or the identification of a group of metabolites characteristic of the septic urine metabolome, the progression of the metabolome throughout the time of the condition needs to be examined. In our study a comparative analysis of urine metabolomes between the first and the third day of septic newborns' extrauterine life was conducted. For ten (n = 10) out of the twenty-three neonates diagnosed with EOS, urine samples and NMR data from the third day of neonates' life are not available due to handling inaccuracies during sample collection or other excluding criteria regarding the NMR spectra (e.g., baseline and phase distortions) and the presence of lipids, characterized of broad peaks overlapping peaks of small molecules studied in this analysis. Hence, a total of twenty-six (n = 26; paired 1st and 3rd day of life) urine 1H NMR spectra, corresponding to each one of the thirteen (n = 13) EOS preterms, were analyzed. An initial classification of the two groups was performed via PCA, where the third principal component, explains the 79.9% of the cumulated variance (Supplementary materials Figure S4). PLS-DA corroborated and strengthened the clustering, providing a valid and reliable model (Figure 4a). The cross-validation indicated the use of the first two components as optimal for building the classification model, with R<sup>2</sup> = 0.613 and Q<sup>2</sup> = 0.296. VIP scores were in accordance with the PCA's loadings and indicated reduced concentration of myo-inositol, gluconate, betaine, taurine, and creatinine on the third day (Figure 4b). The normalized concentration's differentiated levels of the reported metabolites are clearly represented on their associated box plots (Figure 4c). The alteration of myo-inositol levels between the first and the third day of life has been previously reported for healthy full-term (>38 GA) neonates [33]. Specifically, in association with sepsis, increased levels of myo-inositol have also been detected on the work of Sarafidis et al. for LOS at the day of the disease's onset [28]. The decrease in creatinine is in accordance with the Fanos et al. findings for septic neonates [30].

#### *3.2. Metabolic Profile Alterations of Preterms with LOS*

#### 3.2.1. LOS Preterms versus Non NICU Healthy Preterms

Spectroscopic data of urine samples collected on the third day of life from eleven (n = 11) preterm neonates that developed LOS were compared to the third day's 1H NMR urine profile of twelve (n = 12) non-NICU healthy preterms. The unsupervised PCA, explaining the 75.1% of the total variability within the first three components (PC1 = 43.3%, PC2 = 18.2% and PC3 = 13.6%) indicated a clear tendency for clustering the two groups. Loadings with the greatest impact caused this form of PCA plot, agreed with EOS results and reinforced the claim of the great impact of hospitalization in NICU on urine metabolome (Supplementary materials Figure S5). The supervised PLS-DA, after the cross-validation resampling method, present a reliable and predictable model with R<sup>2</sup> = 0.861 and Q<sup>2</sup> = 0.788 for the second component (Figure 5a). As expected, VIP scores highlighted gluconate and the pattern of 1.40, 7.40–7.50 ppm (Figure 5b), with significant alterations of normalized concentration among groups, clearly displayed on the box plots (Figure 5c). Multivariate analysis of LOS metabolic profile reveals similar metabolic alterations with EOS. This analysis did not add something different compared to the analysis about EOS and healthy non-NICU preterms, and it is primary evidence that the septic profile does not dramatically change in relation to time within the first three days of neonate's life.

#### 3.2.2. LOS Preterms Versus NICU Control Preterms

Following the same procedure with the comparisons for EOS neonates, in order to eliminate the impact of hospitalization and dietary or drugs urine excreted metabolites in urine LOS data compared to the NICU control group. In total, eleven (n = 11) 1H NMR urine LOS profiles were compared with nine (n = 9) 1H NMR urine NICU control profiles from the third day of life. Scores plot of the PCA model (Supplementary materials Figure S5a) primarily did not reveal strong discrimination and resembles the PCA plot clustering of EOS analysis. Additively, presents different urine metabolic profile with the same elevations of myo-inositol, betaine, gluconate, taurine and NICU characteristics (Supplementary materials Figure S6b). The PLS-DA model is characterized by low capacity of predictability with negative Q2 values (Figure 6a). Beyond these metabolites, VIP scores revealed higher concentration of buckets belonging to sugars, for the control group (Figure 6b). The bucket of 1.17 ppm belongs to a crowded 1H NMR region, without a specific metabolite present to the total of urine samples.

(**c**)

**Figure 4.** Multivariate analysis of NMR data belonging to urine samples of neonates diagnosed with EOS the first (green circles) and the third day (blue circles) of their life. (**a**) PLS-DA scores plot of the first- and third-day's samples. (**b**) VIP scores and metabolites related to buckets with different concentration among the two groups. (**c**) Box plots of normalized concentration for the discriminant metabolites.

#### *3.3. Metabolic Profile Alterations between EOS and LOS Neonates*

LOS and EOS neonates, except the onset of the disease, present different clinical symptoms and vary on the severity of the outcome. This differentiation can be reflected also on the metabolism. So, additively to separate comparisons between EOS, LOS and control groups, multivariate analysis was conducted between the urine metabolic profile of the first day of life for twenty-three (n = 23) EOS neonates and third day of life for eleven (n = 11) LOS neonates. For the PCA model (Supplementary materials Figure S7), until the PC3 the variance explained was 70.1% and the plot showed a tendency of clustering the urine metabolic profiles of each group, which became clear on the PLS-DA plot (Figure 7a). The metabolic alterations, according to VIP scores (Figure 7b) and box plots (Figure 7c), resemble the differences among the first and third day of EOS. So, the correlation between EOS and LOS cannot be validated as this metabolic outcome may reflect the adaptation of the neonate to the extrauterine life. To shed light on the alterations of the septic metabolome over the days, targeted metabolomic analysis of specific metabolites already known or suspected based on metabolomics results for their association with sepsis would offer a more certain approach.

**Figure 5.** Multivariate analysis of NMR data belonging to neonates diagnosed with LOS (pink circles) and control neonates without need for hospitalization (blue circles). (**a**) PLS-DA scores plot of the LOS and healthy non-NICU preterms without need for hospitalization. (**b**) VIP scores and metabolites related to buckets with different concentration among the two groups. (**c**) Box plots of normalized concentration for the discriminant metabolites.

**Figure 6.** Multivariate analysis of NMR data belonging to neonates diagnosed with LOS (orange circles) and control neonates of NICU (blue circles). (**a**) PLS-DA scores plot of the LOS and control. (**b**) VIP scores and metabolites related to buckets with different concentration among the two groups.

**Figure 7.** Multivariate analysis of NMR data belonging to urine samples of neonates diagnosed with EOS (red circles) and neonates with LOS (green circles). (**a**) PLS-DA scores plot of EOS and LOS samples. (**b**) VIP scores and metabolites related to buckets with different concentration among the two groups. (**c**) Box plots of normalized concentration for the discriminant metabolites.

#### *3.4. Univariate Statistical Analysis*

A univariate statistical analysis was performed on specific metabolites with discriminant peaks. Metabolites with *p*-value < 0.05 (Table 2) were characterized as statistically significant. Between septic and healthy non-NICU preterms, additively to multivariate results, lactose and hippurate were highlighted as significant for EOS, and dimethylglycine for LOS group. For both septic groups decreased levels of taurine, betaine and increased levels of gluconate also shown by multivariate analysis were reinforced by univariate analysis. EOS and LOS metabolites' comparison confirmed the initial results obtained through multivariate analysis regarding myo-inositol's differentiation. Septic groups and control NICU preterms did not highlight any statistically significant metabolite. The box plots of the statistically significant metabolites represent the differentiation of their relative intensity between septic and healthy control non-NICU neonates (Supplementary materials Figure S8).


**Table 2.** Statistically significant metabolites and their *p*-value for EOS and LOS groups.

#### *3.5. Pathway Analysis*

The identified significant metabolites from multivariate and univariate analysis were implemented into MetaboAnalyst pathway analysis module to determine a qualitative aspect of all the possibly affected metabolic pathways. A separate analysis for the EOS and LOS group was selected, as except myo-inositol, betaine, gluconate, taurine, lactose, creatinine and hippurate which were statistically significant for both groups, glucose and *N*, *N*-Dimethylglycine were characterized as significant only for EOS and LOS group, respectively. A metabolic pathway analysis depicted ten (n = 10) altered metabolic pathways for the EOS (Figure 8a, Supplementary materials Table S2) and nine (n = 9) for the LOS group (Figure 8b, Supplementary materials Table S3). The representation of all the identified pathways was based on the pathway impact (*x*-axis) and the calculated *p*-value (*y*-axis). Among the nine metabolic pathways for EOS, ascorbate/ alderate metabolism and taurine/hypotaurine metabolism were the most significant, with *p*-value < 0.05. Taurine/hypotaurine metabolism pathway had the largest impact factor (0.42), followed by inositol phosphate pathway (0.13). Myo-inositol and taurine were the metabolites involved in these pathways (Supplementary materials Table S2). Regarding the LOS, significant metabolites, glycine/serine, and threonine metabolism pathway, where betaine and *N*,*N*-Dimethylglycine are involved, had the lowest *p*-value (0.01). The pathway impact scores did not present large differences from EOS pathway analysis.

**Figure 8.** Graphically representation of the pathway analysis. Each cycle represents a metabolic pathway, while the color and the size are based on *p*-value and pathway impact, respectively. (**a**) Pathway analysis of EOS group significant metabolites. (**b**) Pathway analysis of LOS significant metabolites.

#### **4. Conclusions**

The findings of our study indicate a discrete metabolic profile of septic neonates. NMR based metabolomic approach revealed the relation among septic and control neonates, highlighting gluconate, myo-inositol, hippurate, taurine, *N*, *N*-Dimethylglycine, betaine, creatinine, glucose and lactose as significant metabolites. Our study reported, for the first time, altered urinary amounts of betaine in EOS and LOS neonates and *N*, *N*-Dimethylglycine in LOS neonates. Differentiated concentration levels of taurine and hippurate via LC-MS analysis have been previously reported by Sarafidis et al., and through UPLC-MS by Mardegan et al. [28,29]; however, our study was the first to detect them in urine samples of septic neonates via NMR. The utilization of the two control groups and their discrete analysis, based on the NICU hospitalization, showed that NICU treatment has a significant impact on neonates' urine metabolome. The observed spectral pattern indicative for the most of the NICU neonates, suggests that it is related to endogenous or exogenous metabolites of the personalized nutrition or medical treatment. Additionally, the impact of nutrition is confirmed from the greatly elevated levels of gluconate for the septic group and lactose for neonates fed from their mothers. Differentiation between EOS and LOS, and the adaptation of the fetus to neonate during the first days of extrauterine life that occur in parallel are reflected to the metabolism. Changes through the first days of life, associated with EOS and LOS, highlight the necessity for chronological coupled sampling with the onset and specific time of the disease progression. This research builds on the power of NMR metabolomic analysis to determine the status of an entire organism by a small amount of non-invasive collected biological sample. The establishment of NMR analysis of metabolome for clinical research in the field of neonatology, leading to large-scale multicenter studies, gives new and promising perspectives for its incorporation into the clinical daily routine and the validation of new combined diagnostic biomarkers.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/app12041932/s1; Figure S1: Scores and loadings plot of PCA for NMR data belonging to neonates diagnosed with EOS (pink circles) and healthy neonates without need for NICU hospitalization (blue circles). (a) PCA scores plot of the EOS and healthy non-NICU preterms. (b) Loadings plot of PC1 and PC2; Figure S2: 1D-STOCY pseudo-NMR spectrum of correlation coefficients to the other signals in the median urine NMR spectrum and maximum intensity correlation of peaks are color encoded and projected into statistical difference spectra: ''driver peak" was set the one at 1.41 ppm; Figure S3: Scores and loadings plot of PCA for NMR data belonging to neonates diagnosed with EOS (orange circles) and neonates hospitalized in NICU without EOS (purple circles). (a) PCA scores plot of the EOS and control group. (b) Loadings plot of PC1 and PC2; Figure S4: Scores and loadings plot of PCA for NMR data belonging to urine samples of neonates diagnosed with EOS the first (green circles) and the third day (blue circles) of their life. (a) PCA scores plot of the first- and third-day's samples. (b) Loadings plot of PC1 and PC2; Figure S5: Scores and loadings plot of PCA for NMR data belonging to neonates diagnosed with LOS (pink circles) and control neonates with-out need for hospitalization (blue circles). (a) PCA scores plot of the LOS and healthy non-NICU preterms without need for hospitalization. (b) Loadings plot of PC1 and PC2; Figure S6: Scores and loadings plot of PCA for NMR data belonging to neonates diagnosed with LOS (orange circles) and control neonates of NICU (purple circles). (a) PCA scores plot of the LOS and control group. (b) Loadings plot of PC1 and PC2; Figure S7: Scores and loadings plot of PCA for NMR data belonging to urine samples of neonates diagnosed with EOS (red circles) and neonates with LOS (green circles). (a) PCA scores plot of EOS and LOS group. (b) Loadings plot of PC1 and PC2; Figure S8: Box plots of the statistically significant metabolites highlighted from univariate analysis with *p*-value < 0.05, between healthy control non-NICU neonates, LOS and EOS groups; Table S1: 1H NMR Chemical Shifts of Metabolites detected in urine samples of neonates and their main metabolic pathway; Table S2: Detailed results from the pathway analysis of the EOS group's significant metabolites; Table S3: Detailed results from the pathway analysis of the LOS group's significant metabolites.

**Author Contributions:** Study conception and design, I.C., A.V. and G.A.S.; patient enrolment and management: I.C. and A.V.; collection of clinical data and urine samples: I.C.; NMR analysis: S.A.C.; statistical analysis, biostatistics, and computational analysis, P.D.G.; results interpretation, P.D.G., S.A.C., I.C., A.V. and G.A.S.; writing—original draft preparation: P.D.G. and S.A.C.; writing—review and editing: P.D.G., S.A.C., I.C., A.V. and G.A.S.; supervision, A.V. and G.A.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was supported by the INSPIRED (MIS 5002550) which is implemented under the Action 'Reinforcement of the Research and Innovation Infrastructure,' funded by the Operational Program 'Competitiveness, Entrepreneurship and Innovation' (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund). Additionally, EU FP7 REGPOT CT-2011-285950—"SEE-DRUG" project is acknowledged for the purchase of UPAT's 700 MHz NMR equipment.

**Institutional Review Board Statement:** This study was approved by the General University Hospital of Patras human research ethics committee (protocol code 7976/24.4.17, date of research committee approval: 217/ 31.3.2017 and study number approval: 154/16.3.2017 by the research ethics and deontology committee) all procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

**Informed Consent Statement:** Informed consent and parental permission were obtained from all the parents of the participants involved in the study.

**Data Availability Statement:** Data and R script are available from the corresponding authors upon reasonable request.

**Conflicts of Interest:** The authors declare that they have no conflict of interest.

#### **References**


## *Article* **Phenotyping Green and Roasted Beans of Nicaraguan Coffea Arabica Varieties Processed with Different Post-Harvest Practices**

**Gaia Meoni 1,2,3,\*, Claudio Luchinat 1,2,3, Enrico Gotti 4, Alejandro Cadena <sup>5</sup> and Leonardo Tenori 1,2,3,\***

	- alejandro.cadena@caravela.coffee

**Abstract:** Metabolomic tecniques have already been used to characterize two of the most common coffee species, *C. arabica* and *C. canephora*, but no studies have focused on the characterization of green and roasted coffee varieties of a certain species. We aim to provide, using NMR-based metabolomics, detailed and comprehensive information regarding the compositional differences of seven coffee varieties (*C. arabica*) of green and roasted coffee bean batches from Nicaragua. We also evaluated how different varieties react to the same post-harvest procedures such as fermentation time, type of drying and roasting. The characterization of the metabolomic profile of seven different Arabica varieties (Bourbon-typica), allowed us also to assess the possible use of an NMR spectra of bean aqueous extracts to recognize the farm of origin, even considering different farms from the same geographical area (Nueva Segovia). Here, we also evaluated the effect of post-harvest procedures such as fermentation time and type of drying on green and roasted coffee, suggesting that postharvest procedures can be responsible for different flavours. This study provides proof of concept for the ability of NMR to phenotype coffee, helping to authenticate and optimise the best way of processing coffee.

**Keywords:** metabolomics; phenotyping; nuclear magnetic resonance spectroscopy; coffee beans; coffee processing; coffee varieties; post-harvest treatment

#### **1. Introduction**

Green coffee beans are one of the most traded commodities, and coffee is the most consumed beverage after water [1]. Its popularity is due to the attractive organoleptic and energetic characteristics of coffee [2]. The quality of coffee principally derives from the grade of green coffee beans that are influenced by several factors, including genetics, geographic localization, altitude of the plantation, climate, agricultural and postharvest processing factors [3,4]. Moreover, the different processing techniques of coffee beans can impact the final product. Usually, in wet-processed coffee, freshly harvested coffee cherries are de-pulped to remove the skin and most of the fruit around the bean. Then, de-pulped coffee beans are placed in tanks where they can naturally ferment for 12–24 h. This fermentation begins to break down the mucilage, which is a sugary, slimy substance that surrounds the beans. Then, the coffee is dried on courtyards under the direct sun or in shade. The exact implementation of these steps influences the organoleptic properties and the quality of the product [5], which can be described also by the presence and the concentration of certain metabolites (small molecules < 1500 Da) in coffee beans [6]. These

**Citation:** Meoni, G.; Luchinat, C.; Gotti, E.; Cadena, A.; Tenori, L. Phenotyping Green and Roasted Beans of Nicaraguan Coffea Arabica Varieties Processed with Different Post-Harvest Practices. *Appl. Sci.* **2021**, *11*, 11779. https://doi.org/ 10.3390/app112411779

Academic Editor: Chiara Cavaliere

Received: 28 October 2021 Accepted: 3 December 2021 Published: 11 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

differences in metabolites can be therefore used as indicators of coffee quality, and can potentially direct the agronomic and post-harvest procedures to a high-quality grade final product [7].

Metabolomics, the omic science that deals with metabolites the final products of all biochemical reactions occurring in a certain biological system, can be considered an optimal tool to characterize the totality of genetic and environmental interactions and their effect on coffee [8]. Therefore, understanding how different factors affect coffee metabolites could be crucial to improve coffee quality. Most of the commonly used analytical techniques have been extensively applied to characterize the levels of certain specific chemical components (e.g., sugars, caffeine, trigonelline, phenolics, tocopherols, chlorogenic acids and lipids) and the metabolic profiles of the two mostly traded coffee species, *Coffea arabica* and *C. canephora* (Robusta) [7,9–12]. The profiles of Arabica and Robusta species can be distinguished using fatty acids, amino acid enantiomers, caffeine and other xanthine alkaloids, chlorogenic acid concentration, as well as other compounds (furans, phenols, quinic acids, pyridines, biogenic amines, terpenes and steroids) [13]. Arabica coffee accounts for 60% of global production and is preferred by customers due to its distinct flavor and aroma [14]. There are currently more than 50 commercially grown Arabica coffee cultivars but with different traits that can be classified into four groups based on their genetic descriptions: Ethiopian Landrace, Bourbon-Typica group, Introgressed, and F1 Hybrid [15]. However, little is known about the chemical differences between coffee varieties [6,16,17]. Considering that coffee species have demonstrated to react differently to external stimuli, it could be interesting to evaluate how different varieties of the same species and cultivation type, react if exposed to the identical post-harvest conditions or roasting procedure. Mengistu et al. 2020, demonstrated that different coffee varieties of *C. arabica* grown in the north-western highlands of Ethiopia are characterized by different levels of trigonelline, chlorogenic acids and caffeine. However, at present there are no metabolomic based studies determining the fingerprint and the profile of green and roasted beans of different coffee varieties. Here, NMR-based metabolomics is applied to characterize seven different coffee varieties of the same species (*C. arabica*) and the same cultivation type (Bourbon-Typica) localized within the same geographic area of Nicaragua. Moreover, we evaluated how they differently react to the same post-harvest procedure and to the same roasting time and temperature. The experimental design also allowed us to evaluate the differences between the same varieties grown by different farms located within the same territory. The possibility to characterize the profile of coffee of different producers, localized within a restricted geographic area, as previously demonstrated by our group in cow's milk [18], could be of potential interest for precise authentication. This could pave the way for the authentic territorial characterization of specialty coffees. Identifying qualitative attributes and characteristic metabolomic profiles of each producer and promoting transparency concerning its origin could help individual farmers to add value to their products and become more involved in upgrading strategies.

#### **2. Materials and Methods**

#### *2.1. Coffee Beans*

A total of 36 green coffee beans batches were collected in 2019 from three distinct farms (farm 1: 13◦47 11.8 N 86◦32 46.5 W, 1155 m AMSL (above mean sea level); farm 2: 13◦45 08.8 N 86◦29 42.3 W, 1033 m AMSL; farm 3: 13◦44 36.1 N 86◦24 25.0 W, 696 m AMSL) localized in the Nicaraguan department of Nueva Segovia. Table 1 shows all the characteristics of each batch. A total of seven different coffee varieties (*C. arabica,* Bourbon-Typica) have been collected: catuai rojo (CR, number of coffee batches = 8), maracaturra (MC, *n* = 4), bourbon (BO, *n* = 8), caturra (CA, *n* = 4), pacamara (PA, *n* = 4), tekesic (TE, *n* = 4), bourbon rojo (BR, *n* = 4). Selected varieties are recognized by name according to information provided by the growers. With the aim to evaluate the effect of the different types of green coffee processing, we considered for each variety two different times of fermentation (12 h and 24 h duration), and two drying procedures after full washing,

namely "under shade" (Us) and "direct sun" (Ds), see Table 1. Green bean batches (80 g) were roasted by Caravela Ltd. (London, UK) using an IKAWA professional roaster (IKAWA Ltd., London, UK) at 220 ◦C for 5:30 min.


**Table 1.** List of the Nicaraguan coffee batches analyzed.

#### *2.2. NMR Samples*

Seven beans for each batch were grounded using a Caso 1830 coffee grinder, which was thoroughly cleaned between the grinding of each sample. A total of ~0.2 g of crushed beans were weighed into 2 mL Eppendorf tubes and 1 mL of ultrapure H2O (Synergy®, Merck KGaA, Darmstadt, Germany) was added to each sample. Samples were centrifuged 5 min at 14,000 RCF (room temperature) and then incubated at 95 ◦C in closed 2 mL Eppendorf tubes for 1 h. The aqueous extracts were centrifuged for 5 min at 14,000 RCF at 4 ◦C to let the solid debris settle. Then, 300 μL of the supernatant were transferred into a new 1.5 mL Eppendorf tube and mixed with 300 μL of phosphate buffer (1.5 M K2HPO4, 100% (*v/v*) 2H2O, 2 mM NaN3, 5.8 mM TSP; pH 7.4, all reagents have been purchased by Sigma-Aldrich, Darmstadt, Germany) and vortexed for 20 s. A total of 450 μL of this mixture was transferred in a 4.25 mm NMR tube. Samples were weighted and extracted in five technical replicates for each batch.

#### *2.3. NMR Spectroscopic Analysis and Data ProcessingNMR Data Analysis*

One-dimensional (1D) 1H-NMR spectra were measured at 400 MHz using an AVANCE III Bruker spectrometer equipped with a 5 mm BBI 400S1 H-BB-D-05Z probe. The probe temperature was regulated at 300 K and for each spectrum, 64 scans were collected using noesygpps1d (Bruker) pulse sequence, a spectral width of 12.47 ppm, a relaxation delay of 4 s and a total acquisition time of 8 min. The receiver gain was set to 203. FIDs were zero-filled and transformed using exponential line broadening (0.6 Hz), resulting in spectra of 16,384 data points. A total of 260 noesygpps1d spectra were acquired. Because of the low shimming quality, two NMR spectra of roasted coffee beans (361a and 369b) were removed before the statistical analyses (total of considered spectra: n◦ green = 180; n◦ roasted = 178).

#### *2.4. NMR Data Analysis*

Resulting NMR spectra were aligned to the TSP signal (0 ppm) and input variables for statistical analyses were generated via variable size binning (green coffee beans spectra divided into 384 buckets, and roasted coffee beans spectra divided into 419 buckets). Each spectrum was segmented into buckets of 0.02 ppm in the range between 0.4 and 10 ppm, except the resonance regions of caffeine (3.2, 3.4, 7.75 ppm) and chlorogenic acid (6.2, 7.0, 7.50 ppm) because of the significant chemical shift changes observable due to their interaction in aqueous solution [19]. Therefore, the buckets of these regions were merged to have the protons of the corresponding molecule into the same bucket window (merged in green buckets: 3.14–3.28, 3.34–3.44, 6.02–6.5, 6.58–7.2, 7.44–7.68, 7.7–7.9; merged roasted buckets: 3.22–3.28, 3.36–3.48, 6.22–6.44, 6.72–7.2, 7.44–7.66, 7.7–7.88; Supplementary Materials Figure S1). Moreover, the region of residual water (4.5 ppm–5.24 ppm) was excluded. Buckets were then normalized to the measured weight of crushed beans, and thereafter, Probabilistic Quotient Normalization (PQN) was applied. The resulting dataset was used to perform multivariate statistical analysis.

A total of 20 metabolites were identified in all the NMR spectra of green coffee, and 29 were identified in all the roasted one. Among them, 15 metabolites are present both in the NMR spectra of green and roasted aqueous extracts. Comprehensively, 34 different metabolites were assigned (Figure 1). Since most of them resonate in crowded regions of the spectrum, where the presence of other signals below certain peaks cannot be excluded, only 15 metabolites in green and 25 metabolites in roasted coffee, corresponding to well defined and resolved peaks in the spectra, were quantitated considering the area under the peaks. Signal identification was performed using a library of NMR spectra of pure organic compounds (AssureNMR 2.2 software, Bruker BioSpin, Karlsruhe, Germany), public databases (e.g., FooDB, n.d.; PhytoHub, n.d., Edmonton, Alberta) storing reference and literature data [7,10,12]. The resulting matrices were used to perform multivariate and univariate data analyses.

**Figure 1. 1H NMR spectra (400 MHz) of water-soluble green** (**a**) **and roasted** (**b**) **coffee beans.** (1) fatty acids; (2) isoleucine; (3) valine; (4) propionic acid; (5) lactic acid; (6) alanine; (7) unknown1; (8) quinic acid; (9) acetic acid; (10) hydroxy acetone; (11) acetone; (12) gamma-aminobutyric acid (GABA); (13) malic acid; (14) citric acid; (15) choline; (16) caffeine; (17) theophylline; (18) myo-inositol; (19) glycolic acid; (20) trigonelline; (21) 2-furylmethanol; (22) 5-caffeolylquinic acid (5-CQA); (23) sucrose; (24) citraconic acid; (25) chlorogenic acids; (26) fumaric acid; (27) xanthine; (28) formic acid; (29) N-methylpyridinium (NMP); (30) nicotinic acid; (31) 5-hydroxymethylfurfural (5-HMF); (32) anserine; (33) methyl xanthines; (34) putrescine.

#### *2.5. Statistical Analysis*

Data analyses were performed using R, an open-source software for the statistical analysis of data. Multivariate analysis on metabolomic data was performed on processed NMR bucketed spectra. Principal component analysis (PCA) was used as first exploratory analysis [20]. The RF ("Random Forest" of R package) algorithm [21], was used to assess whether green and roasted NMR metabolomic profiles can be used to classify samples according to the variety, origin, and kind of drying (direct sun or under shadow) and fermentation time (12 h or 24 h) of different coffee batches. Random Forest uses a collection of classification trees, each of them is grown by random selection of features from a bootstrap sample at each branch. Class prediction is based on the majority vote of the collection. While the tree is constructed, about one-third of the instances are left out of the bootstrap sample. This data is then used as test sample to obtain an unbiased estimate of the classification (OOB) error. Variable importance is evaluated by measuring the increase of the OOB error when variables are permuted [22].

Univariate analysis was performed on quantitated metabolites. The Kruskal–Wallis test followed by Dunn post hoc analysis [23] was chosen to infer significant differences among independent samples from multiple groups (n◦ groups > 2). The Wilcoxon test was chosen to gather differences between two groups and false discovery rate correction was applied using the Benjamini and Hochberg method (FDR) [24], an adjusted *p*-value < 0.05 was considered statistically significant.

#### **3. Results and Discussion**

#### *3.1. Unsupervised Analysis of 1H NMR Coffee Beans Spectra*

As preliminary evaluation, PCA was performed on the datasets of bucketed 1H-NMR spectra (5 independent samples for each batch), to investigate the quality and the overall behaviour of the acquired green and roasted coffee spectra (Supplementary Materials Figure S2). The sum of the variance of PC1 and PC2 accounts for a total of 89.9% and 76.5% in green and roasted coffee score plots, respectively (Supplementary Materials Figure S2a,b). PCA shows a tendence to form clusters according to the variety (Supplementary Materials Figure S2a,b). The farm effect seems to emerge particularly in BO variety (Supplementary Materials Figure S2a,b), while e subtle differentiation by the fermentation time (12 h vs. 24 h) emerges, especially for the MC and PA green coffee beans water extracts (Supplementary Materials Figure S2a). This is in line with the observation that there are varieties that, being more metabolically susceptible, could also change more significantly in taste depending on the way in which they are processed [24]. Even less marked, these differences are present also in the spectra of roasted beans.

#### *3.2. Coffee Varieties*

Each variety was analyzed, using RF as the supervised machine learning approach, to demonstrate the presence of the fingerprint of coffee varieties both in green and in roasted coffee using all NMR data (mean predictive accuracy 91.7%, Table S1). This type of analysis conducted within the same farm, certainly highlights the strong differences between varieties.

Then, the presence of the varietal fingerprint was investigated regardless of the farm of provenance, using bucketed spectra (Figure 2a,b). The predictive accuracies of green (Figure 2a) and roasted (Figure 2b) coffee beans models are similarly good (87.2% in green and 86.0% in roasted), confirming, as previously seen (Supplementary Materials Table S1), that even after roasting the varietal metabolomic fingerprint could be derived. Among all variety classes, TE and BR class error is the highest (Figure 2a,b). The RF variable importance is calculated for green and roasted coffee beans batches, and the overall importance is assessed by determining the maximum for each descriptor over all classes (Figure 2c,d). As shown in Figure 2c,d, there are some conserved regions (ppm), present both in green and roasted coffee spectra, that mostly contribute as important features ranked by RF: the region between 0.94 and 1.2 ppm could be mainly ascribed to the broad signals of methyl and methylene protons of fatty acids (FA) chains [9], the regions within 4.44 and 4.46 ppm, 8.08–8.12 ppm, 8.80–8.86 ppm and 9.12–9.14 ppm, attributable to trigonelline (TR) protons, and the bucket range from 2.52 ppm to 2.76 ppm corresponding to citric acid (CT) signals. Therefore, fatty acids, trigonelline and citric acid, can be considered descriptors of the varieties both in green and in roasted NMR spectra. It emerges that fatty acids and trigonelline maintain also the same trend between the considered varieties in green and in roasted coffee.

**Figure 2. Variety fingerprint assessment through RF.** Confusion matrices of RF algorithm of green (**a**) and roasted (**b**) coffee bean spectra. A summary of the variable importance measures for the buckets of coffee NMR spectra with variety as the response variable in the RF model is reported: (**c**) for green coffee model (**a**,**d**) for roasted coffee model (**b**). Buckets are ranked according to the mean decrease in classification accuracy when they are permuted. Calculated RF class error and mean decrease accuracy units can be also read as percentage (e.g., class error of 0.05, means 5%). Most important buckets regions corresponding to assignable resonance present both in green and roasted coffee RF models (**c**,**d**) are labeled accordingly: fatty acid, FA; trigonelline, TR; citric acid, CT. Corresponding RF score plots are reported in Supplementary Materials Figure S3.

In addition, with the aim to compare the efficacy of the fingerprinting and profiling approaches [8], RF was applied, even on the matrices of the corresponding peak areas of the identified metabolites in green and roasted spectra (Figure 3a–d). Compared to the RF models built on bucketed spectra (entire spectra, fingerprinting, Figure 2), models built on metabolites resulted to be less accurate (green model, Figure 3a, pred. acc: 79.4%; roasted model, Figure 3b, pred. acc: 69.7%). This suggests that the fingerprint approach is preferable for variety classification/recognition [8].

**Figure 3. Variety profiling.** Confusion matrices of RF algorithm of green (**a**) and roasted (**b**) metabolites of coffee beans. A summary of the variable importance measures for the identified metabolites with variety as the response variable in the RF model (**a**) of green coffee (**c**) and model (**b**) of roasted coffee (**d**). Metabolites are ranked according to the mean decrease in classification accuracy. At the bottom are reported the boxplots of the trigonelline levels in green (**e**) and roasted beans (**f**) among the seven different coffee varieties.

The most important metabolites in discriminating varieties are ranked in Figure 3c,d. This analysis reports trigonelline as the most contributing metabolite in the discrimination of varieties, both in the profile of green (Figure 3c) and in the profile of roasted coffee (Figure 3d).

Univariate statistical analysis (Figure 3e,f, Supplementary Materials Figures S4 and S5) supports that trigonelline trends are conserved between coffee varieties, even after roasting (BO > MC > PA > TE = BR > CA > CR).

An increment of quinic, acetic, fumaric and formic acids and a decrement of gammaaminobutyric acid (GABA), malic acid, theophylline, trigonelline, 5-CQA, sucrose and chlorogenic acids occur following the roasting process (Supplementary Materials Figure S6). Although it is already known that roasting leads to the alteration of these metabolites [12,25,26], there are no data regarding the behavior of such components among different coffee varieties after roasting. As previously seen, trigonelline is particularly interesting in this respect, as it demonstrated a characteristic trend among varieties and was preservable even after roasting. Characteristically higher amounts of trigonelline are usually detected in *C. arabica* with respect to *C. canephora* [27]. This is in line with reports by Mengistu et al., 2020 on Ethiopian coffee, suggesting a characteristic trend of trigonelline among different varieties. The significance of trigonelline has been well established in previous studies, not only as a precursor of flavor and aroma compounds (as one of the main contributors to coffee's bitter taste), but also as a beneficial nutritional compound [28].

The fact that the trigonelline trend is conserved among varieties after roasting could suggest trigonelline as a potential candidate biomarker for variety determination.

#### *3.3. Coffee Farms*

RF models were also created to assess whether the characteristic fingerprint and/or profile of the corresponding coffee farm can be derived from coffee batches of the same variety. However, this hypothesis has been tested only in catuai rojo (CR) and bourbon (BO), since among the seven varieties collected, only these two are produced by more than one farm (see Table 1). RF models have been built to distinguish CR batches of farm 1 and farm 3 and BO batches of farm 1 and farm 2. All the four RF models built to distinguish farms of CR show optimal predictive accuracies (pred. acc%. 94 ± 3.15) both for green and roasted coffee (Supplementary Materials Figure S7a–d). Consistently with the summaries of the most important variables (Supplementary Materials Figure S7a1–d1), univariate analysis on metabolites shows significant higher content of quinic acid, alanine, trigonelline, caffeine, and lower amounts of theophylline, 5-CQA, citric and chlorogenic acid in green coffee beans of the catuai rojo variety of farm 1 when compared to farm 3 (Figure 4a). Higher levels of choline, sucrose, xanthine, and lower levels of 5-hydroxymethyl furfural (5-HMF), fumaric acid, hydroxyacetone and formic acid can be observed in CR roasted coffee of farm 1 (Figure 4b). Even the four RF models built to classify BO coffee batches according to the farm of origin (farm 1 vs. farm 2) show optimal classification accuracies (pred. acc%. 98.1 ± 2.4, Supplementary Materials Figure S7e–h). The summary of variables importance (Supplementary Materials Figure S7f1) and univariate analysis on metabolite levels (Figure 4c) report higher amounts of alanine, 5-CQA, malic and chlorogenic acid, and lower levels of theophylline, quinic acid, GABA and sucrose in green beans of farm 1 when compared to farm 2. The trends of theophylline, 5-CQA and chlorogenic acid are conserved even after roasting (see Figure 4d). Moreover, in roasted BO coffee beans of farm 1 compared to farm 2, lower amounts of 5-HMF, lactic acid, hydroxyacetone, formic and acetic acid and myo-inositol are detected. Taken together, these results suggest that farm 1 is characterized by higher levels of theophylline when compared with the other farms. Theophylline is a xanthine alkaloid and it is usually detected in higher amount in Robusta than in Arabica beans [29,30]. It has already been demonstrated by Mehari et al., that the concentrations of xanthine alkaloids (such as theophylline, theobromine, trigonelline and caffeine) could change significantly in coffee according to geographical origin [31]. Higher levels of theophylline could derive from different caffeine metabolisms of the plant, but

also from caffeine degradation performed by natural occurring microorganisms during bean fermentation [32,33].

**Figure 4. Log2(FC) of metabolites' concentrations characteristic of distinct farms.** Positive Log2(FC) values represent higher level of green coffee (**a**) and roasted coffee (**b**) metabolites in farm 1 compared to farm 3 (negative Log2(FC) values) considering catuai rojo batches; positive values represent higher levels of green coffee (**c**) and roasted coffee (**d**) metabolites in farm 1 compared to farm 2 (negative Log2(FC) values) considering bourbon batches. Dark gray bars represent the metabolites which are statistically significant after the FDR *p*-value correction (FDR < 0.05), gray bars for those metabolites that show a *p*-value < 0.05 but lose significance after the False Discovery Rate correction. Asterisks represent the Cliff's delta effect-size, were "\*\*\*" means large effect, "\*\*" medium effect.

#### *3.4. Evaluation of the Fermentation and Drying Effects on Coffee Metabolomic Profile*

To evaluate the effect of the two times of fermentation (12 h and 24 h), the RF approach was applied on green and roasted coffees, using either the matrices of bucketed spectra or the matrices of metabolites (Supplementary Materials Figure S8a–d). Considering all four models created, the fermentation time mostly affected the profile of green coffee, while the effect was not remarkable in roasted coffee. The RF model built on green metabolites resulted to be the most effective in discriminating the fermentation times (pred. acc%. 72.2, Supplementary Materials Figure S8b). A total of 24 h fermented coffee beans were

characterized by higher levels of acids (in particular: malic acid, acetic acid, chlorogenic acids, 5-CQA, citric acid, fumaric acid, GABA, quinic acid, as reported in Supplementary Materials Figure S8b1).

Univariate analysis corroborates the fact that malic acid levels are statistically different in the two groups. Among all the fifteen quantified metabolites in green coffee beans, only malic acid remained statistically significant after the false discovery rate (FDR) correction (malic acid: *p*-value = 0.0002, FDR = 0.003, cliff's delta = small). Based on the good performance of the model reported as b in Supplementary Materials Figure S8, to check if distinct varieties reacted differently to 12 h or 24 h of fermentations, the effect was evaluated on green coffee metabolites considering each variety separately. As can be seen in Figure 5, each variety reacts differently to the time of fermentation. In particular, the profiles of fermentation times can be distinguished with a predictive accuracy ~100% in maracaturra, pacamara and bourbon rojo (Figure 5b,e,f), suggesting a remarkable change induced by the time of fermentation. As previously reported, the coffee batches longer fermented are characterized by higher levels of acids.

**Figure 5.** *Cont*.

**Figure 5. RF models on green coffee metabolites characterizing each variety according to 12 or 24 h of beans fermentation.** (**a**) catuai rojo; (**b**) maracaturra; (**c**) bourbon; (**d**) caturra; (**e**) pacamara; (**f**) tekesic; (**g**) bourbon rojo. Mean decrease accuracy values are reported on the orthogonal axis of each plot.

The drying effect was also evaluated considering coffee batches processed "under shade" and those at "direct sun" exposure. Four RF models were created using available data (Supplementary Materials Figure S9a–d): also, in this case the effect of the different drying procedure is more remarkable in green coffee (Supplementary Materials Figure S9a,b), and in particular the RF model built on metabolites provided a better discrimination of green coffee batches processed in the two different manners (overall predictive accuracy: 71%, Supplementary Materials Figure S9b). Among the quantified metabolites, amino acids (valine and alanine) seem to be the most affected by these procedures (Supplementary Materials Figure S9b1). Univariate analysis confirms valine as the only metabolite which remained significantly altered after the FDR correction (valine: *p*-value = 0.0006, FDR = 0.008, cliff's delta = small). The effect of the drying procedure is detectable in each variety Figure 6. In Figure 6 it emerges that valine levels are higher in all coffee beans batches dried at direct sun. Alanine, acetic and chlorogenic acid levels are generally altered in all the considered models, but there is not a unique trend common for all the varieties (Figure 6a–e), demonstrating that each variety differently reacts to the type of drying.

**Figure 6.** RF models on green coffee metabolites characterizing each variety according to the different drying procedures used: "under shade" and "direct sun". (**a**) catuai rojo; (**b**) maracaturra; (**c**) bourbon; (**d**) caturra; (**e**) pacamara; (**f**) tekesic; (**g**) bourbon rojo. Mean decrease accuracy values are reported on the orthogonal axis of each plot.

#### **4. Conclusions**

Coffee metabolomics research has primarily focused on green and roasted coffee beans from the two main varieties, *C. arabica* and *C. canephora.* To the best of our knowledge, there are no metabolomic based studies about the characterization of coffee varieties considering both green and roasted coffee. Here, we have presented detailed and comprehensive information regarding the different metabolomic composition of seven Arabica coffee varieties, using an NMR-based metabolomic approach. For each variety, two points of fermentation time (12 h vs. 24 h) and two types of drying procedures (under shade and direct sun) have been considered. The analyses were performed both considering the entire spectra to evaluate the fingerprint of each variety, and on the identified metabolites, both for green and for roasted coffee beans.

The results demonstrated that NMR spectra of both green and roasted coffee beans can be used to recognise coffee varieties with high accuracies (87.2% and 86% using, respectively green and roasted NMR spectra to build the model).

Moreover, it was also possible to characterize, using this approach, the metabolomic profile of distinct coffee farms within the same restricted geographical area of Nicaragua cultivating the same varieties. Our results demonstrate that, even when coffee batches are processed following the same post-harvest procedure, the characteristic fingerprint of each farm could be derived with high predictive accuracies. The opportunity to quickly obtain NMR spectra with a minimal sample preparation, and to use them to classify samples according to the variety, makes the NMR-based metabolomic approach a suitable approach to recognize original products. Moreover, NMR spectroscopy may be considered as a "magnetic tongue" that analyses and predicts food flavours without being targeted and disruptive.

Therefore, the effects of the time of fermentation and drying types were also evaluated, suggesting that both post-harvest procedures are capable of inducing changes in the metabolic profile of coffee beans that are responsible for different flavours in the cup. In particular, the amount of malic acid, which contributes to a tart acidulous and sour taste, is increased in 24 h of fermentation batches of CR, PA, TE, and BR; trigonelline, instead, related to a bitter taste, is increased in 12 h fermentation in CA, while the other varieties show weaker variation based on the treatments; formic acid which gives a sour/lemon taste, is increased in MC green beans at 24 h of fermentation, while it is decreased in BR cultivar at 24 h of fermentation. Caffeine content seems also to be slightly increased by longer fermentation time. The content of acetic acid, which contributes to a sour vinegar taste, seems to be higher, particularly in CR and PA if exposed to the sun drying, instead, for the other varieties, higher content can be obtained if beans are exposed under shade. The present study suggests that post-harvest treatment procedures can differently affect the amount of aroma precursors within distinct coffee varieties and that the kind of processing should be optimized specifically for each variety.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/app112411779/s1, Figure S1: 1H NMR spectra of coffee beans. Figure S2: First two components of PCA score plots of 1H NMR bucketed spectra of green (a) and roasted (b) coffee beans. Figure S3: Random Forest multidimensional scaling (MDS) plots. Figure S4: Univariate analysis on metabolites: cultivars' comparison (green coffee beans). Figure S5: Univariate analysis on metabolites: cultivars' comparison (roasted coffee beans). Figure S6: Roasting effect on coffee beans' metabolites. Figure S7: Farms' fingerprint and profiling assessment through RF. Figure S8: RF models built on green and roasted NMR bucketed spectra and metabolites: beans fermented 12 h vs. beans fermented 24 h. Figure S9: RF models built on green and roasted NMR bucketed spectra and metabolites: beans dried under shade vs. beans dried at direct sun.Supplementary data tables: containing NMR data (bucketed spectra and area under the peaks of identified molecules) of green and roasted coffee batches used in this study. Table S1: Variety classification within the three farms through RF.

**Author Contributions:** Conceptualization G.M., C.L., E.G. and L.T.; Data curation G.M.; Formal analysis G.M.; Investigation G.M. and L.T.; Visualization G.M.; Writing Original Draft G.M.; Supervision L.T. and C.L.; Provision of samples A.C.; Funding acquisition C.L.; writing review L.T. and C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are contained within the supplementary data table.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Review* **NMR in Metabolomics: From Conventional Statistics to Machine Learning and Neural Network Approaches**

**Carmelo Corsaro 1,\*, Sebastiano Vasi 1, Fortunato Neri 1, Angela Maria Mezzasalma 1, Giulia Neri <sup>2</sup> and Enza Fazio <sup>1</sup>**


**Abstract:** NMR measurements combined with chemometrics allow achieving a great amount of information for the identification of potential biomarkers responsible for a precise metabolic pathway. These kinds of data are useful in different fields, ranging from food to biomedical fields, including health science. The investigation of the whole set of metabolites in a sample, representing its fingerprint in the considered condition, is known as metabolomics and may take advantage of different statistical tools. The new frontier is to adopt self-learning techniques to enhance clustering or classification actions that can improve the predictive power over large amounts of data. Although machine learning is already employed in metabolomics, deep learning and artificial neural networks approaches were only recently successfully applied. In this work, we give an overview of the statistical approaches underlying the wide range of opportunities that machine learning and neural networks allow to perform with accurate metabolites assignment and quantification.Various actual challenges are discussed, such as proper metabolomics, deep learning architectures and model accuracy.

**Keywords:** NMR; metabolomics; biomarkers; clustering; artificial intelligence; machine learning; deep learning; health science

#### **1. Introduction**

Metabolomics corresponds to the part of omics sciences that investigates the whole set of small molecule metabolites in an organism, representing a large number of compounds, such as a portion of organic acids, amino acids, carbohydrates, lipids, etc. [1–3]. The investigation and the recording of metabolites by target analysis, metabolic profiling and metabolic fingerprinting (i.e., extracellular metabolites) are fundamental steps for the discovery of biomarkers, helping in diagnoses and designing appropriate approaches for drug treatment of diseases [4,5]. There are many databases available with metabolomics data, including spectra acquired by nuclear magnetic resonance (NMR) and mass spectrometry (MS), but also metabolic pathways. Among them, we mention the Human Metabolome Database (HMDB) [6] and Biological Magnetic Resonance Bank (BMRB) [7] that contain information on a large number of metabolites gathered from different sources. By means of the corresponding web platform, it is possible, for instance, to search for mono- and bi-dimensional spectra of metabolites, starting from their peak position [3]. However, metabolomics databases still lack homogeneity mainly due to the different acquisition conditions, including employed instruments. Thus, the definition of uniform and minimum reporting standards and data formats would allow an easier comparison and a more accurate investigation of metabolomics data [8].

In recent years, NMR has become one of the most employed analytical non-destructive techniques for clinical metabolomics studies. In fact, it allows to detect and quantify

**Citation:** Corsaro, C.; Vasi, S.; Neri, F.; Mezzasalma, A.M.; Neri, G.; Fazio, E. NMR in Metabolomics: From Conventional Statistics to Machine Learning and Neural Network Approaches. *Appl. Sci.* **2022**, *12*, 2824. https://doi.org/10.3390/ app12062824

Academic Editor: Alessia Vignoli

Received: 31 January 2022 Accepted: 1 March 2022 Published: 9 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

metabolic components of a biological matrix whose concentration is comparable or bigger than 1 μM (see Appendix A). Such sensitivity, relatively low if compared with other MS techniques, allows to assign up to 20 metabolites in vivo, and up to 100 metabolites in vitro [9–11]. Numerous strategies are being designed to overcome actual limitations, including a lower selectivity compared to the MS technique coupled with gas or liquid chromatography (GC-MS and LC-MS, respectively) and a low resolution for complex biological matrices. These include the development of new pulse sequences mainly involving field gradients for observing multidimensional hetero- or homo-nuclear correlations [12]. Within metabolomics investigations, NMR analyses are usually coupled with statistical approaches: sample randomization allows to reduce the correlation between confounding variables, sample investigation order and experimental procedures. In the last ten years, nested stratified proportional randomization and matched case-control design were adopted in the case of imbalanced results [13–15].

In any case, data pre-processing is a relevant step before performing data analysis by means of a conventional approach or a statistical one. The goal of pre-processing is to homogenize the acquired data, avoiding the presence of instrumental bias mainly involving peaks' features for a better quantification of metabolites. For example, the pre-processing of NMR spectra concerns phasing, baseline correction, peak alignment, apodization procedures, normalization and binning [16,17] (see Figure 1). In particular, the binning procedure corresponding to the spectral segmentation is performed mainly in those cases of challenging spectral alignment or simply for reducing the data points [18]. Even though binning reduces data resolution, binning procedures are commonly used [19–21].

**Figure 1.** Schematic workflow illustrating the steps of NMR based metabolomic studies coupled with chemometrics and pathway analysis. (1) Sample preparation and NMR tube filling (top left); (2) experimental parameters setting and data acquisition (top right); (3) data processing (middle left); (4) execution of multivariate statistical analysis (bottom right); (5) determination of metabolic pathways (bottom left). Some figures are reprinted from Refs. [22,23] under the terms of the CC-BY license.

For what concerns normalization, recorded spectra are usually normalized by the total integrated area and thus the metabolites concentration can be compared among different samples. In the case of large signals variation, probabilistic quotient normalization can be adopted instead [24]. Finally, deconvolution is also employed for the necessary assignment and quantification of those metabolites whose signals overlap [25,26]. All these pre-processing methods are also chosen, taking into account that the approaches adopted for the data processing are essentially dual: (1) chemometrics, consisting in the employment of statistical analysis for the recognition of similar patterns and for the significant determination of intensity values, and (2) quantitative metabolomics, based on an initial assignment and quantification of metabolites with the subsequent statistics. We outline that, from one side, chemometrics allows an automatic and non-biased classification of metabolites, whereas from the other side, it needs a big number of uniform spectra. These requirements do not apply for quantitative metabolomics [27,28].

In order to gain useful insights and a corresponding interpretation of NMR outcomes, it is indeed mandatory to use statistical and bioinformatic tools, considering the complex output generated [22]. In this work, we discuss the main statistical approaches currently used for NMR-based metabolomics analysis, pointing out the advantages and disadvantages. Illustrative examples are reported, and the actual challenges influencing the analysis are also discussed. On the basis of these evidences, it emerged that innovative experimental procedures would need to be implemented in order to improve the potentiality of existing approaches (i.e., adequate sample sizes and conditions), thereby combining their complementing features with the aim to achieve most of the metabolomic information from an NMR measurement. Nevertheless, on considering the high complexity of biological systems, each regulation level, including the genome, should be considered, yielding corresponding insights on cellular processes. Thus, data coming from different biological levels should be integrated within the same analysis for the observation of interconnectivity changes between the different cellular components. In this context, neural network-based approaches could be adequate in responding to this major challenge and indeed to the exploitation of proper approaches for the weighted consideration of data corresponding to different layers of biological organization.

#### **2. Conventional Approaches**

#### *2.1. Unsupervised Methods*

In the analysis of large metabolomic NMR datasets, unsupervised techniques are applied with the aim to identify any significant pattern within unlabeled databases without any human action. Below, we introduce and describe several unsupervised methods, highlighting their characteristics and implementation procedures. In particular, we describe the following unsupervised techniques: (a) principal component analysis (PCA); (b) clustering; (c) self-organizing maps (SOMs).

#### 2.1.1. Principal Component Analysis (PCA)

Principal component analysis (PCA) is employed for lowering the dimensionality of high-dimensional datasets, preserving as much information as possible by means of a "linear" multivariate analysis [29,30]. This approach employs a linear transformation to define a new smaller set of "summary indices"—or "principal components" (PCs)—that are more easily visualized and analyzed [31]. In this frame, principal components correspond to new variables obtained by the linear combination of the initial variables by solving an eigenvalue/eigenvector problem. The first principal component (PC1) represents the "path" along which the variance of the data is maximized. As happens for the first principal component, the second one (PC2) also defines the maximum variance in the database. Nevertheless, it is completely uncorrelated to the PC1 following a direction that is orthogonal to the first component path. This step reiterates based on the dimensionality of the system, where a next principal component is the direction orthogonal to the prior components with the most variance. If there are significant distinctions between the ranges of initial

variables (those variables with smaller ranges will be dominated by those with larger ones), distorted results may occur. To avoid this kind of problem, it is required to perform a standardization operation before executing PCA that corresponds to a transformation of the data into comparable scales. This can be done by using different scaling transformations, such as autoscaling, the generalized logarithm transform or the Pareto scaling with the aim to enhance the importance of small NMR signals, whose variation is more affected by the noise [32]. One of the most used transformation is the mean centered autoscaling:

$$\frac{value - mean}{\text{st.deviation}},\tag{1}$$

Furthermore, the computation of the covariance matrix is required to discard redundant information mainly due to the presence of any relationship between the initial variables of the data. The covariance matrix is symmetric (a *n* × *n*) being composed by the covariances of all pairs of the considered n variables (*x*1, ..., *xn*):

$$
\begin{bmatrix}
\text{Cov}(\mathbf{x}\_1, \mathbf{x}\_1) & \cdots & \text{Cov}(\mathbf{x}\_1, \mathbf{x}\_n) \\
\vdots & \ddots & \vdots \\
\text{Cov}(\mathbf{x}\_n, \mathbf{x}\_1) & \cdots & \text{Cov}(\mathbf{x}\_n, \mathbf{x}\_n)
\end{bmatrix}
\tag{2}
$$

In this frame, PCs can be obtained by finding the eigenvectors and eigenvalues from this covariance matrix. Figure 2 shows a graph with only three variables axes of the ndimensional variables space. The red point in this figure represents the average point used to move the origin of the coordinate system by means of the mean-centering procedure in the standardization process. Once we define PC1 and PC2, as shown in Figure 2, they define a plane that allows inspecting the organization of the studied database. Further, the projection of the data with respect to the new variables (PCs) is called the score plot, and if the data are statistically different/similar, they can be regrouped and classified.

**Figure 2.** Example plot with 3 variable axes in a n-dimensional variable space. The principal components PC1 and PC2 are reported.

PCA is usually applied in NMR-metabolomic studies because it simplifies the investigation of hundreds of thousands of chemical components in metabolomic database composed of several collected NMR spectra. In this way, each NMR spectrum is confined to a single point in the score plot in which similar spectra are regrouped, and differences on the PC axes shed light on experimental variations between the measurements [28,33–35]. However, it is noteworthy that PCA, like the other latent structure techniques, must be applied to matrices where the number of cases is greater than the number of variables [36].

The PCA technique can also be combined with other statistical approaches, including the analysis of variance (ANOVA) as reported by Smilde et al. [37] in their ANOVAsimultaneous component analysis (ASCA). This method is able to associate observed data changes to the different experimental designs. It is applied to metabolomics data, for example, to study variations of the metabolites level in human saliva due to oral rinsing [38], or the metabolic responses of yeast at different starving conditions [39].

#### 2.1.2. Clustering

Clustering is a data analysis technique used to regroup unlabeled data on the basis of their similarities or differences. Examples of clustering algorithms are essentially the following: exclusive, overlapping, hierarchical, and probabilistic clustering [40,41]. Exclusive and overlapping clustering can be described together because they differ for the existence of one or multiple data points in one or more clustered sets. In fact, while exclusive clustering establishes that a data point can occur only in one cluster, overlapping clustering enables data points to be part of multiple clusters with different degrees of membership. Exclusive and overlapping clustering are hard or k-means clustering and soft or fuzzy k-means clustering, respectively [42–44]. In hard clustering, every element in a database might be a part of a single and precise cluster, whereas in soft clustering, there is a probability of having each data point into a different cluster [44]. Generally speaking, k-means clustering is a "distance-based" method in which each "clustered set" is linked with a centroid that is considered to minimize the sum of the distances between data points in the cluster.

Hierarchical clustering analysis (HCA) is used to recognize non-linear evolution in the data—contrary to what was done by the PCA which shows a linear trend—by means of a regrouping of features sample by sample without having any previous information [45]. This clustering method could be divided in two groups: (i) agglomerative clustering, and (ii) divisive clustering [46,47]. The first one allows to keep data points separate at first, unifying them iteratively later until it one cluster with a precise similarity between the data points is obtained. In the opposite way, divisive clustering creates a separation of data points in a data cluster on the basis of their differences. The clustering analysis leads to dendrograms that are diagrams in which the horizontal row represents the linked residues, whereas the vertical axis describes the correlation between a residue and previous groups. Figure 3 reports a dendrogram obtained by means of hierarchical cluster analysis performed on 1H NMR data on the plasma metabolome of 50 patients with early breast cancer [48]. This kind of analysis allowed to discriminate among three different groups: LR-1 (red), LR-2 (blue) and LR-3 (green). They are characterized by significantly different levels of some metabolites, such as lactate, pyruvate and glutamin [48]. Furthermore, covariance analysis of NMR chemical shift changes allows defining functional clusters of coupled residues [49].

Clustering has been largely applied for metabolomic studies covering fields from medicine to food science, as is reported in the Applications section (Section 4). Here, we anticipate that clustering is essentially adopted for samples' classification by grouping metabolites without any external bias. This allows entering into the details of the precise metabolic pathways that may provide a connection between metabolomics and molecular biology. In such a way, many biomedical applications, including diagnostics and drug synthesis, would reach important improvements.

**Figure 3.** An example of a dendrogram obtained by means of hierarchical cluster analysis performed on 1H NMR data on the plasma metabolome of 50 patients with early breast cancer. From the analysis, 3 different groups are classified: LR-1 (red), LR-2 (blue) and LR-3 (green). In this case, the Ward algorithm is adopted for measuring the distance. Figure reprinted from Ref. [48] under the terms of the CC-BY license.

#### 2.1.3. Self-Organizing Maps (SOMs)

Self-organizing maps (SOMs) were introduced by Kohonen [50] and are widely employed to cluster a database, reduce its dimension and detect its properties by projecting the original data in a new discrete organization of smaller dimensions. This is performed by weighting the data throughout proper vectors in order to achieve the best representation of the sample. Starting from a randomly selected vector, the algorithm constructs the map of weight vectors for defining the optimal weights, providing the best similarity to the chosen random vector. Vectors with weights close to the optimum are linked with each unit of the map allowing to categorize objects in map units. Then, the relative weight and the total amount of neighbors reduce over time. Therefore, SOMs have the great power of reducing the dimensionality of the system while preserving its topology. For that reason, they are commonly adopted for data clustering and as a visualization tool. Another great asset of SOMs concerns the shapes of the clusters that do not require being chosen before applying the algorithm, whereas other clustering techniques usually work well on specific cluster shapes [51]. However, some limitations are evidenced using SOMs. In fact, they are normally of low quality, and the algorithm must be run many times before a satisfactory outcome is reached. Further, it is not easy to furnish information about the whole data distribution by only observing the raw map. Figure 4 reports the cluster of subjects involved in the study of renal cell carcinoma (RCC) by (NMR)-based serum metabolomics that was generated by using SOM (including the weighted maps for the considered 16 metabolites) [52].

The achieved results clearly separate healthy subjects (left region) and RCC patients (right region) within the SOM. Moreover, the weighted maps of the individual metabolites allow to identify a biomarker cluster including the following seven metabolites: alanine, creatine, choline, isoleucine, lactate, leucine, and valine. These may be considered for an early diagnosis of renal cell carcinoma [52].

**Figure 4.** An example of SOM model for studying renal cell carcinoma (RCC). (**A**) SOM classification and discrimination between healthy subjects (left region) and RCC patients (right region) by considering 16 metabolites extracted by means of NMR spectroscopy on serum samples. (**B**–**Q**) Weight maps of the considered 16 metabolites. Darker colors correspond to higher SOM weights. Figure reprinted from Ref. [52] under the terms of the CC-BY license.

#### *2.2. Supervised Methods*

Problems or datasets having response variables (discrete or continuous) are generally treated with supervised methods. We distinguish between classification or regression problems, depending on whether the variables are discrete or continuous, respectively. The supervised technique is based on the association between the response variable (used to drive the model training) and the predictors (namely covariates) with the aim to perform precise predictions [53–55]. In fact, first, a training dataset is used as fitting model, while, in a second step, a testing dataset is used to estimate the predictive power. The relevant predictors are chosen by three types of feature selection methods [56] whose merits and demerits are listed in the scheme drawn in Figure 5 [57]:


**Figure 5.** Scheme about merits and demerits of supervised methods, including filter, wrapper and embedded feature selection approaches.

Then, to measure the robustness of the fitting model and the predictive power, statistical approaches are adopted. Among them, we mention the root mean square error for calculating regression, sensitivity and specificity and the area under the curve for achieving classification.

For simplicity, let us consider that in binary classification, the test prediction can provide the following four results: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). The model sensitivity, which coincides with the TP rate (TPR, i.e., the probability of classifying a real positive case as positive), is defined as TP/(TP + FN). On the contrary, the specificity is defined as TN/(FP + TN) and is linked to the ability of the test to correctly rule out the FP (FP rate, FPR = 1 − specificity). In order to evaluate the performance of binary classification algorithms, the most used approach is that of the receiver operating characteristic (ROC) curve, which consists of plotting TPR vs. FPR for the considered classifier at different threshold values (see Figure 6). The performance of the classifier is usually indicated by the value of the corresponding area under the ROC curve (AUC). Figure 6 shows, as an example, the ROC curve and the corresponding AUC value for a classifier with no predicting power (red dashed line with AUC = 0.5), a perfect classifier (green dotted line with AUC = 1) and a classifier with some predictive power (blue solid line with AUC∼0.8).

Furthermore, several resampling methods, including bootstrapping and cross validation, can be adopted to achieve more reliable outcomes. This is a general description of the supervised methods; in the next, we will briefly enter into the details for some of them including random forest (RF) and k-nearest neighbors (KNN), principal component regression (PCR), partial least squares (PLS), and support vector machine (SVM).

**Figure 6.** ROC curves and corresponding AUC values for three classifiers: no predicting power (red dashed line with AUC = 0.5), perfect classifier (green dotted line with AUC = 1) and some predictive power (blue solid line with AUC∼0.8).

#### 2.2.1. Random Forest (RF) and k-Nearest Neighbors (KNN)

Although RF and KNN algorithms can be used for both supervised and unsupervised statistical analyses, here, we deal with the supervised aspects.

Random forest, as the name itself suggests, is composed by a proper number of decision trees working as an ensemble but individually depict a class from which the most representative corresponds to the model's prediction. Therefore, the idea behind the random forest algorithm is to correct the error obtained in one selection tree by using the predictions of many independent trees and by using the average value predicted by all these trees [58]. RF can deal with categorical features by treating both high dimensional spaces and a large number of training examples. In detail, the first step in a RF scheme is to create a selection tree; then, by using the observations {*Yj*, *Xj*}1<*j*<*K*, where *Xj* is usually a vector and *Yj* is a real number, different sets can be obtained using different splitting criteria which operate on the considered vectors. Each criterion allows the initial subset to be divided into two subsets. An example of the selection tree is shown in Figure 7:

**Figure 7.** Example of decision tree with a different action corresponding to a different conditions set.

Given an observation *Xi*, and known selection tree, one determines in which final node the vector *Xi* is classified in order to predict *Y*.

Instead, the k-nearest neighbors (KNN) algorithm considers that similar outcomes lie near each other. Given again an observation *Xi* and with the aim to predict *Y*, the KNN algorithm selects the k-nearest observations of *Xi* in {*Yj*, *Xj*}1<*j*<*K*. Let *i*1, ..., *ik* be the *k* values which provide the *k* minimum values of the function: g(j) = d(*Xj* - *Xi*). These minimum values can be equal if there are multiple values of *Xj* at the same distance from *Xi* [59]. There are at least the three possibilities for the distance (Euclidean, Manhattan and Minkowski). So, the value predicted for *Yi* is the mean value of the *k* values Y*<sup>j</sup>* for the *k* nearest neighbors of *Xi*:

$$
\hat{Y}\_i = \frac{1}{k} \sum\_{1}^{k} Y\_{ik} \tag{3}
$$

2.2.2. Principal Component Regression (PCR) and Partial Least Squares (PLS)

It is well known that a linear model can be written as *Y* = *Xβ* + , in which *Y* represents the response variable (it can be a single variable or even a matrix), and *X* represents the design matrix having variables along its columns and observations along its rows; *β* corresponds to the coefficients vector (or matrix) and represents the random error vector (or matrix). For a small number of variables and a high number of observations, it is commonly adopted for *β* the ordinary least square solution ((*XTX*)−<sup>1</sup> *XTY*). In the opposite case, where it is not possible to evaluate the inverse of the singular matrix (*XTX*), other solutions have to be considered [60]. One of them is the principal component regression (PCR) that makes use of the first PCs achieved by running PCA to fit the linear regression model instead of using all original variables. However, often, there is not a good correlation between these PCs and the response variables *Y*. Alternatively, the partial least squares (PLS) regression method is more efficient [61]. In the latter case, one has to determine the most suitable number of components to maintain, and then PLS evaluates a linear regression model by employing the projection of predicted and observed variables to a new space according to the following relations:

$$
\Upsilon = \mathcal{U}Q' + F \tag{4}
$$

$$X = TP' + E \tag{5}$$

where *T* and *U*, analogously to PCA, correspond to *X* and *Y* scores and are matrices constituted by latent variables; at the same time, *P* and *Q* correspond to *X* and *Y* loadings, representing the weight matrices of the linear combinations; *E* and *F* represent all that is not possible to explain by using latent variables. Each of them, being expressed as a linear combination of *X* and *Y*, can be rewritten in terms of weight factors as *t* = *Xw* and *u* = *Yc*, where *t* and *u* are two latent variables and *w* and *c* are the corresponding weight vectors. Indeed, PLS evaluates that set of *X* variables that is able to explain the majority of the changes in *Y* variables. Therefore, PLS, by using orthogonal conditions, evaluates those latent variables *t* and *u*, whose covariance is maximal. Ultimately, there are some substantial differences between the PCA and PCR-PLS approaches. In fact, as already mentioned, PCA pertains to unsupervised methods, whereas PCR and PLS pertain to supervised approaches. Moreover, as already mentioned, PCR takes advantage of the first PCs obtained from the PCA, using them as predictors for fitting the regression of a latent variable. Hence, PCA is able to explain just the *X* variance, whereas PLS allows achieving a multi-dimensional route in the *X* space, indicating the maximum variance route in the *Y* space. In other words, in PCR, the principal components become the new (unrelated) variables of the regression, which thus becomes more easily resolvable. Otherwise, in PLS, the *Y* variables are decomposed into principal components too, while those of *X* are rotated along the direction of maximum correlation with respect to the principal components of *Y*. Therefore, the purpose of PLS is to determine latent variables similar to the principal components that maximize the variance of both matrices.

We also mention the partial least squares discriminant analysis, or PLS-DA, which is an alternative when the dependent variables are categorical. Discriminant analysis is a classification algorithm which adds the dimension reduction part to it. PLS-DA allows the employment of predictive and descriptive algorithms other than for discriminative variable choice (see Figure 8a). PLS-DA is executed on NMR spectra for different aims, including food authentication and diseases classification in medical diagnostics [62–65]. However, a more comprehensive variant of PLS is the orthogonal PLS (OPLS) method. It is finalized to separate systematic changes in X into two parts; one of them is in linear relationship with Y and another is irrelevant to Y (generally, perpendicular to it). So, some changes in X which are perpendicular to Y are eliminated, while uncorrelated changes in X are separated from correlated ones (see Figure 8b). In this way, the uncorrelated changes are analyzed separately, favoring the prediction ability and the interpretation of results [66]. This latter is one of the advantage of OPLS with respect to PLS together with the aspect that the inner repetition is not time consuming, which can accelerate the calculation process. In fact, OPLS is more appropriate for discriminating the precise differences between two systems, providing information on the variables with the largest discriminatory power.

**Figure 8.** (**a**) Bidimensional PLS-DA score plot of urine samples obtained from different hospitals. HB—Basurto Hospital, CRC—Cruces Hospital, HD—Donosti Hospital, TX—Txagorritxu Hospital. Figure reprinted from [67] under the terms of Creative Commons Attribution 4.0 International License. (**b**) OPLS scheme.

#### 2.2.3. Support Vector Machine (SVM)

Considering the data organized into a matrix, each subject corresponding to a row vector can be conceived as a single point in the p-space of the considered variables. Data can be essentially organized into two main groups, "separated by a gap" whose margins are defined by support vectors. Instead, the edge located in the gap center separating the data corresponds to the dividing hyperplane. SVM tries to define the support vectors, and the prediction will indicate to which hyperplane side the new observations belong. However, generally, data cannot be linearly separated, and hence it is difficult to determine the separating hyperplane. Nevertheless, SVM can accurately execute a non-linear classification throughout the so-called kernel trick. It consists of an implicit mapping of the considered inputs into high-dimensional feature spaces with the objective to their linear separation in that space [68]. In detail, the optimal hyperplane is the one that provides the highest separation between the two classes. With greater definition, by separation, we mean the maximum amplitude (or width, w) between the lines parallel to the hyperplane without any data points in between. This optimal hyperplane is called the maximum-margin hyperplane and the corresponding linear classifier is the maximum-margin classifier (Figure 9).

**Figure 9.** Linear SVM model highlighting the classification of two classes (red and blue). Figure reprinted from Ref. [68] under the terms of the HighWire Press license.

In addition, in the presence of mislabeled data, the SVM can provide inadequate classifications; therefore, only a few misclassified subjects are found instead by maximizing the separation between the two classes. Finally, validation methods and diagnostic measures are analogous to those adopted in PLS methods. Ultimately, SVM is one of the approaches with the highest accurate prediction, since it is based on statistical learning frameworks [69,70]. It can also be used within machine learning approaches for anomaly detection (such as weather) by choosing an anomaly threshold with the aim to establish whether an observation belongs to the "normal" class or not. Disadvantages of supervised methods include overfitting problems [71] corresponding to the inclusion of noise inside the statistical model. These issues can be provoked by excessive learning, so several validation techniques, such as cross validation [72] or bootstrapping [73], are usually employed to solve them.

#### *2.3. Pathway Analysis Methods*

A powerful method to describe peculiar features of the cell metabolism is pathway analysis (PA), which provides a graphical representation of the relationships among the actors (mainly enzymes and metabolites) of precise catalyzed reactions. Therefore, PA is highly employed for the interpretation of high-dimensional molecular data [74]. In fact, taking advantage of the already acquired knowledge of biological pathways, proteins, metabolites and also genes can be mapped onto newly developed pathways with the objective to draw their collective functions and interactions in that specific biological environment [75]. Although PA was initially developed for the interpretation of transcriptomic data, in the last decades, it has become a common method in metabolomics, being particularly suited to find associations between molecules involved in the same biological function for a given phenotype [76–78].

PA methods include several tools allowing deep statistical analyses in metabolomics known as enrichment analysis. They grant the functional interpretation of the achieved results mainly in terms of statistically significant pathways [79]. These tools can handle heterogeneous and hierarchical vocabularies and may be classified into two distinct collections. The first encompasses "non-topology-based" (non-TB) approaches, which do not consider the acquired knowledge concerning the character of each metabolite in the considered pathways [80]. Non-TB approaches include the over-representation analysis (ORA) as the first generation technique and the functional class scoring (FCS) as the second generation.

Finally, the second collection includes topology-based methods (see Figure 10) that are adopted to determine those pathways that are significantly impacted in a given phenotype.

**Figure 10.** Conceptual map about the topology-based pathway analysis method.

This latter approach can be classified depending on the considered pathways (e.g., signaling or metabolic), inputs (e.g., subset or all metabolites and metabolites *p*-values), chosen mathematical models, outputs (e.g., pathway scores and *p*-values) and the wanted implementation (e.g., web-based or standalone) [81,82]. Note that PA methods were originally developed for genes, but they can be successfully applied for every biomolecule/metabolite [83].

#### 2.3.1. Over-Representation Analysis (ORA)

Over-representation analysis (ORA) is among the most used pathway analysis approaches for the interpretation of metabolomics data needed as input, once the type of annotations to examine is chosen. One obtains a collection of annotations and their associated *p*-value as outputs since a statistical test is applied to determine whether a set of metabolites is enriched by a specific annotation (e.g., a pathway) in comparison to a background set. Different statistics can be applied to obtain information about the studied biological mechanisms and on the specific functionality of a given metabolite set. Among the most used statistics, we would like to mention the well-known binomial probability, Fisher's exact test and the hypergeometric distribution [84,85].

Three are the necessary inputs in ORA analysis: (i) a set of pathways (or metabolite collections); (ii) a catalog of investigating metabolites and, (iii) a background collection of compounds. The list of investigating metabolites usually comes from experimental data after applying a statistical test to determine those metabolites whose signals can be associated with a precise result by choosing a threshold value usually associated to the *p*-values [74]. The background collection includes all metabolites that can be revealed in the considered measurement. If the p-value corresponding to each pathway is obtained by means of the right-tailed Fisher's exact test based on the hypergeometric distribution, the probability to find k metabolites or more in a pathway can be written as [74]:

$$P(X \ge k) = 1 - \sum\_{i=0}^{k-1} \frac{\binom{M}{i} \binom{N-M}{n-i}}{\binom{N}{n}},\tag{6}$$

where *N* corresponds to the number of background compounds, *n* is the number of the measured metabolites, *M* is the number of background metabolites mapping the *i*th pathway, and *k* represents the overlap between *M* and *n*. A scheme of the ORA principle is displayed in Figure 11 as a 3D Venn diagram. Finally, multiple corrections are usually applied, as calculations are made for many pathways, thus obtaining a collection of significantly enriched pathways (SEP).

**Figure 11.** A 3D Venn diagram illustrating the relation between ORA parameters (Equation (6)) in which *N* corresponds to the number of background compounds, *n* is the number of the measured metabolites, *M* is the number of background metabolites mapping the *i*th pathway, and *k* represents the overlap between *M* and *n*.

Before applying ORA, one has to verify if the metabolomics dataset is sufficiently big to furnish proper statistical significance. For instance, usually MS-based techniques can observe more metabolites than NMR-based methods, such as the mono-dimensional NMR ones commonly used for profiling [86]. Indeed, the choice of the most suitable background collection is the real challenge and still remains an open subject because it strictly depends on the situation [74].

#### 2.3.2. Functional Class Scoring (FCS)

Functional class scoring (FCS) methods look for coordinated variations in the metabolites belonging to a specific pathway. In fact, FCS methods take into account those coordinated changes within the individual set of metabolites that, although weak, can have a significant effect of specific pathways [75,78]. Essentially, all FCS methods comprise three steps (see Figure 12):


3. The last FCS step corresponds to estimating the significance of the so-called pathwaylevel statistics. In detail, the null hypothesis can be tested into two different ways: (i) by permuting metabolite labels for every pathways, so comparing the set of metabolites in that pathway with a set of metabolites not included in that pathway (competitive null hypothesis) [75] and (ii) by permuting class labels for every sample, so comparing the collection of metabolites in a considered pathway with itself, whereas the metabolites excluded by that pathway are not considered (self-contained null hypothesis) [91].

**Figure 12.** Schematic representation of the three main steps adopted in FCS methods.

2.3.3. Metabolic Pathway Reconstruction and Simulation

The identification of metabolomic biomarkers and their mapping into a neural network is fundamental to further study the cellular mechanisms and its physiology. The goal is to identify the effects of the metabolites (as a function of their concentration) on the cellular changes, providing a relationship with the most likely biologically meaningful sub-networks. Thus, basing on genome annotation and protein homology, reference pathways could be mapped into a specific organism. However, this mapping method often produces incomplete pathways that need the employment of ab initio metabolomic network construction approaches (such as Bayesian networks), where differential equations describe the changes in a metabolomic network in terms of chemical amounts [98,99]. Qi et al. [100] further improved this approach allowing to optimize accuracy in defining metabolomics features or better the correlation between the substrates whose nature is well known as well as the species of each individual reactions, so defining the classification of the mapped metabolic products in a pathway and their modifications under selected perturbations. Recently, Hu et al. [23] performed a pathway analysis on serum spectra recorded by 1H NMR with the aim to identify eventual biomarkers characterizing the treatment of human lung cancer. After a first statistical analysis in terms of PLS-DA, they were able to identify four metabolic pathways associated with the metabolic perturbation induced by non-small-cell lung cancer (Figure 13) by means of the MetaboAnalyst package [101]. In detail, the highest pathway impact was shown by the metabolisms of (i) taurine and hypotaurine, (ii) d-glutamine and d-glutamate, (iii) glycine, serine and threonine, and (iv) alanine, aspartate and glutamate, thus shedding light on the responsible processes in this kind of cancer.

**Figure 13.** Pathway analysis performed on serum spectra recorded by 1H NMR allowing the identification of main metabolic pathways associated with non-small cell lung cancer. The larger the circle, the higher the impact. The color, from red to yellow, identifies the corresponding significance. Figure reprinted from [23] under the terms of the Creative Commons Attribution 4.0 International License.

#### **3. Artificial Intelligence toward Learning Techniques**

Artificial intelligence (AI) techniques are based on algorithms that try to simulate both human learning and decision making. Indeed, AI exploits the ability of computer algorithms to learn from a given dataset containing precise information that then must be recognized in new dataset in an automatic way. Specifically, the computer algorithms during learning on the test dataset create models that are able to provide information on the probability that a specific result may occur. Furthermore, these programs are usually able to identify the important features associated with the outcome of interest. Artificial intelligence methods can accurately handle big data for biomarkers prediction, allowing the determination of relevant characteristics pertaining to a dataset and a deep comprehension of the significance of such data. Specifically, the integration of metabolic snapshots with metabolic fluxes and the use of knowledge-informed AI methods allow obtaining a profound comprehension of metabolic pathways at the system level. Hence, the development of multi-omic techniques integrating both experimental and computational methods, adequate to extract metabolic information at the cellular and subcellular levels, will provide powerful tools to enter the details of metabolic (dis)regulation, therefore allowing the exploitation of personalized therapies [102].

#### *Machine Learning, Neural Networks and Deep Learning*

All the conventional approaches discussed in the previous sections can be implemented by learning algorithms that let the corresponding network learn by a given dataset and, after performing a test with a sample dataset, can be used with a known predictive power. In this section, we get into details of the different machine learning techniques as a subset of AI methods (Figure 14).

**Figure 14.** Venn diagram illustrating that deep learning is the core of machine learning, which in turn is a technique within AI methods.

In addition, neural networks and deep learning approaches are characterized in terms of the number of node layers, also named depth. Briefly, a node is the locus in which the algorithm performs the calculation and would correspond to the action that a neuron exerts in the human brain when it is subject to a stimulus. As shown in Figure 15b, a node takes different inputs, each having its own weight, that can be amplified or reduced by the activation function, thus giving a corresponding significance to the received inputs with respect to the specific task that the used algorithm is learning. So, a neural network consisting of two or more hidden layers can be classified as a deep learning technique and is usually described by the diagram shown in Figure 15a, together with a scheme of how one node might look (Figure 15b).

**Figure 15.** (**a**) Example scheme of a deep neural networks, reprinted from Ref. [103] under the terms of the CC-BY license; (**b**) operating principle of a single node.

Deep learning techniques, being able to handle large datasets, thus allowing a highlevel description, are already used to provide the optimal route to solve a lot of issues in the field of image recognition, speech recognition, and natural language processing. Furthermore, DL techniques can be divided into three main categories (see Figure 16) that are deepened in Ref. [104]:


tive autoencoder (CAE), variational autoencoder (VAE), self-organizing map (SOM), restricted Boltzmann machine (RBM) and deep belief network (DBN);

• Hybrid learning (both discriminative and generative) includes models composed by both supervised and unsupervised algorithms other than deep transfer learning (DTL) and deep reinforcement learning (DRL).

**Figure 16.** A taxonomy of DL techniques. For acronyms, see main text.

Supervised learning can furnish a discriminative function in classification applications by identifying the different features of those classes that can be extracted by the data. Among them, multi-layer perceptron (MLP) is a feedforward ANN that involves (i) an input layer collecting input signals, (ii) an output layer that provides an outcome in consideration of the processed input and (iii) some hidden layers separating the input and output layers that correspond to the network computational engine. On the contrary, unsupervised learning is employed to recognize eventual correlations by analyzing the signals pattern and to assess the statistical distributions of the achieved results both on original data and on their corresponding classes. This kind of generative approach can be also used as an initial step (pre-processing) before applying supervised learning methods. Most common unsupervised techniques, reported in Figure 16, are listed and briefly described in the next. Hybrid learning paradigms combining both discriminative and generative methods are possible. Hybrid deep learning architectures are usually constituted by multiple models where the basis can indeed be either a supervised or unsupervised deep learning method. Common hybrid learning algorithms are, for example, semi-supervised learning that allows to use a supervision for some data points, keeping the others unlabeled, and deep reinforcement learning (DRL; see Figure 16) that, interacting with an environment, involves the knowledge of performing with sequential decision-making tasks in order to maximize cumulative rewards [104,105]. Their advantages lie in the possibility to consider the best aspects of discriminative and generative models. For instance, a hybrid architecture can adopt small inputs to avoid the problem of determining the right network size and instead an increasing number of neurons in receptive-field spaces [106]. At the same time, by a proper enhancement of the initial weights through suitable algorithms, neural networks in hybrid architectures can provide higher accuracy and predictive power [107,108].

Most of the techniques in the categories indicated before are feed-forward (working from input to output) but, as detailed in the last part of the section, the opposite movement is also possible. This is called backpropagation and works from output to input, allowing the evaluation of the individual neuron's error, allowing to properly modify and fit the algorithm iteratively. Unlike ML, that usually adopts manual identification and description of relevant features, DL techniques aim to execute automatically the features extraction, avoiding almost all human participation. In addition, DL can handle larger datasets, especially of the unstructured type. In fact, DL methods can have unstructured raw data as input (such as text or images) and can directly define which characteristics must be considered to distinguish the original observations. By recognizing similar and/or different patterns, DL methods can adequately cluster inputs. Therefore, DL approaches would need a very high number of observations to be as accurate as possible. Generally, and according to the scheme reported in Figure 16, the most adopted deep learning techniques are the following [104]:

	- (a) Deduce feature maps from input after applying a proper function (convolution);
	- (b) Reveal an image after given changes (max-pooling);
	- (c) Flatten the data for the CNN analysis (flattening);
	- (d) Compiling the loss function by a hidden layer (full connection).
	- (a) Sparse autoencoders have more hidden than input layers for reducing overfitting.
	- (b) Denoising autoencoders are able to reconstruct corrupted data by randomly assigning 0 to some inputs.
	- (c) Contractive autoencoders include a penalty factor to the loss function to prevent overfitting and data repetition when the network has more hidden than input layers.
	- (d) Stacked autoencoders perform two stages of encoding by the inclusion of an additional hidden layer.

10. **Gradient descent** are neural networks that identify a slope corresponding to a relation among variables (for example, the error produced in the neural network and data parameter: small data changes provoke errors variations).

**Figure 17.** Example of a convolutional neural network. Figure reprinted from Ref. [109] under the terms of CC BY-NC-ND 4.0 license.

**Figure 18.** (**a**) Scheme of a RNN. (**b**) Example of a BP network architecture. Figure reprinted from Ref. [109] under the terms of CC BY-NC-ND 4.0 license.

From the above brief information, it emerges that, even if DL methods can be thought as black-box solutions, future generation deep learning can provide a great aid for the analysis of big data and for corresponding reliable results.

#### **4. Applications of Deep Learning Approaches for NMR-Based Metabolomics**

In this section, the applications of deep learning on NMR-based metabolic data for specific different fields are reported and discussed. Here, we briefly introduce the potentiality of the applications of deep learning in metabolomics which today are still relatively low compared to other omics. This is explained since metabolome-specific deep learning architectures should be defined, and dimensionality problems and model evaluation regime should be further evaluated. In any case, data pre-processing using convolutional neural network architecture appears to be the most efficient approach among the deep learning ones. The main advantage of CNNs compared to a traditional neural network is that they automatically detect important features without any human supervision. Specifically, CNNs learn relevant features from image/video at different levels, similar to a

human brain [110]. This is very relevant to analyze both biomedical and food data, whose classification in view of safety security actions is extremely important.

The potentiality of the NMR technique within the field of metabolomics is currently employed for several purposes, including the detection of viable microbes in microbial food safety [10], the assessment of aquatic living organisms subjected to contaminated water [111], the identification of novel biomarkers to diagnose cancer diseases [112] and the monitoring of the plant growth status changing environmental parameters in view of smart agriculture [113]. In the next sections, we discuss some applications of deep learning approaches for NMR-based metabolomics in food and biomedical areas, highlighting their strengths and limitations.

Even before the development of artificial intelligence, statistical analyses were successfully applied in food analysis but with some limitations. For example, traditional methods are usually not very accurate in the classification of similar foods in contrast to modern deep learning approaches that allow enhancing all small differences. However, traditional methods usually constitute the first step, providing the input for neural networks with the aim to achieve a more accurate and automatic output. Furthermore, advanced computational algorithms can be applied not only for statistical analysis, but also to execute simulations whose predictions depend on the considered conditions [114].

#### *4.1. Food*

Foodomics is a term referred to the metabolomic approaches applied to foodstuffs for investigating topics mainly related with nutrition. Nowadays, DL methods are being progressively applied in the food field with different purposes, such as fraud detection [115]. Furthermore, another important issue is to guarantee the geographical origin and production/processing procedures of food, the precise proportions of ingredients, including additives and the kind of used raw materials. In this context, machine learning is a powerful method for achieving an adequate classification. For example, Greer et al. [116] carried out NMR measurements using a not-conventional protocol to measure the magnetization relaxation times (both the longitudinal T1 and transverse T2) and then they efficiently classified cooking oils, milk, and soy sauces (see Figure 19).

Since the considered datasets are very large (typically about 5 × <sup>10</sup><sup>6</sup> points each), the authors first reduced their size by means of the singular value decomposition, thus allowing a fast classification and also providing little insight into the sample physical properties. Figure 19 reports different combinations for the obtained classification features. Figure 19a,b corresponds to the two components used by the Gaussian fit of those peaks revealed by the inverse 2D Laplace transform [117]. A sharp distinction of the samples is clearly shown for every adopted combination. The y-axes of Figure 19a,c report the first component of T1 versus the first and second components of T2, respectively. Contrarily, the y-axes of Figure 19b,d report the second component of T1 versus the first and second components of T2, respectively. The authors found that most of the trained models reached an accuracy up to 100% (see, for example, Figure 20a). Finally, they also pointed out the effect of the sample temperature on classification accuracy for achieving reliable results (see Figure 20b).

**Figure 19.** T1–T2 correlational maps classifying several kinds of oils: olive (blue), canola (orange), corn (yellow) and vegetable (purple) by using the two components used by the Gaussian fit of those peaks revealed by the inverse 2D Laplace transform. (**a**,**c**) report the first component of T1 versus the first and second components of T2, respectively. (**b**,**d**) report the second component of T1 versus the first and second components of T2, respectively. See main text and Ref. [116] for details. Figure reprinted with permission from Ref. [116]. Copyright 2018 Elsevier.

**Figure 20.** (**a**) Comparison of the accuracy for the predictive power of the algorithms applied to classify cooking oil samples by employing three different classification training; (**b**) accuracy of predictive power applied to soy sauce sample highlighting the effect of temperature. Figure adapted with permission from Ref. [116]. Copyright 2018 Elsevier.

Nowadays, deep neural networks (DNNs) are rarely used for metabolomics studies because the assignment of metabolites contribution in NMR spectra still lacks highly reliable yields due to the complexity of the investigated biological matrix and thus of the corresponding signals. As described in the previous section, different deep learning methods were used, but some of them are characterized by some limitations (i.e., low accuracy in classification). Some efforts were made to overcome this problem. Date et al. [118] recently developed a DNN method that includes the evaluation of the so-called mean decrease accuracy (MDA) to estimate every variable. It relies on a permutation algorithm

that allows the recognition of the sample geographical origins and the identification of their biomarkers. On the other hand, for food authenticity and nutritional quality, the fast revelation of viable microbes is still a challenge. Here, we report a multilayer ANN example (see Figure 21) showing four input neurons, two hidden layers made of three neurons, and two output neurons.

**Figure 21.** Multilayer artificial neural network showing 4 input neurons, 2 hidden layers made of 3 neurons, and 2 output neurons of which the one corresponding to "Salmonella" shows the highest value, associated with the prediction performed by the used ANN. Figure reprinted from Ref. [119] under the terms of the CC-BY license.

Such a scheme was organized by Wang et al. [119] for the detection, by means of NMR spectroscopy coupled with deep ANNs, of pathogenic and non-pathogenic microbes. According to the classification method, each output neuron is associated to one possible output. Here, "Salmonella" shows the highest value of output, thus corresponding to the prediction performed by the used ANN. In such a case, the weights of each input are optimized to reach the wanted outcome throughout backpropagation, thus defining multiple epochs and training cycles. Figure 22 reports an example referred to an ANN analysis with two hidden layers of 800 neurons. ANN training is made optimizing a set of training criteria to avoid shallow local minima. In particular, training continues when the loss function decreases after an epoch of training ("greedy" algorithm—case a) and even after a small increase followed by a continuous decrease (case b). On the contrary, training stops for an increase in the loss function after several constant values (case c) and for steep increases (case d) [119].

**Figure 22.** Comparison of the different criteria adopted for the ANN training. (**a**) "Greedy" learning; (**b**) "jumping" out of a local tiny minimum; (**c**) halt at large minima; (**d**) halt at sharp growths in loss. Figure reprinted from Ref. [119] under the terms of the CC-BY license.

Once the network is trained, it is able to perform predictions on new input data. As already mentioned, the loss and the model accuracy provide a measure of the output goodness. In fact, the aim is to minimize the disagreement between the prediction and the reality (loss) and to maximize accuracy (cross-validation method). Thanks to this approach, Wang et al. [119] found that the used ANNs accurately predict 91.2% of unknown microbes

and, after repeating the model training by considering just those metabolites whose amount increased with incubation time, they observed an accuracy up to 99.2%.

Machine learning and neural network approaches are simultaneously adopted to analyze large amounts of NMR metabolomics data for food safety [109]. This can be performed also by means of magnetic resonance imaging (MRI), which is an imaging technique relying on NMR principles. Within the food field, it is mainly used to resolve the tissue texture of foods [120,121]. On the other hand, Teimouri et al. [122] used PLSR, LDA, and ANN for the classification of the data collected by CCD images from food portions, different in color and geometrical aspects. In this way, they were able to classify 2800 food samples in one hour, with an overall accuracy of 93%. Instead, De Sousa Ribeiro et al. [123] developed a CNN approach able to reconstruct degraded information on the label of food packaging. Before applying CNNs, they started with K-means clustering and KNN classification algorithms for the extraction of suitable centroids.

#### *4.2. Biomedical*

Metabolomics-based NMR investigations, coupled with deep learning methods, are increasingly employed within the biomedical field. More profoundly, the use of complex DL architectures hardly allows achieving a predictive power with ranking or selection. As already discussed, DL models use several computational layers to analyze input signals and establish any eventual preferred direction for signal encoding (forward or backward). This procedure does not usually allow the interpretation of input signals in terms of the used model, making it hard to identify biomarkers in a network, where biological and DL modeling are connected (Figure 23).

Today, it is still necessary to uniform assessment metric for biomedical data classification or prediction, also avoiding false negatives in disease diagnosis. Further, deep learning is a promising methodology to treat data collecting by smart wearable sensors, which is considered fundamental in epidemic prediction, disease prevention, and clinical decision making, thus allowing a significant improvement in the quality of life [124,125].

**Figure 23.** The multiomics method represented connects biological (i.e., signal inhibition, signaling network and biochemical feedback) with DL modeling (backpropagation, prediction, convolution, etc.), aiming to maximize the robustness of the approach for the identification of biochemical features referred to specific phenotypes. Figure reprinted from Ref. [124] under the terms of the Creative Commons Attribution Noncommercial License.

With the aim to obtain an accurate metabolites identification from the observation of the corresponding peaks in complex mixtures, Kim et al. [126] developed a convolutional neural network (CNN) model, called SMART-Miner, which is trained on 657 chemical entities collected from HMDB and BMRB databases. After training, the model is able to automatically carry out the recognition of metabolites from 1H-13C HSQC NMR spectra of complex metabolite mixtures, showing higher performance in comparison with other NMR-based metabolomic tools (Figure 24).

**Figure 24.** Overlay of experimental HSQC spectra from a metabolite mixture (black correlations) and the outcomes predicted by SMART-Miner (colored correlations). Figure reprinted with permission from Ref. [126]. Copyright 2021 Wiley Periodicals, Inc.

Brougham et al. [127], by employing ANNs on 1H NMR spectra, performed a successful classification of four lung carcinoma cell lines, showing different drug-resistance patterns. The authors chose human lung carcinoma and adenocarcinoma cell lines together with specific drug-resistant daughter lines (Figure 25). The ANN architecture was constructed at first using three layers and the corresponding weights were determined by minimizing the root mean square error. Then, the authors analyzed networks with four layers, two of which are hidden. Their results show that the four-layer structure with two hidden layers provided a 100% successful classification [127]. These data are very interesting in terms of the robustness of the used approach: the cell lines were correctly classified, even though the effects were provoked by the operator and independently from the spectra chosen for training and validation (Figure 25).

**Figure 25.** (**A**) Example of 1H NMR spectrum for DLKP lung carcinoma cells. Labeled peak corresponds to (a) CH3, (b) CH2, (c) CH2CH=CH, (d) CH2COO, (e) =CHCH2CH=, and (f) HC=CH/CHOCOR. The highlighted intervals at 0.60–1.04 and 1.24–3.56 ppm were used for statistical analysis. (**B**) PCA score plot including data from all four cell lines. (**C**) Residual mean squares error vs. nodes number in the hidden layers, for the 3-layers (full symbols), and in the second (empty triangles) and third (empty circles) layer for the 4-layers networks. Figure reprinted from Ref. [127] under the terms of the Creative Commons Attribution License.

Very recently, Di Donato et al. [128] analyzed serum samples from 94 elderly patients with early stage colorectal cancer and 75 elderly patients with metastatic colorectal cancer. With the aim to separately observe each different molecular components, these authors acquired one-dimensional proton NMR spectra by using three different pulse sequences for each sample: (i) a nuclear Overhauser effect spectroscopy pulse sequence to observe molecules with both low and high molecular weight; (ii) a common spin echo monodimensional pulse sequence [129] to observe only lighter metabolites and (iii) a common diffusion-edited pulse sequence to observe only macromolecules [128]. Their results, taking advantage of Kaplan–Meier curves for prognosis and of a PCA-based kNN analysis, allowed distinguishing relapse-free and metastatic cancer groups, with the advantage of obtaining information about the risks in the early stage of the colorectal cancer disease.

Peng et al. [130], by using two-dimensional NMR correlational spectroscopy on the longitudinal (T1) and transversal components (T2) of the magnetization relaxation time during its equilibrium recovery, were able to perform a molecular phenotyping of blood with the employment of supervised learning models, including neural networks. In detail, by means of a fast two-dimensional Laplace inversion [117], they obtained T1–T2 correlation spectra on a single drop of blood (<5 μL) in a few minutes (Figure 26) with a benchtopsized NMR spectrometer. Then, they converted the NMR correlational maps for deep image analysis, achieving useful insights for medical decision making by the application of machine learning techniques. In particular, after an initial dimensionality reduction by unsupervised analysis, supervised neural network models were applied to train and predict the data that, at the end, were compared with the diagnostic prediction made by

humans. The results showed that ML approaches outperformed the human being and took a much shorter time. Therefore, the authors demonstrated the clinical efficacy of this technique by analyzing human blood in different physiological and pathological conditions, such as oxidation states [130]. Concerning the analysis of different physiological conditions, Figure 26 reports the T1–T2 correlational maps of blood cells at oxygenated (a), oxidized (b), and deoxygenated (c) states. Three peaks with different relaxation times values were observed and assigned to the different microenvironments that water experiences in the considered samples of red blood cells. For the obtained maps, the coordinate for the bulk water peak (slowest component) is shown at the upper left of the map indicating T2 and T1 relaxations (in ms) and T1/T2-ratio, respectively. Instead, the coordinates of the fastest components, due to hydration and bound water molecules [131], are reported close to the corresponding correlation peak (Figure 26).

**Figure 26.** T1–T2 correlational maps in false colors of red blood cells at different conditions: oxygenated (**a**), oxidized (**b**), and deoxygenated (**c**). Figure reprinted from Ref. [130] under the terms of the Creative Commons Attribution 4.0 International License.

#### **5. Conclusions and Future Perspective**

The role played by each metabolite (in terms of identification and quantification) is essential to validate NMR spectroscopy potentiality in this field. Overall, NMR-based metabolomics coupled with machine learning and neural networks improves its power, especially in the food and biomedical fields, paving the way for innovative and hybrid approaches for deep insights into the metabolic fingerprinting of complex biological matrices. In fact, the number of identified metabolites is very low, and in some cases, the metabolites profile analysis is difficult for the high noise level and the multicolinearity with respect to the genomics case. However, the coupling of genomics and metabolomics tools is still a goal to be achieved. To this purpose, the deep learning and neural network approaches are the best methods to use, although the first step may involve the use of linear discriminant analysis to select a subset of metabolites to be used as input for the neural network analysis in view of an accurate classification as well as the generalizability of the method. Therefore, some efforts are still necessary for applying deep learning approaches on NMR metabolomics data, strictly related to the specific properties of the selected/investigated metabolites, evaluating the dimensionality reduction problems and improving the reliability of the evaluation models.

**Author Contributions:** Conceptualization, C.C., F.N. and E.F.; methodology, S.V., A.M.M. and G.N.; writing—review and editing, all authors. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** No new data were created or analyzed in this study, so data sharing is not applicable to this article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Technical Aspects**

Nuclear magnetic resonance (NMR) is one of the most employed experimental techniques for investigating the wide composition and structural complexity of biological samples. The NMR technique is characterized by high reproducibility and ease of sample preparation and measurement proceedings. NMR is a non-destructive technique able to perform different measurements on the same sample, providing increasingly accurate and detailed information. NMR also allows to reach a quantitative analysis and to carry out in vivo metabolomics studies. Unfortunately, it has a relatively low sensitivity (μM), but, in combination with chromatography, it shows a great potentiality for targeted analysis. However, it is a relatively young experimental technique with continuous development from both the hardware and software point of view (see Ref. [3] for a more details). For instance, cryoprobes [132–134] and magic angle techniques [17,135,136] are today commonly used for improving the signal-to-noise ratio, while AI methods are used both for signal pre-processing, such as baseline optimization [137–139], and for data analysis, as discussed in the main text of this review.

Briefly, the NMR working principle is based on the resonant excitation of the precession dynamics of the nuclear magnetic moment under the effect of a static magnetic field. Nuclei characterized by an odd number of protons and/or neutrons show a magnetic moment, associated to the nuclear spin characterized by the corresponding quantum number (I). Nuclei with I = 0 possess an intrinsic nuclear magnetic moment (*μ*) so producing a slight local magnetic field (B0). Once immersed in an external magnetic field (B), these nuclei, previously randomly orientated, align themselves either in the same or opposite direction of B. These nuclei, subject to B, move in a precessional motion at a frequency called Larmor frequency, which takes on values in the range of 50–900 MHz (see Figure A1). Indeed, it is characteristic for each nucleus and increases with the strength of the external magnetic field B. In this condition, if the system is irradiated with an electromagnetic radiation at the corresponding Larmor frequency (resonance condition), nuclei can absorb the radiation energy, and the nuclear spins can be promoted to a different Zeeman level.

**Figure A1.** (**A**) Spectroscopies and corresponding frequency ranges. Larmor frequency of most used nuclei for metabolomics analyses with respect to that of the proton when at 600 MHz. (**B**) Parts per million intervals for all these nuclei (15N, 13C, 31P, 19F and 1H) at characteristic chemical environments. Figure reprinted from Ref. [3] under the terms of the Creative Common CC-BY license.

When spins relax toward the fundamental state, they emit a radio frequency (damped in time, called free induction decay (FID)) that well characterizes each nucleus of the system, depending on the corresponding chemical environment that essentially exerts a local magnetic field, causing a shift (chemical shift) from the pure Larmor frequency value. This is commonly indicated by *δ* and measured in parts per million since the recorded frequency is divided by the spectrometer working frequency such that the spectra acquired with different instruments can be compared. Note that nuclei with I = 0, such as 12C and 16O, are NMR inactive [140,141].

Figure A1B reports the most common NMR active nuclei. Among them, 13C and 15N show a wide chemical shift range, together with a sharp line signal, but their poor natural abundance and the low sensitivity (compared to other nuclei as 1H or 19F) limit their employment in the metabolomic investigation. 31P has a good sensitivity (6.6 × <sup>10</sup>−<sup>2</sup> relative to 1H) and a wide spectral range, but only few metabolites, such as nucleoside or phospholipids, contain it, restricting its employment to a few compounds. The same comments can be done about 19F.

The high abundance in nature, high sensitivity and relevant gyromagnetic ratio of 1H makes 1D 1H NMR spectra especially useful in the metabolomic investigation. The 1D 1H NMR spectra are fast to record (few minutes) and just the information contained in only one spectrum can provide useful data to identify and quantify from 50 to 100 metabolites [142,143]. In this case, if nuclear spins are totally relaxed and no polarization transfer

sequences are applied, the intensity of each acquired proton signal is correlated with the corresponding concentration levels in the molecules, and the area under each peak is directly proportional to the number of 1H constituting the corresponding residue, giving a real distribution of the individual metabolites in the sample mixture. This quantification is possible without previous calibration, thanks to the large linear dynamic range and signal response that characterize proton NMR spectroscopy.

Another important aspect in the analysis of the 1H NMR spectra is the solvent suppression, and in this way, several protocols can be used. Commonly, the protonated solvent can be replaced with a deuterated one; this procedure can also require the lyophilization of the sample and the subsequent dispersion in the deuterated solvent. When this is not possible, the solvent peak can be suppressed by using proper pulse sequences [144]. Regarding the identification of metabolites constituting a biological matrix, when they have a unique and high reproducible fingerprint at specific conditions (pH, solvent, temperature), the nontarget strategy can be adopted [145]. This consists of the employment of multimodal models, which clarify how the NMR fingerprint of each sample and among the groups correlate with each other, providing a static analysis. This strategy is very important to give a first overview about the sample composition; however, it is not sufficient to analyze very complex samples. In the latter case, it is more common to adopt the target strategy, which consists in the comparison of the acquired data with available metabolite databases, such as the Human Metabolome Database, Biological Magnetic Resonance Data Bank, Birmingham Metabolite Library, Bbiorefcode (Bruker Biospin Ltd., Billerica, MA, USA) and Chenomx library (Chenomx Inc., Edmonton, AB, USA) [145].

Figure A2 reports a 1H NMR spectrum acquired from human serum at 700 MHz: 55 different metabolites were identified and labeled in the recorded spectrum [3]. In particular, each proton signal can be attributed to the different components of the biofluid, thanks to the high sensitivity of 1H nuclei, its natural abundance and the remarkably narrow line widths, giving a remarkable spectrum resolution. Note that the high intensity of the lactate peak is due to a conversion of the glucose in lactate during the preparation of the sample. To reach a certain assignment of the detected metabolic peaks 1D 1H NMR is sometimes not sufficient. This is due to the relevant numbers of resonances with an ambiguous assignment, and to a peak overlap of the matrix's components. Thus, bi-dimensional (2D) NMR techniques, which investigate the spin–spin correlation among 1H-1H nuclei or with heteroatoms, such as 13C, 15N, 31P, are adopted. In metabolomic studies, typical 2D NMR techniques are 1H-1H correlated spectroscopy (COSY) and total correlation spectroscopy (TOCSY), 1H-13C heteronuclear single quantum coherence (HSQC) and heteronuclear multiple bond correlation (HMBC). HSQC is a great experiment for metabolites identification, which gives information on the direct connectivity between protons and heteroatoms. In particular, the large chemical shift scale of 13C helps to solve the tough issue of the overlapped signals in the proton spectrum, and the variety of HSQC experiments can provide different sets of information on the investigated sample.

For instance, the potentiality of the HSQC technique was proved in the identification of methyl groups of betaine and trimethylamine-N-oxide (TMAO). The proton resonances of methyl groups in TMAO and betaine organic compounds are both close to *δ* = 3.26 ppm, and thus, the signals are not distinguishable. Instead, the carbon chemical shift of methyl groups in TMAO is assigned at 62.2 ppm, while that of betaine is at 55.8 ppm. This information can be easily obtained by the 1H-13C HSQC experiment (see Figure A3), giving an unambiguous identification of the two organic compounds [145]. HMBC is an appropriate technique to analyze the correlations using the coupling of protons with heteroatoms, which are separated up to four bonds, providing complementary information to that given by HSQC for the structural characterization of metabolites.

**Figure A2.** 1H NMR spectrum of ultrafiltered human serum at 700 MHz with the identified compounds labeled above each of the corresponding peaks. Figure reprinted from Ref. [3] under the terms of the Creative Common CC-BY license.

**Figure A3.** 1H-13C 2D HSQC experiment to identify TMAO and betaine organic compounds of a biological matrix. Figure reprinted from Ref. [145] under the terms of the Creative Common CC-BY license.

Thus, metabolic identification can be easily reached by the combination of 2D-NMR techniques with the metabolite databases. However, in some cases, the concentration of metabolites is very low, and their peaks are often overlapped making their identification difficult, even when employing 2D NMR techniques. In these cases, if the sharp chemical shift of the compounds to identify is known, it is recommended to use a standard (reference) compound, which is added in the concentration range 10–100 μM. For instance, this method was applied to identify the uridine diphosphate (UDP) conjugates, which are present in very low concentrations in cellular extracts with overlapped peaks, but their chemical shift is well known and the signal-to-noise ratio (S/N) has sufficient intensity to be quantified by 1D/2D NMR experiments [145]. Figure A4 shows in details how the spiking of pure compounds into a mixture aids the identification of metabolites within the spectrum and also its quantification by performing peak fitting of the two spectral regions corresponding to UDP-nacetylglucosamin (UDP-Gluc-NAc). Note that the proton signal on the left side (*δ* = 5.50 ppm) of UDP-Gluc-NAc is overlapped to that of galactose-1-phosphate (Gal-1- P), whereas signals from the uridine group (*δ* = 5.95 ppm) superimpose with those from UDP-glucose. In addition, without spiking, it is almost impossible to define the shift of the methyl group belonging to UDP-Gluc-Nac acetyl (right region) since there is a big overlap with other signals, such as the multiplets from glutamine and glutamate.

The addition of a standard is also employed to obtain an absolute quantification of the metabolites contained in the sample. Therefore, the estimation of the metabolites concentration can be made by comparing the area of the metabolites NMR peaks with that of the reference sample by the following equation:

$$\frac{M}{S} = \frac{Im}{Is} \times \frac{Ns}{Nm} \tag{A1}$$

in which *M* and *S* represent the amounts of the considered metabolite and that of the reference, while Im and Is indicate the area under the curve of corresponding peaks, and Nm and Ns represent the number of protons which contribute to these bands, respectively [145]. To quantify a small set of metabolites whose resonances are well-resolved peaks, also the pulse length-based concentration determination (PULCON) quantitative NMR can be used. It considers that the signal intensity is inversely proportional to the duration of the 90◦ pulse adopted to excite nuclei [146].

**Figure A4.** (**A**): The spiking of UDP-nacetylglucosamine (UDP-Gluc-NAc) allows its identification and quantitation. (**B**): The same spectrum of A without the addition of UDP-Gluc-NAc. Figure reprinted from Ref. [145] under the terms of the Creative Common CC-BY license.

Finally, another possibility for quantitative NMR analysis was reached by using a digital standard electronic reference to access in vivo concentration (ERETIC) technique [147]. It consists of the generation of a signal via a second channel of the probe and the addition of it as a pseudo-FID during the acquisition of the proton experiment, resulting in a common NMR signal [148]. Initially, the ERETIC technique required to be calibrated before running the quantification measurements and some hardware rearrangements. Improvements of ERETIC are ERETIC2 (Bruker Biospin, Topspin 3.0) and quantification by artificial signal (QUANTAS) [149].

Considering the complexity of NMR spectrum of metabolites, often, peak integration is not a sufficient method for the quantitative estimation, and in these cases, the deconvolution approach is preferred. It consists in the fit of a target peak of the compound by using the signal acquired from the reference compound [150]. Different specific software for NMR, such as TopSpin (Bruker), MNova (Mestrelab Research), Spectrus Processor (ACD/Labs), Delta (JEOL) and Chenomx NMR Suite (Chenomx Inc.) can be used for this goal. Among them, JEOL Delta is the only one completely free of charge, while Chenomx NMR Suite seems to show the best performance because it is based on a sophisticated targeted profiling technology and on reference libraries containing hundreds of metabolite spectral data, allowing a user-friendly deconvolution of complex NMR spectra [151]. The spectral analysis and deconvolution can also be performed with non-specific software, such as Matlab (The MathWorks, Inc.) or R (The R Foundation).

Several factors (i.e., pulse sequence changes or variation in the repetition time) influence the deconvolution process and its accuracy, including the variety of standard compounds present in the library and the need to repeat the NMR data acquisition in the same experimental conditions. Changes in the pulse sequence and/or the repetition time result in a less accurate fitting. The performance of the deconvolution is also influenced by the protons' bond to nitrogen atoms, also called labile (for instance, the *α*-protons in amino acids) [145]. These protons fast exchange with the solvent, and this not only makes

it difficult to detect them, but also provokes changes in the line shape of the close protons peaks. The result is an attenuated resonance, which does not precisely match with the integral corresponding to the other proton peaks in the considered sample. Another error regards the partial peak saturation of the protons, with a resonance close to the presaturation peak of the solvent (commonly water). This was observed for the anomeric protons of carbohydrates, which are frequently resonant close to the water signal, or also for the CH quartet at *δ* 4.11 ppm in the lactate spectrum. Beyond these disadvantages, the deconvolution approach is a great and widely employed tool for the metabolomic quantification studies [145].

Generally, successful NMR metabolomics requires statistical analyses, which have become progressively advanced over the years, and are the focus of this review. Dependent and independent parameters are correlated by means of conventional approaches on the basis of the mathematical relationship and, in turn, on model fitting. On the other hand, machine learning approaches group input data based on a cluster classification without any statistical assumption, while deep learning is devoted to find statistical inferences from a large amount of input data. The future of NMR-based metabolomics is to generalize the learning approaches to optimize predictive ability for specific diseases.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-4554-7