*2.1. Participants*

Participants were recruited among enrollees of BIOSPHERE and GuLiver Axis studies. Both studies were approved by the Ethics Committee of the Università Cattolica del Sacro Cuore (Rome, Italy; protocol number BIOSPHERE: 8498/15; protocol number GuLiver Axis: 741). Study procedures and criteria for participant selection have been previously described [24,25].

In both studies, community-dwellers aged 70+ were recruited after signing written informed consent. The presence of PF&S was established according to the operational definition elaborated in the SPRINTT project [7,27]: (a) physical frailty, based on a summary score on the Short Physical Performance Battery (SPPB) [28] between 3 and 9; (b) low appendicular muscle mass (aLM), according to the cutpoints of the Foundation for the National Institutes of Health (FNIH) sarcopenia project [29]; and (c) absence of mobility disability (i.e., ability to complete the 400-m walk test) [30]. The present investigation involved 35 participants, 18 with PF&S, and 17 nonsarcopenic nonfrail (nonPF&S) controls. Gut microbial profiles, circulating inflammatory mediators, and serum AAs and derivatives were assessed.

#### *2.2. Measurement of Appendicular Lean Mass by Dual X-ray Absorptiometry (DXA)*

aLM was quantified through whole-body DXA scans on a Hologic Discovery A densitometer (Hologic, Inc., Bedford, MA, USA) according to the manufacturer's procedures. Criteria for low aLM were as follows: (a) aLM to body mass index (BMI) ratio (aLMBMI) <0.789 and <0.512 in men and women; or (b) crude aLM <19.75 kg in men and <15.02 kg in women when the aLM/BMI criterion was not met [29].

#### *2.3. Blood Sample and Stool Collection*

Blood samples were collected in the morning by venipuncture of the median cubital vein after overnight fasting, using commercial collection tubes (BD Vacutainer®; Becton, Dickinson and Co., Franklin Lakes, NJ, USA). Serum separation was obtained after 30 min of clotting at room temperature and subsequent centrifugation at 1000× *g* for 15 min at 4 ◦C. The upper clear fraction (serum) was collected in 0.5 mL aliquots and stored at −80 ◦C until analysis.

Participants were carefully instructed on the procedures for fecal sample collection. Stool samples were collected at home in a commercial sterile, dry screw-top container. Upon collection, stool samples were delivered to the Human Microbiome Unit at the Bambino Gesù Children's Hospital (Rome, Italy) and immediately frozen at −80 ◦C until further processing.

#### *2.4. Measurement of Circulating Inflammatory Mediators*

A multi-marker immunoassay was used to measure circulating levels of a panel of inflammatory markers [11,25,26,31]. Briefly, a set of 27 pro- and anti-inflammatory mediators, including cytokines, chemokines, and growth factors, were assayed in duplicate in serum samples using the Bio-Plex Pro Human Cytokine 27-plex Assay kit (#M500KCAF0Y, Bio-Rad, Hercules, CA, USA) on a Bio-Plex® System with Luminex xMap Technology (Bio-Rad) (Table 1). Data acquisition was performed with the Bio-Plex Manager Software 6.1 (Bio-Rad) using instrument default settings. Optimization of standard curves across all of the assayed analytes was carried out to remove outliers. Results were obtained as concentrations (pg/mL).


**Table 1.** List of serum inflammatory biomarkers assayed by multiplex immunoassay.

Abbreviations: CCL, C-C motif chemokine ligand; FGF, fibroblast growth factor; G-CSF, granulocyte colony-stimulating factor; GM-CSF, granulocyte macrophage colony-stimulating factor; IFN, interferon; IL, interleukin; IL1Ra, interleukin 1 receptor agonist; IP: interferon-induced protein; MCP-1: monocyte chemoattractant protein 1; MIP: macrophage inflammatory protein; PDGFBB, platelet derived growth factor BB; TNF, tumor necrosis factor.

#### *2.5. Determination of Circulating Amino Acids*

Serum concentrations of 37 AAs and derivatives were determined by ultraperformance liquid chromatography/mass spectrometry (UPLC/MS), as described previously [10]. Briefly, 50 μL of sample was added to 100 μL 10% (w/v) sulfosalicylic acid containing an internal standard mix (50 μM) (Cambridge Isotope Laboratories, Inc., Tewksbury, MA, USA) and subsequently centrifuged at 1000× *g* for 15 min. The supernatant was collected, and 10 μL was mixed with 70 μL of borate buffer and

20 μL of AccQ Tag reagents (Waters Corporation, Milford, MA, USA). The mixture was subsequently heated at 55 ◦C for 10 min. Samples were eventually loaded onto a CORTECS UPLC C18 column 1.6 μm 2.1 × 150 mm (Waters Corporation) for chromatographic separation (ACQUITY H-Class, Waters Corporation) and eluted at a flow rate of 500 μL/min with a linear gradient (9 min) from 99:1 to 1:99 water 0.1% formic acid/acetonitrile 0.1% formic acid. Analyte detection was performed on an ACQUITY QDa single quadrupole mass spectrometer equipped with electrospray source operating in positive mode (Waters Corporation). AA controls (level 1 and level 2) manufactured by the MCA laboratory of the Queen Beatrix Hospital (Winterswijk, The Netherlands) were used to monitor the analytic process.

#### *2.6. Gut Microbiota DNA Extraction, 16S rRNA Amplification, and Sequencing*

Total genome DNA was extracted from fecal samples using the QIAmp Fast DNA Stool mini kit (Qiagen, Germany), according to the manufacturer's instructions.

The V3-V4 region of the 16S rRNA gene (~460 bp) was amplified using the primer pairs 16S\_F (5-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG CCT ACG GGN GGC WGC AG-3 ) and 16S\_R (5-GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GGA CTA CHV GGG TAT CTA ATC C–3), reported in the MiSeq rRNA Amplicon Sequencing protocol (Illumina, San Diego, CA, USA). Amplification reactions were set up using a 2× KAPA HiFi HotStart Ready Mix (KAPA Biosystems Inc., Wilmington, MA, USA). AMPure XP beads (Beckman Coulter Inc., Beverly, MA, USA) were employed to clean-up the DNA amplicons. To obtain a unique combination of bar-code sequences, a second amplification step was performed using the Illumina Nextera forward and reverse adaptor-primers (Illumina, San Diego, CA, USA). The final library was quantified after a clean-up step using a Quant-iT ™ PicoGreen ® dsDNA Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA) and diluted in an equimolar concentration (4 nM).

Samples were sequenced using an Illumina MiSeqTM platform, following the manufacturer's specifications, generating 300 base-length paired-end reads. Bacterial 16S rRNA amplicon data were analyzed using a combination of the QIIME 1.9.1 software pipeline and the VSEARCH v1.1 pipeline. Fastq-join was used to merge paired-end raw sequences, followed by a split library step (QIIME). After dereplication and chimera checking (VSEARCH), reads were then clustered into operational taxonomic units (OTUs) at 97% identity. Taxonomy of each of 16S rRNA gene sequence was assigned using the UCLUST against the Greengenes 13.8 database (97% sequence similarity).

## *2.7. Statistical Analysis*

Analyses were performed using the freely available software environment for statistical computing and graphics R statistics program (version 3.4.0). Sequential and Orthogonalized Covariance Selection (SO-CovSel) statistics were run under Matlab R2015b environment by means of in-house written functions (freely available at www.chem.uniroma1.it/romechemometrics/research/algorithms/).

Descriptive statistics were run on all data. Di fferences in demographic, anthropometric, clinical, functional characteristics, and inflammatory and metabolic markers between PF&S and nonPF&S participants were assessed via *t*-test statistics and χ2 or Fisher exact tests, for continuous and categorical variables, respectively. All tests were two-sided, with statistical significance set at *p* < 0.05.

To compare the gu<sup>t</sup> microbiota alpha diversity between PF&S and nonPF&S participants, Chao1 index was calculated on raw data and di fferences were assessed by Wilcoxon test.

Data were then preprocessed removing OTUs not seen more than three times in at least 20% of the samples and were normalized using a regularized logarithm transformation (rlog). Di fferential abundance analysis between PF&S and nonPF&S groups at the phylum, family, and genus levels was carried out using a negative binomial distribution on data normalized by "size factors", taking into account sequencing depth between samples. Di fferences in bacterial abundance were reported as log2 fold change (log2FC). Only comparisons with a log2FC > or <±1.5 and an adjusted (Benjamini–Hochberg method) *p* value < 0.05 were considered significant.

After import into MatLab, serum concentrations of inflammatory and metabolic markers and the abundance of gu<sup>t</sup> microbial OTUs were organized into three matrices (Table 2), to be further processed through a multi-block approach. Given its ability to provide accurate predictions and, at the same time, to identify a parsimonious number of relevant variables (putative markers), the analysis was carried out through the recently developed SO-CovSel algorithm [32].


**Table 2.** Composition of the multi-block dataset used for Sequential and Orthogonalized Covariance Selection (SO-CovSel) analysis.

SO-CovSel is a predictive method that couples variable selection (through the CovSel approach) with sequential multiblock modeling, and it can be used to deal with both quantitative and qualitative responses. According to the method, the response(s) to be predicted can be expressed as a linear combination of variables from the different blocks, as described by the following equation:

$$Y = X\_1B\_1 + X\_2B\_2 + X\_3B\_3$$

The matrices *B*1, *B*2, and *B*3 collect the regression coefficients relating the individual blocks to the response(s). Within the multiblock linear regression framework summarized by the previous equation, one of the main peculiarities of the SO-CovSel methods is that not all the variables from the various blocks are used as predictors, but only the most relevant ones, which are selected according to the CovSel algorithm [33]. In CovSel, the first variable is selected as the one having the maximum covariance with the response. The subsequent variables are selected according to the same criterion, but after having orthogonalized both the X and the Y with respect to the contribution of the previously selected predictors, to avoid redundancy. The other main characteristic of the SO-CovSel method, which derives from its analogy with sequential and orthogonalized partial least squares regression (SO-PLS) [34,35], is that the di fferent blocks are sequentially modeled, after having been orthogonalized with respect to the contribution of the previous ones. This avoids scaling issues and allows evaluating whether the block adds new information or it is redundant.

Based on these considerations, for a problem involving three blocks of predictors, as the one addressed in the present study, the SO-CovSel algorithm can be schematically summarized by the following steps:

1. CovSel algorithm is used to select relevant variables and calculate a regression model between the first block and the responses

$$Y = X\_{1\text{sol}}B\_1 + E\_1$$

2. The second block is orthogonalized with respect to the variables selected in the first block

$$X\_{2\text{orth}} = \left[I - X\_{1\text{sd}}(X\_{1\text{sd}}^T X\_{1\text{sd}})X\_{1\text{sd}}^T\right]X\_{2\text{sd}}$$

3. CovSel algorithm is used to select relevant variables and calculate a regression model between the orthogonalized second block and the residuals from the first fit

$$E\_1 = X\_{2 \text{selectth}} B\_{2 \text{orth}} + E\_2$$

4. The third block is orthogonalized with respect to the variables selected in the first and second blocks

$$\mathbf{X\_{3orbh}} = \left[I - \mathbf{X\_{12sd}} \big(\mathbf{X\_{12sd}^T} \mathbf{X\_{12sd}}\big) \mathbf{X\_{12sd}^T}\right] \mathbf{x\_{3bl}}$$

where

$$X\_{12sd} = \left[ X\_{1sd} X\_{2orblsel} \right]\_{\text{'}} $$

5. CovSel algorithm is used to select relevant variables and calculate a regression model between the orthogonalized third block and the residuals from the second fit

$$E\_2 = X\_{3 
earrow 
thron{B}\_{3 
oights{th}}} + E\_3$$

6. An overall prediction model is built as

$$Y = \hat{Y} + E\_3 = X\_{1\text{scl}}B\_1 + X\_{2\text{scl}\text{orth}}B\_{2\text{orth}} + X\_{3\text{scl}\text{orth}}B\_{3\text{orth}} + E\_3$$

where the predicted response *Y*ˆ is calculated as

$$\hat{Y} = X\_{1sel}B\_1 + X\_{2sel}B\_{2ordh} + X\_{3sel}B\_{3ordh}$$

The algorithm, described in the steps above for regression (i.e., for the prediction of a quantitative response) can easily be adapted for classification problems, such as the one addressed in the present study. Indeed, by suitably coding the response matrix Y, a classification problem can be straightforwardly turned into a regression one. In particular, for a problem involving two classes, Y is a binary coded vector that takes the value 1 for PF&S participants and 0 for nonPF&S controls. The classification is then accomplished by properly thresholding the value of the predicted response.
