*2.6. Bioinformatics Analysis*

An ad hoc bioinformatics pipeline was built up under the R environment [14] Raw sequences were processed using USEARCH (version 10.0.240). Paired reads were merged, and low-quality reads were discarded. Filtered reads were assigned to di fferent taxonomic levels (from phylum to species) and organised into operational taxonomic units (OTUs). Sequences were clustered at 97% nucleotide similarity, and chimeric ones were filtered out; their taxonomy was assessed through the Greengenes 16S rRNA bacterial database (version 13.8) [15]. Data were normalized with the Total Sum Scaling method, and normalized OTUs were used to investigate community diversity in each sample biotype. The observed richness and the Chao1 [16] and Shannon [17] indices were calculated to analyse the within-sample species richness ( α-diversity). The β-diversity analysis was conducted to estimate the between-sample diversity, using the generalized UniFrac index as a distance metric [18]. The resulting phylogenetic matrices were represented by multidimensional scaling. Permutational Multivariate Analysis of Variance (PERMANOVA) was performed for β-diversity analysis to statistically assess the grouping of samples by diagnosis. Microbial profiles obtained for each taxonomic level and for each sample biotype were compared among patient groups using the Mann–Whitney U-test, the

Kruskal–Wallis Rank Sum test, and a 20% cuto ff for prevalence. For all the statistical analyses, the significance threshold (*p*-value) was set to 0.05, and all the obtained *p* values were corrected for multiple testing with the Benjamini–Hochberg method.
