*5.8. Statistical Analyses*

T-tests were run in Statistica (v. 13, Tibco Software, Palo Alto, CA, USA) to determine significance (*p* < 0.05) of site-to-site, within- and between-colony variation in PalA concentrations. Similarity matrix and hierarchical clustering analyses were performed using PRIMER v.7 and PERMANOVA+ (PRIMER-e, Auckland, New Zealand). Analyses were performed on the complete microbiome as well as the three microbiome fractions in most cases. The ASV occurrence data were square root transformed for all analyses. A heat map based on the Core80 ASV occurrence was generated with the transformed data, and hierarchical clustering with group average parameter was employed, which was integrated with SIMPROF confidence using 9,999 permutations and a 5% significance level. Bray–Curtis resemblance matrixes were created using ASV occurrences without the use of a dummy variable. To determine basic patterns in community structure within- vs. between-colony and site variation, the significance was determined by t-test using Statistica v. 13. To compare within- (n = 9) vs. between-colony variation (n = 27), nine between-colony pairwise similarity values were randomly sampled in order to compare equal sample sizes, checking that the homogeneity of variance was similar between them. Then threshold metric Multi-Dimensional Scaling (tmMDS) was conducted based on Kruskal fit scheme 1, including 500 iterations, and a minimum stress of 0.001. Similarity profile testing through SIMPROF was performed based on a null hypothesis that no groups would demonstrate di fferences in ASV occurrences. This clustering algorithm was also used to generate confidence levels on the MDS plot, which were set to 65% and 75%. In addition, 95% bootstrap regions were calculated with 43 bootstraps per group, set to ensure a minimum rho of 0.99. In order to assess the contribution of each factor to the variance of the microbial community in this nested experimental design, Site(Colony), permutational multivariate analysis of variance (PERMANOVA) was used. Site-based centroids were calculated and the PERMDISP algorithm was used to determine the degree of dispersion around the centroid for each site. Overall, site-to-site di fference in dispersion was determined and pairwise comparisons were also calculated, with 9999 permutations used to determine significance (P(perm) < 0.05). Exploratory analysis of the major ASV contributors to similarity was performed using the SIMPER procedure based on sites and colonies, with a cut o ff for low contributions set to 70%.

Co-occurrence networks were constructed using filtered ASV occurrence data sets in which the ASV were filtered to only those that were present in at least five samples resulting in a 102 ASV data set. The 102 × 63 matrix was provided as input to FlashWeave v1.0 [64] using default parameters, and visualized in Gephi v. 0.9.2 [65]. Then to consider whether the ASVs in the Core80, Dynamic50, or Variable fractions of the SaM were a ffiliated with particular levels of PalA in the ascidian lobes, PalA niche robust optimum and range were computed using the occurrence and dry weight-normalized contextual data [66]. Weighted gene correlation network analysis (WGCNA package in R [67]) was used to identify modules and their correlation with PalA levels. The matrix was total-sum normalized [68], and WGCNA was used in signed mode. There were few modules detected, although they were

not correlated with PalA. Modules were projected on the FlashWeave co-occurrence network and called subsystems.

### *5.9. Biosynthetic Gene Cluster Analysis*

Subsequently, in order to predict the likelihood of Core80 ASV lineages harboring the potential for natural product biosynthesis, we designed a meta-analysis of neighboring genomes found at the Integrated Microbial Genomes (IMG) database [69]. The analysis was conducted only for ASVs in which confidence of taxonomic assignment was at the genus level. Therefore, genomes were harvested from IMG that were associated with a total of 9 genera (*Microbulbifer* (16 genomes), *Pseudovibrio* (24 genomes), *Endozoicomonas* (11 genomes), *Nitrosomonas* (19 of 68 total genomes in this genus), *Nitrospira* (14 genomes), *Hoeflea* (7 genomes), *Lutibacter* (12 genomes), *Halocynthilibacter* (2 genomes), and in the case of *Lentimonas*, since no genomes were found, we harvested 8 genomes from the *Puniciococcaceae* family). This results in 113 genomes that were submitted to antiSMASH [70] for analysis. The genomes and counts of biosynthetic gene clusters assigned to nonribosomal peptide synthase, polyketide synthase, or a hybrid of the two classes were tabulated (Table S4).
