*2.2. Phylogenetic Analysis*

A phylogenetic tree of the *Zobellia* genus including all type strains and representatives of related genera was inferred based on 16S rRNA partial sequences, which were retrieved from genomic sequences and a nucleotide sequence for *Z. russellii* KMM 3677T. According to the neighbor-joining (NJ) tree (Figure 1), all *Zobellia* clustered together and three subclades could be distinguished. One subclade included *Z. uliginosa* and strains of *Z. galactanivorans*, while the other subclade included *Z. laminariae* and strains of *Z. amurskyensis*. This clustering indicates a closer sequence similarity of the strains within subclades. Interestingly, *Z. russellii* branched deeply within the *Zobellia* clade and demonstrated significant evolutionary divergence from all other strains in the genus, supported by high bootstrap values.

**Figure 1.** Phylogenetic relationships of *Zobellia* species and representatives of the related genera of the family *Flavobacteriaceae*, based on 16S rRNA gene sequence comparisons. The phylogenetic tree was constructed using the neighbor-joining (NJ) approach [34] with bootstrap support of 1000 replications. The scale bars represent 0.01 substitutions per site.

In order to clarify in detail the phylogenetic relationships of *Zobellia* species based on obtained and known draft genomes, further phylogenomic measures were performed using the JSpecies Web Server (JSpeciesWS; http://jspecies.ribohost.com/jspeciesws/). JSpeciesWS is a web service for in silico calculation of the extent of identity between genomes. The service measures the average nucleotide identity (ANI) based on BLASTþ (ANIb) and MUMmer (ANIm), as well as correlation indexes of tetranucleotide (Tetra) signatures [35].

The ANI and Tetra values were calculated and are summarized in Table 3. Consistent with the NJ tree, the genomes of *Z. galactanivorans* OII3 1c and *Z. amurskyensis* MAR 2009 138 strains showed ANI values above 97% with their corresponding type strains, which clearly matched the recommended cut-off point for species delineation of ∼96% ANI [36]. Some discrepancies between ANI and Tetra values were observed for *Z. uliginosa*. Although Tetra signatures were in range >0.989, implying that *Z. uliginosa* is closely related to strains of *Z. galactanivorans*, the estimated ANI values of 92%–94% were slightly lower than the species delineating threshold. Therefore, these strains could either belong to the same species from which *Z. uliginosa* recently diverged, or they are two discrete, albeit closely related, species.


**Table 3.** Results of average nucleotide identity (ANI; %) and tetranucleotide (Tetra) calculations using JSpecies Web Server (JSpecies WS).

#### *2.3. Comparative Genomics*

Since *Z. galactanivorans* DsiJ<sup>T</sup> and *Z. galactanivorans* OII3 1c represent the same species, the genome of strain OII3 1c was excluded from the analysis. However, despite the ANI values, the genome of *Z. amurskyensis* MAR 2009 138 was taken into comparative analysis along with a novel draft genome of the type strain KMM 3526Т.

Gene prediction and preliminary annotation of *Z. amurskyensis* and *Z. laminariae* genomes were performed with the Rapid Annotation using Subsystems Technology (RAST) server (http: //rast.theseed.org/FIG/rast.cgi). In addition to the identification of genes, RAST groups annotated genes into functional subsystems represented by 27 categories of well-characterized metabolic processes and structural complexes [37–39]. Based on such data, we could estimate the contribution of diverse metabolic processes to bacterial life strategies. The total number of protein-coding sequences of 4248 and 4334 accounted for KMM 3526<sup>Т</sup> and KMM 3676<sup>Т</sup> genomes, among which only 2683 and 2699 genes were functionally annotated, respectively. According to the server, about 1500 genes for both flavobacteria are in subsystems, among which "Carbohydrates" was ranked first in gene content.

Genome characteristics of *Z. amurskyensis* and *Z. laminariae* in comparison to publicly available *Zobellia* genomes are shown in Table 4. Genome sizes ranged slightly within 5.14 Mb to 5.52 Mb. Estimated GC content ranged from 36.77% in *Z. laminariae* to 42.8% in *Z. galactanivorans*. It is worth noting that the comparison was made between draft genomes, for which reason overall metrics strongly depend on genome assembly completeness and annotation methods. Since the obtained genomes were annotated using RAST, other genomes from NCBI were also passed through the RAST server for further comparative analysis.

**Table 4.** Comparison of the genome characteristics of *Zobellia* strains.


Genome-wide exploration of orthologous genes/clusters across different species is important in comparative genomics to understand molecular evolution, structure of genes and genomes, as well as adaptive capabilities [40]. Orthologs or orthologous genes originate by vertical descent from a single gene in the last common ancestor [41]. Comparison and annotation of orthologous clusters between five *Zobellia* genomes were performed using the web server OrthoVenn2 (https: //orthovenn2.bioinfotoolkits.net/home) [42]. Inferred proteins for each genome by RAST annotation were used as input. Consistent with phylogenomic analysis, the pairwise heatmap (Figure 2) demonstrates the phylogenetic proximity of *Z. galactanivorans* to *Z. uliginosa* at the ortholog level.

**Figure 2.** The pairwise heatmap of overlapping cluster numbers across the genomes.

The Venn diagram is widely used to visualize similarities and differences between genomes. The distribution of shared orthologous clusters and singletons for each strain is depicted in Figure 3. Singletons are genes for which no orthologs could be found in other species; single-copy gene clusters are clusters that contain single-copy genes in each species [42]. According to cluster analysis, the genomes shared 4853 clusters constituting a supposed pan-genome of the *Zobellia* genus. The core-genome represented in all strains was estimated in 2963 clusters whose functions were mostly assigned to the cellular metabolic process.

**Figure 3.** The Venn diagram plotted by OrthoVenn2 shows shared orthologous protein clusters among the genomes of five *Zobellia* strains. The numbers of shared and unique genes, singletons are shown.

From Figure 3, it is apparent that 516 orthologous clusters composed of 1044 genes were represented only in *Z. galactanivorans* and *Z. uliginosa* genomes, while the genomes of *Z. amurskyensis* KMM 3526<sup>Т</sup> and *Z. amurskyensis* MAR 2009 138 shared 324 clusters of 658 genes. Such clusters are presumably species-specific. Gene ontology (GO) analysis revealed an enrichment of GO:0005983 "starch catabolic

process" in both groups. The group of 516 clusters had additionally GO:0016139 "glycoside catabolic process" and the second group had GO:0008484 "sulfuric ester hydrolase activity".

The dispensable genome of the *Zobellia* genus is composed of singletons or inparalogs, which were unique to each strain. The*Z. laminariae* genome contained the highest number of unique genes, including 633 singletons and 11 clusters of 26 inparalogs. For *Z. uliginosa*, *Z. galactanivorans,* and *Z. amurskyensis* MAR 2009 138 617/562/454 singletons and 48/23/58 inparalogs, respectively, were identified. In the genome of *Z. amurskyensis* KMM 3526Т, there were only 304 singletons. These accessory genes possibly affect metabolic differences within *Zobellia* representatives and determine peculiarities of lifestyle in certain ecological niches, such as sediment, seaweeds, or seawater. However, it should be noted that these differences also could be explained by different completeness of the genomes.
