*2.1. Analysis of Amino Acid Sequence Conservation Among Flavivirus C proteins*

A phylogenetic analysis of the *Flavivirus* C protein and the polyprotein amino acid residue sequences reveals if the C protein is an indicator of phylogenetic similarity (Figure 1). C proteins of Spondweni group viruses, i.e., ZIKV, Spondweni virus (SPOV) and Kedougou virus (KEDV), cluster together, being the most similar to DENV (Figure 1a). Another cluster corresponds to mosquito-borne encephalitis-causing *Flavivirus*: Saint Louis encephalitis (SLEV), WNV, WNV serotype Kunjin (WNV-K), Alfuy (ALFV), Murray Valley encephalitis (MVEV), Usutu (USUV) and Japanese encephalitis (JEV) viruses. The *Flavivirus* polyproteins sequences show similar clusters (Figure 1b). As such, the C protein is a good indicator of viral genetic similarity. Thus, we investigated the C protein amino acid sequences, seeking common patterns relevant to biological activity.

**Figure 1.** *Flavivirus* phylogenetic trees. Phylogenetic trees of (**a**) *Flavivirus* C proteins, highlighting in red the viruses with the C protein most similar to dengue virus (DENV) C (Spondweni group viruses (ZIKV), Spondweni virus (SPOV) and Kedougou virus (KEDV)) and of the (**b**) entire viral polyproteins of the same *Flavivirus*. Overall, despite some differences, the same general clusters are seen regardless of the clustering being based on the polyprotein or the capsid protein.

The amino acid residues sequences of the *Flavivirus* C proteins identified above were analyzed in the context of the three main regions identified in DENV C sequence, *i.e.*, the conserved fold region, the flexible fold region and the N-terminal IDP region (Figure 2). This was done for all mosquito-borne *Flavivirus* relevant for human diseases (Figure 2a), as well as for the four main DENV C serotypes (Figure 2b). For this, the 16 mosquito-borne *Flavivirus* and the 4 DENV serotypes amino acid sequence of the C protein are jointly aligned. In agreement with previous work [12,14], five conserved motifs are found in the mosquito-borne *Flavivirus* C proteins and deserve attention, namely: the N-terminal conserved 13hNML+R18; 40GXGP43 in loop L1-2; 44h+hhLAhhAFF+F<sup>56</sup> in α2 helix; 68RW69 of α3 helix; and, finally, the 84F++–h88 motif from α4 (with 'h', '+' and '–' representing hydrophobic, positively charged and negatively charged residues, respectively). Between residues 70–100, other motifs, not previously reported and containing hydrophobic and positively charged residues, are visible. Moreover, amino acid residues G and P, that can break the continuity of α-helices, are conserved in specific positions of the protein, especially in the disordered N-terminal and the flexible fold regions (Figure 2c). Charged residues are also conserved in specific locations. They are mostly in the conserved fold region, especially after position 95 (Figure 2d). Overall, the disordered N-terminal and the flexible fold regions, when compared with the conserved fold region, have an average of, respectively, 10 versus 4 G and P residues (Figure 2c), green, 10 versus 15 K and R residues (Figure 2d), blue, and 1 versus 2 D and E residues (Figure 2d), magenta.

**Figure 2.** *Flavivirus* C proteins amino acid residues sequence conservation. (**a**) Mosquito-borne *Flavivirus* C protein are 55% conserved, with residues being considered conserved if, in a given position, more than 15 are equal (red) or stereochemically similar (black). (**b**) Conservation between DENV serotypes is 80%, with the same criteria as in (**a**). (**c**) Structure-breaking residues G and P (green). (**d**) Charged residues: dark blue for positively charged residues (K and R), light blue for H, and magenta for negatively charged residues (D or E). (**e**) Overall conserved regions of *Flavivirus* C proteins: the disordered N-terminal and the conserved fold are clearly conserved in terms of charged and G/P amino acids. In contrast, the flexible fold region allows higher variability. Thus, its main role seems to be to connect the disordered N-terminal and the conserved fold regions, and to enable alternative conformations. DENV C serotype 2 is highlighted in blue, with amino acid residues numbered according to its sequence. Amino acid residues are numbered according to the consensus, coinciding with DENV-2 residues numbers. The viruses' full designation is found in the abbreviations section. <sup>272</sup>

Several motifs in the *Flavivirus* C protein sequences can be identified. These represent the main sections of the protein, conserved during evolution as these must be crucial to protein function (Figure 2e). The N-terminal region, although disordered, is highly conserved, in terms of charged amino acid and G/P residues. The flexible fold section allows greater variability, in line with previous reports by us and others, suggesting that it can adopt several conformations [15].
