*2.2. Analysis of the Flavirus C Protein Sequences Hydrophobicity and Secondary Structure Propensity*

Hydrophobicity and α-helical propensity predictions were performed as previously reported [15], using the Kite-Doolittle [26] and the Deleage-Roux [27] scales on ProtScale server, respectively, for the 16 mosquito-borne *Flavivirus* C proteins analyzed (Figure 3). The hydrophobicity scale ranges from −4.5, for highly polar amino acids (hydrophilic), to 4.5, for highly hydrophobic amino acid residues [26]. Therefore, when plotting the average values for each amino acid residue of the *Flavivirus* C sequences, negative local minima and positive local maxima indicate, respectively, hydrophilic and hydrophobic regions (Figure 3a,b). All proteins display a similar profile even in the N-terminal and flexible fold regions despite the slightly higher amino acid residues variability (Figure 2). The α0 domain, homologous to pep14-23, is amphipathic, with average values near 0. In the flexible fold region, which is mostly amphipathic too, there is a peak of hydrophobicity between residues 30 and 40, possibly explaining its intermediate structure/dynamics behavior [13,14]. Some peaks of hydrophobicity are observed in the α3 and α4 domains, with the most hydrophobic domain being α2, as expected from the sequence analysis (Figure 2) and from the literature [12,14,18].

**Figure 3.** *Flavivirus* C proteins hydrophobicity and secondary structure predictions. (**a**) Hydrophobicity predictions and (**b**) respective average (black line) ± standard deviation, SD (gray lines). (**c**) α-helical secondary structure predictions and (**d**) respective average (black line) ± SD (gray lines). Amino acid residues are numbered according to the consensus, coinciding with DENV 2 residues numbers.

For α-helical predictions secondary structure is highly probable above a threshold of 1.0 [27]. *Flavivirus* C proteins secondary structure predictions correlate well with the known secondary structure of DENV C (Figure 2e) [12]. Such agreement supports the concept of a transient α0 occurring for these proteins, as hypothesized earlier [15]. Roughly, between positions 12 to 20, occurs a disordered region

with high tendency to acquire α-helical secondary structure. Importantly, the values of the predictions are similar and the same tendencies are found in all proteins, with peaks and valleys co-localizing (Figure 3). Along with data from the last subsection, these results strengthen the idea that *Flavivirus* C proteins have similar structure and dynamics properties.

#### *2.3. Analysis of the Flavivirus C Protein Tertiary Structure Propensity*

*Flavivirus* C proteins tertiary structure was then investigated, complementing the α-helical predictions, to help understanding the disordered N-terminal region role(s). Following previous work [15], I-TASSER [28–30] was used to predict tertiary structures for the 16 closely related mosquito-borne *Flavivirus* C proteins (Figure 4). Eighty monomer conformations were obtained (several for each sequence) and superimposed with the DENV C homodimer partial structure deposited at the Protein Data Bank (PDB) and obtained via nuclear magnetic resonance (NMR) spectroscopy (PDB ID: 1R6R). Noteworthy, DENV [12,16], WNV [31] and ZIKV [25] C proteins form homodimers, stabilized by hydrophobic and electrostatic interactions involving their conserved fold region [12–14,25,31–33]. Since this is the most conserved region of *Flavivirus* C proteins sequences (Figure 2), a homodimer is thus not only a stable conformational arrangement, but also likely to occur. Thus, as 28 conformers had more than 5 backbone clashes with the other monomer when superimposed in a homodimer structure (not allowing a viable homodimer), those conformers were discarded Table 1. The remaining 52 *Flavivirus* C proteins conformational models were analyzed, while superimposed with DENV C homodimer (PDB ID: 1R6R, model 21 [12]). These were then grouped into four clusters by visual inspection of their similarity (Figure 4).

**Figure 4.** *Flavivirus* C proteins tertiary structure predictions, organized into four conformational clusters. The *Flavivirus* C proteins conformations predicted by I-TASSER are superimposed with DENV C experimental homodimer structure (black). Amino acid residues of the N-terminal region in α-helix conformation are in blue, the other α-helices in red and the loops in gray. From the 80 conformers, 52 can be clustered by similarity of conformations, from cluster A to D. Clusters A, B and C have the α1 helix in the DENV C experimentally determined conformation (Protein Data Bank (PDB) ID: 1R6R [12]). In cluster D the α1 is in West Nile Virus (WNV) C and ZIKV C conformation (PDB IDs: 1SFK [31] and 5YGH [25], respectively). The closed autoinhibitory conformation of cluster C seems the most probable, having the highest number of models. Although unlikely given their transient unstable nature, N-terminal IDP regions may interact with each other. Table 1 specifies each cluster composition.

Most sequences have a conformer in each cluster (Figure 1 and Table 1). In cluster A, some N-terminal amino acid residues are close to α4–α4 and may interact with RNA, namely the positively charged residues. Cluster B has the most scattered conformers, with the N-terminal region at the "top", not interacting with other protein regions, resembling a transition between more ordered states. In cluster C, the N-terminal region is in an autoinhibitory conformation, blocking the access to the α1–α2–α2 –α1 region, as previously suggested by us for DENV C [15]. 18 conformer models are predicted in this closed conformation with, at least, one model from most of the C proteins tested (except JEV C and ZIKV C; see Table 1). Therefore, it can occur in most *Flavivirus* C proteins. As for cluster D conformation, the α1 helix is in the conformation of WNV [14,31] and ZIKV [25] C experimental structures, an arrangement not previously reported for DENV C [15]. This closed conformation also involves the N-terminal region and α1 domain, and partially blocks the α2–α2 hydrophobic cleft (or totally blocks it, when both monomers are in the same conformation). Importantly, both cluster C and D are closed conformations, supporting the autoinhibition hypothesis.


**Table 1.** Distribution of the I-TASSER predicted models through the four clusters.

Dimers with A or B conformers in one monomer enable the simultaneous co-existence of all other conformers (A to D) on the other monomer. The C conformer neither permits the existence of C-C homoconformers (i.e., both monomers in the same conformation) nor the heteroconformers of C-D and D-C . Despite that, D-D homoconformers are allowed, similarly to the conformation that WNV C adopts in the crystal form [31]. Moreover, to go from cluster A to cluster C or D, the N-terminal region should pass by cluster B. These constraints suggest a path for transitions between conformations, discussed ahead. Overall, the autoinhibition hypothesis proposed for DENV C [15] is supported and such conformation can occur in other *Flavivirus* C proteins.
