*2.2. Analysis of Intrinsic Flexibility and Disorder of the Bacterial S1 Proteins and Its Domains.*

For analysis of intrinsic flexibility and disorder of the full length bacterial S1 proteins and its separate structural domains we used the FoldUnfold (average window 11 aa and 5 aa) and IsUnstruct programs; their possibilities and accuracy were described in [26–29]. The obtained results are given in Table 1.

Analysis of the percentage of disorder in the full length S1 proteins and in their separate domains by the FoldUnfold (average window 11 aa and 5 aa) and IsUnstruct programs revealed their close similarity (Table 1).

For full-length proteins, the highest percentage of disorder was detected for four- (30%) and five-domain (30%) containing proteins using the FoldUnfold program (average window 5 aa). The smallest percentage was in the six-domain proteins (13%) when using the FoldUnfold program (average window 11 aa). This indicates the predominance of relatively short flexible or unstructured regions in the considered sequences of the proteins of this group, consistent with the fact that the binary predictor of the CH-CDF plot revealed the ordered states for 67% of proteins in this group.

Most of the separate S1 domains exhibited disorder values around 20%. The lowest percentage of disorder (except the third domain in three-domain containing proteins and the separate domains in the one-domain containing proteins) predicted by the FoldUnfold program (average window 5 aa) was the third domain in six-domain containing proteins (13%). Using the FoldUnfold program (average window 11 aa) and IsUnstruct for this domain also revealed a relatively low percentage of intrinsically disorder compared with other domains in this group and other groups (by the number of domains), 19% and 21%, respectively. The largest percentage of disorder predicted by the IsUnstruct program belonged to the sixth domain in the six-domain containing proteins (45%). Using the FoldUnfold

program for six-domain containing proteins, a propensity for a more disordered state in the terminal domains was also identified. Note that, earlier, we have shown that for long S1 proteins (six-domain S1 proteins) the central part of the proteins (the third domain) is more conservative (as a percent of identity between separate domains) than the terminal domains, and apparently is vital for the activity and functionality of S1 proteins [6].

The concept of order and disorder in protein segments has often been investigated in correlation with the presence or absence of protein repeats at the sequence level. It is noticed that intrinsically disordered proteins often correspond to regions of low compositional complexity (low sequence entropy) and sometimes to repetitive sub-sequences, for example, in fibrillar proteins [30]. Also in some special cases, protein repeats (for example, in the PEVK ((Pro-Glu-Val-Lys) domain) regions of human titin, the prion proteins, or the CTD domain of RNA polymerase) are discussed in detail [31]. However, these findings on specific instances are hard to generalize. A general property observed is that a higher level of repeat perfection correlates positively with the disordered state of protein sub-chains [21].

S1 proteins, having a low degree of conservatism (not perfect repeats) [6], in addition to the found low degree of disorder within and between the domains, demonstrate the unique structural organization of proteins of this family. Apparently, the organization is closer to the formation of the quaternary structure of globular proteins, with the same structural organization of individual structural domains.


**Table 1.** Intrinsic flexibility and disorder of S1 protein family and its structural domains. The largest and smallest values are highlighted in bold.
