*2.2. Rate of Evolution per Site in RTT and RTT-like Causing Proteins*

We calculated the evolutionary rates of MeCP2, CDKL5, and FOXG1 in chordates to investigate their relationships with structural features and the distribution of missense point mutations that have previously been suggested to contribute to RTT or RTT-like syndrome. We used the human sequence as a reference and determined standardized evolutionary rate scores (Z scores), with values greater than or less than zero reflecting evolution at a faster and slower than average rate, respectively (Figure 2 and Supplementary Table S2). Evolutionary rates per site showed similar patterns in all proteins, with low rates of evolution more commonly observed in domains and ordered regions; some exceptional cases such as the transcriptional repression domain (TRD) of MeCP2 showed a partial higher rate of amino acid substitution. On the other hand, non-domain regions that were also usually disordered—excluding the ordered region surrounding a domain in FOXG1—typically exhibited a higher evolutionary rate, although some regions with low rates of evolution were nonetheless detected (Figure 2). This was corroborated by the distribution of evolutionary rates for predicted structural order–disorder residues in the three proteins, with disordered residues showing a wide and overlapping distribution that reflected their conservation. The evolutionary rates of ordered and disordered regions are significantly distinct in those three proteins (*p* < 2.2e−16 for CDKL5 and FOXG1 and *p* < 6.409e−08 for MeCP2, Mann–Whitney U-test; Figure S1).

**Figure 2.** Rate of evolution per site in human RTT-related proteins. (**A**–**C**) Rates of amino acid substitution in MeCP2 (**A**), CDKL5 (**B**), and FOXG1 (**C**) are shown as blue areas. The bars above charts indicate the position of the domain in the human sequence, with light blue areas indicating the domain and black lines indicating no domain. Conserved phosphorylation sites, disordered region, single nucleotide polymorphisms in the general population, and pathogenic missense point mutation are plotted in green, purple, blue, and red lines, respectively. The *x* and *y* axes represent the sequence length and Z score of the evolutionary rates, respectively.

We identified structurally conserved disordered regions, with slowly and rapidly evolving residues reflecting constrained disorder and flexible disorder, respectively [26]. The flexible disorder has a constrained disordered structure despite having rapid evolution of residues; the amino acid substitutions of this property are constrained to residues that confer structural flexibility as the change from structurally disordered to ordered can affect protein function. This type of IDR typically functions as an entropic spring, flexible linker, or spacer without becoming structured and is frequently located outside the domain region [26,35–37]. In contrast, constrained disorder is associated with protein–protein interaction interfaces that adopt a structured conformation or undergo folding upon binding and are thus constrained in terms of sequence, while still requiring flexibility. This module can be present as short linear motifs (SLiMs) or intrinsically disordered domains (IDDs) [26,38]. These regions commonly have secondary structures that may be important for binding and, hence, slowing their evolutionary rates [36,39]. IDDs were observed in the MBD—which was predicted to be partly disordered—and in the TRD and NID of MeCP2; it is in accordance with previous reports that structured regions are found only in the MBD, while other regions are extensively disordered [17,18,40]. Most domains with conserved disordered regions are involved in DNA, RNA, and protein binding, which has been demonstrated by those domains of MeCP2 [41]. SLiMs are frequently located outside the domain and may display modification site. In this study, we predicted the constrained disorder regions and conserved phosphorylation sites located outside the domain to be associated with SLiMs, such as the region that spans after the catalytic domain to the C-terminus of human CDKL5.
