**3. Discussion**

RTT is a progressive postnatal neurodevelopmental disorder; three individual genes, *MECP2*, *CDKL5*, and *FOXG1*, have previously been thought to be the cause of its variants with the altered *MECP2* as the major contributor. Later, it was suggested that RTT is a monogenic disorder caused by either null mutations or mutations that alter the MBD or NID functions of *MECP2* [15,16,47]. MBD and NID facilitate the binding of MeCP2 to modified cytosine in chromatin and recruitment of the NCOR-SMRT complex, respectively; their combination is vital for MeCP2's role as a repressor [48,49]. The altered forms in the other two genes which were previously characterized as variants of RTT were designed as distinct disorders with several overlapping symptoms to RTT. The three proteins have similar extensive amount of disordered regions and play important roles in the brain. The disordered structure itself is a unique property in protein that may contribute to the interaction with a diverse binding partner and the versatility of a protein. While the three proteins may show similar symptoms in the altered form, the investigation on their similarity in the molecular basis remains scarce, particularly on the disordered structure properties and their binding partners. Focusing on RTT, we investigated the evolution of their disordered structures and their binding partners through prediction and phylogenetic profiling, respectively. This approach is important to give an insight into the similarity of biological systems of those proteins structurally and evolutionarily, which may provide useful information for the development of a RTT therapy strategy. RTT itself has attracted considerable attention as its causative protein displays features related to epigenetics and have been shown to have partially or fully disordered structures.

All three proteins have been experimentally determined to play roles and are abundant in the brain, especially the MeCP2\_e1 and hCDKL5\_1 isoforms [50,51]. It is confirmed by the emergence of neurological impairments in the altered availability or forms of either protein. Through evolutionary analysis and IDRs properties, we provide an additional point of view for that feature. Phylogenetic profiling analysis of MeCP2, CDKL5, and FOXG1 and their interacting proteins showed that 240 molecules formed four clusters—i.e., chordates, metazoans, multicellular, and eukaryotes. Among the three, only *FOXG1* was a member of Class 2, which comprises genes acquired during metazoan evolution, whereas the acquisition of *MECP2* and *CDKL5* was correlated with chordate evolution. The acquisition of *CDKL5* and *MECP2*, and *FOXG1* may contribute to the development of the chordate brain and metazoan nervous system during evolution, respectively. Additionally, order–disorder structure predictions revealed that all three proteins had order–disorder structures that were relatively conserved across chordates. Human MeCP2, CDKL5, and FOXG1 phosphorylation sites were also shown to be relatively conserved to chordates. IDRs properties provide proteins with more interaction areas and PTMs sites, spatiotemporal heterogeneity of structure, and ability to associate and dissociate easily with binding partners. Hence, proteins with long IDRs are likely to have a capacity to bind to many different partners. Accordingly, all three proteins were shown to have multiple binding partners, and FOXG1 and MeCP2 displayed the highest number of partners, some of which were evolutionarily acquired before the metazoan evolved. By cooperating with various proteins partners, particularly the co-repressor complex, FOXG1 or MeCP2 can modulate the expression and suppression of different genes [15,52]. The co-repressor complex itself denotes a conserved mechanism that manifests in diverse forms and may have several functional entities depending on the context in which they are recruited [53]. This indicates the necessity to regulate either FOXG1 or MeCP2 concentration precisely; otherwise, altered availability is likely to be deleterious. Several studies have shown that either overexpression or under-expression of MeCP2 and FOXG1 corresponds to neurological deficits; this phenomenon may not independent from their co-repressor complex that has been showed to play roles in neurogenesis and neuron maturation for FOXG1, and MeCP2, respectively [7,15,52]. On the other hand, CDKL5 binds to a fewer number of proteins that have functions in regulating cell adhesion, ciliogenesis, and cell proliferation. We hypothetically suggest that the amount of CDKL5 binding partners is underestimated since this protein was predicted to have relatively long disordered regions with many constrained

disorder features and phosphorylation sites; it also has fewer insertions and deletions than either MeCP2 or FOXG1 along the evolution.

FOXG1 is a transcriptional factor playing an essential role in ventral telencephalon development; it serves as a hallmark of the telencephalon in vertebrates [52,54]. Among the 237 Class 1 or 2 genes, 233 were detected in the cerebral cortex, with nine expressed at a high level (Figure S2). Seven genes were acquired during metazoan evolution, of which four and three encode MeCP2- and FOXG1-interacting molecules, respectively. Since FOXG1 was also acquired during metazoan evolution, acquisition of FOXG1, SATB2, and SALL1 may have played essential roles in development of the neocortex. FOXG1 is transiently expressed in neuronal progenitor cells and regulates their migration to the cortical plate [55]. During this process, FOXG1 expression is upregulated, which contributes to cortical plate development [56]. Similarly, the FOXG1-interacting chromatin remodeling factor SATB2 was found to be expressed in the cortical plate and regulates neocortical development [54,55]. Therefore, it is conceivable that transcriptional co-operation between FOXG1 and SATB2 mediates the laminarization of the neocortex. In support of this possibility, patients with the SATB2 mutation exhibit an RTT-like phenotype [57,58]. There is no direct interplay reported for MeCP2 and FOXG1. The causative regions in the altered form of these proteins that result in the development of RTT or RTT-like disorder exhibited similar functions in regulating the other genes' expression, but likely via a distinct pathway. We suggest that FOXG1 is not a potential target for developing treatment for RTT. However, induced pluripotent stem cell (iPSC)-derived neurons generated from FOXG1+/− patients and patients with MECP2 and CDKL5 mutations reportedly exhibited a similar increase in synaptic cell adhesion protein orphan glutamate receptor δ-1 subunit (GluD1) expression; this result indicates the need for further study to reveal the mechanism of each protein and might be implicated in the clinical symptom overlap among FOXG1-, CDKL5- and MECP2-related syndromes [52,59,60].

CDKL5 belongs to the same molecular pathway of MeCP2. MeCP2 was acquired during chordate evolution; a prerequisite for this step was the acquisition of MeCP2-interacting molecules such as ZNF483, SOX2, HIPK2, and HIST2H2A. The MeCP2 kinase HIPK2 was shown to be required for the induction of apoptotic cell death in neuronal and other cell types via phosphorylation of the MeCP2 N-terminus [61]. Given that CDKL5, another MeCP2 kinase was also acquired during chordate evolution; it is possible that HIPK2 and CDKL5 cooperate to activate MeCP2 during neocortical development. Since apoptotic cell death increased in *Cdkl5* knockout mouse brain, CDKL5 probably has a suppressive function in the apoptosis process in contrast to HIPK2 [62]. Therefore, functional division of their kinases through phosphorylation of MeCP2 is an important issue. Indeed, the CDKL5-interacting domain was shown to be associated with the C-terminus of MeCP2 [63]. Hence, CDKL5 may phosphorylate the carboxy terminus. Thus, both HIPK2 and CDKL5 may activate MeCP2 by phosphorylating different regions of the protein. It has been suggested that MeCP2 also suppresses CDKL5 transcription and that CDKL5 overexpression may also contribute to the typical RTT symptoms [64]. Hence, aiming the catalytic domain of CDKL5 as the key target for developing alternative strategies to treat classical RTT may be essential since its sole impairment resulted in some symptoms that overlapped with those of classical RTT. Additionally, the CDKL5 disordered region, which spans after the catalytic domain to the C-terminus, is suggested to have many SLiMs. The linear motifs theoretically help to determine the various fates of a protein including subcellular localization, stability, and degradation; these motifs are also able to promote recruitment of binding factors and facilitating post-translational modifications [26,38]. Since these motifs typically regulate low-affinity interactions, they can bind to molecules with different structures of similar affinity and facilitate transient-binding, which are favorable properties for drug targets. Accordingly, this region appears to be a potential target for classical RTT treatment. However, this should also consider the expression levels of CDKL5 which are highly modulated spatiotemporally [64,65].

IDRs show unique properties within protein which challenges the traditional viewpoint of the protein structure paradigm. They have differences in residue composition, intramolecular contacts, and functions to ordered regions which cause different evolutionary rates. Generally, they evolve

more rapidly than ordered regions, owing to the different accepted point mutations. However, some disordered regions can be highly constrained as they may play crucial roles and have multiple functions; assessing the evolutionary rate of IDRs may thus reveal crucial protein-specific amino acids in the biological system [66]. In this study, we found a unique relationship between evolutionary rates of disordered regions and symptoms of a disease caused by FOXG1. The N-terminus residues of FOXG1 are highly variable and constrained to be disordered, while the residues from FBD to the C-terminus are constrained and contain an ordered structure. It has been reported that mutations in the N-terminal are more likely to be associated with severe phenotypes, and mutations in the C-terminal are associated with milder phenotypes [52]. We reported and predicted a phosphorylation site located in Ser 19 to be conserved in chordates even though it is located among flexible disordered regions; casein kinase 1 (CK1) modifies this site and promotes the nuclear import of FOXG1, which corresponds to neurogenesis in the forebrain [67]. This explains that a flexible disordered region can retain its functional module from phosphorylation, despite harboring numerous insertions and deletions, and that severe phenotypes may result from the altered function of Ser 19 of FOXG1.

Among 236 male testis expressing RTT-related genes, 47 genes expressed at a high level. Because paternal-derived de novo mutation has been shown to affect X-linked MeCP2-related female Rett syndrome [6,68], paternally expressing mutation in these genes may affect the sperm-derived genetic and/or epigenetic inheritance that influence the cause of Rett syndrome in a daughter. Further studies are required to analyze these possibilities.

It is important to remember that the features of structural order–disorder and phosphorylation sites in this study have been inferred using linear sequence predictors and that the sequences and mutation points were retrieved from databases whose data have been collected from studies with various methods. It should be considered that we use canonical isoforms instead of predominant brain isoforms, this option may be able to be applied computationally but should be of concern experimentally. This study provides suggestive or hypothetical conclusions, thus further experimental study is important to verify the findings of this study. Ultimately, the results can still be used and considered as a basis for further identification.
