**4. Materials and Methods**

#### *4.1. Sequence Retrieval, Alignment, and Phylogenetic Analysis of MeCP2, CDKL5, and FOXG1 Proteins*

Orthologous sequences of human RTT and RTT-like causing proteins (MeCP2, CDKL5, and FOXG1) in chordates were retrieved from the Kyoto Encyclopedia of Genes and Genomes (KEGG) sequence similarity database (https://www.kegg.jp/kegg/ssdb/) with a Smith–Waterman similarity score threshold of 100 and the bidirectional best hits (best–best hits) option [69]. We primarily used the canonical isoforms MeCP2\_e2 and hCDKL\_5 instead of those the predominant isoforms in the human brain, MeCP2\_e1, and hCDKL5\_1. MeCP2\_e2 is the most characterized isoform relative to MeCP2\_e1, and RettBASE has chosen to name the variants MeCP2\_e2 due to historical reason. Variants specific to MeCP2\_e1 are still reported in RettBASE with the prefix MeCP2\_e1 in the database, but we decided to exclude them in our analysis as we only found one variant that meets our criteria and it cannot be included within the MeCP2\_e2 sequence as they differ in the *N*-terminal region; however, we still reported that variant in our Supplementary Data. CDKL5 has a similar case as MeCP2, but the differences of sequences between hCDKL\_5 and hCDKL5\_1 are located in the C-terminal region (905–1030 a.a) which does not shift the reported Rett-like variants in the catalytic domain. We selected this option as we primarily collected the RTT and RTT-like variants from RettBASE. The used isoforms do not differ greatly to those predominant brain isoforms. The highest similarity score for each species was used for each of those proteins to minimize redundancy. Datasets were created for each protein and then aligned using MAFFT v.7 (https://mafft.cbrc.jp/alignment/software/) with the iterative refinement method (FFT-NS-i), with a maximum of 1000 iterations [70]. Phylogenetic trees were constructed with the maximum likelihood method using RAxML-HPC2 BlackBox with the RAxML automatic

bootstrapping option using Jones, Taylor, and Thornton amino acid substitutions with the + F method and gamma shape parameter (JTT + F + G) model for MeCP2 and CDKL5, and the JTT + G model for FOXG1, which were selected as the best fit models under the Bayesian information criterion (BIC) by ModelTest-NG [71,72]. The outgroup for each tree was selected based on the NCBI Taxonomy Common Tree for the common ancestor within the dataset [73]. Reconstruction of phylogenetic trees and calculation of models were performed in CIPRES Science Gateway (http://www.phylo.org/) [74].
