*2.1. In Silico Analysis of the RNA Binding Capacity of MLL Proteins*

As a first step, we mapped the predicted RNA binding motifs on the sequence of four MLL proteins. We used DisoRDPbind, an RNA interaction prediction tool specifically designed to find RNA interaction sites in the disordered regions of proteins. Results shown in Table 1 indicate that all MLL proteins contain several putative RNA interaction motifs in their disordered regions. These regions are found at various positions in the proteins and vary in length from a couple of amino acids to almost a hundred residues, suggesting that RNA binding might be a common feature in MLL proteins.



A comparison with our earlier studies [26] revealed that two conserved disordered binding sites (residues 3537–3545 and 3560–3567) reside within one of the predicted RNA binding regions (residues 3526–3581, Figure 1A) of MLL4, underlining the reliability of the predictions. This region also harbors several cancer-related point mutations, two of them corresponding to a predicted binding site at positions 3560 (D-N) and 3561 (A-D). All these evidences point to the physiological importance of this protein region, making its structural and functional study worthwhile. ANCHOR prediction [27] shows that within the C-terminal border of the predicted RNA binding region there is a region with a strong tendency of the protein chain to form protein-protein interactions (residues 3597–3613, Figure 1A) that corresponds to a run of 14 glutamine residues. Since polyQ repeats in RNA binding proteins have been linked to protein-RNA droplet formation [28], this raises the intriguing possibility of granule formation potency of this segment. Therefore, we chose to test the RNA binding capacity of the MLL4 region between residues 3500–3630 (Figure 1A). As an internal control, another disordered region with no predicted RNA or protein binding sites was selected between residues 4210–4280 of MLL4 (Figure 1D).

**Figure 1.** Structural characterization of the MLL4 regions. Sequences of MLL43500–3630 (**A**) and MLL44210–4280 (**D**). Predicted RNA binding region is indicated by red letters and the polyQ stretch is framed with red. IUPRed (blue) and Anchor (green) prediction of MLL43500–3630 (**B**) and MLL44210–4280 (**E**). Residues having an IUPred score above 0.5 are considered to be disordered, while residues with an Anchor score below 0.5 constitute predicted binding sites. Far-UV CD spectra of MLL43500–3630 (**C**) and MLL44210–4280 (**F**). Inset: temperature-dependent changes in the structure of MLL43500–3630 as observed by monitoring the changes in the absorbance at 220 nm.

As for binding RNAs, we opted to test two different lncRNA constructs, both having been reported to play a role in leukemias. The first is HOTAIR, that has the ability to bind EZH2 (PRC2). The 5' 300 nucleotides of HOTAIR are thought to mediate its binding to PRC2 complex subunits, but the latest annotation in the NCBI database contains an additional 140 bases at the beginning of HOTAIR sequence, compared to the one reported earlier. Therefore, we prepared the longer version (HOTAIR440) that encompasses the 300 nucleotides already known to be involved in protein-RNA interactions and also the nucleotides that has not been studied yet. Since there is no information available about the region of MEG3 that is able to bind proteins, we used the full length MEG3 for our experiments.
