*2.3. RNA Binding of MLL43500–3630 and MLL44210–4280*

Microscale thermophoresis measurements were performed to characterize the RNA binding of the expressed protein regions. We used two lncRNA constructs, HOTAIR440, a segment of HOTAIR that contains the region involved in binding to EZH2 [31], MEG3, a lncRNA involved in leukemias [32] and a 50 nt long RNA with random nucleotide sequence. Contradicting to the lack of predicted binding sites, MLL44210–4280 showed a relatively strong binding to HOTAIR440 with an apparent Kd of 13.05 μM (Figure 2A), while the negative control Thymosin beta 4 (Tβ4) did not bind to the RNA, showing any sign of interaction at only the highest concentrations applied.

**Figure 2.** RNA binding detected by microscale thermophoresis. MST binding curves of MLL43500–3630 (green), MLL44210–4280 (red) and thymosin beta 4 (blue) to different RNAs: HOTAIR440 (**A**), MEG3 (**B**) and 50 nt RNA (**C**).

In the case of MLL43500–3630, saturation of the reaction could not be reached because of marked aggregation above 1:20 RNA:protein ratio (Supplementary Figure S2) but using the T-jump values of the MST measurement (Supplementary Figure S3) an approximate binding constant of 0.1 μM could be determined. The appearance of large particles in the solution, generally considered to be aggregates, is indicated by a "wavy" MST curve and a randomly fluctuating normalized fluorescent percentage as shown on Supplementary Figures S2 and S5. The observed aggregation was dependent on the RNA species, since it was not seen with either of the other tested RNAs (Figure 3B,C), or with a shorter, 300 nt long HOTAIR construct (Supplementary Figure S4). The HOTAIR300 construct overlaps with HOTAIR440 in the 3 300 nucleotides but lacks the first 140 nucleotides of the latter. This shorter HOTAIR construct bound to MLL43500–3630 with a Kd of 0.97 μM, with no sign of irregular behavior. Centrifugation (15 min at 13,000× *g*) of the samples resulted in the loss of fluorescent signal in a protein concentration-dependent manner (Supplementary Figure S5), indicating a formation of structures containing both RNA and protein. Such phenomenon was not observed with MLL44210–4280, or Tβ4 upon mixing them with HOTAIR440, even at significantly higher protein concentrations than MLL43500–3630. Also, MLL43500–3630 did not show aggregation-prone behavior in the absence of RNA.

As we experienced no anomaly in the behavior of MLL43500–3630 when titrated to MEG3, determination of a binding constant was straightforward for this interaction. As shown in Figure 2B, affinity to MEG3 of this region of MLL4 was higher than that of MLL44210–4280. The Kd of MLL43500–3630 binding to MEG3 was calculated to be 0.722 μM, while Kd calculation for MLL44210–4280 was not reliable since saturation of the reaction could not be reached throughout the protein concentration range tested. Tβ4 did not show significant affinity to MEG3, resulting in a failure of binding curve fitting.

To check for any specificity of binding that the expressed MLL4 regions may possess, we also tested a physiologically non-relevant 50 nt RNA construct. Binding curves presented in Figure 2C indicate that both MLL43500–3630 and MLL44210–4280 are capable of binding to this RNA species, but with a remarkably lower affinity than to the lncRNA constructs, while Tβ4 could not bind to it at all. The extended shape of the binding curve and the absence of saturation in the case of both MLL4 constructs indicate weak binding that resulted in an inability to reliably determine the binding constants. Nevertheless, MLL43500–3630 still displayed a stronger affinity towards the RNA than MLL44210–4280.

Electrophoretic Mobility Shift Assay (EMSA) experiments confirmed the findings of the MST measurements (Figure 3) as both MLL4 regions caused a significant change in RNA mobility in the case of HOTAIR440 and MEG3 (Figure 3A,B) RNAs. This shift was drastically less pronounced with the 50 nt RNA sample (Figure 3C), resulting only in a minor weakening of the RNA signal in the lane with the highest protein concentration. This observation corresponds to the outcome of the MST experiments, underlining the existence of a certain level of specificity in the RNA recognition by these two MLL4 regions. The negative control Tβ4 failed to cause any visible change in the RNA mobility, indicating a lack of interaction with any of the tested RNAs. Competitive RNA binding (Figure 3, compare the 3rd and 5th lanes) demonstrated that the observed shift in mobility was indeed a result of RNA-protein interaction, since the shift could be prevented at least to some extent by adding excess unlabeled RNA to the reaction mixtures.

The anomalous behavior of the MLL43500–3630:HOTAIR440 interaction observed in MST was seen in the EMSA experiments as well, since at high protein:RNA ratios the samples obtained a highly viscous quality and completely remained in the wells during the electrophoretic run. Successful experiments could only be carried out by lowering the applied protein concentration, but the interaction was clearly observable even under these circumstances.

In all of the tested interactions, MLL43500–3630, which contains a predicted RNA binding region presented higher affinities to RNAs than the other MLL4 segment, indicating the validity of the prediction. On the other hand, binding of MLL44210–4280 could also be detected in all cases, raising the possibility of the existence of RNA binding sequences differing from the already described interaction motifs. EZH2, a known RNA binding HKTM also interacts with RNAs through a region [17] that has no recognizable RNA binding sequence, emphasizing our lack of complete knowledge of the sequential determinants of protein-RNA interactions.

**Figure 3.** Electrophoretic Mobility Shift Assay. Interaction of MLL43500–3630, MLL44210–4280 and Tβ4 with HOTAIR440 (**A**), MEG3 (**B**) and 50 nt RNA (**C**). For easier understanding, the coloring scheme of Figure 2 is followed (MLL43500–3630: green MLL44210–4280: red, Tβ4: blue). Free RNA is indicated by arrows.

#### **3. Discussion**

Histone methylation is one of the most studied and best-characterized histone modifications that drive the regulation of complete genetic programs in the cells. However, many details of the regulation and targeting of the enzyme complexes mediating histone methylation remain elusive and a subject of debate [23]. One possible regulatory pathway is represented by the ability of certain HKMT complexes to bind different lncRNAs that serve as a targeting platform, bridging transcription factors and HKMT complexes [20,33] at the promoter regions of target genes. PRC2 is one example where it was shown by multiple experiments that it's binding to different lncRNAs results in different physiological outcomes [34]. lncRNAs are involved in many other processes connected to histone modification and there are examples in the literature of direct interaction between lncRNAs and histone modifier complexes [4,22]. Experimental evidence supports the direct binding of WDR5, a canonical MLL complex subunit, to different lncRNAs in cells [22] indicating the involvement of lncRNAs in the regulation of MLL complexes. Taken the analogy of the PRC2, where multiple subunits are shown to be involved in lncRNA binding (Figure 4A) [15], we hypothesized that MLL proteins might also interact with lncRNAs. This hypothesis was supported by our earlier bioinformatics studies that suggested the existence of several interaction sites in the so far uncharacterized, mostly disordered regions of HKMTs [26] and our prediction presented here that the disordered segments of MLL proteins contain several putative RNA binding sequences. We chose to test the RNA binding capability of one such region of MLL4 that also contains a polyQ stretch and is affected by mutations in different cancers. As an internal control, we also tested a different region of MLL4 that contains no such predicted RNA interaction site.

Our expectation was that the isolated small regions of the MLL4 protein would bind RNAs in a nonspecific manner, such as was observed for the isolated PRC2 complex components [34]. Surprisingly, we found that MLL44210–4280 bound MEG3 stronger than HOTAIR440 or the 50 nt random RNA, even though the determination of the exact Kd-s was not successful in all cases.

More interesting was the behavior of the MLL43500–3630 region that showed dramatically different behavior with the different RNAs. Binding to MEG3 gave a Kd of 0.722 μM, while the binding to the 50 nt random RNAs proved to be so weak that a Kd calculation was not successful. Binding to HOTAIR440 seemed to be the strongest with an apparent Kd of 0.1 μM, but it led to the aggregation of the protein-RNA complex. The aggregation was dependent on protein-RNA ratio and could be detected through a wide protein concentration range. The same aggregation could not be observed with a shorter HOTAIR construct that consisted of 300 bases (Supplementary Figure S3). The fact that we could not induce such aggregation by the addition of MEG3, which is much longer than HOTAIR440, points to specific recognition rather than a side-effect of RNA length. We also observed the aggregation at low protein concentrations, but only in the presence of an appropriate amount of HOTAIR440, indicating that the process is not driven by the protein in itself and is not a derivative of sample preparation errors.

It has been recently revealed that many proteins can go through liquid-liquid phase separation when interacting with RNAs, leading to the formation of membraneless organelles that have a significant importance in cellular processes [35]. Experimental evidence supports the involvement of polyQ regions of proteins in the RNA mediated phase separation [28], sometimes in an RNA secondary structure-dependent manner [36]. Since MLL43500–3630 sequence contains 22.9% glutamine residues and a continuous run of 15 glutamines (Figure 1A), it is not unfounded to speculate that this specific region plays a role in the observed anomaly but the fact that it only occurs with one of the tested RNA constructs, indicates that the process is coordinated by the RNA itself. One possibility is that the longer HOTAIR construct contains more than one binding sites for MLL43500–3630, thus facilitating the formation of higher order protein-RNA structures. Alternatively, HOTAIR440 may have the ability to form secondary structures not found in HOTAIR300 or MEG3, which would also provide an explanation for the different behavior of the three systems. As MLL4 is the only HKMT that contains long polyglutamine repeat stretches [26], phase separation might be a regulatory step specific for this protein. Therefore, it is certainly promising to investigate this peculiar phenomenon in more detail.

Since both tested lncRNAs are implicated in different cancers [5,37,38] involving leukemias, our finding that MLL4 has a capacity to bind them raises the possibility that lncRNAs play a role in MLL/COMPASS complex targeting and regulation to a larger extent than currently recognized.

Although cellular experiments are necessary to prove the validity of the observed interactions, our findings provide the first insights into the structure and function of two regions of MLL4 that have been uncharacterized so far. We were able to show that these regions are capable of RNA binding and may be involved in the lncRNA mediated regulation of the MLL4 complexes. Based on our results, we suggest that and MLL4 complexes utilize different regions on their surface to bind lncRNAs (Figure 4B), similarly to the way PRC2 subunits take part in lncRNA binding. As it was shown that lncRNA binding to WDR5 increases the dwelling time of the protein on the chromatin surface [22], binding of the same RNA to MLL4 might facilitate and accelerate the assembly of a functional methyltransferase complex. Since lncRNAs are large molecules that can adopt various secondary structures and interact with many different partners simultaneously, it is plausible to speculate that a specific and high-affinity interaction can be achieved by the combination of different binding sites distributed along the large surfaces of multi-subunit complexes. Given the central role of histone modifications in gene regulation, it is essential to understand the mechanisms that regulate this process. Mounting evidence supports the involvement of lncRNAs in the coordination of histone modifying enzymes but the exact molecular details of their interactions with proteins are yet to be discovered. Recognizing the importance of the disordered/structurally uncharacterized regions of HKMTs in these

interactions might be the first step towards a more complete picture regarding the regulation of histone methylation.

**Figure 4.** lncRNA binding of PRC2 and MLL4/COMPASS complex. Schematic representation of the PRC2 (**A**) and MLL4/COMPASS (**B**) complexes, where the known RNA binding subunits are shown in orange and the suggested lncRNA binding subunit MLL4 is green. Subunits currently not known to be involved in lncRNA binding are blue and the lncRNA is represented by a black line. Suggested lncRNA-MLL4 interaction is indicated by dashed line.

### **4. Materials and Methods**
