*2.9. Secondary Structure of Nucleotides Interacting with DOT Residues*

Further, we have classified the nucleotides based on location and contacts with DOT residues and preference of amino acids in a protein and the results are presented in Table 4 and Table S3. Among all secondary structures formed by nucleotides, unpaired bases are most likely to bind with DOT residues. Specifically, we observed that A and U in unpaired regions prefer to interact with DOT residues, whereas C and G in unpaired and base-paired positions interact with DOT residues with a similar preference. G and C also interact with DOT residues in pseudoknot secondary structure, whereas A and U are least likely to exist in pseudoknot form when bound to DOT regions.



Percentage is mentioned in parenthesis. Relative contacts in DOT regions are calculated by *Nidt*/*Nprot* × 100.

#### *2.10. Interaction Energy of DOT Residues with Nucleotides*

We have computed the interaction energy between amino acids and nucleotides in DOT and ordered regions at the binding interface and the results are presented in Table 5. Most of the amino acids have stronger interactions with nucleotides in ordered regions than DOT regions. However, we noticed that some combinations of amino acid–nucleotide pairs have favourable energy when

interacting with DOT regions. For example, Arg, His, Ile, Leu, Val, and Phe interact with G, His, Ser, and Val with C, and Asn, Asp, Gly, Ile, Leu, and Ser with U. In addition, hydrophobic residues Ile, Leu, and Val have more favourable interactions with G at DOT regions than others. Since Arg and Lys are important for protein–RNA complex formation through electrostatic interactions these residues have stronger energies in ordered regions than DOT regions. On the other hand, His in the DOT region has favourable energy with G and C. These differences in energy could be important to understand the interactions between DOT regions and the RNA molecule, which might also be used to distinguish the RNA binding residues of proteins in DOT and other regions.


**Table 5.** Interaction energy between amino acids and nucleotides in DOT regions.

Interaction energy for non-DOT residues is mentioned in the parenthesis. Amino acid–nucleotide pairs with favourable interaction energies in DOT regions are shown in bold.

We have compared the interaction energy of amino acid–nucleotide pairs in the interface of DOT and other regions and two typical examples are shown in Figure 8. We noticed a wide range of interactions such as stacking, cation-π, electrostatic, and van der Waals interactions at the interface. Most favourable energy is observed for Asn and His with U (−3.26 kcal/mol) and G (−5.44 kcal/mol), respectively, in DOT regions (Figure 8a). On the other hand, Arg and Phe have favourable energy with A (−8.49 kcal/mol) and C (−4.88 kcal/mol), respectively, in non-DOT regions.

**Figure 8.** Amino acid showing (**a**) strong interaction in DOT and weak interaction in non-DOT regions and (**b**)weak interaction in DOT and strong interaction in non-DOT regions.

### **3. Materials and Methods**

We adopted the following protocol to obtain a set of protein–RNA complexes with disorder-to-order transition (DOT) regions: (i) Downloaded the protein–RNA complexes from PDB and NDB databases (www.rcsb.org) [37–39]; (ii) Clustered all the protein–RNA complexes with 30% sequence identity cut-off using CD-Hit suite [40]; (iii) Performed BLAST search (using 99% identity cut-off) of protein sequences to obtain free proteins corresponding to each protein–RNA complex [41,42]. The free proteins have the same sequences as the protein part of protein–RNA complexes but crystallized without RNA. Note that free proteins contain unique PDB IDs, which is distinct from the protein–RNA complex; (iv) Disordered residues are obtained from missing residues information in the protein–RNA complex and free protein pairs by locating "REMARK 465" statement in the protein structure file; (v) DOT residues are isolated by comparing the disorder residues of free and protein–RNA complex pairs such that the residue is ordered in the protein–RNA complex but disordered in free protein. Note that only the regions having 3 or more continuous DOT residues are considered. The final dataset contains 101 DOT regions in 52 proteins and complete data are given in supplementary information. The representation of DOT and ordered region in a typical protein–RNA complex (PDB ID: 4H4K) is shown in Figure 9.

**Figure 9.** Representation of disorder-to-order mediated interactions. Free protein, RNA, and complex (CRISPR-Cas RNA Silencing Cmr Complex) are shown in cyan, orange and green, respectively. The PDB IDs are 4H4K:A (free protein), 3XIL:I (RNA of protein–RNA complex) and 3XIL:B (RNA-bound protein). The disorder-to-order transition (DOT) region can be clearly seen in green with a missing overlapping region of free protein.
