Advances and Mechanisms of RNA–Ligand Interaction Predictions

Zhuo, Chen; Zeng, Chengwei; Liu, Haoquan; Wang, Huiwen; Peng, Yunhui; Zhao, Yunjie

doi:10.3390/life15010104

Open AccessReview

Advances and Mechanisms of RNA–Ligand Interaction Predictions

by

Chen Zhuo

¹,

Chengwei Zeng

¹,

Haoquan Liu

¹,

Huiwen Wang

²,

Yunhui Peng

¹ and

Yunjie Zhao

^1,*

¹

Institute of Biophysics and Department of Physics, Central China Normal University, Wuhan 430079, China

²

School of Physics and Engineering, Henan University of Science and Technology, Luoyang 471023, China

^*

Author to whom correspondence should be addressed.

Life 2025, 15(1), 104; https://doi.org/10.3390/life15010104

Submission received: 10 December 2024 / Revised: 13 January 2025 / Accepted: 15 January 2025 / Published: 15 January 2025

(This article belongs to the Section Genetics and Genomics)

Download

Browse Figures

Versions Notes

Abstract

:

The diversity and complexity of RNA include sequence, secondary structure, and tertiary structure characteristics. These elements are crucial for RNA’s specific recognition of other molecules. With advancements in biotechnology, RNA–ligand structures allow researchers to utilize experimental data to uncover the mechanisms of complex interactions. However, determining the structures of these complexes experimentally can be technically challenging and often results in low-resolution data. Many machine learning computational approaches have recently emerged to learn multiscale-level RNA features to predict the interactions. Predicting interactions remains an unexplored area. Therefore, studying RNA–ligand interactions is essential for understanding biological processes. In this review, we analyze the interaction characteristics of RNA–ligand complexes by examining RNA’s sequence, secondary structure, and tertiary structure. Our goal is to clarify how RNA specifically recognizes ligands. Additionally, we systematically discuss advancements in computational methods for predicting interactions and to guide future research directions. We aim to inspire the creation of more reliable RNA–ligand interaction prediction tools.

Keywords:

RNA secondary structure motifs; RNA pocket geometric feature; RNA–ligand interaction mechanism; structure prediction

1. Introduction

RNA often interacts with other molecules to carry out its biological functions [1,2,3]. Ligands are crucial in regulating RNA functions, affecting RNA activity and stability through various interactions [4]. For instance, benzimidazole translation inhibitors bind to the internal ribosome entry site of the hepatitis C virus, which leads to significant conformational changes of a deep pocket in the RNA [5]. The glmS riboswitch interacts with the metabolite glucosamine 6-phosphate (GlcN6P), a product of the synthase reaction, which acts as a cofactor for self-cleavage. This process regulates metabolite levels and inhibits bacterial cell wall synthesis [6]. Therefore, exploring RNA–ligand interactions is essential to uncovering the functional mechanisms of RNA and understanding its role in biological processes.

The diversity and complexity of the RNA structure enable it to recognize various molecules effectively. Research on the dynamic conformations of riboswitches has identified two primary modes of RNA–ligand interactions: conformational selection and induced fit [7,8,9,10]. For instance, the free SAM-II riboswitch initially forms a simple hairpin structure without the ligand [11]. However, when the SAM ligand is first recognized, the riboswitch undergoes local conformational changes that allow it to stabilize and create a suitable binding site for the ligand. This demonstrates that RNA exhibits significant versatility and plasticity, allowing it to fold into various three-dimensional (3D) shapes that provide specific binding sites for different molecules. With the advancement of biotechnology, an increasing number of RNA-related databases have been established [12,13]. For example, the SMMRNA and NALDB databases provide detailed information on RNA–ligand experimental data, including RNA sequences and structural information, as well as the chemical information of ligands and their binding affinities (Ki, Kd, IC50) for these interactions [14,15]. Other databases, such as ncRPheno, ViRBase v3.0, and LncRNADisease v3.0, catalog the associations between miRNAs, lncRNAs, and other non-coding RNAs, and various diseases in humans and other mammals [16,17,18]. These data sources lay a solid foundation for studying RNA interactions, allowing researchers to explore the interaction interfaces between RNA and their binding partners visually. However, experimental methods such as X-ray crystallography, NMR, and cryo-electron microscopy present several challenges, including high costs and operational complexity [19,20,21,22]. As of 27 November 2024, the Protein Data Bank contains 227,561 experimental structures, but fewer than 1000 of them are RNA–ligand structures [23]. Therefore, there is an urgent need to develop computational methods for predicting RNA interactions.

Machine-learning-based computational methods have recently become essential tools for understanding RNA structure and function [24,25,26]. By leveraging prior knowledge from experimental structures, these methods can analyze vast biological datasets, identify complex patterns, and predict interactions that were previously difficult to detect experimentally [27,28,29]. For instance, RNet utilizes RNA structure network features from three machine learning models to predict RNA–small molecule binding sites [30]. On the other hand, RNAsite employs a random forest model to learn RNA sequence and structure features at various levels to predict RNA–ligand binding sites [31]. Furthermore, deep-learning-based computational methods are emerging as powerful tools for predicting RNA interactions [32,33]. ZHmolReSTasite, for example, uses a deep residual network model to systematically learn RNA sequence, secondary structure, and tertiary structure features for predicting ligand-binding nucleotides [34]. MultiModRLBP combines four deep learning modules to capture multidimensional RNA structural features and predict these nucleotides [35]. However, current methods for predicting RNA–ligand binding affinity are limited, with only RLaffinity and RSAPred available [36,37]. Despite being a relatively unexplored area in RNA research, machine learning algorithms show tremendous potential. RNA structural features and interaction information with other molecules provide valuable data resources and theoretical guidance for computational methods, forming a solid foundation for model training.

This review provides a thorough analysis of RNA–ligand interactions, focusing on the mechanisms through which RNA recognizes ligands and their applications in machine learning algorithms (Figure 1). To aid researchers in understanding the biological functions of RNA, we systematically compile representative databases of RNA–ligand interactions. From the perspective of RNA, we explore how it specifically recognizes ligands through physicochemical interactions across various levels, including primary sequence, secondary structure, and tertiary structure. This approach allows us to gain deeper insights into the mechanisms of RNA recognition. Additionally, we analyze the progress and challenges in computational methods for predicting RNA–ligand interactions, considering different aspects of RNA features. We aim to provide a comprehensive overview of the potential development of computational tools for exploring RNA–ligand interactions, emphasizing the analysis of interaction mechanisms that involve multiple dimensions of RNA characteristics.

2. RNA–Ligand Database Resources

The Protein Data Bank (PDB) is a comprehensive database that stores and retrieves experimental structural information for biological macromolecules [23]. Currently, approximately 92% of the entries in the PDB are protein-related, while around 8% are pertaining to nucleic acids (Figure 2A). The Nucleic Acid Knowledgebase (NAKB) aggregates all RNA and DNA experimental structures [38]. It provides external resource lists and various interactive tools. As of 27 November 2024, the NAKB offers 18,645 available structures, with RNA complexes making up 37% of these, translating to 6914 entries. However, there are fewer than 1000 RNA–ligand complexes. In recent years, advancements have been made in techniques for resolving RNA complex structures, enabling researchers to uncover the structural details of these complexes at the atomic level [19]. Nevertheless, experimentally elucidating the complexity and dynamics of RNA structures, as well as their responses to environmental changes, remains a formidable challenge. Despite this, the structural data in databases like the PDB and NAKB provide a solid foundation for analyzing and understanding the critical structural features and functional roles of RNA–ligand interactions.

RNA–ligand databases are essential for studying RNA’s biological functions (Table 1). These databases can be divided into two categories: structural databases and interaction databases (Figure 2B). RNA–ligand structural databases provide experimentally determined information about complexes, including RNA sequences and structures, ligand molecular properties, and binding thermodynamic data. For instance, NoncoRNA compiles experimentally validated interactions between non-coding RNAs and drug targets, featuring 8233 entries comprising 5568 ncRNAs and 154 drugs across 134 cancer types [39]. ROBIN is a novel library of nucleic acid binders that reports comprehensive results from small-molecule microarray screenings targeting 36 individual nucleic acids with 24,572 small molecules [40]. Other SMMRNA, NALDB, PDBbind, and R-SIM databases provide experimentally determined RNA–ligand structures and thermodynamic data, including binding affinities, binding free energies, and experimental conditions [14,15,41,42]. RNA–ligand interaction databases provide detailed information about the interaction interfaces between RNA and ligands. This includes the sequence and structure of the binding interface and the physicochemical environment. For example, RNALigands integrates RNA–ligand interaction data from Inforna 2.0, R-BIND, and the PDB database, offering sequence and structural data for 841 RNA secondary structure motif–ligand pairs [43,44,45]. RNALigands suggests that similar secondary structure motifs tend to bind to similar ligands, indicating that RNA’s secondary structural features carry important information for specific ligand recognition. HARIBOSS analyzes RNA–ligand structural interaction data from the PDB database, providing annotations on the physicochemical properties of ligands and the structures and compositions of RNA pockets at the binding interface [46]. HARIBOSS found that RNA pockets exhibit hydrogen bond donor and acceptor properties similar to those present in proteins. In addition, druggable RNA pockets tend to be more hydrophobic and less exposed to solvents, underscoring the significance of the RNA structural surface shape in ligand recognition.

RNA–ligand databases are essential resources for understanding RNA function and advancing computational methods. However, RNA demonstrates significant dynamics in recognizing other molecules, with multiple conformational states being fundamental to its function [22,48]. While these databases provide experimental structural data and static interaction information, they do not adequately explore or analyze the 3D structural flexibility and geometric features. To address this gap, we developed the RNA pocket databases (RPocket and RPflex), which capture the static and dynamic interactions of RNA–ligand complexes by examining the geometric features and conformational changes of RNA pockets (Figure 2B). The RPocket database focuses on static information related to RNA sequences and structures, calculating geometric properties of ligand-binding and non-binding pockets, including volume, surface area, effective radius, and centroid [47]. We also created pocket-shape descriptors, RPDescriptors, for calculating the geometric features of RNA pockets. RPDescriptor calculates the shape similarity scores, categorizing pockets into sphere-like, rod-like, and disc-like shapes. The shape similarity scores are defined by two descriptors (

{r p d}_{1}

and

{r p d}_{2}

) based on normalized principal moments of inertia ratios. Our findings indicate that the shape and geometric characteristics of pockets exhibit selective specificity for ligand binding. In addition, we assessed RNA flexibility through the conformational changes in RNA pockets, thereby capturing the dynamic features associated with RNA–ligand recognition [49]. RPflex compiled pocket datasets from various sources, including 2276 pockets from RNA complexes, 352 pockets from RNA–ligand complexes, and 526 pockets from RNA–protein complexes. However, due to the limited number of NMR experimental structures available, we collected pocket data from only 160 non-redundant RNA-related structures. To overcome the limitations of experimental data, we need to understand the physics-based foundations of interactions.

3. Physics-Based Interaction Forces on RNA–Ligand Complexes

RNA–ligand complex stability is achieved through interactions between nucleotides and ligands based on physical principles. These interactions include hydrogen bonds, van der Waals (VdW) forces, stacking interactions, and hydrophobic interactions, all of which are considered short-range forces. In contrast, electrostatic forces are long-range and can act with long distances. Long-range forces primarily drive the remote recognition between RNA and ligands, while short-range forces serve to optimize and enhance the structure of the complex [47].

3.1. The Short-Range Forces

The hydrogen bond is an interaction force that occurs between a hydrogen atom and two electronegative atoms, such as oxygen and nitrogen. Each hydrogen bond typically contributes between 0.5 and 4.5 kcal/mol to the overall system energy [50]. The weakest hydrogen bonds are often considered to be van der Waals (VdW) contacts, which contribute between 0.5 and 1 kcal/mol. The detection tool, HBplus, uses default parameters for identifying hydrogen bonds: a hydrogen–acceptor distance less than 2.7

Å

and a donor–acceptor distance less than 3.35

Å

[51]. For VdW contacts, the default parameters define these contacts as occurring between atoms that are not involved in hydrogen bonds and with less than 3.9

Å

. Hydrogen bonds are generally associated with the interaction energy between actual hydrogen bond donors and acceptors. This interaction energy can be described using the following Lennard–Jones potential [52]:

V (r) = 4 ε ((\frac{σ}{r})^{12} - {(\frac{σ}{r})}^{6})

(1)

where

ε

represents the potential well depth, r represents the distance between two particles, and

σ

is the distance between the two particles when the potential energy is exactly zero. The bases and backbone of RNA (phosphate and ribose) can form hydrogen bonds and VdW contacts with ligands (Figure 3A) [53,54]. RPflex analyzed the interactions in ligand-binding pockets from NMR RNA–ligand structures [49]. The results show that hydrogen bonds with the bases, phosphate, and ribose account for 63%, 21%, and 16%, respectively. In contrast, VdW contacts with these components account for 72% with bases, 6% with phosphate, and 22% with ribose. These findings are consistent with previous studies, including one by G. Padroni et al., which demonstrated that hydrogen bonding events involving bases are approximately three times more frequent than those with the phosphate-ribose backbone [55]. This further reinforces the idea that bases carry most of the structural information in the RNA binding pocket and serve as the primary source of interactions with ligands.

Stacking interactions are non-covalent attractive forces between aromatic rings, contributing approximately 2–6 kcal/mol for each interaction [56]. They typically occur at interatomic distances of 2.7–4.3

Å

. A typical example is the acridine-based ligand that targets a telomeric RNA G-quadruplex (Figure 3B), where the binding event is primarily driven by stacking interactions [57]. G. Padroni et al. discovered that stacking and hydrogen bond interactions represent the largest interactions in RNA–ligand complexes, accounting for 34.8% and 34.4% [55]. This highlights the significance of hydrogen bonds and stacking interactions in RNA recognition. Hydrophobic interactions generally occur at distances between 3.8 and 5.0

Å

, with each interaction contributing approximately 1–2 kcal/mol [58,59]. When hydrophobic groups cluster, the surrounding water molecules increase the system’s entropy. This increase in entropy is the main driving force behind hydrophobic interactions. Some RNA-targeted drugs, such as cryptolepine hydrate and aminoglycosides, utilize hydrophobic interactions to enhance binding to their RNA targets [60,61]. As illustrated in Figure 3C, the increased affinity of an aminoglycoside analog is mainly attributed to the VdW and hydrophobic components of the solvation-free energy, which arise from the methylene chain aligning with or pointing down the major groove [62].

3.2. The Long-Range Force

Electrostatic interactions are a type of non-covalent force between two charged particles. Their strength depends on the quantity of the charges and the distance separating them [63]. According to Coulomb’s law, the value of electrostatic force can be calculated as follows [64]:

F = k \frac{q_{1} q_{2}}{r^{2}}

(2)

where r is the distance between charge particles

q_{1}

and

q_{2}

, and

k

is the Coulomb constant. There are N point charge particles distributed in a uniform dielectric, and their electrostatic potential at position R in space is calculated by

ϕ (r) = k \sum_{i} \frac{q_{i}}{| R - r_{i} |}

(3)

where

q_{i}

and

r_{i}

represent the charge and position of the particle. Electrostatic interactions occur between the negatively charged RNA phosphate backbone and positively charged ligands. Cation–

π

interactions are electrostatic interactions between positively charged atoms and the negatively charged electron cloud of aromatic systems [65]. A notable example of cation–

π

interactions in RNA recognition is guanidine binding to the guanidine II riboswitch [66]. As illustrated in Figure 3D, guanidine cations stack upon the nucleobase guanine, forming a cation–

π

interaction. These electrostatic forces are characterized by their long-range and non-specific nature. They play a crucial role in the initial stages of complex formation by facilitating contact and recognition between molecules. The RPocket database systematically analyzed 240 pockets in RNA–ligand complexes and proposed a potential mechanism for complex formation [47]. This mechanism suggests that long-range electrostatic interactions guide the initial recognition and binding of RNA and its ligand, while short-range interactions optimize and stabilize the resulting complexes. These interaction pairs can be regarded as distance constraints that aid in complexes’ structural modeling and drug design. For example, a nanoassembly system composed of doxorubicin-conjugated polyphosphoester and CD47-targeting siRNA utilizes these principles [67]. In this system, electrostatic interactions and

π

-

π

stacking work together to achieve targeted drug delivery, successfully reactivating T cell- and macrophage-mediated anticancer immunotherapy.

4. Interaction Mechanisms Extracted from RNA Features

RNA structure is organized into three hierarchical levels: sequence, secondary, and tertiary structure (Figure 4). The primary sequence determines its coding and functional properties, while its secondary and tertiary structures give rise to unique spatial configurations that enable binding to target molecules. At the binding interface, various non-covalent interactions between RNA and ligands facilitate the recognition and stabilization of the complex structure. Our aim is to investigate the interaction mechanisms that provide valuable insights for developing computational methods.

4.1. Sequence Features

The primary sequence of RNA refers to the linear arrangement of its four nucleotides: adenine (A), guanine (G), cytosine (C), and uracil (U). In the context of RNA–ligand interactions, most hydrogen bonds are formed with the negatively charged oxygen atoms of the phosphate groups in the RNA sugar-phosphate backbone [54,68]. However, ligands preferentially form hydrogen bonds with the nucleobases, with G being the most involved, followed by U, C, and A. Most hydrophobic interactions occur through the surfaces of the nucleobases, with minimal contact made with the sugar moiety [69]. Previous studies have indicated that optimizing stacking interactions may target purine-rich RNA motifs more effectively [69,70]. Additionally, the RPocket database has analyzed nucleotide sequence patterns derived from RNA–ligand binding sites, revealing that certain sequences, specifically “GU” (11.7%), “GG” (8.8%), “GA” (8.8%), and “GC” (8.1%), are more likely to bind with ligands [47]. Furthermore, it has been suggested that RNA sequences are more susceptible to recognition by certain small molecules. An example of this is myotonic dystrophy (DM), which is caused by repeated expansions of RNAs (CCUG) [71]. This condition leads to toxicity by sequestering crucial RNA-binding proteins and generating toxic proteins through repeat-associated non-ATG translation. Compounds developed by Childs-Disney et al., such as K-alkyne, 2K-4, and 3K-4, have been studied for their potential to improve the BIN1 pre-mRNA splicing defect in a cellular model of DM2 [72].

4.2. Secondary Structure Features

RNA secondary structure consists of paired and unpaired nucleotides, resulting in structural elements such as helical stems and single-stranded regions. The single-stranded regions comprise unpaired nucleotides, including hairpin loops, internal loops, bulge loops, multibranch loops, and pseudoknots [73]. In our previous work, RBind discovered that 98% of nucleotides bound by ligands are located within or near loop regions (within five nucleotide pairs of the loop regions) [74]. This suggests that loops are potential binding sites for ligands. Further analysis by RPocket examined the distribution of RNA secondary structure at ligand-binding sites [47]. The results indicated that these sites tend to be within specific loop motifs, highlighting the important role of unpaired nucleotides in ligand binding. By calculating pocket shape descriptors, we found that 92.6% of tandem loops are typically located within pockets of the same shape. Binding to incorrect secondary structure motifs may disrupt interactions and undermine structural stability. Our findings suggest that incorporating more complex loop combinations could improve predictions of RNA tertiary structures. Additionally, Disney and colleagues have shown that their RNA loop motif–ligand interaction database, Inforna 2.0, can identify ligands for targets such as microRNAs and viral RNAs [44]. Similarly, the R-BIND and RLigands databases curate pairs of loop motif–ligand interactions, allowing users to screen potential ligands by assessing secondary structure motif similarity among RNAs [43,45]. Therefore, the interactions of nucleotides within RNA secondary structures give rise to loop motifs with ligand recognition capabilities, making these motifs ideal candidates for ligand binding.

4.3. Tertiary Structure Features

Secondary structural elements interact spatially to form pockets in RNA. These RNA pockets are defined as concave areas on the three-dimensional structural surface, serving as specific binding sites for other molecules. Various methods have been developed to identify RNA pockets by analyzing their geometric features, such as volume, surface area, and sphericity [75,76,77]. Our previous findings indicate that most ligand-binding pockets have larger volumes and surface areas compared to non-binding pockets, suggesting that ligands might influence the motions of these pockets [47].

Scientists have identified base recognition and shape complementarity at interfaces as key factors in RNA complex interactions [78,79]. In this context, the solvent-accessible surface area (SASA) and Laplacian norm (LN) help characterize the pocket’s shape, while structural network features illustrate communication between nucleotides. Our previous work, ZHmolReSTasite, analyzed the SASA of ligand-binding and non-binding nucleotides in 60 non-redundant RNA–ligand structures [34]. We found that the SASA can be used to characterize the ligand-induced conformational changes. The ligand-binding nucleotides are often situated in pocket regions on concave, rather than convex, surfaces of RNA, resulting in smaller exposed areas. Similarly, the analysis of the 60 non-redundant RNA–ligand structures revealed that nucleotides involved in binding exhibit lower LN values, indicating that the LN feature captures the surface geometry of pocket structures. Additionally, other studies have transformed RNA structures into networks to predict their functional expression [30,74]. Individual nucleotides are represented as network nodes in the RNA networks, while non-covalent interactions between nucleotides are depicted as network edges. Network topology features can capture global and local interaction features within pocket structures. For instance, RPflex utilizes the degree and clustering coefficient to characterize local interactions within the network, while network diameter is used to infer long-range interactions [49]. The results indicate that in pocket structures, hydrogen bonds formed within base interactions play a critical role in stabilizing RNA structures, whereas backbone interactions are essential for RNA folding. This approach offers an intuitive representation of complex 3D structures, facilitating the identification of critical interaction sites.

5. RNA–Ligand Interaction Prediction

By utilizing prior knowledge from experimentally determined structures, machine-learning-based methods can analyze large-scale biological data to identify complex patterns and predict previously difficult-to-detect interactions through experimental approaches [80,81,82]. RNA sequences, secondary structures, and tertiary structural features are converted into representations compatible with machine learning algorithms, such as one-hot encoding and sliding windows. When combined with embedded features of ligands, these machine learning models can predict aspects of complex interactions (Figure 5).

RNA–ligand binding site prediction methods employ various RNA structural features to identify nucleotide-binding sites for ligands, typically framing the task as a binary classification problem. For example, Rsite identifies ligand-binding sites by analyzing the Euclidean distances between nucleotides within the RNA tertiary structure [83]. In contrast, Rsite2 uses secondary structures instead of tertiary ones to calculate these distances [84]. RBind and RNet convert RNA tertiary structures into structural networks to account for network attributes [30,74]. RNAsite combines features from RNA sequences and tertiary structures, including SASA, LN, and network properties, using the random forest algorithm for predicting ligand-binding sites [31]. Building upon this approach, RLbind integrates RNA sequence and tertiary structure features (network properties and SASA), along with biochemical properties, into a convolutional neural network model to capture RNA features effectively [85]. MultiModRLBP employs four modular deep learning components to incorporate RNA sequence, secondary structure, and tertiary structure features [35]. In our previous research, ZHmolReSTasite utilized a deep residual network to extract RNA sequence, secondary structure, and tertiary structure features (network properties, SASA, LN, and pocket features) for predicting ligand-binding nucleotides [34]. The performance of these methods was evaluated on the benchmark dataset TE18 (Figure 6) [74]. Results indicated that MultiModRLBP outperformed the other methods in terms of recall and Matthews correlation coefficient (MCC) metrics, while ZHmolReSTasite achieved the highest precision. RNet achieves the second-highest accuracy despite relying exclusively on network properties. ZHmolReSTasite also analyzed the role of pocket features, revealing that small molecules tend to bind within RNA pockets rather than on more convex surfaces. These results highlight that integrating RNA sequence and structural features can more accurately and comprehensively capture the specificity of RNA in recognizing ligands, especially the key role of tertiary structure geometric features.

RNA–ligand binding preference prediction methods aim to identify potential ligands that interact with RNA, offering valuable insights for functional regulation and drug development. For instance, R-BIND and RLigands utilize RNA sequence and secondary structure features to determine the specificity of small molecules [43,45]. These methods screen their databases by calculating the similarity of secondary structure motifs and predicting ligands that correspond to the target RNA. RNAmigos transforms RNA tertiary structures into networks, considering both canonical and non-canonical base-pairing interactions to predict ligand fingerprints for the target RNA [86]. However, R-BIND and RLigands overlook the characteristics of ligands, which limits their capacity to predict novel ligands. Our work, ZHMol-RLinter, focuses on learning from the RNA–ligand interaction interface. It incorporates RNA sequence, secondary structure, tertiary structure, and physicochemical environment features to predict whether a target RNA and ligand will bind [87]. Compared to other methods, ZHMol-RLinter captures interaction information by learning comprehensive features from both RNA and ligands.

We compared the performance of R-BIND, RLigands, RNAmigos, and ZHMol-RLinter on the challenging unknown ligands in the UNK96 testing set (Table S1). As shown in Figure 7A, ZHMol-RLinter achieved a success rate of 77.1%, while the success rates for the top 20 predicted ligands in R-BIND, RLigands, and RNAmigos were 40.6%, 33.3%, and 29.2%, respectively (Table S2). This demonstrates that ZHMol-RLinter has higher accuracy and comprehensive capabilities. ZHMol-RLinter achieved an F1 score of 70.4% on predicted structures with an RMSD accuracy of 8

Å

[87]. The results indicated that ZHMol-RLinter is a reliable method with notable robustness. The accuracy of predicted RNA 3D structures influences the performance of RNA–ligand binding preference predictions. Therefore, advancing RNA 3D structure prediction methods plays a crucial role in deepening our understanding of RNA–ligand interactions [88,89,90,91]. We also evaluated the contribution of different feature types in the random forest model of ZHMol-RLinter. The results indicated that structural geometric features, specifically network properties and LN, are critical in ligand-specific recognition (Figure S1).

Additionally, we assessed the performance of ZHMol-RLinter on various secondary structure motifs within the UNK96 testing set (Figure S2 and Table S3). Using a 4

Å

cutoff, we identified all loop motifs in the UNK96 testing set that physically interact with ligands. This resulted in a total of 138 loop motif–ligand pairs, which included 43 hairpin loops, 47 internal loops, 29 bulge loops, and 19 multibranch loops (Figure 7B). The results showed that the success rates for hairpin, internal, and bulge loops were 67.4%, 63.8%, and 62.1%, respectively (Figure 7C). In contrast, the success rate for multibranch loops was significantly lower at only 26.3%. Multibranch loops are complex structural domains that typically consist of more nucleotides than other types of motifs. This structural complexity arises from the extensive interactions between nucleotides within multibranch loops, which facilitate the formation of intricate tertiary structures in RNA. Consequently, predicting ligands in the multibranch loop regions of RNA is particularly challenging. This limitation highlights the need for substantial improvements in current methods, especially those targeting multibranch loops, to enhance the accuracy of RNA–ligand interaction predictions.

6. Future Directions

RNAs stabilize the complex structures and execute their biological functions through interactions with other molecules, such as small molecular ligands and ions [92,93]. The non-specific electrostatic force between RNA and ions leads to the accumulation of ions around the RNA, resulting in significant ion–ion correlations and fluctuation effects [94]. Ion concentration can influence the RNA recognition process [95,96]. Compared to small molecular ligands, ions are nearly spherical and much smaller in size. This characteristic causes ions to distribute more diffusely on the RNA surface, resulting in non-specific binding. Unlike ion interaction mechanisms, small molecular ligands recognize specific structural regions of RNA, interacting with RNA pockets in a “key and lock” mode. Thus, our review focuses on exploring RNA–small molecule interactions, revealing how RNA sequence and structural features facilitate the specific recognition of small molecules.

RNAs interact with small molecules through binding pockets [97,98,99,100]. For example, the flavin mononucleotide (FMN) riboswitch regulates gene expression by binding the FMN ligand [101]. This binding process induces changes in the RNA secondary structures, forming stacking interactions within the binding pocket to stabilize the complex. In recent years, we have conducted detailed studies on pocket structures to unveil the mechanisms by which they recognize other molecules. We have developed the RPocket and RPpockets databases, which provide information on RNA–ligand and RNA–protein interactions with pocket topology, respectively [47,102]. To describe the geometry of RNA pockets, we have developed pocket-shape descriptors that categorize pockets as sphere-like, disc-like, or rod-like. Furthermore, we have systematically analyzed the non-catalytic pockets and constructed the HKPocket database to provide druggability information on kinase pockets, thereby advancing our understanding of regulation mechanisms [103,104]. Previous studies have shown that ligand-binding RNA pockets occupy a similar property space to protein pockets that bind ligands [105]. However, RNA–ligand interaction predictions are still inadequate compared to proteins. Thus, determining how to capture the RNA pocket features to understand the RNA–ligand interactions will be a key focus of future work.

We developed computational methods, ZHmolReSTasite and ZHMol-RLinter, utilizing SASA, LN, and network topological descriptors to characterize the geometric features of RNA pockets, effectively demonstrating their ability to specifically recognize ligands. For instance, the average Laplacian norm per nucleotide in the internal loop of the ribosomal RNA (PDB ID: 7JJU_A) is lower than that of the other two loops (Figure S3A). This suggests that the nucleotides in the internal loop are situated in the concave regions of the RNA surface, creating a pocket suitable for GZ6 ligand binding. ZHMol-RLinter accurately predicted the binding preferences of this internal loop. However, the hairpin loop is less likely to form a pocket suitable for CAC ligand binding, which contributes to ZHMol-RLinter’s inaccuracy. This may be due to the more long-range interactions present at the hairpin loop, such as the U24–A32, U25–A29, A7–G26, and G8–C27 interaction pairs. This highlights the significance of considering both local and long-range interactions in RNA–ligand interaction predictions, especially the long-distance communication information encoded by nucleotides. Additionally, ZHMol-RLinter failed to predict two unpaired single-stranded ribosomal RNA–ligand complexes, which have their 3D structures exposed to the solvent (Figure S3B). This exposure hinders the formation of RNA pockets. Therefore, investigating the interactions of these complex RNA–ligand structures could be a promising direction for the future. Compared to the advancements in RNA–ligand binding site and preference prediction, methods for predicting binding affinity are still very limited, with only two recently developed approaches (RLaffinity and RSAPred) [36,37]. Future work should comprehensively consider both RNA sequence and structural features, particularly the geometric characteristics of binding pockets. These multidimensional approaches will help more accurately capture the specificity of RNA in recognizing ligands, thereby improving the predictive performance for binding affinity.

Additionally, a notable challenge in the development of computational methods lies in the scarcity of experimental data. In the PDB database, experimentally determined RNA-related complexes account for only about 4% [23]. Even more limited are experimentally determined RNA-binding affinity data, with only 149 nucleic acid–ligand complexes included in the PDBbind 2020 version [41]. Therefore, constructing pre-trained models to enhance the training data for machine learning models represents a promising pathway for future development. For example, RLaffinity employs a self-supervised pre-training process on RNA–ligand structures without binding affinity data, aiming to maximize embedding learning from the limited data [36]. Pre-training models provide a valuable strategy for RNA interaction predictions by learning universal representations of RNA from large datasets, capturing the intrinsic patterns and structural information without relying on costly experimental annotations. Furthermore, exploring ensemble learning techniques may offer a way to combine multiple models for better prediction accuracy. For example, WVDL uses a weighted voting method based on RNA sequence information to combine convolutional neural networks, long short-term memory networks, and residual networks for protein binding site prediction [106].

This review examines advancements in the interactions between RNA and ligands, focusing on the mechanisms of these interactions and the prediction methods. Long-range forces play a key role in the recognition of RNA by ligands, while short-range forces help optimize and stabilize the complex structure. We analyze RNA–ligand interaction mechanisms from various perspectives, including sequences and structures, as well as their applications in predicting these interactions. However, the development of methods to predict binding affinity lags in the prediction of RNA–ligand binding sites and preferences, remaining in the early stages. To improve binding affinity predictions, it is crucial to integrate multidimensional RNA sequences and structural features, particularly the geometric characteristics of binding pockets. This approach can better capture the specificity of RNA in recognizing ligands and offers promising insights for future research. Additionally, utilizing pre-trained models and ensemble learning algorithms can enhance the training data for machine learning models, helping to address the limitations of experimental data. As computational methods advance, we anticipate significant breakthroughs in RNA–ligand complex research, which will lay a stronger foundation for RNA-targeted drug design and therapeutic strategies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/life15010104/s1, Figure S1. The importance of different features in the Random Forest model of ZHMol-RLinter. GE represents the geometric features, including LN and network properties; Figure S2. Example of RNA secondary structural motifs from the Arabidopsis thaliana thiamine pyrophosphate riboswitch with its regulatory ligand (PDB ID: 2CKY), including multibranch loop, bulge loop, internal loop, and hairpin loop; Figure S3. (A) The interaction of ribosomal RNA–ligand complexes (PDB ID:7JJU_A). The ribosomal RNA includes a hairpin loop, a bulge, and an internal loop. The hairpin loop has one hydrogen bond and five hydrophobic contacts with ligand (CAC). The internal loop has two hydrogen bonds and four hydrophobic contacts with ligand (GZ6). (B) The unpaired single-stranded ribosomal RNA–ligand complexes; Table S1. List of RNA and small molecules in the UNK96 testing set for ZHMol-RLinter; Table S2. The success rates of ZHMol-RLinter, RLigands, RNAmigos, and R-BIND in the UNK96 testing set; Table S3. The success rates of ZHMol-RLinter in different loop types in the UNK96 testing set.

Author Contributions

C.Z. (Chen Zhuo) collected the information on the RNA–ligand interaction and wrote the manuscript; C.Z. (Chengwei Zeng) and H.L. assisted with computational model information collection; H.W. and Y.P. assisted with discussion prediction methods; Y.Z. designed the project and supervised the overall study. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (grant no. 12175081 and 12204154); the Science Fund for Distinguished Young Scholars of Hubei Province (grant no. 2024AFA077); the Natural Science Foundation of Wuhan (grant no. 2024040801020302); and the Fundamental Research Funds for the Central China Normal Universities (grant no. CCNU22QN004 and CCNU24JC012). This work was also supported by the Central China Normal University’s excellent postgraduate education innovation funding project (grant no. 2024CXZZ146); Henan Province Science and Technology Research Project (grant no. 242102310354).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Serganov, A.; Huang, L.; Patel, D.J. Coenzyme recognition and gene regulation by a flavin mononucleotide riboswitch. Nature 2009, 458, 233–237. [Google Scholar] [CrossRef] [PubMed]
Loan Young, T.; Chang Wang, K.; James Varley, A.; Li, B. Clinical delivery of circular RNA: Lessons learned from RNA drug development. Adv. Drug Deliv. Rev. 2023, 197, 114826. [Google Scholar] [CrossRef]
Falese, J.P.; Donlic, A.; Hargrove, A.E. Targeting RNA with small molecules: From fundamental principles towards the clinic. Chem. Soc. Rev. 2021, 50, 2224–2243. [Google Scholar] [CrossRef] [PubMed]
Hargrove, A.E. Small molecule-RNA targeting: Starting with the fundamentals. Chem. Commun. 2020, 56, 14744–14756. [Google Scholar] [CrossRef]
Dibrov, S.M.; Ding, K.; Brunn, N.D.; Parker, M.A.; Bergdahl, B.M.; Wyles, D.L.; Hermann, T. Structure of a hepatitis C virus RNA domain in complex with a translation inhibitor reveals a binding mode reminiscent of riboswitches. Proc. Natl. Acad. Sci. USA 2012, 109, 5223–5228. [Google Scholar] [CrossRef]
Andreasson, J.O.L.; Savinov, A.; Block, S.M.; Greenleaf, W.J. Comprehensive sequence-to-function mapping of cofactor-dependent RNA catalysis in the glmS ribozyme. Nat. Commun. 2020, 11, 1663. [Google Scholar] [CrossRef]
Bosshard, H.R. Molecular recognition by induced fit: How fit is the concept? Physiology 2001, 16, 171–173. [Google Scholar] [CrossRef] [PubMed]
Duchardt-Ferner, E.; Weigand, J.E.; Ohlenschlager, O.; Schmidtke, S.R.; Suess, B.; Wohnert, J. Highly modular structure and ligand binding by conformational capture in a minimalistic riboswitch. Angew. Chem. Int. Ed. Engl. 2010, 49, 6216–6219. [Google Scholar] [CrossRef]
Leulliot, N.; Varani, G. Current topics in RNA-protein recognition: Control of specificity and biological function through induced fit and conformational capture. Biochemistry 2001, 40, 7947–7956. [Google Scholar] [CrossRef] [PubMed]
Haller, A.; Souliere, M.F.; Micura, R. The dynamic nature of RNA as key to understanding riboswitch mechanisms. Acc. Chem. Res. 2011, 44, 1339–1348. [Google Scholar] [CrossRef]
Haller, A.; Rieder, U.; Aigner, M.; Blanchard, S.C.; Micura, R. Conformational capture of the SAM-II riboswitch. Nat. Chem. Biol. 2011, 7, 393–400. [Google Scholar] [CrossRef] [PubMed]
Rigden, D.J.; Fernandez, X.M. The 2024 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res. 2024, 52, D1–D9. [Google Scholar] [CrossRef] [PubMed]
Berman, H.M.; Lawson, C.L.; Schneider, B. Developing Community Resources for Nucleic Acid Structures. Life 2022, 12, 540. [Google Scholar] [CrossRef]
Mehta, A.; Sonam, S.; Gouri, I.; Loharch, S.; Sharma, D.K.; Parkesh, R. SMMRNA: A database of small molecule modulators of RNA. Nucleic Acids Res. 2014, 42, D132–D141. [Google Scholar] [CrossRef]
Kumar Mishra, S.; Kumar, A. NALDB: Nucleic acid ligand database for small molecules targeting nucleic acid. Database 2016, 2016, baw002. [Google Scholar] [CrossRef]
Zhang, W.; Yao, G.; Wang, J.; Yang, M.; Wang, J.; Zhang, H.; Li, W. ncRPheno: A comprehensive database platform for identification and validation of disease related noncoding RNAs. RNA Biol. 2020, 17, 943–955. [Google Scholar] [CrossRef]
Cheng, J.; Lin, Y.; Xu, L.; Chen, K.; Li, Q.; Xu, K.; Ning, L.; Kang, J.; Cui, T.; Huang, Y.; et al. ViRBase v3.0: A virus and host ncRNA-associated interaction repository with increased coverage and annotation. Nucleic Acids Res. 2022, 50, D928–D933. [Google Scholar] [CrossRef]
Lin, X.; Lu, Y.; Zhang, C.; Cui, Q.; Tang, Y.D.; Ji, X.; Cui, C. LncRNADisease v3.0: An updated database of long non-coding RNA-associated diseases. Nucleic Acids Res. 2024, 52, D1365–D1369. [Google Scholar] [CrossRef]
Szpotkowski, K.; Wojcik, K.; Kurzynska-Kokorniak, A. Structural studies of protein-nucleic acid complexes: A brief overview of the selected techniques. Comput. Struct. Biotechnol. J. 2023, 21, 2858–2872. [Google Scholar] [CrossRef]
Schmidt, M. Time-Resolved Macromolecular Crystallography at Pulsed X-ray Sources. Int. J. Mol. Sci. 2019, 20, 1401. [Google Scholar] [CrossRef]
Danev, R.; Yanagisawa, H.; Kikkawa, M. Cryo-Electron Microscopy Methodology: Current Aspects and Future Directions. Trends Biochem. Sci. 2019, 44, 837–848. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Gong, Z.; Zhao, Y. Methods and Applications in Proteins and RNAs. Life 2023, 13, 672. [Google Scholar] [CrossRef] [PubMed]
Velankar, S.; Burley, S.K.; Kurisu, G.; Hoch, J.C.; Markley, J.L. The Protein Data Bank Archive. Methods Mol. Biol. 2021, 2305, 3–21. [Google Scholar] [CrossRef] [PubMed]
Wu, K.E.; Zou, J.Y.; Chang, H. Machine learning modeling of RNA structures: Methods, challenges and future perspectives. Brief. Bioinform. 2023, 24, bbad210. [Google Scholar] [CrossRef] [PubMed]
Xiao, H.; Yang, X.; Zhang, Y.; Zhang, Z.; Zhang, G.; Zhang, B.T. RNA-targeted small-molecule drug discoveries: A machine-learning perspective. RNA Biol. 2023, 20, 384–397. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Zhuo, C.; Gao, J.; Zeng, C.; Zhao, Y. AI-integrated network for RNA complex structure and dynamic prediction. Biophys. Rev. 2024, 5, 041304. [Google Scholar] [CrossRef]
Justyna, M.; Antczak, M.; Szachniuk, M. Machine learning for RNA 2D structure prediction benchmarked on experimental data. Brief. Bioinform. 2023, 24, bbad153. [Google Scholar] [CrossRef]
Schneider, B.; Sweeney, B.A.; Bateman, A.; Cerny, J.; Zok, T.; Szachniuk, M. When will RNA get its AlphaFold moment? Nucleic Acids Res. 2023, 51, 9522–9532. [Google Scholar] [CrossRef] [PubMed]
Bernard, C.; Postic, G.; Ghannay, S.; Tahi, F. State-of-the-RNArt: Benchmarking current methods for RNA 3D structure prediction. NAR Genom. Bioinform. 2024, 6, lqae048. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Jian, Y.; Hou, J.; Zeng, C.; Zhao, Y. RNet: A network strategy to predict RNA binding preferences. Brief. Bioinform. 2023, 25, bbad482. [Google Scholar] [CrossRef]
Su, H.; Peng, Z.; Yang, J. Recognition of small molecule-RNA binding sites using RNA sequence and structure. Bioinformatics 2021, 37, 36–42. [Google Scholar] [CrossRef] [PubMed]
Zeng, C.; Jian, Y.; Vosoughi, S.; Zeng, C.; Zhao, Y. Evaluating native-like structures of RNA-protein complexes through the deep learning method. Nat. Commun. 2023, 14, 1060. [Google Scholar] [CrossRef] [PubMed]
Zeng, C.; Zhuo, C.; Gao, J.; Liu, H.; Zhao, Y. Advances and Challenges in Scoring Functions for RNA-Protein Complex Structure Prediction. Biomolecules 2024, 14, 1245. [Google Scholar] [CrossRef]
Gao, J.; Liu, H.; Zhuo, C.; Zeng, C.; Zhao, Y. Predicting Small Molecule Binding Nucleotides in RNA Structures Using RNA Surface Topography. J. Chem. Inf. Model. 2024, 64, 6979–6992. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Quan, L.; Jin, Z.; Wu, H.; Ma, X.; Wang, X.; Xie, J.; Pan, D.; Chen, T.; Wu, T.; et al. MultiModRLBP: A Deep Learning Approach for Multi-Modal RNA-Small Molecule Ligand Binding Sites Prediction. IEEE J. Biomed. Health Inform. 2024, 28, 4995–5006. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Gao, L. Contrastive pre-training and 3D convolution neural network for RNA and small molecule binding affinity prediction. Bioinformatics 2024, 40, btae155. [Google Scholar] [CrossRef]
Krishnan, S.R.; Roy, A.; Gromiha, M.M. Reliable method for predicting the binding affinity of RNA-small molecule interactions using machine learning. Brief. Bioinform. 2024, 25, bbae002. [Google Scholar] [CrossRef] [PubMed]
Lawson, C.L.; Berman, H.M.; Chen, L.; Vallat, B.; Zirbel, C.L. The Nucleic Acid Knowledgebase: A new portal for 3D structural information about nucleic acids. Nucleic Acids Res. 2024, 52, D245–D254. [Google Scholar] [CrossRef]
Li, L.; Wu, P.; Wang, Z.; Meng, X.; Zha, C.; Li, Z.; Qi, T.; Zhang, Y.; Han, B.; Li, S.; et al. NoncoRNA: A database of experimentally supported non-coding RNAs and drug targets in cancer. J. Hematol. Oncol. 2020, 13, 15. [Google Scholar] [CrossRef]
Yazdani, K.; Jordan, D.; Yang, M.; Fullenkamp, C.R.; Calabrese, D.R.; Boer, R.; Hilimire, T.; Allen, T.E.H.; Khan, R.T.; Schneekloth, J.S., Jr. Machine Learning Informs RNA-Binding Chemical Space. Angew. Chem. Int. Ed. Engl. 2023, 62, e202211358. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Li, Y.; Han, L.; Li, J.; Liu, J.; Zhao, Z.; Nie, W.; Liu, Y.; Wang, R. PDB-wide collection of binding data: Current status of the PDBbind database. Bioinformatics 2015, 31, 405–412. [Google Scholar] [CrossRef]
Ramaswamy Krishnan, S.; Roy, A.; Michael Gromiha, M. R-SIM: A Database of Binding Affinities for RNA-small Molecule Interactions. J. Mol. Biol. 2023, 435, 167914. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Yang, J.; Zhang, Z. RNALigands: A database and web server for RNA-ligand interactions. RNA 2022, 28, 115–122. [Google Scholar] [CrossRef]
Disney, M.D.; Winkelsas, A.M.; Velagapudi, S.P.; Southern, M.; Fallahi, M.; Childs-Disney, J.L. Inforna 2.0: A Platform for the Sequence-Based Design of Small Molecules Targeting Structured RNAs. ACS Chem. Biol. 2016, 11, 1720–1728. [Google Scholar] [CrossRef] [PubMed]
Donlic, A.; Swanson, E.G.; Chiu, L.Y.; Wicks, S.L.; Juru, A.U.; Cai, Z.; Kassam, K.; Laudeman, C.; Sanaba, B.G.; Sugarman, A.; et al. R-BIND 2.0: An Updated Database of Bioactive RNA-Targeting Small Molecules and Associated RNA Secondary Structures. ACS Chem. Biol. 2022, 17, 1556–1566. [Google Scholar] [CrossRef] [PubMed]
Panei, F.P.; Torchet, R.; Menager, H.; Gkeka, P.; Bonomi, M. HARIBOSS: A curated database of RNA-small molecules structures to aid rational drug design. Bioinformatics 2022, 38, 4185–4193. [Google Scholar] [CrossRef]
Zhou, T.; Wang, H.; Zeng, C.; Zhao, Y. RPocket: An intuitive database of RNA pocket topology information with RNA-ligand data resources. BMC Bioinform. 2021, 22, 428. [Google Scholar] [CrossRef]
Haque, F.; Pi, F.; Zhao, Z.; Gu, S.; Hu, H.; Yu, H.; Guo, P. RNA versatility, flexibility, and thermostability for practice in RNA nanotechnology and biomedical applications. Wiley Interdiscip. Rev. RNA 2018, 9, e1452. [Google Scholar] [CrossRef]
Zhuo, C.; Zeng, C.; Yang, R.; Liu, H.; Zhao, Y. RPflex: A Coarse-Grained Network Model for RNA Pocket Flexibility Study. Int. J. Mol. Sci. 2023, 24, 5497. [Google Scholar] [CrossRef]
Auweter, S.D.; Oberstrass, F.C.; Allain, F.H. Sequence-specific binding of single-stranded RNA: Is there a code for recognition? Nucleic Acids Res. 2006, 34, 4943–4959. [Google Scholar] [CrossRef] [PubMed]
McDonald, I.K.; Thornton, J.M. Satisfying hydrogen bonding potential in proteins. J. Mol. Biol. 1994, 238, 777–793. [Google Scholar] [CrossRef] [PubMed]
Schwerdtfeger, P.; Wales, D.J. 100 Years of the Lennard-Jones Potential. J. Chem. Theory Comput. 2024, 20, 3379–3405. [Google Scholar] [CrossRef] [PubMed]
Gilbert, S.D.; Reyes, F.E.; Edwards, A.L.; Batey, R.T. Adaptive ligand binding by the purine riboswitch in the recognition of guanine and adenine analogs. Structure 2009, 17, 857–868. [Google Scholar] [CrossRef] [PubMed]
Seelam, P.P.; Mitra, A.; Sharma, P. Pairing interactions between nucleobases and ligands in aptamer:ligand complexes of riboswitches: Crystal structure analysis, classification, optimal structures, and accurate interaction energies. RNA 2019, 25, 1274–1290. [Google Scholar] [CrossRef] [PubMed]
Padroni, G.; Patwardhan, N.N.; Schapira, M.; Hargrove, A.E. Systematic analysis of the interactions driving small molecule-RNA recognition. RSC Med. Chem. 2020, 11, 802–813. [Google Scholar] [CrossRef]
Wilson, K.A.; Holland, D.J.; Wetmore, S.D. Topology of RNA-protein nucleobase-amino acid pi-pi interactions and comparison to analogous DNA-protein pi-pi contacts. RNA 2016, 22, 696–708. [Google Scholar] [CrossRef] [PubMed]
Collie, G.W.; Sparapani, S.; Parkinson, G.N.; Neidle, S. Structural basis of telomeric RNA quadruplex—Acridine ligand recognition. J. Am. Chem. Soc. 2011, 133, 2721–2728. [Google Scholar] [CrossRef] [PubMed]
Onofrio, A.; Parisi, G.; Punzi, G.; Todisco, S.; Di Noia, M.A.; Bossis, F.; Turi, A.; De Grassi, A.; Pierri, C.L. Distance-dependent hydrophobic-hydrophobic contacts in protein folding simulations. Phys. Chem. Chem. Phys. 2014, 16, 18907–18917. [Google Scholar] [CrossRef]
Dill, K.A.; Ozkan, S.B.; Shell, M.S.; Weikl, T.R. The protein folding problem. Annu. Rev. Biophys. 2008, 37, 289–316. [Google Scholar] [CrossRef]
Nandy, A.; Shekhar, S.; Paul, B.K.; Mukherjee, S. Exploring the Nucleobase-Specific Hydrophobic Interaction of Cryptolepine Hydrate with RNA and Its Subsequent Sequestration. Langmuir 2021, 37, 11176–11187. [Google Scholar] [CrossRef]
Dezanet, C.; Kempf, J.; Mingeot-Leclercq, M.P.; Decout, J.L. Amphiphilic Aminoglycosides as Medicinal Agents. Int. J. Mol. Sci. 2020, 21, 7411. [Google Scholar] [CrossRef]
Murray, J.B.; Meroueh, S.O.; Russell, R.J.; Lentzen, G.; Haddad, J.; Mobashery, S. Interactions of designer antibiotics and the bacterial ribosomal aminoacyl-tRNA site. Chem. Biol. 2006, 13, 129–138. [Google Scholar] [CrossRef]
Koehl, P. Electrostatics calculations: Latest methodological advances. Curr. Opin. Struct. Biol. 2006, 16, 142–151. [Google Scholar] [CrossRef]
Jin, S.; Wang, D.; Jin, X.; Chen, G.Z. Intramolecular electrostatics: Coulomb’s law at sub-nanometers. Chemphyschem 2004, 5, 1623–1629. [Google Scholar] [CrossRef]
Rooman, M.; Lievin, J.; Buisine, E.; Wintjens, R. Cation-pi/H-bond stair motifs at protein-DNA interfaces. J. Mol. Biol. 2002, 319, 67–76. [Google Scholar] [CrossRef] [PubMed]
Huang, L.; Wang, J.; Lilley, D.M.J. The Structure of the Guanidine-II Riboswitch. Cell Chem. Biol. 2017, 24, 695–702.E2. [Google Scholar] [CrossRef] [PubMed]
Li, D.; Cao, Z.; Chen, C.; Li, H.; He, S.; Hou, X.; Liang, M.; Yang, X.; Wang, J. Nanoassembly of doxorubicin-conjugated polyphosphoester and siRNA simultaneously elicited macrophage- and T cell-mediated anticancer immune response for cancer therapy. Biomaterials 2023, 302, 122339. [Google Scholar] [CrossRef]
Krepl, M.; Vogele, J.; Kruse, H.; Duchardt-Ferner, E.; Wohnert, J.; Sponer, J. An intricate balance of hydrogen bonding, ion atmosphere and dynamics facilitates a seamless uracil to cytosine substitution in the U-turn of the neomycin-sensing riboswitch. Nucleic Acids Res. 2018, 46, 6528–6543. [Google Scholar] [CrossRef]
Shih, P.; Pedersen, L.G.; Gibbs, P.R.; Wolfenden, R. Hydrophobicities of the nucleic acid bases: Distribution coefficients from water to cyclohexane. J. Mol. Biol. 1998, 280, 421–430. [Google Scholar] [CrossRef]
Roth, A.; Breaker, R.R. The structural and functional diversity of metabolite-binding riboswitches. Annu. Rev. Biochem. 2009, 78, 305–334. [Google Scholar] [CrossRef] [PubMed]
Jiang, H.; Mankodi, A.; Swanson, M.S.; Moxley, R.T.; Thornton, C.A. Myotonic dystrophy type 1 is associated with nuclear foci of mutant RNA, sequestration of muscleblind proteins and deregulated alternative splicing in neurons. Hum. Mol. Genet. 2004, 13, 3079–3088. [Google Scholar] [CrossRef] [PubMed]
Childs-Disney, J.L.; Yildirim, I.; Park, H.; Lohman, J.R.; Guan, L.; Tran, T.; Sarkar, P.; Schatz, G.C.; Disney, M.D. Structure of the myotonic dystrophy type 2 RNA and designed small molecules that reduce toxicity. ACS Chem. Biol. 2014, 9, 538–550. [Google Scholar] [CrossRef]
Liu, B.; Childs-Disney, J.L.; Znosko, B.M.; Wang, D.; Fallahi, M.; Gallo, S.M.; Disney, M.D. Analysis of secondary structural elements in human microRNA hairpin precursors. BMC Bioinform. 2016, 17, 112. [Google Scholar] [CrossRef]
Wang, K.; Jian, Y.; Wang, H.; Zeng, C.; Zhao, Y. RBind: Computational network method to predict RNA binding sites. Bioinformatics 2018, 34, 3131–3136. [Google Scholar] [CrossRef]
Voss, N.R.; Gerstein, M. 3V: Cavity, channel and cleft volume calculator and extractor. Nucleic Acids Res. 2010, 38, W555–W562. [Google Scholar] [CrossRef]
Kawabata, T. Detection of multiscale pockets on protein surfaces using mathematical morphology. Proteins 2010, 78, 1195–1211. [Google Scholar] [CrossRef] [PubMed]
Guerra, J.V.S.; Ribeiro-Filho, H.V.; Pereira, J.G.C.; Lopes-de-Oliveira, P.S. KVFinder-web: A web-based application for detecting and characterizing biomolecular cavities. Nucleic Acids Res. 2023, 51, W289–W297. [Google Scholar] [CrossRef] [PubMed]
Barik, A.; C, N.; Pilla, S.P.; Bahadur, R.P. Molecular architecture of protein-RNA recognition sites. J. Biomol. Struct. Dyn. 2015, 33, 2738–2751. [Google Scholar] [CrossRef] [PubMed]
Bahadur, R.P.; Zacharias, M.; Janin, J. Dissecting protein-RNA recognition sites. Nucleic Acids Res. 2008, 36, 2705–2716. [Google Scholar] [CrossRef]
Sato, K.; Hamada, M. Recent trends in RNA informatics: A review of machine learning and deep learning for RNA secondary structure prediction and RNA drug discovery. Brief. Bioinform. 2023, 24, bbad186. [Google Scholar] [CrossRef] [PubMed]
Zhou, Y.; Chen, S.J. Advances in machine-learning approaches to RNA-targeted drug design. Artif. Intell. Chem. 2024, 2, 100053. [Google Scholar] [CrossRef]
Morishita, E.C.; Nakamura, S. Recent applications of artificial intelligence in RNA-targeted small molecule drug discovery. Expert Opin. Drug Discov. 2024, 19, 415–431. [Google Scholar] [CrossRef]
Zeng, P.; Li, J.; Ma, W.; Cui, Q. Rsite: A computational method to identify the functional sites of noncoding RNAs. Sci. Rep. 2015, 5, 9179. [Google Scholar] [CrossRef] [PubMed]
Zeng, P.; Cui, Q. Rsite2: An efficient computational method to predict the functional sites of noncoding RNAs. Sci. Rep. 2016, 6, 19016. [Google Scholar] [CrossRef]
Wang, K.; Zhou, R.; Wu, Y.; Li, M. RLBind: A deep learning method to predict RNA-ligand binding sites. Brief. Bioinform. 2023, 24, bbac486. [Google Scholar] [CrossRef] [PubMed]
Oliver, C.; Mallet, V.; Gendron, R.S.; Reinharz, V.; Hamilton, W.L.; Moitessier, N.; Waldispuhl, J. Augmented base pairing networks encode RNA-small molecule binding preferences. Nucleic Acids Res. 2020, 48, 7690–7699. [Google Scholar] [CrossRef]
Zhuo, C.; Gao, J.; Li, A.; Liu, X.; Zhao, Y. A Machine Learning Method for RNA-Small Molecule Binding Preference Prediction. J. Chem. Inf. Model. 2024, 64, 7386–7397. [Google Scholar] [CrossRef] [PubMed]
Waterhouse, A.M.; Studer, G.; Robin, X.; Bienert, S.; Tauriello, G.; Schwede, T. The structure assessment web server: For proteins, complexes and more. Nucleic Acids Res. 2024, 52, W318–W323. [Google Scholar] [CrossRef]
Lukasiak, P.; Antczak, M.; Ratajczak, T.; Szachniuk, M.; Popenda, M.; Adamiak, R.W.; Blazewicz, J. RNAssess—A web server for quality assessment of RNA 3D structures. Nucleic Acids Res. 2015, 43, W502–W506. [Google Scholar] [CrossRef] [PubMed]
Mirabello, C.; Wallner, B. DockQ v2: Improved automatic quality measure for protein multimers, nucleic acids, and small molecules. Bioinformatics 2024, 40, btae586. [Google Scholar] [CrossRef]
Nithin, C.; Kmiecik, S.; Blaszczyk, R.; Nowicka, J.; Tuszynska, I. Comparative analysis of RNA 3D structure prediction methods: Towards enhanced modeling of RNA-ligand interactions. Nucleic Acids Res. 2024, 52, 7465–7486. [Google Scholar] [CrossRef] [PubMed]
Bai, Y.; Das, R.; Millett, I.S.; Herschlag, D.; Doniach, S. Probing counterion modulated repulsion and attraction between nucleic acid duplexes in solution. Proc. Natl. Acad. Sci. USA 2005, 102, 1035–1040. [Google Scholar] [CrossRef] [PubMed]
Chu, V.B.; Herschlag, D. Unwinding RNA’s secrets: Advances in the biology, physics, and modeling of complex RNAs. Curr. Opin. Struct. Biol. 2008, 18, 305–314. [Google Scholar] [CrossRef]
Sun, L.Z.; Zhang, J.X.; Chen, S.J. MCTBI: A web server for predicting metal ion effects in RNA structures. RNA 2017, 23, 1155–1165. [Google Scholar] [CrossRef]
Zhao, C.; Zhang, D.; Jiang, Y.; Chen, S.J. Modeling Loop Composition and Ion Concentration Effects in RNA Hairpin Folding Stability. Biophys. J. 2020, 119, 1439–1455. [Google Scholar] [CrossRef]
Xu, J.; Cotruvo, J.A., Jr. Iron-responsive riboswitches. Curr. Opin. Chem. Biol. 2022, 68, 102135. [Google Scholar] [CrossRef] [PubMed]
Cruz, M.A.; Frederick, T.E.; Mallimadugula, U.L.; Singh, S.; Vithani, N.; Zimmerman, M.I.; Porter, J.R.; Moeder, K.E.; Amarasinghe, G.K.; Bowman, G.R. A cryptic pocket in Ebola VP35 allosterically controls RNA binding. Nat. Commun. 2022, 13, 2269. [Google Scholar] [CrossRef]
Schroeder, G.M.; Cavender, C.E.; Blau, M.E.; Jenkins, J.L.; Mathews, D.H.; Wedekind, J.E. A small RNA that cooperatively senses two stacked metabolites in one pocket for gene control. Nat. Commun. 2022, 13, 199. [Google Scholar] [CrossRef]
Petushkov, I.; Pupov, D.; Bass, I.; Kulbachinskiy, A. Mutations in the CRE pocket of bacterial RNA polymerase affect multiple steps of transcription. Nucleic Acids Res. 2015, 43, 5798–5809. [Google Scholar] [CrossRef]
Liu, H.; Zhao, Y. Integrated modeling of protein and RNA. Brief. Bioinform. 2024, 25, bbae139. [Google Scholar] [CrossRef]
Vicens, Q.; Mondragon, E.; Batey, R.T. Molecular sensing by the aptamer domain of the FMN riboswitch: A general model for ligand binding by conformational selection. Nucleic Acids Res. 2011, 39, 8586–8598. [Google Scholar] [CrossRef]
Yang, R.; Liu, H.; Yang, L.; Zhou, T.; Li, X.; Zhao, Y. RPpocket: An RNA-Protein Intuitive Database with RNA Pocket Topology Resources. Int. J. Mol. Sci. 2022, 23, 6903. [Google Scholar] [CrossRef]
Wang, H.W.; Wang, K.L.; Guan, Z.Y.; Jian, Y.R.; Jia, Y.; Kashanchi, F.; Zeng, C.; Zhao, Y.J. Computational study of non-catalytic T-loop pocket on CDK proteins for drug development. Chin. Phys. B 2017, 26, 128702. [Google Scholar] [CrossRef]
Wang, H.; Qiu, J.; Liu, H.; Xu, Y.; Jia, Y.; Zhao, Y. HKPocket: Human kinase pocket database for drug design. BMC Bioinform. 2019, 20, 617. [Google Scholar] [CrossRef] [PubMed]
Hewitt, W.M.; Calabrese, D.R.; Schneekloth, J.S., Jr. Evidence for ligandable sites in structured RNA throughout the Protein Data Bank. Bioorg. Med. Chem. 2019, 27, 2253–2260. [Google Scholar] [CrossRef]
Pan, Z.; Zhou, S.; Liu, T.; Liu, C.; Zang, M.; Wang, Q. WVDL: Weighted Voting Deep Learning Model for Predicting RNA-Protein Binding Sites. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 3322–3328. [Google Scholar] [CrossRef]

Figure 1. RNA features at different levels are used to predict RNA–ligand interactions, including sequence, secondary structure, and tertiary structure features.

Figure 2. (A) The structure distribution of PDB and NAKB databases, and the number of NAKB database structures over time. (B) The type distribution of RNA–ligand databases, including structure, interaction, and RNA pocket databases.

Figure 3. (A) Hydrogen bonding patterns induced by the ligand (2BP) to interact with nucleobases in the crystal structure of guanine riboswitch (PDB ID: 3G4M). (B) Stacking patterns induced by the ligand (R14) to interact with a telomeric RNA G-quadruplex (PDB ID:3MIJ). (C) The hydrophobicity-driven contacts induced by the ligand (AB6) binding to the ribosomal aminoacyl tRNA (PDB ID: 2FU4). (D) The cation–

π

interaction formed between the ligand (GAI) and the guanidine II riboswitch (PDB ID: 5NEF).

Figure 3. (A) Hydrogen bonding patterns induced by the ligand (2BP) to interact with nucleobases in the crystal structure of guanine riboswitch (PDB ID: 3G4M). (B) Stacking patterns induced by the ligand (R14) to interact with a telomeric RNA G-quadruplex (PDB ID:3MIJ). (C) The hydrophobicity-driven contacts induced by the ligand (AB6) binding to the ribosomal aminoacyl tRNA (PDB ID: 2FU4). (D) The cation–

π

interaction formed between the ligand (GAI) and the guanidine II riboswitch (PDB ID: 5NEF).

Figure 4. RNA structure is organized into three hierarchical levels: sequence, secondary structure, and tertiary structure. RNA features encompass three levels: sequence, secondary structure, and tertiary structure. Various physical interaction forces contribute to stabilizing the complex structures formed between RNA and other molecules, such as 2KZT.

Figure 5. Timeline of the development of RNA–ligand interaction prediction, including binding site, binding affinity, and binding preference prediction.

Figure 6. The performance of different methods for RNA–ligand binding site prediction was evaluated on the benchmark dataset TE18, including precision (A), recall (B), and Matthews correlation coefficient (MCC) (C).

Figure 7. (A) The performance of ZHMol-RLinter, R-BIND, RLigands, and RNAmigos on the challenging UNK96 testing set. (B) The distribution of motif types in the UNK96 testing set, including hairpin loops, internal loops, bulge loops, and multibranch loops. (C) The performance of ZHMol-RLinter, R-BIND, RLigands, and RNAmigos on different loop motifs within the UNK96 testing set.

Table 1. List of RNA–ligand databases, including time, description, data availability, and reference.

Database	Time	Description	Data Availability	Reference
SMMRNA	2014	The database of small molecule modulators along with their target RNA and experimentally determined binding data.	N/A	[14]
NALDB	2016	The database provides detailed information about the experimental data of small molecules that were reported to target all types of nucleic acid structures.	https://www.iiti.ac.in/people/~amitk/bsbe/naldb/HOME.php (9 December 2024)	[15]
Inforna 2.0	2017	A small molecule design platform for structured RNAs that integrates all known RNA motif–small molecule binding partners reported in the scientific literature.	N/A	[44]
NoncoRNA	2020	A manually curated database of ncRNAs and drug target associations designed to provide a potential resource of high-quality data for the exploration of drug sensitivity-related ncRNAs in a variety of human cancers.	http://www.ncdtcdb.cn:8080/NoncoRNA/ (9 December 2024)	[39]
RPocket	2021	An intuitive database of RNA pocket topology information with RNA–ligand data resources that provides geometrical size, centroid, shape, and secondary structure elements of RNA pockets.	http://zhaoserver.com.cn/RPocket/RPocket.html (9 December 2024)	[47]
R-BIND	2022	An updated database of bioactive RNA-targeting small molecules and associated RNA secondary structures.	https://rbind.chem.duke.edu (9 December 2024)	[45]
RNALigands	2022	A database of RNA secondary structure motifs and small molecular ligands.	https://github.com/SaisaiSun/RNALigands (9 December 2024)	[43]
HARIBOSS	2022	A database on the pocket structure and physicochemical properties of ligands and RNA.	http://hariboss.pasteur.cloud (9 December 2024)	[46]
R-SIM	2023	An experimentally validated database of RNA–small molecule interactions that provides comprehensive information on the sequence, structure, and classification of RNA, various physicochemical properties of small molecules, binding affinities, and the literature sources of the data.	https://web.iitm.ac.in/bioinfo2/R_SIM/index.html (9 December 2024)	[42]
ROBIN	2023	A library of nucleic acid binders was identified by small molecule microarray (SMM) screening that reported 2003 RNA–ligand small molecules, representing the largest fully publicly available experimentally derived library to date.	https://github.com/ky66/ROBIN (9 December 2024)	[40]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhuo, C.; Zeng, C.; Liu, H.; Wang, H.; Peng, Y.; Zhao, Y. Advances and Mechanisms of RNA–Ligand Interaction Predictions. Life 2025, 15, 104. https://doi.org/10.3390/life15010104

AMA Style

Zhuo C, Zeng C, Liu H, Wang H, Peng Y, Zhao Y. Advances and Mechanisms of RNA–Ligand Interaction Predictions. Life. 2025; 15(1):104. https://doi.org/10.3390/life15010104

Chicago/Turabian Style

Zhuo, Chen, Chengwei Zeng, Haoquan Liu, Huiwen Wang, Yunhui Peng, and Yunjie Zhao. 2025. "Advances and Mechanisms of RNA–Ligand Interaction Predictions" Life 15, no. 1: 104. https://doi.org/10.3390/life15010104

APA Style

Zhuo, C., Zeng, C., Liu, H., Wang, H., Peng, Y., & Zhao, Y. (2025). Advances and Mechanisms of RNA–Ligand Interaction Predictions. Life, 15(1), 104. https://doi.org/10.3390/life15010104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advances and Mechanisms of RNA–Ligand Interaction Predictions

Abstract

1. Introduction

2. RNA–Ligand Database Resources

3. Physics-Based Interaction Forces on RNA–Ligand Complexes

3.1. The Short-Range Forces

3.2. The Long-Range Force

4. Interaction Mechanisms Extracted from RNA Features

4.1. Sequence Features

4.2. Secondary Structure Features

4.3. Tertiary Structure Features

5. RNA–Ligand Interaction Prediction

6. Future Directions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI