Next Article in Journal
Pathophysiology of Congenital High Production of IgE and Its Consequences: A Narrative Review Uncovering a Neglected Setting of Disorders
Previous Article in Journal
Allelic, Genotypic, and Haplotypic Analysis of Cytokine IL17A, IL17F, and Toll-like Receptor TLR4 Gene Polymorphisms in Metabolic-Dysfunction-Associated Steatotic Liver Disease: Insights from an Exploratory Study
Previous Article in Special Issue
On the Roles of Protein Intrinsic Disorder in the Origin of Life and Evolution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Determining the Identity of Nucleotides and the Energy of Binding of tRNAs to Their Aminoacyl-tRNA Synthetases Using a Simple Logistic Model

by
Piotr H. Pawłowski
1,* and
Piotr Zielenkiewicz
1,2
1
Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-106 Warszawa, Poland
2
Laboratory of Systems Biology, Institute of Experimental Plant Biology and Biotechnology, Faculty of Biology, University of Warsaw, Miecznikowa 1, 02-096 Warsaw, Poland
*
Author to whom correspondence should be addressed.
Life 2024, 14(10), 1328; https://doi.org/10.3390/life14101328
Submission received: 11 September 2024 / Revised: 10 October 2024 / Accepted: 12 October 2024 / Published: 18 October 2024
(This article belongs to the Special Issue What Is Life?)

Abstract

:
This study showed that the predictor in logistic regression can be applied to estimating the Gibbs free energy of tRNAs’ recognition of and binding to their aminoacyl-tRNA synthetases. Then, 24 linear logistic regression models predicting different classes of tRNAs loaded with a corresponding amino acid were trained in a machine learning classification method, reducing the misclassification error to zero. The models were based on minimal subsets of Boolean explanatory variables describing the favorite presence of nucleotides or nucleosides localized in the different parts of the tRNA. In 90% of cases, they agree with the components of the consensus strand in a class of tRNAs loaded by a given amino acid. According to the proposed theoretical model, the values of the free energy for the entry of the recognition state in the process of tRNA charging were obtained, and the inputs from identity nucleotides and the tRNA strand backbone were distinguished. Almost all the resulting models indicated leading anticodon tandems defining the first and second positions of the anticodon (positions 35 and 36 of the tRNA strand) and the small sets (up to six positions) of the other nucleotides as the natural identity nucleotides most influential in the free energy balance. The magnitude of their input to this energy depends on the position in the strand, favoring positions −1, 35, and 36. The role of position 34 is relatively smaller. These identity attributes may not always be fully arranged in a real single adaptor molecule but were comprehensively present in a given tRNA class. A detailed analysis of the resulting models showed that the absolute value of the energy of binding the tandem 35–36 decreases with the number of identity positions, as well as with the decreasing number of possible hydrogen bonds. On the other hand, in these conditions, the absolute value of the energy of binding of other identity nucleotides increases. All the models indicate that the nucleotide-independent energy of the repulsion tRNA backbone decreases with the number of identity nucleotides. It was also shown that the total free energy change in entering the recognition state increases with the amino acid mass, making this process less spontaneous, which may have an evolutionary reference.

1. Introduction

1.1. Background

Proteins perform several critical biological functions as enzymes, structural proteins, or hormones. Their synthesis occurs inside each cell in two basic stages, transcription and translation, accompanied by amino acid delivery. Transcription is a process in which a linear section of DNA encoding a protein (gene) is converted into a molecular template called messenger RNA (mRNA). During translation, the mRNA is read by ribosomes, which use the nucleotide sequence of mRNA as a recipe to synthesize a corresponding polypeptide chain of a future protein. In this process, progressing along the mRNA strand, ribosomes catalyze the formation of covalent peptide bonds between the amino acids, delivered there by specialized transfer RNA (tRNA) consisting of a few dozen nucleotides, which attach free amino acids in the cytoplasm and transport them to locations defined by the three-nucleotide sequences of mRNA (codons), which are easily recognized by the complementary three-nucleotide anticodons located at one end of a tRNA molecule. As the anticodons commonly and easily determine the amino acid load of a given tRNA, the resulting genetic code, i.e., the strict correspondence between codons and amino acids of the synthesized polypeptide, is a fundamental property of life. The basis of this unambiguous and accurate attachment of an amino acid to the CCA-3′ end of the tRNA molecule is the two-step reaction, i.e., amino acid activation with ATP and the transfer to the attachment site, catalyzed by aminoacyl-tRNA synthetases (aaRSs). The biological necessity of specific tRNA aminoacylation reactions relates to the structure of a tRNA, which determines its identity. The adequate rigidity and plasticity of typical L-shaped tRNA architecture are essential for tRNA interactions with aaRSs, requiring conformational changes and local contact of some nucleotidic residues with amino acids from a synthetase (aaRS). Thus, as suggested by crystallography, the identity is ensured by a small number of “identity nucleotides” in contact with an aaRS, predominantly located at distal regions of the tRNA molecule [1].
aaRS and tRNA recognition with identity nucleotides is the primary focus of the present study. Its innovation lies in the objective algorithmic selection of identity nucleotides and the description of the identifying interactions of tRNA and aaRSs in the language of Gibbs free energy change, i.e., the maximal work that can be completed in tRNA-aaRS binding at constant temperature and pressure. This work was estimated thanks to a proposed original theoretical model with a working hypothesis postulating the same statistical correctness of natural (thermodynamic) recognition and proper artificial classification using the machine learning algorithm of logistic regression.

1.2. Historical Perspective

Protein synthesis, according to the recipe written in three-letter code [2] carried by the messenger RNA (mRNA), using amino acids transported by transfer RNA (tRNA) [3] is precisely controlled by ribosomes [4]. The earlier recognition and control of a tRNA load with a given amino acid (aa) by aminoacyl-tRNA synthetases [5] is less understood. In this process, the obvious specific role of the anticodon and the universal role of the attachment site CCA-3′ may be supplemented by the other nucleotide components of the tRNA molecule (Supplementary Materials, Table S1). In the present work, we tried to confirm our expectations and find the other elements that significantly contribute to the process of tRNA loading. This intention was anticipated to be achieved by estimating their input to the change in the free energy during the recognition phase of the tRNA-loading process.
The existence of supplementary recognition elements is indicated by the enzymatic “superspecifity” of synthetases. It is widely accepted that ribosomal superspecifity during translational reading is based on the complementarity of the three-letter codon and anticodon, closely exposed by mRNA and tRNA strands. This is not the case for synthetases, which, as was shown for glutaminyl-tRNA synthetase [6], may require interactions between other tRNA identity nucleotides, at very distant positions of the strand, and their protein recognition sites. These identifying elements are most commonly located in the tRNA anticodon, the acceptor stem, and the associated “discriminator” base, position 73 (Figure 1) [7]. Synthetases may sometimes recognize nucleotides in the tRNA variable pocket [8] and certain motifs of the tRNA structure [9]. The crystallographic analyses of the synthetase/tRNA complexes confirm that these macromolecules may interact in a stereochemically complementary manner [10].
The need for additional identity nucleotides may arise from the specific forward position 34 in the tRNA strand (the third place of the anticodon) and the non-complementary nature of the protein recognition site. These conditions mean that the precision of the recognition of the nucleotide occupying this place could be, in the case of tRNA charging, less than in the translational reading. Thus, to avoid pre-translational errors, these local recognitions must be supported by recognizing nucleotides in other positions. This is especially important for amino acids with a more strictly defined third letter of the code. Additionally, in the case of 6-fold degenerate amino acids, which are decoded by as many as six codons (Leu, Ser, Arg), some tRNAs for the same amino acid may expose the different leading nucleotide tandem of the anticodon, which can also cause additional uncertainty in recognition by synthetases. In this case, the long variable arm of tRNA functions as an important discrimination element [11]. Furthermore, the two forms of lysyl-tRNA synthetase [12], to operate properly, require the recognition of additional anticodon-independent identification. On the other hand, the proper fit of the tRNA acceptor site to the catalytic center of synthetase may also require additional identity nucleotides.

1.3. The Goal and the Idea

The goal of this study is to provide an indication and a comprehensive analysis of the identity nucleotides of tRNA, crucial for its successful enzymatic recognition by aaRS. To reach this goal, the classification model explaining different amino acid loads using a selected specific nucleotide content of the tRNA strand was trained in the machine learning process, using half a thousand sequences of transfer RNA from database data for 90 different species. The idea came from the working hypothesis that natural recognition and proper artificial classification both involve the extraction and analysis of the object attributes, namely, identity nucleotides, leading to semantically the same result. Thus, a proper classification method could offer a theoretical model of natural recognition of tRNA by the aminoacyl-tRNA synthetases and reveal the identity molecules, their type, and position. As a consequence, we may expect that the ratio of the probability of classification success to the probability of classification error equals the odds of the recognition process in nature. Assuming that the natural recognition resembles the thermodynamic process of crossing the energy barrier between the states of non-recognized and recognized tRNA (in both directions), we could estimate the energetic constraints of this process. In this picture (Figure 2), the energy to be overcome (ΔGfor) represents the energy required to bring the tRNA closer and place it in the correct distance and orientation relative to the synthetase anticodon binding domain and the catalytic center. The decrease in the energy after crossing the energy barrier (ΔGrev) is a measure of the stability of the tRNA-synthetase complex achieved through the local interactions necessary for the catalysis of esterification of an amino acid to the 3′ end of a tRNA. In the discussed case, ΔG = ΔGfor + ΔGrev, where ΔGfor > 0 and ΔGrev < 0; thus, ΔGfor corresponds to the repulsion, and ΔGrev corresponds to the attraction between the tRNA and its aminoacyl-tRNA synthetase.
According to the presented picture, the free and tied states of tRNA and aaRS are separated by the transition state to tidy, in which intermolecular attraction and repulsion balance each other. This transition is maximally repulsive for a longer distance and partially stabilized for a shorter distance before the next step in the process of tRNA loading with an amino acid. We may expect that ΔGfor is built by effectively repulsing the long-range interactions (e.g., electrostatics and hydrophobic) and ΔGrev by effectively attracting the short-range interactions of the selected tRNA nucleotides and aaRS amino acids. In this scheme, certain nucleotides may have the opposite, repulsive effect but not change the overall picture of the short-range attraction.
It is worth adding that artificial classification can be performed using different tools, such as simple observations and basic statistical analysis. This paper proposes a machine learning algorithm offering fast big data analysis and objectivity of algorithmic decisions. The advantage of this method is the objective automatic selection of identity nucleotides and analytical description of the recognition process using the so-called predictor function, which quantitatively scores the contribution of each identity nucleotide to the selection of a given tRNA class (amino acid load) and, in this way, evaluates its importance. An inherent positive feature of the machine learning method is that it automatically improves the classification quality through experience.

1.4. The Importance of Positions—Initial Findings

To perform classification tasks, 24 different classes of tRNA were considered. They were distinguished by the charging amino acids and, if applied, by the different forms of synthetase (i.e., LysI and LysII) [13] or the specific degeneration in the first position of the anticodon (e.g., Leu1 = {GA *} and Leu2 = {AA *}). As attributes of the classes, nucleotides or nucleosides at a given position in the tRNA strand were used (−1, 1, 2, 3…76).
One may expect the dominant role of the anticodon in determining the amino acid content of tRNA. The preliminary attempts with the correlation attribute evaluator (Figure 3) showed the highest importance of positions 35 and 36 of the tRNA strand and, surprisingly, relatively low importance of the 3rd letter in the anticodon (positions 34 of tRNA), which is in 41st place. The latter finding confirms the need to look for the other identity nucleotides and, thus, in light of the previous comments, for a theoretical classifier properly modeling natural recognition, which is the topic of the presented work. Thus, the central research question is “Do the nucleotides beyond the anticodon in tRNA significantly contribute to the specificity of aaRS recognition and loading of amino acids?”

1.5. General Observations

A proper classifier for the best assignment of amino acid to a given tRNA was revealed in preliminary numeric experiments. It builds linear logistic regression models, minimizing logistic loss with the LogitBoost algorithm [14,15]. With this classifier, 100% accurate models for amino acid load were obtained, describing the significant positive or negative impact of choosing a given nucleotide at a given position. This way, the most important ensembles (up to eight positions) and their members in the tRNA strand (adding negative or positive input to the free energy) could be indicated, especially in the tRNA anticodon and the position 37, and also in the acceptor stem at the 5′ end position 2, at the 3′ end position 73, in the D-loop position 21 (Figure 4a) and in the variable loop position 48 (Figure 4b). Statistical analysis of the real tRNA sequences showed that the average of the minimal occurrence of the representants of possible identity nucleotides in a given class is 71 ± 15%. This means they may not always be fully arranged in a real single adaptor molecule but were comprehensively present in a given tRNA class (Figure 5). The full occurrence of predicted identity nucleotides, at least in one strand, was only observed in the case of 67% of tRNA classes.

1.6. Main Findings

In the detailed analysis and according to the working hypothesis, when crossing the energy barrier, the values of free energies—forward and reverse, entering and returning from the recognition state in the process of tRNA charging—were obtained, as well as contributions to the Gibbs free energy change. The latter was also possible due to the specific advantage of logistic regression model prediction functions, which, with the assumed correctness of the working hypothesis, may be easily interpreted as non-dimensional changes in the free energy, which was used in the proposed theoretical model (see the Materials and Methods). The magnitude of the identity nucleotide input to the discussed energy depends on the position in the strand, favoring positions −1, 35, and 36. Detailed analysis of the resulting models shows that the height of the reverse barrier related to anticodon tandems decreases with the number of the identity nucleotides, corresponding to the decreasing hydrogen bonding. On the other hand, the height of the reverse barrier, related to the other identity nucleotides, increases with the total number of indication nucleotides. Similarly, the total free energy change in entering the binding state increases with the amino acid mass (see the Discussion). It appears that, apart from the anticodon, the identity nucleotides add additional binding energy to overcome a certain energy level specific to a given amino acid class. Thus, the universal genetic code is supported by a precisely distributed quantity of binding energy in the tRNA–ligase interaction. However, this process is not efficient enough to keep the strong binding unchanged, such as for tRNAs transporting low-mass amino acids. It was also noticed that some subsets of the identity nucleotides, together with the anticodon tandem, are always and only present in a given tRNA class, being universal markers. Some other subsets, even without anticodon tandem, are unique, i.e., they may only be present in the tRNA of a given class, but they are not universal. Both define something like a pre-translational recognition code. The above findings indicate the possible mechanical and informational role of the unique sites outside the anticodon and raise questions about the direction of the evolution of these sites, which we attempt to answer in the Discussion.

2. Materials and Methods

2.1. Data Collection

Data containing tRNA sequences (nucleic, mitochondrion, and plastids) of molecules transferring complementary amino acid (aa) were taken from the tRNAdb—Transfer RNA Database [16] (link: http://trnadb.bioinf.uni-leipzig.de accessed on 28 March 2023). The data were reviewed to prevent duplicates, and the issues containing non-standard bases (A, U, G, and C) in positions 35 and 36 were excluded. Finally, 511 issues were collected as a training set of attributes describing nucleotides or nucleosides at positions −1, 1, 2, 3…76, and the class attribute. They are found in three kingdoms, i.e., Bacteria (140), Archaea (68), and Eukaryota (303), represented by 90 different organisms. The assumed position numeration is presented in Figure 1. The list of the 67 nucleot(s)ides considered and their derivatives, used symbols, and one-letter codes is presented in Supplementary Materials, Table S1. The empty positions were also considered.
The 24 different classes of tRNA were considered. They were defined according to the charging amino acids. In the case of a 2-fold degenerate first and/or second anticodon position (Arg, Leu, Ser), an additional class was attributed (e.g., Ser = {AG *}, Ser2 = {UC *}). The tRNA loaded using different forms of lysyl-tRNA synthetase (i.e., LysI and LysII) were also distinguished. A list of distinguished classes, tRNAaa, is presented in Table 1.
In the data analysis, the collection of machine learning algorithms for data mining tasks was performed with Weka 3.8.5 [17].

2.2. Preliminary Experiments Methods

In preliminary experiments, the attribute importance (Pearson’s correlation ranking between the attribute and the class) for the full training set was evaluated using CorrelationAttributeEval, with the Ranker -T -1.7976931348623157E308 -N -1.
To find the classifier algorithm of the best predictability of the tRNA class, five different classifiers were trained on a full training set: Dl4jMlpClassifier, LibLINEAR, RandomForest, SimpleLogistic, and SMO. The benchmark was determined using the ZeroR algorithm. Parameters of the training processes were accepted as pre-defined in the Weka environment, except the SimpleLogistic classifier, where an option of useCrossValidation = False was chosen to prevent the data order influence. The classifiers were evaluated using the correctness of predictions in a 66% split, 10-fold cross-validation, and full training set.

2.3. Final Experiments Methods

Finally, the classification tasks with the chosen SimpleLogistic classifier, minimizing misclassification error for the training set, were performed using the weka.classifiers.functions scheme: SimpleLogistic -I 0 -S -M 500 -H 50 -W 0.0. Linear predictor function fi for amino acid class aai was used in the following form:
fi: = bi + Σjk pijk × [posij = Nk]  i = 1…24, j = −1…76, k = 1…67
where [posij = Nk] is a Boolean explanatory variable (value of 0 or 1) for the tRNA from class AAi, describing the occurrence at the strand position j, the nucleot(s)ide Nk. In linear regression, pijk is the parameter indicating the relative effect of a particular explanatory variable on the value of the predictor, and bi is the bias, describing the part of predictor value that cannot be explained by the specific criterion postulated by the model. The symbol Σjk indicates the sum of components indicated by indexes j and k. In the considered case, index i corresponds to Table 1, strand position j corresponds to Figure 1, and nucleotide k corresponds to Supplementary Materials, Table S1.

2.4. Statistical Analysis

A detailed revision of the examined real tRNA strands and statistical analysis of the results of the SimpleLogistic classification were performed to obtain consensus strains, the histograms of position usage, and the averages of considered energies using standard formulas in the MO 2007 Excel calculation sheet (Microsoft Corporation, Redmond, WA, USA, 2018, link: https://office.microsoft.com).

2.5. The Theoretical Model of Machine Learning Simulation of tRNA Binding to Aminoacyl-tRNA Synthetase

Let us consider the intermediate stage of tRNA charging [18], i.e., the recognized tRNAaa binding to the corresponding preloaded [Aminoacyl-AMP aaRS] complex
tRNAaa + [Aminoacyl-AMP aaRS] → [tRNAaa Aminoacyl-AMP aaRS]
for each amino acid (aa) and aminoacyl-tRNA synthetase (aaRS) of the genetic code. The 24 amino acid classes were considered. This number led to distinguishing binding of the tRNA for the same amino acid but with different anticodons, positions 35 and 36, and tandems, as for Leu, Arg, and Ser. Furthermore, two classes of lysyl-trna synthetases may be considered.
When [Aminoacyl-AMP aaRS] complexes for all coded amino acids are equally available, the probability pii of binding the tRNAaai for the i-th amino acid to the corresponding i-th complex may be described by the Boltzmann factors as
p i i = E x p ( Δ G i i k B T ) / ( 1 + j = 1 n E x p ( Δ G i j k B T ) ) i = 1 , 24
where ΔGij is the Gibbs free energy of binding the tRNAaai to the j-th preloaded complex (for the j-th amino acid), T is the absolute temperature [K], kB is Boltzmann’s constant, and n is the number of coded amino acids.
If it is assumed that properly recognized tRNAaai binds much stronger than others (ΔGii ▯ ΔGij), the above distribution (Equation (3)) can be simplified as
p i i = E x p ( Δ G i i k B T ) / ( 1 + E x p ( Δ G i i k B T ) ) i = 1 , 24
Thus, the logit function, or ln(oddsi), defined for different i
logit(pii) = ln(pii/(1 − pii))  i = 1, … 24
can be expressed as
logit(pii) = −ΔGii/kBT  i = 1, … 24
The above rewritten equation allows the assignment of probabilistic meaning to the free energy of binding, i.e.,
ΔGii = − kBT logit(pii)  i = 1, … 24
If we assume that the same thermodynamic conditions were maintained during the whole evolution of the universal genetic code and the probability of evolutional classifying of a given tRNA to a certain tRNA aa class equals the thermodynamic probability of the tRNA binding to the preload complex of the corresponding synthetase, then the logit(pii), and thus ΔGii, may be considered the physical determinants of the tRNA classes during the evolution.
Furthermore, if we assume that ΔGij for a given synthetase depends on the varying contents of the tRNA nucleotide sequence and the evolutional selection with logit(pii) as a determinant resembles the classification algorithm with the logistic models ln(odds’i) = fi, where fi is the predictor function for a given class aai predicting the tRNA class from its sequence, then the theoretical classification algorithm can model the evolutional selection, and the ln(odds’i) may approximate logit(pii).
Finally, we can put
logit(pii) = fi  i = 1, … 24
Then, combining Equations (7) and (8) results in
ΔGii = −kBTfi  i = 1, … 24
which shows that fi may be considered a dimensionless analog of the free energy of binding.
Consequently, for fi described by Equation (1), the single energy inputs from the j-th position of the tRNA strand, occupied by the k-th nucleot(s)ide, equals
ΔGijk = −kBTpijk × [posij = Nk]  i = 1, … 24, j = −1…76, k = 1…67
Then, the part of the energy, position, and nucleotide independent may be calculated as
ΔG0ii = −kBTbi  i = 1, … 24,
When dividing the total free energy into forward and reverse parts (Figure 2), i.e.,
ΔGii = ΔGforii + ΔGrevii  i = 1, … 24,
for the long-range repulsion, it is reasonable to approximate the ΔGforii using the nucleotide-independent part of the binding energy change, i.e.,
ΔGforii = ΔG0ii  i = 1, … 24,
and ΔGrevii via the sum (∑jk) of nucleotide-dependent inputs, i.e.,
ΔGrevii = ∑jk ΔGijk.  i = 1, … 24,
with j and k, as defined in Equation (1).
Expected in this approximation, ΔGforii > 0 estimates the repulsion energy at the tRNA tRNA-synthetase approaching, and ΔGrevii < 0 estimates the attraction energy of the arising bonds at the return.

3. Results

3.1. Preliminary Experiments

The attribute-importance ranking evaluated for the full training set using CorrelationAttributeEval is presented in Figure 3. The correlation rank values obtained for dominating attributes are presented in Table 2. It is worth emphasizing that the third position of the anticodon only takes the 41 position in this rank.
The results of efforts toward finding the classifier algorithm of the best predictability of the tRNA class are presented in Table 3. According to the above findings, the SimpleLogistic classifier was chosen for the following machine learning experiments. The advantage of this algorithm is the automatic selection of the most important attributes, being natural candidates for the identity nucleotides.

3.2. Final Classification Task

To find the identity nucleot(s)ides, classification with the SimpleLogistic classifier was performed with the full training set. The final parameters are presented in Supplementary Materials, Table S2. The resulting model selects the most important positions and their nucleotidic contents, i.e., the nucleot(s)ides with a significant, positive, or negative (~) impact on a value of predictor function (Equation (1)), and the value of a change in the free energy of binding (Equations (9)–(11)). The revealed findings are presented in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9, Table 4 and Table 5.
The histograms of attribute position usage, separately for positive and negative inputs, are summarized in Figure 4a and Figure 4b, respectively. The negative-input positions 2, 21, 34, 35, 36, 37, and 73 and positive-input position 48 are dominant. They represent anticodons with the neighborhood, acceptor stem, D-loop, and V-loop. Here, in sum, the possible alternative filling in position 34 (degeneration) is neglected.
An example of an individual tRNA class is shown in Figure 5, illustrating that attributes may not always be fully arranged in a real single adaptor molecule but are comprehensively present in a given tRNAaa class. The full sets (max_theor) of predicted class identity nucleotides that occurred at least in one strand were observed in 67% (16/24) of tRNA classes (Figure 6). In other classes, the maximum occurrence (max_real) is at least 71% (5/7) for strands. The minimal minimum (min_real) is 50%, and its average value in all classes is 71 ± 15%.
The landscape of determined Gibbs free energy inputs ΔGijk calculated with the parameters pijk of the trained SimpleLogistic model (Equation (1)) and the theoretical thermodynamical model of binding, assigning to the identity nucleotides of a given tRNAaa class the free energy inputs (Equation (10)), is presented in Figure 7a. The domination of anticodon tandem and positions −1 and 73 is visible. The extracted example for a single class, tRNAGln, is shown in Figure 7b.
The nucleotide-independent part of free energy change, calculated according to parameters bi (Equation (1)) and the theoretical formula (Equation (11)), is presented in Figure 8.
The possible identity nucleotides selected for different tRNAaa classes using the SimpleLogistic classifier are presented in Table 4. The symbol “~” indicates nucleotides of positive energy input (repulsion) (for other symbol meanings please see Supplementary Materials, Table S1). Some cases in position 34 contain more than one nucleot(s)ide (degeneration). The maximum number of possible identity positions per class is eight. The minimum number is two. Position 34 (the third position of the anticodon) is present in 33% (8/24) of classes. Classes with the same anticodon tandems (e.g., Asp and Glu, Gln and His, and Lys I and LysII) indicate different sets of the other identity nucleotides.
The predicted identity positions were compared with those from consensus representation. In Table 5, the consensus representatives (columns) consistent (red) and inconsistent (gray) with predictions of the SimpleLogistic model are indicated. The elements differing from the attributes of negative impact were treated as consistent. A total of 90% (116/129) of predicted identity nucleotides of non-positive input to energy (non-repulsing) agree with the nucleotides in consensus strands for respective tRNAaa classes.
An example of the spatial distribution of identity nucleotides for twin classes tRNASer and tRNASer2 is presented in Figure 9a,b. This example shows that single Seryl-tRNA synthetase can recognize two ensembles of identity nucleotides.

3.3. The Free Energy Considerations

The change in free energy of binding the tRNAaa to aa-tRNA synthetase was calculated for each tRNA strand in the training set according to Equation (9) and the parameters of Equation (1). This was determined using the SimpleLogistic classifier in the final classification task modeling the ln(odss) of the recognition of tRNA with a given set of identity nucleotides via the linear combination of the Boolean variables characterizing the occurrence of nucleotides and the non-specific bias. Then, the average values of ΔGii for each tRNAaa class were determined (Figure 10). The most negative ΔGii shows the tRNASer class: ΔGii = −8.6 [kBT]. The least “negative” is the tRNALysII class: ΔGii = −4.0 [kBT].
To approximately illustrate the strength of tRNA and aminoacyl-tRNA synthetase attraction, ΔGii and ΔGforii (=ΔG0ii) were summarized on one chart (Figure 11). The strongest attraction, ΔGrevii = ΔGii − ΔGforii (Equations (12) and (13)), was observed for the tRNAGly class, ΔGrevii = −15.2 [kBT]. The weakness attraction, ΔGrevii = −6.6 [kBT], was observed for the tRNALysII class.
The simultaneous dependence of free energy parts—forward ΔGforii, the reversal limited only to the anticodon tandem ΔGrevii(tan), and the rest of the reversal energy without the anticodon tandem ΔGrevii|tan—on the maximal number of possible identity points NIP is shown in (Figure 12). A decrease is shown in the attraction of the anticodon tandem, ΔGrevii(tan); a simultaneous decrease in non-specific repulsion, ΔGforii; and an increase in attraction of other identity nucleotides, ΔGrevii|tan.
The decrease in the attraction of anticodon is not sufficiently balanced by an increase in the attraction of other identity points and a decrease in non-specific repulsion. Thus, the total free energy change increases with NIP (Figure 13).

3.4. The Analysis of the Attraction of Anticodon Tandem

The dependence of the energy of tRNA attraction to aminoacyl-tRNA synthetase in the area of the anticodon tandem on the nucleotide contents is shown in Figure 14. The measure of this attraction ΔGrevn, and the average reversal energy limited only to the one position (35 or 36) and the one nucleotide type, increases for “weak” nucleotides (A and U), with possibly less hydrogen bonding. The 24 classes were considered.
The example of the dependence of the total energy of the attraction on the actual number of possible hydrogen bonds in a different identity ensemble of the tRNAGlu class (Figure 15) confirms the above findings. At the constant anticodon tandem, the attraction of the other identity nucleotides increases, so actual ΔGrevii’ decreases with the number of hydrogen bonds. Here, ΔGrevii’ was not averaged over all classes but represents real cases of the same subsets of identity nucleotides.

3.5. The Free Energy and Recognition

To analyze the interaction of tRNAs of different classes (i) with the given aminoacyl-tRNA synthetase (j), Equations (1) and (9) with the parameters of the final classification task were applied to the nucleotides of the tRNA consensus strands (Table 5). This enabled estimating the consensus free energy change ΔGij”. The examples for glutaminyl-tRNA synthetase (Figure 16a), histidyl-tRNA synthetase (Figure 16b), and lysylI-tRNA synthetase (Figure 16c) are shown. The results suggest that to be properly recognized, the change in binding free energy should decrease below a certain level, characteristic for a given class. The weaker bound or repelled tRNAs are not recognized.
The energy of the less bound strands, max ΔGii, for different classes is shown in Figure 17. A decrease is shown in the binding strength with an increase in the molecular weight of transported amino acids in the range of Gly–LysII, then a slight increase. The weak free energy of binding is −1.62 kBT.

4. Discussion

Although universal genetic code is a major informatics factor governing the development and functioning of all organisms, there is no rational argument suggesting that it is the only form of natural code. For proper functioning, many life processes require other universal identifiers, enabling error-free recognition. A three-letter (positions) universal genetic code enough for translational reading may not be enough during charging tRNA with an appropriate amino acid via aminoacyl-tRNA synthetase. Preliminary numeric experiments indicate that, when we think about the assignment of the tRNA strand to a given amino acid class, the third position of the anticodon, position 34 of the tRNA strand (Figure 1), is not as important as a dominating anticodon tandem; namely, positions 36 and 35 (Figure 3). We propose to use the name “tandem” because nucleotides at these two positions usually occur together in the discussion of nucleotide importance. The 41st place out of 77, which position 34 takes in the correlation ranking (Table 2), entails practical meaning in machine learning. For example, the classifier performing the tRNA classification task of correctly choosing one of the two amino acids coded by the same anticodon tandem (e.g., Gln and His) will not prefer the nucleot(s)ide in position 34, as is the rule in the translation of the universal genetic code. It will recognize other, statistically more important components. Thus, if we assume that training of a classifier manifests the features of a natural enzymatic process of tRNA recognition and that the final single result of the classification corresponds to the enzymatic load of amino acid, it leads to the conclusion that, in the process of tRNA charging, the empty tRNA transporter may also effectively expose nucleot(s)ide(s) other than the anticodon to aminoacyl-tRNA synthetase. Marginalization of the third position of the anticodon during amino acid load, a position that is very important in the translation process, may be related to the specificity of the nucleotide–protein interactions (H-bonds, salt bridges, and hydrophobic effect) in the area of the anticodon site during t-RNA attachment to the synthetase. Regardless of the loose by “wobble” effect, complementary specify in position 34 has to be maintained for future translation in the ribosome; however, in some tRNA classes, it may not be guaranteed by the local interaction of the synthetase with a single nucleotide in this position. In such cases, other well-defined nucleotides, even outside the anticodon loop and stem, have to be applied. They may also speed up the recognition process. Thus, it is expected that so-called “identity nucleotides” are a small set of nucleotides determining the identity of tRNA; more precisely, carrying chemical groups that often interact with amino acids on the synthetases.
Another example of the importance of the identity nucleotides is the 6-fold degenerate amino acids (Leu, Ser, and Arg), and tRNAs may expose the two different leading nucleotide tandems in the anticodon. In this case, identity nucleotides may reduce the uncertainty level in positions 35 and 36 and prohibit charging errors, in effect also speeding up the binding process.
On the other end of the tRNA strand, the proper fit of the tRNA acceptor site to the catalytic center of synthetase may also require additional identifying nucleotides, which may be especially essential in recognizing the tRNA for the same code amino acid via the synthetases off different classes (LysI and LysII).
The strand positions less important than 34, e.g., 8, 20, 33, and 74–76 (Figure 3), may be related to the universal third-order structure of the tRNA strand and the acceptor site CCA-3′.
As nucleotide-specific interactions between aminoacyl-tRNA synthetases and their cognate tRNAs ensure accurate RNA recognition and prevent the binding of noncognate substrates, reducing further translational errors, the above examples highlight the importance of identity nucleotides.
Machine learning algorithms are computational models that allow computers to automatically improve their findings thanks to the experience gained while analyzing the training data sets. Their advantages are the fast processing of big data and objectivity, which is especially important in analyzing biological data. Machine learning classifiers are algorithms that automatically assign data points to classes. As such, they are great tools for modeling the processes of recognition decisions. A simple logistic classifier uses simple logistic regression to predict a binary variable (0, 1) assigned to a decision. This technique assumes that the relationship between the natural log of the odds ratio and the measurement variable described by the so-called predictor function is linear. In the discussed case, the predictor function quantitatively sums the presence of selected nucleotides, assigning them appropriate weights manifesting their importance.
The Weka SimpleLogistic classifier was verified in the preliminary numeric experiments as the best classifying algorithm in 10-fold cross-validation and 66% split tests and was finally chosen among others (Table 3). The useful feature of this classifier is the automatic explicit indication of the most important attributes (nucleotides) and their values (weights). The number of algorithmically selected nucleotides depends on the stopping criterion of LogitBoost iterations. The chosen option of minimizing the training misclassification error results in fewer setups of selected attributes than in the case of AIC or cross-validation options.
The final classification task with a full training set indicates the important positions in the tRNA strand in recognizing the proper class. Their usage (Figure 4a,b) correlates (cc = 0.63) with the ranks assigned by the CorrelationAttributeEval (data partially presented in Table 2). The most useful are the anticodon loop area (34–37) and positions 2, 21, 48, and 73, localized in the Acc-stem, D-loop, V-loop, and 3′ end regions. It is assumed that the revealed contents of these places represent the nucleotides, which are necessary for full identification. Thus, they were named identity positions and nucleotides.
The revealed identity positions and nucleotides are presented in Table 4. The shown class cases obey 2–8 places, filled with 14 different nucleot(s)ides, i.e., A, B, C, D, G, K, M, P, Q, U, 6, 8, and # (for symbol meanings, see Supplementary Materials, Table S1). They also contain the empty positions (-) and the exclusion rules (~) indicating unfavorable staffing. The presented attributes (positions) with their values (nucleotides) are theoretically the most representative group of features properly determining the values of the linear learners (predictors) for the correct determination of the predicted classes. In some cases, there were two or three different fillings proposed at position 34, which was only indicated in the eight classes.
The identity positions use the most representative components of the consensus strand in the class of tRNAs loaded by a given amino acid (Table 5). Only 13/129 (10%) positions do not meet consensus meaning. Global consensus, as in positions 74–76, was not useful for specific tRNAaa recognition.
The identity nucleotides are not always fully arranged as a collective in a real single adaptor molecule, but they are at least partially visible in the strands of a given tRNA class (Figure 5). Full sets of predicted class identity nucleotides occurred at least in one strand and were observed in 67% (16/24) of tRNA classes (Figure 6). In other classes, the maximum occurrence exceeded 71% (5/7) of possible identity positions per strand (Glu).
Some ensembles of identity nucleotides, wider than the anticodon tandem, can serve as universal tRNAaa class markers, i.e., they are entirely present in all strands of only one tRNAaa class (Table 6). The above findings raise questions about the direction of the evolution of these sites and the possible informational role and importance of specially marked amino acid transporters.
Some ensembles of identity nucleotides, even without those from positions 35 and 36, exhibit high-class specificity. A total of 16 such ensembles in 76 issues within 511 strands were observed as unique, i.e., observed only in one class (Table 7), but not always. They contain 2–6 positions written with an 11-letter alphabet, which also obey empty positions or the attributes with the opposite impact. They may be a large fraction among identity nucleot(s)ides of a given class (Figure 5), but they may not be common in all strands. Such unique extra-anticodon ensembles may conserve the former predicted classification in the case of a modified anticodon tandem. This mechanism could function during the evolution of the genetic code, producing its extra degeneration. There may be some tracers of this phenomenon, e.g., position 21A in Arg and Arg2 and position 48~- in Leu and Leu2 (Table 4). This may also be the source of translation errors.
In the case of a classifier based on a machine learning algorithm correctly classifying representatives of classes occurring in nature, the probability of its decisions must be consistent with the probability of occurrence of real physicochemical processes naturally determining these classes. In analyzing the formation of tRNA classes, we assumed that these are the thermodynamic processes of overcoming the potential barrier during the fitting of the tRNA strand to the corresponding ligase. Thus, the central assumption of the proposed theoretical model of machine learning simulation of the tRNA binding to aminoacyl-tRNA synthetase is the correspondence between the biologically probable diversity of tRNAaa strands revealed by the AI classifier and the thermodynamic probability. This allows for the mathematical alignment of the logarithm of odds for classification task and logits for thermodynamically driven tRNA and synthetase binding (Equation (8)), i.e., ln(odds’i) = logit(pii), where odds’ is the measure of the success in the numeric experiment and pii is the probability of binding described by the Boltzmann distribution (Equation (4)). This allows for the expression of the change in the free energy of binding, ΔGii, by the predictor function fi, which may be treated as dimensionless energy (Equation (9)). Thus, the predictor, including the presence of selected nucleotides, becomes a mathematical model of free energy change related to this nucleotide, namely, its interaction with aaRS.
The overall picture for all classes of the energetic input, ΔGijk, of a given tRNA position to a total free energy change is presented in Figure 7a. It shows the energetically rescaled values (Equation (10)) of the coefficients (parijk) of the linear predictor functions (Equation (1)) representing the relative effect of a given filling for the value of the predictor of a given class. The maximal value of a given predictor at a given nucleotide content leads to assigning a classified tRNA strand to a corresponding class. In Figure 7b, a detailed example is shown for a single predictor of tRNAGln class. It calculates five positions of a negative energy input (attraction) and three positions of a positive input (repulsion) if filled with indicated nucleotides. In this work, the positive values of energy input are interpreted as exclusion rules, e.g., ΔGijk > 0 for nucleotide “A” in position 44, which implicates the rule “44~A”, i.e., the presence of “A” in this position testifies against tRNAGln class; all other nucleot(s)ides are neutral.
The nucleotide-independent part of free energy change, ΔG0ii (Figure 8), calculated according to the bias of predictor function (Equation (1)) and the theoretical formula (Equation (11)), was always positive, which corresponds to the repulsion. This term represents part of the energy unexplained strictly by the nucleotidic attributes of the simple logistic model. On the other hand, a simple analysis showed its moderate correlation (cc = 0.46) with the common net negative charge of amino acids of aminoacyl-tRNA synthetases at a pH of 7.0 (Figure 18). As seen in Figure 9a,b, tRNA exposes its negatively charged sugar–phosphate backbone toward an overwhelmingly negative enzyme, which may result in electrostatic repulsion. This repulsion is reduced in the cases of tRNA with the crucial third anticodon nucleotides, which have to bind stronger to be properly recognized (see Figure 11). Thus, it is reasonable to assume that ΔG0ii describes electrostatic repulsion between the tRNA backbone and the amino acids of synthetase.
The average change in the free energy of binding ΔGii was estimated at −8.6 to −4.0 kBT (Figure 10). A decreasing trend in the binding energy with the increase in the molecular mass of amino acid until LysII, then a slight increase, was observed. Its comparison (Figure 11) with the energy barrier for the attachment, ΔGforii, approximated by ΔG0ii (Equation (13)), led to the estimation of the strength of binding, i.e., the energy required to reverse the process, ΔGrevii = ΔGii − ΔGforii. The strongest attraction was determined for the tRNAGly class, ΔGrevii = −15.2 [kBT]. The weakness attraction, ΔGrevii = −6.6 [kBT], was found for the tRNALysII class.
Some simultaneous variations in the free energy change with the increase in the number of identity positions, Nip (Figure 12), were observed. Thus, the reversal energy limited only to the anticodon tandem, ΔGrevii(tan), increases (the attraction of the tandem decreases), but the rest of the reversal energy, without the anticodon tandem, ΔGrevii|tan, and part-forward of the energy, ΔGforii, decrease (attraction increase). This counterplay of the outside-codon factors is not able to balance fully the tandem energy variation, so the total free change slightly increases (Figure 13) with the number of identity positions.
Specifically, an unambiguous interpretation of the sequence-independent energy component ΔG0ii as the long-range term ΔGforii (Equation (13)) in the “crossing the energy barrier” model (Figure 2) may cause some issues. There could be other sequence-independent interactions that are short ranged (but do not contribute to the sequence-specific recognition process). As the value ΔGforii, calculated as ΔGforii = ΔG0ii, decreases with the number of identity positions Nip (Figure 12), this may suggest that for the appropriately large number of the identity nucleotides, the parameter bi (Equation (11)) and the corresponding repulsion fall to zero independently of increasing the proximity of molecules. Thus, at first approximation, short-range repulsion can be neglected, and ΔGforii mainly reflects the long-range interaction energy.
The “strong” nucleotides (G and C), with a theoretically possible three-hydrogen bond in interactions with other RNA or protein in positions 35 and 36, bind aminoacyl-tRNA synthetases stronger than “weak” nucleotides (A, U) (Figure 14). This may suggest the important role of hydrogen bonding. The average data in the other positions for all analyzed classes were insufficient. The dependence of the total energy of the attraction on the actual number of possible hydrogen bonds, NHB, in different identity ensembles of the tRNAGlu class (Figure 15) seems to confirm the above findings. At the constant anticodon tandem, the attraction of the other identity nucleotides increases with the actual number of hydrogen bonds, so the actual ΔGrevii’ decreases.
The examples of recalculations for glutaminyl-tRNA synthetase (Figure 16a), histidyl-tRNA synthetase (Figure 16b), and lysyl-tRNA synthetase (Figure 16c) using consensus tRNAaa strands suggest that for proper recognition, the change in binding free energy should decrease below a certain level, characteristic for a given class. The weaker bound or repelled tRNAs are not recognized. The energies of the real less bound strands, max ΔGii, for different classes, are shown in Figure 17. They estimate the minimal binding energy levels.
The misacetylation as the source of mistranslation occurs approximately ten times less frequently than misreading [19]. It should also be investigated as a potential source of translational errors. The data in Figure 16c show that tRNAAsn is the best candidate to be misrecognized by lysyl-tRNA synthetase of class II.
Generally, the energies presented in a holistic approach in Figure 16a–c and Figure 17 can be skewed and thus misleading due to overrepresentation or underrepresentation. To avoid this effect, the record duplicates were removed from a data set, and the unique strands representing a wide spectrum of possible tRNA occurrences in nature were considered (see Section 2.1). In the theoretical model (Equation (3)), it was assumed that the [Aminoacyl-AMP aaRS] complexes for all coded amino acids are equally available and the classes in the machine learning classifier are equally accessible. These conditions, being reminiscent of the issue in statistical mechanics regarding the principle of equal a priori probabilities, may avoid additional skewness. The entropic component is not a topic here, but at an assumed constant temperature, it does not influence the change in the Gibbs free energy.
The coverage of identity nucleot(s)ides, i.e., the ratio of those that occurred to possibly occurred in a given class (please see the example in Figure 5), varies with the maximal number of identity positions, NIP (Figure 19). It also decreases with the number of identity positions, which raises the question if it may be an evolutional trend. The coverage, similar to the abundance [20], might be a useful parameter in the biophysical modeling of biological processes.
When simultaneously analyzing the field of the two parameters possibly related to the evolutionary history of genetic code, i.e., the molecular weight of charging amino acid, MW, and NIP, a specific manner was found in which the tRNAaa classes of aa belonging to the same metabolic families cover the area of discussed values (Figure 20).
According to this picture, the tRNAaa for aa from the serine and pyruvate families is characterized by a lower range of MW and NIP values. On the other hand, the histidine and the aromatic family contain MW and NIP from the ranges of the higher values. The aspartate and glutamate families cover the range of moderate MW and the wide spectrum of NIP. One may conclude that the weak aa mass and the weakly identified tRNA, serine, and pyruvate families completely emerged at the beginning of the evolution of the code, much earlier than the final histidine and aromatic families. In this scheme, the other families were created throughout the entire period of the evolution.
Moreover, the average free energy change in a given tRNAaa class increases with the molecular weight of the corresponding amino acid (Figure 21). Two weight groups of amino acids were distinguished: below and above 150 [Dalton]. The initial trend for Gly-Met (blue) results in a fragile binding above the mass of methionine, which might stop the aminoacylation of tRNA. It is likely that this trend was evolutionarily changed to the stronger binding trend His-Trp (red) due to small mutations in the previously used stronger bound tRNA strands, even amplifying its binding. The consensus strand (Table 5) of histidine (H) is the most similar to glutamic acid (E) and vice versa. The consensus strands are the closest in the entire analyzed set. The tRNA for histidine includes, as the only one, strongly attracts glycine at position −1 (Table 4, Figure 7a).
It is reasonable to expect that the tRNA of the amino acids with the later originated genetic code has more identity positions, the classes of earlier becoming code are better completed, and they transport lighter amino acids [21]. The dependence of the coverage on the number of identity positions (Figure 19) and the specific distribution of aa families on the parameters plane, MWxNIP (Figure 20), seem to support these expectations. As a result, the idea to treat NIP as a determinant of the evolutionary progress can be postulated, and only alone anticodons or very short ensembles of identity nucleotides may play a role at the beginning of the evolution, i.e., a two-letter code evolution. However, their informative role became too low and came to be supported by the bigger sets of spatially distributed elements. This is especially clearly seen in the case of the anticodons of such pairs as Asp-Glu, Arg2-Ser2, Ile-Met, and Cys-Trp, where the third position is essential for proper translation and should be correctly recognized during amino acid load. These pairs could evolve from the two-letter coded amino acids loaded onto the strains containing the two separate subsets of the extra-anticodon unique identity nucleot(s)ides, which, at some stage of evolution, became processed by the two different synthetases. Then, unique extra-anticodon positions could avoid docking errors at the third position of the anticodon site. This enabled evolutional differentiation of the third position of the anticodon and the charging amino acids at the first and the second fixed positions of the anticodon tandem. A similar mechanism, obeying entire sets of identity nucleot(s)ides, could also permit the differentiation of the Lysyl-tRNA-synthetases classes (LysI and LysII). There are probably many other consequences of the existence of the identity nucleot(s)ides and their unique subsets within tRNA strands, e.g., determining the third-order structure, which requires a detailed analysis in the future. This is in favor of the positive answer to the central research question of this paper regarding the importance of identity nucleotides beyond anticodon.
The identity positions determined by the presented model (Figure 22) cover 62% (60/97) of positions conserved in the three domains of life, as reported in a recent review [22]. The 55 predicted positions in all classes are not cited in this review, and the 37 reported positions are not predicted. This may be due to the limitations of the literature study and the performance of the trained model (which might not always be the highest) at assumed parameters. Both factors can be improved in future research.
The independent results of the Gibbs free energy of tRNA-aaRS interactions are not known to the authors. The only known report of the estimation of long-range electrostatic attractions presented in the work [23] reveals the order of a few kBT, which is similar in magnitude to that observed in the results of our models. In the authors’ opinion, the presented application of the machine learning tool and the thermodynamics approach leads to interesting results that are hard to obtain using other methods, and this is why they are worth publishing.
The discussed importance of identity nucleotides makes the presented findings, such as universal markers, unique ensembles, or Gibbs free energy of tRNA-aaRS binding, important to the broader field of molecular biology or tRNA research.
Despite the present paper focusing on the tRNA, some evolutional aspects, among others indicating the number of identity positions as an important parameter related to evolution, are a good basis for the future study of the coevolution of tRNA and aaRS. Although one may expect that due to transcriptional and translational mechanisms, the synthetases evolve slower than the corresponding tRNA strands and could not be observed, the observed divergence of tRNA-ligase classes (LysI and LysII) is the first gate to this area.

5. Summary

A total of 24 linear logistic regression models selecting identity nucleotides (or nucleosides) and quantitatively evaluating their importance, thus predicting different classes of t-RNA load with the corresponding amino acid, were developed using the machine learning classification method. The favorite location of identity nucleotides appears in the different parts of the tRNA strand, i.e., in the anticodon loop, especially in the tRNA anticodon and position 37; in the acceptor stem, at the 5′ end, positions 2; at the 3′ end, position 73; in D-loop, position 21, and also in the variable loop, position 48. They agree with the components of the consensus strand in a class of tRNAs loaded using a given amino acid. According to the proposed theoretical model of machine learning simulation with the accepted working hypothesis, the values of the free energy to enter the recognition state in the process of tRNA loading were obtained, and the inputs from the identity nucleotides and tRNA strand backbone were distinguished. Almost all predictions indicate leading anticodon tandems defining the first and the second position of the anticodon (positions 35 and 36 of tRNA strand) and the small sets (up to six positions) of the other nucleotides, with the natural identity nucleotides being the most influential in the free energy balance. The magnitude of their input to this energy depends on the position in the strand, favoring positions −1, 35, and 36. The role of position 34 is relatively smaller. The identity attributes may not always be fully arranged in a real single tRNA molecule but were comprehensively present in a given tRNA class. Some subsets of the identity nucleotides, together with the anticodon, may be treated as universal class markers. Some other subsets, considered even without anticodons, are only present in the tRNA of a given class, but not always. The analysis of the individual logistic models shows that the absolute value of the energy of binding the anticodon tandem, 35–36, decreases with an increasing number of identity positions and with a decreasing possible number of hydrogen bonds. In these conditions, the absolute value of the energy of binding of other identity nucleotides increases. All models indicate the nucleotide-independent energy of the repulsion tRNA backbone, decreasing with the number of identity nucleotides. It was also shown that the total free energy change in entering the recognition state increases with the amino acid mass, making this process less spontaneous. The identity nucleotides, apart from the anticodon, may add some additional binding energy to overcome a certain energy level specific to a given amino acid class. The stability of universal genetic code is supported by a precisely controlled quantity of binding energy in tRNA–ligase interaction. However, during evolution, this process was not efficient enough to keep an unchanged level of strong binding, such as for early tRNA transporting of low-mass amino acids. The strand coverage by the identity positions decreases with the number of identity positions. On the other hand, an increase in the molecular weight of carried amino acids and the diversity of corresponding metabolic families were observed. As the number of identity positions may indicate the evolutional progress of a given tRNA class, the results of this study may be useful in the future analysis of the evolution of tRNA and coevolution of tRNA and aaRS.

6. Conclusions

Identity nucleot(s)ides may be revealed using the machine learning (ML) classification method.
The evolutional trend of the tRNA sequences toward completing identity nucleotides may be postulated.
This phenomenon, as a consequence, may finally lead to the emergence of specific nucleotidic markers of class and unique subsets of the nucleotides appropriately scattered outside the anticodon site, which may guarantee more precise control of the interaction of tRNA and aminoacyl-tRNA synthetase during the process of the amino acid load, and thus less erroneous protein synthesis.
A certain level of change in free energy of binding is required for aminoacylation of tRNA.
Identity nucleotides help to maintain the required free energy level for different anticodon contents.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/life14101328/s1, Table S1: Nucleot(s)ides, their symbols, and one-letter codes; Table S2: Parameters of the model.

Author Contributions

Conceptualization, P.H.P.; methodology, P.H.P.; software, P.H.P.; validation, P.H.P. and P.Z.; formal analysis, P.H.P.; investigation, P.H.P.; resources, P.Z.; data curation, P.H.P.; writing—original draft preparation, P.H.P.; writing—review and editing, P.H.P. and P.Z.; visualization, P.H.P.; supervision, P.Z.; project administration, P.Z.; funding acquisition, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

Partially supported by the Polish Ministry of Science and Higher Education, under the project: DIR/WK/2018/06.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this article and its Supplementary Material files. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Giegé, R.; Frugier, M. Transfer RNA Structure and Identity. In Madame Curie Bioscience Database [Internet]; Landes Bioscience: Austin, TX, USA, 2013. Available online: https://www.ncbi.nlm.nih.gov/books/NBK6236/ (accessed on 11 October 2024).
  2. Watson, J.D.; Crick, F.H. Genetical implications of the structure of deoxyribonucleic acid. Nature 1953, 171, 964–967. [Google Scholar] [CrossRef] [PubMed]
  3. Plescia, O.J.; Palczuk, N.C.; Cora-Figueroa, E.; Mukherjee, A.; Braun, W. Production of antibodies to soluble RNA (sRNA). Proc. Natl. Acad. Sci. USA 1965, 54, 1281–1285. [Google Scholar] [CrossRef] [PubMed]
  4. Palade, G.E. A small particulate component of the cytoplasm. J. Biophys. Biochem. Cytol. 1955, 1, 59–68. [Google Scholar] [CrossRef] [PubMed]
  5. Delarue, M. Aminoacyl-tRNA synthetases. Struct. Biol. 1995, 5, 48–55. [Google Scholar] [CrossRef] [PubMed]
  6. Ibba, M.; Hong, K.W.; Sherman, J.M.; Sever, S.; Söll, D. Interactions between tRNA identity nucleotides and their recognition sites in glutaminyl-tRNA synthetase determine the cognate amino acid affinity of the enzyme. Proc. Natl. Acad. Sci. USA 1996, 93, 6953–6958. [Google Scholar] [CrossRef] [PubMed]
  7. Lenhard, B.; Orellana, O.; Ibba, M.; Weygand-Durasevic, I. tRNA recognition and evolution of determinants in seryl-tRNA synthesis. Nucleic Acids Res. 1999, 27, 721–729. [Google Scholar] [CrossRef] [PubMed]
  8. McClain, W.H.; Foss, K. Changing the acceptor identity of a transfer RNA by altering nucleotides in a “variable pocket”. Science 1988, 241, 1804–1807. [Google Scholar] [CrossRef] [PubMed]
  9. Saks, M.E.; Sampson, J.R. Evolution of tRNA recognition systems and tRNA gene sequences. J. Mol. Evol. 1995, 40, 509–518. [Google Scholar] [CrossRef] [PubMed]
  10. Rould, M.A.; Perona, J.J.; Steitz, T.A. Structural basis of anticodon loop recognition by glutaminyl-tRNA synthetase. Nature 1991, 352, 213–218. [Google Scholar] [CrossRef] [PubMed]
  11. Rubio Gomez, M.A.; Ibba, M. Aminoacyl-tRNA synthetases. RNA 2020, 26, 910–936. [Google Scholar] [CrossRef] [PubMed]
  12. Ambrogelly, A.; Korencic, D.; Ibba, M. Functional annotation of class I lysyl-tRNA synthetase phylogeny indicates a limited role for gene transfer. J. Bacteriol. 2002, 184, 4594–4600. [Google Scholar] [CrossRef] [PubMed]
  13. Ribas de Pouplana, L.; Schimmel, P. Two Classes of tRNA Synthetases Suggested by Sterically Compatible Dockings on tRNA Acceptor Stem. Cell 2001, 104, 191–193. [Google Scholar] [CrossRef] [PubMed]
  14. Landwehr, N.; Hall, M.; Frank, E. Logistic Model Trees. Mach. Learn. 2005, 59, 161–205. [Google Scholar] [CrossRef]
  15. Sumner, M.; Frank, E.; Hall, M. Speeding up Logistic Model Tree Induction. In Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, Porto, Portugal, 3–7 October 2005; pp. 675–683. [Google Scholar]
  16. Jühling, F.; Mörl, M.; Hartmann, R.K.; Sprinzl, M.; Stadler, P.F.; Pütz, J. tRNAdb 2009: Compilation of tRNA sequences and tRNA genes. Nucleic Acids Res. 2009, 37, D159–D162. [Google Scholar] [CrossRef] [PubMed]
  17. Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. In Data Mining: Practical Machine Learning Tools and Techniques, 4th ed.; Morgan Kaufmann: Cambridge, MA, USA, 2016. [Google Scholar]
  18. Berg, J.M.; Tymoczko, J.L.; Stryer, L. Aminoacyl-Transfer RNA Synthetases Read the Genetic Code. In Biochemistry, 5th ed.; Section 29.2.1; W.H. Freeman and Company: New York, NY, USA, 2001. [Google Scholar]
  19. Pienaar, E.; Viljoen, H.J. The tri-frame model. J. Theor. Biol. 2008, 251, 616–627. [Google Scholar] [CrossRef] [PubMed]
  20. Siwiak, M.; Zielenkiewicz, P. A Comprehensive, Quantitative, and Genome-Wide Model of Translation. PLoS Comput. Biol. 2010, 6, e1000865. [Google Scholar] [CrossRef] [PubMed]
  21. Pawłowski, P.H. The smooth evolution of the universal genetic code (main episodes). Int. J. Sci. 2019, 9, 28–51. [Google Scholar] [CrossRef]
  22. Giegé, R.; Eriani, G. The tRNA identity landscape for aminoacylation and beyond. Nucleic Acids Res. 2023, 51, 1528–1570. [Google Scholar] [CrossRef] [PubMed]
  23. Tworowski, D.; Safro, M. The long-range electrostatic interactions control tRNA-aminoacyl-tRNA synthetase complex formation. Protein Sci. 2003, 12, 1247–1251. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The secondary tRNA structure and the assumed numeration of positions along the strand. AA—amino acid.
Figure 1. The secondary tRNA structure and the assumed numeration of positions along the strand. AA—amino acid.
Life 14 01328 g001
Figure 2. The recognition of tRNA as the thermodynamic process of crossing the energy barrier between the states of non-recognized and recognized tRNA. The symbols are as follows: S—the tRNA substrate, E—the tRNA-synthetase enzyme, [ES]—the enzyme–substrate complex (tied state), ΔG—the change in the Gibbs free energy, for—forward part, and rev—reversal part. The second (basic) transition state to aminoacetylation was signalized by an arrow.
Figure 2. The recognition of tRNA as the thermodynamic process of crossing the energy barrier between the states of non-recognized and recognized tRNA. The symbols are as follows: S—the tRNA substrate, E—the tRNA-synthetase enzyme, [ES]—the enzyme–substrate complex (tied state), ΔG—the change in the Gibbs free energy, for—forward part, and rev—reversal part. The second (basic) transition state to aminoacetylation was signalized by an arrow.
Life 14 01328 g002
Figure 3. The attribute importance ranking evaluated for the full training set using CorrelationAttributeEval. The correlation of the attributes with the amino acid class was analyzed. The colored area at the bottom indicates parts of the secondary structure of the tRNA molecule. The white area indicates loops, the red area indicates anticodons, and the same colors indicate complementary regions of stems. Position 34 (the 3rd letter of the anticodon) is surprisingly less important than other non-coding positions.
Figure 3. The attribute importance ranking evaluated for the full training set using CorrelationAttributeEval. The correlation of the attributes with the amino acid class was analyzed. The colored area at the bottom indicates parts of the secondary structure of the tRNA molecule. The white area indicates loops, the red area indicates anticodons, and the same colors indicate complementary regions of stems. Position 34 (the 3rd letter of the anticodon) is surprisingly less important than other non-coding positions.
Life 14 01328 g003
Figure 4. (a). A simple logistic model of tRNA recognition. The usage of positions with a negative input (attraction) to change the free energy. (b). A simple logistic model of tRNA recognition. The usage of positions with positive (repulsion) input to change the free energy. The white area indicates loops, the red area indicates anticodons, and the same colors indicate complementary regions of stems.
Figure 4. (a). A simple logistic model of tRNA recognition. The usage of positions with a negative input (attraction) to change the free energy. (b). A simple logistic model of tRNA recognition. The usage of positions with positive (repulsion) input to change the free energy. The white area indicates loops, the red area indicates anticodons, and the same colors indicate complementary regions of stems.
Life 14 01328 g004
Figure 5. An example of identity nucleotides (red) found in the seven positions, i.e., 12, 16, 24, 35, 36, 39, and 50, in the analyzed 21 real tRNAGlu strands (for glutamine) using the simple logistic model. Other non-identity nucleotides in these positions (gray) are also shown. In this case, the mean coverage by the identity nucleotides is 64.6% of the possible area. There is no full representation of identity nucleotides in one strand. The figure illustrates the scale of the considered phenomenon, which is also common in other classes. The symbol “?” is 5-methylcytidine.
Figure 5. An example of identity nucleotides (red) found in the seven positions, i.e., 12, 16, 24, 35, 36, 39, and 50, in the analyzed 21 real tRNAGlu strands (for glutamine) using the simple logistic model. Other non-identity nucleotides in these positions (gray) are also shown. In this case, the mean coverage by the identity nucleotides is 64.6% of the possible area. There is no full representation of identity nucleotides in one strand. The figure illustrates the scale of the considered phenomenon, which is also common in other classes. The symbol “?” is 5-methylcytidine.
Life 14 01328 g005
Figure 6. The number of identity positions, real and possible cases, found with the SimpleLogistic model for different tRNAaa classes. Color meaning: blue—min_real, the minimal number found in the real tRNAaa strands; orange—max_real, the maximal number found in the real tRNAaa strands; grey—max_theor, the maximal number theoretically predicted by the SimpleLogistic model for the strand of a given tRNAaa class. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Figure 6. The number of identity positions, real and possible cases, found with the SimpleLogistic model for different tRNAaa classes. Color meaning: blue—min_real, the minimal number found in the real tRNAaa strands; orange—max_real, the maximal number found in the real tRNAaa strands; grey—max_theor, the maximal number theoretically predicted by the SimpleLogistic model for the strand of a given tRNAaa class. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Life 14 01328 g006
Figure 7. (a). The landscape of all inputs to the free energy, ΔGijk, determined in the SimpleLogistic model for the full training set. ΔGijk is the input to the free energy of tRNA binding to the i-th synthetase from the j-th position of the tRNA strand occupied by the k-th nucleot(s)ide. All classes and all nucleot (s)ides are considered. The point represents a single energy input value, positive or negative. Vertical bars show the maximal or minimal value. The colored area at the bottom indicates parts of the secondary structure of the tRNA molecule. The white area indicates loops, the red area indicates anticodons, and the other same colors indicate complementary regions of stems. (b). The example for the tRNAGln class is extracted from (a).
Figure 7. (a). The landscape of all inputs to the free energy, ΔGijk, determined in the SimpleLogistic model for the full training set. ΔGijk is the input to the free energy of tRNA binding to the i-th synthetase from the j-th position of the tRNA strand occupied by the k-th nucleot(s)ide. All classes and all nucleot (s)ides are considered. The point represents a single energy input value, positive or negative. Vertical bars show the maximal or minimal value. The colored area at the bottom indicates parts of the secondary structure of the tRNA molecule. The white area indicates loops, the red area indicates anticodons, and the other same colors indicate complementary regions of stems. (b). The example for the tRNAGln class is extracted from (a).
Life 14 01328 g007
Figure 8. The nucleotide-independent part of the free energy change, ΔG0ii, for different tRNAaa classes, calculated according to parameters of the SimpleLogistic classification, bi (Equation (1)), and the proposed formula of the theoretical model of tRNA tRNA-synthetase binding (Equation (11)). The tRNAaa classes are presented according to increasing amino acid molecular weight.
Figure 8. The nucleotide-independent part of the free energy change, ΔG0ii, for different tRNAaa classes, calculated according to parameters of the SimpleLogistic classification, bi (Equation (1)), and the proposed formula of the theoretical model of tRNA tRNA-synthetase binding (Equation (11)). The tRNAaa classes are presented according to increasing amino acid molecular weight.
Life 14 01328 g008
Figure 9. (a). The spatial distribution of identity nucleotides for class tRNASer. (b). The spatial distribution of identity nucleotides for class tRNASer2. Colors corresponds to those in Figure 3, Figure 4a,b, and Figure 7a,b.
Figure 9. (a). The spatial distribution of identity nucleotides for class tRNASer. (b). The spatial distribution of identity nucleotides for class tRNASer2. Colors corresponds to those in Figure 3, Figure 4a,b, and Figure 7a,b.
Life 14 01328 g009
Figure 10. The average values of the change in the free energy of binding ΔGii for each tRNAaa class determined using Equation (9) and the parameters of Equation (1) taken from the classification task with the SimpleLogistic classifier. The standard deviation is included. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Figure 10. The average values of the change in the free energy of binding ΔGii for each tRNAaa class determined using Equation (9) and the parameters of Equation (1) taken from the classification task with the SimpleLogistic classifier. The standard deviation is included. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Life 14 01328 g010
Figure 11. The values of ΔGii and ΔGforii (=ΔG0ii) are summarized on one chart to estimate strong ΔGrevii = ΔGii − ΔGforii as a measure of the strength of tRNA and aminoacyl-tRNA synthetase attraction. The values are taken from Figure 8 and Figure 10. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Figure 11. The values of ΔGii and ΔGforii (=ΔG0ii) are summarized on one chart to estimate strong ΔGrevii = ΔGii − ΔGforii as a measure of the strength of tRNA and aminoacyl-tRNA synthetase attraction. The values are taken from Figure 8 and Figure 10. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Life 14 01328 g011
Figure 12. The simultaneous dependence of free energy parts—reversal limited only to the anticodon tandem, ΔGrevii(tan); the rest of the reversal energy without the anticodon tandem, ΔGrevii|tan; and forward, ΔGforii, on the maximal number of possible identity points, NIP. Standard variations are 1.45, 1.49, and 1.21, respectively.
Figure 12. The simultaneous dependence of free energy parts—reversal limited only to the anticodon tandem, ΔGrevii(tan); the rest of the reversal energy without the anticodon tandem, ΔGrevii|tan; and forward, ΔGforii, on the maximal number of possible identity points, NIP. Standard variations are 1.45, 1.49, and 1.21, respectively.
Life 14 01328 g012
Figure 13. The dependence of the change in total free energy ΔGii on NIP. Standard variations are shown.
Figure 13. The dependence of the change in total free energy ΔGii on NIP. Standard variations are shown.
Life 14 01328 g013
Figure 14. The dependence of the energy of tRNA attraction to aminoacyl-tRNA synthetase in the area of the anticodon tandem on the anticodon contents (G, C, A, and U). ΔGrevn is the reversal energy limited to only one position (35 or 36) and one nucleotide type. Standard variations are shown.
Figure 14. The dependence of the energy of tRNA attraction to aminoacyl-tRNA synthetase in the area of the anticodon tandem on the anticodon contents (G, C, A, and U). ΔGrevn is the reversal energy limited to only one position (35 or 36) and one nucleotide type. Standard variations are shown.
Life 14 01328 g014
Figure 15. The dependence of the total energy of the attraction on the actual number of possible hydrogen bonds in a given real identity ensemble of the tRNAGlu class. ΔGrevii’ represents the reversal part of free energy change for real cases of the same subsets of identity nucleotides in the tRNAGlu strands with the anticodon tandem 35U and 36C. Empty squares represent a priori zero and the theoretical value of the free energy change for the maximal number of hydrogen bonds in the case of all identity nucleotides presented.
Figure 15. The dependence of the total energy of the attraction on the actual number of possible hydrogen bonds in a given real identity ensemble of the tRNAGlu class. ΔGrevii’ represents the reversal part of free energy change for real cases of the same subsets of identity nucleotides in the tRNAGlu strands with the anticodon tandem 35U and 36C. Empty squares represent a priori zero and the theoretical value of the free energy change for the maximal number of hydrogen bonds in the case of all identity nucleotides presented.
Life 14 01328 g015
Figure 16. (a). Consensus free energy change ΔGij” estimated with the parameters of the final classification task but for the nucleotides present in the consensus strands of different classes. The horizontal line represents the energy of the reduced binding real tRNAGln strand. The tRNAaa classes are presented according to increasing amino acid molecular weight. (b). The consensus free energy change ΔGij” estimated with the parameters of the final classification task but for the nucleotides present in the consensus strands of different classes. The horizontal line represents the energy of the reduced binding real tRNAHis strands. The tRNAaa classes are presented according to increasing amino acid molecular weight. (c). The consensus free energy change ΔGij” estimated with the parameters of the final classification task but for the nucleotides present in the consensus strands of different classes. The horizontal line represents the energy of the reduced binding real tRNALysII strands. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Figure 16. (a). Consensus free energy change ΔGij” estimated with the parameters of the final classification task but for the nucleotides present in the consensus strands of different classes. The horizontal line represents the energy of the reduced binding real tRNAGln strand. The tRNAaa classes are presented according to increasing amino acid molecular weight. (b). The consensus free energy change ΔGij” estimated with the parameters of the final classification task but for the nucleotides present in the consensus strands of different classes. The horizontal line represents the energy of the reduced binding real tRNAHis strands. The tRNAaa classes are presented according to increasing amino acid molecular weight. (c). The consensus free energy change ΔGij” estimated with the parameters of the final classification task but for the nucleotides present in the consensus strands of different classes. The horizontal line represents the energy of the reduced binding real tRNALysII strands. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Life 14 01328 g016aLife 14 01328 g016b
Figure 17. The energy of the less bound strands, maxΔGii, for different classes. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Figure 17. The energy of the less bound strands, maxΔGii, for different classes. The tRNAaa classes are presented according to increasing amino acid molecular weight.
Life 14 01328 g017
Figure 18. The electric charge [e], positive and negative, of aminoacyl-tRNA synthetases at a pH of 7.0 related to amino acids. The net charge is, in most cases, negative.
Figure 18. The electric charge [e], positive and negative, of aminoacyl-tRNA synthetases at a pH of 7.0 related to amino acids. The net charge is, in most cases, negative.
Life 14 01328 g018
Figure 19. The coverage of identity nucleot(s)ides vs. the maximal number of identity points, NIP. Only non-repulsing identity nucleotides were calculated.
Figure 19. The coverage of identity nucleot(s)ides vs. the maximal number of identity points, NIP. Only non-repulsing identity nucleotides were calculated.
Life 14 01328 g019
Figure 20. The molecular weight of charging amino acid, MW, and NIP of the tRNAaa classes shown in the field of discussed parameter values, with the indicated aa metabolic families, i.e., serine (Ser, Ser2, Gly, and Cys), pyruvate (Ala, Val, Leu, and Leu2), aspartate (Asp, Asn, LysI, LysII, Met, Thr, and Ile), glutamate (Glu, Gln, Pro, Arg, and Arg2), histidine (His), and aromatic (Phe, Trp, and Tyr).
Figure 20. The molecular weight of charging amino acid, MW, and NIP of the tRNAaa classes shown in the field of discussed parameter values, with the indicated aa metabolic families, i.e., serine (Ser, Ser2, Gly, and Cys), pyruvate (Ala, Val, Leu, and Leu2), aspartate (Asp, Asn, LysI, LysII, Met, Thr, and Ile), glutamate (Glu, Gln, Pro, Arg, and Arg2), histidine (His), and aromatic (Phe, Trp, and Tyr).
Life 14 01328 g020
Figure 21. The average free energy change ΔGii vs. molecular weight of the corresponding amino acid. Two weight groups of amino acids were distinguished: below and above 150 [Dalton]. Empty markers indicate E-glutamic acid and H-histidine. Histidine has the most similar consensus strand to glutamic acid and vice versa.
Figure 21. The average free energy change ΔGii vs. molecular weight of the corresponding amino acid. Two weight groups of amino acids were distinguished: below and above 150 [Dalton]. Empty markers indicate E-glutamic acid and H-histidine. Histidine has the most similar consensus strand to glutamic acid and vice versa.
Life 14 01328 g021
Figure 22. The identity positions determined by the model and reported in the literature. Note: to avoid overestimation, in the report’s analysis [22], the complementary nucleotide pairs were counted as units. In the analysis of the positions determined by the presented model, class LysI was omitted.
Figure 22. The identity positions determined by the model and reported in the literature. Note: to avoid overestimation, in the report’s analysis [22], the complementary nucleotide pairs were counted as units. In the analysis of the positions determined by the presented model, class LysI was omitted.
Life 14 01328 g022
Table 1. A list of analyzed tRNA amino acid classes, tRNAaa.
Table 1. A list of analyzed tRNA amino acid classes, tRNAaa.
tRNAaa Class
aaCharging Amino AcidPositions 35 and 36Remarks
AlaAlanineG C2-letter gen. code
ArgArginineC Gdegenerate pos. 36
Arg2ArginineC Udegenerate pos. 36
AsnAsparagineU U3-letter gen. code
AspAspartic acidU C3-letter gen. code
CysCysteineC A3-letter gen. code
GlnGlutamineU G3-letter gen. code
GluGlutamic acidU C3-letter gen. code
GlyGlycineC C2-letter gen. code
HisHistidineU G3-letter gen. code
IleIsoleucineA U3-letter gen. code
LeuLeucineA Gdegenerate pos. 36
Leu 2LeucineA Adegenerate pos. 36
LysILysineU Usynthetase class I
LysIILysineU Usynthetase class II
MetMethionineA U3-letter gen. code
PhePhenylalanineA A3-letter gen. code
ProProlineG G2-letter gen. code
SerSerineG Adeg. pos. 35 36
Ser2SerineC Udeg. pos. 35 36
ThrThreonineG U2-letter gen. code
TrpTryptophanC A3-letter gen. code
TyrTyrosineU A3-letter gen. code
ValValineA C2-letter gen. code
Table 2. The values of the correlation rank for the most important attributes of those presented in Figure 3.
Table 2. The values of the correlation rank for the most important attributes of those presented in Figure 3.
Rank
Position
tRNA Sequence PositionCorrelation Rank Value
1350.1942
2360.1925
3460.1659
4470.162
5480.1612
6730.1375
7110.1263
8240.1213
9450.1172
10130.1162
41340.0752
Table 3. Predictions of the trained candidate classifiers.
Table 3. Predictions of the trained candidate classifiers.
Classifier10 Fold Cross-Validation66% SplitFull Training Set
SimpleLogistic94.520592.5287100
LibLINEAR93.933585.6322100
RandomForest93.933583.908100
SMO92.367981.0345100
Dl4jMlpClassifier86.692878.1609100
ZeroR8.02358.0468.0235
Table 4. The identity nucleot(s)ides for different tRNAaa classes determined using the SimpleLogistic classifier. Symbol definitions are in Supplementary Materials, Table S1.
Table 4. The identity nucleot(s)ides for different tRNAaa classes determined using the SimpleLogistic classifier. Symbol definitions are in Supplementary Materials, Table S1.
Ala3G17D35G36C70U71C
Arg21A35C36G
Arg221A35C36U69G72U
Asn5C14U35U37 639C41G63~C73~A
Asp34G8Q35U36C73G
Cys9G34G35C36A63G73U
Gln1~G35U36G41C44~A58~A70A71C
Glu12C16A24A35U36C39C50A
Gly35C36C
His−1G2C17U35U36G37G
Ile2G7~G35A36U37 640G41G70~G
Leu35A36G48~-55G
Leu212A13~C35A36A48~-
LysI4C18~-28C35U36U43~G71C
LysII7~G29U34)~G35U36U67~G
Met7G11G31P34MCB35A36U
Phe21G23A34G#35A36A44G
Pro35G36G37K
Ser25A35G36A48~-73G
Ser220-34 7G35C36U46~-
Thr2C35G36U73U
Trp22G34B35C36A70C72U73~U
Tyr21C34G35U36A45-63~C
Val35A36C
Table 5. The consensus representatives of TRNAaa strands are consistent, and inconsistent, with predictions of the SimpleLogistic model. Symbol definitions are in Supplementary Materials, Table S1.
Table 5. The consensus representatives of TRNAaa strands are consistent, and inconsistent, with predictions of the SimpleLogistic model. Symbol definitions are in Supplementary Materials, Table S1.
AlaArgArg2AsnAspCysGlnGluGlyHisIleLeuLeu2Lys1bLys2bMetPheProSerSer2ThrTrpTyrVal
−1---------G--------------
1GGGGGGGGGGGGGGGGGCGGGAGG
2GGCUAGGCCCGCUGACCGGACGGG
3GGCCCCUCGCGUCGCCCGACCGAU
4GCCUAUGCGGCAACUUGCGAGGGU
5GCCCCACUGUCGGCGGAGGAAGGC
6CCCUGCCCGUUUGCGCGACAUCGC
7AGGGGAAGGAUAAGUGAGGGUGGG
8UUUUUUUUUUUUUUUUUUUUUUUU
9AAAKAGKGAAAGGAAAAAGGAAAA
10GGGLGGGGGGGLGGLGLGGGGGGG
11CCCCUCUUUUCCCCCCCCCCCUCU
12UUUGACGCUUUMMUUUUCMCUUGU
13CCUCPGPPUPCGGCCCCCG-CCAP
14AAAAAAAAAAAAAAAAAAAAAAAA
15GAAAGGGGAGGGGGGGGGGGGAGG
16CDUDUCDAUDUDDCDDDCD-DUUD
17D--C------D--CCDDU------
18-------------A---U------
19GGGGGGGGGGGGGGGGGG#GGGGG
20GGGGGGGGGGGGGGGGGGGGGGGG
21GAADDDDCDDDDDCDDGDDDDDCD
22GAGGGAGGGGGAAGGGGGAAGGAU
23AAACUGCGAAAGGAAAAGGGAACA
24GGGGAGAAAAGGGGGGGGGGGAGA
25CUCCUCCCCCCCCCCCCUCCCCCC
26RRARAAUAAARRRGAARARRAAGA
27CCCPCPCCCCPCCGPPPCAACCGP
28UUCPCCCCCAGUCCPCPUCACCCC
29UUGCCGGGAGCGAGUGAUAGUGAU
30GGGGGGGCGCCGGGGGGGGACGGG
31CACGCAAGCGGAAGAPACAACAAC
32UCCCCCBCCPCPCCCCBUhcCCCCU
33UUUUUUUUUUUUUUUUUUUUUUUU
34II{QGGNCGGGU.UCC#UIGGBGI
35GCCUUCUUCUAAAUUAAGGCGCUA
36CGUUCAGCCGUGAUUUAGAUUAAC
37AK66AKAAAK6KKA66*K*66**A
38CAAAAAACGCACAAAAAPAAAAA?
39GPGCGPPCGCCPPCPPPGPGPPPG
40CCCCGCCGCGGCCCCC?CCUGCCC
41AACGGCCCUCGCUCACUAUCACCA
42AAGAGGAGGUAAGGAGAAGAGGGG
43GGGAGUGGGGGGGCAAUGUUGACA
44AAAAAACAAAAUJGGAAGJJAAUA
45GGGGGAGGGAG--GGGGG--GG-G
46-----------CG-----GG--G-
47-----------CC-----GG--C-
48-----------CC-----G---C-
49??GGAA????A?GCAA??GGGGG?
50CGCGCC??UUCUUCACCCCCUUGC
51GGAUGCGGGGUGGGGAUAAGAGAC
52GGGGGGGGGGGGGGGAGGGGGGGG
53GGGGGGGGGGGGGGGGGGGGGGGG
54TTT.TUUTTUTTTUTTTTTTTTTT
55PPPPPPPPPPPPPUPPPPPPPPPP
56CCCCCCCCCCCCCCCCCCCCCCCC
57GGGGAGGGGGAAGAGGGAGGGGGG
58AAAAAAAAAAAAAAAAAAAAA
59UAAGUAAAUAGAAAGAUAAAUAAA
60UUUCUUUUUUUUUUUU?UUUUUUU
61CCCCCCCCCCCCCCCCCCCCCCCC
62CCCCCCCCCCCCCCCUCCCCCCCC
63CCUACGCCCCACCCCCGUUCUUGG
64GUGCGGGGGACAAGUUGGGGAGUG
65GGGCUUGGGGUCCGCUGGCGCCCG
66CCCCCUUUCUAUUCAAUCCCACCC
67GGGACGGGCAAGUGUGCUGUGGCG
68GGGGGUGGCAGCCGCCUGCUUCCG
69CGGGUGGGCCGUUGACCGCUCCCA
70UCGGGGAGCGCGGCGGGCUGGCUA
71CUGAUCCGGGCGACUGGCCUGCCC
72CCUCCCCAC?CCCCCCCGCCCUCC
73AGAGGUUAAAAAAGAAAAGGAGAA
74CCCCCCCCCCCCCCCCCCCCCCCC
75CCCCCCCCCCCCCCCCCCCCCCCC
76AAAAAAAAAAAAAAAAAAAAAAAA
Table 6. The universal tRNAaa class markers.
Table 6. The universal tRNAaa class markers.
Ala35G36C71C
Cys34G35C36A73U
Glu24A35U36C
LysI4C28C35U36U71C
Table 7. Unique ensembles of identity nucleotides without those from positions 35 and 36. Symbol definitions are in Supplementary Materials, Table S1.
Table 7. Unique ensembles of identity nucleotides without those from positions 35 and 36. Symbol definitions are in Supplementary Materials, Table S1.
Ala3G17D70U71C
Arg221A69G72U
Asn5C39C41G63~C73~A
Asp34 873G
Cys9G34G63G73U
Glu12C16A24A
His−1G2C17U
2C17U37G
Ile2G7~G37 640G41G70~G
Met7G31P34C
7G31P34B
Phe21G23A34#
23A34#44G
Ser25A48~-73G
Ser220-34G46~-
Trp22G34B70C72U73~U
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pawłowski, P.H.; Zielenkiewicz, P. Determining the Identity of Nucleotides and the Energy of Binding of tRNAs to Their Aminoacyl-tRNA Synthetases Using a Simple Logistic Model. Life 2024, 14, 1328. https://doi.org/10.3390/life14101328

AMA Style

Pawłowski PH, Zielenkiewicz P. Determining the Identity of Nucleotides and the Energy of Binding of tRNAs to Their Aminoacyl-tRNA Synthetases Using a Simple Logistic Model. Life. 2024; 14(10):1328. https://doi.org/10.3390/life14101328

Chicago/Turabian Style

Pawłowski, Piotr H., and Piotr Zielenkiewicz. 2024. "Determining the Identity of Nucleotides and the Energy of Binding of tRNAs to Their Aminoacyl-tRNA Synthetases Using a Simple Logistic Model" Life 14, no. 10: 1328. https://doi.org/10.3390/life14101328

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop