Next Article in Journal
Differential Regulation of POC5 by ERα in Human Normal and Scoliotic Cells
Previous Article in Journal
Forensic Implications of the Discrepancies Caused between NGS and CE Results by New Microvariant Allele at Penta E Microsatellite
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparative Analysis of Codon Usage Patterns in Nuclear and Chloroplast Genome of Dalbergia (Fabaceae)

Key Laboratory of Genetics and Germplasm Innovation of Tropical Special Forest Trees and Ornamental Plants (Ministry of Education), Hainan Key Laboratory for Biology of Tropical Ornamental Plant Germplasm, School of Forestry, Hainan University, Haikou 570228, China
*
Authors to whom correspondence should be addressed.
Genes 2023, 14(5), 1110; https://doi.org/10.3390/genes14051110
Submission received: 1 April 2023 / Revised: 4 May 2023 / Accepted: 16 May 2023 / Published: 19 May 2023
(This article belongs to the Section Plant Genetics and Genomics)

Abstract

:
The Dalbergia plants are widely distributed across more than 130 tropical and subtropical countries and have significant economic and medicinal value. Codon usage bias (CUB) is a critical feature for studying gene function and evolution, which can provide a better understanding of biological gene regulation. In this study, we comprehensively analyzed the CUB patterns of the nuclear genome, chloroplast genome, and gene expression, as well as systematic evolution of Dalbergia species. Our results showed that the synonymous and optimal codons in the coding regions of both nuclear and chloroplast genome of Dalbergia preferred ending with A/U at the third codon base. Natural selection was the primary factor affecting the CUB features. Furthermore, in highly expressed genes of Dalbergia odorifera, we found that genes with stronger CUB exhibited higher expression levels, and these highly expressed genes tended to favor the use of G/C-ending codons. In addition, the branching patterns of the protein-coding sequences and the chloroplast genome sequences were very similar in the systematic tree, and different with the cluster from the CUB of the chloroplast genome. This study highlights the CUB patterns and features of Dalbergia species in different genomes, explores the correlation between CUB preferences and gene expression, and further investigates the systematic evolution of Dalbergia, providing new insights into codon biology and the evolution of Dalbergia plants.

1. Introduction

The central dogma is the fundamental principle of the transfer of genetic information between macromolecules within biological cells [1]. During the transfer of genetic information from the mRNA to the protein, the triplet codons play a crucial role in the formation of proteins in organisms, especially during the translation process [2]. Three bases make up a codon and jointly encode one amino acid, with each amino acid being encoded by one or more codons, but no more than six [3]. Tryptophan (Trp, W) is only encoded by UGG, and methionine (Met, M) is only encoded by AUG. The remaining 18 amino acids are encoded by multiple synonymous codons, ensuring the stability of the translation process [4]. However, synonymous codons are not used randomly or equally; rather, some are preferentially used to encode amino acids over others. This phenomenon of using synonymous codons with different frequencies is known as codon usage bias (CUB) [5,6,7,8]. The preference for certain codons mainly reflects the impact of translation levels under weak natural selection. Furthermore, the composition of the third base of a codon is subject to strong mutational bias and codirectional natural selection [9,10].
The chloroplast (cp) genome is non-recombinant and maternally inherited, exhibiting features such as semi-autonomy and conservative gene content, organization, and structure relative to the mitochondrial and nuclear genomes [11,12]. It is relatively stable in structure and contains a large amount of genetic information specific to plants. Therefore, it plays an important role when we study the origin, systematic evolution, species identification, and classification of plants and serves as a critical data source for exploring the evolutionary relationships between plant species [13,14]. Systematic phylogenetic analysis based on chloroplast genomes has been extensively reported, for instance, Achyranthes (Amaranthaceae) [15], Ferula (Apiaceae) [16], and Aquilegia (Ranunculaceae) [17]. Plants are subject to different evolutionary constraints, and there are differences in codon usage patterns in chloroplast genomes [18].
Dalbergia is a pan-tropical genus with over 269 recognized tree, shrub, and woody vine species [19]. The genus is native to more than 130 countries in tropical and subtropical regions [20]. Members of Dalbergia have high economic and medicinal values [19]. For example, the rare species D. odorifera [20,21] and the high-quality timber species Dalbergia cochinchinensis [22]. Studying codon usage bias in Dalbergia may have practical applications for future genetic modification of these species, potentially as timber, furniture, or other economically valuable products [23,24]. Furthermore, Dalbergia plants possess nitrogen-fixing capabilities and can serve as support species to produce parasitic trees such as sandalwood [25]. CUB analysis is an effective method for studying species specificity, evolutionary relationships, and mRNA translation, and for discovering new genes [26]. Therefore, we systematically analyzed the codon usage patterns of the entire genome of D. odorifera and the protein-coding sequences of 25 chloroplast genomes of the Dalbergia genus, and evaluated the impact of natural selection, mutational pressure, and other factors on codon usage. At the same time, we investigated the relationship between codon usage preference in the entire genome of D. odorifera and gene expression, as well as the systematic evolution of the Dalbergia genus. This study aims to provide valuable insights into the genetic modification and improvement of Dalbergia species for economic and environmental purposes and to contribute to a better understanding of the codon usage bias in other plants.

2. Materials and Methods

2.1. Data Collection and Processing

The protein coding sequences (CDS) of the chloroplast reference genome of Dalbergia were collected and downloaded by us from the NCBI database (Supplementary Table S1). The nuclear genome protein coding sequence of D. odorifera, and the RPKM expression matrix of four tissues (root, stem, leaf, and seed) were downloaded directly from the GigaDB dataset (http://gigadb.org/dataset/100760), accessed on 25 May 2022. To minimize errors caused by short sequences, CDS sequences less than 300 bp in length were removed [27].

2.2. Codon Usage Indicators

Codonw1.4.2 and the online software Cusp were used to calculate codon usage metrics [28,29]: ENC, RSCU, GC, GC3s, A3, T3, G3, C3, GC1, GC2 and GC3. The GC content of different CDS sequence positions were calculated by a Python3.6 script.

2.3. Synonymous Codon Analysis

The relative synonymous codon usage (RSCU) is the ratio of the observed frequency of a particular codon to the expected frequency if all synonymous codons for a specific amino acid were used equally in the coding sequence. A codon with an RSCU value > 1.6 is considered over-represented, while an RSCU value < 0.6 is considered under-represented. RSCU values < 1 indicate that the codon is used less frequently, while RSCU values >1 indicate that the codon is used more frequently. The RSCU index of codons is calculated as follows [30]:
R S C U i j = X i j 1 n i j = 1 n i X i j .
In which Xij is the frequency of occurrence of the jth codon of the ith amino acid and ni is the number of codons of the ith amino acid.

2.4. ENC-Plot Analysis

The effective codon number (ENC) is usually used to express the degree of random selection of codon usage bias. It takes values in the range of 20–61. The magnitude of ENC value is inversely proportional to the degree of codon usage bias. That is, a smaller ENC value is associated with a strong codon bias, and conversely, a larger ENC value is associated with a weak codon usage bias. We used the values of GC3 and ENC as the horizontal and vertical coordinates, respectively, and added the standard curve by the following equation [31]:
E N C exp = 2 + G C 3 s + 29 G C 3 s 2 + ( 1 G C 3 s ) 2
ENCexp represents the expected position of the gene when codon use preference is determined only by GC3s composition. GC3s represents the content of the third base G + C of the synonymous codon.

2.5. Parity Rule 2 (PR2) Bias Plot Analysis

The formation of codon usage preferences is closely related to the bases in the third position of the codon. We calculated the base composition at the third position of the codon. We also used G3/(G3 + C3) as the horizontal coordinate and A3/(A3 + T3) as the vertical coordinate, thus analyzing the distribution of the third base in the codon. Theoretically, the frequency of using the third base of the codon should follow the PR2 principle (A = T, C = G). Additionally, the bases that deviate from the central distance indicate the degree and direction of codon usage deviation from the rule [8].

2.6. Neutrality Plot Analysis

Neutral plot analysis is a method used to explore factors that influence codon usage patterns. This method involves calculating GC3 and GC12 (the mean of GC1 and GC2), with GC3 as the independent variable and GC12 as the dependent variable, and fitting a straight line to the data. The regression coefficient is a key indicator of neutrality, with a positive or negative value indicating the direction of the correlation between GC3 and GC12, and its magnitude determining the strength of the correlation [32].

2.7. The Determination of Optimal Codon

Optimal codons are those with high frequency of use and an ENC difference above a certain threshold. High utilization codons are those with RSCU values greater than 1, while the ENC difference categorizes genes into high and low codon preference groups based on their ENC values. We also calculated the ΔRSCU value of codons, with a threshold value of 0.08. If a codon has ΔRSCU > 0.08 and RSCU > 1, it is considered an optimal codon [33].

2.8. The Correlation between Codon Usage and Gene Expression

In order to investigate the interaction between codon usage and gene expression in the nuclear genome of D. odorifera, we evaluated three levels: (1) At the sequence level, all ENC values of the CDS sequence were divided into two categories (low and high codon bias strength). They were also characterized at high, medium, and low transcription levels, respectively. (2) At the codon level, the bias of bases (A, T, C, and G) in the third position of synonymous codons was calculated. The total number of four bases in each codon and the maximum number of biases in the third position of the codon were calculated, and then all RPKMs of CDS sequences were classified into four categories based on the use of the third base of the codon. (3) At the amino acid level, amino acids were divided into four categories based on the frequency of use of 59 synonymous codons (Supplementary Table S2). For each type of amino acid, the RSCU value of synonymous codons was calculated, and the codon with the highest RSCU was used to group them. The gene expression level of CDS sequences in each group was then analyzed, and t-tests were used to analyze the significant differences in the usage of different categories of synonymous codons for each type of amino acid [34].

2.9. Cluster Analysis and Phylogenetic Tree Construction

Clustering analysis was performed based on codons using RSCU values as features with the Euclidean distance method [35]. In addition, 25 chloroplast genome sequences were used for phylogenetic analysis. All sequences were aligned using MAFFT (v7.505) (https://mafft.cbrc.jp/alignment/server/, accessed on 25 May 2022) with the parameters “mafft—thread 8—threadtb 5—threadit 0—reorder—auto”. The phylogenetic tree was constructed using MEGA v11.0 (https://www.megasoftware.net/, accessed on 25 May 2022), where obvious regions that may not belong to the pair were manually removed using the neighbor-joining (NJ) method [36].

3. Results

3.1. Analysis of Codon Composition Characteristics

In the nuclear genome of D. odorifera, there are 27,940 coding protein sequences, while the 25 chloroplast genomes of the Dalbergia species have a total of 1438 protein-coding sequences (Supplementary Table S3). For each sequence, the usage of seven codons was calculated, and the codon usage biases of the protein-coding sequences in the nuclear genome and chloroplast genomes were compared (Table 1 and Figure 1). Additionally, a correlation analysis was conducted on the codon usage bias of protein-coding sequences in both the nuclear genome and chloroplast genomes (Figure 2A,B).
In the nuclear genome sequence, there were 86 sequences with an ENC value less than 35, and 19,724 sequences with an ENC value greater than 50. The average ENC value was 51.86 (Table 1 and Supplementary Table S4), indicating a weak overall codon usage bias. In the protein-coding sequences of the chloroplast genome, there were no sequences with an effective number of codons (ENC) value less than 35, while 684 sequences had an ENC value greater than 50. The average ENC value was 49.90, which was slightly lower than that observed in the protein-coding sequences of the nuclear genome (Table 1 and Supplementary Table S5). The average GC content of the nuclear genome protein-coding sequences was 46.1%, while that of the chloroplast genome protein-coding sequences was 39.0% (Table 1). The total GC content of the chloroplast genome protein-coding sequences was lower than that of the nuclear genome sequence (t = 58.672, p < 0.0001). Furthermore, differences were found in the GC content relationship at different positions within the protein-coding sequences of both the genome-wide and chloroplast genomic sequences. Interestingly, while the GC1 content was highest in the chloroplast genome and the GC3 content was lowest, the opposite was observed in the genome-wide sequences, with GC1 being highest and GC2 being lowest. These results are consistent with previous studies on Mesona chinensis [37] and Sesamum indicum [38] (Figure 2C,D).
In addition, both the nuclear genome and the chloroplast genome protein-coding sequences had GC content and GC3s less than 50%, indicating an enrichment of A/U bases in the third position of the codon. However, there was a significant difference in GC3s between the chloroplast and nuclear genome protein-coding sequences (t = 26.726, p < 0.001), indicating differences in codon usage between the two sequence types. Meanwhile, our correlation analysis shows that the chloroplast genome protein-coding sequences showed a stronger correlation in codon usage bias compared to the nuclear genome protein-coding sequences, particularly in the GC1, GC2, and ENC indices. Notably, the ENC value of both showed a positive correlation with the G/C content and a negative correlation with the A/T content. Additionally, the ENC value was most closely related to GC3 content (Figure 2A,B), indicating a strong conservation of the GC3 at the genetic level.
Furthermore, the analysis of codon composition characteristics in the chloroplast genome of 25 Dalbergia species revealed an average of 24,482 codons, with Dalbergia millettii having the most (24,738) and Dalbergia oliveri having the fewest (24,317) (Table S5). The GC content of the three base positions in each chloroplast genome’s codons was biased toward A and U. The ENC values of all 25 species were between 49 and 50 (Table S5), indicating a weak codon bias. These findings suggest that the codon GC content, ENC value, and their correlations may affect the analysis of factors influencing codon usage.

3.2. Synonymous Codon Analysis

We analyzed synonymous codon usage in 25 species of Dalbergia and found differences in their relative usage across species, which allowed us to group them into nine clusters. The 59-dimensional codon vectors could be classified into six categories. In the chloroplast genomes of Dalbergia, we observed a bias toward using high-frequency codons with A/U endings, with 30 synonymous codons having a relative usage greater than 1 (Figure 3A).

3.3. ENC—Plot Analysis

Mutational pressure and natural selection are important factors that affect codon usage. If mutational pressure is the main factor affecting codon usage preference, the true ENC values (ENCobs) should be close to the region of the ENCexp expectation curve. Conversely, if natural selection is the main factor affecting codon usage preference, the true ENC values will deviate farther from the region of the ENCexp expectation curve [39]. As shown in the figure, Both the protein-coding sequences of the nuclear genome and the chloroplast genome are mainly distributed below the standard curve, and the ENC values are clustered between 40 and 61. This indicates that both have similar codon usage preferences and relatively weak codon bias. In addition, natural selection is the main factor affecting their ENC usage preferences, followed by mutation (Figure 3B).

3.4. PR2-Plot Analysis

This study conducted a Parity Rule 2 (PR2) analysis of the relationship between the third base of codons (A3/T3 and G3/C3) in the protein-coding sequences of the nuclear genome of D. odorifera and 25 chloroplast genomes of the Dalbergia. The results showed that the coordinate points of the four regions were unevenly distributed. Specifically, in the protein-coding sequences of the nuclear genome of A. sinensis, the A3/(A3 + T3) ratio was mainly distributed below 0.5, indicating that the frequency of T base usage was higher than that of A in the nuclear genome. In addition, in the chloroplast genomes of the Dalbergia species, the G3/(G3 + C3) ratio was mainly distributed above 0.5, indicating that the frequency of G base usage was higher than that of C in the chloroplast genome (Figure 3C). These results further demonstrate that base mutations and natural selection jointly affect the codon usage bias in both the nuclear genome and chloroplast genome of the Dalbergia species, with natural selection being the major factor influencing the codon usage bias of these genomes.

3.5. Neutrality Plot Analysis

In neutral plot analysis, if the main factor affecting codon bias is mutation, then the fitted linear regression coefficient is close to 1. Conversely, if the main factor affecting codon bias is natural selection, then the fitted linear regression coefficient should be close to 0 [40]. Neutral plot analysis showed that in the protein-coding sequences of the nuclear genome (r = 0.2864, p < 0.001), GC12 was distributed from 0.2610 to 0.8219, and GC3 was distributed from 0.1877 to 0.9112. In the protein-coding sequences of the chloroplast genome (r = 0.2308, p < 0.001), GC12 was distributed from 0.3046 to 0.5396, and GC3 was distributed from 0.2005 to 0.4805 (Supplementary Table S6). The fluctuations in GC3 and GC12 are smaller in the chloroplast genome and their values tend to be more stable. The correlation coefficients of GC3 and GC12 in the protein-coding sequences of the nuclear genome and chloroplast genome were 0.1294 and 0.3118, respectively (Figure 3D), indicating that both parameters were positively correlated in the protein-coding sequences of the nuclear genome and chloroplast genome, with a closer correlation in the chloroplast genome. The results further indicate that natural selection is the major factor influencing the codon usage patterns in both the nuclear and chloroplast genomes of the Dalbergia genus, while base mutations play a secondary role.

3.6. Optimal Codon Determination

To determine the optimal codons for the protein-coding sequences of the nuclear genome and the chloroplast genomes of 25 Dalbergia species, we selected genes with 5% of each end of the ENC value as the high and low expression genomes, respectively, which were sorted according to the ENC value. The RSCU values of all synonymous codons in the nuclear genome and chloroplast genome were calculated. A total of 25 optimal codons (RSCU > 1, ΔRSCU > 0.08, and 16 ending in A/U (A:6, U:5, G:2, and C:3) was selected from the nuclear genome protein-coding sequences (Supplementary Table S7). In the chloroplast genomic protein-coding sequence, there are between 15 and 23 optimal codons for two species, Dalbergia martinii and Dalbergia obtusifolia, respectively (Figure 4). In addition, the chloroplast genomes of 25 species share four codons: UUG (leucine), UCU (serine), GCA (alanine), and UGU (cysteine). Except for UUG, the third bases of the other three codons preferentially use A/U endings (Figure 4). This result is like the genome-wide protein-coding sequence. These findings suggest that the whole-genome and chloroplast genomic protein-coding sequences of Dalbergia tend to use A/U-terminal codons.

3.7. Relationship between Codon Usage and Gene Expression

To better understand the expression of genes at the level of codon bias, we plotted RPKM-ENC heatmaps for the leaf, root, stem, and seed tissues of D. odorifera (Figure 5). From the figure, we can see that the expression trends of genes in the four tissues are similar, and most genes have ENC values greater than 50. This indicates that the codon bias of the protein-coding sequences in the nuclear genome of D. odorifera is relatively weak. To further analyze the relationship between gene expression and codon bias, we conducted a three-level analysis: At the sequence level, we classified all genes into three expression levels: high (RPKM > 10), medium (10 ≥ RPKM > 1), and low (RPKM ≤ 1), with genes having an RPKM of 0 being excluded. Each expression level was then further divided into two groups based on their ENC values. Our results showed that more genes had ENC values greater than 50, indicating a lower codon usage bias in the D. odorifera genome. Notably, significant differences in RPKM values were observed in both the high and medium expression level groups (Supplementary Table S8). It is worth mentioning that, within our three expression levels, gene expression and codon bias were inconsistent: ① In the high expression level group, gene expression increased with stronger codon bias (t = 3.5113, p = 4.519 × 10−4; ② In contrast, in the medium expression level group, genes with weaker codon bias had higher expression levels (t = −5.5917, p = 2.348 × 108); ③ In the low expression level group, the relationship between gene expression and codon bias was not evident, unlike in the other two groups (t = −0.9879, p = 0.3233). This result suggests that in D. odorifera, the stronger the codon bias in highly expressed genes was, the higher was the gene expression level, while such a trend was not observed in the medium and low expression level genes (Figure 6A–C).
Based on the third base bias of each codon, we grouped all CDS sequences into four groups at the codon level: those ending in A, T, C, and G. CDS sequences with codons biased toward G/C endings had higher RPKM values than those biased toward A/T endings (Figure 6D and Supplementary Table S9). This result suggests that genes biased toward G/C endings at the third base tend to be highly expressed.
On the basis of the frequency of synonymous codon usage for each amino acid, we divided the CDS sequences into two-codon groups (Phe, His, Lys, Asn, Asp, Cys, Gln, Glu, and Tyr) (Figure S1), three-codon groups (Ile), four-codon groups (Pro, Thr, Ala, Gly, and Val), and six-codon groups (Ser, Leu, and Arg) (Figure S2 and Supplementary Table S10). For each compared synonymous codon of an amino acid, except for Leu and Arg, codons ending in G/C were predominantly expressed compared to those ending in A/U (Figures S1 and S2). This result is consistent with the codon-level findings. It is worth noting that although the synonymous codon analysis and optimal codon analysis suggest that D. odorifera prefers to end with A/U, most highly expressed genes in this study tended to end with G/C. This suggests that highly expressed genes in D. odorifera prefer to end with G/C, unlike the codon third base bias preference for A/U.

3.8. Phylogenetic Relationships of 25 Dalbergia Species

To investigate the correlation between codon usage bias and phylogeny, we generated phylogenetic trees using RSCU, CDS, and whole chloroplast genomes. Our analysis revealed that the phylogenetic tree constructed based on the chloroplast nuclear genomes and CDS sequences of the 25 Dalbergia species was more consistent with the true classification of the species, indicating a closer relationship between codon usage preference and evolutionary history. These findings could provide insights into the evolution and diversification of the Dalbergia (Figure 7).
The clustering results based on the RSCU values of codon preference features showed significant differences, with only some branches being identical (Figure 7 and Figure S3). However, the preferences for GC content and ENC usage were consistent regardless of the evolutionary tree, indicating that each species in Dalbergia has a fixed codon usage profile. Although GC3 exhibits slight fluctuations and is more like the CDS sequence based on the whole chloroplast genome, both differ from the clustering tree constructed based on RSCU. This suggests that preference-free codons also play a significant role in the process of species evolution, and the relevant properties of preference-free codons need further investigation (Figure 7 and Figure S3).

4. Discussion

Unequal use of synonymous codons in coding sequences is common in all life forms [41]. According to previous studies, codons play an important role in gene regulation and molecular evolution as important constituent elements in the translation of gene coding regions into shape proteins. The coupling between codon usage and protein sequence selection varies over a wide range and is particularly pronounced in high preference genes [42,43]. At present, studies of codon usage bias have been reported for many species [39,44,45,46]. In this study, we analyzed the codon usage patterns in both the nuclear genome and chloroplast genome of D. odorifera and 25 other species of the Dalbergia. We found that the codon usage patterns were very similar among the nuclear genomes and chloroplast genomes of the Dalbergia, as well as among different chloroplast genomes within the genus. Furthermore, we evaluated the influence of factors such as mutation pressure and natural selection on the uneven usage of codons. We also conducted an in-depth exploration of the correlation between codon usage bias in the nuclear genome of A. sinensis and its gene expression. Additionally, we constructed phylogenetic trees based on RSCU, CDS, and the whole chloroplast genome, respectively.
Due to the relatively weak selection pressure on the third base of codons, GC3 is usually considered an important parameter for analyzing codon usage bias [47]. Based on our analysis of synonymous and optimal codons, we discovered that the protein-coding sequences in the genome-wide and chloroplast genomic of Dalbergia exhibit a preference for A/U-ending codons (Figure 3A and Figure 4). This result is consistent with the previously reported codon usage bias in dicotyledons such as Theaceae and Solanum [8,48].
The factors that affect codon usage bias are complex, including mutation pressure, GC content, natural selection, gene length, recombination rate, gene expression level, and genetic code repair, among others [49,50,51]. ENC-plot, PR2-plot, and neutrality plot analyses revealed that natural selection is the major factor driving codon usage bias in the genome of Dalbergia, with mutation pressure playing a secondary role (Figure 3B–D). This is the same with the previous studies of Euphorbiaceae [52], Panicum [53], Malus [54]. Further exploration is needed into other factors influencing codon usage preferences. Overall, the above-mentioned factors work together to shape the codon usage pattern in Dalbergia species. Our analyses suggest that natural selection is the main force driving the observed codon usage bias, which is consistent with the notion that the usage of synonymous codons is under selection for translational efficiency and accuracy. Understanding the factors that govern codon usage patterns in Dalbergia could provide insights into the evolutionary processes and molecular mechanisms underlying gene expression regulation in this important genus.
In our study, we evaluated the correlation between the expression levels of protein-coding genes in D. odorifera and codon bias in three directions. At the sequence level, we investigated the relationship between gene expression and codon usage bias across high, medium, and low transcription levels. In the high expression gene group, gene expression and codon bias showed a positive correlation, consistent with previous research by Cuscuta australis (Convolvulaceae) [34]. Conversely, the medium expression gene group demonstrated a negative correlation between codon bias and gene expression, while the low expression gene group revealed no clear relationship between the two. This may be attributed to the species-specific characteristics of D.odorifera. Moreover, the selection of different expression level thresholds in various studies may potentially influence the experimental results. Previous research primarily focused on examining the relationship between gene expression and codon bias in higher expression genes [55]. In our study, we integrated transcription levels for all genes and divided them into three levels to characterize the relationship more comprehensively between gene expression and codon bias in D.odorifera. At the codon level, genes with a G/C-ending codon bias showed significantly higher expression levels than those with an A/U-ending codon bias. At the amino acid level, we found that highly expressed genes tended to use G/C-ending synonymous codons. These results are consistent with those observed in Arabidopsis thaliana [56]. Overall, the strength of codon bias in gene expression showed a positive correlation, and highly expressed genes tended to use G/C-ending codons.
The standard evolutionary model typically assumes that codon bias is explained by a balance between genetic mutation, selection, and drift (mutation pressure, natural selection, and genetic drift) [57]. We attempted to link the codon usage preference feature with the system evolution feature, seeking to identify similarities and differences between the two. We found that the clustering tree based on the codon usage feature RSCU differed significantly from the other two system evolution trees. The codon usage features of different species in the Dalbergia were not entirely identical. Previous literature has reported a negative correlation between codon bias and coding region length, with genes with longer coding regions exhibiting a greater tendency toward random codon usage [58]. The greater the randomness of codon usage is, the less likely the codons are to match together in sequence alignment. Our results confirmed that the clustering based on codon bias and the evolutionary tree constructed from sequence alignment differed significantly (Figure 7 and Figure S3). It is worth noting that the four species, D. odorifera, Dalbergia hainanensis, Dalbergia tonkinensis, and D. hainanensis, were more likely to be classified together. We found that these species are geographically close, all located in southern Asia, with D. odorifera and D. hainanensis originating in China’s Hainan region [19,20,59]. This indicates that species that have a closer origin tend to have more similar codon usage preference features and system evolution features. Additionally, previous research has shown that clustering based on codon bias does not accurately reflect the true system classification and phylogenetic relationships [60]. Our study also confirmed this result. The reasons for the differences may be related to the number of effective codons, the base composition at different positions, and the usage of synonymous codons. This further emphasizes the importance of considering the mutation characteristics of genome sites and sequence information in non-coding regions during the construction of evolutionary trees, thereby helping in the in-depth study of the evolution of Dalbergia plants.

5. Conclusions

This study investigated the codon usage patterns of the nuclear genome and chloroplast genome of Dalbergia species, and explored the correlation between codon usage preferences in the genome of D. odorifera and gene expression. Additionally, a phylogenetic analysis of 25 Dalbergia species was conducted. The results showed that Dalbergia species exhibited a preference for A/U-ending codons at the third position of protein-coding sequences in both the nuclear genome and chloroplast genome. A total of 25 optimal codons was identified in the nuclear genome, and between 15 and 23 optimal codons were identified in the chloroplast genome. Natural selection was found to be the primary factor influencing codon usage bias in Dalbergia species, followed by mutation pressure. Furthermore, in D. odorifera, genes with stronger codon usage bias exhibit higher expression levels among highly expressed genes, and these highly expressed genes tend to preferentially use G/C-ending codons. The phylogenetic analysis based on chloroplast CDS sequences, and the nuclear genome showed strong similarity, but the phylogenetic tree based on codon usage features (RSCU values) showed significant differences. This suggests that unbiased codons play an important role in the evolutionary process of species. Overall, this study provides insights into the codon usage patterns of Dalbergia species and their relationship with gene expression, as well as their evolutionary relationships.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes14051110/s1, Figure S1: The relationship between RPKM value and synonymous codon preference in amino acid level (all amino acids with two synonymous codons, and lower a and b in boxplot meant the significant difference in statistical analysis); Figure S2: The relationship between RPKM value and synonymous codon preference in amino acid level (all amino acids with more than two synonymous codons, and lower a and b in boxplot meant the significant difference in statistical analysis); Figure S3: Clustering diagram based on RSCU values and its codon usage indicator trend lines (GC content of chloroplast protein-coding sequences; GC3: content of G+C at the third position of chloroplast protein-coding sequences; ENC: number of effective codons); Table S1: Chloroplast genome of 25 Dalbergia species; Table S2: Amino acids of four types of synonymous codons; Table S3: Whole genome and chloroplast genome protein coding sequence information; Table S4: Codon usage indices of nuclear genome and chloroplast genome; Table S5: Codon GC content of 25 Dalbergia species; Table S6: Neutral plot analysis data for nuclear and chloroplast genomes; Table S7: Nuclear genome optimal codon; Table S8: Statistics of gene expression at different levels; Table S9: Information on the four bases of the nuclear genome in leaf tissues; Table S10: Amino acid level gene expression information statistics.

Author Contributions

M.-Q.T. and S.-Q.X. conceived the project and designed the experiments, Z.-K.W. carried out the study and wrote the manuscript. Y.L. collated the data. H.-Y.Z. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the National Natural Science Foundation of China (32060149, 32201624), the Hainan Provincial Natural Science Foundation of China (320RC500, 321RC469), the Key Science and Technology Program of Hainan Province (ZDKJ2021031), and the Priming Scientific Research Foundation of Hainan University (KYQD(ZR)1721, KYQD(ZR)-21039).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are available in the Methods section and Supplementary Materials of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ENCEffective number of codons
GCTotal G + C contents of the gene
GC1, GC2, GC3G + C content at the first, second, and third codon positions
GC12Average GC content at the first and second codon positions
GC3sG + C content at the third position of synonymous codons
ENCexpExpected ENC value
ENCobsObserved ENC value
NCBINational Centre for Biotechnology
PR2Parity Rule 2
RSCURelative synonymous codon usage
T3s, C3s, A3s, G3sContent of T, C, A and G at the third codon position of synonymous codons

References

  1. Saw, P.E.; Xu, X.; Chen, J.; Song, E.W. Non-coding RNAs: The new central dogma of cancer biology. Sci. China Life Sci. 2021, 64, 22–50. [Google Scholar] [CrossRef] [PubMed]
  2. Hershberg, R.; Petrov, D.A. Selection on Codon Bias. Annu. Rev. Genet. 2008, 42, 287–299. [Google Scholar] [CrossRef] [PubMed]
  3. Behura, S.K.; Severson, D.W. Comparative analysis of codon usage bias and codon context patterns between dipteran and hymenopteran sequenced genomes. PLoS ONE 2012, 7, e43111. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, H.; Huang, Y.; Du, X.; Chen, Z.; Zeng, X.; Chen, Y.; Zhang, H. Patterns of synonymous codon usage bias in the model grass Brachypodium distachyon. Genet. Mol. Res. 2012, 11, 4695–4706. [Google Scholar] [CrossRef]
  5. Carlini, D.B.; Stephan, W. In vivo introduction of unpreferred synonymous codons into the Drosophila Adh gene results in reduced levels of ADH protein. Genetics 2003, 163, 239–243. [Google Scholar] [CrossRef]
  6. Roberts, R.J. Restriction and modification enzymes and their recognition sequences. Nucleic. Acids. Res. 1985, 11 (Suppl. S13), r165–r200. [Google Scholar] [CrossRef]
  7. Archetti, M. Codon usage bias and mutation constraints reduce the level of error minimization of the genetic code. J. Mol. Evol. 2004, 59, 258–266. [Google Scholar] [CrossRef]
  8. Zhang, R.; Zhang, L.; Wang, W.; Zhang, Z.; Du, H.; Qu, Z.; Li, X.Q.; Xiang, H. Differences in Codon Usage Bias between Photosynthesis-Related Genes and Genetic System-Related Genes of Chloroplast Genomes in Cultivated and Wild Solanum Species. Int. J. Mol. Sci. 2018, 19, 3142. [Google Scholar] [CrossRef]
  9. Monroe, J.G.; Srikant, T.; Carbonell-Bejerano, P.; Becker, C.; Lensink, M.; Exposito-Alonso, M.; Klein, M.; Hildebrandt, J.; Neumann, M.; Kliebenstein, D.; et al. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 2022, 602, 101–105. [Google Scholar] [CrossRef]
  10. Ata, G.; Wang, H.; Bai, H.; Yao, X.; Tao, S. Edging on Mutational Bias, Induced Natural Selection from Host and Natural Reservoirs Predominates Codon Usage Evolution in Hantaan Virus. Front. Microbiol. 2021, 12, 699788. [Google Scholar] [CrossRef] [PubMed]
  11. Wu, Z.; Tian, C.; Yang, Y.; Li, Y.; Liu, Q.; Li, Z.; Jin, K. Comparative and Phylogenetic Analysis of Complete Chloroplast Genomes in Leymus (Triticodae, Poaceae). Genes 2022, 13, 1425. [Google Scholar] [CrossRef] [PubMed]
  12. Zhang, T.; Fang, Y.; Wang, X.; Deng, X.; Zhang, X.; Hu, S.; Yu, J. The complete chloroplast and mitochondrial genome sequences of Boea hygrometrica: Insights into the evolution of plant organellar genomes. PLoS ONE 2012, 7, e30531. [Google Scholar] [CrossRef] [PubMed]
  13. Wang, Y.; Wang, S.; Liu, Y.; Yuan, Q.; Sun, J.; Guo, L. Chloroplast genome variation and phylogenetic relationships of Atractylodes species. BMC Genom. 2021, 22, 103. [Google Scholar] [CrossRef] [PubMed]
  14. Matveeva, T.V.; Pavlova, O.A.; Bogomaz, D.I.; Demkovich, A.E.; Lutova, L.A. Molecular markers for plant species identification and phylogenetics. Ecol. Genet. 2011, 9, 32–43. [Google Scholar] [CrossRef]
  15. Xu, J.; Shen, X.; Liao, B.; Xu, J.; Hou, D. Comparing and phylogenetic analysis chloroplast genome of three Achyranthes species. Sci. Rep. 2020, 10, 10818. [Google Scholar] [CrossRef]
  16. Yang, L.; Abduraimov, O.; Tojibaev, K.; Shomurodov, K.; Zhang, Y.-M.; Li, W.-J. Analysis of complete chloroplast genome sequences and insight into the phylogenetic relationships of Ferula L. BMC Genom. 2022, 23, 643. [Google Scholar] [CrossRef]
  17. Zhang, W.; Wang, H.; Dong, J.; Zhang, T.; Xiao, H. Comparative chloroplast genomes and phylogenetic analysis of Aquilegia. Appl. Plant Sci. 2021, 9, e11412. [Google Scholar] [CrossRef]
  18. Duan, H.; Zhang, Q.; Wang, C.; Li, F.; Tian, F.; Lu, Y.; Hu, Y.; Yang, H.; Cui, G. Analysis of codon usage patterns of the chloroplast genome in Delphinium grandiflorum L. reveals a preference for AT-ending codons as a result of major selection constraints. PeerJ 2021, 9, e10787. [Google Scholar] [CrossRef]
  19. Wu, H.Y.; Wong, K.H.; Kong, B.L.; Siu, T.Y.; But, G.W.; Tsang, S.S.; Lau, D.T.; Shaw, P.C. Comparative Analysis of Chloroplast Genomes of Dalbergia Species for Identification and Phylogenetic Analysis. Plants 2022, 11, 1109. [Google Scholar] [CrossRef]
  20. Zhao, X.; Wang, C.; Meng, H.; Yu, Z.; Yang, M.; Wei, J. Dalbergia odorifera: A review of its traditional uses, phytochemistry, pharmacology, and quality control. J. Ethnopharmacol. 2020, 248, 112328. [Google Scholar] [CrossRef]
  21. Hong, Z.; Li, J.; Liu, X.; Lian, J.; Zhang, N.; Yang, Z.; Niu, Y.; Cui, Z.; Xu, D. The chromosome-level draft genome of Dalbergia odorifera. Gigascience 2020, 9, giaa084. [Google Scholar] [CrossRef] [PubMed]
  22. Shao, F.; Panahipour, L.; Gruber, R. Flavonoids from Dalbergia cochinchinensis: Impact on osteoclastogenesis. J. Dent. Sci. 2023, 18, 112–119. [Google Scholar] [CrossRef]
  23. Hong, Z.; Liao, X.; Ye, Y.; Zhang, N.; Yang, Z.; Zhu, W.; Gao, W.; Sharbrough, J.; Tembrock, L.R.; Xu, D.; et al. A complete mitochondrial genome for fragrant Chinese rosewood (Dalbergia odorifera, Fabaceae) with comparative analyses of genome structure and intergenomic sequence transfers. BMC Genom. 2021, 22, 672. [Google Scholar] [CrossRef]
  24. Sun, Y.; Gao, M.; Kang, S.; Yang, C.; Meng, H.; Yang, Y.; Zhao, X.; Gao, Z.; Xu, Y.; Jin, Y.; et al. Molecular Mechanism Underlying Mechanical Wounding-Induced Flavonoid Accumulation in Dalbergia odorifera T. Chen, an Endangered Tree That Produces Chinese Rosewood. Genes 2020, 11, 478. [Google Scholar] [CrossRef]
  25. Lu, J.K.; Li, Z.S.; Yang, F.C.; Wang, S.K.; Liang, J.F.; He, X.H. Concurrent carbon and nitrogen transfer between hemiparasite Santalum album and two N2-fixing hosts in a sandalwood plantation. For. Ecol. Manag. 2020, 464, 118060. [Google Scholar] [CrossRef]
  26. Zhang, Y.; Shen, Z.; Meng, X.; Zhang, L.; Liu, Z.; Liu, M.; Zhang, F.; Zhao, J. Codon usage patterns across seven Rosales species. BMC Plant Biol. 2022, 22, 65. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, W.; Schalamun, M.; Morales-Suarez, A.; Kainer, D.; Schwessinger, B.; Lanfear, R. Assembly of chloroplast genomes with long- and short-read data: A comparison of approaches using Eucalyptus pauciflora as a test case. BMC Genom. 2018, 19, 977. [Google Scholar] [CrossRef] [PubMed]
  28. Shen, D.; Cheng, A.; Wang, M. Analysis of synonymous codon usage in the outer membrane efflux protein gene of Riemerella anatipestifer. In Proceedings of the 2012 5th International Conference on BioMedical Engineering and Informatics, Chongqing, China, 16–18 October 2012; pp. 1147–1152. [Google Scholar]
  29. Long, S.; Yao, H.; Wu, Q.; Li, G. Analysis of compositional bias and codon usage pattern of the coding sequence in Banna virus genome. Virus Res. 2018, 258, 68–72. [Google Scholar] [CrossRef]
  30. Sharp, P.M.; Li, W.H. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 1986, 24, 28–38. [Google Scholar] [CrossRef]
  31. Dilucca, M.; Forcelloni, S.; Georgakilas, A.G.; Giansanti, A.; Pavlopoulou, A. Codon Usage and Phenotypic Divergences of SARS-CoV-2 Genes. Viruses 2020, 12, 498. [Google Scholar] [CrossRef]
  32. Dilucca, M.; Pavlopoulou, A.; Georgakilas, A.G.; Giansanti, A. Codon usage bias in radioresistant bacteria. Gene 2020, 742, 144554. [Google Scholar] [CrossRef] [PubMed]
  33. Wang, L.; Xing, H.; Yuan, Y.; Wang, X.; Saeed, M.; Tao, J.; Feng, W.; Zhang, G.; Song, X.; Sun, X. Genome-wide analysis of codon usage bias in four sequenced cotton species. PLoS ONE 2018, 13, e0194372. [Google Scholar] [CrossRef] [PubMed]
  34. Liu, X.-Y.; Li, Y.; Ji, K.-K.; Zhu, J.; Ling, P.; Zhou, T.; Fan, L.-Y.; Xie, S.-Q. Genome-wide codon usage pattern analysis reveals the correlation between codon usage bias and gene expression in Cuscuta australis. Genomics 2020, 112, 2695–2702. [Google Scholar] [CrossRef]
  35. Almutairi, M.M. Analysis of chromosomes and nucleotides in rice to predict gene expression through codon usage pattern. Saudi J. Biol. Sci. 2021, 28, 4569–4574. [Google Scholar] [CrossRef]
  36. Huang, C.; Fu, D.; Wu, X.; Chu, M.; Ma, X.; Jia, C.; Guo, X.; Bao, P.; Yan, P.; Chunnian, L. Characterization of the complete mitochondrial genome of the Bazhou yak (Bos Grunniens). Mitochondrial. DNA B Resour. 2019, 4, 3234–3235. [Google Scholar] [CrossRef]
  37. Tang, D.; Wei, F.; Cai, Z.; Wei, Y.; Khan, A.; Miao, J.; Wei, K. Analysis of codon usage bias and evolution in the chloroplast genome of Mesona chinensis Benth. Dev. Genes Evol. 2021, 231, 1–9. [Google Scholar] [CrossRef] [PubMed]
  38. Andargie, M.; Congyi, Z. Genome-wide analysis of codon usage in sesame (Sesamum indicum L.). Heliyon 2022, 8, e08687. [Google Scholar] [CrossRef]
  39. Gao, Y.; Lu, Y.; Song, Y.; Jing, L. Analysis of codon usage bias of WRKY transcription factors in Helianthus annuus. BMC Genom. Data 2022, 23, 46. [Google Scholar] [CrossRef]
  40. He, B.; Dong, H.; Jiang, C.; Cao, F.; Tao, S.; Xu, L.-A. Analysis of codon usage patterns in Ginkgo biloba reveals codon usage tendency from A/U-ending to G/C-ending. Sci. Rep. 2016, 6, 35927. [Google Scholar] [CrossRef]
  41. Nath Choudhury, M.; Uddin, A.; Chakraborty, S. Codon usage bias and its influencing factors for Y-linked genes in human. Comput. Biol. Chem. 2017, 69, 77–86. [Google Scholar] [CrossRef]
  42. Zhou, Z.; Dang, Y.; Zhou, M.; Li, L.; Yu, C.-h.; Fu, J.; Chen, S.; Liu, Y. Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc. Natl. Acad. Sci. USA 2016, 113, E6117–E6125. [Google Scholar] [CrossRef] [PubMed]
  43. Ran, W.; Kristensen, D.M.; Koonin, E.V. Coupling between protein level selection and codon usage optimization in the evolution of bacteria and archaea. mBio 2014, 5, e00956-14. [Google Scholar] [CrossRef] [PubMed]
  44. Southworth, J.; Armitage, P.; Fallon, B.; Dawson, H.; Bryk, J.; Carr, M. Patterns of Ancestral Animal Codon Usage Bias Revealed through Holozoan Protists. Mol. Biol. Evol. 2018, 35, 2499–2511. [Google Scholar] [CrossRef]
  45. Shen, Z.; Gan, Z.; Zhang, F.; Yi, X.; Zhang, J.; Wan, X. Analysis of codon usage patterns in citrus based on coding sequence data. BMC Genom. 2020, 21, 234. [Google Scholar] [CrossRef] [PubMed]
  46. Chakraborty, S.; Mazumder, T.H.; Uddin, A. Compositional dynamics and codon usage pattern of BRCA1 gene across nine mammalian species. Genomics 2019, 111, 167–176. [Google Scholar] [CrossRef]
  47. Deb, B.; Uddin, A.; Chakraborty, S. Codon usage pattern and its influencing factors in different genomes of hepadnaviruses. Arch. Virol. 2020, 165, 557–570. [Google Scholar] [CrossRef]
  48. Wang, Z.; Cai, Q.; Wang, Y.; Li, M.; Wang, C.; Wang, Z.; Jiao, C.; Xu, C.; Wang, H.; Zhang, Z. Comparative Analysis of Codon Bias in the Chloroplast Genomes of Theaceae Species. Front. Genet. 2022, 13, 824610. [Google Scholar] [CrossRef]
  49. Barbhuiya, P.A.; Uddin, A.; Chakraborty, S. Understanding the codon usage patterns of mitochondrial CO genes among Amphibians. Gene 2021, 777, 145462. [Google Scholar] [CrossRef]
  50. Marais, G.; Mouchiroud, D.; Duret, L. Does recombination improve selection on codon usage? Lessons from nematode and fly complete genomes. Proc. Natl. Acad. Sci. USA 2001, 98, 5688–5692. [Google Scholar] [CrossRef]
  51. Huang, X.; Jiao, Y.; Guo, J.; Wang, Y.; Chu, G.; Wang, M. Analysis of codon usage patterns in Haloxylon ammodendron based on genomic and transcriptomic data. Gene 2022, 845, 146842. [Google Scholar] [CrossRef]
  52. Wang, Z.; Xu, B.; Li, B.; Zhou, Q.; Wang, G.; Jiang, X.; Wang, C.; Xu, Z. Comparative analysis of codon usage patterns in chloroplast genomes of six Euphorbiaceae species. PeerJ 2020, 8, e8251. [Google Scholar] [CrossRef]
  53. Li, G.; Zhang, L.; Xue, P. Codon usage pattern and genetic diversity in chloroplast genomes of Panicum species. Gene 2021, 802, 145866. [Google Scholar] [CrossRef] [PubMed]
  54. Li, G.; Zhang, L.; Xue, P.; Zhu, M. Comparative Analysis on the Codon Usage Pattern of the Chloroplast Genomes in Malus Species. Biochem. Genet. 2022, 60, 1–15. [Google Scholar] [CrossRef] [PubMed]
  55. Hao, J.; Liang, Y.; Ping, J.; Li, J.; Shi, W.; Su, Y.; Wang, T. Chloroplast gene expression level is negatively correlated with evolutionary rates and selective pressure while positively with codon usage bias in Ophioglossum vulgatum L. BMC Plant Biol. 2022, 22, 580. [Google Scholar] [CrossRef] [PubMed]
  56. Sahoo, S.; Das, S.S.; Rakshit, R. Codon usage pattern and predicted gene expression in Arabidopsis thaliana. Gene 2019, 721, 100012. [Google Scholar] [CrossRef] [PubMed]
  57. Fuller, Z.L.; Haynes, G.D.; Zhu, D.; Batterton, M.; Chao, H.; Dugan, S.; Javaid, M.; Jayaseelan, J.C.; Lee, S.; Li, M.; et al. Evidence for stabilizing selection on codon usage in chromosomal rearrangements of Drosophila pseudoobscura. G3 2014, 4, 2433–2449. [Google Scholar] [CrossRef] [PubMed]
  58. Stoletzki, N. The surprising negative correlation of gene length and optimal codon use—Disentangling translational selection from GC-biased gene conversion in yeast. BMC Evol. Biol. 2011, 11, 93. [Google Scholar] [CrossRef]
  59. Nhung, N.P.; Thu, P.Q.; Chi, N.M.; Dell, B. Vegetative propagation of Dalbergia tonkinensis, a threatened, high-value tree species in South-east Asia. South For. A J. For. Sci. 2019, 81, 195–200. [Google Scholar] [CrossRef]
  60. Quax, T.E.; Claassens, N.J.; Söll, D.; van der Oost, J. Codon Bias as a Means to Fine-Tune Gene Expression. Mol. Cell 2015, 59, 149–161. [Google Scholar] [CrossRef]
Figure 1. Comparison of base composition between nuclear genome and chloroplast genome.
Figure 1. Comparison of base composition between nuclear genome and chloroplast genome.
Genes 14 01110 g001
Figure 2. Codon usage parameter characteristics (A,B) andare correlation of codon usage indicators for protein-coding sequences (The darker the red represents the stronger negative correlation, the darker the blue represents the higher positive correlation. (C,D) are percentages of GC content at different positions in D. odorifera genome. Note: (A,C) chloroplast genome, (B,D) Nuclear genome).
Figure 2. Codon usage parameter characteristics (A,B) andare correlation of codon usage indicators for protein-coding sequences (The darker the red represents the stronger negative correlation, the darker the blue represents the higher positive correlation. (C,D) are percentages of GC content at different positions in D. odorifera genome. Note: (A,C) chloroplast genome, (B,D) Nuclear genome).
Genes 14 01110 g002
Figure 3. The use of synonymous codons and the factors influencing codon bias (A) Heatmap of synonymous codon usage (darker blue shades represent lower RSCU values, while darker red shades represent higher RSCU values). (B) ENC-plot analysis of nuclear genome and chloroplast genome. (The red curve is the expected ENC value versus GC3s.) (C) Analytical plot of the PR2-plot. (D) Neutral plot analysis of nuclear genome and chloroplast genome sequences. (The red diagonal line represents GC3 equals GC12, and if all points are on this line, it suggests that the codon usage pattern is mainly affected by mutations. Otherwise, it is influenced by natural selection.)
Figure 3. The use of synonymous codons and the factors influencing codon bias (A) Heatmap of synonymous codon usage (darker blue shades represent lower RSCU values, while darker red shades represent higher RSCU values). (B) ENC-plot analysis of nuclear genome and chloroplast genome. (The red curve is the expected ENC value versus GC3s.) (C) Analytical plot of the PR2-plot. (D) Neutral plot analysis of nuclear genome and chloroplast genome sequences. (The red diagonal line represents GC3 equals GC12, and if all points are on this line, it suggests that the codon usage pattern is mainly affected by mutations. Otherwise, it is influenced by natural selection.)
Genes 14 01110 g003
Figure 4. Optimal codon results of 25 Dalbergia plants (the optimal codon is indicated in red).
Figure 4. Optimal codon results of 25 Dalbergia plants (the optimal codon is indicated in red).
Genes 14 01110 g004
Figure 5. The relationship between gene expression levels and ENC values of different tissues in D. odorifera Four-tissue RPKM-ENC distribution heatmap (A) leaf, (B) root, (C) seed, (D) stem. The redder the color, the greater the number of neighboring genes, and the bluer the color, the smaller the number of neighboring genes.
Figure 5. The relationship between gene expression levels and ENC values of different tissues in D. odorifera Four-tissue RPKM-ENC distribution heatmap (A) leaf, (B) root, (C) seed, (D) stem. The redder the color, the greater the number of neighboring genes, and the bluer the color, the smaller the number of neighboring genes.
Genes 14 01110 g005
Figure 6. The gene expression of different codon bias in sequence and codon level (A) Relationship between high expression gene and ENC value; (B) Relationship between medium expression gene and ENC value; (C) Relationship between low expression gene and ENC value; (D) Relationship between RPKM value and third codon base preference in codon level (Note: In the same boxplot, different letters between groups indicate significant differences (a and b), while identical letters represent non-significant differences (a and a, or b and b).).
Figure 6. The gene expression of different codon bias in sequence and codon level (A) Relationship between high expression gene and ENC value; (B) Relationship between medium expression gene and ENC value; (C) Relationship between low expression gene and ENC value; (D) Relationship between RPKM value and third codon base preference in codon level (Note: In the same boxplot, different letters between groups indicate significant differences (a and b), while identical letters represent non-significant differences (a and a, or b and b).).
Genes 14 01110 g006
Figure 7. Phylogenetic trees and trend lines of their codon usage indicators (A) Phylogenetic tree based on chloroplast protein-coding sequences. (B) Phylogenetic tree based on the whole chloroplast genome.
Figure 7. Phylogenetic trees and trend lines of their codon usage indicators (A) Phylogenetic tree based on chloroplast protein-coding sequences. (B) Phylogenetic tree based on the whole chloroplast genome.
Genes 14 01110 g007
Table 1. Codon usage indicators in protein-coding sequence of nuclear and chloroplast genome.
Table 1. Codon usage indicators in protein-coding sequence of nuclear and chloroplast genome.
IndicatorsNuclear GenomeChloroplast Genome
Mean ± SDMax.Min.Mean ± SDMax.Min.
T3s0.390 ± 0.0800.68540.03170.397 ± 0.0950.55880.1684
C3s0.267 ± 0.0870.81600.216 ± 0.0660.33330.0649
A3s0.311 ± 0.0730.60260.02420.404 ± 0.0760.57140.2079
G3s0.283 ± 0.0680.83810.04760.253 ± 0.0930.4870.1012
ENC51.86 ± 4.4106125.1749.9 ± 6.19935.7161
GC3s0.432 ± 0.1020.9140.1750.354 ± 0.1080.5670.176
GC0.461 ± 0.0500.6950.2590.390 ± 0.0460.5020.285
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Z.-K.; Liu, Y.; Zheng, H.-Y.; Tang, M.-Q.; Xie, S.-Q. Comparative Analysis of Codon Usage Patterns in Nuclear and Chloroplast Genome of Dalbergia (Fabaceae). Genes 2023, 14, 1110. https://doi.org/10.3390/genes14051110

AMA Style

Wang Z-K, Liu Y, Zheng H-Y, Tang M-Q, Xie S-Q. Comparative Analysis of Codon Usage Patterns in Nuclear and Chloroplast Genome of Dalbergia (Fabaceae). Genes. 2023; 14(5):1110. https://doi.org/10.3390/genes14051110

Chicago/Turabian Style

Wang, Zu-Kai, Yi Liu, Hao-Yue Zheng, Min-Qiang Tang, and Shang-Qian Xie. 2023. "Comparative Analysis of Codon Usage Patterns in Nuclear and Chloroplast Genome of Dalbergia (Fabaceae)" Genes 14, no. 5: 1110. https://doi.org/10.3390/genes14051110

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop