Next Article in Journal
On the Pleiotropic Actions of Glucagon-like Peptide-1 in Its Regulation of Homeostatic and Hedonic Feeding
Previous Article in Journal
Bisphenol A Exposure Modifies the Vasoactive Response of the Middle Cerebral Artery
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exploring Molecular and Genetic Differences in Angelica biserrata Roots Under Environmental Changes

State Key Laboratory of Aridland Crop Science, College of Agronomy, Gansu Agricultural University, Lanzhou 730070, China
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(8), 3894; https://doi.org/10.3390/ijms26083894
Submission received: 21 February 2025 / Revised: 15 April 2025 / Accepted: 17 April 2025 / Published: 20 April 2025
(This article belongs to the Section Molecular Genetics and Genomics)

Abstract

:
Angelica biserrata (Shan et Yuan) Yuan et Shan (A. biserrata) roots, a widely distributed medicinal crop with intraspecific diversity, exhibits significant variability in coumarin content across habitats. This study integrated metabolomics and transcriptomics to dissect the spatial heterogeneity in metabolite profiles and gene expression, revealing the mechanisms driving coumarin biosynthesis divergence. By synthesizing climate-related big data with machine learning and Bayesian-optimized deep learning models, we identified key environmental drivers and predicted optimal cultivation conditions. The key findings were as follows: (1) differential regions most strongly influenced coumarin; (2) upstream genes (such as PAL-1, PAL-2, BGLU44, etc.) modulated downstream coumarin metabolites; (3) elevation (Elev) and warmest quarter temperature (Bio10) dominated coumarin variation, whereas May solar radiation (Srad5) and precipitation seasonality (Bio15) controlled transcriptomic reprogramming; (4) the optimized environment for bioactive compounds included mean annual temperature (Bio1) = 9.99 °C, annual precipitation (Bio12) = 1493 mm, Elev = 1728 m, cumulative solar radiation = 152,643 kJ·m−2·day−1, and soil organic carbon = 11,883 g·kg−1. This study aimed to clarify the biological characteristics and differential regulatory mechanisms of A. biserrata roots in different habitats, establish a theoretical framework for understanding the molecular mechanisms controlling metabolic changes under various habitats, and contribute to elucidating the formation of active constituents while facilitating their effective utilization.

1. Introduction

Medicinal plants, as critical resources for healthcare and industrial applications, exhibit close correlations between their growth environments and medicinal quality, bioactive compound accumulation, and ecological adaptability [1]. In recent years, the impacts of environmental factors on the distribution, physiological metabolism, and pharmacologically active components of medicinal plants have emerged as a research focus in interdisciplinary fields spanning ecology, pharmacology, and agronomy, driven by global climate change, intensified environmental pollution, and the growing demand for plant-derived medicinal resources. Studies demonstrate that distinct growth environments significantly influence the growth cycles and secondary metabolite synthesis in medicinal plants. For example, Zhou et al. revealed that altitudinal gradients induced marked variations in the flavonoid metabolites of Agriophyllum squarrosum [2]. Similarly, Hosseini et al. reported that drought conditions significantly affected glycyrrhizin synthesis licorice (Glycyrrhiza glabra L.) [3]. However, secondary metabolites in medicinal plants serve not only as crucial products for environmental adaptation but also as key bioactive compounds. Consequently, investigating the relationship between secondary metabolites and ecological factors is essential for ecological cultivation and the quality control of medicinal plants from the source.
The root of Angelica biserrata (Shan et Yuan) Yuan et Shan (A. biserrata), a plant of the genus Angelica in the Apiaceae family, is utilized as the traditional Chinese medicine “Duhuo”, demonstrating significant medicinal and industrial value [4,5,6]. Contemporary pharmacological studies have further emphasized the potential benefits of A. biserrata roots in detoxification, wound healing, liver-soothing, wind-dispelling, and tranquilizing effects [7]. During the 20th century, numerous scholars documented the various chemical components in A. biserrata roots, with coumarins and volatile oil compounds constituting the primary constituents, alongside organic acids and sugars [8,9]. From the perspectives of traditional Chinese medicine pharmacology and plant chemistry, coumarins are considered the most crucial active ingredients [10].
A. biserrata roots are extensively distributed, and diverse ecological environments influence their distribution, morphology, physiological effects, and accumulation of secondary metabolites [5,11]. In a previous study utilizing a species distribution model, we found that A. biserrata roots exhibit strong ecological adaptability in China, with their suitable distribution area covering 40% of China’s land area. However, this has also resulted in the diversification of A. biserrata root varieties and has posed challenges for quality control. Current research on the environmental impacts on A. biserrata root quality predominantly focuses on the quantitative disparities in active compounds, whereas the biosynthetic mechanisms and gene expression underlying these variations remain understudied. Furthermore, the suitable growth environment has a significant effect on improving the content and quality of active compounds. Consequently, developing effective analytical methods to comprehensively investigate the ecological adaptation mechanisms of A. biserrata roots in various habitats and to identify the key environmental variables influencing these differences is crucial for ecological cultivation and quality control.
In recent years, multi-source data generated by various emerging technologies have been extensively utilized in the comprehensive evaluation of medicinal plant quality. Multi-omics approaches, which enable the deeper exploration of the potential differential characteristics of plants in diverse environments, have been widely studied in the quality evaluation of medicinal plants, thanks to the continuous advancements and cost reductions in high-throughput sequencing technologies [12]. These studies primarily concentrate on identifying molecular alterations in the genome, transcriptome, proteome, and metabolome [13]. Among these, the integrated analysis of transcriptomics and metabolomics not only overcomes the limitations of single-omics methods, but also systematically and comprehensively elucidates the functions and regulatory mechanisms of molecules, so it is more widely used in the quality evaluation of medicinal plants [14,15]. For instance, Bao et al. conducted a quality analysis of Euryales Semen originating from different sources and varieties by utilizing untargeted metabolomics [16]. Pathway analysis unveiled the pivotal role of flavonoids in the seed development process of Euryales Semen. The research findings suggested a strong similarity in metabolic data among Euryales Semen samples from diverse regions. Furthermore, Zhang et al. utilized untargeted metabolomics and transcriptomics to reveal disparities in metabolite accumulation and gene expression between wild and cultivated ophiocordyceps sinensis [17]. The integrated analysis of metabolomics and transcriptomics indicated that the genes IMPDH, AK, ADSS, guaA, and GUK were potentially linked to the synthesis of purine nucleotides and nucleosides, providing a fresh perspective on the molecular underpinnings of metabolic variations in medicinal fungi.
With the continuous progress of computer algorithms, the combination of multi-source data and machine learning has become an important method to evaluate the influence of various environmental factors on the quality of medicinal plants [18]. Machine learning enables the assessment of feature importance for prediction outcomes, thereby enhancing the identification of key factors in datasets. This capability is particularly valuable for high-dimensional data, as it facilitates the elimination of irrelevant features, simplifies the model architecture, and improves interpretability [19,20]. For instance, Liu et al. applied machine learning to analyze the correlation between climate data and Panax notoginseng saponin content, revealing that saponin levels were negatively correlated with annual average temperature and annual temperature range. Lower annual average temperatures and reduced annual temperature ranges were shown to promote saponin accumulation [21].
Beyond identifying key environmental variables, deep learning within machine learning has been progressively applied to predict optimal environments due to its advantages in capturing nonlinear relationships between phenotypic traits and yields. For instance, Gharghory et al. proposed an enhanced architecture based on LSTM recurrent neural networks to forecast greenhouse microclimates [22]. Shi et al. integrated deep learning with the Sparrow search algorithm to predict greenhouse microclimates, thereby improving seedling environmental adaptability [23]. Among the various algorithms, deep neural network models (DNNs) are favored for their proficiency in processing large-scale data and time-series predictions [24]. While DNNs excel at capturing spatial features, their lack of explicit memory mechanisms for temporal dependencies renders them relatively weak in time-series modeling. Consequently, optimization algorithms play a pivotal role in enhancing model performance [25]. Commonly employed optimization methods include Particle Swarm Optimization (PSO), Genetic Algorithm (GA), and Bayesian Optimization (BO), among others [26]. Compared to PSO and GA, BO demonstrates superior global exploration capabilities [27]. By iteratively selecting the most promising parameter combinations for evaluation, BO effectively reduces the search space and enhances the optimization efficiency [28]. Cho et al. also analyzed four baseline strategies for DNN hyperparameter optimization in their study, revealing that BO consistently delivered top-tier or near-top-tier performance across all DNN benchmark tests [29].
Based on the aforementioned background, this study investigated A. biserrata roots from different habitats, employing transcriptomics and metabolomics to conduct in-depth analyses of variations in secondary metabolites and genetic information across habitats. Integrated with machine learning, we identified the key environmental variables driving these differences. Additionally, a quantitative model linking growth environments to bioactive molecules was established using deep learning and Bayesian Optimization. This research aimed to elucidate the biological characteristics and differential mechanisms of A. biserrata roots under diverse habitats, thereby providing a scientific basis and theoretical guidance for advancing studies on the interactions between medicinal plants and their ecological environments.

2. Results

2.1. Metabolic Differences and Similarities of Ecotype Samples

Through the quality control of internal standards and QC samples, the pretreatment, derivatization, sample loading, and mass spectrometry system stability of the experimental parts were analyzed and evaluated. As depicted in Figure S1, the PCA model graph obtained through 7-fold cross-validation shows that the QC samples cluster closely together, indicating good stability and reproducibility in this experiment. In this study, a total of 4982 metabolites were identified and analyzed based on The Human Metabolome Database (HMDB), Lipidmaps (v2.3), and the METLIN database. Among them, the 995 secondary metabolites mainly included coumarins and their derivatives, flavonoids, prenol lipids, and steroids and steroid derivatives (Table 1), of which prenol lipids accounted for the largest proportion. The prenol lipids in A. biserrata roots were mainly terpenoids and volatile oils.
To evaluate the diversity of secondary metabolites among the four ecotype samples, principal component analysis (PCA) was employed. PCA represents an unsupervised pattern recognition technique frequently utilized to visualize overarching clustering patterns among distinct groups and to assess the variability within the same set of samples [30]. The PCA results indicated that the biological replicates of the four samples clustered together in distinct regions, highlighting significant differences. PC1 and PC2 contributed 35.2% and 33.4%, respectively, to the sample separation (Figure 1a). To investigate the influence of different habitats on the metabolites of A. biserrata roots, an OPLS-DA model was established to discern the metabolic distinctions between each pair of the four distinct habitats. Based on the high predictability (Q2) of the OPLS-DA models, coupled with permutation tests for additional verification (Figure S2), models exhibiting high predictiveness and reliability were developed. As illustrated in Figure S3, the score plots of the six models exhibit a clear distinction between the various groups.
Through screening for differential metabolites, it was found that the differential metabolites (DAMs) among different varieties were primarily characterized by variations in their metabolite content, rather than their types. The screening results were visualized using Venn diagrams (Figure 1b) and a Volcano plot (Figure 1c). The Volcano plot reveals 313 DAMs between C and G (141 upregulated, 172 downregulated), 266 DAMs between C and H (142 upregulated, 122 downregulated), 286 DAMs between S and H (141 upregulated, 127 downregulated), 271 DAMs between G and C (163 upregulated, 108 downregulated), 325 DAMs between S and C (177 upregulated, 148 downregulated), and 288 DAMs between G and S (141 upregulated, 147 downregulated). These differential metabolites, representing a significant proportion of the organooxygen compounds (13.8–18.8%), prenol lipids (8.6–11.3%), coumarins and their derivatives (7.9–11.3%), carboxylic acids and their derivatives (7.9–10.3%), fatty acyls (8.9–11.7%), and flavonoids (5.9–8.9%), are detailed in Table 2. Among these, organooxygen compounds constituted the highest proportion in all comparison groups. The differences in secondary metabolites were primarily attributed to coumarin and its derivatives, as well as flavonoids. Furthermore, based on the Venn diagram, 58 shared DAMs were identified through pairwise comparisons, with cluster analysis indicating the predominant occurrence of coumarins and organic oxygen compounds, with coumarins notably comprising 22% (Figure S4).
KEGG enrichment analysis was conducted on the differential metabolites of each control group: C-VS-H, G-VS-C, G-VS-H, G-VS-S, S-VS-C, and S-VS-H, which were enriched in 51, 76, 49, 62, 63, and 62 pathways, respectively (Tables S3 and S4). The top 20 pathways with the lowest p-values were selected to construct a circular plot for KEGG enrichment analysis (Figure S5). The results indicated that the enriched pathways across all control groups primarily belonged to the categories of metabolism, environmental information processing, organismal systems, and human diseases. Among these categories, metabolism was the most enriched. In each control group, the most enriched differential metabolites were related to the biosynthesis of various plant secondary metabolites, followed by amino acid metabolism. Furthermore, the metabolites enriched in the biosynthesis pathways of secondary metabolites were primarily coumarins and derivatives.

2.2. Genetic Information Differences and Similarities of Ecotype Samples

This study generated a total of 81.24 Gb of clean data based on transcriptome sequencing, with an effective data volume ranging from 5.97 to 7.01 Gb per sample. The Q30 base distribution ranged from 93.81% to 94.67%, with an average GC content of 42.5% (Table S5). The length distribution of the annotated genes and the sample FPKM expression distribution range are shown in Figure S6a,c. Based on the PCA (Figure S6b), the biological replicates of the four samples exhibited significant differences across distinct regions. The clustering analysis of the sequencing samples demonstrated tight clustering of the three biological replicates in each group, indicating the high reproducibility of the transcriptome data (Figure S6d).
The numbers of differentially expressed genes (DEGs) identified in the differential comparisons of C-VS-H, G-VS-C, G-VS-H, G-VS-S, S-VS-C, and S-VS-H were 22,451, 15,782, 21,967, 21,503, 18,357, and 19,590, respectively, as illustrated in Figure S7. The expression profiles of the living genes in four different habitats were analyzed, and the results indicated that the types were essentially the same. GO enrichment analysis was employed to describe the functions of the DEGs in A. biserrata root samples from four distinct habitats. All DEGs were effectively annotated into the three functional categories of “cellular component”, “molecular function”, and “biological process” in the GO analysis. The top 30 enriched GO terms were shown in Figure S8. Through pairwise comparisons, it was observed that within the biological process category, the majority of DEGs were enriched in DNA integration and DNA recombination. With respect to the cellular component category, the DEGs exhibited the highest enrichment in the integral component of the membrane and the nucleus. In the category of molecular function, the DEGs were predominantly enriched in metal-ion-binding and ATP-binding activities. KEGG enrichment pathway analysis was performed on the DEGs, and all were successfully annotated into categories including “genetic information processing”, “metabolism”, “cellular processes”, and “environmental information processing” (Table S6). The 20 pathways with the lowest p-values were chosen to construct circular plots for the KEGG enrichment analysis (Figure 2). Notably, within the “metabolism” category, phenylpropanoid biosynthesis exhibited the smallest p-values among all comparison groups and demonstrated the highest enrichment based on the number of genes.
To further explore the overall differences in genes, we annotated them to transcription factors. A total of 1691 Unigenes were annotated as transcription factors, among which AP2/ERF-ERF, NAC, and WRKY were the transcription factors with a relatively large number of genes, with 123, 121, and 106 genes, respectively. Upon comparing the distribution of all Unigenes and differentially expressed Unigenes (upregulated and downregulated) in terms of their transcription factor profiles, it was observed that in each comparison group, the transcription factor with the highest number of differentially expressed genes was AP2/ERF-ERF. Specifically, in C-VS-H, there were 17 upregulated and 47 downregulated; in G-VS-C, 28 upregulated and 14 downregulated; in G-VS-H, 27 upregulated and 44 downregulated; in S-VS-C, 44 upregulated and 16 downregulated; in S-VS-G, 32 upregulated and 15 downregulated; and in S-VS-H, 28 upregulated and 34 downregulated (Figure S9).

2.3. Comprehensive Analysis of Metabolomics and Transcriptomics

To investigate the relationship between DEGs and DAMs in A. biserrata roots from different habitats, we conducted a comprehensive analysis. Spearman correlation coefficients (PCCs) were calculated among the top 30 DEGs and DAMs, and the concentration of the data was visually analyzed. As shown in Figure S10A, the results indicated that in pairwise comparisons across the four distinct habitats, most DAMs were significantly correlated, either positively or negatively, with DEGs. The data comparison revealed that among all the comparison groups, the proportion of metabolites enriched in the synthesis pathway of secondary metabolites was the highest, and these secondary metabolites all belonged to the coumarin class in phenylpropanoid. Co-enrichment pathway analysis was further conducted on DAMs and DEGs, visualizing the top 30 pathways (Figure S10B). Many pathways related to carbohydrates and amino acids exhibited significant differences, including carbohydrate metabolism, such as pentose and glucuronate interconversions; starch and sucrose metabolism; galactose metabolism; and amino acid metabolism, with significant pathways like cyanoamino acid metabolism and histidine metabolism.

2.4. Analysis of the Synthetic Pathway of the Main Active Compound: Coumarins

From the perspective of the synthetic pathway, it was mainly the upstream gene that regulated the downstream coumarin difference (Figure 3a). These genes included PAL-1 and PAL-2 which were associated with the synthesis of trans-cinnamate from L-phenylalanine; BGLU44, ANIA_01804, XYL4, BXL1, BGLU18, and ANIA_02828 with the synthesis of cis-2-Hydroxycinnamate from beta-D-Glucosyl-2-coumarinate; and C4H1, 4CL-1, and 4CL-2 with the synthesis of various coumarins and their derivatives from trans-cinnamate.
In order to verify the reliability of the transcriptome data, we used qRT-PCR to determine the expression levels of seven genes involved in coumarins’ metabolism (Figure 3b). The expression patterns of most genes in all comparison groups were similar to those obtained in the RNA-Seq analysis. Therefore, the results of the RNA-seq analysis that we completed have high reproducibility and reliability, which is helpful to further study the key genes of coumarin accumulation in A. biserrata roots. In addition, the RNA-Seq and qRT-PCR results showed that the data could evaluate the upregulation and downregulation of gene expression. Comparative analysis revealed fifteen downregulated and two upregulated genes in the C-VS-H comparison group. Through pairwise comparison analysis, it was observed that in the C-VS-H comparison group, fifteen genes were downregulated and two were upregulated. In the G-VS-C group, seven genes were upregulated and ten were downregulated. For G-VS-H, fifteen genes were downregulated and two were upregulated. In S-VS-C, five genes were upregulated and twelve were downregulated. In S-VS-G, eight genes were upregulated and nine were downregulated. Finally, in S-VS-H, fourteen genes were upregulated and three were downregulated (Table 3). The differential changes in metabolites were primarily observed in derivatives of the secondary metabolite coumarin, such as osthenol and psoralen.

2.5. Interaction Between Environment and Global Transcriptome as Well as Metabolome

This study employed R language to conduct Random Forest (RF) analyses on 38 environmental variables and the PC1 feature axes derived from two omics datasets. As shown in Figure 4, most of the environmental variables were significant for the ranking of the importance of the metabolomics data. This may be attributed to the intricate nature of transcriptomics data, and the genes may have showed diversity under the action of the environment. Further analysis showed that the most important and significant environmental variable for the ranking of metabolomics data was the solar radiation in September, and the rankings of the precipitation factor (Bio12–Bio19) and solar radiation factor (Srad1-12) were more important than that of the temperature factor (Bio1–11). The environmental variables temperature annual range (Bio7) and solar radiation in September were the most important for the transcriptome of effects.

2.6. Interaction and Interaction Network Between Environment, Coumarin Metabolites, and Genes

Through the interaction network diagram, it was found that osthenol has a significant negative correlation with Bio1, Bio5, Bio6, Bio8, Bio9, Bio10, and Bio11, and a positive correlation with Elev (Figure 5a). The relationship between environmental factors and coumarin metabolites, as well as the genes related to their synthesis, was quite complex, making it difficult to assess the key environmental variables that influenced coumarin differences (Figure S11). Therefore, this study further evaluated the key environmental variables affecting these differences using machine learning and statistical analysis methods.
In this study, the ‘FactoMineR’ package of R language was used for MFA, and the differential coumarin metabolites, their related transcriptome gene expression, environmental factors, and origin information were organized into eight datasets, in which the environmental factors were divided into five categories (Figure S12). By balancing the influence of each group of variables, the comprehensive relationship between each group of variables was analyzed. According to the position of the variable set in the ranking graph, the correlation and relative contribution between the variable set and the MFA feature group could be evaluated. The results showed that the metabolites contributed the most to Dim1, followed by the environmental variable temperature, and the environmental variable rainfall contributed the most to Dim2 (Figure 5b). Figure 5c shows the contribution of the first 20 variables to the ranking of Dim1 and Dim2. Among them, the environmental factor altitude had the highest contribution to Dim1, and the soil factor (S_oc) had the highest contribution to Dim2. Spearman correlation analysis of different datasets found that metabolites were significantly positively correlated with altitude, genes, and altitude, and that genes had the highest correlation with altitude (Figure 5d).
Considering the comprehensiveness and complexity of ecological factors, we used R V4.3.1 software to calculate the Spearman correlation coefficient of the environmental variables. Combined with the results of MFA, the variables with correlation coefficients exceeding |0.8| only retained the variables with the highest contribution, and finally, 11 environmental variables were selected. The ‘vegan’ package in R language was used to perform RDA on 11 environmental variables related to coumarin-related differential metabolites and differential key genes. The results showed that the four environmental variables with the highest degree of interpretation and significance with genes were Elev, Bio10, Srad5, and Bio15. The four environmental variables with the highest degree of interpretation and significance with metabolites were Srad8, Bio10, Elev, bio1 (Figure S13a). VPA was used to evaluate the explanatory ratio of environmental factors to gene and metabolite changes. The results showed that elev had the highest explanatory ratio of 16.51% for metabolites, followed by bio10 with 5.88%. In terms of genes, Srad5 had the highest interpretation ratio of 51.08%, followed by bio15 of 50.50% (Figure S13b). The cross-validation of the random forest analysis of environmental variables showed that the selection of two important variables from the four environmental variables would obtain the ideal regression results, and the error was minimized. The ranking of importance results showed that elev and bio10 had the most important effect on the content of coumarin metabolites, and srad5 and bio15 were the most important for the expression of genes related to coumarin synthesis, which was consistent with the results of VPA (Figure 5e). The marginal effects of important environmental variables on metabolites and gene expression can be seen in the partial dependence plot. If the variable has little effect on the result, the partial dependence plot should be a horizontal line. It can be seen from the diagram that the effects of coumarin metabolites and gene expression under different environmental variables are significantly different (Figure S14).

2.7. Optimal Environment for Active Compound Accumulation

This study employed Bayesian Optimization in conjunction with deep neural networks to identify the optimal environmental conditions for the active compounds in A. biserrata roots, thereby offering a valuable reference for the ecological cultivation of Chinese medicinal herbs. The MSE for the test set was 0.0043, with a training value of 0.0025. The RMSE for the test set was 0.0654, with a training value of 0.0495. Additionally, the MAE for the test set was 0.05, with a training value of 0.0388. The R2 value was 0.9772, which was close to 1 (Table 4). Figure 6a illustrates the change in the loss function over training epochs, exhibiting a general trend of an initial decrease followed by stabilization. Figure 6b presents the relationship between the actual and predicted values, where all data points lie approximately along a straight line. Figure 6c is a residual plot that demonstrates that the residuals are randomly distributed around zero, with no discernible patterns or trends observed. Specifically, the optimal values obtained were as follows: Bio1 of 9.9878302 °C, Bio12 of 1493 mm, Elev of 1728 m, SradSUM of 152,643 kJ∙m−2∙day−1, and S_oc (soil organic carbon) content of 11,883.

3. Discussion

3.1. Differences Between Metabolites and Heritage Information of Ecotype A. biserrata Roots

A comparison of information on metabolite and gene expression differences between the four ecotype samples revealed that the differences were only in terms of content and did not differ between species, suggesting that the different habitat environments only altered the differences in content. KEGG enrichment analysis, based on the screened differential metabolites and genes, revealed that the most abundant and significant differential metabolite enrichment pathway was secondary metabolite synthesis. Of particular note was the observation that all of these secondary metabolites were coumarin analogs. The most abundant and significant differential gene enrichment pathway was found to be phenylpropanoid synthesis. In many medicinal plants, the phenylpropanoid pathway serves as a pivotal biosynthetic route for critical secondary metabolites, such as coumarins. These metabolites, derived from the phenylpropanoid pathway, not only act as indicators of plant responses to environmental stressors but also function as key mediators in plant pathogen resistance [31]. Studies have demonstrated that variations in growth environments significantly influence the types, concentrations, and biosynthetic pathways of secondary metabolites in medicinal plants [32]. As essential bioactive compounds and adaptive products to environmental conditions, the synthesis and accumulation of secondary metabolites in medicinal plants are closely associated with ecological factors [33]. For example, Du et al. reported that the differential metabolites in Eucommia ulmoides cultivated across distinct regions predominantly originated from flavonoid compounds produced via the phenylpropanoid pathway, with flavonoids constituting its primary bioactive constituents [34]. Similarly, Zhang et al. revealed that environmental disparities in the growth habitats of F. dibotrys were primarily elucidated by variations in the abundance of phenolic and flavonoid compounds [35].
In the present study, the biosynthetic process of coumarin was illuminated through an integrated analysis of transcriptomics and metabolomics. The results obtained demonstrated significant disparities between the upstream gene and the downstream coumarin metabolites. This observation suggested the possibility of regulatory influence by the upstream gene on the downstream metabolic differences. Dong et al. demonstrated that in Angelica sinensis (a species of the genus Angelica), when the expression levels of genes related to the biosynthesis of its active compounds—ferulic acid (PAL1, 4CLL4, 4CLL9, C3H, HCT, CCOAOMT, and CCR) and flavonoids (CHS and CHI)—were increased under varying temperatures, the concentrations of these metabolites were simultaneously elevated [36]. This further indicated that environmental factors regulated the accumulation of downstream metabolites through upstream genes. The upstream genes included PAL-1, PAL-2, BGLU44, ANIA_01804, XYL4, BXL1, ANIA_02828, C4H1, 4CL-1, F6H1-3, and 4CL-2. Han et al. reported that the biosynthetic pathway of simple coumarins involved 10 gene families, comprising C4H, C2’H, C3H, 4CL, C3’H, CCoAOMT, COMT, COSY, F6’H, and HCT genes. Most annotated gene fragments in this study were classified into these gene families [37]. To validate the reliability of the transcriptome data, we employed qRT-PCR to determine the expression levels of seven genes involved in coumarin metabolism. The expression patterns of most genes across all control groups showed consistency with those obtained from RNA-Seq analysis. Consequently, our RNA-Seq results demonstrated high reproducibility and reliability, providing valuable insights into the coumarin biosynthetic pathway.
Through an analysis of the expression levels of coumarin metabolites in A. biserrata roots from different regions, we found that the coumarin content in samples from Hubei was significantly higher than that from other regions, whereas the content in samples from Shaanxi was the lowest among all. Han et al. revealed that there were considerable variations in the mass fractions of indicator components in A. biserrata roots from different regions, ranked, from high to low, as Wushan, Hubei, Wuxi, Sichuan, Shaanxi, and Gansu [38]. The results of this study were largely consistent with theirs, suggesting that regions such as Hubei and the Chuan Yu area (Sichuan–Chongqing) possess a more sensitive environmental response mechanism, which facilitates the accumulation of coumarin components in A. biserrata roots.

3.2. Key Environmental Variables Affecting A. biserrata Root Active Substance Accumulation and Gene Expression

Environmental factors exert a significant impact on the synthesis and gene expression of plant secondary metabolites [39]. In this study, environmental variables were found to have some effect on the overall metabolite and transcriptional PC1 feature axes. However, it was not possible to identify the key environmental variables. This may be attributed to the complexity of the entire dataset and the interactions among environmental variables. Previous studies have demonstrated that environmental variables are complex and dynamic, and their impact on the accumulation of plant metabolites constitutes a comprehensive process [40]. In nature, the plant stress response represents a highly coordinated signaling event that has evolved over thousands of years [41]. The influence of environmental factors on plants is never mediated solely by a single factor [34]. Climate change will induce metabolic alterations in plants and modify the accumulation patterns of primary and secondary metabolites [42]. For instance, barley grown under a single stress factor can adapt and enhance its resistance, whereas the combination of climatic factors mitigates this effect [43]. Likewise, tomato (Solanum lycopersicum) plants exhibit superior performance when grown at elevated temperatures and under higher light conditions compared to when grown under standard conditions [44].
By utilizing machine learning, statistical analysis, and constructing interaction networks, we further analyzed the key environmental variables influencing the differences in coumarin metabolites and gene expression. Our final analysis revealed that elev (elevation) and bio10 (mean temperature of the warmest quarter) were key variables influencing differences in coumarin metabolites, whereas srad5 (solar radiation in May) and bio15 (precipitation seasonality) were crucial for gene expression. Altitude served as the primary regulatory factor influencing growth, development, and the accumulation of active substances [45]. Prior research has demonstrated that altitude exerts a significant influence on terpenoids, phenylpropanoids, fatty acid biosynthesis, and flavonoid biosynthesis [46,47,48,49]. Optimal altitude conditions impact quality by altering the distribution pattern of photosynthetic products and the rate of dry matter accumulation [50].
The radiation in May (annual sunshine hours) had a significant effect on the expression of genes associated with coumarin synthesis in A. biserrata roots, which could be attributed to May being the flowering period for A. biserrata roots [51]. Elevational gradients led to a substantial increase in ultraviolet (UV) radiation intensity, particularly within the UV-A and UV-B spectral bands, while coumarin biosynthesis exhibited pronounced sensitivity to UV exposure. Research indicates that variations in UV-A and UV-B radiation activate the expression of genes encoding key enzymes in phenylpropanoid biosynthesis [52]. Escobar et al. demonstrated that prolonged UV-B exposure triggered divergent expression patterns of phenylpropanoid biosynthetic genes between two Vaccinium corymbosum cultivars, accompanied by distinct disparities in total phenolic compounds and flavonoid accumulation [53]. Similarly, Lei et al. revealed that high-altitude adaptation in Draba oreades Schrenk enhanced resilience to intensified UV radiation and concurrent low-temperature stress, thereby driving dynamic shifts in phenylpropanoid and flavonoid metabolite profiles [54].

3.3. Deep Learning Predicts the Optimal Suitable Environment for Active Substances of A. biserrata Roots

The active compounds in medicinal plants constitute the cornerstone of their therapeutic efficacy, and the environment exerts a substantial influence on the content and quality of these compounds. Optimizing the growth conditions of medicinal plants can augment the concentration and quality of their bioactive compounds. In this study, a combination of Bayesian Optimization and deep neural networks was employed to simulate and predict the optimal environment for the bioactive compounds of A. biserrata roots. The results indicated that the model’s evaluation metrics, including MSE, RMSE, MAE, and R2, exhibited excellent performance. Notably, the R2 value approached 1, signifying an exceptional fit of the model to the data, which nearly fully explained the variance therein. Concurrently, the gradual decrease in the loss function across training epochs indicated that the model was learning and enhancing its predictive capabilities. Ultimately, its stabilization suggested that the model had converged to the optimal solution [55]. The high consistency between the actual and predicted values demonstrated the model’s accuracy in predicting the growth environment of A. biserrata roots. The random distribution of residuals, devoid of any discernible patterns or trends, typically indicates that the model is appropriate and free from significant systematic errors. Specifically, the predicted optimal growth conditions for the bioactive compounds of A. biserrata roots were as follows: Bio1 (annual mean temperature) of 9.9878302 °C, Bio12 (annual precipitation) of 1493 mm, Elev (elevation) of 1728 m, SradSUM (total solar radiation) of 152,643 kJ∙m−2∙day−1, and S_oc (soil organic carbon) of 11,883.
The relevant literature indicates that A. biserrata roots thrive in cool, humid environments, are cold-hardy, and have an optimal growth temperature range of 15–25 °C [56]. However, some studies suggest more vigorous growth at 8–12 °C. Although the simulated temperature values are slightly below the optimal range reported in the literature, they remain within the growth range of A. biserrata roots. Lower temperatures reduce the growth rate while potentially inducing environmental stress signals, which activate secondary metabolic pathways, thereby enhancing the accumulation of secondary metabolites [57]. A. biserrata roots adapt to a wide range of annual precipitation, from 600 to 1500 mm. High precipitation maintains soil moisture, fostering growth and active compound accumulation. A. biserrata roots are predominantly found in alpine regions, at altitudes of 1000–2600 m [58]. The predicted altitude aligns with its growth range. High-altitude conditions, including lower temperatures, higher humidity, and fertile soil, favor A. biserrata roots’ growth. Although the literature does not directly address total solar radiation, A. biserrata roots’ shade-loving nature suggests that excessive radiation may be harmful. However, moderate radiation is crucial for photosynthesis. Soil organic carbon is a crucial indicator for assessing soil fertility. A. biserrata roots thrive in fertile, loose soil rich in organic matter [59]. According to relevant research findings, the soil organic carbon content value predicted in this study is highly beneficial for the growth of A. biserrata roots, promoting root development and nutrient absorption, thereby facilitating growth and the accumulation of active compounds. In summary, the results of this study are generally consistent with the actual situation of high-quality A. biserrata root cultivation, indirectly proving the accuracy and precision of the model constructed in this study. Of course, actual cultivation still requires the fine-tuning of the planting area in conjunction with local conditions and the growth habits of A. biserrata roots to improve its yield and quality.

4. Materials and Methods

4.1. Preparation of A. biserrata Root Materials

Fresh, high-quality samples of A. biserrata roots from four distinct regions were selected, without the utilization of genetically modified organisms (GMOs) or their derivatives, nor the application of chemically synthesized insecticides, fertilizers, growth regulators, or other chemical substances. Farmhouse manure was applied as the standard practice, with a harvesting cycle of two years, and the harvest occurred in October. The collection sites were located away from urban areas, industrial and mining zones, sources of industrial pollution, and domestic waste disposal facilities, encompassing the primary areas of artificial cultivation for A. biserrata roots. At each sampling site, three replicate root samples were randomly collected, placed in sterile plastic bags, maintained on ice, and transported immediately to the laboratory. The harvested roots were rinsed with distilled water to remove surface contaminants and subsequently blot-dried with filter paper. All samples were flash-frozen in liquid nitrogen and stored at −80 °C to ensure the preservation of biomolecular integrity for subsequent transcriptomic and metabolomic analyses. The details of the collection sites are presented in Figure 7.

4.2. Transcriptome Analysis

This study utilized Illumina NovaSeq 2000 sequencing technology to explore the variations in the gene expression of A. biserrata roots across diverse habitats. After annotating the Unigenes, the sequence reads were aligned to these Unigenes by utilizing the Bowtie2 V2.5.2 software. Subsequently, the expression levels of the Unigenes were quantified using the express software, yielding FPKM values [60]. Differential expression multiples were computed using DESeq2 V3.11 software, and differential expression analysis was conducted by employing the negative binomial distribution (NB) test. The criteria adopted for identifying differentially expressed genes were a q-value of less than 0.05 and a FoldChange greater than 2. The Unigenes were mapped onto the Kyoto Encyclopedia of Genes and Genomes (KEGG) to annotate their biological pathways. Gene Ontology (GO) term assignment was performed by aligning the Unigenes with entries in the Swiss-Prot database and with the corresponding GO terms [61]. For detailed procedures and methods, please refer to the Supplementary Materials.

4.3. Metabolite Determination and Analysis

This study utilized liquid chromatography–tandem mass spectrometry (LC-MS/MS) to quantify differential metabolite profiles. A combination of multidimensional and unidimensional analytical approaches was employed to screen for differentially expressed metabolites between groups. In the OPLS-DA and PLS-DA analyses, Variable Importance in Projection (VIP) scores were used to evaluate the influence and explanatory power of metabolite expression patterns in distinguishing between sample groups. This aided in the identification of biologically significant differential metabolites. Subsequent validation using T-tests was conducted to determine the statistical significance of the differentially expressed metabolites between groups. The selection criteria were a p-value less than 0.05 and a VIP score greater than 1. The identified differential metabolites were subsequently analyzed for metabolic pathway enrichment using the KEGG database V114.0 [34,62].

4.4. Quantitative Real-Time Polymerase Chain Reaction (qRT-PCR)

To validate the RNA-seq data, eight differentially expressed genes (DEGs) involved in coumarin metabolism were selected for quantitative real-time PCR (qRT-PCR) analysis. The primer sequences were designed in-house and synthesized by Qingke Biotech (Beijing Tsingke Biotech Co., Ltd., Beijing, China) based on the mRNA sequences retrieved from the National Center for Biotechnology Information (NCBI) database, as detailed in Table S1. The Actin (ACT) gene served as a reference control, with the forward primer sequence of TGGTATTGTGCTGGATTCTGGT and the reverse primer sequence of TGGATCACCACCAGCAAGG producing an amplicon size of 109 base pairs (bp). Relative changes in gene expression levels were then determined using the 2−ΔΔCt method [63].

4.5. Environment Variable

Given the comprehensive and complex nature of ecological factors, we selected 38 environmental variables to explore the mechanism by which the environment influenced the accumulation of metabolites in A. biserrata roots [64] (Table S2). These variables encompassed climate data, soil data, and terrain data. The climate data consisted of solar radiation and 19 bioclimatic factors, collected monthly from January to December. The data were sourced from the WorldClim V2.1 database (http://www.worldclim.org, accessed on 15 October 2023), with a spatial resolution of 2.5 km × 2.5 km. Six soil factors, including soil types and various physical and chemical properties, were selected from the World Soil Database (https://www.fao.org/home/en/, accessed on 15 October 2023) with a spatial resolution of 30 m. Using ArcGIS 10.8, environmental variable values from the 38 factors were extracted to the specific sampling points to obtain relevant information on the environmental conditions.

4.6. Deep Learning and Statistical Analysis

Based on R software version 4.3.1, we utilized two machine learning approaches, namely random forest and Multiple Factor Analysis (MFA), in conjunction with Redundancy Analysis (RDA), Spearman correlation analysis, and Variance Partitioning Analysis (VPA), to identify the pivotal environmental variables that contribute to the significant disparities in active substances and their corresponding gene expression. The environment–genetic–metabolite network was visualized utilizing R software version 4.3.1 in conjunction with Cytoscape version 3.5 [65].
To optimize the optimal cultivation environment, we concentrated on maximizing the content of osthole, a representative coumarin derived from the roots of Ap, employing DNNs coupled with Bayesian Optimization for predictive purposes. The DNNs, in conjunction with Bayesian Optimization, was implemented using Python 3.12, with Keras and scikit-optimize employed for the construction, training, and evaluation of the neural network. The dataset was randomly split into three subsets: training, validation, and testing. During the neural network training process, 80% of the data were used for training, 10% for validation, and 10% for testing. The transfer functions for the hidden and output layers were the hyperbolic tangent sigmoid function (tansig) and the linear function (in), respectively. Throughout the construction process, we ensured the data quality by verifying the absence of missing values, converting pertinent dataset columns to floating-point types to facilitate mathematical calculations, and augmenting the dataset by replicating the original data and introducing random perturbations to enhance data diversity and model generalization. The data were scaled to the [0, 1] range utilizing MinMaxScaler V1.6.0, a widely used data preprocessing technique for deep learning models that aids in accelerating model convergence [66]. BayesSearchCV V0.8.1 was employed for the Bayesian Optimization of hyperparameters.
The procedure was conducted 300 times to assess the model’s accuracy, employing metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R2). Additionally, visual aids, including plots of the predicted versus actual values, residual plots, and graphs depicting the variation in the loss function across training epochs, were utilized. To evaluate the suitability of the overall environment, five key environmental variables were selected: Bio1 (annual mean temperature), Bio12 (annual precipitation), Elev (elevation), SradSUM (total solar radiation), and S_oc (soil organic carbon). These variables were employed to predict the optimal environmental conditions.

5. Conclusions

This study aimed to address the challenges posed by the extensive distribution and environmental variability of medicinal plants in achieving standardized quality control and maintaining genetic diversity. Using the roots of the widely distributed A. biserrata roots as a model, we integrated multi-omics approaches, computational ecology, and environmental big data to investigate the mechanisms by which ecological factors regulate their bioactive molecules. The results demonstrated that environmental variables significantly influenced both plant growth and the accumulation of bioactive compounds, with the coumarin biosynthesis pathway exhibiting the most pronounced metabolic divergence. By reconstructing the coumarin biosynthetic pathway, we identified the upstream genes (e.g., PAL-1, PAL-2, BGLU44) that governed the differential accumulation of downstream metabolites. Key environmental drivers included elevation (Elev) and the mean temperature of the warmest quarter (Bio10), which modulated coumarin variation, while solar radiation in May (Srad5) and precipitation seasonality (Bio15) predominantly affected gene expression. Furthermore, a quantitative model linking growth environments to bioactive molecule profiles was established, with the optimal parameters defined as follows: mean annual temperature (Bio1) = 9.28 °C, annual precipitation (Bio12) = 1483.66 mm, elevation = 1765.08 m, total solar radiation (SradSUM) = 152,833.23 kJ/m2/d, and soil organic carbon (S_oc) = 11,873.95 mg/kg. This work provides a scientific foundation for the ecological cultivation and sustainable utilization of A. biserrata roots, while highlighting the critical role of ecological strategies in medicinal plant conservation. Our findings propose novel interdisciplinary approaches bridging ecology, agriculture, and phytomedicine for optimized resource management.

Supplementary Materials

The supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26083894/s1.

Author Contributions

C.H.: Conceptualization, Data curation, Investigation, Methodology, Writing—original draft. Q.L.: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Validation, Visualization, Writing–review and editing. X.D.: Formal analysis. K.J.: Writing–review and editing. W.L.: Writing–review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Longyuan youth innovation and entrepreneurship talent project (GSRC-2023-1-4), the National Natural Science Foundation of China (31860102), Youth Tutor Fund project of Gansu Agricultural University (GAU-QDFC-2024-09).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are open access and able at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE288417 (accessed on 2 April 2025) code will be made available upon request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Dessie, Y.; Amsalu, N.; Fassil, A.; Liyew, M. Antibacterial Potential of Selected Traditional Medicinal Plants for Wound Healing in Sekela District, Northwestern Ethiopia. J. Herbs Spices Med. Plants 2025, 31, 113–130. [Google Scholar] [CrossRef]
  2. Zhou, S.; Yan, X.; Yang, J.; Qian, C.; Yin, X.; Fan, X.; Fang, T.; Gao, Y.; Chang, Y.; Liu, W.; et al. Variations in Flavonoid Metabolites Along Altitudinal Gradient in a Desert Medicinal Plant Agriophyllum squarrosum. Front. Plant Sci. 2021, 12, 683265. [Google Scholar] [CrossRef] [PubMed]
  3. Hosseini, S.M.; Samsampour, D.; Ebrahimi, M.; Abadía, J.; Khanahmadi, M. Effect of drought stress on growth parameters, osmolyte contents, antioxidant enzymes and glycyrrhizin synthesis in licorice (Glycyrrhiza glabra L.) grown in the field. Phytochemistry 2018, 156, 124–134. [Google Scholar] [CrossRef] [PubMed]
  4. Yang, Y.F.; Zhang, L.; Yang, X.W. Distribution Assessments of Coumarins from Angelicae Pubescentis Radix in Rat Cerebrospinal Fluid and Brain by Liquid Chromatography Tandem Mass Spectrometry Analysis. Molecules 2018, 1, 225. [Google Scholar] [CrossRef] [PubMed]
  5. Lu, Y.Q.; Wu, H.W.; Yu, X.K.; Zhang, X.; Luo, H.Y.; Tang, L.Y.; Wang, Z.J. Traditional Chinese medicine of Angelicae pubescentis radix: A review of phytochemistry, pharmacology and pharmacokinetics. Front. Pharmacol. 2020, 11, 355. [Google Scholar] [CrossRef] [PubMed]
  6. Guo, Q.Q.; Du, G.C.; Li, Y.X.; Liang, C.Y.; Wang, C.; Zhang, Y.N.; Li, R.G. Nematotoxic coumarins from Angelica pubescens Maxim. f. biserrata Shan et Yuan roots and their physiological effects on Bursaphelenchus xylophilus. J. Nematol. 2018, 50, 559–568. [Google Scholar] [CrossRef] [PubMed]
  7. Zhou, L.L.; Zeng, J.G. Research Advances on Chemical Constituents and Pharmacological Effects of Angelica pubescen. Mod. Chin. Med. 2019, 21, 1739–1748. [Google Scholar]
  8. Zhu, Y.; Feng, J.; Liu, Q.Y.; Yang, S. Studies on Chemical Constituents of Radix Angelicae pubescentis. In Proceedings of the 2023 8th International Symposium on Energy Science and Chemical Engineering, Guangzhou, China, 24–26 March 2023; College of Commerce Liaoning Economic Management Cadre Institute: Dalian, China; Endocrinology Department Affiliated Hospital of Liaoning University of Traditional Chinese Medicine: Shenyang, China; College of Pharmacy Liaoning University of Traditional Chinese Medicine: Shenyang, China, 2025; pp. 205–209. [Google Scholar]
  9. Cao, L.D.; Hu, C.; Gu, J.; He, L.; Teng, X.F.; Tian, S.Y.; Li, Z. Studies on the chemical constituents from the roots of Angelicae pubescentis and their inhibition of osteoclastongenesis. Lishizhen Med. Mater. Medica Res. 2020, 33, 2918–2923. [Google Scholar]
  10. Wang, B.; Liu, X.; Zhou, A.; Meng, M.; Li, Q. Simultaneous analysis of coumarin derivatives in extracts of Radix Angelicae pubescentis (Duhuo) by HPLC-DAD-ESI-MSn technique. Anal. Methods 2014, 10, 39. [Google Scholar] [CrossRef]
  11. Zhen, M.D.; Song, M.M.; He, Z.H.; Li, H.Z. Molecular authentication of the medicinal species of Rhizoma et Radix Heraclei, Radix Angelicae Sinensis, Radix Angelicae Pubescentis and Rhizoma et Radix Notopterygii by integrating ITS2 and its secondary structure. Acta Pharm. Sin. 2021, 56, 2289–2294. [Google Scholar]
  12. Onda, Y.; Mochida, K. Exploring Genetic Diversity in Plants Using High-Throughput Sequencing Techniques. Curr. Genom. 2016, 17, 358–367. [Google Scholar] [CrossRef] [PubMed]
  13. Sun, J.; Du, L.; Qu, Z.; Wang, H.; Dong, S.; Li, X.; Zhao, H. Integrated metabolomics and proteomics analysis to study the changes in Scutellaria baicalensis at different growth stages. Food Chem. 2023, 419, 136043. [Google Scholar] [CrossRef] [PubMed]
  14. Shen, S.Q.; Zhan, C.S.; Yang, C.K.; Fernie, A.R.; Luo, J. Metabolomics-centered mining of plant metabolic diversity and function: Past decade and future perspectives. Mol. Plant 2022, 16, 43–63. [Google Scholar] [CrossRef] [PubMed]
  15. Klupczyńska, E.A.; Pawłowski Tomasz, A. Regulation of Seed Dormancy and Germination Mechanisms in a Changing Environment. Int. J. Mol. Sci. 2021, 22, 1357. [Google Scholar] [CrossRef]
  16. Bao, K.; Jing, Z.H.; Wang, Q.; Huang, Z.; Han, D.; Dai, S.; Liu, C.; Wu, Q.; Xu, F. Quality analysis of Euryales Semen from different origins and varieties based on untargeted metabolomics. J. Chromatogr. B 2022, 1191, 123114. [Google Scholar] [CrossRef]
  17. Zhang, J.S.; Wang, N.; Chen, W.X.; Zhang, W.; Zhang, H.; Yu, H.; Yi, Y. Integrated metabolomics and transcriptomics reveal metabolites difference between wild and cultivated Ophiocordyceps sinensis. Food Res. Int. 2022, 163, 112275. [Google Scholar] [CrossRef]
  18. Lee, H.; Koo, J.H.; Lee, C.K.; Song, Y.; Joo, W.K.; Chae, C.J. Prediction and Classification of Phenol Contents in Cnidium officinale Makino Using a Stacking Ensemble Model in Climate Change Scenarios. Agronomy 2024, 14, 1766. [Google Scholar] [CrossRef]
  19. Nazarenko, D.V.; Kharyuk, P.V.; Oseledets, I.V.; Rodin, I.A.; Shpigun, O.A. Machine learning for LC-MS medicinal plants identification. Chemom. Intell. Lab. Syst. 2016, 156, e174–e180. [Google Scholar] [CrossRef]
  20. Meng, T.; Jing, X.; Yan, Z.; Pedrycz, W. A survey on machine learning for data fusion. Inf. Fusion 2020, 57, e115–e129. [Google Scholar] [CrossRef]
  21. Liu, C.; Zuo, Z.; Xu, F.; Wang, Y. Study of the suitable climate factors and geographical origins traceability of Panax notoginseng based on correlation analysis and spectral images combined with machine learning. Front. Plant Sci. 2023, 13, 1009727. [Google Scholar] [CrossRef]
  22. Gharghory, S.M. Deep Network based on Long Short-Term Memory for Time Series Prediction of Microclimate Data inside the Greenhouse. Int. J. Comput. Intell. Appl. 2020, 19, 2050013. [Google Scholar] [CrossRef]
  23. Shi, D.Y.; Yuan, P.; Liang, L.W.; Gao, L.; Li, M.; Diao, M. Integration of Deep Learning and Sparrow Search Algorithms to Optimize Greenhouse Microclimate Prediction for Seedling Environment Suitability. Agronomy 2024, 14, 254. [Google Scholar] [CrossRef]
  24. Maleki, M.; Wraith, D. Mixtures of multivariate restricted skew-normal factor analyzer models in a Bayesian framework. Comput. Stat. 2019, 34, 1039–1053. [Google Scholar] [CrossRef]
  25. Arsalan, G.; Ali, S.A.; Meisam, A.; Mohammadzadeh, A.; Jamali, S. Application of Artificial Neural Networks for Mangrove Mapping Using Multi-Temporal and Multi-Source Remote Sensing Imagery. Water 2020, 14, 244. [Google Scholar]
  26. Ben Said, L.; Basem, A.; Sultan, A.J.; Singh, P.K.; Jasim, D.J.; Anqi, A.E.; Rajab, H.; Ahmed, M.; Rajhi, W. Harnessing meta-heuristic, Bayesian, and search-based techniques in optimizing machine learning models for improved energy storage with microencapsulated PCMs. Int. Commun. Heat Mass Transf. 2025, 162, 108537. [Google Scholar] [CrossRef]
  27. Liu, M.; Zhuang, P.; Lai, F. A Bayesian optimization-genetic algorithm-based approach for automatic parameter calibration of soil models: Application to clay and sand model. Comput. Geotech. 2024, 176, 106717. [Google Scholar] [CrossRef]
  28. Mahani, M.R.; Nechepurenko, I.A.; Rahimof, Y.; Wicht, A. Optimizing data acquisition: A Bayesian approach for efficient machine learning model training. Mach. Learn. Sci. Technol. 2024, 5, 35013. [Google Scholar] [CrossRef]
  29. Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic Enhancement Strategies When Using Bayesian Optimization for Hyperparameter Tuning of Deep Neural Networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
  30. Jakubus, M.; Bakinowska, E. Varied macronutrient uptake by plants as an effect of different fertilisation schemes evaluated by PCA. Acta Agric. Scand. Sect. B—Soil Plant Sci. 2020, 70, 56–68. [Google Scholar] [CrossRef]
  31. Dong, N.; Lin, H. Contribution of phenylpropanoid metabolism to plant development and plant–environment interactions. J. Integr. Plant Biol. 2021, 63, 180–209. [Google Scholar] [CrossRef]
  32. Kumar, V.; Nautiyal, C.S. Plant Abiotic and Biotic Stress Alleviation: From an Endophytic Microbial Perspective. Curr. Microbiol. 2022, 79, 311. [Google Scholar] [CrossRef] [PubMed]
  33. Ranner, J.L.; Schalk, S.; Martyniak, C.; Parniske, M.; Gutjahr, C.; Stark, T.D.; Dawid, C. Primary and Secondary Metabolites in Lotus japonicus. J. Agric. Food Chem. 2023, 71, 11277–11303. [Google Scholar] [CrossRef]
  34. Du, J.L.; Lu, X.; Geng, Z.P.; Yuan, Y.; Liu, Y.; Li, J.; Wang, M.; Wang, J. Metabolites changes of Eucommia ulmoides Olive samaras from different regions and cultivars. Ind. Crops Prod. 2022, 189, 115824. [Google Scholar] [CrossRef]
  35. Zhang, C.; Jiang, Y.; Liu, C.; Shi, L.; Li, J.; Zeng, Y.; Guo, L.; Wang, S. Identification of Medicinal Compounds of Fagopyri Dibotryis Rhizome from Different Origins and Its Varieties Using UPLC-MS/MS-Based Metabolomics. Metabolites 2022, 12, 790. [Google Scholar] [CrossRef] [PubMed]
  36. Dong, H.; Li, M.; Jin, L.; Xie, X.; Li, M.; Wei, J. Cool Temperature Enhances Growth, Ferulic Acid and Flavonoid Biosynthesis While Inhibiting Polysaccharide Biosynthesis in Angelica sinensis. Molecules 2022, 27, 320. [Google Scholar] [CrossRef] [PubMed]
  37. Han, X.X.; Li, C.; Sun, S.; Ji, J.; Nie, B.; Maker, G.; Ren, Y.; Wang, L. The chromosome-level genome of female ginseng (Angelica sinensis) provides insights into molecular mechanisms and evolution of coumarin biosynthesis. Plant J. Cell Mol. Biol. 2022, 112, 1224–1237. [Google Scholar] [CrossRef]
  38. Han, F.; Ling, M.X.; Luo, C.; Xiao, Z.; Reng, X.Y.; Tan, Q.S.; Zhang, W.W. Analysis of quality differences in Angelica sinensis and its counterfeit products from different regions. J. Southwest China Norm. Univ. (Nat. Sci. Ed.) 2019, 44, 34–39. [Google Scholar]
  39. Leisner, C.P.; Potnis, N.; SanzSaez, A. Crosstalk and trade-offs: Plant responses to climate change-associated abiotic and biotic stresses. Plant Cell Environ. 2022, 10, 2946–2963. [Google Scholar] [CrossRef]
  40. Zandalinas, S.I.; Sengupta, S.; Fritschi, F.B.; Azad, R.K.; Nechushtai, R.; Mittler, R. The impact of multifactorial stress combination on plant growth and survival. New Phytol. 2021, 3, 1034–1048. [Google Scholar] [CrossRef]
  41. Chen, Z.H.; Soltis, D.E. Evolution of Environmental Stress Responses in Plants. Plant Cell Environ. 2020, 12, 2827–2831. [Google Scholar] [CrossRef]
  42. Mikkelsen, B.L.; Olsen, C.E.; Lyngkjær, M.F. Accumulation of secondary metabolites in healthy and diseased barley, grown under future climate levels of CO2, ozone and temperature. Phytochemistry 2014, 118, 162–173. [Google Scholar] [CrossRef] [PubMed]
  43. Paukov, A.; Teptina, A.; Morozova, M.; Kruglova, E.; Favero-Longo, S.E.; Bishop, C.; Rajakaruna, N. The Effects of Edaphic and Climatic Factors on Secondary Lichen Chemistry: A Case Study Using Saxicolous Lichens. Diversity 2019, 11, 94. [Google Scholar] [CrossRef]
  44. Gerganova, M.; Popova, A.V.; Stanoeva, D.; Velitchkova, M. Tomato plants acclimate better to elevated temperature and high light than to treatment with each factor separately. Plant Physiol. Biochem. 2016, 104, 234–241. [Google Scholar] [CrossRef]
  45. Li, M.F.; Liu, X.Z.; Wei, J.H. Selection of high altitude planting area of Angelica sinensis based on biomass, bioactive compounds accumulation and antioxidant capacity. Chin. Tradit. Herb. Drugs 2020, 51, 474–481. [Google Scholar]
  46. Xu, M.; Li, X.; Liu, M.; Shi, Y.; Zhou, H.; Zhang, B.; Yan, J. Spatial variation patterns of plant herbaceous community response to warming along latitudinal and altitudinal gradients in mountainous forests of the Loess Plateau, China. Environ. Exp. Bot. 2020, 172, 103983. [Google Scholar] [CrossRef]
  47. Chauhan, S.; Dhalaria, R.; Ghoshal, S.; Kanwal, K.; Verma, R. Altitudinal Impact on Phytochemical Composition and Mycorrhizal Diversity of Taxus Contorta Griff in the Temperate Forest of Shimla District. J. Basic Microbiol. 2024, 64, e2400016. [Google Scholar] [CrossRef]
  48. Večeřová, K.; Klem, K.; Veselá, B.; Holub, P.; Grace, J.; Urban, O. Combined Effect of Altitude, Season and Light on the Accumulation of Extractable Terpenes in Norway Spruce Needles. Forests 2021, 12, 1737. [Google Scholar] [CrossRef]
  49. Zou, K.; Liu, X.; Zhang, D.; Yang, Q.; Fu, S.; Meng, D.; Chang, W.; Li, R.; Yin, H.; Liang, Y. Flavonoid Biosynthesis Is Likely More Susceptible to Elevation and Tree Age Than Other Branch Pathways Involved in Phenylpropanoid Biosynthesis in Ginkgo Leaves. Front. Plant Sci. 2019, 10, 983. [Google Scholar] [CrossRef]
  50. Costa, S.D.; Gerschlauer, F.; Kiese, R.; Fischer, M.; Kleyer, M.; Hemp, A. Plant niche breadths along environmental gradients and their relationship to plant functional traits. Divers. Distrib. 2018, 24, 1869–1882. [Google Scholar] [CrossRef]
  51. Guo, X.L.; Lin, X.M.; Guo, J.; You, J.M. The Research Status and Prospect of Radix Angelicae Pubescentis. J. Anhui Agric. Sci. 2014, 42, 11673–11674. [Google Scholar]
  52. Dixon, R.A.; Paiva, N.L. Stress-Induced Phenylpropanoid Metabolism. Plant Cell Online 1995, 7, 1085–1097. [Google Scholar] [CrossRef]
  53. Escobar, L.A.; Silva, O.; Acevedo, P.; Nunes-Nesi, A.; Alberdi, M.; Reyes-Díaz, M. Different levels of UV-B resistance in Vaccinium corymbosum cultivars reveal distinct backgrounds of phenylpropanoid metabolites. Plant Physiol. Biochem. 2017, 118, 541–550. [Google Scholar] [CrossRef] [PubMed]
  54. Lei, L.; Yuan, X.; Fu, K.; Chen, Y.; Lu, Y.; Shou, N.; Wu, D.; Chen, X.; Shi, J.; Zhang, M.; et al. Pseudotargeted metabolomics revealed the adaptive mechanism of Draba oreades Schrenk at high altitude. Front. Plant Sci. 2022, 13, 1052640. [Google Scholar] [CrossRef]
  55. Silambarasan, T.S.; Mukesh, K.D. Eco-Technological Evaluation of Natural Phytochemicals Potential Drug Molecules Against Main Protease: A Machine Learning Algorithm. Cureus 2024, 16, e57151. [Google Scholar]
  56. Chinese Pharmacopoeia Commission. Pharmacopoeia of People’s Republic of China; China Medical Science and Technology Press: Beijing, China, 2020; Volume 1, pp. 133–134. [Google Scholar]
  57. Mihang, J.; Liu, X.J.; Liu, L.Y. Different Responses of Terrestrial Carbon Fluxes to Environmental Changes in Cold Temperate Forest Ecosystems. Forests 2024, 15, 1340. [Google Scholar] [CrossRef]
  58. Qaderi, M.M.; Martel, A.B.; Strugnell, C.A. Environmental Factors Regulate Plant Secondary Metabolites. Plants 2023, 3, 447. [Google Scholar] [CrossRef]
  59. Wang, R.X.; Zhang, T.J.; He, F.; Fu, R.; Jing, W.G.; Guo, X.; Wei, F. Identification Analysis of Angelicae sinensis radix and Angelicae pubescentis radix Based on Quantized “Digital Identity” and UHPLC-QTOF-MSE Analysis. J. Am. Soc. Mass Spectrom. 2024, 35, 2222–2229. [Google Scholar] [CrossRef] [PubMed]
  60. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 15, 2114–2120. [Google Scholar] [CrossRef]
  61. Thawng, C.N.; Smith, G.B. A transcriptome software comparison for the analyses of treatments expected to give subtle gene expression responses. BMC Genom. 2022, 23, 452. [Google Scholar] [CrossRef]
  62. García, C.J.; Alacid, V.; Tomás-Barberán, F.A.; García, C.; Palazón, P. Untargeted Metabolomics to Explore the Bacteria Exo-Metabolome Related to Plant Biostimulants. Agronomy 2022, 8, 1926. [Google Scholar] [CrossRef]
  63. Livak, K.J.; Schmittgen, T.D. Analysis of relative gene expression data using real time quantitative PCR and the 2(-Delta Delta C(T)) Method. Methods 2001, 25, 402–408. [Google Scholar] [CrossRef] [PubMed]
  64. He, P.; Guo, L.F.; Liu, Y.Z.; Meng, F.; Peng, C. Spatial dynamic simulation of important cash crops based on phenology with Scutellaria baicalensis Georgi as an example. Eur. J. Agron. 2023, 144, 126748. [Google Scholar] [CrossRef]
  65. Farcuh, M.; Li, B.; Rivero, R.M.; Shlizerman, L.; Sadka, A.; Blumwald, E. Sugar metabolism reprogramming in a non-climacteric bud mutant of a climacteric plum fruit during development on the tree. J. Exp. Bot. 2017, 68, 5813–5828. [Google Scholar] [CrossRef] [PubMed]
  66. Ghany, E.A.S.; Mahmood, A.M.; Aziz, E. Adaptive Dynamic Learning Rate Optimization Technique for Colorectal Cancer Diagnosis Based on Histopathological Image Using EfficientNet-B0 Deep Learning Model. Electronics 2024, 13, 3126. [Google Scholar] [CrossRef]
Figure 1. Metabolomics profiles of Ecotype Samples. (a) PCA score plot; (b) Venn plot; (c) Volcano plot.
Figure 1. Metabolomics profiles of Ecotype Samples. (a) PCA score plot; (b) Venn plot; (c) Volcano plot.
Ijms 26 03894 g001
Figure 2. KEGG enrichment circle diagrams of DEGs. First circle: Enriched classifications, with the outer circle representing the coordinate scale for the number of genes. Different colors signify different classifications. Second circle: The number of genes in that classification within the background set, along with the q-value or p-value. The longer the bar, the more genes are present; the redder the color, the smaller the value; the bluer, the larger. Third circle: Bar chart showing the proportion of up- and downregulated genes. Light red represents the proportion of upregulated genes, while light blue represents the proportion of downregulated genes. The specific numerical values are displayed below. Fourth circle: RichFactor values for each classification (the number of foreground genes in the classification divided by the number of background genes). The background grid lines indicate increments of 0.2.
Figure 2. KEGG enrichment circle diagrams of DEGs. First circle: Enriched classifications, with the outer circle representing the coordinate scale for the number of genes. Different colors signify different classifications. Second circle: The number of genes in that classification within the background set, along with the q-value or p-value. The longer the bar, the more genes are present; the redder the color, the smaller the value; the bluer, the larger. Third circle: Bar chart showing the proportion of up- and downregulated genes. Light red represents the proportion of upregulated genes, while light blue represents the proportion of downregulated genes. The specific numerical values are displayed below. Fourth circle: RichFactor values for each classification (the number of foreground genes in the classification divided by the number of background genes). The background grid lines indicate increments of 0.2.
Ijms 26 03894 g002
Figure 3. (a) Gene changes in metabolites and regulatory enzymes in the process of coumarin metabolism; (b) quantitative real-time PCR (qRT-PCR) validation of select genes.
Figure 3. (a) Gene changes in metabolites and regulatory enzymes in the process of coumarin metabolism; (b) quantitative real-time PCR (qRT-PCR) validation of select genes.
Ijms 26 03894 g003
Figure 4. RF of environmental variables and transcriptomics and metabolomics.
Figure 4. RF of environmental variables and transcriptomics and metabolomics.
Ijms 26 03894 g004
Figure 5. (a) Environment–metabolite–gene correlation diagram; (b) MFA variable set contribution graph; (c) contribution of 20 variables to Dim1 and Dim2 rankings; (d) Spearman correlation analysis of different datasets; (e) random forest feature sorting.
Figure 5. (a) Environment–metabolite–gene correlation diagram; (b) MFA variable set contribution graph; (c) contribution of 20 variables to Dim1 and Dim2 rankings; (d) Spearman correlation analysis of different datasets; (e) random forest feature sorting.
Ijms 26 03894 g005
Figure 6. (a) Loss over epochs; (b) relationship between true and predicted values; (c) residual plot.
Figure 6. (a) Loss over epochs; (b) relationship between true and predicted values; (c) residual plot.
Ijms 26 03894 g006
Figure 7. Collection point information and sample images.
Figure 7. Collection point information and sample images.
Ijms 26 03894 g007
Table 1. Distribution of metabolites detected by metabolomics.
Table 1. Distribution of metabolites detected by metabolomics.
ClassCountPercent
Benzene and Substituted Derivatives3146.04%
Carboxylic Acids and Derivatives54410.46%
Coumarins and Derivatives1082.08%
Fatty Acyls70713.6%
Flavonoids2915.6%
Glycerophospholipids951.83%
Organooxygen Compounds64912.48%
Prenol Lipids4628.88%
Steroids and Steroid Derivatives1342.58%
Others189636.46%
Table 2. The proportion of differential metabolite classifications in each comparison group.
Table 2. The proportion of differential metabolite classifications in each comparison group.
C-VS-HG-VS-CS-VS-CG-VS-HG-VS-SS-VS-H
Organooxygen compounds18.8%14%15.3%16.6%13.8%14.9%
Prenol lipids8.6%8.8%11.3%8.7%11.1%12.6%
Coumarins and derivatives8.9%9.2%6.7%11.3%10.4%9.7%
Carboxylic acids and derivatives10.2%10.3%10.1%8.7%7.9%10%
Fatty acyls8.9%10.7%11.6%11.7%10.4%13%
Flavonoids8.9%5.9%7.3%7.1%8.3%6.7%
Table 3. Information on key genes involved in coumarin synthesis and their differential expression.
Table 3. Information on key genes involved in coumarin synthesis and their differential expression.
EnzymeNumberingGene IDGene NameRegulation
C-VS-HG-VS-CG-VS-HS-VS-CS-VS-GS-VS-H
phenylalanine ammonia-lyase (PAL), K10775Gene2TRINITY_DN23454_c1_g2_i2_3PAL-3DownUPDownUPDownDown
Gene1TRINITY_DN21310_c0_g2_i1_4PAL-1DownUPDownUPDownDown
4-coumarate--CoA ligas (4CL), K01904Gene3TRINITY_DN18825_c0_g1_i2_44CL-1DownUPDownUPDownDown
Gene4TRINITY_DN22443_c1_g2_i2_24CL-1DownUPDownUPDownDown
Gene6TRINITY_DN22443_c1_g4_i1_24CL-2DownUPDownUPUPDown
Gene5TRINITY_DN23319_c1_g4_i1_34CL-1DownUPDownUPUPDown
trans-cinnamate 4-monooxygenase (CYP73A), K00487Gene8TRINITY_DN19642_c0_g1_i1_3C4H1DownUPUPDownDownDown
Gene7TRINITY_DN18621_c0_g1_i3_4C4H1DownUPDownUPUPDown
beta-glucosidase (bglB) K05350Gene9TRINITY_DN18641_c0_g1_i2_3BGLU44DownUPDownDownDownDown
Gene10TRINITY_DN19407_c0_g1_i2_4BGLU44DownDownDownUPUPUP
Gene12TRINITY_DN21033_c0_g1_i8_4BGLU18DownDownDownUPUPDown
Gene11TRINITY_DN21908_c0_g2_i2_4BGLU44UPDownUPDownDownDown
beta-glucosidase (bglX) K05349Gene13TRINITY_DN19157_c0_g1_i1_4ANIA_01804DownDownDownUPUPDown
Gene14TRINITY_DN20626_c2_g1_i5_1ANIA_01804UPDownUPDownDownUP
Gene15TRINITY_DN20683_c0_g1_i2_3XYL4DownDownDownDownUPDown
Gene16TRINITY_DN20767_c0_g1_i18_2BXL1DownDownDownUPUPDown
Gene17TRINITY_DN22550_c1_g4_i2_3ANIA_02828DownUPDownUPDownDown
Table 4. DNN model accuracy assessment.
Table 4. DNN model accuracy assessment.
Training Set MetricsTest Set Metrics
Mean Squared Error (MSE)0.00250.0043
Root Mean Squared Error (RMSE)0.04950.0654
Mean Absolute Error (MAE)0.03880.05
Coefficient of Determination (R2)0.97720.9552
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hu, C.; Li, Q.; Ding, X.; Jiang, K.; Liang, W. Exploring Molecular and Genetic Differences in Angelica biserrata Roots Under Environmental Changes. Int. J. Mol. Sci. 2025, 26, 3894. https://doi.org/10.3390/ijms26083894

AMA Style

Hu C, Li Q, Ding X, Jiang K, Liang W. Exploring Molecular and Genetic Differences in Angelica biserrata Roots Under Environmental Changes. International Journal of Molecular Sciences. 2025; 26(8):3894. https://doi.org/10.3390/ijms26083894

Chicago/Turabian Style

Hu, Chaogui, Qian Li, Xiaoqin Ding, Kan Jiang, and Wei Liang. 2025. "Exploring Molecular and Genetic Differences in Angelica biserrata Roots Under Environmental Changes" International Journal of Molecular Sciences 26, no. 8: 3894. https://doi.org/10.3390/ijms26083894

APA Style

Hu, C., Li, Q., Ding, X., Jiang, K., & Liang, W. (2025). Exploring Molecular and Genetic Differences in Angelica biserrata Roots Under Environmental Changes. International Journal of Molecular Sciences, 26(8), 3894. https://doi.org/10.3390/ijms26083894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop