Identification, Design, and Application of Noncoding Cis-Regulatory Elements

Xu, Lingna; Liu, Yuwen

doi:10.3390/biom14080945

Open AccessReview

Identification, Design, and Application of Noncoding Cis-Regulatory Elements

by

Lingna Xu

^1,2

and

Yuwen Liu

^1,2,3,*

¹

Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-Omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China

²

Innovation Group of Pig Genome Design and Breeding, Research Centre for Animal Genome, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518124, China

³

Kunpeng Institute of Modern Agriculture at Foshan, Chinese Academy of Agricultural Sciences, Foshan 528226, China

^*

Author to whom correspondence should be addressed.

Biomolecules 2024, 14(8), 945; https://doi.org/10.3390/biom14080945

Submission received: 26 May 2024 / Revised: 25 July 2024 / Accepted: 30 July 2024 / Published: 5 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Cis-regulatory elements (CREs) play a pivotal role in orchestrating interactions with trans-regulatory factors such as transcription factors, RNA-binding proteins, and noncoding RNAs. These interactions are fundamental to the molecular architecture underpinning complex and diverse biological functions in living organisms, facilitating a myriad of sophisticated and dynamic processes. The rapid advancement in the identification and characterization of these regulatory elements has been marked by initiatives such as the Encyclopedia of DNA Elements (ENCODE) project, which represents a significant milestone in the field. Concurrently, the development of CRE detection technologies, exemplified by massively parallel reporter assays, has progressed at an impressive pace, providing powerful tools for CRE discovery. The exponential growth of multimodal functional genomic data has necessitated the application of advanced analytical methods. Deep learning algorithms, particularly large language models, have emerged as invaluable tools for deconstructing the intricate nucleotide sequences governing CRE function. These advancements facilitate precise predictions of CRE activity and enable the de novo design of CREs. A deeper understanding of CRE operational dynamics is crucial for harnessing their versatile regulatory properties. Such insights are instrumental in refining gene therapy techniques, enhancing the efficacy of selective breeding programs, pushing the boundaries of genetic innovation, and opening new possibilities in microbial synthetic biology.

Keywords:

CREs; MPRA; predicting CRE activity; de novo designing CREs

1. Introduction

Historically, genomic research has predominantly concentrated on elucidating the functional implications of protein-coding sequences. Recent advancements in genomic annotation have revealed that over 98% of the human genome comprises noncoding DNA sequences [1]. Genome-wide association studies (GWAS) have further shown that more than 90% of genetic variation lies within these noncoding regions [2]. This previously overlooked portion of the genome is now gaining widespread attention. The National Human Genome Research Institute’s Encyclopedia of DNA Elements (ENCODE) project, initiated in 2003, aims to catalog and characterize the entire repertoire of functional elements within the human genome [3]. This landmark initiative has significantly advanced our understanding of genomic complexity and the regulatory mechanisms that govern gene expression.

The ENCODE Consortium has pioneered robust experimental and bioinformatic methodologies to delineate cis-regulatory elements (CREs) [4,5]. Contemporary high-throughput modalities, including chromatin accessibility assays, histone modification profiling, and three-dimensional chromatin conformation modeling, infer regulatory potential but only indirectly gauge CRE activity through physicochemical indices. To address this, the massively parallel reporter assay (MPRA) was developed, enabling high-throughput quantification of CRE activity using customized plasmid vectors. The MPRA system is comprehensively summarized to enable researchers to flexibly utilize these technologies in conducting related studies.

The systematic identification of regulatory elements has greatly enhanced our understanding of CRE functionalities across various cellular contexts, tissues, species, and temporal states [6,7,8,9,10,11,12,13]. Numerous scholars have developed deep learning models to conduct in-depth CRE analyses [14,15,16,17,18]. These models pose critical questions: Can we predict regulatory elements across species and different spatial and temporal contexts based on sequence homology? What is the syntactic logic of CREs in regulating gene expression? Decoding the key DNA motifs within regulatory elements is foundational for designing these elements, and customizing transcriptional regulation represents a major innovation in biology.

Understanding, identifying, decoding, and designing noncoding regulatory elements are pivotal for advancing research across life sciences, genetic evolution, disease treatment, and animal husbandry and breeding. This review provides an overview of high-throughput technologies for identifying CREs, summarizes studies on predicting and designing regulatory elements using deep learning models, and discusses the potential applications of noncoding regulatory elements in various fields.

2. High-Throughput Direct Identification of CRE Activity Using MPRA

Transcription factors (TFs) and their associated cofactors bind to DNA within accessible chromatin domains and exert essential regulatory functions. Advanced genomic methodologies, such as micrococcal nuclease sequencing [19], DNase I hypersensitive site sequencing [20], and assays for transposase-accessible chromatin using sequencing [21], have been pivotal in demarcating these domains that harbor critical CREs, including promoters, enhancers, and silencers. The epigenetic landscape, defined by “core histone modifications”, provides a framework for identifying and decoding functional genomic elements and chromatin organization [22]. Techniques such as chromatin immunoprecipitation-sequencing [23,24,25], cleavage under targets and release using nucleases (CUT&RUN) [26], and cleavage under targets and tagmentation (CUT&Tag) [27,28] use antibodies to recognize histone modifications and specifically capture genomic regulatory element regions. Chromosome conformation capture techniques, such as high-throughput genomic and epigenomic techniques to capture chromatin conformation (Hi-C), offer genome-wide quantification of chromatin contacts [29]. Chromatin interaction analysis with paired-end tag sequencing [30,31] and in situ Hi-C followed by chromatin immunoprecipitation (Hi-ChIP) [32] facilitate the identification of CRE regions. These regions are marked by histone modifications or transcription factor binding. Additionally, 3D genomic approaches provide insights into regulatory element interactions [29] and can elucidate the downstream target genes of such elements [30,31,32].

Despite these advancements, these strategies have inherent limitations and cannot directly localize and quantify CREs with high specificity and precision. Chromatin accessibility assays capture a broad spectrum of regulatory regions. Histone modification capture does not comprehensively encompass all genomic regulatory elements, and three-dimensional interaction maps may include nonfunctional distant contacts as noise. Achieving accurate localization and precise activity measurements of CREs within genomic fragments remains a challenge.

The dual-luciferase reporter system is revered as a gold standard for quantifying CRE activity but is hindered by its low throughput, which is inadequate for the demands of the big data era. Inspired by dual-luciferase plasmid constructs, the MPRA system has been engineered for high-throughput functional interrogation of regulatory elements. MPRA utilizes specific plasmid backbones and regulatory element activity detection strategies based on the functional principles of different regulatory elements. The target genomic fragments are inserted into appropriate screening plasmids, and the CRE activity of a fragment is evaluated by measuring either the transcriptional expression abundance of the tag information paired with the fragment or the transcriptional expression abundance of the fragment itself [33,34,35]. Below, we will detail how researchers use MPRA technology to identify the activity of CREs.

2.1. MPRA Promoter

Promoter activity test fragments were systematically paired with unique barcodes and integrated into a plasmid backbone, incorporating a fluorescent reporter gene, such as green fluorescent protein (GFP), to facilitate high-throughput screening (Figure 1). We constructed a plasmid library dedicated to promoter identification, termed the input library, which was subsequently sequenced. The library was transfected into specific cell lines, and RNA was extracted to construct an RNA-seq library for sequencing, which is referred to as the output library. The promoter activity of each test fragment was quantitatively assessed by analyzing the normalized output-to-input library ratio of the corresponding barcodes.

Ryan et al. leveraged the MPRA system to identify variants that modulate gene expression by influencing promoter activity, thereby contributing to a genetic predisposition to ankylosing spondylitis [36]. In a related study, van Arensbergen et al. employed this approach on a genome-wide scale in K562 cells to map active gene promoters [37]. Furthermore, Zhao et al. advanced this technique by integrating single-cell technology with MPRA and developed single-cell MPRA to pinpoint promoter activity-affecting variants across distinct cell subtypes within the mouse retina [8].

2.2. MPRA Enhancer

Enhancer library construction parallels that for promoters but involves the integration of a minimal promoter (miniP) and a GFP reporter gene between the test fragment and barcode (Figure 1). This setup facilitates a high-throughput screening system for enhancer activity [33]. Unlike promoters, enhancers can exert regulatory influence without directional or proximal constraints. To explore this, Stark et al. developed self-transcribing active regulatory region sequencing (STARR-seq), a derivative of MPRA, in which enhancer fragments are cloned downstream of a minimal promoter and upstream of a polyadenylation signal within a plasmid (Figure 2). In cellular environments, an active enhancer sequence interacts with the promoter to drive the concurrent expression of the reporter gene and the enhancer sequence itself, which could be directly measured through RNA-seq. With the capability to screen DNA fragments at a whole-genome scale, this method has been first applied to map enhancer activity throughout the Drosophila genome, providing substantial insights into the enhancer function [17,38,39,40].

STARR-seq offers several advantages over traditional MPRA, including its applicability to native genomic fragments such as randomly sheared whole-genome fragments, chromatin immunoprecipitated DNA, or enriched open chromatin areas. These fragments can be efficiently integrated into a STARR-seq vector, simplifying library construction and reducing costs while increasing throughput (Figure 2). Liu et al. successfully applied STARR-seq to delineate and quantify enhancer activity across the mammalian genome, achieving a system throughput of 158.6 million sequences [40]. Subsequent innovations have extended the utility of STARR-seq, leading to the development of several variants such as Capture-STARR-seq [41,42], ATAC-STARR-seq [43,44], ChIP-STARR-seq [45], population-scale STARR-seq assays [46], DNA methylation-STARR-seq [47], biallelic targeted STARR-seq [48], and massive active enhancers by sequencing [49]. Each of these technologies addresses specific challenges and expands the scope of their potential applications.

The advent of STARR-seq using adeno-associated viruses [50,51] has facilitated high-throughput identification of enhancer activities in vivo, although the efficiency of this method is currently limited by adeno-associated-viruses (AAV) transduction rates. Enhancements in AAV capsid technology, particularly those that optimize tissue-specific transduction [52], are expected to significantly enhance the throughput and applicability of this approach. Moreover, the integration of nanotechnology for in vivo delivery promises safer and more efficient applications.

Finally, the application of STARR-seq in single-cell formats provides unique challenges, notably the dispersion of limited plasmid transcripts across diverse cell subtypes, which complicates the capture of limited transcriptomic data. Mangan et al. addressed this issue by employing semi-nested polymerase chain reaction (PCR) to selectively amplify transcript information from various cell subtypes, thereby facilitating the detection of enhancer activity in heterogeneous cell populations [53].

2.3. MPRA Silencer

Silencers, akin to enhancers, exert regulatory functions irrespective of spatial and positional constraints and can be characterized through high-throughput methodologies such as STARR-seq. However, a critical difference lies in promoter selection: for enhancer identification, weak promoters or elements with inherently low promoter activity are preferred to enhance the sensitivity of detection [54,55]. Conversely, robust promoters, such as phosphoglycerate kinase (PGK) or super core promoter 1 (SCP1), are indispensable for the effective identification of silencers (Figure 1 and Figure 2). Jayavelu et al. employed a STARR-seq platform utilizing the SCP1 promoter to identify and catalog approximately 7500 potential silencing fragments across human and mouse genomes within K562 cell lines [56]. Further exploring this paradigm, Saadat Hussain used three distinct STARR-seq vectors incorporating SCP1, PGK, and the lymphoid-specific recombination-activating gene 2 (Rag2) gene promoter to evaluate silencer activity within mouse DNase I hypersensitive sites regions in T cell lines, revealing that more potent promoters, such as PGK and Rag2, are superior in detecting a greater number of silencers [57]. Hansen et al. applied ATAC-STARR-seq to a female B-cell lymphoblastoid cell line (GM12878) to concurrently assess the activity of both enhancers and silencers [44], a method that leverages the capacity of this system to detect dual regulatory elements within a singular experimental setup. Although using the ATAC-STARR-seq system to simultaneously identify enhancers and silencers is highly innovative, the promoter activity of the ORI element in the ORI-STARR-seq backbone is quite weak, which may result in some false positives when identifying silencers [58].

2.4. MPRA Insulator

Insulators play pivotal roles in genomic architecture through two principal mechanisms: first, by functioning as enhancer blockers, they prevent enhancers from modulating adjacent promoters; second, they serve as barriers that insulate genes from the transcriptional interference of heterochromatin [59,60,61]. To further explore and characterize these elements, Zhang et al. developed a site-specific heterochromatin insertion of elements in the lamina-associated domain platform. This approach leverages a serine integrase-based strategy for the high-throughput identification of insulators capable of mitigating the influence of heterochromatin on gene transcription, offering new insights into the spatial regulation of gene expression [9] (Figure 3). Hong et al. developed a massively parallel integrated regulatory elements (MPIRE) framework and used it to measure the insulator effects of three insulators—CTCF binding sites (A2 and cHS4) and B box motifs (ALOXE3)—as well as their mutants at thousands of locations in the genome. The study found that while all three insulators could block enhancers in the genome, only ALOXE3 could act as a heterochromatin barrier [62].

2.5. MPRA-5′

The 5′ untranslated region (5′ UTR) plays a crucial role in the regulation of gene transcription and translation. To investigate this, researchers typically integrate target sequences downstream of a promoter and upstream of a reporter gene within an MPRA framework. This setup facilitates the concomitant application of in vitro transcription, ribosome profiling, and RNA sequencing to quantitatively assess the influence of 5′ UTRs on gene transcription on a broad experimental scale [16,63,64,65,66,67] (Figure 4). Considering the variable length of 5′ UTRs, which ranges from 18 to >3000 bases, and the profound impact of both UTR length and sequence context on gene expression, Yiting Lim and associates have developed a pooled full-length UTR multiplex assay on gene expression. This innovative approach uses single-molecule real-time sequencing to capture paired full-length 5′ UTRs and downstream barcodes. Subsequent barcode sequencing of DNA, total mRNA, and ribosome-associated mRNA enables a detailed analysis of the regulatory effects exerted by 5′ UTRs on both transcription and translation processes [68].

2.6. MPRA-3′ UTR

Contemporary MPRA systems, widely used for the high-throughput characterization of 3′ untranslated regions (3′ UTRs), adopt a configuration akin to the STARR-seq system. These assays strategically position test fragments downstream of a promoter and reporter gene and upstream of a poly A termination signal. Diverging from the STARR-seq framework, which employs relatively weak promoters for enhancer detection, MPRA systems for 3′ UTRs incorporate robust promoters to enhance transcriptional output. Oikonomou et al. implemented a dual-fluorescence system harnessing the strengths of GFP and mCherry, both driven by potent promoters. The test fragments were integrated downstream of the mCherry reporter gene, facilitating the identification of regulatory elements within the 3′ UTR via fluorescence-activated cell sorting [69]. Further advancements were made by Dustin Griesemer, who developed the MPRAu system using the strong PGK promoter to drive transcriptional machinery. This system was engineered to insert test fragments downstream of the GFP reporter gene, enabling evaluation of the functional impacts of 12,173 3′ UTR variants linked to human diseases and evolutionary traits across six distinct cell lines [70]. These plasmid-based configurations allow for a systematic analysis of the regulatory effects mediated by 3′ UTRs, offering a streamlined approach to genomic research. However, further investigations are necessary to elucidate the intricate dynamics of post-transcriptional mRNA translation and stability. The development of the Fast-UTR system, predicated on a bidirectional tetracycline-regulated viral reporter gene, has facilitated the quantification of the effects of 3′ UTR sequences on both mRNA and protein synthesis [71,72,73]. MPRA system is combined with in vitro transcription for high-throughput analysis of 3′UTR regulation of mRNA stability [71,74,75] (Figure 5).

2.7. MPRA Technology Is an Effective Strategy for Fine-Mapping Causal Variants of Complex Traits

Comprehensive GWASs have been undertaken to elucidate the influence of genetic variants on complex traits. These studies have often revealed that loci associated with quantitative traits display extensive linkage disequilibrium (LD), encompassing numerous DNA variants that affect phenotypes across genome-wide significance regions. Nevertheless, traditional GWAS methodologies often fail to identify precise causal variants responsible for trait variations. To address this limitation, experimental and computational strategies have been employed to facilitate the identification of causal variants at specific loci. Advanced fine-mapping techniques that leverage association analysis and LD models have been developed to enhance the prediction of potential causal variants [76,77,78,79]. Despite these advancements, heterogeneity in computational predictions necessitates further experimental validation to confirm true causal variants. Innovative methods, such as allele-specific chromatin accessibility, have been implemented to assess how genetic variations influence chromatin accessibility within pertinent cell types, thereby affecting gene expression and phenotypes [80,81]. Similarly, single nucleotide polymorphism (SNP)-ChIP techniques have been used to identify SNPs that affect CREs using genetically diverse donor materials [82]. However, these techniques often reveal that multiple SNPs regulate a single CRE, complicating the isolation and study of individual variant effects.

The adoption of MPRA has significantly advanced the field by enabling high-throughput validation of thousands of genetic variants for their regulatory activity, thereby fine-mapping causal variants that influence diseases or complex traits [83,84,85]. Recent research has increasingly focused on the identification of functional regulatory variants within GWAS loci, using both historical data and novel investigative techniques [11,36,86,87,88,89,90,91,92,93,94,95]. Ryan et al., in a pivotal 2016 study, used a promoter MPRA system to screen approximately 30,000 expression quantitative trait locus (eQTL) variants in B lymphoblastoid cell lines and identified 842 variants with allele-specific expression differences [36]. Enhanced MPRA systems have been widely applied to investigate the regulatory roles of genetic variation across a diverse array of diseases and conditions, including schizophrenia [11,92,95], Alzheimer’s disease [95], obesity [86], melanoma [89], multiple myeloma [88], skin pigmentation disorders [93], Lassa fever [94], and eosinophilic esophagitis [96]. Furthermore, Duan et al. leveraged the STARR-seq system to assess enhancer activity for nearly 6000 insulin resistance-related GWAS variants in HepG2 liver cancer cells, preadipocytes, and A673 rhabdomyosarcoma cells, demonstrating the practical applications of these techniques in functional genomics [91]. Abell et al. employed the MPRA system to investigate the regulatory activities of eQTLs and GWAS loci under both single and linked genetic conditions and uncovered the predominant additive effects in most haplotype combinations [87]. These studies underscore the critical role of innovative genomic technologies in advancing our understanding of the genetic determinants of complex biological traits.

2.8. Limitations of MPRA

Although the MPRA system is a highly effective tool for high-throughput identification of CREs and regulatory variants, it has several limitations that affect its utility in functional genomics. One primary constraint is the inability to ascertain which genes are affected by CREs and regulatory SNPs. This limitation necessitates the integration of MPRA findings with other functional genomic datasets, such as eQTLs and Hi-C, to elucidate the genomic targets of these regulatory elements [97]. Furthermore, the MPRA approach predominantly employs exogenous plasmid constructs to assess regulatory activity and fails to accurately replicate the native chromatin context of the genome. Research has indicated that a fraction of the regulatory elements identified via MPRA are situated within heterochromatin regions, potentially rendering them nonfunctional in their natural genomic environments [40]. Consequently, the application of MPRA to identifying functional CREs and variants requires thorough validation by incorporating additional epigenomic factors, such as chromatin accessibility and histone modifications. Integrative analyses are critical to accurately assessing the functional relevance of regulatory elements in the complex architecture of the genome.

The MPRA system typically identifies regulatory element activity by calculating the ratio between the RNA-seq coverage and the genome sequencing coverage of a fragment to be tested. Processing duplicate reads is a dilemma in RNA-seq experiments, particularly in the context of MPRA. Retaining duplicates risks being affected by PCR amplification efficiency, while discarding duplicates risks underestimating the abundance of highly expressed sequences. Our experience indicates that for whole-genome screening MPRA, where transcript abundance is lower for each potential CRE, collapsing duplicated reads is preferred. Conversely, for target MPRA focusing on tens of thousands or fewer fragments, retaining duplicated reads is favored.

Additionally, a more effective, albeit technically challenging, approach is to add unique molecular identifiers (UMIs) to the library before the PCR step during library construction [17,91,98]—for example, adding UMIs to the plasmid library and reverse transcription products before PCR of sequencing adapter attachment. During data analysis, deduplication based on UMIs can effectively distinguish between the inherent genomic and transcriptomic information of the library and the duplicates from library construction and sequencing PCR, accurately identifying the activity of regulatory elements. However, for highly complex MPRA libraries, the difficulty of adding UMIs increases significantly, and a large amount of sequencing data is required to comprehensively detect all UMI information.

3. Using Deep Learning for Predicting CRE Activity and De Novo Design

Since the deep learning renaissance in 2012, deep learning has propelled a revolution across various domains, championing data-centric methodologies. Deep learning uses deep neural networks, which are sophisticated constructs comprising numerous layers of artificial neurons. The crucial advantage of deep learning is embedding nonlinear feature computations within the architecture of the model. This facilitates the identification of intricate patterns hidden in expansive datasets [99,100]. High-throughput functional genomics technologies have yielded a substantial high-quality dataset of CREs, rendering the application of deep learning models for CRE analysis feasible and highly effective.

3.1. Deep Learning Models Are Used to Predict the Activity of CREs and mRNA Expression Levels

Recent breakthroughs have catalyzed the development of deep learning models and considerably enhanced genomic analytics. These models predict the DNA and RNA protein-binding affinities and activity of CREs. The DeepBind model [101] is used for the precise prediction of protein-DNA/RNA interactions by decoding nucleotide sequence patterns. Employing convolutional neural network architecture to predict expression levels in promoter or terminator regions reveals that the 3′ UTR can subtly adjust mRNA expression, whereas the 5′ UTR has a substantial impact on altering mRNA expression levels [102]. The Xpresso model uses genomic sequences to estimate mRNA expression levels, enabling the quantification of the influence exerted by enhancers, heterochromatic domains, and microRNAs [103]. In parallel, the model for the 5′ UTR language model accurately estimates transcript abundance, revealing insights into gene regulation and mRNA synthesis [15]. These tools are particularly adept at identifying known binding sites and discovering novel interactions, thereby elucidating the intricate regulatory mechanisms of gene expression. In addition, owing to their high predictive accuracy, deep learning models offer crucial insights into the complexities of genetic disorders. Tools such as DeepSEA [104] and Basset [105] predict the impact of genetic variants, such as SNPs and insertion-deletions (InDels), on the activity of enhancers and other CREs. Furthermore, by integrating them with human GWAS datasets, they identified disease-associated causal SNPs [106,107], underscoring their value in unraveling the genetic underpinnings of diseases.

3.2. Deep Learning Models Decode Sequence Features of CREs and Accurately Predict Activity

Deep learning architectures, which are intricate and nonlinear, are misunderstood as opaque “black boxes”. However, unraveling the DNA sequence characteristics that these models learn—how they synthesize these features to precisely predict CRE activity—broadens our understanding of the CRE regulatory syntax. Progress in model interpretability has highlighted the pivotal role of DNA motifs in the function of CREs, along with their spatial orientation, distribution, and interactions, thus unraveling regulatory grammar. For example, the DeepSTARR model delineates enhancer-associated TF motifs and complex syntax rules, highlighting functional variability within identical TF motifs based on flanking sequences and motif spacing [17]. The FactorNet model uses deep neural networks to predict cell type-specific binding to TFs [108]. DeepSEA [104], Basset [105], and Enformer [109] have systematically decoded the genomic syntax for regulatory regions and histone modifications. Cross-species models further demonstrate the conservation of TF-binding preferences, thereby enriching our understanding of fundamental aspects of gene regulation. The DeepMEL model was developed to decode the syntactic structures of enhancers across various species, thereby enabling the prediction of enhancers in additional cell types and species [110]. Sethi et al. devised a STARR-seq model to decipher the syntactic information of enhancers in fruit flies, which they subsequently applied to predict enhancers in mammalian species [111]. The DeepArk model uses epigenomic data from four model organisms to extensively examine the influence of DNA sequences on CRE activity. This model predicts the regulatory effects of genomic variations [112].

3.3. De Novo CREs Design Using Learning Models

Given the ability of deep learning models to predict CRE activity directly from DNA sequences in silico, they substantially enhance the efficiency of traditional mutagenesis-based designs. For instance, DeepSEED is exemplary for refining the context of TF-binding sites, which is a key aspect of efficient promoter engineering [113]. Meanwhile, the DeepSTARR model by the Stark lab and cell type-specific enhancers by the Aerts group exemplify the transformative power of deep learning in enhancer design [14,17]. Furthermore, optimizing UTRs using deep learning techniques is critical for boosting protein expression and improving mRNA therapy and vaccine efficacy [18]. Pioneering work by Zeng et al. produced NCA-7d as the 5′ UTR and S27a plus a functional motif R3U as the 3′ UTR (named as NASAR) combination of UTRs, offering an efficiency leap over traditional UTRs [114]. Tang and colleagues’ Smart5 UTR model, integrating N1-methyl-pseudouridine into 5′ UTRs, has significantly advanced mRNA vaccine effectiveness against challenging viral variants [115]. The de novo design of regulatory elements allows for precise modulation of gene expression across varying levels of activity, which is crucial for the development of targeted gene therapies.

3.4. Challenges of Integrating Deep Learning with MPRA

Integrating deep learning with MPRA technology can substantially enhance the precision and robustness of cis-regulatory elements and gene regulation predictions. MPRA generates extensive datasets that are invaluable for training deep learning models, particularly for predicting TF binding, enhancer activity, and gene expression levels [64,67,74,116,117,118,119,120]. However, such integration presents several challenges. First, to meet the requirements of deep learning models, MPRA experiments usually necessitate larger library sizes. For instance, when the number of fragments to be tested reaches 100,000 or 1 million, and the library complexity after pairing with barcodes reaches 10 million or 100 million, the experiment becomes highly challenging [117,119]. Second, the interpretability of deep learning models developed for different MPRA datasets is still in the exploratory stage [105,121,122]. The high dimensionality of data increases the risk of overfitting, necessitating the use of regularization techniques, cross-validation, and data augmentation strategies to mitigate this issue [17,123]. Additionally, the diversity of the training data may be limited, necessitating broader MPRA experiments to create more comprehensive datasets. Future research should prioritize the integration of multi-omics data, the development of transfer learning approaches, and the advancement of methods for model interpretability. Innovations in MPRA technology, such as optimized library design and advanced readout methodologies, will further enhance data quality and quantity [98,124,125]. Addressing these challenges and leveraging the synergistic strengths of MPRA and deep learning will lead to significant advancements in elucidating complex genomic regulatory landscapes and their effects on gene expression.

4. Outlook: Broad Applications of CREs across Various Fields

As research into CREs advances, their mechanisms of action are gradually being elucidated. Benefiting from the rapid progress in this field, fine-tuning CRE activities to control gene transcription and translation havehas broad applications across numerous areas (Figure 6).

4.1. Optimizing CREs Is One of the Key Future Research Directions for Human Gene Therapies

Human gene therapy encompasses two primary strategies: (1) the introduction of DNA, mRNA, or proteins into the body via vectors such as AAV or lipid nanoparticles to replace defective genes and sustain essential biological functions and (2) employing the CRISPR/Cas9 system for precise gene editing to rectify pathogenic mutations or using CRISPR/dCas9 to modulate the expression of disease-related genes. Enhancing each component of these gene therapy strategies can substantially improve therapeutic efficacy.

For DNA therapy, optimizing regulatory elements, such as promoters and enhancers, can significantly boost the transcription and expression levels of introduced genes. In mRNA therapy, refining UTR sequences can increase protein translation efficiency, thereby enhancing the therapeutic potential of mRNA-based treatments [18,114,115].

When considering delivery systems such as AAV, the design of tissue- or cell subtype-specific promoters or enhancers can facilitate targeted gene expression, thereby minimizing off-target effects on non-diseased tissues or cells [126,127,128]. Similarly, for gene-editing systems, the optimization of regulatory elements can lead to greater editing precision and efficiency [129,130].

4.2. Optimizing CREs Is Critical for the Development of Agricultural Crops and Livestock Breeds with Superior Quality and Yield

In agriculture, epigenomic research on livestock [6,131,132,133,134,135] and crops [136,137] has integrated genomic and transcriptomic data to annotate complex trait GWAS loci, thereby providing essential genetic markers for breeding programs. Breeding efforts are often constrained by trade-offs between complex traits such as meat production versus intramuscular fat in pigs or milk yield versus protein content in dairy cows. A breakthrough study by Song et al. used CRISPR-Cas9 to induce a 54-bp deletion in the rice IPA1 gene promoter, which simultaneously increased tiller number and panicle size, thus overcoming the inherent trade-offs [138]. This suggests the potential for marker-assisted selection and the use of gene and CRE editing to improve crop and livestock varieties for enhanced yield and quality.

4.3. In-Depth Research on CREs Lays a Solid Theoretical Groundwork for Customizing Microbial Cell Factories in the Future

In the field of synthetic biology, microbial biomanufacturing has garnered increasing interest in the production of diverse products [139]. Microbial cell factories use microbes as platforms to produce fuels and chemicals, and there is an increasing focus on optimizing CREs within bacterial plasmid backbones to enhance gene expression [140] and increase the expression of yeast proteins [141]. Future research could enable the custom design of CREs that are responsive to temperature, light, and pH, tailoring microbial production to diverse environmental conditions and achieving precise control over industrial outputs.

4.4. Conclusions and Overlook

The study of noncoding functional CREs is intensely active, and a plethora of experimental and analytical methods are continuously being developed. However, the interactions between CREs and trans-regulatory factors, as well as interactions among different CREs, are highly complex. Current experimental strategies for identifying CREs and their interactions are still not fully developed. This necessitates the ongoing advancement of new methods or the integration of multi-omics technologies to further elucidate the molecular mechanisms and sequence regulatory grammars of CREs. Deep learning models for predicting and designing regulatory elements require continuous development and adjustment of algorithms, as well as more high-quality experimental data to expand the training sets and improve prediction accuracy. As research into CREs progresses, we anticipate that the functional annotation and regulatory mechanisms of CREs will become clearer. With continuing technological advancements across various disciplines, noncoding CREs are poised for significant breakthroughs in an increasing number of areas.

Author Contributions

Data collection and manuscript writing, L.X.; manuscript review, L.X. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (32070595) to Y.L., and the China National Key R&D Program during the 14th Five-Year Plan Period (2021YFF1200503) to Y.L.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the study design, collection, analysis, or interpretation of data; the writing of the manuscript; or the decision to publish the results.

References

Santosh, B.; Varshney, A.; Yadava, P.K. Non-coding RNAs: Biological functions and applications. Cell Biochem. Funct. 2015, 33, 14–22. [Google Scholar] [CrossRef] [PubMed]
Albert, F.W.; Kruglyak, L. The role of regulatory variation in complex traits and disease. Nat. Rev. Genet. 2015, 16, 197–212. [Google Scholar] [CrossRef]
Feingold, E.; Good, P.; Guyer, M.; Kamholz, S.; Liefer, L.; Wetterstrand, K.; Collins, F.; Gingeras, T.; Kampa, D.; Sekinger, E. The ENCODE (ENCyclopedia of DNA elements) project. Science 2004, 306, 636–640. [Google Scholar]
Dunham, I.; Kundaje, A.; Aldred, S.F.; Collins, P.J.; Davis, C.A.; Doyle, F.; Epstein, C.B.; Frietze, S.; Harrow, J.; Kaul, R.; et al. An integrated encyclopedia of DNA elements in the human genome. Nature 2012, 489, 57–74. [Google Scholar] [CrossRef]
Abascal, F.; Acosta, R.; Addleman, N.J.; Adrian, J.; Afzal, V.; Ai, R.; Aken, B.; Akiyama, J.A.; Jammal, O.A.; Amrhein, H.; et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 2020, 583, 699–710. [Google Scholar] [CrossRef]
Teng, J.; Gao, Y.; Yin, H.; Bai, Z.; Liu, S.; Zeng, H.; Bai, L.; Cai, Z.; Zhao, B.; Li, X.; et al. A compendium of genetic regulatory effects across pig tissues. Nat. Genet. 2024, 56, 112–123. [Google Scholar] [CrossRef] [PubMed]
Fu, T.; Amoah, K.; Chan, T.W.; Bahn, J.H.; Lee, J.-H.; Terrazas, S.; Chong, R.; Kosuri, S.; Xiao, X. Massively parallel screen uncovers many rare 3′ UTR variants regulating mRNA abundance of cancer driver genes. Nat. Commun. 2024, 15, 3335. [Google Scholar] [CrossRef] [PubMed]
Zhao, S.; Hong, C.K.Y.; Myers, C.A.; Granas, D.M.; White, M.A.; Corbo, J.C.; Cohen, B.A. A single-cell massively parallel reporter assay detects cell-type-specific gene regulation. Nat. Genet. 2023, 55, 346–354. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.; Ehmann, M.E.; Matukumalli, S.; Boob, A.G.; Gilbert, D.M.; Zhao, H. SHIELD: A platform for high-throughput screening of barrier-type DNA elements in human cells. Nat. Commun. 2023, 14, 5616. [Google Scholar] [CrossRef]
Zhang, D.; Deng, Y.; Kukanja, P.; Agirre, E.; Bartosovic, M.; Dong, M.; Ma, C.; Ma, S.; Su, G.; Bao, S.; et al. Spatial epigenome–transcriptome co-profiling of mammalian tissues. Nature 2023, 616, 113–122. [Google Scholar] [CrossRef]
Rummel, C.K.; Gagliardi, M.; Ahmad, R.; Herholt, A.; Jimenez-Barron, L.; Murek, V.; Weigert, L.; Hausruckinger, A.; Maidl, S.; Hauger, B. Massively parallel functional dissection of schizophrenia-associated noncoding genetic variants. Cell 2023, 186, 5165–5182. e5133. [Google Scholar] [CrossRef] [PubMed]
Xu, J.; Song, F.; Lyu, H.; Kobayashi, M.; Zhang, B.; Zhao, Z.; Hou, Y.; Wang, X.; Luan, Y.; Jia, B.; et al. Subtype-specific 3D genome alteration in acute myeloid leukaemia. Nature 2022, 611, 387–398. [Google Scholar] [CrossRef] [PubMed]
Vaishnav, E.D.; de Boer, C.G.; Molinet, J.; Yassour, M.; Fan, L.; Adiconis, X.; Thompson, D.A.; Levin, J.Z.; Cubillos, F.A.; Regev, A. The evolution, evolvability and engineering of gene regulatory DNA. Nature 2022, 603, 455–463. [Google Scholar] [CrossRef] [PubMed]
Taskiran, I.I.; Spanier, K.I.; Dickmänken, H.; Kempynck, N.; Pančíková, A.; Ekşi, E.C.; Hulselmans, G.; Ismail, J.N.; Theunis, K.; Vandepoel, R.; et al. Cell-type-directed design of synthetic enhancers. Nature 2024, 626, 212–220. [Google Scholar] [CrossRef] [PubMed]
Chu, Y.; Yu, D.; Li, Y.; Huang, K.; Shen, Y.; Cong, L.; Zhang, J.; Wang, M. A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions. Nat. Mach. Intell. 2024, 6, 449–460. [Google Scholar] [CrossRef] [PubMed]
Hair, S.C.; Fedak, S.; Wang, B.; Linder, J.; Havens, K.; Certo, M.; Seelig, G. Optimizing 5′ UTRs for mRNA-delivered gene editing using deep learning. bioRxiv 2023. [Google Scholar] [CrossRef]
de Almeida, B.P.; Reiter, F.; Pagani, M.; Stark, A. DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 2022, 54, 613–624. [Google Scholar] [CrossRef] [PubMed]
Castillo-Hair, S.M.; Seelig, G. Machine Learning for Designing Next-Generation mRNA Therapeutics. Acc. Chem. Res. 2022, 55, 24–34. [Google Scholar] [CrossRef]
Zaret, K. Micrococcal nuclease analysis of chromatin structure. Curr. Protoc. Mol. Biol. 2005, 45, 21.1.1–21.1.17. [Google Scholar] [CrossRef]
Song, L.; Crawford, G.E. DNase-seq: A high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb. Protoc. 2010, 2010, pdb.prot5384. [Google Scholar] [CrossRef]
Buenrostro, J.D.; Wu, B.; Chang, H.Y.; Greenleaf, W.J. ATAC-seq: A Method for Assaying Chromatin Accessibility Genome-Wide. Curr. Protoc. Mol. Biol. 2015, 109, 21.29.21–21.29.29. [Google Scholar] [CrossRef]
Kundaje, A.; Meuleman, W.; Ernst, J.; Bilenky, M.; Yen, A.; Heravi-Moussavi, A.; Kheradpour, P.; Zhang, Z.; Wang, J.; Ziller, M.J.; et al. Integrative analysis of 111 reference human epigenomes. Nature 2015, 518, 317–330. [Google Scholar] [CrossRef]
Park, P.J. ChIP–seq: Advantages and challenges of a maturing technology. Nat. Rev. Genet. 2009, 10, 669–680. [Google Scholar] [CrossRef]
Furey, T.S. ChIP–seq and beyond: New and improved methodologies to detect and characterize protein–DNA interactions. Nat. Rev. Genet. 2012, 13, 840–852. [Google Scholar] [CrossRef]
Nakato, R.; Sakata, T. Methods for ChIP-seq analysis: A practical workflow and advanced applications. Methods 2021, 187, 44–53. [Google Scholar] [CrossRef] [PubMed]
Skene, P.J.; Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. Elife 2017, 6, e21856. [Google Scholar] [CrossRef]
Kaya-Okur, H.S.; Wu, S.J.; Codomo, C.A.; Pledger, E.S.; Bryson, T.D.; Henikoff, J.G.; Ahmad, K.; Henikoff, S. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 2019, 10, 1930. [Google Scholar] [CrossRef] [PubMed]
Kaya-Okur, H.S.; Janssens, D.H.; Henikoff, J.G.; Ahmad, K.; Henikoff, S. Efficient low-cost chromatin profiling with CUT&Tag. Nat. Protoc. 2020, 15, 3264–3283. [Google Scholar] [CrossRef] [PubMed]
Pollex, T.; Rabinowitz, A.; Gambetta, M.C.; Marco-Ferreres, R.; Viales, R.R.; Jankowski, A.; Schaub, C.; Furlong, E.E.M. Enhancer–promoter interactions become more instructive in the transition from cell-fate specification to tissue differentiation. Nat. Genet. 2024, 56, 686–696. [Google Scholar] [CrossRef]
Grubert, F.; Zaugg, J.B.; Kasowski, M.; Ursu, O.; Spacek, D.V.; Martin, A.R.; Greenside, P.; Srivas, R.; Phanstiel, D.H.; Pekowska, A.; et al. Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions. Cell 2015, 162, 1051–1065. [Google Scholar] [CrossRef]
Fulco, C.P.; Nasser, J.; Jones, T.R.; Munson, G.; Bergman, D.T.; Subramanian, V.; Grossman, S.R.; Anyoha, R.; Doughty, B.R.; Patwardhan, T.A.; et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 2019, 51, 1664–1669. [Google Scholar] [CrossRef] [PubMed]
Nasser, J.; Bergman, D.T.; Fulco, C.P.; Guckelberger, P.; Doughty, B.R.; Patwardhan, T.A.; Jones, T.R.; Nguyen, T.H.; Ulirsch, J.C.; Lekschas, F.; et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 2021, 593, 238–243. [Google Scholar] [CrossRef] [PubMed]
Patwardhan, R.P.; Hiatt, J.B.; Witten, D.M.; Kim, M.J.; Smith, R.P.; May, D.; Lee, C.; Andrie, J.M.; Lee, S.-I.; Cooper, G.M.; et al. Massively parallel functional dissection of mammalian enhancers in vivo. Nat. Biotechnol. 2012, 30, 265–270. [Google Scholar] [CrossRef]
Melnikov, A.; Murugan, A.; Zhang, X.; Tesileanu, T.; Wang, L.; Rogov, P.; Feizi, S.; Gnirke, A.; Callan, C.G., Jr.; Kinney, J.B.; et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 2012, 30, 271–277. [Google Scholar] [CrossRef]
Kwasnieski, J.C.; Mogno, I.; Myers, C.A.; Corbo, J.C.; Cohen, B.A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl. Acad. Sci. USA 2012, 109, 19498–19503. [Google Scholar] [CrossRef]
Tewhey, R.; Kotliar, D.; Park, D.S.; Liu, B.; Winnicki, S.; Reilly, S.K.; Andersen, K.G.; Mikkelsen, T.S.; Lander, E.S.; Schaffner, S.F.; et al. Direct Identification of Hundreds of Expression-Modulating Variants using a Multiplexed Reporter Assay. Cell 2016, 165, 1519–1529. [Google Scholar] [CrossRef]
van Arensbergen, J.; FitzPatrick, V.D.; de Haas, M.; Pagie, L.; Sluimer, J.; Bussemaker, H.J.; van Steensel, B. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 2017, 35, 145–153. [Google Scholar] [CrossRef]
Arnold, C.D.; Gerlach, D.; Stelzer, C.; Boryń, Ł.M.; Rath, M.; Stark, A. Genome-wide quantitative enhancer activity maps identified by STARR-seq. Science 2013, 339, 1074–1077. [Google Scholar] [CrossRef] [PubMed]
Arnold, C.D.; Gerlach, D.; Spies, D.; Matts, J.A.; Sytnikova, Y.A.; Pagani, M.; Lau, N.C.; Stark, A. Quantitative genome-wide enhancer activity maps for five Drosophila species show functional enhancer conservation and turnover during cis-regulatory evolution. Nat. Genet. 2014, 46, 685–692. [Google Scholar] [CrossRef]
Liu, Y.; Yu, S.; Dhiman, V.K.; Brunetti, T.; Eckart, H.; White, K.P. Functional assessment of human enhancer activities using whole-genome STARR-sequencing. Genome Biol. 2017, 18, 219. [Google Scholar] [CrossRef]
Vanhille, L.; Griffon, A.; Maqbool, M.A.; Zacarias-Cabeza, J.; Dao, L.T.; Fernandez, N.; Ballester, B.; Andrau, J.C.; Spicuglia, S. High-throughput and quantitative assessment of enhancer activity in mammals by CapStarr-seq. Nat. Commun. 2015, 6, 6905. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Liu, Y.; Zhang, Q.; Wu, J.; Liang, J.; Yu, S.; Wei, G.-H.; White, K.P.; Wang, X. Systematic identification of regulatory variants associated with cancer risk. Genome Biol. 2017, 18, 194. [Google Scholar] [CrossRef]
Wang, X.; He, L.; Goggin, S.M.; Saadat, A.; Wang, L.; Sinnott-Armstrong, N.; Claussnitzer, M.; Kellis, M. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat. Commun. 2018, 9, 5380. [Google Scholar] [CrossRef]
Hansen, T.J.; Hodges, E. ATAC-STARR-seq reveals transcription factor-bound activators and silencers across the chromatin accessible human genome. Genome Res. 2022, 32, 1529–1541. [Google Scholar] [CrossRef] [PubMed]
Vockley, C.M.; D’Ippolito, A.M.; McDowell, I.C.; Majoros, W.H.; Safi, A.; Song, L.; Crawford, G.E.; Reddy, T.E. Direct GR Binding Sites Potentiate Clusters of TF Binding across the Human Genome. Cell 2016, 166, 1269–1281.e1219. [Google Scholar] [CrossRef] [PubMed]
Vockley, C.M.; Guo, C.; Majoros, W.H.; Nodzenski, M.; Scholtens, D.M.; Hayes, M.G.; Lowe, W.L., Jr.; Reddy, T.E. Massively parallel quantification of the regulatory effects of noncoding genetic variation in a human cohort. Genome Res. 2015, 25, 1206–1214. [Google Scholar] [CrossRef]
Lea, A.J.; Vockley, C.M.; Johnston, R.A.; Del Carpio, C.A.; Barreiro, L.B.; Reddy, T.E.; Tung, J. Genome-wide quantification of the effects of DNA methylation on human gene regulation. eLife 2018, 7, e37513. [Google Scholar] [CrossRef] [PubMed]
Kalita, C.A.; Brown, C.D.; Freiman, A.; Isherwood, J.; Wen, X.; Pique-Regi, R.; Luca, F. High-throughput characterization of genetic effects on DNA-protein binding and gene transcription. Genome Res. 2018, 28, 1701–1708. [Google Scholar] [CrossRef]
Zhu, X.; Huang, Q.; Huang, L.; Luo, J.; Li, Q.; Kong, D.; Deng, B.; Gu, Y.; Wang, X.; Li, C.; et al. MAE-seq refines regulatory elements across the genome. Nucleic Acids Res. 2023, 52, e9. [Google Scholar] [CrossRef]
Lambert, J.T.; Su-Feher, L.; Cichewicz, K.; Warren, T.L.; Zdilar, I.; Wang, Y.; Lim, K.J.; Haigh, J.L.; Morse, S.J.; Canales, C.P.; et al. Parallel functional testing identifies enhancers active in early postnatal mouse brain. Elife 2021, 10, e69479. [Google Scholar] [CrossRef]
Chan, Y.-C.; Kienle, E.; Oti, M.; Di Liddo, A.; Mendez-Lago, M.; Aschauer, D.F.; Peter, M.; Pagani, M.; Arnold, C.; Vonderheit, A.; et al. An unbiased AAV-STARR-seq screen revealing the enhancer activity map of genomic regions in the mouse brain in vivo. Sci. Rep. 2023, 13, 6745. [Google Scholar] [CrossRef] [PubMed]
Tabebordbar, M.; Lagerborg, K.A.; Stanton, A.; King, E.M.; Ye, S.; Tellez, L.; Krunnfusz, A.; Tavakoli, S.; Widrick, J.J.; Messemer, K.A.; et al. Directed evolution of a family of AAV capsid variants enabling potent muscle-directed gene delivery across species. Cell 2021, 184, 4919–4938.e4922. [Google Scholar] [CrossRef] [PubMed]
Mangan, R.J.; Alsina, F.C.; Mosti, F.; Sotelo-Fonseca, J.E.; Snellings, D.A.; Au, E.H.; Carvalho, J.; Sathyan, L.; Johnson, G.D.; Reddy, T.E.; et al. Adaptive sequence divergence forged new neurodevelopmental enhancers in humans. Cell 2022, 185, 4587–4603.e4523. [Google Scholar] [CrossRef] [PubMed]
Muerdter, F.; Boryń, Ł.M.; Woodfin, A.R.; Neumayr, C.; Rath, M.; Zabidi, M.A.; Pagani, M.; Haberle, V.; Kazmar, T.; Catarino, R.R.; et al. Resolving systematic errors in widely used enhancer activity assays in human cells. Nat. Methods 2018, 15, 141–149. [Google Scholar] [CrossRef] [PubMed]
Klein, J.C.; Agarwal, V.; Inoue, F.; Keith, A.; Martin, B.; Kircher, M.; Ahituv, N.; Shendure, J. A systematic evaluation of the design and context dependencies of massively parallel reporter assays. Nat. Methods 2020, 17, 1083–1091. [Google Scholar] [CrossRef] [PubMed]
Doni Jayavelu, N.; Jajodia, A.; Mishra, A.; Hawkins, R.D. Candidate silencer elements for the human and mouse genomes. Nat. Commun. 2020, 11, 1061. [Google Scholar] [CrossRef] [PubMed]
Hussain, S.; Sadouni, N.; van Essen, D.; Dao, L.T.M.; Ferré, Q.; Charbonnier, G.; Torres, M.; Gallardo, F.; Lecellier, C.H.; Sexton, T.; et al. Short tandem repeats are important contributors to silencer elements in T cells. Nucleic Acids Res. 2023, 51, 4845–4866. [Google Scholar] [CrossRef] [PubMed]
Mouri, K.; Dewey, H.B.; Castro, R.; Berenzy, D.; Kales, S.; Tewhey, R. Whole-genome functional characterization of RE1 silencers using a modified massively parallel reporter assay. Cell Genom. 2023, 3, 100234. [Google Scholar] [CrossRef] [PubMed]
Kim, S.; Yu, N.-K.; Kaang, B.-K. CTCF as a multifunctional protein in genome regulation and gene expression. Exp. Mol. Med. 2015, 47, e166. [Google Scholar] [CrossRef]
Jia, Z.; Li, J.; Ge, X.; Wu, Y.; Guo, Y.; Wu, Q. Tandem CTCF sites function as insulators to balance spatial chromatin contacts and topological enhancer-promoter selection. Genome Biol. 2020, 21, 75. [Google Scholar] [CrossRef]
Song, Y.; Liang, Z.; Zhang, J.; Hu, G.; Wang, J.; Li, Y.; Guo, R.; Dong, X.; Babarinde, I.A.; Ping, W.; et al. CTCF functions as an insulator for somatic genes and a chromatin remodeler for pluripotency genes during reprogramming. Cell Rep. 2022, 39, 110626. [Google Scholar] [CrossRef] [PubMed]
Hong, C.K.; Erickson, A.A.; Li, J.; Federico, A.J.; Cohen, B.A. Massively parallel characterization of insulator activity across the genome. bioRxiv 2022. [Google Scholar] [CrossRef]
Sample, P.J.; Wang, B.; Reid, D.W.; Presnyak, V.; McFadyen, I.J.; Morris, D.R.; Seelig, G. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat. Biotechnol. 2019, 37, 803–809. [Google Scholar] [CrossRef] [PubMed]
Jia, L.; Mao, Y.; Ji, Q.; Dersh, D.; Yewdell, J.W.; Qian, S.-B. Decoding mRNA translatability and stability from the 5′ UTR. Nat. Struct. Mol. Biol. 2020, 27, 814–821. [Google Scholar] [CrossRef] [PubMed]
Plassmeyer, S.P.; Florian, C.P.; Kasper, M.J.; Chase, R.; Mueller, S.; Liu, Y.; White, K.M.; Jungers, C.F.; Djuranovic, S.P.; Djuranovic, S.; et al. A Massively Parallel Screen of 5′ UTR Mutations Identifies Variants Impacting Translation and Protein Production in Neurodevelopmental Disorder Genes. medRxiv 2023. [Google Scholar] [CrossRef]
Reimão-Pinto, M.M.; Castillo-Hair, S.M.; Seelig, G.; Schier, A.F. The regulatory landscape of 5′ UTRs in translational control during zebrafish embryogenesis. bioRxiv 2023. [Google Scholar] [CrossRef]
Cao, J.; Novoa, E.M.; Zhang, Z.; Chen, W.C.W.; Liu, D.; Choi, G.C.G.; Wong, A.S.L.; Wehrspaun, C.; Kellis, M.; Lu, T.K. High-throughput 5′ UTR engineering for enhanced protein production in non-viral gene therapies. Nat. Commun. 2021, 12, 4138. [Google Scholar] [CrossRef] [PubMed]
Lim, Y.; Arora, S.; Schuster, S.L.; Corey, L.; Fitzgibbon, M.; Wladyka, C.L.; Wu, X.; Coleman, I.M.; Delrow, J.J.; Corey, E.; et al. Multiplexed functional genomic analysis of 5′ untranslated region mutations across the spectrum of prostate cancer. Nat. Commun. 2021, 12, 4217. [Google Scholar] [CrossRef] [PubMed]
Oikonomou, P.; Goodarzi, H.; Tavazoie, S. Systematic identification of regulatory elements in conserved 3′ UTRs of human transcripts. Cell Rep. 2014, 7, 281–292. [Google Scholar] [CrossRef]
Griesemer, D.; Xue, J.R.; Reilly, S.K.; Ulirsch, J.C.; Kukreja, K.; Davis, J.R.; Kanai, M.; Yang, D.K.; Butts, J.C.; Guney, M.H.; et al. Genome-wide functional screen of 3′ UTR variants uncovers causal variants for human disease and evolution. Cell 2021, 184, 5247–5260.e5219. [Google Scholar] [CrossRef]
Litterman, A.J.; Kageyama, R.; Le Tonqueze, O.; Zhao, W.; Gagnon, J.D.; Goodarzi, H.; Erle, D.J.; Ansel, K.M. A massively parallel 3′ UTR reporter assay reveals relationships between nucleotide content, sequence conservation, and mRNA destabilization. Genome Res. 2019, 29, 896–906. [Google Scholar] [CrossRef]
Siegel, D.A.; Le Tonqueze, O.; Biton, A.; Zaitlen, N.; Erle, D.J. Massively parallel analysis of human 3′ UTRs reveals that AU-rich element length and registration predict mRNA destabilization. G3 2022, 12, jkab404. [Google Scholar] [CrossRef]
Zhao, W.; Pollack, J.L.; Blagev, D.P.; Zaitlen, N.; McManus, M.T.; Erle, D.J. Massively parallel functional annotation of 3′ untranslated regions. Nat. Biotechnol. 2014, 32, 387–391. [Google Scholar] [CrossRef] [PubMed]
Rabani, M.; Pieper, L.; Chew, G.L.; Schier, A.F. A Massively Parallel Reporter Assay of 3′ UTR Sequences Identifies In Vivo Rules for mRNA Degradation. Mol. Cell 2017, 68, 1083–1094.e1085. [Google Scholar] [CrossRef] [PubMed]
Schuster, S.L.; Arora, S.; Wladyka, C.L.; Itagi, P.; Corey, L.; Young, D.; Stackhouse, B.L.; Kollath, L.; Wu, Q.V.; Corey, E.; et al. Multi-level functional genomics reveals molecular and cellular oncogenicity of patient-based 3′ untranslated region mutations. Cell Rep. 2023, 42, 112840. [Google Scholar] [CrossRef] [PubMed]
Hormozdiari, F.; Kostem, E.; Kang, E.Y.; Pasaniuc, B.; Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 2014, 198, 497–508. [Google Scholar] [CrossRef] [PubMed]
Benner, C.; Spencer, C.C.; Havulinna, A.S.; Salomaa, V.; Ripatti, S.; Pirinen, M. FINEMAP: Efficient variable selection using summary data from genome-wide association studies. Bioinformatics 2016, 32, 1493–1501. [Google Scholar] [CrossRef] [PubMed]
Pasaniuc, B.; Price, A.L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 2017, 18, 117–127. [Google Scholar] [CrossRef] [PubMed]
Schaid, D.J.; Chen, W.; Larson, N.B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 2018, 19, 491–504. [Google Scholar] [CrossRef]
Zhang, S.; Zhang, H.; Zhou, Y.; Qiao, M.; Zhao, S.; Kozlova, A.; Shi, J.; Sanders, A.R.; Wang, G.; Luo, K.; et al. Allele-specific open chromatin in human iPSC neurons elucidates functional disease variants. Science 2020, 369, 561–565. [Google Scholar] [CrossRef]
Liang, D.; Elwell, A.L.; Aygün, N.; Krupa, O.; Wolter, J.M.; Kyere, F.A.; Lafferty, M.J.; Cheek, K.E.; Courtney, K.P.; Yusupova, M.; et al. Cell-type-specific effects of genetic variation on chromatin accessibility during human neuronal differentiation. Nat. Neurosci. 2021, 24, 941–953. [Google Scholar] [CrossRef] [PubMed]
Vale-Silva, L.A.; Markowitz, T.E.; Hochwagen, A. SNP-ChIP: A versatile and tag-free method to quantify changes in protein binding across the genome. BMC Genom. 2019, 20, 54. [Google Scholar] [CrossRef] [PubMed]
Hua, H. From GWAS to single-cell MPRA. Nat. Methods 2023, 20, 349. [Google Scholar] [CrossRef] [PubMed]
McAfee, J.C.; Bell, J.L.; Krupa, O.; Matoba, N.; Stein, J.L.; Won, H. Focus on your locus with a massively parallel reporter assay. J. Neurodev. Disord. 2022, 14, 50. [Google Scholar] [CrossRef] [PubMed]
Fabo, T.; Khavari, P. Functional characterization of human genomic variation linked to polygenic diseases. Trends Genet. 2023, 39, 462–490. [Google Scholar] [CrossRef] [PubMed]
Joslin, A.C.; Sobreira, D.R.; Hansen, G.T.; Sakabe, N.J.; Aneas, I.; Montefiori, L.E.; Farris, K.M.; Gu, J.; Lehman, D.M.; Ober, C.; et al. A functional genomics pipeline identifies pleiotropy and cross-tissue effects within obesity-associated GWAS loci. Nat. Commun. 2021, 12, 5253. [Google Scholar] [CrossRef] [PubMed]
Abell, N.S.; DeGorter, M.K.; Gloudemans, M.J.; Greenwald, E.; Smith, K.S.; He, Z.; Montgomery, S.B. Multiple causal variants underlie genetic associations in humans. Science 2022, 375, 1247–1254. [Google Scholar] [CrossRef] [PubMed]
Ajore, R.; Niroula, A.; Pertesi, M.; Cafaro, C.; Thodberg, M.; Went, M.; Bao, E.L.; Duran-Lozano, L.; de Lapuente Portilla, A.L.; Olafsdottir, T.; et al. Functional dissection of inherited non-coding variation influencing multiple myeloma risk. Nat. Commun. 2022, 13, 151. [Google Scholar] [CrossRef] [PubMed]
Long, E.; Yin, J.; Funderburk, K.M.; Xu, M.; Feng, J.; Kane, A.; Zhang, T.; Myers, T.; Golden, A.; Thakur, R.; et al. Massively parallel reporter assays and variant scoring identified functional variants and target genes for melanoma loci and highlighted cell-type specificity. Am. J. Hum. Genet. 2022, 109, 2210–2229. [Google Scholar] [CrossRef]
Cui, Y.; Arnold, F.J.; Peng, F.; Wang, D.; Li, J.S.; Michels, S.; Wagner, E.J.; La Spada, A.R.; Li, W. Alternative polyadenylation transcriptome-wide association study identifies APA-linked susceptibility genes in brain disorders. Nat. Commun. 2023, 14, 583. [Google Scholar] [CrossRef]
Duan, Y.Y.; Chen, X.F.; Zhu, R.J.; Jia, Y.Y.; Huang, X.T.; Zhang, M.; Yang, N.; Dong, S.S.; Zeng, M.; Feng, Z.; et al. High-throughput functional dissection of noncoding SNPs with biased allelic enhancer activity for insulin resistance-relevant phenotypes. Am. J. Hum. Genet. 2023, 110, 1266–1288. [Google Scholar] [CrossRef] [PubMed]
McAfee, J.C.; Lee, S.; Lee, J.; Bell, J.L.; Krupa, O.; Davis, J.; Insigne, K.; Bond, M.L.; Zhao, N.; Boyle, A.P.; et al. Systematic investigation of allelic regulatory activity of schizophrenia-associated common variants. Cell Genom. 2023, 3, 100404. [Google Scholar] [CrossRef] [PubMed]
Feng, Y.; Xie, N.; Inoue, F.; Fan, S.; Saskin, J.; Zhang, C.; Zhang, F.; Hansen, M.E.B.; Nyambo, T.; Mpoloka, S.W.; et al. Integrative functional genomic analyses identify genetic variants influencing skin pigmentation in Africans. Nat. Genet. 2024, 56, 258–272. [Google Scholar] [CrossRef] [PubMed]
Kotliar, D.; Raju, S.; Tabrizi, S.; Odia, I.; Goba, A.; Momoh, M.; Sandi, J.D.; Nair, P.; Phelan, E.; Tariyal, R.; et al. Genome-wide association study identifies human genetic variants associated with fatal outcome from Lassa fever. Nat. Microbiol. 2024, 9, 751–762. [Google Scholar] [CrossRef] [PubMed]
Myint, L.; Wang, R.; Boukas, L.; Hansen, K.D.; Goff, L.A.; Avramopoulos, D. A screen of 1,049 schizophrenia and 30 Alzheimer’s-associated variants for regulatory potential. Am. J. Med. Genet. B Neuropsychiatr. Genet. 2020, 183, 61–73. [Google Scholar] [CrossRef] [PubMed]
Shook, M.S.; Lu, X.; Chen, X.; Parameswaran, S.; Edsall, L.; Trimarchi, M.P.; Ernst, K.; Granitto, M.; Forney, C.; Donmez, O.A.; et al. Systematic identification of genotype-dependent enhancer variants in eosinophilic esophagitis. Am. J. Hum. Genet. 2024, 111, 280–294. [Google Scholar] [CrossRef] [PubMed]
Pratt, B.M.; Won, H. Advances in profiling chromatin architecture shed light on the regulatory dynamics underlying brain disorders. Semin. Cell Dev. Biol. 2022, 121, 153–160. [Google Scholar] [CrossRef] [PubMed]
Neumayr, C.; Pagani, M.; Stark, A.; Arnold, C.D. STARR-seq and UMI-STARR-seq: Assessing Enhancer Activities for Genome-Wide-, High-, and Low-Complexity Candidate Libraries. Curr. Protoc. Mol. Biol. 2019, 128, e105. [Google Scholar] [CrossRef]
Eraslan, G.; Avsec, Ž.; Gagneur, J.; Theis, F.J. Deep learning: New computational modelling techniques for genomics. Nat. Rev. Genet. 2019, 20, 389–403. [Google Scholar] [CrossRef]
Zou, J.; Huss, M.; Abid, A.; Mohammadi, P.; Torkamani, A.; Telenti, A. A primer on deep learning in genomics. Nat. Genet. 2019, 51, 12–18. [Google Scholar] [CrossRef]
Alipanahi, B.; Delong, A.; Weirauch, M.T.; Frey, B.J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 2015, 33, 831–838. [Google Scholar] [CrossRef] [PubMed]
Washburn, J.D.; Mejia-Guerra, M.K.; Ramstein, G.; Kremling, K.A.; Valluru, R.; Buckler, E.S.; Wang, H. Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence. Proc. Natl. Acad. Sci. USA 2019, 116, 5542–5549. [Google Scholar] [CrossRef]
Agarwal, V.; Shendure, J. Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Rep. 2020, 31, 107663. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 2015, 12, 931–934. [Google Scholar] [CrossRef] [PubMed]
Kelley, D.R.; Snoek, J.; Rinn, J.L. Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016, 26, 990–999. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Theesfeld, C.L.; Yao, K.; Chen, K.M.; Wong, A.K.; Troyanskaya, O.G. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 2018, 50, 1171–1179. [Google Scholar] [CrossRef] [PubMed]
Wang, X.; Goldstein, D.B. Enhancer Domains Predict Gene Pathogenicity and Inform Gene Discovery in Complex Disease. Am. J. Hum. Genet. 2020, 106, 215–233. [Google Scholar] [CrossRef]
Quang, D.; Xie, X. FactorNet: A deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 2019, 166, 40–47. [Google Scholar] [CrossRef]
Avsec, Ž.; Agarwal, V.; Visentin, D.; Ledsam, J.R.; Grabska-Barwinska, A.; Taylor, K.R.; Assael, Y.; Jumper, J.; Kohli, P.; Kelley, D.R. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 2021, 18, 1196–1203. [Google Scholar] [CrossRef]
Minnoye, L.; Taskiran, I.I.; Mauduit, D.; Fazio, M.; Van Aerschot, L.; Hulselmans, G.; Christiaens, V.; Makhzami, S.; Seltenhammer, M.; Karras, P. Cross-species analysis of enhancer logic using deep learning. Genome Res. 2020, 30, 1815–1834. [Google Scholar] [CrossRef]
Sethi, A.; Gu, M.; Gumusgoz, E.; Chan, L.; Yan, K.-K.; Rozowsky, J.; Barozzi, I.; Afzal, V.; Akiyama, J.A.; Plajzer-Frick, I.; et al. Supervised enhancer prediction with epigenetic pattern recognition and targeted validation. Nat. Methods 2020, 17, 807–814. [Google Scholar] [CrossRef] [PubMed]
Cofer, E.M.; Raimundo, J.; Tadych, A.; Yamazaki, Y.; Wong, A.K.; Theesfeld, C.L.; Levine, M.S.; Troyanskaya, O.G. Modeling transcriptional regulation of model species with deep learning. Genome Res. 2021, 31, 1097–1105. [Google Scholar] [CrossRef] [PubMed]
Zhang, P.; Wang, H.; Xu, H.; Wei, L.; Liu, L.; Hu, Z.; Wang, X. Deep flanking sequence engineering for efficient promoter design using DeepSEED. Nat. Commun. 2023, 14, 6309. [Google Scholar] [CrossRef] [PubMed]
Zeng, C.; Hou, X.; Yan, J.; Zhang, C.; Li, W.; Zhao, W.; Du, S.; Dong, Y. Leveraging mRNA Sequences and Nanoparticles to Deliver SARS-CoV-2 Antigens In Vivo. Adv. Mater. 2020, 32, e2004452. [Google Scholar] [CrossRef] [PubMed]
Tang, X.; Huo, M.; Chen, Y.; Huang, H.; Qin, S.; Luo, J.; Qin, Z.; Jiang, X.; Liu, Y.; Duan, X.; et al. A novel deep generative model for mRNA vaccine development: Designing 5′ UTRs with N1-methyl-pseudouridine modification. Acta Pharm. Sin. B 2024, 14, 1814–1826. [Google Scholar] [CrossRef] [PubMed]
Deng, C.; Whalen, S.; Steyert, M.; Ziffra, R.; Przytycki, P.F.; Inoue, F.; Pereira, D.A.; Capauto, D.; Norton, S.; Vaccarino, F.M.; et al. Massively parallel characterization of regulatory elements in the developing human cortex. Science 2024, 384, eadh0559. [Google Scholar] [CrossRef] [PubMed]
Sahu, B.; Hartonen, T.; Pihlajamaa, P.; Wei, B.; Dave, K.; Zhu, F.; Kaasinen, E.; Lidschreiber, K.; Lidschreiber, M.; Daub, C.O.; et al. Sequence determinants of human gene regulatory elements. Nat. Genet. 2022, 54, 283–294. [Google Scholar] [CrossRef] [PubMed]
Cuperus, J.T.; Groves, B.; Kuchina, A.; Rosenberg, A.B.; Jojic, N.; Fields, S.; Seelig, G. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 2017, 27, 2015–2024. [Google Scholar] [CrossRef] [PubMed]
de Boer, C.G.; Vaishnav, E.D.; Sadeh, R.; Abeyta, E.L.; Friedman, N.; Regev, A. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 2020, 38, 56–65. [Google Scholar] [CrossRef]
Kreimer, A.; Ashuach, T.; Inoue, F.; Khodaverdian, A.; Deng, C.; Yosef, N.; Ahituv, N. Massively parallel reporter perturbation assays uncover temporal regulatory architecture during neural differentiation. Nat. Commun. 2022, 13, 1504. [Google Scholar] [CrossRef]
Talukder, A.; Barham, C.; Li, X.; Hu, H. Interpretation of deep learning in genomics and epigenomics. Brief. Bioinform. 2020, 22, bbaa177. [Google Scholar] [CrossRef] [PubMed]
Lanchantin, J.; Singh, R.; Lin, Z.; Qi, Y. Deep Motif: Visualizing Genomic Sequence Classifications. arXiv 2016, arXiv:1605.01133. [Google Scholar]
Cao, Z.; Zhang, S. Simple tricks of convolutional neural network architectures improve DNA–protein binding prediction. Bioinformatics 2018, 35, 1837–1843. [Google Scholar] [CrossRef] [PubMed]
Das, M.; Hossain, A.; Banerjee, D.; Praul, C.A.; Girirajan, S. Challenges and considerations for reproducibility of STARR-seq assays. Genome Res. 2023, 33, 479–495. [Google Scholar] [CrossRef]
Gallego Romero, I.; Lea, A.J. Leveraging massively parallel reporter assays for evolutionary questions. Genome Biol. 2023, 24, 26. [Google Scholar] [CrossRef] [PubMed]
Asokan, A.; Shen, S. Redirecting AAV vectors to extrahepatic tissues. Mol. Ther. 2023, 31, 3371–3375. [Google Scholar] [CrossRef] [PubMed]
Lau, C.H.; Suh, Y. In vivo genome editing in animals using AAV-CRISPR system: Applications to translational research of human disease. F1000Research 2017, 6, 2153. [Google Scholar] [CrossRef]
Jüttner, J.; Szabo, A.; Gross-Scherf, B.; Morikawa, R.K.; Rompani, S.B.; Hantz, P.; Szikra, T.; Esposti, F.; Cowan, C.S.; Bharioke, A.; et al. Targeting neuronal and glial cell types with synthetic promoter AAVs in mice, non-human primates and humans. Nat. Neurosci. 2019, 22, 1345–1356. [Google Scholar] [CrossRef] [PubMed]
Heidersbach, A.J.; Dorighi, K.M.; Gomez, J.A.; Jacobi, A.M.; Haley, B. A versatile, high-efficiency platform for CRISPR-based gene activation. Nat. Commun. 2023, 14, 902. [Google Scholar] [CrossRef]
Zhou, J.; Liu, G.; Zhao, Y.; Zhang, R.; Tang, X.; Li, L.; Jia, X.; Guo, Y.; Wu, Y.; Han, Y.; et al. An efficient CRISPR–Cas12a promoter editing system for crop improvement. Nat. Plants 2023, 9, 588–604. [Google Scholar] [CrossRef]
Pan, Z.; Wang, Y.; Wang, M.; Wang, Y.; Zhu, X.; Gu, S.; Zhong, C.; An, L.; Shan, M.; Damas, J.; et al. An atlas of regulatory elements in chicken: A resource for chicken genetics and genomics. Sci. Adv. 2023, 9, eade1204. [Google Scholar] [CrossRef] [PubMed]
Kern, C.; Wang, Y.; Xu, X.; Pan, Z.; Halstead, M.; Chanthavixay, G.; Saelao, P.; Waters, S.; Xiang, R.; Chamberlain, A.; et al. Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat. Commun. 2021, 12, 1821. [Google Scholar] [CrossRef] [PubMed]
Pan, Z.; Yao, Y.; Yin, H.; Cai, Z.; Wang, Y.; Bai, L.; Kern, C.; Halstead, M.; Chanthavixay, G.; Trakooljul, N.; et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat. Commun. 2021, 12, 5848. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Hou, Y.; Xu, Y.; Luan, Y.; Zhou, H.; Qi, X.; Hu, M.; Wang, D.; Wang, Z.; Fu, Y.; et al. A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome. Nat. Commun. 2021, 12, 2217. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Gao, Y.; Canela-Xandri, O.; Wang, S.; Yu, Y.; Cai, W.; Li, B.; Xiang, R.; Chamberlain, A.J.; Pairo-Castineira, E.; et al. A multi-tissue atlas of regulatory variants in cattle. Nat. Genet. 2022, 54, 1438–1447. [Google Scholar] [CrossRef] [PubMed]
Zhao, L.; Xie, L.; Zhang, Q.; Ouyang, W.; Deng, L.; Guan, P.; Ma, M.; Li, Y.; Zhang, Y.; Xiao, Q.; et al. Integrative analysis of reference epigenomes in 20 rice varieties. Nat. Commun. 2020, 11, 2658. [Google Scholar] [CrossRef] [PubMed]
Marand, A.P.; Chen, Z.; Gallavotti, A.; Schmitz, R.J. A cis-regulatory atlas in maize at single-cell resolution. Cell 2021, 184, 3041–3055.e3021. [Google Scholar] [CrossRef] [PubMed]
Song, X.; Meng, X.; Guo, H.; Cheng, Q.; Jing, Y.; Chen, M.; Liu, G.; Wang, B.; Wang, Y.; Li, J.; et al. Targeting a gene regulatory element enhances rice grain yield by decoupling panicle number and size. Nat. Biotechnol. 2022, 40, 1403–1411. [Google Scholar] [CrossRef] [PubMed]
Ye, J.-W.; Lin, Y.-N.; Yi, X.-Q.; Yu, Z.-X.; Liu, X.; Chen, G.-Q. Synthetic biology of extremophiles: A new wave of biomanufacturing. Trends Biotechnol. 2023, 41, 342–357. [Google Scholar] [CrossRef]
Schmitt, A.D.; Hu, M.; Ren, B. Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol. 2016, 17, 743–755. [Google Scholar] [CrossRef]
Hong, S.P.; Seip, J.; Walters-Pollak, D.; Rupert, R.; Jackson, R.; Xue, Z.; Zhu, Q. Engineering Yarrowia lipolytica to express secretory invertase with strong FBA1IN promoter. Yeast 2012, 29, 59–72. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The MPRA system is used for high-throughput identification of promoters, enhancers, and silencers. Initially, a plasmid library pairing test fragments with tags is constructed, followed by the insertion of either GFP alone or a promoter followed by GFP. This GFP plasmid library is then transfected into cells, after which cellular RNA is collected to create an RNA-seq library. The activity of the test fragments is determined by comparing the ratio of tag counts between the GFP plasmid library and the RNA-seq library using the pairing information of the test fragments and tags. The differentiation among them is that only GFP is inserted for promoter activity testing, miniP and GFP for enhancers, and a strong promoter such as PGK plus GFP for silencers.

Figure 2. The STARR-seq system is employed for high-throughput identification of enhancers and silencers. Test fragments are inserted between the GFP and poly A signal within a plasmid backbone, and the plasmid library is sequenced and transfected into cells. An RNA-seq library is then constructed, and the activity of enhancers or silencers is quantified by calculating the abundance ratio of the transcript inserts relative to the plasmid library. The main distinction between the identification of these elements lies in the type of the promoter used; enhancers use a weak promoter such as miniP or ORI, while silencers use a strong promoter like PGK.

Figure 3. High-throughput identification of insulators using the MPRA system. This system assesses the regulatory impact of isolated heterochromatin on gene function. Developed by Zhang et al., it employs serine integrase to integrate test sequences near heterochromatin and analyzes the shielding effect of test fragments through the expression intensity of target genes such as GFP [9].

Figure 4. High-throughput identification of 5′ UTRs using the MPRA system. This strategy combines MPRA with in vitro transcription and ribosome component analysis to identify the impact of 5′ UTRs on the protein translation process.

Figure 5. High-throughput identification of 3′ UTRs using the MPRA system. The MPRA system uses MPRA and ribosome component analysis to study the impact of 3′ UTRs on protein translation, employing in vitro transcription to collect mRNA at different time points to study the impact of 3′ UTRs on mRNA stability. The asterisks indicate the time points at which mRNA was collected.

Figure 6. Applications of CRE research in human gene therapy, agricultural molecular breeding, and microbial cell factories. The left side of the diagram shows how humans integrate the research findings of CREs with human gene therapy technologies to develop customized treatment plans for human diseases. The middle part of the diagram illustrates the use of identified functional sites that affect economically important traits for breeding superior varieties or the use of functional elements that influence traits for gene editing to cultivate high-quality, high-yield varieties. The right side of the diagram displays the continuous optimization and upgrading of the production processes and yields of microbial cell factories through the optimization and modification of CREs.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, L.; Liu, Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules 2024, 14, 945. https://doi.org/10.3390/biom14080945

AMA Style

Xu L, Liu Y. Identification, Design, and Application of Noncoding Cis-Regulatory Elements. Biomolecules. 2024; 14(8):945. https://doi.org/10.3390/biom14080945

Chicago/Turabian Style

Xu, Lingna, and Yuwen Liu. 2024. "Identification, Design, and Application of Noncoding Cis-Regulatory Elements" Biomolecules 14, no. 8: 945. https://doi.org/10.3390/biom14080945

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification, Design, and Application of Noncoding Cis-Regulatory Elements

Abstract

1. Introduction

2. High-Throughput Direct Identification of CRE Activity Using MPRA

2.1. MPRA Promoter

2.2. MPRA Enhancer

2.3. MPRA Silencer

2.4. MPRA Insulator

2.5. MPRA-5′

2.6. MPRA-3′ UTR

2.7. MPRA Technology Is an Effective Strategy for Fine-Mapping Causal Variants of Complex Traits

2.8. Limitations of MPRA

3. Using Deep Learning for Predicting CRE Activity and De Novo Design

3.1. Deep Learning Models Are Used to Predict the Activity of CREs and mRNA Expression Levels

3.2. Deep Learning Models Decode Sequence Features of CREs and Accurately Predict Activity

3.3. De Novo CREs Design Using Learning Models

3.4. Challenges of Integrating Deep Learning with MPRA

4. Outlook: Broad Applications of CREs across Various Fields

4.1. Optimizing CREs Is One of the Key Future Research Directions for Human Gene Therapies

4.2. Optimizing CREs Is Critical for the Development of Agricultural Crops and Livestock Breeds with Superior Quality and Yield

4.3. In-Depth Research on CREs Lays a Solid Theoretical Groundwork for Customizing Microbial Cell Factories in the Future

4.4. Conclusions and Overlook

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI