Next Article in Journal
Impact of Implementing Female Genomic Selection and the Use of Sex-Selected Semen Technology on Genetic Gain in a Dairy Herd in New Zealand
Previous Article in Journal
Prodrugs in Oncology: Bioactivation and Impact on Therapeutic Efficacy and Toxicity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Deaminase-Driven Reverse Transcription Mutagenesis in Oncogenesis: Critical Analysis of Transcriptional Strand Asymmetries of Single Base Substitution Signatures

by
Edward J. Steele
1,* and
Robyn A. Lindley
2
1
Melville Analytics Pty Ltd. and Immunomics, Kangaroo Point, Brisbane 4169, Australia
2
Department Clinical Pathology, Victorian Comprehensive Cancer Centre (VCCC), University of Melbourne, Melbourne 3052, Australia
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2025, 26(3), 989; https://doi.org/10.3390/ijms26030989
Submission received: 19 November 2024 / Revised: 5 January 2025 / Accepted: 9 January 2025 / Published: 24 January 2025
(This article belongs to the Section Molecular Oncology)

Abstract

:
This paper provides a critical analysis of the molecular mechanisms presently used to explain transcriptional strand asymmetries of single base substitution (SBS) signatures observed in cancer genomes curated at the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Wellcome Trust Sanger Institute). The analysis is based on a deaminase-driven reverse transcriptase (DRT) mutagenesis model of cancer oncogenesis involving both the cytosine (AID/APOBEC) and adenosine (ADAR) mutagenic deaminases. In this analysis we apply what is known, or can reasonably be inferred, of the immunoglobulin somatic hypermutation (Ig SHM) mechanism to the analysis of the transcriptional stand asymmetries of the COSMIC SBS signatures that are observed in cancer genomes. The underlying assumption is that somatic mutations arising in cancer genomes are driven by dysregulated off-target Ig SHM-like mutagenic processes at non-Ig loci. It is reasoned that most SBS signatures whether of “unknown etiology” or assigned-molecular causation, can be readily understood in terms of the DRT-paradigm. These include the major age-related “clock-like” SBS5 signature observed in all cancer genomes sequenced and many other common subset signatures including SBS1, SBS3, SBS2/13, SBS6, SBS12, SBS16, SBS17a/17b, SBS19, SBS21, as well as signatures clearly arising from exogenous causation. We conclude that the DRT-model provides a plausible molecular framework that augments our current understanding of immunogenetic mechanisms driving oncogenesis. It accommodates both what is known about AID/APOBEC and ADAR somatic mutation strand asymmetries and provides a fully integrated understanding into the molecular origins of common COSMIC SBS signatures. The DRT-paradigm thus provides scientists and clinicians with additional molecular insights into the causal links between deaminase-associated genomic signatures and oncogenic processes.

1. Introduction

The purpose here in the Introduction is to provide the conceptual background to the molecular analysis that follows (Section 2), that is, our critical analyses of the underlying causes of the strand biased somatic mutation signatures documented in thousands of cancer genomes.
It is now generally agreed that the cytosine (AID/APOBEC) and adenosine (ADAR) deaminases targeting DNA and RNA C-site and A-site substrates play important roles in health and immunity. However, when dysregulated to “off-target” genomic C- and A-sites in protein coding genes they can potentially cause somatic mutations and add to the severity of progressive genetic diseases such as cancer. Their causative role in front-line innate immunity to viral infections, restriction of dangerously active mobile retrotransposons, as well as mutagenesis of the cancer genome at “off-target” DNA and RNA substrates has been reviewed in detail for AID/APOBEC family of deaminases [1,2]. We have focused over the past 20 years on the putative wider impact of aberrant immunoglobulin (Ig) somatic hypermutation (SHM)-like processes also via ADAR1/ADAR2 deaminase A-to-I editors causing A-to-I(G) mutations in both RNA and DNA substrates across the cancer genome. Thus, a role for both AID/APOBEC and ADAR deaminases in targeted somatic mutations (TSM) in codon context in the TP53 DNA binding region in TP53-negative breast cancers was initially reported [3]. These unconventional TSM analyses also show a putative role in human genomic evolution for AID/APOBEC and ADAR deaminases in the appearance of the many single nucleotide polymorphisms (SNPs) in large numbers of defective genes curated in the human OMIM database (Online Mendelian Inheritance in Man) [4]. All these “off-target” dysregulated AID/APOBEC/ADAR mutagenesis data and others have been comprehensively reviewed [5]. Further, these concepts have inspired promising prognostic/prediction algorithms for a number of cancers [6]. We discuss all these types of cancer mutagenesis data in the context of the reverse transcriptase (RT) mechanism of immunoglobulin (Ig) somatic hypermutation (SHM) at both Ig and non-Ig loci [7].
This paper, therefore, lays out a critical analysis of the mechanisms of origin of transcriptional strand asymmetries observed in the single base substitution (SBS) signatures curated at the online COSMIC database (Box 1, Figure 1, Table 1). One subsidiary aim is that this knowledge may also be leveraged to develop more precise predictive genomic tests for use in the clinic and for understanding personalized medicine in patient responses to different cancer treatments, such as in recent machine learning analysis of available genomic sequence data on many cancers [6], as well as the prior work on prognostic/predictive codon context-based targeted somatic mutations (TSM) in High Grade progressing ovarian cancers, HGS-OvCa [8].
Table 1. Somatic point mutation patterns (as a percentage of the total) in data sets involving rearranged murine IgV loci (A) and in human cancer SBS5 (B), SBS3 (C).
Table 1. Somatic point mutation patterns (as a percentage of the total) in data sets involving rearranged murine IgV loci (A) and in human cancer SBS5 (B), SBS3 (C).
A. Somatic mutations (Mean % 12 Studies Plus SEM) in Rearranged Murine IgV Loci
Mutant Base
FromATCGTotalStrand Bias Factor
A 10.6 (1.2)6.3 (0.9)14.6 (0.7)31.6 (1.7)A>>T 2.9×
T3.1 (0.6) 5.3 (1.1)2.6 (0.6)11.0 (1.3)p < 0.001
C4.3 (0.8)13.4 (1.3) 3.6 (0.7)21.3 (1.3)G>>C 1.7×
G20.1 (1.9)7.2 (1.4) 8.7 (0.7) 36.1 (2.5)p < 0.001
B. Somatic Mutations (as Percentage of Total 89,120 Mutations) in SBS5
Mutant Base
FromATCGTotalStrand Bias Factor
A 5.33.71625A>>T 1.1×
T4.9 13.94.323.1p < 0.001
C5.415.5 4.225.2G>>C 1.1×
G15.96.54.2 26.5p < 0.001
C. Somatic Mutations (as Percentage of Total 53,833 Mutations) in SBS3
Mutant Base
FromATCGTotalStrand Bias Factor
A 8.44.78.821.8A>>T 1.04×
T7.8 8.15.221p > 0.05
C9.38.4 9.527.2G>>C 1.1×
G9.310.99.8 29.9p < 0.001
All data rounded to one decimal place. All mutations are read from the coding or non-transcribed strand (NTS). A. Data from Steele 2009 [9]. B. and C. Data from Alexandrov et al., 2013, 2020 [10,11]—Single Base Substitution Mutational Signatures (v3.4 October 2023) at the COSMIC website at https://cancer.sanger.ac.uk/signatures/sbs/sbs5/ (accessed on 15 April 2024) and https://cancer.sanger.ac.uk/signatures/sbs/sbs3/ (accessed on 15 April 2024). Only cancer types with a minimum 2000 mutations for the SBS5 or SBS3 signatures with average probability at least 0.75 are considered, for real mutations on transcribed and non-transcribed strands. In both B, C, a Chi-square 4 × 4 test (assigning a nominal 10 to empty cells) gives very large Chi-square values with p-values <0.00001. In B, mutations of T-to-G significantly exceed mutations of A-to-C by 1.16× p < 0.001. In C, there are strand biases within A:T base pairs where A-to-G mutations exceeds T-to-C mutations by 1.1× giving p < 0.01. Similar data for SBS5 and SBS3 broken down by cancer tissue type are shown in Tables S1 and S2. Generic symbol A>>T means mutations of A exceeding mutations of T at A:T base pairs. Generic symbol G>>C means mutations of G exceeding mutations of C at G:C base pairs.
Box 1. COSMIC SBS Signatures—Main Summary by C-site, A-site Category.
Alexandrov and colleagues [10,11,12] report single base substitution signatures (SBS) of mutagenesis in cancer using an algorithm-extraction of tri-nucleotide signatures from whole exome (WES) and whole genome (WGS) sequence data from thousands of cancer genomes. The present categories offer an alternative way of understanding these somatic mutation patterns based on the likelihood of the origin of the dominant deaminase-driven signature—C-site or A-site, or both C-site plus A-site. This categorisation is cognisant of the fact that certain apparent non-deaminase driven or ‘environmental exposure’ signatures (Tobacco Smoking, UV exposure, Reactive Oxygen Species viz. 8oxoG) are also present in certain cancer genomes. The most dominant signature is SBS5 occurring in all cancer genomes sequenced. SBS5 displays a ‘dysregulated Ig-Like SHM’ somatic mutation pattern [1,7,8,9,13,14] with transcriptional strand bias of mutations of A exceeding mutations of T (A>>T) and mutations of G exceeding mutations of C (G>>C), as in Table 1, Table 2 and Table 3, online Supplementary Tables S1–S3. This strand-biased mutation pattern is also observed at both Ig and non-Ig loci (the TP53 DNA binding region) that have undergone somatic mutagenesis [7,14]. Listed below are the authors’ main Deaminase Driven Categories. Superscript T indicates Transcriptional Strand Asymmetry. The SBS Mutational Signatures (v3.4—October 2023) are at The Catalogue of Somatic Mutations in Cancer (COSMIC) website at https://cancer.sanger.ac.uk/signatures/sbs/ (accessed on 15 April 2024)
C-site predominantly (putative AID/APOBEC driven)
SBS1T (at ACG strong strand bias, 5-meCpG, but slight reverse C>>G at CCG, GCG, TCG), SBS2, SBS6, SBS13, SBS19T (pure almost G>A >>C>T only), SBS7a, SBS7b
A-site predominantly (putative ADAR1 and coupled Target Site Reverse Transcription [TSRT], Pol-Eta and/or possibly DNA Pol-Theta driven)
SBS12T(Liver), SBS16T (Liver), SBS17aT, SBS17bT (but reverse strand bias not A>>T it is T>>A), SBS21T (DNA mismatch repair deficiency, but apparent reverse strand bias not A>>T it is T>>A)
C-site plus A-site more or less balanced “Ig-SHM-like” (AID/APOBEC/ADAR driven + TSRT via Pol eta, Pol theta?) SBS5T (SBS40 = SBS5?), SBS3T, SBS9T (but apparent reverse strand bias not A>>T it is T>>A).
SBS signatures not highlighted in above categories are discussed and analysed at length in Results and Analytical Discussion Section 2.1
Table 2. SBS5: Strand biases in types of mutations in different cancers.
Table 2. SBS5: Strand biases in types of mutations in different cancers.
Strand Bias at Selected Base Pairs
Global Strand BiasA-to-G>T-to-G>G-to-A>G-to-T>
CancerA>>TG>>CT-to-CA-to-CC-to-TC-to-A
Billiary-AdenoCA++++++++++++++
Bladder-TCC+++++++++++++++++
Breast-Cancer++++++++++++++++++
CNS-GBM+++++++++++++++
CNS-Medullo+++++++++++++++
ColoRect-AdenoCARNS+++R++++++++++
ESCC++++++++++++++++++
Eso-AdenoCANS+++NS++NS+++
Head-SCC+++++++++NS++++++
Liver-HCC++++++++++++++++++
Lung-AdenoCA++++++++++++NS+++
Lung-SCC+++++++++R++++++++
Lymph-BNHLNS+++NS+++++++
Lymph-CLL+++++++++++++
Panc-AdenoCA++++++++++++++
Prost-AdenoCA++++++++++++NS+++
Skin-MelanomaR++++++++++++++++
Stomach-AdenoCANS+++++++++++++
Uterus-AdenoCAR+++++++++NS+
Code: Intensity Metric of Strand Bias +++ means p < 0.001, ++ p < 0.01, + p < 0.05, NS p > 0.05. R is Reverse Direction of Dominant Global Strand Bias. These summaries have been constructed from the data in Table S1. Only those cancers where the total number of mutations is approx. 30,000 are shown. The flips and inconsistency of direction C-to-G>G-to-C and C-to-G<G-to-C consistent with REV1 translesion DNA repair equally focused on repairing single bp lesions on both strands [9]. To summarize C-to-G>G-to-C is NS Billary-AdenoCA, Bladder-TCC, CNS-GBM, ColoRect-AdenoCA, Es0-AdenoCA, Lymph-BNHL, Skin-Melanoma, Stomach-AdenoCA, Uterus-AdenoCA, and C-to-G>G-to-C is significant at least p < 0.05 in Breast-Cancer, Liver-HCC, Lymph-CLL, Panc-AdenoCA, Prost-AdenoCA, and C-to-G<G-to-C at least p < 0.05CNS-Medullo, ESCC, Head-SCC, Lung-AdenoCA and Lung-SCC.
Table 3. Summary of origins and features of main SBS types.
Table 3. Summary of origins and features of main SBS types.
Transcriptional Strand Asymmetry
Deduced DeaminationInferred Cause of
Transcriptional
Strand Asymmetry
Inferred Cause of
T-to-C>A-to-G at
Collapsed R Loops †
DNARNA
COSMIC SBSC-to-UA-to-I
SBS5AID/APOBECADAR (+Hx)TSRTADAR (Hx)
SBS1AID/APOBEC TSRT
SBS2/SBS13AID/APOBEC
SBS3AID/APOBECADARTSRT, TCR
SBS4 TCR
SBS6AID/APOBEC
SBS7a, SBS7bAID/APOBEC TCR
SBS7c, SBS7d ADAR (+Hx) ADAR (Hx)
SBS8 TCR
SBS9AID/APOBECADAR (+Hx)TSRTADAR (Hx)
SBS10a,b SBS14AID/APOBEC
SBS11AID/APOBEC TSRT
SBS12AID/APOBECADARTSRT
SBS15AID/APOBEC
SBS16 ADARTSRT
SBS17a, SBS17b ADAR (+Hx) ADAR (Hx)
SBS18 TSRT
SBS19AID/APOBEC TSRT
SBS84AID/APOBEC
SBS85 ADAR (+Hx) ADAR (Hx)
† In some cases, depending on context, Wobble Base pairing by Hypoxanthine (Hx) or A-to-I deaminated adenine on the template or transcribed DNA strand (TS) may also lead to excesses of T-to-A>A-to-T, or even T-to-G>A-to-C post replication on the non-transcribed strand (NTS) as Figure 1 e.g., SBS9, SBS7c,7d. For further information on the origin of each signature see text Section 2. It should be noted that there are many overlapping target motifs for AID/APOBEC and ADAR deaminations that reflect uncertainty in the field on the exact hierarchy of the targeting preferences, suggesting a “deaminase overlay” in many somatic mutation cancer signatures (which is implied by the analysis of the origins of global signature SBS5). This has been reviewed [5], and the main tabular deaminase motif target summary is in Table S4. TSRT, target site reverse transcription as Figure 1a; TCR, conventional transcription coupled repair; APOBEC, apolipoprotein B mRNA-editing, catalytic polypeptide; AID, activation-induced cytidine deaminase, a member of APOBEC family of cytosine deaminases; ADAR, adenosine deaminase acting on RNA.

1.1. AID/APOBEC and ADAR Ig SHM-like Dysregulated Mutagenesis

We hypothesize that transcriptional strand asymmetries in SBS signatures can be understood by the action of the mutagenic cytosine (AID/APOBEC) and adenosine (ADAR1/2) deaminases (see Figure 1) often coupled to cellular reverse transcription allowing the generation of distinct transcriptional mutation strand biases. In the case of A-to-I pre-mRNA editing via ADAR1 [27], there is an implied association with cellular reverse transcription, via DNA repair Polymerases eta and theta [7,28] (see Figure 1a). DNA replication of ADAR deaminase-mediated A-to-I DNA modifications (A-to-Hx, Hypoxanthine) can also help explain, as the present analysis shows, distinct strand-biased outcomes at resolved (collapsed) long transcriptional R-Loops. An R-loop, in contrast to an RNA Pol II driven Transcription Bubble is a very long three-stranded nucleic acid structure, composed of a long annealed DNA:RNA hybrid and the associated displaced non-template single-stranded DNA (see Figure 1b).
Additionally to the main SBS list in Box 1, there are some secondary downstream mutation signatures [10,11,12], such as Defective Homologous Recombination Repair (dHR; SBS3), defective DNA mismatch repair (dMMR; SBS15, SBS21, SBS26, SBS44), defective base excision repair (dBER; SBS30, SBS36), defective nucleotide excision repair (dNER), and defects in replicative polymerases POLE or POLD1 genes (dPOLE/dPOLD1; SBS10a,10b, SBS14, SBS20) that may result in additional replication fork-based strand-biased signatures (which is not our focus). However, in the majority of these cases it is posited here that the primary source of the de novo somatic mutations is associated with deaminase mutagenic activity: either a C-to-U, C-to-T, or A-to-I modification potentially causing a mutagenic outcome in DNA or RNA sequence of the cancer genome or transcriptome (which can subsequently, if left unrepaired, be copied back to the evolving cancer genome via cellular reverse transcription). This interpretation assigns causative “AID/APOBEC activity” to a far wider set of SBS signatures than is currently allocated at the online COSMIC site [11,12] to just SBS2 and SBS13.
In this paper the deaminase-driven reverse transcriptase (DRT) paradigm is formerly introduced to show how the above scenarios can plausibly occur in a transcription-linked path during oncogenesis (See Supplementary File Sections S1 and S2 for a wider historical background to the current analysis including a longer list of abbreviations and definitions). It provides a molecular analytical framework based on molecular biology first principles of DNA replication, RNA transcription, and DNA repair. It is a set of foundation features and assumptions that involve AID/APOBEC and ADAR deamination coupled in many cases to a target site reverse transcription, TSRT [23]. It includes the RT activity of the DNA repair polymerase-eta, with putative back up across the cancer genome by the RT activity of DNA repair polymerase-theta [7,28]. This is a clear variation in known RNA templated DNA repair processes now documented in yeast (Saccharomyces cerevisiae) and in human embryonic kidney cells lines (HEK293 cells)—see Section 1.5. Therefore, while this significant step is still not fully understood in every molecular detail, the TSRT process at DNA mutational lesions allows RNA A-to-I mutational modifications to be fixed back into the genomic DNA, scoring primarily as an A-to-G mutation at that site when this unrepaired I (Inosine) is accurately copied and replicated. For example, it helps our understanding of the genesis of the striking transcriptional strand biased A-to-G mutations observed in the genomic DNA of liver cancer cells at WA sites (e.g., origins SBS12, Section 2.1.12).
Other prominent endogenous mutation sources of note include reactive oxygen species (ROS) elevated in Innate Immune Responses and the cell-wide stress response initiated by Interferon-Stimulated Gene cascades, which can also activate APOBEC and ADAR deaminases [29]. ROS can result in oxidative 8oxoG modifications that lead to primary G-to-T mutations (SBS18) that are particularly prominent in Brain and CNS abnormalities [30] and some other cancers [11]. Other important endogenous alkylating events at G, A, T bases may result in non-bulky base modifications that cause instructive mutagenic lesions, including 06-meG G-to-A (C-to-T), 04-meT T-to-C (A-to-G) or cytotoxic lesions (N-7-meG, N3-meA, N2-meG). These often result in abasic sites and ssDNA nicks that are expected to be repaired by a base excision repair (BER) step [31,32].
The deaminase-driven reverse transcriptase (DRT) hypothesis was first articulated in part in 2010 [13]. It then developed further when applied to understanding the transcriptional strand biases of C-to-U(T) mutations at G:C base pairs and accompanying targeted mutations occurring at A:T base pairs in the DNA binding region of TP53-ve tumor samples [3,14]. The principles of the DRT hypothesis, as now formally articulated here for cancer mutagenesis, were further employed in toto or in part in subsequent prediction/prognostication analyses by applying more specific codon-context targeted somatic mutation (TSM) analysis to tumor-normal NGS tumor-normal NGS sequence data [8], and in other deaminase-based somatic mutation and genetic analyses [4,6,33].
The main difference between the DRT hypothesis and other diagnostic and therapy-focused deaminase-associated signature analyses [10,11,12,34] is that the DRT-paradigm focuses on the two main types of mutator processes in carcinogenesis (Box 1). These are: (a) Mutagenic C-site deaminations AID/APOBEC (C-to-U, and C-to-T at 5′meCpG sites); (b) A-site deaminations mediated by ADAR1/2 RNA A-to-I editors (read as A-to-G). In most other oncogenic signature analyses involving transcriptional strand asymmetry, the latter is often ignored or overlooked. This may have been because in the past, the reverse transcriptase model of Ig SHM itself has been controversial, yet that controversy has now died down as more independent data have accumulated [7,28] together with the recognition that the general phenomenon of RNA templated DNA repair, which is now accepted by the non-immunological biochemistry research community working on RNA directed DNA repair mechanisms (see Section 1.5).
The detection of assumed RNA deaminations at ADAR-targeted WA sites now apparent in genomic DNA thus results from the coupling to cellular reverse transcription (DNA Polymerase eta and now putatively DNA Polymerase theta) at many non-Ig loci across the cancer genome. It, therefore, follows that the execution of TSRT with the integration of an error-filled cDNA copy of the base modified transcribed strand (TS) provides the most plausible explanation for understanding how oncogenic strand bias mutation patterns involving both C-site and A-site base modifications arise. This extends 5′ and 3′ as a variable length integrated cDNA “patch” around the deaminase lesion site in the genomic DNA as summarized (Figure 1a) and as developed from the reverse transcriptase mechanism of Ig-SHM [7] (Figure 2). However, it needs to be made clear at this juncture that ADAR-mediated A-to-I deamination can also occur in principle at WA sites (AA or TA) directly on the DNA moiety of annealed RNA: DNA hybrids [35] that are ubiquitously generated at Transcription Bubbles and R-Loops (Section 1.3 and Section 1.4).
Before proceeding to the detailed analysis of the likely origins of the major SBS signatures (Section 2), the main nucleic acid substrates for AID/APOBEC and ADAR deamination first need to be discussed.

1.2. Lagging and Leading Strands of the Replication Forks

These are a significant source of unpaired and exposed ssDNA for AID/APOBEC mediated mutations at C-sites in various SBS signatures. However, they are not strictly relevant to understanding “Transcriptional strand asymmetries” and are not directly discussed or analyzed in detail here, but they are discussed in more detail in Supplementary file Section S2A.

1.3. Stalled Transcription Bubbles in RNA Pol II Transcribed Regions

These provide the great bulk of “Transcriptional strand asymmetries” observed in SBS signatures and are a genome-wide rich source of DNA and RNA substrates for somatic mutations [5,6,36]. Open Transcription Bubbles provide ssDNA in the displaced non transcribed strand (NTS), as shown in Figure 1a and Figure 2. This allows access to C-to-U DNA deamination in the context of the key variable deaminase motifs often close by and overlap, particularly in Ig variable regions: AID at WRCN motifs; various APOBEC3 family members at TCN motifs (APOBEC3A, APOBEC3B, APOBEC3H); and, CCN motifs (APOBEC3G). On the template transcribed strand (TS), in addition to ssDNA tracts at the 5′ and 4′ edges of the bubble, the RNA Exosome actively permits access to unpaired C-sites in the annealed RNA:DNA hybrids [16]. Stalled Transcription Bubbles would also allow the annealed RNA:DNA hybrid region to be attacked by ADAR1 or ADAR2 acting on adenosines base paired in both dsRNA or DNA and RNA moieties of the DNA:RNA hybrid [35]. The nascent dsRNA in stem-loops emergent from the Transcription Bubble also present deamination targets for the transcription coupled Z-DNA binding by ADAR1 associated with RNA Pol II elongation [27]. APOBEC3A is also a known RNA C-to-U editor [1,37,38] and can in theory deaminate nascent pre-mRNA molecules. Stalled Transcription Bubbles are widespread and high frequency events in all protein coding RNA Pol II transcribed genes studied—from the transcription start site (TSS) to a point about 3Kb downstream into the genic regions [39].

1.4. Long R-Loops in RNA Pol II Transcribed Regions

R-Loops, in contrast to shorter Stalled Transcription Bubbles, offer a major source of both long unpaired ssDNA and long annealed RNA:DNA hybrid substrates [5,6,26,35] for both AID/APOBEC C-to-U and ADAR1/2 A-to-I deaminations. ADAR2 has been shown in vitro to deaminate both RNA and DNA moieties of the RNA:DNA hybrid [35]; and the ongoing work by Tasakis et al., 2020 (Pers comm N.F. Papavasiliou) reveals direct ADAR DNA deaminations at RNA:DNA hybrids within R-Loops in vivo (Figure 1b), in progressing multiple myeloma [40]. It has been reported that APOBEC3B both regulates R-Loop formation and promotes transcription-associated mutagenesis in cancer [41]. The entire APOBEC3 family is under TP53 expression regulatory control [42], and we also expect that the RNA editing properties of APOBEC3A play a similar role in RNA:DNA hybrid collapse and resolution (Figure 1b) as it is also a major C-to-U DNA editor in cancer genomes.
Recent evidence analyzed herein implies that both nuclear ADAR1 and ADAR2 act to resolve long annealed RNA:DNA hybrids by A-to-I editing the DNA moiety and RNA moiety (Figure 1b). This facilitates the release of the annealed nascent RNA moiety, which then becomes susceptible to digestion by RNase H enzymes that act by cleaving the RNA released in RNA/DNA hybrids. It is conceivable that APOBEC3A also plays a role in C-to-U editing of the nascent RNA at R-Loops in their dissolution. Such RNA and DNA modifications are expected to assist the collapse of R-Loops to dsDNA helices, albeit now potentially modified by putative A-to-I DNA modifications in some cases (Hypoxanthine, Hx). If these are left unrepaired followed by replication across the R-Loop collapsed region, it allows T-to-C and other Wobble Pair transversions T-to-A, T-to-G to be fixed on the NTS of the DNA helix (Figure 1b).
Three important areas supplying data from unrelated biomedical research over the past 10–20 years now require highlighting before proceeding to the detailed analysis of SBS mutational strand asymmetries. These are (a) the phenomena of RNA templated DNA repair (Section 1.5): (b) experiments showing how AID deaminases actually latch onto and travel with nascent pre-mRNA polymers in RNA Pol II elongation (the “RNA Tether Model”), Section 1.6; and, (c) the very real yet not widely known or understood phenomenon of aberrant ‘Non-B Lymphocyte Ig synthesis and secretion’ in all cancer cells examined thus far, Section 1.7.

1.5. Biochemistry of RNA Templated DNA Repair

We highlight the molecular and biochemical data that strongly supports the TSRT mechanism at the core of the DNA Polymerase eta-driven reverse transcriptase mechanism of antigen-driven immunoglobulin somatic hypermutation (RT Ig-SHM, Figure 2) mechanism in Germinal Centre B lymphocytes [7].
These data have emerged over the past 10–15 years from the non-immunological biochemistry DNA repair discipline. The first from the group of Francesca Storici dating to 2007 [43,44,45] on the mechanism of RNA Templated Homologous Recombination (HR) DNA strand break (double or single strand) repair in yeast strains. The responsible cellular reverse transcriptase now identified in the yeast phenomenon is DNA polymerase zeta, which recent data demonstrate is a far more efficient cellular reverse transcriptase in yeast than its DNA repair translesion counterpart DNA polymerase eta [46].
The other definitive and directly relevant work is by the group of Tapas K Hazra [47,48,49]. Their work is in a well-defined DNA repair system with stable human embryonic kidney cell lines (HEK293 cells). These data, particularly in Chakraborty et al., 2023 [49], provide the most definitive demonstration showing that transcription-coupled (TC) RNA templated non-homologous end joining repair (TC-NHEJ) of double strand breaks (DSB) is clearly executed in a target site reverse transcriptase process (TSRT), as hypothesized by Andrew Franklin during his PhD in 2003 who first demonstrated reverse transcriptase activity human DNA polymerase eta and other Y family members DNA polymerases iota and kappa [50]. In addition, a quite separate analysis by us on the putative reverse transcriptase origin, via an hypothesized transcription-coupled nucleotide excision repair (TC-NER) of CAGn and related repeat expansions in neurological diseases was published in 2020 by Franklin et al. [22]; its relevance to the present review lies in the fortuitous fact of its timely relation to the work of the Hazra group published independently in the same year on similar in-frame trinucleotide expansion diseases such as spinocerebellar ataxia type-3 [48].

1.6. The RNA Tether Model

The “RNA Tether Model” of Michael R Lieber and colleagues fundamentally enhances our understanding of the mechanism of “off-target RT Ig-SHM like” processes consistent with our general model of dysregulated AID/APOBEC and ADAR driven RT Ig-SHM like mutagenesis across the cancer genome.
The earlier work on the mechanisms of human chromosomal translocations in fragile zones [51] has now led to the clear and definitive RNA tether model [52]. This model of AID deaminase action is applicable to not only chromosomal translocations, but also potentially generalizable to Ig SHM and Ig Class Switch Recombination (CSR), and thus implied now also to APOBEC1 and APOBEC3 family deaminations in cancer mutagenesis [3,5]. AID proteins can be tethered to any nascent emergent RNA during RNA Pol II elongation, leaving their deamination binding domains free to deaminate unpaired cytosines in ssDNA regions in and around Stalled Transcription Bubbles (cf. ssDNA regions as shown in Figure 1a and Figure 2)—or potentially even for nascent RNA targeted C-to-U editing by APOBEC3A. The implication is that provided AID deaminase proteins are in the same immediate nuclear vicinity, any emergent non-Ig nascent RNA can be subject to “off-target” mutagenesis. The immediate vicinity could mean the coincidence of the non-Ig transcribed gene transcription factory with an “Ig transcription factory”, as discussed in PR Cook and associates (as reviewed recently in Steele [53]). Therefore, the RNA tether model is a major conceptual advance. It is directly relevant to understanding “dysregulated Ig SHM-like AID/APOBEC/ADAR mutagenesis” across the cancer genome as discussed here and demonstrated in “hot” and “cold” AID deamination topographical associated domains zones by David G Schatz and colleagues in the Ramos Burkitt Lymphoma derived cells lines [54].

1.7. Non-B Lymphocyte Immunoglobulin

The pan-cancer phenomenon of non-B Lymphocyte immunoglobulin (Ig V[D]J DNA Rearrangements) and immunoglobulin synthesis in de novo emergent cancer cells of diverse epithelial origins is also consistent with a model of “dysregulated AID/APOBEC and ADAR driven RT Ig-SHM-like mutagenesis” across the pan-cancer genome.
This is the important work of the group of Xiaoyan Qui and colleagues since 1996. They have characterized the clear reality of cancer-derived Ig proteins emerging de novo in all cancer types examined thus far [55,56,57,58]. Enough independent evidence marshaled by the Qiu group and others convinces the present authors this is a real “Non-B cell Ig” phenomenon of cancer cells (and some transient occasional positivity of some normal cells in the lung and colon). Thus, cancer cell carcinomas derived from the epithelium of the lung, colon, breast, and other tissue sites express and secrete classic Ig molecules, in particular IgG as intact HL heterodimers of 150 Kd of unknown antigen binding specificity. The expressed and secreted Ig is a cancer biomarker [57] and the expressed Ig is associated with pro-tumorgenic properties, metastasis, and tumor evasion. The molecular mechanisms of how all cancers display this phenomenon is poorly understood.
The phenomenon cannot be explained by contaminating infiltrating B lymphocytes or hemopoietic lineage lymphocytic cells—the Ig is clearly cancer cell derived, with genuine heavy chain V[D]J and light VJ rearrangements with overtly functional VDJ and VJ in-frame joints (plus or minus N region additions at the junctions). The patient cancer cells display restricted sets of germline VH rearrangements in sets of VDJ functional rearrangements—and the expressed Ig protein displays aberrant glycosylation in H chain constant regions [58] the assumed glycosylation marker characterized exhaustively by Gregory Lee [57].
An important paper in the series is Zheng et al., 2009 [58], which shows an apparent classic strand bias yet clearly “dysregulated Ig-SHM” occurring de novo in cancer cells. Thus, there are clear somatic (hyper)mutations of these functional rearrangements displaying the signature of Ig SHM strand biased patterns with mutations of A exceeding mutations of T, A>>T (at WA/TW sites, implying a role for DNA Polymerase eta) and mutations of G exceeding mutations of C, G>>C (RGYW/WRCY), as we have reviewed here (Table 1) and elsewhere [7]. This, of course, implies AID-driven reverse transcription mutagenesis in oncogenesis, as discussed in earlier Lindley and Steele papers since 2010 [1,8,13,14].
All this necessary conceptual background now sets the stage for the systematic analysis of the causative origins of the main transcriptional strand biased SBS signatures.

2. Results of Critical Analysis of Transcriptional Strand-Bias: Implications for SBS Origins

The DRT-paradigm is summarized as a set of “dysregulated Ig SHM-like” strand biased patterns; for example, as observed in the dominant yet ‘flat-like’ COSMIC signature SBS5 (Figure 3 and Figure 4, Table 1 and Table 2). We postulate that the transient assembly (and disassembly) in the cell nucleus of AID-associated Ig-SHM like enzymes and membrane-anchoring factors create a potential “Ig SHM-like Transcription Factory” environment at many sites across the cancer genome and as described by the cell biology work on Transcription Factories by Peter R Cook and associates (reviewed in the context of Ig SHM [53]). The comprehensive genome-wide studies on putative AID-driven Ig-SHM like mutations in the human lymphoblastoid cell line Ramos by David G Schatz and associates are consistent with this view [54]. It would involve the RNA Pol II elongation complex generating a Transcription Bubble at protein coding genes. Such Transcription Bubbles often transiently stall downstream of the transcription start site, to approximately 3 Kb downstream of the TSS [39], allowing mutagenic deaminase action at exposed DNA and RNA substrates at nascent RNA stem-loops [27], the annealed 11 nt RNA:DNA hybrid [35], and the unpaired ssDNA in the displaced NTS. Extreme examples of much longer extended RNA:DNA annealed hybrids would be in R-Loops, particularly evident in cancer genomes at Transcription Replication Fork collisions (TRC) on the same strand [25,26], and R-Loop formation at telomeres [24].

2.1. Strong Evidence for DRT Origin of Many SBS Signatures

What follows is a detailed critical analysis of the mechanism of origin of a number of significant SBS signatures in the context of the AID/APOBEC/ADAR deaminase mutational DRT-paradigm. To better understand these analyses, it is advised that the reader refer continuously to the transcriptional strand asymmetry signatures for that SBS at the COSMIC website (e.g., as seen in Figure 3 and Figure 4). Go to “Single Base Substitution (SBS) Signatures” at https://cancer.sanger.ac.uk/signatures/sbs/ (accessed on 15 April 2024), scroll to the SBS signature you select, click that site, go to the chosen “SBSxx Topographical Features” then click “Transcriptional strand asymmetry” to clearly see the histogram plots, as in Figure 3 and Figure 4, i.e., Genic: Transcribed Strand (blue) and Genic: Untranscribed Strand (green), and you can download the Excel files used here to analyze in detail, as we have done in Materials and Methods Section 3.2, e.g., also Supplementary Table S1. A tabular summary of the conclusions and outcomes is displayed in Table 3.

2.1.1. Origins SBS5

The etiology of SBS5 is unknown and is described as “flat” and “Clock-like”, thus age-related. It is the most dominant SBS signature appearing prominently in frequency in all cancer genomes [10,11,12]. There is a general agreement that SBS5 is the result of the accumulation of many somatic mutations over time and cell division cycles arising from the interplay of DNA damage and repair in the broadest meaning of that description [59]. The over-arching feature is the significant transcriptional strand bias at A:T and G:C base pairs displayed by this pan-cancer signature (Table 1 and Table 2).
The somatic mutation pattern for SBS5 is displayed as a “Types of Mutation” pattern in Table 1B in relation to the same pattern, as observed in well-defined experimental somatic mutation assays observed in Germinal Centre B lymphocytes from Peyer’s Patches and spleens of aged or immunized inbred mice and transgenic systems where PCR recombinant artefacts, that blunt strand bias, have been minimized (Table 1A). The data in Table 1B are integrated and pooled across many thousands of sequenced cancer genomes of different cancer tissue types (see Table S1, and strand biases further summarized by cancer type in Table 2). The mutations of A systematically exceed the mutations of T (symbolized as A>>T), and the mutations of G systematically exceed the mutations of C (symbolized as G>>C) at p < 0.001. However, within A:T base pairs, the mutations at A-to-C complement T-to-G and go against this trend, with T-to-G mutations significantly exceeding A-to-C mutations (p < 0.001). This result is consistent across the majority of different cancer tissue types displaying SBS5 signature patterns (Table 2, see Table S1).
Why are these patterns so? Two of the three main sources of deaminase substrates (ssDNA, RNA:DNA hybrids) are associated with the Replication Stress (Lagging strand of the Replication Fork, ssDNA) and Transcriptional Stress (R-Loop generation, ssDNA, RNA:DNA Hybrids). In our view, the third source of deamination substrates are at Stalled Transcription Bubbles, as is the case for the more defined Ig SHM systems providing ssDNA, dsRNA stem-loops, and RNA:DNA hybrid substrates, which together provide substrates for AID/APOBEC C-site and ADAR A site deaminations (Figure 1 and Figure 2). These mutagenic events coupled to TSRT mediated by DNA Polymerase-eta, as for Ig SHM, generate the strong strand biased mutagenesis signal that outweighs the contributions from the other two mentioned sources. Blunting this systematic A>>T strand bias in SBS5 is the systematic strand bias of T-to-G over A-to-C mutations. This strand bias is not evident in the Ig SHM data (Table 1A), but it is the case in the vast majority of different types of cancer genomes with sufficiently large enough mutation numbers to assess significance (Table 2, see Table S1). There are exceptions to this T-to-G strand bias viz. Squamous Cell Carcinoma of the Head and Neck (Head-SCC), and Lung cancer (Lung-SCC).
The strand biased pattern for the origins of T-to-G>A-to-C in the vast majority of cancers is plausibly explained by the DRT-Paradigm by assuming a major role for the modified isomer of uracil in pre-mRNA, pseudouridine (ψ), now behaving like “G” base pairing to mis-incorporate a C in the newly synthesized cDNA transcribed strand (TS) via TSRT opposite ψ in the pre-mRNA. There is much data in the pseudouridine literature consistent with this explanation [17,18,19,20].
What is a plausible explanation for the other less dominant yet reverse strand biases of A<<T, at certain trinucleotides such as TTC, TTT? These are at classic WA site motifs for ADAR A-to-I RNA deamination [60]. One likely explanation is the role of ADAR1 and ADAR2 assisting R-Loop dissolution, particularly at ubiquitous transcription-generated R-Loops at replication fork head-on collisions or TRC [25]. The annealed RNA:DNA hybrid regions are a target for ADAR1 attack for dissolution of R-Loops at telomeres [24]; and, ADAR2 does this in the wider body of the genome at R-Loop TRC sites [25].
Double strand DNA breaks (DSBs) can provoke R-Loop formation [61] and change the general pattern of ADAR2 A-to-I RNA editing that assists their resolution [62] and specifically assists DNA end resection and homologous recombination (HR). These observations are thus consistent with observed genomic mutagenesis in cancer on foci of R-Loops at TRC sites [25].
The almost universal T-to-G strand bias observed in cancer genomes prompts an additional comment. If the suggestion that excessive pseudourinylation (ψ) in cancer transcriptomes is correct, then it warrants further investigation to establish whether or not ψ is a useful pan-cancer biomarker.

2.1.2. Origin SBS1

This “Clock-Like” signature at CpG sites appears in all cancers examined (Figure 3). The formal description and etiology is a “Clock-Like” (i.e., age-related) signature arising due to spontaneous water deamination or enzymatic deamination of the methylated cytosine at NCG (read CpG) sites.
The dominant ACG motif among the C>T trinucleotides harbors approximately 36% of all substitutions within this signature. At ACG, a G-to-A>C-to-T strand bias is evident when mutations are read from NTS (not significant at the other lower incidence CpG motifs, CCG, GCG, TCG). SBS1 is a minor extracted SBS signature. It is a G>>C strand bias component of the global AID/APOBEC driven strand bias at C-sites in the SBS5 signature.
The current biochemical data are inconsistent with SBS1 arising spontaneously by water hydrolysis in vivo [11,12] and is most likely AID or APOBEC deaminase driven when it appears in cancer genomes in vivo. In our view, SBS1 falls under the umbrella of the DRT-Paradigm.
Detailed comparative biochemistry analysis in vitro by Ito et al. [63] of deamination C or mC to uracil or thymine suggest this is an enzyme catalyzed AID/APOBEC deamination signature and that spontaneous water hydrolysis is an unlikely deamination event. Indeed, methylation of C at CpG sites appears to protect cytosines from enzymatic deamination. There is a range of dose dependent activity (“catalytic efficiency”) for C and mC deamination across a range of substrate motifs (see data in figure one in the paper by Ito et al. [63]). These include a TCpG site and ACA, CCA, GCA, TCC, TCT, TCA, CCC motifs. The deamination efficiency against relevant substrates, methylated or not, was compared for cytosine deaminases AID, APOBEC1, APOBEC3A though to APOBEC3H. In all cases unmethylated C-sites were deaminated effectively but at varying dose-dependent efficiencies. What is striking is that in all cases, when the same C-centered motif is methylated, substantial reductions, down 51–98% in deamination efficiency, occurred across the range of AID and APOBECs tested. Indeed, APOBEC3H was “inhibited” in its deamination activity the least by cytosine methylation.
Consistent with this view is the finding that SBS1 is depleted across cancer types for multiple histone marks, including H3K9me3 [12]. One speculation is that this is a consequence of excessive methylation of cytosines protecting against AID/APOBEC deamination of 5meC sites in general.

2.1.3. Origin SBS2/SBS13 (See Figure 4)

These are C-site mutations (Box 1) targeted at G:C base pairs [11,12,64]. The designated origin is attributable to aberrant activity of the AID/APOBEC family of cytosine deaminases particularly APOBEC3A, APOBEC3B, APOBEC3H at lagging strands of replication forks under stress (see Supplementary Information Section S2A). They do not display systematic transcriptional strand asymmetry, but they do show replication strand asymmetry with a preference for the lagging strand indicative of unpaired cytosines on the ssDNA substrates at replication forks. It is agreed that the strand bias at G:C base pairs in SBS13 is most likely generated by the translesion repair enzyme REV1 replicating across abasic sites arising from BER removal of uracil (reviewed in the context of Ig SHM [9]). The SBS2/SBS13 signature (Figure 4) appears jointly and to varying degrees of strength in many cancers (24/32 in [11]). It can be considered as a small and defined subset of the global AID/APOBEC and ADAR strand-biased deaminase-based TSRT signatures already discussed for SBS5, using the DRT-Paradigm. This interpretation also assigns causative “AID/APOBEC” activity to the SBS2/SBS13 at the online COSMIC database [11,12], as shown in the summary Table 3.

2.1.4. Origin SBS3

This complex SBS signature (Figure 3, Table 1) appears in a subset of tumors with Defective Homologous Recombination (dHR) Repair of double strand breaks (DSBs) due to genetic deficiency in BRCA1 or BRCA2 genes. It is a complex signature, and many features are not inconsistent with the DRT-paradigm interpretation.
There are strong parallels in the genomic sequencing analysis on in vitro culture of the avian DT40 cell system consistent with the SBS3 profile [65]. Superficially, it appears similar and “flat” to the SBS5 profile, but there are many differences. Clearly, many unrepaired single base substitution lesions, apart from the more serious DSBs, are elevated in HR Defective patients. The patients themselves are surprisingly long lived given the seriousness of the formal HR deficiency, suggesting other DNA repair mechanisms compensate, which suggests back-up RNA-templated DSB repair via DNA repair reverse transcriptases, Pol-eta [49], and thus putatively Pol-theta, which is also a reverse transcriptase [66] via TSRT as already discussed.
At A:T base pairs, the global strand biased A>>T pattern is not significant, although there is a clear strand bias for A-to-G exceeding T-to-C mutations, the prominent strand bias in SBS5. There are many distortions in patterns to the “types of mutations” that are systematic in SBS5 (Table 2), as seen in Table S2. Many of the cancers with this profile may also have potential “smoking” adduct or etiology for those mutation patterns; or accumulated endogenous adducts on G, and maybe A as well (Table S2). In this regard, BRCA1 deficient DT40 cells display 53BP1 dependent translesion Y family involvement of Pol-eta, Pol-kappa. This dependency on specific base substitution mutations on Pol-eta, Pol-kappa for translesion synthesis [67] is very interesting given that human Y family polymerases eta, kappa, and iota are all known to display reverse transcriptase activity [50,68,69].
Final caveats on the putative involvement of TSRT in the generation of some of the SBS3 signature, particularly at purines G and A are noted. This strand bias could also result from exogenous sources such as tobacco smoking. It could also arise via spontaneous endogenous bulky adducts on G and A, thus conventional Transcription Coupled Repair (TCR) [70,71,72], making detected mutations on the NTS exceed those on the TS, as is clear in the “tobacco smoking” signature of SBS4.
However, ROS-generated 8oxoG modifications in nascent pre-mRNA cannot be ruled out as a primary source of excessive strand biased G-to-T mutations (see below Section 2.1.16).
Further, SBS3 is a minor signature in most BRCA1/2 deficient cancers, except in Breast-Cancer, ESCC, Ovary-AdenoCA, Panc-AdenoCA, and Stomach-AdenoCA (Table S2). It is conceivable within the SBS3 profile, that there is also some endogenous ADAR A-to-I damage at A-sites in pre-mRNA, at uracil isomerization in pre-mRNA (ψ), and at Transcription Bubbles, thus making an expected contribution of A-to-G>T-to-C. The global G>>C strand bias is also prominent and may suggest the involvement of Pol-eta (or putative Pol-theta) TSRT repair, as discussed for SBS5. This appears particularly the case for Breast-Cancer (Table S2).
In summary, a number of processes generating strand bias effects appear to contribute to the SBS3 profile, including the bulky adduct clearance of adducted Gs and As by conventional TCR, AID/APOBEC, and ADAR deaminase-driven reverse transcriptase-coupled processes involving TSRT and back up RNA-HR reverse transcriptase-mediated DSB repair.

2.1.5. Origin SBS4

This is the classic “tobacco smoking” signature. It occurs mainly at G:C base pairs but also at lower frequency at A:T base pairs. The undisputed conventional explanation is that the SBS4 transcriptional strand biases at G:C and A:T base pairs are caused by preferential bulky adduct clearance of adducted Gs and As on the transcribed strand by conventional transcription coupled repair (TCR). This signature has long been considered to be diagnostic of DNA mutagenic damage associated with tobacco smoking [70,71,72].

2.1.6. Origin SBS6

The proposed etiology for this signature is defective DNA mismatch repair (dMMR) with bias to the leading strand at replication forks [12] and is found in microsatellite unstable tumors. It appears at significantly low incidence in a very small number of cancers (Liver-HCC, Lymph-BNHL, Panc-AdenoCA, Uterus-AdenoCA [11]. The prominent apparent reversal of G-to-A over C-to-T strand bias (as SBS5, Table 1 and Table 2) at some motifs (CCG, GCG, GCT, TCG, but not ACA, ACG) is similar to patterns at the same motifs in SBS1 (Figure 3). None of these apparent strand biases reach significance, and the numbers of mutations are small. SBS6 is considered a small subset of the AID/APOBEC deaminase driven C-site signature of SBS5 (The DRT-Paradigm).

2.1.7. Origin SBS7a, SBS7b and SBS7c, SBS7d

These have been attributed to exogenous UV exposure observed in Skin-Melanoma genomes [11,12]. Many of the component transcriptional strand biased signatures at both G:C and A:T base pairs can be plausibly understood within the frame of the DRT-paradigm.
The main G:C base pairs targeted mutations in Skin-Melanoma are caused by the formation of cyclobutene pyrimidine dimers (CPD) in DNA. This is a significant damage lesion in the DNA helix blocking transcription and replication passage. It is responsible for >95% of all C>T signature mutations (of C-to-T and G-to-A) in Skin-Melanoma genomes. These numbers and statistics for strand biases at G:C and A:T base pairs are summarized in Table S3 (harvested from https://cancer.sanger.ac.uk/signatures/sbs/ accessed on 15 April 2024). It can involve a two-step process in human cells involving cytosine deamination (C-to-U) at certain motifs then error-free polymerase bypass repair [73]. The UVB exposure causes cyclobutene pyrimidine dimers (of adjacent pyrimidines written as C=C, T=C). The authors tested their hypothesis that largely confirms this alternative mechanism.
The main assumption in Jin et al. [73] is that the cytosines in the CPDs are deaminated by spontaneous processes to form uracil, which are then faithfully replicated by Y family translesion DNA polymerase eta, and thus incorporate adenines across the deaminated, or uracil-containing CPDs. The resulting mutations in the tri-nucleotide spectrum broadly matches SBS7 (SBS7a,b), which is a very good confirmation of their alternative explanation for adjacent T-T sites appearing at T=C sites within CPDs after UVB exposure and CPD repair.
This is a reasonable explanation apart from the assumption that the recovery and repair process on UVB exposure involves non-catalytic or spontaneous cytosine deamination. Our doubts about this assumption are supported by the experimental method the authors employed [73]. We propose an alternate explanation based on their method of UVB exposure and recovery prior to sequencing the products. The authors irradiated human fibroblast cells with UVB and harvested them 24 and 48 h later to allow time for deamination [73]. In our view, that period of 24–48 h for deamination to occur is the alternative key to understanding these data. This time interval is consistent with the immediacy and time course of a cellular Innate Immune response. It is indeed plenty of time for the Innate Immune response to be marshaled and assembled following this quite powerful attack on the integrity of the cell, particularly the DNA damaged genome. In our opinion, a cellular Innate Immune response is unavoidable.
Thus, our alternative contention is that sunlight UVB damage, such as CPD lesions across the genome, particularly in coding regions, can excite an Interferon Stimulated Gene-dependent Innate Immune response, which includes APOBEC and ADAR activation [29]. This itself is also likely to activate expression of the DNA damage regulator TP53 that is known to coordinate expression of APOBEC3 family genes [42]. Thus, APOBEC3G [74], APOBEC3B, and APOBEC3A at least can expect to be activated [75,76,77], causing expected collateral genomic damage via DNA deaminations [3,5] particularly in melanoma [78] and thus cancer pre-mutations—via C-to-U mutations at T=C and C=C cyclobutane pyrimidine dimers, involving error free DNA direct copying damage repair by DNA Polymerase eta.
In our view, SBS7a/b is a cancer mutation signature involving both active deaminase-driven cytosine deamination coupled at least to translesion DNA repair synthesis involving DNA polymerase eta.
However, not explained is the strong transcriptional reverse strand bias (in relation to SBS5) of G<<C (i.e., C-to-T>G-to-A) in both SBS7a, SBS7b, and at a far lower level of T site mutations, which exceed A site mutations (Table S3). How do these strong and highly significant transcriptional strand biases in the C>T and T>C tri-nucleotide spectral patterns arise (without replication strand bias)?
Plausible explanations that fit the data are in two parts:
1. SBS7a, SBS7b: The strong transcriptional strand bias at TpC-sites first involves the C-to-U deamination step, as shown in Jin et al. [73]. Pol-eta may well be involved in the error-free repair. However, CPDs in the cell genome would also be expected invoke a strong conventional TCR process [70]—involving NER-TCR—directed at the preferential repair of the template or transcribed strand (TS) for RNA Pol II transcription leaving an excess of unrepaired C-to-T mutations on the displaced non-transcribed strand (NTS). CPDs are akin to obstructive bulky adducts on the template strand, which would be cleared preferentially, as shown earlier for bulky adducts of purines [71,72], as observed in SBS4 (tobacco smoking).
In our opinion the extreme strand biases at G:C base pairs in the SBS7a and SBS7b profiles result from conventional TCR.
2. SBS7c, SBS7d: These mutation levels are <5% of all mutations in Skin-Melanoma genomes. In our opinion the reverse strand biases e.g., T-to-C far exceeding A-to-G require a different explanation as it involves specific mutations at A:T base pairs. The most plausible in progressing malignant melanomas would be the ubiquitous and putative large number of R Loop-Replication Fork conflicts [25], as already discussed to explain similar reversals in strand biases in SBS5. Thus, we invoke ADAR1/2 involvement in the Inosine modification of adenine bases in the DNA moiety of the long annealed RNA:DNA hybrids at R Loops. This would then assist in the release of the pre-mRNA and its degradation, thus dissolution of R Loops as discussed above. The extreme T-to-C strand bias over A-to-G follows replication of the unrepaired Inosine (Hypoxanthine) in the DNA at the collapsed R Loop site. Given Wobble Base pairing off template Hypoxanthine, other possible extreme strand biased signatures at T appearing on the NTS would be T-to-A viz. at TTT trinucleotide motifs (AAA on the TS).
We support both of these explanations, although different, as they are economical on basic assumptions, and provide plausible explanations for the intriguing strand biases of SBS7. Together, both explanations are consistent with AID/APOBEC and ADAR deaminations as initiators and drivers of DNA damage in melanoma progression post UV exposure. They are, thus, part of the DRT-Paradigm we employ in our analytical approach to understand the generation of SBS strand bias signatures.
It is noted that the strong presence of a T-to-G>A-to-C strand bias, which we have speculated, is caused by endogenous pseudouridination (ψ) of uracil in cancer transcriptomes (Table S3), and now coupled to TSRT (Pol-eta, Pol-theta), as discussed earlier as the base mispair outcome of the RNA modifications appears in genomic DNA.

2.1.8. Origin SBS8

Classed as of “unknown etiology”, it is similar to the signature of alkylation of G and A by methyl methanesulfonate exposure in avian DT40 cells [67]. However, both C-site and A-site Transcriptional Strand Asymmetry is noted at G:C and A:T base pairs. A plausible origin is exposure to alkylating agents [31] (endogenous or exogenous?) and the strand biased profiles are suggestive of bulky adducts of G, A, and T resulting in G-to-T, A-to-T, and T-to-A excesses on the non-transcribed strand via conventional TCR with preferentially targeting of the transcribed strand [70], as originally described for bulky adducts of tobacco smoking c.f SBS 4 [71,72].

2.1.9. Origins SBS9

This signature is classed [11] as “In part, polymerase eta activity”. It is classed in Box 1 as a C-site plus A-site more or less balanced “Ig-SHM-like” (AID/APOBEC/ADAR driven transcriptional strand biased signatures with some TSRT and some Hx in DNA after R Loops have collapsed and replicated (Table 3).
It appears primarily in lymphocytic and lymphoma tumors (Lymph-BNHL, Lymph-CLL). We are genuinely puzzled by this categorization involving Pol-eta activity. In our opinion, DNA Polymerase eta (and theta) can be involved in target site reverse transcription (TSRT) in the strand biased fixation of RNA mutations in DNA as in Ig SHM (Figure 2). Most of the mutations are at A:T base pairs in the T>C and T>G tri-nucleotide components of the SBS9 profile (https://cancer.sanger.ac.uk/signatures/sbs/ accessed on 15 April 2024). Parts of the patterns are interesting with systematic strand bias to the NTS of T-to-A, T-to-C, and T-to-G. These are understandable under the DRT-Paradigm given previous listed analyses (SBS5), yet it involves Hx in DNA at collapsed R Loops.
First, for T-to-C strand bias to the NTS. In our view, this would plausibly involve ADAR1/2 A-to-I editing of the DNA of the annealed RNA:DNA hybrid at R Loops, as they are collapsed and dissolved, in rapidly proliferating lymphocyte cancers. Then, the unrepaired template Hypoxanthine is copied as T-to-C into synthesis of the NTS on replication as discussed (SBS5).
Second, the origin of the T-to-G strand bias could also plausibly involve pseudouridine (ψ) modifications in RNA as discussed and TSRT fixation of T-to-G mispaired mutations in the genome via DNA Polymerase eta acting in its reverse transcriptase repair mode (TSRT).
The main features in SBS9 are understandable from first principles and DRT model assumptions (AID/APOBEC and ADAR deaminations coupled to TSRT). However, also note the analysis [12], where the strong replication strand bias with enrichment of mutations on the leading strand is attributed to the infidelity of polymerase eta.

2.1.10. Origin SBS10a, SBS10b, SBS14

These signatures are associated with replicative DNA polymerase epsilon or POLE gene mutations—with or without dMMR. A mutation in the POLE gene is associated with faulty polymerase proofreading. There is no reason to dispute the attributed origins of these very minor signatures that appear in Colorectal-AdenoCA, Uterus-AdenoCA, and Liver-HCC as a consequence of POLE mutation(s), with or without MMR deficiency. However, Otlu et al. [12] attributes the strong replication strand bias with enrichment of mutations on the leading strand to the defective activity of polymerases, DNA polymerase epsilon (POLE) and polymerase delta (POLD1).

2.1.11. Origin SBS11

There is no reason to qualify the origins of SBS11, as it is associated with Temozolomide treatment. It is a minor yet distinctive signature in CNS-GBM and Panc-Endocrine tumors. The systematic transcriptional strand bias of G-to-A mutations exceeding T-to-C mutations at many C-site motifs (ACC, ACT, GCC, GCG, GCT, TCC) suggests the involvement of AID/APOBEC deamination coupled to TSRT via Pol-eta (or Pol-theta). Thus, a cytosine deaminase explanation at Transcription Bubbles coupled to genomic fixation via TSRT is plausible. The DRT-Paradigm is useful to understand the transcriptional strand bias features of SBS11.

2.1.12. Origin SBS12

This is one of the most interesting signatures in the SBS collection. It is of “Unknown” etiology and dominates Liver hepatocellular carcinoma (HCC) genomes (see Table 2, Table 3 and Table S1). It is largely focused on A:T base pairs, with lower-level mutations at G:C base pairs. The notable feature is the extreme strand bias of A-to-G mutations strongly exceeding T-to-C mutation on the NTS. A plausible interpretation is that this is caused by the oncogenic tumor promoting activity of high ADAR1 expression in such cancers [79], as discussed elsewhere [7,80]. Others [81], including curators at the COSMIC site, suggest this is an example of an unknown etiology involving Transcription Coupled Damage (TCD) causing lesions at adenines on the (displaced) NTS at Transcription Bubbles. However, in the context of the DRT model (Figure 1a), this is a good, although an extreme example of transcription-coupled ADAR1-mediated A-to-I deamination of nascent pre-mRNA stem-loops [27] followed by TSRT at Stalled Transcription Bubbles then fixing the pre-mRNA A-to-I mutations in DNA. This is the most plausible cause of the extreme strand bias of A-to-G mutations over T-to-C, as read on the NTS. The stand-out features of SBS12 are thus understandable from first principles and foundation assumptions of the AID/APOBEC and ADAR deamination paradigm coupled to TSRT involving the RT activity Pol-eta at least, and/or the putative RT activity of Pol-theta. That is, the DRT-Paradigm.

2.1.13. Origin SBS15

This signature of “Defective DNA mismatch repair” (dMMR) displays features of the DRT-paradigm. It is evident at low level in Biliary-AdenoCA, Colorectalk-AdenoCA, Stomach-AdenoCA, Uterus-AdenoCA [11]. At the COSMIC site (ver3.4), ESCC displays the signature prominently. It is focused at G:C base pairs for the C>T set of trinucleotide motifs, particularly GCG, but is also evident at GCA, GCC, and GCT motifs. These are key features of core RCN AID deaminase motifs (typically WRCG/W). What is striking about SBS15 is the complete lack of Transcriptional strand asymmetry, see https://cancer.sanger.ac.uk/signatures/sbs/sbs15/#transcriptional-strand-asymmetry (accessed on 15 April 2024). A plausible explanation is that defects in the mismatch repair MSH2-MSH6 heterodimer activity may not sufficiently recruit DNA Polymerase eta to AID-mediated C-to-U DNA lesion sites (thus poor TSRT). Such a deficit has been established in a Ig SHM system in vitro by Patricia J Gearhart and colleagues [82]. Thus, the DRT-Paradigm allows us to better understand the lack of transcriptional asymmetry in SBS15.

2.1.14. Origin SBS16

This signature is of “Unknown” etiology, yet it can be plausibly attributed to “Alcohol consumption” on current observations, and mechanistically to what has been termed Transcription Coupled Damage [12,81]. It is evident in Head-SCC and Liver-HCC [11,12]. At the COSMIC site (ver3.4), ESCC and Liver-HCC display this strong A>>T strand biased signature at A:T base pairs prominently (https://cancer.sanger.ac.uk/signatures/sbs/sbs16/#transcriptional-strand-asymmetry (accessed on 15 April 2024)).
SBS16 is, thus, an A:T bp-focused signature at ATA, ATG, ATT motifs, which are core WA motifs for both ADAR1 mediated A-to-I pre-mRNA modifications [60] and indeed DNA Polymerase eta [83,84] during Ig SHM in vivo [27]. The strand biased mechanisms highlighted in Figure 1 apply. As with SBS12, the SBS16 signature is, therefore, understandable from first principles and foundation assumptions of the AID/APOBEC and ADAR deamination paradigm coupled to TSRT, involving the RT activity Pol-eta at least, and/or the putative alternate RT activity of Pol-theta, that is the DRT-Paradigm. See comments Table 3.

2.1.15. Origin SBS17a, SBS17b

This signature is also of “Unknown” etiology, and it appears in many cancer genomes but particularly with high somatic mutation numbers in Eso-AdenoCA, Stomach-AdenoCA and is A:T bp focused. In SBS17a, the reverse strand bias of A-to-G<T-to-C on NTS is significant (p < 0.001). In SBS17b, the strand bias of T-to-G>A-to-C at main motifs CTT, GTT, and TTT is also systematically significant (p < 0.001).
The explanations under the DRT-Paradigm for these different transcriptional strand biases are in two parts.
In SBS17a, these strand biased patterns are consistent with ADAR-mediated A-to-I creating hypoxanthine in the DNA moiety of long annealed RNA:DNA hybrids at R Loops in these progressing cancers. On ADAR assisted dissolution and degradation pre-mRNA, it can lead to unrepaired hypoxanthine in TS DNA being replicated to produce excess T-to-C (and Wobble Base pairing producing the alternative T-to-A) mutations on the NTS.
In SBS17b, while a contribution from Wobble Base pairing at Hypoxanthine at R Loop dissolution may contribute to the excess in T-to-G over A-to-C, putative pseudouridinylation (ψ) of nascent pre-mRNA, as speculated previously at Stalled Transcription Bubbles followed by TSRT would also contribute to this pancancer signature (see discussion Section 2.1.1 SBS5).

2.1.16. Origin SBS18

This signature, in many cancer genomes, is putatively caused both by the Innate Immune Response to infections and internal cellular stress and DNA damage involving reactive oxygen species (ROS)—acting to oxidize nucleic acids particularly Guanines causing G-to-T mutations in DNA as a consequence of 8oxoG formation. Thus, the strong mutation profile signature of SBS18 is focused on G:C base pairs and most dominantly in C>A trinucleotides. There are two striking features.
The first is the transcriptional strand bias of G-to-T mutations exceeding C-to-A on the NTS. This is particularly evident in ACA, ACC, ACT, CCA, GCT, GCA, GCT, TCA, TCC motifs. In different cancer types with large numbers of mutations the strand bias is very significant in Breast-Cancer, Colorect-AdenoCA, ESCC, Eso-AdenoCA, Stomach-AdenoCA (p < 0.001). This is a striking result not in the least because the observation conflicts strongly with known oxidative DNA base damage studies in mammalian cells. Thus, Thorslund et al. [85] investigated defined oxidative DNA base damage exposure of Chinese hamster ovary fibroblast cells in culture. In contrast to mitochondria, they report that 8oxoG is repaired equally on both DNA strands without strand bias. This is expected as 8oxoG modifications are not considered bulky adducted modifications and can be replicated easily or presumably reverse transcribed.
Why do the SBS18 ROS signatures in many cancer genomes in vivo display strong G-to-T over C-to-A strand bias? This is reminiscent of the known similar bulky adduct-induced strand biases caused at Gs in lung cancer mutational hotspots in the TP53 gene on exposure to Benzo[a]pyrene adducts, and their slower removal from the TS [71,72] the now classic strand-biased outcome of Transcription Coupled Repair, as discussed for SBS4 [70].
An answer that fits the transcriptional asymmetry data assumes oxidative RNA damage in nascent pre-mRNA at Stalled Transcription Bubbles, as specifically speculated on earlier [13,14] based on the published RNA oxidative damage studies of Wu and Li [21]. Thus, in this scenario, strong strand biases of the type G-to-T exceeding C-to-A on the NTS can also, in theory, be generated by ROS stress first as RNA modifications (8oxoG), which are converted to excessive strand biased G-to-T mutations via TSRT and reverse transcriptase functions of Pol-eta (or putatively Pol-theta).
The DRT-Paradigm, thus, allows a plausible understanding of these simple base modified strand biases now appearing in genomic DNA of cancer cells.
The second and overlooked feature of SBS18, is the significant 5′ preference for G-to-T mutations. Thus, on average, the incidence of G-to-T mutations (8oxoG) at WG sites is four times more frequent than at SG sites (S is G or C). This has similarities to accessibility of ADAR deaminases to the A-site at WA motifs in dsRNA [60]. It appears that oxidation at the 8 position of G via ROS follows similar biochemistry.
The transcriptional strand asymmetry signature of SBS18 is understandable in part in terms of the DRT-paradigm.

2.1.17. Origin SBS19

The etiology of this signature is “Unknown”. It appears as a minor signature in CNS-PiloAstro, Liver-HCC and Myeloid-MDS/MPN tumor genomes. The striking transcriptional strand asymmetry profile shows it is almost a pure G-to-A>C-to-T strand biased signature. Again, a signature that is best understood under the DRT-Paradigm.

2.1.18. Origin SBS84, SBS85

In Otlu et al. [12] these are assigned as “AID-associated signatures SBS84 and SBS85”. This implies off-target Ig SHM-like mutagenesis across the cancer genome. The reverse transcriptional strand-bias is significant, particularly at A:T base pairs T-to-C> A-to-G and T-to-A>A-to-T. This suggests R Loop targeting and unrepaired hypoxanthines in DNA in the transcribed strand after nascent RNA release and degradation. The key points on this reverse strand biased feature have been made viz. Section 2.1.1 SBS5 (and Table 3).

2.1.19. SBS Signatures SBS20 Through SBS 44 (as [11])

This paper will not critically evaluate these signatures here, as the main conceptional and interpretation points concerning the DRT-Paradigm have been established, in our opinion, by the above analyses. These additional signatures are all minor mutation patterns apart from SBS40 (“Unknown” etiology, yet it appears much like SBS5). Some are repetitive subsets of other established signatures. Many have no known causes. However, many also have no topographical “Transcriptional strand asymmetry” assigned at time of writing, e.g., SBS40.

3. Materials and Methods

3.1. Cancer Genome Sequence Source Data

All somatic mutation data in sequenced cancer genomes was sourced between 7 and 15 April 2024 at the Catalogue Of Somatic Mutations In Cancer (COSMIC) online site (v3.4) at https://cancer.sanger.ac.uk/signatures/sbs/ (accessed on 15 April 2024).

3.2. Conversion of Transcriptional Strand Asymmetry SBS Somatic Mutation Numbers at the COSMIC Site to “Types of Mutation” Tables

We converted the COSMIC (v3.4) presentation of “Transcriptional strand asymmetry” files (under Topographical features) to a more familiar format for viewing strand biases by the construction of “types of mutations” tables typical of published Ig SHM analyses at Ig loci as Table 1 [7]. In such tables, all 12 types of somatic mutations are read from the coding strand, or in the present terminology, on the displaced non-transcribed strand generated by RNA Polymerase II Transcription Bubbles, NTS.
The analytical steps for conversion of a SBS signature is as follows:

3.2.1. Step One of Conversion

We converted the “Transcriptional strand asymmetry” data for each SBS signature for accumulated real mutations at each of the pyrimidine-centered trinucleotide motifs (C>A, C>G, C>T, T>A, T>C, T>G) on the “transcribed” strand and left unaltered the real mutations on the “untranscribed” strand. For example, at G:C base pairs, real mutations of C>A on the “transcribed strand” are now read as G>T mutations on the “untranscribed strand” (and numbers of real C>A on “untranscribed strand” left unaltered); similarly real mutations of C>T on the “transcribed strand” are now read as G>A on the “untranscribed strand”; and similarly for A:T base pairs real mutations of T>C on the “transcribed strand” are read as A>G mutations on the “untranscribed strand (and T>C real mutations on the “untranscribed strand” left unaltered). This exercise is repeated for each pyrimidine centered trinucleotide motif.

3.2.2. Step Two of Conversion

This is illustrated for SBS5 Liver-HCC. In the first step (Figure 5), the downloaded text file that has been converted to an excel file is “v3.2_SBS5_TRANSCR_ASYM.xlsx” and SBS5 Liver-HCC data extracted as shown in the screen shot below.

3.2.3. Step Three of Conversion

The next step (Figure 6) is the conversion of the real mutations to a “Types of Mutation” table, as presented in Supplementary Table S1, where strand biases are easily visible particularly at large N values and Chi-Squared tests (2 × 2) performed directly and p level recorded as NS (p > 0.05), then significance level at p < 0.05, p < 0.01, or p < 0.001) tabulated as in screen shot below (From Supplementary Table S1).

4. Summary and Conclusions

The focus of our analysis is applied to understanding the likely origin of key COSMIC SBS signatures, using the DRT paradigm (Table 3) [11,12]. Most signatures analyzed can be understood within the context of the DRT model (Figure 1). Most implicate roles for deamination of cytosines or adenosines in DNA or RNA substrates and include coupling to a reverse transcription step. This interpretation and re-evaluation of the base mechanisms driving somatic mutagenesis is consistent with the original hypothesis that cancer genomes display a “dysregulated AID/APOBEC Ig SHM-like signature” coupled with an ADAR deaminase RNA editing signature and reverse transcription [3,5,7,8,13,14].
As we continue to construct our knowledge of the likely source(s) of different mutational signatures in cancers, our main conclusion here is that we can use the DRT model to provide a more detailed molecular explanation of the roles of the mutagenic homologous families of deaminases during oncogenesis. The SBS suite of signatures is a significant advance, yet the next steps may need even more complex predictive molecular models than the DRT model posited here, to be able to identify and assign causation to the many genomic signature differences observed between different cancers and individuals within a cohort.
The main advantage of viewing the molecular processes involved through the DRT prism is two-fold: First, we need to do much more work to better understand the transcription-linked molecular processes contributing to the spectra of mutagenic signatures arising during oncogenesis. Second, the DRT model, and its molecular processes, can potentially lead to the future development of more therapeutically precise predictions for individual patients, and for the personalization of the patient clinical care treatment path.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/ijms26030989/s1. References [86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114] are cited in Supplementary Materials.

Author Contributions

Both authors conceived this analytical review, E.J.S. wrote the first draft with consultation and amendment by R.A.L. All authors have read and agreed to the published version of the manuscript.

Funding

This analytical review was supported solely by inhouse funds.

Data Availability Statement

All data analyzed in this paper are provided here or in Supplementary File and primary mutation data available online at the COSMIC website.

Acknowledgments

The helpful assistance of B. Otlu on use of the COSMIC online database is acknowledged. We acknowledge helpful suggestions by Jared Mamrot on earlier drafts.

Conflicts of Interest

Author Edward J. Steele is employed by the company Melville Analytics Pty Ltd. and Immunomics. The remaining author Associate Professor Robyn A. Lindley is Hon. Principal Fellow, Dept of Clinical Pathology, Victorian Comprehensive Cancer Centra (VCCC), Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne and Founder and Chief Scientific Officer, GMDxGenomics and declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

(Full list in Supplementary file): A>>T—mutations in DNA at A exceed mutations at T; A-to-I—adenosine to inosine RNA editing of adenosine preferentially at WA sites—in DNA the product is called hypoxanthine (Hx); ADARadenosine deaminase acting on RNA causing A-to-I modifications (read as A-to-G mutations) ; APOBEC—apolipoprotein B mRNA-editing, catalytic polypeptide; AID—activation-induced cytidine deaminase, a member of APOBEC family of cytosine deaminases, that initiates via C-to-U DNA editing (or C-to-T mutations after unrepaired replication) in single stranded DNA both in rearranged immunoglobulin (Ig) variable gene (V[D]J) somatic hypermutation (SHM) and heavy chain isotype class switch recombination (CSR); G>>C—mutations in DNA at G exceed mutations at C; C-to-U, cytosine to uracil mutations at appropriate AID (WRCN) and APOBEC3 family protein deaminase motifs at e.g., TCN, CCG, read as C-to-T cytosine to thymine at 5meCpG site motifs; TSM, targeted somatic mutation in codon context; DRT model, deaminase-driven reverse transcriptase mutagenesis; TSRT—target site reverse transcription; N, any base; R, purine base A and G; S, strong base pairing G and C; W, weak base pairing A and T/U.

References

  1. Pecori, R.; Di Giorgio, S.; Lorenzo, J.P.; Papavasiliou, F.N. Functions and consequences of AID/APOBEC-mediated DNA and RNA deamination. Nat. Rev. Genet. 2022, 23, 505–518. [Google Scholar] [CrossRef]
  2. Rodriguez, M.G.; Flath, B.; Chelico, L. The interesting relationship between APOBEC3 deoxycytidine deaminases and cancer: A long road ahead. Open Biol. 2020, 10, 200188. [Google Scholar]
  3. Lindley, R.A. The importance of codon context for understanding the Ig-like somatic hypermutation strand-biased patterns in TP53 mutations in breast cancer. Cancer Genet. 2013, 206, 222–226. [Google Scholar] [CrossRef]
  4. Lindley, R.A.; Hall, N.E. APOBEC and ADAR deaminases may cause many single nucleotide polymorphisms curated in the OMIM database. Mutat. Res. 2018, 810, 33–38. [Google Scholar] [CrossRef]
  5. Lindley, R.A. Review of the mutational role of deaminases and the generation of a cognate molecular model to explain cancer mutation spectra. Med. Res. Arch. 2020, 8, 2177. [Google Scholar] [CrossRef]
  6. Mamrot, J.; Hall, N.E.; Lindley, R.A. Predicting clinical outcomes using cancer progression associated signatures. Oncotarget 2021, 12, 845–858. [Google Scholar] [CrossRef]
  7. Steele, E.J.; Franklin, A. Lindley RA Somatic mutation patterns at Ig and Non-Ig Loci. DNA Repair 2024, 133, 103607. [Google Scholar] [CrossRef]
  8. Lindley, R.A.; Humbert, P.; Larner, C.; Akmeenmana, E.H.; Pendlebury, C.R.R. Association between targeted somatic mutation (TSM) signatures and HGS-OvCa progression. Cancer Med. 2016, 5, 2629. [Google Scholar] [CrossRef]
  9. Steele, E.J. Mechanism of somatic hypermutation: Critical analysis of strand biased mutation signatures at A:T and G:C base pairs. Mol. Immunol. 2009, 46, 305–320. [Google Scholar] [CrossRef]
  10. Alexandrov, L.B.; Nik-Zainal, S.; Wedge, D.C.; Aparicio, S.A.J.R.; Behjati, S.; Bjankin, A.V.; Bignell, G.R.; Bolli, N.; Borg, A.; Borresen-Dale, A.-L.; et al. Signatures of mutational processes in human cancer. Nature 2013, 500, 415–421. [Google Scholar] [CrossRef]
  11. Alexandrov, L.B.; Kim, J.; Haradhvala, N.J.; Huang, M.N.; Ng, A.W.T.; Wu, Y.; Boot, A.; Covington, K.R.; Gordenin, D.A.; Bergstrom, E.N.; et al. The repertoire of mutational signatures in human cancer. Nature 2020, 578, 94. [Google Scholar] [CrossRef]
  12. Otlu, B.; Diaz-Gay, M.; Vernes, I.; Bergstrom, E.N.; Zhivagui, M.; Barnes, M.; Alexandrov, L.B. Topography of mutational signatures in human cancer. Cell Rep. 2023, 42, 112930. [Google Scholar] [CrossRef] [PubMed]
  13. Steele, E.J.; Lindley, R.A. Somatic mutation patterns in non-lymphoid cancers resemble the strand biased somatic hypermutation spectra of antibody genes. DNA Repair 2010, 9, 600. [Google Scholar] [CrossRef]
  14. Lindley, R.A.; Steele, E.J. Critical analysis of strand-biased somatic mutation signatures in TP53 versus Ig genes, in genome-wide data and the etiology of cancer. ISRN Genom. 2013, 2013, 921418. [Google Scholar]
  15. Kuraoka, I.; Endou, M.; Yamaguchi, Y.; Wada, T.; Handa, H.; Tanaka, K. Effects of endogenous DNA base lesions on transcription elongation by mammalian RNA polymerase II. J. Biol. Chem. 2003, 278, 7294. [Google Scholar] [CrossRef]
  16. Basu, U.; Meng, F.-L.; Keim, C.; Grinstein, V.; Pefanus, E.; Eccleston, J.; Zhang, T.; Myers, D.; Wasserman, C.R.; Wesemann, D.R.; et al. The RNA exosome targets the AID cytidine deaminase to both strands of transcribed duplex DNA substrates. Cell 2011, 144, 353–363. [Google Scholar] [CrossRef] [PubMed]
  17. Karijolich, J.; Yi, C.; Yu, Y.-T. Transcriptome-wide dynamics of RNA pseudouridylation. Nat. Rev. Mol. Cell Biol. 2015, 16, 581–585. [Google Scholar] [CrossRef]
  18. Zhou, K.I.; Clark, W.C.; Pan, D.W.; Eckwahl, M.J.; Dai, Q.; Pan, T. Pseudouridines have context-dependent mutation and stop rates in high-throughput sequencing. RNA Biol. 2018, 15, 892–900. [Google Scholar] [CrossRef]
  19. Adamopoulos, P.G.; Athanasopoulou, K.; Daneva, G.N.; Scorilas, A. The Repertoire of RNA Modifications Orchestrates a Plethora of Cellular Responses. Int. J. Mol. Sci. 2023, 24, 2387. [Google Scholar] [CrossRef]
  20. Kierzek, E.; Malgowska, M.; Lisowiec, J.; Turner, D.H.; Gdaniec, Z.; Kierzek, R. The contribution of pseudouridine to stabilities and structure of RNAs. Nucleic Acids Res. 2014, 42, 3492–3501. [Google Scholar] [CrossRef]
  21. Wu, J.; Li, Z. Human polynucleotide phosphorylase reduces oxidative RNA damage and protects HeLa cell against oxidative stress. Biochem. Biophys. Res. Commun. 2008, 372, 288–292. [Google Scholar] [CrossRef]
  22. Franklin, A.; Steele, E.J.; Lindley, R.A. A proposed reverse transcription mechanism for (CAG)n and similar expandable repeats that cause neurological and other diseases. Heliyon 2020, 6, e03258. [Google Scholar] [CrossRef] [PubMed]
  23. Luan, D.D.; Korman, M.H.; Jakubczak, J.L.; Eichbush, T.H. Reverse transcription of R2B mRNA is primed by a nick at the chromosomal target site: A mechanism for non-LTR retrotransposition. Cell 1993, 72, 595. [Google Scholar] [CrossRef]
  24. Shiromoto, Y.; Sakurai, M.; Minakuchi, M.; Ariyoshi, K.; Nishikura, K. ADAR1 RNA editing enzyme regulates R-loop formation and genome stability at telomeres in cancer cells. Nat. Commun. 2021, 12, 1654. [Google Scholar] [CrossRef]
  25. Bayona-Feliu, A.; Herrera-Moyano, E.; Badra-Fajardo, N.; Galvan-Femenia, I.; Soler-Oliva, E.; Aguilera, A. The chromatin network helps prevent cancer-associated mutagenesis at transcription-replication conflicts. Nat. Commun. 2023, 14, 6890. [Google Scholar] [CrossRef]
  26. Stoy, H.; Zwicky, K.; Kuster, D.; Lang, K.S.; Krietsch, J.; Crossley, M.P.; Schmid, J.A.; Cimprich, K.A.; Merrikh, H.; Lopes, M. Direct visualization of transcription-replication conflicts reveals post-replicative DNA:RNA hybrids. Nat. Struct. Mol. Biol. 2023, 30, 348–359. [Google Scholar] [CrossRef]
  27. Steele, E.J.; Lindley, R.A.; Wen, J.; Weiller, G.F. Computational analyses show A-to-G mutations correlate with nascent mRNA hairpins at somatic hypermutation hotspots. DNA Repair 2006, 5, 1346–1363. [Google Scholar] [CrossRef]
  28. Franklin, A.; Steele, E.J. RNA-directed DNA repair and antibody somatic hypermutation. Trends Genet. 2022, 38, 426–436. [Google Scholar] [CrossRef]
  29. Schoggins, J.W.; Rice, C.M. Interferon-stimulated genes and their antiviral effector functions. Curr. Opin. Virol. 2011, 1, 519. [Google Scholar] [CrossRef]
  30. Polyzos, A.A.; McMurray, C.T. Close encounters: Moving along bumps, breaks, and bubbles on expanded trinucleotide tracts. DNA Repair 2017, 56, 144–155. [Google Scholar] [CrossRef]
  31. Wirtz, S.; Nagel, G.; Eshkind, L.; Neurath, M.F.; Samson, L.D.; Kaina, B. Both base excision repair and O6-methylguanine-DNA methyltransferase protect against methylation-induced colon carcinogenesis. Carcinogenesis 2010, 31, 2111–2117. [Google Scholar] [CrossRef] [PubMed]
  32. Krokan, H.E.; Bjoras, M. Base excision repair. Cold Spring Harb. Perspect. Biol. 2013, 5, a012583. [Google Scholar] [CrossRef]
  33. Mamrot, J.; Balachandran, S.; Steele, E.J.; Lindley, R.A. Molecular model linking Th2 polarized M2 tumour-associated macrophages with deaminase-mediated cancer progression mutation signatures. Scand. J. Immunol. 2019, 89, e12760. [Google Scholar] [CrossRef]
  34. Shi, R.; Wang, X.; Wu, Y.; Xu, B.; Zhao, T.; Trapp, C.; Wang, X.; Unger, K.; Zhou, C.; Lu, S.; et al. APOBEC-mediated mutagenesis is a favorable predictor of prognosis and immunotherapy for bladder cancer patients: Evidence from pan-cancer analysis and multiple databases. Theranostics 2022, 12, 4181. [Google Scholar] [CrossRef]
  35. Zheng, Y.C.; Lorenzo, C.; Beal, P.A. DNA Editing in DNA/RNA hybrids by adenosine deaminases that act on RNA. Nucleic Acids Res. 2017, 45, 3369–3377. [Google Scholar] [CrossRef]
  36. Milano, L.; Gautam, A.; Caldecott, K.W. DNA damage and transcription stress. Mol. Cell 2024, 84, 70. [Google Scholar]
  37. Sharma, S.; Patnaik, S.K.; Taggart, R.T.; Kannisto, E.D.; Enriques, S.M.; Gollnick, P.; Baysal, B.E. APOBEC3A cytidine deaminase induces RNA editing in monocytes and macrophages. Nat. Commun. 2015, 6, 6881. [Google Scholar] [CrossRef]
  38. Sharma, S.; Patnaik, S.K.; Kemer, Z.; Baysal, B.E. Transient overexpression of exogenous APOBEC3A causes C-to-U RNA editing of thousands of genes. RNA Biol. 2016, 14, 603. [Google Scholar] [CrossRef]
  39. Fong, N.; Sheridan, R.M.; Ramachandran, S.; Bentley, D.L. The pausing zone and control of RNA polymerase II elongation by Spt5: Implications for the pause-release model. Mol. Cell 2022, 82, 3632–3645.e4. [Google Scholar] [CrossRef]
  40. Tasakis, R.N.; Laganà, A.; Stamkopoulou, D.; Melnekoff, D.T.; Nedumaran, P.; Leshchenko, V.; Pecori, R.; Parekh, S.; Papavasiliou, F.N. ADAR1 can drive Multiple Myeloma progression by acting both as an RNA editor of specific transcripts and as a DNA mutator of their cognate genes. bioRxiv 2020. [Google Scholar] [CrossRef]
  41. McCann, J.L.; Cristini, A.; Law, E.K.; Lee, S.Y.; Tellier, M.; Carpenter, M.A.; Beghe, C.; Sanchez, A.; Jarvis, M.C.; Stefanovska, B.; et al. APOBEC3B regulates R-loops and promotes transcription-associated mutagenesis in cancer. Nat. Genet. 2023, 55, 1721–1734. [Google Scholar] [CrossRef]
  42. Menendez, D.; Nguyen, T.-A.; Snipe, J.; Resnick, M.A. The Cytidine Deaminase APOBEC3 Family Is Subject to Transcriptional Regulation by p53. Mol. Cancer Res. 2017, 15, 735–743. [Google Scholar] [CrossRef]
  43. Storici, F.; Bebenek, K.; Kunkel, T.A.; Gordenin, D.A.; Resnick, M.A. RNA-templated DNA repair. Nature 2007, 447, 338–341. [Google Scholar] [CrossRef]
  44. Keskin, H.; Shen, Y.; Huang, F.; Patel, M.; Yang, T.; Ashley, K.; Mazin, A.V.; Storici, F. Transcript-RNA-templated DNA recombination and repair. Nature 2014, 515, 436. [Google Scholar] [CrossRef]
  45. Meers, C.; Keskin, H.; Banyai, G.; Mazina, O.; Yang, T.; Gombolay, A.L.; Mukherjee, K.; Kaparos, R.; Kaparos, E.I.; Newman, G.; et al. Genetic characterization of three distinct mechanisms sup-porting RNA-driven DNA repair and modification reveals major role of DNA polymerase. Mol. Cell 2020, 79, 1037.e5. [Google Scholar] [CrossRef]
  46. Mayle, R.; Holloman, W.K.; O’Donnell, M.E. DNA polymerase ζ has robust reverse transcriptase activity relative to other cellular DNA polymerases. J. Biol. Chem. 2024, 300, 107918. [Google Scholar] [CrossRef]
  47. Charaborty, A.; Tapryal, N.; Venkova, T.; Horikoshi, N.; Pandita, R.K.; Sarker, A.H.; Sarkar, P.S.; Pandita, T.K.; Hazra, T.K. Classical non-homologous end-joining pathway utilizes nascent RNA for error-free double-strand break repair of transcribed genes. Nat. Commun. 2016, 7, 13049. [Google Scholar] [CrossRef] [PubMed]
  48. Chakraborty, A.; Tapryal, N.; Venkova, T.; Mitra, J.; Vasquez, V.; Sarker, A.H.; Duarte-Silva, S.; Huai, W.; Ashizawa, T.; Ghosh, G.; et al. Deficiency in classical nonhomologous end-joining-mediated repair of transcribed genes is linked to SCA3 pathogenesis. Proc. Natl. Acad. Sci. USA 2020, 117, 8154. [Google Scholar] [CrossRef]
  49. Chakraborty, A.; Tapryal, N.; Islam, A.; Sarker, A.H.; Manohar, K.; Mitra, J.; Hegde, M.L.; Hazra, T. Human DNA polymerase η promotes RNA-templated error-free repair of DNA double-strand breaks. J. Biol. Chem. 2023, 299, 102991. [Google Scholar] [CrossRef]
  50. Franklin, A.; Milburn, P.J.; Blanden, R.V.; Steele, E.J. Human DNA polymerase-h(eta), an A-T mutator in somatic hypermutation of rearranged immunoglobulin genes, is a reverse transcriptase. Immunol. Cell Biol. 2004, 82, 219. [Google Scholar] [CrossRef]
  51. Lieber, M.R. Mechanisms of human lymphoid chromosomal translocations. Nat. Rev. Cancer 2016, 16, 387–398. [Google Scholar] [CrossRef]
  52. Liu, D.; Hsieh, C.-L.; Lieber, M.R. The RNA tether model for human chromosomal translocation fragile zones. Trends Biochem. Sci. 2024, 49, 391–400. [Google Scholar] [CrossRef] [PubMed]
  53. Steele, E.J. Somatic hypermutation in immunity and cancer: Critical analysis of strand-biased and codon-context mutation signatures. DNA Repair 2016, 45, 1. [Google Scholar] [CrossRef]
  54. Senigl, F.; Maman, Y.; Dinesh, R.K.; Alinikula, J.; Seth, R.B.; Pecnova, L.; Omer, A.D.; Rao, S.S.P.; Weisz, D.; Buerstedde, J.-M.; et al. Topologically Associated Domains Delineate Susceptibility to Somatic Hypermutation. Cell Rep. 2019, 29, 3902–3915.e8. [Google Scholar] [CrossRef] [PubMed]
  55. Cui, M.; Huang, J.; Zhang, S.; Liu, Q.; Liao, Q.; Qiu, X. Immunoglobulin Expression in Cancer Cells and Its Critical Roles in Tumorigenesis. Front. Immunol. 2021, 12, 613530. [Google Scholar] [CrossRef]
  56. Zhao, J.; Peng, H.; Gao, J.; Nong, A.; Hua, H.; Yang, S.; Chen, L.; Wu, X.; Zhang, H.; Wang, J. Current insights into the expression and functions of tumor-derived immunoglobulins. Cell Death Discov. 2021, 7, 148. [Google Scholar] [CrossRef]
  57. Lee, G. RP215 and GHR106 Monoclonal Antibodies and Potential Therapeutic Applications. Open J. Immunol. 2023, 13, 61–85. [Google Scholar] [CrossRef]
  58. Zheng, J.; Huang, J.; Mao, Y.; Liu, S.; Sun, X.; Zhu, X.; Ma, T.; Zhang, L.; Ji, J.; Zhang, Y.; et al. Immunoglobulin gene transcripts have distinct VHDJH recombination characteristics in human epithelial cancer cells. J. Biol. Chem. 2009, 284, 13610–13619. [Google Scholar] [CrossRef]
  59. Spisak, N.; de Manuel, M.; Milligan, W.; Sella, G.; Przeworski, M. The clock-like accumulation of germline and somatic mutations can arise from the interplay of DNA damage and repair. PLoS Biol. 2024, 22, e3002678. [Google Scholar] [CrossRef] [PubMed]
  60. Bass, B.L. RNA editing by adenosine deaminases that act on RNA. Annu. Rev. Biochem. 2002, 71, 817. [Google Scholar] [CrossRef]
  61. Lim, G.; Hwang, S.; Yu, K.; Kang, J.Y.; Kang, C.; Hohng, S. Translocating RNA polymerase generates R-loops at DNA double-strand breaks without any additional factors. Nucleic Acids Res. 2023, 51, 9838. [Google Scholar] [CrossRef] [PubMed]
  62. Jimeno, S.; Prados-Carvaja, L.R.; Fernandez-Avila, M.J.; Silvia, S.; Silvestris, D.A.; Endara-Coll, M.; Rogriguez-Real, G.; Domingo-Prim, J.; Mejias-Navarro, F.; Romero-Franco, A.; et al. ADAR-mediated RNA editing of DNA:RNA hybrids is required for DNA double strand break repair. Nat. Commun. 2021, 12, 5512. [Google Scholar] [CrossRef] [PubMed]
  63. Ito, F.; Fu, Y.; Kao, S.A.; Yang, H.; Chen, X.S. Family-Wide Comparative Analysis of Cytidine and Methylcytidine Deamination by Eleven Human APOBEC Proteins. J. Mol. Biol. 2017, 429, 1787–1799. [Google Scholar] [CrossRef]
  64. Petljak, M.; Dananberg, A.; Chu, K.; Bergstrom, E.N.; Striepen, J.; von Morgen, P.; Chen, Y.; Shah, H.; Sale, J.E.; Alexandrov, L.B.; et al. Mechanisms of APOBEC3 mutagenesis in human cancer cells. Nature 2022, 607, 799–807. [Google Scholar] [CrossRef] [PubMed]
  65. Zamborszky, J.; Szikriszt, B.; Gervai, J.Z.; Pipek, O.; Poti, A.; Krzystanek, M.; Ribli, D.; Szalai-Gindi, J.M.; Csabai, I.; Szallasi, Z.; et al. Loss of BRCA1 or BRCA2 markedly increases the rate of base substitution mutagenesis and has distinct effects on genomic deletions. Oncogene 2017, 36, 746–755. [Google Scholar] [CrossRef]
  66. Chandramouly, G.; Zhao, J.; McDevitt, S.; Rusanov, T.; Hoang, T.; Borisonnik, N.; Treddinick, T.; Lopezcolorado, F.W.; Kent, T.; Siddique, L.A.; et al. Polθ reverse transcribes RNA and promotes RNA-templated DNA repair. Sci. Adv. 2021, 7, eabf1771. [Google Scholar] [CrossRef] [PubMed]
  67. Chen, D.; Gervai, J.Z.; Poti, A.; Nemeth, E.; Szeltner, Z.; Szikriszt, B.; Gyure, Z.; Zamborsky, J.; Ceccon, M.; di Fagagna, F. d’A.; et al. BRCA1 deficiency specific base substitution mutagenesis is dependent on translesion synthesis and regulated by 53BP1. Nat. Commun. 2022, 13, 226. [Google Scholar] [CrossRef]
  68. Su, Y.; Egli, M.; Guengerich, F.P. Human DNA polymerase η accommodates RNA for strand extension. J. Biol. Chem. 2017, 292, 18044–18051. [Google Scholar] [CrossRef]
  69. Su, Y.; Ghodke, P.P.; Egli, M.; Li, L.; Wang, Y.; Guengerich, F.P. Human DNA polymerase η has reverse transcriptase activity in cellular environments. J. Biol. Chem. 2019, 294, 6073–6081. [Google Scholar] [CrossRef]
  70. Hanawalt, P.C.; Spivak, G. Transcriptional-coupled DNA repair: Two decades of progress and surprises. Nat. Rev. Mol. Cell Biol. 2008, 9, 958–970. [Google Scholar] [CrossRef]
  71. Denissenko, M.F.; Pao, A.; Tang, M.-S.; Pfeifer, G.P. Preferential formation of Benzo[a]pyrene adducts at lung cancer mutational hotspots in P53. Science 1996, 274, 430–432. [Google Scholar] [CrossRef] [PubMed]
  72. Denissenko, M.F.; Pao, A.; Pfeifer, G.P.; Tang, M.-S. Slow repair of bulky DNA adducts along the nontranscribed strand of the human p53 gene may explain the strand bias of transversion mutations in cancers. Oncogene 1998, 16, 1241–1247. [Google Scholar] [CrossRef] [PubMed]
  73. Jin, S.-G.; Pettinga, D.; Johnson, J.; Li, P.; Pfeifer, G.P. The major mechanism of melanoma mutations is based on deamination of cytosine in pyrimidine dimers as determined by circle damage sequencing. Sci. Adv. 2021, 7, eabi6508. [Google Scholar] [CrossRef] [PubMed]
  74. Neil, S.; Bieniasz, P. Human Immunodeficiency Virus, Restriction Factors, and Interferon. J. Interferon Cytokine Res. 2009, 29, 569. [Google Scholar] [CrossRef]
  75. Burns, M.B.; Temiz, N.A.; Harris, R.S. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat. Genet. 2013, 45, 977–983. [Google Scholar] [CrossRef]
  76. Roberts, S.A.; Lawrence, M.S.; Klimczak, L.J.; Grimm, S.A.; Fargo, D.; Stojanov, P.; Kiezun, A.; Kryukov, G.V.; Carter, S.L.; Saksena, G.; et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat. Genet. 2013, 45, 970–976. [Google Scholar] [CrossRef] [PubMed]
  77. Chan, K.; A Roberts, S.; Klimczak, L.J.; Sterling, J.F.; Saini, N.; Malc, E.P.; Kim, J.; Kwiatkowski, D.J.; Fargo, D.C.; A Mieczkowski, P.; et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 2015, 47, 1067–1072. [Google Scholar] [CrossRef]
  78. Zaidi, M.R.; Davis, S.; Noonan, F.P.; Graff-Cherry, C.; Hawley, T.S.; Walker, R.; Feigenbaum, L.; Fuchs, E.; Lyakh, L.; Young, H.A.; et al. Interferon-g links ultraviolet radiation to melanomagenesis in mice. Nature 2011, 469, 458. [Google Scholar] [CrossRef]
  79. Chan, T.H.; Lin, C.H.; Qi, L.; Fei, J.; Li, Y.; Yong, K.J.; Liu, M.; Song, Y.; Chow, R.K.K.; Ng, V.H.E.; et al. A disrupted RNA editing balance mediated by ADARs (Adenosine Deaminases that act on RNA) in human hepatocellular carcinoma. Gut 2014, 63, 832–843. [Google Scholar] [CrossRef]
  80. Lindley, R.A.; Steele, E.J. Presumptive Evidence for ADAR1 A-to-I Deamination at WA-sites as the Mutagenic Genomic Driver in Hepatocellular and Related ADAR1-Hi Cancers. J. Carcinog. Mutagen. 2020, 11, 2. [Google Scholar]
  81. Haradhvala, N.J.; Polak, P.; Stojanov, P.; Covington, K.R.; Shinbrot, E.; Hess, J.M.; Rheinbay, E.; Kim, J.; Maruvka, Y.E.; Braunstein, L.Z.; et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 2016, 164, 538–549. [Google Scholar] [CrossRef]
  82. Wilson, T.M.; Vaisman, A.; Martomo, S.A.; Sullivan, P.; Lan, L.; Hanaoka, F.; Yasui, A.; Woodgate, R.; Gearhart, P.J. MSH2–MSH6 stimulates DNA polymerase eta, suggesting a role for A:T mutations in antibody genes. J. Exp. Med. 2005, 201, 637–645. [Google Scholar] [CrossRef] [PubMed]
  83. Rogozin, I.B.; Pavlov, Y.I.; Bebenek, K.; Matsuda, T.; Kunkel, T.A. Somatic mutation hotspots correlate with DNA polymerase eta error spectrum. Nat. Immunol. 2001, 2, 530. [Google Scholar] [CrossRef] [PubMed]
  84. Zeng, X.; Winter, D.B.; Kasmer, C.; Kraemer, K.H.; Lehmann, A.R.; Gearhart, P.J. DNA polymerase η is an A–T mutator in somatic hypermutation of immunoglobulin variable genes. Nat. Immunol. 2001, 2, 537–541. [Google Scholar] [CrossRef] [PubMed]
  85. Thorslund, T.; Sunesen, M.; Bohr, V.A.; Stevnsner, T. Repair of 8-oxoG is slower in endogenous nuclear genes than in mitochondrial DNA and is without strand bias. DNA Repair 2002, 1, 261. [Google Scholar] [CrossRef] [PubMed]
  86. Anderson, C.J.; Talmane, L.; Luft, J.; Connelly, J.; Nicholson, M.D.; Verburg, J.C.; Pich, O.; Campbell, S.; Giasi, M.; Wei, P.-C.; et al. Strand-resolved mutagenicity of DNA damage and repair. Nature 2024, 630, 744–751. [Google Scholar] [CrossRef]
  87. Aitken, S.J.; Anderson, C.J.; Connor, F.; Pich, O.; Sundaram, V.; Feig, C.; Rayner, T.F.; Luck, M.; Aitken, S.; Luft, J.; et al. Pervasive lesion segregation shapes cancer genome evolution. Nature 2020, 583, 265–270. [Google Scholar] [CrossRef]
  88. Beale, R.C.L.; Petersen-Mahrt, S.K.; Watt, I.N.; Harris, R.S.; Rada, C.; Neuberger, M.S. Comparison of the different context-dependence of DNA deamination by APOBEC enzymes: Correlation with mutation spectra in vivo. J. Mol. Biol. 2004, 337, 585–596. [Google Scholar] [CrossRef]
  89. Berry, M.W.; Browne, M.; Langville, A.N.; Pauca, V.P.; Plemmons, R.J. Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 2007, 52, 155–173. [Google Scholar] [CrossRef]
  90. Bishop, K.N.; Holmes, R.K.; Sheehy, A.M.; Davidson, N.O.; Cho, S.J.; Malim, M.H. Cytidine deamination of retroviral DNA by diverse APOBEC proteins. Curr. Biol. 2004, 4, 1392–1396. [Google Scholar] [CrossRef] [PubMed]
  91. Blanc, V.; Davidson, N.O. C-to-U RNA Editing: Mechanisms leading to genetic diversity. J. Biol. Chem. 2003, 278, 1395–1398. [Google Scholar] [CrossRef] [PubMed]
  92. Buisson, R.; Langenbucher, A.; Bowen, D.; Kwan, E.E.; Benes, C.H.; Zou, L.; Lawrence, M.S. Passenger hotspot mutations in cancer driven by APOBEC3A and mesoscale genomic features. Science 2019, 364, eaaw2872. [Google Scholar] [CrossRef] [PubMed]
  93. Burns, M.B.; Lackey, L.; Carpenter, M.A.; Rathore, A.; Land, A.M.; Leonard, B.; Refsland, E.W.; Kotandeniya, D.; Tretyakova, N.; Nikas, J.B.; et al. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature 2013, 494, 366–371. [Google Scholar] [CrossRef] [PubMed]
  94. Dang, Y.; Wang, X.; Esselman, W.J.; Zheng, Y.-H. Identification of APOBEC3-DE as another antiretroviral factor from the Human APOBEC Family. J. Virol. 2006, 80, 10522–10533. [Google Scholar] [CrossRef]
  95. Ewa, B.; Danuta, M.-S. Polycyclic aromatic hydrocarbons and PAH-related DNA adducts. J. Appl. Genet. 2017, 58, 321–330. [Google Scholar] [CrossRef] [PubMed]
  96. Guttenplan, J.B.; Kosinska, W.; Zhao, Z.-L.; Chen, K.-M.; Aliaga, C.; DelTondo, J.; Cooper, T.; Sun, Y.-W.; Zhang, S.-M.; Jiang, K.; et al. Mutagenesis and carcinogenesis induced by dibenzo[a,l]pyrene in the mouse oral cavity: A potential new model for oral cancer. Int. J. Cancer 2012, 130, 2783–2790. [Google Scholar] [CrossRef]
  97. Hache’, G.; Liddament, M.T.; Harris, R.S. The retroviral hypermutation specificity of APOBEC3F and APOBEC3G is governed by the C-terminal DNA Cytosine deaminase domain. J. Biol. Chem. 2005, 280, 10920–10924. [Google Scholar] [CrossRef] [PubMed]
  98. Harari, A.; Ooms, M.; Mulder, L.C.; Simon, V. Polymorphisms and splice variants influence the antiretroviral activity of human APOBEC3H. J. Virol. 2009, 83, 295–303. [Google Scholar] [CrossRef]
  99. Henry, M.; Guetard, D.; Suspene, R.; Rusniok, C.; Wain-Hobson, S.; Vartanian, J.-P. Genetic editing of HBV DNA by monodomain human APOBEC3 cytidine deaminases and the recombinant nature of APOBEC3G. PLoS ONE 2009, 4, e4277. [Google Scholar] [CrossRef]
  100. Huang, Y.; Chen, C.; Russu, I.M. Dynamics and stability of individual base pairs in two homologous RNA-DNA hybrids. Biochemistry 2009, 48, 3988–3997. [Google Scholar] [CrossRef] [PubMed]
  101. Leonard, B.; Hart, S.N.; Burns, M.B.; Carpenter, M.A.; Temiz, N.A.; Rathore, A.; Vogel, R.I.; Nikas, J.B.; Law, E.K.; Brown, W.L.; et al. APOBEC3B upregulation and genomic mutation patterns in serous ovarian carcinoma. Cancer Res. 2013, 73, 7222–7231. [Google Scholar] [CrossRef] [PubMed]
  102. Liddament, M.T.; Brown, W.L.; Schumacher, A.J.; Harris, R.S. APOBEC3F properties and hypermutation preferences indicate activity against HIV-1 in vivo. Curr. Biol. 2004, 14, 1385–1391. [Google Scholar] [CrossRef] [PubMed]
  103. Lindley, R.A.; Steele, E.J. Deaminases and Why Mice Sometimes Lie in Immuno-Oncology Pre-Clinical Trials? Ann. Clin. Oncol. 2019. [Google Scholar] [CrossRef]
  104. Logue, E.C.; Bloch, N.; Dhuey, E.; Zhang, R.; Cao, P.; Herate, C.; Chauveau, L.; Hubbard, S.R.; Landau, N.R. A DNA sequence recognition loop on APOBEC3A controls substrate specificity. PLoS ONE 2014, 9, e97062. [Google Scholar] [CrossRef] [PubMed]
  105. Malvezzi, S.; Farnung, L.; Aloisi, C.M.N.; Angelov, T.; Cramer, P.; Sturla, S.J. Mechanism of RNA polymerase II stalling by DNA alkylation. Proc. Natl. Acad. Sci. USA 2017, 114, 12172–12177. [Google Scholar] [CrossRef] [PubMed]
  106. Mertz, T.M.; Harcy, V.; Roberts, S.A. Risks at the DNA Replication Fork: Effects upon Carcinogenesis and Tumor Heterogeneity. Genes 2017, 8, 46. [Google Scholar] [CrossRef]
  107. Rosenberg, B.R.; Hamilton, C.E.; Mwangi, M.M.; Dewell, S.; Papavasiliou, F.N. Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA editing targets in transcript 3′ UTRs. Nat. Struct. Mol. Biol. 2011, 18, 230–236. [Google Scholar] [CrossRef] [PubMed]
  108. Sanchez, A.; Ortega, P.; Sakhtemani, R.; Manjunath, L.; Oh, S.; Bournique, E.; Becker, A.; Kim, K.; Durfee, C.; Temiz, N.A.; et al. Mesoscale DNA features impact APOBEC3A and APOBEC3B deaminase activity and shape tumor mutational landscapes. Nat. Commun. 2024, 15, 2370. [Google Scholar] [CrossRef]
  109. Seplyarskiy, V.B.; Soldatov, R.A.; Popadin, K.Y.; Antonarakis, S.E.; Bazykin, G.A.; Nikolaev, S.I. APOBEC-induced mutations in human cancers are strongly enriched on the lagging DNA strand during replication. Genome Res. 2016, 26, 174–182. [Google Scholar] [CrossRef] [PubMed]
  110. Sowden, M.; Hamm, J.K.; Smith, H.C. Overexpression of APOBEC-1 results in mooring sequence-dependent promiscuous RNA editing. J. Biol. Chem. 1996, 271, 3011–3017. [Google Scholar] [CrossRef] [PubMed]
  111. Swann, P.F. Why do O6-alkylguanine and O4-alkylthymine miscode? The relationship between the structure of DNA containing O6-alkylguanine and O4-alkylthymine and the mutagenic properties of these bases. Mutat. Res. 1990, 233, 81–94. [Google Scholar] [CrossRef] [PubMed]
  112. Taylor, B.J.M.; Nik-Zainal, S.; Wu, Y.L.; Stebbings, L.A.; Raine, K.; Campbell, P.J.; Rad, C.; Stratton, M.R.; Neuberger, M.S. DNA deaminases induce break-associated mutation showers with implication of APOBEC3B and 3A in breast cancer kataegis. eLife 2013, 2, e00534. [Google Scholar] [CrossRef] [PubMed]
  113. Wiegand, H.L.; Doehle, B.P.; Bogerd, H.P.; Cullen, B.R. A second human antiretroviral factor, APOBEC3F, is suppressed by the HIV-1 and HIV-2 Vif proteins. EMBO J. 2004, 23, 2451–2458. [Google Scholar] [CrossRef]
  114. Yu, Q.; Chen, D.; Konig, R.; Mariani, R.; Unutmaz, D.; Landau, N.R. APOBEC3B and APOBEC3C are potent inhibitors of simian immunodeficiency virus replication. J. Biol. Chem. 2004, 279, 53379–53386. [Google Scholar] [CrossRef] [PubMed]
Figure 1. DRT Model Cancer Mutagenesis Main substrates and single base substitutions with strand bias consequences for deamination-driven reverse transcription (DRT) mutagenesis in progressing cancer genomes is shown for: (a). Stalled Transcription Bubble. The ssDNA sites in the open transcription bubble are targeted by the AID/APOBEC cytosine deaminases and create C-to-U and abasic lesion sites. Black strands represent DNA. Red strands represent RNA. Blue strands represent cDNA. RNA mutations (G-to-A, G-to-C, G-to-U) appear as a consequence of transcription across these AID/APOBEC cytosine deamination lesion sites [15] by the RNA Polymerase II elongation complex (RNA Pol II) on the transcribed strand (TS) indicated by open circles. The RNA exosome allows access to unpaired cytosines on the TS in RNA:DNA hybrid [16]; or by transcription-coupled ADAR1 deamination of adenine to inosine (A-to-I) in the nascent dsRNA or on both nucleic moieties of the annealed RNA:DNA hybrid (9–11 nt) indicated by closed circle. Other subsidiary non-deaminase-driven RNA modifications could include endogenous uracil isomerization to pseudouridine (ψ) to give a U-to-G miscoding substitution [17,18,19,20], indicated as closed triangles; or non-deaminase-driven RNA miscoding mutations (G-to-U) following reactive oxygen species (ROS) generation of 8oxoG (c.f. SBS18 transcriptional strand asymmetry) in nascent RNA or the annealed RNA:DNA hybrids [21], indicated by inverted closed triangles. The last TSRT step is effectively a potential “error prone” DNA repair process akin to a patch nucleotide excision repair (NER) on the TS allowing replication of the helix in that damaged genomic region, discussed at length in figure four in Franklin et al. [22]. Alternate symbol fills are shown to symbolize RNA mutation or modification as a complementary base pairing partner in DNA. Also see and compare the prior published schematic summary showing the main elements of the reverse transcriptase (RT) mechanism for immunoglobulin (Ig) somatic hypermutation (SHM)—RT Ig-SHM—and the target site reverse transcription (TSRT) process as a patch correction around DNA lesion sites following Luan et al., 1993 [23] as discussed by Steele et al., 2024 [7] and Figure 2. The term “ Then sites without a base” is also called an abasic site. (b). R Loops. See text for more detail on deamination modifications by ADAR1 or ADAR2 [24,25] at long (40 nt–670 nt) annealed RNA:DNA hybrids at R Loops [26]. Black strands represent DNA. Red strands represent RNA. These are often generated under replicative stress in the body of the genome, particularly at transcription replication fork collisions or conflicts (TRCs) on the same strand [25,26] at deaminated A-sites in both the RNA and DNA moieties. These DNA A-to-I modifications are also referred to as hypoxanthine (Hx). As discussed in the text, such deaminations contribute to R Loop dissolution by facilitating the release of the firmly bound RNA and then its degradation by RNaseH activity. After R Loop collapse, the inosine modified TS (Hx) sites remaining unrepaired will be replicated over and result in excess T-to-C, T-to-A and T-to-G mutations (filled stars) on the NTS. The incidence of these mutations (in order T-to-C>T-to-A>T-to-G) result in transcriptional strand asymmetry signatures as discussed in detail in the text and summarized in Table 3.
Figure 1. DRT Model Cancer Mutagenesis Main substrates and single base substitutions with strand bias consequences for deamination-driven reverse transcription (DRT) mutagenesis in progressing cancer genomes is shown for: (a). Stalled Transcription Bubble. The ssDNA sites in the open transcription bubble are targeted by the AID/APOBEC cytosine deaminases and create C-to-U and abasic lesion sites. Black strands represent DNA. Red strands represent RNA. Blue strands represent cDNA. RNA mutations (G-to-A, G-to-C, G-to-U) appear as a consequence of transcription across these AID/APOBEC cytosine deamination lesion sites [15] by the RNA Polymerase II elongation complex (RNA Pol II) on the transcribed strand (TS) indicated by open circles. The RNA exosome allows access to unpaired cytosines on the TS in RNA:DNA hybrid [16]; or by transcription-coupled ADAR1 deamination of adenine to inosine (A-to-I) in the nascent dsRNA or on both nucleic moieties of the annealed RNA:DNA hybrid (9–11 nt) indicated by closed circle. Other subsidiary non-deaminase-driven RNA modifications could include endogenous uracil isomerization to pseudouridine (ψ) to give a U-to-G miscoding substitution [17,18,19,20], indicated as closed triangles; or non-deaminase-driven RNA miscoding mutations (G-to-U) following reactive oxygen species (ROS) generation of 8oxoG (c.f. SBS18 transcriptional strand asymmetry) in nascent RNA or the annealed RNA:DNA hybrids [21], indicated by inverted closed triangles. The last TSRT step is effectively a potential “error prone” DNA repair process akin to a patch nucleotide excision repair (NER) on the TS allowing replication of the helix in that damaged genomic region, discussed at length in figure four in Franklin et al. [22]. Alternate symbol fills are shown to symbolize RNA mutation or modification as a complementary base pairing partner in DNA. Also see and compare the prior published schematic summary showing the main elements of the reverse transcriptase (RT) mechanism for immunoglobulin (Ig) somatic hypermutation (SHM)—RT Ig-SHM—and the target site reverse transcription (TSRT) process as a patch correction around DNA lesion sites following Luan et al., 1993 [23] as discussed by Steele et al., 2024 [7] and Figure 2. The term “ Then sites without a base” is also called an abasic site. (b). R Loops. See text for more detail on deamination modifications by ADAR1 or ADAR2 [24,25] at long (40 nt–670 nt) annealed RNA:DNA hybrids at R Loops [26]. Black strands represent DNA. Red strands represent RNA. These are often generated under replicative stress in the body of the genome, particularly at transcription replication fork collisions or conflicts (TRCs) on the same strand [25,26] at deaminated A-sites in both the RNA and DNA moieties. These DNA A-to-I modifications are also referred to as hypoxanthine (Hx). As discussed in the text, such deaminations contribute to R Loop dissolution by facilitating the release of the firmly bound RNA and then its degradation by RNaseH activity. After R Loop collapse, the inosine modified TS (Hx) sites remaining unrepaired will be replicated over and result in excess T-to-C, T-to-A and T-to-G mutations (filled stars) on the NTS. The incidence of these mutations (in order T-to-C>T-to-A>T-to-G) result in transcriptional strand asymmetry signatures as discussed in detail in the text and summarized in Table 3.
Ijms 26 00989 g001
Figure 2. The reverse transcriptase mechanism of strand-biased immunoglobulin somatic hypermutation (RT-Ig SHM). A schematic outline showing the main mutational events at Stalled Transcription Bubbles generated by RNA Pol II [7]. Black strands represent DNA. Red strands represent RNA. Blue strands represent cDNA. The green sites are A-to-I RNA editing sites in double stranded RNA or in both RNA and DNA moieties of RNA:DNA hybrids. Site without a base is also called an abasic site.
Figure 2. The reverse transcriptase mechanism of strand-biased immunoglobulin somatic hypermutation (RT-Ig SHM). A schematic outline showing the main mutational events at Stalled Transcription Bubbles generated by RNA Pol II [7]. Black strands represent DNA. Red strands represent RNA. Blue strands represent cDNA. The green sites are A-to-I RNA editing sites in double stranded RNA or in both RNA and DNA moieties of RNA:DNA hybrids. Site without a base is also called an abasic site.
Ijms 26 00989 g002
Figure 3. Transcriptional Strand Asymmetry Profiles SBS5, SBS3 and SBS1. These profiles are taken directly from the publicly available COSMIC website from Single Base Substitution Mutational Signatures (v3.4 October 2023) at the COSMIC website at https://cancer.sanger.ac.uk/signatures/sbs/ accessed on 15 April 2024 [10,11,12]. URL accessed between 7 and 15 April 2024. Transcriptional stand asymmetry in SBS5 is notable at ACN and ATN trinucleotides, and to a lesser extent in others TCN and TTN (mutated base underlined) where the strong strand bias A>>T and G>>C (Table 1) is now often reversed to T>>A and C>>G when read from the non-transcribed or coding strand. See discussion of Table 1 and Table 2. In SBS3 it is a far flatter profile with patterns of C>A, C>G, C>T broadly elevated across the C-site trinucleotides. A similar flatter profile at T>A, T>C and T>G trinucleotides. See Table 1 for statistical significance of the main profiles. Tables S1–S3 have summaries of the types of mutations observed in different cancer types. For SBS1 the dominant signature is G-to-A exceeding C-to-T systematically across all tissues p < 0.001 mainly at ACG motifs; the apparent reverse strand bias in the SBS1 profile at CCG, GCG, TCG motifs is not significant.
Figure 3. Transcriptional Strand Asymmetry Profiles SBS5, SBS3 and SBS1. These profiles are taken directly from the publicly available COSMIC website from Single Base Substitution Mutational Signatures (v3.4 October 2023) at the COSMIC website at https://cancer.sanger.ac.uk/signatures/sbs/ accessed on 15 April 2024 [10,11,12]. URL accessed between 7 and 15 April 2024. Transcriptional stand asymmetry in SBS5 is notable at ACN and ATN trinucleotides, and to a lesser extent in others TCN and TTN (mutated base underlined) where the strong strand bias A>>T and G>>C (Table 1) is now often reversed to T>>A and C>>G when read from the non-transcribed or coding strand. See discussion of Table 1 and Table 2. In SBS3 it is a far flatter profile with patterns of C>A, C>G, C>T broadly elevated across the C-site trinucleotides. A similar flatter profile at T>A, T>C and T>G trinucleotides. See Table 1 for statistical significance of the main profiles. Tables S1–S3 have summaries of the types of mutations observed in different cancer types. For SBS1 the dominant signature is G-to-A exceeding C-to-T systematically across all tissues p < 0.001 mainly at ACG motifs; the apparent reverse strand bias in the SBS1 profile at CCG, GCG, TCG motifs is not significant.
Ijms 26 00989 g003
Figure 4. Transcriptional Strand Asymmetry Profiles SBS5 compared to SBS2, SBS13. These profiles are taken directly from the publicly available COSMIC website. Single Base Substitution Mutational Signatures (v3.4 October 2023) at the COSMIC website at https://cancer.sanger.ac.uk/signatures/sbs/ accessed on 15 April 2024.
Figure 4. Transcriptional Strand Asymmetry Profiles SBS5 compared to SBS2, SBS13. These profiles are taken directly from the publicly available COSMIC website. Single Base Substitution Mutational Signatures (v3.4 October 2023) at the COSMIC website at https://cancer.sanger.ac.uk/signatures/sbs/ accessed on 15 April 2024.
Ijms 26 00989 g004
Figure 5. Conversion text file to Excel file. SBS5 Liver-HCC text file “v3.2_SBS5_TRANSCR_ASYM.xlsx”. https://cancer.sanger.ac.uk/signatures/sbs/sbs5/ (accessed on 15 April 2024).
Figure 5. Conversion text file to Excel file. SBS5 Liver-HCC text file “v3.2_SBS5_TRANSCR_ASYM.xlsx”. https://cancer.sanger.ac.uk/signatures/sbs/sbs5/ (accessed on 15 April 2024).
Ijms 26 00989 g005
Figure 6. Conversion text file to Excel file. Conversion of real mutations to a “Types of Mutation” table.
Figure 6. Conversion text file to Excel file. Conversion of real mutations to a “Types of Mutation” table.
Ijms 26 00989 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Steele, E.J.; Lindley, R.A. Deaminase-Driven Reverse Transcription Mutagenesis in Oncogenesis: Critical Analysis of Transcriptional Strand Asymmetries of Single Base Substitution Signatures. Int. J. Mol. Sci. 2025, 26, 989. https://doi.org/10.3390/ijms26030989

AMA Style

Steele EJ, Lindley RA. Deaminase-Driven Reverse Transcription Mutagenesis in Oncogenesis: Critical Analysis of Transcriptional Strand Asymmetries of Single Base Substitution Signatures. International Journal of Molecular Sciences. 2025; 26(3):989. https://doi.org/10.3390/ijms26030989

Chicago/Turabian Style

Steele, Edward J., and Robyn A. Lindley. 2025. "Deaminase-Driven Reverse Transcription Mutagenesis in Oncogenesis: Critical Analysis of Transcriptional Strand Asymmetries of Single Base Substitution Signatures" International Journal of Molecular Sciences 26, no. 3: 989. https://doi.org/10.3390/ijms26030989

APA Style

Steele, E. J., & Lindley, R. A. (2025). Deaminase-Driven Reverse Transcription Mutagenesis in Oncogenesis: Critical Analysis of Transcriptional Strand Asymmetries of Single Base Substitution Signatures. International Journal of Molecular Sciences, 26(3), 989. https://doi.org/10.3390/ijms26030989

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop