**Bioanalytical and Mass Spectrometric Methods for Aldehyde Profiling in Biological Fluids**

#### **Romel P. Dator, Morwena J. Solivio, Peter W. Villalta \* and Silvia Balbo \***

Masonic Cancer Center, University of Minnesota, 2231 6th Street SE, Minneapolis, MN 55455, USA; rpdator@umn.edu (R.P.D.); msolivio@umn.edu (M.J.S.)

**\*** Correspondence: villa001@umn.edu (P.W.V.); balbo006@umn.edu (S.B.);

Tel.: +1-612-626-8165 (P.W.V.); +1-612-624-4240 (S.B.)

Received: 22 March 2019; Accepted: 22 May 2019; Published: 4 June 2019

**Abstract:** Human exposure to aldehydes is implicated in multiple diseases including diabetes, cardiovascular diseases, neurodegenerative disorders (i.e., Alzheimer's and Parkinson's Diseases), and cancer. Because these compounds are strong electrophiles, they can react with nucleophilic sites in DNA and proteins to form reversible and irreversible modifications. These modifications, if not eliminated or repaired, can lead to alteration in cellular homeostasis, cell death and ultimately contribute to disease pathogenesis. This review provides an overview of the current knowledge of the methods and applications of aldehyde exposure measurements, with a particular focus on bioanalytical and mass spectrometric techniques, including recent advances in mass spectrometry (MS)-based profiling methods for identifying potential biomarkers of aldehyde exposure. We discuss the various derivatization reagents used to capture small polar aldehydes and methods to quantify these compounds in biological matrices. In addition, we present emerging mass spectrometry-based methods, which use high-resolution accurate mass (HR/AM) analysis for characterizing carbonyl compounds and their potential applications in molecular epidemiology studies. With the availability of diverse bioanalytical methods presented here including simple and rapid techniques allowing remote monitoring of aldehydes, real-time imaging of aldehydic load in cells, advances in MS instrumentation, high performance chromatographic separation, and improved bioinformatics tools, the data acquired enable increased sensitivity for identifying specific aldehydes and new biomarkers of aldehyde exposure. Finally, the combination of these techniques with exciting new methods for single cell analysis provides the potential for detection and profiling of aldehydes at a cellular level, opening up the opportunity to minutely dissect their roles and biological consequences in cellular metabolism and diseases pathogenesis.

**Keywords:** aldehydes; genotoxicity; cancer; diseases; oxidative stress; exposure biomarkers; high-resolution mass spectrometry; data-dependent profiling; derivatization; biological fluids; isotope labeling

#### **1. Introduction**

#### *Sources of Human Exposure to Aldehydes*

Aldehydes are characterized by the presence of a –HC = O reactive site and often exist in combination with other functional groups. They are ubiquitous in the environment, originating from man-made sources, as well as through natural processes (Figure 1). The hydroxyl radical mediated-photochemical oxidation of hydrocarbons generates aldehydes in the atmosphere [1–3]. For instance, formaldehyde is produced from the oxidation of methane and naturally occurring compounds, such as terpenoids and isoprenoids from tree foliage [2]. In industrialized areas, the majority of aldehydes are produced from motor vehicle exhaust (internal diesel engine combustion),

which either directly yields aldehydes or generates hydrocarbons, which are eventually converted to aldehydes by photochemical oxidation reactions [1,4–8]. Formaldehyde, acetaldehyde, and acrolein are significant contributors to the overall summed risk of mobile sources of air toxicants according to the United States Environmental Protection Agency (U.S. EPA) [1]. Other sources of aldehydes include agricultural and forest fires, incinerators, and coal-based power plants [9–13]. Additionally, humans are exposed to aldehydes in residential and occupational settings where aldehydes are present in confined spaces [14] due to the release of fumes from indoor furniture, carpets, fabrics, household cleaning agents, cosmetic products, and paints [12,15–18]. Aldehydes are also widely used as fumigants and for biological specimen preservation [1]. Another major source of aldehyde exposure comes from cigarette smoke. Mainstream tobacco smoke (MTS) is composed of significant amounts of acetaldehyde as the major component, followed by acrolein, formaldehyde, and crotonaldehyde [19–26]. Similarly, popular devices such as e-cigarettes, which are advocated as safer alternatives to tobacco, have been found to generate high concentrations of aldehydes [27–37]. Aldehydes are also present in food and beverages (as flavorings), and in alcoholic drinks either as congeners or, in the case of acetaldehyde, as the oxidative by-product of ethanol [38–40]. Biotransformation is another source of aldehyde exposure. This includes metabolism of a sizeable number of environmental agents, such as drugs, tobacco smoke, alcohol, and other forms of xenobiotics [41–43]. Of note, exposure also comes from the metabolism of a number of widely used anticancer drugs such as cyclophosphamide, ifosfamide, and misonidazole as well as other drugs used for the treatment of diseases such as epilepsy and HIV-1 infection [1]. The production of aldehydes is proposed to be an important contributor to the toxicity and undesirable side effects of treatment with these drugs.

**Figure 1.** Exogenous and endogenous sources of human exposure to aldehydes.

Finally, normal cellular metabolic pathways such as lipid peroxidation, Alk-B type repair, histone demethylation, carbohydrate or ascorbate autoxidation, carbohydrate metabolism, and amine oxidase-, cytochrome P-450-, and myeloperoxidase-catalyzed metabolic pathways produce aldehydes endogenously [1,44,45]. The metabolism of molecules such as amino acids, vitamins, and steroids, to name a few, also generates aldehydes [46]. Aldehydes are generally formed during conditions of high oxidative stress. Oxidants are generated as a result of normal intracellular metabolism in

the mitochondria, peroxisomes, and a number of cytosolic enzyme systems [47]. These metabolic free radicals and oxidants are referred to as reactive oxygen species (ROS). A balance between ROS production and removal by the antioxidant defense systems is essential to maintaining redox homeostasis. A disturbance in the balance favoring pro-oxidative conditions results to oxidative stress. An elevated level of ROS and the resulting oxidative stress leads to biological damage and is implicated in aging and pathologies of various conditions including cancer, cardiovascular, inflammatory, and neurodegenerative diseases [47,48]. The generation of aldehydes is one important consequence of sustained oxidative stress, which can result to the auto-oxidation of lipids (damaging cell membranes) and fatty acids within cells. Lipid peroxidation occurs when a variety of ROS and/or reactive nitrogen species (RNS) oxidize lipids containing carbon-carbon double bonds, especially polyunsaturated fatty acids, resulting in free radical chain reactions and subsequent formation of by-products such as lipid radicals, hydrocarbons, and aldehydes [49]. The correlation between elevated ROS and aldehyde production has been extensively studied and is known to contribute to a multitude of disease pathologies by altering proteomic, genomic, cell signaling, and metabolic processes [50,51]. Indeed, 4-hydroxy-2-nonenal (4-HNE) and malondialdehyde (MDA) are both used as markers of the magnitude of oxidative stress and lipid peroxidation [52]. Dietary consumption of polyunsaturated fatty acids and the subsequent oxidation of these molecules can also result to the formation of aldehydes. The carbohydrate or ascorbate autoxidation pathways generate endogenous glyoxal, which is a major lipid and DNA oxidative degradation product [1]. Likewise, methylglyoxal is produced through the enzymatic reactions of triose phosphate intermediates such as glyceraldehyde-3-phosphate and dihydroxyacetone phosphate during glycolysis or from the metabolism of ketone bodies or threonine [53]. The serum amine oxidase (SAO) and polyamine oxidase (PAO) also generate endogenous aldehydes by catalyzing the deamination of biogenic amines [1]. In summary, the dysregulation of metabolic processes and oxidative stress result in lipid peroxidation, carbohydrate auto-oxidation, protein oxidation, as well as polyamine catabolism, all result in aldehyde formation.

#### **2. Biological Consequences of Aldehyde Exposure on Genome Integrity, Carcinogenesis, and Other Diseases**

Low molecular weight aldehydes such as formaldehyde, acetaldehyde, and acrolein are generally toxic compounds. The majority of the most abundant aldehydes are irritants at high doses, and, due to their volatility, induce acute inhalation toxicity. Additionally, aldehydes are believed to play major roles in various debilitating diseases such as cancer and neurodegeneration. Aldehydes are highly reactive, electrophilic compounds, which can exert their detrimental role through interactions with various biomolecules such as phospholipids, peptides, regulatory proteins, enzymes, and DNA forming covalent modifications, affecting their normal functions and leading to mutations and chromosomal aberrations. These mediated effects vary from physiological and homeostatic, to cytotoxic, mutagenic, and carcinogenic [54,55]. Figure 2 shows the structures of common aldehydes implicated in the pathogenesis of multiple human diseases.

Formaldehyde and acetaldehyde, from alcohol consumption, have been classified as Group 1 human carcinogens by the International Agency for Research on Cancer (IARC) [56–59]. Both compounds are believed to exert their carcinogenic effects by reacting with DNA, forming covalent modifications known as DNA adducts [60–66]. These adducts if not repaired or eliminated may translate into mutations and ultimately into dysregulation of normal cellular growth. Aldehyde toxicity is also implicated in aging, and age-related diseases such as cardiovascular and neurological disorders [67–70]. Unlike free radicals with shorter half-lives ranging from nanoseconds to milliseconds, reactive carbonyl compounds (RCCs) including aldehydes are more stable with half-lives ranging from minutes to hours. Because of this relative stability, aldehydes are long-lived and can therefore diffuse from the point of origin and intracellularly and extracellularly attack targets, which are distant from the radical events [71,72].

**Figure 2.** Structures of common aldehydes associated with various human diseases.

Mounting evidence indicates that endogenous aldehydes, such as MDA, 4-HNE, 3-aminopropanal (3-AP), acrolein, formaldehyde, and methylglyoxal, are mediators of neurodegeneration [73] and aldehydes formed during lipid peroxidation (advanced lipoxidation end-products, ALEs) and sugar glycoxidation (advanced glycoxidation end-products, AGEs) accumulate in several oxidative stress and aging disorders [74]. Aldehydes foster oligomerization of proteins and peptides found in neuritic plaques, which is a characteristic of Alzheimer's disease (AD) [75–77]. Physiological concentrations of these aldehydes range from nM to several hundred μM [78,79]. Methylglyoxal concentration in human blood is estimated to be in the 100–120 nM range, while its cellular concentration is about 1–5 μM and 0.1–1 μM for glyoxal [80–82]. MDA concentration in serum is 0.93 ± 0.39 μM [83] and 4-HNE concentration in cells is less than 1 μM [52]. Likewise, the levels of acrolein formed by metabolism are hard to quantify and may reach very high levels in certain microenvironments [84]. Increased levels of these aldehydes in brain and cerebrospinal fluid (CSF) were reported for various neurodegenerative disorders [85]. The levels of 4-HNE are found to increase in the brain regions of deceased AD patients compared to age-matched controls [86], and are elevated in CSF of AD patients compared to healthy controls [87]. Likewise, acrolein is found to be elevated in the amygdala and hippocampus/parahippocampal gyrus in brains of AD patients compared to controls [88]. Protein carbonylation has been associated with the progression of several neurodegenerative disorders including AD, Parkinson's disease (PD), multiple sclerosis (MS) and amyotrophic lateral sclerosis (ALS).

Methylglyoxal is found at significantly higher levels in diabetic patients compared to healthy controls [89], while 4-HNE, is known to form adducts with mitochondrial proteins, (specifically through interactions with cysteine, histidine, and lysine residues), lipids, and DNA resulting to mitochondrial malfunction. The mitochondrial electron transport chain is the most important source of endogenous ROS, converting 1–2% of the total oxygen consumed into superoxide anions [90,91]. An estimate of 1–8% of 4-HNE produced in cells will form adducts with proteins, with 30% of it occurring in the mitochondria, making it consequential in ROS production [51,92,93]. In some cases, ROS overproduction has been associated with mutations in a mitochondrial gene that encodes a component of the electron transport chain [94]. Increasing damage to mitochondrial DNA inevitably results to compromised mitochondrial function and integrity, leading to a vicious cycle of ROS generation and DNA damage [91]. Oxidative damage to mitochondrial DNA in the heart and the brain has been shown to decrease the lifespan in mammals, and mitochondrial dysfunction has been associated with some neurological disorders including AD, PD, Huntington's Diseases (HD), and ALS [48,95].

Finally, endogenous aldehydes may also play a role in the free radical theory of aging at the molecular level, which has gained widespread attention and acceptance. In this context, aging is viewed as a process related to an imbalance favoring pro-oxidant over antioxidant molecules (either by ROS elevation or an age-related downregulation of antioxidant molecules and ROS-mitigating enzymes) and consequently an increase in oxidative stress and the level of aldehydes resulting from it [72].

Despite the fact that these molecules fundamentally underlie early events driving the initiation and propagation of various pathologies, their exact role and diagnostic or prognostic value as clinical biomarkers have been underexploited [96]. The complete cellular "aldehydic load" is considered an important parameter for appraisal of these pathologic statuses [97,98]. Developing methods to detect free aldehydes in biological systems is important in understanding the roles and functions of these molecules in cellular processes and disease pathogenesis. The measurement of free aldehydes has the potential to be used to characterize exposure, but also to identify biomarkers for early disease diagnosis, monitor disease progression and response to therapy, and investigate physiological malfunctions such as high oxidative stress.

#### **3. Metabolism of Aldehydes**

As outlined in the previous section, excessive exposure to aldehydes can result in the disruption of a number of cellular functions, which can ultimately contribute to human diseases. The balance between the activation and detoxification of aldehydes will dictate their toxicity, which is dependent on the aldehyde itself and the presence of aldehyde metabolizing enzymes in cells. Several metabolic pathways and metabolizing enzymes are responsible for the metabolism and detoxification of aldehydes. These enzymes include aldehyde-oxidizing enzymes, aldehyde-reducing enzymes, and glutathione (GSH)-dependent aldehyde metabolizing enzymes, as previously reviewed by O'Brien [1]. For instance, 4-HNE is metabolized by glutathione S-transferase (GST) and aldehyde dehydrogenase 2 (ALDH2), and to a minor extent alcohol dehydrogenase (ADH) in rat hepatocytes [92,99–102]. Methylglyoxal is likely metabolized by glyoxalase (GLOX) and reduced by aldo-keto reductase (AKR) 1A2 [1]. The inhibition of ALDH2 activity, with the consequent increase in the level of aldehydes by oxidative stress was also observed in humans and diabetic mice during aging and is associated with cardiac dysfunction [103]. Elimination and in vivo metabolism of alkanals and aromatic aldehydes is via dehydrogenase-catalyzed oxidation. Likewise, the main in vivo elimination and metabolism of alkenals such as acrolein is via glutathione conjugation catalyzed by glutathione transferases [1].

In the case of formaldehyde, its metabolism is known to be mediated by alcohol and aldehyde dehydrogenases, ADH5 and ALDH2, respectively. Depletion of GSH levels in hepatocytes and inhibition of these enzymes result in a marked increase in formaldehyde cytotoxicity [104]. Formaldehyde is a potent DNA and protein cross-linking molecule that organisms produce in vast quantities, through one carbon metabolism (1C-metabolism), and in processes such as enzymatic demethylation of histones and nucleic acids [105]. This is supported by the blood formaldehyde concentration, which ranges from 20–100 μM, and 200–400 μM in a healthy human brain, indicating a substantial source of this molecule [106–109]. A study on mice revealed a two-tier protection mechanism, shielding mice from high levels of endogenous formaldehyde. The first tier involved the enzyme ADH5, which eliminates formaldehyde, while the Fanconi Anemia pathway for cross-link repair reverts DNA damage due to formaldehyde. It was hypothesized that ADH5-dependent formaldehyde oxidation into formate could provide 1C units to enable nucleotide synthesis [110]. Formaldehyde reacts spontaneously with intracellular GSH, present in substantial amounts to form S-hydroxymethylglutathione (HMGSH), which undergoes oxidation by ADH5 and NAD(P)<sup>+</sup> to generate S-formylglutathione (FGSH), which is subsequently converted by S-formylglutathione hydrolase (FGH) regenerating GSH and yielding formate. The formate formed in this process is eventually used in biosynthetic reactions [111], thus showing that formaldehyde detoxification produces a 1C unit sustaining essential metabolism [55], including the biosynthesis of purines and thymidine, homeostasis of amino acids glycine, serine, and methionine, epigenetic maintenance, and redox defense [112]. This biochemical route of formaldehyde detoxification can therefore provide the cell with utilizable 1C units [111]. Since this genotoxic molecule is generated in large amounts in the human body, a steady-state balance between formaldehyde generation and removal is established due to

detoxification by cellular enzymes including alcohol dehydrogenase 1 (ADH1), which reduces cytosolic formaldehyde to methanol, mitochondrial ALDH2, cytosolic alcohol dehydrogenase 3 (ADH3), also known as glutathione-dependent formaldehyde dehydrogenase, as well the previously mentioned ADH5, all responsible for formaldehyde metabolism [113–116].

Aldehydes are oxidized by the aldehyde dehydrogenase superfamily, of which 16 genes and 3 pseudogenes have been identified in the human genome, including ALDH1A, ALDH2, ALDH1B1, ALDH3A1, and ALDH3A2. ALDH2, for example, is efficient at metabolizing acetaldehyde, a reactive metabolite of ethanol, to acetate and likely plays a major role in reducing the toxicity of aldehydes in humans [117]. Likewise, the aldehyde-reducing enzymes are another superfamily of enzymes responsible for the reduction of aldehydes to alcohol using NADH as a cofactor, and which can be divided into several classes corresponding to the necessary cofactors. The ADH superfamily preferentially uses NADH to reduce aldehydes to alcohols, while using NAD+ to do the reverse reaction but to a lesser extent [1]. This class of enzymes is located in the cytosol and includes ADH1, ADH2, and ADH3. The aldo-keto reductase superfamily uses NADPH solely while others use both NADPH and NADH. This class of enzymes includes AKR1A1, AKR1C, and AKR7A1. The short-chain dehydrogenase/reductase superfamily is another class of aldehyde reducing enzymes responsible for the detoxification of aldehydes in cells. This class of enzymes includes carbonyl reductase (CR) and hydroxypyruvate reductase (GRHPR). CR is considered the main quinone oxidoreductase in human liver and catalyzes the two-electron reductive detoxification of quinones, including PAHs [118]. Another class of aldehyde metabolizing enzymes are GSH-dependent, including ADH5, GSTs, and glyoxalase 1 (GLO1). The class III alcohol dehydrogenase detoxifies formaldehyde via glutathione conjugation. Glutathione conjugation is catalyzed by glutathione transferases and predominantly forms conjugates with alkenals and hydroxyalkenals. Glyoxal and methylglyoxal are metabolized by glutathione conjugation and subsequent isomerization by glyoxalases [1]. The activities of these enzymes in living cells dictate the toxicity of aldehydes. Given these well-established associations of reactive carbonyls in cellular metabolism and contributions in human diseases, methods that will allow the elucidation of their roles and functions in biological systems are needed. This panel of biomarkers could be used to determine exposure, early disease diagnosis, and for monitoring disease progression, as well as therapeutic efficacy.

#### **4. Bioanalytical and Mass Spectrometric Methods for Characterizing Aldehydes**

There are a wide variety of analytical and biochemical techniques used to identify and quantify aldehydes. Traditionally, the analysis of aldehydes or carbonyl compounds is performed on matrices such as air, water, and soil for environmental monitoring of air and water quality by US federal agencies such as the US EPA, NIOSH, and ASTM (see Section 4.2 below) [119–123]. Because aldehydes play important roles in cellular processes and are linked to various diseases, these methods were further extended for the identification and characterization of these compounds in biological fluids such as plasma, cerebrospinal fluid (CSF), urine, exhaled breath condensate (EBC), and saliva. One challenging aspect in the measurement of aldehydes in biological matrices is their inherent volatility, polarity, and biochemical instability. Thus, derivatization is commonly used for the analysis of low molecular weight aldehydes in complex matrices to improve chromatographic separation, MS ionization, and MS/MS fragmentation detectability [119,124–127]. A wide range of derivatization reagents, as previously reviewed by Santa [124], and analytical methods are being applied for the analysis of carbonyl compounds in food and beverages, as previously reviewed by Osorio [39]. The different derivatization techniques and analytical methods used to identify and measure these compounds have their strengths and limitations, and, depending on the information one wants to obtain, there are techniques and experimental strategies that are suitable for each specific application. Nonetheless, methods to improve the overall sensitivity and detection of aldehydes in complex biological matrices are still being developed to enable trace level analysis and allow elucidation of their contributions and impact on human health.

#### *4.1. Colorimetric*/*Fluorimetric*/*Amperometric Methods*

One of the most commonly used methods for the analysis of aldehydes in biological fluids is the assay of thiobarbituric acid reactive substances (TBARS), which are produced under high oxidative stress conditions resulting from lipid peroxidation. Oxidation of lipids generates reactive and unstable lipid hydroperoxides and further decomposition of these hydroperoxides yields MDA, a well-known biomarker of oxidative stress. MDA forms a 1:2 adduct with 2-thiobarbituric acid (2-TBA) and can be measured spectrophotometrically or fluorimetrically [128,129] (Figure 3). Although the specificity of this approach is in question as TBA can react with compounds other than MDA, it is still widely applied to measure lipid peroxidation in various biological samples including animal and human tissues and biofluids, as well as food and drugs [129]. One strategy employed to overcome the limitation of this assay is the prior precipitation of lipoproteins to eliminate interfering soluble 2-TBA-reactive substances. As TBARS are minimized, the assay becomes quite specific for lipid peroxidation [129,130]. In addition, extraction of MDA-reactant adducts is also employed, however, this approach introduces another time-consuming step and adversely affects precision of the assay [130].

**Figure 3.** Reaction of 2-thiobarbituric acid (2-TBA) with malondialdehyde (MDA), a biomarker of oxidative stress. 2-TBA reacts with MDA to form a colored product, which is measured spectrophotometrically at 532 nm. The intensity of the colored product reflects the level of lipid peroxidation in the sample.

Another rapid and simple strategy to determine aldehydes in biological fluids, such as saliva, is the development of a microfluidic paper-based analytical device (μPAD) [131]. This device is based on the reaction of aldehydes with 3-methyl-2-benzothiazolinone hydrazine (MBTH) and iron (III) to form a blue formazan complex, which can be evaluated visually (Figure 4) [131]. This approach is simple, rapid, and non-invasive for the analysis of salivary aldehydes, which could be useful in assessing oral cancer risk in population-based studies and point-of-care diagnostics for aldehyde exposure. Methods based on capillary electrophoresis, coupled with amperometric detection (CE-AD) and using electroactive 2-TBA, have been developed and used to analyze two non-electroactive aldehydes, methylglyoxal and glyoxal in urine and water samples. This method demonstrates good specificity for methylglyoxal and glyoxal with the formation of stable pink-chromophore adducts with 2-TBA. Using this approach, the LODs (limit of detection) obtained are 0.2 μg L−<sup>1</sup> (0.6 nmol L<sup>−</sup>1) and 1.0 μg L−<sup>1</sup> (3.2 nmol L<sup>−</sup>1) for methylglyoxal and glyoxal, respectively [132]. The approaches described above are simple and the instrumentation is easy to use and operate for rapid screening of aldehydes in various matrices. In addition, these analytical techniques can be applied for remote monitoring of aldehydes where more sophisticated bioanalytical tools and mass spectrometry instrumentation are not available. The limitations of these techniques, however, are their low specificity and selectivity for identifying aldehydes, which can be further confounded with increased matrix complexity.

**Figure 4.** Reaction of MBTH with aldehydes to form an intense blue-colored complex. Figure adapted from Reference [131] (Copyright 2016, Elsevier).

#### *4.2. High-Performance Liquid Chromatography (HPLC) with Ultraviolet (UV)*/*Fluorescence Detection*

Historically, HPLC-UV has been the method of choice for characterizing and quantifying aldehydes in a wide array of matrices and were originally developed for environmental analysis. However, characterization and quantification of aldehydes has gained widespread use in the food and beverage industry, and in the biomedical field, where aldehydes have been shown to play major roles in cellular processes and disease pathogenesis. In addition, the derivatization of carbonyl compounds is typically accomplished using 2,4-dinitrophenylhydrazine (DNPH) to form their corresponding carbonyl-hydrazones. The carbonyl-hydrazones are then analyzed by HPLC with ultraviolet detection. HPLC-UV detection is commonly used to characterize and quantify carbonyl compounds in various matrices because of its simplicity, robustness, and reproducibility. DNPH derivatization and HPLC-UV analysis are used in environmental monitoring of air and water quality and used for screening and monitoring carbonyl compounds in various matrices by the US federal agencies (Table 1) [119,133–137]. The HPLC-UV technique is also being used in the food industry to measure aldehydes in food and beverages [39,138–142] and in biomedical research to measure aldehydes and carbonyls in various matrices such as urine, plasma and serum samples [40,143–152]. DNPH derivatization is also used in conjunction with a reducing agent, 2-picoline borane (2-PB) to stabilize carbonyl-hydrazones and to resolve isomeric compounds produced during the reaction that might interfere with subsequent quantitative analysis by HPLC-UV [153]. DNPH and hydroquinone impregnated into silica cartridges has been used for the determination of acrolein and other carbonyl compounds in cigarette smoke [22]. This approach is useful for characterizing carbonyls in air samples for environmental analysis as well as for the characterization of other α,β-unsaturated aldehydes in tobacco smoke. DNPH derivatization was also used for the analysis and measurement of acetaldehyde in plasma and red blood cells [154], formaldehyde determination in human tissue [151], carbonyl compounds in exhaled breath of e-cigarette users [35], and for the measurement of formaldehyde released from heated hair straightening cosmetic products [18]. Other reagents such as the previously mentioned 2-thiobarbituric acid (2-TBA) and diaminonapththalene (DAN) are also being used for HPLC-UV analysis of carbonyl compounds from biological matrices and environmental samples [155–157].

To improve sensitivity and allow for simultaneous derivatization and extraction of derivatized carbonyls for HPLC-UV analysis, a wide array of sample preparation techniques have been introduced into the analytical workflows. For instance, a method for the quantification of early lung cancer biomarkers, hexanal and heptanal in urine, has been developed using a bar adsorptive microextraction (BAμE) technique and DNPH derivatization. This approach uses an adsorptive bar impregnated with the derivatization reagent for simultaneous derivatization and extraction of derivatized carbonyls. The LODs obtained for hexanal and heptanal are 0.80 μmol L−<sup>1</sup> (800 nmol L−1) and 0.40 μmol L−<sup>1</sup>

(400 nmol L−1), respectively [145]. Similarly, magnetic solid phase extraction coupled with in-situ DNPH derivatization (MSPE-ISD) was developed for the determination of hexanal and heptanal in urine. The extraction, purification, and derivatization of aldehydes are integrated into a single analytical step, simplifying the measurement workflow. The LODs are 1.7 and 2.5 nmol L−<sup>1</sup> for hexanal and heptanal, respectively. Using this approach, the levels of hexanal and heptanal in urine of lung cancer patients were found to be higher compared to healthy controls [147]. Another method for the analysis of hexanal and heptanal in plasma used DNPH adsorbed on a polymer monolith composed of poly(methacrylic acid-co-ethylene glycol dimethacrylate) for simultaneous derivatization and microextraction, followed by HPLC-UV analysis. The LODs obtained are 2.4 and 3.6 nmol L−<sup>1</sup> for hexanal and heptanal, respectively [150]. This monolith microextraction technique was further extended and used for the analysis of 5-hydroxymethylfurfural (5-HMF) in beverages such as coffee, honey, beer, soda, and urine [142]. In addition, a method using dispersive liquid–liquid microextraction with 1-dodecanol of DNPH derivatized aldehydes has been developed. Centrifugation of the sample and subsequent solidification of the droplet on an ice bath for easy removal of derivatized compounds for HPLC-UV analysis was performed. The LODs obtained for hexanal and heptanal are 7.90 nmol L−<sup>1</sup> and 2.34 nmol L<sup>−</sup>1, respectively. This approach afforded higher sensitivity compared to the conventional liquid-liquid microextraction methods [146]. An alternative approach developed by the same group uses ultrasound-assisted headspace liquid-phase microextraction with in-drop derivatization for the extraction and determination of hexanal and heptanal in blood. This technique uses a polychloroprene PCR tube containing the extraction solvent, methyl cyanide and the derivatization reagent, DNPH. Volatile aldehydes are then headspace extracted and derivatized simultaneously in the droplet and analyzed by HPLC-UV. The LODs for hexanal and heptanal are 0.79 nmol L−<sup>1</sup> and 0.80 nmol L−1, respectively [148].

**Table 1.** DNPH derivatization and HPLC-UV analysis of carbonyl compounds for environmental analysis.


In addition to UV detection, fluorogenic derivatization reagents for the HPLC analysis of aldehydes are widespread in the literature. These tagging reagents are used either as pre-column labeling reagents or in one-pot derivatization of aldehydes. For instance, the labeling reagent 1,3,5,7-tetramethyl-8-aminozide-difluoroboradiaza-s-indacence (BODIPY-aminozide) is used as a pre-column derivatization reagent to monitor aldehydes in human serum by HPLC with fluorescence detection [158]. The BODIPY-based reagent reacts with aldehydes to form stable and highly fluorescent BODIPY hydrazone derivatives, which are easily separated and detected by HPLC with fluorescence detection at 495 nm (maximum excitation wavelength) and 505 nm (maximum emission wavelength). This approach is used to measure trace aliphatic aldehydes in serum samples without pretreatment or enrichment method [158]. Other reagents used for pre-column labeling are 2,2 -furil to label aldehydes [159] and 4-(*N,N*-dimethylaminosulfonyl)-7-hydrazino-2,1,3-benzoxadiazole to label 4-HNE in human serum [160]. For the one-pot-derivatization of aldehydes, rhodamine B hydrazide (RBH) [161], 2-aminoacridone [162], 9-fluorenylmethoxycarbonyl hydrazine (FMOC-hydrazine) [163], and 2-TBA [164] are used for the determination of malondialdehyde in biological fluids [161] by HPLC with fluorescence detection. For the determination of methylglyoxal, glyoxal, and diacetyl using HPLC-fluorescence, the most commonly used derivatization reagents are 4-methoxy-*o*-phenylenediamine (4-MPD) [165] and 1,2-diamino-4,5-dimethoxybenzene (DDB) [166]. Monitoring of methylglyoxal and glyoxal in diabetic patients has been proposed to help assess the risk of development of diabetic complications. Additionally, an increase in oxidative stress biomarkers has been reported in juvenile swimmers but no prior data has been reported on α-ketoaldehydes in urine associated with swim training. Thus, these methods were applied to compare the levels of these molecules in urine samples from healthy volunteers, diabetic subjects, and juvenile swimmers [165]. For acrolein analysis, luminarin 3 [167] and *m*-aminophenol [168] were used for the derivatization and HPLC-fluorimetric analysis in plasma resulting from the metabolism of drugs such as cyclophosphamide and ifosfamide [167]. HPLC coupled with UV or fluorescence detection are widely used techniques for aldehyde analysis in various environmental and biological matrices. These techniques have been the methods of choice as they offer good sensitivity and robustness. Along with innovative sample pre-treatment incorporated into the assays, low detection limits were obtained for quantifying specific biomarkers associated with various diseases. However, these methods do not provide structural information relating to the analyte of interest and require synthetic standards for analyte identification and confirmation. Finally, co-eluting peaks during HPLC separation can further confound the identification and quantitation of known and unknown carbonyl compounds via UV or fluorescence.

#### *4.3. Aldehyde Visualization in Cells*

In addition to HPLC with fluorimetric detection, fluorescent probes were designed and synthesized for real-time visualization of aldehydes in cells such as FP1 and FAP-1 for formaldehyde detection [169,170]. These formaldehyde probes are based on the 2-aza-Cope sigmatropic rearrangement, which yields highly fluorescent signal for the selective and sensitive detection of aldehydes in cells [169,170]. Recently, a novel technique based on real-time imaging of aldehydes in cells using multicolor fluorogenic hydrazone transfer ("DarkZone") was developed (Figure 5). This approach used a cell permeable DarkZone dye (7-(diethylamino)coumarin; DEAC) as a quenched hydrazone, which lights up when the quencher-aldehyde is replaced by the target aldehyde. The fluorescence signals are then detected by flow cytometry or microscopy without the need for washing or cell lysis. This strategy is useful for determining the aldehyde load associated with human diseases [171]. Recently, a novel fluorescent probe to visualize specific and total biogenic carbonyls was developed based on the pattern and fluorescence spectral profile unique to the target carbonyl compound. The probe is based on an *N*-aminoanthranilate methyl ester moiety [96]. These techniques offer real time monitoring of total aldehydes in cells and identification of specific aldehydes based on their unique fluorescence excitation and emission spectra. Overall, real-time imaging of aldehyde production in cells using aldehyde-specific probes allows elucidation of the roles and functions of these compounds in cellular processes and their involvement in disease pathogenesis. These techniques, however, lack the selectivity and specificity for the identification of specific carbonyls in cells as no structural information can be obtained. Finally, these techniques are not applicable to biological matrices such as blood, urine, CSF or saliva.

**Figure 5.** Real-time imaging of total aldehydic load in cells. Cellular aldehyde labeling fluorescence images and flow cytometry data. Hela cells were exposed to varying concentrations of: (**a**) formaldehyde; (**b**) glycolaldehyde; (**c**) acrolein; and (**d**) acetaldehyde along with 20 μM of the dye AFDZ and 10 mM catalyst (2,4-dimethoxyaniline) with images taken after 1 h of incubation. Note that 50 μM was used with acrolein and 100 μM for the other aldehydes tested. (**e**) K562 cells pretreated with 250 μM daidzin and incubated with 40 μM of AFDZ dye, 10 mM catalyst (2,4-dimethoxyaniline), and with/without 20 mM ethanol. (**f**) Flow cytometry data monitoring the production of aldehyde over time in K562 cells with/without ethanol. The fluorescence intensities were compared to that obtained from *t* = 0 without added ethanol and daidzin. Scale bars (20 μM) are shown. Reprinted from [171] (Copyright 2016, American Chemical Society).

#### *4.4. Gas Chromatography (GC)*/*Gas Chromatography-Mass Spectrometry (GC-MS)*

Mass spectrometry is widely used for the characterization and quantification of carbonyl compounds providing more selectivity, specificity, and sensitivity than is possible with UV or fluorescence detection [39,124,172]. There are a wide variety of derivatization reagents and sample preparation methods used to enhance the detection and sensitivity for mass spectrometric analysis of aldehydes (Table 2). For GC-MS analysis, derivatization increases the volatility of aldehydes in biological fluids and is most commonly done with *O*-2,3,4,5,6-pentafluorobenzyl hydroxylamine hydrochloride (PFBHA) as has been used for the analysis of saliva-available carbonyls in chewing tobacco products [173], to measure methylglyoxal and glyoxal in plasma of diabetic patients [174], formaldehyde in urine [175], and for the determination of MDA and 4-HNE levels in plasma [176]. In addition, PFBHA derivatization is often performed using headspace microextraction with subsequent derivatization on-fiber, on droplet, or for simultaneous extraction, derivatization, and GC-MS of volatile carbonyls. For instance, a quantitative method for the analysis of hexanal, heptanal, and

volatile aldehydes in human blood was developed using headspace solid-phase microextraction with on-fiber derivatization with PFBHA and subsequent analysis by GC-MS. This approach afforded LODs of 0.006 nM (0.006 nmol L−1) and 0.005 nM (0.005 nmol L−1) for hexanal and heptanal, respectively [177,178]. Similarly, this approach is implemented for the determination of hexanal, heptanal, octanal, nonanal, and decanal in exhaled breath [179,180] and for the analysis of volatile low molecular weight carbonyls in urine [181]. Likewise, several volatile organic compounds (C3–C9 aldehydes) as promising biomarkers of non-small cell lung cancer (NSCLC) are identified in exhaled breath of patients with lung cancer using on-fiber-derivatization with PFBHA. The LOD and LOQ obtained for all aldehydes are 0.001 nM and 0.003 nM, respectively [182]. On-fiber derivatization using 2,2,2-trifluoroethylhydrazine (TFEH) as derivatization reagent is also used for the analysis of MDA in blood [183].

In addition, PFBHA derivatization on droplet is used for the analysis of hexanal and heptanal in blood [184]. This strategy involves the dissolution of the derivatizing agent in an organic solvent such as decane, and volatile aldehydes are headspace extracted and derivatized in the droplet with subsequent injection for GC-MS analysis. Likewise, a stir bar sorptive extraction (SBSE) for the GC-MS analysis of 4-HNE in urine was developed. This approach used a stir bar impregnated with the derivatization agent, PFBHA. The resulting oximes were further acylated using sulfuric acid and thermally desorbed and analyzed by GC-MS. This approach affords LOD of 22.5 pg mL−<sup>1</sup> (0.06 nmol L<sup>−</sup>1) and LOQ of 75 pg mL−<sup>1</sup> (0.19 nmol L<sup>−</sup>1) for the target carbonyl, 4-HNE [185]. PFBHA is also used in combination with other derivatization reagents. For example, a novel two-step derivatization approach using PFBHA as the first derivatizing agent followed by *N*-Methyl-*N*-trimethylsilyl-trifluoroacetamide (MSTFA) was developed for the analysis of glyoxal, methylglyoxal, and 3-deoxyglucosone in human plasma by GC-MS [186]. Other derivatization reagents used for GC-MS are 2,3,4,5,6-pentafluorobenzyl bromide (PFB-Br) [187,188] and 2,4,6-trichlorophenylhydrazine (TCPH) [189] for the analysis of MDA in urine; phenylhydrazine (PH) for the analysis of MDA in plasma and rat liver microsomes [190]; pentafluorophenyl hydrazine (PFPH) for the analysis of carbonyls in MTS [23]; 2,3-diaminonaphthalene along with salting-out assisted liquid–liquid extraction (SALLE) and dispersive liquid–liquid microextraction (DLLME) for the analysis of glyoxal and methylglyoxal in urine [191]; and meso-stilbenediamine [192] and 1,2-diaminopropane [193] for the analysis of methylglyoxal serum of diabetic patients and healthy controls by capillary GC-FID.

Methods based on gas chromatography without prior derivatization are also used for the analysis of volatile aldehydes. For example, a GC-MS coupled to a headspace generation autosampler is used for the analysis of endogenous aldehydes in urine as potential biomarkers of oxidative stress [194] and carbonyls such as acetaldehyde, propionaldehyde, acrolein, and crotonaldehyde in MTS [195]. Similarly, acetaldehyde in saliva of subjects after alcohol consumption is determined without prior derivatization using headspace extraction and GC coupled with flame ionization detector (FID) [40]. No prior derivatization is also applied to characterize toxic compounds such as benzene, toluene, butyraldehyde, benzaldehyde, and tolualdehyde in saliva using micro-solid-phase extraction (μSPE) and GC-IMS [196]. Gas chromatography coupled with various detection systems such as FID and mass spectrometry are ideal tools in the direct analyses of volatile carbonyl compounds in complex matrices. These techniques are useful for low molecular weight, volatile aldehydes. However, these methods require derivatization for the analysis of high-molecular weight, less volatile carbonyls.


Bioanalytical techniques for characterizing carbonyl compounds.

**Table**

**2.**


NR, not reported; MTS, mainstream tobacco smoke; RLM, rat liver microsomes.

**Table 2.** *Cont*.

#### *4.5. Liquid Chromatography-Mass Spectrometry (LC-MS)*

#### 4.5.1. Methods Based on Selected Reaction Monitoring (SRM)

Liquid chromatography–mass spectrometry-based approaches have been used extensively to quantify derivatized carbonyl compounds, and recently for screening of unknown carbonyl compounds. Aldehyde derivatizations using 2,4-DNPH [143,197–202], dansylhydrazine (DnsHz) [203,204], *N*-(1-chloroalkyl)pyridinium [205], *o*-phenyldiamine [206], D-cysteine [207], 9,10-phenanthrenequinone (PQ) [208], 3-nitrophenylhydrazine [209], and 3,4-diaminobenzophenone [210] have been used to provide chromatographic retention and separation, efficient MS ionization, and MS/MS detectability. Typically, LC-MS analysis has been performed using selected reaction monitoring (SRM) with either atmospheric pressure chemical ionization (APCI), atmospheric pressure photoionization (APPI) or electrospray ionization (ESI). For example, D-cysteine has been used to generate alkyl-thiazolidine-carboxylic acid derivatives and analyzed by LC-SRM to quantify aldehydes in beverages with an LOD and LOQ of 0.2–1.9 μg L−<sup>1</sup> (1.36–8.76 nmol L−1) and 0.7–6.0 μg L−<sup>1</sup> (4.76–27.6 nmol L−1), respectively [207]. Alternatively, a method for profiling lipophilic reactive carbonyls in biological samples based on dansylhydrazine derivatization and LC-SRM has been developed with monitoring of the characteristic product ion, *m*/*z* 236.1 corresponding to 5-dimethylaminonaphthalene-1-sulfonyl moiety. This approach detects 400 free reactive carbonyls in plasma samples from mice, of which 34 are confirmed by synthetic standards [204]. Furthermore, charged derivatization reagents, such as 4-(2-(trimethylammonio) ethoxy) benzenaminium halide (4-APC), 4-(2-((4-bromophenethyl)dimethylammonio)ethoxy)benzenaminium dibromide (4-APEBA), N-[2-(aminooxy)ethyl]-N,N-dimethyl-1-dodecylammonium (QDA), and *N,N,N*-triethyl-2-hydrazinyl-2-oxoethanaminium bromide (HIQB), have been used to enhance ionization of the carbonyls for LC-MS analysis. For example, 4-APC, which contains an aniline moiety for reaction with aliphatic aldehydes, and a quaternary ammonium group for improved ionization efficiency and sensitivity, was developed for the analysis and quantitation of aldehydes in biological fluids [211] (Figure 6). Similarly, a second-generation derivatization reagent, 4-APEBA, consisting of a bromophenethyl group for isotopic signature incorporation and additional fragmentation identifiers, has been developed [212]. Another labeling reagent using *N*-(1-chloroalkyl)pyridinium quaternization to provide a charged tag was developed for quantifying aliphatic fatty aldehydes. This approach is used to measure the levels of long-chain non-volatile fatty acids in thyroid carcinoma tissues [205].

**Figure 6.** Commonly used differential isotope labeling reagents for profiling and relative quantitation of carbonyl compounds.

Assays with simultaneous derivatization and analysis have been developed. For example, a fully automated in-tube solid phase microextraction/liquid chromatography-post column derivatization with

hydroxylamine hydrochloride and mass spectrometry was developed for the analysis of hexanal and heptanal in human urine as potential biomarkers for lung cancer [213]. In addition, this approach has been extended to the analysis of urinary malondialdehyde by DNPH derivatization and LC-SRM [198]. Similarly, an approach based on magnetic solid phase extraction coupled with in-situ derivatization with 2,4-DNPH was developed for the determination of hexanal and heptanal in urine of lung cancer patients [147]. Likewise, an Alternate Isotope-Coded Derivatization (AIDA) was developed to quantify malondialdehyde and 4-HNE in exhaled breath condensate by LC-SRM. This approach affords good quantitation of MDA and 4-HNE and is in good agreement with quantitation of the same samples using external calibration [199].

#### 4.5.2. Screening LC-MS Methods

SRM analysis provides excellent sensitivity and good specificity for quantitative analysis but lacks the ability to screen for unknown aldehydes and requires a knowledge of unique SRM transitions of the known carbonyl compounds to be measured. Thus, data-dependent LC-MS/MS analysis (DDA) with DNPH derivatization is frequently used for untargeted profiling with MS<sup>n</sup> spectra used for identification and structural elucidation [135,215,229]. Studies using negative ionization have described the MS and MS/MS behavior of DNPH-derivatized carbonyls [215,216,229]. Studies using positive electrospray ionization have characterized DNPH-derivatized malondialdehyde [198,199,217,230] and 4-HNE [199], and recently we characterized the positive ionization and fragmentation of a wide range of DNPH-derivatized carbonyls to establish consistent fragmentation rules applicable to this class of compounds, allowing for screening of unknown carbonyl compounds and comprehensive detection [218] (Table 3).

Differential Isotope Labeling for Profiling and Relative Quantitation of Aldehydes

To allow simultaneous identification and quantitation of carbonyl compounds in biological fluids and alcoholic beverages, isotopically labeled counterparts are used for differential labeling (Figure 6). 4-APC and its labeled counterpart, D4-4-APC, have been used for untargeted profiling of aldehydes by differential stable isotope labeling using liquid chromatography-double neutral loss scan-mass spectrometry (SIL-LC-DNLS-MS). Pooled control samples are labeled with isotope labeled compounds, while the individual samples are derivatized with the unlabeled versions. This approach involves scanning of the two characteristic neutral fragments of 87 Da and 91 Da generated upon CID corresponding to the unlabeled 4-APC and labeled D4-4-APC-derivatized carbonyls, respectively. This strategy enables profiling of 16 and 19 aldehyde-containing compounds in human urine and white wine, respectively. Finally, five aldehydes in human urine and four aldehydes in white wine are confirmed by comparison with synthetic standards [219]. This approach was further extended using an enrichment step by solid phase-extraction using stable isotope labeling–solid phase extraction–liquid chromatography–double precursor ion scan/double neutral loss scan–mass spectrometry analysis (SIL-SPE-LC-DPIS/DNLS-MS) for profiling and relative quantitation of aldehydes in beer. The pair of isotope reagents, 4-APC and D4-4-APC, are used for differential labeling of the samples and co-eluting *m*/*z* pairs separated by 4 Da were detected and identified in the mass spectral data obtained by high resolution LC-QTOF-MS. Using this approach, 25 candidate aldehydes are detected in beer. The 25 candidate aldehydes are then quantified in different beer samples using a targeted MRM approach by monitoring the MRM transitions [M]<sup>+</sup> <sup>→</sup> [M]<sup>+</sup> <sup>−</sup> 87 and [M+4]<sup>+</sup> <sup>→</sup> [M + 4]<sup>+</sup> <sup>−</sup> 91 corresponding to 4-APC and D4-4-APC, respectively. Fifteen aldehydes are identified and confirmed by comparison with synthetic standards and MS/MS analysis [220]. Likewise, differential labeling for profiling and relative quantitation of fatty aldehydes in biological samples using 2,4-bis-(diethylamino)-6-hydrazino-1,3,5-triazine and its deuterated counterpart has been developed. Using the 2VO dementia rat model system, 43 and 19 fatty aldehydes are significantly altered between the controls and models groups' plasma and brain tissue, respectively [214].


LC-MS-basedMethodsforCharacterizing

#### *Toxics* **2019** , *7*, 32



**Table 3.***Cont*.

A high-performance chemical isotope labeling (CIL)-LC-MS method for profiling and quantitative analysis of carbonyl sub-metabolome in human urine using dansylhydrazine (DnsHz) as labeling reagent has been developed [222]. Identification and relative quantitation of carbonyl metabolites was performed using differential tagging with 12C-DnsHz and 13C-DnsHz in urine samples and subsequent analysis using LC-QTOF-MS. In-house software program was developed to process the CIL LC-MS mass spectral and a custom library of DnsHz-labeled standards was constructed (www.mycompoundid.org) for carbonyl metabolites identification. In total, 1737 peak pairs are detected in human urine, of which 33 are confirmed [222]. In addition, a strategy based on isotope labeling and liquid chromatography–double precursor ion scan mass spectrometry (IL-LC-DPIS-MS) was developed for the comprehensive profiling and relative quantitation of carbonyl compounds in human serum using the labeling reagent, HIQB and its corresponding isotope-labeled analog, D7-HIQB [222]. The characteristic products ions, *m*/*z* 130.1/137.1 are monitored in the double precursor ion scans during mass spectrometry analysis upon collision-induced dissociation (CID). In total, 156 candidate carbonyl compounds are detected in human serum, of which 12 are further identified by synthetic standards. Using a targeted MRM mode, 44 carbonyls are found to be statistically different in myelogenous leukemia patients compared to healthy controls [223].

Methods Using High-Resolution/Accurate Mass Data Dependent Acquisition (DDA) and Data Independent Acquisition (DIA)

High-resolution mass spectrometry-based methods for metabolomics profiling provide accurate masses of both precursor and MS/MS fragment ions, and thus allow confident identification of detected metabolites in complex biological matrices. Recently, we have developed a high-resolution accurate mass data-dependent MS3 neutral loss (NL) screening strategy to characterize DNPH-derivatized carbonyls in biological fluids, allowing for the simultaneous detection and quantitation of suspected and unknown/unanticipated carbonyl compounds [218]. Previous analyses of DNPH-derivatized carbonyls were mostly performed in negative ionization mode and at relatively high-flow rates, which limit the sensitivity of detection and quantitation of trace level analytes (Table 3). We found that, in positive mode, these compounds showed a characteristic neutral loss of hydroxyl radical (•OH) upon CID. This NL is not observed in negative mode. The characteristic neutral loss, •OH from DNPH-derivatized carbonyls, is then used as a screening approach during MS acquisition allowing unambiguous identification of RCCs (Figure 7). Furthermore, a relative quantitation strategy by differential isotope labeling using D0-DNPH and D3-DNPH is implemented to determine the relative levels of carbonyls after specific exposures. Using this approach, pre-exposure samples are labeled with D0-DNPH, while post-exposure samples are labeled with D3-DNPH. The samples are combined in a 1:1 (v/v) ratio and analyzed by our HR-AM NL screening strategy. The MS-based workflow provides an accurate, rapid, and robust method to identify and quantify toxic carbonyls in various biological matrices for exposure risk assessment. This is in contrast to previous work, which used relatively high flow rates (0.2–1.5 mL min<sup>−</sup>1) and low-resolution MS analysis, limiting their sensitivity and identification confidence at trace analyte levels. We applied this method to characterize the levels of carbonyls after alcohol consumption in humans and showed that acetaldehyde levels are increased after exposure. This strategy is currently being used to characterize the carbonyls associated with e-cigarette use (vaping) as well as tobacco smoking.

**D**

**Figure 7.** Development of a high-resolution accurate mass data-dependent MS<sup>3</sup> neutral loss screening strategy for profiling and quantitative analysis of aldehydes in biological fluids. (**a**) The high-resolution accurate mass of •OH (17.0027 Da) was used to screen for all DNPH-derivatized aldehydes. (**b**) Monitoring of specific fragment ions (*m*/*z* 78.0332 and *m*/*z* 164.0323) minimizes possible false positive identification. (**c**) Representative MS, MS2, and MS3 spectra of DNPH-derivatized acetaldehyde and proposed structures of major fragment ions. Reprinted with permission from Ref. [218] (Copyright 2017, Springer).

Another strategy based on ultra-high-resolution fourier transform mass spectrometry (UHR FT-MS) method using the tribrid orbitrap fusion was developed for profiling carbonyl metabolites in crude biological extracts. This approach uses a chemoselective tagging reagent, QDA, and its labeled counterpart, 13CD3-QDA, for differential isotope labeling of biological samples. Data-dependent TopN MS/MS of the targeted mass difference of 4.0219 Da (QDA and 13CD3-QDA metabolite pairs) is performed with direct infusion allowing for long acquisition times, resolved isotopic peaks and

high-quality MS and MS/MS data. MS and MS/MS spectral data are processed using a custom software Precalculated Exact Mass Isotopologue Search Engine (PREMISE) for QDA–13CD3-QDA ion pairs and isotopologue identification. The workflow identifies 66 carbonyls in mouse tumor tissues, of which 14 carbonyls are quantified using authentic standards [231]. A similar derivatization and differential labeling approach is applied for the profiling and untargeted metabolomics of carbonyl compounds in cell extracts [226]. Likewise, direct infusion and FT-ICR-MS are used for the analysis of aldehydes and ketones in exhaled breath using 2-(aminooxy)ethyl-*N,N,N*-trimethylammonium iodide (ATM) and 4-(2-aminooxyethyl)-morpholin-4-ium chloride (AMAH) as derivatizing agents [227,228]. ATM is chemically functionalized on a novel microreactor to selectively preconcentrate volatile aldehydes and ketones. This approach demonstrated detection of C1-C12 aldehydes and applicable to any gaseous samples [227]. Similarly, AMAH is used as derivatizing agent coated within a silicon microreactor to capture volatile carbonyls to form AMAH-carbonyl adducts and analyzed by FT-ICR-MS. Subsequent treatment of the derivatized-carbonyl adducts with poly(4-vinylpyridine) yielded volatile carbonyl adducts, which can be analyzed using GC-MS. These complementary approaches using FT-ICR-MS and GC-MS provide a convenient and flexible identification and quantification of isomeric volatile organic compounds in exhaled breath [228]. In addition, an on-line weak-cation exchange liquid chromatography–tandem mass spectrometry using the LC-QTOF-MS2 has been developed for screening aldehydes in plasma and urine samples. This strategy involves derivatization of aldehydes with 4-APC and subsequent reduction by NaBH3CN. The characteristic MS/MS fragmentation of 4-APC derivatized aldehydes allows confirmation of known aldehydes as well as differentiation of hydroxylated and non-hydroxylated aldehydes [221]. Finally, a novel DIA strategy has been developed for the global analysis of aldehydes and ketones in biological samples. The strategy is based on TSH (*p*-toluenesulfonylhydrazine) derivatization of carbonyl compounds and Sequential Window Acquisition of All Theoretical Fragment-Ion spectra (SWATH) detection. Although the TSH-derivatized carbonyls are efficiently detected in both positive and negative modes, the negative ion mode data acquisition exhibits the signature fragment ion at *m*/*z* 155.0172, which is monitored using ESI-QqTOF-SWATH allowing chemo-selective identification of carbonyl compounds. Using this strategy, 61 target carbonyls were successfully identified and quantified in biological samples. In addition, SWATH MS data acquisition provides high resolution accurate mass measurements of both the precursor and fragment ions, allowing for confident identification of derivatized compounds [224].

Overall, HPLC coupled with mass spectrometry techniques are powerful tools for profiling and performing quantitative analysis of aldehydes in various biological matrices. The high selectivity and specificity of these methods along with structural information obtained from MS and MSn mass spectral data are ideal for identifying knowns and unknowns. The more recent LC-MS-based methods presented here offer improved sensitivity, selectivity, and specificity for the detection of aldehydes in complex biological matrices. Although these techniques are highly sensitive, they are also susceptible to matrix interferences requiring rigorous sample clean-up. In addition, these techniques require expensive instrumentation and highly trained users, and are less portable. The development of new and innovative MS-based techniques is continuously evolving towards novel applications, in particular, for trace level analysis ideal for human exposure assessment, allowing for elucidation of their contributions and impact on human health.

#### **5. Future Perspectives**

The increased emphasis on the need to improve methods to comprehensively characterize exposures, and the parallel development of enhanced technology is resulting in a number of exciting new analytical techniques and approaches. The introduction of the concept of the exposome, intended as the totality of chemical exposures in an individual's life-time [232], has brought to light new analytical challenges related to the complexity of capturing the totality of various exposures, which are often chemically diverse, present in trace levels, and, in some cases, are resulting from the combination of endogenous and exogenous sources. To address this complexity, tools have been developed to analyze for specific classes of compounds resulting in a number of complementary approaches. Aldehydes are a major component of the exposome, and aldehyde exposure is important in the pathogenesis of several diseases, including certain cancers. Profiling and characterizing these compounds is particularly difficult due to their reactivity and the ubiquitous presence of many of them. The improvement of tools for the investigation of the "aldehydome", the sum of all exogenous and endogenously-formed aldehydes, is needed to elucidate the complex roles these compounds play in physiological and pathological events. With the availability of more advanced MS instrumentation, high performance chromatographic separation, and improved bioinformatics tools, the data acquired allow for increased sensitivity, identification of specific aldehydes, and the establishment of new biomarkers of exposure and effect. Additionally, the combination of these techniques with exciting new methods for single cell detection provides the potential for detection and profiling of aldehydes at a cellular level, opening up the opportunity to minutely dissect their roles and functions in biological systems and in pathogenesis.

**Author Contributions:** All authors critically reviewed all relevant literature and contributed to writing of the manuscript.

**Funding:** The work presented in this review carried out in the Balbo Research group was supported by NIOSH-funded MCOHS ERC Pilot Research Training Program (OH008434).

**Acknowledgments:** Mass spectrometry was carried out in the Analytical Biochemistry Shared Resource of the Masonic Cancer Center, University of Minnesota, funded in part by Cancer Center Support Grant CA-077598 and S10 RR-024618 (Shared Instrumentation Grant).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**



#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Protein Adductomics: Analytical Developments and Applications in Human Biomonitoring**

#### **George W. Preston and David H. Phillips \***

Environmental Research Group, Department of Analytical, Environmental and Forensic Science, School of Population Health and Environmental Sciences, King's College London, Franklin-Wilkins Building, 150 Stamford Street, London SE1 9NH, UK; george.preston@kcl.ac.uk

**\*** Correspondence: david.phillips@kcl.ac.uk

Received: 22 March 2019; Accepted: 20 May 2019; Published: 25 May 2019

**Abstract:** Proteins contain many sites that are subject to modification by electrophiles. Detection and characterisation of these modifications can give insights into environmental agents and endogenous processes that may be contributing factors to chronic human diseases. An untargeted approach, utilising mass spectrometry to detect modified amino acids or peptides, has been applied to blood proteins haemoglobin and albumin, focusing in particular on the *N*-terminal valine residue of haemoglobin and the cysteine-34 residue in albumin. Technical developments to firstly detect simultaneously multiple adducts at these sites and then subsequently to identify them are reviewed here. Recent studies in which the methods have been applied to biomonitoring human exposure to environmental toxicants are described. With advances in sensitivity, high-throughput handling of samples and robust quality control, these methods have considerable potential for identifying causes of human chronic disease and of identifying individuals at risk.

**Keywords:** haemoglobin; albumin; mass spectrometry; biomarkers; protein adducts

#### **1. The Exposome and Adductomics**

Many decades of epidemiological observations have indicated that incidences of chronic human diseases are likely to result from a combination of environmental exposures to chemical and physical stressors, and predispositions inherent in human genetics. The wide geographical variation of many such diseases implies that it is environmental factors that play the dominant role, and not inherited predisposition, in disease causation [1], but knowledge of what the environmental factors are is often far from complete. As a consequence, estimations of overall risks associated with these factors are inaccurate and important associations may go undetected. These limitations have recently been framed within the context of the *exposome*, which can be thought of as the environmental counterpart of the genome. Conceptually, the exposome aims to reflect the totality of environmental exposures throughout the human lifespan, and to take into account both external components (e.g., exogenous environmental agents) and internal ones (e.g., endogenous cellular processes that give rise to altered stasis or function) [2–5]. For strategies to improve human health to be effective, it is essential to unravel the causes of chronic human diseases and to assess accurately their risks. The goal of studying the exposome (i.e., of *exposomics*) is disease prevention through the acquisition of a broad scientific perspective that encompasses health, environmental, educational, socioeconomic and political factors [6–9].

Two recent collaborative projects have applied the exposome concept to investigating environmental impacts on human health by assessing environmental exposure at personal and population levels within existing short- and long-term population studies. In the *EXPOsOMICS* project the emphasis has been on the measurement and impact of air and water pollution, studied in a number of adult and child study populations [10]. In the *HELIX* project the focus has been on early-life events, examining exposure to a range of chemicals and physical agents in existing birth cohorts [11]. Both these projects utilised a combination of exposure monitoring, using mobile and static monitors, smartphone and satellite data, and omics techniques to investigate biomarkers associated with exposures. The multi-omic approach has included metabolome, proteome, transcriptome, epigenome and adductome profiles. While many of the results of these interrelated analyses have yet to emerge, it is anticipated that new insights into the importance of environmental factors in the aetiology of human diseases will ensue and that the studies will point the way to improved strategies for monitoring human exposures and their health consequences.

While these projects have focused on human exposures and health outcomes, broader ecological issues may also be addressed by the exposome concept. The adverse outcome pathway (AOP) concept seeks to define the initial molecular events that culminate in adverse (toxicological) endpoints [12]. There is currently much discussion of how to assess the properties of complex mixtures of chemicals, taking into consideration possible positive and negative interactions between their components, in order to refine hazard identification and risk assessment. It has been proposed that considering the relative contributions of components of the exposome in relation to complex mixtures combined with a mechanistic understanding of the induced adverse effects, may improve the integrated risk assessment for both human and environmental health [13].

Electrophiles have long been suspected in the causality of cancer and other chronic diseases. Because they are reactive, they can be measured indirectly through the adducts they form with protein and DNA. Indeed, damage to, or modification of, DNA by reactive intermediates of chemical carcinogens or by ionising and non-ionising radiation is a key early event in the carcinogenic process. The exposome concept encompasses a "top-down" approach to identifying environmental factors that determine susceptibility to disease throughout the entire lifespan. In parallel, a "bottom-up" approach can investigate biomarkers specific for certain environmental exposures, based on knowledge of environmental carcinogens and their pathways of metabolic activation. As part of this approach, protein adductomics constitute the untargeted investigation of modification of proteins by endogenous or exogenous agents.

#### **2. Approaches to Protein Adductomics**

The concept of an adductome (that is, a collection of additional products) implicates two types of reactant: Those that add, and those to which are added. In the context of the present discussion, these are nucleophilic protein sites (amino acid residues) and electrophilic toxicants, respectively1. Reactants of either type are potentially diverse, meaning that the adductome could be vast. From an analytical standpoint, this potential vastness (i.e., structural diversity) is problematic because of the lack of a common 'handle' or 'signature' by which to purify and identify the adducts. Accordingly, investigators have focused on adducts of either specific nucleophiles or specific electrophiles. If the investigator's aim is to discover biomarkers of exposure, a nucleophile is selected and the electrophiles to which it adds are captured; if the aim is instead to discover targets, an electrophile is selected and the nucleophiles that add to it are captured. Given that the focus of this review is on biomarkers of environmental exposure, we will concentrate on the former approach. The latter approach is also important, however, because it is a route by which novel adducts could be accessed, either directly [14,15] or indirectly [16].

Of the methods that capture electrophiles, the most advanced methods are based on haemoglobin (Hb) and human serum albumin (HSA). There have been a number of important methodological

<sup>1</sup> A note regarding language. For the reaction of a nucleophile with an electrophile, the view of the chemist is that the *nucleophile* is the active participant, providing electrons for the chemical bond ('nucleophilic addition', 'nucleophilic attack', and so on). Toxicologists, on the other hand, tend to speak of the *toxicant* as active (toxicant 'binds' to target), and since the toxicant is usually an electrophile the roles would seem to switch. This second interpretation is equally logical because the nucleophilic targets are often endogenous and less mobile (e.g., DNA or protein) and therefore seem to be passive entities.

developments since Rappaport et al. reviewed the subject in 2012 [17]. Another, related review [18] was published during the preparation of the present review.

#### *2.1. Hb as A Target of Electrophiles*

Hb is found in the erythrocytes, where it functions as an oxygen carrier. Its high concentration and reactivity (see below) make it a likely target of electrophiles, and its long lifetime in vivo (126 days, the lifetime of an erythrocyte [19]) presumably gives the resulting adducts an opportunity to accumulate. Human Hb A, the major form of Hb in adults, is a tetramer composed of two α-chains and two β-chains. The four chains, each of which binds one molecule of haem, all adopt similar folds in the tetramer. The α- and β-chains have several amino acid residues in common, including the *N*-terminal valine residues [20]. The α-amino groups of these terminal residues are nucleophilic, and have been observed to react with toxicologically-relevant electrophiles [21]. The *N*-terminal α-amino groups of the α- and β-chains have similar pKa values and similar reactivity towards certain electrophiles (e.g., the acetylating agent acetic anhydride), but not necessarily towards all electrophiles [22]. For example, another acetylating agent, methyl acetyl phosphate, has been observed to modify the *N*-terminus of only the β-chain [23].

The β-chain of Hb possesses a cysteine residue (Cys-β93) for which there is no equivalent in the α-chain [20]. Adducts of Hb Cys-β93 have been the subject of both targeted and, to a lesser extent, untargeted adductomic analyses (see below). A targeted adductomic method (i.e., a method involving simultaneous monitoring of multiple known/hypothesised adducts) was used to monitor Hb adducts of 15 different aromatic amines (e.g., 4-aminobiphenyl) in tobacco smokers' blood [24]. These, it should be pointed out, are not adducts of the amines themselves, but rather of the corresponding arylnitroso compounds [25]. Arylnitroso compounds form via oxidation of the amines' *N*-hydroxy metabolites, in a reaction for which, in the erythrocyte at least, the oxidant is the oxy form of Hb itself. The Cys-β93 adducts of arylnitroso compounds are *N*-arylsulfinamides, which hydrolyse under acidic conditions to regenerate their corresponding aromatic amines [26]. On this basis, detection of the aromatic amines liberated by acid hydrolysis of *N*-arylsulfinamides has been used as an indirect way of detecting the adducts [24].

#### *2.2. The N-alkyl Edman Method*

The analytical tractability of Hb *N*-terminal adducts is due to a general property of *N*-terminal amino acid residues, namely their ability to be detached from the rest of the protein via Edman degradation. This is a procedure that was originally developed for protein sequencing, but which was modified in the 1980s by Ehrenberg and co-workers for the analysis of Hb *N*-terminal adducts [27]. Ehrenberg and co-workers' procedure has been referred to as the '*N*-alkyl Edman method' because of its ability to detect, for example, *N*α-methyl and *N*α-ethyl substituents [28,29]. In fact, the observed *N*α-substituents have not been limited to simple alkyl groups, but for convenience the modified *N*-terminal amino acid is referred to as *N*-alkylvaline. Edman's original procedure involved reacting the α-amino group of a peptide with phenyl isothiocyanate, which rendered an acid-labile product [30]. Treatment of this product with anhydrous acid liberates the terminal amino acid as an anilinothiazolinone, which is then isomerised in aqueous acid to a phenylthiohydantoin (PTH) [31]. Ehrenberg and co-workers found that Hb with *N*-terminal *N*-alkylvaline (i.e., a secondary amine) reacted with isothiocyanate reagents in the same way as unmodified Hb, but that the resulting derivatives were labile even under neutral conditions [27]. The final product, a substituted PTH, could therefore be isolated using conditions under which unmodified Hb remained intact.

In subsequent iterations of the *N*-alkyl Edman method, the isothiocyanate reagent was varied so as to generate analytes appropriate for particular analytical methods. The most recent iteration, the 'FI*R*E procedure', uses fluorescein isothiocyanate ('FI*R*E' being a contraction of 'fluorescein isothiocyanate', '*R*-group' and 'Edman degradation') [32]. The FI*R*E procedure was initially developed with targeted analysis in mind, but was later adapted for untargeted analyses ('FI*R*E screening procedure' [28]).

#### *2.3. The Role of Tandem Mass Spectrometry in Protein Adductomics*

Like most other adductomic methodologies, the FI*R*E screening procedure utilises tandem mass spectrometry (MS/MS) for the detection of adducts. MS/MS, as its name suggests, involves two stages of mass analysis. The first stage is for intact precursor ions (e.g., protonated molecules) and the second stage is for product ions (i.e., fragments of precursor ions). A process of fragmentation takes place in between the two stages. Mass analysis can be performed in either a static mode, whereby ions of specified mass-to-charge ratio (*m*/*z*) are isolated, or a dynamic mode, whereby a continuous range of *m*/*z* values is scanned. Either stage can be performed in either mode, meaning that a number of different types of experiment are possible. In selected reaction monitoring (SRM), a technique commonly used for targeted analyses, ions of pre-specified *m*/*z* are isolated at both stages. Isolation is achieved by defining a narrow window of permissible *m*/*z* values and is often done using a quadrupole mass filter. An apparatus commonly used for SRM is the triple quadrupole mass spectrometer, which consists of two quadrupole mass filters, with a collision cell between them, connected in series. The first and second stages of mass analysis take place in the first and second filters, respectively, with fragmentation taking place in the collision cell. Other MS/MS techniques of relevance to this review are precursor ion scanning, data-dependent acquisition (DDA) and data-independent acquisition (DIA). These will be covered in more detail in the sections concerning HSA adductomics.

#### *2.4. Stepped MS*/*MS Methods*

Several adductomic studies have employed stepped methods, which can be thought of as hybrids of SRM and scanning. A stepped method consists of a sequence of SRM experiments that collectively resemble a scan. In considering how the methods work, it is instructive to think of adducts' structures in terms of two distinct parts: A constant part that derives from the nucleophile (common to all precursor ions) and a variable part that derives from the electrophile (variable among precursor ions). It follows, therefore, that a given product ion (or neutral fragment) will be either constant or variable depending on how the precursor ion becomes broken up into fragments. Given that the variable parts of the precursor ions are unlikely to be known *a priori*, the constituent SRM experiments of a stepped method must be necessarily arbitrary. For this reason, it is common to see lists of equally-spaced integer or half-integer *m*/*z* values [28]. We have referred to these arbitrary values as sampling points [33]. The idea of an arbitrary SRM experiment might strike the reader as odd, since SRM is traditionally used for targeted analyses, but for untargeted analyses it does not matter where the sampling points fall. The important thing is that, collectively, they are able to capture all relevant adducts. The limitation of stepped methods is their low resolution, which means that they are unable to identify adducts unambiguously purely on the basis of mass. Their value, therefore, tends to be in providing a quantitative description of the distribution of adducts.

#### *2.5. The FIRE Screening Procedure*

The FI*R*E screening procedure [28] is a method for untargeted detection of Hb adducts (Figure 1). It is a stepped method akin to the 'adductome approach to detect DNA damage' developed by Kanaly et al. [34]. In the FI*R*E screening procedure, different precursor ions (protonated fluorescein thiohydantoins, FTHs) are captured at the first stage of mass analysis via one of 136 different windows. Each window is approximately 0.7 *m*/*z* units wide, and the *m*/*z* values on which the windows are centred are 1 Da apart. Thus, by cycling through all 136 windows, the method can capture a wide range of precursor ions and can, therefore, detect the corresponding range of mass shifts (between +14 and +149 Da). Once captured, a precursor ion is fragmented, and its products are passed to the second stage of mass analysis. Here, a set of fixed windows permit only constant product ions to pass to the detector (implicates loss of variable neutral fragments), and a variable window permits only variable product ions to pass (implicates loss of constant neutral fragments). If the right combination of constant and variable product ions is detected, then the presence of a corresponding FTH, and therefore Hb adduct,

can be inferred. This MS/MS is done 'online' following the chromatographic separation of the FTHs, and the data thus generated are, like those reported by Kanaly et al., visualised as an 'adductome map', usually a plot of *m*/*z* against retention time [28,34,35].

**Figure 1.** Main steps of the FI*R*E ('fluorescein isothiocyanate', '*R*-group' and 'Edman degradation') screening procedure for Hb *N*-terminal adductomics. The procedure detects 'R' groups, which are generated when an *N*-terminus of Hb reacts with an electrophile in vivo. The *N*-termini are derivatised with fluorescein isothiocyanate, and derivatives with 'R' groups are selectively decomposed to the corresponding fluorescein thiohydantoins. The thiohydantoins are analysed using LC and online 'stepped' triple quadrupole mass spectrometry.

Carlsson et al. used their procedure to screen the blood of smokers and non-smokers and detected 26 features of interest; this study is described below in Section 3.

#### *2.6. HSA as A Target of Electrophiles*

HSA is the major protein in human plasma. Its lifetime in vivo, whilst shorter than that of Hb, is presumably still long enough for adducts to accumulate. In vivo, HSA binds fatty acids, scavenges metal ions, and contributes to the oncotic pressure of blood [22]. Extensive use of HSA has been made for the biological monitoring of toxicants, and a detailed account of this can be found in the recent review by Sabbioni and Turesky [36]. For the purposes of the present review, we focus on providing a background to the untargeted HSA adductomics studies.

Thus far, HSA contains a number of nucleophilic sites, including (but not limited to) histidine residues, lysine residues and a single reduced cysteine residue (Cys-34). Notably, histidine residues in HSA, as in Hb, are targets of epoxides [37,38]. Lysine residues in serum albumins are notable targets of aflatoxin B1 dialdehyde [39,40].

Cys-34 is the only site in HSA for which untargeted adductomic methods have been developed. The motivation to look at this particular site is related to the unique chemistry of thiol groups, and the fact that HSA Cys-34 accounts for the majority of such groups in human plasma [41]. Given that the reacting species is a thiolate anion rather than a thiol group proper [41], adduct formation should be promoted by alkaline conditions and/or basic groups within the local protein environment. The pKa of the HSA Cys-34 thiol group is controversial, but is generally regarded to be lower than that of a typical thiol group [41]. In the three-dimensional structure of HSA, as determined by X-ray crystallography, the side chain of Cys-34 is partially buried [42]. On this basis, it has been inferred that there might be a limit to the size of the electrophiles that HSA Cys-34 can add to. It has also been recognised, however, that the

tertiary structure of HSA is dynamic and that Cys-34 may become less buried upon deprotonation of the thiol group [17,43]. HSA Cys-34 is reactive towards a variety of toxicologically-relevant electrophiles, including sulphur mustard and metabolites of aromatic amines [44,45], and can also undergo oxidative transformations [46,47]. It appears that, in vivo, a substantial proportion of the HSA Cys-34 thiol groups is *S*-thiolated (*S*-[cystein-*S*-yl], *S*-[glutathion-*S*-yl] and so on), and a smaller, but appreciable proportion is found as the corresponding sulfenic, sulfinic or sulfonic acids [41,47].

#### *2.7. HSA Cys-34 Adductomics*

To date, methods for HSA Cys-34 adductomics have been based exclusively on peptide analytes (Figure 2). When HSA is digested with trypsin, and no cleavages are missed, Cys-34 and its adducts are found in a 21-amino-acid peptide [48,49]. This peptide, which Rappaport's group has referred to as 'T3' (i.e., the third-heaviest tryptic peptide [49]), has been used as an analyte in a number of studies [49–51]. When a combination of trypsin and chymotrypsin is used, the Cys-34-containing peptide is instead the LQQCPF hexapeptide [43]. The use of Pronase, suggested by Sabbioni and Turesky as a means of generating lower-molecular-weight analytes, has not to our knowledge been implemented for untargeted HSA Cys-34 adductomics [36]. When Noort et al. [44,52] used Pronase to digest HSA adducts of either sulphur mustard or acrylamide, the respective modifications were found in the CPF tripeptide.

**Figure 2.** Main steps of published HSA Cys-34 adductomic workflows. The reaction of HSA with an electrophile in the blood plasma installs an 'R' group at the Cys-34 site. The HSA is isolated from plasma or serum and digested—usually with trypsin—to produce a mixture of peptides. Some of the peptides contain 'R' groups and others do not (the introduction of an enrichment step prior to digestion can limit the number of those that do not). Peptides are then separated chromatographically and analysed using MS/MS. One of the MS/MS methods, a stepped triple-quadrupole method termed FS-SRM, is depicted. This method monitors three variable product ions of the tryptic 'T3' peptide (*y*15, *y*<sup>16</sup> and *y*17).

Some of the first untargeted HSA adductomic analyses were performed by Aldini et al. using the technique of precursor ion scanning [43] (see also the 'chemical modificomics' method proposed by Goto et al. [53]). Precursor ion scanning is an MS/MS technique involving a scan at the first stage of mass analysis and the isolation of a constant product ion at the second stage. The result is a spectrum of the different precursor ions that give rise to a given product. Aldini et al. [43] reacted purified HSA with a mixture of α,β-unsaturated aldehydes (4-hydroxy-2-nonenal, 4-hydroxy-2-hexenal and acrolein), and digested the products with trypsin and chymotrypsin. Analysis of the digestion products, using liquid chromatography (LC) and online precursor ion scanning, revealed peaks corresponding to substituted LQQCPF peptides. These, in turn, corresponded to HSA Cys-34 Michael adducts of the α,β-unsaturated aldehyde reactants.

#### *2.8. Fixed-Step SRM of HSA Adducts*

An important development, reported by Li et al. in 2011, was the demonstration of a stepped method called fixed-step SRM (FS-SRM [49]). FS-SRM consists of a sequence of SRM experiments that collectively resemble a linked scan [54]. In developing the method, Li et al. drew on elements of the 'adductome approach to detect DNA damage' described by Kanaly et al. [34,55], and also a method of analysing mercapturic acids described by Wagner et al. [56]. Being a stepped method, FS-SRM is broadly analogous to the FI*R*E screening procedure (which, in fact, it pre-dates). The analytes in FS-SRM are substituted T3 peptides, and the precursor ions captured in the first stage of mass analysis are triply-protonated peptides. The product ions isolated in the second stage are doubly-charged variable *y*-ions and a singly-charged constant *b*-ion. Together, these precursor and product ions constitute what is effectively a peptide sequence tag [57]. The sampling points used for FS-SRM are 4.5 Da apart and, in the Li and co-workers' study, there were 77 of them. FS-SRM differs from the other stepped methods in that, for FS-SRM, the sample is infused into the mass spectrometer as a mixture of adducts rather than as a series of eluted components. There is still an LC step but it is disconnected from the mass spectrometry, and it serves to capture the entire population of adducts rather than to separate them. The method is therefore freed from a major constraint imposed by LC, namely the need for a full set of SRM experiments to be done within the width of a chromatographic peak.

Our personal experience with protein adductomics has been in the implementation of FS-SRM for epidemiological studies [10,33]. Such studies, which typically involve tens or hundreds of samples, pose challenges that are not necessarily encountered in smaller pilot studies. In implementing the method of Li et al., the main challenge that we faced was the need for higher throughput. This was addressed by evaluating the various stages of sample preparation (HSA purification, adduct enrichment, digestion and peptide clean-up) and optimising these where possible. Notably, we deleted the adduct enrichment step, and we changed the method of sample clean-up from HPLC (serial) to solid-phase extraction (SPE; effectively parallel). A model adduct, prepared by treating HSA with *N*-ethylmaleimide, proved useful for evaluating the performance of the methods.

In parallel with our work on FS-SRM, Grigoryan et al. [50] developed a new analytical workflow based on LC with on-line DDA mass spectrometry. In DDA, the *data* on which the acquisition is *dependent* are precursor ions' *m*/*z* values, and they are obtained via a high-resolution scan—using, for example, an Orbitrap mass analyser. The *data* are used to direct the isolation of precursor ions, and so only these precursor ions are fragmented. The *acquisition* is the scan via which the resulting product ions are detected. In addition to their analytical method, Grigoryan et al. [50] also developed methods for sample preparation and data analysis (the 'adductomics pipeline'). The method of sample preparation is essentially a streamlined version of the one developed by Li et al. [49]. One major difference with respect to the earlier method, however, was the omission of a reducing agent, which had previously been used to reduce protein disulphide bonds prior to tryptic digestion. The effect of omitting the reducing agent was to preserve *S*-thiolated forms of Cys-34. The method of data analysis begins with the detection of a tag (a combination of constant and variable product ions) in the product-ion scan data. The corresponding precursor ion is then identified, and an ion count chromatogram for this precursor ion is extracted. A particularly innovative part of the pipeline is the method by which the peptide analytes are quantified. Each analyte is quantified relative to a 'housekeeping peptide', which is another tryptic peptide of HSA. In this way, the method is able to control for variation in the quantity of digested HSA. Grigoryan et al. [50] used their pipeline to analyse samples of plasma from smokers and non-smokers, and found a total of 43 putative adducts (see Section 3 below).

#### *2.9. Multiplex Adduct Peptide Profiling*

Another promising method for HSA Cys-34 adductomics (and potentially also Hb Cys-β93 adductomics) is 'multiplex adduct peptide profiling' (MAPP [51]). MAPP utilises DIA mass spectrometry, which is perhaps the least prescriptive of all MS/MS techniques. Similar to a stepped SRM-based method, DIA captures precursor ions via a series of contiguous windows. The windows are, however, rather wider than those used for SRM, and it is therefore likely that a given window will capture multiple precursor ions (in MAPP, for example, the width of each window is 10 *m*/*z* units). As in DDA mass spectrometry, the second stage of mass analysis is a scan, and a high-resolution scan is done as an alternative first stage.

The MAPP method, like the 'adductomics pipeline', requires prior knowledge of the peptide analyte's sequence and the site of modification. Series of constant product ions (e.g., b-ions from backbone scission near the N-terminus) are recognised and are linked back to their respective precursor ions via common chromatographic retention times. The substituted peptide's mass shift is then confirmed by the presence of corresponding variable product ions. Although the authors were only able to identify oxidised and *S*-thiolated forms of HSA Cys-34, their method has the potential to detect toxicologically-relevant adducts (e.g., if the samples could be further enriched for these adducts prior to analysis).

#### *2.10. Hb and HSA Compared*

Given that Hb and HSA contain some of the same nucleophilic functional groups, these proteins might be expected to have overlapping reactivity towards electrophiles. The observation that cysteine residues in HSA and Hb can add to comparable amounts of benzene oxide in vivo, for example, is evidence of such overlap [58]. On the other hand, Dingley et al. [59] found that dietary exposure to 2-amino-1-methyl-6-phenylimidazo[4,5-*b*]pyridine (PhIP; see Section 3.3) caused the formation of substantially larger amounts of HSA adducts than Hb adducts. A similar fate has been observed for aflatoxin B1 in rats: of a given dose of this toxicant, a substantially higher proportion is found bound to serum albumin than to Hb [60,61]. This might also be expected to be the case in humans, and indeed assays for HSA adducts of aflatoxin B1 dialdehyde have been developed [62]. Possible reasons for differences in the amount or type of adducts include (i) the fact that Hb and HSA are synthesised at different sites in the body (in different cell types), and as a result could be exposed to different electrophiles [36]; (ii) the fact that Hb resides inside the erythrocyte, whereas HSA is secreted [18]; (iii) the influence of neighbouring amino acid side chains and cofactors on the reactivity of the nucleophilic groups (see Sections 2.1 and 2.6); and (iv) the possibility that the erythrocyte membrane could shield Hb from electrophiles, or even sequester electrophiles [63]. It is also worth considering that apparent differences in the extent of adduct formation could reflect differences in chemical and biological stability of the proteins and/or modifications.

#### *2.11. Other Target Proteins*

Few proteins other than Hb and HSA have been discussed as candidates for untargeted adductomic analyses, and fewer still have been investigated experimentally. Hb and HSA adducts are probably two of the richest and most accessible sources of potential biomarkers, but this is not to say that other proteins could not provide additional and unique information. Three other proteins of relevance to the present review have been discussed: Collagen, histones and apolipoproteins. Collagen is mentioned by Scheepers in his workshop report [19], presumably because of its abundance in the body and its extremely long lifespan in certain tissues [64]. However, there have been few attempts to use collagen adducts for biological monitoring, probably because of the heterogeneity, physical properties and limited accessibility of collagen [65–67]. Histones, which are also mentioned by Scheepers, represent a more promising source of biomarkers. Work on histone adducts has not been extensive, but some interesting results have been obtained. *N*-Terminal segments of histones are of particular interest because they protrude from nucleosomal core particles, and, on this basis, it is plausible that they could be accessible to electrophiles. Consistent with this idea, SooHoo et al. [68] observed modifications near the N-termini of histones isolated from cultured human lymphoblasts that had been exposed to *anti*-benzo[a]pyrene 7,8-dihydrodiol-9,10-oxide (BPDE). Fabrizi et al. [69] used a model peptide to

infer the reactivity of an *N*-terminal segment of histone H2B towards phosgene, and observed the incorporation of carbonyl groups into the peptide.

Apolipoproteins have been investigated as targets of endogenous electrophiles, such as the lipid oxidation product 4-hydroxy-2-nonenal. By definition, endogenous adducts cannot be biomarkers of exposure in the strict sense, but they could potentially be biomarkers of effect. We mention them here because they have been the subject of a recent untargeted adductomics study. This study focused on adducts of histidine and lysine residues in human low density lipoprotein [35]. Unlike the FI*R*E screening procedure or FS-SRM, the method is not site-specific; rather, it detects modifications to any and all residues of particular amino acid. The analytes are 'free' amino acids, which are prepared from lipoprotein by acid hydrolysis. Consequently, they may represent a mixture of sites, and perhaps a mixture of proteins. The analytical method, like others described elsewhere in this article, involves ultraperformance LC and triple quadrupole mass spectrometry. Apparently it is a stepped method, in which a constant product ion is isolated at the second stage of mass analysis. For adducts of histidine residues, the constant product is the immonium ion of histidine, and for adducts of lysine residues, it is a deaminated immonium ion of lysine. Shibata et al. [35] used their method to analyse low density lipoprotein that had been first purified from human plasma, and then oxidised in vitro. The oxidised lipoprotein was treated with sodium borohydride to reduce imine linkages (as in, for example, a lysine residue adducts of 9-oxononanoic acid), before being hydrolysed and the resulting amino acids analysed. The authors produced adductome maps for lipoprotein with and without oxidation, and by comparing these maps they were able to attribute the formation of the aforementioned 9-oxononanoic acid adduct to the oxidising condition.

#### *2.12. Adduct Enrichment*

Enrichment, in the context of untargeted adductomics, entails depletion of the unmodified nucleophile and possibly also other substances that might interfere with the detection of the adducts. In the FI*R*E screening procedure for Hb adducts, enrichment is facilitated by the detachment of the *N*-alkylvaline residues. This exaggerates the relatively minor difference in structure between Hb and its adducts, thereby allowing the unmodified Hb to be removed readily [28]. For HSA Cys-34 adductomics, methods of enrichment have mainly exploited the reactivity of the Cys-34 thiol group, which is present in the unmodified HSA but not in the adducts. Funk et al. [70] demonstrated the use of a disulfide-functionalised resin for scavenging unmodified HSA, and this method was later used in adductomic workflows [49,51,71]. The main limitation of the thiol scavenging method is that it does not remove *S*-thiolated HSA: If a reducing agent is later added to reduce the other disulfide bonds in HSA (i.e., those of the cystine residues) then the *S*-thiolation is reversed and the Cys-34 thiol would seem to reappear. Funk et al. [70] sought to limit this effect by removing the S-thiolation prior to the scavenging step. In our hands, the thiol scavenging method proved difficult to implement in a high-throughput setting, and so we deleted it from our workflow [33]. Chung et al. [71] used thiol scavenging as the first of two stages of enrichment, the second stage being an antibody-mediated purification of the substituted T3 peptides using a polyclonal antibody raised against the T3 peptide but having cross-reactivity with adducts.

#### **3. Human Biomonitoring**

#### *3.1. Methodological Considerations*

Human biomonitoring refers to the quantification of xenobiotics or their derivatives (and sometimes their early effects) in human biospecimens [72]. As well as confirming the nature of the exposure, biomonitoring aims to measure the internal dose of the xenobiotic(s). The biomonitoring of protein adducts is usually done as part of the 'bottom up' (targeted) approach (see Section 1). A typical targeted method might involve isotope dilution (i.e., the addition of a known amount of an isotopically-labelled

standard) followed by LC-MS/MS. This would require prior characterisation of the adduct and synthesis of a suitable standard.

In principle, data collected via the untargeted approach (e.g., peak areas from LC-MS/MS) could be used in the same way as those collected in targeted studies. However, this would depend on the untargeted method achieving an acceptable accuracy, precision and dynamic range for each relevant adduct. At some stage, a synthetic reference compound would be needed to confirm a particular adduct's identity, and to implicate the corresponding electrophile [18]. For hitherto unknown adducts, possible identities must first be proposed. Methods that have assisted in this endeavour have included database searching, the use of calculator software, and the comparison of measured and predicted physicochemical properties [50,73]. The characterisation of novel adducts—a challenging aspect of the research—has been reviewed in detail by Carlsson et al. [18].

Accuracy, in practice, may suffer as a consequence of the need to capture a range of adducts. It is likely that the use of generic standards (e.g., the S-carbamidomethylated T3 peptide for FS-SRM) affects accuracy, and therefore precludes absolute quantification [33]. Dynamic ranges are dependent on the analytical method, and presumably also on the ability to enrich adducts. As judged from lowest reported adduct concentrations, the detection limits of Grigoryan and co-workers' LC-MS-based method, and of the FI*R*E screening procedure, are good (<7 and <0.1 adduct molecules per million HSA molecules or Hb chains, respectively [28,50]). The methods should, therefore, be able to detect some xenobiotic adducts, although in practice relatively few such adducts have been observed [50]. For FS-SRM (our implementation), the detection and quantification limits are in the region of one adduct molecule per thousand HSA molecules, and are probably too high to detect xenobiotic adducts [33]. Putative adducts detected by FS-SRM and the other methods may, however, relate to the early effects of exposure.

At the present time, the role of the untargeted methods is to complement the targeted methods, rather than to replace them. Indeed, approaches that combine both methods have been proposed [7]. Some authors advocate a more pragmatic 'fit-for-purpose' approach, which balances methodological rigour with cost. Dennis et al. draw a distinction between regulatory endeavours, which require maximal rigour, and exploratory studies whose aims might be achievable without a fully validated method [7].

While the discipline of untargeted protein adductomics is still a relatively young one, there have been a number of pilot studies that have sought to demonstrate its utility. Additionally, some targeted investigations have looked for adduct formation at the same sites (e.g., Cys-34 of HSA) and these will also be mentioned here.

#### *3.2. Human Biomonitoring of Hb Adducts*

In the first adductomic application of the FI*R*E method (see Section 2.5), Hb samples from smokers and non-smokers were analysed and compared [28]. In all samples seven adducts at the *N*-terminal valine residue were identified; these were the addition of methyl and ethyl groups, and adducts formed by ethylene oxide, acrylonitrile, methyl vinyl ketone, acrylamide and glycidamide; in addition, a further 19 unknown adducts were detected in all samples. Subsequently, one of these unknown adducts has been identified as derived from ethyl methyl ketone [74]. A further four have been attributed to the precursor electrophiles glyoxal, methylglyoxal, acrylic acid and 1-octen-3-one [73]; and recently another adduct, detected in smokers and non-smokers at similar levels, has been identified as *N*-(4-hydroxybenzyl)valine, postulated to have arisen from either 4-quinone methide, which could form the valine adduct via a Michael addition, or 4-hydroxybenzaldehyde, which could form the same adduct via a Schiff base formation followed by reduction [75].

Applying their untargeted Hb adductomic approach to a larger study population, Carlsson et al. [76] analysed blood samples from healthy children about 12 years old (*n* = 51). In this cohort, a total of 24 adducts (12 of them previously identified; see above) were observed and their levels quantified. Relatively large interindividual variations in adduct levels were observed. The frequencies

of micronuclei in erythrocytes were also determined. Analysis using a partial least-squares regression model showed that as much as 60% of the micronucleus variation could be explained by the adduct levels. This indicates the ability of such studies to align measurements of internal dose (protein adducts) with endpoints of genotoxicity (micronucleus formation).

#### *3.3. Human Biomonitoring of HSA Adducts*

An early study that demonstrated the utility of monitoring HSA for alkylated cysteine involved exposure of human blood to 14C-labelled sulfur mustard (the chemical warfare agent mustard gas) [44]. Isolation and tryptic digestion of albumin produced the 21-amino acid fragment containing a sulfur mustard-cysteine adduct, detected by micro-LC-MS/MS. An alternative method, which employed Pronase for the digestion, yielded a modified tripeptide (Cys-Pro-Phe), which was detected with greater sensitivity than the 21-amino acid fragment. The method was used to analyse samples of blood from nine Iranians exposed to sulfur mustard during the Iran-Iraq war of 1986. In all nine cases, the sulfur mustard-adducted tripeptide was detected.

Application of the FS-SRM method to analyses of archived plasma protein that had been pooled according to subjects' ethnicities and tobacco smoking habits demonstrated differences between pools [49] and suggested that FS-SRM might be able to detect statistically significant differences between groups of individual samples that had not been pooled.

A pilot study of 20 smokers and 20 never-smokers provided evidence of the effect of smoking on levels of putative HSA adducts. Differences between smokers and never-smokers were most apparent in putative adducts with net gains in mass between 105 Da and 114 Da (relative to unmodified HSA) [33].

Further investigations of the effects of tobacco smoking have revealed around 43 adduct features, some of which are positively associated with smoking and but also some that are negatively associated. The former result from genotoxic constituents of tobacco smoke, such as ethylene oxide and acrylonitrile, while the latter, which include Cys-34 oxidation products and disulfides, may reflect alterations in the serum redox state of smokers, resulting in lower adduct levels [50].

Grigoryan et al. used LC and high-resolution mass spectrometry to investigate interactions between the Cys-34 and reactive oxygen species (ROS) [47]. Chronic exposure to ROS is linked to many chronic diseases and, in this study, a number of adducts originating from ROS were detected in human serum: Sulfinic acid, sulfonic acid and a proposed sulfinamide structure (a mono-oxygenated moiety also with the loss of two hydrogen atoms).

Antibody enrichment may pave the way to a more sensitive assay. Using a polyclonal antibody, raised against the T3 peptide, but with cross-reactivity to the peptide containing adducts (see Section 2.12), ten modified T3 peptides were detected in human plasma samples; eight of them were characterised and they included Cys-34 oxidation products, modification involving loss of water or lysine, cysteinylation, and transpeptidation of arginine [71].

In a study of women from the Xuanwei and Fuyuan counties in China, where extensive use of smoky coal for heating and cooking has resulted in very high rates of lung cancer among non-smokers, HSA Cys-34 adducts were compared in 29 females who used smoky coal and 10 controls using other energy sources [77]. Fifty different modified T3 peptides were identified, including oxidation products, mixed disulfides, rearrangements and truncations. Two peptides that were detected at significantly *lower* levels in the smoky coal group were adducts of glutathione and γ-glutamylcysteine. The results are interpreted as evidence that exposure to the indoor combustion products results in depletion of glutathione, an essential antioxidant, as well as its precursor γ-glutamylcysteine [77].

A recent study on the health effects of urban air pollution, the Oxford Street II study [78], involved a randomised crossover design whereby three groups of volunteers (healthy subjects, chronic obstructive pulmonary disease (COPD) sufferers and patients with ischaemic heart disease (IHD)) walked for two hours along a busy street in London where traffic is restricted to diesel buses and taxis. The volunteers also spent two hours walking in a London park on a separate occasion. They were monitored for respiratory and cardiovascular function in both environments and, in addition, two studies have analysed their HSA samples for adducts. In the first report, Liu et al. [79] analysed 50 HSA samples by high-resolution mass spectrometry to determine whether protein modifications differ between COPD or IHD patients and healthy subjects. The untargeted analysis of adducts at the Cys-34 locus of HSA detected 39 adducts with sufficient data, and these adducts were examined for associations with estimated exposures to air pollution and health status. Multivariate linear regression revealed 21 significant associations, mainly with the underlying diseases, but also with air-pollution exposures. Interestingly, most of the associations indicated that adduct levels decreased with the presence of disease or increased pollutant concentrations. Negative associations of COPD and IHD with the Cys-34 disulfide of glutathione and two Cys-34 sulfoxidations were consistent with results from smokers and non-smokers [50] and from non-smoking women exposed to indoor combustion of coal and wood [77].

In the second study, Preston et al. [80] examined a larger number of Oxford Street II samples by the FS-SRM method. Associations between amounts of putative adducts and two types of measure were tested: Pollution (e.g., ambient concentrations of nitrogen dioxide and particulate matter) and health outcome (e.g., measures of lung health and arterial stiffness). There were 11 instances of a response variable being associated with a pollution measurement and eight instances of a response variable being associated with a health outcome measure. However, no two measures of different types were associated with the same adduct amount, suggesting that the internal changes responsible for health outcomes may differ from those that effect changes in adduct amounts.

In a more targeted study, Bellamri et al. [81] investigated the formation in human subjects of HSA adducts at Cys-34 by PhIP, which is formed in cooked meats and may be associated with colorectal, prostate and mammary cancer. Volunteers abstained from eating cooked well-done meat or fish for three weeks, then ate a semi-controlled diet that included cooked beef containing known quantities of PhIP for four weeks. The volunteers then returned to their regular diets, but with the exclusion of cooked well-done meat and fish for a further four weeks. The authors found that an adduct of oxidised PhIP, which was below the limit of detection (LOD) (10 femtograms PhIP/mg HSA) in most subjects before the meat feeding, increased by up to 560-fold at week 4 in subjects who ate meat containing 8.0 to 11.7 μg of PhIP per 150–200 g serving. In contrast, the adduct remained below the LOD in subjects who ingested 1.2 or 3.0 μg PhIP per serving, and PhIP-HSA adduct levels did not correlate with PhIP intake levels across four exposure groups (*p* = 0.76). There were also indications that the PhIP adduct was unstable, having a half-life of fewer than two weeks. Nevertheless, the study demonstrates that the Cys-34 site in HSA is accessible by a relatively large molecule like PhIP, despite concerns about possible steric hindrance (see Section 2.6.).

#### **4. Prospects**

A key advantage of monitoring proteins for adducts is the abundance of material that can be obtained from tissue banks; for example, red blood cells are an abundant source of Hb and blood plasma or serum is an abundant source of HSA. The proteins' lifespans in blood mean that there is a substantial "capture period" for monitoring exposure to genotoxicants; and protein adducts, unlike DNA adducts, are not subject to loss through repair processes. Full implementation of the exposome concept requires monitoring individuals or populations at several points in time over the course of their lives [3,4]. This is achievable if biobanks collect material from individuals not just once but multiple times, and such biobanks already exist.

Dried blood spots can also be a suitable source of protein for investigation [82]. If obtained from neonatal blood spots (i.e., Guthrie spots) then the material provides a valuable opportunity for investigating exposures in utero. A single blood spot of about 50 μL is estimated to contain about 9.6 mg of protein, of which about 7.7 mg will be Hb and 1.2 mg HSA [82]. In a proof-of-principle study, Yano et al. [83] identified 26 Cys-34 adducts (oxidation and *S*-thiolation products) in HSA isolated from dried blood spots of 49 newborn babies and were able to distinguish between newborns of smoking and non-smoking mothers on the basis of the levels of a putative cyano modification to Cys-34.

There is also the potential to broaden the scope of adductomics by investigating novel modifiable loci in blood proteins. Cys-34 of HSA and the *N*-terminus of Hb are undoubtedly major targets for electrophiles, but the literature hints at wider reactivity within the blood proteome. Consequently, a hitherto-untapped source of analytes for protein adductomics can be envisaged. Mapping the loci at which adducts can form will be beneficial and, for this purpose, new chemical tools will be required. Identifying adducts detected by the top-down approaches will be a challenge, necessitating chemical synthesis of candidate structures for unequivocal characterisation.

Protein adductomics is a component of the exposome concept that is still relatively novel, but it is one that has already demonstrated the ability to capture electrophiles of both endogenous and exogenous origin; this suggests the potential to contribute meaningfully to the aims of the exposome concept—to describe the totality of all biologically relevant exposures. Rapid advances in mass spectrometry instrumentation, with significant increases in sensitivity and resolution, will drive further advances in protein adductomics methodology. When coupled with other omics approaches, such as proteomics, transcriptomics and metabolomics, all of which have the potential for high-throughput screening of populations, a future can be envisaged in which it will be possible to capture snapshots of human exposure to genotoxicants and the resultant biological consequences at multiple stages throughout life. Building this comprehensive picture should shed significant light on the causes and courses of chronic diseases in humans. Such knowledge will provide new opportunities for early intervention to reduce potentially harmful human exposure, to monitor the effectiveness of intervention strategies and, ultimately, to prevent diseases before they occur.

**Author Contributions:** Both authors critically reviewed the literature and contributed equally to writing the manuscript.

**Funding:** The authors' research was funded by the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 308610 (the EXPOsOMICS project). Additional funding was from Cancer Research UK (Programme Grant CRUK/A14329) and the MRC-PHE Centre for Environment and Health (MRC grant number G0801056/1).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Internal Doses of Glycidol in Children and Estimation of Associated Cancer Risk**

#### **Jenny Aasa 1, Efstathios Vryonidis 1, Lilianne Abramsson-Zetterberg <sup>2</sup> and Margareta Törnqvist 1,\***


Received: 20 December 2018; Accepted: 29 January 2019; Published: 1 February 2019

**Abstract:** The general population is exposed to the genotoxic carcinogen glycidol via food containing refined edible oils where glycidol is present in the form of fatty acid esters. In this study, internal (in vivo) doses of glycidol were determined in a cohort of 50 children and in a reference group of 12 adults (non-smokers and smokers). The lifetime in vivo doses and intakes of glycidol were calculated from the levels of the hemoglobin (Hb) adduct *N*-(2,3-dihydroxypropyl)valine in blood samples from the subjects, demonstrating a fivefold variation between the children. The estimated mean intake (1.4 μg/kg/day) was about two times higher, compared to the estimated intake for children by the European Food Safety Authority. The data from adults indicate that the non-smoking and smoking subjects are exposed to about the same or higher levels compared to the children, respectively. The estimated lifetime cancer risk (200/105) was calculated by a multiplicative risk model from the lifetime in vivo doses of glycidol in the children, and exceeds what is considered to be an acceptable cancer risk. The results emphasize the importance to further clarify exposure to glycidol and other possible precursors that could give a contribution to the observed adduct levels.

**Keywords:** glycidol; Hb adduct; *N*-(2.3-dihydroxypropyl)valine; in vivo; cancer risk; UPLC/MS/MS

#### **1. Introduction**

Exposure to genotoxic compounds in interaction with other factors contributes to an increased risk of cancer development [1]. For many cancer types, the onset of the disease are several decades after a specific exposure [2]. Quantification of the cancer risk from exposure to genotoxic compounds is usually based on data from carcinogenicity studies at high doses in rodents, with extrapolation of the obtained cancer risk coefficient to estimated human exposure doses of the studied compound. An improved estimate of the risk would be obtained if the internal (in vivo) dose of the studied compound/metabolite could be used for species extrapolations. This is particularly important for ongoing human exposures, where measured in vivo doses also give an improved estimate of the exposure.

One compound, for which there is an ongoing human exposure, is the genotoxic compound glycidol [3]. This compound has received much attention during the last years, largely due to the detection of glycidyl fatty acid esters in food intended for children, such as as infant formula [4]. Possible exposure sources are food products containing refined edible oils where glycidol is present in the form of glycidyl fatty acid esters [5,6]. The esters are hydrolyzed in the stomach, leading to formation of glycidol (Figure 1) [7]. This is of concern to human health, as glycidol is classified by the International Agency for Research on Cancer (IARC) as probably carcinogenic to humans, Group 2A [8]. Protecting children from exposure to genotoxic compounds is important. Children are considered more vulnerable for exposure to chemical compounds compared with adults, because children are still

growing and organ cells are rapidly dividing [9]. This likely increases the rate of fixation of mutations, which can lead to the onset of cancer.

**Figure 1.** Glycidol is formed in vivo by hydrolysis of glycidyl fatty acid esters [3], where R represents different ester side chains. Glycidol reacts with the N-terminal valine in hemoglobin (Hb) to form *N*-(2,3-dihydroxypropyl)valine (diHOPrVal) as N-terminal.

Here, we will present results from a study of children where the in vivo dose of glycidol has been quantified in blood samples from the individuals. Also, blood samples from 12 adults were analyzed as a reference group. As genotoxic chemicals are usually electrophiles and short-lived in vivo, measurement in vivo of the ultimate genotoxic compound/metabolite is generally difficult per se. Instead, stable adducts to proteins may be used as a biomarker of exposure/internal dose. We have previously developed a method for measurement of adducts to the N-terminal valine in hemoglobin (Hb) using GC/MS/MS [10]. Later, the method was developed for LC/MS/MS, and is referred to as the FIRE procedure [11,12]. If the reaction rate constant for the formation of the specific adduct from the studied electrophile to the N-terminal valine is known, the in vivo dose expressed as the area under the concentration–time curve (AUC) can be calculated for the electrophile from a measured Hb adduct level [13]. This analytical approach has previously been applied in studies of exposed animals and humans for monitoring of internal exposures and quantification of in vivo doses of reactive compounds, for instance present in food [14–16], or for the screening of adducts with an adductomics approach for the investigation of background exposures in humans [17,18]. In the present study, in vivo doses of glycidol were calculated from measured Hb adduct levels in the human blood samples (Figure 1). The in vivo doses were then used for the estimation of the lifetime excess cancer risk, due to glycidol exposure.

#### **2. Materials and Methods**

#### *2.1. Chemicals*

Glycidol (98%, CAS No. 556-52-5) was purchased from Acros Organics (Geel, Belgium). L-Valine-(13C5) (96–98% purity), RS-glyceraldehyde and sodium cyanoborohydride, used for the synthesis of the internal standard *N*-(2,3-dihydroxypropyl)-(13C5)valine, were obtained from Cambridge Isotope Laboratories, Inc (Tewksbury, MA, USA) and Sigma Aldrich (St Louis, MO, USA) respectively. Cyanoacetic acid and ammonium hydroxide were purchased from Fluka (Buchs, Switzerland). The analytical standard used for the calibration curve, fluorescein thiohydantoin of *N*-(2,3-dihydroxypropyl)valine (diHOPrVal-FTH), was synthesized previously within the research group [15]. Fluorescein isothiocyanate (FITC) was purchased from Karl Industries (Aurora, OH, USA) and potassium hydrogen carbonate (KHCO3) from Merck (Darmstadt, Germany). All other chemicals (analytical grade) were obtained from Sigma Aldrich (St Louis, MO, USA).

#### *2.2. Study Population*

Blood samples from 50 children at the age of about 12 years (35 boys, 15 girls) were obtained from a study in 2014 by the National Food Agency in Sweden regarding food-related exposures in children of school age (reviewed by the Regional Ethical Review Board, Uppsala, Sweden; No. 2013/354 (date of approval: 13 December 2013); following the rules of the Declaration of Helsinki). Venous blood from each individual was sampled at one occasion, and the samples used for they analysis of Hb adducts were centrifuged at 1500× *g* for 10 min to separate red blood cells (RBCs) from the plasma prior to storage of the RBC at −20 ◦C. In addition to these samples, measurements were also conducted on blood samples from six non-smoking and six smoking adults (males), collected earlier (1997) with ethical approval (from the Regional Ethical Review Board, Stockholm, Sweden; No. 96-312 (date of approval: 14 October 1996). In connection to the blood collection, these blood samples were centrifuged at 1500× *g* for 10 min, to separate RBCs and plasma. The RBCs were washed three times with an equal volume of 0.9% NaCl followed by centrifugation and lysis by the addition of an equal volume of distilled water prior to storage at −20 ◦C.

#### *2.3. Synthesis of Internal Standard*

The internal standard (IS), *N*-(2,3-dihydroxypropyl)-(13C5)valine fluorescein thiohydantoin (FTH), was synthesized in two steps. First, RS-glyceraldehyde (21.6 mg, 240 μmol) and (13C5)valine (13.7 mg, 116 μmol) were mixed in 3.6 mL methanol, followed by the addition of sodium cyanoborohydride (8.9 mg, 142 μmol) for a reduction of the formed Schiff base. The reaction solution was mixed (750 rpm) for 20 hours at 37 ◦C, followed by evaporation of the solvent. In the second step, the product, *N*-(2,3-dihydroxypropyl)-(13C5)valine, was dissolved in 5 mL acetonitrile (40%, aq.) with 0.125 M KHCO3 and 540 μL FITC (90 mg, 231 μmol, dissolved in dimethylformamide) to generate the final FTH derivative to be used as IS. The derivatization reaction was kept for 20 hours at 37 ◦C during mixing (700 rpm), before termination by addition of 1 M HCl (250 μL, 250 μmol). The reaction solution was stored in the freezer overnight followed by centrifugation for 10 min (5000 rpm at 4 ◦C).

The supernatant was filtered and concentrated to ca. 2 mL, and the final product to be used as IS was separated from by-products using semi-preparative HPLC-UV. Eight injections of each 250 μL was applied on a Hichrom C18 column (10 mm × 250 mm, 5 μm). The mobile phase started at isocratic mode for 8 min, followed by gradient mode from 40% A (water) and increasing to 100% B (acetonitrile) in 1 min, which was kept for 4 min before re-equilibrating of the column for 3 min prior to the next injection. The flow rate was 4 mL/min and the total run time 16 min. The UV spectrophotometer (Shimadzu SPD-6A) was set to the wavelength (λ) of 274 nm.

The identity and the purity of the product and the *m/z* was confirmed using LC/UV/MS/MS (API 3200 Q-Trap, AB Sciex, Concord, ON, Canada) running in full scan positive mode (*m*/*z* 80–800). Five μL was injected onto the column (Discovery HS, C18, 150 × 2.1 mm, 3 μm) with a gradient starting from 90% A (water, 0.1% formic acid) for 0.5 min and increasing to 40% B (acetonitrile, 0.1% formic acid) in 6.5 min. A further increase to 100% B was kept for 2 min followed by re-equilibration of the column. The flow rate was 0.2 mL/min and the total run time was 12 min. All settings for the MS instrument were essentially as in previous studies [11,15]. Finally, the solvent of the verified synthesized product was evaporated until dryness, and the yield of the product was determined gravimetrically to be 7.5 mg (13.2 μmol, 11.4%).

#### *2.4. Procedure for Hemoglobin Adduct Measurement*

The blood samples (250 μL) were prepared for analysis according to the FIRE procedure, where the fluorescein isothiocyanate (FITC) reagent is used for the measurement of adducts (*R*) from electrophilic compounds with a modified Edman procedure [11,12]. The hemoglobin (Hb) content was measured in all samples (RBCs diluted with water, 1:1) prior to derivatization and detachment of N-terminals with adducts with FITC during mixing at 37 ◦C overnight. Internal standard (*N*-(2,3-dihydroxypropyl)-(13C5)valine) was added prior to the work-up procedure that was performed as described in several previous papers [15,17,19]. A final sample volume of 100 μL (40% acetonitrile in water) was used for analysis of Hb adducts with ultra-high performance liquid chromatography (UPLCTM) and high-resolution mass spectrometry (HRMS) (Section 2.6). The intraday variability of the FIRE procedure was investigated by processing three individual blood samples from the children five times in parallel at one occasion.

Calibration samples were prepared by adding known concentrations of diHOPrVal-FTH to bovine blood (Håtunalab AB, Bro, Sweden) followed by the work-up procedure. Two sets of calibration samples were prepared: Set 1) in duplicates at four levels at 0.04–1.6 pmol/sample (for measurement of background levels in human samples) and Set 2) in triplicates at six levels at 11–600 pmol/sample (for kval determination). The preparation and analysis of the calibration samples were as described by Aasa et al. [15], with the exception of the analysis of the samples from Set 1, which were analyzed with HRMS (described in Section 2.6).

#### *2.5. Measurement of Reaction Rate and Calculation of Internal Dose from Adduct Levels*

The daily adduct level increment (*a*) was calculated from the measured steady state level of the adducts (*Ass*) and the erythrocyte lifetime (*ter*), according to Equation (1). The *ter* in humans was assumed to be 126 days [13,20,21].

$$A\_{ss} = a \frac{t\_{cr}}{2} \tag{1}$$

The daily adduct increment was then used for calculation of daily AUC, by using the second-order rate constant for the reaction between glycidol and the N-terminal valine, kval. This constant was determined by incubation of glycidol with fresh human blood from four individuals (from Komponentlab, Karolinska University Hospital, Huddinge, Sweden). Triplicate samples of whole blood from each individual at three dose levels, 0 (control), 125 and 250 μM glycidol (Hb: 102–136 g/L), were incubated for one hour at 37 ◦C during mixing (750 rpm). The incubations were finalized by centrifugation and washing according to previous procedures [15]. The samples (250 μL) were then derivatized with FITC overnight followed by work-up and analysis by LC/MS/MS, as described by Aasa et al. [15] and as the calibration curve Set 2 (Section 2.4). The kval could then be determined and be used for the calculation of the in vivo doses (AUC) from measured adduct levels (*A*) according to Equation (2), assuming that glycidol at these concentrations is stable during the 1-hour incubation time, as observed earlier [15]. For calculation of the daily AUC the *A* is replaced with the daily adduct increment (*a*) calculated in Equation 1.

$$ALIC\left(\mu Mh\right) = \frac{A\left(pmol/g\ Hb\right)}{k\_{\text{mol}}\left(pmol/g\ Hb/\mu Mh\right)}\tag{2}$$

#### *2.6. LC/MS/MS System*

The analysis of Hb adducts in the in vitro incubations was performed with a triple quadrupole LC/MS/MS instrument, as described by Aasa et al. [15]. Compared to the previous analysis, the *m/z* transitions 563.1 > 447.1 (from glycidol) and 565.1 > 449.1 (from the internal standard) were also included in the analysis.

The analysis of Hb adducts in blood samples from the studied groups of humans was conducted with a Dionex Ultimate 3000 UHPLC system connected to an Q ExactiveTM HF Hybrid Quadrupole-Orbitrap™ high-resolution mass spectrometer (HRMS) (Thermo Fisher Scientific, MA, USA). The chromatographic separation was performed by injection (20 μL) on an Acquity UPLCTM HSS C18 column, 2.1 × 100 mm, 1.8 μm (Waters, Sollentuna, Sweden). The mobile phase (0.3 mL/min) was running in gradient mode, starting at 80% A (0.1% formic acid in H2O:ACN; 95:5, *v*/*v*) and increasing to 100% B (0.1% formic acid in H2O:ACN; 5:95, *v*/*v*) in 9 min. The final composition was kept for 2.5 min before re-equilibration for 3 min. *N*-(2,3-Dihydroxypropyl)valine fluorescein thiohydantoin was monitored in parallel reaction monitoring (PRM) mode with the resolution 60 000 and the normalized collision energy (NCE) on 45. The software XCalibur (Thermo Fisher Scientific, MA, USA) was used for processing of the data. The levels of the Hb adduct was quantified by using the accurate masses *m*/*z* 563.1463 (diHOPrVal-FTH) and *m*/*z* 568.1629 (internal standard, *N*-(2,3-dihydroxypropyl)-(13C5)valine fluorescein thiohydantoin), and the specific fragments associated with diHOPrVal-FTH; *m/z* 503.0893, 460.0704 and 447.0629 and the internal standard; *m/z* 505.0955, 462.0772 and 449.0694, with a 3–5 ppm mass tolerance.

#### *2.7. Statistical Analysis*

For the measured and calculated parameters (Hb adduct levels, AUC and intakes), the minimum and maximum values along with the mean values and the standard deviations are reported for the studied groups. A *t*-test and the Grubb´s test were used for the analysis of differences between the studied groups and for testing of outliers, respectively.

#### **3. Results**

For the present study we used high resolution mass spectrometry (HRMS) operating in parallel reaction monitoring (PMR) mode, which improved the selectivity for detection of the adduct *N*-(2,3-dihydroxypropyl)valine (diHOPrVal). We also synthesized an isotopically substituted internal standard, *N*-(2,3-dihydroxypropyl)-(13C5)valine fluorescein thiohydantoin, specific for the quantification of the studied adduct, which is an improvement for the quantification compared to our previous studies [15,18]. A representative example of a chromatogram from analysis of a human blood sample and a *m/z* spectrum of *N*-(2,3-dihydroxypropyl)valine-FTH is shown in Figure 2. The variability of the FIRE method was tested by processing five parallel blood samples from three children, which showed a relative standard deviation of 4.2–7.3% of the replicates (data not shown).

**Figure 2.** Ion chromatogram from a human sample and exact mass spectrum for the fluorescein thiohydantoin (FTH) derivative (diHOPrVal-FTH) of the N-terminal valine adduct formed by glycidol (6.6 pmol/g Hb at 3 ppm mass tolerance). The *m/z* fragments 503, 460 and 447 were used for quantification of the adduct diHOPrVal-FTH (*m/z* 563).

The Hb adduct was possible to quantify in all studied subjects. The adduct levels observed in the samples from the children (*n* = 50) varied between 4.4 and 20 pmol/g Hb (Figure 3, Table 1A). No statistically significant difference in the levels was observed between the sexes of the children. For the adults, the range of the observed adduct levels was 6.3–31 pmol/g Hb (Figure 3, Table 1A). Although this group consisted of a small number of subjects, there was a statistically significant difference of the adduct levels between the smoking and the non-smoking subjects (adults), with about twice higher mean levels in the smokers (23.4 pmol/g Hb) compared to the non-smokers (10.3 pmol/g Hb) (*p* < 0.01). The adduct levels of the non-smoking adults were in the range of the adduct levels in the children (Figure 3, Table 1A). Assuming that a chronic exposure (of glycidol or its precursor) is giving rise to the observed adduct levels, the daily adduct increment was calculated using Equation 1 (Table 1A).

**Figure 3.** *N*-(2,3-Dihydroxypropyl)valine adduct levels in blood from children (*n* = 50) and, non-smoking (*n* = 6) and smoking (*n* = 6) adults. The mean values are marked as horizontal bars. The higher level observed in the smokers is likely due to presence of glycidol in tobacco smoke (see Section 4).

**Table 1.** (A) Measured steady state Hb adduct levels (*N*-(2,3-dihydroxypropyl)valine) in blood samples from children and adults and corresponding calculated daily adduct level increments, and (B) the corresponding estimated daily in vivo doses (AUC) and intakes of glycidol.


<sup>a</sup> Mean levels quantified from four different *m/z* transitions in the MS analysis of the studied Hb adduct. <sup>b</sup> The daily AUC was calculated from the daily adduct level increment (Equation 1) and the second-order reaction rate constant, kval (Equation 2), assumed to be at steady state (from chronic exposure). <sup>c</sup> The calculations of the daily Hb adduct increment, AUC and intake have been reported for boys and girls together as no statistical differences were observed between the sexes. <sup>d</sup> Glycidol in tobacco smoking is likely contributing to the daily intake level (see Section 4).

The daily intake of glycidol (μg/kg per day) was in the next step calculated from the obtained daily Hb adduct level increments. To perform this calculation, the relation between the adduct level (or in vivo dose) and administered dose of glycidol is required. We used the results recently published by Abraham et al., who studied the relation between glycidol-induced diHOPrVal adduct levels and administered dose of glycidol from palm fat in human subjects [22]. The adduct increment in their study was calculated to be 82 pmol/g Hb per mg glycidol/kg body weight (b.w.). This figure was used to estimate the mean daily intakes of glycidol in the different groups of subjects in our study, as presented in Table 1B.

Further, the daily in vivo doses (AUC) of glycidol in the studied human individuals (Table 1B) were calculated from the daily adduct level increments and the kval according to Equation 2. The

second-order reaction rate constant (kval) was determined to be 19.2 ± 0.6 pmol/g Hb per μMh from the linear slopes of plots of the Hb adduct levels (*y*-axis) obtained from triplicate incubations of glycidol at two doses (AUC in vitro: *x*-axis) in human fresh whole blood from four individuals (Figure 4). The AUC was used for the calculation of the cancer risk due to glycidol exposure, further discussed in the Discussion part.

**Figure 4.** Determination of the second-order reaction rate constant, kval, for the formation of the *N*-(2,3-dihydroxypropyl)valine adduct in Hb. The data points represent the adduct levels from 1-hour incubations of glycidol with fresh human blood from triplicate samples at each dose level from four individuals, where each point corresponds to the mean from two *m/z* transitions. One replicate at the highest dose for individual 2 (data point in parenthesis) was excluded in the analyses as it was judged as an outlier (Grubbs test).

#### **4. Discussion**

#### *4.1. N-(2,3-dihydroxypropyl)valine Adduct Levels*

In this study, we quantified the levels of the *N*-(2,3-dihydroxypropyl)valine adduct (diHOPrVal) in samples from 50 children of about 12 years of age, showing a fivefold variation in the adduct levels (ca. 4–20 pmol/g Hb). No significant difference was observed between the sexes of the children. The diHOPrVal adduct levels were also quantified in a small number of adults (*n* = 12), where the mean adduct levels where about the double in the smokers compared to the non-smokers. As glycidol is known to be present in tobacco smoke, this was expected [23,24]. The mean Hb adduct level in studied non-smoking adults indicated approximately the same exposure for this group as for the children. A larger sample size from adults should be included for a more reliable comparison.

The diHOPrVal adduct in Hb has earlier been quantified only in small groups of adults in a few published studies, which all show somewhat lower levels compared to our study (Table 2). The observed variation of the mean adduct levels between the different studies may be due to differences in exposure between the studied groups, but also due to the fact that the analyses are performed at different laboratories and with different analytical methods for measurement of the N-terminal adduct in Hb; the *N*-alkyl Edman (GC/MS/MS) and the FIRE procedure (LC/MS/MS), and with no inter-calibration between the laboratories.



<sup>a</sup> FIRE procedure (adduct level expressed as per g Hb, approximately the same as per globin). <sup>b</sup> *N*-alkyl Edman method.

We have assumed that the observed diHOPrVal adduct levels in children and non-smokers originate from the exposure to the genotoxic compound glycidol via food, but this adduct may also theoretically originate from other precursors (Figure 5). One possibility is the food contaminant 3-monochloropropane-1,2-diol (3-MCPD), often occurring in parallel with glycidol in food. 3-MCPD would however give a very small contribution, as 3-MCPD has more than 1000 times lower rate constant for formation of the adduct to the N-terminal in Hb compared to glycidol [28]. The food-related compounds allyl alcohol, found in garlic, or anhydro sugars from carbohydrates, can also theoretically be precursors to the adduct [29,30]. The formation of glycidol from allyl alcohol could be assumed to be possible via a metabolic oxidation. The heating of carbohydrate-rich food (anhydro sugars) could theoretically form glycidol, which was indicated in an animal experiment with feeding with heat-processed feed and measurement of the diHOPrVal adduct [30]. Other theoretically potential precursors are the endogenously produced glyceraldehyde and glycidaldehyde. Both compounds, though, require reduction after formation of a Schiff base to the N-terminal valine in Hb to form the stable diHOPrVal adduct [30]. It is not known whether the reduction of protein adducts from Schiff bases occurs in vivo, but it was observed to occur in blood in vitro [31]. Epichlorohydrin, from occupational exposure, also could form diHOPrVal but it is not a probable exposure source in the presently studied group of humans [27]. Thus, other sources to the measured adduct than glycidol cannot be excluded, which could potentially lead to an overestimation of the in vivo doses (AUC) of glycidol in the studied subjects. It is obvious that low molecular mass adducts in many cases could have several possible precursor electrophiles.

**Figure 5.** Examples of possible precursors to *N*-(2,3-dihydroxypropyl)valine c.f. [29,30].

In this study, we calculated the AUC of glycidol from the quantified diHOPrVal levels in human subjects, assuming that glycidol is the dominating source of the adduct. The differences in the measured adduct levels and the corresponding intakes and AUC of glycidol (fivefold) between all children (Table 1) could partly be explained by different dietary habits, as glycidyl fatty acid esters are present in different types of food products [3], but also by different genotypes/phenotypes for metabolizing enzymes (e.g., epoxide hydrolase and glutathione transferase, c.f [3]). Furthermore, we could not observe any significant correlations between the diHOPrVal levels in the children and any type of registered food product (from food frequency questionnaires) with potential impact on glycidol exposure (bread, sweets, chips and other fried food) in this limited study (data not shown). Studies of a larger number of subjects and using food frequency questionnaires with more specific questions could possibly enable a sufficient statistical material for such analyses.

#### *4.2. In vivo dose and Intake of Glycidol*

Knowledge about the AUC of a particular exposure gives the possibility for a more accurate estimation of the cancer risk. Assuming that the AUCs and the intakes calculated for the studied subjects reflect the exposure to the general European population, the values imply a higher exposure to glycidol than expected, from the mean glycidol intakes estimated by the European Food Safety Authority (EFSA); ca. 0.2 μg/kg b.w./day and 0.6 μg/kg b.w./day for adults and children, respectively [3] and the Swedish National Food Agency (NFA); 0.1 μg/kg b.w./day for adults [32] (Table 3). The relation between adduct level and intake of glycidol in humans was obtained from a recently published human exposure study by Abraham et al., which involved a good number of persons (11) with intake of palm fat oil over 4 weeks corresponding to a mean daily intake of glycidol of 4.3 μg/kg b.w. [22]. The methods for measurement of the diHOPrVal adducts used by Abraham et al. and by us are not inter-calibrated, which might contribute to some uncertainty in this calculation, just like the figure on a mean lifetime of erythrocytes of 126 days (cf., Mitlyng et al. [33] and Abraham et al. [22]).

**Table 3.** Estimated daily intakes of glycidol and calculated AUC at corresponding estimated lifetime exposures of glycidol, calculated from Hb adduct levels measured in children and adults (non-smokers) in the present study. Daily intakes estimated from dietary patterns by the European Food Safety Authority (EFSA) and the Swedish National Food Agency (NFA) are used for comparison [3,32].


<sup>a</sup> n.a.: not available.

The corresponding figures of the AUC per administered dose of glycidol, obtained by different methods, are ca. 35% lower both for rats and monkeys [34,35] compared with the value calculated for humans, which indicates that there are no major differences in the disappearance rate of glycidol between these species and which also supports the reliability of the obtained human data.

#### *4.3. Human Cancer Risk*

Different models have been used to assess the human cancer risk of glycidol based on published carcinogenicity data from studies in rodents [36,37]. The European Food and Safety Authority (EFSA) has used the Margin of Exposure (MOE) approach, which is based on estimated intake values of glycidol and the reference point T25 (the dose corresponding to a 25% tumor incidence in the animals

used in the carcinogenicity studies) [3]. Using the MOE, EFSA concluded that there is a health concern associated with glycidol exposure. Furthermore, the California Environmental Protection Agency (C. EPA) has calculated a non-significant risk level (NSRL; 1 cancer case in 10<sup>5</sup> individuals over life-time) of 0.54 μg glycidol per day using an additive risk model [38].

Common for these two models are that the estimation of the cancer risk is based on extrapolations from rat to human and the administered doses of glycidol in the carcinogenicity studies. As an improvement of cancer risk estimations, our group at Stockholm University has developed a risk model based on species extrapolation via internal doses (AUC) and background tumor incidence. This model is referred to as the multiplicative risk model and has recently been validated for glycidol, presented in a forthcoming paper [33] and a few other genotoxic carcinogens [39–41]. With this model, described in detail in the referred papers, a cancer risk coefficient (β), which describes the relative tumor risk per in vivo dose, is derived. The relative risk coefficient has been shown to be approximately independent of tumor site, sex, and species for all tested compounds as well as for ionizing radiation. Thus, a common risk coefficient can be derived from the responding sites in the test species in animal cancer tests with a compound. Accordingly, this risk coefficient is assumed to also be valid in humans and can be used for the calculation of the human cancer risk for the studied specific exposure, when in vivo dose and background cancer incidence is known.

For the calculation of the human cancer risk due to glycidol exposure, the in vivo dose over a lifetime (70 years) is compared to the obtained cancer risk coefficient β [33], which is somewhat higher than the risk coefficient obtained by an additive model, as by C.EPA [38]. In the present work, we estimated a mean daily intake of 1.4 μg glycidol per kg bodyweight for the children, which implies about 36 mg/kg during a lifetime (70 years). This is equivalent to the cumulative AUC of ca. 150 μMh (Table 3). Assuming that the background tumor frequency in the general human population is ca. 30% [42], the given lifetime exposure condition lead to an estimate of ca. 200 additional cancer cases in a population of 100,000 (i.e., relative risk increment) at the given exposure condition. In the calculations, we have not considered different contribution to the risk from exposure at different ages, where children in general have higher risk increments per AUC compared with adults. This was observed for subjects exposed to ionizing radiation where the excess relative risk for children is higher compared to adults, as reviewed by Kutanzi et al. [43].

#### **5. Conclusions**

This is the first study measuring the Hb adduct *N*-(2,3-dihydroxypropyl)valine and calculation of the corresponding in vivo doses of glycidol in children. The observed variation in the in vivo doses of glycidol within the children´s cohort is likely due to dietary habits and/or different genotypes/phenotypes of metabolic enzymes. The data on diHOPrVal adduct levels in the children as well as in the small group of adults, despite some remaining uncertainties, indicate that calculated intakes of glycidol give contributions that exceed what is considered to be an acceptable cancer risk, using a multiplicative cancer risk model. The obtained data, calculated intakes, and corresponding estimated cancer risks emphasize the importance of further clarifying the background exposure to glycidol from food, as well as possible other sources to the observed diHOPrVal adduct levels in the population, particularly in children.

**Author Contributions:** Conceptualization, M.T.; Data curation, L.A.-Z. and M.T.; Funding acquisition, M.T.; Investigation, J.A. and E.V.; Methodology, E.V.; Project administration, M.T.; Resources, L.A.-Z. and M.T.; Supervision, J.A. and M.T.; Visualization, J.A. and E.V.; Writing – original draft, J.A.; Writing – review & editing, J.A., E.V., L.A.-Z. and M.T.

**Funding:** This research was funded by the Research Council Formas grant number 216-2012-1450 and Stockholm University, Stockholm, Sweden.

**Acknowledgments:** We wish to thank Natalia Kotova for building up the used pilot biobank of blood samples from the children at the Swedish National Food Agency, which was financially supported by the Civil Contingencies Agency.

**Conflicts of Interest:** The authors declare no conflicts of interests.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Review* **Biological Evaluation of DNA Biomarkers in a Chemically Defined and Site-Specific Manner**

#### **Ke Bian 1, James C. Delaney 2, Xianhao Zhou <sup>1</sup> and Deyu Li 1,\***


Received: 25 May 2019; Accepted: 14 June 2019; Published: 25 June 2019

**Abstract:** As described elsewhere in this Special Issue on biomarkers, much progress has been made in the detection of modified DNA within organisms at endogenous and exogenous levels of exposure to chemical species, including putative carcinogens and chemotherapeutic agents. Advances in the detection of damaged or unnatural bases have been able to provide correlations to support or refute hypotheses between the level of exposure to oxidative, alkylative, and other stresses, and the resulting DNA damage (lesion formation). However, such stresses can form a plethora of modified nucleobases, and it is therefore difficult to determine the individual contribution of a particular modification to alter a cell's genetic fate, as measured in the form of toxicity by stalled replication past the damage, by subsequent mutation, and by lesion repair. Chemical incorporation of a modification at a specific site within a vector (site-specific mutagenesis) has been a useful tool to deconvolute what types of damage quantified in biologically relevant systems may lead to toxicity and/or mutagenicity, thereby allowing researchers to focus on the most relevant biomarkers that may impact human health. Here, we will review a sampling of the DNA modifications that have been studied by shuttle vector techniques.

**Keywords:** DNA lesion; DNA damage; shuttle vector technique; replication block; mutagenicity; mutational spectrum; mutational signature; DNA repair; DNA adduct bypass; site-specific mutagenesis

#### **1. Introduction**

The human genome is constantly exposed to and damaged by endogenous chemicals, such as reactive oxygen species, lipid peroxidation intermediates, and alkylating agents. These electrophilic reactive chemicals, as well as environmental carcinogens and administered drugs, are known to generate various DNA adducts [1–3]. Some of the adducts block DNA replication or cause mutations and have been used as biomarkers to monitor the level of DNA damage or of disease progression [4–6]. One of the major goals for researchers is to understand the deleterious consequences of those lesions within the cell or animal. Among the different methods for studying the biological effects of the adducts, use of shuttle vectors containing a chemically defined lesion at a specific site has provided information about the biological and toxicological properties of the adduct [4,7]. The shuttle vector-based methods normally involve the steps outlined in Figure 1. *Oligonucleotide synthesis*: An oligonucleotide (oligo) containing a structurally defined lesion at a specific site is made either through a biomimetic route (in situ formation by direct chemical reaction, followed by HPLC purification of site-specifically modified oligo), or purely synthetically using a normal or convertible nucleoside phosphoramidite, etc. *Vector construction*: An ss- or ds-DNA vector containing the modified oligo is built by cutting the parent vector with one or a pair of restriction endonuclease(s), followed by ligation of the 5 -phosphorylated modified oligo. *Cellular processing*: The vector is transfected into different types of cells (e.g., *Escherichia. coli* (*E. coli*) or mammalian), and cellular polymerases are allowed to replicate or transcriptionally bypass the lesion under different repair or bypass conditions. *Data*

*analysis*: DNA is extracted, amplified using PCR, and the biological outcomes are analyzed, which include the ability of the lesion/adduct to block polymerases or cause a mutation when processed by a polymerase during cellular replication. This assessment could be done by plaque or colony counting and picking with Sanger sequencing, 32P-post labeling and thin-layer chromatography (TLC), liquid chromatography-mass spectrometry (LC-MS), next-generation sequencing (NGS), etc. [4,5,7–9]. The shuttle vector-based method was initially introduced by Essigmann [7,9–11], further developed and utilized by Wang [4,5], Moriya [12,13], Livneh [14,15], Greenberg [16,17], Basu [18,19], Lloyd [20,21], Loechler [22,23], Fuchs [24,25], Pagès [26,27], and others. Several informative review articles have been written by these authors on designing and applying the methods.

**Figure 1.** Schematic overview of the shuttle vector-based methods for evaluating DNA biomarkers.

In this work, we will review a variety of DNA biomarkers or probes that have been studied using the shuttle vector techniques and briefly summarize their biological outcomes. In all cases, focus is placed on the effect of the lesion to block replication and to cause mutations. For the details regarding the formation of DNA damage and other properties of the lesions, please refer to the original literature or review articles. We apologize in advance to researchers whose work we could not include in this review. After detailed discussions on individual lesions, we will provide some perspectives on possible future directions.

#### **2. Discussions on Individual Modifications**

Below, we will cover modified DNA structures generated from oxidative stress, alkylation, and other processes (Figures 2–5). In the following sections, the biological effects of a certain lesion are briefly summarized. Please see Figures 2–5 for chemical structures and Table 1 for detailed information.


**Table 1.** Bypass efficiency and mutagenicity of DNA modifications.


#### **Table 1.** *Cont.*

DinB-; 7.5% AlkB-DinB-) [42], 100% (15% AlkB-) [39]

C>G 4%, AlkB-) [42], Not mutagenic (C>T 52%, C>A 30%, AlkB-) [39]


#### **Table 1.** *Cont.*


**Table 1.** *Cont.*

**Figure 2.** Structures of oxidative lesions.

**Figure 3.** Structures of alkyl modifications.

**Figure 4.** Structures of bulky and crosslinked lesions.

**Figure 5.** Structures of other nucleotide analogs.

#### *2.1. Oxidative Biomarkers*

All the structures of modifications covered in this section are displayed in Figure 2. 8-Oxo-7,8 dihydro-2 -deoxyguanosine (8-oxo-G) is not a strong block to replication, demonstrating greater than 80% bypass efficiency in *E. coli* [28]. Its mutagenic pairing with A during replication in wild type (WT) cells leads to a low amount of G>T mutation (3%) [28]. However, in MutY-cells (MutY: adenine glycosylase in 8-oxo-G:A base excision repair), the G>T mutation increases to 44% [29]. 8-oxo-G causes mainly G>T mutation with a frequency of 8% in human cells [28,29]. Thymidine glycol (Tg) is not a replication block, and it is not mutagenic in *E. coli*; however, tandem lesions of 8-oxoG and Tg are twice as effective as a single 8-oxo-G in blocking DNA replication, and the dual lesion is more mutagenic than 8-oxo-G [38]. Fapy-dG (N-(2-deoxy-α,β-d-erythropentofuranosyl)-N-(2,6-diamino-4 hydroxy-5-formamidopyrimidine)) strongly blocks replication by 60–70% in *E. coli*, but it is not very mutagenic, providing less than 2% G>T mutation [31]. Fapy-dG causes 10% G>T mutation in human cells [30]. 5-Guanidino-4-nitroimidazole (NI) strongly blocks replication (93%) in *E. coli*, giving mainly G>T (22%) and G>A (19%) mutations, and some G>C (9%) mutation as well [32]. Oxaluric acid (Oa) is toxic, blocking replication by 50%, causing nearly 100% G>T mutation in *E. coli* [28,31,34]. Oxazalone (Oz) strongly blocks replication and is very mutagenic, causing 86% G>T mutation [28]. Cyanuric acid lesion (Ca) blocks 35% replication in *E. coli*, and is very mutagenic with 95% G>T mutation [28]. Guanidinohydantoin (Gh) slightly blocks replication (25%), and it is highly mutagenic yielding 97% G>C and 2% G>T mutation [34]. Two stable stereoisomers of spiroiminodihydantoin (Sp1 and Sp2) are strong replication blocks (91%), and are both very mutagenic, causing mainly G>C (72% for Sp1 and 57% for Sp2) and G>T (27% for Sp1 and 41% for Sp2) mutations [34]. Urea lesion (Ur) is a strong replication block (90%) causing 54% G>T, 35% G>C, and 9% G>A mutations [29,35]. Imidazolone adduct (Iz) can be bypassed in *E. coli* with a 40% blockage in replication, essentially causing G>C (88%) mutation, with some G>A (2%) and G>T (1%) mutations [32]. 8,5 -Cyclo-2 -deoxyguanosine (cdG) is a strong replication block (89%) in *E. coli*, and knocking out pol V increases its replication block; it is mutagenic and causes 20% G>A mutation [36]. The 5 S-diastereomer of cyclo-dG (S-cdG) also strongly blocks DNA replication (96%) in human cells, giving primarily G>T (35%) and G>A (20%) mutations [37]. 8,5 -Cyclo-2 -deoxyadenosine (cdA) is 31% bypassed in *E. coli*, but the bypass efficiency drops to 13% when pol V is removed from the cell [36]. It is mutagenic and causes A>T (11%) mutation [36]. The 5 S-diastereomer of cyclo-dA (S-cdA) strongly blocks replication in human cells by 94% [37]. Knocking down pol η by siRNA decreases the bypass efficiency and mutagenicity of S-cdA [37]. 5-Chlorocytosine (5-Cl-dC) blocks replication (25%), forming a low level of C>T mutation (5%) in *E. coli* [39]. 5-Hydroxycytosine (5-OH-dC) is not mutagenic in *E. coli* [40]. 5-Hydroxyuracil (5-OH-dU, derived from 5-OH-dC) is very mutagenic providing 83% C>T mutation in *E. coli* [40]. 5,6-Dihydroxy-5,6-dihydrouracil (Ug) is also very mutagenic (80% C>T) in *E. coli* [40].

Tetrahydrofuran (THF) is a stable structural analog to the abasic site (AP site), which is not stable and may lead to further damages to the DNA strand. THF strongly blocks replication (>95%) and causes G>T (50%), G>C (26%), and G>A (7%) mutations; additionally, it causes 13% −1 frame shift mutation [28,29,32,34,80].

#### *2.2. Alkyl Biomarkers*

All the structures of modifications covered in this section are displayed in Figure 3. 1-Methyldeoxyguanosine (m1G) is a strong replication block either with or without the repair enzyme AlkB (85% and 97%); it mainly causes ~3% G>T mutation in WT *E. coli*, which increases to more than 50% in AlkB- *E. coli* (AlkB: alkyl DNA adduct direct reversal of damage repair protein) [9,33]. *N*2-methylguanine (m2G) weakly blocks replication by 10% in *E. coli*, there is no significant change when knocking out either AlkB or DinB (DinB: DNA polymerase IV), and a small amount of G>A mutation (3%) is seen [42]. *N*2-ethylguanine (e2G) does not block replication in *E. coli* and causes a low amount of G>A mutation (2%); eliminating AlkB and DinB does not change the replication bypass and mutagenicity significantly [42]. *N*2-carboxymethyl-2 -deoxyguanosine

(*N*2-CMdG) and *N*2-(1-carboxyethyl)-2 -deoxyguanosine (*N*2-CEdG) do not block DNA replication and are not mutagenic in WT mammalian cells; however, each of them causes G>A (23%) and G>T (15%) mutations in mouse embryonic fibroblast (MEF) cells that are deficient in pol κ [43]. *N*2-CEdG blocks replication in *E. coli* [44]. The *R*-*N*2-CEdG is a stronger replication block (61%) than *S*-*N*2-CEdG (25%); however, neither of them are mutagenic [44]. *N*2-furfurylguanine (*N*2-FF-dG) does not block replication in WT *E. coli*; however, it blocks replication about 72% in DinB- cells [42]. It is not very mutagenic with or without DinB [42]. 2-Tetrahydrofuran-2-yl-methylguanine (*N*2-HF-dG) is similar in structure to *N*2-FF-dG and strongly blocks replication (72%) only when DinB is knocked out, and causes only 2% G>C mutation [42]. *O*6-methylguanine (*O*6mG) is very mutagenic and leads to almost 100% G>A mutation in Ada/Ogt/UvrB triple knockout *E. coli* (Ada/Ogt: alkyl DNA adduct direct reversal of damage repair protein; UvrB: nucleotide excision repair) [45,46]. *N*-Nitroso compounds induce DNA lesions: *O*6-pyridyloxobutyl-dG (*O*6-POB-dG), *O*6-pyridylhydroxybutyl-dG (*O*6-PHB-dG), *O*6-carboxymethyl-dG (*O*6-CMdG), which have two structural analogs: *O*6-aminocarbonylmethyl-dG (*O*6-ACM-dG) and *O*6-hydroxyethyl-dG (*O*6-HOEt-dG) [47]. *O*6-POB-dG slightly blocks DNA replication and induces G>A (90%) transition and G>T (2.5%) transversion in *E. coli* [47]. *O*6-PHB-dG is a moderate impediment to DNA replication and causes G>A (95%) mutation exclusively in *E. coli* [47]. *O*6-CMdG strongly inhibits replication in *E. coli*, but causes moderate G>A (10%) mutation [47]. *O*6-ACM-dG and *O*6-HOEt-dG are two analogs of *O*6-CM-dG. Both *O*6-ACM-dG (2% bypass) and *O*6-HOEt-dG (15% bypass) strongly block DNA replication [47]. They also induce G>A mutation with 30% and 40% frequencies, respectively [47]. Major acrolein-dG adducts include 8α and 8β isomers of 3H-8-hydroxy-3-(β-D-2 -deoxyribofuranosyl)-5,6,7,8-tetrahydropyrido[3,2-a]purine-9-one (γ-OH-PdG), 6α and 6β isomers (α-OH-PdG), and 1,*N*2-(1,3-propano)-2 -deoxyguanosine (PdG) [12]. The bypass efficiency for γ-OH-PdG is 73% compared to dG control in human cells, and γ-OH-PdG is not very mutagenic (<1%) [12]. α-OH-PdG strongly blocks DNA replication with a bypass efficiency of 17% in human cells and it causes G>T (11%) mutation [13]. PdG strongly blocks replication in human cells and mainly causes 6% G>T mutation [12]. Most of the derivatives of PdG moderately block DNA replication in human cells and cause mainly G>T mutation (2–8%) [81]. 1,*N*2-ethenoguanine (1,*N*2-eG) is a strong replication blocker (96%) in *E. coli* and causes G>A and G>T mutation by 6% for both, plus a small amount of G>C (2%) mutation; it also causes −1 and −2 frame shift mutations (5%), and knocking out AlkB leads to higher replication block and almost doubles the mutagenicity [8]. 2 -Fluoro-*N*2,3-ε-2 -deoxyarabinoguanosine (2 -F-*N*2,3-eG), a stable analog of *N*2,3-ethenoguanine (*N*2,3-eG), blocks replication by 79%, and causes 30% G>A mutation in *E. coli*, with AlkB having no significant influence in its replication bypass and mutagenicity [8].

1-Methyldeoxyadenosine (m1A) strongly blocks replication in AlkB- *E. coli* (88%), but it is not very mutagenic, causing <1% A>T mutation; m1A does not block replication in AlkB+ *E. coli* cells [9]. 1,*N*6-ethenoadenine (eA) weakly blocks replication by 4% in WT *E. coli*, but significantly blocks replication (95%) when AlkB is knocked out; likewise, eA is not mutagenic in WT *E. coli*, but shows strong mutagenicity in AlkB- cells (25% A>T mutation) [33,49]. Bypass efficiency of eA in human cells is 17% [50]. 1,*N*6-ethanoadenine (EA) does not block replication in WT *E. coli*, but strongly blocks replication by 86% when AlkB is removed; it is not very mutagenic in either WT or AlkB- cells, causing only 2% A>C mutations [49]. *N*6-carboxymethyl-2 -deoxyadenosine (*N*6-CMdA) minimally blocks replication in *E. coli* and is not mutagenic [36]. *S*-*N*6-HB-dA (HB = 2-hydroxy-3-buten-1-yl) and *R*,*R*-*N*6,*N*6-DHB-dA (DHB = 2,3-dihydroxybutan-1,4-diyl) do not block DNA replication and are not mutagenic in *E. coli* [51]. *S*,*S*-*N*6,*N*6-DHB-dA moderately inhibits replication with a 60% bypass efficiency, and causes minimal 1% A>G mutation [51]. *R*,*S*-1,*N*6-γ-HMHP-dA (HMHP = 2-hydroxy-3-hydroxymethylpropan-1,3-diyl) strongly inhibits DNA replication but causes only 2% A>T mutation [51].

*O*2-Methylthymidine (*O*2-Me-dT) can be bypassed by 55% in human cells and mainly causes T>A mutation (56%) [53]. *O*2-[4-(3-pyridyl-4-oxobut-1-yl]thymidine (*O*2-POB-dT) exhibits genotoxicity showing 26% bypass efficiency and is mutagenic with 47% T>A transversion [53]. Both *O*2-Me-dT

and *O*2-POB-dT strongly block DNA replication in *E. coli* (95% and 97%) [54]. *O*2-Me-dT induces 10% T>A and 10% T>G mutations [54]. *O*2-POB-dT induces 38% T>G and 12% T>A mutations [54]. *O*2-Ethylthymidine (*O*2-EtdT) is a strong replication block (79%) in *E. coli*, and knocking out pol IV increases the blocking activity, while knocking out pol V increases the replication block even more [55]. It is very mutagenic and forms T>C (35%), T>A (15%), and T>G (5%) mutations, and mutation frequency drops when pol V is knocked out [55]. The bypass efficiency of *O*2-dT alkyl adducts in *E. coli* depends on the size of the alkyl lesion [82]. More than 20% of adducts can be bypassed during replication for ethyl and methyl substitutions, but less than 10% can be bypassed for propyl, and less than 5% for butyl adducts, with the major mutation type being T>C point mutation [82]. *O*2-alkyldT lesions strongly inhibit DNA replication (40–85%) in mammalian cells [52]. The blockage effect increases with the size and branching of the alkyl groups [52]. These lesions cause T>A and T>G mutations [52]. 3-Methyldeoxythymidine (m3T) strongly blocks replication in *E. coli* by 94% and is very mutagenic, generating mainly T>A (32%) transversion mutation; eliminating AlkB slightly increases its replication blocking power and mutagenicity [9]. N3-Ethylthymidine (N3-EtdT) strongly blocks replication by 83% in *E. coli*, and knocking out pol V or pol IV increases its blocking activity; it is very mutagenic causing T>A (21%), T>C (15%) and T>G (3%) mutations, and removing pol V eliminates the mutagenicity of this adduct [55]. N3-carboxymethylthymidine (N3-CMdT) strongly blocks replication by 45% in *E. coli*, with the major mutation being T>A (66%); and knocking out pol V slightly increases the mutation rate; however, knocking out pol IV decreases the mutation rate [36]. *O*4-carboxymethylthymidine (*O*4-CMdT) is a strong replication block (51%) and very mutagenic, causing 86% T>C mutation [36]. *N*3-CMdT, *O*4-CMdT and *O*6-carboxymethyl-dG (*O*6-CMdG) moderately block DNA replication in human cells [48]. *N*3-CMdT causes T>A (81%) mutation; *O*4-CMdT causes T>C (68%) mutation; *O*6-CMdG causes G>A (6.4%) mutation; neither *N*6-CMdA nor *N*4-CMdC block replication or induce mutation [48]. *O*4-Ethylthymidine (*O*4-EtdT) does not strongly block replication (24%) in WT *E. coli*, but it cannot be efficiently bypassed in pol II/IV/V triple knock out cells [55]. The major mutation of *O*4-EtdT is T>C (84%) transition; however, it does not cause mutations in *E. coli* lacking pol V [55]. *O*4-Alkylthymidine (*O*4-alkyldT) lesions moderately block DNA replication in human cells; pol ι and pol ζ promote the bypass of all *O*4-alkyldT lesions except *O*4-MedT [56]. The *O*4-alkyldT lesions induce only T>C transition mutations in cells [56].

3-Methyldeoxycytidine (m3C) has been demonstrated to strongly block replication (>90%) and generate mainly C to T (50%) and C to A mutations (30%) in the AlkB- *E. coli* cell [9]. However, the lesion is not mutagenic and not blocked by the replicative polymerases in the WT (AlkB+) cell [9]. 3-Ethyldeoxycytidine (e3C) does not block replication in *E. coli*; however, it dramatically blocks replication when knocking out AlkB (91%) [9]. e3C causes 17% C>T, 11% C>A, and 2% C>G mutations in AlkB- *E. coli*, but is not mutagenic in WT cells [9]. The m3C, e3C, and m1A lesions presumably have their methyl or ethyl groups removed by AlkB's direct reversal of DNA alkyl damage mechanism prior to encountering the DNA polymerase [9]. *N*4-carboxymethyl-2 -deoxycytidine (*N*4-CMdC) weakly blocks replication (17%) and is not mutagenic in *E. coli* [36]. 5-Methylcytosine (5mC) and its derivatives 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) neither block replication nor cause mutation in *E. coli* [39,59]. 5mC also does not block replication in human cells, but there are some blockades of 5hmC (5%), 5fC (25%), and 5caC (28%) towards DNA replication in human cells [58]. 3,*N*4-ethenocytosine (eC) is a toxic adduct, which strongly blocks replication (76%) and leads to mutation with a pattern of dominant C>A (24%) and less C>T (11%) mutations in WT *E. coli*; in AlkB- cells, the blockage of replication increases to 87% and mutagenicity rises up to 49% C>A and 31% C>T mutations [33]. Lipid peroxidation-derived product 4-oxo-2(*E*)-nonenal reacts with dG, dA, and dC in DNA to form heptanone (H)-etheno (e) adducts [50]. H-edC shows strong DNA replication blocking in both *E. coli* (99%) and human cells (90%) [50]. It causes mainly C>G (40%) mutation in *E. coli*; however, mostly C>A (60%) and C>T (32%) mutations are seen in human cells [50].

5-Hydroxymethyluracil (5hmU) blocks replication by 20%, but it is not mutagenic in human cells [60]. The *S*<sup>p</sup> alkyl phosphotriester (*S*p-alkyl-PTE) lesions display comparable replication bypass efficiency to unmodified DNA in *E. coli*; *S*p-Me-PTE is mutagenic causing TT>GT (50%) and TT>GC (15%) mutations [61]. In contrast, *R*p-alkyl-PTEs block DNA replication (30–70%) but are not mutagenic [61]. Interestingly, *n*Pr- and *n*Bu-PTEs exhibit higher bypass efficiencies than Me- and Et-PTEs [61].

#### *2.3. Bulky Lesions*

All the structures of modifications covered in this section are displayed in Figure 4. *N*-(deoxyguanosin-8-yl)-1-aminopyrene (C8-AP-dG) moderately blocks DNA replication in human cells [66]. *N*-acetyl-2-aminofluorene (C8-AAF-dG) strongly blocks replication [66]. 2-Aminofluorene (C8-AF-dG) slightly blocks replication [66]. All three adducts can be nearly bypassed in error free manner [66]. Aristolochic acids I and II (AA-I, AA-II) are found in all *Aristolochia* species and generate the aristolactam (AL) metabolite for forming DNA adducts with dA and dG. Both AL-II-dA and AL-II-dG strongly block DNA replication in MEF cells [63]. AL-II-dA causes 22% A>T mutation and AL-II-dG causes 9% G>T transversion [63]. Knocking out the rev3L gene dramatically suppresses bypass of AL-I-dA in MEF cells and abolishes A>T transversion [67]. Benzo[*a*]pyrene (BP)-7,8-diol-9,10-epoxide-*N*2-deoxyguanosine (BPDE-dG) is an adduct formed by benzo[*a*]pyrene; it predominantly miscodes with G>T (73%) and G>A (12%) mutations in WT MEF cells [68]. Knocking out rev1 gene decreases the bypass efficiency of BPDE-dG to 40% and changes the mutation frequency to 32% G>T and 18% G>A [68]. Knocking out the rev3L gene significantly decreases the bypass efficiency to 13% and decreases the mutation to 6% G>T [68]. Mitomycin C (MC) generates dG-N2-MC and dG-N2-2,7-Diaminomitosene (DAM) adducts, which can be bypassed 38% and 27% in human cells, respectively [62]. The major type of mutation is G>T mutation (18% for dG-N2-MC and 10% for dG-N2-2,7-DAM) [62]. Aflatoxin B1-N7-dG adduct (AFB1-N7-dG) is weakly mutagenic in *E. coli*, causing 1.5% G>T mutation [64]; and its FAPY adduct causes 14% G>T mutation [65].

#### *2.4. Crosslinked Lesions*

All the structures of modifications covered in this section are displayed in Figure 4. *N*2-guanine -*N*2-guanine interstrand crosslinks (ICLs), 3-(2-deoxyribos-1-yl)-5,6,7,8-(*N*2-deoxyguanosyl)-6(either R or S)-methylpyrimido[1,2-R]purine-10(3H)-one is a product induced by acetaldehyde/crotonaldehyde [69]. ICL-S and ICL-R moderately inhibit DNA replication in WT *E. coli*; however, their replication blocking effects increase in uvr- *E. coli* cells [69]. ICL-Rd is a moderate block in WT *E. coli*, but it almost completely blocks replication in uvr- cells [69]. All three lesions are weakly mutagenic in *E. coli* causing exclusively 5 -G>T (3%) transversions; no mutation is observed at the 3 -G site [69]. Similar mutations generated by these lesions are seen in human cells, except ICL-S has a slightly higher mutation frequency (6%) [69]. The crosslinks formed by *cis*-diaminedichloroplatinum (II) (*cis*-DDP, cisplatin) between two guanines or adenine-guanine strongly block DNA replication in *E. coli*, but they are not very mutagenic [72]. 5-Formylcytosine mediated peptide crosslink causes 7% C>T and 1% C>G mutation and 2% C deletion [73]. γ-Hydroxypropanodeoxyguanosine (γ-HOPdG) mediated crosslink between peptide and guanine is mutagenic, causing 5% G>T and 3% G>C mutations; however, the crosslink between peptide and γ-hydroxypropanodeoxyadenine (γ-HOPdA) is not mutagenic [20].

#### *2.5. Other Nucleotide Analogs*

All the structures of modifications covered in this section are displayed in Figure 5. A series of unnatural analogs of thymine (T) was developed by the Kool group to probe the biological requirements for DNA polymerases [74]. 3-Toluene-1-β-D-deoxyriboside (H) strongly blocks replication (95%) and is very mutagenic causing T>A (41%), T>C (5%), and T>G (4%) point mutations and −1 frame shift mutation (13%). 2,4-Difluoro-5-toluene-1-β-D-deoxyriboside (F) strongly blocks replication (87%) and is mutagenic causing T>A (9%), T>C (1%), and T>G (1%) mutations. 2,4-Dichloro-5-toluene-1-β-D-deoxyriboside (L) strongly blocks replication (80%) and is slightly mutagenic causing T>A (5%) mutation. 2,4-Dibromo-5-toluene-1-β-D-deoxyriboside

(B) strongly blocks replication (88%) and is mutagenic, causing T>A (24%) mutation. 2,4-Diiodo-5-toluene-1-β-D-deoxyriboside (I) strongly blocks replication (90%) and is very mutagenic causing T>A (46%), T>C (1%), and T>G (1%) point mutations and −1 frame shift mutation (6%) [74]. xG is an 'expanded base' of dG (retaining the hydrogen-bonding face), which strongly blocks replication (89%) and is very mutagenic, causing G>A (95%) mutation [75]. xA (expanded A) weakly blocks replication (20%) and is not mutagenic; xT (expanded T) weakly blocks replication (27%), but is very mutagenic, causing T>A (73%) mutation; xC (expanded dC) strongly blocks replication (71%) and is mutagenic, causing C>A (10%) mutation [75].

The α-anomer of deoxynucleosides (α-dN) can be generated as a result of hydroxyl radical attack on deoxyribose [76]. All α-dNs except α-dA strongly block replication in *E. coli* [76]. α-dC blocks almost 99% replication and causes 72% C>A mutation [76]. α-dG also strongly blocks replication and causes 60% G>A mutation [76]. α-dT blocks almost 99% replication but it is not mutagenic in WT *E. coli* [76]. α-dA is not mutagenic [76]. The anticancer agent 6-thioguanine (sG) and its derivative *S*6-methylthioguanine (*S*6mG) do not block replication strongly in both *E. coli* and human cells [78]. sG causes 11% G>A mutation and *S*6mG causes 94% G>A mutation in *E. coli* [78]. sG is less mutagenic (8%) than *S*6mG (40%) in human cells as well [78]. Guanine-*S*6-sulfonic acid (SO3HG) is another derivative of sG [78]. It is not a strong replication block in *E. coli*, but it is very mutagenic, causing 77% G>A mutation [78]. The anti-HIV drug KP1212 is an analog of deoxycytidine [57]. It does not block replication in *E. coli*, but is mutagenic causing 10% C>T mutation [57]. Among the four 2 -deoxyxylonucleosides (xN), only xA and xG exhibit a replication block in *E. coli* [77]. xA is the only mutagenic lesion among the four and causes 10% A>G mutation [77]. Base J strongly blocks replication by 48%, but is not mutagenic in human cells [60].

#### **3. Perspectives**

In this review, we survey the biological effects of various DNA lesions or biomarkers studied by the shuttle vector techniques, allowing one to gain insight into how DNA damage or other chemically defined nucleobases are processed by polymerases and repair machinery in a natural cellular environment under physiological conditions. Among the new methods that have been developed or applied in the last decade, MS-based strategies and NGS methods have been demonstrated to be efficient for analyzing the lesion's biological outcomes. LC-MS-based methods are sensitive and accurate for quantifying the degree of lesion bypass and point mutations [4,5]. NGS techniques allow for a large-scale population analysis on many samples at the same time and provide information on a genomic perspective [4,8]. Another possible direction for using vectors as probes to analyze biomarkers is to study the mutational spectrum or mutational signature of a certain chemical or damaging agent [83–86]. LC-MS- and NGS-based analyses not only consider the biological consequences at the lesion site, but also incorporate information from the neighboring bases, such as one or two nucleotides next to the lesion site from both the 5 and 3 direction. An oligonucleotide containing the modified base can be made surrounded by nearest (and next-to nearest) randomized bases and ligated into a shuttle vector. While cellular analysis may pull out a hotspot consensus sequence for poor repair and/or mutagenic replication, this will not answer the primary question of contextual bias in adduct formation. Shuttle vector systems whereby the vector is treated with the chemical to be assessed, followed by quantification of adduct type and amount, and transfection into isogenic cells of varying repair and/or replication backgrounds may tease apart the contribution of local sequence environment to adduct formation, repair, and replication. Such vectors were used over a decade ago [87], and coupled with NGS throughput and bioinformatics, may provide enough reads to make statistically significant claims. Shuttle vectors are currently, to our knowledge, mainly DNA-based; however, one can envision use of RNA-based vectors to study the effect of modified RNA bases on cellular processes such as viral replication, translation, reverse transcription, and possibly even repair. While the role of DNA damage in toxicology focuses mainly on the direct adduction of chemical damage to DNA, pool mutagenesis has often been overlooked, and it would be interesting to leverage shuttle vector techniques to study

the incorporation of modified bases from the nucleotide pool in the form of damaged DNA or from DNA-based therapeutics.

**Author Contributions:** Conceptualization, K.B. and D.L.; writing—original draft preparation, K.B., J.C.D., X.Z. and D.L.; writing—review and editing, K.B., J.C.D., X.Z. and D.L.; supervision, D.L.; funding acquisition, D.L.

**Funding:** This work was supported by National Institutes of Health under grant numbers R15 CA213042 and R01 ES028865 (to D.L.).

**Acknowledgments:** The authors want to thank the RI-INBRE program and its director Bongsup Cho for their kind support.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
