1. Introduction
DNA adducts are modifications on the genome resulting from the covalent binding of structural moieties on the nucleosides, or the molecular rearrangement of the nucleosides, associated with exposure to xenobiotics and endogenous processes. Liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS) has become a mainstay for DNA adducts analyses [
1,
2,
3]. An advantage of such instrumentation resides in its selectivity; for instance, high-resolution mass spectrometry (HRMS), which allows the identification of analytes and provides a straightforward quantification, as opposed to other indirect methods, e.g.,
32P-postlabeling. The comprehensive and global analysis of DNA adducts is termed adductomics, wherein various types of DNA modifications are identified in a sample. HRMS has contributed to advances in this approach, providing accurate masses of the adduct analytes, which facilitates the identification and eventually aids in the elucidation of chemical structures, which can be employed to study trace exposure to selective chemicals [
4,
5,
6].
Nontargeted HRMS acquisition methods applied in metabolomics or proteomics studies are being adapted for adductomics approaches, e.g., data-independent acquisition (DIA) and data-dependent acquisition (DDA). DIA is a tandem MS data acquisition method that consists of a sequential fragmentation of precursor ions, selected by specific
m/
z values or windows, spanning across an
m/
z spectrum range [
7,
8]. In a DDA method, the most intense scanned precursor ions are sequentially selected and fragmented [
9]. The DIA method has been considered advantageous due to high accuracy and sensitivity for low-abundant analytes in metabolomics applications [
7] and detecting DNA adducts [
10,
11]. These LC-HRMS acquisition methods tend to produce large amounts of data, and analyzing such data by a manual process, i.e., directly from the instrument software, is tedious and time-consuming. The lack of matching data from mass libraries and the unavailability of standards also pose difficulties in the identification of adducts. Several data processing methods have been developed for metabolomics and proteomics projects, e.g., Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH) [
12]. However, there is still a lack of available automated data analysis methods that can detect compounds through tandem reactions, such as by tracking a neutral loss under MS-CID (collision-induced dissociation), in untargeted approaches.
Our team is developing a DNA adductomics approach for ecotoxicological applications. Over the past decades, human activities have aggravated environmental pressure, including pollution load, in aquatic ecosystems worldwide, including the Baltic Sea [
13]. Various chemical pollutants accumulate in sediments [
14], leading to the chronic exposure of benthic fauna to sediment-bound contaminants. The Baltic amphipod
Monoporeia affinis is a well-established sentinel species in various monitoring programs. In our previous study [
6], an HRMS-based adductomics approach was employed using
M. affinis sampled within the scope of the Swedish National Marine Monitoring Program (SNMMP), conducted by the Swedish Environmental Protection Agency. In this sample set, only 18 adducts were found by processing the DIA data manually.
In the present study, “nLossFinder,” a graphical user interface (GUI) program running under MATLAB (MathWorks Inc., Natick, MA), was developed for DNA adductomics. For the development and testing of this program, we used commercial calf thymus DNA (ctDNA) and DNA extracted from the field-collected
M. affinis in our previous study [
6]. The MS-raw data were obtained by LC-HRMS/MS, employing a DIA with sequential precursor windows method. The developed program is based on finding a neutral loss difference between precursors (MS1) and adducted nucleobase fragments (MS2) with high mass accuracy. The workflow of this approach for DNA adductomics is described along with the existing challenges. Such a software-aided tool for nontargeted DNA adducts detection is significant in the high-throughput screening of adducts in biological specimens, such as those collected in laboratory experiments and environmental surveys, and is useful in further advancing the field of DNA adductomics.
3. Results and Discussion
The fragmentation of ionized adducts on the nucleosides (dA, dG, dC, and dT) by tandem mass spectrometry (MS/MS) can occur with the neutral loss of deoxyribose, resulting in the corresponding ionized bases adenine (A), guanine (G), cytosine (C), and thymine (T) with the specific adduct moiety (
Figure 1A). Thus, a potential DNA adduct is considered detected when the difference between a precursor and a specific fragment corresponds to the deoxyribose neutral loss (116.0473 Da). Based on this fragmentation, the approach to employing a DIA method with sequential precursor dissociation spanning over
m/
z windows is illustrated in
Figure 1B.
For each DIA window, a set of precursor ions, within a determined
m/
z range, is selected and fragmented. The number and the width of the precursor windows span the
m/
z range of the precursor ions. Throughout a chromatographic run,
m/
z spectra scans are recorded in sequences of blocks. In each block, the first scan corresponds to a full-MS1 precursor ion spectrum, and the following scans are full-MS2 spectra from the fragmentation products of precursors within each DIA
m/
z window range, i.e., each MS1 is followed by
n-MS2 scans, where
n corresponds to the number of precursor DIA windows (
Figure 1). The data obtained by this method can be analyzed manually using the instrument software (e.g., Thermo Fisher Xcalibur Qual Browser). However, this process can be more efficient and less time-consuming if performed computationally. Thus, a user-friendly GUI program with embedded algorithms, called nLossFinder, was developed for such purpose in this work.
The peak detection by nLossFinder is performed in two steps, which depend on parameters defined by the user in the GUI. As mentioned under the Methods section (
Section 2), first, PICs are extracted from the raw precursor (MS1) and tandem (MS2 for each DIA window) data. Then, a matching filter algorithm is employed to detect peaks in the PICs. A peak is detected when a part of the signal data in each PIC has a Gaussian shape, or alike, i.e., parts of the signal that have an increasing slope, followed by a decreasing slope, that can be distinguished from noise. The number of data points that constitute a peak should be higher than five for optimal identification, although less resolved peaks (three points) can also be detected, depending on the GUI settings. Other parameters can be set to modulate the sensitivity of the matching filter, such as the signal-to-noise ratio, which can determine the sensitivity of the peak detection relatively to chemical noise.
During the development of nLossFinder, initially, ctDNA was used as a mechanistic model sample. The nucleoside mixture obtained by the digestion of ctDNA was analyzed by LC-HRMS/MS using the DIA method, varying the number of precursor windows and the window width. These experiments, summarized in
Table 1, were performed to assess the quality of the results while varying the number of DIA windows (
n, cf.
Figure 1) and the window widths. The results demonstrate that too many DIA windows, with widths as narrow as
m/
z 5, or too few DIA windows, as wide as
m/
z 100 or 350, result in a lower number of potential matches. The quality of the results was assessed based on total putative adducts found and the peak shapes observed in nLossFinder. Good peaks (in
Table 1) are peaks that have a regular Gaussian shape and are not noisy (with spikes). With wider DIA windows, the peaks become increasingly noisy.
In our previous study [
6], the number of DIA windows was in the form of a CT10 experiment (
Table 1), but the interpretation of the results was performed manually. An approach such as in CT5 and CT10, i.e., many and narrow DIA windows, is, in principle, helpful to identify DNA adducts manually with the instrument software (Xcalibur Qual Browser). When using an algorithm, such as in nLossFinder, where visualization is not a concern for finding neutral losses, few and wider windows increase the number of spectral points per peak. The identification of peaks of abundant compounds may not be much affected by a low scanning frequency, for instance, as that with 30 DIA windows set in the analysis, spectral data are recorded once every 30 scans in the precursor (MS1) and in each fragment (MS2) chromatograms (cf.
Figure 1). However, less abundant compounds may be characterized by too few spectral signals, which may pose difficulties in the identification of features (peaks). Further, very-wide windows (>
m/
z 50) resulted in noisy peaks, and the number of adducts detected with nLossFinder was low compared to the narrower windows. These results suggest that there are limits at an instrumental level for this approach, where very-wide DIA windows may compromise the quality of the results. Thus, according to these experiments, more putative adducts with good-quality peak shapes were detected with nLossFinder when the precursor window width was set to
m/
z 10 or 20, and a number of DIA windows (
n) of 16 or 9, respectively, and a precursor center from
m/
z 200 to 350.
The DNA adductomics approach was applied to
M. affinis, using the conditions optimized with the ctDNA digest sample. The DNA from the sampled amphipods was extracted and digested in our earlier study [
6]. A pool of digested DNA from 12 individual amphipods was analyzed by using the DIA settings which gave the most optimal results according to
Table 1, i.e., nine windows, width
m/
z 20, and precursor center from
m/
z 200 to 350. This data set was analyzed in nLossFinder, and a putative list of 153 DNA adducts was generated. Possible ESI adducts and isotopes were not removed from the list though. A table with the list of putative adducts found in
M. affinis is presented in the
Supplementary Information (Table S1). These adducts found in
M. affinis are represented in the form of an adductome map in
Figure 2. The map is created by plotting all detected adducts,
m/
z values of molecular ions against their corresponding elution time (retention time on the chromatogram). This type of adductome mapping can be useful as the fingerprint of all background DNA modifications in the specific species from a particular location and provide information for establishing the background assessment criteria [
27] based on the detected adducts.
The adducts found in
M. affinis samples included the epigenetic markers 5-me-dC and
N6-me-dA and the oxidative stress marker 8-oxo-dG, which were confirmed by comparison with reference compounds as shown in our previous study [
6]. In addition, four other modifications (5-OH-dC, dU, dI, Gh, cf.
Table S1) tentatively identified earlier [
6], were found in the present study. In an earlier study on the same type of sample, which was performed in 2018 [
6], when manual processing was employed, in total 18 putative adducts were detected, of which 6 (
Table S2) were not found using nLossFinder in the sample analyzed in this work. This was confirmed manually using the software from the instrument (Thermo Xcalibur—Qual Browser), which indicated that these adducts may be unstable when stored in the resulting matrix at −20 °C for nearly two years.
To summarize key features of nLossFinder, prior to the data processing, i.e., finding adducts, the user must set up peak detection parameters. New parameters should be set for different experimental conditions (e.g., different number of DIA windows and widths, and chromatographic elution time window). The parameters for the extraction of PICs (
Figure 3A, exemplified for 5-OH-dC) define the minimum number of points in PIC, tolerance for missing data points within a PIC, and
m/
z error tolerance. The parameters for peak detection define how the ZAF will detect peaks in the extracted PICs (
Figure 3B). There is a detailed explanation of the steps and the parameters used in the peak detection algorithms of nLossFinder in the
Supplementary Information. After processing the experimental data, the user can visualize the detected adducts as a list, with the options to sort the results by retention time,
m/z, or intensity values of the precursors (
Figure 3C). It is possible to visualize and discard unwanted matches, such as potential ESI adducts, isotopes, or matches that may seem not desirable, e.g., very low intensity, with irregular peak shapes, or somewhat noisy, etc. Furthermore, the analysis of the results can be performed elsewhere, i.e., a list containing retention time,
m/
z, intensity, and peak area values of the precursors and specific fragments, as well as the number of data points per peak, is exported as a comma-separated values (CSV) file, which can be opened in another program (e.g., Microsoft Excel) for further analysis. nLossFinder is rapid in processing the data and thus can provide high-throughput results, although each sample must be analyzed separately. The processing time of the
M. affinis sample data was 36 s in a PC (Shark Gaming Systems, Glostrup, Denmark) equipped with Intel(R) Core(TM) i9-9900K CPU @ 3.60 GHz, 16 GB RAM, and a 4 GB GPU.
A common disadvantage of using computational programs for the screening of certain chemicals/adducts is that they may generate false positives. These may arise, for instance, if there are artifacts that can mimic the deoxyribose neutral loss, e.g., neutral losses of isomers of deoxyribose, or neutral loss from multiple charged precursors. Because reference compounds are seldom available, the user must set some criteria in the interpretation of the results. Some criteria that the user can adopt to minimize the number of false-positive matches while using nLossFinder are to verify the peak shapes of precursor and specific fragments in each match and discard matches that the program has found but have irregular shapes or do not overlap properly. An algorithm in nLossFinder automatically performs such tasks using peak shapes as a filtering criterion to reduce false positivity. This algorithm is employed to separate the detected peak data from the rest of the PIC data (surrounding noise). The process is accomplished by applying a smoothing filter on each detected peak apex and intersecting the local minima of the smooth peak with the PIC data. This extracts the peak data that have the best shape, discarding the rest of the PIC data (noise before and after the peak). This algorithm also checks for local maxima around the peak, which can reveal that the peak is surrounded by high PIC noise signals. Moreover, the overlapping between precursors and specific fragment peaks is also verified. Matches that do not meet the criteria (shape and overlap) are discarded automatically. However, a final visual review of the matches is recommended to be performed in nLossFinder to confirm that all the matches have desired shapes, proper overlapping, etc. (such as in
Figure 3C). Some matches may eventually pass these filters, such as very-low-abundant precursors that may be characterized by multiple small consequent peaks. In the total findings presented in
Table S2 (about 150 putative adducts), only a few low intense matches (about 5) had to be removed manually. The visual confirmation also allows studying the matches that correspond to isotopes and ESI adducts, which are considered to be false positives. Nonetheless, this latter process can be performed elsewhere from the output table.
The isotopes can be useful to determine the precursor ion charge but cannot be taken as a positive match and, therefore, should be excluded. Moreover, ESI adducts such as sodium, ammonium, potassium, etc., if already found in their protonated form, should not be considered as additional adducts. Isotopes and ESI adducts can be identified by sorting the precursors by retention time and comparing their m/z values in overlapping matches. However, when samples in a study are prepared and processed in the same way, the false positives may not have any significant impact, e.g., in identifying adducts that are linked to a disease state or to a particular exposure. Nevertheless, if required for selective purposes, a targeted acquisition method, e.g., parallel reaction monitoring, could be applied on the generated list of putative adducts to increase the confidence in the identification of the putative adducts and to aid in ruling out the false positives from the untargeted DIA approach. In the future development of nLossFinder, it is planned that the isotopes and ESI adducts will be removed or highlighted. It is planned to be able to analyze batches of samples, rather than one at a time, as nLossFinder is processing presently. This will allow comparison with different samples, including those with background samples, when available, to identify and remove false positives.