1. Introduction
International law enforcement and justice entities have reached a consensus that DNA analysis is the “gold standard” in forensic investigations. Short tandem repeats (STRs), characterized by their widespread distribution and high genetic polymorphism within populations, have emerged as pivotal genetic markers for investigative procedures. Common crime scene traces such as cigarette butts, hair, and fingerprints often present challenges because of their low-template DNA, which is susceptible to random effects during polymerase chain reaction (PCR) amplification, including allelic imbalance and allele dropout [
1]. These factors add complexity to subsequent STR profile analyses.
To address allele dropout (Type II error), various sensitivity-enhancing methods have been explored, such as increasing PCR cycles, reducing PCR volume, using nested PCR, enhancing fluorescent dye signals, extending injection times, and employing higher-purity formamide during sample preparation for capillary electrophoresis [
2,
3,
4]. However, the heightened sensitivity increases the risk of mislabeling non-allelic signals (Type I error), which can arise from PCR products (e.g., stutter, non-template-dependent nucleotide addition, and non-specific amplification products) or instrumental artifacts (e.g., spikes, raised baselines, and incomplete spectral separation resulting in pull-up or bleed-through) [
5]. Many labs currently adhere to the recommended analytical threshold (AT) provided by amplification kit manufacturers when analyzing forensic samples. This conservative approach aims to minimize the impact of background noise and PCR artifacts. However, for low-template samples, conservative ATs cannot reliably differentiate the target DNA signal from noise [
6]. Additionally, in cases of limited sample quantity, it is impractical to further increase the detection sensitivity and conduct retests. The SWGDAM Interpretation Guidelines emphasize that “an AT defines the minimum height requirement at and above which detected peaks can be reliably distinguished from background noise. Peaks above AT are generally not considered noise, and are either artifacts or true alleles” [
5]. Therefore, to ensure optimal signal processing parameters during DNA analysis, it may be advisable to select an AT that minimizes both Type I and Type II errors.
In practice, various forensic DNA analysis laboratories employ different methods to determine the threshold for analysis. Gilder et al. [
7] proposed a method endorsed by the IUPAC, based on Kaiser’s suggestion [
8,
9]. In this approach, a threshold is established by analyzing the baseline noise to ensure that signals arising from random fluctuations are not erroneously labeled as true alleles. Marciano et al. [
10] described a dynamic locus and sample-specific AT based on the mean and standard deviation of noise in regions flanking a locus within an individual sample. This system achieved 97.2% accuracy in allele detection, representing an 11.4% increase over the lowest static threshold (50 RFU). Additionally, some methods rely on the relationship between the RFU signal and DNA input into PCR, originating from the field of chemical analysis and later applied to DNA analysis [
11,
12,
13,
14]. Several previous studies compared various methods for determining the optimal AT. Rakay et al. [
14] separately tested ATs derived from negatives, the relationship between RFU signals and DNA input, and commonly employed ATs to compare their impact on both Type I and Type II errors. They suggested that, for samples amplified with less than 0.5 ng DNA, applying ATs derived from baseline analysis of negatives can reduce the probability of allele dropout by a factor of 100 without significantly increasing the probability of erroneous noise detection. Bregu et al. [
15] also outlined and compared four different methods that rely on the analysis of baseline noise from a number of negatives to calculate ATs. They found that variations in the procedural conditions could affect the baseline noise associated with genetic analysis, ultimately influencing the determination of the analysis threshold in laboratories. They also recommended the use of ATs derived from negative samples with lower DNA levels. However, despite the abundance of methods for calculating ATs based on negative signals, there is still no clear consensus on the preferred method for practical casework. Law enforcement agencies face a challenge in the absence of a scientific guide and framework for adjusting ATs, particularly for low-template samples.
This study analyzed the status and distribution of baseline noise across multiple laboratories over three years, considering reagent kits, testing quarters, laboratory conditions, and amplification cycle numbers, using a large number of negative control profiles. The objective of this study was to explore the need for each laboratory to establish an optimal AT. In this study, we utilized established methods, relying on amplification negatives, to determine ATs for analyzing low-template DNA profiles. The goal was to compare error rates across laboratories. The overall objective of this study was to establish a universally applicable AT calculation practice model for scientific and efficient genetic analysis, providing guidance and references for laboratory personnel in diverse settings.
2. Materials and Methods
2.1. Collection of Historical and Experimental Data
In total, 929 negative control samples were collected from six laboratories (LAB_a–f) between 2019 and 2022. The amplification kits used included the AGCU EX22 kit (Applied ScienTech, Wuxi, China), PowerPlex 21 kit (Promega, Madison, WI, USA), and VeriFiler™ Plus kit (Thermo Fisher Scientific, Waltham, MA, USA). Amplified products were analyzed using an ABI 3500 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA).
Additionally, the experiments on low-template DNA samples were conducted by seven laboratories (LAB_a–e, g and h) using the VeriFiler™ Plus kit. All experiments were consistently performed by the same experimenter. Female control DNA 9947A (OriGene, Rockville, MD, USA) was diluted to three concentrations: 31.25 pg/µL, 15.625 pg/µL, and 7.8125 pg/µL. Each PCR reaction used 1 µL of DNA, with a total volume of 10 µL, following the routine protocol for the cases. Three PCR replicates were conducted for each concentration of 9947A and negative control for 27, 29, and 31 cycles. The amplified products were separated via capillary electrophoresis using an ABI 3500 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA), with three replicates for each amplified product. The research protocol was reviewed and approved by the Ethics Committee at the Institute of Forensic Medicine, Sichuan University (No. KS2022770) (4 March 2022).
2.2. Analysis of Large-Scale Negative Control Samples
The negative control results from historical records and low-template DNA experiment were analyzed using GeneMapper ID-X. First, the empirical AT commonly used in each laboratory was used to analyze all negative samples. Negative samples with peak heights above this threshold were excluded. Subsequently, an AT of 1 RFU (with the threshold of the internal lane standard set at 175 RFU) was used to analyze the remaining negative samples. The data from the “Sizing Table” in GeneMapper ID-X for each dye were exported, containing the details of each signal above the AT, such as marker, allele, size, height, area, and data point. An in-house Python script was used to filter signals outside the read region recommended by the manufacturer. All signals within 2 bases of the internal lane standard were removed to avoid the influence of pull-up [
15]. Negative samples from each laboratory were then grouped into quarters at three-month intervals. The signal number and height distribution for each dye across the quarters and laboratories were analyzed.
2.3. Study on ATs
The low-template DNA results were analyzed using GeneMapper ID-X, with a minimum AT set at 1 RFU for each non-internal standard dye. Signal data from the “Sizing Table” for each dye were exported and processed using a Python script. This script initially screened the peaks occurring at the locus positions. Subsequently, to eliminate pull-up peaks, it excluded signals meeting specific criteria: sharing the same position (±0.3 bases) as an allelic peak in another dye, and with a peak height of 5% or less compared to that of the allelic peak [
16]. The AT for each dye varied from 1 to 200 RFU. The number of allelic dropouts and non-allelic peaks at each AT were statistically analyzed with the reference control DNA 9947A with 1 ng input.
In addition, low-template control DNA samples were analyzed using GeneMapper ID-X with six distinct groups of ATs. These ATs included one conventional threshold (denoted
ATori) and five thresholds calculated using previously published methods [
7,
8,
9,
12,
15,
17,
18]. The
ATori set uses a specific threshold value of 175 RFU for each dye. The other five methods were developed based on the analysis of signals from negative samples in low-template DNA experiments.
AT1:
AT1 was calculated using the following equation:
where
Yn is the mean of the negative signals,
sY,n is the standard deviation of the negative signals, and
k is a constant that depends on the desired confidence level. In accordance with insights from the preceding literature on the choice of
k [
7,
8,
12,
13,
14], this study opted to set
k equal to three.
AT2: The following equation was used to determine
AT2:
where
Yn and
sY,n are the mean and standard deviation of the negative signals, respectively,
tα,υ is the one-sided critical value from the t-distribution for a given confidence interval, and
nn is the number of negative samples.
AT3:
AT3 was computed using the following equation:
where the parameters are defined as in the equation for
AT2, and
expresses the correction for the uncertainty of the true and calculated mean negative signal.
AT4: AT4 indicates the level of background noise in a signal. This was calculated by determining the value that separated 99% of the negative signals from the rest.
AT5: The following equation was used to determine
AT5:
The calculation of AT5 was based on the assumption that a negative signal follows a lognormal distribution. Consequently, the natural logarithm (log base e) of the negative signal is assumed to follow a normal distribution, with a mean denoted as υ and a variance represented by τ. The specific value of factor k depends on the chosen confidence level used to estimate the noise. The value of k was set to three.
The exported “Sizing Table” at each AT group underwent further analysis. The number of allelic dropouts and non-allelic peaks in the low-template DNA samples under different DNA inputs and amplification cycles were counted and analyzed. Receiver operating characteristic (ROC) curves were plotted for the six AT groups based on the true-positive and false-positive results. ROC analysis was used to determine the optimal method for determining ATs in different laboratories under different conditions.
2.4. Building of the Executable Program NegaProcess
Utilizing PyInstaller-6.1.0 (
http://www.pyinstaller.org/, accessed on 20 October 2023), the signal analysis script for negative samples and the calculation script for five AT calculation methods based on negative signals were compiled into an executable program named NegaProcess. This versatile program empowers researchers to effortlessly load any quantity of negative control samples, assess the baseline conditions of their laboratories, and scientifically adjust the AT to enhance sample analysis. Importantly, it does not demand programming or statistical expertise, allowing users to save time that would otherwise be spent on learning analysis procedures.
4. Discussion
Analysts generally pursue two objectives when detecting and analyzing samples: maximizing obtainable information and minimizing noise. However, there exists a tradeoff between these objectives. The conservative ATs commonly recommended by manufacturers aim to accomplish the latter objective. However, valuable information may exist below the AT, particularly for low-template samples. Indiscriminate reduction of the AT, nonetheless, may result in information that lacks interpretability and fails to provide compelling conclusions in legal contexts.
This study employed two types of detection data: historical data collected from various laboratories, and data obtained from low-template standard samples. The analysis explored the baseline states of negative control samples and assessed the impact of different ATs on both allele and non-allele detection results. The historical data encompassed outcomes from three distinct STR detection kits, with the choice of kit constituting a factor contributing to baseline disparities. The signal height distribution of negative samples from diverse kits exhibited distinctive characteristics, primarily manifested in varying levels of average and maximum peak heights for each dye, owing to different fluorescent labels. Consequently, each kit manufacturer recommends a specific AT.
To examine whether the signal distribution of negative samples changed over time, historical negative samples were categorized into quarters (with each year consisting of four quarters). The results indicated that these differences appeared to occur randomly. Further analysis was conducted to determine if these differences exhibited any seasonal patterns. The findings revealed no correlation with seasonal temperature fluctuations in the quantity or distribution of negative signals across different quarters. Most laboratories operate in controlled environments, maintaining a constant temperature and humidity, thus minimizing the impact of outdoor temperature changes on the experimental process. Random variations may arise from the irregular maintenance of the electrophoresis instruments. Investigating the influence of instrument maintenance changes on experimental results requires dedicated monitoring of the instrument status and thorough record keeping. We advocate that every laboratory should adopt these practices to achieve standardization of experimental conditions. For a more comprehensive interpretation of the analysis results, consider performing repeated tests on the negative control samples after each instrument maintenance session to observe and document the baseline status.
Variations between laboratories are expected, owing to differences in experimental instruments and reagent configurations. In laboratories with stringent control over experimental conditions, the practice of maintaining a lower baseline while adhering to widely accepted ATs may result in the oversight of potential information. This becomes a critical issue, especially in the analysis of challenging materials. In addition, when faced with low-template samples, laboratories typically opt to increase the number of PCR cycles during amplification. Ideally, increasing the number of PCR cycles should exponentially increase the target DNA template quantity and increase the probability of detecting alleles without affecting the negative signals. However, the results of this study indicated that with an increase in PCR cycle numbers, non-specific amplification products or exogenous DNA in the negatives also increased, leading to differences in the distribution of negative signals at different PCR cycle numbers. Specifically, the average and maximum signal heights of the same dye at 31 cycles were significantly higher than those at other cycle numbers. Importantly, these heightened signals did not exceed commonly used thresholds in this study. Designating a profile as contaminated only occurs when these signals exceed the AT, and the quantity surpasses the maximum allowable number of drop-in events (sporadic contamination) [
18,
19]. Contamination, if present, may be reproducible and can be deduced through different types of negative controls. According to the findings of this study, when opting for an increase in PCR cycle numbers, analysis under conservative thresholds ensures the stability of allele dropout events at a specific level, while concealing non-allele peaks below the AT (
Figure 5). Meanwhile, for the analysis of samples with extremely low templates, as defined by quantities below 7.8125 pg in this study, it is imperative to employ conservative ATs to minimize the interference from non-allele peaks.
Currently, various laboratories face a substantial backlog, with certain cases retaining only detection results and no remaining samples for retesting. To reanalyze detection data, there is a pressing need to establish scientific ATs to accurately distinguish allele peaks from noise. This study employed five previously reported methods based on negative signals to calculate the appropriate AT and analyzed the results of low-template standard samples from seven laboratories. Based on the ROC curves, AT5 consistently exhibited the lowest overall error rate in most cases. This favorable performance can be attributed to AT5’s calculation, which is grounded in the assumption that negative signals conform to a lognormal distribution. This premise aligns more closely with the observed right-skewed normal distribution of the signal heights for each dye in the various kits. Furthermore, reanalysis of cases showed the significant potential of adjusting the AT based on the AT5 calculation method to unearth genuine allele information. Notably, in specific cases involving extremely low-template quantities and increased PCR cycle numbers, the original conservative AT exhibited even lower overall error rates, which is consistent with the previously mentioned conclusions.
Moreover, we examined the size and height of non-allele peaks at each locus under each calculated AT. Using the results obtained under
AT5 for LAB_a as an illustration, the occurrence of numerous non-allele peaks was not random; they appeared consistently across samples with different template quantities and PCR cycle numbers (
Figure S11). Upon cross-referencing with standard genotyping, regularly occurring non-allele peaks were identified as stutters. Although the analysis process in the GeneMapper ID-X initially filters stutter based on the default stutter ratio of the VeriFiler™ Plus kit, its presence in the current results suggests that the stutter ratio may differ under conditions of low-template DNA, leading to an increased number of non-allele peaks. To address this issue, specialized verification of low-template samples is recommended to eliminate interference from these non-allele peaks during interpretation. Additionally, certain laboratories (such as LAB_c) exhibited a substantial number of spikes and off-ladder (OL) peaks in their analysis results, which contributed to a higher number of outliers in the statistical analysis of non-allele peaks after the AT was lowered. Spike peaks may be caused by external particles, such as dust or dried small aggregates, entering the capillary or gel, or fluctuations in the current [
18]. To maintain consistency in sample quantity across laboratories, we chose not to exclude anomalous files. These outliers can be mitigated by increasing sample size. Crucially, when encountering such issues, analysts should promptly review experimental procedures and ensure the proper maintenance of electrophoresis instruments.
This study centers on the analysis of low-template samples, significantly enhancing information availability through the scientific adjustment of the AT. However, it is imperative to acknowledge certain limitations in this research. Without reference genotypes, distinguishing whether peaks above the adjusted AT represent alleles or non-allelic artifacts remains challenging. Forensic researchers have recognized and actively worked toward discerning non-allelic peaks or directly mitigating their impact, particularly with the incorporation of artificial intelligence [
10,
20,
21,
22,
23,
24,
25]. Nevertheless, the “black box” issue introduced by sophisticated artificial intelligence algorithms poses a challenge in legal contexts [
26]. To meet the criteria of intelligibility and acceptability in court, algorithms addressing such issues must prioritize transparency and readability. Developing a straightforward machine learning algorithm to effectively categorize peaks above the AT for confronting and resolving this persistent challenge is the focal point of our ongoing research.
5. Conclusions
In conclusion, this study systematically investigated the baseline signals in electrophoresis results by leveraging data from multiple laboratories. Variability in negative signal distribution was observed across different reagent kits, laboratory conditions, and amplification cycles. Our findings underscore the impact of routine instrument maintenance and reagent changes on baseline levels, providing valuable insights for laboratories conducting forensic DNA analyses. Adjusting the AT according to specific laboratory conditions is crucial for minimizing allele dropout and non-allelic peak detection, ensuring accurate and reliable results. Moreover, a comparative analysis of the five AT calculation methods revealed that barring extreme scenarios of low-template amounts and high PCR cycle numbers, the AT5 method consistently demonstrated the lowest overall error rate. This suggests that AT5 is a promising method for enhancing allele detection, particularly in the analysis of challenging historical data. As a practical outcome, we developed a user-friendly program for real-time statistical analysis that facilitates prompt adjustments to the AT based on laboratory-specific conditions. This tool empowers laboratory personnel to conduct efficient and scientifically guided analyses, thereby maximizing information retrieval and ensuring robust forensic DNA analysis.
In summary, our comprehensive investigation of baseline signals and AT optimization provides valuable insights for forensic DNA analysis. The tailored adjustments recommended in this study, supported by the empirical evidence, offer a practical framework for laboratories to enhance the accuracy and reliability of genetic analysis procedures.