1. Introduction
Prenatal testing for genomic defects of a fetus before birth is an integral component of current obstetric practice [
1]. The primary aim of the testing is screening for the presence of abnormal number of chromosomes. Testing is focused mainly on identifying an additional copy of a chromosome, the condition called trisomy [
2], represented as an excessive proportion of genomic material from the aberrant chromosome. Discovery of fetal DNA fragments in plasma extracted from maternal blood, cell-free fetal DNA (cffDNA) [
3], opened up new options in the field of prenatal screening called non-invasive prenatal testing (NIPT) [
1,
4]. In contrast to established screening methods [
5], sampling of genetic material from the mother’s circulation does not pose any direct risk for the fetus [
6]. On the other hand, fetal DNA fragments constitute only a minor part of the sampled, mostly maternal, genomic material and so pose a challenge for reliable detection of present aberration.
The proportion of fetal fragments in analyzed DNA mixture is called fetal fraction (FF). A sample with a trisomic fetus typically has an aberrant proportion of genomic material from the trisomic chromosome, compared to healthy samples. A trisomic sample with a very low FF can, however, be incorrectly evaluated as healthy, since the aberrant chromosome may cause only a weak deviation from normal values that can be presumed to represent a normal measurement error [
7], which is particularly problematic for the detection of sub-chromosomal aberrations. Reliable estimation of the FF is therefore a crucial step of NIPT analysis to avoid false-negative results [
8,
9]. This can be based on relevant anthropometric and laboratory processing attributes with significant correlation with FF, namely the gestational age and the body mass index of the mother [
10]. They were, however, reported to be too weak for use as stand-alone information for reliable conclusions, and so more sophisticated technical parameters exploiting different characteristics between maternal and fetal DNA fragments are typically used [
11].
Count based methods of FF determination calculate disproportion of number of reads mapped to chromosomes that differ in mother and fetus genotypes. Although they are quite reliable, they can be used only on samples with male (XX vs. XY genotype) [
12] or trisomic (2 copies vs. 3 copies of an aberrant chromosome) fetuses [
13,
14]. In pregnancy with a healthy female fetus, the FF have to be estimated by alternative methods.
Methods based on sequence variation in DNA fragments are, on the other hand, highly reliable [
15,
16], although their application typically requires another laboratory assay to recover genetic map of parents, and so it is considered too expensive and time-consuming for routine diagnostics. Alternatively, FF may be estimated from the genomic location patterns that slightly differ between maternal and fetal fragments due to differences of their nucleosome positioning (SANEFALCON method) [
17] and euchromatic DNA structure (SeqFF method) [
18]. Due to high accuracy and no additional laboratory costs, the SeqFF method is a preferred method for samples with female fetus [
19].
The length of DNA fragment is another promising attribute for differentiation of DNA fragments, since fetal fragments tend to be shorter than maternal ones (
Figure 1). Although the fragment lengths improve prediction of NIPT tests by detection of false positive predictions [
20,
21,
22], their potential for prediction of FF has been understudied. Only a simple length-based metric has been proposed in [
23], where the ratio of the number of shorter (100–150 bases) and longer (163–168 bases) fragments has correlated with the FF.
We demonstrate in this study that fragment length profile may be utilized as a fairly accurate predictor of the FF. We investigate various methods and enhancements that lead to improvement of prediction accuracy and propose weighting training samples to improve accuracy in samples with low FF that are more prone to wrong testing conclusions. Although the proposed method is not as accurate as SeqFF, we show that their combination leads to more a reliable predictor than individual methods alone.
3. Discussion
Accurate estimation of fetal fraction in analyzed mixtures of fetal and maternal DNA fragments is a critical part of non-invasive prenatal testing. The main challenge is to identify samples with too low a number of fetal fragments for reliable prediction of aneuploidy, since over-estimation of prediction may lead to false-negative results. Conversely, accuracy is not so important in typical settings for samples with a sufficiently high fetal fraction. Prediction models should be therefore trained and evaluated with a special focus on the importance of individual samples.
Several methods have been proposed for FF prediction, each with its own benefits and limitations. Early established methods were based on disproportion of number of fragments from sex chromosomes. They are simple to understand and implement and require a relatively small number of samples for training. Their calculation is therefore typically the first step in estimation, whether as the gold standard for other trained methods or initial determination of fetus sex. Testing may also utilize two separated prediction methods for male and female fetuses, possibly with different thresholds based on their precision. With the rising number of tested samples, the more complex models can be trained without the risk of overfitting. Since the number of samples should be much higher than the number of trained model parameters, laboratories should reach at least hundreds of samples for fragment length based models and tens of thousands for the position based models. This is a great advantage for simpler models, since the acquisition of such a number of samples may be limiting for small laboratories. This represents a problem for already established tests as well, since beneficial changes in laboratory processing may be halted due to a high cost of re-training.
Available methods are also highly diverse in the attributes they use for prediction, from the simplest models that utilize only patient attributes, to highly complex attributes aggregated from sequenced fragments. Combination of several independent, even weak predictors, can perform better than any of them alone, as was shown in many other domains [
24]. Fetal fraction prediction benefited from such coupling as well, as presented in this study. Furthermore, the more accurate predictions including more methods may indicate or even correct the potential abnormal diversions of any single method. Present or future methods may be incrementally supplemented and so gradually improve the overall predictions. The effect of individual methods should be however weighted to prefer methods with higher accuracy. Assigned weights may also indicate that a gain from a less accurate predictor becomes negligible and so may be excluded without significant impact.
We propose several approaches that markedly improve accuracy of prediction of fetal fraction, making results of non-invasive prenatal testing more reliable. Firstly, the fragment length profiles may be utilized as a reliable predictor of FF and achieve similar precision to the favored methods based on positions of DNA fragments. Secondly, the combination of results from multiple predictors achieve far better predictions, at least when they are based on different attributes of input DNA fragments. The final combined method is free of systematic bias when compared to the Y-based method. We conclude that low quality predictors, like regressors based on relevant anthropometric and laboratory processing attributes, may contribute to the overall accuracy. Other methods utilizing their own set of distinctive attributes, both present and future, can be similarly included, and so improve prediction compared to their stand-alone usage. Based on these findings, we conclude that additional, possibly independent, information can significantly raise the prediction accuracy of FF prediction, and thus, should be used when possible.
Since the SeqFF publication does not provide a training option, we were forced to use the published parameters on our dataset. As a consequence, we were unable to reach correlation reported by the SeqFF study; however, our combined method reached similar scores. We presume that the combined method can significantly surpass the performance of the standalone SeqFF method if its parameters are properly trained on a similar dataset.
Finally, appropriate weighting of samples in the training process may achieve higher accuracy for samples with low FF, and so allow a more decision regarding which samples have enough fetal fragments for subsequent testing for genomic aberrations.
4. Materials and Methods
4.1. Sample Acquisition
We collected 2454 informative samples from women undergoing NIPT testing, with single euploid male fetus, for training and testing of FF models. These samples were concluded as informative based on their fetal fraction and manual examination. The hard coded FF threshold for an informative sample was 4% but in some cases a sample with lower FF was reclassified as informative by manual examination. Our work was part of two clinical studies approved by the Ethical Committee of the Bratislava Self-Governing Region (Sabinovska ul.16, 820 05 Bratislava): the first one called “NIPT study” (study ID 35900_2015 approved on 30 April of 2015 under the decision ID 03899_2015) and the second one called “SNiPT” (study ID 37136/2018 approved on 11 June of 2018 under the decision ID 07507/2018/HF). All patients that were included in the study signed written informed consents consistent with the Helsinki declaration which were approved by the above-mentioned ethics committee.
4.2. Sample Preparation and Sequencing
Peripheral blood from pregnant women was collected into EDTA tubes and kept at 4 °C until plasma separation. Blood plasma was separated within 36 h after collection and stored at −20 °C at a DNA isolation unit. DNA was isolated using a QIAgen DNA Blood Mini Kit (Hilden, Germany). Standard fragment libraries were prepared from isolated DNA using a modified protocol of the Illumina TruSeq Nano Kit (San Diego, CA, USA) as described previously [
20]. Briefly, to decrease the laboratory costs we used reduced volumes of reagents, which was compensated for by completing 9 cycles of PCR instead of 8, as per protocol. Physical size selection of cfDNA fragments was performed using specific volumes of magnetic beads in order to enrich FF. Illumina NextSeq 500/550 High Output Kit v2 (San Diego, CA, USA) (75 cycles) was used for massively parallel sequencing of prepared libraries using pair-end sequencing with read length of 35 bp.
4.3. Mapping and GC Correction
The first stage of data processing was carried out as previously described [
20]. NextSeq-produced fastq files (two per sample) were directly mapped using the Bowtie 2 [
25] algorithm with --very-sensitive option to the human reference genome hg19 (GRCh37). Reads with mapping quality of 40 or higher were retained for further data processing. Length of a DNA fragment was determined as the difference of the leftmost and the rightmost mapped base of the corresponding read pair.
Next, we weighted mapped reads to eliminate the GC bias according to [
26] (with the exclusion of intra-run normalization) to retrieve better estimates of underlying chromosomal distributions.
The corrected number of fragments per chromosome is determined by summing the corrected read counts over all bins of the specified chromosome. The exception is chromosome Y, which is presented only pregnancies with male fetuses. Even in such cases, low proportion of mappable regions does not contain enough reads for reliable correction. As a result, a vector of corrected number of fragments per autosomes, called autosomal counts, corrected number of fragments per chromosome X (chrX), and uncorrected number of fragments per chromosome Y (chrY) were passed to the calculation of the reference, Y-based FF (see below).
4.4. Data Preparation and Evaluation of Models
Fetal fractions were calculated using several proposed predictors and state-of-the-art method SeqFF. Predictions were compared with the reference Y-based method. The robustness of this comparison stems from using a multitude of training and testing datasets, thereby suppressing bias from a single random sampling. Each model presented in this paper was evaluated on 100 testing sets by Pearson correlation coefficient between Y-based values and predicted values of FF. For each such experiment, the whole dataset (2454 male samples) was divided into 80% for training and 20% for testing.
The proportion of fragments for each target length (50–220 bases) were calculated for each sample separately. Moreover, in the case of the support vector machine and neural network estimators, the resulting 171 feature vectors of training sets were normalized to have zero mean and unit variance.
Y-Based Estimator
The Y-based FF was calculated according to the equation
from [
27], where male %chr Y and female %chr Y is the mean fraction of chromosome Y sequence reads of plasma samples obtained from 14 adult male individuals and pregnant women bearing euploid female fetuses, respectively. %chr Y is the fraction of chromosome Y sequence reads of the sample whose FF we want to calculate.
4.5. FF Estimators Based on Fragment Length
4.5.1. Linear Regression Model (LRM)
Fetal fractions were estimated using a standard linear regression model provided by R-software given by the equation
where
F is vector of fetal fractions,
L is matrix of fragment lengths from 50 to 220 bp,
θ is vector of parameters to be estimated, and
ε is vector of errors. The training and testing sets were selected from the collected data. The parameters
θ resulting from training were used to estimate FF for testing data.
4.5.2. Non-Linear Regression Model (NLRM)
We introduce a new approach for FF prediction—nonlinear function used in NLRM. Motivation behind this method originated from a published method using the ratio of shorter (100–150 bases) and longer (163–168 bases) fragments [
23]. However, when we used a ratio of the same fragment lengths as reported in the publication, we obtained sub-par results for our data. Therefore, we think that the choice of fragment lengths is data specific and, in the first step, we were exhaustively searching for the best fragment lengths to use for our data.
Let I =
be set of proportions of fragments of length 50 bp to 220 bp. Let
be subsets of consecutive elements of I. Ratio of fragments is defined as
To effectively compute the best performing ratio, reduction of the number of parameters is needed when using optimization methods. To this end, we chose only consecutive intervals of fragment lengths for evaluation. The ratio with the best correlation with Y-based FF (FY) was then chosen for second step. We found out that was maximal for the subsets and .
In the second step, we applied a non-linear regression method for estimating model parameters (weights) corresponding to fragment lengths. In this case, NLRM is defined by the equation
where
is the vector of fetal fractions, η is a nonlinear function of selected fragment lengths,
is vector of parameters to be estimated, and
is vector of errors. Consider the non-linear regression model where for each sample (indexed by
), the FF estimation is given by
Note that in the above, the nonlinear function η is given by the ratio of fragment lengths weighted by parameters .
Similarly to LRM, the training and testing sets were selected from the collected data. The
nlm function from
R software (
https://www.r-project.org/) was used to estimate the parameters for NLRM. The parameters were then used to estimate FF for testing data. Other optimisation methods from
R software like
nls function and
optim function using Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [
28] were also applied to estimate parameters for NLRM, but resulting variance of the FF was much bigger compared to using
nlm. 4.5.3. Neural Network (NN)
A multilayer perceptron (MLP) regression model was trained in several steps for which the datasets of normalized fragment length proportions were preprocessed. We selected
DNNRegressor estimator from open source machine learning framework
TensorFlow [
8] as implementation of MLP regression. We tried two- and three-layer network architectures and several activation functions in the
DNNRegressor method. As a result, we found the following configuration to have the best performance for the given input size: 55 units in input layer and 25 in two hidden layers each,
Adagrad optimizer, hyperbolic tangent as activation function, and mean reduction of loss.
4.5.4. Support Vector Machine (SVM)
We selected a SVM estimator from open source machine learning library
sklearn [
29] as the implementation of SVM regression. Again, datasets of normalized fragment length proportions were used to predict fetal fraction. Parameters of the SVM were optimized through grid search using the method of sklearn (
GridSearchCV) in conjunction with rigorous testing and empirical evidence. The following parameters were found to give the best results: linear kernel, epsilon 0.01, tolerance for stopping criterion (tol) 0.001, and penalty parameter of the error term (C) 1.0. Non-linear kernels were also tested but all had poorer performance than linear kernel.
4.6. FF Estimators Based on Read Counts
SeqFF
The number of fetal DNA fragments varies in different regions of the genome and significantly correlates with GC content and presence of coding regions. The position-based prediction method SeqFF [
11] takes a vector of Loess-corrected fragment counts as input, partitioned into bins 50,000 bases long. The fetal fraction is then determined using standard multivariate regression models trained on a large number of samples (25,312 in original study) which exceeds this study close to tenfold. In addition, published software for SeqFF calculation does not provide a training option. For these reasons we used a pre-trained model and testing script from the original study to evaluate prediction accuracy on our dataset.
4.7. Correlation with Sample Attributes
Additional sample attributes from requisition were compared with the reference FF. Slight significant correlation was observed in three attributes, namely the gestational age, the body mass index of the mother, and the DNA library concentration, with Pearson correlation at the level 0.1, −0.33, and −0.22, respectively.
4.8. Combinaton of FF Estimators
A combined estimator for FF was built from SVM estimator and SeqFF estimator. Furthermore, we examined impact of additional predictive attributes from requisitions and laboratory processing of samples, namely the gestational age, the body mass index of the mother, and the DNA library concentration. Then, 5 variables, specifically FF estimates resulting from the SeqFF and SVM methods, the gestational age, the body mass index of the mother, and the DNA library concentration were inserted into linear regression with Y-based FF as a response variable. Finally, the parameters obtained from this linear regression were used to estimate FF for testing data.
4.9. Weighted Samples
The crucial step in the standard NIPT analysis is to identify samples with low fetal fractions that may not have enough fetal fragments for reliable identification of aneuploidy. Such samples are concluded as uninformative and may be subject to repeated sampling, typically few weeks later. Overestimation of FF may spuriously inflate confidence in a ploidy call for a sample where the assay is not expected to be sensitive. On the other hand, for high-FF samples the accuracy of FF is less important because the aneuploidy-calling algorithm is sufficiently sensitive anyway.
With this fact in mind, we tried to improve predictions in the lower range of Y-based FF targets. All samples with fetal fraction below 10% (according to the Y-based estimator) within the dataset were duplicated, triplicated, and quadruplicated, giving three distinct datasets. To evaluate this approach, we trained prediction models on these datasets and compared them with models trained on the original dataset. Prediction ability of FF in lower and higher range was evaluated separately.