**5. Target Enrichment**

Several target enrichment protocols, including xGen Lockdown probe protocol (IDT, Coralville, IA, USA), NimbleGen Seq Cap EZ system (Roche, Indianapolis, IN, USA), and the SureSelect Target Enrichment System (Agilent Technologies, Santa Clara, CA, USA), all operate using the same general procedure (Figure 3). However, the IDT xGen Lockdown probe protocol appears to be the most commonly used [39]. This protocol recommends using 500 ng of each prepared library as the input. Enrichment steps include combining DNA with the blocking oligos, after which the mixture is dried using a SpeedVac system. Blocking oligos are short oligonucleotide sequences that are added to decrease the possibility of hybridization between library adapters and capture probes during the target enrichment process [39]. The hybridization reaction can then be performed by combining the biotinylated probes with the dried DNA. Following hybridization, streptavidin-coated beads are added to pull down the probe-target complexes. Non-target fragments with no probe binding will then be washed off, and post-capture PCR amplification will follow to amplify the target fragment further. The final step involves purification of the postcapture PCR amplicons, after which the enriched library may be quantified and validated for sequencing on a NGS instrument. Some of the commercially available probe capture enrichment kits are listed in Table 3.

**Figure 3.** Overview of the target enrichment process.


**Table 3.** Commercially available probe capture enrichment kits.

The probes used in target enrichment are either DNA or RNA explicitly designed for the genomic region of interest. Probes are designed to the desired tiling density across the target region. The tiling density refers to the extent of the coverage of the target region by the probes. For example, 1× tiling density means that the probes cover the region of interest one time. In contrast, 2× tiling density means that the region of interest would be covered twice using a series of overlapping probes. Figure 4 depicts the differences between 1× and 2× tiling densities. The probes are often approximately 120 nt in length; however, this could differ, and are labeled by 5- terminal biotinylation. Once the desired probes have been designed, they can then be synthesized by a biotechnology company for use in enrichment studies. Table 4 summarizes various probe design methods that have been used in reported target enrichment studies.

**Figure 4.** Comparison of 1× and 2× Tiling Density (Adapted from IDT).

**Table 4.** Summary of probe design methods used in target enrichment protocols.



## **Table 4.** *Cont.*

In studies focusing on a highly diversified virus such as HIV or HCV, probe design takes careful consideration if attempting to be inclusive of all subtypes and groups. In order to design probes to variable sequences such as those present in the different subtypes of HIV, one strategy is to first design the probes based on a consensus sequence and then subsequently design probes that will cover the variable regions for each subtype to be covered [19]. Alternatively, probes can be designed to be specific to one subtype rather than inclusive of all subtypes [26].

While the IDT xGen Lockdown probe protocol appears to be the most commonly used target enrichment protocol, an alternative protocol that has recently been gaining attention is the myBaits Hybridization Capture Kit by Arbour Biosciences (Ann Arbor, MI, USA). The myBaits protocol involves using pools of in-solution biotinylated RNA/DNA probes that are provided with reagents and allow for targeted sequencing on NGS platforms such as Illumina (San Diego, CA, USA), Ion Torrent, PacBio (Menlo Park, CA, USA), and Nanopore [42]. This kit also allows the user to use custom-designed probes with the kit.

The specific design of the probes will be influenced by the particular goal of the laboratory investigation. Additionally, the choice between RNA and DNA probes may depend on factors such as cost, storage requirements, and stability of the probes. A big advantage of using DNA probes is their stability as they can be safely stored at −20 ◦C, whereas RNA probes are sensitive to freeze-thaw cycles and need to be held at −80 ◦C for long-term storage [46]. RNA probes are often used due to the increased stability and hybridization efficiency of RNA-DNA duplexes compared to DNA-DNA duplexes [46].

## **6. Next-Generation Sequencing**

After target enrichment, samples are sequenced on an NGS platform [47]. Although several NGS platforms are available, the MiSeq and NextSeq systems by Illumina have been most commonly used in target enrichment studies [19,25]. Both the MiSeq and the NextSeq operate using sequencing by synthesis technology in which the addition of fluorescently labeled nucleotides is tracked as the DNA chain is copied [47]. This process occurs in a massively parallel fashion, with the number of cycles determining the read length. The main difference between the two platforms is the read length and data output. MiSeq generates a maximum read length of 600 bp with a maximum output of 13.2–15 Gb compared to a maximum read length of 300 bp reads and output of 32.5–39 Gb with the NextSeq [48]. Both the MiSeq and NextSeq have been used successfully in target enrichment studies, so the choice of which sequencing platform to use will depend on the specifics of the research project itself and the availability of sequencing instruments.

#### **7. Post-Sequencing Analysis (Bioinformatics)**

After completion of sequencing on an NGS instrument, the data from the sequencing run should be analyzed using sophisticated bioinformatics tools. Both MiSeq and NextSeq systems provide read information in a fastq file, which can then be imported into bioanalytic software for analysis. Regardless of the platform, many researchers apply the same procedures to refine their sequencing data. This includes an initial data cleaning up by discarding reads of low quality scores. Adapter sequences are then removed from the reads. The remaining good quality reads are then mapped to a reference sequence available from GenBank or even a custom-defined reference. Once the reads have been aligned, a consensus sequence can be derived and the final alignment determined for further downstream applications [19,25,26]. A summary of the bioinformatics tools that have been used in target enrichment studies of HIV and HCV viruses can be found in Table 5.

**Table 5.** Summary of bioinformatics platforms used in target enrichment protocols.


#### **8. Target Enrichment Performance**

The success of target enrichment protocols has been demonstrated in studies comparing sequencing data from a run without enrichment and a run with enrichment prior to sequencing. In a study by P. Miyazato et al. [26], libraries prepared in the absence of enrichment resulted in 1.9% of the total reads mapping to the provirus. When the same libraries were enriched the total number of reads mapping to the provirus was increased from 1.9% to 99%. Similarly, in a study by S. Iwase et al. [25], DNA-capture sequencing was tested in HIV-1 infected latent cell lines. In the absence of target enrichment, from a total of 1.6 × 10<sup>6</sup> reads, only three mapped to the provirus. This number increased in a subsequent experiment involving target enrichment prior to sequencing. In this case, out of 560,000 mapped reads, there were 28,000 reads aligning with the provirus [25]. This target

enrichment protocol provided information that allowed researchers to characterize the provirus using a new method and authors indicated its applications to other experiments aiming to treat HIV-1 infection.

In addition, target enrichment has also been shown in an HIV study by J. Yamaguchi et al. [19], to aid in the sequencing of low titer samples. They found that the genomes obtained from samples with VLs between log 4 and 5 copies/mL were still incomplete in the absence of the enrichment protocol procedure. In addition, when using samples at even a lower titer of log 3.5 copies/mL sequencing without the enrichment steps resulted in 20–50% coverage only. In comparison, sequencing the same low titer samples (log 3.5 copies/mL) using the enrichment protocol resulted in full genome sequences. This result is important as it indicates that low titer specimens, such as those present in patients undergoing antiretroviral therapy, may be characterized using the probe capture enrichment method.
