**3. Results**

## *3.1. Identification and Distribution of a Novel DNA Repeat on the* T. cruzi *Genome*

The *T. cruzi* genome is composed of diverse repetitive elements that vary in size and copy number among different strains. The smallest high copy number element found in the *T. cruzi* genome is satellite DNA, which is 195 bp long with approximately 20,000 copies [29]. Therefore, we established a 150-nucleotide length to screen for new repetitive elements. We decided to start investigating both haplotypes (Esmeraldo like-S and non-Esmeraldo like-P) of the clone CL Brener genome sequence. This strain was used for the *T. cruzi* genome sequence project, and therefore, a considerable amount of information is available allowing future co-relation analysis. Moreover, the CL Brener strain has a hybrid origin containing haplotypes from different DTUs which could increase the robustness of our observations. Therefore, once the parental strains are from DTUs II and III, any DNA element found in both haplotypes would be more likely to be found in the genome sequences of other DTUs. In our approach, we used a sliding 150-nucleotide window along all chromosome sequences in each haplotype, moving it one nucleotide at a time, resulting in millions of 150-nucleotide fragments covering the entire genome of *T. cruzi* CL Brener (Figure 1A). A list with 52 million fragments was obtained and summarized, showing the frequency of 100% identical fragments that appeared during the window screening.

Furthermore, four sequential filtering steps were used to clean these data and isolate potential repetitive sequences. The first two steps excluded any fragment with at least one undefined (N) nucleotide, and then only the ones that appeared at least ten times on the list were selected. Next, the two additional filtering steps excluded the fragments with a significant match against the repetitive elements using RepeatMasker software and excluded any fragment from multigenic family genes. Therefore, the final list of 67 unique 150-nucleotide fragments was obtained (Figure 1A). Once all of the fragments were obtained by a sliding window, where sequential fragments were only one nucleotide apart, the final 67 fragments were aligned to determine whether they were independent repetitive sequences and/or part of a longer sequence. As shown in Figure 1B, all 67 sequences present 100% identity and aligned together, resulting in a consensus sequence composed of 241 nucleotides (Supplementary File S1). Therefore, using the 150-nucleotide sliding window and filtering and alignment steps, we identified a novel repetitive sequence on the *T. cruzi* genome that has not been described to date.

′ ′

−

′

**Figure 1.** Strategy to identify a novel repetitive element in the *T. cruzi* CL Brener genome. (**A**) Schematic representation of the 150 nucleotide (nt) sliding window used to generate sequences covering all of the CL Brener genome and filtering steps used to exclude known repetitive elements. (**B**) Alignment of the 67 sequences of 150 bp obtained after the filtering steps that resulted in the consensus sequence of 241 nucleotides of the repetitive element.
