*3.2. The 241 nt Repeat Is Enriched at Intergenic Regions of T. cruzi Genome Sequences*

′

Since the 241 nt repeat was present in both CL Brener haplotypes, we next wanted to determine whether this repeat (i) is present in others *T. cruzi* strains and (ii) is present in other trypanosomatids. Blast-n search revealed that the 241 nt sequence is present in all searched *T. cruzi* strains, including the bat strain *T. cruzi marinkellei*, but it is absent in *Leishmania* and *Trypanosoma brucei* (Supplementary File S2).

To characterize this new repetitive element found in *T. cruzi*, we first checked the genome sequences of *T. cruzi* strains from different DTUs available in TritrypDB that were sequenced using long reads (PacBio technology) that provide more reliable assembly of these genomes (Table 1). Then, we chose one strain of each DTU to be analyzed, named as follows: Dm28c (TcI), Y (TcII), and TCC (TcVI). There are no strains from TcIII and TcIV sequenced by PacBio (Pacific Bioscience of California, Inc. Menlo Park, CA, USA) technology, and the TcV strain genome (Bug 2148) lacks annotation (Table 1). Even though the *T. cruzi marinkellei* genome was not sequenced by PacBio, we decided to perform some analyses on this strain in order to gain evolutionary insights into this repeat.

**Table 1.** *T. cruzi* genomes available in TritrypDB. With the exception of *T. cruzi marinkellei* (sequenced by Illumina technology-Illumina Inc., San Diego, CA, USA) and the CL Brener strain (reference genome sequenced by whole genome shotgun assembly), the strains were sequenced by PacBio technology. NA: Not Annotated/\* chosen strains for formal analysis.


To identify the locations of this repetitive sequence on the genome, Blast-n searches on the *T. cruzi* genome sequences were performed. From the retrieved regions, only those with a minimum length of 140 bp and 95% identity to the 241 nt repeat were selected. The great majority of this repetitive element was found distributed on intergenic regions of the analyzed strains (Supplementary File S3): 100% in Dm28c (1117 of 1117), 99.5% in Y (742 of 746), 100% in TCC (1171 of 1171), 96.7% on CL Brener S (322 of 334), and 97.5% on CL Brener P (398 of 408). The repeats found in the genic regions (Supplementary File S4) were in genes of a hypothetical protein (four genes on Y, eight genes on CL Brener\_S, and eight genes on CL Brener\_P), ATPase (1 on CL Brener\_S and 2 on CL Brener\_P) and

trans-sialidase (two on CL Brener\_S). In fact, the identity between these two trans-sialidase sequences and the 241 nucleotide consensus sequence is the cause of the alignment break observed in Figure 1B (nucleotides 30–60 and 180–210), where the 150 nt fragments 100% identical to these trans-sialidases were eliminated after the multigenic family filtering step. Then, we further investigated the repeats located on intergenic regions.

The selected sequences from Blast-n were at least 140 nucleotides long, but approximately 90% of repetitive sequences found on the genome ranged from 231 to 244 nucleotides in DM28c, TCC and CL Brener and 87.5% in the Y strain (Figure 2A). Since the consensus sequence of this new repetitive element is 241 nucleotides long, we call these repeats a 241 nt repeat. These repeats located on intergenic regions were found interspersed throughout the genome sequences rather than organized in tandem in a head-to-tail fashion (Supplementary Files S5–S7), and they were present in most chromosomes from assembled genomes of CL Brener and Y strains (Table 2) (the DM28c and TCC genome sequences are not chromosome assembled). Even though larger chromosomes showed more copies of the repeat, its distribution was not proportional to chromosome size, as seen on Y strain chromosomes 2 and 3 (Table 2), where 3 copies are found on chromosome 2 and 73 copies are found on chromosome 3. Different copy numbers were also observed between CL Brener haplotypes, as seen on chromosome 40 (Table 2), where 34 repeats were found on the S haplotype and 8 were found on the P. In addition, the repeat distribution showed different profiles among the chromosomes, as it was observed at some chromosome edges in some chromosomes and concentrated in the middle in others (Supplementary Files S5–S7). Therefore, there was no preferential location along all chromosomes from the CL Brener and Y strains.
