*3.4. The 241 nt Repeat Is Found Closer to Upstream Genes and May be Part of the 3*′*UTR of Trans-Sialidase Gene mRNA*

The intergenic location of the 241 nt repeat and the different gene profiles of upstream and downstream genes of the repeat motivated us to determine the distance between the 241 nt and the upstream and downstream genes (indicated by "dup" and "ddown" on Figure 2B). Comparing the distances from the 241 nt repeat to the upstream and downstream genes, it was observed that the 241 nt repeat was found to be significantly closer to upstream genes than to downstream genes in all of the *T. cruzi* strains analyzed (Figure 3A and Supplementary Files S8 and S9) including *T. cruzi* marinkellei (Supplementary Files S8 and S9), with the exception of Sylvio X10/1 strain (Supplementary Files S8 and S9).

**Figure 3.** Distance from 241 nt repeats of upstream and downstream genes. The distances of each gene upstream (blue symbols) and downstream (red symbols) of the 241 nt repeat are plotted on the graph. Horizontal bars indicate the mean, and \* indicates a *p* value < 0.001 from the Student's t-test. (**A**) Distance from the 241 nt repeat to upstream genes and downstream genes on Dm28c, Y, TCC and CL Brener Esmeraldo-Like haplotype and non-Esmeraldo-Like haplotype. (**B**) Distance from 241 nt to the four main multigenic families of genes among upstream and downstream genes on Dm28c, Y, TCC and CL Brener genome sequences. (**C**) Distance from the repeat to hypothetical proteins.

′ ′ When the distance between the repeat and each multigenic family was analyzed, the 241 nt repeat was found to be significantly closer to some upstream multigenic families, including trans-sialidase, MASP and mucin, on all strains analyzed. The GP63 from upstream genes was closer to the repeat than from downstream genes; however, it was significant only in the Dm28c strain (Figure 3B). In contrast, the distance from RHS to the repeat was different among strains, while in Y and CL Brener P strains, the repeat was closer to upstream RHS genes (significant only in the Y strain), and in Dm28c and TCC, the repeat is closer to RHS from downstream genes (Figure 3C).

′ The proximity of the 241 nt repeat to upstream genes raised the question whether this repeat could be transcribed as part of the 3′UTRs mRNA of upstream genes. Since the UTR length of mRNA from *T. cruzi* varies in size, ranging from 17 to 2800 nucleotides, and is generally limited to 56.65% of the final mRNA [31], we used these two pieces of information to infer the possible presence of the 241 nt repeat in the 3′UTR of trans-sialidase, MASP, mucin and GP63 final mRNA. To this end, we first calculated the distance from the first nucleotide after the stop codon (of upstream gene) to the last

nucleotide of the 241 nt sequence, as shown in Figure 4A (dB), and then analyzed the proportion of genes where dB was lower than 2800 bp. Second, we calculated the distance from the first nucleotide of upstream genes to the last nucleotide of the 241 nt repeat (dA in Figure 4A) and calculated the ratio of dB/dA. Then, the proportion of genes where dB/dA was lower than 56.65% was investigated. Therefore, when dB is lower than 2800 bp and the dB/dA ratio is lower than 56.65%, it is possible that the repeat is enclosed into the 3′UTR of the final mRNA.

′ ′ ′ ′ ′ ′ **Figure 4.** The 241 nt repeat is found in the 3′UTR of upstream genes. (**A**) Schematic representation of upstream and downstream genes to the 241 nt sequence and distances used to predict the 3′UTR length (dA). (**B**) Predicted 3′UTR including the 241 nt sequence (distance dA is represented on "b"). The percentage of genes that predicted 3′UTR represents less than 2800 bp of the mRNA and is listed on the left of the graphs. (**C**) The predicted 3′UTR size (distance dB is represented on "b", which includes the 241 nt sequence) in proportion to the final full mRNA was plotted. The percentage of genes that predicted 3′UTR represents less than 56.65% of the mRNA is listed on the left of the graphs. In (**B**) and (**C**), the four major representative genes among upstream and downstream genes are represented. Abbreviations: MASP-mucin-associated surface protein and GP-glycoprotein.

Over 96% of the trans-sialidase genes from DM28c, Y, TCC, and CL Brener showed a dB lower than 2800 bp and a dB/dA ratio lower than 56.65% (Figure 4B,C). For MASP genes, there was variation among the strains: In Dm28c, 66.9% of the MASP genes showed a dB lower than 2800 bp (Figure 4B), and 65.7% of the MASP genes showed a dB/dA ratio lower than 56.65% (Figure 4C). In Y strains, 98% of the MASP genes showed a dB lower than 2800 bp (Figure 4B), and 92.5% of the MASP genes showed a dB/dA lower than 56.65% (Figure 4C). In TCC, 72.2% of the MASP genes showed a dB lower than 2800 bp (Figure 4B), and 70% of the MASP genes showed a dB/dA lower than 56.65% (Figure 4C). In the CL Brener strain, 90.6% (S) and 89.1% (P) of the MASP genes showed a dA lower than 2800 bp (Figure 4B), and 82.3% (S) and 86.7% (P) of the MASP genes showed a dB/dA ratio lower than 56.65%

(Figure 4C). The analysis of GP63 genes showed that 100% of the GP63 genes from Dm28c, TCC and CL Brener S had a dB lower than 2800 bp (Figure 4B) and a dB/dA lower than 55.65% (Figure 4C). In the Y strain, 98.2% of GP63 genes showed a dB lower than 2800 bp (Figure 4B), and 83.6% of the GP63 genes showed a dB/dA ratio lower than 56.65% (Figure 4C). In CL Brener P, 78.6% of the GP63 genes showed a dB lower than 2800 bp (Figure 4B) and a dB/dA ratio lower than 56.65% (Figure 4C).

As can be observed from the results described above, trans-sialidase, MASP and GP63 genes (from the four strains) had similar proportions of genes with a dB lower than 2800 bp and a dB/dA ratio lower than 56.65%. However, the analyzed mucin genes presented greater differences between the dB and dB/dA ratio analyses. In Dm28c, 66.7% of mucin genes showed a dB lower than 2800 bp (Figure 4B), and 0% of mucin genes showed a dB/dA lower than 56.65% (Figure 4C). In the Y strain, 92.5% of mucin genes showed a dB lower than 2800 bp (Figure 4B), and 83.6% of mucin genes showed a dB/dA lower than 56.65% (Figure 4C). In TCC, 91.3% of mucin genes showed a dB lower than 2800 bp (Figure 4B), and 47.8% of mucin genes showed a dB/dA ratio lower than 56.65% (Figure 4C). In CL Brener, 92.9% (S) and 95.5% (P) of mucin genes showed a dB lower than 2800 bp (Figure 4B), and 42.9% (S) and 90.9% (P) of mucin genes showed a dB/dA ratio lower than 56.65% (Figure 4C).

These data suggest that the 241 nt repeat can be part of the 3′UTR of the final mRNA of most trans-sialidase, MASP and GP63 genes. The mucin genes analyzed here showed a lower proportion where the 241 nt repeat is located in the 3′UTR. Additionally, mucin is the multigenic family with the fewest genes in the genome associated with the repeat (approximately 5%; Supplementary File S8); thus, any function of this repeat may have a minor role on mucin genes.
