*3.5. The 241 nt Repeats Are Found Significantly Expressed in Transcriptomes and Highly Correlated to the mRNA 3*′*UTR Sequence*

To answer the question whether the 241 nt repeat is indeed expressed in the 3′ UTR of the final RNA and the possible role of this repeat in gene expression, transcriptome datasets available in GenBank [32] were analyzed. First, we selected all RNA-seqs of epimastigote and trypomastigote forms with at least two replicates each. Only the Dm28c, Y and CL Brener strains were available, and only the Y and CL Brener transcriptome data could be analyzed due to the percentage of aligned reads (Supplementary File S10). Therefore, RNAseq data of the *T. cruzi* CL Brener strain (Franco G.R., unpublished data) and Y strain [33,34] from epimastigote and trypomastigote forms were aligned against a reference genome sequence, coverage analysis was performed, and the expression profiles of the 241 nt repeat and surrounding regions were obtained.

To determine the presence of reads covering the 241 nt repeat sequence, the counts per million reads mapped (CPM) parameter was used. The cut-off value of 2 was established, and CPMs over 2 were considered as a significant expression of the analyzed region. Figure 5A shows the CPM values from epimastigote and trypomastigote forms of Y and CL Brener strains, and the great majority of the repeats are expressed in epimastigote forms (96.9% in Y and 70.6% and 76.5% in CL Brener S and P, respectively) and trypomastigote forms (66.4% in Y and 100% in CL Brener S and P). In fact, the CPM mean is over 60 in both strains (in the epimastigote form of the Y strain and in the trypomastigote form of the CL Brener strain). Thus, the 241 nt repeats are not just a repetitive element on the genome but also are expressed as constituent of the final RNA.

−−

− − **Figure 5.** RNA-seq analysis from Y and CL Brener strains of *T. cruzi*. After RNA-seq alignment with reference genomes (Y and CL Brener strains), the coverage and expression profile were obtained from 241 nt repeats and surrounding regions. (**A**) CPM values from 241 nt repeats in epimastigotes (light gray) and trypomastigotes (dark gray) (CL Brener and Y strain). The dashed line indicates the cut-off (2). (**B**–**D**) TPM values from transcripts aligned in the 241 nt repeat, dup and ddown segments of CL Brener\_S (**B**), CL Brener\_P (**C**) and Y strain (**D**). (**E**–**G**) Gene expression profile of multigenic families associated and nonassociated with the 241 nt repeat. The Log2(FC) values of the trypomastigote/epimastigote ratios were calculated, and a FC of 1.5 x was chosen as the cut-off. A Log2(FC) > 0.585 indicates genes upregulated in trypomastigotes (dark gray), a log2(FC) < −0.585 indicates genes upregulated in epimastigotes (light gray) and a log2(FC) between −0.585 and 0.585 indicates nondifferential expression (black). Abbreviations: CPM-counts per million; TPM-transcripts per million, FC-fold change, TS-trans-sialidase, MASP-mucin-associated surface protein, DGF-1-dispersed gene family, RHS-retrotransposon hot spot and MF-multigenic family.

′ Once the presence of the 241 nt repeat is confirmed in mRNAs, we then analyzed if the 241 nt repeat was indeed in the 3′ UTR of multigenic family genes (trans-sialidase, MASP, mucin and GP63), as predicted by the genomic analysis. For that, the TPM of three regions were considered: the 241 nt repeat, the region between the repeat and upstream gene (indicated by dup in Figure 2B) and the region between the repeat and downstream gene (indicated by ddown in Figure 2B). Additionally, only the

241 nt repeats flanked by one upstream gene and one downstream gene were considered (patterns ++ and −− of Figure 2B and Table 3).

Some background information is provided below:


Therefore, the TPM analysis rationale was that if the 241 nt repeat is part of the mRNA 3′ UTR, the TPM from the repeat and dup would be more similar, but if the 241 nt repeat is not part of the 3′ UTR, the repeat TPM would be similar to the ddown TPM. Figure 5B–D show the mean TPM values from the Y strain (Figure 5B), CL Brener\_S (Figure 5C) and CL Brener\_P (Figure 5D). The TPMs of the 241 nt repeat regions are higher than the TPMs of ddown, and repeat TPM values are closer to the dup TPMs. To assess whether there is an association of the TPM value from repeat and dup, a statistical test was applied.

In the CL Brener (S and P) and Y strains, the 241 nt repeats were significantly associated with dup according to the one-tailed sign test (*p*-value < 0.01 for both strains). There was a strong association between the repeat and the dup corresponding to the genes of multigenic families (trans-sialidase, MASP, mucin and GP63). The remaining genes showed no significant association with the repeats in CL Brener-S (*p* = 0.82), borderline significance for CL Brener-P (*p* = 0.046), and a significant association for the Y genome (*p* = 0.002), although in all cases, the association was not as strong as that in the set of the four gene families (trans-sialidase, MASP, mucin and GP63).

Additionally, the regions corresponding to dup and the 241 nt repeat were analyzed in terms of their coverage, and most of these regions on CL Brener and Y strains were completely covered (Supplementary File S11). Taken together, these data strongly indicate that the 241 nt repeat is indeed expressed in the 3′ UTR of the genes of multigenic families such as trans-sialidase, MASP, mucin and GP63.
