*3.6. Distinct Expression Profile Between the Epimastigote and Trypomastigote of Genes Is Associated with the 241 nt Repeat*

The RNA-seq data also allowed us to quantify the expression profile of epimastigotes and trypomastigotes from CL Brener and Y strains. In that manner, six multigenic family (MF) genes were selected, and four were found to be enriched among upstream genes (trans-sialidase, MASP, mucin and GP63) plus DGF-1 and RHS (enriched among downstream genes). For this analysis, only upstream genes and downstream genes from patterns ++ and −− (Figure 2B) were considered, and all genes annotated as "pseudogene" were excluded.

The log2(fold change, FC) ratio of trypomastigote/epimastigote from the six multigenic family genes were calculated, and the results are summarized in Figure 5E–G. A FC (fold change) of 1.5 was established, and thus, genes were considered upregulated in trypomastigotes when the log2(FC) was higher than 0.585 and upregulated in epimastigotes when the log2(FC) was lower than −0.585. Figure 5E–G shows the percentage of genes differentially expressed in epimastigotes or trypomastigotes as well as the percentage of genes not differentially expressed. The MF genes were organized into four different groups so that the FC could be compared among them: 1. total MF genes from the genome; 2. MF genes that are not flanking the 241 nt repeat (nonassociated); 3. MF genes located upstream to the 241 nt repeat; and 4. MF genes located downstream to the 241 nt repeat. Analyzing the FCs of the genes of the six multigenic families of *T. cruzi* Y strain, a similar distribution among the three

FC ranges was observed (Figure 5E): 32% of the total MF genes were upregulated in trypomastigotes, 35% were upregulated in epimastigotes, and 35% had no differential expression.

MF genes not associated with the 241 nt repeat and MF genes among downstream genes had similar results to the total MF genes; however, the group of MF genes among the upstream genes showed a decrease in the percentage of genes upregulated in trypomastigotes (~14%).

In *T. cruzi* CL Brener, the total MF genes showed that ~38% in S and ~39% in P of the genes are not upregulated in epimastigotes and trypomastigotes; meanwhile, ~29% in S and ~30% in P are upregulated in epimastigotes, and ~33% in S and ~31% in P are upregulated in trypomastigotes. Slight changes were observed among MF nonassociated genes; however, MF among upstream genes showed an increase in the percentage of genes upregulated in trypomastigotes (~53% in S and ~39% in P) together with a decrease in the percentage of upregulated genes in epimastigotes (~16% in S and ~22% in P). Additionally, MF among the downstream genes of CL Brener P showed a decrease in epimastigote upregulated genes (~20%), while in CL Brener S the percentage of MF among downstream genes was similar to that of the MF nonassociated genes.

These changes in differential expressed genes (DEG) of MF genes containing the 241 nt repeat in 3′ UTR strongly point to a relevant role of the 241 nt repeat in gene expression regulation among different life cycles of *T. cruzi*.

#### **4. Discussion**

The *T. cruzi* genome presents a highly repetitive DNA fraction that comprises at least 50% of its genome [14]. Apart from multigenic families, which encode mostly surface proteins and have some of their functions established, most of the repetitive elements in the *T. cruzi* genome do not yet have a defined function. Through a new approach of genome screening using a sliding window of 150 nucleotides, sequential filtering steps, and alignment of the resulting sequences, a resulting repetitive sequence of 241 nts was identified and mapped in each chromosome of clone CL Brener through Blast-n search in TriTrypDB. Further analysis showed that the 241 nt repeat is found in all strains of *T. cruzi* as well as in the ancestral strain *T. cruzi marinkellei*. This sequence was found to be distributed on almost all chromosomes of CL Brener and Y strains as an interspersed repetitive element and enriched in chromosomes with high concentrations of multigenic families, such as chromosomes 18, 28, 38, and 48 from CL Brener S [35]. However, the 241 nt repeat seems to not be randomly distributed along the genome, as it has a close relationship with its upstream genes (defined according to its transcription orientation). Analyzing the distance between the 241 nt and the upstream and downstream genes, a significantly shorter distance was observed from this element to upstream genes. Furthermore, the repertoire of genes found upstream from the 241 nt repeat was mostly composed of surface protein genes in all analyzed strains (Dm28c, Y, TCC, CL Brener, Brazil A4) and *T. cruzi marinkellei*. Surprisingly, in the Sylvio strain, we did not observe the proximity of the 241 nt repeat to upstream genes, and genes flanking the repeat were not the same as those found in all other strains. Since the other strains from TcI, TcII, and TcVI as well as the ancestral *T. cruzi marinkellei* presented the same repertoire of genes flanking the 241 nt repeat, the differences in the repertoire found in the Sylvio strain could be from genome assembly that is fragmented in repetitive regions [30].

In the *T. cruzi* genome, repeated elements (micro- and minisatellite repetitive DNA, retroelements) and multigenic families encoding surface proteins (TS, GP63, MASP, and mucins), DGF-1 and RHS are located in large nonsyntenic regions [16,36,37], also named disruptive compartments [37]. Since the 241 nt repeats are primarily associated with the surface protein genes RHS and DGF-1, they have been mapped on the nonsyntenic regions of the genome. Interestingly, even the hypothetical protein genes carrying the 241 nt repeat were mapped to this region. Taken together, these results suggest that the duplication of 241 nt repeats occurred together with the expansion of multigenic families in *T. cruzi*.

Transcriptome data analysis showed that the 241 nt repeat is indeed expressed in epimastigote and trypomastigote forms of *T. cruzi* (strains Y and CL Brener). Moreover, the 241 nt repeat is transcribed as part of 3′ UTR of trans-sialidases, MASP, mucin and GP63 and its presence seems to be involved in gene expression regulation. Gene expression analysis of trypomastigotes and epimastigotes (Y and CL Brener) indicated that MF genes associated with the 241 nt repeat are differentially expressed when compared to the MF genes nonassociated with the repeat. In CL Brener strain, a higher percentage of MF genes among upstream genes are upregulated in trypomastigotes, while in the Y strain, the MF genes among upstream genes are downregulated in trypomastigotes. However, the molecular bases that contribute to different expression patterns of genes harboring the 241 nt repeat in each of the two strains (Y and CL Brener) remain unknown.

Three findings reinforce the possibility of a biological function for this repeat: (i) the presence of the 241 nt repeat in the genomes of all analyzed DTUs and in the ancestral *T. cruzi marinkellei*; (ii) the conserved repertoire of genes flanking the 241 nt repeat in different strains and in the ancestral subspecies; and (iii) the presence of the 241 nt in the 3′ UTR region of MF genes whose expression changes in different forms of *T. cruzi* life cycle. Therefore, we propose that the 241 nt repeat could serve as a cis-regulatory element on mRNA, playing a role in the posttranscriptional regulation of surface proteins of *T. cruzi*. UTR segments are involved in gene expression regulation, regulating mRNA transcription to mRNA decay [38] and in the interaction of mRNA with other RNA molecules [39]. Diverse elements in the 3′ UTR region have been described to have cis-regulatory functions in gene expression [40,41] not only in later divergent eukaryotes but also in trypanosomatids [10,20,42–44]. In *T. cruzi*, for example, mRNAs harboring a 43-nt U-rich element in its 3′UTR are upregulated in amastigote forms. This U-rich sequence is subject to TcUBP1 (a RNA binding protein) binding, which leads to mRNA destabilization in epimastigotes and mRNA expression in amastigotes [45]. Moreover, in *Leishmania,* a 450-nt sequence was identified and showed the cis-regulatory function of mRNAs, causing an amastigote stage-specific expression of mRNA harboring it on its 3′ UTR [10]. Further experiments are necessary to investigate the proposed biological function for this repeat that, if confirmed, will contribute to the understanding of the controlled expression of genes in *T. cruzi*, a medically important organism that presents a unique system of gene expression among eukaryotes.

#### **5. Conclusions**

Through a new approach of genome screening that involves nucleotide window sliding, filtering steps, and sequence alignment, a novel repetitive sequence of 241 nts was identified. The 241 nt element (named 241 nt repeat) is not found on the *T. brucei* and *Leishmania sp* genomes, and it is interspersed on almost all chromosomes of *T. cruzi* (Y and CL Brener strain). The repertoire of genes found upstream from the 241 nt repeat was mostly composed of surface protein genes encoding trans-sialidases, MASP, mucins and GP63 protease. Since (i) this new repeat was found to be transcribed as part of the 3′ UTR of mRNAs of these multigenic families and (ii) MF harboring the 241 nt repeat presents a gene expression profile different from those not harboring the repeat, the involvement of the 241 nt repeat in the control of gene expression in *T. cruzi* is strongly suggested.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2073-4425/11/10/1235/s1; Supplementary File S1. DNA sequence of the 241 nt repeat; Supplementary File S2. Number of retrieved sequences from the 241 nt Blast-n search on available genome sequences of *Leishmania*, *Trypanosoma brucei* and *T. cruzi* from TritrypDB; Supplementary File S3. Graphs representing the frequency and location of 241 nt repeats on each chromosome of the Esmeraldo-like haplotype from the *T. cruzi* CL Brener genome; Supplementary File S4. Graphs representing the frequency and location of 241 nt repeats on each chromosome of the non-Esmeraldo-like haplotype from the *T. cruzi* CL Brener genome; Supplementary File S5. Graphs representing the frequency and location of 241 nt repeats on each chromosome of the *T. cruzi* Y strain (YC6 from TritrypDB). Supplementary File S6. List of 241 nt repeats found in intergenic region of *T. cruzi* genome sequences from the following strains: Dm28c, Y, TCC and CL Brener; Supplementary File S7. List of 241 nt repeats found inside coding regions of *T. cruzi* genome sequences from the following strains: Dm28c, Y, TCC and CL Brener; Supplementary File S8. Tables containing (i) the total number of genes found upstream and downstream to the 241 nt repeat and their representation in percentage, (ii) List of genes and their percentages among the total genes from the genome sequence and (iii) genes from the genome found upstream and downstream to the 241 nt repeat (%); Supplementary File S9. Repertoire of genes flanking the 241 nt repeat in *T. cruzi* Brazil A4, Sylvio X10/1 and *T. cruzi* marinkellei and a graph showing the distance from the 241 nt repeat to upstream and downstream genes of *T. cruzi* Brazil A4, Sylvio X10/1 and *T. cruzi* marinkellei; Supplementary File S10. Table containing information of the RNAseq alignment of *T. cruzi* CL

Brener, Y and Dm28c strains. Supplementary File S11. Table containing the percentage of dup region covered by RNAseq reads of *T. cruzi* CL Brener and Y strains.

**Author Contributions:** Conceptualization: M.C.E.; methodology: S.G.C., M.d.S.R., J.S.L.P., M.Y.N.J., and M.C.E.; formal analysis: S.G.C., M.M., M.d.S.R., J.S.L.P., M.Y.N.J., and M.C.E.; investigation: S.G.C., M.M., M.Y.N.J., J.P.C.d.C., and M.C.E.; resources: M.C.E.; data curation: S.G.C., M.M., N.d.O.N., and M.Y.N.J.; writing—original draft preparation: S.G.C., M.Y.N.J.; writing—review and editing: J.F.d.S., J.P.C.d.C., and M.C.E.; supervision, J.F.d.S.; project administration: M.C.E.; funding acquisition: M.C.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by FAPESP under grants CeTICS 2013/07467-1 and 2016/50050-2. M.C.E. is a fellow from CNPq (grant 306199/2018-1).

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
