*3.2. Phylogeny and Classification of the RHS Multigene Family of Clone CLB*

In the phylogenetic analysis, the transcribed RHS genes were examined for the presence of RHS domains by rpsBLAST using an e-value of 1 × 10−<sup>5</sup> against the database of conserved domains [18]. Aiming to reveal the real extension of recombination events within RHS genes, in this analysis, we excluded non-LTR retrotransposons or other protein families with which RHS are commonly associated. The presence of conserved RHS domains (pfam07999, PTZ00209, and TIGRO1631) was also confirmed in other databases (CDD, Pfam, SMART, KOG, COG, PRK, and TIGR). The analysis of 139 RHS amino acid sequences was carried out using the maximum likelihood method in the RxML v 8.2.9 program by replacement models (PROTGAMMAAUTO). One thousand bootstrap replicas were processed to confirm the degree of reliability of the groups, assuming bootstrap values >75. Seventy-four RHS sequences can be categorized into groups 1 to 10 with values above the cutoff (indicated in colors), while three groups comprising 65 sequences with bootstrap values below the cutoff (indicated in black) were designated as unclassified groups. The number of sequences per group ranged from two RHS sequences in group 10 (light blue) to 15 sequences in group 3 (red) (Figure 2 and Table 1). Phylogenetic analysis showed that each RHS group consists of a monophylogenetic group. The results were also shown in the format rooted in the midpoint (Figure S5), where all the sequences with their respective TriTrypDB access numbers can be appreciated [41]. −

**Figure 2.** Phylogeny and classification of transcribed RHS sequences. Phylogenetic analysis was carried out using the RaxML v 8.2.9 program with an automatic search for substitution models (PROTGAMMAAUTO) selected using the Akaike information criterion (AIC) (auto-prot = AIC), with 1000 bootstrap replicates. Groups 1–10 comprise RHS sequences, with supported values separated by colors, and RHS sequences with bootstrap values below the cutoff (unclassified groups) are indicated in black.


**Table 1.** Distribution of the members of RHS groups across the chromosomes of clone CLB.


**Table 1.** *Cont.*

<sup>1</sup> TriTrypDB [41]. <sup>2</sup> CDS (coding DNA sequence), size in bp. <sup>3</sup> The translated peptide, size in amino acid (aa). <sup>4</sup> The direction of transcription. <sup>5</sup> RHS is located in the subtelomeric regions of the chromosomes of clone CLB [45]. <sup>6</sup> Genomic coordinates at the in silico chromosome of clone CLB (TcChr) [40]. \* The other allele at the same locus is a pseudogene.

The bulk of detailed information of the RHS groups of the CLB genome, such as chromosome mapping, genomic location including the subtelomeric region, the sizes of the coding sequence, and the predicted translated protein, is shown in Table 1. Most of RHS transcribed genes (70%) encode proteins of approximately 60 to 180 kDa, and the remainder encode peptides of 38 to 10 kDa. The RHS sequences selected for phylogenetic analysis were those assigned to CLB chromosomes (TcChr). Out of 74 RHS sequences, 58 genes have only one copy located in haplotype S or P, resulting in a hemizygous condition. Twenty-two of the hemizygotes are located in the subtelomere, a polymorphic region susceptible to homologous recombination, including ectopic recombination [5,45,46].

Our results showed that RHS hemizygotes can also be found in the interstitial chromosome regions in which the synteny is interrupted by a set of RHS sequences [47,48]. It has been proposed that the *T. cruzi* genome is organized in two compartments: a core compartment comprising conserved and hypothetical conserved genes, and a non-syntenic region (disruptive compartment) enriched by repetitive sequences such as members of multigene families TS, MASP, and mucins [3]. Other multigene families (GP63, DGF-1, and RHS) are dispersed throughout both compartments [3].

The subtelomeres of *T. cruzi* could be included in the disruptive compartment since they are enriched by genes encoding surface proteins (TS, MASP and DGF-1), retrotransposon hot spot genes (RHS), retrotransposon elements, satellite DNA, RNA-helicase and N-acetyltransferase genes [45,48–51]. Twenty-five chromosomal ends of CLB chromosomes (TcChr) are composed mostly of RHS genes and pseudogenes [45]. The disruptive compartment including the subtelomeric regions could act as sites for homologous recombination [2,3,5,26,28–30,32–35].

The members of the RHS groups are organized in multiple clusters at various genomic locations on different chromosomes, including the core and disruptive compartments and subtelomeres. (Table 1 and Figure 1). The distance between two contiguous RHS genes ranged from 2 to 50,000 bp and the identity from 55 to 98%, suggesting the occurrence of gene duplication by homologous mitotic recombination, as has been described in fungi [52,53]. Some rearrangements could be explained by unequal crossing-over between homologous chromatids (interhomolog crossover) leading to the loss of the tandem counterparts in one of the haplotypes. For example, the RHS genes of groups 1 and 7 located on chromosomes TcChr4-P and TcChr7-S, respectively, were mapped in only one haplotype, indicating the loss of these genes in the corresponding haplotype (Figure 3A,B).

**Figure 3.** Gene duplication events in the RHS sequences of clone CLB. The figure shows the physical map of the chromosome regions involved in the recombination event. For clarity, only RHS sequences are shown. The direction of transcription is indicated by blue (sense) and red (anti-sense) arrows. (**A**,**B**) Groups 1 and 4: duplication of RHS genes by unequal crossing-over with loss of tandem counterparts in one of the haplotypes (TcCh4-S and TcChr7-P). (**C**) Group 6: duplication of the RHS genes by unequal crossing-over with the conservation of one of the RHS counterparts in the TcChr15-S haplotype. (**D**) Group 7: duplication followed by genetic conversion between paralogous genes located in the TcChr16-P and TcChr16-S haplotypes (interlocus nonallelic gene conversion). The identity between homologous RHS proteins of the P and S haplotypes is indicated in the figure. The identity between paralogous RHS proteins ranged from 93 to 100%. The physical maps showing the position of RHS sequences were downloaded from the public genome database TriTrypDB [41].

The RHS genes of group 6 were mapped to the chromosomes TcChr15-P and TcChr15-S, and only the first gene (TcCLB.511871.130) of the cluster was present on the TcChr15-S haplotype, the remainder was lost by unequal crossing-over-recombination between homologous chromatids (Figure 3C). The homologous RHS genes of the TcChr15-P encode proteins with >93% identity with each other, and they share 84% identity with the paralogous RHS (TcCLB.511871.130) of the TcChr15-S haplotype. These results showed that duplications gave rise to RHS sequences in tandem that maintained the structure of the functional gene.

The RHS genes of group 7 located on the chromosomes TcChr16-P and ThChr16-S share 84–97% identity (Figure 3D), and this arrangement could be explained by genetic duplication followed by genetic conversion between non-alleles (interlocus nonallelic gene conversion), e.g., between the RHS genes TcCLB.507843.10 (TcChr16-S) and TcCLB.506809.5
