*2.4. Intra-Assemblage Genetic Diversity*

Tables 5–7, Table S5 show the genetic diversity of the *gdh*, *bg*, and *tpi* representative, partial sequences generated in the present study. These Tables provide information for each sequence including stretch, single nucleotide polymorphisms (SNPs), and GenBank accession number. Assemblage/sub-assemblage assignment was conducted by direct comparison of the sequencing results obtained at the three loci investigated. Sequences presenting double peak positions that could not be unequivocally assigned to a given assemblage/sub-assemblage were reported as ambiguous sequences.

A total of 63 sequences were successfully characterised at the *gdh* locus (Table 5). All 17 assemblage A sequences were unequivocally identified as sub-assemblage AII. Of them, seven sequences were 100% identical to reference sequence L40510. The remaining 10 sequences differed by 1–6 SNPs from L40510. BIII sequences showed a high degree of genetic diversity among them, explaining that 21/24 of the sequences assigned to this sub-assemblage corresponded to distinct genotypes (genetic variants) of the parasite. These sequences differed by 4–13 SNPs from reference sequence AF069059, most of them associated with ambiguous (double peak) positions. Similarly, most (20/22) sequences identified as ambiguous BIII/BIV sequences were different among them, differing by 9–17 SNPs from reference sequence L40508. Virtually all SNPs detected in BIII/IV sequences corresponded to double peaks at single nucleotide positions.

At the *bg* locus, a total of 55 sequences were fully characterised (Table 6). Out of the 14 assemblage A sequences, two belonged to AII and five to AIII. All AII and AIII sequences were identical to reference sequences AY072723 and AY072724, respectively. Five sequences were considered mixed AII + AIII infections based on the presence of two double peak (C415Y and T423Y) positions and taking sequence AY072723 as reference. Two additional sequences corresponded to AII + B and AIII + B mixed infections, differing by 32 and 38 SNPs from reference sequence AY072727, respectively. Except one, all the detected SNPs corresponded to clear double peak positions. Compared to the *gdh* locus, a lower (but still substantial) degree of genetic variability was observed within the 41 sequences assigned to assemblage B at the *bg* locus. All of them differed by 1–6 SNPs from reference sequence AY072727. A genetic variant showing two transitional mutations at positions C165T and A183G was the genotype most frequently detected.

A total of 55 sequences were fully characterised at the *tpi* locus (Table 7). Within assemblage A, 14 sequences were assigned to the sub-assemblage AII. Of them, eight had 100% homology with reference sequence U57897, whereas the remaining six sequences differed by 1–2 SNPs from the latter. Two additional sequences were identified as AII + BIII sequences and presented 94–95 SNPs when aligned with reference sequence U57897. Out of the 25 sequences assigned to BIII, seven showed 100% identity with reference sequence AF069561. The remaining 18 sequences grouped in 16 distinct genotypes that differed by 1–6 SNPs from reference sequence A AF069561. Only a single sequence was confirmed ad BIV, differing by 3 SNPs with reference sequence AF069560. Finally, virtually all (12/13) sequences with a BIII/BIV ambiguous result were different among them, differing from reference sequence AF069560 by 7–11 SNPs. As in the case of the BIII/BIV sequences identified at the *gdh* locus, most of the SNPs identified at the *tpi* locus were associated with ambiguous nucleotide positions.




*Pathogens* **2021**, *10*, 206



M, A/C; R, A/G; Y, C/T. **Table 6.** Diversity, frequency, and main molecular features of *G. duodenalis* sequences at the *bg* locus generated in the present study. GenBank accession numbers are provided.


K, G/T; M, A/C; R, A/G; Y, C/T.




Figure 1 shows the phylogenetic tree obtained for the *gdh* gene by maximum parsimony and Bayesian methods. All *G. duodenalis* sequences clustered together (monophyletic groups) with different well-supported clades (100% of bootstrap and 1.0 posterior probability). Two major branches were formed and included all (A–F) *G. duodenalis* assemblages. The sequences of indigenous people from the Brazilian Amazon clustered in branches for assemblage A (97% of bootstrap and 1.0 posterior probability) and B (100% of bootstrap and 1.0 posterior probability). In assemblage B, the sequences obtained in this study clustered with sub-assemblages BIII and BIV reference strains. Similar phylogenetic trees for the *bg* and *tpi* sequences generated in the present study are shown in Figures S1 and S2, respectively.

**Figure 1.** Maximum parsimony phylogenetic tree based on *gdh* sequences of *G. duodenalis*. Numbers on nodes indicate the bootstrap/posterior probability values. Black filled circles represent sequences generated in the present study. GenBank accession numbers for all sequences used for the phylogenetic analysis were embedded in the tree. *Giardia muris* was used as the outgroup.
