**2. Results**

#### *2.1. Occurrence of Enteric Parasites*

A total of 1093 children aged 3–14 years participated in the present study, of which 807 children were enrolled from 18 schools and resided in 66 neighbourhoods in 10 districts of the Zambézia province. The other 286 children were enrolled from six primary healthcare centres and a hospital clinic and resided in 37 different neighbourhoods in six districts. Overall, *G. duodenalis* was the most prevalent enteric parasite found [41.7%, 95% confidence interval (CI): 38.8–44.7%], followed by *Blastocystis* sp. (14.1%, 95% CI: 12.1–16.3%), and *Cryptosporidium* spp. (1.6%, 95% CI: 0.9–2.5%). The prevalence rates of these pathogens in each participating school and healthcare centre are summarized in Table 1. Estimates did not consider the clustered nature of the data, as this task was thoroughly conducted elsewhere [27].

**Table 1.** Molecular-based prevalence rates of *Giardia duodenalis*, *Cryptosporidium* spp., and *Blastocystis* sp. in the surveyed paediatric population by school or medical centre of origin in the Zambézia province, Mozambique.


#### *2.2. Prevalence and Molecular Characterization of G. duodenalis*

A total of 456 DNA isolates tested positive for *G. duodenalis* by real-time polymerase chain reaction (qPCR, 336 and 120 in asymptomatic and symptomatic children, respectively). Generated cycle threshold (Ct) values had median values of 31.6 (range: 18.0–42.1) in asymptomatic children, and of 31.2 (range: 19.7–41.4) in symptomatic children. Overall, 45.8% (209/456) had qPCR values ≥32, and 39.7% (181/456) had qPCR Ct values <30. Based on previous molecular studies conducted by our research team using this very same method in human populations from other African countries including Mozambique [23,28,29], only DNA isolates with qPCR Ct values <32 (*n* = 247) were assessed for genotyping and sub-genotyping purposes in order to optimise available resources.

Assemblage/sub-assemblage assignment was conducted by direct comparison of the sequencing results obtained at the three loci (*gdh*, *bg*, and *tpi*) investigated. Sequences presenting double peak positions that could not be assigned unequivocally to a given assemblage/sub-assemblage were reported as ambiguous sequences. Out of the 247 DNA isolates investigated, 15.0% (37/247), 10.1% (25/247), and 6.1% (15/247) yielded amplicons at the *gdh*, *bg*, and *tpi* loci, respectively (Table 2). Overall, 17.4% (43/247) were amplified at least at a single locus, whereas multi-locus genotyping data at the three loci were available for 4% (10/247) of them. Most (88.4%, 38/43) of the isolates successfully amplified at any of the three markers assessed had qPCR Ct values <30. Sequence analyses revealed the presence of assemblages A (7.0%, 3/43) and B (88.4%, 38/43). Two additional sequences (4.6%, 2/43) corresponded to mixed A+B infections. No infections caused by host-restricted canine (C, D), feline (F), or ruminant (E) assemblages were detected. All genotyped isolates except one were obtained in asymptomatic children.

**Table 2.** Multilocus genotyping results of the *G. duodenalis*-positive children (*n* = 43) successfully genotyped at any of the three loci investigated in the Zambézia province, Mozambique.


<sup>1</sup> Symptomatic child, *bg*: ß-giardin, *gdh*: glutamate dehydrogenase, qPCR: real-time PCR, *tpi*: triose phosphate isomerase.

All three A sequences were assigned to the sub-assemblage AII of the parasite. Out of the 40 B sequences, 12.5% (5/40) were identified as sub-assemblage BIII, 20.0% (8/40) as

sub-assemblage BIV, 52.5% (21/40) as ambiguous BIII/BIV sequences, and the remaining 15.0% (6/40) were only genotyped at the assemblage level.

The diversity, frequency, and main features of the *G. duodenalis* sequences generated at the *gdh* locus are shown in Table S1. Briefly, all four AII sequences were identical to reference sequence L40510. In contrast, a much higher level of genetic diversity was observed within the 34 sequences assigned to assemblage B at this locus. Indeed, the five sequences identified as BIII differed by 1–9 single nucleotide polymorphisms (SNPs) from reference sequence AF069059, most of them associated to heterozygous C/T peaks at positions 99, 147, 150, 309, and 336. The nine sequences unambiguously assigned to BIV differed by 2–8 SNPs from reference sequence L40508. Six of them presented nucleotide substitutions (mainly C↔T transitions) at positions 183, 387, 396, and 423, but not ambiguous positions in the form of double peaks. SNPs present in the three remaining BIV sequences combined mutations and heterozygous positions at different proportions. Remarkably, virtually all ambiguous BIII/BIV sequences different among them and by 5–15 SNPs from reference sequence L40508. Most of these SNPs involved heterozygous C/T (and, to a lesser extent, A/G) peaks. In contrast, SNPs associated to transition C↔T or A↔G mutations were rare or non-existent. Some of these ambiguous BIII/BIV sequences presented clear double peaks at defined positions specific for BIII (e.g., 99, 147, 150, 309, and 336) and BIV (e.g., 183, 387, 396, and 423) sequences, suggesting the occurrence (at an unknown rate) of true BIII+BIV intra-assemblage mixed infections. Several heterozygous positions within BIII and BIII/BIV (particularly the latter) sequences were potentially associated to amino acid change in the polypeptidic chain.

The diversity, frequency, and main features of the 25 *G. duodenalis* sequences generated at the *bg* locus are summarized in Table S2. The only sequence assigned to AII was identical to reference sequence AY072723. The other 24 sequences, all belonging to the assemblage B of *G. duodenalis*, presented a comparatively lower degree of genetic diversity than their counterparts at the *gdh* locus. Of them, four sequences were identical to reference sequence AY072727, whereas the remaining 20 sequences differed from it by 1–5 SNPs. Variations involving transitional C↔T or A↔G mutations and double peaks tended to accumulate at positions 183, 309, 519, and 565 of AY072727; some of them (including a transversion C/A mutation) were involved in amino acid substitutions at the protein level.

The diversity, frequency, and main features of the 15 *G. duodenalis* sequences generated at the *tpi* locus were summarized in Table S3. Out of the three sequences identified as AII, two of them varied by 1–2 SNPs (including a transversion C/G mutation at position 287) from reference sequence U57897. The third AII sequence lacked sufficient quality to accurately determine the presence of potential SNPs. The 10 sequences characterised as BIII differed by 1–8 SNPs from reference sequence AF069561. Six of them included only transitional C↔T or A↔G mutations, whereas the remaining four had several heterozygous positions. Detected SNPs tended to accumulate at positions 34, 108, and 141 of AF069561, some of them involved in amino acid substitutions in the polypeptidic chain. Two isolates corresponded to ambiguous BIII/BIV sequences differing by seven SNPs from reference sequence AF069560. Most of these SNPs were the result of transitional C/T or A/G mutations, one of them (T57C) involving an amino acid chain at the protein level. No transversion mutations were detected within sequences generated at the *tpi* locus.

The evolutionary relationships among the *G. duodenalis* sequences generated at the *gdh* locus in the present study were shown in Figure 1. Sequences of human origin generated by our research team in previous studies conducted in geographical areas of high (Ethiopia, Angola, Brazil, and Iran) and low (Spain) endemicity were also included in the analysis for comparative purposes. Assemblage A sequences grouped together in well-defined clusters with appropriate reference sequences. Although assemblage B sequences also formed a well-supported clade (88% of bootstrap), sub-assemblage BIII and BIV sequences could not be segregated in independent clusters. Phylogenetic trees generated at the *bg* (Figure S1) and *tpi* (Figure S2) loci seem to corroborate this finding.

**Figure 1.** Phylogenetic relationships among *Giardia duodenalis* assemblages and sub-assemblages identified in infected symptomatic and asymptomatic children in the Zambézia province, Mozambique. The analysis was conducted by a neighbor-joining method of a 412-bp fragment (corresponding to position 79–490 of reference sequence L40508) of the *gdh* gene sequence. Genetic distances were calculated using the Kimura two-parameter model. Green filled squares represent sequences generated in the present study. Purple filled dots represent reference sequences. Bootstrap values lower than 50% are not displayed. *Giardia ardeae* was used as outgroup taxon to root the tree.
