*4.7. Statistical Modeling*

Data were analyzed using the software R version 4.0.5 [55]. The chi-squared or Fisher's exact test, Odd ratios (OR) and 95% confidence intervals (95% CIs) were used to compare and describe the qualitative variables.

Only participants with complete data records were included in the final analysis. A new variable was created according to the World Health Organization's definition of diarrhea and the information gathered about the type of stool and the number of depositions per day. A case of acute diarrhea was defined as a person with more than three episodes of liquid stools per day, lasting less than 2 weeks [56]. Prevalence rates at 95% CIs for single infections and coinfections in the study population were calculated using epiR library version 0.5-10 [57].

#### 4.7.1. Co-Occurrence of Enteric Pathogens

Null model analysis was used to explore whether enteric protozoa coinfections were positive, negative, or randomly associated. Data were organized as a presence-absence 4 × 507 (row × columns) matrix, in which each row represented a protozoa species. Each column represented a study participant, "1" indicated that a species was present at a particular host and "0" indicated that a species was absent.

The C-score was the co-occurrence index used for co-occurrence patterns characterization. The algorithm chosen was the fixed row-equiprobable column [58]. The calculated C-score was compared with the expected C-score calculated for 5000 randomly assembled null matrices by Monte Carlo simulations. Furthermore, to compare the degree of co-occurrence across data, a standardized effect size (SES) was calculated, an index that measures the number of standard deviations that the observed index (C-score) is above or below the mean index of the simulated communities. The package "EcoSimR" version 0.1.0 was used to carry out the analysis [59].

#### 4.7.2. Assessing the Impact of Coinfection with Enteric Pathogens on Diarrhea Severity

The partial least square (PLS) regression method was used to assess the impact of coinfection with enteric protozoa on developing clinical symptomatology. This technique was selected as it offers multiple advantages over other regression methods: it is the least restrictive of the multivariate techniques for exploring complex ecological patterns [60], including the impact of coinfections on the host's health [23], and its distribution is free and well suited to deal with multicollinearity [61]. In our analysis, we defined explanatory and response components or blocks. The explanatory block (PLS X's component) was defined by a presence-absence matrix representing the enteric protozoa community (*Blastocystis* sp., *G. duodenalis*, *E. histolytica*, and *Cryptosporidium* spp.). In addition, due to the previously mentioned age variability in the clinical presentation of diarrheal diseases, age in years was also included as a covariate in the explanatory block. Our response block (PLS's Y component) included the main symptoms described associated with the infections of those protist species (abdominal distension, lack of appetite, itchy skin, perianal pruritus, constipation, nausea, abdominal pain and acute diarrhea).

The significance of PLS models was assessed using Stone–Geisser's Q2 test, a crossvalidation redundancy measure created to evaluate the predictive significance of exogenous variables. Values greater than 0.0975 indicate that predictors are statistically significant, whereas values below this threshold reveal no significance. Finally, the percentage of observed MNLS variability explained by the enteric pathogen block was also estimated. The "plspm" version 0.4.9 was used to perform the analysis [62].
