*4.3. Statistical Analysis*

In order to compare the functional, activation, and memory profile of *Mtb*-specific CD4+ T-cells between antigen conditions or patient groups, we used the Wilcoxon rank sum, and Kruskal–Wallis with Dunn's tests for multiple comparisons as appropriate. Cytokine combinations were assessed using SPICE (simplified presentation of incredibly complex evaluations) software and data are reported after background subtraction [33]. Di fferences in the functional, activation, and memory profile of *Mtb*-specific CD4+ T-cells between baseline and two months were analyzed using the Wilcoxon signed rank test. Receiver operating characteristics (ROC) curve analysis was used for single marker analysis. COMPASS (combinatorial polyfunctionality analysis of antigen-specific T-cells) and MIMOSA (mixture models for single cell assays) analysis was conducted in R (v1.1.463, The R Foundation for Statistical Computing, Vienna, Austria) using the packages as described previously [19]. To reliably measure the expression of phenotypic markers on antigen specific cells, we first determined whether the immune response to antigen stimulation was significantly higher when compared to non-specific cytokine production detectable in unstimulated samples. To do this, we applied MIMOSA analysis and defined antigen responders as participants with a false discovery rate (FDR) < 0.05 from this analysis. COMPASS utilizes a Bayesian hierarchical mixture model to identify antigen-specific changes simultaneously across all possible T-cell subsets [19]. Such an approach has a higher sensitivity and specificity for detecting true responses over alternative approaches such as basic log fold change [19]. A Markov Chain Monte Carlo algorithm computes posterior probabilities resulting in two scores; functionality and polyfunctionality that can be correlated with any outcome of interest. We analyzed data using STATA (v14.2; StataCorp, College Station, TX, USA) and GraphPad Prism (v8.1; GraphPad Software, La Jolla, CA, USA). A *p*-value < 0.05 was considered statistically significant and we adjusted for multiple comparisons where needed.
