*3.1. Acceptability and Feasibility Indicators*

Process indicators are listed in Table 2. Six participants never started the training program, out of which 5 were the MPWs and 1 was an ASHA. The reasons for not starting the training were largely due to other work commitments, and other personal or family commitments. Further, several participants (*n* = 9) started the training but could not complete it. This was similarly due to other family or work commitments, and inclement weather as the training happened during the monsoon season (making it difficult to travel to the training facility for F2F participants). Thus, 27 (64%) participants completed the full training program, with 8 (57%) in F2F, 8 (57%) in DGT, and 11 (79%) in DGT+. We observed differences in program completion between the different types of non-specialist health workers, where 16 (70%) ASHAs, 8 (80%) ASHA Facilitators, and 3 (33%) MPWs completed the training.



F2F: Face-to-Face; DGT: Digital Training.

There were a total of 399 support calls related to technical assistance for the digital training programs. Among the DGT participants, there were 255 calls. This involved calls made by the participants and calls made by the research team to respond to the participants. In total, 58% of the calls (149 out of 255) were from participants to the research team. While 42% of the calls (106 out of 255) were from the research team in response to participants' queries. For DGT+ participants, the major difference was that our research team initiated the calls (as opposed to participants initiating calls). Among DGT+ participants, there were 144 calls. In total, our research team initiated 60% of the calls (87 out of 144) to participants, while 40% of the calls (57 out of 144) were from participants to our research team. The number of calls per participant ranged from 4 to 37. The calls primarily related to technical challenges, as summarized in Table 3, such as poor connectivity, the mobile app not loading or being deleted from the phone, and challenges with navigating the course content.


**Table 3.** Common technical challenges mentioned by participants during phone calls with the research team in the digital training programs.

Table 4 summarizes participants' responses to the satisfaction and acceptability questionnaire for each training program. Mean score across the domains was generally 5 or greater (out of a possible score of 6), indicating that participants rated the training programs favorably for feasibility, acceptability, and adoption. Across study arms, appropriateness was ranked lowest, suggesting that additional efforts are necessary to promote engagement with the program content. Findings from the focus group discussions (*n* = 28 participants) were grouped within the same four domains from the satisfaction and acceptability questionnaire, as highlighted in Table 5. Recommendations for improving the F2F training including increasing the duration of the training and clarifying some of the training manual content. For the digital training programs, the main recommendations were related to ensuring that the entire course could be accessed offline due to poor internet connectivity in the region, as well as providing a more comprehensive orientation session at the beginning of the program to provide an overview of the smartphone app and navigating the digital program interface, as well as extending the availability of telephone support from the research team.

**Table 4.** Participant ratings of satisfaction and acceptability with the training programs \*.


 Theinventory [55–58]. The measure consists of 26 items and was tailored to the Face-to-Face (F2F) or digital (DGT) training and translated into Hindi for use in this study. The items are rated on a six-point Likert scale, with 1 being the lowest and 6 the highest score. The questionnaire covers the domains of acceptability, appropriateness, adoption, and feasibility. The average score of each domain was calculated by adding the score of all the items in the domain divided by the number of questions in the domain. F2F: Face-to-Face; DGT: Digital Training.

\*

**Table 5.** Summary of key findings from the focus group discussions with participants in the three training programs.



**Table 5.** *Cont*.

F2F: Face-to-Face; DGT: Digital Training; HAP: Healthy Activity Program; PHQ-9: 9-item Patient Health Questionnaire.

### *3.2. Preliminary E*ff*ectiveness Outcome*

Using a paired *t*-test to explore whether there was a statistically significant mean difference between the competency scores obtained pre- and post-training for all participants (all three training programs combined), we found that participants (*N* = 36) overall scored better on the post-training assessment (Mean = 35.43; SD = 11.39) compared to the pre-training (baseline) assessment (Mean = 25.82; SD = 7.42), with a maximum attainable score of 100. This represents a significant increase of 9.61 points (95% CI: 5.17 to 14.04), *t* (35) = 4.401, *p* < 0.0005, suggesting that competency scores increased after completing the training program regardless of training format (F2F or digital). For the F2F training, the Wilcoxon signed-rank test showed a significant change in participants' competency scores (*Z* = 2.934, *p* = 0.0033). For the DGT training participants, the change was not significant (*Z* = 0.863, *p* = 0.3882), whereas, for the DGT+ training participants, the change was statistically significant (*Z* = 2.271, *p* = 0.0231), as illustrated in Figure 2. For F2F, the mean competency score improved by 13.8 (SD = 6.6) points, while for the DGT and DGT+ arms it was 2.5 (SD = 7.8) points and 12.7 (SD = 18.2) points, respectively.

**Figure 2.** Change in competency assessment scores within each training program. Note: this Figure includes scores from the *n* = 11 in F2F, *n* = 12 in DGT, and *n* = 13 DGT+ participants who completed the post-training (endline) competency measure; though, some of these participants did not complete the training programs. F2F: Face-to-Face; DGT: Digital Training.

Next, we explored changes in scores on the competency measure between training groups. We conducted a Welch's ANOVA test, which showed that there was a statistically significant difference in change in the competency score obtained before and after the training between the three groups, *F* (2,21) = 7.0358, *p* = 0.00455. Following up with a Games–Howell post-hoc test, we found that there was a statistically significant difference in the scores on the competency assessment obtained pre- and post-training between the F2F and DGT arm with *p* < 0.01, but not between the F2F and DGT+ arms.
