*2.3. Data Preparation*

The usage of mental health apps for each participant was measured through four different metrics, which are the average daily time spent on mental health apps (DT), the average daily number of launches of mental health apps (DL), the average duration of daily sessions of mental health apps (DS), and the number of days of use of mental health apps (UD) during the 21 days period.

The average daily time spent was measured by finding the total daily time spent, in minutes with fractions of seconds, on the apps over the 21 days. For the average daily launches of the apps, the total daily count of sessions on the apps was taken over 21 days. Additionally, the average duration of daily sessions was calculated by taking the sum of the sessions throughout the 21 days and averaging them over 21. The number of days of use was calculated as a count of the unique days of the usage of mental health apps. The four usage metrics were calculated separately for the overall mental health apps, guidance-based mental health apps, and tracking-based mental health apps.

The study was designed to answer three different research questions and, therefore, required different criteria for the usage metrics. The first question (RQ1) was directed toward the number of users of mental health apps before and during COVID-19 and did not require the utilization of the four usage metrics (DT, DL, DS, and UD). The second question (RQ2) focused on answering the change in the usage of the overall mental health apps before and during COVID-19. In this case, all four usage metrics were measured for the participants. Additionally, we had 11 users in 2019 and 14 users in 2020 with fewer than 2 days of usage. We also had 327 users in 2019 and 452 users in 2020 with no usage. The users with no usage or number of days of use fewer than 2 days over the 21 days were not considered. This was undertaken to take into consideration that users with fewer than 2 days of usage may have installed mental health apps as a trial and hence, may not have returned to use them. Furthermore, they had a negligible time spent on mental health apps (below 2 min). The third question (RQ3) dealt with the change in usage of the two categories of mental health apps before and during COVID-19. We took all four usage metrics in this case as well and did not consider users with no usage or usage fewer than 2 days. We found 11 users in 2019 and 14 users in 2020 with fewer than 2 days of usage for guidance-based apps. Meanwhile, for tracking-based apps, we had 2 users in 2019 and 5 users in 2020. With regards to users having no usage, we had 335 users in 2019 and 478 users in 2020 for guidance-based apps. For tracking-based apps, we had 359 users with no usage in 2019 and 531 users in 2020. Similar to RQ2, the users with fewer than 2 days of usage were not considered since they had negligible time spent on mental health apps and may have installed the mental health apps as a trial.

#### *2.4. Data Analysis*

The statistical analysis was performed on JASP 0.14.1 [33]. Chi-square tests were used to determine the change in the number of users from pre-COVID-19 to during COVID-19. Chi-square tests were also further used to determine the relationship between using mental health apps and the demographic variables of their users. Phi-coefficients were used as well to determine the effect size of these relationships. The normality of the data was checked by

conducting Shapiro–Wilk tests on the four usage metrics (DT, DL, DS, and UD) with respect to the years and demographics for the overall mental health, guidance-based, and trackingbased apps. The majority of the usage metrics (that is, 66 out of the 96 distributions) did not have a normal distribution; hence, non-parametric tests were considered. Median and interquartile ranges (IQR) were used for the descriptive statistics since the usage measures were not normally distributed. Since the usage metrics were continuous variables, the Mann–Whitney U test was further applied to compare the usage from pre-COVID-19 to during COVID-19. Mann-Whitney U test was first conducted separately against the usage and the time periods for overall mental health apps, guidance-based and tracking-based apps. Then, the usage was tested against age and gender for the overall mental health apps, guidance-based, and tracking-based apps. Cohen's d was used to determine the effect size of these relationships.
