Next Article in Journal
A Hybrid Approach for an Efficient Estimation and Control of Permanent Magnet Synchronous Motor with Fast Dynamics and Practically Unavailable Measurements
Next Article in Special Issue
Glycemic Management by a Digital Therapeutic Platform across Racial/Ethnic Groups: A Retrospective Cohort Study
Previous Article in Journal
Investigation on Characterization of Typical Characteristic in Compressor Based on Flat Plate Model
 
 
Article
Peer-Review Record

Accounting for Patient Engagement in Randomized Controlled Trials Evaluating Digital Cognitive Behavioral Therapies

Appl. Sci. 2022, 12(10), 4952; https://doi.org/10.3390/app12104952
by Oleksandr Sverdlov 1,* and Yevgen Ryeznik 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(10), 4952; https://doi.org/10.3390/app12104952
Submission received: 19 February 2022 / Revised: 2 May 2022 / Accepted: 10 May 2022 / Published: 13 May 2022
(This article belongs to the Special Issue Digital Therapeutics Applications for Chronic Disease Management)

Round 1

Reviewer 1 Report

This is a well written article that offers some valuable insights for researchers.

Author Response

Thank you very much for your review. We value your feedback and comments.

Reviewer 2 Report

Line 42, the authors state that it is quite challenging to quantify and build reliable statistical models relating engagement with clinical efficacy. For instance, “more time using a digital therapeutic” does not necessarily translate into “better treatment effect”. Can the authors please provide citations or further cited information for this statement. This would assist the reader in gauging the importance of how to measure engagement.

 

Line 43, the authors state: Fourth, high heterogeneity in the usage/engagement patterns may potentially confound treatment effects. Can the authors please provide citations or further cited information for this statement. A further explanation of this line can assist statistical modelers with understanding how complex variables need to be accounted for in clinical trials.



Line 68, For simplicity, we assume that X = the percentage of successfully completed dCBT trainings at the end of the treatment period provides a meaningful quantification of individual engagement with the dCBT. Why have the authors chosen this definition of engagement? How does this simple definition of engagement affect the authors’ results and conclusions.

 

Line 70, In other words, X is measured on a scale 0 to 1, where 0 means 0% completed trainings, 0.6 means 60% completed trainings, etc. By contrast, in the control group the sham application has no
active therapeutic ingredients of a psychosocial intervention, and therefore the engagement
is a structural zero for every participant in the control group. Can the authors further address why it is pertinent to not model for engagement in the control group. It seems like this variable would affect clinical outcomes in any type of trial group.

 

Line 287, The values of power of the ANCOVA test and the two sample t-test (which are obtained at the average observed level of engagement in the experimental group) are very close. Can the authors please explain what this means to their original study assumptions.

Line, 439 In our considered example, engagement in the experimental group (plausibly correlated with the clinical endpoint) represents an influential covariate that should be adjusted for in the analysis. While this may be true, can the authors contextualize this result with the fact that engagement is defined very simply and only considered for the treatment group.

 

Line 441, Simple unadjusted analysis using two-sample t-test can be applied to compare the effect of the experimental treatment versus control at the observed average level of engagement. While this is a legitimate approach, it is not optimal for the considered problem. An ANCOVA model with proper adjustment for engagement leads to minimum variance unbiased estimates and is potentially more
powerful. This statement, which is referring to the power and differences between the two statistical tests is generally true and known. Can the authors please explain how their modeling has added to this already known difference between the benefits of using ANCOVA vs Two sample T-Test.  

 

Line 446, Furthermore, an investigator may be interested in treatment comparison at levels of engagement other than the average level observed in the study, and this is possible with the ANCOVA but not with the two-sample t-test. Are the authors stating that their study results are informing this statement? If so, could the authors please further explain why their study was needed to reach this conclusion which seems to already be a feature of ANCOVA? In other words, researchers using ANCOVA techniques may already be aware of what ANCOVA techniques can offer, what then is the benefit of this study?

 

Line 451, Our considered example used a very simple linear model, just to illustrate the phenomenon. In practice, more complex models may be used to describe a relationship between the primary endpoint, treatment, engagement, and possibly other factors. How has this example that has utilized a simple linear model and a simple definition of engagement furthered the reader’s understanding of the benefits of ANCOVa vs Two sample T-Tests?

 

Line, 458, Unfortunately, mechanisms of action of CBT and dCBT may be elusive, quantifying engagement may be challenging, models can be difficult to formulate, and experiments may lack statistical power. Despite all these challenges, it is essential that science-based models are developed
before interventions are tested in RCTs. How does this manuscript account for the fact that indeed, clinical trials involve complex variables that need to be continually teased out and understood; Can utilizing simply defined variables to model statistical effects be useful? In light of this, What is the contribution of this manuscript?

 

Line 470, However, one should always mind uncertainty which can be high and render the resulting confidence intervals meaningless. This again speaks to the need for models that accurately reflect mechanisms of action of experimental therapies. This is a great point, could the authors please explain how it is that their study has shed any light on this matter? It seems like the limitations of choosing a simple definition of engagement and not accounting for many possible theoretical control variables or competing covariates renders the conclusions in this manuscript hard to appreciate.

 

Line 477, What if engagement was high but there was no clinical benefit of the experimental treatment compared to the control? Technically, since data on engagement is acquired after randomization, it is affected by treatment and should be regarded as a response. In our model we used engagement as a covariate, and it was assumed to be measured without error. What then does this reflection have on the results and conclusion of the present study? If one wants to account for the effect of covariates, then one should indeed use ANCOVA over a two sample t-test. This is known, so the question is what knowledge is this manuscript contributing? It may be that the way the study hypotheses are stated may be confusing to a reader. It is unclear if the study is highlighting the need to account for engagement or if the study is highlighting the need to use ANCOVA. In either case the manuscript can be further strengthened by the authors further explaining how their study results account for the insightful limitations that they identify in the discussion section.

Author Response

Thank you very much for your review. We value your feedback and welcome your suggestions for improvement. Please see our point-by-point responses below.

---

Comment #1: Line 42, the authors state that it is quite challenging to quantify and build reliable statistical models relating engagement with clinical efficacy. For instance, “more time using a digital therapeutic” does not necessarily translate into “better treatment effect”. Can the authors please provide citations or further cited information for this statement. This would assist the reader in gauging the importance of how to measure engagement.

Response to #1:  Thank you for this comment. We have added a reference to a recently published study of a smartphone-based intervention as an adjunct to standard of care in schizophrenia (Ghaemi et al., 2022), for which the first author was a trial statistician. In essence, the study found no benefit on the 12-week primary and secondary efficacy endpoints despite that application engagement was good and patient and clinical investigator satisfaction was high. 

---

Comment #2: Line 43, the authors state: Fourth, high heterogeneity in the usage/engagement patterns may potentially confound treatment effects. Can the authors please provide citations or further cited information for this statement. A further explanation of this line can assist statistical modelers with understanding how complex variables need to be accounted for in clinical trials.

Response to #2: We concur with your comment. The following text and the reference have been added: “…For instance, if patients are randomized to treatments that do not meet their expectations, this may affect the patients’ motivation to engage and comply with the treatments, which, in turn, may impact the study outcomes.” [Reference: Truzoli et al. (2021) Patient expectations of assigned treatments impact strength of randomised control trials. Frontiers in Medicine,  8:648403. doi: 10.3389/fmed.2021.648403 ]

---

Comment #3: Line 68, For simplicity, we assume that X = the percentage of successfully completed dCBT trainings at the end of the treatment period provides a meaningful quantification of individual engagement with the dCBT. Why have the authors chosen this definition of engagement? How does this simple definition of engagement affect the authors’ results and conclusions.

Response to #3: Thank you for this comment. We have added the following text (and some additional references)” “…While the utility of “completion rate” as a measure of engagement for digital mental health interventions has been well documented [Refs: Zeng et al., 2020; Palmier-Claus et al., 2012; Kreyenbuhl et al., 2016; Bucci et al., 2018], we acknowledge that this only one way to measure engagement. In practice other, more elaborate metrics may be considered. However, in our opinion, a simple metric such as “completion rate” is rather informative and can be used as a starting point for developing statistical models.”

---

Comment #4: Line 70, In other words, X is measured on a scale 0 to 1, where 0 means 0% completed trainings, 0.6 means 60% completed trainings, etc. By contrast, in the control group the sham application has no active therapeutic ingredients of a psychosocial intervention, and therefore the engagement is a structural zero for every participant in the control group. Can the authors further address why it is pertinent to not model for engagement in the control group. It seems like this variable would affect clinical outcomes in any type of trial group.

Response to #4: Thank you for this comment. Indeed there may be different ways to define and quantify engagement, and in this paper we considered only one of them. To clarify, the following text has been added: “…To illustrate the latter point, consider, for example a recent randomized, sham-controlled clinical trial of a smartphone-based application as an adjunct to the standard-of-care in schizophrenia [Ref: Ghaemi et al., 2022]. Participants in the experimental group were exposed for a period of 12 weeks to an app that was designed as a self-management tool in schizophrenia. The users could engage with the app by prompt or on demand, and the app provided interactive, cognitive and behavior exercises that were hypothesized to improve symptoms in schizophrenia. Participants in the sham group were exposed for 12 weeks to an app that was similar in appearance to the experimental app, but which did not deliver any active therapeutic content; it only sent periodic prompts to open the app in which case a digital clock timer was displayed. The sham control arm was chosen to account for the nonspecific effects of engagement with a smartphone. Clearly, in this case the engagement defined as “exercise completion rate” can be quantified only in the experimental group, and it can be viewed as zero in the control group.”

---

Comment #5: Line 287, The values of power of the ANCOVA test and the two sample t-test (which are obtained at the average observed level of engagement in the experimental group) are very close. Can the authors please explain what this means to their original study assumptions.

Response to #5: We have modified the sentence to make the second observation more clear: “…The values of power of the ANCOVA test and the two-sample t-test are very close. This implies that both ANCOVA and a two-sample t-tests are appropriate for the inference on treatment difference at the average observed level of engagement.”

---

Comment #6: Line, 439 In our considered example, engagement in the experimental group (plausibly correlated with the clinical endpoint) represents an influential covariate that should be adjusted for in the analysis. While this may be true, can the authors contextualize this result with the fact that engagement is defined very simply and only considered for the treatment group.

Response to #6: We concur with your comment. We have modified the first paragraph of section 6, emphasizing our modeling assumptions, the message on the existing knowledge of statistical properties of ANCOVA and a two-sample t-test, and the added value of our study. Specifically, one key contribution of our current paper (that, to our knowledge, has not been studied before) is the investigation of statistical power and sample size requirements to address research questions involving treatment comparison at different levels of engagement (including, but not limited to the average observed in the study). We have shown how these issues can be addressed at the design planning stage, and how a subsequent data analysis can be performed. Clearly, our study has limitations, and these have been highlighted as topics for the future work. 

---

Comment #7: Line 441, Simple unadjusted analysis using two-sample t-test can be applied to compare the effect of the experimental treatment versus control at the observed average level of engagement. While this is a legitimate approach, it is not optimal for the considered problem. An ANCOVA model with proper adjustment for engagement leads to minimum variance unbiased estimates and is potentially more powerful. This statement, which is referring to the power and differences between the two statistical tests is generally true and known. Can the authors please explain how their modeling has added to this already known difference between the benefits of using ANCOVA vs Two sample T-Test.  

Response to #7: We concur with this comment. Please see our detailed response to comment #6.

---

Comment #8: Line 446, Furthermore, an investigator may be interested in treatment comparison at levels of engagement other than the average level observed in the study, and this is possible with the ANCOVA but not with the two-sample t-test. Are the authors stating that their study results are informing this statement? If so, could the authors please further explain why their study was needed to reach this conclusion which seems to already be a feature of ANCOVA? In other words, researchers using ANCOVA techniques may already be aware of what ANCOVA techniques can offer, what then is the benefit of this study?

Response to #8: We concur with this comment. Please see our detailed response to comment #6.

---

Comment #9: Line 451, Our considered example used a very simple linear model, just to illustrate the phenomenon. In practice, more complex models may be used to describe a relationship between the primary endpoint, treatment, engagement, and possibly other factors. How has this example that has utilized a simple linear model and a simple definition of engagement furthered the reader’s understanding of the benefits of ANCOVA vs two-sample T-Tests?

Response to #9: We concur with this comment. More complex data structures can (and should) be considered, and the corresponding statistical developments will have both similarities and differences from the approach we presented. Please see also our detailed response to comment #6 (on both the limitations and the future work). 

---

Comment #10: Line, 458, Unfortunately, mechanisms of action of CBT and dCBT may be elusive, quantifying engagement may be challenging, models can be difficult to formulate, and experiments may lack statistical power. Despite all these challenges, it is essential that science-based models are developed before interventions are tested in RCTs. How does this manuscript account for the fact that indeed, clinical trials involve complex variables that need to be continually teased out and understood; Can utilizing simply defined variables to model statistical effects be useful? In light of this, What is the contribution of this manuscript?

Response to #10: Thank you for this comment. Indeed, our study considered a rather simple model and we list this as a limitation. However, at the same time, our study captures an essential feature of experiments evaluating complex cognitive behavioral interventions—the need for considering patient engagement in both the design and analysis of such experiments. Our considered measure of engagement (“completion rate”) is only one established metric; there are many others that can be potentially useful in practice. Please see also our detailed response to comment #6.

---

Comment #11: Line 470, However, one should always mind uncertainty which can be high and render the resulting confidence intervals meaningless. This again speaks to the need for models that accurately reflect mechanisms of action of experimental therapies. This is a great point, could the authors please explain how it is that their study has shed any light on this matter? It seems like the limitations of choosing a simple definition of engagement and not accounting for many possible theoretical control variables or competing covariates renders the conclusions in this manuscript hard to appreciate.

Response to #11: This paragraph has been streamlined by keeping only the main points. Please see our detailed response to comment #6.

---

Comment #12: Line 477, What if engagement was high but there was no clinical benefit of the experimental treatment compared to the control? Technically, since data on engagement is acquired after randomization, it is affected by treatment and should be regarded as a response. In our model we used engagement as a covariate, and it was assumed to be measured without error. What then does this reflection have on the results and conclusion of the present study? If one wants to account for the effect of covariates, then one should indeed use ANCOVA over a two sample t-test. This is known, so the question is what knowledge is this manuscript contributing? It may be that the way the study hypotheses are stated may be confusing to a reader. It is unclear if the study is highlighting the need to account for engagement or if the study is highlighting the need to use ANCOVA. In either case the manuscript can be further strengthened by the authors further explaining how their study results account for the insightful limitations that they identify in the discussion section.

Response to #12: Thank you for this comment. The intent of this paragraph is not to reflect on the results of the current study; its purpose is to highlight a potential problem for the future work. We have rearranged the concluding paragraphs such that the joint modeling of efficacy and engagement is listed as one of the directions for future research.

Reviewer 3 Report

Please see the attachment.

Comments for author File: Comments.pdf

Author Response

Thank you very much for your review. We value your feedback and welcome your suggestions for improvement. Below please find our point-by-point responses to your comments.

---

Comment #1: I suggest the authors discussing patient engagement in the context of mediation analysis instead of covariate adjustment. In randomized trials, a covariate, or baseline covariate, usually refers a variable measured before randomization and hence not confounding the treatment effect. However, patient engagement is measured after treatment assignment and impacted by the treatment. The estimands discussed in this paper are hence more related to the direct and indirect effect in the mediation analysis literature, rather than covariate adjusted analysis in FDA (2021). One paper that has similar setting is Valeri et al. (2014).

Response to #1: 

Thank you for this comment. We concur with you suggestion and included this point explicitly as a topic for the future work: “…As one of the reviewers pointed out, it may be more appropriate to discuss patient engagement in the context of mediation analysis instead of covariate adjustment. An approach to mediation analysis when a continuous mediator is measured with error and the outcome follows a generalized linear model was described in the reference [Valeri, L., Lin, X., and VanderWeele, T.J. Mediation analysis when a continuous mediator is measured with error and the outcome follows a generalized linear model. Statistics in Medicine 2014; 33(28):4875-4890.] This approach may be also useful in RCTs of digital mental health interventions, and we defer this important topic to the future work.”

---

Comment #2: Page 6. (n-2)/(n-3) is larger than 1 instead of smaller than 1. Hence ANCOVA-based inference is not guaranteed to be more powerful than two-sample t-tests. The authors also need to further explain why the two methods have similar power in Figure 2 and show when ANCOVA can be more powerful.

Response to #2: Thank you very much for this comment. We have identified the mistake and made the correction as you suggested. Indeed, the two estimators of the error variance may be numerically different (with a difference possible in both directions)—and this difference depends on the strength of correlation and the values of the sample variances. We have elaborated on these items in-text.

---

Comment #3: 

This paper assumes a very stringent model on the relationship among the outcome,

treatment, and patient engagement. Particularly, more evidence, such as real data examples, is need to show a linear effect of patient engagement on the outcome in [0; 1]. Since the interpretation of one estimand is on zero engagement, it is not clear to me whether this model is true on the boundary. The authors could consider proposing statistical tests of the model corrections or using model-robust inference.

Response to #3: We concur with your comment. It definitely deserves additional investigation which we highlight in the discussion section. Specifically, the following text has been added: “…While the approach in this paper was considered for a rather simple model (a single covariate representing engagement in the experimental group, measured on a continuous scale from 0 to 1), it captures an essential feature of experiments evaluating complex interventions—the need for considering patient engagement in both the design and analysis of the experimental data. However, our assumed model can be challenged on a number of grounds. First, it is rather stringent and it may not hold on the boundary when the engagement is equal to zero. Formal statistical tests may be required for checking the plausibility of this assumption and model-robust inference may be necessary at the analysis stage. We designate this as an important problem and defer it to the future work. Second, formulation of a model adequately describing the relationship between primary endpoint, treatment, engagement, and possibly other factors is a major problem in itself. It will depend on the disease area, the target patient population, the mechanism of action of an experimental therapy, etc. Product developers and clinical investigators should work closely with statisticians to build scientifically sound models and test/validate them in carefully designed experiments. These considerations merit further investigation, but they are beyond the scope of the present work.” 

Round 2

Reviewer 2 Report

The authors have addressed all of the questions and feedback I have provided previously. 

Reviewer 3 Report

Thank you for your responses to my comments. 

Back to TopTop