Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods

Filos, Dimitris; Claes, Jomme; Cornelissen, Véronique; Kouidi, Evangelia; Chouvarda, Ioanna

doi:10.3390/app13106120

Open AccessArticle

Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods

by

Dimitris Filos

^1,*

,

Jomme Claes

²

,

Véronique Cornelissen

²

,

Evangelia Kouidi

³

and

Ioanna Chouvarda

^1,*

¹

Laboratory of Computing, Medical Informatics and Biomedical Imaging Technologies, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

²

Department of Rehabilitation Sciences, University of Leuven, 3000 Leuven, Belgium

³

Laboratory of Sports Medicine, Department of Physical Education and Sport Science, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(10), 6120; https://doi.org/10.3390/app13106120

Submission received: 7 April 2023 / Revised: 15 May 2023 / Accepted: 15 May 2023 / Published: 16 May 2023

(This article belongs to the Special Issue Human Activity Recognition (HAR) in Healthcare)

Download

Browse Figures

Versions Notes

Abstract

:

Cardiac rehabilitation (CR) focuses on the improvement of health or the prevention of further disease progression after an event. Despite the documented benefits of CR programs, the participation remains suboptimal. Home-based CR programs have been proposed to improve uptake and adherence. The goal of this study was to apply an end-to-end methodology including machine learning techniques to predict the 6-month adherence of cardiovascular disease (CVD) patients to a home-based telemonitoring CR program, combining patients’ clinical information with their actual program participation during a short familiarization phase. Fifty CVD patients participated in such a program for 6 months, enabling personalized guidance during a phase III CR study. Clinical, fitness, and psychological data were measured at baseline, whereas actual adherence, in terms of weekly exercise session duration and patient heart rate, was measured using wearables. Hierarchical clustering was used to identify different groups based on (1) patients’ clinical baseline characteristics, (2) exercise adherence during the familiarization phase, and (3) the whole program adherence, whereas the output of the clustering was determined using repetitive decision trees (DTs) and random forest (RF) techniques to predict long-term adherence. Finally, for each cluster of patients, network analysis was applied to discover correlations of their characteristics that link to adherence. Based on baseline characteristics, patients were clustered into three groups, with differences in behavior and risk factors, whereas adherent, non-adherent, and transient adherent patients were identified during the familiarization phase. Regarding the prediction of long-term adherence, the most common DT showed higher performance compared with RF (precision: 80.2 ± 19.5% and 71.8 ± 25.8%, recall: 94.5 ± 14.5% and 71.8 ± 25.8% for DT and RF accordingly). The analysis of the DT rules and the analysis of the feature importance of the RF model highlighted the significance of non-adherence during the familiarization phase, as well as that of the baseline characteristics to predict future adherence. Network analysis revealed different relationships in different clusters of patients and the interplay between their behavioral characteristics. In conclusion, the main novelty of this study is the application of machine learning techniques combining patient characteristics before the start of the home-based CR programs with data during a short familiarization phase, which can predict long-term adherence with high accuracy. The data used in this study are available through connected health technologies and standard measurements in CR; thus, the proposed methodology can be generalized to other telerehabilitation programs and help healthcare providers to improve patient-tailored enrolment strategies and resource allocation.

Keywords:

adherence; cardiac rehabilitation; machine learning; prediction; exercise; home-based; familiarization phase; telemonitoring

1. Introduction

Cardiovascular diseases (CVDs) constitute one of the major health problems in Europe, accounting for about 45% of all causes of death, and with a continuously increasing prevalence [1]. This high incidence and prevalence of CVD leads to a high personal health burden and places a huge burden on society, with EUR 210 billion/year spent on the management of CVD [2]. Considerable differences in prevalence are observed between European countries, mainly associated with the prevalence of several risk factors, such as smoking, obesity, diabetes, and physical inactivity.

Physical activity has been recognized to have a beneficial effect on the prevention of CVD. Therefore, exercise training is a central part of secondary prevention programs, referred to as “cardiac rehabilitation” (CR) [3]. Based on the clinical guidelines [3], moderate exercise for at least 150 min per week and behavioral changes towards a less sedentary lifestyle can reduce cardiovascular risk factors. The WHO [4] defines adherence as the extent to which a person’s behavior (e.g., lifestyle changes) corresponds with current recommendations. Adherence to physical activity can be quantified in several ways [5]. In general, adherence to lifestyle changes has been recognized as a crucial component towards better management of patients with chronic disease, but this goal is rarely achieved [6,7]. However, despite the beneficial effects of CR, participation rates remain low, with less than 40% of eligible patients attending CR programs [8,9]. Low socioeconomic status, age, gender, the proximity to a CR center, and behavioral aspects such as lack of motivation and reduced self-efficacy have been identified [9] as the main barriers to CR participation.

In order to overcome the aforementioned barriers and increase both uptake and adherence to CR, home-based telerehabilitation services have been developed, considering the advances in technology and the Internet of things (IoT) [10]. Indeed, telerehabilitation programs have proven to be a safe and effective approach to managing heart failure (HF) patients [11]. In a meta-analysis by Claes [12], it was found that center-based and home-based CR had equal effects on exercise capacity, while others found equal effects on quality of life (QoL) and cost-effectiveness [13].

One major advantage of modern telerehabilitation services is the personalized guidance that they offer, which is facilitated by the availability of low-cost and unobtrusive devices that integrate various sensors that are useful for the quantification of exercise response [11]. For example, accelerometry data can be used to evaluate the volume of exercise, heart rate sensors are able to capture exercise intensity, and geolocation services allow for the estimation of walking distance. However, the plethora of available devices also leads to a large heterogeneity in intervention design, ranging from motivational messages [14] to telephone counseling [15] or personalized real-time adaptation of exercise sessions [16].

However, the problem of non-adherence is also apparent in home-based or self-management interventions [17], where significant variations in the levels of adherence have been reported compared to center-based CR programs. Several RCTs [18] and meta-analyses [19,20] have reported that patient-centered approaches show an improvement in patient adherence to CR programs. However, most of these studies were observational and used subjective information and self-reports to quantify adherence [21].

Both patient-related factors and intervention design could be addressed to increase adherence [22]. In a systematic review, it was found that both the program characteristics and personal factors, including health and cognition status, influence adherence to exercise programs [23]. Essery et al. [21] presented a list of factors that are associated with adherence to home-based physical therapies, where it was reported that the perception of health status, self-motivation, or current physical activity level presents a strong positive association with adherence, while daily stress has a strong negative association with it. On the other hand, the incorporation of data-driven or rule-based models can guide decisions, leading to improved adherence [24].

It would be beneficial to integrate objective patient information from a short period of time to predict long-term adherence to exercise and, thus, proceed with appropriate targeted modifications and better use of resources. In [25] the Discontinuation Prediction Score (DiPS) was introduced, which compares each week’s average steps with those of the first week of the program. It can be used to score the probability of dropping out from exercise programs during the week. The prediction model used objectively collected physical activity data from 210 physically inactive women aged 25 to 69 years, and it applied logistic regression and support-vector machines to predict the DiPS. As found, the adherence rate decreased as the program progressed, whereas daily steps at the start of the program and the steps measured during the previous week were significant predictors of DiPS. In a study performed by [26], three different clusters of participants were created based on basic individual characteristics and training data collected during the first three months of the application’s use. Deep learning techniques were applied for the prediction of adherence to exercise during the fourth month of the program. Finally, [27] applied data-characteristics-based long short-term memory (DC-LSTM) recurrent neural networks (RNNs) to predict outdoor physical activity, taking into account patient profiles and environmental characteristics, such as weather, temperature, and humidity. However, in all of these studies, the focus was on the short-term prediction. It would be beneficial to identify the patients who are likely to be non-adherent at an early stage of the program, so as to modify the motivation strategies and to use the resources efficiently. In a previous study performed by our research group, adherence to a short familiarization period for home-based CR was combined with clinical characteristics to predict future adherence [28]. A support-vector machine (SVM) classifier was trained using the most significant features. However, in that study, only those patients who could clearly be considered to be adherent or non-adherent during the familiarization phase were included. This resulted in the exclusion of a considerable number of patients who were moderately adherent, hindering the generalization of the method and results.

In this paper, we hypothesize that a predictive model based on machine learning techniques, which integrates (i) patient clinical characteristics, (ii) data from self-reports, and (iii) objective physical activity information gathered during a short familiarization phase, can predict longer-term adherence to a home-based CR program for CVD patients. Therefore, the specific aims of this study were as follows: (1) to cluster patients into distinct groups based on the adherence to the system during a 6-week familiarization period, (2) to investigate significant differences between the groups during the aforementioned period, and (3) to implement a model that could predict the use of the system during a 6-month CR program. Following a data-driven approach, while adherence prediction was considered to be a discrete problem (N classes), the number of classes was not predefined but, rather, identified during the analysis pipeline via clustering. Finally, the predictive model needed to be explainable so that it could be used by the clinical experts to better support patients in adhering to home-based CR or to search for other CR alternatives if predicted adherence to home-based CR was low.

2. Materials and Methods

The graphical overview of the proposed methodology is depicted in Figure 1. Each part of the figure is described in detail in the following section. In brief, the implementation of the predictive model was based on data collected from patients with CVD. Different types of data are available, including clinical data and actual usage of the system based on smartwatches. Unsupervised learning methods, such as hierarchical and spectral clustering, were used, and the patients were grouped into different groups. Machine learning techniques were applied in order to predict long-term adherence to telerehabilitation programs. Finally, network analysis was performed in order to identify relationships between the features.

2.1. Data Description

This study uses data that were collected during the PATHway-I trial [16]—a single-blinded randomized control trial (RCT) involving 120 patients that were randomized into a usual care group and an intervention group, on a 1:1 basis. Given the scope of the present study, only the patients from the intervention group were included in this analysis.

In brief, the Physical Activity Towards Health (PATHway) was a home-based CR platform that aimed to empower patients towards self-management of their CVD [16]. It combined gamified approaches (ExerClass/ExerGames), e-coaching, and outdoor activities such as jogging or bicycling, to promote an active and healthy lifestyle according to standard clinical guidelines [29]. Clinical evaluation performed before the start of the CR program assisted clinical experts to set personalized goals and exercise intensities for the patients. Heart rate (HR), captured by smartwatches using Microsoft Band (which measures HR accurately [30]), along with subjective information from questionnaires, allowed for the continuous monitoring and adaptation of the program based on patient performance and preferences, in both the short- and longer-term horizons. In brief, the short-term horizon aimed to guide the patient during the ExerClass/ExerGame sessions to exercise within the personalized beneficial HR zone. This was followed by the provision of a variety of aerobic or resistance exercises of different levels of intensity or difficulty and targeting different body parts. On the other hand, the customization of the exercise program on a weekly basis aimed to improve patients’ exercise adherence to the program. A decision support system (DSS) integrated this patient information with clinical guidelines, and experts’ knowledge was developed to achieve this goal [24]. Finally, a notification module was included, aiming to provide tailored messages to the patients to maintain their engagement with the PATHway system [31].

Patients randomized to the intervention group participated in a familiarization phase to become acquainted with the home-based CR intervention. During the first 4 weeks, the patient was guided by experts on how to use the PATHway system. To evaluate the adherence to the exercise program, observation of patient behavior without additional supervision was valuable in evaluating adherence to the exercise program. In this respect, an additional 2-week period was considered, where the patient used the system without supervision by an expert. Thus, the total duration of the familiarization phase was 6 weeks, and it represented approximately 20% of the whole program’s duration (Figure 2). Patients with a median duration of exercise sessions per week equal to zero were considered to be absent from the program and were excluded from further analysis.

2.1.1. Data from Baseline and Periodic Clinical Evaluation

A plethora of data were collected during baseline and at 3 months and 6 months after the start of the intervention. The data were categorized into three main categories:

Cardiovascular Risk Profile: These markers were collected through blood sampling and anthropometric measurements. The Framingham cardiovascular risk score was calculated as described in [32].
Health-Related Physical Fitness: These data represent the findings from a maximally graded cardiopulmonary exercise test (CPET) on a bicycle, along with muscle strength testing including maximal isometric and isokinetic quadriceps strength, handgrip strength, and a 30 s sit-to-stand test.
Psychological wellbeing and intervention effectiveness: This subjective information was collected using standardized questionnaires assessing QoL [33], physical activity behavior [34,35,36], smoking, alcohol consumption [37], diet [38], stress [39], medication adherence [40], mental wellbeing [41], social support [42], self-efficacy [43], and perceived health status [44,45].

In total, 59 features were measured at baseline and 6-month follow-up, while 52 of these features were collected at 3-month follow-up. A detailed overview of the collected data has been published previously [16,46].

2.1.2. Exercise Session Data

During the execution of the exercise session, the duration of the session was captured either automatically in the case of ExerClass/ExerGames or synchronized later when the patient exercised outdoors. Independent of the type of session (ExerClass/ExerGame or outdoor activity), the heart rate of each patient was captured by the smartwatch, with a sampling frequency of 1 Hz. These data were used to quantify patient performance and adherence to the exercise program.

Exercise Adherence Metric

The adherence to the system was assessed in terms of the mean duration of exercise sessions performed each week. In more detail, the adherence to the exercise program in week i was measured as follows:

a d h e r [i] = S e s s D u r a t i o n [i] / N s e s s i o n s [i]

(1)

In Equation (1), SessDuration is the total duration of the exercise session performed during week I, while Nsessions is the total number of sessions for that week. SessDuration was measured automatically in the case of indoor activity with the PATHway system, while in the case of outdoor activity the patient started and ended the recording of the session. Since it could be possible that a patient forgot to stop the recording, the maximum value for SessDuration [i] was set to be equal to 120 min. In addition, sessions with a duration of less than 10 min were also excluded from the analysis, as they were characterized as invalid activities [46].

Exercise Performance Metric

According to [29], a patient must exercise above a minimal HR threshold to achieve health benefits, which is defined as 40% of the maximum HR measured during a cardiopulmonary exercise test (CPET). Thus, HRlower is defined as follows:

H R l o w e r = 0.4 * H R p e a k

(2)

where

H R p e a k

is the maximum HR measured during baseline CPET. To quantify patient performance, the

H R t i m e

was estimated as the percentage of time that the HR was greater than

H R l o w e r

.

In addition, the

H R m e a n

for the

k^{t h}

session was calculated as follows:

H R m e a n [k] = \frac{1}{n} \sum_{i = 1}^{n} H R s i g (i)

(3)

where n is the total number of samples of the

H R s i g

signal during the session, and

H R n o r m

was measured as follows:

H R n o r m = (H R m e a n / H R p e a k) * 100

(4)

This reflects the percentage of the mean session HR with respect to the maximum possible value. These metrics provide averaged values of the subject’s HR and, therefore, are not significantly affected by any artifacts or low signal accuracy that may occur due to the exercise.

2.2. Investigation of Different Patient Clusters at Baseline

The patients that attended the home-based CR programs presented different profiles with regard to their exercise behavior or their clinical characteristics. Thus, the first step towards implementing a model that could predict future adherence to the program was to categorize the patients into different clusters. In this study, the clustering was based on (1) the characteristics collected before the start of the program, (2) the adherence to the program during the familiarization phase, and (3) the adherence to the whole 6-month exercise program.

2.2.1. Clustering Baseline Profiles

Hierarchical clustering was used to categorize the patients into different groups based on their baseline characteristics (Table 1). Hierarchical clustering is an algorithm that groups objects with similar characteristics into a tree-like hierarchy [47]. The main advantage of hierarchical clustering is that it is easy to interpret, as the dendrograms provide visual information on the observations and the clusters to which they belong at each level of detail. In the present study, the number of clusters was selected based on the one that maximized the silhouette value [48].

2.2.2. Clustering Familiarization Adherence Behavior

The clustering of the patients was based on the mean duration of exercise sessions performed each week (

a d h e r

) during the 6 weeks of the familiarization phase. In this respect, for each patient, the

a d h e r

value for each of the 6 weeks was computed, and hierarchical clustering was applied. Maximum silhouette values were used to identify the optimal number of clusters.

2.2.3. Clustering whole-Program Adherence Behavior

The adherence for the whole 28-week period of the program was based on

a d h e r

, which was calculated for the period after the familiarization phase; thus, 22 weeks were used to cluster the patients. However, the fact that the number of features was comparable to the number of patients included in the study (approximately 1:2) made the hierarchical clustering inefficient, as this method is prone to outliers [47]. For this reason, spectral clustering was applied to categorize the patients into different groups [49]. A self-tuning kernel [50] was used, and the number of diffusion iterations was set to 18.

2.3. Predictive Modeling for Whole-Program Adherence Prediction

A decision classification tree was built to predict the adherence to the exercise program. Decision trees are unsupervised learning algorithms that are often used in multilabel classification [51]. The main advantage of their use, apart from their good performance, is their interpretability, as they allow for the visualization of the model in terms of rules. The data used for the model’s development were (1) the clusters of patients that were created based on the baseline characteristics, and (2) the clusters related to the adherence to the exercise program during the familiarization phase.

Because of the small sample size, we ran the model 100 times with different combinations of training and test datasets. Each time, the whole dataset was split into training and testing subsets, at a 9:1 ratio.

The minimum number of observations that should exist in each node of the tree to attempt a split was set to 4, and 10 cross-validations were carried out. For the implementation of the model, the “rpart” R package was used [52]

For each of the 100 models, the performance of the classification was measured using precision (

P r e c

), recall (

R e c

), and accuracy (

A C C

), which were defined as follows:

P r e c = \frac{T P}{T P + F P}

(5)

R e c = \frac{T P}{T P + F N}

(6)

A C C = \frac{T P + T N}{T P + T N + F P + F N}

(7)

where

T P

,

T N

,

F P

, and

F N

are the true positive, true negative, false positive, and false negative, respectively. The adherent group was selected to be the positive group. The frequency of each model was calculated, and the mean performance metrics were extracted.

However, one of the main drawbacks of decision trees is instability, especially in cases where the sample size is small; thus, minor changes in the training dataset can lead to modifications in the tree. Therefore, a random forest (RF) technique was applied, which is more stable and robust. An RF uses voting techniques to aggregate tree-structured classifiers into a single classifier [53]. A 10-fold cross-validation was applied, and 100 runs of the RF were used to extract the most important features and the performance metrics

P r e c

,

R e c

, and

A C C

.

2.4. Statistical Differences between the Groups

The Kruskal–Wallis non-parametric statistical test was used to investigate the existence of significant differences among the groups of patients, as well as to identify any significant differences between different time periods, since this test is more robust when the sample size is small [54]. In this case, the analysis was based on the computation of the differences between the two time periods. In all cases, the probability threshold was set to 0.05, to consider statistically significant differences.

Spearman’s rank correlation coefficient

ρ

was used to estimate the rank association between the variables, and it was computed as follows:

ρ = \frac{c o v (R (x), R (Y))}{σ_{R (X)} σ_{R (Y)}}

(8)

where

R (X_{i})

and

R (Y_{i})

are the ranks of the variables

X_{i}

and

Y_{i}

, respectively,

c o v

is the covariance, and

σ_{R (X)}

and

σ_{R (Y)}

are the standard deviation of R(X) and

R (Y)

, respectively. This measure is non-parametric and is recommended when the data do not necessarily come from a normal distribution.

2.5. Network Analysis Per Group

Network research aims to understand how a process works and identify the system components as well as the statistical relations between them, with the former being represented as the nodes of the system and the latter as links between the nodes [55]. Following this systems medicine approach [56], psychological networks have been widely used in recent years to conceptualize the interplay of different components of human behavior [57].

In this study, a network analysis was performed to identify the network structure for each group of patients based on their baseline characteristics, their adherence during the familiarization phase, and their adherence to the whole program. The data previously used for the creation of the clusters related to the baseline characteristics and the adherence during the familiarization phase were also used to create the networks. For the network analysis of adherence to the whole exercise program, both types of data were considered. However, in all cases, only features that presented statistically significant differences between the clusters using the Kruskal–Wallis test were used for the creation of the networks. In addition, for a more accurate estimation of the networks, the number of nodes had to be less than the number of members of the group. Therefore, the features were ordered based on the p-values calculated using the Kruskal–Wallis test, and only the most significant were included in the analysis.

The network analysis was implemented in R, using the “glasso” package [58] based on [59] for LASSO regularization. In more detail, the Gaussian graphical model (GGM) [60] was estimated using “glasso” and EBIC model selection, since it has been found that this combination works well in retrieving the correct network structure [61]. To assess the importance of the nodes in the network structure, three measures were used: node strength and closeness quantify how well a node is directly or indirectly connected to others, respectively, while betweenness quantifies the node’s importance in the average path between two other nodes [57].

Regarding the metrics that were captured during the session (

a d h e r

,

H R n o r m

, and

H R t i m e

), temporal networks were created using the “graphicalVAR” package [62]. The number of LASSO tuning parameters that were tested was set to 50.

3. Results

3.1. Absent Versus Present Patients during Familiarization

From the 50 patients that were included in the intervention group (after the exclusion of the patients who dropped out), 9 of them were considered to be absent (Absent group) from the exercise program, since they exercised very sparsely, i.e., their median weekly duration of exercise sessions during the familiarization phase, as well as in the following weeks, was zero minutes. Therefore, they were excluded from further analysis; thus, 41 patients were included in the Present group.

It was found that Absent had a lower score on the BARSE questionnaire, which measures the subjects’ perceived capabilities to exercise three times per week for at least 40 min over the next two months [35]. In addition, the Absent patients had a lower sedentary time and a higher light activity time during baseline testing (Table 1). These findings suggest that Absent patients were feeling capable of engaging in enough physical activity by themselves, and they considered that they did not need the telerehabilitation system to become more active.

While the Absent patients were not further studied in the next sections, it is worth noting their differences with the Present group regarding their clinical characteristics after the 6-month intervention period. It was found that Present patients reduced their waist circumference, while their BMI, triglyceride levels, and peak load during the CPET remained stable. In contrast, the Absent patients had increased BMI, waist circumference, and triglyceride levels, and a reduced peak load during CPET. These findings suggest that exercise had a slightly positive effect on patients who participated in the CR program.

Finally, based on the observation of the exercise behavior in terms of the number of sessions performed as well as their duration, it was found that the patients who were characterized as absent during the familiarization phase continued to remain inactive during the rest of the program (Figure 3). The statistical analysis of the mean duration of the sessions each week revealed the existence of statistically significant differences between the groups—mainly during the first half of the 6-month program.

3.2. Patient Profile Clusters

The hierarchical clustering resulted in the creation of different clusters of Present patients considering their baseline characteristics and their exercise behavior during the familiarization period. More details are provided in the following sections.

3.2.1. Clusters Based on Clinical Baseline Characteristics

Three clusters of patients were found to maximize the silhouette value, using hierarchical clustering analysis on the baseline characteristics. Cluster 1 included 5 patients, while 15 and 21 patients were included in Clusters 2 and 3, respectively. The use of the Kruskal–Wallis test revealed statistically significant differences between the three clusters (Table 2).

As shown in Table 2, Cluster 1 included patients with lower cardiovascular risk compared to the patients from the other clusters. Those patients were confident that they could exercise regularly (BARSE), and this was reflected in a higher daily number of steps and lower sedentary time. The opposite behavior was observed in the patients included in Cluster 2. Those patients were more sedentary and less physically active, as reflected by lower daily levels of MVPA and steps. In addition, they had the lowest PSS scores and the highest BARSE scores, glucose levels, and cardiovascular risk. Finally, the third and largest cluster included patients who were active, as they achieved the recommended guidelines for daily steps and MVPA, and their muscular strength was the highest compared with the other groups. However, these patients were less confident that they could exercise regularly, and they had the lowest scores in the PACE survey, which captures the attainment of physical activity guidelines. For these reasons, Cluster 1 is referred to as “Low-Risk”, Cluster 2 as “High-Risk”, and Cluster 3 as “Average-Baseline”.

3.2.2. Clusters of Patient Adherence during Familiarization

Three clusters were identified based on the hierarchical clustering, with 12, 24, and 5 patients to be included in each cluster. The observation and the statistical analysis of the mean

a d h e r

values revealed information regarding the exercise behavior of the patients in each group. As shown in Figure 4, during the first two weeks, the patients from all of the clusters presented similar behavior as they attended the demonstration sessions, and they performed one ExerClass or ExerGame.

Slight differences are observed for the following 2 weeks, where the demonstration of the systems was performed. However, the observation of the p-values after the application of the Kruskal–Wallis test also revealed a continuous decrease, with the differences being statistically significant in the last two weeks of the familiarization phase, where the patients used the system at their homes, without any supervision (Table 3). Patients from Cluster 1 presented a continuous and gradual increase in their adherence, while patients from Cluster 2—which was the largest cluster—had a low adherence that decreased even further during the last 2 weeks. Finally, a small cluster of patients presented fluctuations regarding their time spent exercising (Figure 4). For this reason, and for simplicity, Cluster 1 was named “adherent-6w”, Cluster 2 “non-adherent-6w”, and Cluster 3 “transient-6w”.

The evolution of the performance metrics during the familiarization phase for the three clusters is depicted in Figure 5a,b. As observed, all patients, independent of their adherence to the exercise program, spent more than 60% of their time with an HR above the lower HR threshold, and the mean HR during the session was 80% of the maximum HR. Although the non-adherent-6w patients had lower

H R t i m e

values in most of the weeks, there were no statistically significant differences between the clusters (Table 4). These results suggest that when the patients exercised, they performed moderate-to-vigorous activity, and they performed similarly, independent of how frequently they participated in the rehabilitation program.

Finally, from the observation of the correlation matrices in Figure 5c, we can conclude there was a strong negative correlation between the adherence and the mean HR during the session in the Adherent-6w group (as a percentage of the maximum HR), while for Transient-6w patients the correlation was strongly positive, and for the non-adherent-6w group the correlation was tight. Taking Figure 5b into account as well, where Adherent-6w present lower performance compared with Transient-6w patients, this finding suggests that adherence did not necessarily lead to better performance during exercise and that, generally, the patients tended to exercise in beneficial HR zones (Figure 5a).

3.3. Program Adherence Clusters

The spectral clustering that was based on

a d h e r

for the period after the familiarization phase (week 6) resulted in the formulation of two clusters. Figure 6 depicts the

a d h e r

over the whole intervention period (32 weeks). The first cluster included 24 patients and represented those individuals that were adherent to the exercise program, while the second cluster consisted of non-adherent patients (17 members). As depicted in Figure 6, even for the “Adherent” cluster, a slight decrease in exercise duration was observed during the last 4 weeks of the program.

3.4. Predicting Program Adherence

Figure 7 provides a visual overview of the distribution of patients into different clusters based on the analysis performed. As depicted, the majority of the patients who remained non-adherent during the whole program’s duration were also non-adherent during the familiarization phase. On the other hand, patients who were adherent during the familiarization phase tended to also be adherent for the whole program. One additional interesting finding is that the active patients with low cardiovascular risk during baseline did not adhere to the exercise program.

A dendrogram was created to predict future adherence to a home-based CR program according to the clinical data at baseline and the adherence to a short familiarization phase. Based on multiple train/test splits and model building with cross-validation in each run, the most frequent models—representing 92% of the total number of models—are depicted in Figure 8. As observed, model “a” (left) was the most frequent model, and it had the highest performance (accuracy = 82.3 ± 14.7%, precision = 80.2 ± 19.5%, and recall = 94.5 ± 14.5%).

The rules of the model with the highest performance (Figure 8a) were as follows:

A patient that is recruited for a home- and exercise-based rehabilitation program has a 58.5 ± 2.5% probability to be adherent without any additional knowledge.
If the patient is adherent during the familiarization phase, then the probability of being adherent for the whole program reaches 92.3 ± 3.5%.
For a patient that is non-adherent or has a transient exercise behavior in the familiarization phase, the possibility to be non-adherent for the rest of the program is 55.6 ± 3%.
- If those patients are of high risk, based on the baseline characteristics, then the probability of being non-adherent increases to 82.4 ± 3.9%.
- If those patients are of low risk or are included in the average-baseline cluster, then the probability of being adherent is 63.3 ± 3.3%.

Model “b” (middle) is very similar to model “a”, where the continuation of adherence (second rule) also includes the transient adherence during the familiarization phase in the same branch, and the third rule is also the same, with very similar probabilities. The third model uses rules similar to model “a”, but it considers the baseline clusters first and then the adherence during the familiarization.

This instability of the decision tree classification was reduced by the use of the RF classification technique. Four features were identified in all of the RF runs as being significant for the classification of adherence. Figure 9 depicts the mean importance of the features that were used in each of the 100 runs of the classifier. The performance of the RF model (accuracy = 73.4 ± 17.5%, precision = 71.8 ± 25.8%, and recall = 87.7 ± 24%) was lower compared with the most frequent decision tree model (model “a”), but the RF model was more robust, revealing the importance of transient-6w users for the prediction of adherence to the whole program.

3.5. Network Analysis and Detection of Structure in Patient Profiles

Network analysis focused on the comparison of the bigger clusters for each type of analysis, i.e., the two clusters that included the most members. For baseline characteristics, the clusters with the high-risk and sedentary patients and the high-risk and fit patients were compared. The nodes on the graphs represent the features that were statistically significant between all three groups. As shown in Figure 10 (upper), the structure of the networks differed. In the cluster with the high-risk and fit patients, a stronger interplay among the characteristics was observed. The centrality measures denoted that the effect of each node was stronger for most of the features. Although a strictly causal relation was not defined, this structure may suggest that it is possible to drive changes in some factors and see effects in others, much more than in the high-risk and sedentary group. The main differences between the groups were as follows:

In the high-risk and fit group, the risk was correlated with glucose and SBP, while in the high-risk and sedentary group, the risk was correlated with the level of MVPA.
In the average group, STS and SBP were positively correlated, while in the high-risk and sedentary group they were negatively correlated.
In the high-risk and fit group, the main connections included peak HR–MVPA–STS–SBP (physical/cardiovascular condition), while in the X group a glucose–SBP–STS link prevailed.

Better interpretable results of the network analysis are provided by the comparison of the clusters that were created based on the

a d h e r

during the familiarization period. In this case, the comparison focused on the adherent and non-adherent clusters, and each node denotes a week. As depicted in Figure 11 (upper), for the adherent group, there were positive relationships with

a d h e r

. In the non-adherent cluster, there was a break of the positive relationship between week 4 and week 5.

The temporal graphs suggest a causal relationship between exercise HR performance and the next week’s adherence. In the transient adherence group, (a) adherence was positively affected by previous adherence, and good

H R n o r m

, i.e., performance, (b) adherence improved next week’s

H R t i m e

. In the non-adherent group,

H R t i m e

positively affected

a d h e r

. These links did not exist in the adherent group, in which the adherence behavior was not affected by the performance within the session. This means that there may be space for adjustments in the exercise sessions, to improve performance and influence adherence.

Finally, regarding the analysis of the graphs for the two clusters that were created based on adherence to the whole exercise program, the 10 most significant features were considered. Those features included adherence to weeks 2–6 of the familiarization phase, as well as five features based on the baseline characteristics. The network structure for both the adherent and non-adherent groups is depicted in Figure 12. As observed, weight and pVO2 were important and influential nodes in the non-adherent group, whereas in the adherence group, the adherence in weeks 2–6 remained correlated.

4. Discussion

Although the beneficial effects of CR have been thoroughly described in several studies, the uptake and adherence to center-based or home-based CR remains suboptimal. The limited adherence to CR programs leads to implications for patients’ clinical status and the effective use of resources.

This study proposes an approach to predict long-term exercise adherence in a home-based CR setting, based on readily available baseline data before the start of a CR program. These data include clinical information, behavioral characteristics, and cardiovascular fitness, as well as HR and exercise duration during a familiarization phase of the intervention.

The methodology is based on the combination of unsupervised and supervised machine learning techniques in order to predict, from the initial stages of the CR programs, those patients who are more likely to be adherent during a 6-month period. In more detail, the unsupervised methods aim to identify different patients’ profiles based on clinical and behavioral characteristics, whereas supervised techniques use these profiles to make the prediction. Based on the bibliography, this is the first data-driven end-to-end method that is able to predict long-term adherence in such programs, using data that are commonly collected during CR programs.

Initially, clustering was chosen as an unsupervised method to show the group characteristics at baseline and the adherence behavior in a limited familiarization period, without imposing a binary problem. The baseline data led to the formation of three patient groups, suggesting (1) a low-risk and active group of patients, (2) high-risk sedentary patients, and (3) a considerable number of patients who were of high cardiovascular risk but were also fit and motivated. The exercise familiarization showed three adherence behaviors (high adherence, low adherence, and transient adherence), while the exercise sessions after the familiarization phase led to two clusters: adherent and non-adherent. These two clusters were the targets for prediction, while the clusters based on the baseline data and the familiarization phase served as inputs for the prediction model.

Two types of models were tested: (1) decision trees, and (2) RF. The first type is more interpretable but also unstable, while the second type offers both robustness and explainability. Regarding the decision trees, the most common model produced after 100 runs with 10-fold cross-validation achieved both high precision and high recall (80.2 ± 19.5% and 94.4 ± 14.5%, respectively), and the rules were simple and explainable. As shown in Figure 8a, only approximately 60% of the target patients were adherent. However, if a patient was adherent during the familiarization phase, the long-term adherence rates reached 90%. For the rest of the patients, their clinical profiles can help the clinical experts to identify the non-adherent ones. A similar conclusion can also be reached by the observation of Figure 8b,c.

On the other hand, the RF model had lower performance (precision = 71.8 ± 25.8% and recall = 87.7 ± 24%), but it also revealed the importance of non-adherence during the familiarization phase and the high-risk and sedentary profile for the prediction of the whole-program adherence.

Previous studies focused on groups of patients that presented clear exercise behavior in terms of adherence, while they excluded patients with intermediate behaviors from the analysis [28], thereby somewhat limiting the generalizability of the model. In the present study, the transient adherence and initial clinical profile were found to be important for the prediction in both the RF model and the decision tree one (Figure 8b and Figure 9, respectively). However, the validation of the models using an external dataset is a necessary next step.

While the decision tree model predicted that those who were adherent in the familiarization phase would continue to be adherent, it also shed light on the other cases, where the combination of adherence profile and clinical baseline seemed to play a role in subsequent adherence. For example, the patients with high cardiovascular risk seemed to be more susceptible to support and improved adherence, while patients with a low cardiovascular risk might need different handling, as they were predicted to continue being non-adherent, potentially because they had already established a physically active lifestyle and, perhaps, did not have the motivation to follow a specific program.

This is an important point that recognizes and sheds some light on the gray zone profiles or behaviors, which is also supported by the network analysis. Different network structures of baseline characteristics showed more correlated features in the high-risk and fit group of patients, and potentially more room for intervention. Temporal analysis at familiarization showed an interplay between HR performance and adherence in the transient and non-adherent groups, with adherence influenced by

H R t i m e

or

H R n o r m

, which may also suggest further room for improvement and personalization of sessions. In the present study, exercise intensity was not a factor predicting adherence; however, in the temporal graph analysis a temporal link between HR and subsequent session duration was noted.

The role of both the familiarization phase and patient self-confidence is closely linked to understanding the program and self-motivation of patients. This has also been highlighted by [63], who mentioned family support to help keep patients engaged in a home-based CR program and suggested educating both patients and families to improve adherence to home-based CR programs. In addition, several previous works have identified factors that affect either short-term or long-term adherence to home-based CR programs, such as self-motivation, physical activity levels, or perception of self-status [9,21], and they propose patient-centered strategies to improve adherence to the exercise programs [18,64].

On the other hand, few works have attempted to make a predictive model in a data-driven manner to increase the chances of a match between patient and CR program. Recently, predictive models using machine learning techniques have been proposed [26]. However, those studies are only able to perform short-term predictions, while our model provides a longer-term adherence prediction.

A major advantage of the methodology presented in the present article was that the models were based on variables that are easily collected. In addition, the present study does not disregard the patients with intermediate behavior, such as transient adherence, to make future predictions, making our approach more generalizable compared with previous studies [28].

Addressing adherence to lifestyle changes, including exercise training, is significant and incredibly difficult, since participation rates in CR programs depend on several factors. Understanding those factors and predicting patients’ behavior, such as exercise compliance, is important in clinical practice; thus, the clinical implications of our work could be substantial. Identifying areas for improvement in the interventions can increase adherence and the effectiveness of home-based CR; this, in turn, can lead to a better health status and quality of life for patients with CVD. In addition, since home-based CR methods have also proven to be more cost-effective, this could also help alleviate the financial stress placed on healthcare systems by the management of CVD patients [65]. Second, being able to predict adherence to home-based CR could contribute to better allocation of resources. Tang [66] showed that patient characteristics influence the choice of a certain type of CR delivery mode. The clinician could use this information to advise for or against home-based CR for a specific patient, increasing the likelihood of a match between patient and CR program.

The main limitation of the present study is the fact that the results were based on a small dataset (41 patients) collected as part of an RCT described in [16]. However, the data used for the predictive models and patient clustering are typically collected before a patient is recruited into a CR program, increasing the generalizability of the method and making it feasible to increase the sample size and allow for external validation of the models. However, this method needs to be validated using larger datasets, and this is one of the future directions of our study.

An additional limitation is that in the present study, adherence was mainly associated with the use of a home-based CR platform. However, the use of technology during exercise may not fully cover or represent adherence to the desired health behavior. As observed in Figure 6, patients in the adherent group presented a decrease in their adherence over time. This finding could be explained by the fact that these patients were becoming more confident in their physical activity behavior and might choose to exercise on their own, without the constant need to be stimulated by a home-based system. Finally, information related to the age and the sex of the patients was not available during the analysis, and their inclusion could lead to different clusters based on patients’ baseline characteristics. This lack of information is a limitation of our study.

The results of the present study highlighted the importance of patients’ characteristics and behavior in the familiarization phase for predicting adherence to home-based CR programs. Considering that CR programs are effective in improving patients’ functional capacity, psychosocial status, and quality of life, technology should be leveraged for the widespread implementation of CR programs in patients with CVD or other chronic diseases.

Author Contributions

I.C. conceived the idea and formulated the research goals. I.C. and D.F. designed and developed the methodology and implemented the computer code and the algorithms. J.C. and D.F. preprocessed the data. D.F. and I.C. coordinated the writing of all drafts of the manuscript. J.C., E.K. and V.C. provided knowledge in the field of cardiac rehabilitation and performed the critical review of the manuscript. All authors contributed to the submitted versions of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Institutional Review Board Statement

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The study protocol was approved by the ethics committees of UZ Leuven/KU Leuven (Belgium; S59023), the Research Ethics Committees of Mater Misericordiae University Hospital and Beaumont University Hospital in Dublin, Ireland (1/378/1846), and the ethics committee of Dublin City University (DCU; REC2016/123), Ireland.

Informed Consent Statement

Informed consent was obtained from all individual participants included in the study.

Data Availability Statement

Data sharing is not applicable to this article.

Acknowledgments

We would like to thank the PATHway consortium for their cooperation in completing the PATHway trial and for providing the data necessary to make this work possible. We would also like to thank Elisavet Koutsiana for her support in cleaning the data. More information about the PATHway trial can be found at: https://cordis.europa.eu/project/id/643491 (accessed on 16 May 2013).

Conflicts of Interest

The authors declare no potential conflict of interest with respect to the research, authorship, and/or publication of this article.

References

Timmis, A.; Townsend, N.; Gale, C.P.; Torbica, A.; Lettino, M.; Petersen, S.E.; Mossialos, E.A.; Maggioni, A.P.; Kazakiewicz, D.; May, H.T.; et al. European society of cardiology: Cardiovascular disease statistics 2019. Eur. Heart J. 2020, 41, 12–85. [Google Scholar] [CrossRef] [PubMed]
Wilkins, E.; Wilson, L.; Wickramasinghe, K.; Bhatnagar, P.; Leal, J.; Luengo-Fernandez, R.; Burns, R.; Rayner, M.; Townsend, N. European Cardiovascular Disease Statistics 2017; European Heart Network: Brussels, Belgium, 2017. [Google Scholar]
Piepoli, M.F.; Hoes, A.W.; Agewall, S.; Albus, C.; Brotons, C.; Catapano, A.L.; Cooney, M.T.; Corrà, U.; Cosyns, B.; Deaton, C.; et al. 2016 European Guidelines on cardiovascular disease prevention in clinical practice. Eur. Heart J. 2016, 37, 2315–2381. [Google Scholar] [CrossRef] [PubMed]
WHO. Adherence to Long-Term Therapies: Evidence for Action; WHO: Geneva, Switzerland, 2003. [Google Scholar]
Livitckaia, K.; Koutkias, V.; Maglaveras, N.; Kouidi, E.; Van Gils, M.; Chouvarda, I. Adherence to physical activity in patients with heart disease: Types, settings and evaluation instruments. In Proceedings of the International Conference on Biomedical and Health Informatics, Thessaloniki, Greece, 18–21 November 2017; pp. 255–259. [Google Scholar]
Naderi, S.H.; Bestwick, J.P.; Wald, D.S. Adherence to drugs that prevent cardiovascular disease: Meta-analysis on 376,162 patients. Am. J. Med. 2012, 125, 882–887.e1. [Google Scholar] [CrossRef] [PubMed]
Bjarnason-Wehrens, B.; McGee, H.; Zwisler, A.D.; Piepoli, M.F.; Benzer, W.; Schmid, J.P.; Dendale, P.; Pogosova, N.G.V.; Zdrenghea, D.; Niebauer, J.; et al. Cardiac rehabilitation in Europe: Results from the European Cardiac Rehabilitation Inventory Survey. Eur. J. Prev. Cardiol. 2010, 17, 410–418. [Google Scholar] [CrossRef]
Kotseva, K.; De Backer, G.; De Bacquer, D.; Rydén, L.; Hoes, A.; Grobbee, D.; Maggioni, A.; Marques-Vidal, P.; Jennings, C.; Abreu, A.; et al. Lifestyle and impact on cardiovascular risk factor control in coronary patients across 27 countries: Results from the European Society of Cardiology ESC-EORP EUROASPIRE V registry. Eur. J. Prev. Cardiol. 2019, 26, 824–835. [Google Scholar] [CrossRef]
Chindhy, S.; Taub, P.R.; Lavie, C.J.J.; Shen, J. Current Challenges in Cardiac Rehabilitation: Strategies to Overcome Social Factors and Attendance Barriers. Expert Rev. Cardiovasc. Ther. 2020, 18, 777–789. [Google Scholar] [CrossRef]
Rose, K.; Eldridge, S.; Chapin, L. The Internet of Things (IoT): An Overview. Int. J. Eng. Res. Appl. 2015, 5, 71–82. [Google Scholar]
Cavalheiro, A.H.; Silva Cardoso, J.; Rocha, A.; Moreira, E.; Azevedo, L.F. Effectiveness of Tele-rehabilitation Programs in Heart Failure: A Systematic Review and Meta-analysis. Health Serv. Insights 2021, 14, 1–10. [Google Scholar] [CrossRef]
Claes, J.; Buys, R.; Budts, W.; Smart, N.; Cornelissen, V.A. Longer-term effects of home-based exercise interventions on exercise capacity and physical activity in coronary artery disease patients: A systematic review and meta-analysis. Eur. J. Prev. Cardiol. 2017, 24, 244–256. [Google Scholar] [CrossRef]
Rawstorn, J.C.; Gant, N.; Direito, A.; Beckmann, C.; Maddison, R. Telehealth exercise-based cardiac rehabilitation: A systematic review and meta-analysis. Heart 2016, 102, 1183–1192. [Google Scholar] [CrossRef]
Frederix, I.; Hansen, D.; Coninx, K.; Vandervoort, P.; Vandijck, D.; Hens, N.; Van Craenenbroeck, E.; Van Driessche, N.; Dendale, P. Medium-term effectiveness of a comprehensive internet-based and patient-specific telerehabilitation program with text messaging support for cardiac patients: Randomized controlled trial. J. Med. Internet Res. 2015, 17, e185. [Google Scholar] [CrossRef]
Pinto, B.M.; Goldstein, M.G.; Papandonatos, G.D.; Farrell, N.; Tilkemeier, P.; Marcus, B.H.; Todaro, J.F. Maintenance of exercise after phase II cardiac rehabilitation: A randomized controlled trial. Am. J. Prev. Med. 2011, 41, 274–283. [Google Scholar] [CrossRef]
Claes, J.; Buys, R.; Woods, C.; Briggs, A.; Geue, C.; Aitken, M.; Moyna, N.; Moran, K.; McCaffrey, N.; Chouvarda, I.; et al. PATHway I: Design and rationale for the investigation of the feasibility, clinical effectiveness and cost-effectiveness of a technology-enabled cardiac rehabilitation platform. BMJ Open 2017, 7, e016781. [Google Scholar] [CrossRef]
Anderson, L.; Sharp, G.A.; Norton, R.J.; Dalal, H.; Dean, S.G.; Jolly, K.; Cowie, A.; Zawada, A.; Taylor, R.S. Home-based versus centre-based cardiac rehabilitation. Cochrane Database Syst. Rev. 2017, 6. [Google Scholar] [CrossRef]
Pfaeffli Dale, L.; Whittaker, R.; Dixon, R.; Stewart, R.; Jiang, Y.; Carter, K.; Maddison, R. Acceptability of a Mobile Health Exercise-Based Cardiac Rehabilitation Intervention. J. Cardiopulm. Rehabil. Prev. 2015, 35, 312–319. [Google Scholar] [CrossRef]
Hannan, A.L.; Harders, M.P.; Hing, W.; Climstein, M.; Coombes, J.S.; Furness, J. Impact of wearable physical activity monitoring devices with exercise prescription or advice in the maintenance phase of cardiac rehabilitation: Systematic review and meta-analysis. BMC Sports Sci. Med. Rehabil. 2019, 11, 14. [Google Scholar] [CrossRef]
Hamilton, S.J.; Mills, B.; Birch, E.M.; Thompson, S.C. Smartphones in the secondary prevention of cardiovascular disease: A systematic review. BMC Cardiovasc. Disord. 2018, 18, 25. [Google Scholar] [CrossRef]
Essery, R.; Geraghty, A.W.A.; Kirby, S.; Yardley, L. Predictors of adherence to home-based physical therapies: A systematic review. Disabil. Rehabil. 2017, 39, 519–534. [Google Scholar] [CrossRef]
Beinart, N.A.; Goodchild, C.E.; Weinman, J.A.; Ayis, S.; Godfrey, E.L. Individual and intervention-related factors associated with adherence to home exercise in chronic low back pain: A systematic review. Spine J. 2013, 13, 1940–1950. [Google Scholar] [CrossRef]
Picorelli, A.M.A.; Pereira, L.S.M.; Pereira, D.S.; Felício, D.; Sherrington, C. Adherence to exercise programs for older people is influenced by program characteristics and personal factors: A systematic review. J. Physiother. 2014, 60, 151–156. [Google Scholar] [CrossRef]
Triantafyllidis, A.; Filos, D.; Buys, R.; Claes, J.; Cornelissen, V.; Kouidi, E.; Chatzitofis, A.; Zarpalas, D.; Daras, P.; Walsh, D.; et al. Computerized decision support for beneficial home-based exercise rehabilitation in patients with cardiovascular disease. Comput. Methods Programs Biomed. 2018, 162, 1–10. [Google Scholar] [CrossRef] [PubMed]
Zhou, M.; Fukuoka, Y.; Goldberg, K.; Vittinghoff, E.; Aswani, A. Applying machine learning to predict future adherence to physical activity programs. BMC Med. Inform. Decis. Mak. 2019, 19, 169. [Google Scholar] [CrossRef] [PubMed]
Bastidas, O.J.; Zahia, S.; Fuente-Vidal, A.; Férez, N.S.; Noguera, O.R.; Montane, J.; Garcia-Zapirain, B. Predicting physical exercise adherence in fitness apps using a deep learning approach. Int. J. Environ. Res. Public Health 2021, 18, 10769. [Google Scholar] [CrossRef] [PubMed]
Kim, J.C.; Chung, K. Prediction model of user physical activity using data characteristics-based long short-term memory recurrent neural networks. KSII Trans. Internet Inf. Syst. 2019, 13, 2060–2077. [Google Scholar] [CrossRef]
Claes, J.; Filos, D.; Cornelissen, V.; Chouvarda, I. Prediction of the Adherence to a Home-Based Cardiac Rehabilitation Program. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), Berlin, Germany, 23–27 July 2019. [Google Scholar]
ACSM. ACSM’s Guidelines for Exercise Testing and Prescription; ACSM: Indianapolis, IN, USA, 2013; Volume 9, ISBN 978-1-6091-3955-1. [Google Scholar]
Shcherbina, A.; Mikael Mattsson, C.; Waggott, D.; Salisbury, H.; Christle, J.W.; Hastie, T.; Wheeler, M.T.; Ashley, E.A. Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. J. Pers. Med. 2017, 7, 3. [Google Scholar] [CrossRef]
Walsh, D.M.J.; Kieran, M.; Cornelissen, V.; Buys, R.; Claes, J.; Zampognaro, P.; Melillo, F.; Maglaveras, N.; Chouvarda, I.; Triantafyllidis, A.; et al. The development and codesign of the PATHway intervention: A theory-driven eHealth platform for the self-management of cardiovascular disease. Transl. Behav. Med. 2019, 9, 76–98. [Google Scholar] [CrossRef]
Wilson, P.W.F.; D’Agostino, R.B.; Levy, D.; Belanger, A.M.; Silbershatz, H.; Kannel, W.B. Prediction of coronary heart disease using risk factor categories. Circulation 1998, 97, 1837–1847. [Google Scholar] [CrossRef]
Cruz, L.N.; Camey, S.A.; Fleck, M.P.; Polanczyk, C.A. World Health Organization quality of life instrument-brief and Short Form-36 in patients with coronary artery disease: Do they measure similar quality of life concepts? Psychol. Health Med. 2009, 14, 619–628. [Google Scholar] [CrossRef]
Hardie Murphy, M.; Rowe, D.A.; Belton, S.; Woods, C.B. Validity of a two-item physical activity questionnaire for assessing attainment of physical activity guidelines in youth. BMC Public Health 2015, 15, 1080. [Google Scholar] [CrossRef]
McAuley, E. The role of efficacy cognitions in the prediction of exercise behavior in middle-aged adults. J. Behav. Med. 1992, 15, 65–88. [Google Scholar] [CrossRef]
Sniehotta, F.F.; Schwarzer, R.; Scholz, U.; Schuz, B. Action planning and coping planning for long-term lifestyle change theory and.pdf. Eur. J. Soc. Psychol. 2005, 35, 565–579. [Google Scholar] [CrossRef]
Lawford, B.R.; Barnes, M.; Connor, J.P.; Heslop, K.; Nyst, P.; Young, R.M.D. Alcohol use disorders identification test (AUDIT) scores are elevated in antipsychotic-induced hyperprolactinaemia. J. Psychopharmacol. 2012, 26, 324–329. [Google Scholar] [CrossRef]
Martínez-González, M.A.; García-Arellano, A.; Toledo, E.; Salas-Salvadó, J.; Buil-Cosiales, P.; Corella, D.; Covas, M.I.; Schröder, H.; Arós, F.; Gómez-Gracia, E.; et al. A 14-item mediterranean diet assessment tool and obesity indexes among high-risk subjects: The PREDIMED trial. PLoS ONE 2012, 7, e43134. [Google Scholar] [CrossRef]
Cohen, S.; Kamarck, T.; Mermelstein, R. A Global Measure of Perceived Stress. J. Health Soc. Behav. 1983, 24, 385–396. [Google Scholar] [CrossRef]
Morisky, D.E.; Ang, A.; Krousel-Wood, M.; Ward, H.J. Predictive validity of a medication adherence measure in an outpatient setting. J. Clin. Hypertens. 2008, 10, 348–354. [Google Scholar] [CrossRef]
Ng Fat, L.; Scholes, S.; Boniface, S.; Mindell, J.; Stewart-Brown, S. Evaluating and establishing national norms for mental wellbeing using the short Warwick–Edinburgh Mental Well-being Scale (SWEMWBS): Findings from the Health Survey for England. Qual. Life Res. 2017, 26, 1129–1144. [Google Scholar] [CrossRef]
Vaglio, J.; Conard, M.; Poston, W.S.; O’Keefe, J.; Haddock, C.K.; House, J.; Spertus, J.A. Testing the performance of the ENRICHD Social Support Instrument in cardiac patients. Health Qual. Life Outcomes 2004, 2, 24. [Google Scholar] [CrossRef]
Shields, C.A.; Brawley, L.R. Preferring proxy-agency: Impact on self-efficacy for exercise. J. Health Psychol. 2006, 11, 904–914. [Google Scholar] [CrossRef]
Razykov, I.; Ziegelstein, R.C.; Whooley, M.A.; Thombs, B.D. The PHQ-9 versus the PHQ-8—Is item 9 useful for assessing suicide risk in coronary artery disease patients? Data from the Heart and Soul Study. J. Psychosom. Res. 2012, 73, 163–168. [Google Scholar] [CrossRef]
Broadbent, E.; Petrie, K.J.; Main, J.; Weinman, J. The Brief Illness Perception Questionnaire. J. Psychosom. Res. 2006, 60, 631–637. [Google Scholar] [CrossRef]
Claes, J.; Cornelissen, V.; McDermott, C.; Moyna, N.; Pattyn, N.; Cornelis, N.; Gallagher, A.; McCormack, C.; Newton, H.; Gillain, A.; et al. Feasibility, Acceptability, and Clinical Effectiveness of a Technology-Enabled Cardiac Rehabilitation Platform (Physical Activity Toward Health-I): Randomized Controlled Trial. J. Med. Internet Res. 2020, 22, e14221. [Google Scholar] [CrossRef]
Nielsen, F. Hierarchical Clustering. In Introduction to HPC with MPI for Data Science; Springer International Publishing: Cham, Switzerland, 2016; pp. 195–211. ISBN 9789811305535. [Google Scholar]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
John, C.R.; Watson, D.; Barnes, M.R.; Pitzalis, C.; Lewis, M.J. Spectrum: Fast density-aware spectral clustering for single and multi-omic data. Bioinformatics 2020, 36, 1159–1166. [Google Scholar] [CrossRef]
Lihi, Z.-M.; Perona, P. Self-Tuning Spectral Clustering. In Advances in Neural Information Processing Systems; Saul, L., Weiss, Y., Bottou, L., Eds.; MIT Press: Cambridge, MA, USA, 2004; Volume 17. [Google Scholar]
Vens, C.; Struyf, J.; Schietgat, L.; Džeroski, S.; Blockeel, H. Decision trees for hierarchical multi-label classification. Mach. Learn. 2008, 73, 185–214. [Google Scholar] [CrossRef]
Therneau, T.; Atkinson, B. rpart: Recursive Partitioning and Regression Trees; Scientific Research Publishing: Wuhan, China, 2019. [Google Scholar]
Mahdi Abdulkareem, N.; Mohsin Abdulazeez, A. Machine Learning Classification Based on Radom Forest Algorithm: A Review. Int. J. Sci. Bus. 2021, 5, 128–142. [Google Scholar] [CrossRef]
Kruskal, W.H.; Wallis, W.A. Use of Ranks in One-Criterion Variance Analysis. J. Am. Stat. Assoc. 1952, 47, 583–621. [Google Scholar] [CrossRef]
Borsboom, D.; Deserno, M.K.; Rhemtulla, M.; Epskamp, S.; Fried, E.I.; McNally, R.J.; Robinaugh, D.J.; Perugini, M.; Dalege, J.; Costantini, G.; et al. Network analysis of multivariate data in psychological science. Nat. Rev. Methods Prim. 2021, 1, 58. [Google Scholar] [CrossRef]
Zanin, M.; Aitya, N.A.A.; Basilio, J.; Baumbach, J.; Benis, A.; Behera, C.K.; Bucholc, M.; Castiglione, F.; Chouvarda, I.; Comte, B.; et al. An Early Stage Researcher’s Primer on Systems Medicine Terminology. Netw. Syst. Med. 2021, 4, 2–50. [Google Scholar] [CrossRef] [PubMed]
Epskamp, S.; Borsboom, D.; Fried, E.I. Estimating psychological networks and their accuracy: A tutorial paper. Behav. Res. Methods 2018, 50, 195–212. [Google Scholar] [CrossRef]
Friedman, J.; Hastie, T.; Tibshirani, R. glasso: Graphical Lasso: Estimation of Gaussian Graphical Models. 2019. Available online: https://CRAN.R-project.org/package=glasso (accessed on 6 April 2023).
Friedman, J.; Hastie, T.; Tibshirani, R. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 2008, 9, 432–441. [Google Scholar] [CrossRef]
Epskamp, S.; Waldorp, L.J.; Mõttus, R.; Borsboom, D. The Gaussian Graphical Model in Cross-Sectional and Time-Series Data. Multivar. Behav. Res. 2018, 53, 453–480. [Google Scholar] [CrossRef]
Epskamp, S. Brief Report on Estimating Regularized Gaussian Networks from Continuous and Ordinal Data. arXiv 2016, arXiv:1606.05771. [Google Scholar]
Epskamp, S. graphicalVAR: Graphical VAR for Experience Sampling Data. 2021. Available online: https://cran.r-project.org/web/packages/graphicalVAR/graphicalVAR.pdf (accessed on 6 April 2023).
Ge, C.; Ma, J.; Xu, Y.; Shi, Y.J.; Zhao, C.H.; Gao, L.; Bai, J.; Wang, Y.; Sun, Z.J.; Guo, J.; et al. Predictors of adherence to home-based cardiac rehabilitation program among coronary artery disease outpatients in China. J. Geriatr. Cardiol. 2019, 16, 749–755. [Google Scholar] [CrossRef]
Shaw, J.F.; Pilon, S.; Vierula, M.; McIsaac, D.I. Predictors of adherence to prescribed exercise programs for older adults with medical or surgical indications for exercise: A systematic review. Syst. Rev. 2022, 11, 80. [Google Scholar] [CrossRef] [PubMed]
Heindl, B.; Ramirez, L.; Joseph, L.; Clarkson, S.; Thomas, R.; Bittner, V. Hybrid cardiac rehabilitation—The state of the science and the way forward. Prog. Cardiovasc. Dis. 2022, 70, 175–182. [Google Scholar] [CrossRef]
Tang, L.H.; Harrison, A.; Skou, S.T.; Taylor, R.S.; Dalal, H.; Doherty, P. Are patient characteristics and modes of delivery associated with completion of cardiac rehabilitation? A national registry analysis. Int. J. Cardiol. 2022, 361, 7–13. [Google Scholar] [CrossRef]

Figure 1. Graphical overview of the proposed approach. Each part of the graph is described in detail in the following sections.

Figure 2. Timeline for the intervention study’s structure.

Figure 3. Evolution of the number of sessions (left) and the mean session duration (right) for the present (green) and absent (red) patients. Regarding the mean session duration (right), statistically significant differences were found for weeks 1 to 11, 13, 17, 20, and 24. The character # means number of sessions.

Figure 4. Evolution of mean adher values for the patients included in each cluster.

Figure 5. On the upper part,

H R t i m e

(a) and

H R n o r m

(b) during the familiarization period are depicted for the three clusters. On the bottom (c), the Spearman’s correlation for those variables with

a d h e r

is provided; the bigger and darker the circle, the greater the correlation. Blue and red denote positive and negative correlation, respectively.

Figure 5. On the upper part,

H R t i m e

(a) and

H R n o r m

(b) during the familiarization period are depicted for the three clusters. On the bottom (c), the Spearman’s correlation for those variables with

a d h e r

is provided; the bigger and darker the circle, the greater the correlation. Blue and red denote positive and negative correlation, respectively.

Figure 6. Evolution of the mean

a d h e r

value (mean weekly session duration) for the two clusters based on the spectral clustering. The dotted line reflects the end of the familiarization phase.

Figure 6. Evolution of the mean

a d h e r

value (mean weekly session duration) for the two clusters based on the spectral clustering. The dotted line reflects the end of the familiarization phase.

Figure 7. Sankey diagram regarding the different clusters of the analysis for the Present cluster. The Absent patients (n = 9) were not included in the analysis.

Figure 8. The three most frequent models that were created after splitting the dataset 100 times into different training and testing datasets. Model (a) was created 61 times, while the frequency for model (b,c) was 17 and 14, respectively. The performance metrics for each model is shown in the figure, too. The models are represented as dendrograms with rules. In each node of the tree, the most common class is depicted, along with the respective probability (mean ± std).

Figure 9. The mean importance for the features that were used in each of the 100 runs of the RF classification.

Figure 10. Networks of the two most popular clusters, based on the baseline clinical characteristics (upper). The centrality measures are depicted on the bottom (bottom).

Figure 11. Temporal network for the three clusters created using the adherence during the familiarization phase. In the Adherent-6w group (left), there seems to exist no connection between the adherence and the HR metrics representing the performance during the session. The respective graphs for the non-adherent and the transient users are depicted in the middle and on the right, respectively.

Figure 12. Networks for the adherent and non-adherent clusters include data from both the adherence to the familiarization phase and the baseline clinical characteristics. The numbers inside the nodes (upper) represent the variable adher during each week. The different measures for the importance of each node is depicted (bottom).

Table 1. Statistically significant differences between the Present and Absent groups during baseline, and differences between the start and the end of the CR program.

	Present	Absent	p-Value
	Baseline
BARSE	67.361 ± 22.5	53.93 ± 16.6	0.043
Sedentary time (min)	752.53 ± 98.3	677.89 ± 55.1	0.016
Light activity time (min/day)	559.88 ± 80.87	620.78 ± 50.54	0.035
	Baseline–6 months
BMI (kg/m²)	0.037 ± 1.08	1.21 ± 1.84	0.022
Waist circumference (cm)	−2.06 ± 5.25	2.77 ± 4.24	0.014
Triglycerides (mmol/L)	−0.029 ± 0.67	0.48 ± 0.72	0.028
pLoad (Watt)	−0.122 ± 25.4	−14.44 ± 18.1	0.028

BARSE = barriers self-efficacy scale; BMI = body mass index; pLoad = peak load achieved during CPET.

Table 2. Baseline clinical characteristics that presented statistically significant differences between the three groups of patients (p < 0.05). The values are presented as the mean value ± standard deviation.

	Cluster 1	Cluster 2	Cluster 3
	Low-Risk & Active	High-Risk & Sedentary	High-Risk & Fit
Glucose (mmol/L)	4.71 ± 0.5	6.04 ± 1.81	5.81 ± 1.04
Risk score (%)	6.16 ± 5.9	18.65 ± 10.43	16.79 ± 10.035
BARSE	81.23 ± 11.1	75.13 ± 17.63	58.43 ± 24.39
PSS	13 ± 7.5	7.73 ± 5.34	12.48 ± 6.129
PACE	4.5 ± 1.7	4.5 ± 1.85	3.12 ± 1.387
Illness perception	38.2 ± 13.74	24.13 ± 12.69	34.1 ± 13.37
SF-36 mental	78.19 ± 15.14	83.36 ± 14.65	74.18 ± 14.17
EE (kcal)	1576 ± 639.62	1115.4 ± 187.27	1583.62 ± 395.35
MVPA (min)	229.4 ± 46.39	77.2 ± 29.46	132.43 ± 41.3
Steps (n)	16720 ± 767.8	8994.93 ± 903.62	13,173.14 ± 1475.88
30 s STS (n)	23 ± 3.24	16.53 ± 4.29	19.29 ± 4.69
Sedentary time (min)	655.6 ± 35.84	840.93 ± 67.96	712.48 ± 76.99
Quadriceps isokinetic (J)	1636.38 ± 314.61	1885.51 ± 849.03	2378.82 ± 576.13
Quadriceps isometric (Nm)	107.2 ± 26.33	142.4 ± 51.32	156.79 ± 38.1

BARSE = barriers self-efficacy scale; PSS = perceived stress scale; PACE = physical activity questionnaire; SF-36 = short-form 36; EE = energy expenditure; MVPA = moderate-to-vigorous physical activity; STS = sit-to-stand.

Table 3. Mean session duration for all of the patients in each group for weeks 1 to 6 of the familiarization phase. The values are presented in minutes as the mean ± standard deviation.

Week	Cluster 1 Adherent-6w	Cluster 2 Non-Adherent-6w	Cluster 3 Transient-6w	p-Value
1	9.43± 1.6	8.94 ± 2.4	9.4 ± 1.3	0.87
2	10.07 ± 5.7	9.69 ± 7.3	8.93 ± 13.4	0.72
3	16 ± 7.9	11.27 ± 8.6	20.6 ± 19	0.35
4	18.75 ± 11.2	13.16 ± 9	9.78 ± 16.3	0.99
5	22.88 ± 9.8	10 ± 9	28.3 ± 16.8	0.00054
6	28.28 ± 12.6	5.67 ± 7.3	29.16 ± 2.4	<0.0001

Table 4. Performance metrics for the 3 session adherence clusters.

	$% T i m e S p e n t a b o v e H R l o w e r$				$% H R n o r m$
Week	Adherent-6w	Non-Adherent-6w	Transient-6w	p-Value	Adherent-6w	Non-Adherent-6w	Transient-6w	p-Value
1	81.01 ± 15.5	63.44 ± 25.6	61.41 ± 37.8	0.171	84.62 ± 12.78	78.91 ± 15.69	77.74 ± 18.22	0.455
2	82.74 ± 18.4	74.12 ± 18	83.8 ± 20.9	0.256	83.1 ± 11.92	83.71 ± 15.99	91.83 ± 11.51	0.5
3	72.72 ± 30.2	77.71 ± 14.7	82.87 ± 21.3	0.838	78.03 ± 14.65	87.93 ± 14.34	91.42 ± 19.63	0.161
4	79.38 ± 24.1	80.31 ± 14.7	95.37 ± 0.9	0.181	80.76 ± 12.32	86.67 ± 14.49	93.18 ± 19.94	0.508
5	76.9 ± 27.7	70.77 ± 24.4	73.6 ± 25.3	0.492	81.77 ± 14.59	79.98 ± 14.28	83.44 ± 13.24	0.959
6	80.7 ± 28	77.44 ± 10.4	82.81 ± 15.8	0.36	82.72 ± 16.5	86.63 ± 14.44	88.22 ± 13.83	0.88

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Filos, D.; Claes, J.; Cornelissen, V.; Kouidi, E.; Chouvarda, I. Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods. Appl. Sci. 2023, 13, 6120. https://doi.org/10.3390/app13106120

AMA Style

Filos D, Claes J, Cornelissen V, Kouidi E, Chouvarda I. Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods. Applied Sciences. 2023; 13(10):6120. https://doi.org/10.3390/app13106120

Chicago/Turabian Style

Filos, Dimitris, Jomme Claes, Véronique Cornelissen, Evangelia Kouidi, and Ioanna Chouvarda. 2023. "Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods" Applied Sciences 13, no. 10: 6120. https://doi.org/10.3390/app13106120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Adherence to Home-Based Cardiac Rehabilitation with Data-Driven Methods

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Description

2.1.1. Data from Baseline and Periodic Clinical Evaluation

2.1.2. Exercise Session Data

Exercise Adherence Metric

Exercise Performance Metric

2.2. Investigation of Different Patient Clusters at Baseline

2.2.1. Clustering Baseline Profiles

2.2.2. Clustering Familiarization Adherence Behavior

2.2.3. Clustering whole-Program Adherence Behavior

2.3. Predictive Modeling for Whole-Program Adherence Prediction

2.4. Statistical Differences between the Groups

2.5. Network Analysis Per Group

3. Results

3.1. Absent Versus Present Patients during Familiarization

3.2. Patient Profile Clusters

3.2.1. Clusters Based on Clinical Baseline Characteristics

3.2.2. Clusters of Patient Adherence during Familiarization

3.3. Program Adherence Clusters

3.4. Predicting Program Adherence

3.5. Network Analysis and Detection of Structure in Patient Profiles

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI