1. Introduction
Diabetes Mellitus is a disease characterized by a dysregulation of the natural homeostasis of glucose concentration in the body. It has a projected prevalence of 9.9% of the global population by 2045 (629 million people affected), and the estimated yearly cost in healthcare expenditure by the same year is 776 billion USD [
1]. Approximately 10% of the people that have diabetes suffer from type 1 diabetes, which has a very early onset, usually manifesting during childhood or adolescence, and requires intensive supervision and treatment of the disease. Type 1 Diabetes (T1D) is defined by the inability of the beta cells of the pancreas to produce insulin in the bloodstream partially or totally, which leads to abnormally high levels of blood glucose. This problem worsens during periods of meal ingestion, in which glucose is poured into the bloodstream through the gut, or during physical activity or stress, which alter uptake or production of glucose in different parts of the body.
An Artificial Pancreas (AP) is a device that feeds continuous glucose measurements into an insulin pump in real time and continuously adjusts insulin dosage delivered to the patient. It is one the most promising solutions to the complications of T1D, and several prototypes have been under development over the last few years, including commercially-available [
2] devices. Physical activity has been linked to changes in glucose trends and variability in patients with diabetes [
3,
4]. The accuracy of glucose measurement is of great importance for the correct performance of AP devices, since it drives the decision making of the algorithms behind it. The accuracy of Continuous Glucose Monitoring (CGM) devices has been observed to be affected during exercise periods. Taleb et al. [
5] showed that both Dexcom G4 and Medtronic Enlite devices’ accuracy dropped during aerobic physical activity when compared to reference glucose measurements. Increased CGM error has been found consistently during aerobic exercise, even among the more recently distributed devices in the market [
6]. Biagi et al. [
7] showed that the accuracy of Medtronic Enlite 2 devices dropped both during aerobic and anaerobic exercise, but only the results during aerobic exercise were significantly different. This conclusion has also been reported in the literature before [
5,
8], which suggests an underlying problem with the mechanism of glucose estimation in the subcutaneous tissue during periods of physical activity.
Exercise monitoring is widespread today in most parts of the world, with many different devices designed and marketed to provide estimation of the intensity, type, and duration of physical activity. Integration of all available signals from wearable physical activity monitors (wearables) remains an open issue [
9], and many of the wearable signals are only able to be extracted as processed variables. The use of wearable devices to improve T1D management has been reviewed before [
10], and many studies have been conducted by adding different isolated wearable variables into AP controller algorithms [
11], supervision algorithms [
12], or prediction and classification algorithms [
13,
14]. Turksoy [
15] studied the influence of wearable signals (named biometric variables in the reference) in the context of their possible implementation in an artificial pancreas system, showing that the amount of information carried by each signal changes depending on the type of exercise performed and placing special importance on the estimation of total energy spent by the physical activity.
The motivation for this work arises from the possibility of compensating the greater estimation error shown during exercise periods. Our working hypothesis is that exercise monitoring devices could provide information to a CGM device that allows for a real-time correction of the shift in glucose estimation during physical activity. The work presented in this paper analyzes the influence of the different signals provided by three different wearables (Fitbit Charge HR, Microsoft Band 2, and Polar HR) on the accuracy of a CGM device during aerobic exercise. Each wearable signal is critically selected or discarded as input for a multiple regression model designed to compensate the measurement error induced by the exercise session. The final regression model improves the accuracy of the CGM signal significantly during the exercise period, and baseline accuracy is maintained during the resting period. It is shown here that only two wearable signals are necessary to enhance the CGM measurements, and that redundancy of sensors only marginally increases the accuracy of the CGM estimations.
This paper is structured as follows. First, a breakdown of the methods, models, protocol, and datasets used is provided, including a brief description of all the devices employed in the study. Secondly, the results of the validation of the model utilized will be provided, along with the critical selection of signals to be used in the model. Then, a critical discussion of the results will be exposed, followed by the final conclusions.
2. Materials and Methods
2.1. Patients
The dataset used here was obtained from a longitudinal, prospective, interventional study with the goal of analyzing the performance of an exercise-challenged closed-loop controller in T1D people [
16]. A total of six participants were enrolled at the Clinic University Hospital of Barcelona. The protocol was approved by the Ethics Committee of the hospital. Criteria for eligibility were: (1) age between 18 and 60 years old, (2) Body Mass Index (BMI) between 18 and 30 kg/m
, (3) Glycated Hemoglobin A1c (HbA1C) between 6.0% and 8.5%, and (4) use of Continuous Subcutaneous Insulin Infusion (CSII) for at least six months. Exclusion criteria included: (1) pregnancy, (2) use of experimental drug or devices in the past 30 days, (3) onset of progressive fatal diseases, (4) hypoglycemia unawareness, (5) drug or alcohol abuse, and (6) other systematic diseases other than T1D, including hepatic, neurological, and endocrine-related illnesses. A summary of the patient demographics is shown in
Table 1.
Each participant underwent three aerobic and three anaerobic exercise tests. The order of the type of exercise was randomized, but the same type was carried out for three consecutive trials. The subjects used Paradigm Veo® insulin pumps, and two Enlite-2® (Medtronic Minimed, Northridge, CA, USA) CGM sensors were inserted in different parts of the abdomen the day before the trial by the subjects. Plasma glucose (PG) as a reference value was measured using the YSI 2300 Stat Plus Glucose Analyzer (YSI Incorporated Life Sciences, Yellow Springs, OH, USA), with a frequency of 15 min. The exercise schedules were as follows:
Aerobic routine: Patients exercised doing three bouts of 15 min on a cycloergometer at 60% of the patients maximum capacity with five minutes of rest between them.
Anaerobic (resistance) routine: Patients exercised doing five bouts of eight repetitions of four different exercise sets of 15 min at 70% of the patients’ maximum capacity with 90 s of rest between sets.
The exercise intensity was converted into units of heart rate, which was different for each patient and calculated as follows:
where
is the intensity of the exercise in terms of heart rate for patient
i,
stands for the percentage intensity of the exercise (60 for aerobic exercise, 70 for anaerobic), and
stands for the maximum heart rate of patient
i, defined as
for women, and
for men, where
is the age of patient
i in years. Lastly,
is the resting heart rate of patient
i as measured at the beginning of the experiment.
2.2. Exercise Monitoring Devices
Off-the-shelf physical activity monitors were used to obtain the different biometric variables registered in the study. Patients wore three different devices simultaneously:
Fitbit Charge HR™(Fitbit, San Francisco, CA, USA): a physical activity monitor that tracks several exercise signals: (1) heart rate, (2) steps, (3) floor level, (4) Metabolic equivalents (METs), and (5) calories burned.
Microsoft® Band 2 (Microsoft, Redmond, WA, USA): a multipurpose device that tracks physiological variables such as: (1) heart rate, (2) steps, (3) galvanic skin response, (4) skin temperature and (5) movements.
Polar heart rate monitor, Model RCX3® (Polar, Kempele, Finland): a commercially-established heart rate monitor.
The above listed variables were recorded with a sample rate of one minute and downloaded at the end of each trial. Each device provided a different number of signals, which will be listed next, for a total of 11 signals. The nomenclature of each signal needs to be defined in advance, since they will be referred to using this notation from here on. For each signal, a three-letter code was defined representative of the nature of that signal. The first letter of the triplet was the initial letter of the device that provided that signal, and the second and third letters were an abbreviation of the name of the signal itself. For example, the signal named FHR stands for F from the Fitbit Device and HR from the Heart Rate signal.
FHR: Fitbit Heart Rate; measured in beats per minute; can be used to estimate exercise intensity.
FST: Fitbit STeps; the number of steps walked.
FLV: Fitbit LeVel; the number of floors of stairs climbed.
FME: Fitbit MEts; the metabolic equivalent of tasks; an estimation of exercise intensity normalized for each patient.
FCA: Fitbit CAlories; calories burned.
MHR: Microsoft Band Heart Rate.
MTM: Microsoft Band TeMperature; skin temperature.
MGS: Microsoft Band Galvanic Skin Response; an estimation of the skin electrodermal activity, also known as the conductance level of the skin.
MST: Microsoft Band STeps; number of steps walked.
MMO: Microsoft Band MOvements; accumulated movement magnitude registered by the accelerometer.
PHR: Polar Heart Rate.
2.3. Data Filtering and Pre-Processing
Thirty six trials were completed, and data were collected for all of them. In order to align CGM and YSI data, CGM data were linearly interpolated and rounded to one sample per minute. Following the rationale described in [
7], data arrays corresponding to either CGM or YSI malfunction, error outliers, or more than 30% of data samples flagged as faulty were discarded. A total of 665 sampled pairs were available, distributed among 27 streams of data out of the 36 corresponding to paired errors of CGM vs. YSI following the rule:
where
represents the CGM estimation error for data stream
i at timestamp
k. Similarly,
and
are the CGM data and plasma glucose data measured with the YSI. Each data stream
i is a time series corresponding to a particular experiment and CGM signal.
CGM accuracy was evaluated using the Mean Absolute Relative Deviation (MARD), calculated as:
where
is the average percent error of the data stream
i and
n is the total number of samples available for
i.
Wearable data were pre-processed by removing the baseline value of each wearable signal and trial from the signal itself. The baseline of each signal was calculated as the median value of that particular variable for each data stream evaluated only in the resting period. In an out-of-clinic environment, this value is trivial to obtain since it would be the equilibrium value of the signal during resting or sleeping periods.
The availability of the different wearable variables, defined as the complementary value of the percentage of missing samples, varied from signal to signal. In
Table 2, the availability of each one of the signals obtained from the wearables is provided.
2.4. Data Analysis
2.4.1. Model Fit
Linear regression models were used to fit the CGM error data, using only the wearable signals as inputs. The intercept of the model was forced to be zero in order to better compensate only the error observed in the exercise period. The linear regression model follows the following equation:
where
is the vector of regression parameters, of a length equal to the number of wearable signals used. Each of the components of
, corresponding to each of the wearable signals, is a column vector of length
n. Finally, the new enhanced CGM (eCGM) estimations, corrected using the new estimated error, is calculated as follows:
where
represents the enhanced CMG values for data stream
i,
are the original sensor glucose estimations, and
are the output of the linear regression above.
2.4.2. Exercise Signal Selection
In order to determine which of the wearable signals in the study were more relevant for our purposes, a backwards elimination of variables approach was followed, in which variables were sequentially discarded as inputs to the correction model, based on:
Lack of availability of synchronous data to the CGM-YSI available pairs.
Lack of representation of the type of physical activity performed in the study: cycloergometer aerobic exercise.
High correlation between wearable signals. A high correlation between two signals implies that the amount of information carried by each one of those signals is similar; thus, one of them can be eliminated without hindering the predictive capabilities of the model.
MARD improvement evaluated in the validation sets (see
Section 2.4.3 below). A wearable was removed from the input set if the MARD for the validation sets (both in the exercise period and in global terms) was improved by removing that wearable signal alone.
If a variable complied with the above rules, it was removed from the inputs of the multiple regression model, and the model was fit again to the data and the outputs re-evaluated, until no improvement of the error was achieved by removing any of the inputs.
2.4.3. Cross-Validation
The predictive power of the model was validated by performing a non-exhaustive leave nine-out cross-validation of the model. Out of the set of 27 streams of data, two thirds (i.e., 18 randomly-chosen streams out of 27) were selected as a fitting set, and the remaining data were used to validate the fitted model. Model accuracy was evaluated on the validation set, the results stored, and the process was repeated, randomly selecting new fitting and validation sets. This process was repeated 20 times. All the accuracy metrics shown onward were those averaging the results throughout all the 20 validation sets of nine streams of data.
4. Discussion
The first outcome that can be extracted from the results is that further integration of physiological signals in CGM devices is desired. CGM devices use algorithmic rules to fit the chemical measurements of the subcutaneous probe into relatively accurate plasma glucose values. If the data extracted from wearable devices were to be integrated in those algorithms, glucose estimation would benefit greatly.
The proposed linear regression model (composed of only two parameters) achieved a reduction in the MARD during aerobic exercise that practically nullified the influence of the exercise on the CGM estimation error. Indeed, the MARD was significantly reduced from 17.46% to 13.8%, much closer to the global average MARD value for the original CGM readings of 13.61%. This achievement was the main reason to choose a regression model for error compensation, which validated the starting hypothesis.
The regression model presented here differed from other studies involving wearables in the AP (see [
11,
12]) in the fact that the work presented here supposed a more general result, its application being not only limited to closed loop studies; the work presented here can be used to provide better estimations to an AP device, but it is not limited to that. It can also be used to provide simply more accurate continuous glucose readings to people with diabetes regardless of whether or not they are using an AP device or undergoing CSII treatment.
The Clarke EGA shown in
Figure 3 indicated very encouraging results for the exercise period; while the original CGM already had 100% of the samples in the A + B zone, eCGM showed a shift of the data points towards the A zone for the exercise period (72.4% → 85.7%). However, Clarke’s analysis also showed a very small rise of points in Zone D during the rest period (0.2% → 0.6%), while no points were present in Zone C. This is a consequence of the discontinuity in the zones defined for Clarke’s EGA, where Zone D can be accessed directly from Zone A or B. This is one of the main criticisms [
18] against Clarke’s EGA, and it was the reason that motivated the comparison with other analysis methodologies.
The results shown in
Table 6 are even more positive, showing an increase of points in Zone A (79.2% → 89.2%), while showing no points in Zones C + D + E. ISO results from
Table 7 also showed better results for eCGM than the original sensor, increasing the amount of “OK” samples 55.2% → 64.4% during the exercise period, as well as increasing the total samples in the ISO zone 66.6% → 68.8%
Figure 2 compares MARD between different proposed models of decreasing complexity, showing also increasing performance in the validation error. The difference in the global MARD values in the last two models was almost non-existent, while the MARD during the exercise phase of the study actually worsened. However, these differences were very small when compared to the MARD reduction achieved by utilizing any of those models versus not using any model to compensate for exercise (17.46% → 13.8% and 17.46% → 14.14%). It can be argued that using a model with only one parameter is preferable than the more complex option of a two-parameter model, especially when looking at the fact that both input signals came from different devices on the market: FME was provided by the Fitbit device, and MTM was extracted from the Microsoft device.
This work did not focus on the nature of the signals used in the model. Rather, the focus here was on the user-level data that can be easily extracted from a device, which translates into a more straightforward implementation of the model in portable devices. This implies that some of the signals that were estimated from each of the devices using hidden algorithms might not be reproducible from other raw data, e.g., FME was an energy expenditure estimation by the Fitbit device calculated from its sensors, which can differ from implementations in other devices. In a recent study comparing different research and consumer devices, an error range of 10% was found for heart rate and energy expenditure measurements [
20].
It must be noted that the current approach has some drawbacks due to its intentional simplicity. First, while the CGM error was reduced significantly in the exercise period, this was achieved by means of a direct algebraic link to the signals of the wearable devices, which may transport noise from the wearable into the CGM, but that problem was not observed in our data. Additionally, it must be mentioned that more complex dynamic models might have further enhanced CGM accuracy, but the design of such models is beyond the scope of this work.