1. Introduction
Sleepiness and vigilance are closely associated with core body (CBT) and skin temperatures. A high daytime CBT aligns with a low skin temperature for optimal vigilance, while a low night-time CBT aligns with a high skin temperature for optimal sleep. Sleep initiation is bidirectionally associated with the circadian rhythms of the CBT and skin temperature, with habitual sleep onset coinciding with CBT decline and increased skin temperature [
1].
The CBT decline during sleep is facilitated by peripheral vasodilation, which increases the cutaneous blood flow such that internal heat from the body’s core is dissipated into the micro-environment surrounding the sleeper’s body. The rise in blood flow through peripheral cutaneous blood vessels causes the skin temperature to rise [
2]. This is particularly evident during the transition from wakefulness to sleep, which is accompanied by an increase in both the distal (hands and feet) and proximal (abdomen and chest) skin temperature by approximately 1 °C and 0.5 °C, respectively [
3].
Under the controlled conditions of a constant routine protocol, Kräuchi et al. [
3] identified the gradient between the distal and proximal skin temperature to be the best predictor for sleep onset latency. A causative role of warming the feet in increasing the distal skin temperature and shortening the sleep onset latency was verified in [
4,
5]. The warming of the proximal temperature by less than 1 °C can reduce the sleep onset latency by at least 3 min, as reported in [
6].
The close association between sleep, body temperature regulation, and patterns of skin temperature change motivates the need for practical and ecologically valid means to unobtrusively measure the skin temperature during sleep and enable the longitudinal characterization of the dynamics of skin temperature during sleep.
The research presented in this article proposes an algorithm to estimate the wrist distal skin temperature during sleep with a minute-level temporal resolution using the data from a temperature sensor strip positioned slightly under the surface of a smart bed’s mattress.
2. Materials and Methods
2.1. Smart Bed Platform
Sleep Number’s primary sensing technology uses a pressure signal sampled at 500 Hz from a sensor positioned inside an inflatable air bladder within a smart bed. Ballistocardiography (BCG) signals, which reflect movement, cardiac activity, and respiratory activity, are obtained from the pressure signal through signal processing methods described in [
7]. The smart bed’s embedded software relies on machine learning methods to determine, in real time, bed presence, bed entry, bed exit, body movements, position changes, time to fall asleep, breathing rate, and heart rate [
7].
In addition to the pressure signal, recently introduced smart bed models (C360 [
8]) include an array of five temperature sensors arranged along a temperature sensor strip (TSS). These sensors are impalpable as they are located 0.25 inches below the sleeping surface.
The C360 smart bed type used in this study has two TSSs on each side of the bed, positioned as indicated in
Figure 1. For a queen-sized bed (width: 152.4 cm height: 203.2 cm), the individual temperature sensors are separated from each other by 14 cm. For a king-sized bed (width 203.2 cm and height: 203.2 cm), the spacing between the individual temperature sensors is 16.8 cm.
The individual temperature sensors are referred to as T1 to T5 starting from the edge to the center of the bed. The embedded software in the smart bed samples the temperature data of each individual sensor at an approximate rate of 1 sample/s. The resolution of each temperature sensor is 0.1 °C.
2.2. Study Design and Participants
This study, approved by the Institutional Review Board (IRB) of Northwestern University (protocol number STU00217800), was conducted between August 2023 and March 2024. Eighteen volunteers were recruited (9M/9F), with a mean age of 44.5 (standard deviation SD: 6.7) years (see the table in
Section 3.1), from Sleep Number customers having a C360 smart bed and residing within easy reach of the Sleep Health Center of Northwestern University in the western Chicago area, IL, USA.
The study volunteers signed an IRB-approved consent form and participated in this study for 16 nights in total. Each participant collected data from 14 sleep sessions at home in their usual smart bed and spent two nights in the sleep lab on a queen-sized C360 smart bed. Two queen-sized C360 smart beds, to which participants were randomly assigned, were installed in the research facilities at the Sleep Health Center at Northwestern University.
The in-home portion of the study included the collection of data from the TSS; the proximal skin temperature, using an iButton [
9] (type DS1922L; Maxim, Dallas Semiconductor Corp., Dallas, TX, USA) positioned just below the left collar bone on the participant’s chest; the foot distal skin temperature, using an iButton positioned near the arch on the right foot; and the non-dominant wrist distal skin temperature, using the Empatica E4 (Boston, MA, USA) wristband [
10]. The iButton device offers two resolutions, 0.5 °C or 0.0625 °C [
11]; we used the latter in this study. The Empatica E4’s skin temperature resolution is 0.02 °C [
12]
The in-lab portion of the study included polysomnography (PSG) signals [
13], the proximal skin temperature, and the foot distal skin temperature, measured using iButtons positioned similarly to the in-home condition. The Empatica wristband was not used in the lab to minimize the burden on participants who slept with a PSG setup that included an electroencephalogram, electrocardiogram, electromyogram, and breathing sensors. The TSS data from the smart beds installed in the sleep lab were also collected.
The sampling periodicity for the iButtons was set to 30 s, the sampling rate for the skin temperature acquired by the Empatica wristband was 4 samples/s, and the TSS sampling rate was approximately 1 sample/s/sensor.
In this research, we focus on estimating the wrist distal skin temperature (i.e., the temperature measured by the Empatica wristband) using the TSS data. Henceforth, we refer to the wrist distal skin temperature as DST.
The analysis of the complete dataset collected in this study including PSG signals, sleep architecture, and behavioral outcomes is ongoing and will be the topic of a future publication.
2.3. Distal Skin Temperature Estimation Algorithm
Figure 2 shows the block diagram of the algorithm to estimate the DST at time
n in minutes. An early version of this algorithm was presented in [
14].
The data from the temperature sensor array, acquired at a sampling frequency of approximately a sample per second for each sensor, were aggregated at the minute level by calculating the mean value such that the sampling rate for each individual temperature sensor was exactly 1 sample/min. The temperature data from the Empatica wristband and the iButtons were also sub-sampled at 1 sample/min to match the TSS’s sampling rate. Given the low variability of the skin temperature values [
15], such an aggregation can be performed without the risk of aliasing.
The five TSS values for the n-th minute were sorted in descending order ; and the top three largest values were selected for further processing. This selection aimed at focusing the analysis on the temperature values that most likely touched the body of the study participant.
A first DST estimation
was obtained using a gradient boosted tree implemented using the XGboost library in Python [
16]
. Because of the strong correlation between consecutive DST values [
15], five consecutive DST estimates were grouped into a five-element vector
, which was further processed by a random forest [
17] to produce the final DST estimation at time
n,
. The two regression models, namely, the gradient boosted tree and random forest, were sequentially trained as described in
Section 2.4.
2.4. Training and Validation of the Gradient Boosted Tree and the Random Forest Models
The data from all the sleep sessions from the in-home portion of the study were used to train and validate the regression models. Data selection criteria were applied to ensure that only high-quality data were used to train the models. The data from a sleep session were included if at least 80% of the TSS and Empatica data for that session had good enough quality characterized by the absence of missing data or invalid values that included null, negative, and large temperature oscillations within a minute (which indicate electronic issues). In addition, using the smart bed data from the pressure signal, the temperature data corresponding to segments of bed absence were discarded.
All the sleep session data from a given participant were considered as an indivisible block such that no data from the same participant could be in both training and validation sets. This procedure increases the likelihood of the validation accuracy to reflect the generalization accuracy (i.e., the accuracy on unseen data).
The skin temperature data measured at the non-dominant wrist by the Empatica E4 device was set as the ground-truth (or reference) DST value. These data were first smoothed using the robust locally weighted regression algorithm (LOWESS [
18]) and then sub-sampled from 4 samples/second to 1 sample/min to match the TSS sampling rate.
A five-fold cross-validation (CV) procedure [
19] was followed to train the regression models. The set of participants was partitioned into five subsets that were mutually exclusive and collectively exhaustive. For each CV fold, the data from a given subset were selected for accuracy evaluation (see accuracy evaluation method in
Section 2.5) and the data from the remaining four subsets were used to train the regression models in sequential order: the gradient boosted tree first, and then the random forest (see
Figure 3a).
The optimal set of hyperparameters for both regression models was determined by selecting a portion (20%) of the data to perform a Bayesian-type of optimization [
20] (implemented in Python using the Hyperopt library) in the hyperparameter space defined as described in
Table 1. The optimal hyperparameter values were used for the five-fold CV.
2.5. Accuracy Evaluation
For each CV fold, the accuracy was evaluated at the sleep session level first (see
Figure 3b) and then the mean values across all sleep sessions in the validation set were calculated to obtain the accuracy corresponding to the CV fold.
At the sleep session level, the agreement between the ground-truth values obtained from the reference DST and the estimated DST values (smoothed using LOWESS) was evaluated using Bland–Altman plots [
21], from which the bias, the upper limit of agreement (LoA), and the lower LoA were calculated. Given the ground-truth DST values for a sleep session
and the corresponding DST estimated values
, the bias, lower LoA, and upper LoA were calculated as described in Equation (
1). In this equation, the initial value for
n is 5 since the first estimation
can only occur after five values:
have been accumulated (see
Figure 2).
is the standard error and the limits of agreement LoA correspond to the the 95% confidence interval for the bias.
In addition to the bias and LoA, the coefficient of determination
was also calculated to quantify the agreement between the reference and estimated DST. The coefficient of determination is informative of the regression type of analyses [
22] and can be interpreted as the proportion of the variation in the reference DST that is explained by the estimated DST.
2.6. DST Estimation Algorithm Verification Using the in-Lab Data
The DST for all in-lab sessions was estimated using the algorithm presented in
Section 2.3. For each sleep session, the estimated DST values starting at lights-off (which is the event that indicates to the study participants that they may start attempting to fall asleep) and ending an hour after lights-off, were normalized by subtracting the initial value to obtain a DST change curve.
The hour-long period for this analysis was intended to capture the falling asleep process which should be associated with an increase in DST, as elaborated in
Section 1. The grand average estimated DST change across all in-lab sessions was finally calculated by averaging the individual sleep session DST change curves.
An identical process to the one described in the previous paragraph was followed for the measured foot distal temperature data to obtain a grand average foot distal temperature change (with respect to lights-off) across all in-lab sessions. While the latter may not be identical to the grand average estimated DST change because of the difference in location (wrist versus foot), it is reasonable to expect a significant and strong correlation between these curves during the falling asleep process. We have therefore calculated the Pearson correlation and corresponding statistical significance.
4. Discussion
The significant association between skin temperature and sleep, the potentially causal role of the former on the transition from wakefulness to sleep, and sleep maintenance motivates the need for unobtrusive and ecologically valid means to measure the skin temperature while lying in bed. This can enable the characterization of the skin temperature dynamics across a large sample of individuals for a long period of time and under real life conditions.
In this study, we leveraged the capabilities of a smart bed (C360) equipped with temperature sensor strips located on each side of the bed and slightly under the sleeping surface to develop an algorithm to estimate the wrist distal skin temperature. For this purpose, we leveraged the data collected in a study where the participants (owners of C360 smart beds) collected temperature data for fourteen nights sleeping on their own smart bed at home, and for two in-lab nights sleeping on a C360 bed installed in the sleep lab. This design allowed us to train and validate the algorithm with the in-home data and verify the results using the in-lab data.
The relatively slow temporal variation in the skin temperature [
15] enabled the resampling of both the sensor strip and the ground-truth data at 1 sample/min. The orientation of the sensor strip (see
Figure 1) suggests its sensitivity to lateral sleep movements, which were (in part) compensated by our strategy of selecting, at each time point, the data from the top three out of the five temperature values provided by the sensor.
The architecture of the skin temperature estimation algorithm was organized into two stages; the first stage is a regression model implemented as a gradient boosted decision tree and the second stage leveraged the strong temporal correlation between consecutive skin temperature values to generate the final skin temperature estimation using five consecutive values (i.e., five minutes) from the first stage, which were grouped using a rolling window. We approached the training of this staged regression model in a sequential manner and used a five-fold cross-validation procedure that ensured that the data from a given participant were used either in the training or the validation but not in both. This strategy increases the likelihood that the validation accuracy reflects the generalization accuracy when the model is applied to data from “unseen” individuals.
The convergence of training and validation curves towards a coefficient of determination close to 0.9 (see
Figure 4) suggests that the model has an appropriate generalization ability and may explain about 90% of the variability in the reference skin temperature data. The Bland–Altman analysis across all cross-validation folds suggests that the algorithm may not only be able to explain a high proportion of the variance but also reflect the ground-truth skin temperature with a near-zero bias and absolute error
°C.
The examination of the data from nine sleep sessions in the validation set (see
Figure 6) illustrates the close agreement between the estimated and ground-truth distal skin temperature. It is illustrative to appreciate that the moments where the estimation error appears as more prominent correspond to large deviations in the ground-truth data (see first and last graph in
Figure 6). It may be argued that the estimation provided by our algorithm is more stable and as such may be more resilient to perturbations. This is likely the result of the second-stage random forest model that processes five consecutive estimates from the first stage. However, this can also indicate a limitation in the ability to capture rapidly changing temperature variations that may reflect sleep-relevant aspects.
The verification results using the in-lab data provide further evidence of the generalization ability of the algorithm. Indeed, the temperature sensor strip data for each participant were collected from a different smart bed (i.e., in-lab versus at-home), and the distal temperature data were measured on the foot instead of the wrist. Furthermore, there were two more participants for the in-lab condition with respect to the in-home condition. While the distal skin temperature measured on the wrist may not be identical in value to that measured on the foot [
5], it can be reasonably expected that during the falling asleep process, both temperatures increase, which facilitates the decrease in the core body temperature. The latter is a hallmark of the transition from wakefulness to sleep [
1]. The grand average distal skin temperature change curves shown in
Figure 6 show that the foot distal skin temperature increases, which reflects a transition from wakefulness to sleep and that the grand average distal skin temperature estimated by our algorithm shows a similar increasing trend that significantly and strongly correlates (Pearson correlation: 0.99) with the measured distal skin temperature.
To train the distal skin temperature estimation model, strict quality selection criteria were applied. Indeed, about 20% (49/229) of the in-home sleep session data was not considered for training due to its insufficient TSS quality and the data from 85 in-home sleep sessions were also excluded due to the insufficient quality of the ground-truth reference. Emphasizing data quality over quantity for the training of machine learning models has been recommended as a strategy to improve model performance [
23]. Notwithstanding the strict data quality criteria applied to model training, the verification with the in-lab data considered all the sleep sessions.
There are a number of limitations associated with this research. The number of participants in this study is small and this limits the generalization of the results we have presented.
The reference for the distal skin temperature was collected from the wrist. However, it may be more appropriate to consider as the distal temperature reference a mean value across several sensors positioned on all limbs. The algorithm architecture proposed in this article may still be able to successfully use that type of reference.
The distal-to-proximal gradient temperature was suggested to be the best predictor for sleep onset [
24] and, as such, it may be preferable to design an algorithm to estimate this gradient instead of the distal skin temperature alone.
Sleep stage information from the in-home data is not available given that those recordings did not include electroencephalograms or other PSG signals to enable sleep staging. This limits the analysis of the accuracy depending on the sleep stages.
The precise position of the sleeper on the bed was not recorded, which could have provided valuable context information about the segments with missing or erroneous data that we discarded in our analysis. Our remediation strategy consisted of selecting the top three temperature values as proxies for the sensors that touch the body, and could be optimized if the sleeper location was known.
The accuracy of the system was not tested while using electric blankets or other devices. Provided that these would not directly touch the sensors and modify only the in-bed micro-environment, it is reasonable to expect that the algorithm can still provide accurate skin temperature estimation. However, this needs to be formally validated.
The method proposed in this research to unobtrusively estimate the distal skin temperature while lying in bed has a promising accuracy and generalization ability. To the best of our knowledge, this is the first study reporting on the feasibility of unobtrusively estimating the skin temperature while lying in bed. Additional research is needed to confirm these encouraging results in a larger cohort.