1. Introduction
The interest in the study of sleep is growing, thanks to awareness about the negative consequences of poor sleep quality. These can be daytime drowsiness, which can lead to low performance during work and to an increased risk of falling asleep while driving, weakening of the immune system, and generally lowering the quality of life [
1,
2].
First of all, quality sleep is characterized by sufficient Total Sleep Time (TST); specifically, 7 to 9 h are recommended. It must also be continuous, with few interruptions that are as short as possible. It must then be deep enough and consist of a short falling asleep time [
3,
4]. Finally, it is important to understand whether or not the individual in question is suffering from sleep disorders, such as respiratory disturbances, which could prevent sufficient energy recovery during the night and lead to daytime drowsiness [
5].
The medical gold standard for the study of sleep is polysomnography (PSG), a medical examination in which electroencephalogram (EEG), electrocardiogram (ECG), electromyogram (EMG), electrooculogram (EOG), nasal flow, and movements during the night are monitored. Data collected are analyzed by a physician who does the sleep scoring and finds any sleep disorders [
6,
7,
8].
Sleep scoring is the identification of the subject’s stage of sleep. During the night, light sleep (NREM1 and NREM2 phase), deep sleep (NREM3 phase), and REM sleep alternate. Light sleep is characterized by a slowing of cardiac and respiratory activity. Deep sleep consists of minimal Heart Rate (HR) and slow brain waves. REM sleep is distinguished by rapid eye movements (REM is, in fact, an acronym for Rapid Eye Movements) and by a higher HR [
9,
10]. The advantage of polysomnography is the ability to analyze several physiological parameters, including EEG, in order to obtain the most accurate scoring and reporting of disturbances. However, the disadvantages of this procedure are cost and invasiveness. A less invasive technique used to monitor sleep is actigraphy, which involves scoring through the study of movements detected by means of a wearable device [
11,
12]. Thanks to the improvements that wearable technology sensors have undergone, more and more devices have emerged that are capable of scoring using HR, Heart Rate Variability (HRV), and 3-axis acceleration [
13,
14]. Sleep is strongly related to the Autonomic Nervous System (ANS), which regulates many of the involuntary body functions, such as breathing, digestion, HR, and blood pressure. The ANS is divided into two main components: the sympathetic nervous system (which stimulates “fight or flight” responses) and the parasympathetic nervous system (which promotes rest and recovery). During sleep, the ANS plays a key role in regulating the different stages of sleep, especially in the transition between REM and non-REM sleep. HRV is a measure of the variation in time intervals between successive heartbeats (RR intervals), and it is closely related to the activity of the ANS [
15]. Because of this relationship, wearable devices are able to perform sleep scoring quite accurately, although less accurate than a PSG, but cheaper and less invasive. In addition to RR intervals, there are several HRV metrics, such as SDNN (Standard Deviation of NN intervals) and RMSSD (Root Mean Square of Successive Differences), which are used in conjunction with HR and accelerometer to achieve a higher accuracy.
An important topic in the study of sleep is on-the-fly sleep scoring. An on-line (or on-the-fly) algorithm generates output in response to each input portion without knowing the future input [
16]. An off-Line algorithm, instead, provides the output after having received the whole input sequence. In addition to being able to estimate how the previous night went, in some cases, it would be useful to be able to know at which stage the subject is at a specific moment. For example, sometimes it’s useful to avoid a subject falling into REM or deep sleep and to act instantaneously if it happens. Different studies have been conducted where the algorithms are able to identify the sleep stage during the night.
Thankachan et al. [
17] developed an automated threshold calculation algorithm capable of identifying NREM, REM, and wake in mice. They used EEG and EMG and achieved an overall accuracy of 90%.
Koushik et al. [
18] developed a time-distributed Convolutional Neural Network able to perform real-time sleep scoring. Their algorithm is able to identify five stages: N1, N2, N3, REM, and wake. They used a single-channel EEG obtained by a modified version of the Muse headband [
19] and achieved an overall accuracy of 83.5%.
Rault et al. [
20] developed another real-time algorithm able to detect sleep–wake patterns with a single-channel EEG with a resolution of 30s. They applied it to critically ill patients achieving a media sensitivity of 97.9% and a median specificity of 49.7% on the sleep–wake pattern identification.
Zhang et al. [
21] implemented a real-time Automated Sleep Scoring device with one probe that senses blood oxygen, PPG, and actigraphy. It was able to perform four stages of sleep scoring with an accuracy of 75.1%.
Most of them require EEG, which is an invasive technology, or require ad-hoc devices. The idea behind this work is to implement an on-the-fly scoring algorithm that is able to work with signals available from a consumer smartwatch, in particular, HR, RR intervals, and 3-axis acceleration at 1Hz. The research significance of our concept is not just a chance to perform sleep scoring with HR, RR intervals, and accelerations, but the possibility of using these parameters in an on-the-fly sleep scoring algorithm. The obtainment of these data from cheap, low-invasive, and widespread wearable devices, can lead to interesting applications. A useful case for the on-the-fly sleep scoring algorithm could be while driving.
Despite the spread of autonomous driving, available systems are not completely autonomous. Cars with partial and conditional automation (L2–3) require human attention. The driver must be able to take back control of the vehicle when needed through the handover procedure [
22]. In an automated context, drivers tend to be distracted and could become drowsy more easily, and drowsy driving is a public health problem that causes 17.6% of road accidents, according to an AAA Foundation study of the National Highway Traffic Safety Administration [
23]. In particular, a deeper sleep state leads to a higher sleep inertia, preventing a timely control regain [
24].
The problem with the proposed algorithm is the dependency from the 3-axis accelerometer, which could be strongly affected at the wheel, especially on rough terrains. So a second version of the algorithm which does not use the accelerometer was developed and tested.
2. Materials and Methods
Participants were monitored overnight in their homes through a full PSG. They signed informed consent before taking part in the data collection and filled out a form asking about gender, age, height, weight, and whether or not they were taking beta-blockers. Along with the PSG, they wore a smartwatch with a consumer-grade optical HR sensor. The used wearable exploits photopletysmography (PPG) to detect continuously HR at 1Hz, and the accelerometer sensor is able to provide 3-axis acceleration data up to 25Hz. Other than HR, the PPG sensor allows the detection of RR intervals. The used smartwatch was tested in different physical activity scenarios, achieving a Mean Absolute Percentage Error on Mean HR of around 3%. The previously mentioned data were extracted at 1Hz. PSGs were read by the doctor who produced a medical scoring.
2.1. Motion-Aware On-the-Fly Sleep Scoring Algorithm
Data acquired from the wearable device were passed to the Matlab algorithm capable of on-the-fly scoring (every second, the algorithm evaluates the sleep stage the subject is in based on the data acquired so far).
The algorithm takes HR, RR intervals, and a 3-axis accelerometer as input. Assuming that at the beginning of the recording the subject is awake, it starts looking for the physiological HR fall to find the sleep onset [
9]. Next, it examines the various stages by following the known sleep stage behavior.
LIGHT Sleep: NREM1 is a transitional phase between sleep and wakefulness, lasting a few minutes, in which the body begins to relax and from which it is easy to be awakened.
Later, it transitions to NREM2, a phase more stable than NREM1 and with fewer movements that covers about 50% of total sleep. In the algorithm under consideration, NREM1 and NREM2 are combined to constitute the light sleep stage, a phase characterized by lower HR and Respiratory Rate than wakefulness.
DEEP Sleep: NREM3 sleep, also known as deep sleep, is characterized by further lowering and regularization of HR and respiratory rate, movements are minimal. In adult individuals, it covers about 15–20% of total sleep and is mostly in the early part of sleep.
REM Sleep: The last one is the Rapid Eye Movements (REM) phase, which owes its name to the irregular movements that the eyes make during this stage. It is characterized by brain activity similar to that during wakefulness, and consequently by higher cardiac activity than the other sleep phases. Total muscle atonia is present. It covers about 20–25% of total sleep and it is mostly in the final part of sleep [
10,
25].
The algorithm begins with the assumption that the user who starts it is awake. Even if the smartwatch requires more than 1 s to evaluate the different metrics, it starts evaluating them when worn, and their value is available when the application starts. Then, after each sample acquisition, if the subject is awake, the algorithm looks for an HR decrease and for the following HR stabilization inside the evaluation window. If it happens, the subject has fallen asleep. When the subject is sleepy, the algorithm behaves as follows:
IF
a large enough portion of HR is greater than the threshold AND
a large enough portion of RR intervals is smaller than the threshold AND
there are enough movements,
THEN the subject is AWAKE;
ELSE IF
THEN the subject is in DEEP sleep;
ELSE IF
THEN the subject is in REM sleep;
ELSE
All the recordings were processed by this algorithm and the scoring obtained is compared with the PSG scoring written by the medical doctor. The flowchart in
Figure 1 shows how the algorithm works according to the rules seen above. The 10 parameters (winSize, hrTh, rrTh, accTh, stdTh, ×1, ×2, ×3, ×4, and ×5) were defined based on a set of polysomnography tests other than those presented in this paper. In particular, HR, RR intervals, and accelerations behavior were observed in the different sleep stages. hrTh, rrTh, and accTh, are based on those metrics behavior during wakefulness, stdTh is based on how the standard deviation of the HR processed through a moving average decreased during deep sleep. The parameters ×3 and ×5 reflect a minimum time interval necessary for deep and rem stage transition. The tests presented in this paper are used to validate the performance of the algorithm by running a double-blind experiment: the PSG scoring is performed by the medical doctor independently from the algorithm results, and the algorithm results are collected without knowing the outcome of PSG scoring.
2.2. On-the-Fly Sleep Scoring Algorithm
As previously stated, a second version of the algorithm was developed, capable of performing on-the-fly sleep scoring without the accelerometer. The idea is similar to the previous one, where signals were compared to thresholds inside a window to identify the sleep stage. Also in this case, the algorithm begins with the assumption that the user who starts it is awake. Then, after each sample acquisition, if the subject is awake, the algorithm looks for an HR decrease and for the following HR stabilization inside the evaluation window. If it happens, the subject has fallen asleep. When the subject is sleepy, the algorithm behaves as follows:
IF
a large enough portion of HR is greater than the threshold AND
a large enough portion of RR intervals is smaller than the threshold,
THEN the subject is AWAKE;
ELSE IF
THEN the subject is in DEEP sleep;
ELSE IF
a large enough portion of the SDNN is larger than the 1st threshold AND
a large enough portion of the SDNN is larger than the 2nd threshold,
THEN the subject is in REM sleep;
ELSE
6. Discussion
PSG is the gold standard for the study of sleep structure and pathology. It consists of monitoring a patient for one night by collecting data from different kinds of sensors, such as EEG and EOG. The next day, the collected data are read by a physician. This test allows the highest reliability in sleep detection, but it is invasive and expensive [
26]. To fix these problems, there has been a growing interest in wearable devices such as smartwatches and smart rings, whose sensors have improved greatly in recent years and which enable remarkable results in sleep estimation. The problem addressed in this work is to create an algorithm for on-the-fly scoring, but one that can be used with wearable devices. Then, data were collected from wearable devices, and a Matlab algorithm was written to take input from the collected data and calculate the scoring. PSG was used as ground truth, as it is the most accurate technology for sleep scoring. A comparison of the results was made, and the following was determined.
The most difficult phase to identify was light sleep, which is a transitional phase, and has characteristics in common with wakefulness and REM stage, and lends itself to being confused with them. The easiest to identify were wakefulness and deep sleep. Wakefulness has HR and acceleration larger than every sleep stage, so it is usually more recognizable than the other stages, except for motionless wakefulness. Having both the HR and accelerometer can help in distinguishing between motionless wake and sleepiness. Deep sleep is also recognizable thanks to the smaller HR variation that characterizes it. The greater reliability in wakefulness recognition allowed the sleep–wake pattern values, namely TST, WASO, Sleep Onset time, and awakening time, to be more accurate. In particular, the Sleep Onset was shown to be the best identified parameter by looking for the physiological fall of HR. In the following works, the authors compared the sleep scoring performed by different wearables with the one obtained by PSG. Kinec et al. [
27] compared five commercial sleep-tracking devices with actigraphy and PSG. Lee et al. [
28] made a comparison between 11 commercial sleep devices, including wearable, nearable, and airable devices. Across all the devices, they found a bias toward the light sleep stage. In particular, speaking about wearables, they found that wakefulness is usually misclassified as light. Chinoy et al. [
29] compared the performances of seven consumer sleep-tracking devices with the PSG, obtaining an overestimation of light sleep with respect to the medical device. Kim et al. [
30] evaluated the sleep tracking ability of a specific smartwatch, the Samsung Galaxy Watch 3. They compared it to the PSG, obtaining weaker results on REM sleep. All of them agree on the overestimation on the Total Sleep Time (TST) at the expense of the Wakefulness After Sleep Onset (WASO). This is due to the difficulties in finding motionless wake, which sometimes causes our algorithm to misclassify it as light sleep. The main limitations on the results, with respect to the PSG, are the absence of EEG, which is critical for sure sleep stage identification, and the fact that the algorithm is on-the-fly, so evaluations cannot be conducted on the whole signals, but are performed while signals are collected.
An interesting application for the on-the-fly sleep scoring algorithm could be the identification of a deep sleep state while driving to allow the driver to recover. The problem with the proposed algorithm is the dependency on the 3-axis accelerometer, which could be strongly affected at the wheel, especially on rough terrains. So a second version of the algorithm which does not use the accelerometer was developed and tested. Despite the main application of the algorithm at the wheel was to detect a deep sleep state, all four stages were considered in the analysis for a more general use of the algorithm. The easiest to identify were wakefulness and deep sleep. This code provided the same results as the previous in deep sleep identification, since this stage was identified just using HR also in the previous algorithm. Performances in wakefulness and REM identification have slightly worsened because the contribution of the accelerometer is now missing. The performances on sleep–wake parameters, with the exception of the Sleep Onset Advance which depends just on HR, are also slightly worsened due to the lack of accelerations.
In other applications, such as sports rehabilitation, healthcare, and lifestyle monitoring, where the accelerations are not strongly affected by the environment, the motion-aware algorithm could be employed, providing better results. Movements are very helpful in recognizing wakefulness, usually characterized by higher accelerations, and REM sleep, which is known for its muscular atonia.
7. Conclusions
This work shows that an on-the-fly sleep scoring algorithm is possible also with HR and RR intervals. For better results, if the situation allows, it is possible to consider also the acceleration for sleep staging. This study demonstrates that an on-the-fly sleep scoring algorithm can be effectively implemented using Heart Rate (HR) and RR intervals data from consumer wearable devices, with acceptable accuracy and reliability for practical applications. The algorithm was tested with and without accelerometer input to assess the influence of motion data on sleep stage identification.
The results suggest that while accelerometer data improves the identification of certain stages, particularly wakefulness and REM sleep, a reliable level of performance is still achievable with HR and RR intervals alone.
The findings underscore the potential of such algorithms in applications where on-the-fly monitoring is essential, such as in safety-critical environments like autonomous driving. In these cases, the ability to exclude the accelerometer makes the algorithm less sensitive to external motion interference, offering a versatile solution. Future work could focus on further refining this model to enhance accuracy across all sleep stages, potentially incorporating additional physiological markers as wearable technology evolves.