Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU

van Esch, Rik J. C.; Cramer, Iris C.; Verstappen, Cindy; Kloeze, Carla; Bouwman, R. Arthur; Dekker, Lukas; Montenij, Leon; Bergmans, Jan; Stuijk, Sander; Zinger, Svitlana

doi:10.3390/app15073422

Open AccessArticle

Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU

by

Rik J. C. van Esch

^1,2,*

,

Iris C. Cramer

^1,2

,

Cindy Verstappen

^1,3,

Carla Kloeze

⁴

,

R. Arthur Bouwman

^1,2

,

Lukas Dekker

^1,3

,

Leon Montenij

^1,2,

Jan Bergmans

¹

,

Sander Stuijk

¹

and

Svitlana Zinger

¹

Department of Electrical Engineering, Eindhoven University of Technology, 5612 AP Eindhoven, The Netherlands

²

Department of Anesthesiology, Intensive Care and Pain Medicine, Catharina Hospital, 5623 EJ Eindhoven, The Netherlands

³

Department of Cardiology, Catharina Hospital, 5623 EJ Eindhoven, The Netherlands

⁴

Department of Medical Physics, Catharina Hospital, 5623 EJ Eindhoven, The Netherlands

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 3422; https://doi.org/10.3390/app15073422

Submission received: 20 February 2025 / Revised: 17 March 2025 / Accepted: 17 March 2025 / Published: 21 March 2025

(This article belongs to the Special Issue Biomedical Signal Processing in Healthcare: Latest Advances and Prospects)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

We provide new insights into the performance of camera-based heart and respiration rate extraction and evaluate its usability for replacing spot checks conducted in the general ward. A study was performed comprising of 36 ICU patients recorded for a total time of 699 h. The 5 beats/minute agreement between camera and ECG-based heart rate measurements was 81.5%, with a coverage of 81.9%, where the largest gap between measurements was 239 min. The challenges encountered in heart rate monitoring were limited visibility of the patient’s face and irregular heart rates, which led to poor agreement between camera- and ECG-based heart rate measurements. To prevent non-breathing motion from causing error in respiration rate extraction, we developed a metric which was used to detect non-breathing motion. The 3 breaths/minute agreement between the camera- and contact-based respiration rate measurements was 91.1%, with a coverage of 59.1%, where the largest gap between measurements was 114 min. Encountered challenges were the morphology of the respiration signal and irregular breathing. While a few challenges need to be overcome, the results show promise for the usability of camera-based heart and respiration rate monitoring as a replacement for spot checks of these vital parameters conducted in the general ward.

Keywords:

remote photoplethysmography; heart rate; respiration rate; contactless monitoring; video; optical flow

1. Introduction

Patients in hospitals have a 17% chance of experiencing at least one complication postoperatively [1], which increases to up to 27% for patients undergoing major surgery. These statistics show how important facilitating the early detection of postoperative complications is in hospitals. Outside of high acuity departments such as the ICU, detection of complications is based on intermittent spot checks of vital signs. The main limitation of spot checks is the time between measurements, which is typically a few hours. Deviations in vital signs often manifest several hours before an adverse event occurs [2] but could take too long to detect using spot checks. Continuous monitoring of vital signs has been previously shown to contribute to earlier detection of adverse events in surgical departments [3]. However, current continuous monitoring techniques, which are used in the ICU for patient monitoring, are not suitable for deployment on low-acuity departments. Wired sensors are not generally acceptable there, as they could impede patient mobilization and can be uncomfortable for patients [4]. Instead, wearable or contactless sensors would be more appropriate for continuous monitoring on low-acuity departments.

Camera-based vital signs monitoring is one group of relatively new technologies that could facilitate continuous monitoring in low-acuity departments. The main advantage of camera-based monitoring is that it is contactless and thus unobtrusive compared to contact sensors. Camera-based monitoring research focuses mainly on three different parts of the electromagnetic spectrum: visible light (RGB), near-infrared (NIR) light, and far-infrared light (thermal radiation). In these three groups, many techniques have been developed to monitor various vital parameters, such as heart rate [5,6,7,8], heart rate variability [9], respiration rate [10,11,12,13], SpO₂ [14,15], and temperature [16]. Furthermore, research has also shown that monitoring other facial features, such as the redness of the skin, can be used as markers for detecting patient deterioration [17,18].

Camera-based heart and respiration rate monitoring is by far the most mature of these technologies. In this work, we will focus on remote photoplethysmography (remote PPG) for heart rate monitoring and breathing motion extraction for respiration rate monitoring. First, remote PPG is a technique that measures microscopic skin color variations caused by the cardiovascular pulse wave. The remote PPG signal can be extracted using RGB and NIR cameras. Many methods have been proposed for the (robust) extraction of the remote PPG signal [5,6]. Second, breathing motion extraction is based on detecting the respiratory movements of the torso. Various methods have been previously proposed, for example, based on optical flow [10] or cross-correlation [11].

While camera-based heart and respiration rate monitoring is a well-explored field of research, most validations of these techniques have been performed on healthy volunteers and/or short video recordings. There are potentially substantial challenges in continuous camera-based monitoring that cannot be easily emulated by short recordings on healthy subjects. These include naturally occurring patient activity, changes in illumination, occlusion of the patient, and clinical interventions. Although ideally these camera-based monitoring challenges are investigated in patients in low-acuity departments, where we would like to apply this technology, the lack of continuously measured reference values for heart and respiration rates limits what we can learn about the accuracy of camera-based monitoring in this environment. Although not ideal, the ICU provides a suitable tradeoff between the availability of reference values and the presence of challenging monitoring situations.

Previous research by Jorge et al. [19] on camera-based heart and respiration rate monitoring in the ICU provided insights on the coverage and mean absolute error (MAE) using monochrome recordings of 15 postoperative ICU patients for a total duration of 233.5 h. First, they manually removed privacy moments and occlusions of the patient by visitors or staff. Then, from the remaining video they extracted heart rate with a coverage of 53.2% and respiration rate with a coverage of 63.1%.

In our work, we intend to add to their insights into camera-based monitoring performance and coverage by including other types of ICU patients. To be precise, we intend to evaluate state-of-the-art camera-based heart and respiration rate extraction methods on a group of postoperative cardiac surgery patients, which is a surgery type not included in the work of Jorge et al. [19], and a second group of patients who had an emergency admission to the ICU. These groups of patients tend to have a wider range in their vital parameters and a higher rate of adverse events compared to the types of postoperative patients in the work of Jorge et al. [19]. This can provide useful insights into whether the system is accurate even during abnormalities in the vital parameters, which can be an indicator of future deterioration.

Furthermore, Jorge et al. [19] used a monochrome camera to extract heart rate. However, previous research [5] shows that heart rate extraction based on a single wavelength camera is inferior to techniques that combine multiple color channels. In this work, we will show that for many patients, the only limitation on performance and coverage is the room lighting, where for these patients, a 5 beats/min (BPM) agreement of 97.3%, where the average heart rate is 80 BPM, and a coverage of 80.1% was reached. When room lighting is insufficient, a monochrome camera that is sensitive in the NIR spectrum and a NIR light source are still needed to provide coverage. This presents the additional challenge of combining measurements from both modalities depending on lighting conditions.

Finally, Jorge et al. [19] performed extraction of the respiration rate through changes in the intensity of the pixels caused by breathing motion. Instead, an optical flow method is likely more effective for breathing motion extraction, since pixel intensity changes may not always correlate with breathing motion. Wang et al. [10] demonstrated that such a method can be highly sensitive to breathing motion. This is required to measure the breathing motion of patients with varying breathing patterns, amplitudes of chest movement, and postures relative to the camera. However, this method cannot yet distinguish the breathing motion from any other motion. The presence of non-breathing motion should be detected, as any respiration rate extracted while non-breathing motion is present could be inaccurate. In this work, we will therefore develop a metric through which we can determine whether extracted breathing motion is distorted by non-breathing motion or not.

In this work, our aim is to provide new insights into the performance and coverage of camera-based continuous heart and respiration rate monitoring using state-of-the-art methods. While it is unlikely that perfect coverage can be achieved with the current methods, our aim is to evaluate the usability of camera-based monitoring to replace spot-check-based monitoring conducted in the general ward. In this setting, it will not be crucial to measure absolutely continuously, but up-to-date measurements need to be available when required by hospital staff. As such, challenging situations where camera-based monitoring becomes inaccurate need to be understood, as the prolonged presence of these situations could lead to the unavailability of up-to-date and accurate heart and respiration rate measurements. As such, in this work, we will investigate where challenging situations occur and what their effect is on the extraction of the heart and respiration rate.

In Figure 1, our complete proposed methodology for camera-based heart and respiration can be seen. The remainder of this paper is structured as follows: In Section 2 we will outline the measurement setup, video data collection, and the reference vital parameters. Then, in Section 3, we show the extraction methods for the heart and respiration rate for both RGB-based and NIR-based extraction. In addition, we detail how we will analyze the performance of the heart and respiration rate extraction. Finally, we develop the breathing motion metric to distinguish breathing motion from non-breathing motion for respiration rate extraction. In Section 4, we will then lay out the results for each vital parameter and imaging modality. Finally, in Section 5, we will draw conclusions on the current performance of camera-based heart and respiration rate monitoring.

2. Materials and Methods

This work is part of the FORSEE study, a prospective single-center validation study of video-based vital sign monitoring in ICU patients in a tertiary care hospital (Catharina Ziekenhuis, Eindhoven, The Netherlands). The study was reviewed by the medical ethical committee of the MEC-U (Nieuwegein, The Netherlands, File no: W20.180), who determined that the Medical Research Involving Human Subjects Act (WMO) did not apply to this study. The study protocol was approved by the internal review boards of the Catharina hospital in Eindhoven. The study was registered on clinicaltrials.gov with the registry number: NCT06099327. This study took place from August 2022 until August 2023.

2.1. Measurement Setup

Video recording utilized two camera types: RGB and NIR. The RGB camera (3860-C-HQ Rev. 2, IDS GmbH, Obersulm, Germany) was configured for 32 fps 484 × 274 pixel resolution with 2× binning and 12-bit depth. To ensure adequate brightness of the face, the RGB camera’s automatic shutter function was run once every minute, and the ROI for the brightness target was set to the face ROI detected with a face detector [22]. All other camera functions were disabled. The NIR camera (UI-3370CP-NIR-GL, IDS GmbH, Obersulm, Germany) operated at 16 fps 1920 × 1080 resolution with 8-bit depth. The lower frame rate was chosen to allow a longer shutter time, which was set to 66 ms to allow sufficient brightness of the NIR image during the night. Again, all other functions were disabled. Both cameras were equipped with a fixed 8 mm focal length lens (M118FM08, TAMRON, Saitama City, Japan) with a fully open aperture.

Mounting the cameras above the patient’s bed would be ideal, as they would not obstruct the staff, and the higher angle of incidence could reduce the chance that staff would occlude the view of the patient. However, altering the ICU room to mount the cameras on the ceiling was not feasible for this study. Instead, a mobile camera setup on a medical cart was placed at the foot end of the bed. This allowed a single setup to be used for data collection in multiple ICU rooms, and as the setup was mobile, it could be moved out of the way if needed by the staff. The three cameras and a baby monitor (DVM-71, Alecto, Kerkrade, The Netherlands), of which only the NIR light source was used, were mounted close together on the cart at a height of 2 m and were aimed such that patient visibility could be ensured regardless of bed height and headrest adjustments. Each of the cameras collected approximately 2GB/min of video data for a total of approximately 5.8 TB of video per 24 h per patient. Therefore, a laptop with 8 TB of external storage was used to collect and store video data of one patient. After a recording was finished, the video data were transferred to another storage location and processed offline. Some patients were also seated in a chair next to the bed for a part of the recording, in which case the setup was moved so that they were still within the field of view. If during a recording a moment of privacy was needed, the cameras were covered. The complete measurement setup can be seen in Figure 2.

2.2. Video Dataset

In total, this study included 36 ICU patients from the Catharina hospital (Eindhoven, The Netherlands). For 25 patients, informed consent was obtained prior to elective surgery with planned postoperative admission to the ICU. The remaining 11 patients had unscheduled (emergency) admissions to the ICU, and informed consent was obtained from the patient if they were conscious and otherwise from a family member. Each patient participated in the trial for up to 24 h or until discharged from the ICU, whichever occurred first. For one postoperative patient, a failure in the recording laptop caused the RGB camera to not store any video data, and therefore, this patient was not included in any further analysis. The summarized characteristics of the patients can be found in Table 1. Furthermore, the operation of the NIR camera was dependent on the operation of the NIR light source. In some cases, the NIR light source or the whole cart was moved or unplugged from the outlet, rendering any video recording from the NIR camera inadequate for further analysis. This situation occurred for two postoperative patients in total.

Each video was annotated with a number of parameters that would impede the measurement completely, i.e., the patient was not in the field of view of the camera, was (partially) occluded by staff or equipment, or when the camera was covered by staff for privacy reasons. For the purpose of annotation, a second set of videos was created from the raw video recordings, which were downsampled to a frame rate of 1 frame/minute. This resulted in a reasonable number of frames to be annotated as compared to the original 699 h of recording per camera. Furthermore, the frame rate of 1 frame/minute is considered sufficiently high such that most patient occlusions by staff and equipment can be detected, as they typically take 1 min or more to complete. These annotations were then interpolated to the frame rates of the recordings by copying the nearest annotation of each video frame in the original recording.

2.3. Reference Vital Signs

In addition to the video recordings, contact-based vital signs measurements are also collected. For the purpose of this study, a single-lead ECG was collected from the patient monitor (Intellivue MP70/MX750, Philips Medical, Best, The Netherlands) as the gold standard for heart rate. Furthermore, airway flow, pressure, and ECG thoracic impedance were collected as the golden standard for respiration rate. For the first 24 patients, contact sensor data were collected using a second laptop connected to the patient monitor using the iXTrend software package version 2.1 (ixitos GmbH, Aken, Germany). For the remaining patients, Data Warehouse Connect version 4.0 (Philips Medizin Systeme Böblingen GmbH, Böblingen, Germany) was used to collect reference data from the patient monitor. The collection software was changed during the duration of the study due to the replacement of patient monitors in the ICU, and these new patient monitors were not compatible with the iXtrend software.

2.4. Heart Rate Extraction

In this work, the camera-based heart rate measurement was extracted from a face ROI through the extraction of the remote PPG signal using both the RGB and NIR cameras. We detail briefly the processing pipeline for heart rate extraction of both cameras using (modified) existing state-of-the-art methods. The different processing steps can be seen in Figure 1, where we refer to the methodology used below each processing step. Furthermore, we have used an asterisk to mark methods that were modified and a plus sign to denote processing steps that were new contributions.

2.4.1. Face ROI Detection

For both NIR and RGB video, the face ROI was automatically detected using a face detector [20] every 30 s. The face detector produced ROIs and a corresponding score for all potential faces in the image. The patient’s face ROI was selected from the two highest scoring ROIs depending on which was closer to the center of the frame. For all other frames, the face ROI was tracked using a Kernelized Correlation Filter (KCF) tracker [21]. If the face detector or tracker produced a face ROI of which the center jumped more than 50 pixels in either direction between two frames, the face detector was immediately run again to ensure the patient’s face was still correctly detected.

2.4.2. Remote PPG Signal Extraction

The remote PPG signal was then extracted from the video frames using the detected face ROIs. First, for the NIR camera, the remote PPG signal was simply calculated as the spatial mean of the pixels inside the face ROI over time. Here, we must consider that there is spatial variation in the phase of the remote PPG signal across the skin [23], which could affect the spatial mean. Here, we used a Fourier-spectrum-based heart rate extraction (see Section 2.4.3), which limits the effects of the phase information. Additionally, Kamishilin et al. [23] note that local skin pixels can have counter-phase PPG signals, which would be problematic when averaging. In our work, the area of skin a single pixel covers is much larger, which makes it unlikely that we can observe or mitigate these effects. For the RGB camera, the model-based Plane-Orthogonal-to-Skin (POS) method [5] was used to extract the remote PPG signal from the R, G, and B color channels in the face ROI.

Regularly, the POS method uses the spatial means of the R, G, and B channels inside the face ROI to extract the remote PPG signal. However, in the hospital environment where lighting is less controlled, overexposure of pixels could occur, which distorts the pulsatile changes in color due to blood flow. To limit the influence of overexposed pixels on the channel means, we proposed assigning weights to each pixel based on the frequency of overexposure in the last 0.5 s (16 frames). The weight of each pixel is calculated as

1 - \frac{n}{16}

, where n is the number of times that pixel was overexposed in the last 16 frames. The weights were updated for each new frame and then used to calculate the weighted spatial means of the R, G, and B channels.

2.4.3. Heart Rate Estimation from Remote PPG

Both the remote RGB and the NIR based PPG signals were segmented into non-overlapping segments of 60 s using a hanning window. The windowed segments were then zero padded such that the Fourier spectrum of the segments had a resolution of 0.1 BPM. For each segment, the heart rate was chosen as the peak of the Fourier spectrum between 40 and 220 BPM, which we consider as the range of valid human heart rates. Then, based on the video annotations, any segments which contained frames that were marked as privacy sensitive, out of view, etc., were discarded, and the remaining segments were marked as valid time. In Table 3, we have reported the valid recording time per patient.

2.4.4. Reference Heart Rate

The reference heart rate was extracted using the single-lead ECG measurements. The R peaks of each patient’s ECG were detected using RDECO [24] and processed into inter-beat intervals (IBIs). These IBIs were also divided into segments of 60 s, and the heart rate of each segment was calculated by taking the mean of the IBIs.

2.4.5. Error Analysis of RGB-Based Measurements

RGB-based heart rate extraction relies on visible light to enable successful heart rate measurements from a patient. As the measurement setup in this work does not include a dedicated visible light source, variations in room lighting, weather, and bed placement in the ICU room can strongly affect the lighting conditions in the room. While we have previously described how we attempted to prevent and combat overexposure of face pixels, there will also be a minimum brightness level at which we can still reliably extract heart rate from RGB video frames. First, we determined the brightness level as the mean across all pixels in the face ROI and the R, G, and B channels. Then, we assessed the 5 BPM agreement, coverage, and mean absolute error (MAE) as we increased a minimum brightness threshold to determine how the RGB-based extraction was affected by the face ROI brightness.

After we removed segments that could have poor agreement due to the brightness level, we further investigated the RGB recordings of patients for significant errors and tried to, based on investigation of the ground truth and video recordings, sketch a picture of likely causes for poor quality monitoring in those recordings. Furthermore, we will also highlight which step in the heart rate extraction seems to be the problem.

2.4.6. Error Analysis of NIR-Based Measurements

The methodology of NIR-based heart rate extraction differs from the RGB-based extraction. The NIR-based extraction uses a dedicated infrared light source, which means that the lighting conditions are largely controlled. However, unlike RGB-based extraction, patient motion can cause errors in the NIR-based measurements. Patient motion can add intensity variations to the skin pixels that are not caused by blood flow. If their frequency content is within the heart rate range, i.e., 0.66–3.6 Hz (40–220 BPM), the extraction of the heart rate during patient motion could become inaccurate. As the NIR camera is monochrome, the POS method cannot be applied to provide robustness against patient motion. Therefore, detection of patient motion is necessary for NIR-based heart rate extraction such that measurements distorted by patient motion can be discarded.

To evaluate the effect of motion on the remote PPG extraction, we used the Lukas–Kanade optical flow [25] to quantify the motion in each frame of the NIR videos. We then found the mean optical flow magnitude of all pixels within the face ROI. Finally, we classified a segment of remote PPG signal to contain motion if any frame within it had a motion magnitude greater than some threshold. We then determined this threshold based on a tradeoff between the agreement and coverage of the NIR-based heart rate extraction.

2.4.7. Combining RGB and NIR

In the brightness range where the RGB camera has no coverage or poor performance, the NIR camera should replace it to ensure coverage in this brightness range. To enable this, we applied the combination strategy detailed in Figure 3. Since the RGB-based extraction is motion-robust and the NIR-based extraction is not, the RGB-based extraction was used above the to-be-determined face ROI brightness threshold. Below the brightness threshold, the NIR measurement was used as long as the patient’s motion was beneath the motion threshold.

After choosing a motion threshold for the NIR measurements and a RGB face ROI brightness threshold, we reported for each patient individually the performance of the combined RGB and NIR heart rate measurements. For each patient, we reported a number of evaluation metrics: the 5 BPM agreement, MAE, coverage, largest time gap, and valid time. The largest time gap consists of the longest sequence of discarded measurements in the camera-based heart rate measurements during the valid time of each patient. The valid time consists of the total time where the cameras are recording and reference measurements are available, excluding annotated moments of privacy when the patient was not present or when the view of the patient was obstructed by staff or equipment.

2.5. Respiration Rate Extraction

To enable a camera-based respiration rate extraction during all lighting conditions, both the RGB and NIR cameras were used. The extraction of the respiration rate in this work is based on the measurement of the chest movements of the patients. To accurately measure these chest movements, an optical flow method as proposed by Wang et al. [10], specifically the M1D-OF method described in their work, was used for both types of video. This method restricts the general optical flow calculation to increase the sensitivity to breathing motion based on a few assumptions: First, it is assumed that all respiratory motion is in the vertical direction. As such, in the optical flow estimation, the horizontal flow is assumed to be zero. Second, it assumes that the breathing motion is homogeneous across the entire chest ROI. The different processing steps for respiration rate extraction can be found in Figure 1.

2.5.1. Chest ROI Detection

The chest ROI that was used to determine breathing motion is derived from the location and size of the face ROI. It is defined as a region directly below the face ROI. Its height is equal to the height of the face ROI, and its width is three times that of the face ROI.

2.5.2. Breathing Motion Extraction

For each video frame, the pixels in the chest ROI were extracted and converted to grayscale. The M1D-OF method extracts the chest motion between pairs of frames. The distance between these frames was chosen to be 12 frames for the RGB camera and 5 frames for the NIR camera. To extract the respiration signal, it was postprocessed as described by Wang et al. [10], which includes a sliding window cumulative sum to integrate the signal while avoiding baseline wander of the respiration signal.

2.5.3. Respiration Rate Estimation

Similarly to the calculation of the heart rate, we used a windowed Fourier transform to extract the respiration rate from the chest motion signal. Again, we divided the respiration signals into segments of 60 s and selected the respiration rate as a peak of the Fourier spectrum of each segment. However, the highest spectrum peak is not always the respiration rate; it could instead be one of the higher-order harmonics due to signal morphology. To find the correct respiration rate, we first selected the two largest peaks within the 6–48 breaths/min (RPM) range, ensuring that the smaller peak was at least half the height of the larger. Second, we determined the first harmonic through the criteria in Table 2. If a valid first harmonic was found, it was chosen as the respiration rate; otherwise, the highest peak was selected. In addition, any segments where the patient was out of view or occluded from the camera were discarded, and the remaining valid time can be found in Table 4.

2.5.4. Reference Respiration Rate

For the reference respiration rate, we used the same approach as the camera respiration rate extraction for all three references; however, they were not all available all of the time. To form a single reference respiration rate, they were prioritized in the following order: airway flow, airway pressure, and finally ECG-based thoracic impedance. If all three were unavailable, the segment was removed from further consideration.

2.5.5. Detection of Breathing Motion Errors

The M1D-OF method [10] cannot yet differentiate between breathing and other motion in the chest ROI, leading to inaccurate respiration measurements in segments containing non-breathing motion. Here, we developed a metric to detect when non-breathing motion occurred such that inaccurate measurements could be discarded.

The image brightness function

I (x, y, t)

describes the brightness of the pixels in the chest ROI over time. The gradient constraint equation [26] describes how optical flow, i.e., motion, relates to the image brightness function. The gradient constraint equation at time t is written as follows:

- I_{t} (x, y) = v_{x} (x, y) I_{x} (x, y) + v_{y} (x, y) I_{y} (x, y),

(1)

where

I_{t}

,

I_{x}

, and

I_{y}

are gradients of the brightness function toward their respective variables. The calculation of these gradients is detailed by Wang et al. [10]. Finally,

v_{x}

and

v_{y}

represent the horizontal and vertical motion. To solve Equation (1) for

v_{x}

and

v_{y}

, the M1D-OF method of Wang et al. [10] constrains the solution through the assumption that breathing motion is equal across the chest ROI and that the motion is only in the vertical direction. In terms of Equation (1), this means that

v_{x} = 0

for all

x, y \in R O I

and that

v_{y}

is constant over all

x, y

. Given these assumptions, the least squares (LSs) solution of

v_{y}

is defined as follows:

{\hat{v}}_{y} = - {({D_{y}}^{T} D_{y})}^{- 1} {D_{y}}^{T} D_{t} .

(2)

Here,

D_{y}

and

D_{t}

are column vectors containing, respectively, the spatial and temporal gradients of each pixel in the chest ROI. This LSs solution minimizes residual error r:

r = | | D_{y} \cdot {\hat{v}}_{y} + D_{t} {| |}_{2}^{2} .

(3)

The constrained model of optical flow in the chest ROI only fits well to motion that does indeed adhere to the restrictions. We can argue that the constrained solution

v_{y}

in Equation (2) will generally not be a good solution to Equation (1) if

v_{x} \neq 0

and/or

v_{y}

is not the same across all

x, y \in R O I

. In terms of the residual error, we can write the following:

D_{t} = - V_{x} \cdot D_{x} - V_{y} \cdot D_{y},

(4)

where

V_{x}

and

V_{y}

are the vectors of the true velocities

v_{y} (x, y)

and

v_{x} (x, y)

of all pixels in the chest ROI. If we substitute Equation (4) into Equation (3), we obtain

r = | | D_{y} \cdot ({\hat{v}}_{y} - V_{y}) - V_{x} \cdot D_{x} {| |}_{2}^{2} .

(5)

As such, the residual error contains the unmodeled motion in the horizontal direction and the difference between the true motion in the vertical direction relative to the solution

{\hat{v}}_{y}

. Thus, any pixel in the ROI that contains motion that does not adhere to the constrained model will contribute to an increase in residual error. Indicating an increase in residual error could be a sign of non-breathing motion.

An additional source of residual error is the measurement error in spatial

I_{y}

and temporal

I_{t}

gradients, as explained by Kearney et al. [26], which is caused by the sensor and quantization noise. Furthermore, systematic error arises from non-linearity in image brightness

I (x, y)

, especially at sudden changes like blanket edges. Although small motions, such as breathing, are less affected, large non-breathing motions could be amplified by non-linearity in the brightness function [26].

To summarize, non-breathing motion errors consist of unmodeled dynamics, and thus, the error changes with the direction and magnitude of the non-breathing motion and could be further amplified by non-linearity in the brightness function. In contrast, measurement error is unrelated to motion and will change slowly over time with changes in lighting conditions. To detect the presence of non-breathing motion using the residual error regardless of the contribution of measurement error, we used a sliding window with a 6-s duration and 1-frame stride to measure residual error standard deviation. If a segment’s standard deviation surpassed a threshold, it was marked as having non-breathing motion. In addition, we also discarded a segment if no motion was detected in it at all. To do this, any segment where the standard deviation of the measured breathing motion was below a threshold of 0.02 was also discarded.

To show how well this approach can work, we plotted the 3 RPM agreement, MAE, and coverage of the respiration rate over a range of thresholds. Since both NIR- and RGB-based respiration rate extraction use grayscale images, the same methodology can be applied to both; however, since the hardware and settings differ between the cameras, they must be evaluated separately.

2.5.6. Combining RGB and NIR

The combination strategy for the respiration rate measurements based on RGB and NIR is shown in Figure 4, where we use

σ_{R G B}

and

σ_{N I R}

to denote the standard deviation of the breathing motion in a segment. First, thresholds

T_{N I R}

and

T_{R G B}

of the breathing motion metric were selected such that their overall 3 RPM agreement across all patients was 90%. Then, we chose whether to use the RGB- or NIR-based measurement based on the brightness level in the chest ROI; if it was sufficient, the RGB-based measurement was used. Otherwise, the NIR measurement was chosen. We determined the brightness threshold

T_{b r i g h t n e s s}

by plotting the overall 3 RPM agreement, MAE, and coverage of both modalities and the combined measurements against the threshold and chose the threshold which yielded the best overall performance.

Finally, after choosing the breathing motion and brightness thresholds, for each patient, we reported a number of evaluation metrics: the 3 RPM agreement, MAE, coverage, largest time gap, and valid time. The largest time gap consists of the longest sequence of discarded measurements in the camera-based respiration rate measurements during the valid time of each patient. The valid time consists of the total time where the cameras are recording and reference measurements are available, excluding annotated moments of privacy when the patient was not present or when the view of the patient was obstructed by staff or equipment. Using these evaluation metrics, we investigated if there were any significant errors in the extraction of the respiration rate. Here, we will not distinguish between RGB- and NIR-based measurements but only interpret the combined performance, since the method of extraction was shared between the two modalities.

3. Results

3.1. Heart Rate Extraction

First, we detail the results of the heart rate extraction for both the RGB- and NIR-based extraction and evaluate where errors occurred and what the likely cause is. Finally, we also evaluate their combined performance.

3.1.1. Error Analysis of RGB-Based Extraction

We first identified the minimum RGB face ROI brightness required for accurate measurement by using the patient RGB video recordings where no other monitoring challenges were present. Recordings where other monitoring challenges occurred were excluded from this analysis, as they could affect the choice of the minimally required face ROI brightness. Thus, this analysis consisted of 18 of the 35 total patients (consisting of patients 1, 3, 4, 7, 8, 9, 10, 11, 14, 15, 16, 17, 19, 20, 22, 23, 24, and 30), where it is worth noting that this selection contained 17 postoperative patients and only 1 emergency patient. In Figure 5, we show the total 5 BPM agreement, coverage, and MAE of the RGB recordings of these 18 patients plotted against the face ROI brightness threshold. From these results, we see that for these recordings, the agreement nearly plateaued at the brightness level of 25, with an agreement of

97.3 %

and MAE of

1.06

BPM. The overall coverage of measurements above this threshold is

80.1 %

. It is also clear that the only significant factor influencing the quality of RGB-based monitoring in these recordings is the face ROI brightness level, since a very high agreement was achieved for all measurements above the threshold.

Next, after we applied the minimum brightness threshold to the heart rate measurements of the remaining 17 patients, we then ruled out the effect of face ROI brightness on the agreement of these patients’ heart rate extractions and investigates the remaining errors. First, for 10 patients (12, 13, 18, 26, 28, 29, 31, 33, 34, and 35), a significant loss of agreement can be attributed to the presence of an irregular heart rhythm during the recordings. The heart rate extraction through the Fourier spectrum of a segment can become inaccurate if the heart rate in the segment cannot be represented by a single peak, and thus, the extracted heart rate does often not agree with the mean of the IBI in the corresponding ECG segment. Depending on the severity of the irregularity, it might always cause loss of agreement or only when combined with other influences, which we mention below.

Second, upon further investigation of the patients’ RGB recordings, we observe that inaccurate measurements often coincided with the presence of staff and/or visitors in the field of view of the camera. Segments where they occluded the face ROI of the patient were already removed from further analysis, and staff and visitors were present in almost all recordings; as such, further scrutiny was required to arrive at the root cause. Subsequently, we noted that problems occurred most often when the patient was not facing the camera, as is the case in Figure 6a,b. In this situation, the face detector might more easily pick up staff or visitors instead of the patient, or it might not be able to find the patient at all. This alone will cause problems with agreement, since the face ROI is not on the patient. On top of this, the camera might adjust the exposure time using an ROI that does not contain the patient’s face, which could result in a very overexposed face ROI until the exposure time is adjusted again, as is the case seen in Figure 6a. The resulting overexposed image in turn can increase the difficulty of finding the correct face ROI, especially if staff and/or visitors re-enter the field of view, which could perpetuate the situation until the patient is alone in the view of the camera for an extended period of time.

In its own right, limited visibility of the patients’ face can also already lead to inaccurate heart rate measurements. In Figure 6c, we see a patient who is not facing the camera while also having a very limited amount of skin pixels visible in the face ROI. Their blanket occludes part of their face from the camera. Medical tape and/or facial hair can occlude the face in a similar way. However, due to the requirement that patients remain anonymous, we are unable to show a clear example of this here. These occlusions can be especially problematic if a patient is resting and does not move a lot, which can cause inaccurate measurements over a long period of time.

The situations described above can and do happen in most recordings, but for eight patients (2, 6, 21, 25, 27, 31, 32, and 33), it especially led to a major reduction in agreement of the RGB camera.

3.1.2. Error Analysis of NIR-Based Extraction

Similar to the RGB recordings, the NIR recordings of patients with an irregular rhythm had inaccurate heart rate measurements; as such, the NIR recordings of the same group of 11 patients had low agreement with the reference. Another observation is that in the case of patient 20, the NIR-based heart rate became inaccurate because of a blinking light that illuminated the patient; the light blinked with a period of 5 s, which is outside of the heart rate range, but one of its higher-order harmonics did fall in the heart rate band at 48 BPM, which caused a peak in the Fourier spectrum which for many segments was higher than the heart rate peak. Furthermore, we also observed that for a large group of patients, the RGB recording never became dark enough to the point where the RGB-based measurement became inaccurate, which means that only a subset of the NIR recordings was relevant for heart rate monitoring. Also, the NIR camera was out of focus for the recording of patients 3 and 16; the resulting blurring of the images influenced the optical flow calculation. Because of this, the optical flow measurement becomes much less sensitive to smaller motion, which could harm the performance of the motion detection. Furthermore, the face detection was affected by this, resulting in a more poorly defined face ROI. Finally, patients 19 and 23 had a problem where the NIR light source was not correctly pointed at the patient, causing inaccurate extraction of the heart rate. In the end, the NIR recordings of seven patients (1, 7, 10, 15, 17, 29, and 34) remained where NIR measurements could feasibly contribute to the coverage of the monitoring system. In Figure 7, we see the overall agreement, MAE, and coverage of the heart rate extraction in these selected NIR recordings plotted against the NIR motion threshold. In the graph, 100% coverage indicates the whole recording. Since the NIR-based measurements were only evaluated where the RGB face ROI brightness was below 25, the coverage did not reach 100%.

3.1.3. RGB and NIR Combination

In Table 3, we report the performance of the heart rate measurements using the combination strategy of RGB and NIR measurements shown in Figure 3. We chose the motion threshold to be 0.008 such that the overall agreement of the remaining segments would be 90%. The patients where NIR-based measurements were used to contribute to the coverage of the camera-based heart rate extraction have been highlighted in the table. We observe that for a number of patients, where NIR measurements were not used, the coverage was still near 100%, meaning that RGB-based measurements could cover monitoring throughout patient recording. This indicates that in those cases, the ICU room did not become dark enough for the NIR camera to become relevant. Furthermore, we note that there is a clear difference in the monitoring quality of the postoperative patients as compared to the emergency patients, where for the postoperative patients, the overall agreement was 91%, while for the emergency patients, it was only 62.6%. Overall, the agreement of all patients was 81.5%, with an MAE of 6.52 BPM and a coverage of 81.9%.

Furthermore, in the analysis of the largest time gaps between the heart rate measurements of each patient, we divided the patients into three groups. First, for the patients where NIR contributed to the coverage, we observe that the time gap was at most 46 min. Second, for patients where NIR was not required, as the RGB face ROI brightness never became lower than 25, we observe very low time gaps of at most a few minutes. Finally, for the remaining patients where NIR could contribute to coverage but could not be used due to the reasons outlined in Section 3.1.2, the largest time gap was simply the longest interval where insufficient room lighting impeded heart rate extraction with the RGB camera.

3.2. Respiration Rate Extraction

First of all, we can see in Figure 8a how the overall performance of the RGB-based respiration rate measurement changed when we increased the threshold on our new breathing motion metric. A low threshold led to significantly higher overall agreement of the remaining segments. Furthermore, in Figure 8a, we can also see the performance of the NIR-based respiration rate measurement against our breathing metric threshold. Using NIR-based respiration, we can see a similar effectiveness of the breathing metric as in the RGB-based respiration. We also note that since the RGB and NIR cameras have different frame rates, the resolutions and bit depths threshold values between the two method were not comparable and had to be set separately. Finally, it is noted that in the RGB recordings, sometimes patients slid down the bed, which moved the chest ROI out of view of the camera, which explains why the coverage curve in Figure 8a did not reach 100% for high-threshold values of the metric.

Combining RGB and NIR

Combining the respiration rate measurement of the NIR and RGB extraction methods requires that we choose threshold values of the breathing error metrics of the RGB and NIR measurements. Here, we chose cutoff values of 5.5 and 2.6 for the RGB and NIR, respectively, which resulted in an overall agreement of 90% for each recording type. Now, in Figure 8b, we see the combined and individual performance of the RGB and NIR measurements plotted against the brightness threshold of the chest ROI. The agreement, coverage, and MAE changed only slightly as we varied the brightness threshold. Over the range of brightness thresholds, the agreement of the combined respiration rate measurement changed very little.

Next, to simplify the evaluation of the combined respiration rate measurement per patient, we chose the brightness threshold for the evaluation at 45. At this brightness threshold, RGB- and NIR-based measurements had approximately equal agreement. In Table 4, we present the 3 RPM agreement, MAE, coverage, largest time gap, and valid time for each patient. Patients for whom NIR contributed to the coverage of the respiration rate measurements have been highlighted in the table. For patients 29 and 30, no respiration rate reference was available, and thus, they have been excluded from the table.

Now, in Table 4, we can see that the overall 3 RPM agreement was 91.1%, with an MAE of 1.12 RPM and a coverage of 59.1%. Of the two groups of patients, the postoperative group had a slightly higher agreement and coverage. Although the overall performance of the respiration rate extraction is decent, when we look at the performance of individual patients, there is still quite some variation in individual performance, especially for patient 22, who had the poorest agreement among the patients of 52.2%. The reason for this is likely because the patient suffered from a pneumothorax (collapsed lung) at the time of the recording. Although we will not go into the clinical details, it is likely that the assumptions of the breathing motion measurement were violated by the change in the physiology of the chest cavity.

Furthermore, in Table 4, we report the largest time gap between measurements for each patient. For most patients, the largest gap between measurements was fairly short and dependent on the threshold of the breathing motion metric. In the current methodology, the only way to reduce the time gap would be to increase the threshold; however, this would also likely lower the agreement. For patient 24, the relatively longer time gap of 99 min was caused by discarding RGB measurements due to low image brightness, while no NIR measurements were available. For patients 27, 31, and 32 the cause for the relatively longer time gaps of 101, 73, and 114 min were likely caused by failure of or poor face detection. We observed before that this also caused problems in the performance of heart rate extraction in these patients. Since the chest ROI is derived from the face ROI, the chest ROI could also potentially not or not fully contain the patient’s chest. This could lead to discarding a large number of measurements until the face and thus chest ROI registration are again of good quality.

Finally, another source of error was the morphology of the breathing motion and reference signals. This error will not be detected by the breathing motion metric and can lead to a loss of agreement between the two cameras and the reference measurements. In Section 2.5, we outlined a correction of the Fourier spectrum peak to find the first harmonic of the respiration signals, which we considered to be the respiration rate. The overall 3 RPM agreement across all patients in Table 4 increased by 1.9% over the overall agreement of the measurements before peak correction was applied, which was 89.2%. While we note that the peak correction had a positive effect, it was not always successful. In segments where the morphology and/or length of a breathing cycle changed over time, we could not always find the first harmonic successfully with this method. Notably in this case, there can be multiple peaks in the spectrum related to the respiration rate, which increases the chance that the wrong peak is selected.

4. Discussion

In this work, our objective was to provide new insights into the performance and coverage of camera-based continuous heart and respiration rate monitoring using state-of-the-art methods in ICU patients. Furthermore, we aimed to evaluate the usability of camera-based monitoring to replace spot-check-based monitoring in the general ward.

In terms of RGB-based heart rate extraction, the brightness level was the only factor limiting agreement and coverage in 18 of 35 RGB recordings. In these recordings, segments with a brightness level greater than 25 showed a 97.3% 5 BPM agreement and a coverage of 80.1%. Only one of these recordings was from the emergency group, as more emergency patients faced challenging monitoring conditions compared to the postoperative group. Key challenges were identified in RGB-based heart rate measurements from the remaining 17 patients. First, irregular heart rates complicate Fourier spectrum analysis because the rhythm is not represented by a single peak. This occurred for three postoperative and for seven emergency patients. Further investigation into the type of rhythm and the effect on the heart rate measurement is left for future work. Second, RGB-based heart rate measurements often became inaccurate due to limited face visibility and often resulted in poor face ROI detection, especially when staff or visitors obstructed the camera view. The camera view was defined by our measurement setup. We used one RGB and one NIR camera at the bed’s foot end with fixed lens magnification, which was chosen to always keep the patient within the camera’s field of view. Future research should include how the current camera monitoring setup can be improved to increase agreement and coverage where the system of this work failed. Future improvements may include adding multiple cameras at different room locations for better visibility, ensuring the patient’s chest and face are always visible, or splitting monitoring tasks between different cameras with increased lens magnification for more detailed ROIs. Of course, this could also be facilitated by a single camera with a higher resolution.

In terms of the NIR-based heart rate extraction, we found that the ICU environment was suboptimal for evaluating the efficacy of the NIR-based monitoring system. Many recordings did not aid in heart rate measurement coverage because the room never became sufficiently dark for RGB-based measurements to fail. Where NIR-based measurements did work, their performance was decent, but their coverage was limited by the presence of patient motion. To improve the usability of the NIR camera for heart rate monitoring, there are a few things to consider. First, in one patient, there was a problem with illumination variation caused by a blinking light that illuminated the patient. For RGB-based monitoring this is not problematic, since the POS method for extraction is robust against illumination variations. This is not the case for NIR monitoring. To avoid this issue, visible light filters on the camera lens could be used. However, in that case, a more powerful IR light source could be needed. The current light source was chosen with the primary goal being safe with regard to infrared light exposure of the ICU patient, but with careful consideration of safety standards, more could potentially be achieved. Furthermore, in some patients, the NIR-based extraction could not be used, because the IR light source was not pointed correctly at the patient. Here, an IR light source with a wider angle of illumination could prevent this from occurring.

In the evaluation of the combined heart rate extraction using RGB and NIR measurements, a 5 BPM agreement of 81.5%, an MAE of 6.52 BPM, and a coverage of 81.9% were achieved. The agreement in the postoperative patients was 91%, while it was only 62.6% in the emergency patients. Relative to the sizes of the two groups, more patients had an irregular heart rate in the emergency group, which negatively impacted the monitoring performance. Furthermore, more patients in the emergency group had problems related to the visibility of the face of the patient, which negatively affected the agreement. However, the two groups of patients are too small to evaluate if the difference in performance was systemic or if their occurrence was by chance, especially since irregular heart rhythm and problems with face detection also occur in some postoperative patients.

We have noted that irregular heart rhythms can be problematic for accurate heart rate monitoring. This can be partially attributed to the fact that a Fourier-spectrum-based heart rate extraction cannot always accurately represent the heart rate of irregular heart rhythms. Potentially, a time-domain-based analysis of the heart rhythm would be better suited if the heart rhythm is irregular. Implementation of a time domain analysis might also require changes to other steps in the processing chain and is therefore left for future work. In previous work [27], we have investigated the effect of atrial fibrillation and flutter on heart rate monitoring in a different dataset; a similar analysis in this dataset will be performed in future work.

One more important factor that we have not yet considered is skin pigment. In prior work [5], it was shown that the amplitude of the remote PPG signal is related to the skin tone, especially in RGB-based measurements. In our study, unfortunately, we could not evaluate the influence of skin pigments on the performance of heart rate extraction, as all patients are classified as having skin type I/II on the Fitz–Patrick scale. Further evaluation of these effects are left for future work.

We evaluated continuous camera-based respiration monitoring using an optical flow method that finds the breathing motion by restricting the optical flow in the chest ROI. We furthermore developed a metric which uses these restrictions and the residual error of the restricted optical flow model to determine whether a segment contains any non-breathing motion which could affect the accuracy of the respiration rate measurement. Using the breathing motion metric for both NIR- and RGB-based respiration rate extraction, we evaluated the overall performance of the combined respiration rate. Overall, a 3 RPM agreement of 91.1%, an MAE of 1.12 RPM, and a coverage of 59.1% were achieved. Postoperative patients achieved an agreement of 92.5% and a coverage of 64.8%, while the emergency patients achieved an agreement of 86.7% and coverage of 47%. The lower coverage for the emergency patient group can be partly attributed to the failure of the face ROI detection, which also affected the quality of the chest ROI. Furthermore, postoperative patients tended to be sedated at the start of the recording, while emergency patients were generally not, making it more likely that segments were excluded there based on the presence of non-breathing motion.

Our camera-based respiration rate extraction methodology still has some limitations. First, the respiration rate calculation method used in this work based on Fourier spectrum peaks was not always successful. In particular, the morphology of the breathing motion and irregular breathing could cause a loss of agreement with the reference. Therefore, a respiration rate measurement based on breath-to-breath intervals would likely be more successful and should be evaluated in future work. Furthermore, the breathing motion metric tends to be a conservative metric: if any non-breathing motion is detected in a segment, it is immediately discarded. In future work, we aim to extend the metric to an indicator which can detect non-breathing motion on a per-breath basis such that only real breaths are used to determine the respiration rate, while the signal affected by non-breathing motion is ignored. Finally, in the ICU patients in this work, the assumption that the breathing motion is primarily in the vertical direction was true; however, in other environments, this might not necessarily be the case, and as such, the methodology should be extended such that the system can automatically detect the direction of the respiration and adapt the assumptions based on it.

5. Conclusions

We evaluated the performance of camera-based heart and respiration rate monitoring. We found that with the exception of some remaining challenges, the extraction methods show promise in monitoring these vital parameters. Additionally, the largest time gap between two vital parameter measurements was 239 and 114 min for the heart and respiration rate, respectively. These are shorter than the typical intervals between spot checks in the general ward, which occur every 4–6 h [3]. These metrics, combined with our findings on the performance, show promise that camera-based heart and respiration rate monitoring could become a viable option to replace spot checks of these vital parameters in the general ward.

Author Contributions

Conceptualization, R.J.C.v.E., I.C.C., C.V., C.K., L.D., R.A.B., L.M., J.B., S.S. and S.Z.; methodology, R.J.C.v.E.; software, R.J.C.v.E.; validation, I.C.C.; formal analysis, R.J.C.v.E.; investigation, I.C.C. and C.V.; resources, I.C.C., C.V. and C.K.; data curation, I.C.C.; writing—original draft preparation, R.J.C.v.E.; writing—review and editing, I.C.C., C.V., C.K., R.A.B., L.D., S.S. and S.Z.; visualization, R.J.C.v.E.; supervision, R.A.B., L.D., L.M., J.B., S.S. and S.Z.; project administration, S.Z.; funding acquisition, R.A.B., L.D. and J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This study is part of the research proposal funded by ZonMw, NWO, the Hartstichting, Philips, and the Dutch CardioVascular Alliance (DCVA) through the funding call “Heart for a sustainable healthcare Medical devices for early recognition, prevention and treatment of cardiovascular diseases contributing to sustainable healthcare” (grant nr. IMDI104021003). Additional funding was obtained from the Catharina Onderzoeksfonds of the Catharina hospital in Eindhoven (grant nr. FORSEE-2019-6).

Institutional Review Board Statement

This work is part of the FORSEE study, a prospective single-center validation study of video-based vital sign monitoring in ICU patients in a tertiary care hospital (Catharina Ziekenhuis, Eindhoven, The Netherlands). The study was reviewed by the medical ethical committee of the MEC-U (Nieuwegein, The Netherlands, File no: W20.180), who determined that the Medical Research Involving Human Subjects Act (WMO) did not apply to this study. The study protocol was approved by the internal review boards of Catharina hospital Eindhoven on 11 September 2020. The study was registered on clinicaltrials.gov with the registry number: NCT06099327.

Informed Consent Statement

In total, this study includef 36 ICU patients from the Catharina hospital (Eindhoven, The Netherlands). For 25 patients, informed consent was obtained prior to elective surgery with a planned postoperative admission to the ICU. The remaining 11 patients had unscheduled (emergency) admissions to the ICU, and informed consent was obtained from the patient if they were conscious and otherwise from a family member.

Data Availability Statement

The research data collected in this study contain sensitive and identifiable information of the participants. To safeguard the privacy of the participants, the research data will remain confidential and cannot be shared.

Conflicts of Interest

R. Bouwman and L. Montenij acted as clinical consultants for Philips Research in Eindhoven, The Netherlands. This did not affect this work or the decision to submit this paper for publication. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

The International Surgical Outcomes Study Group. Global patient outcomes after elective surgery: Prospective cohort study in 27 low-, middle- and high-income countries. BJA Br. J. Anaesth. 2016, 117, 601–609. [Google Scholar] [CrossRef]
Churpek, M.M.; Yuen, T.C.; Huber, M.T.; Park, S.Y.; Hall, J.B.; Edelson, D.P. Predicting Cardiac Arrest on the Wards. Chest 2012, 141, 1170–1176. [Google Scholar] [CrossRef] [PubMed]
Michard, F.; Kalkman, C.J. Rethinking Patient Surveillance on Hospital Wards. Anesthesiology 2021, 135, 531–540. [Google Scholar] [CrossRef] [PubMed]
Watkinson, P.J.; Barber, V.S.; Price, J.D.; Hann, A.; Tarassenko, L.; Young, J.D. A randomised controlled trial of the effect of continuous electronic physiological monitoring on the adverse event rate in high risk medical and surgical patients. Anaesthesia 2006, 61, 1031–1039. [Google Scholar] [CrossRef]
Wang, W.; den Brinker, A.C.; Stuijk, S.; de Haan, G. Algorithmic Principles of Remote PPG. IEEE Trans. Biomed. Eng. 2017, 64, 1479–1491. [Google Scholar] [CrossRef]
de Haan, G.; Jeanne, V. Robust Pulse Rate From Chrominance-Based rPPG. IEEE Trans. Biomed. Eng. 2013, 60, 2878–2886. [Google Scholar] [CrossRef]
Yu, S.N.; Wang, C.S.; Chang, Y.P. Heart Rate Estimation from Remote Photoplethysmography Based on Light-Weight U-Net and Attention Modules. IEEE Access 2023, 11, 54058–54069. [Google Scholar] [CrossRef]
Chen, W.; McDuff, D. DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks. In Proceedings of the Computer Vision–ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part II. Springer: Cham, Switzerland, 2018; pp. 356–373. [Google Scholar] [CrossRef]
van Meulen, F.B.; Grassi, A.; van den Heuvel, L.; Overeem, S.; van Gilst, M.M.; van Dijk, J.P.; Maass, H.; van Gastel, M.J.H.; Fonseca, P. Contactless Camera-Based Sleep Staging: The HealthBed Study. Bioengineering 2023, 10, 109. [Google Scholar] [CrossRef]
Wang, W.; den Brinker, A.C. Algorithmic insights of camera-based respiratory motion extraction. Physiol. Meas. 2022, 43, 075004. [Google Scholar] [CrossRef]
Bartula, M.; Tigges, T.; Muehlsteff, J. Camera-based system for contactless monitoring of respiration. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; pp. 2672–2675. [Google Scholar] [CrossRef]
Alves, R.; Van Meulen, F.; Van Gastel, M.; Verkruijsse, W.; Overeem, S.; Zinger, S.; Stuijk, S. Thermal Imaging for Respiration Monitoring in Sleeping Positions: A Single Camera is Enough. In Proceedings of the 2023 IEEE 13th International Conference on Consumer Electronics-Berlin (ICCE-Berlin), Berlin, Germany, 2–5 September 2023; pp. 220–225. [Google Scholar] [CrossRef]
Lewis, G.F.; Gatto, R.G.; Porges, S.W. A novel method for extracting respiration rate and relative tidal volume from infrared thermography. Psychophysiology 2011, 48, 877–887. [Google Scholar] [CrossRef]
van Gastel, M.; Stuijk, S.; de Haan, G. Camera-based pulse-oximetry—Validated risks and opportunities from theoretical analysis. Biomed. Opt. Express 2018, 9, 102–119. [Google Scholar] [CrossRef] [PubMed]
van Gastel, M.; Verkruysse, W. Contactless SpO₂ with an RGB camera: Experimental proof of calibrated SpO₂. Biomed. Opt. Express 2022, 13, 6791–6802. [Google Scholar] [CrossRef] [PubMed]
Song, C.; Lee, S. Accurate Non-Contact Body Temperature Measurement with Thermal Camera under Varying Environment Conditions. In Proceedings of the 2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM), Seoul, Republic of Korea, 3–5 January 2022; pp. 1–6. [Google Scholar] [CrossRef]
Axelsson, J.; Sundelin, T.; Olsson, M.J.; Sorjonen, K.; Axelsson, C.; Lasselin, J.; Lekander, M. Identification of acutely sick people and facial cues of sickness. Proc. R. Soc. Biol. Sci. 2018, 285, 20172430. [Google Scholar] [CrossRef]
Forte, C.; Voinea, A.; Chichirau, M.; Yeshmagambetova, G.; Albrecht, L.M.; Erfurt, C.; Freundt, L.A.; e Carmo, L.O.; Henning, R.H.; van der Horst, I.C.C.; et al. Deep Learning for Identification of Acute Illness and Facial Cues of Illness. Front. Med. 2021, 8, 661309. [Google Scholar] [CrossRef]
Jorge, J.; Villaroel, M.; Tomlinson, H.; Gibson, O.; Darbyshire, J.L.; Ede, J.; Harford, M.; Young, J.D.; Tarassenko, L.; Watkinson, P. Non-contact physiological monitoring of post-operative patients in the intensive care unit. NPJ Digit. Med. 2022, 5, 4. [Google Scholar] [CrossRef]
Hu, P.; Ramanan, D. Finding Tiny Faces. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Los Alamitos, CA, USA, 21–26 July 2017; pp. 1522–1530. [Google Scholar] [CrossRef]
Henriques, J.F.; Caseiro, R.; Martins, P.; Batista, J. High-Speed Tracking with Kernelized Correlation Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 583–596. [Google Scholar] [CrossRef]
Xu, Y.; Yan, W.; Yang, G.; Luo, J.; Li, T.; He, J. CenterFace: Joint Face Detection and Alignment Using Face as Point. Sci. Program. 2020, 2020, 7845384. [Google Scholar] [CrossRef]
Kamshilin, A.; Nippolainen, E.; Sidorov, I.; Vasilev, P.; Erofeev, N.; Podolian, N.; Romashko, R. A new look at the essence of the imaging photoplethysmography. Sci. Rep. 2015, 5, 10494. [Google Scholar]
Moeyersons, J.; Amoni, M.; Van Huffel, S.; Willems, R.; Varon, C. R-DECO: An open-source Matlab based graphical user interface for the detection and correction of R-peaks. PeerJ Comput. Sci. 2019, 5, e226. [Google Scholar] [CrossRef]
Lucas, B.D.; Kanade, T. An iterative image registration technique with an application to stereo vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence, IJCAI’81, San Francisco, CA, USA, 24–28 August 1981; Volume 2, pp. 674–679. [Google Scholar]
Kearney, J.K.; Thompson, W.B.; Boley, D.L. Optical Flow Estimation: An Error Analysis of Gradient-Based Methods with Local Optimization. IEEE Trans. Pattern Anal. Mach. Intell. 1987, 2, 229–244. [Google Scholar]
Cramer, I.; van Esch, R.; Verstappen, C.; Kloeze, C.; van Bussel, B.; Stuijk, S.; Bergmans, J.; van ’t Veer, M.; Zinger, S.; Montenij, L.; et al. Accuracy of remote, video-based supraventricular tachycardia detection in patients undergoing elective electrical cardioversion: A prospective cohort. J. Clin. Monit. Comput. 2025. [Google Scholar] [CrossRef]

Figure 1. Flowchart of the complete methodology of heart and respiration rate extraction. Details on each processing step can found in the section step beneath each block. An asterisk indicates an adaptation has been made to the existing method [5,10,20,21]. A plus sign indicates a new contribution.

Figure 2. Complete measurement setup for RGB and NIR video monitoring in the ICU.

Figure 3. Flowchart of the combination strategy for RGB and NIR heart rate measurements.

Figure 4. Flowchart of the combination strategy for RGB and NIR respiration rate measurements.

Figure 5. Performance of subset of 18 patient RGB recordings evaluated against a minimum face ROI brightness threshold.

Figure 6. Examples of heart rate extraction errors due to limited face visibility. The detected face ROI is marked in red. The area around and including the patients face is blurred to ensure anonimity of the participants. (a) Overexposure from face detection failure and limited visibility. (b) Face detection failure due to limited face visibility. (c) Limited face visibility due to equipment and blanket.

Figure 7. Performance of the NIR-based heart rate measurement plotted against the motion threshold.

Figure 8. (a) RGB- and NIR-based respiration rate extraction performance versus breathing motion metric. (b) RGB, NIR, and combined respiration rate extraction performance versus chest ROI brightness.

Table 1. Summarized patient characteristics. One patient of the total of thirty-six was excluded from further analysis due to missing video data.

Description	Mean (Standard Deviation)
Number of patients	35
Total recording time (h)	699
Age (years)	68.0 (8.7)
Height (cm)	173.0 (8.4)
Weight (kg)	86.5 (15.2)
BMI	28.8 (4.7)
Male	22
Female	13
Fitz–Patrick scale I/II	35

Table 2. Respiration rate Fourier spectrum peak correction through finding the first harmonic.

Criteria	Choice
$\| f_{1} - \frac{f_{2}}{2} \| < 1$ RPM	$f_{r e s p} = f_{1}$
$\| \frac{f_{1}}{2} - f_{2} \| < 1$ RPM	$f_{r e s p} = f_{2}$
$\| f_{1} - \frac{f_{2}}{3} \| < 1$ RPM	$f_{r e s p} = f_{1}$
$\| \frac{f_{1}}{3} - f_{2} \| < 1$ RPM	$f_{r e s p} = f_{2}$
$\| \frac{f_{1}}{3} - \frac{f_{2}}{2} \| < 1$ RPM	$f_{r e s p} = \frac{f_{1}}{6} + \frac{f_{2}}{4}$
$\| \frac{f_{1}}{2} - \frac{f_{2}}{3} \| < 1$ RPM	$f_{r e s p} = \frac{f_{1}}{4} + \frac{f_{2}}{6}$

Table 3. Evaluation metrics of the heart rate extraction for each patient. Patients where NIR-based measurements contributed to the coverage have been highlighted.

Patient	Agreement (%)	MAE (BPM)	Coverage (%)	Time Gap (min)	Valid Time (min)
1	94	2.28	90.6	13	1166
2	85.7	6.39	100	0	1056
3	94.6	2.36	59.7	122	1125
4	99.1	0.48	100	0	1245
5	57.4	20.51	100	1	829
6	85.2	7.13	73	60	1057
7	96.9	1.47	67.8	41	1043
8	97.5	0.85	99.5	4	1309
9	98.3	0.68	99.6	2	1263
10	95.4	2.1	56.6	46	737
11	99	0.21	100	1	889
12	76.3	10.99	97.5	10	767
13	83.5	3.88	95.5	2	617
14	94.8	2.03	96.3	18	482
15	99.4	0.19	99.5	2	1117
16	94.9	1.87	65.6	104	1073
17	93.8	2.13	76.4	37	1071
18	63.5	7.95	100	1	1105
19	98.7	1.18	59.4	134	877
20	97.6	0.83	70.4	126	1293
21	77.8	8.33	100	0	99
22	93.2	2.2	100	1	338
23	96.8	2.29	65.2	112	1147
24	95.9	1.13	99.6	4	1201
Postoperative	91	3.45	85.4	134	22,906
25	36.8	28.16	65.8	239	1306
26	8.6	33.54	85.7	26	258
27	66.9	14.04	79.3	90	2133
28	26	15.99	75.2	84	1336
29	82.9	6.3	53	29	1266
30	98.9	0.3	99.8	1	1206
31	77.4	8.98	71.5	87	1156
32	50.5	12.1	67.7	65	984
33	51.6	18.58	97.5	19	1181
34	91.2	2.13	79.3	18	1117
35	43.5	16.58	58.8	123	896
Emergency	62.6	12.72	75.6	239	12,837
Overall	81.5	6.52	81.9	239	35,743

Table 4. Performance of the respiration rate extraction for each patient. Patients where NIR contributed to the coverage have been highlighted.

Patient	Agreement (%)	MAE (RPM)	Coverage (%)	Time Gap (min)	Valid Time (min)
1	95.5	0.58	69.9	15	1168
2	99.4	0.09	84.9	7	1057
3	95	0.65	73.4	18	1126
4	98.3	0.17	87.9	11	1246
5	93	0.88	53.9	57	823
6	100	0.03	56.9	7	102
7	81.7	1.44	45.1	22	983
8	97	0.41	69.6	15	1327
9	94	0.76	55.2	20	1264
10	93	0.78	38.1	21	717
11	96.4	0.37	53.3	26	890
12	90	1.09	55	43	763
13	96.5	0.44	41.1	50	621
14	97.4	0.25	64	52	484
15	83.9	1.64	74.5	21	1119
16	93.8	0.7	62.8	61	1074
17	88.2	1.36	57.8	64	1073
18	90.8	0.92	70.8	43	1046
19	87	1.19	71.2	17	87
20	99.2	0.09	86.4	11	1293
21	100	0	51	5	100
22	52.2	6.29	26.6	21	338
23	88.7	1.04	77.4	15	1114
24	81.6	2.18	54.9	99	1206
Postoperative	92.5	0.8	64.8	99	20,606
25	82.2	2.96	67.3	36	1323
26	81.7	2.42	31.7	43	259
27	76.4	3.83	51	101	2128
28	86.9	1.8	41.1	34	1337
31	95	1.12	37.7	73	1158
32	85.8	2.06	24.3	114	985
33	88.2	1.27	43.6	38	1182
34	97.5	0.31	68	9	1117
35	97.5	0.45	35.1	59	897
Emergency	86.7	2.05	47	114	11,592
Overall	91.1	1.12	59.1	114	32,198

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

van Esch, R.J.C.; Cramer, I.C.; Verstappen, C.; Kloeze, C.; Bouwman, R.A.; Dekker, L.; Montenij, L.; Bergmans, J.; Stuijk, S.; Zinger, S. Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU. Appl. Sci. 2025, 15, 3422. https://doi.org/10.3390/app15073422

AMA Style

van Esch RJC, Cramer IC, Verstappen C, Kloeze C, Bouwman RA, Dekker L, Montenij L, Bergmans J, Stuijk S, Zinger S. Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU. Applied Sciences. 2025; 15(7):3422. https://doi.org/10.3390/app15073422

Chicago/Turabian Style

van Esch, Rik J. C., Iris C. Cramer, Cindy Verstappen, Carla Kloeze, R. Arthur Bouwman, Lukas Dekker, Leon Montenij, Jan Bergmans, Sander Stuijk, and Svitlana Zinger. 2025. "Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU" Applied Sciences 15, no. 7: 3422. https://doi.org/10.3390/app15073422

APA Style

van Esch, R. J. C., Cramer, I. C., Verstappen, C., Kloeze, C., Bouwman, R. A., Dekker, L., Montenij, L., Bergmans, J., Stuijk, S., & Zinger, S. (2025). Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU. Applied Sciences, 15(7), 3422. https://doi.org/10.3390/app15073422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Camera-Based Continuous Heart and Respiration Rate Monitoring in the ICU

Abstract

1. Introduction

2. Materials and Methods

2.1. Measurement Setup

2.2. Video Dataset

2.3. Reference Vital Signs

2.4. Heart Rate Extraction

2.4.1. Face ROI Detection

2.4.2. Remote PPG Signal Extraction

2.4.3. Heart Rate Estimation from Remote PPG

2.4.4. Reference Heart Rate

2.4.5. Error Analysis of RGB-Based Measurements

2.4.6. Error Analysis of NIR-Based Measurements

2.4.7. Combining RGB and NIR

2.5. Respiration Rate Extraction

2.5.1. Chest ROI Detection

2.5.2. Breathing Motion Extraction

2.5.3. Respiration Rate Estimation

2.5.4. Reference Respiration Rate

2.5.5. Detection of Breathing Motion Errors

2.5.6. Combining RGB and NIR

3. Results

3.1. Heart Rate Extraction

3.1.1. Error Analysis of RGB-Based Extraction

3.1.2. Error Analysis of NIR-Based Extraction

3.1.3. RGB and NIR Combination

3.2. Respiration Rate Extraction

Combining RGB and NIR

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI