1. Introduction
Patients in hospitals have a 17% chance of experiencing at least one complication postoperatively [
1], which increases to up to 27% for patients undergoing major surgery. These statistics show how important facilitating the early detection of postoperative complications is in hospitals. Outside of high acuity departments such as the ICU, detection of complications is based on intermittent spot checks of vital signs. The main limitation of spot checks is the time between measurements, which is typically a few hours. Deviations in vital signs often manifest several hours before an adverse event occurs [
2] but could take too long to detect using spot checks. Continuous monitoring of vital signs has been previously shown to contribute to earlier detection of adverse events in surgical departments [
3]. However, current continuous monitoring techniques, which are used in the ICU for patient monitoring, are not suitable for deployment on low-acuity departments. Wired sensors are not generally acceptable there, as they could impede patient mobilization and can be uncomfortable for patients [
4]. Instead, wearable or contactless sensors would be more appropriate for continuous monitoring on low-acuity departments.
Camera-based vital signs monitoring is one group of relatively new technologies that could facilitate continuous monitoring in low-acuity departments. The main advantage of camera-based monitoring is that it is contactless and thus unobtrusive compared to contact sensors. Camera-based monitoring research focuses mainly on three different parts of the electromagnetic spectrum: visible light (RGB), near-infrared (NIR) light, and far-infrared light (thermal radiation). In these three groups, many techniques have been developed to monitor various vital parameters, such as heart rate [
5,
6,
7,
8], heart rate variability [
9], respiration rate [
10,
11,
12,
13], SpO
2 [
14,
15], and temperature [
16]. Furthermore, research has also shown that monitoring other facial features, such as the redness of the skin, can be used as markers for detecting patient deterioration [
17,
18].
Camera-based heart and respiration rate monitoring is by far the most mature of these technologies. In this work, we will focus on remote photoplethysmography (remote PPG) for heart rate monitoring and breathing motion extraction for respiration rate monitoring. First, remote PPG is a technique that measures microscopic skin color variations caused by the cardiovascular pulse wave. The remote PPG signal can be extracted using RGB and NIR cameras. Many methods have been proposed for the (robust) extraction of the remote PPG signal [
5,
6]. Second, breathing motion extraction is based on detecting the respiratory movements of the torso. Various methods have been previously proposed, for example, based on optical flow [
10] or cross-correlation [
11].
While camera-based heart and respiration rate monitoring is a well-explored field of research, most validations of these techniques have been performed on healthy volunteers and/or short video recordings. There are potentially substantial challenges in continuous camera-based monitoring that cannot be easily emulated by short recordings on healthy subjects. These include naturally occurring patient activity, changes in illumination, occlusion of the patient, and clinical interventions. Although ideally these camera-based monitoring challenges are investigated in patients in low-acuity departments, where we would like to apply this technology, the lack of continuously measured reference values for heart and respiration rates limits what we can learn about the accuracy of camera-based monitoring in this environment. Although not ideal, the ICU provides a suitable tradeoff between the availability of reference values and the presence of challenging monitoring situations.
Previous research by Jorge et al. [
19] on camera-based heart and respiration rate monitoring in the ICU provided insights on the coverage and mean absolute error (MAE) using monochrome recordings of 15 postoperative ICU patients for a total duration of 233.5 h. First, they manually removed privacy moments and occlusions of the patient by visitors or staff. Then, from the remaining video they extracted heart rate with a coverage of 53.2% and respiration rate with a coverage of 63.1%.
In our work, we intend to add to their insights into camera-based monitoring performance and coverage by including other types of ICU patients. To be precise, we intend to evaluate state-of-the-art camera-based heart and respiration rate extraction methods on a group of postoperative cardiac surgery patients, which is a surgery type not included in the work of Jorge et al. [
19], and a second group of patients who had an emergency admission to the ICU. These groups of patients tend to have a wider range in their vital parameters and a higher rate of adverse events compared to the types of postoperative patients in the work of Jorge et al. [
19]. This can provide useful insights into whether the system is accurate even during abnormalities in the vital parameters, which can be an indicator of future deterioration.
Furthermore, Jorge et al. [
19] used a monochrome camera to extract heart rate. However, previous research [
5] shows that heart rate extraction based on a single wavelength camera is inferior to techniques that combine multiple color channels. In this work, we will show that for many patients, the only limitation on performance and coverage is the room lighting, where for these patients, a 5 beats/min (BPM) agreement of 97.3%, where the average heart rate is 80 BPM, and a coverage of 80.1% was reached. When room lighting is insufficient, a monochrome camera that is sensitive in the NIR spectrum and a NIR light source are still needed to provide coverage. This presents the additional challenge of combining measurements from both modalities depending on lighting conditions.
Finally, Jorge et al. [
19] performed extraction of the respiration rate through changes in the intensity of the pixels caused by breathing motion. Instead, an optical flow method is likely more effective for breathing motion extraction, since pixel intensity changes may not always correlate with breathing motion. Wang et al. [
10] demonstrated that such a method can be highly sensitive to breathing motion. This is required to measure the breathing motion of patients with varying breathing patterns, amplitudes of chest movement, and postures relative to the camera. However, this method cannot yet distinguish the breathing motion from any other motion. The presence of non-breathing motion should be detected, as any respiration rate extracted while non-breathing motion is present could be inaccurate. In this work, we will therefore develop a metric through which we can determine whether extracted breathing motion is distorted by non-breathing motion or not.
In this work, our aim is to provide new insights into the performance and coverage of camera-based continuous heart and respiration rate monitoring using state-of-the-art methods. While it is unlikely that perfect coverage can be achieved with the current methods, our aim is to evaluate the usability of camera-based monitoring to replace spot-check-based monitoring conducted in the general ward. In this setting, it will not be crucial to measure absolutely continuously, but up-to-date measurements need to be available when required by hospital staff. As such, challenging situations where camera-based monitoring becomes inaccurate need to be understood, as the prolonged presence of these situations could lead to the unavailability of up-to-date and accurate heart and respiration rate measurements. As such, in this work, we will investigate where challenging situations occur and what their effect is on the extraction of the heart and respiration rate.
In
Figure 1, our complete proposed methodology for camera-based heart and respiration can be seen. The remainder of this paper is structured as follows: In
Section 2 we will outline the measurement setup, video data collection, and the reference vital parameters. Then, in
Section 3, we show the extraction methods for the heart and respiration rate for both RGB-based and NIR-based extraction. In addition, we detail how we will analyze the performance of the heart and respiration rate extraction. Finally, we develop the breathing motion metric to distinguish breathing motion from non-breathing motion for respiration rate extraction. In
Section 4, we will then lay out the results for each vital parameter and imaging modality. Finally, in
Section 5, we will draw conclusions on the current performance of camera-based heart and respiration rate monitoring.
2. Materials and Methods
This work is part of the FORSEE study, a prospective single-center validation study of video-based vital sign monitoring in ICU patients in a tertiary care hospital (Catharina Ziekenhuis, Eindhoven, The Netherlands). The study was reviewed by the medical ethical committee of the MEC-U (Nieuwegein, The Netherlands, File no: W20.180), who determined that the Medical Research Involving Human Subjects Act (WMO) did not apply to this study. The study protocol was approved by the internal review boards of the Catharina hospital in Eindhoven. The study was registered on clinicaltrials.gov with the registry number: NCT06099327. This study took place from August 2022 until August 2023.
2.1. Measurement Setup
Video recording utilized two camera types: RGB and NIR. The RGB camera (3860-C-HQ Rev. 2, IDS GmbH, Obersulm, Germany) was configured for 32 fps 484 × 274 pixel resolution with 2× binning and 12-bit depth. To ensure adequate brightness of the face, the RGB camera’s automatic shutter function was run once every minute, and the ROI for the brightness target was set to the face ROI detected with a face detector [
22]. All other camera functions were disabled. The NIR camera (UI-3370CP-NIR-GL, IDS GmbH, Obersulm, Germany) operated at 16 fps 1920 × 1080 resolution with 8-bit depth. The lower frame rate was chosen to allow a longer shutter time, which was set to 66 ms to allow sufficient brightness of the NIR image during the night. Again, all other functions were disabled. Both cameras were equipped with a fixed 8 mm focal length lens (M118FM08, TAMRON, Saitama City, Japan) with a fully open aperture.
Mounting the cameras above the patient’s bed would be ideal, as they would not obstruct the staff, and the higher angle of incidence could reduce the chance that staff would occlude the view of the patient. However, altering the ICU room to mount the cameras on the ceiling was not feasible for this study. Instead, a mobile camera setup on a medical cart was placed at the foot end of the bed. This allowed a single setup to be used for data collection in multiple ICU rooms, and as the setup was mobile, it could be moved out of the way if needed by the staff. The three cameras and a baby monitor (DVM-71, Alecto, Kerkrade, The Netherlands), of which only the NIR light source was used, were mounted close together on the cart at a height of 2 m and were aimed such that patient visibility could be ensured regardless of bed height and headrest adjustments. Each of the cameras collected approximately 2GB/min of video data for a total of approximately 5.8 TB of video per 24 h per patient. Therefore, a laptop with 8 TB of external storage was used to collect and store video data of one patient. After a recording was finished, the video data were transferred to another storage location and processed offline. Some patients were also seated in a chair next to the bed for a part of the recording, in which case the setup was moved so that they were still within the field of view. If during a recording a moment of privacy was needed, the cameras were covered. The complete measurement setup can be seen in
Figure 2.
2.2. Video Dataset
In total, this study included 36 ICU patients from the Catharina hospital (Eindhoven, The Netherlands). For 25 patients, informed consent was obtained prior to elective surgery with planned postoperative admission to the ICU. The remaining 11 patients had unscheduled (emergency) admissions to the ICU, and informed consent was obtained from the patient if they were conscious and otherwise from a family member. Each patient participated in the trial for up to 24 h or until discharged from the ICU, whichever occurred first. For one postoperative patient, a failure in the recording laptop caused the RGB camera to not store any video data, and therefore, this patient was not included in any further analysis. The summarized characteristics of the patients can be found in
Table 1. Furthermore, the operation of the NIR camera was dependent on the operation of the NIR light source. In some cases, the NIR light source or the whole cart was moved or unplugged from the outlet, rendering any video recording from the NIR camera inadequate for further analysis. This situation occurred for two postoperative patients in total.
Each video was annotated with a number of parameters that would impede the measurement completely, i.e., the patient was not in the field of view of the camera, was (partially) occluded by staff or equipment, or when the camera was covered by staff for privacy reasons. For the purpose of annotation, a second set of videos was created from the raw video recordings, which were downsampled to a frame rate of 1 frame/minute. This resulted in a reasonable number of frames to be annotated as compared to the original 699 h of recording per camera. Furthermore, the frame rate of 1 frame/minute is considered sufficiently high such that most patient occlusions by staff and equipment can be detected, as they typically take 1 min or more to complete. These annotations were then interpolated to the frame rates of the recordings by copying the nearest annotation of each video frame in the original recording.
2.3. Reference Vital Signs
In addition to the video recordings, contact-based vital signs measurements are also collected. For the purpose of this study, a single-lead ECG was collected from the patient monitor (Intellivue MP70/MX750, Philips Medical, Best, The Netherlands) as the gold standard for heart rate. Furthermore, airway flow, pressure, and ECG thoracic impedance were collected as the golden standard for respiration rate. For the first 24 patients, contact sensor data were collected using a second laptop connected to the patient monitor using the iXTrend software package version 2.1 (ixitos GmbH, Aken, Germany). For the remaining patients, Data Warehouse Connect version 4.0 (Philips Medizin Systeme Böblingen GmbH, Böblingen, Germany) was used to collect reference data from the patient monitor. The collection software was changed during the duration of the study due to the replacement of patient monitors in the ICU, and these new patient monitors were not compatible with the iXtrend software.
2.4. Heart Rate Extraction
In this work, the camera-based heart rate measurement was extracted from a face ROI through the extraction of the remote PPG signal using both the RGB and NIR cameras. We detail briefly the processing pipeline for heart rate extraction of both cameras using (modified) existing state-of-the-art methods. The different processing steps can be seen in
Figure 1, where we refer to the methodology used below each processing step. Furthermore, we have used an asterisk to mark methods that were modified and a plus sign to denote processing steps that were new contributions.
2.4.1. Face ROI Detection
For both NIR and RGB video, the face ROI was automatically detected using a face detector [
20] every 30 s. The face detector produced ROIs and a corresponding score for all potential faces in the image. The patient’s face ROI was selected from the two highest scoring ROIs depending on which was closer to the center of the frame. For all other frames, the face ROI was tracked using a Kernelized Correlation Filter (KCF) tracker [
21]. If the face detector or tracker produced a face ROI of which the center jumped more than 50 pixels in either direction between two frames, the face detector was immediately run again to ensure the patient’s face was still correctly detected.
2.4.2. Remote PPG Signal Extraction
The remote PPG signal was then extracted from the video frames using the detected face ROIs. First, for the NIR camera, the remote PPG signal was simply calculated as the spatial mean of the pixels inside the face ROI over time. Here, we must consider that there is spatial variation in the phase of the remote PPG signal across the skin [
23], which could affect the spatial mean. Here, we used a Fourier-spectrum-based heart rate extraction (see
Section 2.4.3), which limits the effects of the phase information. Additionally, Kamishilin et al. [
23] note that local skin pixels can have counter-phase PPG signals, which would be problematic when averaging. In our work, the area of skin a single pixel covers is much larger, which makes it unlikely that we can observe or mitigate these effects. For the RGB camera, the model-based Plane-Orthogonal-to-Skin (POS) method [
5] was used to extract the remote PPG signal from the R, G, and B color channels in the face ROI.
Regularly, the POS method uses the spatial means of the R, G, and B channels inside the face ROI to extract the remote PPG signal. However, in the hospital environment where lighting is less controlled, overexposure of pixels could occur, which distorts the pulsatile changes in color due to blood flow. To limit the influence of overexposed pixels on the channel means, we proposed assigning weights to each pixel based on the frequency of overexposure in the last 0.5 s (16 frames). The weight of each pixel is calculated as , where n is the number of times that pixel was overexposed in the last 16 frames. The weights were updated for each new frame and then used to calculate the weighted spatial means of the R, G, and B channels.
2.4.3. Heart Rate Estimation from Remote PPG
Both the remote RGB and the NIR based PPG signals were segmented into non-overlapping segments of 60 s using a hanning window. The windowed segments were then zero padded such that the Fourier spectrum of the segments had a resolution of 0.1 BPM. For each segment, the heart rate was chosen as the peak of the Fourier spectrum between 40 and 220 BPM, which we consider as the range of valid human heart rates. Then, based on the video annotations, any segments which contained frames that were marked as privacy sensitive, out of view, etc., were discarded, and the remaining segments were marked as valid time. In Table 3, we have reported the valid recording time per patient.
2.4.4. Reference Heart Rate
The reference heart rate was extracted using the single-lead ECG measurements. The R peaks of each patient’s ECG were detected using RDECO [
24] and processed into inter-beat intervals (IBIs). These IBIs were also divided into segments of 60 s, and the heart rate of each segment was calculated by taking the mean of the IBIs.
2.4.5. Error Analysis of RGB-Based Measurements
RGB-based heart rate extraction relies on visible light to enable successful heart rate measurements from a patient. As the measurement setup in this work does not include a dedicated visible light source, variations in room lighting, weather, and bed placement in the ICU room can strongly affect the lighting conditions in the room. While we have previously described how we attempted to prevent and combat overexposure of face pixels, there will also be a minimum brightness level at which we can still reliably extract heart rate from RGB video frames. First, we determined the brightness level as the mean across all pixels in the face ROI and the R, G, and B channels. Then, we assessed the 5 BPM agreement, coverage, and mean absolute error (MAE) as we increased a minimum brightness threshold to determine how the RGB-based extraction was affected by the face ROI brightness.
After we removed segments that could have poor agreement due to the brightness level, we further investigated the RGB recordings of patients for significant errors and tried to, based on investigation of the ground truth and video recordings, sketch a picture of likely causes for poor quality monitoring in those recordings. Furthermore, we will also highlight which step in the heart rate extraction seems to be the problem.
2.4.6. Error Analysis of NIR-Based Measurements
The methodology of NIR-based heart rate extraction differs from the RGB-based extraction. The NIR-based extraction uses a dedicated infrared light source, which means that the lighting conditions are largely controlled. However, unlike RGB-based extraction, patient motion can cause errors in the NIR-based measurements. Patient motion can add intensity variations to the skin pixels that are not caused by blood flow. If their frequency content is within the heart rate range, i.e., 0.66–3.6 Hz (40–220 BPM), the extraction of the heart rate during patient motion could become inaccurate. As the NIR camera is monochrome, the POS method cannot be applied to provide robustness against patient motion. Therefore, detection of patient motion is necessary for NIR-based heart rate extraction such that measurements distorted by patient motion can be discarded.
To evaluate the effect of motion on the remote PPG extraction, we used the Lukas–Kanade optical flow [
25] to quantify the motion in each frame of the NIR videos. We then found the mean optical flow magnitude of all pixels within the face ROI. Finally, we classified a segment of remote PPG signal to contain motion if any frame within it had a motion magnitude greater than some threshold. We then determined this threshold based on a tradeoff between the agreement and coverage of the NIR-based heart rate extraction.
2.4.7. Combining RGB and NIR
In the brightness range where the RGB camera has no coverage or poor performance, the NIR camera should replace it to ensure coverage in this brightness range. To enable this, we applied the combination strategy detailed in
Figure 3. Since the RGB-based extraction is motion-robust and the NIR-based extraction is not, the RGB-based extraction was used above the to-be-determined face ROI brightness threshold. Below the brightness threshold, the NIR measurement was used as long as the patient’s motion was beneath the motion threshold.
After choosing a motion threshold for the NIR measurements and a RGB face ROI brightness threshold, we reported for each patient individually the performance of the combined RGB and NIR heart rate measurements. For each patient, we reported a number of evaluation metrics: the 5 BPM agreement, MAE, coverage, largest time gap, and valid time. The largest time gap consists of the longest sequence of discarded measurements in the camera-based heart rate measurements during the valid time of each patient. The valid time consists of the total time where the cameras are recording and reference measurements are available, excluding annotated moments of privacy when the patient was not present or when the view of the patient was obstructed by staff or equipment.
2.5. Respiration Rate Extraction
To enable a camera-based respiration rate extraction during all lighting conditions, both the RGB and NIR cameras were used. The extraction of the respiration rate in this work is based on the measurement of the chest movements of the patients. To accurately measure these chest movements, an optical flow method as proposed by Wang et al. [
10], specifically the M1D-OF method described in their work, was used for both types of video. This method restricts the general optical flow calculation to increase the sensitivity to breathing motion based on a few assumptions: First, it is assumed that all respiratory motion is in the vertical direction. As such, in the optical flow estimation, the horizontal flow is assumed to be zero. Second, it assumes that the breathing motion is homogeneous across the entire chest ROI. The different processing steps for respiration rate extraction can be found in
Figure 1.
2.5.1. Chest ROI Detection
The chest ROI that was used to determine breathing motion is derived from the location and size of the face ROI. It is defined as a region directly below the face ROI. Its height is equal to the height of the face ROI, and its width is three times that of the face ROI.
2.5.2. Breathing Motion Extraction
For each video frame, the pixels in the chest ROI were extracted and converted to grayscale. The M1D-OF method extracts the chest motion between pairs of frames. The distance between these frames was chosen to be 12 frames for the RGB camera and 5 frames for the NIR camera. To extract the respiration signal, it was postprocessed as described by Wang et al. [
10], which includes a sliding window cumulative sum to integrate the signal while avoiding baseline wander of the respiration signal.
2.5.3. Respiration Rate Estimation
Similarly to the calculation of the heart rate, we used a windowed Fourier transform to extract the respiration rate from the chest motion signal. Again, we divided the respiration signals into segments of 60 s and selected the respiration rate as a peak of the Fourier spectrum of each segment. However, the highest spectrum peak is not always the respiration rate; it could instead be one of the higher-order harmonics due to signal morphology. To find the correct respiration rate, we first selected the two largest peaks within the 6–48 breaths/min (RPM) range, ensuring that the smaller peak was at least half the height of the larger. Second, we determined the first harmonic through the criteria in
Table 2. If a valid first harmonic was found, it was chosen as the respiration rate; otherwise, the highest peak was selected. In addition, any segments where the patient was out of view or occluded from the camera were discarded, and the remaining valid time can be found in Table 4.
2.5.4. Reference Respiration Rate
For the reference respiration rate, we used the same approach as the camera respiration rate extraction for all three references; however, they were not all available all of the time. To form a single reference respiration rate, they were prioritized in the following order: airway flow, airway pressure, and finally ECG-based thoracic impedance. If all three were unavailable, the segment was removed from further consideration.
2.5.5. Detection of Breathing Motion Errors
The M1D-OF method [
10] cannot yet differentiate between breathing and other motion in the chest ROI, leading to inaccurate respiration measurements in segments containing non-breathing motion. Here, we developed a metric to detect when non-breathing motion occurred such that inaccurate measurements could be discarded.
The image brightness function
describes the brightness of the pixels in the chest ROI over time. The gradient constraint equation [
26] describes how optical flow, i.e., motion, relates to the image brightness function. The gradient constraint equation at time
t is written as follows:
where
,
, and
are gradients of the brightness function toward their respective variables. The calculation of these gradients is detailed by Wang et al. [
10]. Finally,
and
represent the horizontal and vertical motion. To solve Equation (
1) for
and
, the M1D-OF method of Wang et al. [
10] constrains the solution through the assumption that breathing motion is equal across the chest ROI and that the motion is only in the vertical direction. In terms of Equation (
1), this means that
for all
and that
is constant over all
. Given these assumptions, the least squares (LSs) solution of
is defined as follows:
Here,
and
are column vectors containing, respectively, the spatial and temporal gradients of each pixel in the chest ROI. This LSs solution minimizes residual error
r:
The constrained model of optical flow in the chest ROI only fits well to motion that does indeed adhere to the restrictions. We can argue that the constrained solution
in Equation (
2) will generally not be a good solution to Equation (
1) if
and/or
is not the same across all
. In terms of the residual error, we can write the following:
where
and
are the vectors of the true velocities
and
of all pixels in the chest ROI. If we substitute Equation (
4) into Equation (
3), we obtain
As such, the residual error contains the unmodeled motion in the horizontal direction and the difference between the true motion in the vertical direction relative to the solution
. Thus, any pixel in the ROI that contains motion that does not adhere to the constrained model will contribute to an increase in residual error. Indicating an increase in residual error could be a sign of non-breathing motion.
An additional source of residual error is the measurement error in spatial
and temporal
gradients, as explained by Kearney et al. [
26], which is caused by the sensor and quantization noise. Furthermore, systematic error arises from non-linearity in image brightness
, especially at sudden changes like blanket edges. Although small motions, such as breathing, are less affected, large non-breathing motions could be amplified by non-linearity in the brightness function [
26].
To summarize, non-breathing motion errors consist of unmodeled dynamics, and thus, the error changes with the direction and magnitude of the non-breathing motion and could be further amplified by non-linearity in the brightness function. In contrast, measurement error is unrelated to motion and will change slowly over time with changes in lighting conditions. To detect the presence of non-breathing motion using the residual error regardless of the contribution of measurement error, we used a sliding window with a 6-s duration and 1-frame stride to measure residual error standard deviation. If a segment’s standard deviation surpassed a threshold, it was marked as having non-breathing motion. In addition, we also discarded a segment if no motion was detected in it at all. To do this, any segment where the standard deviation of the measured breathing motion was below a threshold of 0.02 was also discarded.
To show how well this approach can work, we plotted the 3 RPM agreement, MAE, and coverage of the respiration rate over a range of thresholds. Since both NIR- and RGB-based respiration rate extraction use grayscale images, the same methodology can be applied to both; however, since the hardware and settings differ between the cameras, they must be evaluated separately.
2.5.6. Combining RGB and NIR
The combination strategy for the respiration rate measurements based on RGB and NIR is shown in
Figure 4, where we use
and
to denote the standard deviation of the breathing motion in a segment. First, thresholds
and
of the breathing motion metric were selected such that their overall 3 RPM agreement across all patients was 90%. Then, we chose whether to use the RGB- or NIR-based measurement based on the brightness level in the chest ROI; if it was sufficient, the RGB-based measurement was used. Otherwise, the NIR measurement was chosen. We determined the brightness threshold
by plotting the overall 3 RPM agreement, MAE, and coverage of both modalities and the combined measurements against the threshold and chose the threshold which yielded the best overall performance.
Finally, after choosing the breathing motion and brightness thresholds, for each patient, we reported a number of evaluation metrics: the 3 RPM agreement, MAE, coverage, largest time gap, and valid time. The largest time gap consists of the longest sequence of discarded measurements in the camera-based respiration rate measurements during the valid time of each patient. The valid time consists of the total time where the cameras are recording and reference measurements are available, excluding annotated moments of privacy when the patient was not present or when the view of the patient was obstructed by staff or equipment. Using these evaluation metrics, we investigated if there were any significant errors in the extraction of the respiration rate. Here, we will not distinguish between RGB- and NIR-based measurements but only interpret the combined performance, since the method of extraction was shared between the two modalities.
4. Discussion
In this work, our objective was to provide new insights into the performance and coverage of camera-based continuous heart and respiration rate monitoring using state-of-the-art methods in ICU patients. Furthermore, we aimed to evaluate the usability of camera-based monitoring to replace spot-check-based monitoring in the general ward.
In terms of RGB-based heart rate extraction, the brightness level was the only factor limiting agreement and coverage in 18 of 35 RGB recordings. In these recordings, segments with a brightness level greater than 25 showed a 97.3% 5 BPM agreement and a coverage of 80.1%. Only one of these recordings was from the emergency group, as more emergency patients faced challenging monitoring conditions compared to the postoperative group. Key challenges were identified in RGB-based heart rate measurements from the remaining 17 patients. First, irregular heart rates complicate Fourier spectrum analysis because the rhythm is not represented by a single peak. This occurred for three postoperative and for seven emergency patients. Further investigation into the type of rhythm and the effect on the heart rate measurement is left for future work. Second, RGB-based heart rate measurements often became inaccurate due to limited face visibility and often resulted in poor face ROI detection, especially when staff or visitors obstructed the camera view. The camera view was defined by our measurement setup. We used one RGB and one NIR camera at the bed’s foot end with fixed lens magnification, which was chosen to always keep the patient within the camera’s field of view. Future research should include how the current camera monitoring setup can be improved to increase agreement and coverage where the system of this work failed. Future improvements may include adding multiple cameras at different room locations for better visibility, ensuring the patient’s chest and face are always visible, or splitting monitoring tasks between different cameras with increased lens magnification for more detailed ROIs. Of course, this could also be facilitated by a single camera with a higher resolution.
In terms of the NIR-based heart rate extraction, we found that the ICU environment was suboptimal for evaluating the efficacy of the NIR-based monitoring system. Many recordings did not aid in heart rate measurement coverage because the room never became sufficiently dark for RGB-based measurements to fail. Where NIR-based measurements did work, their performance was decent, but their coverage was limited by the presence of patient motion. To improve the usability of the NIR camera for heart rate monitoring, there are a few things to consider. First, in one patient, there was a problem with illumination variation caused by a blinking light that illuminated the patient. For RGB-based monitoring this is not problematic, since the POS method for extraction is robust against illumination variations. This is not the case for NIR monitoring. To avoid this issue, visible light filters on the camera lens could be used. However, in that case, a more powerful IR light source could be needed. The current light source was chosen with the primary goal being safe with regard to infrared light exposure of the ICU patient, but with careful consideration of safety standards, more could potentially be achieved. Furthermore, in some patients, the NIR-based extraction could not be used, because the IR light source was not pointed correctly at the patient. Here, an IR light source with a wider angle of illumination could prevent this from occurring.
In the evaluation of the combined heart rate extraction using RGB and NIR measurements, a 5 BPM agreement of 81.5%, an MAE of 6.52 BPM, and a coverage of 81.9% were achieved. The agreement in the postoperative patients was 91%, while it was only 62.6% in the emergency patients. Relative to the sizes of the two groups, more patients had an irregular heart rate in the emergency group, which negatively impacted the monitoring performance. Furthermore, more patients in the emergency group had problems related to the visibility of the face of the patient, which negatively affected the agreement. However, the two groups of patients are too small to evaluate if the difference in performance was systemic or if their occurrence was by chance, especially since irregular heart rhythm and problems with face detection also occur in some postoperative patients.
We have noted that irregular heart rhythms can be problematic for accurate heart rate monitoring. This can be partially attributed to the fact that a Fourier-spectrum-based heart rate extraction cannot always accurately represent the heart rate of irregular heart rhythms. Potentially, a time-domain-based analysis of the heart rhythm would be better suited if the heart rhythm is irregular. Implementation of a time domain analysis might also require changes to other steps in the processing chain and is therefore left for future work. In previous work [
27], we have investigated the effect of atrial fibrillation and flutter on heart rate monitoring in a different dataset; a similar analysis in this dataset will be performed in future work.
One more important factor that we have not yet considered is skin pigment. In prior work [
5], it was shown that the amplitude of the remote PPG signal is related to the skin tone, especially in RGB-based measurements. In our study, unfortunately, we could not evaluate the influence of skin pigments on the performance of heart rate extraction, as all patients are classified as having skin type I/II on the Fitz–Patrick scale. Further evaluation of these effects are left for future work.
We evaluated continuous camera-based respiration monitoring using an optical flow method that finds the breathing motion by restricting the optical flow in the chest ROI. We furthermore developed a metric which uses these restrictions and the residual error of the restricted optical flow model to determine whether a segment contains any non-breathing motion which could affect the accuracy of the respiration rate measurement. Using the breathing motion metric for both NIR- and RGB-based respiration rate extraction, we evaluated the overall performance of the combined respiration rate. Overall, a 3 RPM agreement of 91.1%, an MAE of 1.12 RPM, and a coverage of 59.1% were achieved. Postoperative patients achieved an agreement of 92.5% and a coverage of 64.8%, while the emergency patients achieved an agreement of 86.7% and coverage of 47%. The lower coverage for the emergency patient group can be partly attributed to the failure of the face ROI detection, which also affected the quality of the chest ROI. Furthermore, postoperative patients tended to be sedated at the start of the recording, while emergency patients were generally not, making it more likely that segments were excluded there based on the presence of non-breathing motion.
Our camera-based respiration rate extraction methodology still has some limitations. First, the respiration rate calculation method used in this work based on Fourier spectrum peaks was not always successful. In particular, the morphology of the breathing motion and irregular breathing could cause a loss of agreement with the reference. Therefore, a respiration rate measurement based on breath-to-breath intervals would likely be more successful and should be evaluated in future work. Furthermore, the breathing motion metric tends to be a conservative metric: if any non-breathing motion is detected in a segment, it is immediately discarded. In future work, we aim to extend the metric to an indicator which can detect non-breathing motion on a per-breath basis such that only real breaths are used to determine the respiration rate, while the signal affected by non-breathing motion is ignored. Finally, in the ICU patients in this work, the assumption that the breathing motion is primarily in the vertical direction was true; however, in other environments, this might not necessarily be the case, and as such, the methodology should be extended such that the system can automatically detect the direction of the respiration and adapt the assumptions based on it.