*3.2. Interval Processing*

The design process went through three cycles creating and testing three di fferent algorithms: Eulerian video magnification followed by motion extraction, principal flow field, micromotion and stationarity detection. The final algorithm, micromotion and stationarity detection, achieved the design goals of consistently providing an RR for the subject and went on to the secondary analysis, comparing it to the continuous vital signs recorded into the subject's EMR.

### 3.2.1. Eulerian Video Magnification Followed by Motion Extraction

1

The initial technique for extracting RR from footage was an Eulerian video magnification (EVM) algorithm to amplify movement, followed by a motion history image (MHI) algorithm to extract the motion.

EVM amplification has been described in other fields [17]. Our algorithm decomposes the video into pyramids of Laplacian images of di fferent spatial frequencies to allow for greater accuracy and large amplification of minute motion. Laplacian pyramids are commonly used in motion magnification [18]. After pyramids were made, a low pass filter was performed and pixel intensity was increased based on each pyramid level. The amplified reconstructed frame was then superimposed on the original frame.

The MHI algorithm then generated the motion between frames by referencing successive binary silhouette images of the baby. The motion gradient between frames was used to calculate the amplitude for the inspiration/expiration signal of the baby's breath and generate a continuous respiratory waveform. The respiratory rate could then be calculated from the waveform. The process of quantifying motion in a MHI is a unique process and has not been studied to our knowledge.

The benefits of the EVM and MHI approach was its ability to magnify small motions, with minimal processing and few artifacts. The critical deficits were the high rates of false positive signals. This was primarily from amplification of noise generated by changes in lighting and infant movement.

### 3.2.2. Principal Flow Field

Principal flow field (PFF) methodology was used in hopes of decreasing the impact of noise and increasing processing speed. Flow fields are a common way to evaluate movement in engineering. To optimize processing speed, our PFFs were performed on segments instead of full frames. This was accomplished by calculating optical flow fields by generating pixel movement gradients between frames. Knowing that inhalation and exhalation would be in the opposite direction allowed us to form flow field matrices localized to pixels representing respiration rather than noise.

PFFs were computed using the flow field matrices on the initial frame and adapted for each sequential frame. The PFF was used to generate a continuous respiration signal, which again was used to generate RR.

The benefit of PFF was its localized respiratory movements which decreased processing times while picking up similarly minute movements as the EVM algorithm. However, this algorithm was limited due to recurrent false positive readings with no infant present in the crib. This was due to the optical flow fields picking up the best plausible signal that represented respiratory movement even if this motion was actually noise from video capture, video compression, movement artifacts, or lighting changing. As there is always some level of noise, this algorithm would generate respiratory signals and rates even if a baby was not breathing or even if not present in the frame.

### 3.2.3. Micromotion and Stationarity Detection (MSD)

The final algorithm utilized micromotion and stationarity detection (MSD) and has not previously been studied for this application. To overcome the challenge of finding respiratory motion at a frame-to-frame level without incidentally measuring noise, the MSD analyzed and modeled the noise instead of trying to eliminate it.

In order to model noise characteristics, the image was divided into small sub-regions where changes in pixel intensity were measured over time. The model of the noise consisted of standard deviation (SD) measures of the changes in pixel intensity.

Assuming that the SD of the change in pixel intensity over a series of frames, with no motion and only noise, would remain relatively small and equal from frame to frame, then a large change in the SD of pixel intensity would indicate a micro movement. This is how imperceptible movements of chest rise and fall could be located and measured.

Pixel intensities, or breathing motions, were calculated by taking SD measures of the SD measures. This generated heat maps that represented motion (Figure 2). To determine an RR, the number of peaks were counted and averaged over 100 frames of SD measured values.

**Figure 2.** Heat map derived from standard deviation (SD) measures of SD measures showing motion due to breathing. The red region represents high SD measures and the blue region represents low SD measures. The red region was concentrated near the baby's chest, indicating that the measurement showed motion associated with breathing.

A significant benefit of the MSD is its insusceptibility to noise and capability to detect whether or not an infant is in the frame and if that infant has had an apneic event, defined as a pause in breathing for greater than 20 s. The MSD algorithm had difficulty with measuring RR while the patient had gross or macro movements, such as movement of arms, leg, or torso with crying or shifting while sleeping. After such movements, the algorithm needs to recalibrate over 100 frames, or 10 s at 10 frames per second, to ensure it has located the subject and is measuring the correct signal. ECG impedance pneumography also cannot extract RR with large patient movements, but recovers more quickly after only a couple of seconds.

### *3.3. Respiratory Rate Correlation between ECG and MSD Video-Based Monitoring*

The secondary analysis was completed by running the MSD analysis on two patients. Their 48 h of video recording was scanned for continuous time frames where the patient remained asleep, relatively still, and unobstructed by staff or parents providing care. The combined continuous uninterrupted video consisted of 21 min and 50 s and contained 246 time points.

The MSD algorithm takes 10 s to calibrate, as described above, and then populates an RR every 5 s. The EMR produced a time point every 1 s as long as there was no interruption in the signal. To assure the two data sets represented the same points in time, the RR of 10 sequential timestamps from the video algorithm were assessed against the same time stamp from the EMR, and if there was a large discrepancy, then that series would be considered inaccurate and would not be included in the final data set. After confirmation that the time stamps matched up, respiratory rates from the MSD algorithm were compared to the RR from the EMR at the same corresponding time stamp until there was another interruption in either signal that prevented the two RRs from being generated at the same time. This assessment would then repeat to find the next usable segmen<sup>t</sup> of data.

The average RR over the approximately 22 min was 65 breaths per minute (BPM) in the EMR data and 67 in the video monitor group. The standard deviations were 18.4 and 19.7, respectively. Both patients recorded were male and Caucasian. Their mean gestation age at birth was 29 weeks and 5 days. Their mean adjusted age at recording was 36 weeks and 2 days. Overlying tracings between the EMR and video monitor over the recording time (in seconds) for both subjects are shown in Figure 3.

**Figure 3.** (**a**) Respiratory rate (RR) (y-axis) from the video monitoring system compared to that of the extracted Electronic medical record (EMR) data over a 8.4 min recording made up of 90 time points (x-axis); (**b**) RR (y-axis) from the video monitoring system compared to that of the extracted EMR data over a 13.4 min recording made up of 155 time points (x-axis)

Comparison between the EMR and MSD respiratory rate via the Bland–Alterman method showed that the video monitoring system had a bias of 1.3 less breaths per minute, and 94.3% of all time point comparisons were between the upper and lower limits of agreemen<sup>t</sup> (Figure 4).

**Figure 4.** Bland–Altman plot. The central dark line represents a bias of −1.3 breaths per minute (BPM). The dashed lines represent the upper limit of agreemen<sup>t</sup> (10.9 BPM) and lower limit of agreemen<sup>t</sup> (−13.5 BPM).

A linear regression between the EMR data and those of the camera-based non-contact monitor showed a correlation coefficient or multiple R of 0.948, with a *p* value of 0.001. The regression showed an R squared of 90%. Assuming that the EMR data were representative of the true respiration rate, the error of the video-based monitoring system was calculated as 6.36 breaths per minute via a root mean square analysis (Figure 5).

**Figure 5.** Linear regression comparing video-based monitoring respiration rate (RR) (y-axis) vs. electronic medical record (EMR)- RR (x-axis). The dashed lines represent upper and lower boundaries of root mean square error between the modes of measurement.
