Video-based monitoring is a potential non-contact system that could improve patient care. This iterative design study developed a novel algorithm that produced RR from footage analyzed from stable NICU patients in open cribs with corrected gestational ages ranging from 33 to 40 weeks. The final algorithm used a proprietary technique of micromotion and stationarity detection (MSD) to model background noise to be able to amplify and record respiratory motions. We found significant correlation—r equals 0.948 (p value of 0.001)—between MSD and the current hospital standard, electrocardiogram impedance pneumography. Our video-based system showed a bias of negative 1.3 breaths and root mean square error of 6.36 breaths per minute compared to standard continuous monitoring. Further work is needed to evaluate the ability of video-based monitors to observe clinical changes in a larger population of patients over extended periods of time.
Impedance pneumography (IMP) measures changes in electrical signal secondary to the movements of the chest and diaphragm and has been widely adapted to multiple clinical settings. IMP uses mathematical algorithms to convert other electronic signals such as pulse oximeters and electrocardiograms (ECGs) to a respiratory wave form and RR, while RIP is derived directly from the attached straps. IMP can be integrated into standard continuous monitoring systems used by most hospitals . While any manipulation of data or signal can introduce error, studies have shown strong correlation between RR gathered from RIP and IMP .
Nevertheless, there are several limitations of IMP. Information has to be gathered from electronic leads attached directly to the skin, and the signal needs to be amplified to be measured, making it susceptible to artifacts or noise. Artifacts and signal noise can originate from inadequate attachment of the electronic leads as well as any movement of the patient not related to breathing. In the neonatal population, these limitations are particularly apparent and produce several unique adverse outcomes. The frequency of false alarms secondary to the frequent movement of newborns is associated with provider alarm fatigue, infant hearing loss, and a disruptive environment for development . Additionally, the humid environment of neonatal incubators and the infant’s thin, underdeveloped skin cause the adhesive in electric leads to fail and require frequent changing. The recurrent application of adhesive to the fragile premature skin causes breakdown and inflammation of their dermal barrier, introducing possible sources for infection .
Several innovations have been developed for non-contact monitoring in neonatology to minimize risk of dermal injury and alarm fatigue in infants. A variety of techniques have been studied from radio wave signaling, ultrasound, imaging photoplethysmography (PPG), and video-based respiratory monitors. In controlled environments, they have been shown to correlate with ECG monitoring with correlation coefficients from 0.79 to 0.92 . Video-based respiratory monitors are particularly versatile due to the accessibility of cameras. These systems could be easily integrated into current monitoring systems and potentially used in clinics and at home for virtual medical appointments. However, due to the subtle motion of neonatal respiration, these technologies have struggled to accurately determine RR as the amplification of movement also increases signal noise, making it difficult to obtain an accurate measurement.
Eighteen subjects were enrolled and recorded in the study for the period of 2016–2017; one was excluded from final analysis due to conflicting documentation. At the time of recording, all infants were under the chronologic age of 10 weeks and had corrected gestational ages of 33–40 weeks. The mean age of patients at time of recording was 5 weeks and 35.5 weeks adjusted. The mean gestational age at birth was 30 weeks and 3 days. The population was 65% male and included a variety of racial backgrounds, with the largest group being White at 35%. Race was defined per electronic medical record with Asian including individuals of Indian and Southeast Asian descent. All subjects had various comorbidities of prematurity, with the prevalence of the most pertinent—apnea of prematurity, history of respiratory distress syndrome, chronic lung disease, and anemia of prematurity—demonstrated in Table 1.
Table 1. Demographics of enrolled subjects.
|Native Hawaiian or Other Pacific Islander||2||0.12|
|Apnea of Prematurity||13||0.76|
|History of Respiratory Distress Syndrome||11||0.65|
|Chronic Lung Disease||4||0.24|
|Anemia of Prematurity||15||0.88|
|Age at Recording (weeks)||5.3||3.1|
|Corrected GA 2 at Recording (weeks)||35.5||1.9|
|Birth GA 2 (weeks)||30.5||2.5|
|Height at Birth (cm)||40.4||3.8|
|Weight at Birth (kg)||1.4||0.4|
1 Standard Deviation (SD); 2 Gestational Age (GA).
The initial technique for extracting RR from footage was an Eulerian video magnification (EVM) algorithm to amplify movement, followed by a motion history image (MHI) algorithm to extract the motion.
EVM amplification has been described in other fields . Our algorithm decomposes the video into pyramids of Laplacian images of different spatial frequencies to allow for greater accuracy and large amplification of minute motion. Laplacian pyramids are commonly used in motion magnification . After pyramids were made, a low pass filter was performed and pixel intensity was increased based on each pyramid level. The amplified reconstructed frame was then superimposed on the original frame.
The MHI algorithm then generated the motion between frames by referencing successive binary silhouette images of the baby. The motion gradient between frames was used to calculate the amplitude for the inspiration/expiration signal of the baby’s breath and generate a continuous respiratory waveform. The respiratory rate could then be calculated from the waveform. The process of quantifying motion in a MHI is a unique process and has not been studied to our knowledge.
The benefits of the EVM and MHI approach was its ability to magnify small motions, with minimal processing and few artifacts. The critical deficits were the high rates of false positive signals. This was primarily from amplification of noise generated by changes in lighting and infant movement.
Principal flow field (PFF) methodology was used in hopes of decreasing the impact of noise and increasing processing speed. Flow fields are a common way to evaluate movement in engineering. To optimize processing speed, our PFFs were performed on segments instead of full frames. This was accomplished by calculating optical flow fields by generating pixel movement gradients between frames. Knowing that inhalation and exhalation would be in the opposite direction allowed us to form flow field matrices localized to pixels representing respiration rather than noise.
PFFs were computed using the flow field matrices on the initial frame and adapted for each sequential frame. The PFF was used to generate a continuous respiration signal, which again was used to generate RR.
The benefit of PFF was its localized respiratory movements which decreased processing times while picking up similarly minute movements as the EVM algorithm. However, this algorithm was limited due to recurrent false positive readings with no infant present in the crib. This was due to the optical flow fields picking up the best plausible signal that represented respiratory movement even if this motion was actually noise from video capture, video compression, movement artifacts, or lighting changing. As there is always some level of noise, this algorithm would generate respiratory signals and rates even if a baby was not breathing or even if not present in the frame.
The final algorithm utilized micromotion and stationarity detection (MSD) and has not previously been studied for this application. To overcome the challenge of finding respiratory motion at a frame-to-frame level without incidentally measuring noise, the MSD analyzed and modeled the noise instead of trying to eliminate it.
In order to model noise characteristics, the image was divided into small sub-regions where changes in pixel intensity were measured over time. The model of the noise consisted of standard deviation (SD) measures of the changes in pixel intensity.
Assuming that the SD of the change in pixel intensity over a series of frames, with no motion and only noise, would remain relatively small and equal from frame to frame, then a large change in the SD of pixel intensity would indicate a micro movement. This is how imperceptible movements of chest rise and fall could be located and measured.
Pixel intensities, or breathing motions, were calculated by taking SD measures of the SD measures. This generated heat maps that represented motion (Figure 2). To determine an RR, the number of peaks were counted and averaged over 100 frames of SD measured values.
Figure 2. Heat map derived from standard deviation (SD) measures of SD measures showing motion due to breathing. The red region represents high SD measures and the blue region represents low SD measures. The red region was concentrated near the baby’s chest, indicating that the measurement showed motion associated with breathing.
A significant benefit of the MSD is its insusceptibility to noise and capability to detect whether or not an infant is in the frame and if that infant has had an apneic event, defined as a pause in breathing for greater than 20 s. The MSD algorithm had difficulty with measuring RR while the patient had gross or macro movements, such as movement of arms, leg, or torso with crying or shifting while sleeping. After such movements, the algorithm needs to recalibrate over 100 frames, or 10 s at 10 frames per second, to ensure it has located the subject and is measuring the correct signal. ECG impedance pneumography also cannot extract RR with large patient movements, but recovers more quickly after only a couple of seconds.
The secondary analysis was completed by running the MSD analysis on two patients. Their 48 h of video recording was scanned for continuous time frames where the patient remained asleep, relatively still, and unobstructed by staff or parents providing care. The combined continuous uninterrupted video consisted of 21 min and 50 s and contained 246 time points.
The MSD algorithm takes 10 s to calibrate, as described above, and then populates an RR every 5 s. The EMR produced a time point every 1 s as long as there was no interruption in the signal. To assure the two data sets represented the same points in time, the RR of 10 sequential timestamps from the video algorithm were assessed against the same time stamp from the EMR, and if there was a large discrepancy, then that series would be considered inaccurate and would not be included in the final data set. After confirmation that the time stamps matched up, respiratory rates from the MSD algorithm were compared to the RR from the EMR at the same corresponding time stamp until there was another interruption in either signal that prevented the two RRs from being generated at the same time. This assessment would then repeat to find the next usable segment of data.
The average RR over the approximately 22 min was 65 breaths per minute (BPM) in the EMR data and 67 in the video monitor group. The standard deviations were 18.4 and 19.7, respectively. Both patients recorded were male and Caucasian. Their mean gestation age at birth was 29 weeks and 5 days. Their mean adjusted age at recording was 36 weeks and 2 days. Overlying tracings between the EMR and video monitor over the recording time (in seconds) for both subjects are shown in Figure 3.
Figure 3. (a) Respiratory rate (RR) (y-axis) from the video monitoring system compared to that of the extracted Electronic medical record (EMR) data over a 8.4 min recording made up of 90 time points (x-axis); (b) RR (y-axis) from the video monitoring system compared to that of the extracted EMR data over a 13.4 min recording made up of 155 time points (x-axis)
Comparison between the EMR and MSD respiratory rate via the Bland–Alterman method showed that the video monitoring system had a bias of 1.3 less breaths per minute, and 94.3% of all time point comparisons were between the upper and lower limits of agreement (Figure 4).
Figure 4. Bland–Altman plot. The central dark line represents a bias of −1.3 breaths per minute (BPM). The dashed lines represent the upper limit of agreement (10.9 BPM) and lower limit of agreement (−13.5 BPM).
A linear regression between the EMR data and those of the camera-based non-contact monitor showed a correlation coefficient or multiple R of 0.948, with a p value of 0.001. The regression showed an R squared of 90%. Assuming that the EMR data were representative of the true respiration rate, the error of the video-based monitoring system was calculated as 6.36 breaths per minute via a root mean square analysis (Figure 5).
Figure 5. Linear regression comparing video-based monitoring respiration rate (RR) (y-axis) vs. electronic medical record (EMR)- RR (x-axis). The dashed lines represent upper and lower boundaries of root mean square error between the modes of measurement.
The main limitation of this study is the relatively small sample size, short recording time, and lack of gold-standard comparison. While correlations are encouraging for the 246 time points recorded, the data were limited to a small population and short time period. As the patients in the secondary analysis were both Caucasian, the lack of diversity potentially limits the generalizability of the technology on a broader population level. This limitation may be mitigated by the technology’s capacity to detect RR through varied infant attire and different color swaddles. Lastly, by selecting relatively stable infants for 48 h of monitoring, our study was not designed to assess whether video monitoring systems can track clinically significant trends in respiratory rate. As this monitor was not available for the clinical team, it could not be assessed whether it had more or less false alarms or missed apneic or tachypnea events.
Video-based respiratory monitoring is at the earliest stages of development. However, our study shows the potential for MSD, making this technology equivalent to induction pneumography for at least a stable neonate in an open crib. While there are still a lot of qualifiers to its comparison, it is a major step toward a future without contact respiratory monitoring. Video-based monitoring could be integrated into incubator designs and attached to preexisting hospital cribs. With further validation, this technology could decrease the need for cardiac leads in more stable infants and be an adjunct monitoring tool for critically ill patients as a means to possibly decrease false alarms from patient movement.
A next major step for video-based respiratory monitoring systems is to validate the technology in a large cohort study inclusive of a diverse population with varying skin tone, race, sex, age, and clinical stability. Once validated, conducting studies that allow providers to view RR in real time would demonstrate the ability of video-monitoring to screen for apneas and tachypnea in infants. To be trusted in clinical settings, this technology must be shown to be as sensitive as induction pneumography. While preliminary data from this study suggest that it is as sensitive and specific as ECG-based systems, a longer and larger study focused on comparing significant events between the two systems is needed. There may also be opportunities for this technology to reduce false positive alarms, a known issue for hospitalized patients, either by being a more accurate monitor or by providing negative feedback to current monitors.
As MSD amplifies and tracks movement by averaging background noise, it has an advantage over induction pneumography which often falls short because of increased noise with faster respiratory rates. A future study comparing video-based respiratory monitoring and ECG induction pneumography to the gold standard of counting breaths for a minute in clinically ill and tachypneic patients could potentially show superiority of video-based systems in monitoring RR.
With the majority of pediatric illnesses being respiratory-related and the accessibility of high-quality cameras, such as those present in most personal cell phones, a video-based monitoring system could also have a large impact on outpatient medicine. It is conceivable that this technology could allow healthcare providers to measure RR throughout a telehealth video visit. This would provide valuable information that would aid in the decision-making process of whether or not a patient needs to go to an emergency room or would be safe to be cared for at home. With the SARS-CoV-2 pandemic increasing demand and showing the value of telemedicine visits, this technology could be invaluable.