1000/1000
Hot
Most Recent
Human vital signs such as temperature, breathing rate (BR), heart rate (HR), heart rate variability (HRV), blood oxygen saturation (SpO2), and blood pressure (BP) indicate human state of health to a large extent. COVID-19 effects on vital signs, as shown above, are significant on cardiorespiratory state and temperature. Certain other characteristics such as coughing and conjunctivitis are clearly detectable by video camera under the right circumstances, while being difficult to detect remotely using non-imaging technology. Coughing can be detected acoustically.
There are various non-contact methods for measuring vital signs remotely, including magnetic induction, the Doppler effect with radar or sonar, video camera imaging, and thermal imaging [1]. These techniques have been shown to be effective in remotely monitoring vital signs with acceptable accuracy, reliability, and sometimes practicality [1]. Each method mentioned above has its pros and cons and may outperform in some challenges but be inferior in others. Doppler radar is highly affected by motion artefacts, can have biological effects on humans, and is not suitable for monitoring multiple subjects simultaneously. Visible spectrum and thermal spectrum cameras offer the best options of those studied, thermal cameras due to their potential to measure all likely symptoms on a single camera, especially temperature, and video cameras due to their availability, cost, adaptability, and compatibility.
Infrared thermography (IRT), also known as thermal imaging, is a promising monitoring and diagnostic technique in the medical field [2]. For example, thermal imaging can be used to assess HR and BR, endocrinological disorders (especially diabetes mellitus), vascular disorders, neurological disorders, musculoskeletal disorders, oncology, regenerative medicine, and surgery. Additionally, a popular application of thermal imaging is screening persons with fever at airports, schools, hospitals, etc. Therefore, IRT is a passive, non-contact monitoring technique, which senses the radiation naturally emitted from an object such as human skin. Thermal imaging does not require an illuminating radiation source or a dedicated light source. These are the most significant advantages of thermal imaging over other non-contact techniques. They are not affected by illumination variation, can work in darkness, and are difficult to falsify using makeup or masks [3].
Thermal cameras respond to wavelengths starting at around 3 micrometres (µm) compared to visible wavelengths, ending at about 0.75 µm. An array of thermally sensitive elements behind a specified lens produces an image much like an optical camera, with higher radiation measured from hotter objects. This leads to a grey image, with lighter areas being warmer rather than darker. Depending on the application, this scale might be reversed for human operators. The use of “false color” is common in thermal images and thermal video, because it allows for increased ability for human operators to discern temperature. Colder temperatures are often assigned a shade of purple, blue, or green, whereas hotter temperatures can be given a shade of orange, red, or yellow [4]. Figure 1 shows a thermal image captured by a thermal camera where the person is covered in shades of orange and yellow, whereas other areas are blue and purple. That is because he is radiating more heat than surrounding objects. False color is purely a remapping of grey levels and is not a feature of the underlying thermal images.
Figure 1. Thermal image captured by a thermal camera.
Thermal cameras work on the principle that all objects that have a temperature above absolute zero (−273.15 ℃) emit electromagnetic energy, also known as infrared radiation or thermal radiation [5].
To accurately observe the temperature of a target using a thermal camera, the target should be an ideal black body. A black body in thermal equilibrium will absorb all incident electromagnetic radiation and emit an equal amount of energy through isotropic radiation. The distribution in wavelength of the emitted radiation is governed by the temperature of the body; thus, as the temperature increases, the distribution shifts towards shorter wavelengths. This phenomenon is described by Planck’s law. A thermal camera measures the predominant wavelength of the radiation emitted by a body. The human body is not an ideal Plank black body radiator but is a close approximation in some bands [5]. It is generally accepted that the emissivity of human skin in the IR range of 3–14 µm is 0.98 ± 0.01[5], which, while close to that of a black body, can cause disturbances due to non-ideal radiation emission and reflections. Thus, there will always be some degree of ambiguity when measuring skin temperature through thermal imaging. In a controlled environment, this can be accounted for with calibration. In uncontrolled environments, however, it is significantly more difficult to calibrate.
This is just one of the many limitations that arise when using thermal cameras for the detection of elevated body temperature. Infrared camera manufacturers, such as FLIR [6], state that such technology can only be used to detect skin temperature (as opposed to core body temperature) and must be operated in a controlled environment to do so. The temperature must be in the range of 20 to 24 °C, and the relative humidity should be in the range from 10% and 50%. The temperature measured on the skin’s surface is offset from the subject’s core body temperature; hence, measurements must be calibrated to be indicative of body temperature. Environmental, operational, and subjective factors such as convective airflow, reflective surfaces, IR contamination (from sources such as sunlight and heaters for example), the sweat produced by the subject, wearable accessories covering some part of the face (such as glasses or baseball caps), ambient temperature, humidity, and emissivity all have an effect on the acquired data. Thus, thermal imaging may be used to determine anomalies relative to other subjects; however, there are many factors that prevent the determination of the absolute core body temperature of a given subject from range. It is suggested that the area medially adjacent to the inner canthus of the eye provides the most consistent measurement, as determined with mass screening [7].
Video imaging is a passive and non-contact modality that can be delivered from common sources of video data including hand-held and fixed video cameras, webcams, smartphones, or from sensor platforms such as unmanned aerial vehicles (UAV) and robots, shown in Figure 2. Video analysis of vital signs generally relies on two phenomena. The first phenomenon known as color based methods or imaging photoplethysmography (iPPG) depends on skin color variations caused by cardiorespiratory activity. The second phenomenon, known as motion based methods, relies on cyclic body motion such as the motion of an arterial pulse, head movements, or movements within the thoracic and abdominal regions due to cardiorespiratory activity.
Figure 2. Different sensors used as video camera. (a) Webcam, (b) digital camera, (c) smartphone, (d) Microsoft Kinect, and (e) UAV.
Video cameras have been shown to be able to measure heart rate in controlled clinical settings, in the outdoors, and from multiple people and at long-range [8]. Recent work has demonstrated the breathing rate in challenging scenarios such as from a drone camera platform [9]. Video cameras have the ability to both diagnose and identify people, which can be useful in many scenarios [10]. Their cost is low, their installed base is large, and the availability of video cameras that can be pressed into service is high.
Video cameras are often integrated with a microphone that might allow detection of coughing [11][12][13]. An important advantage of video cameras is the possibility of using advanced image processing techniques to detect posture and to recognize if a patient is coughing. From longer ranges, individuals in acute distress might be detected by their movements or if they should collapse [9].
A significant advantage of video cameras over all other sensors, including thermal cameras, is their ability to have an arbitrary field of view, with the same camera capable of imaging bacteria and the ice caps of Mars with a change of lens [8].
To enhance the performance of monitoring vital signs, different technologies can be combined, as shown in Table 1. For example, by combining an RGB camera, a monochrome camera with color filter, and a thermal camera, a study by Gupta et al. [14] proposed an efficient multicamera measuring system to measure HR and HRV. In the proposed method, face segmentation was done using conditional regression forests (CRF) to detect facial landmarks in real-time, and an ROI was selected. The raw signal was calculated based on a spatial average of the pixels within the ROI for each channel, i.e., red, green, blue, magenta, and thermal (RGBMT). To recover the underlying source signals, ICA was used, and a bandpass filter was applied to the selected source signal. Finally, peak detection was performed to calculate HR and HRV. The experimental results showed that the GRT (green, red, and thermal) channel combination gave the most accurate results, demonstrating that the inclusion of more spectral channels could attain more robust and accurate measurements.
Hu et al. [15] presented a dual-mode imaging technique based on visible and long-wave infrared wavelengths to extract breathing rate and pattern remotely using both an RGB and a thermal camera. After performing image registration, to identify an ROI in the RGB image, the cascade object detector based on the Viola-Jones algorithm and the screening technique using biological characteristics were used. Next, to select the corresponding regions in the thermal image, linear coordinate mapping was applied. To select the interest points from ROIs in the visible light gray images, the Shi–Tomasi corner detection method derived from the Harris–Stephens was applied. Cross-spectrum ROI tracking was attained using linear coordinate mapping. After that, the mean pixel intensity was obtained within the ROI of the thermal image. In all of the frames, the raw pixel intensities of ROIs were smoothed by the moving average filter and smoothing method. Finally, breathing rate was extracted from the smoothed signal. By adding RGB images, it was possible to detect and track the face and facial tissue in thermal images more accurately and faster. However, the above systems may not be feasible for measuring vital signs at night.
Another dual camera imaging system was proposed in [16] to measure BR and HR at night using an RGB infrared camera and thermal imager. Image registration was done based on affine transformation, face detection was performed using a pre-trained boosted Cascade classifier or fully convolutional network, and a discriminative regression-based approach was used to select landmarks on the face. IR images were used to select ROI from thermal images using linear coordinate mapping. ROI tracking in thermal images was done by Spatio-temporal context learning (STC). A moving average filter was applied to the raw signal and finally, HR and BR were calculated.
Bennett et al. [17] used both a thermal and optical camera to monitor heart rate and blood perfusion based on EVM. After identifying ROI from both thermal and optical videos, the average intensity was computed within the ROI. After that, a Butterworth lowpass filter was used to filter these signals and then normalized. The experimental results of blood perfusion showed that optical video with EVM was sensitive to skin color and lighting conditions, while thermal video with EVM was not sensitive to these variables. However, in the proposed method, the movement of the subject was not considered, and the sample size was small. Although a temperature response caused by blood occlusion was sensed by the thermal camera, it was not in the pulsatile manner of perfusion.
Table 1. Different studies to monitor vital signs using combined technology.
Ref |
Sensor Used |
Vital Signs |
ROI |
Used Technique |
Result |
Gupta et al. [14]
|
RGB, monochrome and thermal camera |
HR and HRV |
Cheeks and forehead |
ICA |
Error = 4.62% |
Hu et al. [15] |
RGB and thermal camera |
BR |
nose and mouth |
Viola–Jones algorithm together with the screening technique |
LCC = 0.971 |
Hu et al. [16] |
RGB infrared and thermal camera |
HR and BR |
Mouth and nose regions |
Moving average filter |
LCC =0.831 (BR), LCC = 0.933 (HR),
|
Bennett et al. [17]
|
Thermal and digital camera |
HR and blood perfusion |
Face and arm |
EVM |
- |
Note: LCC = Linear correlation coefficient.