*2.3. Sensor Fusion Combining RGB sensor and IRT for ROI Detection*

A stable measurement of the body temperature and RR using an IRT needs a detailed ROI detection of facial landmarks (i.e., face, nose and mouth) because temperature is estimated at the facial area and respiration occurs at the nose and mouth. An RGB camera can detect facial landmarks finely using previous methods [20]. Therefore, we introduced a sensor fusion method to obtain facial landmarks in a thermal video determined by an RGB video.

The facial landmarks in a thermal video are detected by homography of the RGB image coordinates of the nose and mouth, detected by "dlib" of an open-source library to thermal image coordinates. The homography between the images is represented by equation (1) and the homography matrix H is represented as

$$\begin{aligned} H &= \begin{pmatrix} h\_{11} & h\_{12} & h\_{13} \\ h\_{21} & h\_{22} & h\_{23} \\ h\_{31} & h\_{32} & h\_{33} \end{pmatrix}, \\ \mathbf{x}\_{thorrmo} &= \frac{h\_{11}x\_{RGR} + h\_{12}y\_{RGR} + h\_{13}}{h\_{31}x\_{RGR} + h\_{32}y\_{RGR} + h\_{33}}, \\ \mathbf{y}\_{thrmmo} &= \frac{h\_{21}x\_{RGR} + h\_{22}y\_{RGR} + h\_{23}}{h\_{31}x\_{RGR} + h\_{32}y\_{RGR} + h\_{33}} \end{aligned} \tag{1}$$

where *xRGB*, *yRGB*, *xthermo* and *ythermo* are image coordinates in the RGB and thermal images. Each *hij* (*i*, *j* = 1, 2, 3) in Equation (1) is an element of the homography matrix H. Figure 3 shows a flowchart of image processing conducted to estimate the homography matrix H. Its standard is the face profile between the RGB and thermal images using pattern matching. First, from the RGB and thermal images shown in Figure 3a,b, the profile part is abstracted using the "grabcut" method [21] of OpenCV, to obtain the profile images shown in Figure 3c. The combination of coordinates between the images is found by obtaining the oriented fast and rotated BRIEF (ORB) characteristics of the two

profile images and by performing a full search of the corresponding points from the characteristic points of each image obtained [22]. The homography matrix for the combination of image coordinates obtained is estimated using the random sample consensus method [23]. Finally, the facial landmarks in the thermal image (Figure 3e) are detected by applying the homography matrix to RGB's facial landmarks (Figure 3d).

**Figure 3.** Feature matching for region-of-interest (ROI) detection in thermal image. The figure reproduced with copyright permission from Reference [14].
