*2.4. Dataset*

The dataset used in this study is publicly available in [45]. The authors used a FLIR Vue Pro camera to capture thermographic images for the dataset. During image capture, participants looked at a fixed point while the camera was moved to nine equidistant positions, forming a semicircle around the volunteer. Thus, the dataset contained nine thermographic images of the face of each participant. Figure 5 displays examples of photos that comprise the dataset.

**Figure 5.** Examples of dataset images [45].

The complete dataset contained 998 images from 111 participants. However, to work on a balanced dataset concerning volunteers' gender, only 781 images were used. Of these, 658 were used for training and 123 for validation of transfer learning by the YOLO network. Aiming to evaluate the performance of YOLO for object detection, it is necessary to label each image with the annotations of their respective bounding boxes.

The face ROIs are the ear, eye, forehead, and whole face. These areas have known temperature thresholds for febrility and can be see directly; thus, they are suitable for screening febrile people using thermography [46]. However, not all regions are constantly visible on the person owing to the use of glasses, face masks, hair over the forehead or ear, and others.

LabelImg software, a free graphical tool for image and video annotation [47], was used to label all images used in this study. Figure 6 shows the graphical interface of the LabelImg software.

**Figure 6.** LabelImg software graphical interface.

#### **3. Results**

The training of the object detector was performed on Google Collaboratory, a computational environment that runs in the cloud and requires no configuration. It allows the writing and executing of code directly in the browser.

For the R-CNN training assessment, it was necessary to quantify the prediction accuracy by comparing the prediction made by this model with the real object location in the image. Thus, the mean average precision (mAP), which is one of the most common metrics for determining the accuracy of object detectors, was employed [48].

Other methods to evaluate the performance of the trained network were precision (P), recall (R), and F1-score (F1). For these metrics, a higher value indicates a better result. Additionally, the values of true positives (TP), false positives (FP), and false negatives (FN) were employed as performance metrics.

The resultant metric values of the trained R-CNN, with a confidence limit of 25% (conf\_threshold = 0.25), were as follows: TP = 452, FP = 46, FN = 9, P = 0.91, R = 0.98, and F1 = 0.94. The mAP with an intersection over union (IoU) greater than 50%, also known as mAP@0.50, was 0.97. From Table 1, it is possible to evaluate the performance of the model for each class.

Tests were carried out with photos of six volunteers, different from those present in the training dataset, to evaluate the prediction accuracy of new images. A Testo-885 thermographic camera captured the new images. After transfer learning, the object detector algorithm analyzed these images and detected all ROIs, even for volunteers wearing masks, caps, and with long hair. Figure 7 displays some of these images.


**Table 1.** Performance for each ROI during validation.

**Figure 7.** Images produced using the Testo-885 camera and analyzed by YOLO.

When identifying an object, the YOLO detector provides the coordinates, width, and height of the bounding boxes. This allows delimiting ROIs where the temperature is analyzed. From each ROI, the values of the pixels with the highest temperatures were extracted. Thus, the algorithm discards regions covered by hair, sweat, and fabric, which are generally at lower temperatures. The higher temperatures are shown as the lightest colors in Figure 8.

Figure 9 displays a boxplot of the pixel values in each ROI, as depicted in Figure 8. The distribution of the pixels in the forehead region displays lower values than those in the eye regions, indicating that the eyes are at a higher temperature than the forehead.

According to [25], the maximum or mean temperatures of ROIs can be adopted to assess human body surfaces. However, the segmentation of ROIs performed in this paper may include background images, parts of the surfaces of glasses, masks, and hair, decreasing the mean temperature of the ROI. Therefore, to avoid this issue, the maximum temperature for each ROI was adopted.

The temperature scale on the right side of Figure 8 indicates that: darker colors are close to a temperature of 24 ◦C and lighter colors approach 35 ◦C. In the thermal imager standard operating mode, these values are automatically generated by the camera's operating software, where the highest value indicates the maximum temperature of the objects in the thermal imager's field of view and the lowest value indicates the minimum temperature of the objects.

**Figure 8.** Extraction of ROIs to be analyzed.

Thermogram radiometric output is not always available, depending on the imager manufacturer. Thus, a method for reading temperatures in the region of interest directly in the thermal image was developed, so that the method can be widely used.

Along the temperature scale, there are 267 pixels, where the first one, pixel zero, has a value of 254. The last pixel of the scale has a value of 4. Figure 10 presents the relationship between the pixel values and their respective positions on the scale as a dashed line (in green). Equation (1) shows a first-order linear proportionality relation, the first-order polynomial, obtained through linear regression, with a coefficient of determination (R2) of 0.9941. The solid line (in red) on the graph shows the behavior of the equation of the straight line that describes this relationship.

$$
\dot{q} = -1.064\upsilon + 270.256,\tag{1}
$$

where *v* is the pixel value and *i* is the position on the temperature scale.

**Figure 10.** Pixel value as a function of position on the temperature scale. The dashed green line depicts the pixel values extracted from the image. The solid red line displays the behavior of the equation obtained by linear regression.

As Equation (1) indicates a first-order linear proportionality between pixel position on the scale and temperature, higher temperatures will produce higher pixel values. Thus, the pixels positioned at the beginning of the scale represent the highest temperatures, and pixels at the end indicate the lowest. Figure 11 depicts the relationship between the pixel positions and the respective temperatures of the scale, as shown in Figure 8. Equation (2) presents a straight line that describes this relationship.

$$T = \left[ (y\_2 - y\_1) / 266 \right] i + y\_1,\tag{2}$$

where *i* is the pixel position, obtained using Equation (1), *y*<sup>1</sup> is the highest value recorded on the temperature scale, *y*<sup>2</sup> is the lowest value recorded on the temperature scale, and *T* is the temperature (in ◦C) of the analyzed pixel.

**Figure 11.** Temperature as a function of pixel position.

Through algebraic manipulation of Equations (1) in (2), it is possible to obtain the value of the temperature *T* of the analyzed pixel, through the value, *v*, of the pixel, the highest value recorded on the temperature scale, *y*1, and the lowest value, *y*2. Equation (3) displays the algebraic manipulation:

$$T = 1/266 \left[ y\_1 (1.064v - 4.256) - y\_2 (1.064v - 270.256) \right]. \tag{3}$$

After obtaining the highest temperatures of each image ROI, the highest value among these temperatures represents the final temperature of the volunteer. Figure 12 shows images of 24 volunteers, and Table 2 lists the highest temperature recorded in each ROI for each person.

**Figure 12.** Images produced using the Testo-885 camera and analyzed by YOLO, with temperature estimated using Equation (3).


**Table 2.** Estimated Volunteer Temperatures.

Table 2 initially shows that there are small variations in temperature (less than 1 ◦C) among most volunteers. However, considering that the human being is homeothermic, this shows that the surface temperature undergoes variations not experienced by the body temperature. Furthermore, it is confirmed that face surface temperature is predominantly lower than body temperature, as shown by the mean temperature values of non-febrile volunteers.

One can note that for volunteers 6 and 22, only one ROI was visible in the image, permitting to obtain only one temperature for the analysis. The identification of only one ROI in volunteers 6 and 22 is an example of both the limitation of the intelligent system and that its objective was achieved. Volunteers were not in adequate direct sight of the thermal imager, in a way that would allow the identification of more ROIs. However, the system managed to identify at least one ROI, enabling the person's temperature analysis.

Unfortunately, only four febrile volunteers were obtained in the image production campaign with volunteers, who were previously diagnosed as feverish by a health team by checking body temperature. Their measured temperatures are identified as 21, 22, 23, and 24 on Table 2. Despite the low sampling of febrile people, it is noted that: the maximum temperature detected in the region of the face of the volunteers 21 and 24 (37.1 ◦C and 37.3 ◦C, respectively) did not exceed the usual fever threshold for central temperature (37.5 ◦C or 38.0 ◦C); the temperatures of different facial regions of volunteer 23 showed a 0.5 ◦C discrepancy, which is significant for the diagnosis of fever. This supports the hypothesis that the febrile diagnostic criteria of core body temperature (37.5 ◦C or 38.0 ◦C) should not be applied to human facial temperature.

#### **4. Conclusions**

This study employed a transfer deep learning method to detect and recognize ROIs, including the face, forehead, eyes, and ears, in thermographic images using the YOLO object detector. After training a CNN from a dataset made available by other researchers, images of new volunteers obtained in the laboratory served as input to the CNN to evaluate the detection performance of the ROIs.

Tests verified that ROI detection was feasible even with the use of masks, caps, helmets, or with features hidden by hair.

As displayed in Figure 9, there were variations in temperature among ROIs. The criterion of adopting the highest temperature within each ROI proved to be efficient, as areas without a direct target, such as those covered by hair, are disregarded.

This study presents a simple system for obtaining temperature values directly from thermographic images without significant computational processing. These improvements in detecting the maximum and minimum temperatures of ROIs can provide better results for identifying febrile people.

As infrared thermography measures surface (skin) temperature and not the core temperature of the human body, future work will apply adequate criteria to analyze the febrility of the individuals from the temperatures of the ROIs. This avoids the use of a single temperature threshold to indicate a feverish state for all regions of the human face. Screening people with fevers through infrared thermography should apply a different and adequate threshold temperature for each face region, typically higher than the body threshold temperature (37.5 ◦C). Additionally, expanding the dataset will improve the detection of ROIs and allow more reliable screening of febrile people.

Finally, other deep learning algorithms will be applied, evaluated, and compared to the results presented in this work.

**Author Contributions:** Conceptualization, R.B.N., J.S. and P.R.M.; data curation, J.R.d.S.; formal analysis, J.R.d.S., R.B.N. and J.S.; funding acquisition, M.A.d.S.L.C. and P.R.M.; investigation, J.R.d.S., G.M.d.A. and M.A.d.S.L.C.; methodology, J.R.d.S., G.M.d.A., M.A.d.S.L.C., H.L.M.C., R.B.N., J.S. and P.R.M.; project administration, P.R.M.; resources, M.A.d.S.L.C. and P.R.M.; software, J.R.d.S., G.M.d.A., M.A.d.S.L.C., R.B.N. and J.S.; supervision, G.M.d.A., M.A.d.S.L.C. and P.R.M.; validation, J.R.d.S., R.B.N. and J.S.; visualization, R.B.N.; writing–original draft, J.R.d.S.; writing–review & editing, R.B.N., J.S. and P.R.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by FAPES (Espírito Santo Research and Innovation Support Foundation), grant numbers 14/2019 (Master's scholarship), 03/2020 (Induced Demand Assessment–COVID-19 Project), and 04/2021 (Research Support). APC and text review were partially funded by the Federal Institute of Espírito Santo through the Institutional Scientific Diffusion Program (PRODIF).

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of Federal Institute of Espírito Santo - Brazil (protocol code CAAE 33502120.2.0000.5072, approved on 29 July 2020).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Acknowledgments:** The authors thank Masterplace Mall, Construtora Paulo Octávio, Sagrada Família Church in Jardim Camburi, and the Energy Laboratory of Ifes Campus Vitória for collaborating in the field research with volunteers. This work was also supported by the Federal Institute of Espírito Santo and the National Council for Scientific and Technological Development (CNPq).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

