Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks

Marzec, Mariusz; Wilczyński, Sławomir

doi:10.3390/app14198878

Open AccessArticle

Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks

by

Mariusz Marzec

¹

and

Sławomir Wilczyński

^2,*

¹

Faculty of Science and Technology, Institute of Biomedical Engineering, University of Silesia, Bedzinska Street 39, 41-200 Sosnowiec, Poland

²

Department of Basic Biomedical Science, Faculty of Pharmaceutical Sciences in Sosnowiec, Medical University of Silesia, Jednosci Street 10, 41-200 Sosnowiec, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8878; https://doi.org/10.3390/app14198878

Submission received: 3 September 2024 / Revised: 27 September 2024 / Accepted: 29 September 2024 / Published: 2 October 2024

(This article belongs to the Special Issue Human Activity Recognition (HAR) in Healthcare, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Monitoring of the vital signs or environment of disabled people is currently very popular because it increases their safety, improves their quality of life and facilitates remote care. The article proposes a system for automatic protection against burns based on the detection of thermal threats intended for blind or visually impaired people. Deep learning methods and CNNs were used to analyze images recorded by mobile thermal cameras. The proposed algorithm analyses thermal images covering the field of view of a user for the presence of objects with high or very high temperatures. If the user’s hand appears in such an area, the procedure warning about the possibility of burns is activated and the algorithm generates an alarm. To achieve this effect, the thermal images were analyzed using the 15-layered convolutional neural network proposed in the article. The proposed solution provided the efficiency of detecting threat situations of over 99% for a set of more than 21,000 images. Tests were carried out for various network configurations, architecture and both the accuracy and precision of hand detection was 99.5%, whereas sensitivity reached 99.7%. The effectiveness of burn risk detection was 99.7%—a hot object—and the hand appeared simultaneously in the image. The presented method allows for quick, effective and automatic warning against thermal threats. The optimization of the model structure allows for its use with mobile devices such as smartphones and mobile thermal imaging cameras.

Keywords:

thermovision cameras; thermal images; image segmentation; image analysis; hand detection; deep learning; blind and visually impaired people

1. Introduction

The automatic and remote monitoring of the environment, vital signs and human behaviour has increasingly wider applications related to improving the quality of life and improving care for the elderly, sick or disabled [1,2,3,4,5,6]. We can also use these solutions to increase the comfort of the life of patients or safety of people working in difficult conditions. Such systems can use different types of devices to observe and record parameters, phenomena and the surroundings. Those most commonly used are acceleration sensors, motion sensors, sensor networks, cameras and advanced vision systems using visible light cameras and thermovision. Typical threats to disabled or elderly people include falls, cuts, contusions and burns. Early detection of such situations can save the health or even the life of the injured person. Direct contact with high temperatures that are dangerous to health or life may be particularly burdensome in their consequences. The process of treating burns is usually difficult and long lasting, so it is worth preventing this type of accident. Such situations occur constantly in everyday life, and people with various diseases or health problems may become their potential victims. Since burns can occur at a temperature of 70 °C within 1 s [7], it seems reasonable to use a warning system that could protect people who are at risk. Taking into account the fact that burns occur extremely quickly, the article proposes an autonomous mobile system allowing for the early detection of danger and warning against burns. There are several devices on the market that can be used to build such a system—Figure 1. The proposed system uses the Flir One Pro [8] thermal imaging camera working with a smartphone. The applied mobile camera can record a thermal image directly in front of the user and transmit this information to the mobile device where the image analysis process, threat detection and warning take place. The Flir One Pro camera captures thermal images with resolution 320 × 240 (images are scaled up) and a maximum temperature range of up to 400 °C. The device is equipped with its own battery and a built-in visible light camera with a resolution of 1440 × 1080. The FLIR ONE cameras can be connected to Android and iOS devices via a micro USB or USB-C connector. They can record images at a frequency of eight frames per second.

A better solution is to use integrated devices in the form of goggles with a built-in IR camera, but in this case, it is important that the equipment used has sufficient computing power or is able to communicate with mobile devices, as is the case with the Flir One Pro camera. The thermal imaging camera integrated with the goggles presented in Figure 1 (SATIR Thermal Vision 256) has an optical resolution of 256 × 192 and a temperature range of −20 °C~+550 °C. The set allows for communication in the 4G standard, which makes it possible to expand its capabilities and use external image analysis algorithms. Currently, however, it is not possible to directly use the device in the proposed system.

To detect the risk of burns, images transmitted from a thermal imaging camera (in the form of a temperature matrix in degrees Celsius) showing the environment directly in front of the system user are used. This makes it possible for the application to react directly, based on the temperature value in the image, which may additionally determine the level of threat. The thermal imaging camera mounted on glasses or on the chest captures a thermal image directly in front of the user and transmits it to the mobile monitoring application that analyses the image and determines the level of threat. The algorithm implemented in it (discussed below) allows for the detection of arms or hands in the field of view of the camera and the detection of high temperatures (over 70 °C). The system will analyze thermal images autonomously based on deep learning methods using a CNN, which will enable high effectiveness and additional classification of threats when the user’s arms or hands appear in the monitored area. The use of CNNs should significantly increase the effectiveness of the system relative to similar solutions.

2. Related Works

Since the main task of the proposed system is to warn of hand burns, hand detection is the most critical component of the system. Issues related to hand and arm detection are often discussed in the available literature in various applications. They are very often related to recognizing gestures [10], detecting location, tracking hands, building human–robot communication interfaces, controlling devices and systems or manipulating objects in 3D space [11,12]. The typical solutions most often use independent image analysis methods (using RGB (conventional cameras), RGB-D (Kinnect—Microsoft, Leap Motion—https://leap2.ultraleap.com/products/leap-motion-controller-2/ (accessed on 3 September 2024)) or IR (Flir One, https://www.flir.eu/products/flir-one-pro/ (accessed on 3 September 2024)) cameras) but also specialized equipment in the form of gloves communicating with a computer or robot. The appearance of depth sensors on the market has also resulted in 3D analysis methods [11] allowing for the assessment of hand location in three dimensions. Currently, methods using deep learning and convolutional neural networks are becoming more and more common [13].

The first group are methods based on the analysis of an image in the RGB or YCbCr palette. They use colour information for hand detection and localization which is the basis for further interpretation. In article [14], an image is initially mapped to the CbCr palette and next edge detection is performed using morphological operations. As a result, the algorithm locates areas that meet the colour criteria and designates areas of the hand. The proposed algorithm obtained the following results: TPR = 94.6% and FPR = 2.8%. The authors did not present the details of the set of studied images, which makes it difficult to assess the universality of the method. Another example of a hand localization method in visible light is the method described in paper [15]. Here, the authors proposed a fast, real-time classification method based on hand shape. The algorithm detects the hand area based on colour, determines the orientation of the hand in a vertical position and then uses shape context analysis, template matching, the orientation histograms, Hausdorff distance analysis and Hu moments to determine shape coefficients and classify them. The set contained 499 images and the method was used for gesture recognition.

In recent years, with a significant drop in the prices of thermal imaging cameras and the rapid development of thermovision, the number of applications of this imaging method in many areas has been increasing. This opens up completely new possibilities and makes it possible to expand the methods related to visible light. Hand detection in thermal images makes it possible to eliminate problems typical for images in visible light, e.g., weak or uneven lighting, the impact of skin colour, complex coloured background. Exemplary solutions are based on the hand area model or information about the brightness in the analyzed area (model-based or appearance-based). The article [16] described a hand segmentation method based on statistical features of textures. The set of features included, among others, mean brightness, standard deviation, entropy, homogeneity and contrast. The areas designated in this way were further analyzed using the above-mentioned texture features, which allowed for the classification of regions located in the hand area. Image segmentation was performed using k-means cluster analysis. The algorithm was used in the treatment of rheumatoid arthritis. In paper [17], an adaptive hand segmentation algorithm was proposed. First, a Gaussian model representing the background was prepared, thanks to which the approximate area of the hand was determined. In the next step, five areas were generated inside the hand area and the temperature distribution models were created. After analyzing the image by the five prepared models, the resulting masks were combined into one resulting mask. The effectiveness of the operation was determined as the ratio of the bounding box marked by the expert to the bounding box determined by the algorithm and amounted to 86%.

Another application of hand segmentation in thermal imaging is biometrics [18]. The goal in this publication was to segment the hand and determine the vein system. The authors proposed several approaches based on thermovision and a combination of visible light and thermovision. In the case of thermal images, the active shape model method was used. In cases where segmentation for visible light images was combined with thermal images, masks established in visible light were used in thermal imaging to preselect the hand region. In this scenario, it was important to match the mask obtained from the visible image to the shape of the hand in the thermal image.

In previous studies, the authors proposed a hand detection method with an SVM classifier using the proposed geometric features and texture parameters in the hand area [19]. A set of over 5100 images was prepared, containing the user’s hands and arms in various situations and with objects with increased temperatures. The hand detection efficiency reached the level of Acc = 0.89 and high-temperature detection was correct for all images in the test set.

In the hand detection process, thermal images are also used in combination with visible light and depth image information [20]. In the case of 3D methods, the information about the colour and depth (RGB-D) are used. The use of a thermal camera, in this case, increased efficiency by reducing the impact of variable external lighting on segmentation results. The authors annotated ground truth bounding boxes for RGB, depth and thermal images and the Fast R-CNN object detector was used to analyze the images. The algorithm was trained on a set of several thousand images (2000 RGB images, 1000 thermal images, and 1000 depth images) recorded from two cameras and a depth sensor. The authors noted that RGB images, then depth images and finally thermal images had the greatest impact on the detection efficiency, but they did not indicate the efficiency values obtained by the proposed method. Another example is the publication [21] where deep neural networks were proposed for the detection and classification of gestures on sequences of combined colour images, depth images and stereo-IR images. The proposed 3D recurrent network was trained on the Sport1M set containing video sequences with 487 types of sports activities. The algorithm did not directly detect hands and the gesture classification efficiency reached 98.2%.

In publication [22], the authors again applied a deep convolutional network to recognize hand gestures in static images. It was noted that hand detection in the case of a complex background is difficult and classic methods are not always effective; therefore, the proposed model simultaneously functioned as a detector and classifier. Relatively few images were used: 1600 training and 400 test images. Gesture classification efficiency reached approximately 94.7%.

Another method related to hand analysis was proposed in paper [23] where the goal was to locate the hand and detect skin in visible light images. The performance of RCNN [24] and Fast-RCNN [25] combined with skin segmentation was compared on several image sets (over 13,200 images containing the hand). Hand detection efficiency on various image sets reached maximum values of 96–97%. As a further improvement, the authors indicated increased resistance to poor lighting, shadows or image blur, because in the case of other image sets, the effectiveness was very low, even around 30–40%. In publication [26], the aim was to detect the driver’s hands while using a mobile phone or when they were placed on the steering wheel. The authors proposed a modified version of Fast-RCNN for hand, smartphone and steering wheel detection. Then, using geometric relationships, the system determined whether the driver had his hands on the steering wheel or was using a smartphone. The effectiveness of smartphone detection reached 94%, and the detection of hands on the steering wheel was effective in 93% of cases.

Another example of a hand detection algorithm combined with position determination using CNN was presented in publication [27]. The authors suggested that methods of detecting hands and their position and orientation also helped computers understand human intentions and provide guidance for more complex tasks. The proposed CNN model was tested on selected sets (e.g., Oxford Hand Dataset—13,050 hand images) and achieved a sensitivity of 99–100% at the stage of generating the proposed hand areas for position analysis.

In publication [28], the authors tested many detectors and, as a result, proposed a hand recognition method using the Yolov7 and Yolov7x models. The Oxford Hand Dataset was again used for testing. The following results were achieved: 84.7% precision and 79.9% recall.

In light of the research and examples described above, a hand detection method based on deep learning and convolutional neural networks is proposed. A set of over 21,000 thermal images and a model based on a 15-layered convolutional network were prepared. The network was optimized and a set of hyperparameters is proposed to ensure high training efficiency and speed. The system presented here completely and automatically analyzes images obtained from a mobile thermal imaging camera and detects hot objects as well as the risk of burning hands. Its task will be to increase the safety of visually impaired or blind people during their everyday activities. The proposed system is an innovative application of thermal imaging because there are no similar solutions on the market combining mobile technologies and thermal image analysis using CNN. The main reason for proposing the CNN network was the known and confirmed high effectiveness in the process of image classification, object detection and semantic segmentation. It results from the image analysis mechanism used (based on the hierarchical arrangement of convolutional layers and the use of receptive fields) and effective and automatic feature extraction [29,30]. The use of CNN was made possible by providing a set of over 21,000 images.

3. Materials and Methods

3.1. Dataset

In order to carry out the study, 29 sequences of thermal images covering various environments and objects were recorded. The set included 7060 thermal images (2564 without hands, 4496 with left or right hands). The images were recorded in sessions of different durations, at different times of the day and included various positions and cases of hot objects and hands. The resolution of thermal images sent to the CNN model, after preprocessing, was 120 × 160 pixels. The temperatures of hot objects ranged from tens to hundreds of degrees of Celsius. The smartphone and thermal camera were vertically positioned. The set of images was then artificially expanded using data augmentation by randomly rotating the images in the range of ±5° (each of the 7060 images was subjected to random rotation, which increased the set to 14,120 images) and randomly increasing the image temperature in the range of 1–5 °C (randomly increasing the temperature/brightening of real images), which resulted in 21,180 images (more than 4 times as much as applied in [19]). The set was then divided into training and test sets, in several variants, in order to verify the impact of the set on the learning process. Figure 2 shows sample images representing everyday situations, different levels of threat (no hot objects, warm or hot objects in the field of view, arm or hand near a hot object), objects and devices that workers, blind or visually impaired people may come into contact with in everyday activity.

3.2. Proposed Methodology

The main objective of the proposed solution was to use a CNN model that allows quick detection of hands and hot objects at thermovision images at close range. As a result, the system can provide a warning of burns. An illustration showing the device mounting is presented in Figure 3. From the point of view of the capabilities of currently available devices [8], it is possible to attach the devices to the user’s chest as in [19,20]. On the other hand, taking into account the convenience and comfort of use, a better solution is to use integrated devices [9], assuming that in the future they will provide appropriate hardware resources for effective image analysis. A smartphone and camera installed at chest level was used in the present study.

Figure 4 below presents the block diagram of the proposed algorithm and discusses the individual stages. At the beginning, the thermal imaging camera acquires raw temperature matrices at a frequency of 8 Hz with resolution 320 × 240. In the next step, they are converted to a temperature matrix in degrees Celsius and scaled to resolution 160 × 120. The algorithm checks whether there are objects with increased temperatures in the image and detects hands. There are two possible scenarios that pose a potential threat. The first one is when only a hot object appears in the image and the other one is when the hot object and the user’s hands appear in the image. Figure 4, on the left, shows a situation when the algorithm detected a hot object but there is no risk of burns (Scenario 1) and a high-temperature warning should be generated. On the right side, there is a situation when the left hand is near a hot object (Scenario 2). In this case, a risk of burn alarm should be generated. In the final stage, the threat classification block decides which warning to generate.

The main element of the developed method is an image analysis block using convolutional networks—CNN. Methods [26,28] also use CNN models, but they are characterized by an extensive structure resulting from the greater complexity of the tasks performed (e.g., multiple scale detection or multiclass detection and classification). The model from Method [26] contains 5 convolutional layers. Behind the 1st and 2nd convolutional layers, there are ReLU, LNR and Max-pooling layers. Behind the conv3, conv4, conv5 layers, there are ReLU layers. Additionally, the outputs of the conv3, conv4 and conv5 layers are connected to three layers, the so-called ROI pooling from which data is transferred to two fully connected layers. At the end, there is a SoftMax layer that scales the model results into probabilities. The structure of the model used in Method Method [28] is much more complex because it can detect objects of different classes and sizes. The YOLOv7 model used is divided into 3 main blocks: Backbone, Neck and Head. The Backbone block contains 4 convolutional layers and 3 Max-pooling layers and is used to generate image features. The Neck block is used to combine features and contains convolutional, pooling and upsampling layers. The last block performs prediction by classifying image areas as appropriate classes. After getting acquainted with the selected network structures [29,30] and bearing in mind the fact that the hand detection process would not require complexity similar to multi-class classification, a network structure consisting of 15 layers and similar to [31] was proposed. Initial tests were carried out with this model after its adaptation to IR images. Due to the high-detection efficiency, an attempt was made to optimize it while maintaining high efficiency—Figure 5. The network receives as input an image matrix containing temperature data in the 160 × 120 × 1 format (optical resolution, number of channels). The images are additionally normalized using zero-center normalization. The input layer was the first to be modified to ensure compatibility with the thermal images. Then, three convolutional layers with kernels of different sizes were used (which aimed to extract image features at different levels—simple features, complex features). Each one of them was followed by Max-pooling layers (which enabled data size reduction). Then, the efficiency was tested for different numbers of filters and finally 24 filters were proposed in the Conv1, Conv2 layers and 48 filters in the Conv3 layer. Taking into account the number of classes of examined images, the fully connected layers were also reduced, the first one (Fc1) to 16, and the second one (Fc2) to 2 neurons, which made it possible to significantly reduce the model size. After each layer, Conv1,2,3 and Fc1, a ReLU activation function layer was added (in order to accelerate model convergence and reduce the vanishing gradient, similarly to [29]). After the Fc1 layer, a layer with the ReLU activation function was added to achieve model nonlinearity. The input image size was reduced in subsequent stages by Max-pooling layers. Further modifications of the network involving changes in the convolutional layers resulted in a decrease in operational efficiency by several percent, so the final version shown in Figure 5 was proposed.

Figure 6 presents selected filter masks generated in subsequent layers of convolutional networks (24, 24, 48 filters).

Figure 7 presents the network training process. It can be observed that the model achieves high accuracy relatively quickly and after 10 iterations, the increase in effectiveness is much slower. The stochastic gradient descent method with momentum values = 0.9 and the learning rate drop factor which reduced by 0.1 every 8 epochs was used. The L2 regularization parameter was equal to 0.004. The hyperparameters of the learning process were also established, i.e., minibatch size = 256 and number of epochs = 20.

4. Experiments and Results

The results of hand detection and threat level classification obtained using the developed algorithm are presented below. Hot object detection is a task that is easy to accomplish using classic image segmentation methods (e.g., thresholding). Therefore, there are no requirements for the number of training images. Detection of the risk of burns is based on the CNN network and in this case the size of the image set affects the results obtained. Depending on the number of images in the examined set, the algorithm obtained different results in terms of the effectiveness of detecting the risk of burns. Thus, for 10,590 images (50% of the entire set), the accuracy reached 0.975. When the number of images was 15,885 (75% of the entire set), the algorithm achieved Acc = 0.986. For the full set of 21,180 images, it was 0.995. The algorithm reached Acc above 0.9 for approximately 5295 images (25% of the entire set). The first block of algorithm detects hot objects with a temperature that poses a threat to health with 100% efficiency. The threshold value of 70 degrees Celsius makes it possible to detect objects that threaten to burn quickly. The algorithm block which is responsible for burn detection requires a more complex image analysis. Using the proposed CNN-based model, the system will be able to determine with high efficiency whether there are hands or arms in the analyzed image. At the beginning of the study, several different network configurations and structures were tested and compared. To evaluate the performance of the model, four typical values of accuracy, precision, specificity and sensitivity, based on the confusion matrix determining the number of correct and incorrect hand detections in images, were proposed. The set was divided into training and test sets in two configurations (similar to [19,21]): 75/25 and 90/10, and the impact on the learning process was verified.

As can be seen in Table 1, the algorithm achieves very good results for various set configurations. Further research and results concern a set divided in a 90/10 ratio (19,062 training images and 2118 test images). Some selected cases of correct and incorrect hand detection are presented below. Figure 8 shows cases where the algorithm correctly responded to the presence of hands (TP) and the presence of hot objects at the same time. In some cases, even images with a small portion of the hand are classified correctly. Figure 9 shows examples of images where the algorithm detected hands even though they do not actually appear in the image—in expert opinion (FP). This situation results from the existence of areas of warm air (with a temperature and shape similar to that of the hand) surrounded by hot objects. The next images in Figure 10 are examples where the algorithm did not correctly detect hands (TN) because they do not appear in the images. In such a situation, the algorithm should only warn about the existence of a hot object, but the level and intensity of warning may be lower.

Figure 11 shows false negative cases (FN), in which the algorithm did not detect hands even though they are actually present in the image. They only constitute less than 0.2% percent, which is very important because it is a key element of the algorithm whose operation affects the safety of the system user. The lack of hand detection may be caused by too small parts of the hand visible in the image (which only appear in the field of view of the camera) or by interference in the hand area resulting from the camera operation. Bearing in mind that the algorithm analyses subsequent frames continuously, the hand will be detected when most of it appears in the image. However, if the hand is not detected, the high-temperature detection block will inform the user about the hot object, which should increase their alertness.

By analyzing available publications related to the analysis of hand images, several examples were selected in which the problem of hand detection in visible light or thermal images appears. Since in most cases the authors did not present comprehensive results of the achieved effectiveness in the hand detection process (or hand detection was only part of the analysis process), the comparison included those solutions and results that could allow for at least partial comparison of effectiveness. Results obtained in visible light, RGBD, IR using classic algorithms and deep learning are summarized with respect to the method proposed here. In the case of reference solutions, the results obtained by their authors and for their datasets are presented. The proposed CNN models were tested on the image set discussed here. Additional information such as the number of images used by the authors is also included. For the method discussed here, a set of averaged results obtained during testing for 10-fold cross-validation is presented (Table 2).

5. Discussion

The research described here was aimed at developing a warning system for high-temperature threats using deep learning methods. The main element was to carry out an automatic analysis of thermal images based on convolutional neural networks. The warning concept is based on the detection of high-temperature objects and hands or arms located near these objects. Experimental results and comparison with other hand detection methods indicate a high effectiveness of the proposed solution. Optimization of the model structure further reduces the hardware requirements needed to run it on mobile devices. The proposed model achieved an accuracy and sensitivity of 99.5% and 99.7%, respectively (Table 2). By optimizing the network structure, it was possible to reduce the memory demand (over four times less memory) and shorten image analysis time (by 20%)—Table 2—No. 7. The main goal of the proposed method is to quickly detect situations that are dangerous to the user and may cause burns. Therefore, from the point of view of user safety, the number of FN cases (when hands are not detected), which is very low in the examined set—less than 0.2%—is particularly important. The low optical resolution of thermal images (compared to visible light images) is sufficient for hand detection and at the same time allows for quick analysis. Thus, we can conclude that the proposed method will work in practical applications. Further development may involve expanding the set of images to include other cases and situations to make the method even more universal. Taking into account the fact that the system is to be ultimately used on mobile devices, further optimization of the network structure is also justified. In the future, it seems important to use new thermal imaging cameras or integrated devices that will appear on the market and provide greater user comfort.

6. Conclusions

In summary, the main contributions of this research are the proposal of a deep learning-assisted automated method for burns detection, carrying out an analysis of the impact of several hyper-parameters on the model and proposing the best tuned model and a comparison with the state-of-the-art methods. High efficiency of the described method proves that it can be used as an element of a thermal threat warning system. Detection of a dangerous temperature is achieved with 100% accuracy. The hand and burn risk detection reaches Acc = 99.5% and Prec = 99.5%. Compared to previous studies and classic methods, it can be seen that the effectiveness has improved significantly by about 8–9%. Compared to current methods using CNNs, it can be observed that the effectiveness is comparable or better (Table 2). The use of deep learning and convolutional neural networks and optimization of the network structure also allowed for faster image analysis than in the case of the compared methods. The most important features of the proposed method are an independent and fully automatic hand detection block, high effectiveness and speed of action and classification of various threat levels. Given that the system will analyze the environment continuously, it can be assumed that the user will be warned of a hot object and burn early enough, but a situation of direct risk of burns may only occur when a user does not react to previous warnings. The solution should be considered as a real-time system [32,33] with hard real-time constraints due to the fact that the image analysis results should be delivered simultaneously with the recording of subsequent image frames, and exceeding the time limit may pose a health risk. A hot object is detected from a distance of over 4 m, so the user is informed about a potential threat in advance, which increases their vigilance. When recording an image at 8 frames/second, the system reacts to the appearance of a hand and a hot object in the frame within 0.125–0.25 s (first or second frame). If the user is about 0.5 m away from the hot object, touching the object may take about 1 s—so the user has about 0.5 s to react. However, if the user holds out their hand and the hot object is further away, the reaction time is longer. The method may have limitations in very small spaces where an object may suddenly appear in the field of view when the user turns towards a hot object that was previously not visible to the camera.

When testing the developed model on a desktop computer using a GPU, the average classification time for 2218 test images was approximately 0.74 s using an Nvidia RTX 3060 graphics card and the CUDA library version 12.5. It can therefore be assumed that porting the model to a mobile platform should not prevent its use.

The burn risk detection efficiency (hand and hot object) reached 99.7%. Various configurations and structures of the neural network and various training and test set configurations for over 21,000 images were tested in the hand detection process. The obtained research results prove that it is possible to automatically and quickly analyze the user’s environment and protect them against burns using the proposed deep learning methods and device sets. With further development and miniaturization of vision equipment, the proposed method can be used with more comfort for the user. Ultimately, the proposed CNN model will be implemented on Android mobile devices. For this purpose, the TensorFlow library can be used, which offers support for the Android system and allows for building and exporting trained CNN networks [34,35]. The capabilities and hardware reserves offered by newer devices (fast multi-core CPUs with frequencies above 3 Ghz, GPUs clocked at over 1.4 GHz and specialized AI systems) and large memory capacity (up to a dozen GB) will allow for further expansion as well as the use of faster mobile thermal imaging cameras when they appear on the market. Hardware support for machine learning is increasingly offered in mobile devices (for example, Dimensity 9300, Snapdragon 8 Gen 3, Dimensity 8300 [36] processors contain specialized AI systems) and allows for the use of increasingly advanced solutions related to image processing and analysis. The proposed CNN system and architecture contribute to the development of the field of thermal image analysis by showing new possibilities offered by available mobile devices. Applications provided by camera manufacturers most often only enable temperature measurement pointwise or in a given area. The addition of artificial intelligence algorithms can facilitate reasoning, observation of phenomena or processes, and enable the automation of performed tasks, similarly to vision systems operating in visible light. Thermal images in vision systems are usually used less frequently than images in visible light, and it should be remembered that the results of the analysis of both types of images can also be combined, which most often improves the final effectiveness of these systems.

Author Contributions

Conceptualization, M.M.; methodology, M.M.; software, M.M.; validation, M.M. and S.W.; formal analysis, M.M. and S.W.; investigation, M.M. and S.W.; resources, M.M.; data curation, M.M.; writing—original draft preparation, M.M.; writing—review and editing, M.M. and S.W.; visualization, M.M., supervision, M.M.; project administration, M.M.; funding acquisition, M.M. and S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Medical University of Silesia, grant number BNW-1-010/N/3/F. This work was supported by the National Centre for Research and Development, “Intelligent system for effective analysis of diagnostic and repair work on industrial installations using mobile units and advanced image analysis—INRED”, project number: POIR.01.01.01-00-0170/17.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Bioethics Committee of the Medical University of Silesia (PCN/CBN/0022/KB1/27/III/16/17/21).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Butt, A.; Narejo, S.; Anjum, M.R.; Yonus, M.U.; Memon, M.; Samejo, A.A. Fall Detection Using LSTM and Transfer Learning. Wirel. Pers. Commun. 2022, 126, 1733–1750. [Google Scholar] [CrossRef]
Pillai, A.S.; Bagujar, S.; Krishnamoorthy, S. Wearable Sensor and Machine Learning Model-Based Fall Detection System for Safety of Elders and Movement Disorders. In Proceedings of the Academia-Industry Consortium for Data Science, Wenzhou, China, 19–20 December 2020; Advances in Intelligent Systems and Computing. Holtzbrinck Springer Nature Publishing Group: Stuttgart, Germany, 2022; Volume 1411. [Google Scholar] [CrossRef]
Alanazi, T.; Muhammad, G. Human Fall Detection Using 3D Multi-Stream Convolutional Neural Networks with Fusion. Diagnostics 2022, 12, 3060. [Google Scholar] [CrossRef] [PubMed]
Augustyniak, P.; Barczewska, K. Systemy Techniczne Formujące Inteligentne Otoczenie Osoby Niepełnosprawnej, 1st ed.; EXIT: Warsaw, Poland, 2015. (In Polish) [Google Scholar]
Yazar, A.; Erden, F.; Enis Cetin, A. Multi-sensor ambient assisted living system for fall detection. In Proceedings of the IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014. [Google Scholar]
Chaaraoui, A.A.; Climent-Pérez, P.; Flórez-Revuelta, F. A review on vision techniques applied to Human Behaviour Analysis for Ambient-Assisted Living. Expert Syst. Appl. 2012, 39, 10873–10888. [Google Scholar] [CrossRef]
Available online: http://www.medonet.pl/choroby-od-a-do-z/choroby-skory,oparzenia-cieplne-i-chemiczne,artykul,1578469.html (accessed on 15 June 2024).
Available online: https://www.flir.eu/flirone (accessed on 1 June 2024).
Available online: https://satir.com/product/satir-thermal-vision-256-thermal-imaging-headset (accessed on 1 June 2024).
Khan, R.; Ibraheem, N. Noor, Survey on Gesture Recognition for Hand Image Postures. Comput. Inf. Sci. 2012, 5, 3. [Google Scholar] [CrossRef]
Cheng, H.; Yang, L.; Liu, Z. A Survey on 3D Hand Gesture Recognition. IEEE Trans. Circuits Syst. Video Technol. 2016, 26, 1659–1673. Available online: https://ieeexplore.ieee.org/document/7208833 (accessed on 15 June 2024). [CrossRef]
Qi, J.; Ma, L.; Cui, Z.; Yu, Y. Computer vision-based hand gesture recognition for human-robot interaction: A review. Complex Intell. Syst. 2024, 10, 1581–1606. [Google Scholar] [CrossRef]
Wu, M. Gesture Recognition Based on Deep Learning: A review. EAI Endorsed Trans. e-Learn. 2024, 10, 1–8. [Google Scholar] [CrossRef]
Dawod, A.Y.; Abdullah, J.; Alam, M.J. A New Method for Hand Segmentation Using Free-Form Skin Color Model. In Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Chengdu, China, 20–22 August 2010; pp. V2-562–V2-566. Available online: https://ieeexplore.ieee.org/document/5579466 (accessed on 16 June 2024).
Nalepa, J.; Kawulok, M. Fast and Accurate Hand Shape Classification. In Beyond Databases, Proceedings of the Architectures, and Structures: 10th International Conference, BDAS 2014, Ustron, Poland, 27–30 May 2014; Springer International Publishing: New York, NY, USA, 2009; pp. 364–373. [Google Scholar]
Snekhalatha, U.; Anburajan, M.; Sowmiya, V.; Venkatraman, B.; Menaka, M. Automated hand thermal image segmentation and feature extraction in the evaluation of rheumatoid arthritis. Proc. Inst. Mech. Eng. Part H J. Eng. Med. 2015, 229, 319–331. [Google Scholar] [CrossRef] [PubMed]
Song, E.; Lee, H.; Choi, J.; Lee, S. AHD: Thermal image-based adaptive hand detection for enhanced tracking system. IEEE Access 2018, 6, 12156–12166. [Google Scholar] [CrossRef]
Font-Aragones, X.; Faundez-Zanuy, M.; Mekyska, J. Thermal hand image segmentation for biometric recognition. IEEE Aerosp. Electron. Syst. Mag. 2013, 28, 4–14. Available online: https://arxiv.org/abs/2202.11462 (accessed on 16 June 2024). [CrossRef]
Marzec, M. Monitoring temperature-related hazards using mobile devices and a thermal camera. In Information Technology in Biomedicine; Pietka, E., Badura, P., Kawa, J., Wieclawek, W., Eds.; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2021; Volume 1186, pp. 369–383. [Google Scholar] [CrossRef]
Luo, R.; Luppescu, G. Using RGB, Depth, and Thermal Data for Improved Hand Detection. Department of Electrical Engineering 2016, Stanford University. Available online: https://web.stanford.edu/class/cs231a/prev_projects_2016/231a-final-project.pdf (accessed on 2 June 2024).
Molchanov, P.; Yang, X.; Gupta, S.; Kim, K.; Tyree, S.; Kautz, J. Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4207–4215. [Google Scholar] [CrossRef]
Adithya, V.; Rajesh, R. A Deep Convolutional Neural Network Approach for Static Hand Gesture Recognition. Procedia Comput. Sci. 2020, 171, 2353–2361. [Google Scholar] [CrossRef]
Roy, K.; Mohanty, A.; Rajiv, R. Sahay, Deep Learning Based Hand Detection in Cluttered Environment Using Skin Segmentation. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCVW), Venice, Italy, 22–29 October 2017; pp. 640–649. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Le, T.H.N.; Zheng, Y.; Zhu, C.; Luu, K.; Savvides, M. Multiple Scale Faster-RCNN Approach to Driver’s Cell-phone Usage and Hands on Steering Wheel Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Nashville, TN, USA, 19–25 June 2016; pp. 46–53. [Google Scholar] [CrossRef]
Deng, X.; Zhang, Y.; Yang, S.; Tan, P.; Chang, L.; Yuan, Y.; Wang, H. Joint hand detection and rotation estimation using CNN. IEEE Trans. Image Process. 2017, 27, 1888–1900. [Google Scholar] [CrossRef] [PubMed]
Dewi, C.; Chen, A.P.S.; Christanto, H.J. Deep Learning for Highly Accurate Hand Recognition Based on Yolov7 Model. Big Data Cogn. Comput. 2023, 7, 53. [Google Scholar] [CrossRef]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Krizhevsky, A. Convolutional Neural Networks for Object Classication in CUDA. 2009. Available online: https://www.eecg.utoronto.ca/~moshovos/CUDA08/arx/convnet_report.pdf (accessed on 7 June 2024).
Sahoo, J.P.; Prakash, A.J.; Pławiak, P.; Samantray, S. Real-Time Hand Gesture Recognition Using Fine-Tuned Convolutional Neural Network. Sensors 2022, 22, 706. [Google Scholar] [CrossRef] [PubMed]
Jleilaty, S.; Ammounah, A.; Abdulmalek, G.; Nouveliere, L.; Su, H.; Alfayad, S. Distributed real-time control architecture for electrohydraulic humanoid robots. Robot. Intell. Autom. 2024, 44, 607–620. [Google Scholar] [CrossRef]
Shah, V.; Sajnani, N. Multi-Class Image Classification using CNN and Tflite. Int. J. Res. Eng. Sci. Manag. 2020, 3, 65–68. [Google Scholar] [CrossRef]
Mihajlović, S.; Ivetić, D.; Berković, I. Image Classification Using Convolutional Neural Networks. In Proceedings of the Conference: X International Conference on Applied Internet and Information Technologies, Zrenjanin, Serbia, 14–16 October 2020. [Google Scholar]
Available online: https://ai-benchmark.com/ranking_processors (accessed on 18 September 2024).

Figure 1. FLIR One mobile camera [8] and goggles with SATIR Thermal Vision 256 camera [9].

Figure 2. Examples of thermal images from the test set.

Figure 3. Mounting of a smartphone or goggles with an IR camera.

Figure 4. Block diagram of the thermal threat detection algorithm.

Figure 5. Network structure.

Figure 6. Sample masks of generated filters for the Conv1, Conv2, Conv3 layers.

Figure 7. Network training process.

Figure 8. TP cases—only the hand or both the hand and hot object.

Figure 9. FP cases—the hand is detected even though it is not present in the image—all cases.

Figure 10. TN cases—hot object and no hand.

Figure 11. FN cases—hot object and hand undetected—all cases.

Table 1. Comparison of results for different configurations of training and test sets.

Structure of Image Set	Accuracy	Precision	Specificity	Sensitivity
75/25	98.2%	97.6%	95.7%	99.5%
90/10	99.5%	99.5%	99.2%	99.7%

Table 2. Comparison of the effectiveness of hand detection methods with the proposed solution.

No.	Method	Accu ACC	Precis PPV	Specif TNR	Sensi/Recall TPR	Image Count	Dataset
1	YCbCr [14]	ND	ND	97.2%	94.6%	-	Visible light images
2	RGBD [21]	97%	-	-	-	1080	Visible light images
3	SVM [19]	89%	92%	87%	90%	5182	Thermal images
4	MS-FRCNN [26]	94%	-	-	-	14,991	Visible light images
5	YOLOv7x [28]	-	84.7%	-	79.9%	13,050	Visible light images
6	Proposed detector based on [29,31]	99.7%	99.7%	99.4%	99.8%	21,180	Thermal images
7	Proposed detector—optimized	99.5%	99.5%	99.2%	99.7%	21,180	Thermal images

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Marzec, M.; Wilczyński, S. Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks. Appl. Sci. 2024, 14, 8878. https://doi.org/10.3390/app14198878

AMA Style

Marzec M, Wilczyński S. Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks. Applied Sciences. 2024; 14(19):8878. https://doi.org/10.3390/app14198878

Chicago/Turabian Style

Marzec, Mariusz, and Sławomir Wilczyński. 2024. "Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks" Applied Sciences 14, no. 19: 8878. https://doi.org/10.3390/app14198878

APA Style

Marzec, M., & Wilczyński, S. (2024). Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks. Applied Sciences, 14(19), 8878. https://doi.org/10.3390/app14198878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Thermal Threat Monitoring Using Thermal Image Analysis and Convolutional Neural Networks

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Dataset

3.2. Proposed Methodology

4. Experiments and Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI