**1. Introduction**

Many civilian fire injuries and civilian fire deaths occur each year due to intentionally-set fires and naturally occurring fires, which causes much property damage. Fires are classified into structure fires (home structures which include one-and two-family, manufactured homes, and apartments), non-residential structure fires (public assembly, school and college, store and office, industrial facilities, and other structures), and outdoor fires (bush, grass, forest, rubbish, and vehicle fires) [1]. Research on automatic fire detection or monitoring has long been the focus of the interior structure fires and non-residential structure fires to protect casualties and property damage from fires.

Smoke is very important because it indicates the start of a fire. However, sometimes the flames start first; thus, both smoke and flames require early detection to extinguish the fire early. Many methods of detecting smoke and flames to extinguish a fire early have been studied. In order to reduce the damage caused by fire, many early fire detection systems using heat sensors, smoke sensors, and flame detection sensors that detect flames by infrared rays (spectrum) and ultraviolet rays (spectrum) are frequently used [2,3]. Sensors used in buildings, factories, and interior spaces detect the presence of particles produced by fire flames and smoke in close proximity using a chemical reaction by ionization that requires proximity. Traditional fire alarm systems using sensors show good detection results in close proximity for activation or very narrow spaces [4,5]. However, sensor-based sensing systems are expensive because many devices need to be installed for fast detection. The disadvantage of the thermal sensor is that the detection is slow because it uses the temperature difference from the surroundings. The smoke sensor may be delayed depending on the speed of the smoke or may not be detected depending on the air flow. In addition, sensor-based detection systems cannot provide users with information about the location or size of a fire. The main disadvantage of the sensor based system is that it is difficult to install outdoors. As mentioned earlier, fires can occur anywhere and anytime, and must be detected at various locations.

In order to overcome the shortcomings of the sensor-based detection systems, many methods of detecting smoke and fire using camera sensors (image-based) have been studied [6,7]. Compared to sensor based fire detectors, video fire detectors have many advantages, such as fast response, long range detection and large protected areas. However, most of the recent video fire detections have a high rate of false alarms [8].

Vision based fire detection includes short range fire detection and long range fire detection. Long-distance forest or wildfire smoke and fire detection system using fixed CCD (Charge-Coupled Device) cameras is the monitoring of smoke and fire from distant mountains or fields [9–11]. In addition, Zhao et al. [12] described wildfire identification based on deep learning using unmanned aerial equipment. To extract local extremal regions of smoke, they used the rapidly growing Maximally Stable Extremal Region detection method in the field of initial smoke region detection.

More research has been conducted on short distance fire and smoke detection than on long distance forest of wildfire. Early fire detection using cameras detected fires in tunnels and mountains using black and white images [13,14]. Early feature extraction detected flames by measuring histogram changes using the temporal change characteristics of flames from black and white images. Recently, image-based flame detection methods using motion, color, shape, texture, and frequency analysis have been studied for the last 20 years [15–21].

Conventional flame detection methods include a method using RGB (Red, Green, Blue) HSV (Hue, Saturation, Value), YCbCr color models, etc., wavelet transform after detecting moving areas and flame color pixels, flame intensity changes over time, the shape of contour of fire flame in HSV color models and time-space domain, and a method using infrared image.

The color image fire detection algorithms determine cases where the flame's color level exceeded a certain threshold in the brightness information of the color space such as RGB, YCbCr, HIS (Hue, Saturation, Intensity), and CIEL\*a\*b\* (CIELAB, Commission Interationale De L'éclairage) [22–26]. Algorithms using spatial domain analysis are algorithms for distinguishing between the flame color and the non-flame color. There are algorithms for determining the fire or analyzing the frequency components of the flame region by analyzing the texture of the flame candidate area [15,26,27]. The algorithm using the frequency analysis of the time domain determines the fire by analyzing the frequency of a specific level value of the flame candidate region that changes over time [22,23,25,26].

Chen et al. [27] studied the fire detection system using RGB and HSI color model and rule-based by using the characteristic that the flame movement is spread in irregular shape when fire occurs. Toreyin et al. [28] proposed a system that detects fire and non-fire using temporal and spatial wavelet analysis of input images as a feature of high frequency components, based on the fact that smoke appears translucent in the early stages of fire. Yuan [7] proposed an algorithm which is fast estimated the motion orientation of smoke and an accumulative motion model which is used the integral image. This is a method of generating a direction histogram for a motion vector by using a feature of upward moving of smoke, and determining that smoke is a case when there are a lot of motion vectors in a relatively upward direction. Yuan [29] proposed a smoke detection algorithm based neural network classification to train using feature vectors, which are generated by LPB (Local Binary pattern) and LBPV (Local Binary pattern Variance) histograms for rotation and lighting in multi-scale pyramid images. Celik and Demirel [30] presented the experimental results using YCbCr color space and proposed a pixel classification algorithm for flames. To this end, they suggested a very innovative

algorithm that separates the chrominance from luminance components. However, this method used heuristic membership and did not produce good results for the new data. Fujiwara [31] proposed a smoke detection algorithm for smoke shapes using a fractal encoding method using the self-organism of smoke in grayscale images. Liu and Ahuja [16] detected the fire region based on the area expansion method using the fire initial region that has high brightness. They asserted that the fire zones and non-fire zones are classified by Fourier coefficients change over time. Philips [32] classified the fire region using the changes in status over time for candidate region, after the fire flame candidate region is dedicated by the color histogram adapted Gaussian filter. Tian et al. [33] detected smoke regions by image separation. After the background model was created, the smoke was detected by gray color and partial transparency. The limitation of the vision-based method is that it fails to detect transparent smokes. Moreover, it often mistakenly detects many natural objects, for example, the sun, various artificial lights or light reflects on various surfaces, dust particles, as well as flame and smoke. Additionally, scene complexity and low-video quality can affect the robustness of vision-based flame detection algorithms, thus increasing the false alarm rate. Barmpoutis et al. [34,35] also asserted that high false alarm rates are caused by natural objects, which have similar characteristics with flame, and by the variation of flame appearance. Other causes have claimed environmental changes that complicate fire detection including clouds, movement of rigid body objects in the scene, and sun and light reflections. Hence, the difficulty of fire flame detection from digital images is due to the chaotic and complex nature of fire phenomena. Lee et al. [36] proposed smoke detection algorithm based on the Histogram of Oriented Gradients and LBP. Adaboost, which is constructing a strong classifier as linear combination, was used to classify trained object.

In contrast, the deep learning based fire flame and smoke detection systems have automatic feature extraction; thus, making the process much more reliable and efficient than the conventional feature extraction methods. However, such a deep learning approach requires tremendous computational power, not only during training periods, but also when deploying trained models to hardware to perform specific tasks. As a fire detection method using a security surveillance camera, fire detection techniques using real-time image analysis and deep learning have been proposed.

Recently, several kinds of deep learning algorithms for fire flame and smoke detection have been proposed. Frizzi et al. [37] researched the Convolution Neural Network (CNN) based smoke and flame detection, Sang [38] studied the classification of smoke image and flame image feature using composite product neural network, Wu et al. [39] Studied the detection of fire and smoke regions by extracting dynamic and static features using ViBe algorithm, Shen et al. [40] detected the fire flame using the YOLO (You Look Only Once) model, and Khan et al. [41] also researched a disaster management system to respond to early fire detection and automatic reaction within the inside and outside environment using CNN. Zhang et al. [42] researched forest fire detection utilizing fire patches detection using two joined deep CNNs to detect fire in forest images. However, these models have many parameters to render, which require a large computing space. Thus, these models are unsuitable for onfield fire detection applications using low-cost low-performance hardware. Muhammad et al. [43] used Foggia's dataset [44]. They fine-tuned various variants of CNNs: AlexNet [41], SqueezeNet [43], GoogleNet [44], and MobileNetV2 [45]. They used Foggia's dataset [46] as the major portion of their train dataset. Although Foggia's dataset includes 14 fire and 17 non-fire videos with multiple frames, the dataset contains a lot of similar images, which restricts the performance of the model trained on this dataset to a very specific range of images. Recently, much research has been conducted on Faster R-CNN, which shows higher performance than other network models, like as R-CNN (Region-based Convolutional Network) and Fast R-CNN. Barmpoutis et al. [39] studied higher-order linear dynamical systems based multidimensional texture analysis as the deep learning networks. They classified the fire using the Faster R-CNN model based on the spatial analysis on Grassmann manifold. Wildland forest fire and smoke detection algorithm with Faster R-CNN was suggested by the Zhang et al. [47] to avoid the complex process.

As mentioned above, malfunctions of smoke and flame detection using image processing have been drastically reduced due to the development of deep learning, but the malfunction still exist due to problems of deep learning. The goal of most of the existing approaches is detecting either smoke or fire from images, but as explained, they suffer from a variety of limitations. To solve the problem of these limitations, in this paper, Faster R-CNN model is proposed with object attribution for increasing the smoke and fire flame detection and decreasing the false positive rate. This method is capable of detecting both smoke and fire flame images at the same time, and offers many advantages and exhibits better performance than other existing visual recognition CNN models for the recognition of fire flame and smoke in images. Additionally, we researched a novel algorithm on rigid change of natural environment to reduce the false positive smoke detection based on advanced deep learning, as shown Figure 1.

**Figure 1.** Flowchart of the proposed algorithm.

This paper is organized as follows. We propose deep learning model architecture for flame and smoke detection in surveillance camera in Section 2. This paper explains several theories to reduce the rate of false alarms and improve the detection rate in Section 3. Our experimental results and discussion are implemented in Section 4. Finally, the manuscript presents a brief conclusion and future research directions in Section 5.

### **2. Deep Learning (Faster R-CNN)**

It is often more difficult to distinguish objects within an image than to classify images. Deep learning using the R-CNN method takes several steps. Once the R-CNN creates a region proposal or a bounding box for an area where an object exists, it unifies the size of the extracted bounding box to use as input to CNN. Next, the model uses SVM (Support Vector Machine) to classify the selected region. Finally, it uses a linear regression model so that the bounding box of the categorized object sets the exact coordinates. CNN for training data is divided into three parts. Figure 2 depicts the full flow of the proposed system. In Figure 2, RPN (Region Proposal network) was used to find a predefined number

of regions (bounding boxes) that can contain objects using features computed by CNN. The next step is to get a list of possible related objects and their locations in the original image. We apply region of interest pooling (ROIP), using boundary boxes for features and related objects extracted from CNN, and extract the features corresponding to related objects as new tensors. Finally, this information is used to classify the contents of the bounding box and the bounding box coordinates are adjusted in the R-CNN module. As a result of the Faster R-CNN, a bounding box of related objects is displayed on the screen. The proposed algorithm part is added at end of Faster R-CNN. We finally select the case where FD (Final Decision) is greater than threshold (TH) using several features in the bounding box.

**Figure 2.** Faster R-CNN system flow.
