**1. Introduction**

Wildfires cause significant harm to humans and damage to private and public property; they pose a constant threat to public safety. More than 200,000 wildfires occur globally every year, with a combustion area of 3.5–4.5 million km<sup>2</sup> [1]. In addition, climate change is gradually accelerating the effects of these wildfires; there is thus considerable interest in wildfire managemen<sup>t</sup> [2–4]. As wildfires are difficult to control once they spread over a certain area, early detection is the most important factor in minimizing wildfire damage. Traditionally, wildfires were primarily detected by human observers, but a deep learning-based automatic wildfire detection system with real-time surveillance cameras has the advantage of the possibility of constant and accurate monitoring, compared to human observers. The available methods for the early detection of wildfires can be categorized as a sensor-based technology and image-processing-based technology, using a camera. Sensors that detect changes in smoke, pressure, humidity, and temperature are widely used for fire detection. However, this method has several disadvantages, such as high initial cost and high false-alarm rates, as the performance of sensors is significantly affected by the surrounding environment [5–7].

With the rapid development of digital-cameras and image-processing technologies, traditional methods are replaced by video- and image-data-based methods [8]. Using these methods, a large area of a forest can be monitored, where fires and smoke can be detected immediately after the outbreak of a wildfire. In addition, owing to intelligent image-analysis technology, image-based methods can be used to address the problem of the inflexibility of sensing technology to new environments [9]. Such early approaches include the use of support vector machines (SVM) [10,11] for classifying wildfire images, and fuzzy c-means clustering [12] for identifying potential fire regions. Recently, convolutional neural networks (CNNs), which provide excellent image classification and object detection by extracting features and patterns from images, made many contributions to the wildfire-detection field [13–16]. CNN is one of the most popular neural networks and was successfully used in many research and industry applications, such as computer vision and image processing [17,18]. These networks were developed and successfully applied to many challenging image-classification problems, such as for improving a model's performance [19,20]. Muhammad et al. [21] developed a modified model from GoogleNet Architecture for fire detection, to increase the model's accuracy, and proposed a framework for fire detection in closed-circuit television surveillance systems. Jung et al. [22] developed a decision support system concept architecture for wildfire managemen<sup>t</sup> and evaluated CNN-based fire-detection technology from the Fire dataset. As noted by Jain et al. in their review of machine-learning applications in wildfire detection [23], Zhang et al. found that CNN outperforms the SVM-based method [24], and Cao et al. reported a 97.8% accuracy rate for smoke detection, using convolutional layers [25]. Recently, advances in mobile communication technology made it possible to use unmanned aerial vehicles (UAVs), which are more flexible than fixed fire-monitoring towers; images obtained from UAVs are used to learn fire-detection models [26,27].

Despite the contributions of these successful studies, some issues still need to be resolved in order to apply this technology in the field. Mountain-image data are easy to obtain, owing to the availability of various built-up datasets. However, not only is there a dearth of fire or smoke images of wildfires in datasets, but such data are also relatively di fficult to obtain because they require the use of installed surveillance cameras or operational drones at the site of the wildfire [28,29]. Therefore, research on damage detection is frequently faced with a data imbalance problem, which causes overfitting; overfitting results in the deterioration of the model performance [30]. In order to solve this data imbalance problem, in a recent study, synthetic images were generated and used to expand the fire/smoke dataset [24,31]. In early studies, the data were increased using indoor artificially generated smoke and flames or artificial images that comprised cut-and-pasted images of flames in their background. However, this requires considerable manpower, and it is di fficult to emulate the characteristics of wildfire images using indoor images. Generative adversarial networks (GANs) [32] are models that create new images using two networks—a generator and a discriminator. The generator creates similar data using the training set, and the discriminator distinguishes between the real data and the fake data created by the generator. The image rotation and image cropping data augmentation method can also be used to expand the training dataset; however, GANs can be used to increase dataset diversity as well as to increase the amount of data. They recently exhibited impressive photorealistic-image-creation results [33–36]. GANs were proven to improve performance when learning the classifier, mainly in areas where it is di fficult to obtain damage data [37–39]. However, there are relatively few related studies in the field of wildfire detection. Namozov et al. used GANs to create fire photographs with winter and evening backgrounds in the original photographs, and added a variety of seasons and times [28]. However, it is di fficult to provide various types of fire scenarios in various places as the resultant image retains not only the background of the original photo, but also the shape of the flame and smoke. To apply the early wildfire detection model to the field, it is necessary to learn various types of wildfire images using new backgrounds, such that wildfire detection can be actively performed even in a new environment.

With the development of the CNN model and the deepening of neural networks, problems such as vanishing gradients arise, which causes overfitting and deterioration of the model performance. An algorithm constructed using the latest neural network architecture of DenseNet [40] could be used to address this issue. DenseNet improves the performance of a model by connecting the feature maps

of the previous layers to the inputs of the next layer using concatenation to maximize the information flow between layers.

Inspired by recent works, we generated synthetic wildfire images using GANs to change the image of a fire-free mountain to that of a mountain with a wildfire. The k-folds (k = 5) cross validation scheme was used on the models, and the train set was separated, train sets A and B, consisting of only the original images and of the original and generated images, respectively. Each dataset was divided to obtain the training data and test data, and was used to train a model that was developed based on DenseNet; this facilitated the comparison of the performance with two pre-trained models, VGG-16 [19] and ResNet-50 [20]. This paper is organized as follows. Section 2 describes the architecture of cycle-consistent adversarial networks (CycleGANs) [41], which is one of the main GANs algorithms used for data augmentation, and DenseNet [40], which is used for wildfire-image classification (wildfire detection). The experiment results obtained using both the models and the classification performance comparison with those of the pre-trained models are presented in Section 3. Section 4 presents the conclusion of this study.

#### **2. Materials and Methods**

#### *2.1. Data Collection*

(**a**) (**b**)

The wildfire and non-fire images that were used for training the GAN model and CNN classification models were collected. The mountain datasets were obtained from eight scene-categories databases [42] and a Korean tourist spot database [43]. However, there is no open data benchmark available for fire or smoke images of wildfires [28]. The collection was, thus, solely obtained using web crawling; this limitation resulted in a data imbalance. Considering that the early fire-detection model is intended for application in drones and surveillance cameras for the purpose of monitoring, both categories of datasets were crawled from images or videos obtained using a drone. The sample of the dataset is presented in Figure 1. A total of 4959 non-wildfire images and 1395 wildfire images were set up in our original dataset and resized to 224 × 224 for the network input.

**Figure 1.** Sample mountain and wildfire images from conducted data collection. (**a**) Mountain images from eight scene categories database. (**b**) Mountain images from Korean tourist spot database. (**c**) Drone-captured mountain images obtained via web image crawling. (**d**) Drone-captured wildfire images obtained via web image crawling. (**e**) Drone-captured wildfire images obtained via web video crawling.

(**c**) (**d**) (**e**)

#### *2.2. CycleGAN Image-to-Image Translation*

To generate wildfire images, CycleGAN [41] was used, which is a method used for image-to-image translation from the reference image domain (X) to the target image domain (Y), without relying on paired images. As illustrated in Figure 2, in the CycleGAN, two loss functions called the adversary loss [33] and cycle-consistency loss [41] were used.

**Figure 2.** Architecture of CycleGAN; mapping between two image domains x and y. The model training is performed as the forward and inverse mappings are learned simultaneously using the adversarial loss and cycle-consistency loss.

Our objective was to train *Gx*→*y* such that the discriminator *Dy* cannot distinguish the image data distribution from *Gx*→*y* and the image data distribution from domain Y. This objective can be written as follows:

$$\mathcal{L}\_{\mathsf{GAN}}(\mathsf{G}\_{\mathbf{x}\rightarrow y}, D\_Y, X, Y) = \mathbb{E}\_{\mathsf{y}\sim p\_{\text{data}}(y)}[\log D\_Y(y)] + \mathbb{E}\_{\mathsf{x}\sim p\_{\text{data}}(\mathbf{x})}[\log(1 - D\_Y(\mathsf{G}\_{\mathbf{x}\rightarrow y}(\mathbf{x})))].\tag{1}$$

$$\mathcal{L}\_{\text{GAN}}(G\_{\mathcal{Y}\to\text{x}}, D\_{\text{x}}, X, Y) = \mathbb{E}\_{\mathbf{x}\sim p\_{\text{data}}(\mathbf{x})}[\log D\_{\mathbf{x}}(\mathbf{x})] + \mathbb{E}\_{\mathcal{Y}\sim p\_{\text{data}}(\mathbf{y})}[\log(1 - D\_{\mathbf{x}}(G\_{\mathcal{Y}\to\text{x}}(\mathbf{y})))].\tag{2}$$

However, in a general GAN, the model is not trained over the entire distribution of actual data; it is only trained for reducing the loss. Therefore, a mode collapsing problem occurs in which the optimization fails, as the generator cannot find the entire data distribution, and all input images are mapped to the same output image. To solve this problem, in the CycleGAN, inverse mapping and cycle-consistency loss (L*cyc*) were applied to Equations (1) and (2), respectively, and various outputs were thus produced. The equations of the cycle-consistency loss were as follows:

$$\mathcal{L}\_{\text{cyc}}(\mathbb{G}\_{\mathbf{x}\rightarrow\mathbf{y}},\mathbb{G}\_{\mathbf{y}\rightarrow\mathbf{x}}) = \mathbb{E}\_{\mathbf{x}\sim\text{plain}(\mathbf{x})}[\left\lVert\mathbb{G}\_{\mathbf{y}\rightarrow\mathbf{x}}\Big(\mathbb{G}\_{\mathbf{x}\rightarrow\mathbf{y}}(\mathbf{x})\Big) - \mathbb{x}\right\rVert\_{1}] + \mathbb{E}\_{\mathbf{y}\sim\text{plain}(\mathbf{y})}[\left\lVert\mathbb{G}\_{\mathbf{x}\rightarrow\mathbf{y}}\Big(\mathbb{G}\_{\mathbf{y}\rightarrow\mathbf{x}}(\mathbf{y})\Big) - \mathbb{y}\right\rVert\_{1}].\tag{3}$$

In addition, by converting the X domain into *Gy*→*x* while adding an identity loss (L*im*) that regularized the generator, such that the calculated output was the same as the input, the converted image could be generated, while minimizing the damage to the original image.

$$\mathcal{L}\_{\text{im}}(\mathbb{G}\_{\text{x}\to\text{y}}, \mathbb{G}\_{\text{y}\to\text{x}}) = \mathbb{E}\_{\mathcal{Y}\sim p\_{data}(\mathbf{y})} \left[ \left\| \mathbb{G}\_{\text{x}\to\text{y}}(\mathbf{y}) - \boldsymbol{y} \right\|\_{1} \right] + \mathbb{E}\_{\mathcal{X}\sim p\_{data}(\mathbf{x})} \left[ \left\| \mathbb{G}\_{\text{y}\to\text{x}}(\mathbf{x}) - \boldsymbol{x} \right\|\_{1} \right]. \tag{4}$$

The final loss combined with all losses was as follows. Using CycleGAN with this method, it was possible to create various wildfire images, while maintaining the shape and background color of the forest site.

$$\begin{split} \mathcal{L}\Big(\mathcal{G}\_{\mathbf{x}\rightarrow y\prime}, \mathcal{G}\_{\mathbf{y}\rightarrow\mathbf{x}\prime}, D\_{\mathbf{x}\prime}, D\_{y}\Big) &= \mathcal{L}\_{\mathbf{GAN}}\Big(\mathcal{G}\_{\mathbf{x}\rightarrow y\prime}, D\_{\mathbf{}}X\_{\prime}\prime\prime\Big) + \begin{split} \mathcal{L}\_{\mathbf{GAN}}\Big(\mathcal{G}\_{\mathbf{y}\rightarrow\mathbf{x}\prime}, D\_{\mathbf{}}X\_{\prime}\prime\prime\Big) + \\ \lambda \mathcal{L}\_{\mathbf{cy}\leftarrow\mathbf{y}\prime}\Big(\mathcal{G}\_{\mathbf{x}\rightarrow\mathbf{y}\prime}, G\_{\mathbf{y}\rightarrow\mathbf{x}}\Big) + \mathcal{L}\_{\mathbf{im}}\Big(\mathcal{G}\_{\mathbf{x}\rightarrow\mathbf{y}\prime}, G\_{\mathbf{y}\rightarrow\mathbf{x}}\Big). \end{split} \tag{5}$$
