*Article* **Satellite Observations of Fire Activity in Relation to Biophysical Forcing Effect of Land Surface Temperature in Mediterranean Climate**

**Julia S. Stoyanova \*, Christo G. Georgiev and Plamen N. Neytchev**

National Institute of Meteorology and Hydrology, 1784 Sofia, Bulgaria; christo.georgiev@meteo.bg (C.G.G.); plamen.neytchev@meteo.bg (P.N.N.)

**\*** Correspondence: julia.stoyanova@meteo.bg; Tel.: +359-2-462-4603

**Abstract:** The present work is aimed at gaining more knowledge on the nature of the relation between land surface temperature (LST) as a biophysical parameter, which is related to the coupled effect of the energy and water cycles, and fire activity over Bulgaria, in the Eastern Mediterranean. In the ecosystems of this area, prolonged droughts and heat waves create preconditions in the land surface state that increase the frequency and intensity of landscape fires. The relationships between the spatial–temporal variability of LST and fire activity modulated by land cover types and Soil Moisture Availability (SMA) are quantified. Long-term (2007–2018) datasets derived from geostationary MSG satellite observations are used: LST retrieved by the LSASAF LST product; fire activity assessed by the LSASAF FRP-Pixel product. All fires in the period of July–September occur in days associated with positive LST anomalies. Exponential regression models fit the link between LST monthly means, LST positive anomalies, LST-T2 (as a first proxy of sensible heat exchange with atmosphere), and FRP fire characteristics (number of detections; released energy FRP, MW) at high correlations. The values of biophysical drivers, at which the maximum FRP (MW) might be expected at the corresponding probability level, are identified. Results suggest that the biophysical index LST is sensitive to the changes in the dynamics of vegetation fire occurrence and severity. Dependences are found for forest, shrubs, and cultivated LCs, which indicate that satellite IR retrievals of radiative temperature is a reliable source of information for vegetation dryness and fire activity.

**Keywords:** geostationary satellite observations; wildfire regime; biophysical drivers; land surface temperature; land cover type; trends

#### **1. Introduction**

Fire is a global phenomenon closely linked to climate variability and human practices with critical regional implications [1,2]. It is an important process in the modulation of the Earth system through the links between weather, climate, and vegetation and has the potential to impact on the global climate system by changing the ability of the surface to absorb and emit energy [3,4].

Climatic variability is a major driver of fire in many terrestrial ecosystems, as reflected in Bradstock's conceptual model of four climatic "switches" that influence fire regimes by controlling fuel amount, fuel moisture, and fire weather at contrasting temporal scales [1]. Fire regimes are also affected by other controls such as landscape-scale patterns of vegetation, topography, and human activities [5]. Climate is connected to fires at two distinct temporal scales [6]. Short-term climatic anomalies (from months to years) affect fires by modifying vegetation growth and fuel moisture before the fire and by influencing weather during the period of fire spread. In addition, climate has more indirect, long-term (decadal or longer) effects on the distribution of major vegetation types, which in turn constrain the landscape-scale mosaics of fuels and vegetation.

**Citation:** Stoyanova, J.S.;

Georgiev, C.G.; Neytchev, P.N. Satellite Observations of Fire Activity in Relation to Biophysical Forcing Effect of Land Surface Temperature in Mediterranean Climate. *Remote Sens.* **2022**, *14*, 1747. https://doi.org/ 10.3390/rs14071747

Academic Editors: Elena Marcos, Leonor Calvo, Susana Suarez-Seoane and Víctor Fernández-García

Received: 28 February 2022 Accepted: 1 April 2022 Published: 5 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Soil moisture (SM) has been identified as a key variable in understanding and predicting wildfire hazards [7–11]. Soil moisture, defined as the water contained in the unsaturated soil zone [12], not only influences vegetation growth conditions and consequently the accumulation of wildfire fuel, but also determines the vegetation moisture content and hence the flammability of the vegetation [7,9,13].

Land surface temperature (LST) is one of the most important parameters controlling physical processes responsible for the land surface balance of water, energy, and CO2 [14,15]. In the context of wildfire studies, the LST consideration has two main aspects: first, as a pre-fire indicator of land surface state (SM deficit in root zone and vegetation dryness, evapotranspiration, leaf temperature), and second, as a post-fire characteristic of fireinduced environmental changes. The variations in the spatial distribution of LST as a result of fires are usually the focus of research works that characterize fire intensity and burn severity. Higher post-fire LST of burned areas has been reported in remotely-sensed images [3,4,16–18], that is accounted mainly due to a decrease in transpiration and an increase in the Bowen ratio (β = sensible heating/latent heating) [19].

Although relationships between drought and fire seem quite interrelated, only a few studies have explored LST as a pre-fire indicator [20,21]. Short-term analyses based on LST daily anomalies have been performed to predict fire occurrence [20].

The vegetation cover partitions the incoming solar radiation into sensible and latent heat fluxes depending on its structure as well as affects the surface roughness, which can in turn alter heat and moisture transport; the type of vegetation and its seasonal dynamics affect the land's vulnerability to fire ignition and spread. That is why spatial–temporal patterns of LST can contribute to the monitoring of processes that structure ecosystem development and may be associated with fire occurrences to help fire management. Satellite measurements are a source of information for the accurate estimation of LST at the global and regional level, thus helping to evaluate the land surface–atmosphere exchange processes and can serve as a valuable metric of surface state [4,14].

In this study, we consider data for skin surface temperature retrieved by satellite measurements to explore the significance of LST as a biophysical driver, which (in combination with SM, air temperature, and humidity) controlled long-term wildfire activity during the study period of 2007–2018 over the Eastern Mediterranean region, as seen in coherent satellite active fire observations.

Studies have used satellite remote sensing for fire monitoring from the early 1990s. With the rapid development of remote sensing technology, satellites are increasingly used for fire monitoring [22–26] and applied at large scales. In recent years, many studies have been committed to identifying the spatial and temporal characteristics of fires from the global and regional perspectives with the use of satellite data [27–31]. All these studies were performed by using observations by low earth orbit (LEO) satellite sensors with the Very High Resolution Radiometer (AVHRR), Moderate Resolution Imaging Spectroradiometer (MODIS), and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER).

Geostationary (GEO) fire products were first generated over the Americas using the Geostationary Operational Environmental Satellite Visible Infrared Spin Scan Radiometer Atmospheric Sounder (GOES-VAS), e.g., in [32], and this led to the development of the long-standing GOES WildFire Automated Biomass Burning Algorithm (GOES WFABBA) product [33]. Roberts et al. [34] first demonstrated the retrieval of Fire Radiative Power (FRP) from GEO data and developed a full "fire thermal anomaly" active fire detection and FRP retrieval algorithm for GEO systems. This was initially applied to data from Meteosat Second Generation [35], and an operational version is now used to generate a series of geostationary active fire detection and FRP retrieval products that span much of the globe, including Meteosat over Africa and Europe [36,37], GOES-East and West over the Americas [38], and Himawari over Asia [39]. Wooster et al. [40] summarized the history of achievements in the field of active fire remote sensing, reviewed the physical basis of the approaches used, the nature of the active fire detection and characterization techniques deployed, and highlighted some of the key current capabilities and applications.

However, since every observation by the satellite sensors lasts only an instant, what they actually measured was the rate of release of FRE per unit of time or Fire Radiative Power (FRP) in MW [24,41,42]. It was shown that the FRE released by a fire is directly proportional to the biomass consumed as well as the smoke emitted [43]. Measurement of fire radiative energy (FRE) release rates or power (FRP) from space offers opportunities for the characterization of biomass burning and emissions in a quantitative manner. In essence, it has enabled the rating of fires as well as an estimation of regional FRP fluxes, which reflect the relative concentrations of biomass burning activities.

Mediterranean regions are some of the most affected by wildfires, and remote-sensed information about fire activity as provided by the SEVIRI instrument on board Meteosat-8 is especially valuable for forest and civil protection activities [44]. On the other hand, the Mediterranean is one of the most responsive regions to climate change, as evidenced by "pronounced warming" and significant decreases in spring and summer precipitation, which have led to regime shifts toward more arid climates [45,46]. Triggered by large-scale atmospheric forcing, Mediterranean regional heat waves are often amplified by surface preconditioning such as negative soil moisture anomalies and vegetation stress [47].

The research works of Sifakis et al. [48], Amraoui et al. [44], and Di Biase and Laneve [49] were aimed at understanding the spatial and temporal patterns of landscape fires in the Mediterranean, applying different remote sensing data from GEO sensors. Compared to LEO systems, GEO products offer higher temporal resolutions but coarser spatial resolutions, and each sensor only provides data over a specific region of the Earth. The observations from the geostationary meteorological satellite MSG provide a valuable source of information about fire occurrences in the Mediterranean region. Sifakis et al. [48] reported that during the summer period of 2007, MSG-SEVIRI data successfully detected 82% of the fire events in Greek territory, with less than 1% false alarms. Using data from MSG-1 satellite, Amraoui et al. [44] performed an analysis of the spatial distribution of fire events in the months of July and August during the period of 2007–2009. Around half of the fire pixels were detected in croplands, and the remaining half was evenly distributed between forest and shrub. Based on the analysis of the low, mid, and upper atmospheric fields of geopotential, temperature, relative humidity, and wind in two extreme events of fire activity that struck Greece and Italy on 24–25 July and 22–27 August 2007, they suggested a conceptual model for meteorological conditions that favor the occurrence of severe wildfire episodes in Italy and the Balkan Peninsula.

The current study is the first one to use the FRP product retrieved by geostationary satellite observations to investigate fire activity in a recent long-term period for the Eastern Mediterranean.

In Mediterranean ecosystems, prolonged droughts and heat waves have created the preconditions for the increasing frequency and intensity of forest fires [50], the underlying mechanism being the reduction of live and dead fuel moisture content as a response of the soil–plant system to the increased vapor pressure deficit [51]. Vegetation response varies with species as well as with forest structure and soil/terrain characteristics, and it is determined by evapotranspiration. The moisture of dead fuels is affected by weather variations as well, and it is regulated through evaporation [52]. The climate and weather patterns of the Mediterranean region highlight the value of having an improved understanding of the relationships between drought and wildfire; more specifically, an understanding of how drought is related to fire danger outputs. A number of studies have examined the link between drought indicators and wildfire occurrence [53–55]. Many drought indices (see [56]) are driven by stand and climate variables of precipitation and/or temperature, but more recent developments include variables that express conditions at the land surface– atmosphere interface such as vegetation health [57], soil moisture deficit [58–62], actual evapotranspiration [63], and evaporative demand [56,64,65]. These last indexes are important because they can reflect the outcome of the coupled physical processes of energy–water cycles on the land surface.

In our previous work [66], long-term LST anomalies retrieved by MSG SIVIRI were used to identify drought-prone areas in the Eastern Mediterranean. Song [21] showed that there is no linear relationship between LST anomaly and fire occurrences, but the time series of LST anomaly before fire events show a significant trend. Resolving LST–fire interactions facilitate fire predictions and the issuance of early warnings.

We use long-term spatially and temporally consistent satellite observations from the SEVIRI sensor of geostationary MSG satellite to assess the biophysical forcing effect of vegetation fires on a climatic basis at a regional scale. The study is focused on two key aspects of wild fire problems that are not systematically studied in the literature: (i) an evaluation of the importance of LST (monthly mean values, anomalies, and difference with air temperature, T2m) as precursor/s of vegetated land surface dryness and the related pre-fire signals of vegetation stress and fire occurrence; (ii) an exploration of the variability of these biophysical drivers of vulnerability in biomass burning across the climatic gradient, delineated into main land cover types, and the quantification of relations with fire activity using long-term records of satellite information from Meteosat, ground observations, and SVAT model data of Soil Moisture Availability (SMA) to the land cover as reference data over the Eastern Mediterranean region (Bulgaria). The specific objectives to be met are the following:


#### **2. Materials and Methods**

This work is based on the methodology developed in our work [66], adapted to assess environmental control on fire activity and its spatial–temporal variability by using satellite observations from geostationary Meteosat. As a biophysical index related to the forcing of fire activity, the satellite-derived IR skin temperature from SEVIRI observations with the EUMETSAT LSASAF LST product is proposed. Based on long-term data records, LST mean summer monthly values, and their anomalies and deviation from the air temperature, T2m were applied as indicators of the coupled energy and water cycles. Fire-promoting SM anomaly patterns derived by the SVAT modeling approach were used as a reference for the evaluation of the LST–fire intertwining. The pattern of fire activity over Bulgaria was characterized by the evaluation of SEVIRI-based satellite detections of biomass burning according to the LSASAF FRP-Pixel product. Benefiting from the geostationary satellites' high temporal resolution, the relationship between LST and fire regimes was studied and the underlying biophysical processes were explored.

Statistical comparative analyses of long-term data records of the LSASAF product, SVAT meteorological model outputs, and ground observations (meteorological parameters and actual fires) were applied following the methodology in [66], which was further extended (in Section 2.5). Overlaid graphical analyses of the summer season (June–September 2007–2018) dynamics of biophysical indexes in relation to fire characteristics (number of detections and severity) were performed to study the sensitivity of the approach. All datasets taken from information sources with different spatial resolutions were placed over the grid of the ECMWF global NWP model, IFS version O1280 (about 9 km spatial resolution). Only values with 1400 points that cover the region of Bulgaria were considered.

#### *2.1. The LSASAF FRP-PIXEL Product*

In order to characterize fire activity in this study, radiative energy released by fire events, as provided by the LSA SAF Fire Radiative Power-Pixel (FRP) product [36,67], was used. The LSASAF FRP-PIXEL product provided information on the location, timing, and fire radiative power (FRP, in MW) output of landscape fires ("wildfires") detected every 15 min across the full Meteosat disk at the native spatial resolution of the SEVIRI sensor. The FRP provided information on the measured radiant heat output of detected fires; the heat output was a direct result of the combustion process whereby carbon-based fuel is oxidized to CO2 with the release of a certain "heat yield" [24]. FRP calculation relies on the Middle Infrared (MIR) method and assumes FRP to be proportional to the difference between the observed fire pixel radiance measured from the MSG SEVIRI radiometer and the "background" radiance that would have been observed at the same location in the absence of fire. The LSASAF algorithm for deriving FRP from SEVIRI radiance measurements is fully described in [37]. The type of fire detection algorithm that is proposed for use with SEVIRI is based on the principles applied to generate active fire detections within the MODIS Fire Products [24].

While this FRP methodology offers many advantages, among the uncertainness reported, two factors are likely to reduce the satellite-measured FRP that is emitted by the fire:


The FRP algorithm is subject to other sources of uncertainties, essentially due to the characteristics of SEVIRI, as described in [37]. Among these limitations, one can state that the FRP-derived value is quite sensitive to the fire location within the pixel. A fire located at the center of the instrument's instantaneous field-of-view will elevate the pixel temperature much more than a fire located far away from its center. Additionally, abnormally low radiances might be observed surrounding a fire pixel due to the negative lobes of the point spread function. As a result, the background temperature might be colder, thereby increasing the estimated FRP value. The rather coarse spatial resolution of SEVIRI may cause the non-detection of smaller/less intense fires that MODIS can detect, for instance, but SEVIRI misses. The non-linearity of the SEVIRI 3.9 channel above 310 K and the saturation of that band above 335 K are responsible for the error in the FRP estimation. As a result, FRP values derived from SEVIRI underestimate those derived from MODIS by about 40% [34].

A significant remote sensing constraint for fires is the presence of cloudiness in fire weather situations, as all satellite methods used for monitoring fires have limitations that tend to cause important biases in the final product. The algorithms cannot detect existing fires due to the elimination of cloudy pixels or the assumption of some detection in partially cloudy pixels as "false" fire detections based on "contextual" tests.

#### *2.2. Biophysical Indexes*

In our work [66], the spatial–temporal variability of land surface state dry anomalies was characterized by the land surface temperature with the use of LST retrievals from the MSG satellite measurements and SMA. Using long-term data records, the same biophysical drivers of drought occurrence have been further explored here in terms of their forcing effects on fire activity through the related biophysical processes. In addition to land surface radiometric temperature characterized by its monthly values and calculated anomalies, the blended parameter, which is the difference between skin surface temperature and air temperature at 2 m above the earth's surface (LST-T2m), is proposed to be used as a first proxy of sensible heat flux exchange. The patterns of these biophysical indexes were defined as the mean monthly values for June, July, August, and September. The SMA

anomalies during the summer months were considered in parallel to explore the fire forcing effect of soil moisture deficit.

To quantify the relations, statistical analyses and modeling were applied. Since the considered biophysical parameters and the procedure of their application have been explained in detail in [66], only the main points are marked here, and the new elements are further explained in Section 2.5.

#### 2.2.1. The LSASAF LST Product

For the purposes of the current study, the LSASAF LST product (MLST, LSA-001) has been applied [68]. The LST retrieval was based on clear-sky measurements from the MSG system in the thermal infrared window (MSG/SEVIRI channels IR10.8 and IR12.0), every 15 min within the area covered by the MSG disk. Theoretically, LST values could be determined from MSG 96 times per day, but in practice, fewer observations were available due to cloud cover.

The 12-year time series for the region of Bulgaria for the period of June–September (2007–2018), which included only days/locations with satellite fire detections, was constructed. Datasets for LSASAF LST at 0900 UTC and at 1200 UTC (averaged over 5 measurements within ±30 min of these two time slots to avoid the limitations of cloudy pixels) were inferred at MSG pixel bases. For some climatic assessments, the averaged 3 × 3 pixels LST values around the grid points were also processed. The difference between LST and the air temperature, T2m (observations at the SYNOP network at 0900 UTC and 1200 UTC) was also constructed for the same studied period.

#### 2.2.2. SVAT Model and SMAI

For the identification of fire-promoting SM anomaly patterns, the "SVAT\_bg" model developed at the National Institute of Meteorology and Hydrology (NIMH) of Bulgaria was used [61,69,70]. This is a simple 1D site-scale meteorological model that exploits the concept of one layer of vegetated land surface and two levels of moisture availability along the root zone depth. One of the main "SVAT\_bg" model output parameters is soil moisture (SM) for different root zone soil depths (20, 50, 100 cm). For assessing land surface state, the SMA concept was adopted to serve as an information source for "warnings" of environmental constraints. Based on the "SVAT\_bg" model-derived SM, a quantitative SMA index, SMAI was developed and operationally calculated at a site scale for the region of each NIMH synoptic station. At each point of the used grid, the SMAI values at the nearest synoptic station were considered for the calculation of the anomalous water content in the unsaturated root zone. SMAI was designed as a 6-level threshold scheme to account for moistening conditions [66]. The site-scale assessment of mean monthly SMAI anomalies at synoptic stations was compared with the overlaid FRP Pixel detections from corresponding MSG pixels covering the region of the station applying the statistical analyses.

#### *2.3. Ground Observations of Forest Fires*

Ground observations of the number of actual forest fires were considered according to the database of the State Forest Agency of Bulgaria (SFA), the responsible national institution, and were used for comparison with satellite detections.

#### *2.4. Target Region and Land Cover*

The study area includes the whole territory of Bulgaria, Southeastern Europe, located within latitude circles of 40.25 N and 45.0 N and meridians of 20.5 N and 29.5 E. The region falls under Mediterranean climate influences, characterized by dry summers and mild, wet winters, both with irregular precipitation distribution. Information about land covers (LC) provided by the ESA-CCI Land Cover Map product at a spatial resolution of 300 m [71] was used. Maps were updated on an annual basis. The LC classification system from ESA-CCI includes 24 vegetation types. For the purposes of this study, we reclassified these types into the following three main vegetation groups (Figure 1): forests (classes 50, 60, 61, 62, 71, 72, 80, 81, 82, 90), scrubland (11, 12, 40, 100, 110, 120, 121, 122), and cultivated areas (10, 20, 30, 130). Other land cover types, including permanent snow and ice (220) and barren and water bodies (200, 210), where there are limited fires, were not analyzed in our study. Urban areas (190) were also excluded.

**Figure 1.** Geographical distribution of the vegetation cover types according to the ESA-CCI Land Cover database 2018 over (**a**) Europe; (**b**) Bulgaria, SE Europe, reclassified for the purposes of the study into three categories: forest, shrubland, grassland.

#### *2.5. Numerical Analyses*

To characterize the fire distribution pattern, a dataset of FRP detections (location, timing, and energy) for the period of June–September (2004–2019) was constructed [72,73]. All pixels with at least one fire detection were considered. Fire regime was characterized by the frequency of detections and severity according to the released energy from biomass burning (FRP fire radiative energy, MW). Each fire pixel was associated with the coordinates of the nearest point on the grid, where LST, LST anomalies, and (LST-T2m) had been derived. The procedure was performed for monthly (June, July, August, and September) accumulated energy released from fires in the corresponding grid point.

In order to validate the adopted methodology, monthly mean anomalies of LST and SMA that had been inferred in [66] and visualized in color-coded maps were evaluated for consistency with FRP fire pixel spatial distribution over Bulgaria by using qualitative comparative analysis. Examples for July 2007 are presented in Figure 2. Figure 2a shows that the higher positive LST anomalies (in red) correspond to a higher number of FRPdetected fire pixels. The stronger negative SMA anomalies (indicated by reddish colors) are related to a higher number of fire detections (Figure 2b). The consistency of the data allows quantitative studies to be performed further.

Based on long-term records (June–September 2007–2018), stochastic graphical analysis was performed. The consistency in the behavior of the fire activity and the biophysical drivers LST, LST anomalies, (LST-T2m), and SMA anomalies was analyzed in terms of their mean, spatiotemporal variability on a monthly and annual basis, as well as their anomalous distribution and relations. The summer seasonal dynamics of biophysical conditions for different LC vegetation types were studied and their relation to vulnerability of fire ignition/spread was evaluated.

A graphical description of the locality, the spread and skewness groups of numerical data of FRP detections, released energy from biomass burning, LST, LST anomalies, and the (LST-T2m) temperature difference was performed through their boxplot quartiles. Two types of regression analyses were performed; the first one made use of estimates by the method of least squares of the conditional mean of the response variable across values of the predictor variables. The second method used Quantile Regression (QR) estimates of

the conditional median of the response variable, which has received increasing attention in recent years and is applied in many areas, including data analysis in the natural sciences.

**Figure 2.** Spatial distribution of LSASAF FRP fire pixel detections superimposed over: (**a**) LSASAF LST 0900 UTC anomalies and (**b**) monthly mean SMA anomalies (50 cm soil depth). Examples for July 2007, Bulgaria: The LST and SMA anomalies were calculated for the 2007–2018 period [66].

To identify regions most vulnerable to fires in Bulgaria, the spatial pattern of the accumulated fire numbers detected by the LSASAF FRP-Pixel product for the period of June–September 2004–2019 was considered. A statistical evaluation of the spatial distribution of fire pixels was performed by applying the Mann–Kendall statistic test. To perform all these comparisons, the R language for statistical computing was used [74].

All numerical evaluations were performed for the three main vegetation groups: forest, shrubland, and cultivated, as well as considered all together in a sample "all land cover". Such a reclassification into three LC categories has been used in other studies of fire activity over the Mediterranean using satellite data, e.g., in [44].

A summary of all data used for comparative analyses in the current study are presented in Table 1.


**Table 1.** Summary of data used and their characteristics.

#### **3. Results**

Fire regimes were described through statistical distributions of frequency and severity over the studied area in the summer season during a 16-year time period. Using 12 years of data for LST and related biophysical parameters, the environmental determinants of fire regimes were assessed by exploring how environmental drivers operating over a range of scales affected the spatial and temporal patterns of these fires.

#### *3.1. Active Fire Monitoring from Space*

The results from the study of fire activity derived from satellite observations over Bulgaria are presented in maps of accumulated fire detections for each month of the July–September period and each year of the whole 2004–2019 period. Examples for selected years are shown in Figure 3 (see also [72,73]). The fire severity was assessed following the color-indicated released energy of burning biomass FRP (MW). The fire activity exhibited various seasonal distributions from year to year, e.g., maximum activity and severity was observed in July 2007 (also in 2006, 2016, 2017, not presented), in August 2004 (also 2006, 2008), and in September 2019 (2011, 2012). There are repetitions of spots with higher fire activity, suggesting that the spatial distribution varies depending on the evolution of the horizontal patterns of the biophysical drivers (to be further considered in Sections 3.2 and 3.3).

**Figure 3.** Maps of the spatial–temporal (monthly/annual) distribution of fire activity over Bulgaria. Color-coded severity of biomass burning according LSASAF FRP-Pixel (MW) is indicated. Examples for July, August, September and annually for 2004, 2007, and 2019 are presented.

Results from the validation of the MSG FRP-Pixel product comparing the number of detected fire pixels in forested areas with the course of actual forest fires reported by ground observations of the State Forest Agency (SFA) of Bulgaria are shown in Figure 4. Comparisons are performed for the period of July–September 2007–2018. A good agreement between the two independent sources of information for forest fire dynamics during different months and years is observed: the increase in actual fires corresponds to the increase in satellite fire pixel detections; the highest forest fire activity is in July 2007; for July and August, the two lines are very close, indicating an almost similar number of fire detections; for September, the courses are similar but the number of satellite detections are lower, probably due to the cloudier conditions in September.

#### *3.2. Biophysical Drivers and Fire Activity*

#### 3.2.1. Annual Trends in Fire Activity along with LST

Figure 5 shows a comparison of LST (at 0900 UTC) with the fire energy released according to the FRP-Pixel product (MW). The course of LST over time (red lines) shows behavior synchronized with fire energy (blue lines) for all vegetation types. The sample size allows a regression analysis to be applied, and the results show an existing linear relationship between the parameters: Trend lines of both parameters, LST (red dashed line) and FRP, MW (blue dashed line), for the considered LC types were obtained (Figure 5a–d). The coefficient of determination R2, which is the square of the correlation, shows how much of the observed scatter in the data is due to the hypothetical linear component as opposed to the unexplained

random error. Despite the rather complex nature of the curves, the resulting R2 values confirm a significant linear component (values bottom right in the panels).

**Figure 4.** Inter-annual dynamics of forest fires over Bulgaria according the LSASAF FRP-Pixel product detections (blue line) and ground observations of actual forest fires by the national SFA (green line) for (**a**) July, (**b**) August, (**c**) September 2007–2018.

**Figure 5.** Time series of fire activity characterized by the total energy released from biomass burning per year FRP, MW (blue line) along with LST (red line) (June–September, 2007–2018) for: (**a**) All LC samples; (**b**) Cultivated LC; (**c**) Shrubs LC; (**d**) Forest LC. Each time series is fitted with a trend line using the linear regression technique (least squares method).

Figure 5a shows the trend lines for LST and the energy released from biomass burning for a sample of all land cover types without classifying them ("all LC"); the linear regression line fits the relation between the parameters at high R2 values of 0.68. The values of R<sup>2</sup> for forest-shrubs-cultivated types vary between 0.61 and 0.68, and the validity of these conclusions passed the significant test from 0.001 up to 0.003 levels (Figure 5b–d). The FRP (blue lines) decreases with the decrease in LST (red lines) within individual diapason for biomes (e.g., 32–29 ◦C for forest and 36–34 ◦C for shrubs/cultivated, as seen in

Figure 5c,d). A high level of fit between the trends in the number of fire pixels and the LST as a biophysical driver is also experienced (Supplementary Material). In this case R2 varies within 0.67–0.62 for shrubs and forest vegetation types, at high significant levels, indicating a descending trend for the fire accuracy along with the LST decrease.

To clarify the nature of this linear structure, it is examined by linear regression on the individual components in each data pair. Again, the results are similar (values in the bottom left panel in Figure 5), although at the limit of reliability. For at least one group in each pair, a linear model with a similar slope of about −0.25 is obtained, and its significance level is slightly above 0.10. Therefore, for at least one pair, a significant confirmation of the estimated downward trend line was obtained.

#### *3.3. Statistical Analyses*

Monthly wildfire activity (2007–2018) characterized by the number of fire pixel according to the LSASAF FRP-Pixel product, and the ratio between the total energy released (MW) and the total number of fire pixels detected for a specific month and LC type are shown in Table 2. The number of fire pixels in forest and shrubs in June represents 6.5–7.6% percent of the fire pixels detected in July. Fires in cultivated LC in June seem lower by about 5% than their amount in July due to the specific vegetation in the managed LC areas still being in the growing phase. For September, the wildfires (forest, shrubs) from the forest fires in August are at 37% and 50% from the shrub fires in the same month. However, in the cultivated LC, the relatively large number of fire pixels are detected in September (75% from their number in August), which is likely due to some controlled agriculture burning activities.

**Table 2.** Monthly accumulated FRP-Pixel detections for the period of 2007–2018 and the total FRP energy released per pixel (MW), DFI. Examples for shrubs, forest, and cultivated LCs for June, July, August, and September are presented.


The ratio between the total FRP energy released and the accumulated number of fire pixels at a specific LC type was introduced to serve as a measure of mean Detected Fire Intensity (DFI) observed by the satellite. The DFI from the forest fire pixels is the lowest one, ranging from 35 MW/pixel in June to 96 MW/pixel in August. For shrubs and cultivated LC types, the DFI range is 72–101 MW/pixel and 70–107 MW/pixel, respectively.

The satellite-derived FRP are likely to estimate a specific portion of the actual fireemitted FRP (see Section 2.1). Data in Table 2 show that shrubs LC is characterized by the largest amount of total energy released by fires for the summer season. In order to assess how the portion of satellite-detected fire energy depends on the vegetation type during the fire season, the DFI values of forest and cultivated LC are divided by the DFI obtained for shrubs in each month (Table 3). It can be seen that the ratio between the DFI parameters of forest and shrubs is 0.48 in June, then increases gradually to 0.95 in August and becomes 0.89 in September. At the same time, the ratio between the DFI values for cultivated–shrubs LC is close to 1.0, ranging from 0.96 in June to 1.06 in August–September. This result comes from the different ability of satellite observations (FRP product) to estimate actual emitted energy from wildfires at different LC types, and this also depends on the biophysical status of the vegetation along the fire season.


**Table 3.** Ratios of the monthly accumulated DFI of forest and cultivated LC fires to the DFI of shrubs fires for the period of 2007–2018.

#### 3.3.1. Box Plots Analyses

Boxplots of the distribution of fire radiative energy (MW) according to the FRP-Pixel product and biophysical indexes for forest and shrubs LCs are shown in Figures 6 and 7.

**Figure 6.** Boxplots of monthly mean values (2007–2018) for the region of Bulgaria during the summer season (June–September): from the LSASAF FRP-Pixel (MW) for (**a**) forest and (**b**) shrubs; from the LSASAF LST 0900 UTC ◦C for (**c**) forest and (**d**) shrubs. The boxes represent the interquartile range (from the 25th to 75th percentile, first and third red line), whiskers cover 99.3% of the data, and the middle (red) lines represent the mean values.

158

**Figure 7.** Boxplots of monthly mean values (2007–2018) for the region of Bulgaria during the summer season (June–September) from the LSASAF LST anomaly for (**a**) forest and (**b**) shrubs, and the (LST-T2m) difference for (**c**) forest and (**d**) shrubs. The boxes represent the interquartile range (from 25th to 75th percentile, first and third red line), whiskers cover 99.3% of the data, and the middle (red) lines represent the mean values.

The plots in Figure 6 show different aggregation of data by months. Through a visual comparison of the corresponding notches, it can be seen that there is sufficient reason to accept the statistical hypothesis that the medians are significantly different. The red lines show the median and first and fourth quartile averages of the same statistics for all individual boxplots presented and are used to visually assess the degree of difference depending on the month.

The boxplots in Figure 6a,b demonstrate the variability of the monthly mean values of FRP (MW) for forest and shrubs LC during the summer. The statistical analyses confirm that for July, the ratio between the median of total FRP box plots for shrubs and forest fires is higher than the corresponding ratio for the DFI parameter in Table 3: shrub fires were detected with about 50% higher released FRP (MW), while the DFI parameter was only 35% larger than this for forest fires. This effect seems to be due to higher energy emitted by some forest fires in July.

On the contrary, for June, the medians of total FRP are not much different for these two LC types (Figure 6a,b), but the DFI parameter is twice higher for shrubs than the corresponding DFI value obtained for forest fires (Table 3). This suggests the occurrence of more forest fires under the canopy of dry dead forest fuel with much less energy measured by the satellite and that a certain amount of radiant energy may have been intercepted (scattered and absorbed) by the forest canopy.

The summer course of the FRP median corresponds well to the LST statistical behavior for the forest LC (Figure 6a,c). In June, optimal SMA is present in the deep forest root zone that leads to fully open stomata conductance and unlimited evapotranspiration, leading to decreasing canopy temperature and minimum LST. For shrubs (Figure 6b,d), there is a disagreement: the LST median maximum appears in August, but the maximum of the total FRP median is in July. The reason could be that in shrub lands, the SMA easily reaches the wilting point, in which the stomata conductance strongly decreases in reaction to the increasing water stress conditions through the transpiration regulation mechanism. These lead to an increasing shrub canopy LST in August.

All fires in the period of July–September occur in days associated with positive values of LST anomalies (averaged in the area of 3 × 3 pixels) around the same location (Figure 7a,b). All 25% of the fire detections in the days associated with negative anomalies of LST occur in June in forest and shrubs LCs. This result suggests that in June, some fires may mostly occur in dead fuel where the canopy temperature is below the historical average for today's day-of-year.

The trend of increasing (LST-T2m) from June to September (Figure 7c,d) corresponds to the gradual decrease of SMA to vegetation cover on the one hand, and to the increasing LST on the other. This is consistent with the result on the disagreement in the behavior of LST versus FRP in August for shrubs LC (Figure 6b,d).

#### 3.3.2. Quantile Regression

Results from the quantile regression analysis of the relationship between log FRP (MW) and biophysical parameters (LST, LST anomalies, (LST-T2m), and SMA anomalies) are presented on Figure 8a–d.

These graphs can quite comprehensively characterize the dependences between each pair of quantities. QR examples of the "all LC" type are here presented. Figure 8a shows at which LST values the maximum FRP (MW) might be expected and the corresponding probability level the FRP might reach the maximum. Accordingly, the QR in Figure 8a shows how the log(FRP, MW) maximum depends on the predictor LST, with a confidence of 5%, 25%, and other levels of significance. The QR lines indicate that the high values of the log(FRP, MW) > 5 can be expected to occur at LST = 35 ◦C with a 25% probability level; extreme values of log(FRP, MW) > 7 are possible with 5% reliability at LST above 40 ◦C, and with 10% reliability above 45 ◦C.

On the plot of Figure 8b, the probability level of FRP maxima at a specific temperature difference (LST-T2m) is illustrated. Very high values of log(FRP, MW) > 6 can be expected to occur at (LST-T2m) = 5 ◦C with 10% reliability; extreme values of log(FRP, MW) > 7 are possible at a 5% significance level for (LST-T2m) above 12 ◦C. This illustrates how the log(FRP, MW) maximum depends on the predictor (LST-T2m), with confidence levels of 5%, 10%, and other levels of significance.

The QR between FRP and LST anomalies (Figure 8c) shows that very high values of log(FRP, MW) > 6 can be expected to occur at negative LST anomalies, with a probability of less than 10%. Extreme values of log(FRP, MW) > 7 are not possible even at a 5% significance level, with negative LST anomalies. In parallel, higher FRP energy occurs at negative SMAI anomalies (Figure 8d). The decreasing trend of energy released, along with the decreasing values of negative or low positive anomalies, i.e., decreasing drought (agri-

cultural and ecological), at corresponding probability levels are shown. Extreme values of log(FRP, MW) > 7 are possible at 10% significance with negative SMAI anomalies.

**Figure 8.** Quantile regression of the total energy released by biomass burning (all land cover types) in Bulgaria (June–September 2007–2018): log (Total FRP-Pixel, MW) vs. (**a**) LSASAF LST 0900 UTC; (**b**) (LSASAF LST-T2m) temperature difference; (**c**) LSASAF LST anomaly (2007–2018); (**d**) SMAI anomaly at 20 cm soil depth. Regression lines for 50%, 75%, 90%, and 95% quantile are shown.

Such QR graphs have been developed for forest, shrubs, and cultivated vegetation types and confirm similar relations for the fire activity over the whole country. The results obtained for some smaller samples over specific regions of Bulgaria show an intersection of the probability lines that is an indication of insufficient data.

#### 3.3.3. Correlation Analyses

The results of the least-squares method applied for the quantitative evaluation of the relation between biogeophysical indexes (LST, LST anomaly, (LST−T2m) temperature difference, including the SMAI anomaly at 20, 50, 100 cm depth) and the severity of wildfire events are presented in Table 4. Monthly means LST at around 0900 UTC, which is 1200 local time (selected as a less cloudy period), are considered. For all land cover types (all biomes included in a sample), exponential regression lines fit the relations at high R<sup>2</sup> values and high significance levels (Figure 9).

**Table 4.** Coefficients of determination R2 of the regression between the energy of biomass burning, FRP (MW), and the monthly mean values of biophysical indexes (July–August, 2007–2018). Only days with fire detections are considered.


**Figure 9.** Regression models of the total energy released by biomass burning in Bulgaria according to the FRP-Pixel product, MW vs. LSASAF LST 0900 UTC for: (**a**) July; (**b**) August; (**c**) September, and vs. LSASAF LST anomaly (2007–2018) for: (**d**) July; (**e**) August; (**f**) September. Examples for cultivated land cover are presented.

The SMA index is especially efficient as a measure of fuel dryness, taken at 50 cm or 100 cm soil depth, for all cover types in the period of July–August. With the progressive establishment of dry conditions during August and September, the correlation decreases when the SMA at 20 cm and 50 cm becomes fully exhausted, and the SMAI is equivalent to a constant value of zero.

Exponential regression models better fit the FRP (MW)–LST relations from Table 4. Examples for cultivated LC are shown in Figure 9a–c, indicating high significance levels (*p* < 0.007) of the dependences for July, August, and September. Similar exponential models for the regression of FRP (MW) versus the LST anomalies, R2 at about 0.6 (*p* < 0.003), are presented in (Figure 9d–f). Results confirm a statistically significant high correlation between biophysical drivers (LST, LST anomalies) and the wildfire severity.

All these results suggest that the biophysical index LST and the related parameters LST anomalies and (LST-T2m) are sensitive to the dynamics of the occurrence and severity of vegetation fires in all studied LC types.

#### 3.3.4. Spatial Pattern and Trends

To characterize fire activity, plots of monthly sums of FRP detections are considered. Figure 10 shows a color-coded map of accumulated fire pixels in the period of June–September 2004–2019. The fire numbers are categorized into the following classes: 1–30 (low, yellow color), 30–90 (moderate, orange color), 90–150 (high, red color), and >150 (extreme, dark red). Thus, the spatial distribution of the spots with higher fire activity is revealed. The "+" sign on the map indicates that the trend passed the Mann–Kendall significance test at the 5% level; thus, regions with a statistically significant positive trend in fire activity over the studied period are localized.

**Figure 10.** Map of "hot spots" of fire activity over Bulgaria, SE Europe based on long-term satellite observations (June–September, 2004–2019) using the LSASAF FRP-Pixel product. The fire numbers are the sum of the fire pixels (4 × 5 km MSG resolution over Bulgaria) detected and resampled in a 10 × 10 km plot. The "+" sign indicates locations where the trend passed the Mann–Kendall significance test at the 5% level.

The long-term trends of fire activity (number of detections) are different when it is delineated into different land cover types (superposition of Figures 1b and 10). These considerations show that there is no positive trend of fire activity in regions with forest land cover types. There are limited hot spots for fires, located in the northern and southeastern parts of Bulgaria, indicating an increasing trend in fire activity that is mostly scattered in the shrublands. Positive trends are also seen in limited areas of cultivated land cover types, but in this case, it is rather a result of human activities than a climate trend.

#### **4. Discussion**

The results of our study confirm the driving role of land surface temperature and the related biophysical parameters in the spatial–temporal evolution of fire activity in Eastern Mediterranean as well as in forcing the vulnerability of forest, shrubs, and cultivated land cover types to wildfire. Based on the 2007–2018 summer time series, the trend analyses (Section 3.2.1) show the synchronized behavior of annual LST values with the released FRP energy (MW) from biomass burning (respectively to the number of fire pixels). The analysis of satellite retrievals from geostationary satellite data show a statistically significant declining trend for the wildfire characteristics, consistent with the course of LST for all considered vegetation types (Figure 3).

The box-plot statistics (Figure 6a,c) show that the median of the FRP (MW) energy released from forest fires in the summer course corresponds well to the behavior of the LST median, which confirms the relationship between the biophysical index LST and fire activity. All fires in the period of July–September occurred in days associated with positive values of LST anomalies, while the other 25% of the fire detections in days associated with negative anomalies of LST occurred in June at forest and shrubs LCs (Figure 7a,b).

A high correlation between the monthly mean values of the biophysical indexes (LST, LST anomalies, LST-T2m, and SMA) and FRP-detected fire pixels was obtained. Exponential regression models fit the relations for all LC types at high significance levels (Section 3.3.3). This result is a step forward from the findings of Song [21], who stressed on the spatial–temporal information provided by LST anomalies (2001–2019 time series over Australia) on fire activity, confirming a significant trend before the fire events but not establishing a linear relationship with fires.

The values of LST, LST anomalies, and LST-T2m, for which the maximum FRP (MW) values might be expected at the corresponding probability level, were statistically estimated (Section 3.3.2) as being able to contribute to fire preparedness activities. For example, extreme values of log(FRP, MW) > 7 were possible at a 5% significance level for (LST-T2m) above 12 ◦C. A previous study reported that forest fires in August over Bulgaria might occur under land surface drought conditions defined by middy (MSG LST-T2m difference) >12 ◦C prior to the fire [62].

These dependences obtained for the Eastern Mediterranean confirm and extend the reported results in the literature that land surface state can significantly contribute to the enhancement of weather impacting on the fire environment such as drought (long-term and/or short-term), terrain, and fuel conditions [62,75,76]. Plants constitute the main ignition material in the landscape, and their moisture content plays an important role because it may serve to retard ignition or mitigate the propagation of a fire [77–80]. For this reason, the fuel moisture content is a common component of fire danger and related fire regime assessments; e.g., in [81,82]. Skin temperature is an important disclosure of fuel moisture content, and the results reported confirm the role of LST as a biophysical index of fire activity. The physical mechanism of LST–SMA relations and related fire occurrence covers a range of coupled biophysical processes: as a moist soil surface dries out, more of the incoming solar energy is reflected, and a larger fraction of the absorbed energy is used to heat the air and soil [83]. The heat flow into the soil increases at first, then decreases as the soil becomes very dry [84]. This results in increasing land surface temperatures, which also influences the rate of drying and evapotranspiration. As a result, high LST is a cause and product of dry periods; in other words, drought begets drought and extreme fire activity.

Land surface temperature explains 80% of the variance in air temperature, and vegetation density also plays an important role in explaining the air temperature variance [85]. Since only data from days with detected fires in the summer months were considered, most of the results of the statistical analyses are related to conditions of limited SMA for the vegetation cover. In such conditions, the (LST-T2m) parameter is a measure of the evapotranspiration rate, and relations are specified by atmosphere/land weather and climate conditions as well as by physiological and structural features of the LC [86,87] Hence, the trend of increasing (LST-T2m) from June to September (see Figure 7c,d) is related to gradually decreasing evapotranspiration during the summer months due to the gradual decrease of SMA to vegetation cover on the one hand, and/or the increasing LST on the other.

All these results suggest that the biophysical index LST and the related parameters of LST anomalies and (LST-T2m) are sensitive to the dynamics of vegetation fire occurrence and severity. The dependences are valid for forest, shrubs, and cultivated LCs at high significant levels and indicate that satellite IR retrievals of radiative temperature is a reliable source of information for vegetation dryness and fire occurrence. Some existing mismatches are due to highly inhomogeneous land cover from one side and the complicated nature of their biogeophysical relations. LST is directly linked to precipitation, cloudiness, and solar irradiance, while SMAI is influenced by the cumulative effect of these meteorological parameters and with functional links to vegetation. In the case of when the period of moisture depletion is long, the LST-SMA relation is weak because the LST continues to increase; on the other hand, after full SM depletion, the "SVAT\_bg" model does not assess any further SM changes. In terms of the model simulations, full SM depletion denotes an interruption of liquid and water vapor flow because SM has reached its constant minimum value (at specific soil/climate), which is set to be the maximum hygroscopic value [69].

In addition, there are some limitations in the remote sensing approach for assessing the relation of fire activity to the distribution of its biophysical drivers. Such cases may be due to different abilities of the satellite observations and the derived FRP product to estimate actual emitted energy from wildfire in different LC types, and this also depends on the biophysical status of the vegetation during the fire season. This ability of FRP is significantly different as compared to wildfires in the forest and shrubs LCs in June (reported here, Table 2 and Figure 6a,b) due to the specific physical properties of the forest canopy, which leads to larger amounts of radiant energy to be intercepted. This in turn makes some of the low-energy forest fires impossible to be detected by the satellite. In August and September, with the decrease in SMA and the fraction of vegetation cover, the canopy biophysical properties changed so that a larger amount of energy released by forest fires could be evaluated by satellite measurements, and most of the forest fires could be detected as well.

Although landscape fires in Mediterranean countries are mainly caused by human activities related to agricultural practices, this study confirms that fire regime components (occurrence and severity) exhibit characteristic spatial and temporal patterns that reflect differences in the relative importance of various environmental drivers. In this regard, an identification of the regions most vulnerable to biomass burning is of importance for some prevention activities. Different land covers have widely differing flammability, which depends on species composition, stand age and density, microclimate, and soil conditions [31,88]. For this reason, a detailed discrimination of the land cover types is needed in studying short-term weather and land surface state influences on the fire ignition and spread. In our study, the LC typology is simplified to consider three types: forests, scrubland, and cultivated; this is because relationships between monthly mean values of the physical indexes and fire activity are explored over long-term 12-year periods. Moreover, the forest LC is considered to be spatially stable throughout all 12 years while more land cover changes may occur in the category scrubland during the observation period, and more detailed classification would be difficult to apply. Our qualitative comparative analysis of the ESA CCI LC maps for 2006–2015 does not show any significant changes in the land cover over Bulgaria for the test period.

Recent concerns about the potential increases of forest fires under climate change underline the importance of fire–climate feedbacks [89–91]. In this context, an important result of our study based on a dataset of satellite FRP detections from the last 16 years is that no positive trend in fire activity for forest land cover type has been observed over Bulgaria. There are significant increasing trends in fire activity for single spots in the low land areas of shrubs LC types in the northern and southeastern part of Bulgaria. A reasonable explanation for these results is that the forests are a more stable environment over a longterm period, which also confirms the validity of our approach for the simplification of the LC variety into three main types for the purposes of the current climate study.

#### **5. Conclusions**

This study quantified the relationship between the spatial–temporal variability of land surface temperature (LST) and fire activity on a short-term climatic scale over the Eastern Mediterranean (Bulgaria), accounting for physical properties such as land cover and soil moisture that, combined with LST, provide a valuable metric for surface state. Based on satellite observations from the geostationary Meteosat LSASAF-FRP-Pixel product (summer season 2007–2018), the distribution of fire activity on a monthly/annual basis in relation to the biophysical forcing effect of LST (assessed by IR MSG satellite measurements) was observed in land cover types of different fire vulnerability (forest, shrubland, cultivated).

A synchronized annual behavior between LST and FRP with a declining trend (2007–2018) in wildfire characteristics was seen. Exponential models fit the relationships of LST monthly means, LST anomalies, and LST-T2, as a first proxy of sensible heat exchange with atmosphere, as well as SMA with FRP fire characteristics (MW). All fires in the period of July–September occurred in days associated with positive LST anomalies.

The values of LST, LST anomalies, and LST-T2m, for which the maximum FRP energy (MW) might be expected at corresponding probability levels, were estimated as being capable of contributing to fire preparedness activities. For example, monthly mean extremes of log(FRP, MW) > 7 could be observed with a 5% reliability at an LST monthly mean above 40 ◦C and (LST-T2m) above 12 ◦C, and with a 10% reliability above 45 ◦C. The reported biophysical forcing effects of LST on vegetation fires provided more understanding of the relationships between drought and wildfire; more specifically, of how drought is related to fire danger outputs. Since forest fires have the potential to affect regional climate through changes in the energy budget [91–93], this knowledge is especially important for the Mediterranean region, where land–atmosphere coupling has become one of the important aspects of global environmental change.

To advance the added value of this study by using LST as a biophysical index of drought for fire management, further evidence in support of this approach is observed: First, the use of related FRP products from the MODIS sensor to validate the usefulness of this approach for short-term applications; Second, the evaluation and adaptation of the LST applications in the operational mode for the analyses of real fire situations in the scope of early warnings of fire risk assessment, thus contributing as the final step in the fire management practice.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/rs14071747/s1, Figure S1: (**a**) FRPdetections & LST.shrubs. (**b**) FRPdetections & LST.cultiv; Figure S2: (**a**) July and September 2007–2018 Shrubs LC. (**b**) August 2007–2018 Shrubs LC; Figure S3: (**a**) FRP-Pixel\_detections.SMA\_anomaly\_at\_50cm\_depth.forest.montly.7; (**b**) FRP-Pixel\_detections.SMA\_anomaly\_at\_50cm\_depth.forest.montly.8; (**c**) FRP-Pixel\_detections. SMA\_anomaly\_at\_50cm\_depth.shrubs.montly.7.

**Author Contributions:** Conceptualization, J.S.S.; methodology, J.S.S. and C.G.G.; software, C.G.G. and P.N.N.; formal analysis, C.G.G. and P.N.N.; investigation, J.S.S.; writing—original draft preparation, J.S.S.; writing—editing, J.S.S. and C.G.G.; project administration, J.S.S.; funding acquisition, J.S.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by EUMETSAT LSASAF Project, CDOP-3, 2017–2022.

**Data Availability Statement:** Data is available at EUMETSAT Product Navigator. Available online: https://navigator.eumetsat.int/start (accessed on 24 February 2022).

**Acknowledgments:** Data processing and preparation of illustration material in Sections 2.5 and 3.1 were performed by Andrey Kulishev within the frame of the LSASAF project implementation. LSASAF is acknowledged for providing the requested LST and FRP data. Data in Section 2.3 have been kindly provided by the State Forest Agency of Bulgaria. Thanks are due to the anonymous reviewers for the valuable suggestions and comments.

**Conflicts of Interest:** The authors declare no conflict of interest. The funder had no role in the design of the study; in the analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Multilabel Image Classification with Deep Transfer Learning for Decision Support on Wildfire Response**

**Minsoo Park 1, Dai Quoc Tran 1, Seungsoo Lee <sup>2</sup> and Seunghee Park 1,3,\***


**Abstract:** Given the explosive growth of information technology and the development of computer vision with convolutional neural networks, wildfire field data information systems are adopting automation and intelligence. However, some limitations remain in acquiring insights from data, such as the risk of overfitting caused by insufficient datasets. Moreover, most previous studies have only focused on detecting fires or smoke, whereas detecting persons and other objects of interest is equally crucial for wildfire response strategies. Therefore, this study developed a multilabel classification (MLC) model, which applies transfer learning and data augmentation and outputs multiple pieces of information on the same object or image. VGG-16, ResNet-50, and DenseNet-121 were used as pretrained models for transfer learning. The models were trained using the dataset constructed in this study and were compared based on various performance metrics. Moreover, the use of control variable methods revealed that transfer learning and data augmentation can perform better when used in the proposed MLC model. The resulting visualization is a heatmap processed from gradient-weighted class activation mapping that shows the reliability of predictions and the position of each class. The MLC model can address the limitations of existing forest fire identification algorithms, which mostly focuses on binary classification. This study can guide future research on implementing deep learning-based field image analysis and decision support systems in wildfire response work.

**Keywords:** wildfire response; multilabel classification; data augmentation; decision support systems; transfer learning

#### **1. Introduction**

Wildfires have become increasingly intense and frequent worldwide in recent years [1]. A wildfire not only destroys infrastructure in fire-hit areas and causes casualties to firefighters and civilians but also causes fatal damage to the environment, releasing large amounts of carbon dioxide [2]. To minimize such damage, decision makers from the responsible agencies aim to detect fires as quickly as possible and to extinguish them quickly and safely [3]. Wildfire response is a continuous decision-making process based on a variety of information that is constantly shared in a spatiotemporal range, from the moment a disaster occurs to when the situation is resolved [4]. Efficient and rapid decision making in urgent disaster situations requires the analysis of decision-support information based on data from various sources [5].

Video and image data are key factors for early detection and real-time monitoring to prevent fires from spreading to uncontrollable levels [6]. Over the past few decades, the use of convolutional neural networks (CNNs) in image analysis and intelligent video surveillance has proven to be faster and more effective than other sensing technologies in

**Citation:** Park, M.; Tran, D.Q.; Lee, S.; Park, S. Multilabel Image Classification with Deep Transfer Learning for Decision Support on Wildfire Response. *Remote Sens.* **2021**, *13*, 3985. https://doi.org/10.3390/ rs13193985

Academic Editor: Elena Marcos

Received: 8 September 2021 Accepted: 4 October 2021 Published: 5 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

minimizing forest fire damage [7]. Nevertheless, several problems must be addressed in forest fire detection and response when using CNNs.

The first problem is that most of the research on wildfires using deep-learning-based computer vision is mainly limited to binary classifications, such as the classification of wildfire and non-wildfire images [8]. In other words, these models are focused on detecting forest fires but ignore other meaningful information, such as information on the surrounding site. Even though a wide range of regions can be filmed via unmanned aerial vehicles (UAVs) or surveillance cameras, information provisions for decision makers are limited in this single-label classification model environment because only one type of result can be obtained from one instance. Unlike single-label classification or multiclass classification, where the classification scheme is mutually exclusive, multilabel classification (MLC) does not have a specified number of labels per image (instance), and the classes are non-exclusive. Therefore, the model can be trained by embedding more information in a single instance [9].

In the context of wildfire response, information is shared to establish a common understanding of wildfire responders regarding the disaster situations they encounter [10]. Information on the disaster site is a vital data source that must be shared to enable timely and appropriate responses. Therefore, information concerning human lives and property at the site of occurrence must be considered to ensure effective and optimized response decisions by decision makers [11].

Another important problem is that the performance of the learning model can be degraded by overfitting owing to insufficient data [12]. The lack of large-scale image data benchmarks remains a common obstacle in training deep neural networks [13]. However, transfer learning and data augmentation can significantly enhance the predictive performance of binary classification models to overcome image data limitations. In particular, transfer learning (with fine-tuning of pretrained models) improves accuracy compared to scanarios when the parameters of the model are initialized from scratch (i.e., without applying transfer learning) [14].

The main purpose of this study was to develop a decision support system for wildfire responders through the early detection and on-site monitoring of wildfire events. A decision support system should perform two functions: (1) process incoming data and (2) provide relevant information [15]. The received data are limited to image data from an optical camera, and the results of deep learning can be analyzed. Therefore, we propose a transfer learning approach for the MLC model to address the following challenges:


In this study, it is significant that MLC was used to provide multiple pieces of information within the image frame, away from the binary or multi-class classifications mainly covered in previous studies. The reason for using this multi-information framework is to share various pieces of information at disaster sites with disaster responders in near-real time. In the model configuration, we tried to lower the error rate as much as possible by using data augmentation, transfer learning, by adding similar data, and cross validation. In order to minimize the resolution gap between the CNN input model and the actual captured image, a method of dividing and evaluating the image was attempted.

The backbone network of the MLC was constructed using VGG16 [16], ResNet50 [17], and DenseNet121 [18], which are mainly used in CNN-based binary classification. These models were retrained on a dataset built by researchers and were validated using 10-fold cross-validation. The size of the dataset used in the training model was increased by data

augmentation to overcome the limitations caused by a lack of data. Finally, the model with the best performance among the three models was selected using the evaluation metric, and the result was visualized as a class activation map (CAM).

The remainder of this paper is organized as follows: Section 2 briefly summarizes previous studies on wildfire detection and response using image data and decision support systems. Section 3 presents the multilabel image classification, transfer learning model, and evaluation methods. The results of relevant experiments are analyzed and discussed in Section 4. Finally, the conclusions are presented in Section 5.

#### **2. Related Work**

Effective disaster management relies on the participation and communication of people from geographically dispersed organizations; therefore, information management is critical to disaster response tasks [19]. Because forest fires can cause widespread damage depending on the direction and speed of the fire, strategic plans are required to ensure prioritization and resource allocation to protect nearby homes and to evacuate people. In the past, limitations in data collection techniques constrained these decision-making processes, making them dependent on the subjective experience of the decision-maker [20]. Recent advances in information technology have led to a sharp increase in the amount of information available for decision making. Nevertheless, human capability in information processing is limited, and it is problematic to process information acquired at the scene of a forest fire timely and reliably. To solve this problem, a forest fire decision-support checklist for the information system was developed [21], and machine-learning-based research has steadily increased in the field of forest fire response and management since the 2000s [11]. Analyzing wildfire sites with artificial intelligence can substantially reduce the response time, decrease firefighting costs, and help minimize potential damage and loss of life [5].

Traditionally, wildfires have mainly been detected by human observations from fire towers or detection cameras, which are difficult to use owing to observer errors and time– space limitations [21]. Research on image-based automated detection that can monitor wildfires in real-time or near-real-time according to the data acquisition environment using satellites and ground detection cameras has been steadily increasing over the past decade [22]. Satellites have different characteristics depending on their orbit, which can be either a solar synchronous orbit or a geostationary orbit. Data from solar synchronous orbit satellites have a high spatial resolution but a low time resolution, which limits their applicability in cases of forest fires. Conversely, geostationary orbit satellites have a high temporal resolution but a low spatial resolution. According to previous studies, geostationary orbit satellites can continuously provide a wide and constant field-of-view over the same surface area; however, many countries do not have satellites owing to budget constraints, atmospheric interference, and low spatial resolution [23]. Therefore, satellites are not suitable for the early detection of small-scale wildfires [24]. On the other hand, small UAVs or surveillance cameras incur much lower operating costs than other technologies [25], offer high maneuverability, flexible perspectives, and resolution and have been recognized for their high potential in detecting wildfires early and for providing field information [26].

Previous studies combined image data and artificial intelligence methods to improve the accuracy of forest fire detection or to minimize the factors that cause errors. Damage detection studies often face the problem of data imbalances [27], which previously relied only on images downloaded from the Web and social media platforms [28,29]. Online image databases, such as the Corsican Fire Database, have been used for binary classification as a useful test set for comparing computer vision algorithms [30] but are still not available in MLC. Recent studies have demonstrated its effectiveness using data augmentation or transfer learning for the generalization of the performance of CNN models [31] and have shown its potential in object detection or MLC fields.

Because neural networks cannot be generalized to untrained situations, the importance of the dataset has been steadily emphasized to improve the performance of the model. During model verification, the smoke color and texture are too similar to other natural phenomena such as fog, clouds, and water vapor, and because it is difficult to detect smoke during the night, algorithms relying on smoke detection generally cause problems such as high false alarm rates [31,32]. The current study was conducted by including the objects that could not be differentiated in the dataset.

#### **3. Materials and Methods**

#### *3.1. Data Augmentation*

Data augmentation is the task of artificially enlarging the training dataset using modified data or synthesizing the training dataset from a few datasets before training the CNN model, which lowers the test error rate and significantly improves the robustness of the model to avoid overfitting. The most popular and proven effective current practices for data augmentation are affine transformation, including the rotation and reflection of the original image and color modification, including brightness transformation [33]. In this study, the image dataset was pre-processed in terms of reflection, rotation, and brightness, which are commonly used data augmentation techniques in previous studies to increase the richness of the training datasets.

#### *3.2. Transfer Learning*

Transfer learning is another approach to prevent overfitting [34]. It is a machine learning method that uses the weights of the pretrained models as weights for the initial or intermediate layers of the new objective model. In computer vision, transfer learning refers mainly to the use of pretrained models. This method is widely used to handle tasks that lack data availability [35]. There are two representative approaches for applying a pretrained model, called a fixed feature extractor and fine-tuning. The fixed feature extractor is a method of learning only the fully connected layer in a pretrained model and fixing the weights of the remaining layers. It is mainly applied when the amount of data is small, but the training data used for pretraining are similar to the training data of the target model. This approach is uncommon for the deep learning of damage detection areas, such as wildfire monitoring images, because of the dissimilarity between ImageNet and the given wildfire images.

On the other hand, fine-tuning not only replaces the fully connected layers of the pretrained model with a new one that outputs the desired number of classes to re-train from the given dataset but also fine-tunes all or part of the parameters in the pretrained convolutional layers and pooling layers by backpropagation. It is used when the amount of data is sufficient, even if the training data are not similar. This is shown in Figure 1.

The pretrained CNN model from ImageNet [36], which contains 1.4 million images with 1000 classes, is used for transfer learning. However, as there are no labels similar to flame or smoke or other on-site images to assist disaster response in the ImageNet label, fine-tuning is introduced.

#### *3.3. Multilabel Classification Loss*

Cross-entropy is defined as the calculation of the difference between the two probability distributions p and q, i.e., error calculations. Cross entropy is used as a loss function in machine learning. However, our framework uses binary cross entropy (BCE), which has commonly been used in the loss function for multilabel classification. The CNN model performs training by adjusting the model parameters such that probabilistic prediction is as similar to ground-truth probabilities as possible through the BCE. In other words, the probability of the output and the target similarly adjusts the model parameters. The BCE loss is defined by the following equation:

$$\mathcal{L}\_{\text{BCE}} = -\frac{1}{N} \sum\_{i=1}^{N} [p(y\_i) \log q(y\_i) + \{1 - p(y\_i)\} \log \{1 - q(y\_i)\}],\tag{1}$$

where *N* denotes the total count of images, *p*(*yi*) denotes the probability of class *yi* in the target, and *q*(*yi*) denotes the predicted probability of class *yi*.

**Figure 1.** Transfer learning architecture with fine-tuning using the pretrained CNN model initialized with the weights trained from ImageNet.

#### *3.4. Proposed Network*

The MLC model used in this study consists of a backbone network pretrained on ImageNet and fully connected layers. In multilabel classification, the training set consists of instances associated with the label set, and the model analyzes the training instances with a known label set to predict the label set of unknown instances. Figure 2 shows an example of a framework for an MLC-based proposed model with DenseNet121 as the backbone. The fully connected layer included dropout [37] and batch normalization [34]. The order of dropout, batch normalization, and rectified linear units (ReLU) were constructed based on the methodology of Ioffe [34] and Li [38].

Six classes were printed out, and the model was configured to achieve the following goals required for disaster response during the event of a forest fire: (a) check whether a forest fire has occurred, (b) detect smoke for the early detection of fires, (c) detect the burning area for extinguishing, and (d) detect the areas where human or property damage may occur.

In this study, we selected each of the three pretrained models mentioned above as a backbone network. An MLC model was constructed to provide information that can be supported for wildfire response from CCTV or UAV images. Finally, we compared the performance of each model.

**Figure 2.** Proposed approach pipeline: framework of multilabel classification with transfer learning from DenseNet-121 as a backbone network. The probabilities generated by the sigmoid function were independently output at the end of the neural network classifier.

#### *3.5. Performance Metrics*

Instances in single-label classification can only be classified correctly or incorrectly, and these results are mutually exclusive. However, the classification schemes in multilabel classification are mutually non-exclusive: in some cases, the predicted results from the classification model may only partially match the elements of the real label assigned to the instance. Thus, methods for evaluating multilabel models require evaluation metrics specific to multilabel learning [39]. Generally, there are two main groups of evaluation metrics in the recent literature: example-based metrics and label-based metrics [40]. Labelbased measurements return macro/micro averages across all labels after the performance of the training system, for each label is calculated individually, whereas example-based measurements return mean values throughout the test set based on differences in the actual and predicted label sets for all instances. To evaluate the performance of each model and to verify the effectiveness of transfer learning and data augmentation, this study used macro/micro average precision (PC/PO), macro/micro average recall (RC/RO), and macro/micro average F1-score (F1C/F1O). The abbreviations for evaluation metrics are based on the notations of Zhu [41] and Yan [9]. The metrics are defined as follows:

$$PC = \frac{1}{q} \sum\_{\lambda=1}^{q} \frac{TP\_{\lambda}^{q}}{TP\_{\lambda}^{q} + FP\_{\lambda}^{q}} \tag{2}$$

$$RC = \frac{1}{q} \sum\_{i}^{q} \frac{TP\_{\lambda}^{q}}{TP\_{\lambda}^{q} + FN\_{\lambda}^{q}} \tag{3}$$

$$PO = \frac{\sum\_{\lambda=1}^{q} TP\_{\lambda}}{\sum\_{\lambda=1}^{q} (TP\_{\lambda} + FP\_{\lambda})} \tag{4}$$

$$RO = \frac{\sum\_{\lambda=1}^{q} TP\_{\lambda}}{\sum\_{\lambda=1}^{q} (TP\_{\lambda} + FN\_{\lambda})} \tag{5}$$

$$F1\text{C} = \frac{2 \ast PC \ast RC}{PC + RC} \tag{6}$$

$$F1O = \frac{2 \star PO \star RO}{PO + RO} \tag{7}$$

In the above equations, *TP*, *FP*, and *FN* denote true positives, false positives, and false negatives, respectively, as evaluated by the classifier.

Macro averages are used to evaluate the classification model on the average of all of the labels. In contrast, the micro average is weighted by the number of instances of each label, which makes it a more effective evaluation metric on datasets with class imbalance problems. The *F*1 score is a harmonic average that considers both precision and recall. Therefore, the *F*1 score is generally considered a more important metric for comparing the models. In addition, the datasets for MLC generally suffer from data imbalance, and thus, micro-average-based metrics are considered important.

In addition, this study used Hamming loss (*HL*) and mean average precision (*mAP*), which are represented by example-based matrices. These metrics are defined as follows:

$$\text{Harmonic Loss} = \frac{1}{|D|} \sum\_{i=1}^{|D|} \frac{|Y\_i \Delta Z\_i|}{|L|} \tag{8}$$

$$mAP = \frac{1}{|L|} \sum\_{i=1}^{|L|} AP\_i \tag{9}$$

In the above equations, |*D*| is the number of samples, |*L*| is the number of labels, and *APi* is the average map of label *i*. Hamming loss is the ratio of a single misclassified label to the total number of labels (considering both the cases when incorrect labels are predicted and when associated labels are not predicted) and is one of the best-known multilabel evaluation methods [42]. The mean average precision was the mean value of the average precision for each class.

#### *3.6. Class Activation Mapping*

In the CNN model, the convolutional units of various layers act as object detectors. However, the use of fully connected layers causes a loss in the localizing features of these objects. Class activation mapping (CAM) [43] is used as a CNN model translation method and is a popular tool for researchers to generate attention heatmaps. A feature of CAM is that the network can include the approximate location information of the object even though the network has been trained to solve a classification task [8]. To calculate the CAM value, the fully connected layer is modified with the global average pooling layer (GAP). Subsequently, a fully connected layer connected to each class is attached and fine-tuned. However, it has a limitation in that it must use a GAP layer. When replacing the fully connected layer with GAP, the fine-tuning of the rear part is required again. However, CAM can only be extracted for the last convolutional layer.

Gradient-weighted class activation mapping (Grad-CAM) [44] solves this problem using a gradient. Specifically, it uses the gradient information coming into the last convolutional layer to take into account the importance of each neuron to the target label. In this study, Grad-CAM was used to emphasize the prediction values determined by the classification model and to visualize the location of the prediction target.

#### **4. Results**

This section presents the learning process and test results of the MLC model to support wildfire responses. Experiments were conducted in a CentOS (Community Enterprise Operating System) Linux release 8.2.2004 environment with Nvidia Tesla V100 GPU, 32 GB memory, and models were built and trained using PyTorch [45], a deep learning opensource framework.

#### *4.1. Dataset*

The dataset used to train and test the deep learning model contained daytime and nighttime wildfire images captured by surveillance cameras or drone cameras downloaded from the Web and cropped images of a controlled fire in the forest captured by a drone by the researchers. This study also included a day–night image matching (DNIM) dataset [46], which was used to reduce the effects of day and night lighting changes, and Korean tourist

spot (KTS) [47] datasets generated for deep learning research, which comprise images linked by forest labels containing important wooden cultural properties in the forest. Additionally, wildfire-like images and 154 cloud and 100 sun images were also included as datasets because they have similar properties with early wildfire smoke and flames as the color or shape and are often detected erroneously. As such, they were included in the training dataset to prevent predictable errors in the verification stage and to train the robust model against wildfire-like images.

The collected images were resized or cropped to 224 × 224 pixels to consider whether the model is applicable to high-definition images. The datasets included 3,800 images. Figure 3 shows samples of the images.

**Figure 3.** Resized sample of annotated images for MLC: (**a**) downloaded from the Web; (**b**) KTS dataset; (**c**) DNIM dataset; (**d**) dataset for error protection purposes; (**e**) dataset of a controlled fire captured by researchers.

All instances were annotated according to the following classes: "Wildfire", "Non-Fire", "Flame", "Smoke", "Building", and "Pedestrian" (each class was abbreviated as "W", "N", "F", "S", "B", and "P", respectively). Table 1 lists the number of images for each designated label set before data augmentation. It consists of 2165 images downloaded from the Web, 1000 images from the KTS dataset, 101 images from the DNIM dataset, 254 images for error protection purposes, and 280 cropped images captured by the researchers. To ensure the annotation quality and accuracy, all of the annotated images were checked twice by different authors.



#### *4.2. Data Partition*

The dataset used for the experiment was divided into train, validation, and test sets. The test dataset included 2280 images from the entire dataset. The remaining 1520 images were pre-processed by data augmentation techniques, such as rotation, horizontal flip, and brightness, which are typically used in CNN image classification studies to secure sufficient data for learning, as shown in Table 2. Table 1 also lists the number of images for each designated label set after augmentation. Overall, the non-fire label group was the

most common, and as the number of multi-labels increased, the number of the label group decreased. In particular, the number of label groups in which pedestrians and houses exist in the wildfire site, which is difficult to obtain, has the lowest number.

**Table 2.** Number of images generated by each data augmentation method.


Since drones are generally not perpendicular to the horizon or are inverted when photographing wildfires, the rotation was not set up as extreme, such as at 90◦ or 180◦, but was instead set up between 10◦ and 350◦, considering the lateral tilt of the drone. In addition, if the brightness of the image is too high or too low, the boundary line of the objective target becomes unclear, and the object becomes ambiguous. Therefore, data enhancement was performed between the maximum brightness *l* = 1.2 and the minimum brightness *l* = 0.8. After data augmentation, the training and test datasets were divided in a ratio of 4:1. In the model learning phase, 912 randomly sampled instances from the training dataset were evenly divided into 10 groups for evaluation using the cross-validation strategy.

The total number of classes of the prepared data was checked, and the distribution is shown in Figure 4. Due to the nature of wildfire response, most of the early detection was performed by smoke, so the number of smoke classes was higher than the number of flame classes. In addition, the wildfire classes and non-fire classes also had an imbalance, and the building and pedestrian classes also had relatively few classes. Since there was an imbalance between the labeling classification table in Table 1 and the overall class distribution in Figure 4, the micro average-based metric evaluation index should be checked.

**Figure 4.** Histogram of class distribution on multi-label classification datasets after data pre-processing.

#### *4.3. Performance Analysis*

This study compared the models with different backbones and verified the efficiency of transfer learning and data augmentation. The model was constructed using training and validation sets that had been partitioned by a 10-fold cross-validation strategy, and the final performance was measured according to each performance metric from the test dataset. The initialized learnable parameters (i.e., hyperparameters) for the CNN-based MLC architectures are listed in Table 3.


**Table 3.** Parameters for each CNN architecture.

The models were trained using binary cross-entropy as a loss function with the selected parameters. Each model was trained using a 10-fold cross-validation strategy, and the results were calculated 10 times. The training process using the validation scheme of each model with the selected hyperparameter combination is illustrated in Figure 5. In the case of VGG-16, the training loss fell gently and started at a very high validation loss value, while the training loss of ResNet-50 and DenseNet-121 fell sharply to about epoch 10 at the initial stage and then remained close to zero. However, it was found that DensNet-121 remained lower in terms of the validation learning curve. In the final epoch, epoch 100, it was shown that both the training loss and validation loss recorded the lowest values in DensNet-121.

**Figure 5.** Learning curve over epochs (until 100). (**a**) Training learning curve. Final loss: VGG-16 (0.00789); ResNet-50 (0.00137); DenseNet-121 (0.00048). (**b**) Validation learning curve. Final loss VGG-16 (0.06790); ResNet-50 (0.03352); DenseNet-121 (0.00318).

The models were evaluated using the label-based performance metrics, which are shown in Figure 6 as a box plot. All of the proposed models showed good multilabel classification ability using images of forest and wildfire sites for disaster response, with high scores (above 0.9) for most of the evaluation metrics. Among the proposed models, DenseNet-121 not only showed a significantly higher score for all of the evaluation metrics (distribution of the highest box and median value) but the interquartile ranges of each metric result were also typically smaller (i.e., with fewer distributed results) than in other models. Thus, the model maintained consistently high performance over several tests. Table 4 presents the results of the evaluation measurements with the mean and standard deviation.

However, an evaluation that only uses label-based measurements cannot highlight the dependencies between classes. Therefore, Table 4 presents example-based scores that consider all of the classes simultaneously and thus are considered more suitable for multilabel problems. The mAP score for the best-performing model (DenseNet-121) was 0.9629, whereas the mAP score for HL was 0.009.

**Figure 6.** Boxplot of the 10-fold cross-validation results from performance metrics for MLC model with each backbone network.


**Table 4.** Compared performance scores of each backbone network (mean ± standard deviation).

In addition, the per-class score of the area under the receiver operating characteristic curve (ROC-AUC) values of our proposed models were calculated to determine the performance for each class in the image dataset. The ROC curve is a graph showing the performance of the classification model at all possible classification thresholds, unlike the recall and precision values that change as the threshold is adjusted. AUC is a numerical value calculated from the area under the ROC curve and represents the measure of separability. Therefore, the ROC-AUC is a performance metric that is more robust than other performance indicators. AUC values range from 0 to 1, where AUC = 0.5 indicates that the model performed a random guess, and thus, the prediction was the entirely unacceptable. The best performance is when AUC = 1, indicating that all of the instances are properly classified. Table 5 presents the results with mean and standard deviation values.

**Table 5.** ROC-AUC scores per class for each model (mean ± standard deviation).


The dataset for the MLC model includes pictures of fires or non-fires (because the results are mutually exclusive). Therefore, the results of two classes—"Wildfire" and "Nonfire"—are calculated in almost the same way. The results of the classes "Wildfire" and "Smoke" were also calculated similarly, as flames are inevitably accompanied by smoke, although this smoke may be invisible because some fires are small or obscured by forests. The accuracy of the pedestrian and building labels was low in all of the models, which

can be attributed to the relatively small number of labels assigned to the instances. It was confirmed that the ROC-AUC scores in all of the classes were generally high in the transfer learning algorithm using DenseNet-121 as a network.

Finally, to confirm the effect of transfer learning and data augmentation on the training model, we removed one data limit overcoming strategies each time using the control variable method and obtained the F1-score and the HL value. This method was implemented for DenseNet-121, which showed the highest performance. The training learning curve over epochs is illustrated in Figure 7. In the training stage, there was a significant difference in the slope of the learning cover curve when no data limit overcoming strategies were used and when one or more strategy was used; a steep learning curve was demonstrated with the strategies; a shallow learning curve was demonstrated without the strategies. When all of the strategies were used, the curve was formed the most rapidly, and the lowest final loss was calculated. This means other models require more practice or attempts before a performance begins to improve until the same level is reached. The curve produced in the case of using all of the strategies was formed the most rapidly, and the lowest final loss score was also calculated. There was no significant difference in the data augmentation and transfer learning effects when looking at the gradient slope or the final loss, but it was shown that the roughness of the curve was further reduced when data augmentation was used. In other words, learning was more stable.

**Figure 7.** Learning curve over epochs (until 100) trained by the control variable method. Final loss. Transfer learning and data augmentation (0.00048); data augmentation (0.01807); transfer learning (0.01909); none (0.05216).

Additionally, the test results determined by the evaluation metrics are listed in Table 6. The results of this experiment show that transfer learning can significantly improve multilabel classification performance. With the exception of the transfer learning strategy, the macro average F1-score decreased by 0.0745, the micro average F1-score decreased by 0.0466, and the HL increased by 0.0286. In the case where only augmentation was used, the macro average F1-score decreased by 0.1159, the micro average F1-score decreased by 0.0701, and the HL increased by 0.0412.

The performance of transfer learning was further reduced when trained only with the datasets with no data augmentation. Hence, the quantitative number of the datasets required for learning in MLC has a significant impact on the performance of the model.


**Table 6.** F1 scores and Hamming loss values for the models trained by the control variable method.

#### *4.4. Visualization*

To perform localization, a bounding box is drawn by the thresholding method, which retains over 20% of the grad-CAM result. It also provides a confidence score indicating the extent to which the model's predictions are true, considering a threshold value of 0.5. Figures 8 and 9 show an example of the results obtained with DenseNet-121 as a backbone.

**Figure 8.** Original image (first column) and the heatmap and bounding box (columns 2–5) for a well-classified example (**Case 1**); an example to review errors for objects with similar colors or shapes (**Case 2**); and an example of an error for a particular class (**Case 3**). Case 3 has an error for the building class. The number above each picture is the predictive confidence score.

**Figure 9.** Example of a heatmap and bounding box result for the class that has the possibility of an error. Confidence score (**a**): Wildfire (0.8117); Smoke (0.8158); Flame (0.0096); Non-Fire (0.1860); Building (0.0006); Pedestrian (0.9952). Confidence score (**b**): Wildfire (0.8117); Smoke (0.8158); Flame (0.0096); Non-Fire (0.1860); Building (0.0006); Pedestrian (0.9952).

> As shown in Figure 8, the sum of the confidence scores between the two classes is almost 100% because the wildfire and non-fire classes are mutually exclusive. Among the test datasets, a Case 1 image was selected as a sample of labeled wildfire with pedestrians, a Case 2 image was selected as a sample with a confusing object, such as sun or fog, that can be treated as a wildfire object for verification. Finally, a sample image with a night fire was selected to evaluate the model in nighttime conditions in Case 3.

> In Case 1, the model predicted smoke, flames, non-fire, and a pedestrian with confidence scores of 0.9464, 0.7642, 0.0539, and 0.8623, respectively. The heatmap and bounding box were separated from each other to express the location. Conversely, for non-fire that is not assigned to an instance, a heatmap map without fire or smoke was displayed in the bush area.

> Case 2 used an image taken at sunrise in a foggy mountainous area. All of the classes except for the non-fire class showed a score of 0.000, and it was possible to examine whether the model worked correctly to classify the sun and fog, which are frequently used for evaluating errors in wildfire detection.

> Case 3 was an image of a wildfire that occurred near a downtown area at nighttime, and fireworks were displayed nearby, which may have some effect on detection. For the wildfire class, a heatmap was formed even in an area unrelated to the wildfire (lower left), which was judged to have detected the smoke generated by the firecrackers on the image. However, although the lighting in the dark at night was considered in the model training process, the heatmap represented the residential area, but the reliability of the building class was still very low (0.0030).

> These results were similarly expressed in other test datasets, indicating that the accuracy of the classification was only improved if the CNN model was observable with the naked eye because only clear targets were labeled during the dataset preparation process.

Using an example, Figure 9 shows that the classification model is robust to a small object or noise in a photograph. As shown in Figure 9a, firefighters were dispatched to extinguish the fire in the forest, and nearby hikers were caught on camera. Although the human shape in Figure 9a looks very small, it is detectable with a 0.9952 confidence score, and an approximate location of the object was determined. Figure 9b shows a house with lights on and a nearby forest fire. Despite the similarity of the lamp to the fire image, the heatmap result did not recognize this part as a fire. Thus, the model looked at the appropriate part when identifying each class.

Finally, Figure 10 shows the influence of transfer learning and data augmentation. As discussed in the previous subsection, four cases were classified using the control variable method. For each case, the heatmap and bounding box of a specific class (smoke and person in Figure 10) were visualized, and the probability values were calculated. In the case of not using the data-shortage overcoming strategy, it was found that the heatmap represented the wrong place, and the class that was difficult to distinguish could not be detected at all. (If the value is less than 0.5, it is not treated as a detected value.) Therefore, the accuracy difference is significant if data supplementation strategies are not used for wildfire images, where data are inevitably lacking, as shown in Figures 9 and 10. When a data supplementation method was used, the heatmap distribution was somewhat reasonable, and the confidence value for the class that was to be detected was significantly increased. When both strategies were used, the heatmap distribution was the cleanest, and the positive predictive probability was the highest.

**Figure 10.** Sample of the Grad-CAM results depending on the model trained by the control variable method.

#### *4.5. Application*

The proposed model was applied to an image captured by the researchers using a DJI Phantom 4 Pro RTK drone, which had an image size of 1280 × 720 high-definition (HD) units. The filming site was composed of a virtual wildfire environment similar to that of a fire created by lighting a drum around the forest.

Although the captured HD image can be resized to the input size of the proposed model, downscaling a high-resolution image may result in the loss of information that is useful for classification, and the model may not operate smoothly [48]. Thus, the images were divided into 28 equal parts of 224 × 224, and the model was evaluated for the divided pictures. When the pictures are divided without overlapping parts, there is a possibility of a blind spot where the object to be found is cut off. Thus, the images are divided such that there are overlapping parts. The predicted classification values of each part of the picture were merged into the entire image and were visualized. The results of applying the proposed model to the drone shooting screen are shown in Figure 11. The confidence value was over 50%, and the label corresponding to the photograph part was predicted. In Figure 11, a forest fire was detected based on smoke in the central part of the whole picture. However, there was also an error (9.02%) in the building area, which was important for preserving residential and cultural assets. This error can be explained as follows: the large object was still cut off in the cropped image despite the application of the overlapping method. The white dotted circle was drawn to highlight the area with people, and the model correctly predicted that there were people in the area at 99.34% and 77.03%.

**Figure 11.** Sample of model application with the confidence score for each class. The blue and red boxes represent 224 × 224 images.

#### **5. Discussion**

In this study, transfer learning and data augmentation were combined to improve the capabilities of the model. Three different pretrained models were used to handle data limitations, data augmentation was performed, and each model was evaluated using labelbased evaluative metrics and example-based evaluative metrics. In conclusion, DenseNet-121 surpassed VGG-16 and ResNet-50 in the proposed MLC model. This is confirmed by the results of the evaluation metrics. With the advancement in camera technology, the image resolution increases, but training a CNN to handle large images is particularly difficult. The problems are the cost and learning time caused by excessive computational load in the initial layer. Because of the discrepancy between the image size in these models and the image size taken from the imaging device, we split the high-resolution image into smaller parts and processed them separately. The method proposed in this study loses less data and is expected to better classify small objects compared to scenarios when the original image is reduced in size and only a single image is processed. The proposed framework can be converted into other applications of image-based decision-making systems for disaster response fields to extract redundant information from one object.

Some previous studies used public data for binary classification problems (fire and non-fire). However, a dataset with multiple labels or classes changes according to the requirements of the system, and it is difficult to use the datasets from previous studies. The classified labels were defined to solve the need for a response from the image sources collected at the site. Fire is often accompanied by smoke, which is released faster than

flames. The flame of a forest fire is barely visible from a distance. However, the smoke columns caused by fires are usually visible on camera. Therefore, early smoke detection is an effective way to prevent potential fire disasters [49]. Based on the detection of flames, field responders can be informed as to where the flames need to be extinguished. Decision makers for wildfire response, who receive information on the life and property at the fire site, use this basic information to decide on the evacuation route by considering the spot of fire occurrence to preferentially protect the area where there is a possibility of severe damage and to establish a line of defense. Instructions for prioritizing such tasks and for efficiently allocating limited support resources must be provided. Therefore, label categories can be defined for wildfire response and building large image benchmarks for disaster response.

This is a basic study that provides multilabel information of target areas from cameras by applying CNN to wildfire response. Considering that multilabeling was performed manually by the researchers, the distinction between instances was vague in some of the images collected from external data sources. Instances that were too small to distinguish were not labeled to avoid overfitting. In addition, in the case of an instance that cannot be easily distinguished with the naked eye, it was not possible to easily add a class to be classified because of a label classification error.

Therefore, future research should aim to construct a formal annotated data benchmark for wildfire response in deep learning systems to enable the use of field information for supporting disaster decision-makers from the perspective of the wildfire detection algorithm. For example, the state of the wildfire may be understood from the fire shape. Crown fires are the most intense and dangerous wildfires, and surface fires cause relatively little damage. It is also important to identify forest species in disaster areas using videos. If the forests of the target area are coniferous, fires may spread to a large area. To provide this additional information, it is important to ensure communication between photographers, labeling workers, and deep learning model developers. From the perspective of wildfire response, future studies should also aim to develop an integrated wildfire-response decision-support system that can provide decision makers with various insights. Location can be retrieved from the global positioning system (GPS) of drones filming in disaster areas, and this can be combined with data on weather conditions that greatly affect wildfire disasters, such as wind direction, wind speed, and drying rate at the target site. In addition, when combined with a geographic information system (GIS), it is possible to determine the slope of the target area because a steep slope is difficult to control during a wildfire.

#### **6. Conclusions**

To the best of our knowledge, previous computer vision-based frameworks for managing fires have only used binary classification. However, in disaster response scenarios, decision makers must prioritize extinguishing operations by considering the range of flames, major surrounding structures such as residential facilities or cultural assets, and residents at the site. Various types of information on the scene of a wildfire can be obtained and analyzed using the photographs from an imaging device. However, annotation work is limited because of a lack of training datasets and the fact that previous wildfire detection research has only focused on binary classification. To solve these problems, we proposed a basic MLC-based framework to support wildfire responses.

The proposed model was verified through well-known evaluation indicators from the dataset selected by the researchers, and DenseNet-121, the most effective of the three representative models, was selected as the final model. Then, we visualized the result through grad-cam, and proposed a method to divide and evaluate each image to prevent data omission when applied to FHD or higher photos according to recently developed camera technology.

**Author Contributions:** Conceptualization, M.P.; methodology, M.P. and D.Q.T.; data curation, S.L.; writing—original draft preparation, M.P. and S.P.; writing—review and editing, M.P. and S.P.; visualization, M.P. and D.Q.T.; project administration, M.P.; funding acquisition, S.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by a grant (2019-MOIS31-011) from the Fundamental Technology Development Program for Extreme Disaster Response funded by the Ministry of Interior and Safety (MOIS, Korea).

**Data Availability Statement:** Data is contained within the article and reference.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

