*2.2. Deep-Feature Extraction*

Feature extraction can be defined as an image processing technique to determine the identity of the mutual importance of imaged areas. There are many procedures for feature extraction in the field of image processing and RS, such as the texture Haralick feature [45,46], spectral features (e.g., the NDVI) [10], and transformation-based (e.g., principal component analysis) and deep-feature [15,47,48] extraction. Among the types of feature-extraction methods, the deep-feature-extraction methods have found a specific place in RS communication because they have great potential for the extraction of complex features from an image [49]. Deep-learning methods can automatically extract high-level spatial and spectral features simultaneously [50]. This advantage of deep-learning methods means that they have been used for many applications in RS, such as change detection [51], classification [52], anomaly detection [53], and damage mapping [54].

**Figure 1.** Overview of the general framework for the burned area mapping.

**Figure 2.** Overview of the proposed DSMNN-Net architecture for burned area mapping.

The deep features are extracted by convolution layers, and the arrangement of the convolution layers and their diversity has caused many deep-learning-based methods to be proposed [55–57]. Presenting the informative structure of convolution layers can be a major challenge. In this regard, the present study presents a novel framework based on standard 3D, 2D, and depthwise convolution layers, with their combination with morphological layers. As illustrated in Figure 2, the method proposed here has two deep-feature extractor streams. The first stream investigates the pre-fire dataset, and the second stream explores the deep features from the post-event dataset. Each stream includes 2D-depthwise, 3D/2D standard convolution layers, and morphological layers based on erosion and dilation operators. Initially, the deep-features extraction is based on 3D multiscale convolution layers, and then the extracted features are fed into the 3D convolution layer. The main advantage of 3D convolution layers is to take the full content of the spectral information of the input dataset by considering the relation among all of the spectral bands. Furthermore, the multiscale block enhances the robustness of the DSMNN-Net against variations in the object size [12]. The multiscale block uses a type kernel size of convolution layers that increase the efficiency of the network. The expected features are reshaped and converted to 2D feature maps, and then the 2D-depthwise convolution layers are used. Next, the hybrid morphological layers based on 2D dilation and erosion combine with 2D convolution layers to explore more high-level features. For this, first, we use two erosion layers, and then the 2D convolution layer and dilation layers are used (see Figure 2). Finally, the 2D convolution, erosion, and dilation layers have been used in the last part of the morphological deepfeature extractor. The extracted deep features are concatenated for two streams and then they are flattened and transferred to two fully connected layers, and finally, the soft-max layer is entitled to decide the input data. The main differences between the proposed architecture and other CNN frameworks are:


#### *2.3. Convolution Layer*

The convolution layers are the core building block of deep-learning methods that can learn feature representations of input data. A convolution layer builds several convolution kernels to extract the type of meaningful features. This study used 3D/2D convolution layers for deep-feature extraction [58–60]. Mathematically, the feature value (Ψ) in the *l*th layer is expressed according to Equation (1) [61]:

$$\mathbf{v}^{l} = \mathbf{g}\left(w^{l}\mathbf{x}^{l-1}\right) + b^{l},\tag{1}$$

where *x* is the input data, *g* is the activation function, *b* is the bias vector for the current layer, and *w* is the weighted vector. The value (ν) at position (*x*,*y*,*z*) on the *j*th feature *i*th layer for the 3D convolution layer is given by Equation (2) [62]:

$$\mathbf{v}\_{i,j}^{xyz} = \mathbf{g}(b\_{i,j} + \sum\_{\lambda} \sum\_{\omega = 0}^{\Omega\_i - 1} \sum\_{\varphi = 0}^{\oplus\_i - 1} \sum\_{\lambda = 0}^{\Delta\_i - 1} \mathbf{W}\_{i, j, \chi}^{\omega, \varphi, \rho, \lambda} v\_{i - 1, \chi}^{(x + \omega)(y + \rho)(z + \lambda)}) \tag{2}$$

where *χ* is the feature cube connected to the current feature cube in the (*i* − 1)th layer, and Ω, Φ, and Λ are the length, width, and depth of the convolution kernel size, respectively. In 2D convolution, the output of the *j*th feature map in the *i*th layer at the spatial location of (*x*,*y*) can be computed using Equation (3):

$$\mathbf{v}\_{i,j}^{xy} = \mathbf{g} \left( b\_{i,j} + \sum\_{\chi} \sum\_{\omega=0}^{\Omega\_i - 1} \sum\_{\varphi=0}^{\Phi\_i - 1} \mathbf{W}\_{i,j,\chi}^{\omega,\varphi} \mathbf{v}\_{i-1,\chi}^{(\chi+\omega)(y+\varphi)} \right) \tag{3}$$

#### *2.4. Morphological Operation Layers*

Topological operators are applied to images by morphological operators to recover or filter out specific structures [63,64]. Mathematical morphology operators are nonlinear image operators that are based on the image spatial structure [65–67]. *Dilation* and *Erosion* are shape-sensitive operations that can be relatively helpful to extract discriminative spatialcontextual information during the training stage [67–69]. *Erosion*( ) and *Dilation*(⊕) are two basic operations in morphology operators that can be defined for a grayscale image *X* with size *M* × *N* and *W* structuring elements, as follows in Equation (4) [65,66]:

$$\begin{array}{l} (X \oplus W)(x, y) = \max\_{(i, m) \in S} (X(x - l, y - m) + \mathcal{W}\_d(l, m)) \\ S = \{(l, m) | l \in \{1, 2, 3, \dots, a\}; m \in \{1, 2, 3, \dots, b\}; \} \end{array} \tag{4}$$

where *Wd* is the structuring element of dilation that can be defined on domain *S*. Accordingly, the erosion operator with structuring element *Wd* can be defined as follows in Equation (5):

$$(X \ominus W)(\mathbf{x}, \mathbf{y}) = \min\_{(i,m) \in S} (X(\mathbf{x} + l, \mathbf{y} + m) - \mathcal{W}\_{\mathfrak{c}}(l, m)) \tag{5}$$

The structure element is initialized based on random values in the training process. The back-propagation algorithm is used to update the structure elements in the morphological layers. The propagation of the gradient through the network is very similar to that of a neural network.

#### *2.5. Classification*

After deep-feature extraction by convolution and morphological layers, the deep features are transformed for the flattening layer to reshape as 1D vectors. Then, these vectors are fed to the first fully connected layer and the second fully connected layer. The latest layer is soft-max, which assigns probabilities to each class for input pixels. Figure 1 presents the classification procedure for this framework.

### *2.6. Training Process*

The network parameters are initialized based on the initial values and then are tuned iteratively based on optimizers, such as stochastic gradient descent. The DSMNN-Net is trained based on the training data, and the error of the network is obtained based on the calculation of the loss value on the validation dataset. The error of the training model is fed to the optimizer and is used to update the parameters. Due to back-propagation, the parameters are updated at each step to decrease the error of comparing the results obtained from the network with the validation dataset. The Tversky loss function is used to calculate the network error in the training process, which is a generalization of the dice score [70]. The Tversky index (*TI*) between *Ψ*ˆ (predicted value) and Ψ (truth value) is defined as in Equation (6):

$$TI(\Psi, \Psi, \mathfrak{a}, \mathfrak{k}) = \frac{|\, \Psi \Psi |}{|\, \Psi \Psi | + \mathfrak{a} |\, \Psi \langle \mathfrak{a} | + \mathfrak{k} | \, \Psi \langle \Psi |$$

where *α* and *β* control the magnitude of penalties for false positive and false negative pixels, respectively. These parameters are often chosen based on trial and error.

#### *2.7. Accuracy Assessment*

We assessed the results of the BAM based on visual and numerical analysis. The numerical analysis was applied as the standard measurement indices. To this end, the five most common quantitative assessment metrics were selected to evaluate the results. These indices are the overall accuracy (OA), the kappa coefficient (KC), and the F1-score, Recall, and intersection over union (IOU).

To compare the performance of the method proposed here, two state-of-the-art deeplearning methods were selected for this study. The first method was the deep Siamese network, which has been proposed in many studies for change detection purposes [71–73]. This method has three convolution layers in each stream, and then fully connected was used for classification. Then, the second method was CNN, based on a framework designed by Belenguer-Plomer, Tanase, Chuvieco and Bovolo [42] for mapping of burned areas. This method has two convolution layers and a max-pooling layer, then two fully connected layers were used. More details of this method can be found in [42].

#### **3. Case Study and Satellite Images**

This section investigates the case study area and the satellite data in more detail.

#### *3.1. Study Area*

Both study areas in this research were located in the Australian continent. The main reason for choosing the areas was the availability of the PRISMA hyperspectral datasets for these areas. Reference is the most important factor in the evaluation of BAM results. Thus, the reference data were obtained based on visual analysis and the interpretation of the results of BAM in previous papers. Figure 3 presents the locations of two study areas, in the southern Australian continent.

**Figure 3.** The locations of the two study areas for burned area mapping.

Figure 4 shows the incorporated burned area datasets for the first study area. Figure 5 illustrates the original incorporated dataset for the BAM for the second study area. The details for the incorporated datasets for both of the study areas are given in Table 1.

**Figure 4.** *Cont.*

**Figure 4.** The dataset used for the burned area mapping for the first study area. (**a**) Pre-event Sentinel-2 imagery. (**b**) Postevent Sentinel-2 dataset. (**c**) Post-event PRISMA hyperspectral imagery. (**d**) Ground truth.

**Figure 5.** Illustration of the various incorporated datasets for the burned area mapping for the second study area. (**a**) Preevent Sentinel-2 imagery. (**b**) Post-event Sentinel-2 dataset. (**c**) Post-event PRISMA hyperspectral imagery. (**d**) Ground truth.


**Table 1.** The main characteristics of the incorporated datasets for both case studies.
