*2.4. Severity Segmentation Branch*

Considering that the features of burnt EV bodies are close to the features of intact vehicles from the source dataset used for pretraining, the transfer learning method is effective in the foreground extraction task, and a simple ASPP module would result in good accuracy. However, in the severity segmentation task, the features of burnt regions are amorphous and abstract, and the number of classes for classification also increase from 2 to 4. Therefore, a network architecture with a better feature representation capability is in need.

Contextual information reinforcement and attention mechanism utilization are two major research priorities in semantic segmentation research. Inspired by DenseASPP, a densely connected multi-scale structure with an attention module named DA-EMA was proposed in this paper. The overall structure of the severity segmentation branch, including the DA-EMA module, is shown in Figure 4.

**Figure 4.** Structure of the severity segmentation branch. C is the output channels, K is number of bases contained in the EMA unit, and d is the dilation rate.

Simply improving the dilation rate of the ASPP module to improve the receptive field may cause a drop in overall model performance caused by the loss of modelling capability. To solve the problem and enlarge the receptive field further, Yang et al. [25] proposed a DenseNet [26]-like densely connected ASPP (DenseASPP) module.

Attention mechanisms have been proven effective in many semantic segmentation scenarios by performing feature recalibration and feature enhancement [27]. In this paper, an attention module is added to every level of a densely connected structure for enhancing multi-scale feature representation. However, traditional attention-based modules need to generate a large attention map that has high computation complexity and high GPU memory cost. A lightweight expectation maximization attention (EMA) module [28] is a good alternative in this case. Instead of treating all pixels as the reconstruction bases of the attention map, the EMA module uses the expectation maximization algorithm to find a set of compact basis in an iterative manner and then largely reduces computational complexity. A typical EMA unit consists of three operations, including responsibility estimation (*AE*), likelihood maximization (*AM*) and data re-estimation (*AR*). Given the input **<sup>X</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>C</sup>* and the initial bases *<sup>μ</sup>* <sup>∈</sup> <sup>R</sup>*K*×*C*, *AE* estimates the latent variables **<sup>Z</sup>** <sup>∈</sup> <sup>R</sup>*N*×*<sup>K</sup>* as 'responsibility', the step functions as the E step in the expectation maximization (EM) algorithm. *AM* uses the estimation to update the bases *μ*, which works as the M step in the EM algorithm. The *AE* and *AM* steps execute alternately for a pre-specified number of iterations. Then, with the converged *μ* and **Z**, *AR* reconstructs the original **X** as **Y** and outputs it. The detailed structure of one EMA unit is shown in Figure 5.

**Figure 5.** Detailed structure of an expectation maximization attention unit.

To improve the contextual representation, dilated convolution is frequently utilized in the proposed network. Wang et al. [29] found a "gridding" issue in the dilated convolution framework: as zeros are padded in the dilated convolution layer, the receptive field of the kernel only covers locations with a non-zero value and makes other neighboring information become lost. In this paper, dilation rates in the proposed DA-EMA module were modified from (3, 6, 12, 18, 24) to (3, 7, 11, 16, 21), which had no common divisor larger than 1 to improve the information used in the densely connected convolution layers with alleviation of the gridding effect. According to Figure 4, the overall DA-EMA module contains 5 EMA units with dilated convolution, and the sixth EMA unit is utilized to process the concatenated feature map. The detailed configuration of the dilated convolution layers and EMA units is shown in Table 3.


