*2.4. Model Inference Speed Improvements*

The SwishBlock bottleneck module in the previous section uses the convolution layer and Batch Normalization layer for forward operation. The batch normalization layer can accelerate the network's convergence and control its overfitting. Although it plays an active role during training, the forward operation is added in the network forward reasoning, which takes up more memory space and affects the processing speed. Therefore, in this section, the parameters in the batch normalization layer of the SwishBlock bottleneck module and the convolutional layer are merged to improve the forward reasoning speed of the model. The original batch normalization layer is represented as follows:

$$\mu\_B \leftarrow \frac{1}{m} \sum\_{i=1}^{m} \mathbf{x}\_i \tag{2}$$

$$
\sigma\_B^2 \leftarrow \frac{1}{m} \sum\_{i=1}^m \left(\mathbf{x}\_i - \boldsymbol{\mu}\_B\right)^2 \tag{3}
$$

$$\stackrel{\triangle}{\infty} \leftarrow \frac{\mathfrak{x}\_{i} - \mu\_{B}}{\sqrt{\sigma\_{B}^{2} + \varepsilon}} \tag{4}$$

$$y\_i \leftarrow \gamma \overset{\wedge}{\mathfrak{x}\_i^{\wedge}} + \beta \tag{5}$$

where *μB* is the mean value of the dataset, and *σ*2*B* is the variance of the dataset. *x*ˆ*i* is the normalization of the dataset, and *yi* is the output result after translation and scaling through the batch normalization layer. The convolution layer is calculated as shown in Equation (6), and the output of the BN layer through the convolution layer is shown in Equation (7). Equation (6) is substituted into Equation (7) and expanded to calculate the new weight and bias term of the convolution layer, as shown in Equation (8). Thus, the new weight and bias are shown in Equations (9) and (10), respectively. The new weight and bias term are used to perform the convolution layer calculation of the SwishBlock bottleneck module, and the result is the same as that of the original convolution layer plus the batch normalization layer, while reducing the forward reasoning speed of the model.

$$out = \sum\_{i=1}^{k} w\_i \mathbf{x}\_i + b \tag{6}$$

$$BN = \frac{\gamma (out - \mu\_B)}{\sqrt{\sigma\_B^2 + \varepsilon}} + \beta$$

$$BN = \frac{\gamma \left(\sum\_{i=1}^{k} w\_i \mathbf{x}\_i + b - \mu\_B\right)}{\sqrt{\sigma\_B^2 + \varepsilon}} + \beta = \frac{\gamma \sum\_{i=1}^{k} w\_i \mathbf{x}\_i + \gamma (b - \mu\_B)}{\sqrt{\sigma\_B^2 + \varepsilon}} + \beta \tag{8}$$

$$w\_{new} = \frac{\gamma \sum\_{i=1}^{k} w\_i}{\sqrt{\sigma\_B^2 + \varepsilon}} \tag{9}$$

$$b\_{new} = \frac{\gamma (b - \mu\_B)}{\sqrt{\sigma\_B^2 + \varepsilon}} + \beta \tag{10}$$
