5.1.1. Convolutional Layer

The convolution layer uses the convolution kernel to convolute the input image. Additionally, the activation function is used to extract the texture features of the image to enhance the features. The convolution operation can be expressed as:

$$\mathbf{x}\_{j}^{l} = f\left(\sum\_{i \in \mathcal{M}\_{j}} \mathbf{x}\_{i}^{l-1} \ast \mathbf{k}\_{ij}^{l} + \mathbf{b}\_{j}\right) \tag{10}$$

where *l* represents the current layer number, *kij* is the weight matrix of the convolution kernel and *Mj* represents the set of input feature maps; *bj* is an offset term corresponding to each feature in the convolution layer.

#### 5.1.2. Pooling Layer

The pooling layer is also called a sub-sampling layer. It is usually located after the convolution layer. Using the sub-sampling function can reduce redundant features, further avoid overfitting and reduce network parameters. The mathematical model can be described as:

$$\mathbf{x}\_{\mathbf{j}}^{l} = f\left(\boldsymbol{\beta}\_{\mathbf{j}}^{l} \; \
downleft(\mathbf{x}\_{\mathbf{j}}^{l} - \mathbf{1} + \boldsymbol{b}\_{\mathbf{j}}^{l}\right)\right) \tag{11}$$

where down(.) represents the sub-sampling function. Generally, this function sums each different *nxn* block in the input image, so that the output image is *n* times smaller in both spatial dimensions. Each output map has its own multiplication bias *β* and addition bias *b*.
