*4.3. CNN Parameter Adjustment*

We used the stochastic gradient descent method as the parameter adjustment optimization method. Although the convolutional neural network used in this article does not have many layers, its structure belongs to deep learning. Let θ be a parameter in the neural network, the negative conditional log-likelihood of the training data can be expressed by Equation (7):

$$J(\theta) = \frac{1}{m} \sum\_{i=1}^{m} L\left(\mathbf{x}^i, \mathbf{y}^i, \theta\right) \tag{7}$$

where *L* represents the loss function corresponding to the *i* − *th* input, then for these cost functions that you want to add, the gradient descent method needs to be calculated using Equation (8):

$$\nabla\_{\theta} J\_{\theta} = \frac{1}{m} \sum\_{i=1}^{m} \nabla\_{\theta} L\left(\mathbf{x}^{i}, y^{i}, \theta\right) \tag{8}$$

The calculation time complexity of this optimization process is *O*(*m*), where m is the amount of data in the training set. As training increases, this complexity will increase. The core of the stochastic gradient descent method is to perform a small-scale sample approximate estimation. A small batch of training (minibatch) was performed in each step, and the sample size at this time was *m*, and *m* << *m*, and then pass. The results obtained in this part were used to estimate the results of the entire sample so that the amount of calculation will be significantly reduced. The gradient estimation, Equation (9), is as follows:

$$\log = \frac{1}{m'} \nabla\_{\theta} \sum\_{i=1}^{m'} L\left(\mathbf{x}^i, \mathbf{y}^i, \theta\right) \tag{9}$$

Use the mini-batch described above to train and estimate the entire sample using the following gradient estimation algorithm as Equation (10):

$$
\theta - \mathfrak{a}\mathfrak{g} \to \theta
\tag{10}
$$

The small-batch gradient descent method solves the shortcomings of the gradient descent method's time complexity and unreliability of the gradient descent method in the parameter optimization process. Finally, we optimized the parameters of the classification layer and used the evaluation function as Equation (11):

$$f(\theta) = -\frac{1}{m} [\sum\_{i=1}^{m} \sum\_{j=1}^{n} 1\{y\_i = j\} \log(\frac{e^{\theta\_j^T} \mathbf{x}\_i}{\sum\_{k=1}^{n} e^{\theta\_k^T} \mathbf{x}\_i})] \tag{11}$$

The optimization process is the process of minimizing the evaluation function so that the parameter *θ* will reach the optimal value.
