2.5.1. BOA

Determining how to select appropriate hyperparameters has become a key issue in image classification tasks in the circumstance that the performance of the model largely depends on the selection of hyperparameters. The method of manual optimization is difficult and time-consuming to find the optimal parameters. Recently, the widely used methods of automatic parameter tuning of machines include the grid search algorithm (GSA), the random search algorithm (RSA), and the BOA. The essence of the GSA is the enumeration method, which is costly in terms of time spent when the objective function is more complex [34]. Although the RSA no longer tests all values within a parameter range, randomly selected sample points in the search range may ignore optimal values [35]. The BOA is one of the most popular methods for tuning hyperparameters in deep learning models [36]. Its main idea is that, given an objective function to be optimized, the posterior distribution of the objective function is updated by continuously adding sample points until the posterior distribution approximately corresponds to the true distribution or the function is executed for a predetermined number of iterations. It is a technique for adjusting hyperparameters based on the priori information, which is faster, more effective, and more efficient than the previous two algorithms. The major problem scenarios of the BOA are as follows:

$$X^\* = \arg\_{\mathbb{S}^x \subseteq S} \max \mathfrak{f}(\mathfrak{x}) \tag{3}$$

Here, *S* is the candidate set of *x* and *f*(*x*) is the objective function. The target of the BOA is to pick an *x* from *S* such that the value of *f*(*x*) is maximized or minimized.

The BOA was used to optimize the hyperparameters of the back propagation neural network (BPNN), AlexNet, VGG16, ResNet34, and IRBOA models. The activation function adopted for each model was ReLU with each batch\_size set to 64, and the training epoch for the BPNN and CNN models were 5000 and 100, respectively. The cross-entropy function was employed for the loss function and the accuracy of the validation set was selected for the objective function of the BOA. The optimized variables are those proposed in 2.5.2, 2.5.3, and 2.5.4, including the number of neurons in the hidden layer of the BPNN (hidden), optimizer, learning\_rate, the update interval in the learning rate decay algorithm (step\_size), the multiplication factor for updating the learning rate (gamma), and L2 regular term parameters (weight\_decay). Table 2 shows the search space of each hyperparameter.

**Table 2.** Hyperparameters search space based on BOA.


### 2.5.2. Optimizer

The optimizer is designed to minimize the loss in the training process through gradient descent, thereby enhancing the accuracy of the model. The stochastic gradient descent (SGD) algorithm and the adaptive momentum estimation (Adam) algorithm are two superior optimizers for image classification tasks in deep learning. Each of them has its advantages and disadvantages, hence the optimizer was selected to make the model optimal by employing the BOA in Section 2.5.1.
