**3. Architecture of the Neural Network**

#### *3.1. Description of Our Model*

The main procedure of the experiment is shown in Figure 2. First, we collect data on a variety of minerals. Then, we label all the data and split the dataset into a training set and a test set. HELaplace processing is performed on the data from the test set and training set. Then, the obtained training set is used to train in a convolutional neural network through the YOLOv5 model. Finally, the classification to which each mineral picture in the test set belongs is calculated and the accuracy rate is recorded.

**Figure 2.** The structure of our model.

Specifically, Figure 3 illustrates the specific structure of the YOLOv5 network. It consists of four parts: input, backbone, neck, and prediction. The input side uses Mosaic data enhancement [23] and adaptive anchor frame calculation. The backbone part uses the focus structure and the cross-stage-partial-connections (CSP) structure. The neck part uses a feature pyramid network (FPN) + pixel aggregation network (PAN)) structure. The prediction part uses non-maximal suppression (NMS) to filter the targets, so it has high accuracy. As a new type of deep neural network (DNN), unlike traditional algorithms that require strict image pixel size, YOLOv5's adaptive image scaling has no requirement in terms of image size. We also modified the YOLOv5 code in the letterbox function of datasets.py to add a minimum of black borders to the adaption of the original image, reducing information redundancy and therefore greatly improving the processing speed. The CSP structure of YOLOv5s is to divide the original input into two branches and perform separate convolution operations to halve the number of channels. One branch performs the Bottleneck \* N operation, then concats two branches. This allows the input and output of BottlenneckCSP to be the same size, which enables the model to learn more features. The neck of YOLOv5 has the same FPN+PAN structure as in YOLOv4. However, the convolution operation used in the neck of YOLOv4 is regular. In contrast, the CSP2

structure inspired by the CSPNet [31] design is used in the neck structure of YOLOv5 to enhance the network feature fusion and improve the identification accuracy.

**Figure 3.** The main model of YOLOv5 network.

#### *3.2. Model Training*

In this paper, the deep learning integrated development environment is Pycharm. Test environment: NVIDIA GTX 1060, 8G memory, Intel Core(TM) i7-8750H CPU, and Python 3.8 as the compiler language. The parameters we used for model training are shown in Table 2. Parameters not listed in the table are used as default values.

**Table 2.** Parameters used for model training.


In our experiments, we use the GLOU function [32] as our loss function. Its smaller value indicates more accurate results. The expression of its function is

$$\text{Glow}\_{\text{loss}} = -\frac{1}{\sum\_{p} \mathbf{1}} \sum\_{p} (\mathbf{1} - \text{low}\_{p}) \tag{4}$$

where *p* denotes the predicted positive example index and lou*p* is the intersection ratio of the predicted positive example frame *p* to the corresponding true frame.

We recorded the changes in loss function GLOU values during the training process and tested the accuracy of the model on the validation set after each iteration of the training set was completed. The change in GLOU loss during the training process is shown in Figure 4. It can be seen that the model converges effectively, and the GLOU loss has reached a low level after 50 epochs. According to the figure, the model achieves the best accuracy on the validation set after the 90th iteration, and the accuracy decreases after continuing the training, probably due to some overfitting.

**Figure 4.** Change in (**a**) loss value and (**b**) precision value.

#### **4. Test Result and Discussion**

To test the accuracy of our method, we selected 13,911 images from a collection of 220,057 images for testing our neural network model. After inputting one of the images into the neural network, the mineral category with the highest probability is given. We evaluate the performance of our method in terms of accuracy, and also compare it with other methods and give results.
