**1. Introduction**

The particle size distribution of run-of-mine ore exhibits a great influence on the grinding process. Variations in the particle size distribution directly affect the throughput and power consumption of mills, especially autogenous (AG) and semi-autogenous (SAG) grinding mills [1]. Therefore, it is critical to evaluate the size distribution of run-of-mine ore on the conveyor belts in real time [2,3]. The measurement of the particle size distribution by sampling and sieving is considered a common and time-consuming method. The analysis method based on machine vision is considered a non-invasive, fast, and inexpensive technique for rock size measurement [4]. Since the 1980s, many studies have been conducted to evaluate the particle size distribution of materials on a conveyor belt based on machine vision and image processing technology [2,5,6]. Scholars have mainly followed three aspects of exploration. The first aspect includes accurate ore contour detection algorithms. The second aspect is the reasonable evaluation model which is used to convert two-dimensional information of ore into three-dimensional information, and then evaluating energy consumption or particle size distribution. The third aspect is new technology including neural networks, deep learning, and genetic

algorithms, etc. In 1988, Lange developed an on-line, real-time system which can capture images of rocks on conveyor belts, and process images to get the chord-length distributions, and then transform chord-length distributions to equivalent sieve sizes. Lange offered a method to distinguish belt ores of different size distributions [2]. Lin and Miller developed an image-based system which used image processing technology to get the chord-length of rocks, and used two kernel functions to calculate the cumulative chord-length distributions of regularly and irregular shaped particles, respectively. The last step was to transform the chord-length distributions into size distributions by the transformation Equation [5]. In 1997, Yen and co-workers used an empirical correction function to solve the coarse particle overlap problem [7]. Before 2000, limited by hardware technology, it was difficult to get sharp images and many algorithms that consumed too much computer performance could not be adopted. The scholars mainly researched the image-based system software and hardware framework, contour detection, reasonable measurement parameters and size transformation functions.

After 2000, with the rapid development of computer hardware and new technologies, the image-based, online, and real-time particle size measurement development made much progress. Singh and Mohan Rao extracted RGB color information, visual texture of particles, and developed a system based on a radial basis neural network. The system was used for ore classification and ore sorting [8]. Al-Thyabat and co-workers evaluated ore image segmentation results by means of Feret's diameter and equivalent area diameter, and experimented and discussed the effect of camera positions [9]. Levner offered a classification-driven watershed segmentation to segment belt ore images, and adopted machine learning to produce markers and identify ore edges [10]. Outal et al. provided a calibration method for evaluating 3D size distribution, according to the 2D segmentation results [11]. Andersson evaluated the size distribution of particles using ordinal logistic regression [12]. Hamzeloo et al. used different particle equivalent models to evaluate 3D size distribution. These equivalent models included the equivalent area circle, best-fit rectangle, Feret diameter, and maximum inscribed disk [4]. In addition, the impact of the shape of the particles on the product properties was also researched extensively [13–16]. Until now, many image-based analysis methods have been successfully applied to evaluate the particle size distribution [17–19], however there are still three unresolved problems regarding the image-based, on-line particle size analysis methods.

One is that there is no research focus on the problem of empty belt identification. In the course of production, we should not turn on or switch to the empty conveyor belts. Therefore, the accurate recognition of the empty belt is necessary to realize the automatic control and switching of conveyor belts. The second problem is the accurate belt ore image segmentation method, especially for the coarse-fine images. According to the experimental results of previous researches, it cannot be concluded that the proposed method can accurately segment both coarse and fine materials. An accurate image segmentation method is the basis of a particle size distribution measurement system. Although there is a method which uses machine learning to identify the pixels of ore edges, the contour detection based on image segmentation is more stable, accurate, and adaptable. In the future, the ore image segmentation method based on deep learning will be an expected practice. The third problem is related to the overlapping of particles. Even though many studies have been conducted in order to find viable solutions to these problems, most of them rely on empirical correction [4,7,20]. Dynamic image analysis (DIA) is a feasible method to solve the overlap problem [15].

Traditional rock image analysis methods cannot distinguish different types of images, such as empty belt images, mixed material images, and coarse material images, which are distinct. The mixed materials include coarse-fine materials and fine materials. It is difficult to accurately process all three types of belt ore images with an image segmentation algorithm; therefore, our analysis method should be able to accurately classify the images we obtain. In recent years, with the rapid development of computer hardware and deep learning theory, the convolutional neural networks (CNNs) have shown great progress in the field of image recognition and classification [21,22]. Krizhevsky et al. developed AlexNet, which can reach 83.6% top-5 accuracy for the ImageNet dataset [23]. At present, the top-5 accuracy of many convolutional neural networks in image recognition tasks can reach more than 90%

for the ImageNet dataset [24–26]. Many research studies on rock image recognition and classification based on deep learning have achieved high accuracy [8,27,28]. *Minerals* **2020**, *10*, 1115 3 of 16

In this research, the method based on the deep learning method and the image processing technology was developed in order to accurately segment the belt ore images. The strategy was to classify the belt ore images into empty belt, mixed materials and coarse materials first and then use different algorithms for processing mixed materials and coarse materials. We focused on the accuracy of belt ore image segmentation and empty belt identification. Both the conversion model for converting two-dimensional information of ore into three-dimensional information and the impact of the shape of the particles on the product properties are beyond the scope of this article. In this research, the method based on the deep learning method and the image processing technology was developed in order to accurately segment the belt ore images. The strategy was to classify the belt ore images into empty belt, mixed materials and coarse materials first and then use different algorithms for processing mixed materials and coarse materials. We focused on the accuracy of belt ore image segmentation and empty belt identification. Both the conversion model for converting two-dimensional information of ore into three-dimensional information and the impact of the shape of the particles on the product properties are beyond the scope of this article.

#### **2. Details of the Method 2. Details of the Method**

The proposed method is divided into three layers. The first layer is a classifier based on a convolutional neural network. The second layer consists of two image processing algorithms based on the OpenCV library. The two algorithms are used to process coarse material images and mixed material images, respectively. The third layer is the statistics layer. The classifier divides the raw images into the empty belt, coarse materials, and mixed materials. If the belt is empty, it gives an alarm; otherwise, it uses the coarse image segmentation (CIS) algorithm to process coarse material images and uses the fine image segmentation (FIS) algorithm to process mixed material images, respectively. Finally, the cumulative area distribution is calculated following the counting segmentation area information. The technical roadmap is shown in Figure 1. The proposed method is divided into three layers. The first layer is a classifier based on a convolutional neural network. The second layer consists of two image processing algorithms based on the OpenCV library. The two algorithms are used to process coarse material images and mixed material images, respectively. The third layer is the statistics layer. The classifier divides the raw images into the empty belt, coarse materials, and mixed materials. If the belt is empty, it gives an alarm; otherwise, it uses the coarse image segmentation (CIS) algorithm to process coarse material images and uses the fine image segmentation (FIS) algorithm to process mixed material images, respectively. Finally, the cumulative area distribution is calculated following the counting segmentation area information. The technical roadmap is shown in Figure 1.

**Figure 1.** The technical roadmap of the method. **Figure 1.** The technical roadmap of the method.

#### *2.1. The Convolutional Neural Network Classifier 2.1. The Convolutional Neural Network Classifier*

#### 2.1.1. The Dataset Preparation

2.1.1. The Dataset Preparation The size of the dataset is considered crucial for evaluating the performance of the trained model. An insufficient dataset causes a low recognition accuracy of the trained model. The dataset used in this study consisted of 2880 images collected from the process control system (PCS) system of a mineral-processing plant in the Yunnan Province, China. The images were taken by eight cameras installed on eight feeding belts with a collection rate of eight photos per minute for each camera. AXIS P3227-LVE cameras from AXIS are used, as well as LED PCS6-LED80 W lamps from Woodgrove. Two belt ore images are taken by each camera continuously every 15 s. The PCS system stores the latest 100 pictures from each camera; therefore, the images in the PCS system are completely updated The size of the dataset is considered crucial for evaluating the performance of the trained model. An insufficient dataset causes a low recognition accuracy of the trained model. The dataset used in this study consisted of 2880 images collected from the process control system (PCS) system of a mineral-processing plant in the Yunnan Province, China. The images were taken by eight cameras installed on eight feeding belts with a collection rate of eight photos per minute for each camera. AXIS P3227-LVE cameras from AXIS are used, as well as LED PCS6-LED80 W lamps from Woodgrove. Two belt ore images are taken by each camera continuously every 15 s. The PCS system stores the latest 100 pictures from each camera; therefore, the images in the PCS system are completely updated every 12.5 min. We wrote a Python script to transfer the pictures from the storage folders to specified

every 12.5 min. We wrote a Python script to transfer the pictures from the storage folders to specified

folders, and the transfer was executed every 13 min, lasting for a week. In the image transfer stage, the goal is to obtain sufficient belt ore images. It is efficient and economical to directly transfer images saved by the PCS system. At the same time, only two feeding belts are in running; therefore, there are a lot of duplicate images in the specified folders. All of the images have a size of 2304 × 1728 px (JPEG file). We planned to select about 3000 sharp, non-repetitive, and representative belt ore images as the original dataset. The 982 empty belt images were picked out including empty belts with water stains, empty belts with small particles, images taken by telephoto lens, and images taken by short focal length lens. We divided the belt ore images with fine content less than 30% into coarse material images. The features of coarse material images are obvious and similar; therefore, the number of coarse material images can be reduced appropriately. The 841 coarse material images were selected. The 1057 mixed material images were selected according to the proportion of fine material. The mixed material images were selected consisting of 100%, 90%, 70%, and 50% fine material. All images were taken from an industrial site and were not created; therefore, the proportion of fine material was an estimation and not an exact value. The 2880 high-quality, representative images were selected from the saved images as the original dataset. After physical verification, 522 px in each image was found to be equal to 30 cm. Several examples of the images are shown in Figure 2. *Minerals* **2020**, *10*, 1115 4 of 16 are a lot of duplicate images in the specified folders. All of the images have a size of 2304 × 1728 px (JPEG file). We planned to select about 3000 sharp, non-repetitive, and representative belt ore images as the original dataset. The 982 empty belt images were picked out including empty belts with water stains, empty belts with small particles, images taken by telephoto lens, and images taken by short focal length lens. We divided the belt ore images with fine content less than 30% into coarse material images. The features of coarse material images are obvious and similar; therefore, the number of coarse material images can be reduced appropriately. The 841 coarse material images were selected. The 1057 mixed material images were selected according to the proportion of fine material. The mixed material images were selected consisting of 100%, 90%, 70%, and 50% fine material. All images were taken from an industrial site and were not created; therefore, the proportion of fine material was an estimation and not an exact value. The 2880 high-quality, representative images were selected from the saved images as the original dataset. After physical verification, 522 px in each image was found to be equal to 30 cm. Several examples of the images are shown in Figure 2.

**Figure 2.** Samples of the original dataset: empty belt, mixed materials, and coarse materials. **Figure 2.** Samples of the original dataset: empty belt, mixed materials, and coarse materials.

#### 2.1.2. Model Training

vector. The normalization is described as follows:

2.1.2. Model Training The design of our network is shown in Figure 3. The network consists of two convolution layers, two maxpool layers, two fully connected layers, and three ReLU activation functions. The DELL R730 server was used to train the model. The Windows Server 2012 was used as the operating system. The Intel E5–2609V4 was used as the CPU with a RAM of 32 GB, and the Nvidia P2000 (5 GB) was used The design of our network is shown in Figure 3. The network consists of two convolution layers, two maxpool layers, two fully connected layers, and three ReLU activation functions. The DELL R730 server was used to train the model. The Windows Server 2012 was used as the operating system. The Intel E5–2609V4 was used as the CPU with a RAM of 32 GB, and the Nvidia P2000 (5 GB) was used as the GPU.

as the GPU. The original dataset was used as the raw data. The input images consisted of three channels, which were resized to 500 × 500 × 3. All input images were required to undergo a two-step pretreatment process. In the first step, the value range of pixels was changed from 0–255 to 0–1. In the second step, the image was normalized by using the empirical mean vector and the empirical std The original dataset was used as the raw data. The input images consisted of three channels, which were resized to 500 × 500 × 3. All input images were required to undergo a two-step pretreatment process. In the first step, the value range of pixels was changed from 0–255 to 0–1. In the second step, the image was normalized by using the empirical mean vector and the empirical std vector. The normalization is described as follows:

$$mean = [0.485, \, 0.456, \, 0.406] \tag{1}$$

$$std = \begin{bmatrix} 0.229 \text{ } 0.224 \text{ } 0.225 \end{bmatrix} \tag{2}$$

The original dataset was completely shuffled: 20% of the images assigned to the test set, 20% of the images assigned to the validation set, and 60% of the images used as the training set. The

*2.2. CIS and FIS Algorithms* 

*2.2. CIS and FIS Algorithms* 

2.2.1. CIS Algorithm

2.2.1. CIS Algorithm

$$result = (image - mean) / std \tag{3}$$

The original dataset was completely shuffled: 20% of the images assigned to the test set, 20% of the images assigned to the validation set, and 60% of the images used as the training set. The processed images were input into the neural network for model training. The epoch value was set to 10, the batch size was set to 32, the learning rate was set to 0.001, and the cross-entropy was used to evaluate the training loss. The prediction result was compared with the true label in order to calculate the training accuracy and the validation accuracy of every epoch. The training accuracy and validation accuracy were used to update the weights in the model [29]. The training process is shown in Figure 4. The training accuracy was 99.48%, the validation accuracy was 100%, and the training cross-entropy equaled 0.0146 when the epoch equaled 10. The model was tested with the test set, and the prediction accuracy was 100%. *Minerals* **2020**, *10*, 1115 5 of 16 processed images were input into the neural network for model training. The epoch value was set to 10, the batch size was set to 32, the learning rate was set to 0.001, and the cross-entropy was used to evaluate the training loss. The prediction result was compared with the true label in order to calculate the training accuracy and the validation accuracy of every epoch. The training accuracy and validation accuracy were used to update the weights in the model [29]. The training process is shown in Figure 4. The training accuracy was 99.48%, the validation accuracy was 100%, and the training cross-entropy equaled 0.0146 when the epoch equaled 10. The model was tested with the test set, and the prediction accuracy was 100%. *Minerals* **2020**, *10*, 1115 5 of 16 processed images were input into the neural network for model training. The epoch value was set to 10, the batch size was set to 32, the learning rate was set to 0.001, and the cross-entropy was used to evaluate the training loss. The prediction result was compared with the true label in order to calculate the training accuracy and the validation accuracy of every epoch. The training accuracy and validation accuracy were used to update the weights in the model [29]. The training process is shown in Figure 4. The training accuracy was 99.48%, the validation accuracy was 100%, and the training cross-entropy equaled 0.0146 when the epoch equaled 10. The model was tested with the test set, and the prediction accuracy was 100%.

**Figure 3.** The architecture of our network. **Figure 3.** The architecture of our network. **Figure 3.** The architecture of our network.

**Figure 4.** Training process using our network. **Figure 4.** Training process using our network. **Figure 4.** Training process using our network.

the surface of the coarse ores and the coarse ores covered by fine particles led to the region of coarse ores in the image being divided into many small regions. Therefore, the CIS algorithm should be able

the surface of the coarse ores and the coarse ores covered by fine particles led to the region of coarse ores in the image being divided into many small regions. Therefore, the CIS algorithm should be able
