*2.1. Encircle–City Feature*

By comparing a large number of sample images, it was found that the brightness area of coal block images is significantly larger than that of gangue images, but the degree of recognition of distinguishing coal blocks and gangue according to the contrast index is not very high because the contrast of coal is significantly greater than that of gangue from a local point of view. Nevertheless, this difference in contrast will partially offset the images from each other from an overall point of view.

In this article, the basic idea of *Encircle–City Feature* is to divide the sample image into several continuous small areas of 50 × 50 without overlaps or blanks, in which the following operations should be performed. If we assume that the matrix of the small 50 × 50 area is I, the implementation steps are as follows:

Step 1: Divide each sample image evenly into *M* × *N* small areas with *M* rows and *N* columns such that each small area should be 50 pixels × 50 pixels without overlaps and blanks and perform Steps 2 to 4 for each one.

Step 2: Obtain the average gray value of the "*City*". The total gray (denoted by *Citysum\_gray*) of small 30 × 30 areas is calculated to obtain the average gray (denoted by *Cityaverage\_gray*) in the central district. This is like a castle located in the central area, so we call it the "City" (shown in red in Figure 1). This is given by Equation (1).

$$\begin{cases} \text{City}\_{\text{sum\\_gray}} = \sum\_{i=11, j=11}^{40, 40} I(i, j) \\\\ \text{City}\_{\text{average\\_gray}} = \text{City}\_{\text{sum\\_gray}} / 90 \end{cases} \tag{1}$$

⎨

**Figure 1.** Schematic diagram of *Encircle—City Feature*.

Step 3: Obtain the total gray value (denoted by *Encirclesum-gray*) and the average gray value (denoted by *Encircleaverage-gray*). The peripheral part of the small 50 × 50 area excluding the central 40 × 40 pixels is like the wall around a castle, so it is called the "*Encircle*", as shown in blue in Figure 1. Equation (2) is as follows:

$$\begin{cases} \begin{array}{c} \text{Enriccl}\_{\text{sum}} - \text{gray} = \sum\_{i=1, j=1}^{50,50} I(i, j) - \sum\_{i=6, j=6}^{45,45} I(i, j) \\\\ \text{Enriccl}\_{\text{average} - \text{gray}} = \text{Enriccl}\_{\text{sum}\_{\text{syn}}} / 900 \end{array} \end{cases} \tag{2}$$

Step 4: For the small area of Row *m* and Column *n*, obtain the "*Encircle–City*" value using Equation (3):

$$\begin{cases} \text{Enriccle} - \text{City}(m, n) = \text{City}\_{\text{average} - \text{gray} - \text{Enriccle}\_{\text{average}} - \text{gray}} \\\\ (m << M, n << N) \end{cases} \tag{3}$$

Step 5: Obtain the average value of the "*Encircle–City*" matrix (*M* × *N* matrix "*Encircle–City*" obtained using Step 4). Finally, calculate the "*Encircle–City*" value of the whole sample image as shown in Equation (4).

$$\begin{cases} \begin{aligned} \textit{Average}\_{\textit{gray}} &= \frac{\sum\_{m=1, n=1}^{M \times N} Enccircle - \textit{City}(m, n)}{(M \times N)} \\\\ \textit{Length} &= find - length(Encircle - \textit{City} > Average\_{\textit{gray}}) \end{aligned} \end{cases}$$

$$Enriccle-\textit{City}\_{\textit{eigenvalue}} = length/(M \times N)$$

where "*Averagegray*" means the average value of the overall matrix "*Encircle–City*"; "*Length*" means the number of elements larger than "*Averagegray*" in the matrix "*Encircle–City*", which is obtained by the function "*find-length*"; *Encircle–Cityeigenvalue* stands for the overall "*Encircle–City Feature*" value of one sample image, as shown in Figure 2.

**Figure 2.** Recognition network for coal and gangue images.

In this article, small areas of 50 × 50 were used to segmen<sup>t</sup> one overall sample image, in which "*Encircle*" contains 900 pixels and "*City*" contains 900 pixels, meaning that the *Encircle–City Feature* can better reflect the texture features of the image under ideal circumstances. It should be noted that we only used the *Encircle–City Feature* value to identify coal and gangue, and the recognition accuracy reached 83.24%, which is not discussed in detail due to limited space.

Unfortunately, the images are often irregular such that the "*Encircle*" and the "*City*" in the small 50 × 50 local area do not necessarily strictly follow the ideal situation shown in Figure 2, which leads to the lower accuracy in identifying coal and coal gangue images using the *Encircle–City Feature* (83.24%). In fact, we are more concerned about the light–dark contrast of small local areas than the whole image, so we introduced the auxiliary value of the "*Encircle–City Feature*": the "*Encircle–City Assist*", the details of which are given below.

Step 1: Divide each sample image evenly into small *M* × *N* areas with *M* rows and *N* columns such that each small area must include 50 pixels × 50 pixels without overlaps or blanks and perform Steps 2 to 4 for each one.

Step 2: Sort the image pixels of the small 50 × 50 area in ascending order according to the gray value, as shown in Equation (5):

$$block\_{sort} = sort(block)\tag{5}$$

where "*blocksort*" means the matrix after arrangemen<sup>t</sup> in ascending, which is calculated and returned by the function "*Sort*".

Step 3: Calculate "*Encircle–CityAssist*", the auxiliary value of the "*Encircle–City Feature*" of the current small 50 × 50 area block, which is the difference between the second half of "*blocksort*" and its first half, as shown in Equation (6).

$$\text{Enriccle} - \text{City}\_{assist} = block(2501 : 5000) - block(1 : 2500) \tag{6}$$

It should be noted that we only used the "*Encircle–CityAssist*" value to identify coal and gangue, and the recognition accuracy reached 78.21%, which is not discussed in detail due to limited space.

⎧⎪⎪⎨⎪⎪⎩ *2.2. ASGS-CWOA-BP*

#### 2.2.1. Overview of ASGS-CWOA

We proposed ASGS-CWOA in [20] with three contributions: the strategy of adaptive shrinking grid search (ASGS), the strategy of opposite–middle raid (OMR) and the adaptive standard deviation updating amount (ASDUA), which has been shown to have superior performance compared with some state-of-the-art algorithms at that time. Accordingly, in this study, we use ASGS-CWOA to address the issue of optimizing the weights of the BP neural network for coal gangue image recognition. In order to adapt to the particularity of the recognition network weights, which are always small, some necessary adjustments of the step size should be made according to the following rules.

In this article, the variation range of the weights was set between −5 and 5 based on experience, i.e., *range\_max* = 5 and *range\_min* = −5. Correspondingly, the value range is [−5, 5] in any dimension for the position of one wolf.

Thus, the step size of the siege stage can be obtained by using Equation (7) as follows:

$$step\\_c\\_max = (range\\_max - range\\_min)/2$$

$$step\\_c\\_min = 0.01\tag{7}$$

*stepc* = *step*\_*c*\_*min* × (*range*\_*max* − *range*\_*min*) × *exp*((*log*(*step*\_*c*\_*min*/*step*\_*c*\_*max*)) × *t*/*T*)

> where *step\_c\_max* is the upper limit of the siege step's size, *step\_c\_min* is the lower limit, *t* indicates the current number of iterations and *T* represents the upper limit.

> The step sizes of the migration stage and the summons–raid stage can be obtained by using Equation (8) as follows:

$$ \begin{cases} \text{step} = \text{step} \times 100, \text{ when } \text{step} \ge 0.001 \\\\ \text{step} = \text{step} \times 1000, \text{ when } \text{step} < 0.001 \\\\ \text{step} = \text{step} \times 2 \end{cases} \tag{8} $$

where *stepa* means the step size of the migration stage and *stepc* means one of the summons– raid stages. In order to prevent the *stepc* value from getting smaller and smaller with each iteration such that the values of *stepa* and *stepb* become too small to affect the optimization effect, the values of *stepa* and *stepb* are amplified when the value of *stepc* is less than 0.001.

#### 2.2.2. The Recognition Network

Based on the BP neural network and the ASGS-CWOA algorithm and considering the factors of low network complexity and less computation, this research designed a recognition network with a simple structure for coal and gangue images (RN-CGI), which includes six input layers, four hidden layers and one output layer, as shown in Figure 2.

Here, the hidden layer adopts the "tansig" kernel function, and the output layer adopts the "purelin" kernel function. The position coordinates of each wolf in the wolf pack represent the weights of the BP neural network, and the fitness value is jointly calculated by the recognition network and the sample eigenvector according to Equations (9) and (10).

$$X\_i = (\mathbf{x}\_{i1}, \dots, \mathbf{x}\_{id}, \dots, \mathbf{x}\_{iD}, \dots, \mathbf{x}\_{iD}) \text{ ( $i = 1, \dots, n$ ,  $n$ ;  $d = 1, \dots, D$ )}\tag{9}$$

$$\begin{Bmatrix} \text{Net.} W = \begin{bmatrix} \mathbf{x}\_{i1} & \mathbf{x}\_{i5} & \mathbf{x}\_{i9} & \mathbf{x}\_{i13} & \mathbf{x}\_{i17} & \mathbf{x}\_{i21} \\ \mathbf{x}\_{i2} & \mathbf{x}\_{i6} & \mathbf{x}\_{i10} & \mathbf{x}\_{i14} & \mathbf{x}\_{i18} & \mathbf{x}\_{i22} \\ \mathbf{x}\_{i3} & \mathbf{x}\_{i7} & \mathbf{x}\_{i11} & \mathbf{x}\_{i15} & \mathbf{x}\_{i19} & \mathbf{x}\_{i23} \\ \mathbf{x}\_{i4} & \mathbf{x}\_{i8} & \mathbf{x}\_{i12} & \mathbf{x}\_{i16} & \mathbf{x}\_{i20} & \mathbf{x}\_{i24} \end{bmatrix} \\ \text{Net.} L = \begin{bmatrix} \mathbf{x}\_{i25} \\ \mathbf{x}\_{i26} \\ \mathbf{x}\_{i27} \\ \mathbf{x}\_{i28} \end{bmatrix} \end{Bmatrix} \tag{10}$$

where *Xid* is the coordinate of the *i*-th wolf in the *d*-th dimension, *Net.W* is the weight from the input layer to the hidden layer, and *Net.L* is the weight from the hidden layer to the output layer. From the sum of the elements of *Net.W* and *Net.L*, it is obvious that *D* is 28, correspondingly. In this way, the location information of each wolf can be mapped into the weight parameters of the recognition network. By continuously optimizing the location information of the wolves, the potential optimal solution with the best fitness value is obtained.

In this article, the network output values of all training samples are calculated according to the network weight parameters mapped by the position information of *wolfi*. Since this article considers the binary classification of coal and gangue images (coal = 0; gangue = 1) and the BP neural network adopts the "tansig" and "purelin" kernel functions, the following judgment can be made for the network output value *outi*: for the *i*-th sample, when *outi* is less than 0.5, this indicates coal, i.e., set 0, but if it is greater than or equal to 0.5, it is judged to be gangue, i.e., set 1. Accordingly, the fitness function can be given by Equation (11).

$$\begin{cases} out = (out\_1, out\_2, \dots, out\_i, \dots, out\_{num}), \text{ i } = 1, 2, \dots, num \\ \qquad right\\_num = length(out = BI) \\ \qquad fitness\_k = right\\_num / num \end{cases} \tag{11}$$

where *num* is the number of training or test samples, *right\_num* is the number of samples correctly identified, *BJ* is the label (0 or 1) of the training or test sample, length (*out* = *BJ*) is a function that can calculate and return the number of correctly identified samples and *fitnessk* is the fitness value of *wolfk* of the *k*-th wolf, that is, the recognition accuracy.

#### *2.3. Overview of the Proposed Method*
