*2.1. ROI Extraction*

In the region of interest (ROI) extraction stage, in order to ensure the detection rate of defects, DOG and edge detection with the MGRTS are used to jointly complete the detection of suspicious areas of defects, and special post-processing is adopted to merge similar areas that may be distributed defects. In the proposed method, edge detection with MGRTS can iteratively segment most of the suspicious regions of defects, while the DoG method is mainly used to detect large-scale defects that cannot be completely segmented by MGRTS, and defects that can be missed by the MGRTS when the background texture gradient is strong.

### 2.1.1. MGRT-Based Iterative Threshold Segmentation

In the MGRTS, the mask gradient response is the gradient map after the strong gradient has been eliminated by the binary mask, so that various defects can be effectively extracted from the mask gradient response map by iterative threshold segmentation. The operation process is as follows.

Firstly, we calculate the horizontal gradient of the original image and get the gradient response map of the Original Gradient (OG). Then, an adaptive threshold segmentation is used to get the binary image of the OG. Next, the binary image is used as a mask to eliminate the strong gradient region on the OG, thus obtaining the mask gradient response. As an iteration, we then repeat the first step on the mask gradient response map. Finally, the binary images obtained by each iteration are combined to obtain the segmentation results of different significant defects.

As shown in Figure 5, the original image (Figure 5a) contains aluminum chips and scratches, and Figure 5b is the gradient response map based on the Sobel operator. By using iterative threshold segmentation guided by mask gradient response maps, the defect areas (Figure 5f) are segmented from the gradient map. In each iteration, adaptive threshold segmentation is realized by Equation (1) and Equation (2).

$$G(\mathbf{x}, y) = \frac{1}{2\pi\sigma^2} e^{-\frac{y^2 + y^2}{2\sigma^2}} \tag{1}$$

$$f\_{\rm lin}(\mathbf{x}, y) = \begin{cases} 1, \, f(\mathbf{x}, y) > \left[ f(\mathbf{x}, y) \ast \mathbf{G}(\mathbf{x}, y) + \lambda \sigma\_{\mathcal{S}} \right] \\ 0, \, f(\mathbf{x}, y) \le \left[ f(\mathbf{x}, y) \ast \mathbf{G}(\mathbf{x}, y) + \lambda \sigma\_{\mathcal{S}} \right] \end{cases} \tag{2}$$

Equation (1) generates a Gaussian weight matrix of size m × m, where σ is the standard deviation. Equation (2) combines local Gaussian weighted sum and global standard deviation σ*<sup>g</sup>* to adapt to local texture changes, so that the algorithm can better extract details and improve the detection of non-obvious defects. The \* denotes the convolution operator, and λ is the weight coefficient.

**Figure 5.** Iterative threshold segmentation of gradient map: (**a**) original image; (**b**) gradient response map (Original Gradient, or OG); (**c**) response map of the first threshold segmentation; (**d**) mask gradient (MG1) after the first threshold segmentation; (**e**) response map of the second threshold segmentation; (**f**) the final segmentation result.

As can be seen from Figure 5, when obvious defects (aluminum chips) and slight defects (scratches) exist at the same time, the slight defects cannot be completely segmented from the gradient map after the first threshold segmentation (Figure 5c). Therefore, we adapt the iterative method and design the termination conditions. Before the second threshold segmentation, the response graph of the first

segmentation is reversed to obtain the mask. The mask is applied to the gradient map to eliminate the region with strong gradient value that has been segmented in the first time, and the gradient map (Figure 5d) for the second time is obtained. Figure 5e shows the response map of the second threshold segmentation. Finally, by combining the response maps (Figure 5c, Figure 5e) of the two segmentations, the final segmentation result (Figure 5f) is obtained.

The iteration termination conditions are made up of two parts: the maximum number of iterations and the change degree of Masked Gradient (MG). As long as one condition is satisfied, the iteration will be terminated. The maximum number of iterations is a super parameter N, and the change degree of masked gradient is calculated by Equations (3)–(5):

$$\mathbf{g}\_{i} = \text{mean}(MG\_{i}) + \lambda \text{std}(MG\_{i}), \; i = 1, 2, \dots, N \tag{3}$$

$$\mathbf{g}\_{i} = \text{mean}(MG\_{i}) + \lambda std(MG\_{i}) \text{ , } i = 1, 2, \dots, N \tag{4}$$

$$Isover = \begin{cases} \textit{true}, & \mathbf{g}\_i - \mathbf{g}\_{i-1} \le \delta or \, i = N \\ \textit{false}, & \mathbf{g}\_i - \mathbf{g}\_{i-1} \le \delta or \, i < N \end{cases} \tag{5}$$

where mean (*MG*) calculates the mean value of the masked gradient map, and std (*MG*) calculates the standard deviation. g*<sup>i</sup>* is used to describe the information distribution of the masked gradient map, and it represents the information change of the gradient map after the *i*-th iteration. When *i* is equal to 1, *MGi* − 1 = *MG*0 is the original gradient *OG*. λ is the weight mentioned above, and δ is the threshold of change degree. Figure 6 shows the histogram and statistical information of the gradient map in different iterations. Figure 6a shows the histogram distribution and statistical information difference of the Original Gradient (*OG*) shown in Figure 6b and the Mask Gradient (*MG*1) after the first threshold segmentation shown in Figure 5d, where λ is set to 5 (We first set λ to 3, considering that at the value of 3 sigma, the confidence probability of a normal distribution is 99.7%. To achieve better recall rate and precision, we test the value range from 3 to 3.5 with a step of 0.5. According to the test results, λ can be set within the range of [3.5, 5]. For relatively simple surfaces such as aluminum strip, it can be set to 3.5).

**Figure 6.** Histogram and statistical information of gradient graph in different iterations: (**a**) the original gradient (OG) and the mask gradient (MG1) after the first threshold segmentation; (**b**) the mask gradients MG1 and MG2 after the second threshold segmentation; (**c**) the OG and the MG1 of another test sample; (**d**) the OG; (**e**) the MG1.

It can be observed that after the first threshold segmentation, the statistical information value *g0* (red solid line in Figure 6a) of *OG* is very different from that (*g*1) of *MG*1 (green solid line in Figure 6a), and *g*<sup>0</sup> − *g*<sup>1</sup> = 28.3, so the second threshold segmentation is needed. Figure 6b shows that after the second threshold segmentation, the statistical information value *g*<sup>1</sup> of *MG*1 (red solid line) and the statistical information value *g*<sup>2</sup> of *MG*2 (green solid line) have almost no difference, so the iteration can be ended. Figure 6c shows the distribution of statistical information of another sample after the first iteration of threshold segmentation. Figure 6d,e are the gradient map *OG* and *MG*1, respectively. For this sample, one segmentation is enough, so the distance (5.78) between *g*<sup>0</sup> and *g*<sup>1</sup> in Figure 6c provides a reference for the selection of the threshold.

### 2.1.2. Difference of Gaussians

Difference of Gaussian has been well used in Scale Invariant Feature Transform (SIFT) [37] to identify potential interest points that are invariant to scale and orientation. First, the scale space of an image is defined as a function, *L(x, y,* σ*)*, that is produced from the convolution of a variable-scale Gaussian, *G(x, y,* σ*)* (defined in Equation (2)) with the input image *f(x, y)*,

$$L(\mathbf{x}, y, \sigma) = G(\mathbf{x}, y, \sigma) \* f(\mathbf{x}, y). \tag{6}$$

Then the result image of DoG can be the Difference of Gaussian function convolved with the image, *D(x, y,* σ*)*, which can be computed from the difference of two nearby scales separated by a constant multiplicative factor *k*,

$$\begin{array}{rcl}D(\mathbf{x},\mathbf{y},\sigma)&=&\left(\mathbf{G}(\mathbf{x},\mathbf{y},k\sigma)-\mathbf{G}(\mathbf{x},\mathbf{y},\sigma)\right)\*f(\mathbf{x},\mathbf{y})\\&=\mathbf{L}(\mathbf{x},\mathbf{y},k\sigma)-\mathbf{L}(\mathbf{x},\mathbf{y},\sigma)\end{array} .\tag{7}$$

Considering the time consumption and defect scale, only two scales σ = 0 (the original image) and σ = 7.1 (the corresponding window size is 45) are used, and the result of DoG is

$$D(\mathbf{x}, y) = G(\mathbf{x}, y, 7.1) \* f(\mathbf{x}, y) - f(\mathbf{x}, y). \tag{8}$$

The construction of *D(x, y)* for a surface defect image of aluminum ingot is shown in Figure 7. Figure 7a is the original image with an oxide film defect. Figure 7b is the fuzzy effect image after convolution of the Gaussian function with an original image. The Gaussian window is set as 45 according to the experiment. Figure 7c is the response map of the DoG calculated by Equation (8). In the collected image of an aluminum ingot surface, the gray value of the defect area is lower than the texture background in varying degrees, so this paper uses *G*(*x*, *y*, 7.1) ∗ *f*(*x*, *y*) − *f*(*x*, *y*) to reduce the influence of the background. Figure 7d shows the result of *f*(*x*, *y*) − *G*(*x*, *y*, 7.1) ∗ *f*(*x*, *y*), which introduces a part of the texture response compared with Figure 7c. Figure 7e is a binary image after segmentation with a fixed threshold, and it will be combined with the result image of MGRTS by a logical OR operator.

### 2.1.3. Similar Areas Merge

The segmentation results of MGRTS and DoG are merged, and the enclosing rectangle of each defect area is obtained by contour extraction after morphological expansion. In this way, we can locate the bounding box of defects with clear boundaries, but for the distributed defects without clear boundaries, we need to further integrate the similar region, so as to obtain the bounding box of distributed defects more completely. For each defect ROI, the mean value and standard deviation of the original image (src1) (Figure 8a) and the gradient map (*OG*) (Figure 8b) are calculated respectively, and an information distribution descriptor v with a length of 4 is obtained. Figure 8 shows the examples of a similar region (red box) and dissimilar region (green box) in the original image and gradient image. As shown in Figure 9, for two dissimilar regions, the gray distribution histogram and statistical

information (mean, standard deviation) of their original image and gradient image are different to some extent. However, for two similar regions, the difference is very small, as shown in Figure 10.

**Figure 7.** Difference of Gaussian of aluminum ingot image with a large-scale defect: (**a**) the original image; (**b**) the Gaussian blur effect; (**c**) the DoG response map calculated by Equation (8); (**d**) the DoG response map calculated by the opposite of Equation (8); (**e**) the binary image after thresholdsegmentation.

**Figure 8.** Examples of similar region (red) and dissimilar region (green) in (**a**) the original image and (**b**) the gradient image.

**Figure 9.** Gray histogram and statistical information of two dissimilar regions: (**a**) original images; (**b**) gradient images.

As shown in Figure 11, when iterating through the extracted candidate regions, we can decide whether to merge them into one window by calculating the spatial distance of two windows and the Euclidean distance of their information distribution descriptors.

If the two windows overlap, or the spatial Euclidean distance *ds* (refer to Equation (9)) between their center points *(p, q)* is very close and less than the threshold δ*s*, the information distribution vectors *v*1, *v*<sup>2</sup> will be extracted, and the Euclidean distance *dv* (refer to Equation (10)) will be calculated.

$$d\_s = \sqrt{(p\_x - q\_x)^2 + \left(p\_y - q\_y\right)^2} \tag{9}$$

$$d\_v(v\_1, v\_2) = \sqrt{\sum\_{i=1}^4 \left(v\_{1i} - v\_{2i}\right)^2} \tag{10}$$

**Figure 10.** Histogram and statistical information of two similar regions: (**a**) original images; (**b**) gradient images.

**Figure 11.** Schematic diagram of similar areas merging.

If the distance is less than the threshold δ*v*, the two windows will be merged. After testing the effect of different values on the merge results from similar areas, we set δ*<sup>s</sup>* to 150. As shown in Figure 12, the similar areas merge results are insensitive to the value of δ*s*. For the pitted slag inclusion defect in this paper, it is better to set the threshold to 150. It is recommended to set the δ*<sup>s</sup>* higher, as it ensures that similar areas will merge together as much as possible. Thus, defect ROIs can be completely detected, and the reduction of the ROI number will help to improve the speed of subsequent classification.

As for the value of δ*v*, we calculate the Euclidean distance of *v*1, *v*<sup>2</sup> extracted from 31,304 pairs of similar areas (Figure 13a) and 13,137 pairs of dissimilar areas (Figure 13b). As shown in Figure 13a, the profile of the histogram (H) is approximately normal distribution. According to the 3 sigma principle of normal distribution and the observation of the two histograms, we set δ*<sup>v</sup>* = 1 (3\*std(H) = 0.082).

**Figure 12.** Similar areas merge results with different *ds*: (**a**) Before similar areas merge; (**b**) Similar areas merge result when *ds* is set to 30 and 40 (*ds* = 30, 40); (**c**) *ds* = 50; (**d**) *ds* = 60,70,80,90; (**e**) *ds* = 100,150; (**f**) *ds* = 200.

**Figure 13.** The Euclidian distance (*dv*) histogram of similar areas and dissimilar areas: (**a**) Similar areas distance histogram; (**b**) Dissimilar areas distance histogram.

### *2.2. Defect ROI Classification*

In the classification stage, considering the strong feature extraction and representation ability of the CNN network, we use the inception-v3 [38] network structure to realize the accurate identification of various defects with large intra-class variations and high inter-class similarity.

Inception [39] is a popular convolutional neural network model proposed by Google. Its unique and detailed inception block design makes the model increase the depth and width of the network while maintaining the same amount of calculation. The inception-v3 network is the third version. The biggest change of v3 version is to decompose the 7 × 7 convolution kernel into two 1 × 7 and 7 × 1 one-dimensional convolution kernels. In this way, the calculation can be accelerated, and one convolution layer can be divided into two, which can further increase the depth of the network and strengthen the nonlinearity of the network. Since the data set of aluminum ingot surface defects used in this paper is quite different from the data set of ImageNet [40], the method of fine tune is adopted to train the model parameters.

The aluminum ingot defect samples used in this paper are collected from an aluminum ingot production line in China. The data set has the problem that the number of real defect samples is very small, and the number of false defect and texture background samples is very large. In order to overcome the class imbalance problem and improve the accuracy of defect classification, we use data augmentation technology to preprocess the sample set and introduce the focal loss into the loss function.

### 2.2.1. Data Augmentation

Based on the analysis of the difficult cases in online application, we use basic image transformation such as flipping, contrast enhancement, sharpening, etc. to create a larger data set. Figure 14 shows the original defect images and the corresponding transformed images.

**Figure 14.** Image transformation of slag inclusion defect.

### 2.2.2. Focal Loss for Multi-Class

The focal loss [41] was designed by Lin et al. to address the one-stage object detection scenario in which there is an extreme imbalance between foreground and background classes during training. The focal loss for binary classification has been be given by Equation (11),

$$FL(p\_t) = -(1 - p\_t)^\gamma \log(p\_t),\tag{11}$$

where *pt* ∈ [0, 1] is the estimated probability of the model for the class with label y = 1. (1 − *pt*) <sup>γ</sup> is a modulating factor with a tunable focusing parameter γ ≥ 0 to down-weight easy examples and thus focus training on hard negatives. Similarly, for *k*-class classification, the formula of focal loss for multi-class (FLM) is as follows,

$$FLM(P\_{k \times 1}) = -Y\_{k \times 1}(1\_{k \times 1} - P\_{k \times 1})^\gamma Y\_{k \times 1} \log(P\_{k \times 1}),\tag{12}$$

where *Yk*×<sup>1</sup> is a one-hot label vector with k elements and *Pk*×<sup>1</sup> is the model's estimated probability vector. The multiplication and logarithm here are all operations at the element level within a vector.

### **3. Results**

The algorithm proposed in this paper is a two-stage target detection algorithm, so corresponding experiments are carried out to analyze and evaluate the performance of the ROI extraction and ROI classification algorithms. Finally, the performance of the whole algorithm is evaluated.

### *3.1. Evaluation Metric*

In the actual production, the impact of defect missing detection is much more serious than that of false detection. Therefore, it is necessary for a surface defect detection system to have a high recall rate for real defects and ensure a high accuracy rate. In the experiments, the precision, recall, and F1-score are used to evaluate the system performance, and the accuracy is used to evaluate the classifier performance. These three metrics are defined as follows,

$$precision = \frac{TP}{TP + FP'}, \text{ recall} = \frac{TP}{TP + FN'}, \text{F1-score} = 2 \* \frac{precision \* recall}{precision + recall'} \tag{13}$$

$$acc = \frac{TP + TN}{TP + FN + FP + TN'} \tag{14}$$

where *TP* represents the number of true positives, *FP* represents the number of false positives, *FN* represents the number of false negatives, and *TN* represents the number of true negatives.

### *3.2. Experimental Analysis of ROI Extraction Algorithm*

In the MGRTS, the maximum number of iterations N and the change degree threshold are set to 5 and 6, and the Gaussian weight matrix size m in adaptive threshold segmentation is set to 25. In the DOG, we did experiments to choose the most appropriate Gaussian window size, which is related to the texture scale. We set the window sizes to 25, 35, 45, and 55 respectively to test the effect of DoG. The experiment result (Figure 15) shows that when the window size is 45, the effect is the best; that is, the DoG not only highlights the defect structure, but also suppresses most of the texture background.

**Figure 15.** Cropped DoG results with different window sizes: (**a**) Original image; (**b**) DoG image with window size (ws) 25; (**c**) DoG image with ws 35; (**d**) DoG image with ws 45; (**e**) DoG image with ws 55.

We also experimented to test the ROI extraction effectiveness of MGRTS and DoG, and the performance of the algorithm that merges similar areas. Figure 16 shows a few representative results of different defects including oxide film (Figure 16a,b), oil stain (Figure 16c), pitted slag inclusion (Figure 16d), and crack (Figure 16e). The first line of Figure 16 is the binary response map of the MGRTS, the second line is the binary response map of DoG, and the third line is the result of adding the response map of MGRTS and DoG. As shown in Figure 16b, the MGRTS failed to segment the large-scale defect completely, and MGRTS also failed to detect the oil stain in Figure 16c, which is mixed in a dense texture background, but the DoG algorithm makes up for these two disadvantages. The fourth line of Figure 16 shows the enclosing rectangle of each defect area obtained by contour extraction after morphological expansion, and the last line is the final ROI bounding box after merging similar regions. It can be seen that for large-scale defects (Figure 16a,b) and distributed defects without obvious boundaries (Figure 16d), the similar region merging algorithm can integrate local regions to obtain a complete bounding box, which also reduces the number of ROI windows and improves the classification efficiency and accuracy.

Table 1 shows the quantitative evaluation of MGRTS and DoG, and the combination of MGRTS and DoG in terms of recall and precision. We tested on a defective images data set captured in an aluminum ingot milling machine production line. The data set consists of 180 images with the size of 4096 × 1024, including 153 defects such as oxide film, oil stain, crack, slag inclusion, and pitted slag inclusion. It can be seen that the combination of MGRTS and DoG boosts the ROI extraction performance, especially the recall rate, which is more important to the production line.


**Table 1.** Comparison of MGRTS, DoG, and the combination of both.

### *3.3. Experimental Analysis of Defect ROI Classification*

In the classification experiment, we used 32,665 defect ROI images of aluminum ingot, among which 10% sample images are selected randomly respectively as validation and training sets; the remaining 80% sample images are used as the training set. The specific number of each type of defect is shown in Table 2. It can be seen from the table that the sample number of each type of defect is extremely imbalanced.

**Table 2.** Specific of each type of defect used in the classification experiment.


**Figure 16.** ROI extraction results of different defects: (**a**) OF; (**b**) OF; (**c**) Oil; (**d**) PSI; (**e**) Sc.

In order to verify the ability of focal loss to deal with sample imbalance, we compared the classification effect of using cross entropy loss and using focal loss in an inceptions-v3 network. Figure 17 shows the recall curve of the two methods for each type of defect on the test set. From the green curve in the figure, it can be seen that inception-v3 with focal loss significantly improved the recall rate of adhesion aluminum and mosquito defects with a relatively small number of samples.

**Figure 17.** The recall curve of the two methods for each type of defect.

We also compared the improved inceptions-v3 network with the traditional machine learning method proposed in [42]. As described in [42], we also extracted seven features including anisometry, circinal rate, ratio between the width and area, compactness, rectangularity, elongation, and ratio between area and perimeter. Furthermore, the Artificial Neural Networks (ANN) was trained with the features extracted from the aluminum ingot defect images. The classification accuracy comparison

is listed in Table 3. The ANN with extracted geometric features failed to recognize the AA and Mo, because the AA defects are similar with PSI, while the Sc and the Mo defects are similar with the SI in geometry. These seven features cannot distinguish them well.

**Table 3.** Classification accuracy of Artificial Neural Networks (ANN), inceptions-v3, and inceptions-v3 with focal loss.


*3.4. Overall Performance Analysis of the Proposed Algorithm*

We test the overall performance of our algorithm using the database of 180 defective images described above. On the premise that the detection resolution meets the needs of the industrial field, we down-sample the image as half of the original image to improve the processing speed.

As a contrast, we also test three one-stage target detection algorithms: YOLOv3 [43], RetinaNet [41], and YOLOv4 [44]. In order to match the network structure, improve the detection accuracy, and reduce the loss of large-scale sampling, we preprocess the original annotation image. First, the original image is down-sampled to half of the original image size, and then the aluminum ingot area image after boundary detection is divided into two parts, and finally, it is normalized to 512 × 512 for network training. Due to the small sample size of the original image, we augment the defective image to three times that of the original; 2/3 of it is used as the training set, and the remaining 1/3 is used as the test set.

Figure 18 shows the detection effect of the four methods for different defects. In order to prevent some small defects from being covered by the bounding boxes, we show the detection results of the four methods on four images. The detection results of YOLOv3, Retina Net, YOLOv4, and our method are shown from left to right in each group of comparison images. YOLOv3 uses multi-scale features to detect objects, and it shows a good ability to identify defects such as large-scale oxide film (Figure 18d), crack (Figure 18e,f), and small-scale slag inclusion (Figure 18c), even though the sample data set used in this paper is small. However, for the distributed pitted slag inclusion (Figure 18a,g) and the scratches (Figure 18b) with low contrast, the effect is poor, especially for the scratch defect, and the recall rate is very low. In contrast, our algorithm can detect scratches and pitted slag inclusion well because of using an iterative threshold segmentation of a masked gradient response map and the merging of similar regions.

Table 4 compares the recall (R), precision (P), and F1-score (F1) of the four methods for each type of defect. At the same time, the reasoning time of each algorithm is also listed. In experiments, the top-1 strategy was used in the statistics of detection results, and no threshold was set for the score. The average recall rate and precision of the algorithm in this paper are over 92.0%, but when influenced by a scratch defect, the average recall rate of YOLOv3 is only 66.1%. Meanwhile, RetinaNet is worse in detecting Cr defects. On the whole, the metrics of YOLOv4 are high, but similar to YOLOv3, the recall rate of scratch defects is low. Our algorithm has a relatively low recall and precision for the defects of pitted slag inclusion. The reason is that the resolution of some defects in the image is low, which affects the accuracy of defect classification. It can be seen from Table 4 that our algorithm achieves the highest F1-score and the shortest inference time.

(**d**)

(**f**) **Figure 18.** *Cont*.

**Figure 18.** Detection effect of different methods: (**a**) PSI; (**b**) Sc; (**c**) SI; (**d**) OF; (**e**) Cr; (**f**) Cr; (**g**) PSI. The detection results of YOLOv3, RetinaNet, YOLOv4, and our method are shown from left to right in each group of comparison images.


**Table 4.** Performance comparison of the four methods for each type of defect.

In order to test the robustness of the algorithm to illumination changes, we enhanced and reduced the brightness of the original image to simulate the change of light source brightness. As shown in Figure 19, the brightness changes of the original image are –40%, –20%, 20%, and 40%, respectively. The detection performance of the algorithm for each defect is basically not affected by illumination. This is due to the following two points: (1) ROI extraction is based on the gradient difference, which is not affected by the overall brightness change of the image. (2) As a result of the data augmentation technology mentioned in Section 2.2.1, ROI classification has a certain robustness to illumination change.

**Figure 19.** Detection effect of different brightness: (**a**) PSI; (**b**) OF; (**c**) PSI; (**d**) Sc. From left to right, the brightness of the image is reduced by 40%, reduced by 20%, unchanged, increased by 20%, and increased by 40%.

We also analyzed the reason that led to the failure cases (shown in Figure 20) of our method. Figure 20a shows that our method produce false negatives of Sc defect, which are mainly caused by the low contrast and the horizontal distribution similar to the milling grain background. Similarly, some small PSI defects with low contrast are missed in Figure 20c. There were no corresponding samples in the classification network training, so the pitted oil areas are incorrectly detected as PSI defects, as shown in Figure 20b. For the large-scale oxide film shown in Figure 20d, the oxide film coverage area is too large, so that its interior is treated as regular patterns and neglected by the MGRTS +DoG. As a result, only the edge is retained.

**Figure 20.** Failure cases of our method: (**a**) False negatives (blue box) of Sc (to facilitate observation, we increased the contrast by 20%); (**b**) False positives (yellow box) of PSI; (**c**) False negatives (blue box) of PSI (to facilitate observation, we increased the contrast by 20%); (**d**) Incomplete detection of OF.
