**2. The Characteristics of Defect Images**

The surface temperature of continuous casting slabs is very high during the production process. The temperature can reach about 800~900 ◦C, which results in the surface being oxidized and forming a large number of various shapes scales [1]. The scales seriously interfere with the detection and recognition of defects. At the same time, due to insufficient illumination in the production site and the rough surface of continuous casting slabs, the slab surface appears uneven. Also, in the continuous casting slab production process, the quality of surface images collected is decreased due to the splash of cooling water, rolling mill vibrations, and other factors. Figure 1 shows several common surface images of continuous casting slabs, including cracks, scales, lighting variations, and slag marks. The cracks are true defect, while the scales, lighting variations, and slag marks are the interference factors that may lead to misclassification. The interference factors are labeled as false defects and also as a type of recognition object. The main task of continuous casting slabs is to recognize crack defects in the complicated background and interference factors.

**Figure 1.** Surface images of continuous casting slabs. (**a**) Cracks; (**b**) Scales; (**c**) Lighting variations; (**d**) Slag marks.

Figure 1a is longitudinal crack defect sample, abbreviated as cracks. The cracks are mainly along the longitudinal distribution of the slabs, the shape is curved, and the length ranges from a few centimeters to dozens of centimeters. The cracks usually have a certain depth, which shows a different grayscale value than the surrounding pixels in the intense light irradiation. The occurrence of cracks is mostly related to the high-temperature steel and various mechanical behaviors in the solidification process. A crack is a very serious defect.

Figure 1b is a scales sample. The shape of scales is uncertain and varies greatly. Sometimes, some of the scales are warped, but most scales are attached to the surface of the slabs and show texture features. Due to the scales covering the surface of the slabs, some fine crack defects are hard to identify.

Figure 1c is the lighting variations. The bright and dark areas are caused by the irradiation of multiple light sources or changes in illumination intensity. The boundaries of the bright and dark areas show straight line shapes, and the gray values on both sides are noticeably different. Therefore, these boundaries are sometimes misclassified as cracks.

Figure 1d is the slag marks. They are mainly formed by the residual slag. Slag marks are also distributed along the longitudinal direction of the slabs with a certain width, and their gray values are lower than that of the surrounding pixels. These images are easily misclassified as crack defects.

#### **3. Basic Principles of the Proposed Method**

The feature extraction method is the core of the defect recognition algorithm. The quality of feature extraction directly affects the results of defect recognition. The proposed feature extraction scheme DNST-GLCM-KSR in this paper utilizes three technologies, including DNST, GLCM, and KSR, which are described in detail as follows.

### *3.1. Discrete Nonseparable Shearlet Transform*

The wavelet analysis method has been applied in many fields due to its advantages of multi-scale decomposition and fast computation speed. However, the drawback of the wavelet is that the direction decomposition is insufficient. It can only decompose the horizontal, vertical, and diagonal directions. In order to compensate for the deficiency of directional decomposition, the multi-scale geometric analysis (MGA) method was proposed. Typical multi-scale geometric analysis methods include Ridgelet transform [20], Curvelet transform [21], Contourlet transform [22], Shearlet transform [23], and Bandelet transform [24]. The most widely used ones are Contourlet transform and Shearlet transform, while the other methods are limited by their slow computational speed. The advantage of Contourlet is fast computational speed, but its direction representation is limited. The computational speed of Shearlet transform is slower than that of Contourlet, but the direction representation is more flexible. The continuous casting slabs run very slowly on the track, with a speed less than 1 m/s, so the computational speed of the defect recognition algorithm is not required to be fast. The discrete nonseparable shearlet transform (DNSTS) [13,14] is a new kind of shearlet transform.

W.Q. Lim [13] proposed discrete nonseparable shearlet transform (DNST) based on the discrete frame. DNST is constructed from a 2D nonseparable fan filter (improved directional selectivity) and a separable compactly supported shearlet generator (excellent localization properties). It is a direction representation system that extends the wavelet frame. DNST exhibits the same advantages wavelet, namely a unified treatment of the continuum and digital situation.

Two-dimensional discrete shearlet transform is usually obtained using a cone adapted discrete shearlet system defined by scaling functions φ*<sup>m</sup>* and shearlet functions ψ*j*,*k*,*m*, ψ *j*,*k*,*<sup>m</sup>* (by swapping the order of two variables of ψ*j*,*k*,*m*).

$$\left\{\phi\_m: m \in \mathbb{Z}^2\right\} \cup \left\{\psi\_{j,k,m'}\widetilde{\psi}\_{j,k,m}: j \in \mathbb{Z}, j \ge 0, |k| \le 2^{\frac{j}{2}}; m \in \mathbb{Z}^2\right\} \tag{1}$$

where *j* is scaling parameter, *m* is translation parameter, and *k* is shear (direction) parameter.

Non-separable generator ψ*non* is defined as follows:

$$
\hat{\psi}^{\text{non}}(\xi) = P(\xi\_1/2, \xi\_2) \hat{\psi}(\xi) \tag{2}
$$

where the trigonometric polynomial *P* is a 2D fan filter, which can improve directional selectivity in the frequency domain at each scale. ψ is the 2D separable shearlet generator. which can provide excellent localization properties. ψˆ is the Fourier transform of ψ. The nonseparable shearlets ψ*non <sup>j</sup>*,*k*,*m*(*x*) generated by ψ*non* by setting

$$
\psi\_{j,k,m}^{nm}(\mathbf{x}) = 2^{\frac{3}{4}j} \psi^{nm} \Big( \mathbf{S}\_k A\_{2^j} \mathbf{x} - M\_{\mathbf{c}\_j} m \Big) \tag{3}
$$

where *A*2*<sup>j</sup>* is parabolic scaling matrix, *Sk* is shear matrix, and *Mcj* is a sampling matrix given by *Mcj* = diag(*c j* 1, *c j* <sup>2</sup>). We only discuss the case of the shearlet coefficients associated with *A*2*<sup>j</sup>* and *Sk*; the

same procedure can be applied to compute the shearlet coefficients associated with *A* <sup>2</sup>*<sup>j</sup>* and *Sk*. After faithfully digitizing ψ*non <sup>j</sup>*,*k*,*m*, the digital formulation of the discrete nonseparable shearlet transform (DNST) is given by

$$DNTST\_{j,k,m}(f\_l) = \left(f\_l \* \overline{\psi}\_{j,k}^d\right) \left(2^l A\_{2^j}^{-1} M\_{\mathcal{E}\_j} m\right) \text{ for } j = 0, \dots, J-1 \tag{4}$$

where *fJ* is the scaling coefficients. ψ*<sup>d</sup> <sup>j</sup>*,*<sup>k</sup>* is digital shearlet filters, <sup>ψ</sup>*<sup>d</sup> <sup>j</sup>*,*<sup>k</sup>* <sup>=</sup> *Sd <sup>k</sup>*2−*j*/2 *pj* ∗ *Wj* , *S<sup>d</sup> <sup>k</sup>*2−*j*/2 is the discrete shear operator, *pj* is the Fourier coefficients of *P*, and *Wj* = *gJ*−*<sup>j</sup>* ⊗ *hJ*−*<sup>j</sup>*/2, *gJ*−*<sup>j</sup>* and *hJ*−*j*/2 are 1D filters. Please refer to reference [13] for details.

The frequency tiling induced by such discrete shearlet system is shown in Figure 2a, where φˆ, <sup>ψ</sup>ˆ*non* and <sup>ˆ</sup> ψ *non* are associated with the square in the center, the horizontal cone (white), and the vertical cone (yellow), respectively. Each scale corresponds to a ring of tiles and shear is associated with a pair of tiles in a certain direction within the ring. With a proper choice of the parameters associated with the translation, the DNST is obtained as a series of filtering operations. Each shearlet function has two symmetric tiles. The magnitude of the shearlet filter is shown in Figure 2b,c. Figure 2b is the frequency tiles of a DNST filter corresponding to the first scale. Figure 2c is the frequency tiles of a DNST filter corresponding to the second scale.

**Figure 2.** (**a**) The tiling of the ideal frequency plane by a cone adapted shearlet system. (**b**) The frequency tiles of a discrete nonseparable shearlet transform (DNST) filter corresponding to the first scale. **(c)** The frequency tiles of a DNST filter corresponding to the second scale.

#### *3.2. Gray-Level Co-Occurrence Matrix*

Gray-level co-occurrence matrix (GLCM) is an effective texture feature extraction approach. GLCM considers not only the distribution of intensities but also the relative positions of pixels in an image [25]. Let *Q* be an operator that defines the position of two pixels relative to each other, and consider an image, *If* , with possible intensity levels. Let *G* be a matrix whose element is the number of times that pixel pairs with intensities *zv* and *zh* occur in *If* in the position specified by *Q*. A matrix formed in this manner is referred to as a gray-level co-occurrence matrix. Generally, GLCM is not directly regarded as a texture feature, but it is represented by some descriptors such as energy, contrast, entropy, homogeneity, and correlation.

#### *3.3. Kernel Spectral Regression*

Kernel spectral regression (KSR) is a dimensionality reduction method based on manifold learning and subspace [18]. The KSR assumes that the original data is embedded in the low-dimensional manifold of the high-dimensional observation space, and each sample is kept adjacent to it by the manifold learning algorithm, so as to mine the low-dimensional manifold structure contained in the high-dimensional data. KSR only needs to solve a set of regularized least squares problems, which results in huge savings of both time and memory. KSR can make efficient use of label and local neighborhood information to discover the intrinsic discriminant structure in the data. The algorithmic procedure is stated below.


$$w\_{ij} = \begin{cases} \mathbb{I}\_{k'} & \text{if } \mathbf{x}\_i \text{ and } \mathbf{x}\_j \text{ both belong to the } k-th \text{ class} \\ 0 & \text{otherwise} \end{cases} \tag{5}$$

where W*ij* is the weight of the edge joining vertices *i* and *j*.

(3) Responses generation. Find *y*0, *y*1,··· , *yc*−1, the largest *c* generalized eigenvectors of eigenproblem.

$$\mathbf{W}\mathbf{y} = \lambda \mathbf{D}\mathbf{y} \tag{6}$$

where *D* is a diagonal matrix whose (*i*, *i*)*-th* element equals to the sum of the *i-th* column of W, *c* is the number of classes.

(4) Regularized kernel least squares. Find *<sup>c</sup>* <sup>−</sup> 1 vectors <sup>α</sup>1,··· ,α*c*−<sup>1</sup> <sup>∈</sup> *Rm*. <sup>α</sup>*k*(*<sup>k</sup>* = 1, ··· *<sup>c</sup>* <sup>−</sup> <sup>1</sup>) is the solution the linear equations system.

$$(\mathbf{K} + \alpha \mathbf{I})\mathbf{a}\_k = \mathbf{y}\_k \tag{7}$$

where *Kij* = *K xi*, *xj* . It can be easily verified that function *<sup>f</sup>*(*x*) <sup>=</sup> *<sup>m</sup> i*=1 α*k i K*(*x*, *xi*) is the solution of the following regularized kernel least square problem:

$$\min\_{f \in H\_K} \sum\_{i=1}^{m} \left( f(\mathbf{x}\_i) - y\_i^k \right)^2 + \alpha \| \| f \|\_{k}^2 \tag{8}$$

(5) KSR Embedding: Let Θ = [α1, α2, ··· , α*c*−1], Θ is a *m* × (*c* − 1) transformation matrix. The samples can be embedded into c − 1 dimensional subspace by

*<sup>x</sup>* <sup>→</sup> *<sup>z</sup>* = <sup>Θ</sup> <sup>T</sup>*K*(:, *<sup>x</sup>*) (9)

where *<sup>K</sup>*(:, x) <sup>=</sup> [*K*(*x*1, *<sup>x</sup>*),*K*(*x*2, *<sup>x</sup>*), ··· ,*K*(*xm*, *<sup>x</sup>*)]*T*.

#### **4. Defect Recognition Algorithm**

The defect recognition algorithm is the core of the surface quality inspection system. Generally, defect recognition algorithm consists of image preprocessing, image feature extraction, and image classification. In order to obtain more comprehensive information of the surface images of continuous casting slabs, we do not carry out image preprocess. Figure 3 is the schematic diagram of the defect recognition algorithm. The details are as follows.


respectively. Then, the mean and variance of each texture descriptor are calculated. Thus, each sample image can get a 10-dimensional GLCM texture feature vector.


**Figure 3.** Schematic diagram of defect recognition algorithm.

#### **5. Experiments and Discussions**

In this section, the authors introduce the sample database in the experiment in Section 5.1. Some important parameters settings are explained in Section 5.2. In Section 5.3, the feature extraction results of DNST are presented and compared with three commonly used multi-scale methods and one texture extraction method. In Section 5.4, the experimental results of the proposed scheme are presented and compared with other feature combination schemes. The advantages of our proposed

scheme in classification time and accuracy are discussed in detail in Section 5.5. Finally, we analyze the specific classification and visualization results of the proposed schemes.

#### *5.1. Sample Database*

The samples were collected by an online surface inspection system on a continuous casting slabs production line in a steel plant. The defects database consists of 496 samples, which are divided into two types—positive samples and negative samples. The positive samples have crack defects, with 222 samples. The negative samples include three types of images—scales, lighting variations, and slag marks, with 274 samples. The cracks are defects, and the other three types of samples are pseudo-defects. The pseudo-defects are the main factor of false classification, so the pseudo-defects are labeled as a type of samples. The odd numbers of samples were used for the training set, and the even numbers of samples were used for the test set. All sample images are cropped to 128 by 128 pixels for classification.
