**1. Introduction**

Surface defect detection is a critical step of the metal industry. Since the technologies under development are becoming more and more feasible, and the results are reliable enough for a decision, the optical non-destructive testing (ONDT) has gained more and more attention in this filed. This is mainly due to the development of the used tools: laser, cameras, and those faster computers that are capable of processing large amounts of encrypted data in optical measurements [1]. A review has been provided in [2], which is about the main ONDT technologies, including fiber optics [3], electronic speckle [4], infrared thermography [5], endoscopic, and terahertz technology. The focus of this paper is the digital speckle measurement method because of the use of CCD technology and advanced computer vision technologies. For the high-quality inspection of various types of materials in all kinds of environments, the advanced computer vision technologies have evolved into a mainstream and replaced the conventional manual inspection method, improving on its inefficiency and high labor intensity.

Texture analysis provides a very powerful tool to detect defects in applications for visual inspection, since textures provide valuable information about the features of different materials [6]. In computer

vision, texture is broadly divided into two main categories: statistical and structural. As shown in Figure 1, statistical textures are isotropic and do not have easily identifiable primitives. In contrast, structural (or patterned) textures are characterized by a set of repetitive primitives and placement rules, as shown in Figure 2. Both the statistical and structural textures appear as homogeneous (Figure 1a,b and Figure 2a,b) or inhomogeneous (Figure 1c,d and Figure 2c,d). It should be noted that Figures 1d and 2b are respectively quoted from Reference [7] and Reference [8]. As can been seen, the milling surface we deal with features structured homogeneous or inhomogeneous textures (Figure 2a,c,d).

**Figure 1.** Statistical textures examples. (**a**) Hot-rolled steel strips surface, homogeneous; (**b**) Concasting slabs surface, homogeneous; (**c**) Con-casting slabs surface, inhomogeneous; (**d**) Bridge deck inhomogeneous.

**Figure 2.** Structural textures examples. (**a**) Milled surface of aluminum ingot, oriented, homogeneous; (**b**) Fabric, isotropic, homogeneous; (**c**,**d**) Milled aluminum ingot surface, inhomogeneous.

In order to enable automatic and non-destructive detection, visual inspection systems have found wide applications in surface detection such as concrete structures [7,9–11] and metal surfaces [12–30]. In the field of concrete structure, there are lots of studies that try to inspect cracks from image analysis [7,9–11]. In the field of metal surfaces, visual inspection systems have been applied in both ferrous metal and nonferrous metal surface detection. For the nonferrous metals, methods to detect the surface defects of various products such as aluminum strips [12–14], aluminum foils [15], and aluminum profiles [16–18] have been well established. About the ferrous metal, the types of steel surfaces studied for defect detection based on vision include slab [14,19,20], plate [21–23], hot strip [24–26], and cold strip [27–29]. The comprehensive survey for typical flat steel products can be found in [30]. In general, the above defect detection techniques can be roughly divided into three categories: statistical, filtering, and machine learning.

The statistical method is to establish a mathematical model using probability theory and mathematical statistics, which can be used to infer, predict, quantitatively analyze, and summarize the spatial distribution data of pixels [31]. Reference [7] presents a multiple features-based cracks detection algorithm of bridge decks. A comprehensive analysis of multiple features (intensity-based, gradient-based, and scale-space) and multiple classifiers (random forests, support vector machines, and adaboost) show a peak classifier performance of 95%. Reference [24] proposed a simple yet robust feature descriptor against noise named the adjacent evaluation completed local binary patterns for hot-rolled steel strip surface defects recognition. Filtering-based methods commonly apply a filter bank to an image to calculate the energy of the filter response. To provide an efficient multi-scale directional representation of different defects, the shearlet transform is introduced in [14]. With the popularity of artificial intelligence in recent years, machine learning has been applied extensively in surface defect detection. Reference [9] used a supervised machine learning method called light gradient boosting machine (LightGBM) to detect cracks from the concrete surface imagery. The features

are derived from pixel values and geometric shapes of cracks. In addition, spectral filtering approaches are suitable for the defect detection of uniform textured images composed of basic texture primitives with a high degree of periodicity [32]. Fourier transform (FT) was used in [33] to detect defects in directionally textured surfaces. Nevertheless, the FT-based approaches are inadequate under the circumstances that Fourier frequency components related to the background and defect areas are highly mixed together [34]. Gabor wavelet was used in [30] to extract features of images with periodic texture. Wavelet transform has been successfully applied in defect detection on statistical surfaces such as cold-rolled steel strips [27] and hot-rolled steel strips [35], and it has also been well used for homogeneous patterned surfaces [36]. Navarro et al. [6] present a wavelet reconstruction scheme to detect defects in a wide variety of structural and statistical textures.

Recently, fine-designed deep convolutional neural networks have emerged as powerful tools in a variety of computer vision tasks. Reference [10] proposed an improved You Only Look Once (YOLOv3) with transfer learning, batch renormalization, and focal loss for concrete bridge surface damage detection. The improved single-stage detector achieved a detection accuracy of 80% on a dataset containing a total of 2206 inspection images labeled with four types of concrete damages. Reference [11] proposed a crack detection method based on deep fully convolutional network (FCN) semantic segmentation with the VGG16 backbone on concrete crack images. The FCN network is trained end-to-end on a subset of 500 annotated 227 × 227-pixel crack-labeled images and achieves about 90% in average precision. An end-to-end steel strip defect detection network model was outlined in [28]; this system is based on the symmetric surround saliency map for surface defects detection and deep convolutional neural networks (CNNs) for seven classes of steel strip defects classification. To inspect the defects of a steel surface, Reference [23] presents a new classification priority network (CPN) and a new classification network, multi-group convolutional neural network (MG-CNN).

However, these defect detection methods are primarily used for only crack defects on concrete structures or metal surfaces with non-texture backgrounds. As far as we know, there is no literature on the surface defect detection of aluminum ingots with a milling grain background. The surface of aluminum ingot after milling always has multi-directional and multi-scale grinding texture patterns; sometimes, the distribution of the grinding ridge is uneven. After milling, various surface defects (Figure 3) will appear on the surface of aluminum ingot such as small local defects (Figure 3a), distributed defects with complex texture and fuzzy boundaries (Figure 3b-d), longitudinal linear defects throughout the whole picture (Figure 3e), and large-scale distributed defects with irregular shapes (Figure 3f). In addition, there are many pseudo defects with various patterns on the surface of aluminum ingot, such as aluminum chips (AC), mosquito (Mo) (Figure 3g), and the milling grain (Figure 3h). These factors greatly increase the difficulty of defect detection and recognition. To handle these problems, we propose a detection algorithm of aluminum ingot surface defects combining traditional detection and deep learning classification, which has been applied to the production line of an aluminum ingot milling surface.

The main contributions of the paper are summarized below:


4. At the beginning of the project, even without a large number of labeled samples, the algorithm can still deploy and detect the suspicious regions quickly owing to the improved ROI detection algorithm.

**Figure 3.** Samples of different defects: (**a**) Slag inclusion (SI); (**b**) Pitted slag inclusion (PSI); (**c**) Adhesion aluminum (AA); (**d**) Scratches (Sc); (**e**) Crack (Cr); (**f**) Oxide film (OF); (**g**) Mosquito (Mo) and aluminum chips (AC); (**h**) Texture background (Tb).
