1. Introduction
With progressively large-scale high-speed railway construction and increasingly rapid development, ensuring safe service is becoming ever more important. Practice shows that long-term repeated loads inevitably cause microscopic damage to railway facilities, deterioration of the main structural components, and reduction of infrastructure performance. Expanded operating hours result in frequent large-scale mass deterioration across a large number of lines. For example, some lines exhibit serious structural defects such as continuous fastener failure and broken rails because of their extreme operating environments. Generally speaking, railway defects are a sign of potentially serious safety hazards. Failure to deal with these defects promptly will shorten the service life of railway facilities and directly affect the safety of line operations. To make up for the shortcomings of existing detection methods, the development of intelligent detection technologies has become a requirement for maintaining the safe and stable operation of high-speed railway lines [
1]. The traditional method of relying on manual inspection no longer meets the actual needs, and intelligent detection technology based on computer vision has attracted significant attention in recent years [
2,
3,
4]. With the continuous development of related theories and technologies such as image processing, machine learning, and artificial intelligence, research in computer vision is gradually shifting from theory to practical applications, to newly proposed vision detection models for specific applications, and to the development of corresponding algorithms.
Visual inspection based on image processing and pattern recognition methods is a recently developed technology. It is considered one of the technologies with the highest potential in rail surface scratch detection [
5,
6]. In the visual inspection system, a high-speed (usually linear) camera is installed on an inspection train. The camera collects images of the track and uses computer vision and pattern recognition methods to detect defects in the track components. Surface defect detection is a relatively active field of application in current computer vision research. It has been widely used in industrial inspection, such as for textile contamination detection [
7,
8] and metal welding surface detection [
9], among others.
Defect detection is generally divided into two stages, including image feature extraction and defect recognition. Mandriota et al. [
10] first adopted Gabor transformation of an input image to form a feature vector. They then applied SVM to recognize scratched rails. Marino et al. [
11] proposed a real-time track detection system based on wavelet transformation and multi-layer perception. Li et al. proposed a real-time discrete rail surface scratch detection method [
12]. Firstly, the proposed system enhances contrast between defects and background using the local normalization (LN) method, which is nonlinear and illumination independent. Subsequently, the defect localization based on projection profile (DLBP) algorithm is presented to identify possible defects. Instead of LN+DLBP, MLC+PEME is proposed to investigate the possible discrete defects [
13]. MLC (Local Michelson-like contrast) first enhances image contrast, and then PEME (proportion emphasized maximum entropy) automatically locates possible defects by maximizing the object entropy while keeping the defect proportion in low-level. He et al. [
14] used a defect enhancement algorithm based on reverse Peronal–Malik (P–M) diffusion. In addition, some improvements have been proposed, such as the use of a cascaded learning form of weak classifiers [
15] or the Latent Dirichlet Allocation (LDA) model [
16] for fastener deterioration. This type of method generally achieves a better classification accuracy, but it cannot always be used to classify defects characterized by small or large intraclass differences. Alternatively, applications of deep learning have been developed rapidly, in terms of both detection accuracy and algorithm execution efficiency. On the basis of these widely used technologies, a number of methods have been proposed. For example, Gibert et al. [
17] used deep convolutional neural networks (DCNNs) for automatic detection of fastener state and proved the method’s effectiveness. This type of method can generally achieve better recognition accuracy, but it is often associated with high training complexity and requires a large number of training samples to prevent overfitting.
In the field of track inspection, intrinsic variability of the imaging environment, the complex evolution of the line state, and the diversity, scale difference, and sparseness of weak defects in a railway system (such as track surface defects) pose severe challenges. Although track defect detection research has made considerable progress, it still faces three shortcomings, including robustness, generalization, and accuracy. Specifically, factors such as external lighting, camera shake, and noise in the operating environment can significantly affect system performance [
5,
7]. Many methods proposed in the literature are only effective for specific types of defect and lack a general defect detection model and theoretical framework [
12,
16]. In actual operations, the false alarm rate remains high even when the detection system meets the missed detection rate index. The low-level visual feature method commonly used in current systems is the main reason for the high false alarm rate. One complication is that the defect target has various shapes, whereas low-level visual characteristics, such as gray-scale features, texture, and shape, are significantly different. As a result, they are difficult to uniformly represent. In contrast, factors such as light and noise will increase the complexity of a defect’s visual feature representation.
We believe that it is necessary to abandon current defect image representation methods based on visual features, engage in in-depth studies of the statistical and mathematical characteristics of defect images, and establish new defect image representation models and detection algorithms. First, in view of the limitations associated with traditional methods based on limited visual feature extraction, we will explore the ‘simplicity’ description of defect image representation in accordance with the principle of ‘Occam’s razor’ to provide alternatives to current defect image representation methods based on visual features. Because of the track surface’s working mode, the track surface has a relatively consistent background, while the defect targets are sparse. The sparseness is reflected by the fact that the number of defect targets are few compared with the entire track line. The area covered by the defect targets represents a small fraction of the collected track image. Therefore, we believe that sparseness and low rank are essential attributes of defect images. Accordingly, building new defect image representation models and detection algorithms will introduce new general technical solutions into this field.
Generally, a track image subject to inspection is relatively consistent and normal, and any defect is equivalent to an abnormal area. From the perspective of matrix analysis, there is a correlation between the gray-scale matrices of the track surface image, and a potential correlation exists between the defect area on the track surface and the background. Recently, low-rank and sparse decomposition methods have shown promising results in foreground detection. The only assumption made regarding the background is that any variation in its appearance can be captured by the low-rank matrix [
18]. In this simple form, a matrix composed of the observed data can be decomposed into a low-rank matrix representing the background and a sparse matrix consisting of foreground objects, treated as sparse foreground.
In recent years, sparse coding and low-rank recovery have received considerable attention. Researchers have carried out extensive research on multiple levels of basic theory, algorithms, and applications [
19,
20,
21]. Among them, sparse representation theory has been widely used in the field of computer vision [
22,
23,
24,
25] and in many hyperspectral image (HSI) processing fields [
26,
27]. The theory [
28,
29,
30,
31] assumes that an unknown signal can be sparsely represented by a linear combination of several signals from an overcomplete dictionary. The corresponding sparse coding shows a good feature representation ability and can implicitly encode interclass information. Different constraints imposed on the coding,
, solve difficult problems associated with different visual tasks. In addition, a series of studies considered the sparse structure of sparse codes or prior knowledge of the spatial distribution of the residual term
[
32,
33,
34,
35]. A typical example is the ProxFlow method. Its objective function is defined as follows:
where the input signal,
, can be described by a sparse linear representation given the overcomplete dictionary,
, and a residual term
. Here, the norm of
is the sparse structure. The sparse norm of the structure has been proposed for the first time; it can yield compact foreground detection. However, the ProxFlow method requires a large amount of pure background data for training and to obtain a clean overcomplete dictionary; otherwise a clean background cannot be restored, and complete foreground detection cannot be achieved. In addition, the decomposition method combining low rank and sparseness (low-rank matrix restoration theory) has made great progress in signal processing, pattern recognition, and other fields. The only assumption made is that any change in the background appearance can be captured by the low-rank matrix [
18]. In this simple form, the observed data matrix can be decomposed into a sparse matrix [
36] and a low-rank matrix, representing the sparse foreground and the background, respectively. This is the well-known robust principal component analysis (RPCA), which has been studied extensively [
21,
37,
38,
39,
40,
41].
The observation matrix
can be formed by linear superposition of the matrices
and
, assuming that the singular value of
is sparse. That is,
is a low-rank matrix, and the matrix elements in
are sparse. Then, the low-rank matrix recovery can be expressed as:
If all elements in the matrix can be observed, this problem is also referred to as robust principal component analysis (RPCA). The defined nuclear norm of the matrix (the Nuclear Norm) is the
norm of the singular value vector; i.e.,
. The kernel can be easily verified as a convex function. In practical applications, the noise component can also be considered in the superposition equation. The low-rank matrix recovery problem (
2) can be described as a semi-definite programming problem, so it is computationally feasible. Candes et al. [
18] proved that when the number of samples meets certain conditions, the kernel norm minimization problem can be completely restored with a high probability. If the rank of the matrix is lower, more missing elements can be allowed [
18,
42]. More theoretical results show that this recovery is also stable in the presence of noise [
43]. The matrix representation of input data is more intuitive and convenient. The low-rank representation may be considered a generalization of the sparse vector representation. Low-rank matrix recovery theory has attracted wide attention. It has had an important impact on signal processing, pattern recognition and computer vision. For instance, it has been explored extensively in the application of texture extraction and texture modeling [
44,
45], face recognition [
35], video denoising [
46], segmentation [
47,
48], and to detect moving objects in videos [
21], achieving good results.
However, according to low-rank matrix recovery theory, when there is high coherence between low-rank and sparse components, the decomposition performance of the observation matrix will be reduced. Therefore, when the background is cluttered or has a similar appearance to that of the salient objects, they are difficult to separate using the previously introduced methods based on low-rank matrix restoration theory. In addition, noise, light, vibration, and other factors will cause the background to appear dynamic, while the low-rank constraint cannot adequately describe this dynamic background change, and some elements of the background are decomposed into sparse components.
In this paper, we propose a new defect detection model. Specifically, to better describe the dynamic background, we use a differential regular term to describe the background’s dynamic performance. This type of background modeling based on a differential regular term can adequately capture background noise and other factors. To encode the prior spatial structure information of the defect components, we introduce structured sparsity norm constraints and use the hierarchical segmentation tree structure of super pixel blocks to maximize the defect area that is completely absorbed into compact, sparse components. The experimental results show that the proposed model can handle the dynamics of the track surface well and that it can be used for structural inspection of the defect area.
In summary, the main contribution of this paper includes three aspects.
We propose a general defect detection model and a theoretical framework. Based on low-rank matrix restoration theory, a new defect detection model is proposed. Although the proposed framework is general, this is the first study that introduces a low-rank framework to the field of rail surface inspection and demonstrates its superiority over baseline methods.
We propose a differential regular term constraint to replace the general low-rank constraint. It can be used to reconstruct the dynamic background and solve the degradation of the decomposition performance when there is high coherence between low-rank and sparse components.
With regard to the structural sparseness norm constraint, we use the hierarchical tree structure of superpixel segmentation to standardize the sparse components and obtain a more compact and complete defect area. However, the track image has feature limitations, so our study uses structural sparsity constraints applied to the gray levels of superpixel blocks (that is, the gray values of the pixels in the block as well as the block as a whole are drawn as a vector to obtain constraints). This is not equivalent to the use of structural sparsity in the feature space of superpixel blocks, as in several previous studies (that is, a constraint to the matrix imposed by feature vectors between blocks).
The remainder of this paper is organized as follows.
Section 2 offers an overview and details of the proposed method. The experimental setup and results are presented in
Section 3, followed by our conclusion in
Section 4.