**1. Introduction**

With the prosperous development of the railway industry, the mileage, speed and density of operations continue to increase, and the inspection requirements for railways are further improved [1]. When it runs at high speed, the phenomena such as friction, rolling contact and elastic deformation occur between the train and the track surface. With the running time increasing, it will result in rail surface defects, such as rail wear, broken, peeling and cracks, which seriously threaten the safety of the rail transit system [2]. Therefore, it is particularly important to study the detection methods for railway surface defects.

As a traditional method for rail surface detection, manual inspection [3] is characteristic of time-consuming, labor-intensive [4] and low detection efficiency [5]. With the development of defect detection technology, many rail surface defect detection methods have emerged, such as ultrasonic flaw detection [6], eddy current flaw detection [7], threedimensional detection [8], radar detection [9] and so on. The above methods are very effective in detecting internal defects. However, the signals generated by the defects on railway surfaces are very weak, and they are difficult to detect by the above methods. At the same time, the defect signals are easily interfered with by the surrounding environment, leading it difficult to achieve satisfying results. There is still a big margin for improvement in the detection technology of rail surface defects.

With the development of computer technology, the machine vision [10] method is applied to rail surface defect detection. Rail surface detection images are obtained by linear array cameras, and the images are automatically synthesized according to the

**Citation:** Bai, T.; Gao, J.; Yang, J.; Yao, D. A Study on Railway Surface Defects Detection Based on Machine Vision. *Entropy* **2021**, *23*, 1437. https://doi.org/10.3390/e23111437

Academic Editors: Yongbo Li, Fengshou Gu and Xihui (Larry) Liang

Received: 1 September 2021 Accepted: 26 October 2021 Published: 30 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

required length. Defect data are obtained by manual screening from actual detection images for model training and testing. This method requires an analysis of rail surface defect information, gray information [11] and background information [12]. It needs to use a feature extraction algorithm [13] or to use an operator template and model-based threshold segmentation method [14] to detect rail surface defects. However, these methods are susceptible to defect characteristics that may lead to blind spot detection [15]. This makes it difficult for machine vision methods to obtain good detection performances.

In recent years, with the development of target detection technology and the neural network [16], deep learning frameworks have been proposed for the detection of various railway components. Liu et al. [17] proposed a method based on image fusion features and Bayesian compression image classification and recognition, which detected the status of fasteners by extracting improved edge orientation histograms (IEOH) and macroscopic local binary pattern (MSLBP) features. Cui et al. [18] segmented the fastener image into different parts to avoid the interference of the fastener fragments and tested the segmentation model in a real-time deep learning module.

In the application of a deep learning framework for rail surface defect detection, Xu et al. [19] proposed to improve the Faster R-CNN (Convolutional Neural Networks) for railway subgrade defect recognition. The improved method can obtain good performance, but it has disadvantages such as a slow detection speed and large detection model. Lu et al. [20] proposed to apply the combined U-Net graph segmentation network and damage location method for damage detection of high-speed railways. This method can obtain a high detection accuracy but has the limitations of slow detection speed and large model volume. Yuan et al. [21] proposed the application of MobileNetV2 to detect rail surface defects, which achieved high-speed real-time detection, but the detection accuracy was low. Faghih-Roohi et al. [22] proposed improved deep convolutional neural networks (DCNN) to efficiently extract and recognize image features, and a small batch gradient descent method was used to optimize the network for the automatic detection of track surface defects. This method requires a long time for network training. Song et al. [23] proposed a deep learning method where the YOLOv3 (You Only Look Once, YOLO) algorithm was used to detect rail surface defects. This method has a fast detection speed but low detection accuracy.

In order to solve the above problems, this paper proposes an improved YOLOv4 [24] rail surface detection method. It studies the use of the MobileNetV3 lightweight network as the backbone of YOLOv4. Depthwise separable convolution is applied for the PANet layer in YOLOv4 to further reduce the amounts of the parameters. It treats rail surface defect detection as an end-to-end regression problem and ensures the effectiveness of rail surface defect detection with a simplified network, improving the detection speed and accuracy. It provides a new idea for rail surface defect detection technology.

The main contributions of this paper are as follows: (1) The MobileNetV3 network is proposed to optimize the YOLOv4 model for rail surface defect detection, using depthwise separable convolution for the PANet layer in YOLOv4. This method optimizes the parameter quantity and model size and improves the detection speed. (2) Field tests are conducted on the track to collect data, a dataset is created with Gaussian noise added, and finally, a rail surface defect detection model is established. The test results show that the method used in the study can effectively detect rail surface defects.

The rest of this article is organized as follows. The second part discusses the theoretical background of YOLOv4 and depth separable convolution. The third part gives the technical route of the proposed method. The fourth part verifies the effectiveness of the method through practical application. Finally, the conclusion is drawn in Section 5.

#### **2. Theoretical Background**

The deep learning and machine vision-based object detection methods are widely used in the current research. For the application of these methods, firstly, a large number of images is collected to establish the image datasets, and secondly, image annotation is performed on the object to be detected in the dataset to obtain the object information; then, a training dataset and the object information are trained by the deep network to obtain a deep network model, and finally, the trained model is used for the object detection test. Among them, the most important part is the training of the deep network model. At this stage, the target detector is mainly composed of four parts: input, backbone, neck and head. As shown in Figure 1, the structure of the one-stage network is simpler than the two-stage one, in which a sparse prediction is added.

**Figure 1.** Object detector framework.

Before the YOLO [25] algorithm was proposed, the R-CNN [26] algorithm was one of the most popular algorithms in the two-stage field. CNN has been applied to target detection and formed a relationship with R-CNN [27], the algorithm region. First, the selective search [28] or edge box of the algorithm is used to generate candidate regions [29], and then, each region is trained and classified in the CNN. Compared with the onestage algorithms, the detection speed of the two-stage ones is slower. Therefore, a YOLO algorithm with the characteristics of the one-stage network structure is proposed. Its core concept is to convert the target detection into a regression problem, and the target map is used as the input of the network. Only through a neural network can the position of the bounding box and the target category be obtained. A fast detection speed and high precision can be realized through the feature information.

The YOLOv4 algorithm is improved from the basis of YOLOv3. As a powerful target detection algorithm, a fast and accurate target detector can be trained by YOLOv4. As shown in Figure 1, the network structure is mainly composed of a backbone network, a neck network and a head network. CSPDarknet53 is applied in the backbone network, an SPP add-on module and PANet path aggregation is performed in the neck network and the YOLOv3 head network is used as the head network.

The PANet layer uses an instance segmentation algorithm. The network structure is shown in the neck part of Figure 2. Compared with the feature pyramid networks (FPN) network, the DownSample operation is added in PANet after UpSample to repeatedly improve the features. Parameter aggregation is carried out on the different backbone layers. It further improves the ability of feature extraction. In YOLOv4, the PANet structure is mainly used in the three effective feature layers.

**Figure 2.** YOLOv4 structure diagram.
