An Improved Target Network Model for Rail Surface Defect Detection

Zhang, Ye; Feng, Tianshi; Song, Yating; Shi, Yuhang; Cai, Guoqiang

doi:10.3390/app14156467

Open AccessArticle

An Improved Target Network Model for Rail Surface Defect Detection

by

Ye Zhang

^1,2,*

,

Tianshi Feng

^1,2

,

Yating Song

^1,2

,

Yuhang Shi

^1,2 and

Guoqiang Cai

^3,4,*

¹

Beijing Engineering Research Center of Urban Transport Operation Guarantee, Beijing University of Technology, Beijing 100124, China

²

Beijing Key Laboratory of Traffic Engineering, Beijing University of Technology, Beijing 100124, China

³

School of Traffic and Transportation, Beijing Jiaotong University, Beijing 100044, China

⁴

State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University, Beijing 100044, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6467; https://doi.org/10.3390/app14156467

Submission received: 1 July 2024 / Revised: 19 July 2024 / Accepted: 23 July 2024 / Published: 24 July 2024

(This article belongs to the Special Issue Intelligent Management and Application of Sustainable Transportation Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Rail surface defects typically serve as early indicators of railway malfunctions, which may compromise the quality and corrosion resistance of rails, thereby endangering the safe operation of trains. The timely detection of defects is essential to ensure the safe operation of railways. To improve the classification accuracy of rail surface defect detection, this paper proposes a rail surface defects detection algorithm based on MobileNet-YOLOv7. By integrating lightweight deep learning algorithms into the engineering application of rail surface defect detection, a MobileNetV3 lightweight network is used as the backbone network for YOLOv7 to enhance both speed and accuracy in complex defect extraction. Subsequently, the efficient intersection over union (EIOU) loss function is utilized as the positional loss function to bolster system resilience. Finally, the k-means++ clustering algorithm is applied to obtain new anchor boxes. The experimental results demonstrate the effectiveness of the proposed method, achieving superior detection accuracy compared with traditional algorithms.

Keywords:

rail surface; MobileNet; feature extraction; convolutional neural networks; machine vision

1. Introduction

Railway infrastructure serves as the backbone of modern transportation systems, facilitating the long-distance movement of goods and passengers. The quality of railway tracks is crucial for the safe operation of trains, with rails forming the center of railway network efficiency and safety, constituting a pivotal foundation for rail transportation. Traditional manual inspections and contact measurement technologies persist in fault detection within rails [1]. However, these methods not only require a significant number of engineers but are also time-consuming and labor-intensive. Moreover, they suffer from slow detection speeds, rendering them incapable of promptly identifying faults [2]. Surface defects on the rails often result from the repetitive motion of trains on the tracks and the friction induced by defective train wheels [3]. Failing to promptly identify and manage these surface defects can lead to an increase in maintenance costs and grave consequences [4]. Therefore, technology capable of conducting automatic rail inspection, reducing operational time, and curbing maintenance expenses becomes imperative for enhancing the safety of railway transportation [5,6].

The high load-bearing capacity and prolonged exposure to environment of railway trains can result in numerous imperfections on the rail surface, such as flaking, short-wave irregularities, squats, fissures in the rails, and deficiencies in fastening components [7,8,9,10]. For example, one common problem affecting the steel rail surface is deep squats, representing rail defects along the running band of the rails. Flaking, on the other hand, refers to the progressive horizontal separation of the rail’s running surface near the gauge corner, often causing the scaling or chipping of small slivers. These imperfections have the capacity to exert a substantial impact on the operational safety of railway transportation. Failure to swiftly and diligently detect these imperfections could jeopardize the safety of train operations and the life and property of passengers. Consequently, to maintain the optimal operational condition of the rails, it is imperative to conduct periodic examinations of the rail system. The expeditious and intelligent detection of surface imperfections on rails has emerged as an exceptionally vital aspect of railway transportation.

There is a growing realization of the necessity for advanced inspection and monitoring technologies to ensure the integrity and reliability of railway tracks. As a result, non-destructive testing methods such as ultrasonic detection, eddy current detection, laser detection, and acoustic emission have been proposed. These methods belong to the domain of physical detection techniques, rooted in the principles of physics with hardware as their core. They facilitate physical phenomena induced by acoustic, optical, electromagnetic, and similar fields to evaluate the quality of rails through the detection of physical manifestations on the rails [11]. Ultrasonic and eddy current detection methods are commonly favored for their noncontact characteristics among the various options available. The ultrasonic detection method harnesses the unique properties of sound waves to identify internal defects and rail fractures. While excelling at pinpointing tiny cracks with high precision, it may encounter limitations in detecting medium to large cracks [12]. Electromagnetic acoustic emission (EMAE) technology can effectively avoid the interference of wheel-rail rolling noise (WRRN) and displays an excellent performance in the field of rail crack detection, but the detection of small cracks is still limited at high train speeds [13]. Eddy current testing, utilizing eddy current technique (ECT) probes and post-processing systems, can detect railhead cracks [9,14]. However, this method faces challenges with respect to identifying rail faults at different levels of severity. In terms of railway systems, laser detection is employed for two-dimensional contour measurements, thereby facilitating a detailed assessment of the conditions prevalent on the railway track surface [15]. Some scholars have collected track inspection images and ultrasonic B-scan images by using the camera and ultrasonic equipment of the rail inspection vehicle, used algorithms to segment the track surface images captured by the camera and filter the ultrasonic operations, and finally tested the images using the trained model [16].

In recent years, artificial intelligence (AI) has led to the widespread application of detection technologies in various fields, such as 3D modeling from unmanned aerial vehicle (UAV) point clouds [17]. This is particularly evident in the automatic identification of defects in civil infrastructure, such as automatic pavement texture recognition, tunnel crack detection, rail track identification, and unwanted object detection [18,19,20]. In comparison with traditional manual inspection methods, AI models indicate superior computational performance, faster detection speeds, and a more cost-effective and secure approach. Therefore, the utilization of AI deep learning models for the automated identification and assessment of defects and service conditions in railway tracks is feasible. The effective combination of digital twins (DTs) and sixth-generation (6G) vehicle-to-electronic (V2X) communications can enhance the analysis of driver behavior and enable fast and accurate diagnosis of vehicle operating conditions to help in vehicle decision-making tasks [21]. Kim B et al. proposed a transformer-based hybrid model that can simultaneously utilize temporal and spatial features to identify anomalies in railway heating, ventilating, and air conditioning (HVAC) systems, achieving good performance [22]. Kim B et al. proposed a model that can simultaneously utilize temporal and spatial features to identify anomalies in railway HVAC systems, achieving good performance [23]. Feng et al. proposed a method employing the probability structure topic model (STM) for the classification and defect detection of automatic fasteners in railway inspection systems [24]. The SeMA-UNet model optimizes deep learning models in data-constrained scenarios, conducts comprehensive feature extraction on railway images, and can achieve better results than traditional models [25]. T. Ye and colleagues devised an automated object detection system capable of discerning obstacles on the railway tracks, including curved sections [26]. Cai et al. proposed the PPNN network by combining probability neural network and a particle swarm optimization algorithm; they achieved satisfactory results in recognizing subway vehicle noise using this network [27]. Aydin et al. devised a fusion model that harnesses deep convolutional neural network (CNN) features in combination with a support vector machine (SVM) for the classification of rail defects [28]. Aytekin et al. conducted an analysis of a fusion methodology for images acquired by a high-speed 3D laser rangefinder, predicated on pixel and histogram similarities. This approach entails reduced computational complexity, rendering it applicable for real-time surveillance [29].

In order to overcome the problem of requiring a large number of training samples for most CNN models, some researchers have used ensemble models for detecting rail surface defects. The detection results indicate that the ensemble algorithm outperforms single detection architectures [30]. Luo H et al. proposed a method based on an improved YOLOv5s, which utilizes techniques such as data augmentation, introducing a global attention mechanism and optimizing loss functions to enhance the detection accuracy of rail surface defects [31]. Kim I et al. applied the RAG-PaDiM algorithm to railway track defect segmentation. By utilizing a residual attention-guided U-Net algorithm to generate embedding vectors and by performing operations such as residual connections and attention gates to provide regional weights, the algorithm achieves accurate pixel-level area curve segmentation and improves task performance [32]. Zheng et al. employed a depth data-driven model and transfer learning approach. They adapted the foundational YOLOv3 and RetinaNet pre-trained models to discern and appraise cracks on the railway surface. This model exhibited commendable recall and precision metrics when applied to a constrained dataset [3]. The FS-RSDD model uses prototype learning to estimate the defect probability of test samples, overcoming the limitations of supervised learning algorithms to achieve high-precision defect detection and localization with limited training samples [33]. Zhou et al., through the manipulation of acquired images from the railway, extracted precise positional information pertaining to the areas of rails necessitating polishing. They employed a machine vision system to discern regions requiring track grinding [34].

Generally, while machine vision and neural networks have been widely employed in industrial inspection tasks, there remains a relative scarcity of research focused on detecting rail surface defects characterized by intricate noise patterns and diverse sample sets. To further improve the accuracy of surface defect detection on railway tracks, this paper proposes a lightweight improved network MobileNet-YOLOv7. The primary contributions are as follows:

(1): A simplified network structure with a MobileNetV3 feature extraction network, reduced model parameters, an enlarged receptive field, more effectively extracted local features of samples, and improved computational efficiency and that has detection accuracy.
(2): To address problems of slow convergence and overfitting in the model, we use the k-means++ clustering algorithm to adjust the anchors for object detection, improving the alignment between anchor boxes and real samples. The results show that this method can effectively accelerate network convergence speed and improve detection accuracy while mitigating sample imbalance concerns.
(3): In order to tackle the challenges of oscillation and slow convergence in the loss function during algorithm training, we opted to substitute the conventional loss function with the EIOU function. By integrating the EIOU function, the algorithm can retain the essential features of the loss while minimizing the difference between the width and height of the target and anchors, consequently enhancing localization performance. The method proposed in this paper is shown in Figure 1.

2. Methodology

YOLOv7 [35] is one of the latest versions in the YOLO series. The YOLOv7 network structure mainly includes a backbone layer, feature pyramid network (FPN) layer, and a head layer [36]. The backbone in YOLOv7 serves as the primary feature extraction network, responsible for processing input images to extract features. These extracted features, referred to as feature layers, constitute a set of features derived from the input images. Furthermore, the FPN is an advanced feature extraction network within YOLOv7 that integrates three key feature layers obtained from the backbone to amalgamate feature information across various scales. The head layer in YOLOv7 is the classifier and regressor, which assesses the feature points to determine whether there is an object corresponding to the prior box on the feature point.

The YOLO model is a very good machine learning algorithm, and in recent years, many scholars have also used the YOLO model to detect surface defects on the track. The YOLO model itself is also being updated at a very fast rate. Although YOLOv7 performs well in terms of speed and accuracy, its use of multiple convolutional operations for feature extraction results in a multilayered network with many parameters. There is potential for improvement in terms of speed and accuracy of detection. Furthermore, with the developments in recent years, more and more engineering departments are using edge devices for detection onsite. The YOLOv7 algorithm employs a significant amount of stacking structures which, while making the network easy to optimize, also leads to excessive model parameters and high hardware demands. This makes it unsuitable for deployment on mobile devices or hardware with limited computing power and not ideal for real-time detection on such devices.

The enhancements to the algorithm are primarily categorized into two aspects: leveraging the capabilities of MobileNetV3 and substituting the YOLOv7 backbone network with the lightweight MobileNetV3 network to establish a symmetrical network structure. MobileNetV3 runs 3 × 3 depthwise separable convolutions on multi-channel convolutional kernels, followed by 1 × 1 pointwise convolutions to reduce the model size. Proper anchor boxes play a crucial role in obtaining an excellent detection model. This paper clusters the rail surface defect dataset using the k-means++ algorithm, and the obtained anchor boxes based on the new clustering results are incorporated into the improved algorithm model, increasing the robustness of the algorithm during training. Utilizing the fast data processing speed of YOLOv7 can accelerate the model’s training speed to achieve real-time data processing. Due to the issue of imbalanced sample quality in the dataset, training low-quality samples can lead to significant fluctuations in loss values, resulting in oscillations in the convergence curve of the loss function. To solve this problem, this paper proposes using the EIOU to replace the original bounding box loss function CIOU, allowing the bounding box loss function to perform the regression more effectively and stably.

2.1. Improved YOLOv7 Network

Traditional CNNs have large memory requirements and computational costs, making them unsuitable to run on mobile and embedded devices. To address this issue, the MobileNet [37] network was developed. The MobileNet series is widely used in object detection for its fast and accurate detection capabilities and has become a representative of lightweight networks. The entire architecture of MobileNetV3 largely follows the design of MobileNetV2, incorporating lightweight lepthwise convolution (DWC) and residual blocks. As a lightweight network, MobileNetV3 stands out for using a neural architecture search (NAS) to build the network as well as integrating the depthwise separable convolutions and linear bottleneck inverted residual structures from MobileNetV1 and MobileNetV2. Additionally, it introduces the SE lightweight attention mechanism in network construction. MobileNetV3 has shown excellent performance in tasks such as mobile image classification, object detection, and semantic segmentation. The MobileNetV3 block incorporates features from both MobileNetV1 and MobileNetV2, including the depthwise separable convolution and the inverted residual structure with linear bottlenecks. The former reduces the model’s computational load, while the latter increases the model’s representativeness. Figure 2 displays the layout of the MobileNetV3 block.

DWC is a commonly used convolution operation in lightweight networks. Assuming the input feature map size is

H \times W \times N

, DWC generates M DWC features after the convolution operation of M D × D size convolution features.

The computational cost of the standard convolution is:

H \times W \times N \times D_{K} \times D_{K} \times M

(1)

The computational cost of DWC is:

D_{K} \times D_{K} \times N \times H \times W + M \times H \times W \times N

(2)

The comparison of computational cost between DWC and standard convolution is:

\frac{D_{κ} \times D_{κ} \times N \times H \times W + M \times H \times W \times N}{H \times W \times N \times D_{κ} \times D_{κ} \times M} = \frac{1}{M} + \frac{1}{{D_{κ}}^{2}}

(3)

For the commonly used 3 × 3 convolution kernel, DWC can reduce computational costs by approximately 90%, significantly lowering the computational burden. Figure 3 illustrates the implementation process.

To ensure the real-time performance and improve accuracy, the SE lightweight attention mechanism is introduced into the network. First, global descriptive features are obtained through the squeeze operation; then, the excitation operation generates weights for each channel; lastly, the output weights of the excitation operation are used as importance indices of feature channels for the reweight operation, which weighs the previous features and recalibrates each channel dimension. Additionally, due to the high computational cost of the sigmoid function in the Swish activation function, which is significant in real-time applications, MobileNetV3 uses an improved h-swish activation function based on the Swish activation function to reduce computational cost effectively and achieve fast detection purposes. The expression of the h-swish activation function is Equation (4):

h - s w i s h (x) = x \cdot \frac{R e L U 6 (x + 3)}{6}

(4)

Substituting the traditional backbone network with the lightweight MobileNetV3 network enables a significant reduction in computational expenses and model size while preserving a satisfactory level of accuracy. This makes YOLOv7 more suitable for use in resource-constrained scenarios. The structure of the network used by the algorithm is shown in Figure 4.

2.2. The EIoU Loss Function

YOLOv7’s loss function has three components: localization, object confidence, and classification loss. The choice of localization loss function is the complete intersection over union (CIOU).

L_{C I O U} = 1 - C I O U = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + α v

(5)

v = \frac{4}{π^{2}} {(a r c t a n \frac{w^{g t}}{h^{g t}} - a r c t a n \frac{w}{h})}^{2}

(6)

α = \frac{v}{(1 - IOU) + v}

(7)

I O U = \frac{A \cap B}{A \cup B}

(8)

where

b

and

b^{g t}

denote the centers of the predicted box and the ground truth box, respectively,

ρ

represents the Euclidean distance between the two centers of the sphere,

c

denotes the diagonal length of the smallest rectangle that encloses both the predicted box and the ground truth box,

α

is the weight function,

v

is the factor to measure the aspect ratio,

w

and

w^{g t}

are the widths of the predicted box and the ground truth box, and

h

and

h^{g t}

are the heights of the predicted box and the ground truth box.

However, CIOU only considers the relative proportion of width and height without taking into account the actual differences in width and height with their confidence levels. Therefore, when the width and height meet certain conditions, the penalty term of the loss function will fail, which is not conducive to model optimization. This paper introduces an effective IOU (EIOU) loss function to replace the CIOU. The target bounding box position and feature information are taken into account by the EIOU loss function. By separating the aspect ratio loss term from the CIOU and directly using the predicted height as a penalty term, the process accelerates convergence and improves regression accuracy. The loss function consists of three parts: overlap area loss

L_{I O U}

, center point distance loss

L_{d i s}

, and width–height loss

L_{a s p}

, following Equations (9) and (10).

\begin{matrix} L_{E I O U} = L_{I O U} + L_{d i s} + L_{a s p} \\ = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + \frac{ρ^{2} (w, w^{g t})}{C_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{C_{h}^{2}} \end{matrix}

(9)

L_{E I O U} = L_{I O U} + L_{d i s} + L_{a s p} = 1 - I O U + \frac{ρ^{2} (b, b^{g t})}{c^{2}} + \frac{ρ^{2} (w, w^{g t})}{c_{w}^{2}} + \frac{ρ^{2} (h, h^{g t})}{c_{h}^{2}}

(10)

where

c_{w}

and

c_{h}

are the widths and heights of the smallest rectangle, which contains both the prediction and the groundtruth boxes.

The conventional IOU loss function can encounter issues when dealing with small objects because the IOU values for small objects are usually low. The EIOU loss function can better handle small objects by incorporating center point and aspect ratio information, improving small object detection performance. Additionally, the EIOU loss function is more sensitive to changes in the position and shape of the target box, making it adaptable to targets of different scales and shapes. This enhances the robustness of the model, enabling it to effectively detect targets in various scenarios.

2.3. Unit Clustering Based on the k-means++ Algorithm

The default sizes of initial anchor boxes in the YOLO V7 algorithm are based on the CoCo training set, which uses the k-means algorithm for calculation and is influenced by the initial cluster centers. If these default sizes were directly used for training, it not only affects the final model’s accuracy but also leads to a prolonged training process without convergence. Therefore, it is essential to recluster the dataset to obtain suitable anchor box sizes, which makes the learning process of deep convolutional neural networks smoother and enables better predictions.

We have improved the accuracy of the proposed object detection network in predicting object positions by using the k-means++ clustering algorithm [38] instead of the original k-means clustering algorithm. Compared with k-means, the k-means++ clustering algorithm improves the accuracy of classification results by optimizing the selection of initial points. By using k-means++ to cluster the dataset, more accurate and representative anchor boxes are generated by reducing the bias in the clustering results caused by the random selection of initial cluster centers.

When using the k-means++ clustering algorithm to select candidate boxes for rail surface defect samples, the distance between sample

b_{i}

and the center of the cluster

c_{j}

are as shown in Equation (11), where

C = {b_{1} (x_{1}, y_{1}), \dots, b_{n} (x_{n}, y_{n})

is the sum of the widths and heights of all labeled rails and rail defects.

d (b_{i}, c_{j}) = 1 - IOU (b_{i}, c_{j})

(11)

where

IOU (b_{i}, c_{j})

represents the intersection over union between

b_{i}

and

c_{j}

.

Once the cluster centers are determined, the following Equation (12) is shown as follows:

P (\begin{matrix} x \end{matrix}) = \frac{d (b_{i}, c_{j})}{\sum_{m = 1}^{N} D {x_{i}}^{2}}

(12)

When the input image size is 640 × 640, the algorithm will generate three different sizes of feature map outputs: 80 × 80, 40 × 40, and 20 × 20. Among them, the 80 × 80 map represents the shallow feature map, suitable for detecting small objects; the 20 × 20 map represents the deep feature map for capturing contour and structural information; and the 40 × 40 feature map is used for detecting medium-sized objects between the other two scales. Each feature map has 3 types of anchors, totaling 9 anchor boxes.

In this research, applying k-means clustering to our training dataset resulted in the identification of nine anchor box sizes, which are (117, 62), (123, 99), (162, 79), (618, 35), (164, 147), (614, 41), (43, 621), (626, 59), and (85, 619). These numbers represent the anchor frame dimensions, with the first number in parentheses representing the length of the anchor frame and the second number representing the height of the anchor frame. The calculation results are shown in Figure 5, where the x-axis and y-axis represent the width and height of the ground truth bounding box, respectively. Each color represents one cluster, and “★” represents the centroid of each cluster after clustering. Because there are 9 sets of anchors in the algorithm, there are 9 centroid points in the figure.

3. Experimental Design

3.1. Dataset of Rail Surface Defects

In this study, the dataset used is a publicly available dataset [28], and the railway surface images were captured by the Railways Research and Technology Center (DATEM) using a measurement train on the Ankara–Konya and Ankara–Eskisehir lines in Turkey. The dataset categorizes images into four different classes based on the types of defects found on the track surface, namely Healthy, Joint, Squats, and Severe Squats (Ssquats), with 492, 408, 608, and 330 samples in each category, respectively. The pixel size of the rail defect images in the dataset is 224 × 224. Figure 6 represents a random selection of images from the dataset. Techniques such as rotation and mirroring were used to enhance the robustness of the dataset. We also expanded the existing dataset by rotating and mirroring images, performing enhancement operations on four types of images. Image mirroring, rotation, and other operations have been proven to be effective in training object detection algorithms [39]. During the data preparation phase, training images were annotated using LabelImg (version: 1.8.6). After dataset preparation, 80% of the data was used for training and validation, while the remaining 20% was used for testing. Moreover, all experimental studies utilized the same training, validation, and test sets to ensure fairness in comparisons.

3.2. Experimental Platform and Equipment

The experimental platform featured an Intel(R) Core (TM) i7-10700K CPU @ 3.80 GHz, NVIDIA GeForce RTX 3080 graphics card, and 16 GB of RAM memory. The operating system was Windows 10; Python 3.8, Pytorch 1.12.0, and CUDA 12.3 were used to build the software environment for deep learning. The training process for the image samples lasted 400 epochs, with a mini-batch size of 16, an initial learning rate of 0.001, and a dropout rate of 0.5. Additionally, the size of the input image was adjusted to 640 × 640 pixels during the experiment.

3.3. Evaluation Metrics

The paper uses a variety of metrics to evaluate the performance of the rail surface defect detection model, specifically including precision (P), recall (R), mAP@0.5, F1 score, AP (average precision), and mAP (mean AP).

The precision represents the proportion of all true positive results in a positive sample that is tested, while the recall represents the model’s ability to correctly detect positive samples. Precision (P) and recall (R) are defined as follows.

P = \frac{T P}{T P + F P}

(13)

R = \frac{T P}{T P + F N}

(14)

where TP is the count of accurately identified model checks, FP is the sum of errors in model detection and target classification, and FN is the quantity of model misses. A threshold value of 0.5 was employed, whereby a prediction box was classified as a positive sample only if its intersection over union (IOU) with the ground truth box surpassed 0.5; otherwise, it was deemed a negative sample.

To assess the model’s comprehensive classification and recall capabilities, the harmonic mean of the above two metrics was used as the F1 score. A higher F1 score indicates a more effective testing methodology. F1 scores are defined as follows:

F 1 = \frac{2 \times P \times R}{P + R}

(15)

The equations for calculating AP (average precision) and mAP (mean average precision) are as follows:

A P = \int \begin{matrix} 1 \\ 0 \end{matrix} P (\begin{matrix} R \end{matrix}) d R

(16)

m A P = \frac{\int_{0}^{1} P (R) d R}{N}

(17)

where PR denotes the precision on the precision–recall curve, while N indicates the object type of the model detection set.

4. Experimental Results and Analysis

4.1. Evaluation Metrics

To validate the effectiveness of the improved algorithm for detecting small targets, this study conducted ablation experiments under consistent training methods and environments. MobileNetV3, k-means++ clustering, and EIOU loss function modules were sequentially added to the original YOLOv7 algorithm in the ablation experiments. Table 1 shows results from the ablation tests.

In Table 1, “√” indicates that the corresponding module was used in the experimental algorithm, while “×” indicates that the module was not used in the algorithm. Overall, in detected object categories, our model showed higher classification accuracy than YOLOv7 for three diseases, with the maximum accuracy difference between them being 5%. Our overall false positive rate was lower than that of YOLOv7. Specifically, optimizing the model’s learning capability by using MobileNetV3 as the backbone network on the original network resulted in an average detection accuracy improvement of 1.6%, with improvements in detection accuracy for various target classes; using the EIoU as the bounding box regression loss function addressed the unclear definition of aspect ratios and the sample imbalance issue in the regression, leading to a 2.5% mAP increase; and incorporating the k-means++ clustering algorithm better adjusted to targets of different scales and shapes, enhancing network information acquisition capabilities and resulting in a 0.4% and 2.3% accuracy improvement for two smaller target classes, respectively, with a minor impact on larger target classes in image processing operations, hence no improvement in the Joint class accuracy. By conducting a sequence of ablation experiments, we were able to clearly observe the beneficial impact of each enhancement on the overall detection performance, thereby substantiating the effectiveness of these enhancements.

4.2. Comparative Experiments

Based on the experimental findings, we computed precision and recall values at various thresholds, connecting these data points to construct a precision–recall (PR) curve, illustrated in Figure 7a. The curve that is closer to the upper right-hand corner shows that as the recall rate increases, the decrease in precision is less pronounced, indicating a better overall performance of the model.

The confusion matrix, which reflects the relationship between the true class of the sample data and the class predicted by the classifier, is a common way to evaluate the performance of a classifier. The horizontal axis of the confusion matrix represents predicted values, while the vertical axis represents true values and “background” represents the background class. The diagonal line in the graph shows the number of correctly identified samples, with darker colors indicating higher numbers. Visualizing test results through the confusion matrix not only allows for the calculation of correct recognition numbers for each class, but also provides insights into the distribution of misclassifications. The confusion matrix detected by YOLOv7 and the proposed method on the same dataset are shown in Figure 7b. The TP rates for Joint, Squats, and Ssquats are 100%, 97%, and 86%, respectively. The corresponding FP rates are notably low, at 0%, 3%, and 14%, respectively. While occasional instances of FN may arise due to the prevalence of shadows and the influence of intricate environmental conditions, potentially affecting the model’s effect, the overall classification of rail defects demonstrates precision and comprehensiveness.

4.3. Comparison with Classical Algorithms

To further validate whether the proposed model is superior at detecting rail surface defects compared with classical algorithms, comparative experiments were conducted with mainstream object detection models such as faster RCNN, YOLOv5s, and YOLOv7 based on the same dataset (as shown in Table 2). This type of comparison helps to more naturally assess the detection performance among different methods. This study evaluated and compared the performance of precision, recall, F1 score, and mAP. An F1 score of 0.86 and mAP of 87.4% were achieved with the two-stage faster RCNN detection algorithm. On the other hand, YOLOv5s showed improved detection performance over faster RCNN, with mAP and F1 scores of 93.9% and 0.90, respectively; however, they were still lower than the proposed method. In comparison, our mAP increased by 1.3% and 4.9% compared with YOLOv5 and YOLOv7, respectively. Considering the superior comprehensive performance, the proposed model is more suitable for locating and detecting rail surface defects in real-world scenarios compared with other algorithms such as faster RCNN, YOLOv5s, and YOLOv7.

5. Conclusions

In this paper, the original YOLOv7 algorithm was improved to propose a more excellent rail surface defect detection algorithm, and MobileNetV3 was used to replace the original backbone network in the feature extraction part. By applying a lightweight feature extraction network, the model size was significantly reduced, leading to potential performance enhancements in detecting rail fasteners. Furthermore, by enhancing the EIoU loss function and utilizing the k-means++ algorithm for data clustering, the instability arising from different initial cluster center selections was mitigated. Compared with the original algorithm, the clustering accuracy is improved, and the algorithm’s robustness was enhanced, thus increasing convergence speed and accuracy. Experimental validation has confirmed that the algorithm presented in this paper can achieve rapid and high-precision positioning, yielding a precision of 94.9%, a recall rate of 90.6%, and a mean average precision (mAP) of 95.2%. The detection algorithm exhibits excellent detection accuracy, making it particularly well suited for conducting rail surface defect detection tasks.

While this paper has made strides in enhancing the mean average precision (mAP), it is important to highlight that the dataset used in this study is from specific railway lines, and further verification is needed to detect surface defects on rails in different environments. Additionally, due to the more detailed feature extraction of MobileNetV3 on the surface condition of rails, the algorithm may mistakenly detect certain areas affected by light and environmental pollution as defects, potentially increasing the false positive rate and requiring further improvement. In future research, we will consider expanding the proposed framework and further improving the lightweight features of the network to optimize the model for application on mobile embedded devices, addressing more complex scenarios in railway surface health analysis.

Author Contributions

Conceptualization, Y.Z. and G.C.; Methodology, T.F.; Resources, Y.S. (Yating Song); Software, T.F.; Validation, Y.S. (Yating Song); Writing—original draft, Y.S. (Yuhang Shi); Writing—review and editing, Y.Z. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key Research and Development Pro-gram of China (“Research on safety assurance, operation and maintenance technologies of high-speed train based on active-safety”) under Grant 2022YFB4301204.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, H.; Jin, X.; Wu, Q.J.; Wang, Y.; He, Z.; Yang, Y. Automatic visual detection system of railway surface defects with curvature filter and improved Gaussian mixture model. IEEE Trans. Instrum. Meas. 2018, 67, 1593–1608. [Google Scholar] [CrossRef]
Cai, X.; Tang, X.; Pan, S.; Wang, Y.; Yan, H.; Ren, Y.; Chen, N.; Hou, Y. Intelligent recognition of defects in high-speed railway slab track with limited dataset. Comput.-Aided Civ. Infrastruct. Eng. 2023, 39, 911–928. [Google Scholar] [CrossRef]
Zheng, Z.; Qi, H.; Zhuang, L.; Zhang, Z. Automated rail surface crack analytics using deep data-driven models and transfer learning. Sustain. Cities Soc. 2021, 70, 102898. [Google Scholar] [CrossRef]
Faghih-Roohi, S.; Hajizadeh, S.; Nunez, A.; Babuska, R.; De Schutter, B. Deep convolutional neural networks for detection of rail surface defects. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 2584–2589. [Google Scholar]
Li, Q.; Ren, S. A visual detection system for rail surface defects. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2012, 42, 1531–1542. [Google Scholar] [CrossRef]
Banić, M.; Miltenović, A.; Pavlović, M.; Ćirić, I. Intelligent machine vision based railway infrastructure inspection and monitoring using UAV. Facta Univ. Ser. Mech. Eng. 2019, 17, 357–364. [Google Scholar] [CrossRef]
Dong, C.; Mao, Q.; Ren, X.; Kou, D.; Qin, J.; Hu, W. Algorithms and instrument for rapid detection of rail surface defects and vertical short-wave irregularities based on fog and odometer. IEEE Access 2019, 7, 31558–31572. [Google Scholar] [CrossRef]
Molodova, M.; Li, Z.; Núñez, A.; Dollevoet, R. Automatic detection of squats in railway infrastructure. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1980–1990. [Google Scholar] [CrossRef]
Kishore, M.B.; Park, J.W.; Song, S.J.; Kim, H.J.; Kwon, S.G. Characterization of defects on rail surface using eddy current technique. J. Mech. Sci. Technol. 2019, 33, 4209–4215. [Google Scholar] [CrossRef]
Wei, X.; Yang, Z.; Liu, Y.; Wei, D.; Jia, L.; Li, Y. Railway track fastener defect detection based on image processing and deep learning techniques: A comparative study. Eng. Appl. Artif. Intell. 2019, 80, 66–81. [Google Scholar] [CrossRef]
Ling, C.; Guo, J.; Gao, X.; Wang, Z.; Li, J. Research on rail defect detection system based on FPGA. In Proceedings of the 2016 IEEE Far East NDT New Technology & Application Forum (FENDT), Nanchang, China, 22–24 June 2016; pp. 195–200. [Google Scholar]
di Scalea, F.L.; Bartoli, I.; Rizzo, P.; Fateh, M. High-speed defect detection in rails by noncontact guided ultrasonic testing. Transp. Res. Rec. 2005, 1916, 66–77. [Google Scholar] [CrossRef]
Chang, Y.; Zhang, X.; Shen, Y.; Song, S.; Song, Q.; Cui, J.; Jie, H.; Zhao, Z. Rail Crack Detection Using Optimal Local Mean Decomposition and Cepstral Information Coefficient Based on Electromagnetic Acoustic Emission Technology. IEEE Trans. Instrum. Meas. 2024, 73, 9506412. [Google Scholar] [CrossRef]
Dike, H.U.; Zhou, Y.; Deveerasetty, K.K.; Wu, Q. Unsupervised learning based on artificial neural network: A review. In Proceedings of the 2018 IEEE International Conference on Cyborg and Bionic Systems (CBS), Shenzhen, China, 25–27 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 322–327. [Google Scholar]
Acikgoz, H.; Korkmaz, D. MSRConvNet: Classification of railway track defects using multi-scale residual convolutional neural network. Eng. Appl. Artif. Intell. 2023, 121, 105965. [Google Scholar] [CrossRef]
Chen, Z.; Wang, Q.; He, Q.; Yu, T.; Zhang, M.; Wang, P. CUFuse: Camera and ultrasound data fusion for rail defect detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21971–21983. [Google Scholar] [CrossRef]
Sahebdivani, S.; Arefi, H.; Maboudi, M. Rail track detection and projection-based 3D modeling from UAV point cloud. Sensors 2020, 20, 5220. [Google Scholar] [CrossRef] [PubMed]
Pan, S.; Yan, H.; Liu, Z.; Chen, N.; Miao, Y.; Hou, Y. Automatic pavement texture recognition using lightweight few-shot learning. Philos. Trans. R. Soc. A 2023, 381, 20220166. [Google Scholar] [CrossRef] [PubMed]
Song, Q.; Wu, Y.; Xin, X.; Yang, L.; Yang, M.; Chen, H.; Liu, C.; Hu, M.; Chai, X.; Li, J. Real-time tunnel crack analysis system via deep learning. IEEE Access 2019, 7, 64186–64197. [Google Scholar] [CrossRef]
Mahmud, I.; Kabir, M.; Shin, J.; Mistry, C.; Tomioka, Y.; Mridha, M.F. Advancing Wildlife Protection: Mask R-CNN for Rail Track Identification and Unwanted Object Detection. IEEE Access 2023, 11, 99519–99534. [Google Scholar] [CrossRef]
Cai, G.; Fan, B.; Dong, Y.; Li, T.; Wu, Y.; Zhang, Y. Task-efficiency oriented V2X communications: Digital twin meets mobile edge computing. IEEE Wirel. Commun. 2023, 31, 149–155. [Google Scholar] [CrossRef]
Kim, B.; Kang, J.W.; Kim, C.S.; Kwon, O.K.; Gwak, J. Hybrid Transformer for Anomaly Detection on Railway HVAC Systems Through Feature Ensemble of Spatial–Temporal with Multi-channel GADF Images. J. Electr. Eng. Technol. 2024, 19, 2803–2815. [Google Scholar] [CrossRef]
Kim, B.; Jeon, Y.; Kang, J.-W.; Gwak, J. Multi-task Transfer Learning Facilitated by Segmentation and Denoising for Anomaly Detection of Rail Fasteners. J. Electr. Eng. Technol. 2023, 18, 2383–2394. [Google Scholar] [CrossRef]
Feng, H.; Jiang, Z.; Xie, F.; Yang, P.; Shi, J.; Chen, L. Automatic fastener classification and defect detection in vision-based railway inspection systems. IEEE Trans. Instrum. Meas. 2013, 63, 877–888. [Google Scholar] [CrossRef]
Kim, B.; Kim, I.; Kim, N.; Park, C.; Oh, R.; Gwak, J. SeMA-UNet: A Semi-Supervised Learning with Multimodal Approach of UNet for Effective Segmentation of Key Components in Railway Images. J. Electr. Eng. Technol. 2024, 19, 3317–3330. [Google Scholar] [CrossRef]
Ye, T.; Zhang, Z.; Zhang, X.; Zhou, F. Autonomous railway traffic object detection using feature-enhanced single-shot detector. IEEE Access 2020, 8, 145182–145193. [Google Scholar] [CrossRef]
Cai, X.; Tang, X.; Chang, W.; Wang, T.; Lau, A.; Chen, Z.; Qie, L. Machine learning-based rail corrugation recognition: A metro vehicle response and noise perspective. Philos. Trans. R. Soc. A 2023, 381, 20220171. [Google Scholar] [CrossRef]
Aydin, I.; Akin, E.; Karakose, M. Defect classification based on deep features for railway tracks in sustainable transportation. Appl. Soft Comput. 2021, 111, 107706. [Google Scholar] [CrossRef]
Aytekin, C.; Rezaeitabar, Y.; Dogru, S.; Ulusoy, I. Railway fastener inspection by real-time machine vision. IEEE Trans. Syst. Man Cybern. Syst. 2015, 45, 1101–1107. [Google Scholar] [CrossRef]
Li, H.; Wang, F.; Liu, J.; Song, H.; Hou, Z.; Dai, P. Ensemble model for rail surface defects detection. PLoS ONE 2022, 17, e0268518. [Google Scholar] [CrossRef] [PubMed]
Luo, H.; Cai, L.; Li, C. Rail surface defect detection based on an improved YOLOv5s. Appl. Sci. 2023, 13, 7330. [Google Scholar] [CrossRef]
Kim, I.; Jeon, Y.; Kang, J.W.; Gwak, J. RAG-PaDiM: Residual attention guided PaDiM for defects segmentation in railway tracks. J. Electr. Eng. Technol. 2023, 18, 1429–1438. [Google Scholar] [CrossRef]
Min, Y.; Wang, Z.; Liu, Y.; Wang, Z. FS-RSDD: Few-shot rail surface defect detection with prototype learning. Sensors 2023, 23, 7894. [Google Scholar] [CrossRef]
Zhou, Q. A Detection System for Rail Defects Based on Machine Vision. J. Phys. Conf. Ser. 2021, 1748, 022012. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Wang, C.; Bochkovskiy, A.; Liao, H. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Arthur, D.; Vassilvitskii, S. k-means++: The Advantages of Careful Seeding; Stanford: Redwood City, CA, USA, 2006. [Google Scholar]
Zoph, B.; Cubuk, E.D.; Ghiasi, G.; Lin, T.-Y.; Shlens, J.; Le, Q.V. Learning data augmentation strategies for object detection. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXVII 16; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 566–583. [Google Scholar]

Figure 1. The rail surface defect detection system.

Figure 2. Structure of the MobileNetV3 block.

Figure 3. Schematic diagram of the DWC process.

Figure 4. The algorithm’s network structure.

Figure 5. The bounding box sizes clustering using the k-means algorithm.

Figure 6. Randomly chosen flawed samples from the dataset.

Figure 7. Confusion matrix comparison. (a) P–R curve; (b) confusion matrix.

Table 1. Experimental results of the improved algorithm ablation.

MobileNetV3	EIOU	k-means++	Input/(Pixel × Pixel)	AP/(%)			mAP@0.5/(%)
MobileNetV3	EIOU	k-means++	Input/(Pixel × Pixel)	Joint	Squats	Ssquats	mAP@0.5/(%)
×	×	×	640 × 640	99.4	95.8	75.5	90.2
√	×	×	640 × 640	99.6	96.3	79.5	91.8
√	√	×	640 × 640	99.6	96.0	87.3	94.3
√	√	√	640 × 640	99.6	96.4	89.6	95.2

Table 2. Mainstream target detection algorithm processing effect comparison results.

Model	Precision (%)	Recall (%)	mAP@0.5 (%)	F1
Faster RCNN	82.1	90.1	87.4	0.86
YOLOv5s	90.3	89.4	93.9	0.90
YOLOv7	84.4	90.2	90.3	0.87
Ours	94.9	90.6	95.2	0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Feng, T.; Song, Y.; Shi, Y.; Cai, G. An Improved Target Network Model for Rail Surface Defect Detection. Appl. Sci. 2024, 14, 6467. https://doi.org/10.3390/app14156467

AMA Style

Zhang Y, Feng T, Song Y, Shi Y, Cai G. An Improved Target Network Model for Rail Surface Defect Detection. Applied Sciences. 2024; 14(15):6467. https://doi.org/10.3390/app14156467

Chicago/Turabian Style

Zhang, Ye, Tianshi Feng, Yating Song, Yuhang Shi, and Guoqiang Cai. 2024. "An Improved Target Network Model for Rail Surface Defect Detection" Applied Sciences 14, no. 15: 6467. https://doi.org/10.3390/app14156467

APA Style

Zhang, Y., Feng, T., Song, Y., Shi, Y., & Cai, G. (2024). An Improved Target Network Model for Rail Surface Defect Detection. Applied Sciences, 14(15), 6467. https://doi.org/10.3390/app14156467

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Improved Target Network Model for Rail Surface Defect Detection

Abstract

1. Introduction

2. Methodology

2.1. Improved YOLOv7 Network

2.2. The EIoU Loss Function

2.3. Unit Clustering Based on the k-means++ Algorithm

3. Experimental Design

3.1. Dataset of Rail Surface Defects

3.2. Experimental Platform and Equipment

3.3. Evaluation Metrics

4. Experimental Results and Analysis

4.1. Evaluation Metrics

4.2. Comparative Experiments

4.3. Comparison with Classical Algorithms

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI