Current-Signal-Based Fault Diagnosis of Railway Point Machines Using Machine Learning

Sugiana, Ahmad; Cahyadi, Willy Anugrah; Yusran, Yasser

doi:10.3390/app14010267

Open AccessArticle

Current-Signal-Based Fault Diagnosis of Railway Point Machines Using Machine Learning

by

Ahmad Sugiana

^*,

Willy Anugrah Cahyadi

and

Yasser Yusran

School of Electrical Engineering, Telkom University, Jl. Telekomunikasi Terusan Buah Batu, Bandung 40257, Indonesia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(1), 267; https://doi.org/10.3390/app14010267

Submission received: 3 December 2023 / Revised: 20 December 2023 / Accepted: 21 December 2023 / Published: 28 December 2023

(This article belongs to the Special Issue Railway Infrastructures Engineering: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

:

The majority of railway operators still implement conventional maintenance for railway point machines (RPMs), which is one of the most vital pieces of equipment for ensuring the safety of train operation. The conventional maintenance method lacks accuracy, is less efficient, and has high labor costs. This study developed a cost-effective and accurate fault diagnosis (FD) method based on current data to increase the overall efficiency of RPM maintenance. The FD method for RPM equipment discussed in this paper consists of three working conditions: normal, working, and failure. The method was proposed based on time-series current signals, which were gathered when the RPM was in operation. Time-series data were extracted and filtered using time-domain feature extraction based on scalable hypothesis testing. The selected features became the datasets for machine learning modeling. Six machine learning algorithms were compared in order to find the algorithm with the best FD accuracy. The results showed 100% accuracy for the Decision Tree and Random Forest algorithms in the FD method. The results of the FD method could be important for maintenance teams in determining suitable maintenance activities based on RPM working conditions.

Keywords:

railway point machine; fault diagnosis; machine learning; predictive maintenance

1. Introduction

The performance and safety of railway systems are significantly affected by railway point machines (RPMs) [1], one of the most critical pieces of equipment in railway signaling systems. An RPM consists of an electric motor and mechanical assembly, which move, detect, and lock the end position of the point tongue either individually or following the direction of the route formed. According to some relevant studies, over the years, failures in RPMs have been the most frequent cause of faults in railway signaling systems, which has a significant impact on the efficiency and safety of train operation [2,3,4]. In addition, at present, conventional maintenance for RPM equipment is mostly employed by railway operators, utilizing large amounts of manpower and material resources [5]. In order to optimize the effectiveness of machine operation, reduce unplanned downtime, and decrease operational and maintenance expenses, it becomes imperative to create an intelligent fault diagnosis (FD) method for RPM equipment. The method should be designed to evaluate the health status, aiming to identify the type, severity, and degradation trend of potential faults [6]. The FD method also has a big impact on increasing the quality and efficiency of maintenance by providing appropriate maintenance recommendations based on the current state of the equipment [7].

In a dynamic system, there are three main methods of fault diagnosis [8]: model-based fault diagnosis, knowledge-based fault diagnosis, and data-driven-based fault diagnosis. Model-based fault diagnosis compares a measured signal with the actual process output and its estimation generated in terms of an appropriate mathematical model of the system working under normal operating parameters [8]. Knowledge-based fault diagnosis focuses on a massive number of historical data to construct a knowledge base that explicitly represents the dependency of system variables [8]. Data-driven fault diagnosis derives valuable information from data captured by sensors and actuators in a dynamic system [9]. In recent years, data-driven fault diagnosis for railway point machines has received increased attention [10] because there is no requirement to model the system. There are two methods used in data-driven-based fault diagnosis: statistical and machine learning methods [8]. The machine learning method has many advantages over the statistical method, including its ability to learn the entire system’s behavior with a few datasets [8], describe very complex and non-linear systems with great accuracy in defect identification, and discover new issues or errors with insufficient data [8].

In recent years, various sensors, including voltage [11], current [11], force [11], acoustic [2], and vibration [12], have been employed in the FD method for RPM equipment. The implementation of a current sensor is the most practical since it does not interfere with operations and can be installed in the power supply, which is located inside the equipment room in the station area, rather than inside the RPM equipment or field area [11]. Furthermore, electric current data can be used to diagnose not just electrical but also mechanical failures [13].

Several studies have shown promising outcomes in utilizing this method for diagnosing faults in railway point machines. For instance, Asada et al. [11] applied wavelet transforms and Support Vector Machines (SVMs) to detect RPM faults using voltage and current data, and the experimental results demonstrated 100% cross-validation accuracy. Jin et al. [14] successfully combined the self-organizing map-minimum quantization error and principal component analysis T

^{2}

for classifying RPM conditions utilizing voltage and current data, achieving 81% classification accuracy in fault detection. Sa et al. [15] experimented to determine whether it was worth replacing RPM using current-data and shapelet methods, achieving 97% accuracy for balanced data and 95% accuracy for unbalanced data in fault detection. Guo et al. [16] examined current data and utilized the stacked autoencoder method to detect failures in RPM equipment, achieving 100% accuracy. However, these studies give numerous methods for detecting or diagnosing failures that use two or more sensors in data monitoring. The smaller number of sensors utilized in the FD method could reduce installation costs [17]. So, further study on the use of only electric current data is required to give low-cost and accurate solutions in the FD method (multi-classification problems) for RPM equipment.

As a result, the purpose of this research is to examine the use of time-series current data and machine learning (ML) techniques in fault diagnosis for railway point machines. When compared to frequency-domain and time–frequency-domain analysis, time-domain analysis is one of the most frequently utilized signal-processing methods since it has the lowest computational cost without neglecting useful information from motor conditions [18,19]. Six ML algorithms will be employed in order to determine which model performs the best in the fault diagnosis of railway point machines: Logistic Regression (LR), k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN). The remainder of this paper is structured as follows. Section 2 describes the methodology. Section 3 presents the research outcomes. Section 4 discusses the results. Section 5 presents the conclusions.

2. Methods

This research was carried out at PT Kereta Api Indonesia (Persero), also known as PT KAI (Persero), which is Indonesia’s largest railway operator. Figure 1 shows the research methodology used in this paper. First, an electric current sensor was developed to monitor the RPM equipment, which is the most applicable sensor in RPM monitoring implementations [11] and delivers a high level of precision in describing the electro-mechanical conditions [13]. Second, various measurement information was acquired for each type of RPM equipment condition using current-data acquisition (DAQ). The three conditions that were monitored were normal, warning, and failure. The obtained time-series current data were extracted into some relevant time-domain features. Third, the features were filtered to become training and testing datasets for machine learning modeling. This study employed six types of ML algorithms: Logistic Regression (LR), k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN). These are supervised learning ML algorithms widely used in predictive maintenance applications, especially in fault diagnosis [5,20]. The confusion matrix approach was utilized to evaluate the classification results of the six ML models [21]. The results were compared to determine which model achieved the best accuracy in predicting RPM equipment conditions.

2.1. Experimental Setup and Data Preparation

The experimental data in this study were obtained from PT KAI (Persero), the largest railway operator in Indonesia. In order to simulate the warning and failure conditions, the current data of a railway point machine were obtained by conducting the experiments on an RPM test bench so it did not interrupt railway operation. The research object was an NSE-type RPM, the most commonly used type of RPM in Indonesia. The NSE-type RPM has a 120 Vdc motor to operate the moving rod in order to change the point direction. Figure 2 shows the experimental setup.

This work employed a WCS1700 current sensor, which was linked to a NodeMCU ESP8266 microprocessor. The current sensor reading was calibrated with a 3041A Precision Multi-Product Calibrator Transmille. After calibration, the current sensor was installed in the neutral line of the RPM equipment’s power line input. Figure 3 shows the installation connection of the current sensor in the neutral line of the RPM equipment. The working current always passes through the neutral line when the RPM is operating in a normal direction and can also reverse direction. So, the current sensor can monitor the RPM during every type of operation.

The current data were acquired while the RPM was in operation and recorded to a .csv file on the workstation laptop. The data acquisition was performed at a sampling frequency of 10 Hz under three categories of RPM conditions: (1) normal, (2) warning, and (3) failure. Normal conditions occurred when the RPM performed its functions in good condition. Warning conditions occurred when the clutch assembly of the RPM contained old grease mixed with mud and needed to be serviced. The clutch assembly is a critical component of the RPM that engages and disengages the power transfer between the gearbox and the moving rod. The warning conditions of the clutch were considered because they cannot be examined through conventional maintenance such as visual inspection and are commonly found in RPM operation [11]. Figure 4 depicts the difference between the RPM clutch assembly under warning and normal conditions. Failure conditions occurred when the RPM was unable to travel into the end position due to an obstruction. The collected time-series current signals were processed and filtered using time-domain feature extraction and scalable hypothesis testing. The filtered features were utilized as the dataset for machine learning modeling in railway point machine fault diagnosis.

2.2. Time-Domain Feature Extraction Based on Scalable Hypothesis Testing

The current data collected during the experimental activity were time-series data. A time series is a set of observations collected in order of time [22]. In order to use a set of time series,

D = {(x_{i})}_{i = 1}^{N}

, as input for supervised machine learning algorithms, each time series x; should be transformed into a well-defined feature space with dimensions M specific to the problem and feature vector

\overset{⇀}{x_{i}} = (x_{i, 1}, x_{i, 2}, \dots, x_{i, M})

[23]. In this paper, time-domain feature extraction is proposed as a method for obtaining specific features. The use of time-domain features is one of the most frequently utilized signal-processing methods since it has the lowest computational cost without neglecting useful information from motor conditions [18,19]. Several time-domain features, such as skewness and kurtosis, are robust to machine operating conditions [24]. Table 1 lists the time-domain features that were utilized in this study.

The extracted features shown in Table 1 were filtered using hypothesis testing. Each feature

X_{i}

was scored according to its significance in predicting target Y. As a result, each feature X; was statistically investigated to determine whether X; was useful for predicting target Y; or irrelevant for predicting target Y. Each test generated a p-value that estimated the probability that feature Y was useful for predicting the target. In this case, a p-value lower than 0.05 indicated that a certain feature X was relevant for predicting target Y. All of these approaches employed tsfresh, an open source Python program that conducts automated feature extraction based on hypothesis testing, specifically designed for time-series datasets.

2.3. Machine Learning Modeling

The selected features from hypothesis testing were employed to establish a dataset for machine learning modeling. The dataset was split into training and testing datasets with an 80/20 ratio to avoid overfitting or underfitting [25]. Figure 5 depicts the ML modeling approach performed using the training dataset and evaluated using the testing dataset.

The training dataset was utilized to tune the hyperparameters of the machine learning model in order to decide which parameter values are best for machine learning modeling. The grid search cross-validation technique is one such method for hyperparameter tuning. Validation set methods, such as leave-one-out cross-validation (LOOCV), 10-fold cross-validation, and 5-fold cross-validation, are types of cross-validation methods. In this paper, 5-fold cross-validation was used in the proposed method, as it has proven to be the most efficient approach with the lowest computing cost and the best outcome [26]. The 5-fold cross-validation approach divided the training datasets into 5 folds, with 80% of the data (4 folds) used as a training set and 20% of the data (1 fold) used to validate the ML model’s accuracy (80/20 ratio) [26]. This procedure was repeated five times with different validation data. Each validation iteration yielded an accuracy and an error value. The mean or average of the accuracy and error values was used to calculate the 5-fold cross-validation results. Fivefold cross-validation was performed in order to evaluate hyperparameter variations. More hyperparameters needed more iterations to obtain the grid search cross-validation results. We utilized the machine learning parameters that yielded the best validation accuracy to train the optimal model by utilizing the whole training dataset [26]. As a result, the testing dataset was used to determine the best ML models. The Logistic Regression (LR), k-Nearest Neighbor (kNN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Artificial Neural Network (ANN) algorithms are the most widely utilized ML algorithms in fault diagnosis due to their high accuracy outcomes [5,20]. Especially when training samples are limited, RF has a significant advantage, and when compared to ANN and SVM, RF performs significantly better on noisy data [27].

Figure 5. Machine learning modeling process. Green arrows indicate the validation process, whereas white arrows indicate the final training and testing processes on the blind test set [28].

All optimal models developed from these algorithms were assessed using the confusion matrix technique, which compared the actual and predicted conditions of the RPM to determine which ML algorithm performed the best in fault diagnostics for RPM equipment. One of the parameters commonly used to evaluate ML models is accuracy, which is the comparative value between the number of correct predictions (both true positives

T N

and true negatives

T F

) and the total number of predictions (including true positives

T P

, true negatives

T N

, false positives

F P

, and false negatives

F N

). The formula is described in Equation (1).

A c c u r a c y = \frac{T P + F P}{T P + T N + F P + F N}

(1)

3. Results

3.1. Results of Current-Data Measurements

The electric current of the RPM equipment was measured under three conditions: (1) normal operating conditions, (2) warning conditions with old grease, and (3) failure conditions when moving to the end position. Figure 6 shows the results of the current datasets obtained from each RPM equipment classification condition. The number of data samples obtained for conditions 1, 2, and 3 was 128, 145, and 153. The number of data samples slightly followed the balanced dataset scenario, which has several advantages like improving model performance, preventing bias, and enhancing generalization [29]. The collected data are illustrated in Figure 6. The time-series current signals of condition 1 and condition 2 were identical. The only difference in the visual reading was that condition 1 took less time to advance the RPM toward the end position compared to condition 2.

To gain a better understanding of the obtained current data, we compared the three types of current data for conditions 1, 2, and 3 of the RPM equipment using average data, as shown in Figure 7. There was a big difference between the three types of data in terms of data patterns and lengths (data dimension). Therefore, the use of time-series feature extraction is useful for dimensional reduction [30]. Dimensional reduction makes the dataset used as input for ML modeling have the same data dimensions, meaning the same number of variables or features.

3.2. Selection of the Relevant Time-Domain Features

In this paper, 11 time-domain features (length, maximum, minimum, mean, median, sum values, variance, standard deviation, root mean square, skewness, and kurtosis) were retrieved from the RPM operation’s time-series current data. The time-series current data were acquired from the three RPM equipment conditions—normal, warning, and failure—utilizing 128, 145, and 153 data samples, respectively. The 11 features were selected by performing scalable hypothesis testing to determine which features had no relevant relationship with the classifications or labels. Table 2 displays the outcomes of the feature selection process.

The 11 extracted features were sorted by p-value. The results showed that the length feature had the lowest p-value. This means that the length feature had a strong relevant relationship with the classifications. The maximum feature had a p-value greater than 0.05, indicating that the maximum feature had no significant relationship with the classifications. Thus, it was eliminated, and the rest of the features were utilized as the dataset for machine learning modeling.

3.3. Determining the Optimal Model

Hyperparameter tuning using six machine learning algorithms (LR, kNN, SVM, DT, RF, and ANN) was conducted in this study to determine the parameters of each machine learning method that yielded the best validation accuracy. The machine learning parameters that were investigated in grid search 5-fold cross-validation were as follows: the parameter in the LR algorithm was the C value (range 1–10); the parameter in the kNN algorithm was the number of neighbors (range 1–10); the parameters in the SVM algorithm were the C value (range 1–10), gamma value (range 0.1–1), and kernel type (linear or rbf); the parameters in the DT algorithm were the maximum depth of the tree (range 1–5) and the criterion function (gini or entropy); the parameters in the RF algorithm were the number of trees in the forest (range 1–5) and the criterion function (gini or entropy); and the parameters in the ANN were the number of hidden layers, neurons, and epochs, as well as the batch size. Because of the dataset’s shorter dimension, the proposed ANN technique employed only one hidden layer using the feedforward propagation technique. Furthermore, the input layer contained 10 nodes, the hidden layer contained 7 nodes, and the output layer contained 3 nodes. The ANN model employed a batch size of 32 values, which is the optimum size for faster convergence compared to a large batch size [4]. To find the optimal number of epochs, we monitored the validation loss every 30 epochs using the early stopping method. The optimal number of epochs found in this study was 1414. Table 3 presents the grid search 5-fold cross-validation results, including the training and validation accuracy values explained in Section 2.3. Table 3 also describes the best parameters in each ML model.

The six ML models achieved more than 80% validation accuracy. The kNN algorithm yielded the lowest validation accuracy, whereas the DT and RF algorithms exhibited the best performance with 100% accuracy. These parameter values were implemented to train the best models for each ML technique and were tested using the confusion matrix approach.

3.4. Comparison of Algorithms’ Performance

All the parameters gathered in Section 3.3 were utilized to train the optimal ML model for the fault diagnosis of RPM equipment. The training dataset, along with these parameters, was used in the training of the optimal machine learning model. Following the construction of the model, it was assessed on a testing dataset using the confusion matrix approach, which compared the actual and predicted conditions of the railway point machine. Figure 8 depicts the performance of the optimal model in terms of each algorithm.

According to the results in Figure 8 and using Equation (1), the accuracy of the optimal ML model was 84% for the kNN algorithm, 87% for the LR algorithm, 94% for the SVM and ANN algorithms, and 100% for the DT and RF algorithms. The LR and kNN algorithms achieved testing accuracy values of less than 90%, whereas the SVM, DT, RF, and ANN algorithms achieved testing accuracy values higher than 90%, indicating that they performed well in the fault diagnosis of railway point machines, especially the DT and RF algorithms, which achieved testing accuracy values of 100%.

4. Discussion

The purpose of this study is to propose a fault diagnosis method for railway point machines using current signals and machine learning techniques. By making use of the benefits of current-signal measurements, the implementation of sensor monitoring is safer since it can be installed inside the equipment room of the station; therefore, the measurements do not interfere with operations since the current sensor senses the current based on electromagnetic principles. In addition, the installation cost is low because it uses only one sensor. The usage of the current sensor in the suggested technique for monitoring RPM equipment will also contribute to cost efficiency. Time-domain feature extraction based on scalable hypothesis testing aids in the identification of important features in fault diagnosis, avoiding overfitting or underfitting in the machine learning model. The chosen characteristics constitute the dataset for machine learning modeling, and the optimum parameters are obtained and utilized to train the ideal models using grid search 5-fold cross-validation. Two ML algorithms, DT and RF, performed very well in classifying RPM conditions, and the ANN and SVM algorithms also showed good accuracy. However, the other algorithms, LR and KNN, performed poorly, especially in classifying conditions 1 and 2. The primary explanation for this is the similarity of the current-data patterns between conditions 1 and 2. However, the DT and RF algorithms could manage these issues and correctly classify the RPM situations.

Table 4 compares earlier works on current-signal-based fault diagnosis for railway point machines. The method proposed in this study for classifying RPM equipment conditions using current signals demonstrated excellent accuracy in the application to fault diagnosis with multi-classifications, whereas previous works only investigated binary classifications in the fault detection of railway point machines.

5. Conclusions

In this research, we proposed a fault diagnosis approach based on current signals and machine learning methods for railway point machines. In this approach, the collected current signals from the test-bench experiment are converted into time-series data. Then, time-domain feature extraction based on scalable hypothesis testing is utilized to select the relevant features for the classifications. Finally, the selected features are employed in machine learning modeling to determine which ML algorithms perform the best. The research results demonstrate that the LR and kNN algorithms can achieve high accuracy, albeit lower than 90%. Furthermore, the SVM, DT, RF, and ANN algorithms show great accuracy (higher than 90%) in the fault diagnosis of railway point machines, especially the DT algorithm, which achieves a testing accuracy of 100% in classifying RPM equipment. This study has some limitations. For example, the data were collected via a test bench, rather than in the field; the current-monitoring system was not developed using industrial data acquisition; and the number of multi-classifications can be further increased. However, the success achieved when using one current sensor, which has low installation costs for the proposed fault diagnosis method, will be a key consideration for maintenance teams of railway companies.

Implementing the current-based fault diagnosis method for railway point machines in real-world railway operations requires several steps, such as sensor installation, data collection, data transmission to the server, data processing, integration with the monitoring system, and an alert mechanism. The potential challenges and barriers to its adoption will be technically complex for maintenance teams due to sensor reliability and cost implications, and the fault diagnosis system must meet industry requirements.

Scaling up the implementation of current-signal-based fault diagnosis for railway point machines across a larger railway network will require several considerations such as sensor network architecture, communication architecture, big data management, robust algorithms, and big data interpretability. There are several strategies for overcoming the scalability issue, for instance, pilot testing, incremental implementation, and continuous improvements.

Further research into combining data from other sensors, such as vibration and acoustic sensors, is also needed. A potential benefit of combining data from vibration and acoustic sensors is that both sensors could monitor the railway point machine all the time, even when the railway point machine is not in operation, whereas current sensors only monitor the railway point machine when it is in operation. Furthermore, the use of both sensors would increase the availability of condition monitoring. Additionally, combining current, vibration, and acoustic sensors would give more information about the machine’s conditions, especially in terms of electrical-mechanical parameters, resulting in a comprehensive fault diagnosis model. However, the potential drawback of combining data from vibration and acoustic sensors is that the installation must be placed in a railway point machine, which is not easy in a narrow RPM space. The installation must comply with the safety of the machine’s operation. In addition, the complexity of signal processing would be increased due to the condition monitoring of several signals in the RPM.

All problems and challenges will be future research topics since they are unavoidable barriers to the development and implementation of RPM intelligent fault diagnosis. Addressing these problems via continuous scientific research will pave the way for innovative solutions, enabling the continual improvement and refinement of railway point machine (RPM) intelligent fault diagnostic systems. Researchers could investigate innovative methods, technologies, and strategies to overcome these challenges, eventually adding to the robustness and efficient adoption of intelligent fault diagnosis in railway operations.

Author Contributions

Conceptualization, A.S. and W.A.C.; methodology, A.S. and W.A.C.; software, Y.Y.; validation, A.S., W.A.C. and Y.Y.; formal analysis, W.A.C. and Y.Y.; investigation, A.S.; resources, A.S. and Y.Y.; data curation, A.S., W.A.C. and Y.Y.; writing—original draft preparation, Y.Y.; visualization, Y.Y.; supervision, A.S. and W.A.C.; project administration, Y.Y.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The research was possible with the help of UPT Balai Yasa Sintel and LAA of PT KAI (Persero) who provided the materials and facility for the data collection experiments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hu, X.; Cao, Y.; Tang, T.; Sun, Y. Data-driven technology of fault diagnosis in railway point machines: Review and challenges. Transp. Saf. Environ. 2022, 4, tdac036. [Google Scholar] [CrossRef]
Cao, Y.; Sun, Y.; Xie, G.; Li, P. A Sound-Based Fault Diagnosis Method for Railway Point Machines Based on Two-Stage Feature Selection Strategy and Ensemble Classifier. IEEE Trans. Intell. Transp. Syst. 2022, 23, 12074–12083. [Google Scholar] [CrossRef]
Wang, F.; Xu, T.; Tang, T.; Zhou, M.; Wang, H. Bilevel Feature Extraction-Based Text Mining for Fault Diagnosis of Railway Systems. IEEE Trans. Intell. Transp. Syst. 2017, 18, 49–58. [Google Scholar] [CrossRef]
Wang, F.; Tang, T.; Yin, J.; Li, Y.; Ren, F. A signal segmentation and feature fusion based RUL prediction method for railway point system. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, Maui, HI, USA, 4–7 November 2018; pp. 2303–2308. [Google Scholar] [CrossRef]
Çinar, Z.M.; Nuhu, A.A.; Zeeshan, Q.; Korhan, O.; Asmael, M.; Safaei, B. Machine learning in predictive maintenance towards sustainable smart manufacturing in industry 4.0. Sustainability 2020, 12, 8211. [Google Scholar] [CrossRef]
Cong, F. Special Issue on Machine Condition Monitoring and Fault Diagnosis: From Theory to Application. Appl. Sci. 2023, 13, 11550. [Google Scholar] [CrossRef]
Jieyang, P.; Kimmig, A.; Dongkun, W.; Niu, Z.; Zhi, F.; Jiahai, W.; Liu, X.; Ovtcharova, J. A systematic review of data-driven approaches to fault diagnosis and early warning. J. Intell. Manuf. 2022, 34, 3277–3304. [Google Scholar] [CrossRef]
Escobet, T.; Bregon, A.; Pulido, B.; Puig, V. Fault Diagnosis of Dynamic Systems: Quantitative and Qualitative Approaches; Springer: Berlin/Heidelberg, Germany, 2019; pp. 1–462. [Google Scholar] [CrossRef]
Gonzalez-Jimenez, D.; Del-Olmo, J.; Poza, J.; Garramiola, F.; Madina, P. Data-driven fault diagnosis for electric drives: A review. Sensors 2021, 21, 4024. [Google Scholar] [CrossRef]
Sun, Y.; Cao, Y.; Li, P. Contactless Fault Diagnosis for Railway Point Machines Based on Multi-Scale Fractional Wavelet Packet Energy Entropy and Synchronous Optimization Strategy. IEEE Trans. Veh. Technol. 2022, 71, 5906–5914. [Google Scholar] [CrossRef]
Asada, T.; Roberts, C.; Koseki, T. An algorithm for improved performance of railway condition monitoring equipment: Alternating-current point machine case study. Transp. Res. Part C Emerg. Technol. 2013, 30, 81–92. [Google Scholar] [CrossRef]
Sun, Y.; Cao, Y.; Li, P.; Xie, G.; Wen, T.; Su, S. Vibration-based Fault Diagnosis for Railway Point Machines using VMD and Multiscale Fluctuation-based Dispersion Entropy. Chin. J. Electron. 2023, 32, 1–11. [Google Scholar]
Bermeo-Ayerbe, M.A.; Cocquempot, V.; Ocampo-Martinez, C.; Diaz-Rozo, J. Remaining useful life estimation of ball-bearings based on motor current signature analysis. Reliab. Eng. Syst. Saf. 2023, 235, 109209. [Google Scholar] [CrossRef]
Jin, W.; Shi, Z.; Siegel, D.; Dersin, P.; Douziech, C.; Pugnaloni, M.; Cascia, P.L.; Lee, J. Development and evaluation of health monitoring techniques for railway point machines. In Proceedings of the 2015 IEEE Conference on Prognostics and Health Management: Enhancing Safety, Efficiency, Availability, and Effectiveness of Systems Through PHAf Technology and Application, PHM 2015, Austin, TX, USA, 22–25 June 2015. [Google Scholar] [CrossRef]
Sa, J.; Choi, Y.; Chung, Y.; Kim, H.Y.; Park, D.; Yoon, S. Replacement condition detection of railway point machines using an electric current sensor. Sensors 2017, 17, 263. [Google Scholar] [CrossRef] [PubMed]
Guo, Z.; Ye, H.; Dong, W.; Yan, X.; Ji, Y. A fault detection method for railway point machine operations based on stacked autoencoders. In Proceedings of the ICAC 2018—2018 24th IEEE International Conference on Automation and Computing: Improving Productivity through Automation and Computing, Newcastle Upon Tyne, UK, 6–7 September 2018; Chinese Automation and Computing Society in the UK—CACSUK. pp. 1–6. [Google Scholar] [CrossRef]
Florian, E.; Sgarbossa, F.; Zennaro, I. International Journal of Production Economics Machine learning-based predictive maintenance: A cost-oriented model for implementation. Int. J. Prod. Econ. 2021, 236, 108114. [Google Scholar] [CrossRef]
Lee, J.; Choi, H.; Park, D.; Chung, Y.; Kim, H.Y.; Yoon, S. Fault detection and diagnosis of railway point machines by sound analysis. Sensors 2016, 16, 549. [Google Scholar] [CrossRef] [PubMed]
Bolbolamiri, N.; Sanai, M.S.; Mirabadi, A. Time-Domain Stator Current Condition Monitoring: Analyzing Point Failures Detection by Kolmogorov-Smirnov (K-S) Test. Int. J. Electr. Comput. Energ. Electron. Commun. Eng. 2012, 6, 587–592. [Google Scholar]
Carvalho, T.P.; Soares, F.A.A.M.N.; Vita, R.; Francisco, P. A systematic literature review of machine learning methods applied to predictive maintenance. Comput. Ind. Eng. 2019, 137, 106024. [Google Scholar] [CrossRef]
Bukhsh, Z.A.; Saeed, A.; Stipanovic, I.; Doree, A.G. Predictive maintenance using tree-based classification techniques: A case of railway switches. Transp. Res. Part C Emerg. Technol. 2019, 101, 35–54. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
Christ, M.; Braun, N.; Neuffer, J.; Kempa-Liehr, A.W. Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh—A Python package). Neurocomputing 2018, 307, 72–77. [Google Scholar] [CrossRef]
Lei, Y.; Yang, B.; Jiang, X.; Jia, F.; Li, N.; Nandi, A.K. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mech. Syst. Signal Process. 2020, 138, 106587. [Google Scholar] [CrossRef]
Gholamy, A.; Kreinovich, V.; Kosheleva, O. Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation. Dep. Tech. Rep. (CS) Comput. Sci. Univ. Tex. El Paso 2018, 1209, 1–6. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. An Introduction to Statistical Learning: With Applications in Python; Springer: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
Han, T.; Jiang, D.; Zhao, Q.; Wang, L.; Yin, K. Comparison of random forest, artificial neural networks and support vector machine for intelligent diagnosis of rotating machinery. Trans. Inst. Meas. Control 2018, 40, 2681–2693. [Google Scholar] [CrossRef]
Xu, Y.; Goodacre, R. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. J. Anal. Test. 2018, 2, 249–262. [Google Scholar] [CrossRef] [PubMed]
Alkharabsheh, K.; Alawadi, S.; Kebande, V.R.; Crespo, Y.; Fernández-Delgado, M.; Taboada, J.A. A comparison of machine learning algorithms on design smell detection using balanced and imbalanced dataset: A study of God class. Inf. Softw. Technol. 2022, 143, 106736. [Google Scholar] [CrossRef]
Zebari, R.; Abdulazeez, A.; Zeebaree, D.; Zebari, D.; Saeed, J. A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. J. Appl. Sci. Technol. Trends 2020, 1, 56–70. [Google Scholar] [CrossRef]

Figure 1. The proposed research methodology using current signals and machine learning for fault diagnosis of railway point machines.

Figure 2. Experimental setup: (a) workstation laptop, (b) current sensor, (c) NSE-type RPM, (d) RPM test bench.

Figure 3. Illustration of current sensor installation.

Figure 4. Clutch assembly conditions of RPM equipment: (a) normal conditions with new grease, (b) warning conditions with old grease mixed with mud.

Figure 6. The collected time-series current signals of the RPM: (a) 128 samples in condition 1, (b) 145 samples in condition 2, (c) 153 samples in condition 3. Different colors indicate number of sampling.

Figure 7. The average of the time-series current signals for the 3 conditions of RPM equipment.

Figure 8. Confusion matrices for the six ML algorithms: (a) LR, (b) kNN, (c) SVM, (d) DT, (e) RF, (f) ANN.

Table 1. Summary of the extracted time-domain features.

Feature	Equation
Length	$s_{1} = N$
Maximum	$s_{2} = m a x (x_{i})$
Minimum	$s_{3} = m i n (x_{i})$
Median	$s_{4} = {\begin{matrix} x_{(N + 1) / 2}, N is odd \\ \frac{x_{N / 2} + x_{(N + 1) / 2}}{2}, N is even \end{matrix}$
Sum values	$s_{5} = \sum_{i = 1}^{N} x_{i}$
Mean	$s_{6} = \frac{s_{5}}{N}$
Variance	$s_{7} = \frac{1}{N} \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}$
Standard deviation	$s_{8} = \sqrt{s_{7}}$
Root mean square	$s_{9} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}}$
Skewness	$s_{10} = \frac{1}{N} \sum_{i = 1}^{N} \frac{{(x_{i} - s_{6})}^{3}}{s_{8}^{3}}$
Kurtosis	$s_{11} = \frac{1}{N} \sum_{i = 1}^{N} \frac{{(x_{i} - s_{6})}^{4}}{s_{8}^{4}}$

Table 2. The selection of time-domain features.

Feature	p-Value	Relevance
Length	6.48 $\times 10^{- 74}$	True
Sum values	2.08 $\times 10^{- 63}$	True
Mean	8.47 $\times 10^{- 59}$	True
Median	3.11 $\times 10^{- 58}$	True
Root mean square	1.04 $\times 10^{- 56}$	True
Kurtosis	3.07 $\times 10^{- 42}$	True
Skewness	4.44 $\times 10^{- 42}$	True
Minimum	1.14 $\times 10^{- 39}$	True
Standard deviation	8.53 $\times 10^{- 8}$	True
Variance	8.53 $\times 10^{- 8}$	True
Maximum	6.40 $\times 10^{- 1}$	False

Table 3. The results of grid search 5-fold cross-validation for the six ML algorithms.

ML Algorithm	Hyperparameters	Train Accuracy	Validation Accuracy
LR	C = 7	88%	88%
kNN	n neighbors = 1	100%	84%
SVM	kernel = rbf, C = 7, gamma = 0.1	100%	93%
DT	criterion = gini, maximum depth = 2	100%	100%
RF	criterion = gini, n estimators = 4	100%	100%
ANN	configuration for layers and neurons = [10 7 3], batch size = 32, epochs = 1414	93%	94%

Table 4. Comparison of the proposed method and previous works.

Item	Method A [15]	Method B [16]	Proposed Method
Signal	RPM current	RPM current	RPM current
Dataset	Field data	Field data	Test bench
Task	RPM replacement	Fault detection	Fault diagnosis
Classification	Binary class	Binary class	Multi-class
Feature extraction	Shapelet	Stacked autoencoders	Time-domain feature extraction based on scalable hypothesis testing
Classifier	Shapelet	Stack autoencoders	Decision Tree
Accuracy	97%	100%	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sugiana, A.; Cahyadi, W.A.; Yusran, Y. Current-Signal-Based Fault Diagnosis of Railway Point Machines Using Machine Learning. Appl. Sci. 2024, 14, 267. https://doi.org/10.3390/app14010267

AMA Style

Sugiana A, Cahyadi WA, Yusran Y. Current-Signal-Based Fault Diagnosis of Railway Point Machines Using Machine Learning. Applied Sciences. 2024; 14(1):267. https://doi.org/10.3390/app14010267

Chicago/Turabian Style

Sugiana, Ahmad, Willy Anugrah Cahyadi, and Yasser Yusran. 2024. "Current-Signal-Based Fault Diagnosis of Railway Point Machines Using Machine Learning" Applied Sciences 14, no. 1: 267. https://doi.org/10.3390/app14010267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Current-Signal-Based Fault Diagnosis of Railway Point Machines Using Machine Learning

Abstract

1. Introduction

2. Methods

2.1. Experimental Setup and Data Preparation

2.2. Time-Domain Feature Extraction Based on Scalable Hypothesis Testing

2.3. Machine Learning Modeling

3. Results

3.1. Results of Current-Data Measurements

3.2. Selection of the Relevant Time-Domain Features

3.3. Determining the Optimal Model

3.4. Comparison of Algorithms’ Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI