1. Introduction
Distributed optical fiber sensors [
1] have emerged as one of the most effective methods for monitoring civil structures. This is primarily due to the inherent advantages of optical fibers, including electromagnetic interference immunity, corrosion resistance, electrical isolation, cost efficiency, compact size, and light weight. Furthermore, these sensors offer a significant reduction in cost per sensing point, low maintenance requirements, enhanced spatial resolution, independence from electrical supply (aside from the interrogator), and reliable measurements. These favorable characteristics allow their use in harsh environments where the use of traditional electronic sensors is inherently limited, and their maintenance and periodic revision is costly for the owner’s structure. The most important advantage of distributed optical fiber sensor technologies lies in the ability to obtain information in a distributed way throughout a monitored structure, allowing even tens of kilometers to be monitored simultaneously using a single interrogator. The high spatial resolution of distributed optical fiber sensors enables disturbances to be located over the entire length of the sensing optical fiber with high accuracy [
2]. In addition, the almost ubiquitous availability of unused fiber optic cables already installed provides a very attractive opportunity to repurpose them and perform distributed monitoring with existing infrastructure without the need for new installations [
3].
There is a wide range of distributed optical fiber sensors based on light scattering phenomena (Raman, Brillouin and Rayleigh) occurring in an optical fiber [
4]. In recent years, distributed acoustic sensors (DAS) based on Rayleigh scattering have become one of the most widely used technologies for monitoring large structures. The high sensitivity, high dynamic range and high sampling rate enable real-time distributed vibration monitoring. Among the most popular applications of this type of sensor are intrusion detection, oil and gas pipeline monitoring, structural health monitoring, seismic monitoring, and railway track monitoring [
5,
6].
The extended railway infrastructure is highly suitable for DAS monitoring due to its ability to detect vibrations generated by trains crossing the tracks. This monitoring technique allows for the estimation of crucial parameters, such as the train’s position, speed, and length [
6,
7]. These estimations serve two critical purposes. Firstly, they confirm the absence of uncoupled wagons during train journeys, ensuring safety. Secondly, they enable efficient management of rail traffic by accurately determining train position, speed, and length, thereby minimizing congestion and optimizing operational efficiency. Moreover, they also enable the detection of potential train faults through the analysis of their characteristic acoustic signatures. For instance, flat wheels or track wear can be identified based on the distinct acoustic patterns they generate [
8,
9]. This capability allows for the early detection of such problems, facilitating prompt maintenance and ensuring the overall safety and efficiency of railway operations. In addition, apart from the track, third party intrusions (TPI) can be detected [
10], and the overhead contact line could also be equipped with optical fibers to monitor its status [
11,
12].
Despite the advantages of DAS, its use in large-scale monitoring involves significant challenges. On the one hand, DAS sensitivity is highly dependent on the installation and the surroundings of the monitoring fiber [
13,
14]. This is because the coupling to the monitoring medium or structure determines the amount of mechanical vibration transferred to the fiber cable, so a non-uniform installation leads to an uneven sensor response. Therefore, it is very difficult to achieve constant coupling when monitoring civil infrastructures extend over tens of kilometers. The situation becomes more challenging when using optical fibers previously installed for telecommunication purposes (i.e., not intended for distributed monitoring), since the fiber coupling is likely to show significant and unpredictable changes along the route. On the other hand, the signal-to-interference-and-noise ratio of each monitored point along the fiber can be strongly affected by local acoustic interference due to environmental conditions. These difficulties result in a DAS sensor being composed of several monitored locations with unknown and variable sensitivity. Due to the large volume of data generated by this type of sensor, the manual search for reliable monitoring locations (i.e., with good coupling and low interference) becomes a prohibitively time-consuming task.
Within the existing literature, numerous algorithms have been proposed to enhance the signal to noise ratio of measurements obtained through a distributed acoustic sensor. These algorithms employ techniques such as wavelet decomposition [
15], empirical mode decomposition [
16], curvelet transform [
17], moving average, and dynamic window moving averaging [
18,
19], to name a few. However, it is important to note that these methods do not adequately address the diverse coupling levels encountered in DAS sensors.
Several studies have been conducted using distributed fiber optic sensing in the railway industry [
6,
7,
8,
9,
20,
21,
22] for applications such as determining railway condition monitoring, train position, and/or speed estimation. Some of them report the use of new installations with fibers installed for monitoring purposes [
8,
20], thus having uniform coupling along the monitored route, while others [
21,
22] use already installed fibers, emphasizingthe problem of unpredictable sensitivity changes.
Regarding the characterization of spatial locations, in [
9], the fiber locations are characterized by averaging the acoustic response along the time axis for different trains on different days obtaining a “signature” of the sensor-rail system. Although this method provides information about the energy at each fiber position, it must be interpreted with care, as not all high-energy locations indicate good monitoring positions. For example, locations affected by constant high-energy acoustic interference (e.g., in locations near roads or construction sites) are expected to show non-repeatable acoustic waveforms over time. In [
21], a similar study to the one presented in this paper is carried out, where a comparison of waveforms generated by different trains at the same fiber location is made, but the waveform similarity between them is not evaluated numerically. Furthermore, fiber locations were chosen randomly and were not compared along the route of the cable. In a more general context, a method to select monitoring locations with good coupling is described in [
23] to apply acoustic beamforming and source location methods to DAS measurements. However, only acoustic sources at fixed positions have been analyzed, resulting in a different characterization for each source position.
To our knowledge, existing research has acknowledged the presence of coupling variations along the optical fiber and has proposed methods to characterize these signal levels. However, thus far, there is a lack of literature on automated techniques specifically designed to identify well-coupled fiber locations for railway monitoring applications utilizing distributed acoustic sensing.
In this work, an automated method to characterize the repeatability of the temporal waveforms caused by trains and acquired at each monitoring position of a DAS is proposed to identify locations with both good mechanical coupling and low acoustic interferences. The method benefits from two important facts: (i) the distance between the railway and the monitoring fiber is fixed, and (ii) all trains running on the track are of the same model. This allows the generation of repeatable acoustic signals that disturb the fiber along its entire length for each train crossing the track, without the need for additional experiments or installations. The proposed method is validated with measurements performed over three days: two of them on consecutive days, while the third one took place one month after the first measurement. During the acquisition stage, 71 trains were automatically detected, and the waveforms acquired at an illustrative 520 m long fiber section were compared. The results indicate that spatial measurement locations with good waveform repeatability can be automatically discriminated by the proposed method.
The structure of this paper is as follows: In the first section, the acoustic distributed sensor employed and the field experimental conditions are presented. The procedure of two essential tasks that allow the correct interpretation of the data, namely, station identification and redundant fiber removal, are explained. Then, a method used to automatically generate a dataset is described in detail. Next, the pre-processing and the automatic train detection algorithm are explained. Finally, the method of repeatability characterization of the spatial monitoring location is explained and verified using experimental data.
3. Results
In this section, the repeatability of the temporal waveforms acquired at each spatial monitoring location is described. Then, the acoustic waveforms obtained at specific locations selected by the algorithm are illustrated.
3.1. Correlation Analysis
To evaluate the repeatability of the acoustic signals, measurements are performed over three days: two of them on consecutive days, while the third one took place one month later. The study is performed on an illustrative 520 m long section of track between stations (the same region shown in
Figure 5), corresponding to 1300 spatial sampling points (separated by 40 cm).
Figure 6 shows the schematic of the illustrative region. The train tracks are situated within a tunnel, while the optical fiber is buried underground inside a plastic pipe covered by concrete at an unknown depth.
To reduce the variability of the measured waveforms, only trains moving in the same direction are analyzed, so that the distance between the railway track and the optical fiber is fixed for each sensing point during the study. A total of 28 trains are detected on the first day of measurement (5-h acquisition), 11 trains on the second day (2-h acquisition) and 32 trains on the last day (6-h acquisition), resulting in a total of 71 trains.
To assess the waveform repeatability at a specific measurement location, the cross-correlation function [
26] between all detected acoustic waveforms is calculated and the maximum is recorded. Note that the correlation function is used in this case, as the signals are delayed temporally from each other. This process results in a matrix of size 71 × 71 for each spatial location, containing the correlation peak amplitude of each combination of waveforms generated by the detected trains. From the cross-correlation calculation, the time lag between the signals is also obtained.
Figure 6 shows two illustrative examples of correlation matrices at different spatial locations (
and
), where the color scale represents the peak correlation value.
Figure 6a,b shows spatial locations with high and low repeatability of the temporal waveforms, respectively.
The yellow diagonal lines correspond to the maximum autocorrelation value of the acoustic signals (i.e., equal to 1), while the other correlation values are obtained from combinations between different waveforms. A comparison of the two correlation matrices reveals very different results. On the one hand,
Figure 7a shows a correlation matrix where there is a high correlation ranging from 0.8 to 1.0 between many of the waveforms (indicated by the green and yellow color), corresponding to a spatial location with optimal coupling and minimal interference, since the acquired waveforms exhibit good repeatability over time. On the other hand,
Figure 7b shows a correlation matrix with values not exceeding 0.5 for almost all combinations (indicated by the darker blue color), corresponding to a spatial location with low waveform repeatability. This can be explained by either poor fiber coupling, a high level of acoustic interference, or both effects combined. The non-uniformity in the correlation matrix shown in
Figure 7a occurs because the temporal waveform depends on the speed of the train [
20]. Thus, only signals acquired when the train is travelling at close speeds will be highly correlated. However, if the number of analyzed waveforms is sufficiently large, it is likely to find signals generated by trains travelling at similar speeds, increasing the chance of finding a correlated signal within the dataset.
3.2. Correlation Indicator
Since not all channels are able to monitor acoustic waveforms, it is of the utmost importance to identify the random locations where the acoustic waveforms exhibit good repeatability over time. In this study, an indicator that summarizes the information from the correlation matrix into a single value representing each spatial monitoring location is proposed. Such an indicator is obtained by counting the number waveforms under the lower diagonal that exceeds a given correlation threshold. Note that the counting of the number of waveforms is considered and not the total number of correlations, as the purpose of the indicator is to measure how many of the analyzed waveforms find at least one similar waveform within the set. The indicator is designed in this way to reduce the impact of working with a dataset containing signals generated by trains travelling at different speeds. Regardless of whether there are many different speeds, if a waveform is repeated at least once, the spatial location score will increase. The maximum possible count equals the total number of signals analyzed, which is used to normalize the indicator. Thus, the indicator expressed as a percentage can vary between 0 and 100%, representing how many of the analyzed signals found a similar waveform in the dataset.
As an example, a correlation threshold value equal to 0.7 is chosen to guarantee good similarity between acoustic waveforms.
Figure 8 shows the percentage of correlated waveforms for the illustrative 520 m-long fiber section.
For each spatial location, the correlation matrix is obtained from the signals of the 71 detected trains and evaluated using the proposed indicator. The locations
and
(used as an example in
Figure 7) are highlighted using red and black dots, respectively. The proposed indicator assigns a higher score to the spatial location with the highest number of correlated waveforms (red dot), indicating a more reliable monitoring location than the one showing low correlation (black dot).
As can be seen, the percentage of correlated waveforms at each spatial location along the fiber changes unpredictably and does not follow a specific pattern. Note that even close measurement locations exhibit different levels of correlated waveforms, reinforcing the idea that a method must be applied to automatically find useful measuring locations. By using the proposed indicator, it is possible to discriminate the repeatability of the waveforms at each spatial measuring location, thus allowing finding the most reliable positions to perform effective measurements. From the correlated waveform percentage, trusted monitoring locations can be chosen using a user-defined confidence threshold.
3.3. Acoustic Waveform Comparison over Time
In this section, waveforms at the same spatial location and acquired over different days are compared. Waveforms are temporally aligned using the time delay calculated from the cross-correlation function. Spatial measurement points with a high correlation count are selected to demonstrate high repeatability over time. The following notation is used to refer to different train acoustic waveforms: , where is the spatial location, and and m correspond to the days and minutes of difference with respect to a reference detected train .
Figure 9 shows the acoustic waveform of two trains measured at point
on the same day, 90 min apart. Note that the vertical axis of the orange line is shifted in order to better illustrate the shape of each waveform. A high correlation value of 0.9 is obtained, demonstrating the high similarity between the two signals for both signal amplitude and waveform. At the top of the figure, a 4-carriage train is also illustrated, in which each of the bogies is numbered from 1 to 8. Note that the waveform contains 8 peaks corresponding to each of the bogies of the train. The correlation value obtained between the two waveforms is shown in the lower right corner.
Figure 10 shows the acoustic waveform of two trains measured at the same spatial location
, on the same day, 30 min apart. An increase in train speed with respect to the waveforms shown in
Figure 9 can be appreciated, evidenced by a shortening of the waveform duration. However, a high correlation value of 0.95 is obtained between the two waveforms. This indicates that even though the waveform is dependent on the train speed, trains with the same speed generate highly repeatable signals.
Figure 11 shows the acoustic waveform of two trains measured at the same spatial location
on the same day, 6 min apart. An even greater increase in speed can be seen in this case, indicated by the shorter duration of the waveform. Notice that at this train speed, the peaks corresponding to bogies 2, 3 are combined into one, and the same is true for peaks corresponding to bogies 4, 5 and 6, 7 (as depicted by the numbering at each peak). This illustrates how the sensor loses the ability to count each bogie individually for this train speed. This is due to the low pass filtering effect occurring as a consequence of integrating the phase measurement along the gauge length [
8]. Nevertheless, a high correlation of 0.85 is obtained between the acoustic waveforms, strengthening the idea that trains travelling through a spatial point at the same speed have repeatable waveforms over time.
Figure 12 shows the acoustic waveform of three trains obtained on three different days. A reference train obtained on day 0, a second one obtained one day and 6 min later and a third one obtained one month and 12 min after the reference. Note that a new vertical axis (yellow) is added that is also vertically offset to better illustrate the waveform of the signals. High correlation values are obtained for both cases, corresponding to 0.96 and 0.92, respectively. This demonstrates that the measurements acquired at the spatial monitoring location selected using our method exhibit high correlation even with a month’s difference.
Figure 13 shows the waveform of two trains acquired at the same spatial location
m, on the same day, 90 min apart. The signals exhibit a very similar signal duration and temporal location of the peaks, indicating a similar train velocity, but a noticeable difference in the signal amplitude.
The origin of this amplitude increase is unknown, but as there is only one train model running on the railway, it is suspected that it may be caused by a change in the weight of the train due to a considerable increase in the number of passengers. This result indicates that the DAS sensor can recognize changes in the amplitude of the acoustic signal generated by a train, which can be presumably associated with the weight of the train.
4. Discussion
Applying the proposed method systematically would allow monitoring the fiber coupling over time, which could provide valuable information about the environment surrounding the fiber. For example, for fibers buried in the ground it would be possible to study how the soil compactness changes after a day of rain or after high temperatures. In the same way, it would also make possible the assessment of the coupling of different types of fiber installations, e.g., to compare whether a fiber buried in the ground has better coupling than a fiber attached to a tunnel wall, or to compare different types of attachment methods. From such studies, a decision can be made to change or upgrade the fiber installation in key areas to allow distributed monitoring at those locations. Similarly, this information can be considered in future fiber communications installations to take advantage of distributed monitoring capabilities as well.
Note that temporal signals are analyzed here instead of spatial signals, as suggested by [
20]. This approach takes advantage of the localized nature of fadings, as well as the random variation in their position along the fiber. When a train passes on a railway track, vibrations are generated, resulting in fading-free signals at certain fiber locations, while others experience fading. However, due to frequency changes in the laser and the highly dynamic nature of the environment in which the fiber is installed (i.e., is affected by environmental factors and acoustic interference), the position of fadings is constantly changing over time. Thus, by creating a sufficiently large dataset, the probability of measuring the vibration generated by the passing train along the entire fiber without experiencing fading increases. This enables the comparison of waveforms from multiple trains at virtually all locations along the fiber. When a fading occurs, the correlation score assigned to that spatial location decreases, as the acquired signals will not correlate well with others. This lack of correlation is the reason why no spatial location achieves a perfect 100% correlation score. Despite this effect, the proposed method is able to characterize the spatial locations of the fiber due to the number of detected waveforms.