1. Introduction
The efficiency and cost-effectiveness of railway freight transportation can be significantly enhanced by embracing automation, enabling a more flexible and competitive service. Automation and autonomous operations have already been widely implemented in road, air, and marine transportation [
1]. In the railway sector, automation has primarily been developed in public transport, including driverless metro systems, light rail transit (LRT), people movers, and automated guided transit (AGT). The core concept involves shifting operational tasks from human drivers to automated train control systems, such as the European Rail Traffic Management System (ERTMS). According to the International Electrotechnical Commission (IEC) standard 62290-1, Autonomous Train Operation (ATO) is a highly automated system that reduces the need for driver supervision [
2].
For fully autonomous train operation, all train operator responsibilities must be transferred to advanced systems capable of sensing the environment, detecting obstacles, and responding appropriately to potential hazards. The Obstacle Detection System (ODS), a critical component of ATO, must function in accordance with freight-specific and general railway automation standards, such as EN 62267. To meet stringent railway regulations, the ODS must reliably detect obstacles in challenging environments and under poor visibility conditions. As a machine vision system incorporating both hardware and software solutions, the ODS provides critical information about obstacles on or near railway tracks and estimates their distance from the train.
A high-performance ODS must operate in real time, function across various lighting conditions, including daytime, low-light, and nighttime scenarios, detect obstacles at long distances (up to 2 km), and remain reliable under adverse weather conditions, such as heavy snow or extreme heat. However, most existing ODS technologies are limited to optimal lighting conditions, reducing their effectiveness in low-visibility environments. In such scenarios, thermal imaging systems offer a viable solution due to their ability to detect objects in the infrared spectrum, independent of ambient lighting. Despite their potential, fully developed methods for detecting both living and non-living objects using thermal imaging remain limited, highlighting the need for further research and development in this field [
3].
In the field of obstacle detection for autonomous train operations, several approaches have been explored to enhance safety and reliability. One method for long-distance obstacle detection involves using LiDAR to process point cloud data, which offers reliable safety even in low-light and tunnel environments [
4]. To improve detection accuracy and adaptability, Tang and Yang developed a multi-sensor obstacle detection system that integrates point cloud data and visual inputs, overcoming the limitations of single-sensor detection [
5]. An approach proposed by Brucker et al. utilizes a shallow neural network that incorporates both local and global information to segment railway images and detect obstacles, outperforming baseline methods on a custom dataset [
6].
The problem of autonomous train operation (ATO) might seem less complex than autonomous car driving, since trains operate on rail tracks and there is no turning. The ATO system focuses on safe and reliable slowing down and/or stopping the train if critical obstacles are detected on the rail tracks or in their vicinity. The actual stopping distance of a freight train is influenced by various factors, including train length, weight, speed, and brake system efficiency. Unlike cars, where the braking distance is 10 to 50 m, the braking distance for freight trains can be from 500 m to more than 2 km according to national and international regulations and rail sections. Therefore, for a reliable ATO system, it is essential to detect obstacles that are much further from the train than in autonomous car braking systems.
This paper proposes an image-plane homography approach enhanced with fuzzy logic for object distance estimation using a single-camera system. The method is designed for application in both daytime and nighttime conditions, utilizing thermal camera imagery to ensure reliable performance under varying illumination. The primary objective is obstacle detection in railway environments, where the homography transformation establishes a mapping between two planes: the image plane and the rail track plane.
The image-plane homography method for object distance estimation using a thermal camera, as described in [
7], is enhanced through a fuzzy logic approach. Under ideal conditions, where the rail tracks lie within a single plane, and the image is undistorted and rectified, homography can provide highly accurate distance estimations. However, in real-world scenarios, image distortions, rail track inclinations of up to 2%, and other environmental factors introduce variations in accuracy. Additionally, inaccuracies in the bounding box coordinates of detected objects further affect the precision of distance estimation.
Thermal camera calibration and rectification present unique challenges compared to conventional visual systems. While traditional methods, such as using a chessboard pattern, are widely employed for camera calibration in optical imaging, they are not directly applicable to thermal cameras. This is because thermal images typically lack the high-contrast features required for precise corner detection, such as those found in a chessboard pattern. Furthermore, thermal cameras are subject to different distortion effects, such as non-uniformity in sensor response and lens distortion, which cannot be easily corrected using conventional calibration techniques. As a result, the rectification of thermal images using standard methods is not feasible in this context. This limitation led to the need for a strategy involving multiple homography matrices, which account for various viewing conditions, rather than relying on single, pre-calculated calibration matrices [
7].
To address these challenges, this paper proposes a hybrid fuzzy logic-based homography distance estimation method. Instead of relying on a single transformation matrix, the approach adapts multiple transformation matrices based on object features extracted from images acquired by the obstacle detection system (ODS). Fuzzy logic is then applied to refine distance estimation, effectively handling nonlinearities and uncertainties present in the image data. In this framework, the bounding box characteristics of detected objects serve as input variables, while the output is the estimated object distance.
The integration of fuzzy logic with homography-based distance estimation presents a novel approach for obstacle detection and distance measurement in autonomous train operations. Traditional methods for distance estimation typically rely on complex calibration techniques, which are often costly and difficult to implement in dynamic, real-world environments. This paper introduces a hybrid fuzzy logic–homography method, which eliminates the need for such calibration by utilizing pre-calculated homography matrices and fuzzy logic to handle the inherent uncertainty and nonlinearity in distance estimation. The main theoretical innovation of this approach lies in its ability to seamlessly integrate fuzzy logic with geometric transformations (homography matrices) to estimate distances accurately from thermal and night-vision camera images, without requiring complex camera calibration or environmental assumptions.
One of the unique contributions of this paper is the use of fuzzy membership functions tailored to specific ranges based on experimental data. By incorporating domain knowledge, such as regulatory braking distances for freight trains, the fuzzy system is designed to provide reliable distance estimates in real-world conditions. This method is particularly well suited for low-light environments, where traditional vision systems may struggle, thus addressing a critical gap in autonomous train safety. Furthermore, by pre-calculating homography matrices and utilizing fuzzy logic for dynamic estimation, this approach offers a significant improvement over existing systems that rely on more rigid calibration procedures.
2. Related Work
Deep learning techniques have been widely utilized for obstacle detection in railway systems, significantly enhancing accuracy and efficiency in challenging environments. Convolutional Neural Network (CNN)-based models have achieved high detection performance, with reported accuracy rates reaching up to 98% in identifying obstacles on railway tracks [
8]. While these methods demonstrate strong potential for real-world deployment, further validation under diverse environmental conditions is necessary. In more complex scenarios, feature-aware networks have been introduced to improve the detection of small obstacles, achieving a mean average precision (mAP) of 92.7% while maintaining a lightweight and efficient design, making them suitable for rail transit applications [
9]. Additionally, infrared (IR)-based pedestrian detection approaches have proven highly effective in night vision surveillance, achieving a segmentation accuracy of 93% and a detection accuracy of 90%, demonstrating strong capability in identifying and tracking pedestrians in low-light conditions [
10].
Accurate object distance estimation is critical in safety-sensitive applications such as mobile robotics, autonomous vehicles, and Intelligent Transportation Systems (ITSs). Distance measurement techniques are generally categorized into active and passive methods [
11]. Active methods rely on sensors such as ultrasonic, radar, and laser scanners, which utilize signal reflections for measurement. Conversely, passive methods derive object position information using camera-based scene analysis [
12].
Vision-based systems, which employ passive measurement techniques, provide valuable spatial information and are commonly classified into monocular and stereo vision approaches. Stereo vision-based systems utilize multiple cameras and triangulation to determine three-dimensional object coordinates, facilitating precise distance estimation [
13]. Monocular vision-based systems, which rely on a single camera, estimate object distances by leveraging geometric relationships within the scene. Various approaches have been explored for monocular distance estimation, including object-relative positioning, geometric transformations, and pixel-height-based mapping. For instance, a method utilizing a single camera with dual off-axis apertures covered by color filters estimates distances by analyzing the relative shift between object projections through both apertures [
14]. Another approach maps pixel height to physical distance, achieving an accuracy of 98.76%, though its application is limited to basic geometric shapes such as rectangular, cylindrical, triangular, and circular objects [
15]. Additionally, a model describing the relationship between object resolution and distance as a growth series has been proposed, demonstrating minimal error rates between 0.5% and 1% when tested on randomly captured images [
16]. Distance estimation using a single camera has also been explored for human face detection, where a formula based on the relationship between pixel area and object distance was derived using the pinhole camera model, camera calibration, and area mapping, achieving measurement accuracy exceeding 95% [
17].
For automotive applications, various approaches have been introduced to estimate distances between vehicles. One method computes the ratio between real-world distances, measured in meters, and pixel distances from the host vehicle to a detected object, achieving high accuracy [
18]. A hybrid technique combining two methods—one based on the inverse proportionality of vehicle distance and width and another relying on position-based object mapping into a 3D space—has also been proposed [
19]. In testing on 1000 sequential images with a resolution of 640 × 480 pixels captured by a single camera, this method achieved an overall accuracy of 94.9%, though its effectiveness was limited to daytime conditions. Another approach utilized focal length in pixels, camera height above the ground, and the y-coordinate of the bottom line of the bounding box surrounding detected vehicles to estimate distances, incorporating homography to enhance accuracy [
20]. Improvements in monocular vision-based vehicle distance estimation using vehicle pose information have also been proposed, demonstrating reliability within a 30-m range [
21].
Inverse Perspective Mapping (IPM) has been explored as a monocular vision-based distance estimation technique, transforming forward-facing images into a top-down “bird’s-eye” view to establish a linear relationship between image and real-world distances [
22]. While this method has shown promising results, its accuracy decreases with longer distances due to perspective distortions. An alternative approach based on known parameters such as the camera field of view, height, and angle has been proposed [
23]. When compared to IPM, this method demonstrated better performance for short-range distances, while both techniques exhibited similar error rates for medium-range distances (22–27 m). However, for long-distance estimation, the error rate of the IPM method increased significantly, reaching up to 9% [
24].
Homography-based distance estimation has also been investigated using single thermal cameras in railway applications. Experimental validation involved placing objects (humans) on railway tracks, where distances were estimated as 49 m, 152 m, 295 m, and 491 m. These estimates were compared against ground-truth measurements of 50 m, 150 m, 300 m, and 500 m, respectively, resulting in an acceptable error margin of approximately 2% [
7]. These findings highlight the potential of homography as a reliable monocular vision-based distance estimation technique, particularly when integrated with corrective methods to compensate for environmental variations and image distortions.
3. Object Distance Estimation Using Image-Plane Homography
The so-called homography [
25,
26] offers the possibility to map the image plane and the corresponding world plane. As a result of this mapping, the world 3D coordinates of each point in the imaged world plane can be calculated. Focused on determining a projective transformation between images [
27], numerous homography estimation methods have been proposed for single-source and multimodal images with various applications, such as image registration, image fusion and object tracking [
26].
As railway tracks are located in a plane with respect to the locomotive frontal profile, homography mapping is important in order to obtain an estimation of the distance
d from an on-board mono camera to an object point on the rail track.
Figure 1 provides an illustration of a bird’s-eye view of the rail tracks plane with an object on the rail tracks at distance
dh from the camera and a real thermal camera image of said rail track scene.
This estimation of object distance involves two phases: the calculation of homography matrix H and the mapping of points from one plane to another, i.e., from the rail tracks to the camera image plane.
Point
x from the rail track plane is mapped onto a point in image
x′ according to the following:
where
x is the homogeneous vector of the point from the real-world plane,
x′ is the homogeneous vector of the corresponding point in the image plane and
H is the 3 × 3 homography matrix. Equation (1) can be written in the following form:
where
are the inhomogeneous coordinates of the point from the real-world rail track plane, measured in relation to an arbitrarily chosen coordinate system, and
are the homogeneous coordinates of the corresponding image point [
26]. The inhomogeneous coordinates of the image point, i.e., its real coordinates in the image (
), are obtained from (2) as follows:
Homography matrix
H has eight degrees of freedom. This means that in order to calculate the elements of matrix
H, one needs to build a system of eight equations of the form (3). Two equations in (3) are built from the coordinates of one point
This means that to build a system of eight linear equations, the coordinates of four points (in the image and in the real world) are required with the restriction that none of the three points are collinear. The last element of the
H matrix,
h33, is usually taken to be equal to 1,
h33 = 1 [
25].
For the calculation of the homography matrix
H, four points were used, for which the coordinates in the real world and the image were known (
Figure 2c).
The calculation of the coordinates in the real world was performed based on the experimental data, where thermal images were acquired on the closed section of the railway. There were two people standing on the rail tracks, one 400 m and the other 950 m from the camera. The rail tracks were detected in the acquired thermal image using edge detection and a region growing algorithm [
28] (
Figure 2a), while the detection of the objects on the rail tracks was performed using the YOLO (You Only Look Once) object classifier (
Figure 2b), trained with a COCO dataset [
29,
30]. Once the rail tracks and the bounding boxes of the detected objects (humans) were obtained, four points in the image plane were determined as edges of the rectangle shown in
Figure 2c. The rectangle’s edges are determined as the intersections of the detected rail tracks and horizontal lines with the same
v coordinates as the lower edges of the detected bounding boxes. The corresponding real-world coordinates were not hard to obtain, bearing in mind that the height from the ground (the rail tracks plane) of the camera was known, the camera was placed in the middle of the tracks, and the distance between the tracks was known and constant.
The coordinates of the two points in the image corresponding to the object 400 m away were marked as (
u1,v1) and (
u2,
v2), while the other two points corresponding to the objects 950 m away were marked as (
u3,
v3) and (
u4,
v4) (
Figure 2c).
Carrying out the procedure explained above using a thermal image of the scene with the objects on the rail tracks at distances of 400 m and 950 m led to a system of equations, which were solved for the elements of the homography matrix
H400–950.
Using the inverse of the homography matrix (4), the estimation of the distance between the camera and any real-world point from the rail tracks plane can be calculated as
The next step was to calculate five new homography matrices using the real-world object coordinates and corresponding points in the image. The objects (persons) were placed 400 m and 600 m from the camera on the rail tracks and
H400–600 matrix was calculated using the same method as previously described. The process was repeated for the pairs of distances of 500 m and 700 m, 600 m and 800 m, 700 m and 900 m, 800 m and 950 m, and
H500–700,
H600–800,
H700–900, and
H800–950 were calculated, respectively. To evaluate homography-based distance estimation, the real measured distances from the camera to the objects’ points on the rail tracks plane were compared to the estimated ones (
Table 1).
The choice of homography matrices is influenced by the trade-off between estimation accuracy and the range of applicability. As demonstrated in
Table 1, the homography matrix that covers the specific range of the object’s real distance provides more accurate estimates than using a matrix derived from the entire 400–950-m range. This highlights the importance of tailoring the homography matrix to the specific distance range for improved performance.
The error in the homography-based distance estimation was a consequence of uncertainty in the homography matrix H calculation, uncertainty in object detection (i.e., in the detection of the exact intersection point of the object and the rail track) and the usage of an uncalibrated and unrectified image.
As can be seen in
Table 1 and
Figure 3, the error was smaller if the real distances were within the range covered by the homography. Each homography matrix is tailored to a specific distance range, improving the accuracy of distance estimation by accounting for different perspectives and depth variations. This approach provides greater robustness compared to using a single matrix for the entire distance range, and it reduces the impact of image distortions and other calibration-related issues.
To handle the previously mentioned uncertainties, a novel FuzzyH method is proposed and explained in the following section as an improvement of the presented homography-based distance estimation methods.
4. Hybrid Fuzzy Logic–Homography Distance Estimation—FuzzyH Method
After analyzing the ranges of errors achieved by the above described homography-based method for distance estimation, a new hybrid fuzzy logic–homography (named FuzzyH) method was proposed for increasing accuracy in distance estimation by obtaining a compensation factor.
The fuzzy system, introduced by Lotfi A. Zadeh in 1965, has become a powerful tool for handling uncertainty and nonlinearity, particularly in complex systems where precise modeling is challenging [
31]. Recent applications of fuzzy systems include autonomous driving, robotic control, and image processing. For example, fuzzy logic has been used for path planning, obstacle avoidance, and multi-sensor data fusion in autonomous vehicles, as well as for image enhancement and target recognition in image processing [
32,
33].
According to Mendel [
34], generally speaking, a fuzzy logic system (FLS) is “a nonlinear mapping of an input data (feature) vector into a scalar output (the vector output case decomposes into a collection of independent multi-input/single-output systems)”. A fuzzy logic system maps (most often crisp) inputs into crisp outputs with a fuzzy inference process as an intermediate process. When applied to a real-world scenario, a fuzzy logic technique requires the following three steps: fuzzification, fuzzy inference process, and defuzzification [
30]. The fuzzification of input data is the determination of the degree to which a crisp input belongs to each of the appropriate fuzzy sets. After input data are fuzzified and their membership values obtained, the next step involves their application to the antecedents of fuzzy rules. The last step, defuzzification, transforms the fuzzy set obtained by the inference engine into a crisp output value.
The flow chart of the proposed FuzzyH system that integrates fuzzy decisioning and various homography matrices for distance estimation is shown in
Figure 4. The inputs to the fuzzy system were the distance estimated from the homography matrix
H400-950 and the
v coordinate in the image, while the output was coefficient
Kd, used for the generation of vector
K and further for the final distance estimation used for obstacle detection and decision making in the autonomous train system.
First, distance (
dh) between the camera and the object was estimated using the described homography-based method. Further, Mamdani’s model [
35,
36,
37], with two inputs and one output, was used for the fuzzy logic system. Inputs to the fuzzy logic system (FLS) were defined as the distance between the camera and the object on rail track
dh and the
v coordinate of the detected object point in the image.
The distance ranges for fuzzy logic were determined based on the empirical data obtained from our experiments conducted on a railway section, where visible rail tracks extend up to 980 m. The furthest distance used for the calculation of homography matrices was 950 m, as this represented the maximum observable distance in the experimental setup. Distances beyond 1000 m were not considered due to the limitations of the camera resolution, which made object detection unreliable at such distances.
Regarding the selection of the focus and medium distances, the 700-m range was chosen, as it corresponds to the typical braking distance for freight trains based on both railway regulations and expert knowledge in the field. This distance range is also critical for operational safety and control.
Additionally, the fuzzy logic system was designed with five distinct ranges, derived from the experimental data. The range is divided into sub-ranges, Short, Mid-short, Mid, Mid-long and Long, and five fuzzy membership functions are defined as shown in
Figure 5.
The first and last membership functions were modeled using trapezoidal functions, while the three intermediate ranges were represented using triangular membership functions. This choice of fuzzy membership functions was aimed at ensuring accurate and reliable distance estimation while also accounting for the real-world conditions observed during experimentation.
Also, the
v coordinates (in pixels) in the image for the same distances are determined as an intersection of the detected rail tracks and the detected object, and five fuzzy membership functions are defined as shown in
Figure 6. The fuzzy membership functions of the input image point
v coordinate were determined to correspond to the fuzzy membership functions of the input distance
dh, and based on the expert knowledge gained during real-world experimentation.
The output of the fuzzy logic system was defined as the distance coefficient (
Kd) with the 0–1 range (
Figure 7). The role of this coefficient was to increase the accuracy of distance estimation.
It can be seen that the fuzzy membership functions of the FLS input dh are symmetrical. On the other hand, the asymmetric membership functions of the FLS input v coordinate and the output distance coefficient Kd possess certain parts that can be observed as symmetrical. This symmetry can help in the reduction in the number of fuzzy rules and their uniformization.
Fuzzy rules for this FLS were defined based on expert knowledge and experience. Some of the rules are given below:
If dh is Short and the v coordinate is High, then Kd is High_K.
If dh is Long and the v coordinate is Low, then Kd is High_K.
If dh is Mid-short and the v coordinate is Mid-high, then Kd is Mid-low_K.
If dh is Mid and the v coordinate is Mid, then Kd is Mid_K.
If dh is Mid-long and the v coordinate is Mid-low, then Kd is Mid-high_K.
Output variable
Kd was used for the calculation of the distance, using homography matrices
H400–600,
H500–700,
H600–800,
H700–900 and
H800–950 already calculated as described in
Section 2 of this paper. For the estimation of a new (corrected) object distance, compensation factor
K was multiplied with corresponding inverse matrices
H−1400–600,
H−1500–700,
H−1600–800,
H−1700–900, and
H−1800–950 to obtain new matrices to be used for object distance estimation. The compensation factor is vector
K with five elements (
K1,
K2,
K3,
K4, and
K5), which satisfy the following condition:
In addition, the calculation of each
K vector element was related to the range of the originally estimated distance
dh, as well as the obtained distance coefficient
Kd. The vector elements were calculated as follows:
where
a = 400 m,
b = 500 m,
c = 600 m,
d = 700 m,
e = 800 m,
f = 900 m and
g = 950 m.
For the random point on the rail tracks plane, whose coordinates in the image are (
u,v), the distance from the camera can now be calculated as follows:
The accuracy of the FuzzyH method was evaluated by comparing its corrected object distance estimates with actual measured distances from the camera to the objects in various real-world field scenarios, as explained in the following section.
5. Serbian Railways Experiments
To evaluate the proposed FuzzyH method for long-range object (obstacles) detection, field tests were performed on a Serbian railway test site (at a location in the village of Babin potok, near the city of Prokuplje) approved for experimental use by the Serbian Railway authorities (
Figure 8 and
Figure 9).
For the development and evaluation of the proposed method, a static thermal camera positioned along the rail tracks was used while the objects (such as moving obstacles) were in motion. This setup was designed to simulate the real-world conditions for distance estimation, with a focus on tracking moving objects at varying distances. The experimental setup for the field tests is shown in
Figure 8, where the thermal camera is mounted on the static test stand so as to view the straight rail tracks of a length of about 1000 m.
For real-time object detection, the system relied on a thermal camera operating at 60 frames per second (fps) with a maximum resolution of 640 × 480 pixels. This setup was specifically chosen to mitigate the influence of vibrations and ensure stable performance. In this configuration, the system could detect objects efficiently using a pre-trained deep learning model, and GPU acceleration was employed to further optimize the processing speed. The system operated in low-light (night) conditions (the intensity of illumination was 0, measured with a luxmeter).
The test stand with the thermal camera on it was placed on a level crossing marked with the letter A in
Figure 9. For the initial homography-based object distance estimations, objects (persons) were at distances of 400 m and 950 m (marked with the letters B and C in
Figure 9, respectively). During the experiments, three people, members of the research team, including the authors of this paper, mimicked potential static and dynamical obstacles on the rail tracks located at various distances from the camera test stand. Before the test, the rail track test site was marked every 5 m alongside the rail tracks.
6. Results and Discussion
The estimation of distances between the thermal camera and the objects (persons) detected in the thermal camera images was performed using the two described methods: the original homography-based method with homography matrix
H400-950 and the FuzzyH method. A comparison of FuzzyH and homography distance estimates is presented in
Table 2 for the randomly selected thermal camera images where the objects (persons) were static or moving on a whole test site along the rail tracks. The results show that the maximum absolute error of the homography-based estimation was 90 m (13.53383%) for the real distance of 665 m, while the minimum absolute error of the estimation for the real distance of 465 m was 24 m (5.16129%). The results also show that the maximal absolute error calculated when using the FuzzyH method was smaller than the maximum absolute error calculated using the homography-based method. Namely, for the FuzzyH method, the maximum absolute error of 32 m (7.111111%) was achieved for the estimation of the distance where the obstacle was 450 m from the camera. The absolute value of the minimum estimation error of 7 m (0.8235%) was calculated for the real distance of 850 m.
The presented experimental results show that, compared to the homography-based method, the FuzzyH method yields significantly better results (
Figure 10). In addition, the FuzzyH results are highly accurate, having in mind the low visibility and distance ranges from 400 m to 950 m, which are classified as long-range distances, with a maximum error of slightly more than 7%. Long-range distance estimation is highly dependent on image quality, image resolution, and object detection reliability.
In order to evaluate the real-time performance of the proposed system, a detailed performance analysis experiment was conducted. The system was designed to operate in real time, with pre-calculated homography matrices minimizing processing time during operation. To assess the consistency of the system’s execution times, the experiment was repeated for 1000 iterations, measuring execution times for each run. An Interquartile Range (IQR) filtering method was applied to remove extreme outliers, followed by statistical analysis including mean, median, and standard deviation, variance, and a 95% confidence interval (CI). The analysis was conducted on a Lenovo Yoga Pro 7, equipped with an Intel® Core™ Ultra 7 155H processor (1.40 GHz), which supports efficient multi-threaded processing through Intel’s hybrid core design and Thread Director technology. The results revealed a mean execution time of 3.6545 × 10−6 seconds, with minimal variability (standard deviation of 8.6862 × 10−7) and a narrow 95% confidence interval, confirming the system’s ability to perform consistently in real-time environments.
To evaluate the robustness of the proposed FuzzyH method against various uncertain disturbances [
38] in different environmental conditions, the evaluation experiments were performed at another location, a level crossing in the village of Žitorađa, near the city of Niš, the Republic of Serbia. The experimental setup was similar to the previously described field tests, consisting of a thermal camera on a test stand (
Figure 8), and experiments were performed in nighttime conditions, with an intensity of illumination of 0. However, the camera was not centered between the rail tracks and gave an angled view of them. The evaluation was performed for distances of 100 m and 200 m for the first scenario and 50 m, 100 m, 300 m, and 500 m for the second, meaning that the proposed method was also tested for distances outside the 400 m to 950 m range used for the development of the FuzzyH distance estimator. In the first experiment (the evaluation scenario), two people were near the rail tracks and the distance between the first person and the experimental setup was estimated to be 202 m, yielding an estimation error of 2 m (1%) since the real measured distance was 200 m. However, the second person was 100 m from the camera and the estimation error was 41 m (41%) (
Figure 11).
In the second evaluation scenario, the camera was placed outside the rail tracks and four people were standing in the vicinity of the rail tracks 50 m, 100 m, 300 m, and 500 m from the camera. The estimated distances using the FuzzyH method were 148 m, 178 m, 332 m and 428 m (
Figure 12). Hence, for the distance of 300 m, the estimation error was 32 m (10.667%), and for the distance of 500 m, the absolute value of the error was 72 m (14.4%).
Although the estimation error was significant and the results were inaccurate for the short distances between the camera and the detected object, the overall results were relevant since the FuzzyH method was developed for distances of more than 400 m, as only long-range distances are important for ATO. Also, evaluation conditions were complex—there was a change in scenarios, different positions of experimental setup, no additional adjustment of the IR camera and no additional parameter adjustment of the FuzzyH estimator. The evaluation results show that the closer one is to the long-range distance, the more accurate the FuzzyH estimation.
In the experimental setup, the thermal camera was fixed in a position relative to the rail track plane, ensuring a consistent camera perspective during the distance estimation process. Additionally, the influence of the rail track inclination, which, in our setup, did not exceed 2%, was considered to be minimal. While inclination can introduce some errors, this effect is reduced by the fact that both the camera and the train are situated on two parallel inclined planes, preserving the relative geometry between them. As such, any variation in inclination should have a negligible impact on the estimation process. However, a more detailed study on the impact of camera positioning and rail track inclination will be conducted in future work, as discussed in the conclusion.
In real-world applications, the system would be deployed on a moving freight train, operating at speeds ranging from 70 km/h to 100 km/h, depending on the size and weight of the train. While train speed is an important consideration, the proposed system is not significantly affected by this factor. The primary challenge for real-time performance is the effect of vibrations, rather than speed. Operating at 60 fps, the system is capable of maintaining accurate distance estimation even at higher speeds typical of freight trains. The influence of both speed and vibration on the system’s accuracy is minimal, making the method suitable for deployment on moving trains.
7. Conclusions
In this paper, a novel approach for long-range obstacle distance estimation in autonomous train operation (ATO) was proposed, combining image-plane homography with fuzzy logic (FuzzyH). The method was designed to enhance the accuracy and robustness of distance estimation using thermal imaging, particularly under low-visibility conditions.
The experimental results demonstrated that the FuzzyH method significantly outperforms the conventional homography-based approach, achieving highly accurate distance estimations in the range of 400 m to 950 m, with a maximum error of just over 7%. This study also highlighted that long-range distance estimation is highly dependent on image quality, resolution, and object detection reliability. Additional evaluations conducted in different environmental conditions confirmed the robustness of the proposed method, even when the camera was not centrally positioned along the tracks and under complete darkness.
These results demonstrate that the proposed method is computationally efficient and capable of meeting the real-time requirements of ATO systems. The system’s consistent performance ensures that it is scalable and suitable for deployment in operational environments.
The method showed reduced accuracy for short distances, but this limitation is not critical for ATO applications, where long-range detection is essential due to the extensive braking distances required for freight trains. The evaluation conditions were intentionally challenging, involving variations in experimental setup and environmental factors without additional system calibration, further validating the adaptability and reliability of the FuzzyH approach. Overall, the proposed FuzzyH method presents a significant advancement in distance estimation for ATO, improving safety and reliability in railway environments.
While the proposed hybrid fuzzy logic–homography method demonstrates promising results for real-time distance estimation in autonomous train operations, there remain several challenges to address in future research. One key challenge is ensuring robustness under varying environmental conditions, such as changes in camera position, rail inclinations, and weather conditions, all of which can affect the accuracy of distance estimations. Additionally, the current method has been tested in a controlled setting, and further validation is required to ensure its scalability and effectiveness in diverse operational environments. Future work will focus on expanding the approach to include a broader range of operational scenarios and comparing its performance to state-of-the-art techniques.
Overall, the contributions of this paper lie in the development of an efficient, low-cost, and scalable distance estimation method for autonomous trains, with potential applications in safety-critical transportation systems. Future research will continue to refine this approach and explore its integration with other technologies, such as real-time object tracking and multi-sensor fusion, to further enhance its robustness and accuracy.