A MSA-YOLO Obstacle Detection Algorithm for Rail Transit in Foggy Weather

Chen, Jian; Li, Donghui; Qu, Weiqiang; Wang, Zhiwei

doi:10.3390/app14167322

Open AccessArticle

A MSA-YOLO Obstacle Detection Algorithm for Rail Transit in Foggy Weather

by

Jian Chen

^1,2,3

,

Donghui Li

^1,*,

Weiqiang Qu

^2,4,* and

Zhiwei Wang

^2,4

¹

School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China

²

Zhejiang Stream Rail Intelligent Control Technology Co., Ltd., Jiaxing 314001, China

³

School of Computer and Information Engineering, Institute for Artificial Intelligence, Shanghai Polytechnic University, Shanghai 201209, China

⁴

Shanghai Stream Rail Transportation Equipment Co., Ltd., Shanghai 200126, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7322; https://doi.org/10.3390/app14167322

Submission received: 3 July 2024 / Revised: 10 August 2024 / Accepted: 16 August 2024 / Published: 20 August 2024

(This article belongs to the Special Issue Application of New Technology and New Ideas in Intelligent Transportation System)

Download

Browse Figures

Versions Notes

Abstract

:

Obstacles on rail transit significantly compromise operational safety, particularly under dense fog conditions. To address missed and false detections in traditional rail transit detection methods, this paper proposes a multi-scale adaptive YOLO (MSA-YOLO) algorithm. The algorithm incorporates six filters: defog, white balance, gamma, contrast, tone, and sharpen, to remove fog and enhance image quality. However, determining the hyperparameters of these filters is challenging. We employ a multi-scale adaptive module to optimize filter hyperparameters, enhancing fog removal and image quality. Subsequently, YOLO is utilized to detect obstacles on rail transit tracks. The experimental results are encouraging, demonstrating the effectiveness of our proposed method in foggy scenarios.

Keywords:

multi-scale adaptive YOLO; obstacle detection; rail transit; foggy weather; deep learning

1. Introduction

Rail transit plays a critical role in fostering urban economic development and fulfilling citizens’ transportation needs. However, the intrusion of obstacles poses a significant threat to the operational safety of rail systems [1,2]. Traditional methods for detecting obstacles often rely on manual inspection, which is both time-consuming and labor-intensive. The current mainstream solution employs video surveillance [3,4]. Nevertheless, the significant degradation in video image quality caused by fog presents a formidable challenge to traditional techniques used for detecting obstacles along rail transit perimeters.

Obstacle detection in rail transit using video images predominantly encompasses three methodologies: feature-based methods, machine learning-based methods, and deep learning-based methods. Feature-based object detection techniques involve extracting features from images and employing classifiers or regressors for object identification. Commonly utilized features include Haar features [5], HOG features [6], and SIFT features [7]. These methods typically require manual feature and classifier design, with their performance being significantly constrained by the selection and design of these features. Machine learning-based object detection techniques acquire feature representations and target models through the training of classifiers or regressors. Conventional machine learning methodologies encompass support vector machines, random forests, and AdaBoost [8,9]. These approaches necessitate manual feature extraction as well as training and inference employing machine learning algorithms. In recent years, remarkable advancements have been achieved in object detection through the application of deep learning methodologies. These approaches harness deep convolutional neural networks (CNNs) to acquire intricate feature representations from images and precisely localize objects within them. Prominent among these deep learning techniques are Faster R-CNN, where you only look once, and a single-shot multibox detector [10]. These methodologies facilitate end-to-end object detection, obviating the necessity for manual feature engineering, thereby yielding substantial enhancements in both precision and computational efficiency. They demonstrate heightened discriminative capabilities and improved generalization performance.

In the realm of object detection, YOLO epitomizes single-stage algorithms that harness convolutional neural networks to extract target features, thereby transforming the classification quandary into a regression one. This paradigm shift results in the direct provision of insights regarding target categories and their respective positional domains, thus markedly amplifying detection efficacy. Deqiang He proposed the FE-YOLO algorithm, an improved CNN-based one-stage object detection method, to enhance obstacle detection accuracy in rail transit environments, focusing on small and irregular obstacles [11]. Tao Ye proposed the SEF-Net algorithm, which enhances railway obstacle detection by improving accuracy and speed, particularly for small objects in complex environments. It integrates stable bottom feature extraction, lightweight feature extraction, and adaptive feature fusion modules [12]. The algorithm proposed above effectively addresses obstacle detection in rail transit under clear or light foggy weather conditions. However, detecting obstacles during severe foggy weather remains a significant unresolved challenge in rail transit safety. Compared to obstacle detection in clear weather conditions, research on detecting obstacles in dense, foggy weather is relatively limited. Utilizing defog models for image enhancement has emerged as a powerful technique and has seen significant advancements in recent years. Kaiming He proposed a method to remove haze from a single image using the dark channel, which relies on the observation that haze-free outdoor images have pixels with very low intensities in at least one color channel [13]. Vishwanath A. Sindagi introduced an unsupervised domain adaptive object detection framework designed for adverse weather conditions like haze and rain, using weather-specific priors to improve detection performance by minimizing weather-related distortions in image features [14]. However, image degradation caused by dense fog significantly lowers detection accuracy, resulting in issues such as missed detections and false alarms.

A target detection model trained on high-quality, clear images often fails to achieve satisfactory performance under adverse weather conditions, such as dense fog. One effective approach is to decompose images captured in these adverse conditions into clean images and their corresponding weather information. Based on weather information, image quality can be appropriately enhanced, potentially recovering more latent information about originally blurred and misidentified objects. This image enhancement technique not only improves the clarity and contrast of images but also enhances the performance of target detection models under adverse weather conditions. As a result, it allows for accurate identification and detection of target objects in environments such as dense fog, rain, and snow, thereby increasing the reliability and stability of the system. However, enhancing image quality under varying levels of fog density remains a challenging problem that requires further investigation.

To address the aforementioned issues, this paper proposes a MSA-YOLO algorithm for rail transit under foggy weather conditions. The algorithm incorporates a multi-scale adaptive technique into the foggy image to suppress interferences such as dense fog and recover latent useful information from the images. Subsequently, the YOLOv3 network is employed to detect obstacles in rail transit. Utilizing its powerful convolutional neural network architecture, YOLOv3 performs multi-scale object detection, ensuring high accuracy while maintaining real-time performance. The system effectively identifies and locates obstacles on the tracks even under complex foggy conditions, significantly enhancing detection accuracy and robustness.

2. Related Work

2.1. Feature-Based Obstacle Detection

Using feature extraction to identify obstacles is a traditional technical approach that demands relatively low computational power from hardware. This method involves analyzing specific attributes of the image, such as edges, textures, and shapes, to distinguish obstacles from the background. Due to its efficiency, feature extraction has been widely adopted in various real-time applications where computational resources are limited. L. A. Fonseca Rodriguez discussed an autonomous train driving system using artificial vision algorithms to detect obstacles on railway tracks [15]. By employing the Hough transform, the system identifies rails and areas of interest and conducts real-time video analysis to detect potential obstacles, improving safety and performance. Huashan Ye introduced a new method based on the Hough Transform for fast detection of lines and circles in images [16]. Mucahit Karaduman presented an image processing-based obstacle detection system using laser measurement for railway safety [17]. Hiroki Mukojima presented a method for obstacle detection on railway tracks using a moving camera with background subtraction. By comparing input and reference camera images, the method effectively detects various obstacles [18]. Tingting Yao introduced an image-based obstacle detection system for automatic train supervision. It improved grayscale and binarization algorithms and used Canny edge detection with detection windows to filter out irrelevant information [19]. Zhongli Wang proposed a robust rail track extraction method using inverse projective mapping and the Hough transform [20]. In some straightforward scenarios, feature-based obstacle detection methods can perform effectively. These methods leverage predefined image features, allowing for reliable detection when environmental conditions are relatively simple and controlled. However, traditional feature extraction techniques may struggle with complex or dynamic environments where the appearance of obstacles can vary significantly. Thus, while effective in certain scenarios, these methods may require augmentation with more advanced techniques to ensure robust performance in diverse conditions.

2.2. Machine Learning-Based Obstacle Detection

Machine learning-based obstacle detection learns image features from large training datasets to automatically detect potential obstacles in the environment. Timothy W. Ubbens described a monocular vision-based obstacle detection method using a support vector machine. By training the support vector machine on images of unobstructed floors, the system classifies anything not recognized as a floor as an obstacle [21]. Raja Sattiraju presented a machine learning-based obstacle detection method for automatic train pairing using ultra-wideband technology [22]. C. S. Arvind discusses an autonomous vehicle’s obstacle detection and avoidance system using reinforcement learning. By combining Q-learning with a multilayer perceptron neural network (MLP-NN) and ultrasonic sensors, the system achieves improved accuracy in detecting and avoiding static obstacles [23]. Compared to traditional methods, machine learning-based approaches provide greater adaptability and detection accuracy, allowing for more effective performance in complex and dynamic environments. However, traditional machine learning-based obstacle detection requires manual feature extraction, which is both time-consuming and labor-intensive.

2.3. Deep Learning-Based Obstacle Detection

Deep learning has been successfully applied to obstacle detection. Rajiv Kapoor proposed an intelligent railway surveillance framework using deep learning for object and track recognition [24]. Haixia Pan presented a multitask learning framework for railway obstacle intrusion detection using convolutional neural networks [25]. The model integrated obstacle detection and track line classification, employing a shared encoder and task-specific decoders. Zhangyi Wang proposed an efficient rail area detection method using convolutional neural networks (CNN) [26]. The method involved two main stages: extracting the rail area with a CNN and optimizing the contours with polygon fitting. Juan Li proposed FB-Net, a robust CNN for real-world railway traffic detection, balancing speed and accuracy [27]. Zhipeng Zhang proposed an AI-based framework for automatically detecting grade-crossing trespassing near misses using computer vision analysis of surveillance video data [28].

In summary, current deep learning-based railway traffic obstacle detection algorithms can detect obstacles effectively in clear weather conditions but experience significantly reduced accuracy in adverse weather conditions such as fog. Particularly in dense fog environments, existing algorithms fail to meet the requirements for railway traffic obstacle detection. To address these issues, the MSA-YOLO algorithm is proposed. This algorithm enhances the network architecture to improve feature extraction and modeling capabilities. The multi-scale network significantly enhances robustness to disturbances in adverse environments and improves obstacle perception capabilities.

3. Methodology

Figure 1 illustrates the MSA-YOLO algorithm. This algorithm comprises three main components: a multi-scale adaptive module, an image processing module, and a YOLO module. The image processing module comprises several advanced filters: a defogging filter, a white balance filter, a gamma filter, a contrast filter, a tone filter, and a sharpen filter. The multi-scale adaptive module is employed to optimize the parameters within the filtering module. These filters collaborate to enhance image quality by mitigating the effects of fog, adjusting color balance, optimizing brightness and contrast, refining tonal range, and sharpening details. This comprehensive approach significantly enhances the algorithm’s overall performance, especially in adverse weather conditions. The YOLO algorithm integrates both local and global information, substantially enhancing the network’s capability to detect objects of varying scales, even in challenging foggy conditions. This integration significantly improves detection accuracy and robustness, making the algorithm highly effective for real-world applications.

3.1. A Multi-Scale Adaptive Module

The multi-scale adaptive network module demonstrates exceptional performance in the field of image processing, particularly in tasks such as defogging and various filtering operations. By employing convolution kernels of different scales, this module effectively processes multi-level information in images, thereby enhancing image quality and detail representation. Specifically, the module includes three different scales of convolution kernels: 7 × 7, 5 × 5, and 3 × 3.

The input to the network is a foggy image, and the architecture consists of three parallel branches for multi-scale feature extraction and fusion. The first branch employs five layers of 7 × 7 convolutions, designed to cover a larger receptive field and capture global features and long-range dependencies of the image. This branch effectively handles images with complex backgrounds and extracts large-scale global information through multiple layers of 7 × 7 convolutions. The second branch uses five layers of 5 × 5 convolutions, balancing receptive field size and computational complexity. Compared to the 7 × 7 convolutions, the 5 × 5 convolutions better capture medium-scale features while being more computationally efficient. This branch enriches the overall feature representation by extracting medium-scale features through multiple layers of 5 × 5 convolutions. The third branch directly inputs the original foggy image, preserving the initial information and details. This ensures that the original image information is retained during feature extraction, providing a reliable foundation for subsequent processing.

At the output stage, the outputs of the three branches are summed to achieve multi-scale feature fusion. By combining features from different scales, the network can comprehensively represent the image information, enhancing defogging effectiveness and feature representation capability. The fused features are then passed through five layers of 3 × 3 convolutions for further detail extraction. The 3 × 3 convolutions, with their smaller receptive field, capture fine structures and textures in the image. This multi-layered approach allows for fine-grained processing of the fused features, enhancing the detailed representation of the image.

Finally, the output is processed through two dense layers for higher-level feature extraction and classification. These dense layers linearly combine the features extracted by the convolutional layers and enhance the model’s expressive power through nonlinear activation functions. The processing through the two dense layers results in a high-quality, defogged image.

3.2. An Image Processing Module

3.2.1. A Defogging Filter

The dark channel prior defogging filter is an effective image-defogging algorithm that removes fog effects by estimating the dark channel prior to an image, thereby restoring the image’s clarity and contrast [13]. Based on the atmospheric scattering model, a fog image can be constructed as follows:

I (x) = J (x) t (x) + A (1 - t (x))

(1)

where

I (x)

represents the fog image, and

J (x)

denotes the scene radiance (clean image). A is the global atmospheric light, and

t (x)

is the medium transmission map, defined as follows:

t (x) = e^{- β} d (x)

(2)

where

β

represents the scattering coefficient of the atmosphere, and

d (x)

denotes the scene depth.

Select the top 0.1% brightest pixels in the dark channel image and use the corresponding pixels in the original image to estimate the atmospheric light A. For each pixel in the input image, select the minimum value among all color channels within a local window. This forms the dark channel image.

J^{d a r k} (x) = \min_{y \in Ω (x)} (\underset{c \in Ω (x)}{m i n} I^{c} (y))

(3)

where

J^{d a r k}

is the dark channel image,

Ω (x)

represents the local window around pixel

x

, and

I^{c} (y)

is the pixel value of color channel c.

The transmission map

t (x)

describes the portion of light that is not scattered and reaches the camera. It is estimated as follows:

t (x) = 1 - ω J^{d a r k} (x) / A

(4)

In this context, ω is a hyperparameter that is optimized through backpropagation by the multi-scale adaptive network to improve the performance of the defog filter for foggy image detection.

3.2.2. White Balance Filter

The white balance algorithm adjusts the proportions of the red, green, and blue (RGB) channels in an image to ensure that white objects appear truly white. The white balance filter is defined as follows:

\{\begin{matrix} r_{o} = W_{r} r_{i} \\ g_{o} = W_{g} g_{i} \\ b_{o} = W_{b} b_{i} \end{matrix}

(5)

where

r_{i}

,

g_{i}

, and

b_{i}

represent the values of the three color channels of the input image in the white balance filter, respectively. The coefficients of the white balance filter are denoted by

W_{r}

,

W_{g}

, and

W_{b}

, while

r_{o}

,

g_{o}

, and

b_{o}

represent the corresponding output values. In the white balance filter,

W_{r}

,

W_{g}

, and

W_{b}

are the parameters that require optimization.

3.2.3. Gamma Filter

The Gamma filter is a nonlinear operation commonly used in image processing. It enhances the visual quality of an image by adjusting its brightness and contrast. The filter is defined as follows:

\{\begin{matrix} r_{o} = r_{i}^{G} \\ g_{o} = g_{i}^{G} \\ b_{o} = b_{i}^{G} \end{matrix}

(6)

where

G

is a hyperparameter that requires optimization.

3.2.4. Contrast Filter

A contrast filter is an image processing tool used to enhance the contrast of an image. By increasing the difference between the light and dark areas, a contrast filter can make the features of the image more distinguishable and visually appealing. The primary goal of a contrast filter is to improve the image’s clarity and detail, making it more suitable for visual analysis or presentation. The filter is defined as follows:

L u m (P_{i}) = 0.27 r_{i} + 0.67 g_{i} + 0.06 b_{i}

(7)

E n L u m (P_{i}) = \frac{1}{2} (1 - \cos (π \times (L u m (P_{i}))))

(8)

E n (P_{i}) = P_{i} \times \frac{E n L u m (P_{i})}{L u m (P_{i})}

(9)

P_{0} = α \cdot E n (P_{i}) + (1 - α) \cdot P_{i}

(10)

where

α

is the parameter that requires optimization.

3.2.5. Tone Filter

A tone filter is an image processing tool designed to adjust the tonal range and distribution of an image. It enhances the visual quality by manipulating highlights, midtones, and shadows, thus bringing out details, improving contrast, and creating a more balanced and aesthetically pleasing image. The filter is defined as follows [29]:

P_{o} = \frac{1}{T_{L}} \sum_{j = 0}^{L - 1} c l i p (L \cdot P_{i} - j, 0,1) t_{k}

(11)

where

L

parameters are represented as

\{\begin{matrix} t_{0} & t_{1} & \begin{matrix} \dots & t_{L - 1} \end{matrix} \end{matrix}\}

,

T_{L} = \sum_{i = 0}^{L - 1} t_{l}

. In a tone filter,

\{\begin{matrix} t_{0} & t_{1} & \begin{matrix} \dots & t_{L - 1} \end{matrix} \end{matrix}\}

are the parameters that require optimization.

3.2.6. Sharpen Filter

A sharpen filter is an image processing tool used to enhance details and edges in an image, making it appear clearer. By emphasizing the edges and fine structures in the image, a sharpen filter can make the visual effect more vivid and dynamic. The filter is defined as follows [30]:

F (x, λ) = I (x) + λ (I (x) - G a u (I (x)))

(12)

where

I (x)

represents the input image,

G a u (I (x))

represents the Gaussian filter, and

λ

is a positive scaling factor. In a sharpen filter,

λ

is the parameters that require optimization.

3.3. Detection Network Module

YOLOv3 utilizes a novel network architecture known as Darknet-53, which comprises 53 convolutional layers. Unlike previous models that use fully connected layers, YOLOv3 employs anchor boxes for predicting bounding boxes. It performs these predictions at three different scales, enabling more accurate detection of objects of varying sizes. Each prediction layer corresponds to a specific scale, enhancing the model’s flexibility and precision. Darknet-53 leverages residual blocks, skip connections, and upsampling techniques to extract more meaningful features from images, thus improving learning and detection accuracy. Due to its outstanding performance in object detection, YOLOv3 is highly suitable for applications such as image editing, security surveillance, crowd detection, and autonomous driving [31,32]. In the detection network presented in this paper, we employ an identical network architecture to detect obstacles in rail transit systems.

4. Experiments

4.1. Data Training

The dataset comprises five annotated object classes: person, bicycle, car, bus, and motorcycle. These classes are sourced from two public datasets, VOC2007 and VOC2012. The foggy images are generated using Equations (1) and (2), where

d (x)

is defined as follows:

d (x) = - 0.04 \times ρ + \sqrt{m a x (r o w, c o l)}

(13)

where

ρ

represents the Euclidean distance from the current pixel to the central pixel, and

r o w

and

c o l

denote the number of rows and columns in the image, respectively. By setting

A = 0.5

and

β = 0.01 \times i + 0.05

, where

i

is an integer from 0 to 9, ten different levels of fog can be applied to each image.

To accommodate obstacle detection in rail transit under both normal and foggy weather conditions, a hybrid approach has been adopted for the training dataset. This method enhances the model’s robustness across various weather conditions and improves its applicability in real-world scenarios. Before each image is fed into the network for training, there is a 2/3 probability that it will be randomly augmented with simulated fog. This augmentation enables the model to learn and adapt to obstacle detection tasks in foggy conditions. By combining normal images with foggy ones, the entire training pipeline is executed end-to-end using YOLOv3 detection loss, ensuring the model learns to detect obstacles in both normal and foggy environments during training.

Furthermore, to enhance the model’s detection performance, a multi-scale adaptive module has been integrated into the training process. This module is weakly supervised through detection loss, eliminating the need for manually annotated ground truth images and thus reducing the dependency on large-scale labeled datasets. The multi-scale adaptive module can extract and fuse features at different scales, allowing the model to more accurately detect obstacles of various sizes and shapes. This multi-scale feature extraction and fusion method significantly enhances the model’s performance in complex scenarios.

4.2. Implementation Details

The MSA-YOLO model is trained using the Adam optimizer over a total of 80 epochs. The initial learning rate is set to

10^{- 4}

and decays to a final learning rate of

10^{- 6}

. The learning rate is adjusted using a warm-up phase followed by a cosine annealing schedule. During training, the MSA-YOLO model predicts bounding boxes at three different scales, with three anchors at each scale to ensure accurate detection of objects of various sizes. This multi-scale prediction approach enhances the model’s robustness and accuracy in detecting objects of different sizes and shapes.

Our experiments are conducted using the TensorFlow framework and executed on an NVIDIA RTX3090 Ti GPU (Santa Clara, CA, USA). The computational power of this high-performance GPU significantly accelerates the training process, enabling us to complete numerous training iterations within a reasonable timeframe. This setup ensures that the model can be efficiently trained to achieve high performance in object detection tasks.

4.3. Experimental Results

4.3.1. Defogging Experimental Results

To validate the effectiveness of the defogging network proposed in this paper, a comparative analysis was conducted between the IA network [33] and the multi-scale adaptive (MSA) network introduced in this study. With β values ranging from 0.05 to 0.14, the defogging effects of the IA and the multi-scale adaptive networks are illustrated in Figure 2. Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) are employed as technical metrics to evaluate the defogging performance of different algorithms. As shown in the figure, the PSNR for the fog image, the defogged image using IA, and the defogged image using MSA are 13.4221, 12.713, and 13.472, respectively; the SSIM values are 0.6594, 0.7398, and 0.7672, respectively. From these PSNR and SSIM values, it is evident that MSA achieves the most satisfactory defogging results for railway images, exhibiting the highest PSNR and SSIM values. The superior performance of MSA can be attributed to the incorporation of a multi-scale network, which enables the algorithm to adaptively remove dense fog while simultaneously enhancing critical target information within the images through multiple filters.

To further validate the reliability of the proposed algorithm, we conducted tests on PSNR and SSIM using 200 images. The experimental results are presented in Table 1. As shown in the table, the proposed MSA network consistently outperforms the IA algorithm. Specifically, MSA achieves higher PSNR and SSIM values, indicating its significant advantage in preserving image quality and detail. The MSA algorithm is better equipped to handle images with varying fog densities, thereby enhancing the accuracy and robustness of target detection. These results further confirm the potential and reliability of MSA in practical applications, providing an effective solution for defogging railway images.

4.3.2. Object Detection Experimental Results

To validate the effectiveness of the object detection algorithm proposed in this paper, a comparative analysis was conducted between IA-YOLO [33] and MSA-YOLO. Figure 3 illustrates the object detection performance of IA-YOLO and MSA-YOLO under different β values and various foggy conditions. As shown in the figure, when β < 0.1, both IA-YOLO and MSA-YOLO effectively detect cars. However, when β ≥ 0.1, IA-YOLO fails to detect cars in foggy environments, whereas MSA-YOLO continues to perform effectively. This indicates that MSA-YOLO excels at removing dense fog while preserving and enhancing critical obstacle information in the images.

The MSA module improves the network’s ability to detect objects of varying scales by integrating high-resolution detail information with low-resolution global information. High-resolution detail information allows the system to capture fine features of target objects, while low-resolution global information provides a comprehensive view, enabling the system to recognize a wider range of target objects in complex environments. This multi-scale information fusion allows MSA-YOLO to perform exceptionally well in various conditions, particularly in complex, foggy weather. The model can accurately identify and locate obstacles in the images, thereby enhancing the system’s robustness and detection accuracy.

Figure 4 shows the obstacle detection results of MSA-YOLO and IA-YOLO under varying fog densities. In the first column, IA-YOLO misses a bus. In the second column, IA-YOLO detects a section of the road as a bus. In the third column, IA-YOLO identifies a tree trunk as a pedestrian. However, MSA-YOLO correctly identifies the objects in the images. These examples demonstrate MSA-YOLO’s excellent performance in identifying various objects of different sizes, shapes, and orientations in complex scenes. This achievement proves the high sensitivity and accuracy of our algorithm, further validating the advancement and practicality of MSA-YOLO in object detection. These visual examples showcase the broad potential and value of MSA-YOLO in real-world applications.

5. Conclusions

We propose the MSA-YOLO algorithm to enhance obstacle detection performance in rail transit under foggy conditions. The input image is processed through a series of filters: defogging, white balance, Gamma, contrast, tone, and sharpening. The MSA effectively removes dense fog, while preserving and enhancing critical obstacle information, facilitating YOLO in obstacle detection. Experimental results demonstrate that the proposed MSA-YOLO algorithm achieves higher detection accuracy in foggy scenarios.

While the MSA-YOLO algorithm has shown promising results, several areas for future work remain. First, the algorithm could be further optimized for real-time performance in practical rail transit applications. Enhancing computational efficiency without compromising detection accuracy will be critical for deployment in real-world scenarios. Second, testing and refining the algorithm on more diverse and extensive datasets, including real-world foggy weather images, will help to improve its generalizability and robustness. Additionally, exploring the integration of the MSA-YOLO algorithm with other advanced defogging techniques and adaptive learning mechanisms could further enhance its performance.

Author Contributions

Study conception, design, and original draft writing: J.C.; data collection: D.L.; analysis, interpretation of results and writing review/editing: J.C. and W.Q.; draft manuscript preparation and editing: Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Grants 2023AY31004, the Research Project of Jiaxing Civil Technology Innovation Research, and Innovative Education Special Project for Intelligent Navigation Applications in 2023 under Grant 2023BD022.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Authors Weiqiang Qu and Zhiwei Wang was employed by the company Zhejiang Stream Rail Intelligent Control Technology Co., Ltd. and Shanghai Stream Rail Transportation Equipment Co., Ltd. Author Jian Chen was also employed by the company Zhejiang Stream Rail Intelligent Control Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

He, D.; Ren, R.; Li, K.; Zou, Z.; Ma, R.; Qin, Y.; Yang, W. Urban rail transit obstacle detection based on improved R-CNN. Measurement 2022, 196, 111277. [Google Scholar] [CrossRef]
Guo, J.; Lou, H.; Chen, H.; Liu, H.; Gu, J.; Bi, L.; Duan, X. A new detection algorithm for alien intrusion on highway. Sci. Rep. 2023, 13, 10667. [Google Scholar] [CrossRef] [PubMed]
Meng, C.; Wang, Z.; Shi, L.; Gao, Y.; Tao, Y.; Wei, L. SDRC-YOLO: A novel foreign object intrusion detection algorithm in railway scenarios. Electronics 2023, 12, 1256. [Google Scholar] [CrossRef]
Li, B.; Tan, L.; Wang, F.; Liu, L. A railway intrusion detection method based on decomposition and semi-supervised learning for accident protection. Accid. Anal. Prev. 2023, 189, 107124. [Google Scholar] [CrossRef] [PubMed]
Arfi, A.M.; Bal, D.; Hasan, M.A.; Islam, N.; Arafat, Y. Real time human face detection and recognition based on Haar features. In Proceedings of the 2020 IEEE Region 10 Symposium, Dhaka, Bangladesh, 5–7 June 2020; pp. 517–521. [Google Scholar]
Sultana, M.; Ahmed, T.; Chakraborty, P.; Khatun, M.; Hasan, M.R.; Uddin, M.S. Object detection using template and HOG feature matching. Int. J. Adv. Comput. Sci. 2020, 11, 233–238. [Google Scholar] [CrossRef]
Chhabra, P.; Garg, N.K.; Kumar, M. Content-based image retrieval system using ORB and SIFT features. Neural Comput. Appl. 2020, 32, 2725–2733. [Google Scholar] [CrossRef]
Hechri, A.; Mtibaa, A. Two-stage traffic sign detection and recognition based on SVM and convolutional neural networks. IET Image Process. 2020, 14, 939–946. [Google Scholar] [CrossRef]
Chen, L.; Liu, Z.; Tong, L.; Jiang, Z.; Wang, S.; Dong, J.; Zhou, H. Underwater object detection using invert multi-class Adaboost with deep learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Lee, J.; Hwang, K.I. YOLO with adaptive frame control for real-time object detection applications. Multimed. Tools Appl. 2022, 81, 36375–36396. [Google Scholar] [CrossRef]
He, D.; Zou, Z.; Chen, Y.; Liu, B.; Miao, J. Rail transit obstacle detection based on improved CNN. IEEE Trans. Instrum. Meas. 2021, 70, 1–14. [Google Scholar] [CrossRef]
Ye, T.; Zhao, Z.; Wang, S.; Zhou, F.; Gao, X. A stable lightweight and adaptive feature enhanced convolution neural network for efficient railway transit object detection. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17952–17965. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
Sindagi, V.A.; Oza, P.; Yasarla, R.; Patel, V.M. Prior-based domain adaptive object detection for hazy and rainy conditions. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 13 November 2020; pp. 763–780. [Google Scholar]
Rodriguez, L.F.; Uribe, J.A.; Bonilla, J.V. Obstacle detection over rails using hough transform. In Proceedings of the 2012 XVII Symposium of Image, Signal Processing, and Artificial Vision (STSIVA), Medellin, Colombia, 12–14 September 2012; pp. 317–322. [Google Scholar]
Ye, H.; Shang, G.; Wang, L.; Zheng, M. A new method based on hough transform for quick line and circle detection. In Proceedings of the 2015 8th International Conference on Biomedical Engineering and Informatics (BMEI), Shenyang, China, 14–16 October 2015; pp. 52–56. [Google Scholar]
Karaduman, M. Image processing based obstacle detection with laser measurement in railways. In Proceedings of the 2017 10th International Conference on Electrical and Electronics Engineering (ELECO), Bursa, Turkey, 30 November–2 December 2017; pp. 899–903. [Google Scholar]
Mukojima, H.; Deguchi, D.; Kawanishi, Y.; Ide, I.; Murase, H.; Ukai, M.; Nagamine, N.; Nakasone, R. Moving camera background-subtraction for obstacle detection on railway tracks. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3967–3971. [Google Scholar]
Yao, T.; Dai, S.; Wang, P.; He, Y. Image based obstacle detection for automatic train supervision. In Proceedings of the 2012 5th International Congress on Image and Signal Processing, Chongqing, China, 16–18 October 2012; pp. 1267–1270. [Google Scholar]
Wang, Z.; Wu, X.; Yan, Y.; Jia, C.; Cai, B.; Huang, Z.; Wang, G.; Zhang, T. An inverse projective mapping-based approach for robust rail track extraction. In Proceedings of the 2015 8th International Congress on Image and Signal Processing (CISP), Shenyang, China, 14–16 October 2015; pp. 888–893. [Google Scholar]
Ubbens, T.W.; Schuurman, D.C. Vision-based obstacle detection using a support vector machine. In Proceedings of the 2009 Canadian Conference on Electrical and Computer Engineering, St. John’s, NL, Canada, 3–6 May 2009; pp. 459–462. [Google Scholar]
Sattiraju, R.; Kochems, J.; Schotten, H.D. Machine learning based obstacle detection for Automatic Train Pairing. In Proceedings of the 2017 IEEE 13th International Workshop on Factory Communication Systems (WFCS), Trondheim, Norway, 31 May–2 June 2017; pp. 1–4. [Google Scholar]
Arvind, C.S.; Senthilnath, J. Autonomous vehicle for obstacle detection and avoidance using reinforcement learning. In Proceedings of the Soft Computing for Problem Solving: SocProS 2018, Singapore, 28 November 2020; Volume 1, pp. 55–66. [Google Scholar]
Kapoor, R.; Goel, R.; Sharma, A. An intelligent railway surveillance framework based on recognition of object and railway track using deep learning. Multimed. Tools Appl. 2022, 81, 21083–21109. [Google Scholar] [CrossRef] [PubMed]
Pan, H.; Li, Y.; Wang, H.; Tian, X. Railway obstacle intrusion detection based on convolution neural network multitask learning. Electronics 2022, 11, 2697. [Google Scholar] [CrossRef]
Wang, Z.; Wu, X.; Yu, G.; Li, M. Efficient rail area detection using convolutional neural network. IEEE Access 2018, 6, 77656–77664. [Google Scholar] [CrossRef]
Li, J.; Zhou, F.; Ye, T. Real-world railway traffic detection based on faster better network. IEEE Access 2018, 6, 68730–68739. [Google Scholar] [CrossRef]
Zhang, Z.; Trivedi, C.; Liu, X. Automated detection of grade-crossing-trespassing near misses based on computer vision analysis of surveillance video data. Saf. Sci. 2018, 110, 276–285. [Google Scholar] [CrossRef]
Hu, Y.; He, H.; Xu, C.; Wang, B.; Lin, S. Exposure: A white-box photo post-processing framework. ACM Trans. Graph. 2018, 37, 1–17. [Google Scholar] [CrossRef]
Polesel, A.; Ramponi, G.; Mathews, V.J. Image enhancement via adaptive unsharp masking. IEEE Trans. Image Process. 2000, 9, 505–510. [Google Scholar] [CrossRef] [PubMed]
Zheng, S.; Wu, J.; Duan, S.; Liu, F.; Pan, J. An improved crowd counting method based on YOLOv3. Mob. Netw. Appl. 2022, 1–9. [Google Scholar]
Ma, X.; Ouyang, W.; Simonelli, A.; Ricci, E. 3d object detection from images for autonomous driving: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 10, 3537–3556. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Ren, G.; Yu, R.; Guo, S.; Zhu, J.; Zhang, L. Image-adaptive YOLO for object detection in adverse weather conditions. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 22 February–1 March 2022; Volume 36, pp. 1792–1800. [Google Scholar]

Figure 1. The MSA-YOLO framework.

Figure 2. Defogging effects of different algorithms.

Figure 3. Detection performance of IA-YOLO and MSA-YOLO at varying fog densities.

Figure 4. Detection results of IA-YOLO and MSA-YOLO.

Table 1. Defogging results of railway images.

Methods	IA	MSA
PSNR	15.8945	17.4097
SSIM	0.7735	0.8044

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Li, D.; Qu, W.; Wang, Z. A MSA-YOLO Obstacle Detection Algorithm for Rail Transit in Foggy Weather. Appl. Sci. 2024, 14, 7322. https://doi.org/10.3390/app14167322

AMA Style

Chen J, Li D, Qu W, Wang Z. A MSA-YOLO Obstacle Detection Algorithm for Rail Transit in Foggy Weather. Applied Sciences. 2024; 14(16):7322. https://doi.org/10.3390/app14167322

Chicago/Turabian Style

Chen, Jian, Donghui Li, Weiqiang Qu, and Zhiwei Wang. 2024. "A MSA-YOLO Obstacle Detection Algorithm for Rail Transit in Foggy Weather" Applied Sciences 14, no. 16: 7322. https://doi.org/10.3390/app14167322

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A MSA-YOLO Obstacle Detection Algorithm for Rail Transit in Foggy Weather

Abstract

1. Introduction

2. Related Work

2.1. Feature-Based Obstacle Detection

2.2. Machine Learning-Based Obstacle Detection

2.3. Deep Learning-Based Obstacle Detection

3. Methodology

3.1. A Multi-Scale Adaptive Module

3.2. An Image Processing Module

3.2.1. A Defogging Filter

3.2.2. White Balance Filter

3.2.3. Gamma Filter

3.2.4. Contrast Filter

3.2.5. Tone Filter

3.2.6. Sharpen Filter

3.3. Detection Network Module

4. Experiments

4.1. Data Training

4.2. Implementation Details

4.3. Experimental Results

4.3.1. Defogging Experimental Results

4.3.2. Object Detection Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI